Diving Deeper with Lightrun – 深入了解Lightrun

最后修改: 2022年 9月 9日

1. Introduction


In our previous article, we introduced Lightrun – a Developer Observability platform. In this article, we’re going to look deeper into the features that it offers, how we can best use them with our applications, and what we can get from it.


2. Snapshots


In our last article, we briefly examined what snapshots are and what they can do for us. Here, we’re going to look in more depth at what they are, how we can best use them, and what they can do for us.


Snapshots are similar to debugger breakpoints. We can register a snapshot on any line of code in our application. Every time it’s triggered it will automatically record the full stack trace and the value of every visible variable. Just like normal breakpoints, this will include local variables, method parameters, and class fields, and will do so up the entire stack frame.


The main difference between snapshots and debugger breakpoints is that snapshots are non-intrusive. They do not cause anything to block – be it the entire application or just the executing thread. They record the current state of execution, and the application carries on without interrupting the application in any way.


Traditional blocking breakpoints such as in a debugger would cause either the single thread or else the entire application to stop pause whilst we are looking at the details. Lightrun allows us to do all of this without affecting the live application at all.


2.1. Placing Snapshots


Snapshots are placed into our application directly from our code editor. In this article, we are using IntelliJ IDEA, but everything can also be achieved from Visual Studio Code as well. We need to determine where in the application we want to place a snapshot. We can then right-click on this line of code and select “Lightrun > Snapshot (Virtual Breakpoint)” from the menu:

快照是直接从我们的代码编辑器放置到我们的应用程序中的。在本文中,我们使用的是IntelliJ IDEA,但一切也可以从Visual Studio Code中实现。我们需要确定在应用程序中我们想要放置快照的位置。然后,我们可以右击这行代码,从菜单中选择 “Lightrun > Snapshot (Virtual Breakpoint)”。

place snapshot

Doing this will then open a dialog allowing us to specify the details of the snapshot:


create snapshot

The default behavior of this is relatively simple but often this is what is most useful. It will:


  • Take a snapshot of the exact line selected.
  • With no conditions on when to trigger the snapshot.
  • With no extra expressions to record with the snapshot.
  • Recording only the first time the snapshot is triggered.
  • Expiring one hour after it was added.

This means that it will record the exact state of execution the next time this line of code is executed, as long as it was within the next hour. When we’re diagnosing an issue this is often the most useful setup, because we want to be in control of what is recorded without getting any noise in the recordings. We’ll see more about all of these later in the article.


Once we have done this, a blue camera icon is placed next to the line of code that the snapshot is registered for. This indicates that our snapshot has been successfully placed and will record when triggered:


snapshot added

When the snapshot gets triggered, we’ll automatically have the details available to see in our editor. This looks and functions almost exactly the same as the IntelliJ breakpoint panel, because it is designed to serve the same purpose:


snapshot triggered

Here we can immediately see the full stack trace to our execution, and the variables that we could access. This includes the variable this, inside of which we can see the fields of the current class instance. We can also drill into these as far as we want, and click into other methods in the stack frame, in the exact same way as with a traditional debugger.


2.2. Conditional Snapshots


In some scenarios, we want to record our snapshots only when certain conditions are met. For example, only when the current user is a particular username.


Lightrun snapshots allow us to specify a condition as part of setting them up. This is very similar to the way conditional breakpoints work in our debugger. Our condition is specified as a Java expression that evaluates to either true or false. This can access anything that would be visible at the point the snapshot is triggered, meaning any local variables, parameters, class fields, or anything else.


For example, imagine we have a method with a parameter id. We want to record a snapshot when this is called, but only when the ID provided is a particular value. We can set this up with a condition triggered exactly as desired:


conditional snapshot

This means that the snapshot will only be triggered when called with our particular test value, but any other live usage of the service will be ignored unless they happen to use the same value. This helps ensure that what we see in our snapshots pane is exactly what we want, without any extra noise cluttering it up and making our diagnosis harder.


2.3. Additional Expressions


In some situations, we may have additional values that we want to record as part of a snapshot.


These might be computations from other values just to make life easier – for example, drilling into nested values to surface them easier. These might also be calls to get values that otherwise wouldn’t be recorded – for example, from static variables such as RequestContextHolder or SecurityContextHolder. They can even be method calls on any of the values that we can see, and record the result of these methods.


Expressions are added in a very similar way to conditions, by entering the expression to be recorded into the Snapshot dialog:


snapshot expressions

We can add as many expressions as needed to a single snapshot, and all of them will be calculated and recorded whenever the snapshot is triggered.


These values then appear in the recorded snapshot as part of the “Variables” pane, with a different icon to indicate that they were manually added expressions instead of automatically detected variables:

然后,这些值作为 “变量 “窗格的一部分出现在记录的快照中,用不同的图标表示它们是手动添加的表达式,而不是自动检测的变量。

snapshot variables

2.4. Recording Multiple Snapshots


In some situations, we might want to record multiple snapshots from the same place. For example, we might want to run several slightly different requests through the system and be able to compare the snapshots for them to identify the differences.


Lightrun Snapshots will by default only record a single snapshot, but we can configure it to instead record as many as we want by entering a maximum hit count into the Snapshot dialog:


multiple snapshot

Doing this will then record snapshots for this number of executions, and make them available to us in our editor:


view multiples snapshots

Now that we have multiple snapshots to work with, we need to know which ones are which. By clicking on the “i” icon to the side of the “Snapshot” tabs we get an information dialog about this exact snapshot:

现在我们有多个快照可以使用,我们需要知道哪个是哪个。通过点击 “快照 “标签边上的 “i “图标,我们得到一个关于这个确切快照的信息对话框。

snapshot info

Here we can see the server instance that the snapshot was recorded from and the time that it was recorded. We can now record as many snapshots as we need and determine which ones are which so that we can better diagnose what’s going on.


2.5. Automatically Expiring Snapshots


Recording snapshots does have a small performance cost on our application. They also cause data transfer from our application to the Lightrun servers, which can potentially incur costs. This means that whilst snapshots are immensely useful for diagnosing issues, we want to ensure that they don’t hang around longer than they’re needed. Lightrun solves this for us by automatically expiring snapshots so that they only impact our application for our exact needs and no more.


By default, snapshots will automatically be disabled after 1 hour. We can set this to an incredibly short period of time if we want to do some focused testing ourselves with minimal impact. Alternatively, we can set this to a very long period of time, for example, to capture any occurrences of a particular issue happening overnight, over a weekend, or even longer.


We can adjust the duration for which the snapshot will be active from the “Advanced” section of the Create Snapshot dialog. This gives us an extra option for the Expiry time, which lets us specify, in hours, minutes, and seconds, how long the snapshot will be active for:

我们可以在创建快照对话框的 “高级 “部分调整快照的活动时间。这为我们提供了一个额外的到期时间选项,让我们以小时、分钟和秒为单位,指定快照将被激活的时间。

snapshot expiry

After this time the snapshot will remain present so that the recorded snapshots are still available. However, it will stop recording anything – even if we haven’t yet reached the maximum hit count. If we don’t change this then the default will be 1 hour. When this time passes, the camera icon for our snapshot turns red to indicate that it is no longer active:


stop recording

Note that the snapshot remains in our system because otherwise, the recorded data wouldn’t be available. However, it won’t record any more data unless it is re-enabled.


3. Logs


Another facility that Lightrun offers are the ability to add logging statements into our application dynamically without needing to change or restart anything.


Logs are similar to Snapshots in how they are configured but are different in their purposes. Snapshots work by recording the exact state of the thread at the time they are triggered. Logs instead will write out to the log stream the information required.


This means that many log messages – either built into the application or dynamically added by Lightrun – will mix together in the log stream and give a bigger picture of what’s happening. We will get to see both the log messages from our application and any logs added by Lightrun combined into the same stream, giving us a full picture of exactly what is happening.


3.1. Adding Dynamic Logs


Adding dynamic logs with Lightrun is done in a very similar way to adding Snapshots. We right-click on the line we want to add the log statement before and select “Lightrun > Log” from the menu:

用Lightrun添加动态日志的方式与添加快照非常相似。我们右击我们想要添加日志语句的行,从菜单中选择 “Lightrun > Log”。

dynamic log

This then gives us a dialog to configure the dynamic log statement with, and then add it to our running application:


create log

This gives us the ability to specify the log message that will be output – which can include dynamic expressions as part of the message. We can also specify a condition required to trigger the log message, in exactly the same way as snapshot conditions work.


By default, these log messages will expire after 1 hour, but this can also be changed the same as for snapshots by clicking on the “Advanced” button.

默认情况下,这些日志信息将在1小时后过期,但这也可以通过点击 “高级 “按钮来改变,与快照相同。

Log messages also have a logging level, which defaults to INFO but we can change them to DEBUG, WARN, or ERROR as desired.


Once we’ve added a log statement, the editor will indicate this in the code view to show where the log statement is and what it is doing:


log statement

3.2. Viewing Logs


By default, our dynamic log messages are written out using Java Util Logging. In this case, we are able to see them interleaved with any other log messages that the application produces, which can give more information:

默认情况下,我们的动态日志信息是使用Java Util Logging写出来的。在这种情况下,我们能够看到它们与应用程序产生的任何其他日志消息交错在一起,这可以提供更多信息。

view logs

It is also possible to have the log messages sent to our editor to view locally. These can be seen in the Lightrun console similar to how we see Snapshots. This can be very useful if we want to add logging to a system without adding extra noise to the output log files, especially if those logs are being consumed by other team members or other systems:


lightrun console logs

We can change where the log messages output goes by opening the agent menu in the sidebar and selecting Log Piping:


log piping

From here we can select between App – which means writing to the configured Java Util Logging setup of the application, Plugin – which means writing to the Lightrun Plugin active in our editor, or Both. Note that this is done for an entire Lightrun agent and not for individual log messages.

从这里我们可以选择App–这意味着写到应用程序的配置的Java Util Logging设置,Plugin–这意味着写到我们编辑器中激活的Lightrun Plugin,或者两者都写。注意,这是为整个Lightrun代理做的,而不是为单个日志信息做的。

Because of the way the Lightrun agent works, the Java Util Logging config is not the standard one from the application. Instead, there are some Lightrun agent flags that are needed to configure the destination and output format of the Lightrun dynamic logger when it’s writing to Java Util Logging.

由于Lightrun代理的工作方式,Java Util Logging的配置不是来自应用程序的标准配置。相反,有一些Lightrun代理标志需要配置Lightrun动态记录器在写入Java Util Logging时的目的地和输出格式。

3.3. Logging Expressions


Logging simple strings is already useful. However, logging values from the application is significantly more useful. In the same way that our snapshots can include custom expressions, we’re able to do this for logs.


When we do this with logs, we’re adding the expression directly into the log message. This is done by wrapping the expression in curly braces:


Searching tasks: status={status}, createdBy={createdBy}

When we do this, any of these expressions will be automatically expanded when the log statement is generated:


logging expressions

These expressions can be anything that can be determined at the point the log statement is generated, in the exact same way as for snapshots.


These expressions can sometimes take up a lot of CPU time to calculate. If this happens, Lightrun may automatically pause a particular log so that they do not interfere with the running of the application. As such, it is recommended to keep logging expressions as simple as possible.


4. Metrics


The final action that we can perform with Lightrun is to record some metrics about our application. This gives us the ability to see usage details of our application – for example, how often certain things happen or how long they take.

我们可以通过Lightrun执行的最后一项操作是记录一些关于我们应用程序的指标这使我们能够看到我们应用程序的使用细节 – 例如,某些事情发生的频率或花费的时间。

In the same way as with Snapshots and Logs, Metrics are added by right-clicking on the appropriate line of code and selecting “Lightrun > Metrics”:

与快照和日志的方式相同,指标的添加是通过右击适当的代码行并选择 “Lightrun > Metrics”。

lightrun metrics

Immediately we can see that this is slightly different – we have different types of metrics that we can add:


  • Counter – This records a simple count of the number of times a line of code was executed.
  • Time Measure – This records the time it takes to get between two lines of code.
  • Method Duration – This records the time it takes between entering and exiting a method.
  • Custom Metric – This uses a custom expression to generate the metric, based on values available in the code.

In each case, we get the standard Lightrun dialog for creating our metric. This lets us configure the metric, including adding a name for the metric, conditions under which it is triggered, and an expiry time after which it stops working – in the exact same way that we can for Snapshots and Logs.


These metrics are output to the logging process by default, but can also be integrated into StatsD, Prometheus, and other tools if desired.


4.1. Counters


Counters are a simple measure of the number of times some code was executed. Every time our line of code is reached, the counter increments by 1, and we can then see how often this has happened.


Adding a counter is done by selecting Counter from the Lightrun menu and then filling in the dialog:


lightrun counter

Most of this is fairly standard. The only bit that is unusual is the “Name” field – we need to give every counter a unique name so that we can track them all.

这其中的大部分是相当标准的。唯一不寻常的是 “名称 “字段–我们需要给每个计数器一个独特的名称,以便我们能够跟踪它们。

In particular, one powerful feature here is that we can set up counters – as with all of the metrics – that have conditions attached to them. This gives us the ability to count the number of times a certain line of code is reached only when other conditions are met, for example only for a certain user or affecting certain records.


Adding a counter doesn’t do anything immediately. However, once it is triggered for the first time, it will start reporting out the metric values in a similar way to how logs are output – to the logging output based on the piping setting.

添加一个计数器并不会立即做任何事情。然而,一旦它第一次被触发,它将开始以类似于日志输出的方式报告出度量值 – 基于管道设置的日志输出。

Our editor shows the values from the metric every 10 seconds:


log metrics

Whereas the log output shows the metric every second:


log output metrics

This output is controlled by the exact same setting as log output and will interleave our metrics with our logs to give a better picture of what is happening. This lets us track over time how the metric has changed, to see the rate at which the code was triggered.


4.2. Time Duration


Where Counters are used to measure the number of times a line of code was executed, a Time Duration is used to measure how long the code is running. These are also called “TicToc” metrics – Tic to start recording and Toc to stop recording, similar to the noise of a clock.

计数器用于测量一行代码的执行次数,而时间长度则用于测量代码的运行时间。这些也被称为 “TicToc “指标–Tic表示开始记录,Toc表示停止记录,类似于时钟的噪音。

When creating a time duration, we need to configure exactly which piece of code we’re measuring. This is done by specifying the lines to start and stop recording on – either by selecting a block of code before opening the dialog or by entering the line numbers within the dialog:


time duration

Other than this, creating a time duration is the same as creating a counter.


Once created, these metrics will start outputting immediately, rather than waiting for the first time they are triggered. These outputs are to both the plugin and application standard output, exactly the same as for counters, and will show us how often the section of code was run and the fastest, slowest, and mean times that the code has taken to run:


time duration metrics

4.3. Method Duration


Method Duration metrics are essentially the same as Time Duration, but instead of specifying start and end lines we specify the full method:


method duration

Once added, this functions exactly the same as time duration metrics covering the whole method body. The start time is considered to be when the method is entered, and the end time is when it is left, regardless of whether it’s by a return or an exception:


method duration logs

We can see from the output here that these are actually “TicToc Log” entries too, only Lightrun has automatically determined the start and stop points for us based on the method itself and not based on lines of code.

我们可以从这里的输出看到,这些实际上也是 “TicToc日志 “条目,只是Lightrun根据方法本身,而不是根据代码行,自动为我们确定了开始和停止点。

4.4. Custom Metric


Our final metric is simply a custom metric. This lets us aggregate numbers that are present in our code somehow, whatever those numbers are. For example, we might want to count the number of records returned in some search results.


When creating this type of metric, we’re required to specify an expression. This expression returns the number that our metric will aggregate, and like elsewhere it can be any expression that can be calculated at this point in our code:


custom metrics

When these metrics output, it shows us the number of times it was triggered and the maximum, minimum, and mean values for our expression:


custom metrics logs

5. Summary


Here we have covered in detail the main ways that Lightrun can give us more insight into our applications and better understand how they are working.


Why not use it in your next application, to better understand what is going on or even to help diagnose any issues that might be going on?