Microbenchmarking with Java – 用Java做微基准测试

最后修改: 2017年 7月 24日

1. Introduction


This quick article is focused on JMH (the Java Microbenchmark Harness). First, we get familiar with the API and learn its basics. Then we would see a few best practices that we should consider when writing microbenchmarks.

这篇快速的文章主要是关于JMH(Java Microbenchmark Harness)的。首先,我们要熟悉API并学习其基本知识。然后,我们将看到一些在编写微基准时应该考虑的最佳实践。

Simply put, JMH takes care of the things like JVM warm-up and code-optimization paths, making benchmarking as simple as possible.


2. Getting Started


To get started, we can actually keep working with Java 8 and simply define the dependencies:

为了开始工作,我们实际上可以继续使用Java 8,并简单地定义依赖关系。


The latest versions of the JMH Core and JMH Annotation Processor can be found in Maven Central.

JMH CoreJMH Annotation Processor的最新版本可以在Maven Central找到。

Next, create a simple benchmark by utilizing @Benchmark annotation (in any public class):


public void init() {
    // Do nothing

Then we add the main class that starts the benchmarking process:


public class BenchmarkRunner {
    public static void main(String[] args) throws Exception {

Now running BenchmarkRunner will execute our arguably somewhat useless benchmark. Once the run is complete, a summary table is presented:


# Run complete. Total time: 00:06:45
Benchmark      Mode  Cnt Score            Error        Units
BenchMark.init thrpt 200 3099210741.962 ± 17510507.589 ops/s

3. Types of Benchmarks


JMH supports some possible benchmarks: Throughput, AverageTime, SampleTime, and SingleShotTime. These can be configured via @BenchmarkMode annotation:

JMH支持一些可能的基准。Throughput, AverageTime, SampleTime, 和SingleShotTime。这些可以通过@BenchmarkMode注解进行配置。

public void init() {
    // Do nothing

The resulting table will have an average time metric (instead of throughput):


# Run complete. Total time: 00:00:40
Benchmark Mode Cnt  Score Error Units
BenchMark.init avgt 20 ≈ 10⁻⁹ s/op

4. Configuring Warmup and Execution


By using the @Fork annotation, we can set up how the benchmark execution happens: the value parameter controls how many times the benchmark will be executed, and the warmup parameter controls how many times a benchmark will dry run before results are collected, for example:


@Fork(value = 1, warmups = 2)
public void init() {
    // Do nothing

This instructs JMH to run two warm-up forks and discard results before moving onto real timed benchmarking.


Also, the @Warmup annotation can be used to control the number of warmup iterations. For example, @Warmup(iterations = 5) tells JMH that five warm-up iterations will suffice, as opposed to the default 20.

另外,@Warmup注解可以用来控制预热迭代的数量。例如,@Warmup(iterations = 5)告诉JMH,5次热身迭代就足够了,而不是默认的20次。

5. State


Let’s now examine how a less trivial and more indicative task of benchmarking a hashing algorithm can be performed by utilizing State. Suppose we decide to add extra protection from dictionary attacks on a password database by hashing the password a few hundred times.


We can explore the performance impact by using a State object:


public class ExecutionPlan {

    @Param({ "100", "200", "300", "500", "1000" })
    public int iterations;

    public Hasher murmur3;

    public String password = "4v3rys3kur3p455w0rd";

    public void setUp() {
        murmur3 = Hashing.murmur3_128().newHasher();

Our benchmark method then will look like:


@Fork(value = 1, warmups = 1)
public void benchMurmur3_128(ExecutionPlan plan) {

    for (int i = plan.iterations; i > 0; i--) {
        plan.murmur3.putString(plan.password, Charset.defaultCharset());


Here, the field iterations will be populated with appropriate values from the @Param annotation by the JMH when it is passed to the benchmark method. The @Setup annotated method is invoked before each invocation of the benchmark and creates a new Hasher ensuring isolation.

在这里,当字段iterations被传递给基准方法时,JMH将用@Param注解中的适当值来填充。@Setup 注释的方法在每次调用基准时被调用,并创建一个新的Hasher以确保隔离。

When the execution is finished, we’ll get a result similar to the one below:


# Run complete. Total time: 00:06:47

Benchmark                   (iterations)   Mode  Cnt      Score      Error  Units
BenchMark.benchMurmur3_128           100  thrpt   20  92463.622 ± 1672.227  ops/s
BenchMark.benchMurmur3_128           200  thrpt   20  39737.532 ± 5294.200  ops/s
BenchMark.benchMurmur3_128           300  thrpt   20  30381.144 ±  614.500  ops/s
BenchMark.benchMurmur3_128           500  thrpt   20  18315.211 ±  222.534  ops/s
BenchMark.benchMurmur3_128          1000  thrpt   20   8960.008 ±  658.524  ops/s

6. Dead Code Elimination


When running microbenchmarks, it’s very important to be aware of optimizations. Otherwise, they may affect the benchmark results in a very misleading way.


To make matters a bit more concrete, let’s consider an example:


public void doNothing() {

public void objectCreation() {
    new Object();

We expect object allocation costs more than doing nothing at all. However, if we run the benchmarks:


Benchmark                 Mode  Cnt  Score   Error  Units
BenchMark.doNothing       avgt   40  0.609 ± 0.006  ns/op
BenchMark.objectCreation  avgt   40  0.613 ± 0.007  ns/op

Apparently finding a place in the TLAB, creating and initializing an object is almost free! Just by looking at these numbers, we should know that something does not quite add up here.


Here, we’re the victim of dead code elimination. Compilers are very good at optimizing away the redundant code. As a matter of fact, that’s exactly what the JIT compiler did here.


In order to prevent this optimization, we should somehow trick the compiler and make it think that the code is used by some other component. One way to achieve this is just to return the created object:


public Object pillarsOfCreation() {
    return new Object();

Also, we can let the Blackhole consume it:


public void blackHole(Blackhole blackhole) {
    blackhole.consume(new Object());

Having Blackhole consume the object is a way to convince the JIT compiler to not apply the dead code elimination optimization. Anyway, if we run theses benchmarks again, the numbers would make more sense:


Benchmark                    Mode  Cnt  Score   Error  Units
BenchMark.blackHole          avgt   20  4.126 ± 0.173  ns/op
BenchMark.doNothing          avgt   20  0.639 ± 0.012  ns/op
BenchMark.objectCreation     avgt   20  0.635 ± 0.011  ns/op
BenchMark.pillarsOfCreation  avgt   20  4.061 ± 0.037  ns/op

7. Constant Folding


Let’s consider yet another example:


public double foldedLog() {
    int x = 8;

    return Math.log(x);

Calculations based on constants may return the exact same output, regardless of the number of executions. Therefore, there is a pretty good chance that the JIT compiler will replace the logarithm function call with its result:


public double foldedLog() {
    return 2.0794415416798357;

This form of partial evaluation is called constant folding. In this case, constant folding completely avoids the Math.log call, which was the whole point of the benchmark.


In order to prevent constant folding, we can encapsulate the constant state inside a state object:


public static class Log {
    public int x = 8;

public double log(Log input) {
     return Math.log(input.x);

If we run these benchmarks against each other:


Benchmark             Mode  Cnt          Score          Error  Units
BenchMark.foldedLog  thrpt   20  449313097.433 ± 11850214.900  ops/s
BenchMark.log        thrpt   20   35317997.064 ±   604370.461  ops/s

Apparently, the log benchmark is doing some serious work compared to the foldedLog, which is sensible.


8. Conclusion


This tutorial focused on and showcased Java’s micro benchmarking harness.


As always, code examples can be found on GitHub.