跳至主要内容

Java JIT Inline

Java JIT Inline

This is a follow-up blog that I write after answering a related question on SO. Through this question, we can understand why we need to know the internal of machine/os/language etc.

Introduction of JIT

The JIT (just-in-time) compiler is a important component of JVM which is used to optimize the performance of Java. As it name indicates, JIT can perform just-in-time compilation, which is the compilation process from byte code into native code of specific platform to make program run faster.
And JIT compiler can not only makes standard compilation, but also it can performs optimizations like OSR (on stack replacement), class hierarchy analysis etc.

Today, we will see how JIT do inline optimization and how it affects the performance of program.

Inline Function

First, we get to understand what is inline function. To inline a function is to substitute the function call with the function body. Compiler can inline some small hot functions into caller which address the overhead of a function invocation, which is common in the optimization of compiled language, like c/c++.

Then, we want see when inline will work in JIT. We already know that the JIT will try to inline frequently called methods in order to avoid the overhead of method invocation. But we should not forget that the heuristic it uses depends on both how often a method is invoked and also on how big it is. Remembering this two criteria, we won’t miss optimization when it is possible.

Inline in JIT

Now, we learn how inline affect the performance of a program from an example.

The following is the main part of a simple program which calculates fib number uses recursion. I have two version of test: one use a fixed number; second use a random number to avoid some optimizations of JIT (they get similar results, so I just show one version of results for brief).

private static int fib(int i) {
    if (i == 0 || i == 1) {
        return 1;
    }
    return fib(i - 1) + fib(i - 2);
}

public static void main(String[] args) {
    Random random = new Random();
    long startTime = System.nanoTime();
    for (int i = 0; i < 1000; i++) {
//            fib(random.nextInt(5) + 25);
        fib(28);
    }
    long endTime = System.nanoTime();
    System.out.format("%.2f seconds elapsed.\n", (endTime - startTime) / 1000.0 / 1000 / 1000);
}

Run program with the following JVM options, we can print some JIT compiler info to assist our understanding of this process:

-XX:+CITime -XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining -XX:+PrintCompilation 
  • -XX:+PrintCompilation: logs when JIT standard compilation happens
  • -XX:+UnlockDiagnosticVMOptions: enables other developer flags like -XX:+PrintInlining
  • -XX:+PrintInlining: prints what methods get inlined and where
  • -XX:+CITime: print various statistics about compilations

Soon, we get the following results:

  task-id
    152   36       4       opt.InlineTest::fib (25 bytes)
                              @ 14   opt.InlineTest::fib (25 bytes)   inline (hot)
                                @ 14   opt.InlineTest::fib (25 bytes)   recursive inlining is too deep
                                @ 20   opt.InlineTest::fib (25 bytes)   recursive inlining is too deep
...
   2429   42       3       sun.misc.FDBigInteger::trimLeadingZeros (57 bytes)
2.28 seconds elapsed.

Accumulated compiler times (for compiled methods only)
------------------------------------------------
  Total compilation time   :  0.015 s
    Standard compilation   :  0.015 s, Average : 0.000
...

We can see two statistic from results which can be used to compare with following test without JIT inline:

  • Program runs 2.28s
  • JIT compiler costs 0.014s

Using same program, we can run with the following compiler commands, in which

  • XX:CompileCommand=dontinline,opt.InlineTest::fib will forbid the inline, which can be seen from the output of compiler;
-XX:+CITime -XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining -XX:+PrintCompilation -XX:CompileCommand=dontinline,opt.InlineTest::fib

Output is like following:

  task-id
    149   35       4       opt.InlineTest::fib (25 bytes)
                              @ 14   opt.InlineTest::fib (25 bytes)   disallowed by CompilerOracle
                              @ 20   opt.InlineTest::fib (25 bytes)   disallowed by CompilerOracle
...
   3062   41       3       sun.misc.FDBigInteger::trimLeadingZeros (57 bytes)
2.91 seconds elapsed.
Accumulated compiler times (for compiled methods only)
------------------------------------------------
  Total compilation time   :  0.015 s
    Standard compilation   :  0.015 s, Average : 0.000
...

Compare these two results, we can see:

  • Compiler task id is larger when we forbid inline, which shows compiler do more work when we forbid inline. This shows inline will not cause too much work of JIT;
  • Program spent more time (2.91s compares to 2.28s) which proves that inline hot function indeed saves us time;
  • The time JIT compiler costs almost same when we disable inline and enable it, which verifies inline function is indeed a useful technique not taking extra overhead;

Ref

Written with StackEdit.

评论

此博客中的热门博文

Spring Boot: Customize Environment

Spring Boot: Customize Environment Environment variable is a very commonly used feature in daily programming: used in init script used in startup configuration used by logging etc In Spring Boot, all environment variables are a part of properties in Spring context and managed by Environment abstraction. Because Spring Boot can handle the parse of configuration files, when we want to implement a project which uses yml file as a separate config file, we choose the Spring Boot. The following is the problems we met when we implementing the parse of yml file and it is recorded for future reader. Bind to Class Property values can be injected directly into your beans using the @Value annotation, accessed via Spring’s Environment abstraction or bound to structured objects via @ConfigurationProperties. As the document says, there exists three ways to access properties in *.properties or *.yml : @Value : access single value Environment : can access multi

Elasticsearch: Join and SubQuery

Elasticsearch: Join and SubQuery Tony was bothered by the recent change of search engine requirement: they want the functionality of SQL-like join in Elasticsearch! “They are crazy! How can they think like that. Didn’t they understand that Elasticsearch is kind-of NoSQL 1 in which every index should be independent and self-contained? In this way, every index can work independently and scale as they like without considering other indexes, so the performance can boost. Following this design principle, Elasticsearch has little related supports.” Tony thought, after listening their requirements. Leader notice tony’s unwillingness and said, “Maybe it is hard to do, but the requirement is reasonable. We need to search person by his friends, didn’t we? What’s more, the harder to implement, the more you can learn from it, right?” Tony thought leader’s word does make sense so he set out to do the related implementations Application-Side Join “The first implementation

Implement isdigit

It is seems very easy to implement c library function isdigit , but for a library code, performance is very important. So we will try to implement it and make it faster. Function So, first we make it right. int isdigit ( char c) { return c >= '0' && c <= '9' ; } Improvements One – Macro When it comes to performance for c code, macro can always be tried. #define isdigit (c) c >= '0' && c <= '9' Two – Table Upper version use two comparison and one logical operation, but we can do better with more space: # define isdigit(c) table[c] This works and faster, but somewhat wasteful. We need only one bit to represent true or false, but we use a int. So what to do? There are many similar functions like isalpha(), isupper ... in c header file, so we can combine them into one int and get result by table[c]&SOME_BIT , which is what source do. Source code of ctype.h : # define _ISbit(bit) (1 << (