Monitor: Spring Boot Actuator & Prometheus

When our application has been developed and ready to deploy, we need to add another very important functionality – functionality to monitor the state of application. The state of application can be divided into two main subtypes:

What programmer should care: CPU load, IO rate, JVM state, latency of API interfaces, …
What product manager will care: PV (page views), UV (unique views), visiting path, element click, …

Today, we focus only on the first kind of monitoring, which is almost same for all Java applications.

Actuator & Prometheus

In one aspect, Spring Boot Actuator provide some useful endpoints to get the state of application, and it support the output in prometheus format in 2.0, so we can have a try to let actuator to collect info for us rather than do it by ourselves (Many programming language has some programming interface to export the state of some variables, for example, JMX in Java).

As the book Site Reliability Engineering introduced, Prometheuss is a very powerful and handy framework to store & query metric info of our application. It will store the collected in a Time-Serial Database, with many labels to identify a unique data item. The following is two samples:

http_request_duration_microseconds{handler="prometheus",instance="192.168.1.100:1234",job="mysql-integration",quantile="0.5"} 75630.254 http_request_duration_microseconds{handler="prometheus",instance="192.168.1.100:1234",job="mysql-integration",quantile="0.9"} 86262.231

                    |-------------|
                    |  prometheus |
                    |_____________|
                   /
                  / pull
                 /
            |/info |   |/trace|
            -------------------
            |  micro service  |
            -------------------

Prometheus Config

Because the role of Prometheus in our basic model is the active consumer to pull data, we need to config it like following under the scrape_configs:


  # config for a single application scrape
 - job_name: 'test-222'
    scrape_interval: 1m
    metrics_path: '/prometheus'
    static_configs:
     - targets: ['192.168.1.222:8080']
       labels:
         instance: test-222

  # config all services registered in consul
 - job_name: 'consul'
    scrape_interval: 1m
    metrics_path: '/prometheus'
    consul_sd_configs:
       - server: '192.168.1.204:8501'
    # give `instance` label a new customized
    relabel_configs:
      - source_labels: [__meta_consul_service_id]
        target_label: instance

In the most cases, the median number of monitoring statistic is always not so useful because the distribution of request is in two common forms:

In both cases, median number may looks fine but conceals some server problems. Therefore, use the quantile is preferable. The commonly used is quantile of 90, 95, 99 etc.

In prometheus, in order to select out the quantile, we can do like following:

# rate is a not so accurate mean num in a range, which is more suitalbe for long range trend (refer ref in the bottom if you want more info)
histogram_quantile(0.95, rate(http_server_requests_seconds_bucket[10m]))

Spring Boot 2.0

After the basic config of Prometheus, we should config actuator in Spring Boot. The endpoints in Spring Boot 2.0 is default not exposed, so we can config exposure like following for convenience:

management:  
 endpoints: 
  web: 
   exposure:
    include: '*'

In order to support the format Prometheus wants, we need to add one more dependency except spring-boot-starter-actuator:

<dependency>
    <groupId>io.micrometer</groupId>
    <artifactId>micrometer-registry-prometheus</artifactId>
    <version>1.0.1</version>
</dependency>

Now, start our application, we can see mappings of /actuator/prometheus:

2018-03-25 09:08:31,700 INFO --- [main] s.b.a.e.w.s.WebMvcEndpointHandlerMapping : Mapped "{[/actuator/prometheus],methods=[GET],produces=[text/plain;version=0.0.4;charset=utf-8]}" onto public java.lang.Object org.springframework.boot.actuate.endpoint.web.servlet.AbstractWebMvcEndpointHandlerMapping$OperationHandler.handle(javax.servlet.http.HttpServletRequest,java.util.Map<java.lang.String, java.lang.String>)

And access the endpoint, we can see output like:

# HELP logback_events_total Number of error level events that made it to the logs
# TYPE logback_events_total counter
logback_events_total{level="error",} 0.0
logback_events_total{level="warn",} 0.0
logback_events_total{level="info",} 30.0
logback_events_total{level="debug",} 30.0
logback_events_total{level="trace",} 0.0
# HELP jvm_memory_used_bytes The amount of used memory
# TYPE jvm_memory_used_bytes gauge
jvm_memory_used_bytes{area="nonheap",id="Code Cache",} 8527872.0
jvm_memory_used_bytes{area="nonheap",id="Metaspace",} 3.6548264E7
jvm_memory_used_bytes{area="nonheap",id="Compressed Class Space",} 5055592.0
jvm_memory_used_bytes{area="heap",id="PS Eden Space",} 8.7630368E7
jvm_memory_used_bytes{area="heap",id="PS Survivor Space",} 0.0

A simple sample can be found here.

It works, but Spring Boot 2.0 is just coming out (2018.03), that means most application is not written in 2.0 and upgrade is not so easy considering the change between those two big versions. So, how to apply it in our old versioned project? We will continue in next blog.

Ref

Written with StackEdit.

On teh way

Blog Search