跳至主要内容

Microservice Kingdom: Memory in Danger

Microservice Kingdom: Memory in Danger

The Microservice kingdom has grown up a lot in recent years. It has many very powerful weapons (frameworks or tools) like Spring Cloud, Dubbo, ServiceComb, Service Mesh1. With the help of those weapons, the Microservice kingdom take many territory of original Monolithic kingdom.


One day, when the king of Microservice inspect his country, a sentinel called free run to his face and reported in hurry, “My Lord. The memory in our server 1 region is in danger: there are only 5% memory is in control of our linux officer.”

“Don’t be so hurry. Tell me the details. When did this happen?” The King is very experienced and want more details.

“Since the last time when server 1 introduced the docker, the memory usage of every microservices, like user-service, gateway-service etc, have been in consistent increase.” free reported.

“How dare you. The docker is the new military officer2 our great King introduced.” The DevOps officer who introduced docker rebuke the free angrily.

“Calm down. We are know it has nothing to do the docker. There must be some mis-understanding between docker and other microservices. Anyway, for now, does any officer has some suggestions about how to deal with the memory crisis in server 1?” King says.

"docker support the resource limit. We can limit the memory upper bounds docker can use if we have to." The DevOps officer says reluctantly.

“Yes, we have to. The linux officer has a very very stubborn but loyal subordinate called OOM-killer. If he detect there is no more memory left for his boss – linux, he will choose some process and kill them to release the memory. Even our microservices may be killed in that case3.”

“Can’t we fire that guy?” The DevOps officer don’t want to limit his docker's development.

“No, we can’t. If linux let the memory usage increase and not respond to the OutOfMemory, itself will crash, which will affect even more people4.” The King refused.

“Ok, I will do the limitation.” The DevOps officer decide to do it by himself. He then send the docker some command5 and tell him to restart:

docker run --memory=xxm --memory-swap=xxm ...

A few days later, the docker officer send sentinel docker ps coming in emergency. After listening the report, the Microservice King call together all the officer to have meeting.

"Just now, the docker reported that our user-service is killed. docker ps, come in, show us the failure.

$ docker ps | grep 'user-service'
$ docker ps -a | grep 'user-service'
0f742445f839        user-service        java ...    16 hours ago  

And the dmesg sentinel is also reported that the a docker container is killed . Any officer has some suggestions?" King speak seriously.

All the officer are too afraid to speak because they understand this is a very critical failure.

After some time, the core officer Java stand out and says, “Our department has some competent staff like jps, jinfo, jmap, we can send them to investigate this failure.”

“Fine, so show me why this happens and how to deal with those situation in 2 days.” King ordered.


The Java officer arrived the accident scene and only find some wreckage (stack trace):

[  583.447974] Pid: 1954, comm: java ...
[  583.447980] Call Trace:
[  583.447998]  [<ffffffff816df13a>] dump_header+0x83/0xbb
[  583.448108]  [<ffffffff816df1c7>] oom_kill_process.part.6+0x55/0x2cf
[  583.448124]  [<ffffffff81067265>] ? has_ns_capability_noaudit+0x15/0x20
[  583.448137]  [<ffffffff81191cc1>] ? mem_cgroup_iter+0x1b1/0x200
[  583.448150]  [<ffffffff8113893d>] oom_kill_process+0x4d/0x50
...
[  583.448275]  [<ffffffff8115b4d3>] do_anonymous_page.isra.35+0xa3/0x2f0
[  583.448288]  [<ffffffff8115f759>] handle_pte_fault+0x209/0x230
[  583.448301]  [<ffffffff81160bb0>] handle_mm_fault+0x2a0/0x3e0
[  583.448320]  [<ffffffff816f844f>] __do_page_fault+0x1af/0x560
[  583.448341]  [<ffffffffa02b0a80>] ? vfsub_read_u+0x30/0x40 [aufs]
[  583.448358]  [<ffffffffa02ba3a7>] ? aufs_read+0x107/0x140 [aufs]
[  583.448371]  [<ffffffff8119bb50>] ? vfs_read+0xb0/0x180
[  583.448384]  [<ffffffff816f880e>] do_page_fault+0xe/0x10
[  583.448396]  [<ffffffff816f4bd8>] page_fault+0x28/0x30
[  583.448405] Task in /lxc/0f742445f8397ee7928c56bcd5c05ac29dcc6747c6d1c3bdda80d8e688fae949 killed as a result of limit of /lxc/0f742445f8397ee7928c56bcd5c05ac29dcc6747c6d1c3bdda80d8e688fae949
[  583.448412] memory: usage xxxMB, limit xxxMB, failcnt 342

No heap dump, no core dump, the investigation mired in a stalemate. Just at this moment, the docker says we can restart the user-service and you guys can jump into container to see what happens. Java think it is a good idea to try to reproduce the error so they started to do so.

$ docker run --memory=xxm --memory-swap=xxm -d ... -name user-service
$ docker exec -it user-service bash

The Java officer come into the container, followed by jps, jmap, jinfo and officer’s son java. The jmap is eager to show his ability, so he says, “I can show the memory usage of a process, organized by classloader, or heap …”

“So show us” the Java interrupt him.

The jmap called the jps and head to help him, then start the working:

$ jps
1 Jar
xxx jps
$ jmap -histo <vmid> | head
 num     #instances         #bytes  class name
----------------------------------------------
   1:          2083       18549536  [B
   2:          1654        2146632  [I
   3:         15388        1471480  [C
   4:          3671         409312  java.lang.Class
   5:         15031         360744  java.lang.String
   6:          2909         314808  [Ljava.lang.Object;
   7:          7206         230592  java.util.concurrent.ConcurrentHashMap$Node

They analyzed some time the memory usage and whether their exists memory leakage. Just as they analyzing, the memory usage keeps raising. Analyzing heap gives no results, they become anxious. At that time, the Java officer notice that the jps and jinfo is playing:

$ jps -lvm
1 Jar
...
$ jinfo -flags <vmid> 
...
Non-default VM flags: -XX:CICompilerCount=3
Command line:  -Djava.awt.headless=true ...
$ jinfo -sysprops <vmid>

When he is about to get angry, he notice that the command line lack -Xmx which is the max memory limit for JVM. Suddenly, a insight dawn on him: those Microservice not have JVM memory settings because they are Spring Boot project which is started with simple java -jar xxx.jar.

Java officer call the java to come here and say, “show us the default memory limit of JVM in this environment”. The java start to do it right now:

$ java -XX:+PrintFlagsFinal -version | grep -iE 'HeapSize|PermSize'
    uintx ErgoHeapSizeLimit                         = 0                                   {product}
    uintx HeapSizePerGCThread                       = 87241520                            {product}
    uintx InitialHeapSize                          := 192937984                           {product}
    uintx LargePageHeapSizeThreshold                = 134217728                           {product}
    uintx MaxHeapSize                              := 3072327680                          {product}

Seeing the MaxHeapSize number is much larger than the memory limit docker has set, they all understand: JVM not know the docker’s memory limit and not going to do GC, then he just require more memory and finally being killed. And the solution is also very simple: adding the -Xmx700m to startup option.


Errors When Use Java Util

Ref

Written with StackEdit.


  1. This is four mainstream Microservice frameworks for Java Web world. ↩︎

  2. Docker is a virtualization technique which can be used to isolate resource & environment and provide elastic service expansion. We call it ‘military officer’ because we draw the analogy between Microservice framework and ‘weapon’. Considering the docker is used to manage those Microservice application, we name it ‘military officer’. ↩︎

  3. An example can be seen from this blog post: OOM killer kill the tomcat ↩︎

  4. Actually, we can turn off the linux’s memory overcommit to avoid linux to crash, but all memory allocation call malloc will fail, which makes all process fail to proceed and make system not working. So, we might better not to turn off this feature. ↩︎

  5. The swap in docker is also cause much performance penalty, so we might better disable it. In docker, the memory-swap include the memory and swap, so, in order to disable swap, we set same value for memory and memory-swap. For more details, you can refer to this page ↩︎

评论

此博客中的热门博文

Spring Boot: Customize Environment

Spring Boot: Customize Environment Environment variable is a very commonly used feature in daily programming: used in init script used in startup configuration used by logging etc In Spring Boot, all environment variables are a part of properties in Spring context and managed by Environment abstraction. Because Spring Boot can handle the parse of configuration files, when we want to implement a project which uses yml file as a separate config file, we choose the Spring Boot. The following is the problems we met when we implementing the parse of yml file and it is recorded for future reader. Bind to Class Property values can be injected directly into your beans using the @Value annotation, accessed via Spring’s Environment abstraction or bound to structured objects via @ConfigurationProperties. As the document says, there exists three ways to access properties in *.properties or *.yml : @Value : access single value Environment : can access multi

Elasticsearch: Join and SubQuery

Elasticsearch: Join and SubQuery Tony was bothered by the recent change of search engine requirement: they want the functionality of SQL-like join in Elasticsearch! “They are crazy! How can they think like that. Didn’t they understand that Elasticsearch is kind-of NoSQL 1 in which every index should be independent and self-contained? In this way, every index can work independently and scale as they like without considering other indexes, so the performance can boost. Following this design principle, Elasticsearch has little related supports.” Tony thought, after listening their requirements. Leader notice tony’s unwillingness and said, “Maybe it is hard to do, but the requirement is reasonable. We need to search person by his friends, didn’t we? What’s more, the harder to implement, the more you can learn from it, right?” Tony thought leader’s word does make sense so he set out to do the related implementations Application-Side Join “The first implementation

Implement isdigit

It is seems very easy to implement c library function isdigit , but for a library code, performance is very important. So we will try to implement it and make it faster. Function So, first we make it right. int isdigit ( char c) { return c >= '0' && c <= '9' ; } Improvements One – Macro When it comes to performance for c code, macro can always be tried. #define isdigit (c) c >= '0' && c <= '9' Two – Table Upper version use two comparison and one logical operation, but we can do better with more space: # define isdigit(c) table[c] This works and faster, but somewhat wasteful. We need only one bit to represent true or false, but we use a int. So what to do? There are many similar functions like isalpha(), isupper ... in c header file, so we can combine them into one int and get result by table[c]&SOME_BIT , which is what source do. Source code of ctype.h : # define _ISbit(bit) (1 << (