跳至主要内容

Google RPC

Google RPC

In this blog, we are going to dive into the functionality what google’s RPC – a complex distributed RPC framework – can do, then we may get RPC is not so easy in large scale. We will focus on two main aspects of its RPC framework: load balance & overload handle.

Load Balance

In the blog of The Life of a Request to Google (2), we have introduced that how Google’s RPC framework do the load balance in detail. Here, we just skim the backbone:

  • In order to have better performance, RPC framework keep long-lived TCP connection to service producer;
  • If connection silent long enough, fall back to UDP health check to save cost;
  • Considering the cost to keep connections to all producer (may be up to hundreds), use determined algorithm to decide a subset of service producers to use;
  • Use Weighted Round Robin to send request to its subset of producers, and update weight dynamically according to the response each producer returned.

“In each response (including responses to health checks), backends include the current observed rates of queries and errors per second, in addition to the utilization (typically, CPU usage)”
Site Reliability Engineering

Handle Overload

In order to operating a large distributed system stably, one requirement have to be met: handle overloading. No matter how well the load balance is done, some part of system will finally overload with load increasing.

In order to solve a problem, we first need to define the problem.

How to Define Overload?

Before to define overload, we first need to define load. People usually like to use static statistic like QPS to represent the load of a system, but it is often a poor criteria as it is proven in practice. From the experience of Google’s engineer, CPU consumption is often a good signal to detect system capacity.

Now we know the load of a machine and a standard to define when it is overload, but how to detect it? Still remember the our long lived connection (and UDP packets)? It will contain the health check response, which will report the state of service provider. If service provider says it is in state of lame duck, service consumer will stop sending request to it. In this method, the core is service producer report its own state rather than other consumer to try to guess it, which will be hard to do in distributed system.

Client

In order to solve this problem, we have to do some works both in client & server. It is often misunderstand that overload handling is done by server side. In fact, the client also plays an important role in it.

Considering the situation when a server received so much load from clients, server have to reject some of requests with lower criticality (e.g. health check request) or some timed-out request (discard those requests will save much work because user may already send another request). But for some request, even reject a request is not so cheap, may equally same expansive as server a request.

So, client throttling is the rescue to this problem. In order to implement client-side throttling, google recommends the adaptive throttling with following probability to reject requests:
max(0,requestKacceptsrequests+1) max(0, \frac{request - K * accepts} {requests + 1})

Server

When it comes to the server side, there exists three ways:

  • Redirect request to other service provider when possible;

  • Serve degraded results when necessary;

    responses that are not as accurate as or that contain less data than normal responses, but that are easier to compute. For example:
    Instead of searching an entire corpus to provide the best available results to a search query, search only a small percentage of the candidate set.
    Rely on a local copy of results that may not be fully up to date but that will be cheaper to use than going against the canonical storage.

  • Reject transparently when all else fails.

Above is the overall policy, in detail, server will do the following tasks:

  • Per customer limit: per client overall budget of traffic, budget of error, budget of retry.
  • Server will propagate deadline/criticality of each request if a request need more RPC invocation;
  • When a customer runs out of global quota, a backend server will only reject requests of a given criticality if it’s already rejecting all requests of all lower criticalities (in fact, the per-customer limits that our system supports, described earlier, can be set per criticality).

Written with StackEdit.

评论

此博客中的热门博文

Spring Boot: Customize Environment

Spring Boot: Customize Environment Environment variable is a very commonly used feature in daily programming: used in init script used in startup configuration used by logging etc In Spring Boot, all environment variables are a part of properties in Spring context and managed by Environment abstraction. Because Spring Boot can handle the parse of configuration files, when we want to implement a project which uses yml file as a separate config file, we choose the Spring Boot. The following is the problems we met when we implementing the parse of yml file and it is recorded for future reader. Bind to Class Property values can be injected directly into your beans using the @Value annotation, accessed via Spring’s Environment abstraction or bound to structured objects via @ConfigurationProperties. As the document says, there exists three ways to access properties in *.properties or *.yml : @Value : access single value Environment : can access multi

Elasticsearch: Join and SubQuery

Elasticsearch: Join and SubQuery Tony was bothered by the recent change of search engine requirement: they want the functionality of SQL-like join in Elasticsearch! “They are crazy! How can they think like that. Didn’t they understand that Elasticsearch is kind-of NoSQL 1 in which every index should be independent and self-contained? In this way, every index can work independently and scale as they like without considering other indexes, so the performance can boost. Following this design principle, Elasticsearch has little related supports.” Tony thought, after listening their requirements. Leader notice tony’s unwillingness and said, “Maybe it is hard to do, but the requirement is reasonable. We need to search person by his friends, didn’t we? What’s more, the harder to implement, the more you can learn from it, right?” Tony thought leader’s word does make sense so he set out to do the related implementations Application-Side Join “The first implementation

Implement isdigit

It is seems very easy to implement c library function isdigit , but for a library code, performance is very important. So we will try to implement it and make it faster. Function So, first we make it right. int isdigit ( char c) { return c >= '0' && c <= '9' ; } Improvements One – Macro When it comes to performance for c code, macro can always be tried. #define isdigit (c) c >= '0' && c <= '9' Two – Table Upper version use two comparison and one logical operation, but we can do better with more space: # define isdigit(c) table[c] This works and faster, but somewhat wasteful. We need only one bit to represent true or false, but we use a int. So what to do? There are many similar functions like isalpha(), isupper ... in c header file, so we can combine them into one int and get result by table[c]&SOME_BIT , which is what source do. Source code of ctype.h : # define _ISbit(bit) (1 << (