In this blog, we are going to dive into the functionality what google’s RPC – a complex distributed RPC framework – can do, then we may get RPC is not so easy in large scale. We will focus on two main aspects of its RPC framework: load balance & overload handle.
Load Balance
In the blog of The Life of a Request to Google (2), we have introduced that how Google’s RPC framework do the load balance in detail. Here, we just skim the backbone:
- In order to have better performance, RPC framework keep long-lived TCP connection to service producer;
- If connection silent long enough, fall back to UDP health check to save cost;
- Considering the cost to keep connections to all producer (may be up to hundreds), use determined algorithm to decide a subset of service producers to use;
- Use Weighted Round Robin to send request to its subset of producers, and update weight dynamically according to the response each producer returned.
“In each response (including responses to health checks), backends include the current observed rates of queries and errors per second, in addition to the utilization (typically, CPU usage)”
– Site Reliability Engineering
Handle Overload
In order to operating a large distributed system stably, one requirement have to be met: handle overloading. No matter how well the load balance is done, some part of system will finally overload with load increasing.
In order to solve a problem, we first need to define the problem.
How to Define Overload?
Before to define overload, we first need to define load. People usually like to use static statistic like QPS
to represent the load of a system, but it is often a poor criteria as it is proven in practice. From the experience of Google’s engineer, CPU consumption is often a good signal to detect system capacity.
Now we know the load of a machine and a standard to define when it is overload, but how to detect it? Still remember the our long lived connection (and UDP packets)? It will contain the health check response, which will report the state of service provider. If service provider says it is in state of lame duck, service consumer will stop sending request to it. In this method, the core is service producer report its own state rather than other consumer to try to guess it, which will be hard to do in distributed system.
Client
In order to solve this problem, we have to do some works both in client & server. It is often misunderstand that overload handling is done by server side. In fact, the client also plays an important role in it.
Considering the situation when a server received so much load from clients, server have to reject some of requests with lower criticality (e.g. health check request) or some timed-out request (discard those requests will save much work because user may already send another request). But for some request, even reject a request is not so cheap, may equally same expansive as server a request.
So, client throttling is the rescue to this problem. In order to implement client-side throttling, google recommends the adaptive throttling with following probability to reject requests:
Server
When it comes to the server side, there exists three ways:
-
Redirect request to other service provider when possible;
-
Serve degraded results when necessary;
responses that are not as accurate as or that contain less data than normal responses, but that are easier to compute. For example:
Instead of searching an entire corpus to provide the best available results to a search query, search only a small percentage of the candidate set.
Rely on a local copy of results that may not be fully up to date but that will be cheaper to use than going against the canonical storage. -
Reject transparently when all else fails.
Above is the overall policy, in detail, server will do the following tasks:
- Per customer limit: per client overall budget of traffic, budget of error, budget of retry.
- Server will propagate deadline/criticality of each request if a request need more RPC invocation;
- When a customer runs out of global quota, a backend server will only reject requests of a given criticality if it’s already rejecting all requests of all lower criticalities (in fact, the per-customer limits that our system supports, described earlier, can be set per criticality).
Written with StackEdit.
评论
发表评论