Please enable JavaScript.

Coggle requires JavaScript to display documents.

SRE (Cap. 20 - Load Balancing in the Datacenter) - Coggle Diagram

- - - - Healthy: The backend task has initialized correctly and is processing requests.
      - Refusing connections: The backend task is unresponsive. This can happen because the task is starting up or shutting down, or because the backend is in an abnormal state
      - Lame duck (Draining): The backend task is listening on its port and can serve, but is explicitly asking clients to stop sending requests.
- - - - Small subsetting: One of the simplest reasons Round Robin distributes load poorly is that all of its clients may not issue requests at the same rate. Different rates of requests among clients are especially likely when vastly different processes share the same backends. In this case, and especially if you’re using relatively small subset sizes, backends in the subsets of the clients generating the most traffic will naturally tend to be more loaded.
      - Varying query costs: Many services handle requests that require vastly different amounts of resources for processing. In practice, we’ve found that the semantics of many services in Google are such that the most expensive requests consume 1000x (or more) CPU than the cheapest requests. Load balancing using Round Robin is even more difficult when query cost can’t be predicted in advance.
      - Machine diversity: Another challenge to Simple Round Robin is the fact that not all machines in the same datacenter are necessarily the same. A given datacenter may have machines with CPUs of varying performance, and therefore, the same request may represent a significantly different amount of work for different machines.
      - Unpredictable performance factors: Competition for shared resources, such as space in memory caches or bandwidth. When a task gets restarted, it often requires significantly more resources for a few minutes.
  - - - The count of active requests may not be a very good proxy for the capability of a given backend: Many requests spend a significant portion of their life just waiting for a response from the network (i.e., waiting for responses to requests they initiate to other backends) and very little time on actual processing. For example, one backend task may be able to process twice as many requests as another (e.g., because it’s running in a machine with a CPU that’s twice as fast as the rest), but the latency of its requests may still be roughly the same as the latency of requests in the other task (because requests spend most of their life just waiting for the network to respond). In this case, because blocking on I/O often consumes zero CPU, very little RAM, and no bandwidth, we’d still want to send twice as many requests to the faster backend. However, Least-Loaded Round Robin will consider both backend tasks equally loaded.
      - The count of active requests in each client doesn’t include requests from other clients to the same backends: That is, each client task has only a very limited view into the state of its backend tasks: the view of its own requests.