Please enable JavaScript.
Coggle requires JavaScript to display documents.
SRE (Cap. 17 - Testing for Reliability) - Coggle Diagram
SRE (Cap. 17 - Testing for Reliability)
Testing is the mechanism you use to demonstrate specific areas of equivalence when changes occur.
Each test that passes both before and after a change reduces the uncertainty for which the analysis needs to allow.
Thorough testing helps us predict the future reliability of a given site with enough detail to be practically useful.
The amount of testing you need to conduct depends on the reliability requirements for your system.
As the percentage of your codebase covered by tests increases, you reduce uncertainty and the potential decrease in reliability from each change. Adequate testing coverage means that you can make more changes before reliability falls below an acceptable level.
Relationships Between Testing and Mean Time to Repair (1)
Passing a test or a series of tests doesn’t necessarily prove reliability. However, tests that are failing generally prove the absence of reliability.
A monitoring system can uncover bugs, but only as quickly as the reporting pipeline can react.
The Mean Time to Repair (MTTR) measures how long it takes the operations team to fix the bug, either through a rollback or another action
.
Zero MTTR occurs when a system-level test is applied to a subsystem, and that test detects the exact same problem that monitoring would detect.
The more bugs you can find with zero MTTR, the higher the Mean Time Between Failures (MTBF) experienced by your users.
Types of Software Testing (2)
Traditional Tests (non production)
Unit tests
A unit test is the smallest and simplest form of software testing.
Unit tests are also employed as a form of specification to ensure that a function or module exactly performs the behavior required by the system.
Integration tests
Software components that pass individual unit tests are assembled into larger components.
Engineers then run an integration test on an assembled component to verify that it functions correctly.
System tests
A system test is the largest scale test that engineers run for an undeployed system.
All modules belonging to a specific component, such as a server that passed integration tests, are assembled into the system.
Then the engineer tests the end-to-end functionality of the system.
Types
Smoke tests
Smoke tests, in which engineers test very simple but critical behavior, are among the simplest type of system tests.
Smoke tests are also known as
sanity testing
, and serve to short-circuit additional and more expensive testing.
Performance tests
Response times for dependencies or resource requirements may change dramatically during the course of development,
a system needs to be tested to make sure that it doesn’t become incrementally slower without anyone noticing (before it gets released to users). For example, a given program may evolve to need 32 GB of memory when it formerly only needed 8 GB, or a 10 ms response time might turn into 50 ms, and then into 100 ms.
Regression tests
Regression tests can be analogized to a gallery of rogue bugs that historically caused the system to fail or produce incorrect results.
By documenting these bugs as tests at the system or integration level, engineers refactoring the codebase can be sure that they don’t accidentally introduce bugs that they’ve already invested time and effort to eliminate.
Production Tests (Black Box)
Rollouts Entangle Tests
It’s often said that testing is (or should be) performed in a hermetic environment. This statement implies that production is not hermetic.
Of course, production usually isn’t hermetic, because rollout cadences make live changes to the production environment in small and well-understood chunks.
Configuration test
At Google, web service configurations are described in files that are stored in our version control system.
For each configuration file, a separate configuration test examines production to see how a particular binary is actually configured and reports discrepancies against that file.
Such tests are inherently not hermetic, as they operate outside the test infrastructure sandbox.
Stress test
In order to safely operate a system, SREs need to understand the limits of both the system and its components. In many cases, individual components don’t gracefully degrade beyond a certain point—instead, they catastrophically fail.
Engineers use stress tests to find the limits on a web service.
Canary test
To conduct a canary test, a subset of servers is upgraded to a new version or configuration and then left in an incubation period.
Should no unexpected variances occur, the release continues and the rest of the servers are upgraded in a progressive fashion. Should anything go awry, the modified servers can be quickly reverted to a known good state.
We commonly refer to the incubation period for the upgraded servers as "baking the binary."
A canary test isn’t really a test; rather, it’s structured user acceptance. Whereas configuration and stress tests confirm the existence of a specific condition over deterministic software, a canary test is more ad hoc. It only exposes the code under test to less predictable live production traffic, and thus, it isn’t perfect and doesn’t always catch newly introduced faults.
Creating a Test and Build Environment (3)
One way to establish a strong testing culture is to start documenting all reported bugs as test cases.
If every bug is converted into a test, each test is supposed to initially fail because the bug hasn’t yet been fixed. As engineers fix the bugs, the software passes testing and you’re on the road to developing a comprehensive regression test suite.
Once source control is in place,
you can add a continuous build system that builds the software and runs tests every time code is submitted.
We’ve found it optimal if the build system notifies engineers the moment a change breaks a software project.
There are a variety of tools to help you quantify and visualize the level of test coverage you need. Use these tools to shape the focus of your testing: approach the prospect of creating highly tested code as an engineering project rather than a philosophical mental exercise. Instead of repeating the ambiguous refrain "We need more tests," set explicit goals and deadlines.
Remember that not all software is created equal. Life-critical or revenue-critical systems demand substantially higher levels of test quality and coverage than a non-production script with a short shelf life.
Testing at Scale (4)
Barrier Defenses Against Risky Software
Automation tools are also software.
Because their risk footprint appears out-of-band for a different layer of the service, their testing needs are more subtle.
Testing Scalable Tools
Testing Disaster
Using Statistical Tests
Fuzzing
Chaos Monkey