Please enable JavaScript.
Coggle requires JavaScript to display documents.
Tech Arch Resiliency - Coggle Diagram
Tech Arch Resiliency
Infrastructure
-
-
Infra, PaaS, Serverless, ManagedServices
-
-
System Maturity: 1. Availability => 2. Fault- Tolerance (component break / failure) => 3. Resiliency (handles attack, bad data, crazy traffic having cost effective)
- Availability
(99.99% Focus on Uptime)
- Redundancy => Fail-over
- Loadbalancing
- Monitoring
- Disaster Recovery
- Replication
- Observability
- Rate Limiting / Load Shredding
- Fault Tolerance
(Zero Downtime Goal)
- Active-Active Redundancy
- Statelessness
- Quorum-based systems
CAP / PACELC, Eventual Consistency,
- All HA/FT strategies: Redundancy, failover, load balancing, replication.
- Circuit Breakers: Preventing cascading failures by "breaking" the circuit to a failing service
- Retries and Timeouts: Handling transient errors and preventing indefinite waits.
- Bulkheads: Isolating components so a failure in one doesn't affect others
- Chaos Engineering: Proactively injecting faults into the system to test its weaknesses and improve its ability to withstand real-world disruptions.
- Automated Healing/Self-Healing: Systems automatically detecting and resolving problems (e.g., restarting failed services).
Note: Both, Circuit Breakers and Bulkheads, aim to isolate failures, they do so with different mechanisms
Failures are inevitables. N/W Partitions, Servers crash, Disk failures will happen.
Resilency is to handle them cost effectively.
-
KPIs
Recovery time objective (RTO) [maximum acceptable downtime for an organization's operations during a disruption]
and Recovery point objective (RPO) [maximum amount of data loss that an organization can tolerate during a disruption]
mean time to recovery (MTTR) and Mean Time Between Failure (MTBF)
Resources
Deloitte Resiliency framework, FMEA
Architecture Evaluation
OPTR, TOGAF, ATAM (Scenario-driven review)
-
-
-
-
-
-
-