Please enable JavaScript.
Coggle requires JavaScript to display documents.
Azure Well-Architected Framework - Coggle Diagram
Azure Well-Architected Framework
Reliability
Principles
Design for scale out
Design for failure
Design for Self-Healing
Observe application health
Design for business requirements (SLA 99.99%)
Drive automation
Patterns
High Availability
(percentage of uptime)
Deployment Stamps
Deploy multiple independent copies of application components, including data stores
Geodes
Deploy backend services into a set of geographical nodes, each of which can service any client request in any region
Health Endpoint
Monitoring
Implement functional checks in an application that external tools can access through exposed endpoints at regular intervals
Queue-based
Load Leveling
Use a queue that acts as a buffer between a task and a service that it invokes, to smooth intermittent heavy loads
Throttling
Control the consumption of resources
Bulkhead
Isolate elements of an application into pools so that if one fails, the others will continue to function
Circuit Breaker
Handle faults that might take a variable amount of time to fix when connecting to a remote service or resource
Resilience
(ability to gracefully handle and recover from failures)
Bulkhead
Isolate elements of an application into pools so that if one fails, the others will continue to function
Circuit Breaker
Handle faults that might take a variable amount of time to fix when connecting to a remote service or resource
Compensating
Transaction (SAGA?)
Undo the work performed by a series of steps, which together define an eventually consistent operation
Health Endpoint Monitoring
Implement functional checks in an application that external tools can access through exposed endpoints at regular intervals
Leader Election
Coordinate the actions performed by a collection of collaborating task instances in a distributed application by electing one instance as the leader that assumes responsibility for managing the other instances
Queue-based
Load Leveling
Use a queue that acts as a buffer between a task and a service that it invokes, to smooth intermittent heavy loads
Retry
Enable an application to handle anticipated, temporary failures when it tries to connect to a service or network resource by transparently retrying an operation that's previously failed.
Scheduled Agent Supervisor
Coordinate a set of actions across a distributed set of services and other remote resources
Performance Efficiency
Principles
Design for horizontal scaling
Define a capacity model according to the business requirements
Test the limits for predicted and random spikes and fluctuations in load
Use PaaS offerings
Choose the right resources and right-size
Apply strategies in your design early
strive for stateless application
store state externally in a database or distributed cache
use caching where possible
Shift-left on performance testing
Run load and stress tests
Establish performance baselines
Run the test in the continuous integration (CI) build pipeline
Continuously monitor for performance in production
Monitor the health of the entire solution
Reevaluate the needs of the workload continuously
Checklist
Application design
Design for scaling
Scale as a unit
Take advantage of platform autoscaling features
Partition the workload
Avoid client affinity (aka stateless)
Offload CPU-intensive and I/O-intensive tasks as background tasks
Data management
Use data partitioning
Design for eventual consistency
Reduce chatty interactions between components and services
where possible, combine several related operations into a single request
use stored procedures in databases to encapsulate complex logic, and reduce the number of round trips and resource locking
Use queues to level the load for high velocity data writes
Use a queue that acts as a buffer between a task and a service that it invokes
This can smooth intermittent heavy loads that may otherwise cause the service to fail or the task to time out
Minimize the load on the data store
The data store is commonly a processing bottleneck, a costly resource, and often not easy to scale out
Typically, it's much easier to scale out the application than the data store, so you should attempt to do as much of the compute-intensive processing as possible within the application
Minimize the volume of data retrieved
Aggressively use caching
Handle data growth and retention
Optimize Data Transfer Objects (DTOs) using an efficient binary format
DTOs are passed between the layers of an application many times
Enable client side caching
Consider denormalizing data
Consider if some additional storage volume and duplication is acceptable in order to reduce the load on the data store
Implementation
Use asynchronous calls
Carry out performance profiling and load testing
Compress highly compressible data
Minimize the number of connections required
Send requests in batches to optimize network use
Avoid a requirement to store server-side session state where possible
Use lightweight frameworks and libraries
Patterns
Throttling
Control the consumption of resources used by an instance of an application, an individual tenant, or an entire service.
Static Content Hosting
Sharding
Divide a data store into a set of horizontal partitions or shards.
Queue-Based Load Leveling
Use a queue that acts as a buffer between a task and a service that it invokes in order to smooth intermittent heavy loads.
Priority Queue
Prioritize requests sent to services so that requests with a higher priority are received and processed more quickly than those with a lower priority.
Materialized View
Generate prepopulated views over the data in one or more data stores when the data isn't ideally formatted for required query operations.
Index Table
Create indexes over the fields in data stores that are frequently referenced by queries.
Event Sourcing
Use an append-only store to record the full series of events that describe actions taken on data in a domain.
CQRS
Segregate operations that read data from operations that update data by using separate interfaces
Choreography
Have each component of the system participate in the decision-making process about the workflow of a business transaction, instead of relying on a central point of control.
Cache Aside
Scalability
(ability of a system to handle increased load)
Application design
Always design Stateless
Always design for Horizontal Scaling
Prefer async inter-service communication, such as message or event
Use cache where possible to avoid huge load to DB
Always use CDN for static resources
Infrastructure
Use Autoscaling to manage load increases and decreases
Plan for growth, add scale units
Every component in the infrastructure must be scalable
Database
DB Sharding
DB Replica: one write, multiple read replicas
DB Caching
Security
Principles
Plan security readiness
Strive to adopt and implement security practices in architectural design decisions and operations with minimal friction.
Design to protect confidentiality
Prevent exposure to privacy, regulatory, application, and proprietary information through access restrictions and obfuscation techniques.
Design to protect integrity
Prevent corruption of design, implementation, operations, and data to avoid disruptions that can stop the system from delivering its intended utility or cause it to operate outside the prescribed limits. The system should provide information assurance throughout the workload lifecycle.
Design to protect avalability
Prevent or minimize system and workload downtime and degradation in the event of a security incident by using strong security controls. You must maintain data integrity during the incident and after the system recovers.
Sustain and evolve your security posture
Incorporate continuous improvement and apply vigilance to stay ahead of attackers who are continuously evolving their attack strategies.
Checklist
Establish a security baseline
that's aligned to compliance requirements, industry standards, and platform recommendations. Regularly measure your workload architecture and operations against the baseline to sustain or improve your security posture over time.
Maintain a secure development lifecycle
by using a hardened, mostly automated, and auditable software supply chain. Incorporate a secure design by using threat modeling to safeguard against security-defeating implementations.
Classify and consistently apply sensitivity and information type labels
on all workload data and systems involved in data processing
Create intentional segmentation and perimeters in your architecture design
and in the workload's footprint on the platform. The segmentation strategy must include networks, roles and responsibilities, workload identities, and resource organization.
Implement strict, conditional, and auditable identity and access management (IAM)
across all workload users, team members, and system components.
Isolate, filter, and control network traffic
across both ingress and egress flows.
Encrypt data
by using modern, industry-standard methods to guard confidentiality and integrity.
Protect application secrets
by hardening their storage and restricting access and manipulation and by auditing those actions.
Implement a holistic monitoring strategy
that relies on modern threat detection mechanisms
Establish a comprehensive testing regimen
that combines approaches to prevent security issues, validate threat prevention implementations, and test threat detection mechanisms.
Define and test effective incident response procedures
Tradeoffs
Security tradeoffs with Reliability
Tradeoff: Increased complexity
Tradeoff: Increased critical dependencies
Tradeoff: Increased complexity of disaster recovery
Tradeoff: Increased rate of change
Security tradeoffs with Cost Optimization
Tradeoff: Additional infrastructure
Tradeoff: Increased demand on infrastructure.
Tradeoff: Increased process and operational costs
Security tradeoffs with Operational Excellence
Tradeoff: Complications in observability and serviceability
Tradeoff: Decreased agility and increased complexity
Tradeoff: Increased coordination efforts.
Security tradeoffs with Performance Efficiency
Tradeoff: Increased latency and overhead
Tradeoff: Increased chance of misconfiguration
Operational Excellence
Principles
Embrace DevOps culture
Establish development standards
Evolve operations with observability
Deploy with confidence
Reach the desired state of deployment with predictability.
Automate for efficiency
Adopt safe deployment practices
Implement guardrails in the deployment process to minimize the effect of errors or unexpected conditions.