Please enable JavaScript.
Coggle requires JavaScript to display documents.
Database architectures (SQL/NoSQL) - Coggle Diagram
Database architectures (SQL/NoSQL)
Choosing SQL vs NoSQL
✅ Expected to Know
When to use relational vs. non-relational DBs.
✅ Best Practices
Use polyglot persistence: choose the right DB per service/domain
Clearly document rationale for each DB choice
Align DB type with CAP tradeoffs (Consistency, Availability, Partition tolerance)
Schema Design & Modeling
✅ Expected to Know
Principles for modeling relational and document-oriented data
✅ Best Practices
Normalize SQL schemas up to 3NF, then denormalize for performance if needed
In NoSQL, model around access patterns—avoid joins
Avoid schema-on-write lock-in by using loosely coupled schemas with validation layers
✅ Patterns
Star Schema / Snowflake Schema for analytics (SQL)
Aggregate Design for NoSQL (e.g., one document per business entity)
Schema Versioning: use schema_migrations (SQL) or version fields (NoSQL)
Scaling Patterns
✅ Expected to Know
How to scale vertically and horizontally for different DBs
✅ Best Practices
Use application-level sharding only when native features fall short
Monitor query plans (SQL) and partition key access patterns (NoSQL)
Understand consistency levels (eventual, strong, quorum)
✅ SQL
Read replicas (e.g., Aurora, PostgreSQL)
Sharding (key-based, range-based, hybrid)
Connection pooling (PgBouncer, RDS Proxy)
✅ NoSQL
Automatic partitioning (DynamoDB, Cassandra)
Hot partition avoidance
Replication factor tuning (e.g., MongoDB)
Transactions & Consistency
✅ Expected to Know
Data consistency guarantees and when to compromise
✅ Best Practices
Avoid 2PC (two-phase commit) across services
Document consistency expectations per data flow
✅ Patterns
Outbox pattern for event publishing with consistency
Compensating transactions in distributed systems
✅ SQL
Full ACID support
Use isolation levels: READ COMMITTED, REPEATABLE READ, SERIALIZABLE
✅ NoSQL
DynamoDB Transactions, MongoDB multi-doc transactions (less efficient)
Eventual consistency for high availability
Sagas for distributed transactions across services
Observability, Health & Maintenance
✅ Expected to Know
How to monitor and manage DB systems in production
✅ Best Practices
Backup regularly and automate DR plans (cross-region snapshots)
Enable automated patching and failover
Perform capacity planning and enforce limits (row size, table count, etc.)
✅ Patterns
Slow query logging
Query tracing with observability tools (Datadog, Prometheus + exporters)
Connection health checks for Kubernetes readiness probes
✅ Metrics to Monitor
Query latency, cache hit ratio, deadlocks, replication lag, read/write IOPS
Security, Compliance & Data Governance
✅ Expected to Know
Securing and governing data across all layers
✅ Best Practices
Encrypt at rest and in transit (KMS, SSL)
Use IAM roles or managed identities for DB access
Track audit logs, access patterns, and GDPR requirements (right to delete, data residency)
✅ Patterns
Field-level encryption (e.g., card number, PII)
RBAC or ABAC models for DB access control
Data masking for non-prod environments
Backup, Archiving & Data Lifecycle
✅ Expected to Know
How to manage long-term data storage and archiving
✅ Best Practices
Define data lifecycle policies for all datasets
Test point-in-time recovery (PITR) and snapshot restoration regularly
Ensure backup encryption and regional replication
✅ Patterns
Time-to-live (TTL) in DynamoDB or MongoDB for expirable data
Cold data tiering to S3 Glacier
Automated backups with retention policies
Multi-Tenancy & Isolation
✅ Expected to Know
Designing for SaaS or multi-customer environments
✅ Best Practices
Use row-level security (RLS) for tenant isolation
Consider query quotas or throttling per tenant
Design indexes and partitions with tenant-aware strategy
✅ Patterns
Single DB, shared schema with tenant ID columns
Single DB, schema per tenant (PostgreSQL)
Database per tenant for extreme isolation
Operational Tradeoffs & Costs
✅ Expected to Know
Choosing databases based on operational cost and complexity
✅ Best Practices
Prefer managed DBs (e.g., RDS, DynamoDB, MongoDB Atlas) when possible
Monitor connection limits, IO costs, storage growth
Use auto-scaling policies (Aurora Serverless, DynamoDB on-demand) wisely
Data Integration & Warehousing
✅ Expected to Know
How to connect OLTP databases to analytical and ML platforms
✅ Best Practices
Keep OLTP and OLAP workloads separate
Define SLA-aware data freshness strategies
Clean, deduplicate, and validate before analytics
✅ Patterns
CDC (Change Data Capture) with Debezium or DMS
ETL/ELT pipelines to Redshift, BigQuery, Snowflake
Data Lakes with S3 + Glue/Athena