Please enable JavaScript.
Coggle requires JavaScript to display documents.
Thirteenth reading - Oracle and High Availability, Ariana Alvarado Molina …
Thirteenth reading - Oracle and High Availability
Importance of Data Protection
Oracle databases store valuable organizational data.
Timely access to data for business decisions is crucial.
Key Roles (Database Administrator, System Administrator, System Architect)
Responsible for implementing techniques to protect data.
Foundation includes proper backup operations.
Availability Strategy Beyond Backups
Essential for avoiding various outages (e.g., disk failures, primary site failure).
Software solutions complement backup operations.
Reliability of Hardware and Software
Hardware and software failures occur occasionally.
Complexity increases the likelihood of downtime.
Disaster Recovery and High Availability
Adequate recovery plans and options differentiate inconvenience from disaster.
Chapter explores Oracle's options for deploying high availability.
Oracle's High Availability Options
Leveraging built-in capabilities like instance recovery.
Options include Active Data Guard and Real Application Clusters.
Implementation of Procedures:
Implementation of appropriate procedures crucial for high availability.
Maximum Availability Architecture (MAA)
Oracle's comprehensive approach to high availability.
Encompasses various aspects discussed in the chapter.
Definition of Availability
Availability refers to a system being both "up" and "working."
Up means the Oracle database can be accessed.
Working implies delivering expected functionality with performance, exceeding SLAs.
Downtime and Its Elimination
Unplanned downtime should be eliminated.
Caused by server, storage, network, software failures, and human error.
Planned downtime during changes should be minimized.
Impact of Database Failures
Businesses rely on data availability for crucial decisions.
Web-based solutions amplify the impact of database failures.
Failures in systems accessed externally can harm financial health and image.
Customer Service Application Example
Interruptions in services like package tracking can drive customers to competitors.
Challenges in Multiple Systems
Accessing data across multiple systems increases failure chances.
Failures can render an entire supply chain inaccessible.
Measuring High Availability
Availability Percentage
99%: 3.65 days of annual downtime.
99.9%: 8.76 hours of annual downtime.
99.99%: 52.56 minutes of annual downtime.
99.999%: 5.26 minutes of annual downtime.
Large-scale systems achieving over 99% can be costly, with incremental costs for higher availability.
Considerations for Availability
Timing Matters
Availability requirements during working hours differ from 24/7.
Planned downtime after hours may be strategic to reduce unplanned failures.
Global Operations
Multinational companies with global operations may require continuous availability.
Contextualizing 24/7/365 Availability
Balancing availability requirements with deployment and maintenance costs.
Examination of complexity and cost may lead to compromises on extreme availability.
Impact of Unexpected Availability Loss
Unexpected loss affects business and IT productivity.
Regardless of business opportunity costs, downtime impact is significant.
The System Stack and Availability
Causes of Unplanned Downtime
Various causes, ranging from easily preventable to requiring significant infrastructure investments.
Consideration of frequent causes in planning for application and database availability.
System Components and Technology Stack
A complete system comprises hardware, software, and networking components in a technology stack.
Individual component availability doesn't guarantee overall system availability.
Achieving High Availability
Different strategies for each system component.
Failures in components above the database can impact database access.
System Stack Layers
Physical and logical layers cooperate to deliver an application.
Server hardware, software, and the database form the foundational layers.
Database Impact on System Stack
Oracle database failure affects higher stack levels.
Data loss or corruption can impact overall application integrity.
Server Hardware, Storage, and Database Instance Failure
Server and Storage Failure
Abrupt causes of unplanned downtime.
Server crash due to hardware or software issues.
Oracle instance failure, not the database itself.
Crash Impact on Oracle
Instance failure affects the delivery of promised functionality.
Data safety maintained in disk files despite a system crash.
Instance Recovery:
Process of cleaning up after a crash.
Active queries and transactions abruptly terminated.
Connected sessions lose the server process.
What Is Instance Recovery?
Instance Recovery in Oracle:
Automatically triggered after an instance failure.
Uses control file and database file headers.
Actions During Recovery:
Recovers all committed transactions.
Rolls back or undoes in-flight transactions.
Transaction Commit Confirmation:
Details written to current online redo log.
Confirmation sent back to client application.
Phases of Instance Recovery
Instance Recovery Phases
Roll Forward Phase
Uses redo logs to reapply changes from last checkpoint to failure.
Closes the gap between online redo logs and datafiles.
Rollback or Transaction Recovery
Rolls back uncommitted transactions.
Background process called deferred rollback.
Checkpoint Concept
Checkpoints synchronize data blocks in datafiles with the redo log.
Recorded in control file, datafile headers, and redo log.
Determines the recovery starting point.
Deferred Rollback
Background rollback of uncommitted transactions after the roll forward phase.
Reduces downtime and variability in recovery times.
Protecting Against System Failure
Component Redundancy:
Implement redundancy for critical system components.
Data Guard Deployment:
Use Data Guard for an alternate site in primary site failure.
Real Application Clusters (RAC):
Deploy RAC for database continuity during instance failure.
Additional Measures:
Details not provided
Component Redundancy in Hardware
Ensure fault-tolerant hardware components.
Include redundancy for:
Disk drives
Disk controllers
Flash memory
CPUs
Power supplies
Oracle's engineered systems (e.g., Oracle Exadata) are pre-configured for redundancy.
Disk failure is a critical focus for redundancy due to its high probability and various redundant solutions available.
Disk Redundancy
Need for Redundancy
Disk failures are more frequent due to the increased number of disks in large databases.
RAID (Redundant Array of Inexpensive Disks) is commonly used for protection.
RAID Concepts
Data duplication on another disk.
Data striped across multiple disks.
Parity calculation for redundancy.
Parity Calculation Example
Parity = A + B + C + D
If B drive is lost: B = E - A - C - D
Automatic Storage Management
Functionality
Manages storage placement for Oracle Database files.
Recommended MAA solution for storage failures and data corruption.
Features
SAME Approach
"Striping and Mirroring Everything."
Handles various disk types, including JBOD arrays.
Redundancy
Mirroring at disk or per-file level.
Dynamic redistribution to avoid disk bottlenecks.
Flexibility
Adding or removing disks without service interruption.
Automatic rebalancing and remirroring on disk failure.
Performance Enhancements
Fast mirror resynchronization for quicker recovery.
Flex ASM for separate ASM servers from Database servers.
Suitable for managing a database storage grid.
Site and Computer Server Failover
Automatic Recovery
Oracle Database automatically recovers from system crashes.
Ensures data integrity in a relational database.
Involves downtime during recovery.
Failover for Server Failures
Multiple servers employed for failover.
Mitigates downtime caused by server failures.
Implemented through Data Guard or Real Application Clusters (RAC).
Protection from Site Failures
Challenges in protecting against complete primary site failure.
Risks include physical, environmental, and hardware issues.
Monitoring and redundancy controls for power supply, climate control, server redundancy, and data redundancy.
Oracle Data Guard and Site Failures
Purpose
Recover databases from site, storage failures, and data corruption.
Configurations
Physical standby (MAA best practice).
Snapshot standby for testing.
Logical standby to reduce planned downtime.
Features
Supports up to 30 physical standby Oracle databases.
Cascading redo logs from primary to standby.
Disaster recovery for multitenant container databases (CDBs) in Oracle Database 12c.
Physical Standby Database
Keep a copy of database files at a second location.
Ship redo logs to the second site and apply them to the copy.
Standby becomes the production database in case of primary site failure.
Causes of Data Loss
Unshipped archived redo logs.
Unarchived filled online redo logs.
Unarchived current online redo logs.
Automation and Fast-Start Failover
Automated archiving of redo logs to multiple destinations.
Fast-start failover to limit redo loss exposure.
Oracle Data Guard Broker
Enables monitoring and control of standby databases.
Single command for failover.
Oracle Active Data Guard and Zero Data Loss
Oracle Active Data Guard Overview
An option for Oracle Database Enterprise Edition.
Deployed to avoid data loss.
Real-time apply for immediate redo data application.
Data Loss Modes
Maximum Protection Mode: No data loss, primary shutdown if remote logfile write fails.
Maximum Performance Mode: High protection, asynchronous writes to standby without impacting primary performance.
Reporting and Querying
Standby used for reporting.
Query standby with DML to global temporary tables while redo apply is active.
Scaling Reads
Oracle Database 12c supports scaling reads across multiple standby databases simultaneously.
Oracle GoldenGate and Replication
Utilized for site and computer failover without unplanned downtime.
Relies on log-based change data capture and replication.
Suitable for heterogeneous server platforms and database implementations.
Requires only GoldenGate software without additional Oracle tools.
Involves extensive scripting and testing for implementation.
Data in replication queue during server downtime may cause unavailability at the secondary site.
Real Application Clusters and Instance Failures
Oracle-recommended Maximum Availability Architecture (MAA) solution.
Database spread across multiple nodes, each with an active Oracle instance.
Clients can connect to any instance for database access.
Workload distributed over nodes, leading to lower resource usage per server.
Minimal overhead for maintaining cluster roles on each node.
Trade-off between performance degradation on node failure and the cost of additional nodes.
Oracle Transparent Application Failover
Seamless migration of users' sessions between Oracle instances.
Transparent high availability during instance failures.
Migration of users from highly active to less active instances.
Clients automatically reconnect to another instance.
Option to preconnect clients to an alternate instance during login for faster failover.
Resubmits queries active during the instance failure.
Oracle ensures read consistency for correct results.
Potential response lag for "next" row requests.
Useful for re-initializing session state.
Ariana Alvarado Molina - 2021089068