Please enable JavaScript.
Coggle requires JavaScript to display documents.
Databases - Coggle Diagram
Databases
Normalisation
Technique for designing relational database tables to minimize duplication of information to safeguard the database against logical or structural problems.
Provides rules that help:
- No redundancy
- Consistent data throughout linked tables.
- records can be added and removed without issues.
- Complex queries can be carried out..
First normal form
- No columns with repeated or similar data
- Each data item cannot be broken down any further - atomic
- Each row is unique: it has a primary key
- Each field has a unique name
Second normal form
- No partial dependencies.
- The table must already be in first normal form
Third normal form
- It is already in 2NF
- No non-key attributes that depend on another non-key attribute
Benefits and problems
Benefits:
- Database doesn't have redundant data. Smaller size, less money spent on storage
- Less data to search through, faster to run a query on the data
- No data duplication, better data integrity and less risk of mistakes
- No data duplication, less chance of storing two or more different copies of the data
- One change can be made which can be cascaded across any related records
Problems:
- Be careful with making data atomic. Just because you can split some types of data further, it isn't always necessarily the correct thing to do
- You can end up with more tables than an unnormalized database
- The more tables and the more complex the database, the slower queries can be to run
- Necessary to assign more relationships to interact with larger numbers of tables
- With more tables, setting up queries can become more complex
SQL
-
SQL commands
A statement or command in SQL is called a query. Queries are formed from the built in commands in SQL.
- CREATE TABLE - create an empty table
- DROP - delete a table or even a complete database
- INSERT - insert a row into a table
- UPDATE - alter data in a row
- SELECT - retrieve data from the database.
- WHERE sets up one or more conditions to filter the query
- AND allows more than one condition to be included
- FROM identifies the table to be used
SQL create/alter
Commands that alter a database
Commands to change the structure of a table
- Adding a new field:
ALTER TABLE table_name ADD column_name datatype
- Removing an existing field
ALTER TABLE table_name DROP COLUMN column_name
- Commands to manage the data
Let us insert a row of data into the table
INSERT INTO TABLE
(FIELD1
, FIELD2
, FIELD3
) VALUES ('X', 'Y', 'Z');
- Update an existing record
UPDATE TABLE
SET FIELD1
=CONDITION1 WHERE FIELD2
=CONDITION2
SQL extraction
Commands to extract data:
- Selection
Let us select every record from the table:
SELECT * FROM TABLE
WHERE 1
The wildcard * means 'every field' and the WHERE 1 means 'match every row'. Therefore this will retreive the whole table.
- Counting
We may just want to know the number of record
SELECT COUNT (FIELD
) FROM TABLE
Counts the records whose FIELD is not empty and return the number.
- Filtering selection
Only retrieve records with a specific condition
SELECT*FROMTABLE
WHEREFIELD
= CONDITION
- Combining conditions
Combine conditions with logical AND / OR.
SELECT*FROMTABLE
WHEREFIELD
=CONDITION1 ORFIELD
='CONDITION2'
-
Primary key
A field(s) in a table which is unique identifier for every record in the table.
Automatically indexed. There are two types:
-
-
Transaction processing
Database transaction - operations that update one or more databases in some way.A transaction is a single operation executed on data. Has two outcomes:
- The transaction succeeds
- The transaction fails
A partially-successful transaction that fails to complete properly should be detected by the system and cancelled so that the database is in the same state as it was as if the transaction never happened.
CRUD
Transaction changes the state of a database, there are four basic operations.
- CREATE
- READ
- UPDATE
- DELETE
ACID rule
Developed to help make sure that the contents and layout of a database aren't damaged or broken even if a transaction fails part way through. These rules are built into database management systems:
- Atomicity
- Consistency
- Isolation
- Durability
Atomicity
Atomic - cannot be broken down into smaller parts.
An atomic transaction is indivisible.
If one of the operations fails, the management system detects this and rolls back the transaction entirely, ensuring that database integrity is maintained. If the transaction fails, then the database is not updated, as if it never happened.
For a transaction to have atomicity, the transaction must fully happen or not happen at all. It must not be completed partially
Consistency
Every database has rules about the kind of data held within its fields and the relationships between tables and keys.
Consistency - transaction must ensure that all the referential integrity rules are being followed.
Consistency means that data integrity remains intact after the transaction.
Isolation
Isolated transaction cannot be interfered with while underway.
Data field/table - placed in a 'locked' status to prevent other transactions affecting it. Once transaction is done, lock is removed.
Relevant if multiple people are working with the database at the same time. If two transactions are running and each needs data that has been isolated by the other. This can lead to a deadlock where neither transaction can proceed.
Simultaneous execution of transactions should lead to the same result as if they were executed one after the other.
Durability
Whatever transaction has taken place, update must remain in place even if there is a system failure.
Durable transaction is recorded so results cannot be easily lost.
A successful transaction updates the database stored in secondary storage - hard drive. Opposite of a durable transaction would be storing the changes in a way that they could be easily lost - for example, never moving them out of main memory.
Why use a database?
- Vast amounts of information can be stored.
- Queries can be run to search for specific records or groups of records.
- Information can be extracted from the database and exported into a word processing package for mail merging
Most official (or spam) letters will be personally addressed to you, with your name, address, account details, etc. This is done by running a query on a database and finding a set of records which match a set of criteria. The data from the query is automatically extracted from the database and placed into a pre-written letter, merged, printed and sent to you. This technique enables companies to contact thousands of people very quickly.
- Queries and reports can be produced from the stored data
Queries on a database can result in a huge amount of results being returned. These results are shown as a list of records in a table. Whilst tables can be sorted into a logical order it can be hard to spot trends or patterns and a table isn't really the best way of presenting data to other people. Reports can be created from the records returned by the query. A report will enable the information to be presented in a professional way with graphs, labels, annotation and your company logo.
Parts of a database
- Table
Database may contain one or more tables. If there is only one table then all of the records held in the database will be in that table. If there are multiple tables then it is normal for each table to store records related to a particular category.
Each table should be given a relevant name.
- Record
Each row in a table is called a record. A record is made up of individual pieces of information that are all related.
- Field
Each column in a table is called a field. A field contains one specific piece of data about a particular item. Each field should be given a relevant name.
Flat file database
Consists of a single table which stores all of the data held within that database
Easy to set up and can be made in software that many have access to such as a spreadsheet.
Repeats data unnecessarily - 'data duplication'.
There is so much data duplication, flat-file databases tend to require more storage space than relational databases.
Relational database
Consists of a number of linked tables. Each table contains records about a particular entity.
Once the tables have been set up, a relationship can be created to link them together.
This arrangement of related tables is known as a 'relational database'. It recognises the difference between entities and uses different tables for each entity.
Benefit: data doesn't have to be duplicated.
Reducing data duplication reduces the amount of data which needs to be stored, making the database smaller. It reduces the risk of mistakes, because every time you have to type the same data in, there is a risk you could mis-spell it.
Secondary key
An index used to search and sort through the database with more convenience and speed.
Purpose: to store and search for data. When databases become large they can take a while to search.
Fields that are likely to be searched upon can be indexed for faster searching. This field, because it is not a primary key, is called a secondary key.
Foreign key
Used to link tables together and create a relationship. A field in one table that is linked to the primary key in another table.
Referential integrity
Process of ensuring consistency as it makes sure that information is not removed if it is required elsewhere in the database.
It checks that every foreign key value has a matching value in the corresponding primary key. Referential integrity uses these to ensure that there are no orphan records (records unlinked to any other table, even though it should).
Data consistency
When a database transaction can change data in acceptable ways. Data inserted or manipulated in the database must follow pre-defined rules such as constraints and triggers. With multi-user databases, such as client-server databases, data consistency means that each user has a reliable view of the data, i.e. data should be in line with the changes made by the user’s own transactions and transactions of other users. It ensures uniformity and accuracy of data between different applications using the database and as data travels across a network.
Data redundancy
Occurs when there is duplication of data or unnecessary data is kept in the database. This can happen when the same value is repeated in two or more fields in a table, or the same field is replicated in two or more tables. If a change is made in one place, then not all the instances of the data will have the same value, leading to data corruption, compromise data accuracy and increase storage requirements. Database normalisation is the main way of dealing with data redundancy.
Data independency
Is achieved when the database is designed so that it reflects the natural relationships between entities and is not structured in a way that suits a specific application. So, when changes are made to the database structure or definition, they will not affect existing database applications that use the database.
BDMS
Managing the modifications so that database application programs don’t need to be rewritten every time a change occurs to the database. The users’ view and experience of the data is not compromised.
- Physical data independence - ability to adapt physical aspects of the database without affecting the database applications.
- Logical data independence - ability to adapt logical aspects of the database without affecting the database applications.
Data integrity
Data integrity - maintaining and ensuring the reliability of data in terms of its accuracy, completeness, and consistency over its lifecycle. Data corruption - when unintentional changes occur to data resulting in errors. Data integrity includes the following:
- Entity integrity - every table must have a primary key, which must have a unique value and can’t be null.
- Domain integrity - every attribute in a relational database is associated with a domain - allowed values the attribute can contain and is defined by characteristics: data type, length of data that it can accept, any constraints on values that are allowed, if there is a default value, if it can accept null values.
- Referential integrity - if a record is added to a table and a value is entered into a foreign key field, the value must exist in the primary key field of another table. Primary/foreign key relationship ensures that a record in one table is linked to a record in another. Sometimes foreign key can be null to signify that there is no relationship. If null values are not allowed, then there must be a related record.
Record locking
Preventing simultaneous access to data in a database, to prevent inconsistent results. Deadlock can occur as a result.
-
-
Capturing data
Process of getting the information you wish to use.
Banks use Mangetic Ink Character to capture data from cheques apart from the amount which must be entered manually.
-
Managing the data
To manipulate the information in any type of way. E.g. sorting through or selecting certain parts using SQL
A database is a collection of data or information which is held together in an organised or logical way.