Please enable JavaScript.
Coggle requires JavaScript to display documents.
Exchanging data - Coggle Diagram
Exchanging data
Compression, encryption and hashing
Compression:
- The process used to reduce the storage space required by a file
- Particularly important for sharing files over networks or the Internet
- Increases the number of files that can be transferred in a given time
- Downloading a compressed file is faster than downloading the full version
Lossy vs lossless compression:
- Lossy compression reduces the size of the file while also removing some information
- Lossless compression reduces the size of a file without losing any information
Run length encoding:
- A method of lossless compression
- Repeated values are removed and replaced with one occurrence followed by the number of times it should be repeated
- Relies on consecutive pieces of data being the same
- Doesn't offer a great reduction in file size if there's little repetition
Dictionary encoding:
- A method of lossless compression
- Frequently occurring pieces of data replaced with an index
- Compressed data is stored alongside a dictionary
- Dictionary matches frequently occurring data to an index
- Original data can be restored using the dictionary
Encryption:
- Used to keep data secure when it's being transmitted
Symmetric encryption:
- Both sender and receiver share the same private key
- The key is distributed in a process called a key exchange
- This key is used for both encrypting and decrypting data
- The key must be kept secret
- If the key is intercepted then any communications sent can be intercepted
Asymmetric encryption:
- Two keys are used: public and private
- The public key can be published anywhere
- The private key must be kept secret
- Together, the keys are known as a key pair
- The keys are mathematically related to one another
- Messages encrypted with the public key can only be decrypted with the corresponding private key
- Encrypting a message using your private key verifies that the message was sent by you. If your public key can decrypt a message, then it must have been encrypted with your private key, which only you have access to
Hashing:
- An input (called a key) is turned into a fixed size value (called a hash)
- A vast number of algorithms, called hash functions, do this
- The output of a hash function can't be reversed to form the key
- The keys, which can't be reversed to gain the passwords
- A hash table is a data structure which holds key-value pairs
- Hash tables can be used to lookup data in an array in constant time
- Hash tables are used extensively in situations where a lot of data needs to be stored with constant access times. For example, in caches and databases
- If two keys produce the same hash, a collision is said to occur
- Methods to overcome collisions include storing items together in a list under the new hash value and using a second hash function to generate a new hash
- A good hash function should have a low chance of collision and should be quick to calculate
- A hash's function output should be smaller than the input it was provided
Networks
Networks and protocols
Characteristics of a network:
- Two or more computers connected together that can transmit data
- Physical topology is the physical layout of the network
- Logical topology is the way in which data flows around a network
Topologies
- Bus topology: network topology where all terminals (devices) are connected to a backbone cable
Advantages:
- Cheaper to set up, doesn't require any additional hardware
Disadvantages:
- If backbone cable fails, the entire network gets disconnected
- As traffic increases, performance decreases
- All computers can see the data transmission
- Star topology: uses a central node (switch/computer) to direct the flow of data, MAC (Media Access Control) addresses identify each device
Advantages:
- Performance is consistent even if network is being heavily used
- If one cable fails, only that station is affected
- Transmits data faster, so it gives better performance than bus topology
- It's easy to add new stations
- No data collisions
DIsadvantages:
- Expensive due to switch and cabling
- If the central switch fails, the rest of the network fails
- Mesh topology: every node is connected to every other node
- Most commonly found with wireless technology like Wi-Fi
Advantages:
- No cabling cost
- As nodes increase, the reliability and speed of the network becomes better
- Nodes automatically get incorporated
- It's faster since nodes don't go through a central switch
Disadvantages:
- You have to purchase devices with wireless capabilities
- Maintaining the network is difficult
Protocols:
- Sets of rules defining how two devices communicate with each other
- Need to be standard so all devices can communicate, regardless of manufacturer
The Internet structure
- A network of networks
- Allows computers on opposite sides of the globe to communicate with each other
The TCP/IP stack:
- Transmission Control Protocol/Internet Protocol
- A stack of networking protocols that work together passing packets during communication
Protocol layering:
- Application layer:
- Based at the top of the stack
- Specifies what protocol needs to be used in order to relate the application that's being sent
- Transport layer:
- Uses the TCP to establish an end-to-end connection between the source and recipient computer
- Splits up data into packets
- Labels packets with their packet number
- Requests retransmission of any lost packets
- Network layer:
- Adds source and destination IP addresses
- Routers operate on the network layer and the router is what uses the IP addresses to forward the packets
- Link layer:
- The connection between the network devices
- Adds the MAC address identifying the Network Interface Cards of the source and destination computers
- On the recipient's computer the layers occur again in reverse:
- Link layer:
- Removes the MAC addresses
- Network layer:
- Transport layer:
- Removes the port number and reassembles the packets
- Application layer:
- Presents the data to the recipient in the form it was sent
LANs and WANs:
- Local area network (LAN) is a network spread over a small geographical area
- Wide area network (WAN) is a network spread over a large geographical area, that typically requires extra hardware
DNS:
- Domain name system
- The system given to the method of naming internet resources
- A hierarchy where each smaller domain is separated from the larger domain by a full stop
- DNS server translates domain names into IP addresses when we access a website
Packet switching
- A method of communicating packets of data across a network
- A packet is just a section of data
- Packets aren't limited to a single route
Advantages:
- There are multiple methods to ensure data arrives
- There is more than one method of getting to the other devices, so if one path breaks you can use another
- You can transfer packets over very large networks to allow communication globally
Disadvantages:
- Time is spent deconstructing and reconstructing the data packets
Circuit switching
- A method of communication where a direct link is created between two devices
- Link maintained for the entire conversation
- The two devices must transfer and receive data at the same rate
Advantages:
- The data arrives in a logical order which results in a quicker reconstruction of the data
- This enables two users to hold a call without delay in speech
Disadvantages:
- Bandwidth is wasted during periods of time where no data is sent
- The devices must transfer and receive data at the same time
- Since switches are used, electrical interference is produced which can corrupt or lose data
Data packets:
- Segments of data
- Contains various pieces of information
- Header
- Sender and the recipient's IP address
- Protocol being used
- Order of the packets
- Time to live/hop limit
- Payload
- Trailer
- Checksum, or cyclic redundancy check
-
Network hardware
Network Interface Cards (NICs):
- Required to connect to a network
- Assign a unique Media Access Control (MAC) address to each device
Switches:
- Used to direct the flow of data across the network
- Most commonly used in a star topology
Wireless Access Points (WAPs):
- Allow devices to connect to a network wirelessly
- More commonly used to connect devices to a router which can allow internet access
- Used in mesh networks
Routers:
- Used to connect two or more networks together
- One network will often belong to the ISP's network (internet service providers' network) allowing the network to connect to the internet
Getaways:
- Used when protocols aren't the same between networks
- Translate protocols so that both networks have the same protocols
- Remove the header from packets before the remaining data is added with the new protocol of the new network in mind
-
Databases
Relational database
Relational databases:
- A relational database is one which uses different tables for different entities
- An entity is an item of interest about which information is stored
Flat file:
- A flat file database consists of a single file
- The flat file will most likely be based around a single entity and its attributes
- Attributes are the categories about which data is collected
- Flat files are typically written out in the following way: Entity1(Attribute1, Atribute2, Attribute3, ...)
-
-
-
Entity relationship modelling:
- One-to-one: Each entity can only be linked to one other entity
- One-to-many: On table can be associated with many tables
- Many-to-many: One entity can be associated with many other entities and the same applies the other way round
Normalisation:
- The process of coming up with the best possible design for a relational database is called normalisation
- Normalisation tries to accomplish the following things:
- No redundancy (unnecessary duplicates)
- Consistent data throughout linked tables
- Records can be added and removed without issues
- Complex queries can be carried out
Second normal form:
- No partial dependencies
- Is in first normal form
Third normal form:
- Is in second normal form
- Contains no non-key dependencies
- A non key dependency is when the attribute depends on the value of the primary key and nothing else
First normal form:
- No attribute can contain more than a single value
Indexing:
- Method used to store the position of each record when ordered by a certain attribute
- Used to look up and access data quickly
- Primary key is automatically indexed
Handling data
Capturing data:
- Data needs to be input into the database and there are various ways of doing this
- The chosen method is always dependent on the context
- Data may need to be manually entered or scanned using methods such as Magnetic Ink Character Recognition (MICR) which is used with cheques
Selecting and managing data:
- Selecting the correct data is an important part of data preprocessing
- This could involve only by selecting data that fits a certain criteria
- Collected data can be managed using SQL to sort, restructure and select certain sections
Exchanging data:
- Exchanging data is the process of transferring the data that has been collected
- One common example of this is EDI (Electronic Data Interchange)
SQL
- SQL stands for Structured Query Language and is a declarative language used to manipulate databases
SELECT, FROM, WHERE:
- The SELECT statement is used to collect fields from a given table
- The FROM statement specifies which table/tables the information will come from
- The WHERE statement specifies the search criteria
ORDER BY:
- The ORER BY part of the code specifies whether you want it in ascending or descending order
JOIN:
- JOIN provides a method of combining rows form multiple tables based on a common field between them
CREATE:
- The CREATE function allows you to make new databases
- You need to specify a few details for each attribute:
- It if is the primary key
- Its data type
- Whether it needs to be filled in
ALTER:
- This is used to add, delete or modify the columns in a table
INSERT INTO:
- This is used to insert a new record in a table
UPDATE:
- This is used to update a record in a table
DELETE:
- This is used to delete a record from a database table
Referential integrity
- Referential integrity is the process ensuring consistency
- This makes sure that information isn't removed if it is required elsewhere in a linked list
Transaction processing:
- A transaction is defined as a single operation executed on data
- Transaction must be processed in line with ACID
ACID (atomicity, consistency, isolation, durability):
- Atomicity: A transaction must be processed in its entirety or not at all
- Consistency: A transaction must keep referential integrity rules between linked tables
- Isolation: Simultaneous execution of transactions must lead to the same result as if they were executed one after the other
- Durability: Once a transaction has been executed it will remain so
Record locking:
- The process of preventing simultaneous access of records in a database
- This is used to prevent inconsistencies or a loss of updates
- If anyone tries to access the same record they will not be able to
- The biggest problem with this is deadlock
Redundancy:
- The process of having one or more copies of the data in physically different locations
- This means that if there is any damage to one copy the others can be recovered
Web technologies
Web development
HTML:
- HTML is the language/script that web pages are written in
- It allows a browser to interpret and render a webpage for the viewer by describing the structure and order of the webpage
- The language uses tags written in angle brackets (<tag>, </tag>) there are two sections of a webpage, a body and head
HTML tags:
- <html>: all code written within these tags is interpreted as HTML
- <body>: defines the content in the main browser content area
- <link>: this is used to link to a css stylesheet
- <head>: defines the browser tab or window heading area
- <title>: defines the text that appears with the tab or window heading area
- <h1>, <h2>, <h3>: heading styles in decreasing sizes
- <p>: a paragraph separated with a line space above and below
- <img>: self closing image with parameters (img src = location, height = x, width = y)
- <a>: anchor tag defining a hyperlink with location parameters (<a href = location> link text</a>)
- <ol>: defines an ordered list
- <ul>: defines an unordered list
- <li>: defines an individual list item
- <div) creates a division of a page into separate areas each which can be referred to uniquely by name (<div id="page">)
Classes and identifiers:
- Class and identifier selectors are the names which you style, this means groups of items can be styled, the selectors for html are usually the div tags
- Identifiers are defined with an initial hashtag, and must be unique to each webpage
- Classes are defined with a full stop as a prefix to the class name, classes can be used multiple times on a webpage
CSS:
- CSS is a script/language like HTML except is used to describe the style of a webpage
- CSS can be used to specify the way HTML elements look, they can be applied to whole tags such as <h1>, <p> or <div>
- CSS can be used using two different form, internal/embedded or external CSS the internal CSS is when the style is placed
- The internal/embedded CSS is placed inside the style tags and is entered directly within the HTML document
JavaScript:
- JavaScript is a language which has a similar layout to languages like python. The main function of JavaScript is to add interactivity to websites
- JavaScript isn't compiled, instead it is interpreted, this is so it can be interpreted in the browser every time the webpage is displayed
- JavaScript be used to input data on the client's computer, this may change the local page interactively to websites
- The local computer can fix invalid data before sending it off to the servers
- It can ease the traffic off of the busy servers
Search engine indexing
Search engines:
- A search engine is a program that searches through a database of internet addresses looking for resources based on a criteria set by the client
- Search engines rely on an index of web pages, web crawlers are used to collect the information it works by traversing the internet webpage by looking for linked sites
- The web crawlers collect keywords and phrases from the webpage. Web crawlers also collect the meta data from websites. This is the information specified by the website owner
Page rank algorithm:
- The page rank algorithm ranks each web page, higher ranked pages will show up first on the search engine. There are two factors which determine the page rank of a page:
- How many incoming links it has from other pages
- The page rank of the web pages that link to it
The data structure used is a directed graph to show which pages link to what websites, the webpages are nodes and the links between two pages are the arcs between the nodes. The algorithm of pagerank itself is given by:
PageRank(x) = (1-d) + d[(PageRank(T1)) + ... + (PageRank(Tn) + Count(Tn))]
-