Please enable JavaScript.
Coggle requires JavaScript to display documents.
Chapter1: Basic - Coggle Diagram
Chapter1: Basic
-
-
Types of
Big Data
Structured data:
- data is ordered in a well structured format,
all entities in the same group share the same attributes.
- is easily organised by
+Managed by CreateReadUpdateDelete,
+Doesn't face with normal difficulties like unstructured/semistructured data bcs data follow predefined schema ->easy to process, analyze, integrated with other datasource.
- Stored in traditional/relational databases (spreadsheet/StructureQueryLanguage).
Unstructured data:
- Data is not ordered in a well-structrued format,
Don't fit relational DBs, data is only divided into categories loosly.
(text files(docs,PDF), video, audio, social media content).
- Stored in data lake, datawarehouse, NoSQL DBs (Mongo DBs).
Semi-structured data:
- Mixture of structured and unstructured data
e.g. email (date, from, to vs content), image (date, location vs pixels).
- Machine learning can used to process unstructured data quickly.
Data
sources
From Social media: posts, comments, photos, video...
From Machine: IoT devices like weather sensor, fitness tracker, vihicle... nearly half amount of total big data now
-
Data platform
and layers
Data platform:
- is an integrated set of technologies that collectively meet end-to-end data needs of organizations.
- it enables aquistion, prepare, delivery, storage, governance, security of your data for users and applications.
- is a key to unlock the value of data.
Data platform's layers:
- Data source
- Ingestion layer
- Processing layer
- Storage layer
- Visualization layer
- Security
- Data governance
Types of
data platform
Centralized data platform:
- (nearly) All layers are implemented on one central platform.
- Central platform contains all data from all departments/applications.
- Centralized computing: One platform on one centralized server.
Distributed data platform:
- Separation only according to department/application data. Or
- Layers of one huge platform are implemented on different locations.
- Distributed computing -> One platform is distributed on multiple servers.
Platform as a Service: data platform implementation is offered by 3rd party.
- Include insfrastructure (server, network, storage), applications (DB management system), solutions for develop, operate, run applications.
On-premises: data platform implementation by yourself, on your server.