Big Data Analytics in Bioinformatics

What is big data in bioinformatics?

Comparison

SQL

NoSQL

MongoDB design specifications

Application of big data analytics in bionformatics (research paper)

Performance

Example: MySQL, Oracle

Good at structured data and transactional, high performance workloads

To consider performance as part of the design specification to ensure that the database can handle the expected workload.

Distributed Database Management System

Relational Database Management System (RDMS)

Horizontally scaled

Example : MongoDB, CouchDB, Cassandra

Indexes

Dynamic Schema

To identify which fields will be used in queries and to create indexes on those fields to optimize query performance.

Fixed schema for organizing data

Not suitable for complex queries

Data stored in collection and documents

Data Model

Uses denormalized data structur

Determine how the data will be organized into collections and documents, and to consider the relationships between different types of data.

Replication

Vertically scaled

To consider replication as part of the design specification to ensure that the database can tolerate server failures and provide reliable access to data.

Can be used for complex queries

Sharing

To improve scalability and performance and to consider sharing as part of the design specification.

Backup and Recovery

Srivastava, A., Naik, A. (2021). Big Data Analysis in Bioinformatics. In: Singh, V., Kumar, A. (eds) Advances in Bioinformatics. Springer, Singapore. https://doi.org/10.1007/978-981-33-6191-1_22


Supriya, P., Marudamuthu, B., Soam, S.K., Rao, C.S. (2021). Trends and Application of Data Science in Bioinformatics. In: Rautaray, S.S., Pemmaraju, P., Mohanty, H. (eds) Trends of Data Science and Applications. Studies in Computational Intelligence, vol 954 . Springer, Singapore. https://doi.org/10.1007/978-981-33-6815-6_12


K. A. C. S. Ocaña, V. Silva, D. d. Oliveira and M. Mattoso, "Data Analytics in Bioinformatics: Data Science in Practice for Genomics Analysis Workflows," 2015 IEEE 11th International Conference on e-Science, Munich, Germany, 2015, pp. 322-331, doi: 10.1109/eScience.2015.50.


Meher, J. (2021). Potential Applications of Deep Learning in Bioinformatics Big Data Analysis. In: Prakash, K.B., Kannan, R., Alexander, S., Kanagachidambaresan, G.R. (eds) Advanced Deep Learning for Engineers and Scientists. EAI/Springer Innovations in Communication and Computing. Springer, Cham. https://doi.org/10.1007/978-3-030-66519-7_7

To consider backup and recovery as part of the design specification to ensure that data can be easily restored in the event of a disaster or data loss.

In bioinformatics, big data refers to the huge volume of biological and genetic information generated by various techniques and technologies such as asequencing, microarray analysis, and mass spectrometry. This data is often complex, varied, and multidimensional, making traditional data processing methods challenging to handle and analyse.

Characteristics

Volume

Variety

Veracity

Velocity

Value

Group Member:
Sayang Elyiana Amiera
Phang Cheng Yi
Indira Thangaraj
Keshiniy Mogan
Iman Ehsan