Big Data Analytics in Bioinformatics
What is big data in bioinformatics?
Comparison
SQL
NoSQL
MongoDB design specifications
Application of big data analytics in bionformatics (research paper)
Performance
Example: MySQL, Oracle
Good at structured data and transactional, high performance workloads
To consider performance as part of the design specification to ensure that the database can handle the expected workload.
Distributed Database Management System
Relational Database Management System (RDMS)
Horizontally scaled
Example : MongoDB, CouchDB, Cassandra
Indexes
Dynamic Schema
To identify which fields will be used in queries and to create indexes on those fields to optimize query performance.
Fixed schema for organizing data
Not suitable for complex queries
Data stored in collection and documents
Data Model
Uses denormalized data structur
Determine how the data will be organized into collections and documents, and to consider the relationships between different types of data.
Replication
Vertically scaled
To consider replication as part of the design specification to ensure that the database can tolerate server failures and provide reliable access to data.
Can be used for complex queries
Sharing
To improve scalability and performance and to consider sharing as part of the design specification.
Backup and Recovery
Srivastava, A., Naik, A. (2021). Big Data Analysis in Bioinformatics. In: Singh, V., Kumar, A. (eds) Advances in Bioinformatics. Springer, Singapore. https://doi.org/10.1007/978-981-33-6191-1_22
Supriya, P., Marudamuthu, B., Soam, S.K., Rao, C.S. (2021). Trends and Application of Data Science in Bioinformatics. In: Rautaray, S.S., Pemmaraju, P., Mohanty, H. (eds) Trends of Data Science and Applications. Studies in Computational Intelligence, vol 954 . Springer, Singapore. https://doi.org/10.1007/978-981-33-6815-6_12
K. A. C. S. Ocaña, V. Silva, D. d. Oliveira and M. Mattoso, "Data Analytics in Bioinformatics: Data Science in Practice for Genomics Analysis Workflows," 2015 IEEE 11th International Conference on e-Science, Munich, Germany, 2015, pp. 322-331, doi: 10.1109/eScience.2015.50.
Meher, J. (2021). Potential Applications of Deep Learning in Bioinformatics Big Data Analysis. In: Prakash, K.B., Kannan, R., Alexander, S., Kanagachidambaresan, G.R. (eds) Advanced Deep Learning for Engineers and Scientists. EAI/Springer Innovations in Communication and Computing. Springer, Cham. https://doi.org/10.1007/978-3-030-66519-7_7
To consider backup and recovery as part of the design specification to ensure that data can be easily restored in the event of a disaster or data loss.
In bioinformatics, big data refers to the huge volume of biological and genetic information generated by various techniques and technologies such as asequencing, microarray analysis, and mass spectrometry. This data is often complex, varied, and multidimensional, making traditional data processing methods challenging to handle and analyse.
Characteristics
Volume
Variety
Veracity
Velocity
Value
Group Member:
Sayang Elyiana Amiera
Phang Cheng Yi
Indira Thangaraj
Keshiniy Mogan
Iman Ehsan