Please enable JavaScript.
Coggle requires JavaScript to display documents.
Twelfth reading - Analysis of Big Data Security Practices, Ariana Alvarado…
Twelfth reading - Analysis of Big Data Security Practices
Choosing the correct distribution is the initial step.
Originally focused on freely available public data.
Initially, security was not a primary concern.
Capabilities expanded beyond initial requirements.
Sensitive data handling poses a significant challenge.
Lack of awareness or maturity in handling big data opportunities.
BIG DATA SERVICE PROVIDERS
Big data market has seen a surge in tools and providers.
Top players include Cloudera, Hortonworks, MapR, and IBM BigInsights.
They offer diverse solutions: native tools with Hadoop plugins or standalone Hadoop platforms.
Cloudera
Founded in 2008 by professionals from Yahoo and Google.
Pioneer in customizing Apache Hadoop core.
Leader in user base, utilizing Apache Hadoop in its distribution.
Hortonworks
Established in 2011, Hortonworks is a leading Hadoop distributor.
Provides an open-source data platform based on Core Apache Hadoop.
Distributes Apache Hadoop without added exclusive components.
Team contributed to key Hadoop advancements.
MapR
Apache Hadoop's open-source version has limitations.
MapR replaces HDFS with its proprietary file system, MapRFS.
MapR replaces HDFS with its proprietary MapRFS for enterprise-grade features.
IBM BigInsights
IBM offers a Spark and Hadoop suite for business enterprises.
Provides a complete solution, including Spark, to scale analytics quickly and easily.
Available on-premises, on-cloud, and integrates seamlessly with other existing systems.
KEY BIG DATA SECURITY ISSUE
Securing a Hadoop cluster is more complex due to its diverse applications.
Nine key security issues are recognized in the current IT landscape for Big Data.
Hadoop’s Data Placement Technique and Multi-tenancy
Issues
Hadoop commonly serves multiple applications and tenants.
Adapting to multitenant scenarios requires changes in how tenants' data is placed.
Security controls are vital for data privacy, especially in multitenant scenarios.
Control-over Data
Role-based access is crucial in relational databases (RDBMS) and data warehouses.
Hadoop should adopt role-based access, groups, and security schemes, similar to RDBMS platforms.
Inter-node Communication Issues
Default communication between Hadoop nodes is unsecured, risking data inspection and tampering.
Client Interaction
Clients interact with the resource manager and nodes, creating a potential risk.
Virus-affected clients may send malicious data or links to services.
Difficulty in protecting nodes from clients, clients from nodes, and name servers from nodes.
The absence of proper integration mechanisms raises overall security concerns in Hadoop.
Distributed Nodes Issues
Processing occurs where resources are, enabling massive parallel computation.
Enables efficient computation with available resources.
Challenges in verifying security across clusters with many moving data blocks.
Authentication of Applications and Nodes
Authentication reduces unauthorized services usage.
Stolen or duplicated Kerberos tickets can authenticate malicious clients.
Service cloning can introduce corrupted services into the cluster.
Kerberos plays a crucial role in authenticating users and services in the Hadoop cluster.
Audit and Logging
Swift detection of a cluster breach is crucial.
Logging plays a vital role in providing a record of activity, enhancing capabilities for security monitoring.
Data at Rest Protection
Encryption is the standard for safeguarding data at rest in a cluster.
It prevents unauthorized access to files, enhancing overall security.
Administrative Data Access
Administrators have full access to cluster data.
Need to establish clear separation of responsibilities among administrators.
Combat unsolicited direct access through robust access controls.
Implement encryption technologies for additional data protection.
Cloudera
Guard the Perimeter – Authentication
Cloudera Manager automates the setup process, reducing tedious tasks.
Access Controls – Authorization
Hadoop employs the POSIX way for data access control.
Gain Visibility – Auditing
Importance lies in comprehending the source and usage of cluster data.
Cluster and RPC Authentication – Impersonation
Hadoop services authenticate using Kerberos RPC.
Data Protection – Encryption
TLS, centrally deployed via Cloudera Manager, ensures secure data transmission.
Hortonworks
Guard the Perimeter – Authentication
Kerberos, integrated with Apache Ambari, authenticates Hadoop users.
Apache Knox ensures perimeter security by acting as a central entry point.
Access Controls – Authorization
Apache Ranger manages access control for consistent administration across Hadoop components.
Gain Visibility – Auditing
Apache Atlas, from Hortonworks, manages cluster metadata and classifies data.
It performs auditing and oversees the security management of cluster data.
Cluster and RPC Authentication – Impersonation
RPC connections utilize Simple Java Authentication and SASL, SSL for encryption.
Data Protection – Encryption
Hortonworks ensures security for data in motion using the Wire Encryption Method.
Commercial tools like Voltage and Dataguise go beyond encryption, adding de-identification at the source.
Ariana Alvarado Molina - 2021089068