Please enable JavaScript.
Coggle requires JavaScript to display documents.
Analysis of Big Data Security Practices - Coggle Diagram
Analysis of Big Data Security Practices
Introduction
Hadoop
Initially developed to analyze largescale data
Choosing the right distribution
Segurity
In hadoop, Security was not a concern at first.
Hadoop evolution
Security has become a major concern for sensitive data
Big data
Pursue new opportunities and stay ahead of competitors for organizations
Data wastage due to lack of experience with big data
Big Data Service Providers
Cloudera Hadoop Distribution
Founded in 2008
Pioneer in Apache Hadoop Core
Hortonworks Data Platform
Founded in 2011
Provides Core Apache Hadoop-based platform for big data
MapR Distribution
Replace HDFS with its MapRFS file system
IBM BigInsights
Includes Spark for fast scaling of analytics
Key Big data Security Isues
Hadoop’s Data Placement Technique and Multi-tenancy
Issues
Hadoop Current data placement strategy is inadequate
Not well suited to multi-tenancy
Control-over Data
Hadoop does not have roles, groups, security schemes and others
Inter-node Communication Issues
Communication between nodes not secure
Client Interaction
No security mechanism is provided to protect client nodes.
Distributed Nodes Issues
Moving computation is cheaper than moving dat
Authentication of Applications and Nodes
Reducing the use of unauthorized services
Audit and Logging
Logging of activity is needed to detect suspicious behaviors
Data at Rest Protection
Encryption could help protect data at rest
Administrative Data Access
Several administrators to separate data responsibilities
Big Data Players - Security Comparison
Cloudera
Guard the Perimeter – Authentication
Kerberos and AD/LADP for user and service authentication
Cloudera Manager
Access Controls – Authorization
POSIX permissions control
Access Control Lists (ACL)
Role-based access control application (RBAC)
Gain Visibility – Auditing
Cloudera Navigator
Cluster and RPC Authentication – Impersonation
Kerberos RPC
Data Protection – Encryption
TLS security deployed with Cloudera Manager
HDFS encryption
Navigator Encrypt and Navigator Key Trustee
Cloudera Navigator Key Trustee
Hortonworks
Guard the Perimeter – Authentication
Kerberos with Apache Ambari to authenticate users
Apache Knox for perimeter security
Access Controls – Authorization
Apache Ranger access control
Cluster Administrators can define security policies
Permissions for certain LDAP-based or individual groups using Ranger
Cluster and RPC Authentication – Impersonation
RPC connections use Simple Java Authentication and SASL
Gain Visibility – Auditing
Hortonworks Apache Atlas manages metadata or classifies data, performs audits and manages data security.
SLL supports encryption
Data transfer protocol encrypted with RC4 or 3DES
Data Protection – Encryption
Wire Encryption, Voltage and Dataguise for data movement
Encryption of data at rest TDE
MapR
Authentification
Kerberos and Native Authentication
Authorization
Access Control Expressions (ACE) most powerful autotyping model
ACE allows access control by means of Boolean logics.
Data access to MapR tables, files, directories, volumes and streams
Uses ACLs for permissions to manage the cluster
Auditing
Access to audit-enabled data
SIEM to analyze these records
Encryption
For data in motion, over-the-wire encryption is used between MapR nodes.
IBM BigInsights
Authentication
LDAP and Kerberos for user authentication
Apache Knox
Authorization
Authorization at BigInsights
Auditing
IBM Security
Guardium Data Activity Monitor
Encryption
IBM Security Guardium Data Encryption and Hadoop Transparent Data Encryption
IBM Security Guardium Data Encryption
SSL and TLS Certificates