Please enable JavaScript.
Coggle requires JavaScript to display documents.
Bigtable (Instances (Clusters (Nodes, 1 or 2 clusters), Instance Type…
Bigtable
Instances
Clusters
Nodes
1 or 2 clusters
Instance Type
Production
Development
Storage Types
Replication
App profile
Routing policy
Schema
Row key
Avoid
Frequently updated
Hashed value
Sequential
Domain name
Column families
Concepts
Each table has only one index: Row key
Atomic only at row level
Keep all info of an entity in a single row
Related entities should be in adjacent rows
Tables are sparse
Guideline
Single value < 10MB
Single row < 100MB
Performance
Best for >1 TB data over a long time
Causes of
slower performance
Workload is small and short
Bad schema:
R/W are not evenly distributed
Rows contain large data (> 1KB)
Rows contains too many cells
Overloaded:
don't have enough nodes
client is not in the same zone as the cluster
How data is optimized
over time
Table are shared into tablets
Troubleshooting
As few clients as possible
Check usage pattern for hotspots
Migration
HBase to Bigtable
HBase -> sequence files
Sequence files ->Cloud storage
Transfer Appliance: > 20TB
distcp
BW > 100 Mbps AND data < 20 TB
Can start a hadoop job to copy to cloud storage
gsutil
BW > 100 Mbps AND data < 10 TB
Storage Transfer Service
Bigtable (via dataflow)
Hadoop jobs to Dataproc
Access control
Project level
Instance level
Limits
Only single-row transactions