Please enable JavaScript.

Coggle requires JavaScript to display documents.

ahgfhgvcht, Container = code packaged up with all its dependencies # -…

- - - - Challenges
        
        Large datasets
        
        Fast-changing data
        
        Varied data
      - Need
        
        Indexing World Wide Web
      - Sol: inventing new data processing methods
        
        2002: GFS: handle data sharing and petabyte storage at scale - served as the foundation of Cloud Storage and also what would become the "managed storage functionality' in BigQuery
        
        2004: (prob; indexing the exploding volume of content in the web) - sol: Introducing MapReduce: new style of data processing designed to manage large-scale data processing across big clusters of commodity servers.
        
        2005: (prob: challenges as recording and retrieving millions of streaming user actions with high throughput) - sol: release of Cloud Bigtable
        
        2008: (prob: With MapReduce available, some developers were restricted by the need to write code to manage their infrastructure, which prevented them from focusing on application logic), sol: introducing Dremel: new approach of big-fata processing by breaking the data into smaller chunks called shards, and then compressing them. Dremel then uses a query optimizer to share tasks between the many shards of data and the Google data centers, which processed queries and delivered results. (The big innovation was that Dremel autoscaled to meet query demands.)
        
        2010: Colossus (cluster-level file system and successor to the Google File System) & BigQuery (see def, announed may 2010, generally available Nov 2011)
        
        2012: Spanner (see def)
        
        2015: Pub/Sub (def: a service used for streaming analytics and data integration pipelines to ingest and distribute data) & TensorFlow
        
        2018: TPU & AutoML (a suite of ML products)
        
        2021: Vertex AI (a unified ML platform)
    - - Cloud Storage, Dataproc, Cloud Bigtable, BigQuery, Dataflow, Firestore, Pub/Sub, Looker, Cloud Spanner, AutoML, and Vertex AI
  - - - Requirements
        
        Prob
        
        Since 2012, required computational power required for ML applications follow no more Moore Law, and it doubles every 3.5 months (far more than CPU & GPU development scales)
        
        Sol
        
        The use of TPUs
        
        Specs
        
        2016
        
        ASICs (application-specific integrated circuits)
        
        Domain-specific hardware (vs GPUs and CPUs : general-purpose hardware)
        
        Accelerate ML workloads (tailoring architecture to meet the computation need in a domain as the matrix multiplication in ML)
        
        Faster & more enerdy-efficient for AI apps and ML than CPUs and GPUs
        
        Included in google cloud products and services
    - - What
        
        Reduce and effort needed for storing data
      - How
        
        by creating an elastic storage bucket directly in a web interface or through a command line ((for example on cloud storage))
        
        Offering relational DB & non-relational DB & Worldwide object storage
- - - - Compute engine
        
        Specs
        
        Iaas offering
        
        C&S&Network virtually as data center
        
        Max flexibility
        
        Run in individual VM
      - Google Kubernetes engine (GKE)
        
        Specs
        
        Containerized apps
        
        Run in cloud env
      - App Engine
        
        Specs
        
        Fully managed PaaS offering
        
        Bind code to libraries needed (a
        
        Allowing ressources to focus on the app logic
      - Cloud functions
        
        Specs
        
        Faas offering
        
        Execute code in response to events
      - Cloud run
        
        Specs
        
        Fully managed compute platform
        
        Enables to run requests or event-driven stateless workloads
        
        Without having to worry about servers (Abstracts awal all infra management) : automaticcaly scales up and down, etc
        
        Charges only for resources used
    - - Cloud Bigtable
        
        Specs
        
        best for real-time, high-throughput applications that require only millisecond latency
      - Cloud SQL
        
        Specs
      - Cloud storage
        
        Specs
        
        Managed service for storing unstructured data (object files)
        
        object: an immutable piece of data consisting of a file of any format
        
        Objects are stored in containers called "buckets"
        
        Buckets are associated with a project
        
        Projects can be grouped under an organization
        
        Each project, bucket, and object in GC is a resource in GC (as are things such as Compute Engine instances)
        
        App examples
        
        serving website content
        
        storing data for archival
        
        disaster recovery
        
        distributing large data objects to end users via Direct Download
        
        Storage classes
        
        Standard storage (Hot data): 1- best for frequently accessed or "hot" data, 2- also great for data that is stored for only brief periods of time
        
        Nearline storage (Once per month): expl: data backups, long-tail multimedia content, data archiving
        
        Coldline storage (Once every 90 days at most): another low-cost option for storing infrequently accessed data
        
        Archive storage (Once a year or less): lowest-cost option (higher costs for data access and operations and 365-day minimum storage duration) - used ideally for data archiving, online backup and disaster recovery
      - Cloud Spanner
        
        Specs
        
        def: is a globally available and scalable relational database.
      - Firestore
        
        Specs
        
        def: transactional NoSQL, document-oriented database
      - BigQuery
        
        Specs
        
        def: is Google's data warehouse solution: a fully-managed, serverless data warehouse that enables scalable analysis over petabytes of data
        
        is a PaaS that supports querying useing ANSI SQL
        
        has built-in ML capabilities
        
        Dremel is the query engine behind BigQuery
  - - - Storage (5)
        
        Cloud Storage, + relational: Cloud SQL, Cloud Spanner + unrelational: Cloud Bigtable, Firestore
        (?? where is BigQuery (the warehouse))
      - Analytics (3)
        
        BigQuery
        
        Looker
        
        LookerStudio
      - ML
        
        ML&AI solutions (4)
        (built on the ML developement platform)
        
        Document AI
        
        Contact Center AI
        
        Retail Product Discovery
        
        Healthcare Data Engine
        
        ML developement platform (4)
        
        Vertex AI
        (the primary product including the 3 others)
        
        AutoML
        
        Vertex AI Workbench
        
        Tensorflow
      - Ingestion & process
        (products for digesting both realtime and batch data)
        
        Dataproc
        
        Dataflow
        
        Cloud Data Fusion
        
        Pub/Sub
- - - - Regions (34)
      - Zones
      - Locations (5)
- - - - Stuctured
        
        Def
        
        Tables - rows - columns
        
        Types
        
        Transactional workloads:
        
        stem from Online Transaction Processing systems (used when fast data inserts and updates are required to build row-based records) (Usually to maintain a system snapshot)
        
        Require relatively standardized queries that only impact a few records
        
        Accessed using SQL
        
        better for local/regional scalability
        
        Cloud SQL
        
        global scalability
        
        Cloud Spanner
        
        Accessed without SQL(NoSQL)
        
        Firestore
        
        Analytical workloads:
        
        stem from Online Analytical Processing systems (used when entire datasets need to be read)
        
        Often require complex queries (expl: aggregations)
        
        Accessed using SQL
        
        BigQuery
        
        Accessed without SQL (NoSQL)
        
        Cloud Bigtable
        (best for real-time, high-throughput applications that require only millisecond latency)
      - Unstructured
        
        Services to use
        
        Usually suited to Cloud Storage
        
        BigQuery now offers possibility to store it as well
        
        Def
        
        Documents
        Images
        Audio files