Execution Environments

Dataflow

Apache Beam

Java

Python

Autoscaling

CPU utilization

Work remaining

Throughput

Dynamic Workload
Re-balancing

Keep all workers busy

Eliminates Stragglers
(small chunk of data
takes longer to process)

uses work
stealing/shedding

Batch/Stream processing

fraud detection

IoT

gaiming

click stream,
point of sale, segmentation

Cloud Functions

node.js

index.js

package.json

pub/sub

cloud storage

http

App Engine Flexible

use cases

backends for mobile

web apps

HTTP APIs

ssh to VM's

disabled by default

microservices

gcloud app deploy

uploads sources into GCS

builds Docker image

pushes image into
Container registry

creates Load Balancer

creates/manages
VM's in 3 Zones

setup monitoring, logging,
health checks, error reporting

0 downtime
traffic split

canary

AB testing

limitations

just http(s)

access to tempfs only

  • in memory
    no persistent disks!

at least 2 nodes running all times!

App Engine Standard

scales to 0

must use App Engine Standard API's

not portable to other
compute environment:

  • not using client libraries

Execution Environments (cont.)

Kubernetes Engine

for complex, portable applications:

  • as runs containers

http(s), any others protocol

worker node(s)

pod(s)

container(s)

kube proxy

kubelet

master node(s)

scheduler

controller manager

api server

Compute Engine

Predefined and Custom
machine types

Persistent Disks (standard, balanced), Local SSD's

Instance Groups

lift and shift

VM Startup

provisioning

boot

request

plan ahead for traffic boost

1 Application

1..n Services

1..n Versions

1..n Instances

at least 'default'

traffic migration -> 100% to certain version
traffic split

App Engine SDK to run locally

services availble

task queue

scheduled tasks

memcache

search

logs

auth

60sec timout

no PD local persistance

pay per hour

pay per class

Endpoints

ESP v2

Extensible Service
Proxy (ESP)

app engine flex

GCE

GKE

k8s

app engine

cloud functions

GKE

Cloud Run

GCE

k8s

auth

logging

monitoring

ApiGee

analytics

monetization

business oriented

select region

select region

go

java

phyton

Networks scales per vCPUs

2Gbps/vCPU

max to 200Gbps (for 176 vCPU)

min 10Gbps (for 2/4 vCPU)

predefined

shared core

memory cpu

high mem

memory optimized

cpu optimized

preemtible

up to 24 hours

sole tenant nodes

Cloud Run

regional

stateless

http

pubsub

can be run inside GKE

shares network & file system

etcd

cloud manager

GKE

Node pools

assign based on label

private cluster
possible

$HOME/.kube/config

Deployment

kubectl run --generator deployment/apps.v1

kubectl apply

cloud console

kubectl autoscale deployment <name> --min=3 --max=10 --cpu-percent

rollingUpdate

recreate

pods deleted and recreated

kubectl rollout undo deployment <name>

kubectl rollout pause/resume deployment <name>

deployment's rollout is triggered if and only if the deployment's Pod template (that is, .spec.template) is changed,

Jobs

Job

Parallel Job

Cron Job

parallelism > 1


either in yaml
or via
scale

Kind: CronJob

Scheduling

nodeSelector

nodeAfinity

podAfinity

topology: node, zone, region

preferredDuringSchedulingIgnoredDuringExecution

requiredDuringSchedulingIgnoredDuringExecution

Taints - defined on nodes

tolerations - on pods

Networking

alias IP ranges
~4000IPs per cluster

NEG: Network Endpoint Groups
used by LB - traffic from LB to PODS directly
(not to NODES)

Security

network policy

pod level firewall rules

nodes recreated

enable for master and nodes
deploy network policies

Storage

Volume

EmptyDir

ephemeral

shares pod's lifecycle

from node's local disk
or memory

ConfigMap

Secret

= ConfigMap but secured

always in-mem (tmp file system)

downwardAPI

pods metadata

ephemeral

ephemeral

PV, PVC

persistant

size, class, access (R/W)

ReadWriteOnce

mounted to single node

ReadOnlyMany

mounted to several nodes

ReadWriteMany

not supported by GCP disks
but supported by NFS systems

k8s has only ServiceAccounts,
user identities managed outside

RBAC

Roles
(namespace level)


and ClusterRoles

Subject

resource's + verb's

user

group

serviceAccount

disable access to node's metadata
by removing role
compute.instance.get


as node contains secret with cert kubelete using to talk with serverAPI

Security Context
Pod spec

Pod Security
Policy

PCollections

PTransforms

Map

FlatMap

ParDo

GroupByKey

CombineByKey - better

Flattern == union

firebase

60sec default timeout

/tmp memory mount

use cases

tiny ETL

webhooks

event handling

can generate Dockerfile

can migrate to GCE, GKE, functions

by IP

by cookie

rand

uses IP tables

accessed by GCP tools via private IP

accessed by Authorized networks (trusted IP ranges)

gcloud container clusters get-credentials <cluster> --zone <zone>

Spot VMs

next gen for preemtible

no live migrate

no auto restart

no max runtime limit

Confidential VMs

encrypt data being processed

Shielded VMs

Secure Boot, virtual
trusted platform module or vTPM-enabled Measured Boot

NEW State-full IP adresses