Please enable JavaScript.
Coggle requires JavaScript to display documents.
Worker Node, localhost
Pod, localhost
Pod, API SErver, Worker Node, User…
-
-
-
-
-
-
-
-
-
-
-
-
-
VM2
ntns1
veth1 > 172.16.0.2
veth1 / veth2
br0
172.16.2.1
-
-
-
VM2
ntns1
veth1 > 172.16.0.2
veth1 / veth2
br0
172.16.2.1
-
-
-
VM2
ntns1
veth1 > 172.16.0.2
veth1 / veth2
br0
172.16.2.1
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
within that cluster we got 2 nodes .. and each node get an ip address for the network ip address space and another carve out for the pods from the cluster ip address space
-
kuberentes does not say anything about things outside of the cluster.. it acts as if they it dosent exist
not part of the kuberentes netwotking model
-
-
-
each cluster has a private networkand the pods gets ips from that private netowkr and there are bridges/gateways into the cluster and out of the cluster setup byh our env
-
our nodes exists with 2 interfaces
one leg on the main net (the 10.240 net) and one leg on the private net () and they know how to route traffic from each side
Gateways are basically access mechanisms / pods do not directly
using our node as gateway
simplest forms of gateways - no extra load balancers or routers or anything
Traffic
one leg on the main net (the nodeIP address 10.240 net) and one leg on the private net (10.0 net) and they know how to route traffic from each side
going deeper into the nodePort service
flow
client 10.128 sending a packet which is targeted to a nodePort .. client only knows node port not pod port and after it gets to the node we should have a load balancer service in place to do the work and dns to tell us to which service we forward that packet ..
when it arrives at the node > node uses the DPort of the ip packet to figure out which service were talking to (talking to 30001 > service foo / Talking to 30002 >> service oth) which is what we call a destination net translation DNAT
and through picking one of the backends that it understands as being part of one of the services and its gonna rewrite that packet to direct to the pod depending on the ip it came from >> most common implementation of this is iptable, ipvs nftables ...
on the reverse the host do the oppsite of the flow and converts the packet back and cliinet will get NodeIp at source of packet
- common things people do is that they will ingress the traffic into L7 proxy >> they run somthing like envoy nginx inside cluster , forward all outside traffic through a nodePort into that proxy (nginx) and use it to do forwarding to rest of the applications and this is whre we see incluster nginx controllers
common things people do is that they will ingress the traffic into L7 proxy >> they run somthing like envoy nginx inside cluster , forward all outside traffic through a nodePort into that proxy (nginx) and use it to do forwarding to rest of the applications and this is whre we see incluster nginx controllers
Egress IP Masquerading (Service NAT)
traffic leaving this machine will look it came from this machine
packet leaving a node from a pod will take that node ip as source ip at the node edge
VIP
instead of having to know which node and which node port to use we give u a virtual ip address taht represents a service in our cluster .. similar to using node dst port we use the dst ip
but with egress we still need something like snat
Gateway : proxy ingress
more proxiful of doing things , instread of using a private ip address we have an actual proxy
the main diffrence is that a virtual ip does not terminate a tcp session .. proxy will terminate a tcp session and start a new one
packet arrive at proxy and proxy will choose which backend it will go to and forward the packet
proxy can either route to nodePort or directly to pod Ips
Proxy obscure the clinet ip address and traffic will appear as if goes from that proxy.. and so if we needed our pods to know the client Ips then we need to choose another mechanism
and again we still need Snat to leave the cluster (egress)
-
-
-
in this example seen at 2 nodes connected under the same submet / virt switch between them // availability and connection through specifying routes
however the node/server could be on diffrent subnet and in that case simple route manipulation is not gonna work and we have to use overlay nets
After seeings overlays we have the following example
suppose these servers are on diffrent subnets and there is router between them Soo we cant really use routes simply to provide connectivity umongest various conainers on the 2 diffrent servers
So we need to come up with a strategy for setting up a simple overlay net
we setup an overlay to provide a connectivytiy amonfst these various containers on 2 diffrent serviers
and so what we wanna do is setip a udp runnel between the 2 servers and this tunnel have diffrent ip address belong to net of each side and this is how communication / data is encapsulated inside udp packets
procedure is similar to before the diffrent in defining variables are that we define tunnel IP address in each server side
further on the specific tunnel setip we take adv of a specific utlity called socat
powerful utlity that provides connectivoty between 2 datasources using various methods Udp/Tcp / Sockets/ Ssl / pipes...
in our case we use Udp defining ports on each side Ip address and type
Tunnel using this utility are 2 >> tun (layer 3 socket connectiivity / layer 2 encapsulation )
and we can use tshark for example to monitor traffic and test connectivity by monitoring for example Veth11 while Vm1 pinging Vm2
-
-
each time that we create a linux host a default netns is also installed
Netns is basically a container of everything and anything that has to do with the net
ipAddress/Net interface / Firewall rules/ Firewall...
-
when Calico CNI Installed it create the veth part and that comes in pair one pai on the pod/kuberentes itself and the other on the host
-
-
if pods from same node needed to communicate they can do that through their calic interfaces connecting them to the same host without the need of the tunnel
for communication beween pods from diffrents hosts that it has to go through the tunnel and this iwhere IPinIP transformation occurs
creates an IP header and sticks everything from that pod/eth0 communication inside that ip address and masquerate the source with its own ip > 172.16.94.0
and sends it through the interface eith0 of the node
IpinIP packet
- frame (ethernet)
- ip(iIPIP) packet (hosts source/dest)
- ip(ipv4 ofr pods) (pods source/dest) http
- tcp
- ...
and reverse of it on the dest host on the response
-
default encapsulation for Callico is IP in IP protocol which involves wrapping a layer 3 IP packet in side an extra IP header that can only encaosulate IP packets..
how calico Uses PGP to share and exchange routes info between nodes to facilitae pod network communication
-
-
-
IPinIP is needed in several cases but its actually better not to use encapsulation >> makes communication faster
-
makes routing decisions based on paths, network policies or rule-sets configured by a system administrator.. basically internet is built on this protocol
- who on the internet i can sent packet to
- make a decision which route should a packet to
BGP connects diffrent networks by exchanding routing information stored diffrent nodes from each network and paths related to each network
Calico takes advantage of this protocol in order to share routes amongest its nodes
when we install calico it creates by default a full mesh of internal BGP connections
and this is done through peering each node with every other node
every route change on one node this change is reflected on every other node
-
if our nodes are not on same subnet calico can laverae IpinIP / in case IPinIP traffic is blocked full mesh can be established using L3 network
in case we are using Vxlan, BGP is not supported in that case
-
-
Calico installs some software that we can think of it as a vrouter on each node and so the node basically becomes like an indepedant network (which is wht BGP works with (Connecting networks)) and using it this is how full mesh is established (each node(net) is peered with every other node(net))
-
-
-
rather than establishing full mesh on all nodes we can select some of them and establish fullmesh amongst them and have the other peered to them (to the nodes directly part of the full mesh )and basically every nodes from these peered nodes reflect changes on the node connect them to the full mesh thus to the entire cluster and this possbile using reflectors
-
we can configure calico to peer directly with our infrastructure
meaning we disable calico full mesh behavior and instread we peer it with our L3 TOR and this is best option as we have full control over our infra
-
-
-
-
-
-
normally these pods communicate to each other through the bridge
this is not the only way we can do t also through iptables depending on the CNI we are using
but if a pod wanna communicate to a pod on the other node they have to use a L2/L3 overlay network which basically hides networking complexity between pods on diffrent nodes as if they are on same node
before we created ny hand the container network interfaces, the bridges and also we setup manually the tunnel between nodes, CNI does all of that for us provides all those services for us , creating container netowkring
Kubernetes does not actually manages pod networking but CNI does all of that... as implementations/plugins there are Flannel CNI and Conical ..
pods on same node communicate to each other through the bridge and iptables depending on the CNI provider