Home Business Kubernetes Networking: Behind the scenes

Kubernetes Networking: Behind the scenes

by admin2 admin2
75 views
Kubernetes Networking: Behind the scenes

One of the things I love the most about Kelsey Hightower’s Kubernetes The Hard Way guide— other than it just works (even on AWS!)—is that it keeps networking clean and simple; a perfect opportunity to understand what the role of the Container Network Interface (CNI) is for example. Having said that, Kubernetes networking is not really very intuitive, especially for newcomers… and do not forget “there is no such thing as container networking”While there are very good resources around this topic (links here), I couldn’t find a single example that connects all of the dots with commands outputs that network engineers love and hate, showing what is actually happening behind the scenes. So, I decided to curate this information from a number of different sources to hopefully help you better understand how things are tied together. This is not only important for verification purposes, but also to ease troubleshooting. You can follow along with this example in your own Kubernetes The Hard Way cluster, as all of the IP addressing and settings are taken from it (May 2018 commits, before Nabla Containers).Let’s start from the end; we have three controller and three worker nodes.You might notice there are also at least three different private network subnets!. Bear with me, we will explore them all. Keep in mind that while we refer to very specific IP prefixes, these are just the ones chosen for the Kubernetes The Hard Way guide, so they have local significance and you can chose any other RFC 1918 address block for your environment. I will post a separate blog post for IPv6.This is the internal network all your nodes are part of, specified with the flag — private-network-ip in GCP or option — private-ip-address in AWS when provisioning the compute resources.Provisioning controller nodes in GCPProvisioning controller nodes in AWSEach of your instances will then have two IP addresses; a private one from the node network (controllers: 10.240.0.1${i}/24, workers: 10.240.0.2${i}/24) and a public IP address assigned by your Cloud provider, which we will discuss later on when we get to NodePorts.GCP$ gcloud compute instances listNAME ZONE MACHINE_TYPE PREEMPTIBLE INTERNAL_IP EXTERNAL_IP STATUScontroller-0 us-west1-c n1-standard-1 10.240.0.10 35.231.XXX.XXX RUNNINGworker-1 us-west1-c n1-standard-1 10.240.0.21 35.231.XX.XXX RUNNING…AWS$ aws ec2 describe-instances –query ‘Reservations[].Instances[].[Tags[?Key==`Name`].Value[],PrivateIpAddress,PublicIpAddress]’ –output text | sed ‘$!N;s/n/ /’10.240.0.10 34.228.XX.XXX controller-010.240.0.21 34.173.XXX.XX worker-1…All nodes should be able to ping each other if the security policies are correct (…and if ping is actually installed in the host).This is the network where pods live. Each worker node runs a subnet of this network. In our setup POD_CIDR=10.200.${i}.0/24 for worker-${i}.To understand how this is setup, we need to take a step back and review the Kubernetes networking model, which requires that:All containers can communicate with all other containers without NATAll nodes can communicate with all containers (and vice-versa) without NATThe IP that a container sees itself as is the same IP that others see it asConsidering there can be multiple ways to meet these, Kubernetes will typically handoff the network setup to a CNI plugin.A CNI plugin is responsible for inserting a network interface into the container network namespace (e.g. one end of a veth pair) and making any necessary changes on the host (e.g. attaching the other end of the veth into a bridge). It should then assign the IP to the interface and setup the routes consistent with the IP Address Management section by invoking appropriate IPAM plugin. [CNI Plugin Overview]Network namespaceA namespace wraps a global system resource in an abstraction that makes it appear to the processes within the namespace that they have their own isolated instance of the global resource. Changes to the global resource are visible to other processes that are members of the namespace, but are invisible to other processes. [Namespaces man page]Linux provides seven different namespaces (Cgroup, IPC, Network, Mount, PID, User and UTS). Network namespaces (CLONE_NEWNET) determine the network resources that are available to a process, “each network namespace has its own network devices, IP addresses, IP routing tables, /proc/net directory, port numbers, and so on”. [Namespaces in operation]Virtual Ethernet (Veth) devicesA virtual network (veth) device pair provides a pipe-like abstraction that can be used to create tunnels between network namespaces, and can be used to create a bridge to a physical network device in another namespace. When a namespace is freed, the veth devices that it contains are destroyed. [Network namespace man page]Let’s bring this down to earth and see how all this is applied to our cluster. First of all, Network plugins in Kubernetes come in a few flavors; CNI plugins being one of them (why not CNM?). The Kubelet in each node will tell the container runtime what Network plugin to use. The Container Network Interface (CNI) sits in the middle between the container runtime and the network implementation. Only the CNI-plugin configures the network.The CNI plugin is selected by passing Kubelet the — network-plugin=cni command-line option. Kubelet reads a file from — cni-conf-dir (default /etc/cni/net.d) and uses the CNI configuration from that file to set up each pod’s network. [Network Plugin Requirements]The actual CNI plugin binaries are located in — cni-bin-dir (default /opt/cni/bin)Notice our kubelet.service execution parameters include network-plugin=cni.[Service]ExecStart=/usr/local/bin/kubelet \ –config=/var/lib/kubelet/kubelet-config.yaml \ –network-plugin=cni \ …Kubernetes first creates the network namespace for the pod before invoking any plugins. This is done by creating a pause container that “serves as the “parent container” for all of the containers in your pod” [The Almighty Pause Container]. Kubernetes then invokes the CNI-plugin to join the pause container to a network. All containers in the pod use the pause network namespace (netns).{ “cniVersion”: “0.3.1”, “name”: “bridge”, “type”: “bridge”, “bridge”: “cnio0”, “isGateway”: true, “ipMasq”: true, “ipam”: { “type”: “host-local”, “ranges”: [ [{“subnet”: “${POD_CIDR}”}] ], “routes”: [{“dst”: “0.0.0.0/0″}] }}Our CNI config indicates we use the bridge plugin to configure a L2 Linux software bridge in the root namespace with name cnio0 (the default name is cni0) that acts as a gateway (“isGateway”: true).It will also setup a veth pair to attach the pod to the bridge just created.To allocate L3 info such as IP addressees, an IPAM-plugin (ipam) is called. The type is host-local in this case, “which stores the state locally on the host filesystem, therefore ensuring uniqueness of IP addresses on a single host” [host-local plugin]. The IPAM-plugin returns this info to the previous plugin (bridge), so any routes provided in the config can be configured(“routes”: [{“dst”: “0.0.0.0/0”}]). If no gw is provided, it will be derived from a subnet. A default route is also configured in the pod network namespace pointing to the bridge (which is configured with first IP of the pod subnet).Last, but not least, we also requested to masquerade (“ipMasq”: true) traffic originating from the pod network . We don’t really need NAT here, but that’s the config in Kubernetes The Hard Way. So, for the sake of completeness, I should mention the entries in iptables the bridge plugin configured for this this particular example; All packets from the pod which destination isn’t in the range 224.0.0.0/4 will be NAT’ed, which is somehow not aligned with “all containers can communicate with all other containers without NAT”. Well, we will prove you don’t need NAT in short.Pod routingWe are now ready to configure pods. We are going to take a look at all the Network namespaces in one of the worker nodes and analyze one of them after creating a nginx deployment as described in here. We will use lsns with the option -t to select the type of namespace (net).ubuntu@worker-0:~$ sudo lsns -t net NS TYPE NPROCS PID USER COMMAND4026532089 net 113 1 root /sbin/init4026532280 net 2 8046 root /pause4026532352 net 4 16455 root /pause4026532426 net 3 27255 root /pauseWe can find out the inode number of these with the option -i in ls.ubuntu@worker-0:~$ ls -1i /var/run/netns4026532352 cni-1d85bb0c-7c61-fd9f-2adc-f6e98f7a58af4026532280 cni-7cec0838-f50c-416a-3b45-628a4237c55c4026532426 cni-912bcc63-712d-1c84-89a7-9e10510808a0Optionally, you can also list all the Network namespaces with ip netns.ubuntu@worker-0:~$ ip netnscni-912bcc63-712d-1c84-89a7-9e10510808a0 (id: 2)cni-1d85bb0c-7c61-fd9f-2adc-f6e98f7a58af (id: 1)cni-7cec0838-f50c-416a-3b45-628a4237c55c (id: 0)In order to see all the processes running in the network namespace cni-912bcc63–712d-1c84–89a7–9e10510808a0 (4026532426), you could do something like:ubuntu@worker-0:~$ sudo ls -l /proc/[1-9]*/ns/net | grep 4026532426 | cut -f3 -d”/” | xargs ps -p PID TTY STAT TIME COMMAND27255 ? Ss 0:00 /pause27331 ? Ss 0:00 nginx: master process nginx -g daemon off;27355 ? S 0:00 nginx: worker processThis indicates we are running nginx in a pod along with pause. The pause container and rest of containers in the pod share the net and ipc namespace. Let’s keep the pause PID 27255 handy.Let’s now see what kubectl can tell us about this pod:$ kubectl get pods -o wide | grep nginxnginx-65899c769f-wxdx6 1/1 Running 0 5d 10.200.0.4 worker-0Some more details:$ kubectl describe pods nginx-65899c769f-wxdx6Name: nginx-65899c769f-wxdx6Namespace: defaultNode: worker-0/10.240.0.20Start Time: Thu, 05 Jul 2018 14:20:06 -0400Labels: pod-template-hash=2145573259 run=nginxAnnotations: Status: RunningIP: 10.200.0.4Controlled By: ReplicaSet/nginx-65899c769fContainers: nginx: Container ID: containerd://4c0bd2e2e5c0b17c637af83376879c38f2fb11852921b12413c54ba49d6983c7 Image: nginx…We have the pod name nginx-65899c769f-wxdx6 and the ID of one of the containers in it (ngnix), nothing about pause yet. It’s time to dig deeper on the worker node to connect all the dots. Keep in mind Kubernetes The Hard Way doesn’t use Docker, so we will use the Containerd CLI ctr to explore the container details.ubuntu@worker-0:~$ sudo ctr namespaces lsNAME LABELSk8s.ioWith the Containerd namespace (k8s.io), we can get the container ID’s for ngnix:ubuntu@worker-0:~$ sudo ctr -n k8s.io containers ls | grep nginx4c0bd2e2e5c0b17c637af83376879c38f2fb11852921b12413c54ba49d6983c7 docker.io/library/nginx:latest io.containerd.runtime.v1.linuxAnd pause:ubuntu@worker-0:~$ sudo ctr -n k8s.io containers ls | grep pause0866803b612f2f55e7b6b83836bde09bd6530246239b7bde1e49c04c7038e43a k8s.gcr.io/pause:3.1 io.containerd.runtime.v1.linux21640aea0210b320fd637c22ff93b7e21473178de0073b05de83f3b116fc8834 k8s.gcr.io/pause:3.1 io.containerd.runtime.v1.linuxd19b1b1c92f7cc90764d4f385e8935d121bca66ba8982bae65baff1bc2841da6 k8s.gcr.io/pause:3.1 io.containerd.runtime.v1.linuxThe container ID for ngnix ending in 983c7 matches what we got with kubectl. Let’s see if we can find out which pause container belongs to the nginx pod.ubuntu@worker-0:~$ sudo ctr -n k8s.io task lsTASK PID STATUS…d19b1b1c92f7cc90764d4f385e8935d121bca66ba8982bae65baff1bc2841da6 27255 RUNNING4c0bd2e2e5c0b17c637af83376879c38f2fb11852921b12413c54ba49d6983c7 27331 RUNNINGDo you remember the PID’s 27331 and 27355 running in the network namespace cni-912bcc63–712d-1c84–89a7–9e10510808a0?ubuntu@worker-0:~$ sudo ctr -n k8s.io containers info d19b1b1c92f7cc90764d4f385e8935d121bca66ba8982bae65baff1bc2841da6{ “ID”: “d19b1b1c92f7cc90764d4f385e8935d121bca66ba8982bae65baff1bc2841da6”, “Labels”: { “io.cri-containerd.kind”: “sandbox”, “io.kubernetes.pod.name”: “nginx-65899c769f-wxdx6”, “io.kubernetes.pod.namespace”: “default”, “io.kubernetes.pod.uid”: “0b35e956-8080-11e8-8aa9-0a12b8818382”, “pod-template-hash”: “2145573259”, “run”: “nginx” }, “Image”: “k8s.gcr.io/pause:3.1”,…Andubuntu@worker-0:~$ sudo ctr -n k8s.io containers info 4c0bd2e2e5c0b17c637af83376879c38f2fb11852921b12413c54ba49d6983c7{ “ID”: “4c0bd2e2e5c0b17c637af83376879c38f2fb11852921b12413c54ba49d6983c7”, “Labels”: { “io.cri-containerd.kind”: “container”, “io.kubernetes.container.name”: “nginx”, “io.kubernetes.pod.name”: “nginx-65899c769f-wxdx6”, “io.kubernetes.pod.namespace”: “default”, “io.kubernetes.pod.uid”: “0b35e956-8080-11e8-8aa9-0a12b8818382” }, “Image”: “docker.io/library/nginx:latest”,…We now know exactly which containers are running in this pod (nginx-65899c769f-wxdx6) and network namespace (cni-912bcc63–712d-1c84–89a7–9e10510808a0):nginx (ID: 4c0bd2e2e5c0b17c637af83376879c38f2fb11852921b12413c54ba49d6983c7)pause (ID: d19b1b1c92f7cc90764d4f385e8935d121bca66ba8982bae65baff1bc2841da6)So, how is this pod (nginx-65899c769f-wxdx6) actually connected to the network?. Let’s take the pause PID 27255 we got before to run commands in its network namespace (cni-912bcc63–712d-1c84–89a7–9e10510808a0).ubuntu@worker-0:~$ sudo ip netns identify 27255cni-912bcc63-712d-1c84-89a7-9e10510808a0We will use nsenter for this purpose with option -t to specify the target pid and also provide -n without a file in order to enter the network namespace of the target process (27255). Let’s see what ip link show,ubuntu@worker-0:~$ sudo nsenter -t 27255 -n ip link show1: lo: mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:003: eth0@if7: mtu 1500 qdisc noqueue state UP mode DEFAULT group default link/ether 0a:58:0a:c8:00:04 brd ff:ff:ff:ff:ff:ff link-netnsid 0and ifconfig eth0 say:ubuntu@worker-0:~$ sudo nsenter -t 27255 -n ifconfig eth0eth0: flags=4163 mtu 1500 inet 10.200.0.4 netmask 255.255.255.0 broadcast 0.0.0.0 inet6 fe80::2097:51ff:fe39:ec21 prefixlen 64 scopeid 0x20 ether 0a:58:0a:c8:00:04 txqueuelen 0 (Ethernet) RX packets 540 bytes 42247 (42.2 KB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 177 bytes 16530 (16.5 KB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0We confirm the IP address we got before from kubectl get pod is configured on pod’s eth0 interface. This interface is part of a veth pair; one end in the pod and the other in the root namespace. To find out what interface is on the other end, we use ethtool.ubuntu@worker-0:~$ sudo ip netns exec cni-912bcc63-712d-1c84-89a7-9e10510808a0 ethtool -S eth0NIC statistics: peer_ifindex: 7This tells us the peer ifindex is 7. We can now check what that is in the root namespace. We can do this with ip link:ubuntu@worker-0:~$ ip link | grep ‘^7:’7: veth71f7d238@if3: mtu 1500 qdisc noqueue master cnio0 state UP mode DEFAULT group defaultto double-check, see:ubuntu@worker-0:~$ sudo cat /sys/class/net/veth71f7d238/ifindex7Cool, the virtual link is clear now. We can see what else is connected to our Linux bridge with brctl:ubuntu@worker-0:~$ brctl show cnio0bridge name bridge id STP enabled interfacescnio0 8000.0a580ac80001 no veth71f7d238 veth73f35410 vethf273b35fSo we have this:Validate routingHow do we actually forward traffic?. Let’s look at the routing table in the pod’s network namespace:ubuntu@worker-0:~$ sudo ip netns exec cni-912bcc63-712d-1c84-89a7-9e10510808a0 ip route showdefault via 10.200.0.1 dev eth010.200.0.0/24 dev eth0 proto kernel scope link src 10.200.0.4So, we know how to get to the root namespace at least (default via 10.200.0.1). Let’s check the host’s route table now:ubuntu@worker-0:~$ ip route listdefault via 10.240.0.1 dev eth0 proto dhcp src 10.240.0.20 metric 10010.200.0.0/24 dev cnio0 proto kernel scope link src 10.200.0.110.240.0.0/24 dev eth0 proto kernel scope link src 10.240.0.2010.240.0.1 dev eth0 proto dhcp scope link src 10.240.0.20 metric 100We know how to forward packets to the VPC Router (Your VPC has an implicit router, which normally has the the second address in the primary IP range for the subnet ). Now, does the VPC router know how to reach each pod network?; No it doesn’t, so you’d expect the CNI-plugin installs routes there or you just do it manually (as in the guide). Haven’t checked yet, but the AWS CNI-plugin probably handles this for us in AWS. Keep in mind there are tons of CNI-plugins out there, this example represents the simplest network setup.NAT Deep diveLet’s create two identical busybox containers with a Replication Controller using kubectl create -f busybox.yaml.We get:$ kubectl get pods -o wideNAME READY STATUS RESTARTS AGE IP NODEbusybox0-g6pww 1/1 Running 0 4s 10.200.1.15 worker-1busybox0-rw89s 1/1 Running 0 4s 10.200.0.21 worker-0…Pings from one container to another should be successful:$ kubectl exec -it busybox0-rw89s — ping -c 2 10.200.1.15PING 10.200.1.15 (10.200.1.15): 56 data bytes64 bytes from 10.200.1.15: seq=0 ttl=62 time=0.528 ms64 bytes from 10.200.1.15: seq=1 ttl=62 time=0.440 ms— 10.200.1.15 ping statistics —2 packets transmitted, 2 packets received, 0% packet lossround-trip min/avg/max = 0.440/0.484/0.528 msTo understand the traffic flow you can either capture packets with tcpdump or use conntrack.ubuntu@worker-0:~$ sudo conntrack -L | grep 10.200.1.15icmp 1 29 src=10.200.0.21 dst=10.200.1.15 type=8 code=0 id=1280 src=10.200.1.15 dst=10.240.0.20 type=0 code=0 id=1280 mark=0 use=1The pod’s source IP address10.200.0.21 is translated to the node IP 10.240.0.20.ubuntu@worker-1:~$ sudo conntrack -L | grep 10.200.1.15icmp 1 28 src=10.240.0.20 dst=10.200.1.15 type=8 code=0 id=1280 src=10.200.1.15 dst=10.240.0.20 type=0 code=0 id=1280 mark=0 use=1You can see the counters increasing in iptables as follows:ubuntu@worker-0:~$ sudo iptables -t nat -Z POSTROUTING -L -vChain POSTROUTING (policy ACCEPT 0 packets, 0 bytes) pkts bytes target prot opt in out source destination…5 324 CNI-be726a77f15ea47ff32947a3 all — any any 10.200.0.0/24 anywhere /* name: “bridge” id: “631cab5de5565cc432a3beca0e2aece0cef9285482b11f3eb0b46c134e457854” */Zeroing chain `POSTROUTING’On the other hand if we removed “ipMasq”: true from the CNI-plugin config, we would see the following (we don’t recommend changing this config on a running cluster, this is only for educational purposes):$ kubectl get pods -o wideNAME READY STATUS RESTARTS AGE IP NODEbusybox0-2btxn 1/1 Running 0 16s 10.200.0.15 worker-0busybox0-dhpx8 1/1 Running 0 16s 10.200.1.13 worker-1…Ping should still work:$ kubectl exec -it busybox0-2btxn — ping -c 2 10.200.1.13PING 10.200.1.6 (10.200.1.6): 56 data bytes64 bytes from 10.200.1.6: seq=0 ttl=62 time=0.515 ms64 bytes from 10.200.1.6: seq=1 ttl=62 time=0.427 ms— 10.200.1.6 ping statistics —2 packets transmitted, 2 packets received, 0% packet lossround-trip min/avg/max = 0.427/0.471/0.515 msWithout NAT in this case:ubuntu@worker-0:~$ sudo conntrack -L | grep 10.200.1.13icmp 1 29 src=10.200.0.15 dst=10.200.1.13 type=8 code=0 id=1792 src=10.200.1.13 dst=10.200.0.15 type=0 code=0 id=1792 mark=0 use=1So, we just verified “all containers can communicate with all other containers without NAT”.ubuntu@worker-1:~$ sudo conntrack -L | grep 10.200.1.13icmp 1 27 src=10.200.0.15 dst=10.200.1.13 type=8 code=0 id=1792 src=10.200.1.13 dst=10.200.0.15 type=0 code=0 id=1792 mark=0 use=1You probably noticed in the busybox example the IP addresses allocated for a busybox pod were different in each case. What if we wanted to make these containers available, so the other pods could reach?. You could take their current pod IP addresses, but these will eventually change. For this reason, you want to configure a Service resource that will proxy requests to a set of ephemeral pods.“A Service in Kubernetes is an abstraction which defines a logical set of Pods and a policy by which to access them” [Kubernetes Services]There are different ways to expose a service; the default type is ClusterIP, which will setup an IP address out of the cluster CIDR (only reachable from within the cluster). One example is the DNS Cluster Add-on configured in Kubernetes The Hard Way.kubectl reveals the Service keeps track of the endpoints, and it will do the translation for you.$ kubectl -n kube-system describe services…Selector: k8s-app=kube-dnsType: ClusterIPIP: 10.32.0.10Port: dns 53/UDPTargetPort: 53/UDPEndpoints: 10.200.0.27:53Port: dns-tcp 53/TCPTargetPort: 53/TCPEndpoints: 10.200.0.27:53…How exactly?… iptables again. Let’s go through the rules that were created for this example. You can list them all with the iptables-save command.As packets are produced by a process (OUTPUT) or just arrived on the network interface (PREROUTING), they are inspected by the following iptables chains:-A PREROUTING -m comment –comment “kubernetes service portals” -j KUBE-SERVICES-A OUTPUT -m comment –comment “kubernetes service portals” -j KUBE-SERVICESThe following targets match TCP packets destined to 10.32.0.10 port 53 and translate the destination address to 10.200.0.27 port 53.-A KUBE-SERVICES -d 10.32.0.10/32 -p tcp -m comment –comment “kube-system/kube-dns:dns-tcp cluster IP” -m tcp –dport 53 -j KUBE-SVC-ERIFXISQEP7F7OF4-A KUBE-SVC-ERIFXISQEP7F7OF4 -m comment –comment “kube-system/kube-dns:dns-tcp” -j KUBE-SEP-32LPCMGYG6ODGN3H-A KUBE-SEP-32LPCMGYG6ODGN3H -p tcp -m comment –comment “kube-system/kube-dns:dns-tcp” -m tcp -j DNAT –to-destination 10.200.0.27:53The following targets match UDP packets destined to 10.32.0.10 port 53 and translate the destination address to 10.200.0.27 port 53.-A KUBE-SERVICES -d 10.32.0.10/32 -p udp -m comment –comment “kube-system/kube-dns:dns cluster IP” -m udp –dport 53 -j KUBE-SVC-TCOU7JCQXEZGVUNU-A KUBE-SVC-TCOU7JCQXEZGVUNU -m comment –comment “kube-system/kube-dns:dns” -j KUBE-SEP-LRUTK6XRXU43VLIG-A KUBE-SEP-LRUTK6XRXU43VLIG -p udp -m comment –comment “kube-system/kube-dns:dns” -m udp -j DNAT –to-destination 10.200.0.27:53There are other types of Services in Kubernetes; NodePort in particular is also covered in Kubernetes The Hard Way. See Smoke Test: Services.kubectl expose deployment nginx –port 80 –type NodePortNodePort exposes the service on each Node’s IP at a static port (the NodePort). You can access the NodePort service from outside the cluster. You can check the port allocated with kubectl (31088 in this example).$ kubectl describe services nginx…Type: NodePortIP: 10.32.0.53Port: 80/TCPTargetPort: 80/TCPNodePort: 31088/TCPEndpoints: 10.200.1.18:80…The pod is now reachable from the Internet at http://${EXTERNAL_IP}:31088/. Where EXTERNAL_IP is the public IP address of any of your worker instances. I used the worker-0’s public IP address in this example. The request is received in the node with private IP 10.240.0.20 (the Cloud provider handles the public facing NAT), however the service is actually running in another node (worker-1, you can tell by the endpoint’s IP address10.200.1.18)ubuntu@worker-0:~$ sudo conntrack -L | grep 31088tcp 6 86397 ESTABLISHED src=173.38.XXX.XXX dst=10.240.0.20 sport=30303 dport=31088 src=10.200.1.18 dst=10.240.0.20 sport=80 dport=30303 [ASSURED] mark=0 use=1So the packet is forwarded to worker-1 from worker-0 where it reaches destination.ubuntu@worker-1:~$ sudo conntrack -L | grep 80tcp 6 86392 ESTABLISHED src=10.240.0.20 dst=10.200.1.18 sport=14802 dport=80 src=10.200.1.18 dst=10.240.0.20 sport=80 dport=14802 [ASSURED] mark=0 use=1Is it ideal?. Probably not, but it works. The iptables rules programed in this case are:-A KUBE-NODEPORTS -p tcp -m comment –comment “default/nginx:” -m tcp –dport 31088 -j KUBE-SVC-4N57TFCL4MD7ZTDA-A KUBE-SVC-4N57TFCL4MD7ZTDA -m comment –comment “default/nginx:” -j KUBE-SEP-UGTFMET44DQG7H7H-A KUBE-SEP-UGTFMET44DQG7H7H -p tcp -m comment –comment “default/nginx:” -m tcp -j DNAT –to-destination 10.200.1.18:80In other words, the destination address of packets with destination port 31088 is translated to 10.200.1.18. The port is also translated from 31088 to 80.We didn’t cover the Service type LoadBalancer that exposes the service externally using a cloud provider’s load balancer, as this post is long enough already.While this might seem like a lot, we are only scratching the surface. I’m planning to cover IPv6, IPVS, eBPF and a couple of interesting actual CNI-plugins next.I hope this has been informative. Please let me know if you think I got something wrong or any typo.Further reading:Kubernetes multi-cluster networking made simpleHow to run IPv6-enabled Docker containers on AWS

You may also like

Leave a Comment