Revision history – Software Heritage archive

visit type:

https://github.com/cilium/cilium

31 August 2024, 23:00:30 UTC

Revision	Author	Date	Message	Commit Date
cdb4503	Paul Chaignon	19 August 2021, 12:12:44 UTC	routing: Fix incorrect detection of Linux slave devices [ upstream commit 07a443e9d262b4ed60ce7355740903ffe634f8d3 ] Using method Slave() exposed by the netlink package doesn't always work. In particular, it doesn't work on AKS, maybe because there's no master bond interface in that case. We should instead rely on the flags passed by Linux's netlink API. Fixes: 3e245517 ("routing: Fix incorrect interface selection for pod routes") Signed-off-by: Paul Chaignon <paul@cilium.io>	19 August 2021, 14:11:08 UTC
a182cd3	Paul Chaignon	16 August 2021, 19:33:01 UTC	routing: Fix incorrect interface selection for pod routes [ upstream commit 3e245517c9112b664e01cd47c0900beacbdedf93 ] The Configure method relies on the MAC address to select the proper egress interface for new pods (EKS and AKS). Several interfaces can however have the same MAC address in the case of slave devices. In such a case, the wrong interface may be selected. To avoid this, we skip Linux slave devices during the lookup by MAC address. Thus, in case of slave devices, we will select the master device. Fixes: 26308b63 ("Implement support for cilium-health in ENI mode") Signed-off-by: Paul Chaignon <paul@cilium.io>	17 August 2021, 17:37:04 UTC
5c8f74c	Paul Chaignon	16 August 2021, 19:29:51 UTC	routing: Throw error if MAC lookup finds several devices [ upstream commit 11c0faa94730d489a1fa5dc989410d5e12009ee2 ] When setting up the Linux routes and rules for ENI and Azure, we lookup the interfaces by their MAC addresses. In that case, we want to ensure a single interface is found for the given MAC address. If several are found, we throw an error now rather than to fail in a more obscure way down the line. Signed-off-by: Paul Chaignon <paul@cilium.io>	17 August 2021, 17:37:02 UTC
5bcf83c	Joe Stringer	19 July 2021, 23:45:00 UTC	Prepare for release v1.9.9 Signed-off-by: Joe Stringer <joe@cilium.io>	19 July 2021, 23:56:42 UTC
92be5a0	Joao Victorino	01 July 2021, 10:54:40 UTC	cgroups: Fix improper error on return [ upstream commit e3f9c61a67815fcbb22bc545267496d8372bd218 ] [ Backporter's notes: handle rename pkg/cgroups/cgroups{,_linux}.go ] If cgroupv2 is not mounted, a logical mistake returns a error even after a successful mount, which is silently ignored by the caller function. This commit fixes the logical mistake and takes care of print a warning message if the cgroupv2 could not be mounted. Fixes: #15997 Signed-off-by: Joao Victorino <joao@accuknox.com> Signed-off-by: Joe Stringer <joe@cilium.io>	19 July 2021, 19:34:42 UTC
3ebf33b	Tobias Klauser	14 July 2021, 13:36:58 UTC	Update Go to 1.15.14 Signed-off-by: Tobias Klauser <tobias@cilium.io>	16 July 2021, 21:51:36 UTC
05fdc2d	Jarno Rajahalme	06 July 2021, 04:29:30 UTC	wip: Add WaitGroup for SelectorCache user notifications [ upstream commit fc6ef4d5cd0764c7e67a72ed62b105e4c1c80263 ] Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>	16 July 2021, 11:58:43 UTC
55b58b2	Jarno Rajahalme	04 June 2021, 06:30:21 UTC	policy: Make selectorcache callbacks lock-free [ upstream commit 7e91f36c5c9845af8de62a652a5406c206b0bb24 ] Make IdentitySelectionUpdated() callbacks lock-free by queueing them while still holding selectorcache lock (to keep FIFO order) and calling from a goroutine not holding any locks. This prevents deadlocks caused by the implementation of IdentitySelectionUpdated() taking locks such as endpoint or selectorcache locks. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>	16 July 2021, 11:58:43 UTC
2ee4c66	André Martins	03 July 2021, 23:22:39 UTC	Revert "policy: Make selectorcache callbacks lock-free" [ upstream commit a97bd0d8f99fa18be533570b9f6afa4ce4649f3a ] This reverts commit a75599da7964fb5e24c3362dfbdedf7d2f455089. has it seems to be causing a lot of FQDN flakes through the entire CI. Signed-off-by: André Martins <andre@cilium.io>	16 July 2021, 11:58:43 UTC
8ca9a6a	Jarno Rajahalme	08 July 2021, 18:52:42 UTC	envoy: Keep track of proxy listeners separately [ upstream commit 099c34d977b73491618454d1a9ea253623665c2d ] Since the addition of Envoy prometheus listener it has been possible to have non-proxy listeners configured with Envoy. Waiting for Envoy N/ACKs must be disabled when no proxy listeners are configured, even if a prometheus listener may still be configured. Without this fix adding endpoints may fail due to not receiving N/ACKs from Envoy after Envoy has been started due to an L7 network policy, and this policy is removed, if the Cilium option '--proxy-prometheus-port' is also configured. Fixes: #12949 Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>	16 July 2021, 11:58:43 UTC
b47dc3d	Aditi Ghag	07 July 2021, 22:11:20 UTC	install/kubernetes: Remove `sh` and `mount` dependency from init container [ upstream commit a76bbde4320591399cd94392ab39650737ee4e13 ] The mount-cgroup init container runs a mount command on the underlying host using `nsenter`. However, certain distros like Talos don't have `sh` or `mount` utilities available. Hence, move the logic to check and mount cgroup2 fs to a statically linked Go program binary. Fixes: fa8bea45562f ("cilium-daemonset: Fix ineffective socket-lb caused by incorrect cgroup2 fs mount") Signed-off-by: Aditi Ghag <aditi@cilium.io>	15 July 2021, 22:31:28 UTC
4cc07fe	Aditi Ghag	07 July 2021, 00:48:58 UTC	install/kubernetes: Mount host's `/proc` in init container [ upstream commit eecca2e0d912b5c17a998c317b4c3d3e4c142e89 ] The mount-cgroup init container needs to run a mount command on the underlying host. But the current approach to mount `/proc/1/ns` fails on distros like Fedora when running on minikube - mounting "/proc/1/ns" to rootfs at "/var/lib/docker/.../merged/hostpid1ns" caused: permission denied: unknown Mount host's `/proc` instead. Fixes: fa8bea45562f ("cilium-daemonset: Fix ineffective socket-lb caused by incorrect cgroup2 fs mount") Reported-By: André Martins <andre@cilium.io> Signed-off-by: Aditi Ghag <aditi@cilium.io>	15 July 2021, 22:31:28 UTC
023c489	Aditi Ghag	07 July 2021, 01:37:07 UTC	install/kubernetes: Set image pull policy for init container [ upstream commit f9b79c7bb776653fdc002eb8bfbff051d36156ad ] Fixes: fa8bea45562f ("cilium-daemonset: Fix ineffective socket-lb caused by incorrect cgroup2 fs mount") Signed-off-by: Aditi Ghag <aditi@cilium.io>	15 July 2021, 22:31:28 UTC
4e3a256	Aditi Ghag	10 June 2021, 20:04:52 UTC	docs: Add troubleshooting steps to the kube-proxy free guide [ upstream commit f263235b4e3f0de9dbddbf6353f65dbd6c0ad036 ] Document the requirement that Cilium agent needs to be able to attach BPF cgroup programs at the host cgroup root, in order for socket-based load balancing (aka host-reachable services) to be effective for other pods and host processes. More details in the PR - https://github.com/cilium/cilium/pull/16259 Signed-off-by: Aditi Ghag <aditi@cilium.io>	15 July 2021, 22:31:28 UTC
be9e4a2	Aditi Ghag	10 June 2021, 20:02:42 UTC	docs: Document failure scenario for kind deployment [ upstream commit ca8456ca5606bc03d643bd4eaccaab751b742f06 ] Deploying a kind cluster in an environment where Cilium is already running (for example, in the Cilium development VM) can lead to Cilium pods crashing. This can also happen if there are other BPF cgroup programs attached to the parent ``cgroup`` hierarchy of the kind container nodes. Relevant Linux kernel code reference - https://elixir.bootlin.com/linux/latest/source/kernel/bpf/cgroup.c#L457. Signed-off-by: Aditi Ghag <aditi@cilium.io>	15 July 2021, 22:31:28 UTC
69d7080	Aditi Ghag	08 June 2021, 06:13:20 UTC	Revert "cgroups: Determine cgroup v2 hierarchy root for Kind" [ upstream commit 0c166f6c6488d3ef12afb6015a86cdd222c890e9 ] This reverts commit e9ce8306400bf416087046b8d5b013b23ebdcb3e. This logic is no longer needed as we mount cgroup v2 filesystem from the underlying kubernetes node. This will enable cilium to correctly attach BPF programs at every `kind` node's cgroup root. Signed-off-by: Aditi Ghag <aditi@cilium.io>	15 July 2021, 22:31:28 UTC
633db21	Aditi Ghag	24 June 2021, 05:19:35 UTC	cilium-daemonset: Host cgroup root mount as alternative to auto-mount [ upstream commit 826531447fa5ba18d2fe0df8d7eed9881b47235d ] Cilium agent daemonset auto mounts cgroup2 filesystem on the host by default. However, it needs to mount host's `/proc` inside an init container in order to do that. To disable this auto-mount behavior, we introduce a helm option. When auto-mount is disabled, users can specify the mount point on the underlying host where cgroup v2 fs is already mounted. We then volume mount this directory inside the cilium agent pod. The reason why we don't set the host cgroup2 mount point to a hard-coded path such as `/sys/fs/cgroup`, is because cgroup2 filesystem mount point can be platform dependent. See this note in the cgroup manpage [1] - >Note that on many modern systems, systemd(1) automatically mounts the cgroup2 filesystem at /sys/fs/cgroup/unified during the boot process. [1] https://man7.org/linux/man-pages/man7/cgroups.7.html Suggested-by: Kornilios Kourtis <kornilios@isovalent.com>. Signed-off-by: Aditi Ghag <aditi@cilium.io>	15 July 2021, 22:31:28 UTC
00d4873	Aditi Ghag	21 May 2021, 05:10:57 UTC	cilium-daemonset: Fix ineffective socket-lb caused by incorrect cgroup2 fs mount [ upstream commit fa8bea45562f7ea3005708e968c419720a0ad190 ] If container runtimes are run with cgroup v2, Cilium agent pod would be deployed in a separate cgroup namespace. For example, Docker container runtime with cgroupv2 support switched to private cgroup namespace mode as the default [1]. Due to cgroup namespaces [2], the cgroup fs mounted by the Cilium pod points to a virtualized cgroup hierarchy instead of the host cgroup root. As a result, BPF programs are attached to the nested cgroup root, and socket-lb isn't effective for other pods. Fix: Mount cgroup2 fs from the host so that BPF programs are attached at the host cgroup root. A new init container is added to the Cilium Daemonset that mounts cgroup2 fs on the host. The `/proc/1/ns/` directory on the host is required to be mounted so that cgroup and mount namespaces are enabled as enterable namespaces while running the `nsenter` command. Additionally, cgroup2 fs can be attached to different paths so let's mount it on the host at a cilium-specific custom location. Cilium can thus have control over the location (e.g., create the directory if it doesn't exist). This also helps in effectively identifying if a cgroup2 mount already exists at the custom location. [1] https://docs.docker.com/config/containers/runmetrics/#running-docker-on-cgroup-v2 [2] https://man7.org/linux/man-pages/man7/cgroup_namespaces.7.html Reported-By: Kornilios Kourtis <kornilios@isovalent.com> Fixes: #15137 Signed-off-by: Aditi Ghag <aditi@cilium.io>	15 July 2021, 22:31:28 UTC
9450bfb	Aditi Ghag	14 June 2021, 22:02:28 UTC	defaults: Update default cgroup root [ upstream commit 8b9bc2ed952533d296066ed711ddfae06c2c7ed4 ] `/var/run` is a symlink to `/run` on most platforms, and may not always be present. Also, this is consistent with the `DefaultMapRootFallback` currently configured in the agent. Example - $ sudo mount -t cgroup2 none /var/run/cilium/cgroupv2 $ mount \| grep cgroup none on /run/cilium/cgroupv2 type cgroup2 (rw,relatime) Signed-off-by: Aditi Ghag <aditi@cilium.io>	15 July 2021, 22:31:28 UTC
eafd12f	Jarno Rajahalme	07 July 2021, 11:19:33 UTC	iptables: Remove leading zeroes [ upstream commit d5ff6879dbc50de93cde07b4e6c87f2581106f34 ] Remove leading zeroes from marks, as 'iptables' is not formatting them. This allows proper matching of existing rules and avoids appending duplicate rules. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>	15 July 2021, 16:38:56 UTC
01e7687	Jarno Rajahalme	02 June 2021, 03:08:52 UTC	endpoint: Do not panic in Finalize() [ upstream commit 28e7e39047622a317670638e40b69b4aa4087811 ] Panicing in Finalize functions may leave endpoint locked and brick the whole agent. Better avoid itt and log errors instead, and unlock the Endpoint in defer if it still happens. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>	15 July 2021, 16:38:56 UTC
b14da88	Jarno Rajahalme	01 June 2021, 04:27:37 UTC	iptables: Keep old rules while adding new ones [ upstream commit 5839d2322f3b691e419fcad25a01c29373d96996 ] Keep old iptables rules by renaming Cilium chains so that new rules can be added while old are still in use. Copy old TPROXY rules from the renamed old rules. Remove the backups only after new rules have been successfully added. This change makes it possible to keep old rules in effect while adding new ones without special consideration for transient rules. On first initialization only copy over the DNS proxy TPROXY rules, as other proxies can't reuse old proxy ports across restarts. Pick the last applicable proxy port from iptables, if multiple are present. Remove stale TPROXY rules once the current port is known. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>	15 July 2021, 16:38:56 UTC
e3fd9a2	Jarno Rajahalme	18 June 2021, 19:07:09 UTC	iptables: Add rudimentary unit testing [ upstream commit 537715af01ae560e950563ab866751098d433e59 ] Wrap "iptables" and "ip6tables" programs with iptablesInterface so that unit testing can mock up the executables. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>	15 July 2021, 16:38:56 UTC
49cac4c	Gilberto Bertin	13 May 2021, 08:28:48 UTC	test: re-enable K8sDatapathConfig Host firewall test [ upstream commit 2c10568cada51702d2d2e97ad4ed49d1f8f587a0 ] This commit re-enables the "K8sDatapathConfig Host firewall tests With native routing" test to run with kube-proxy Signed-off-by: Gilberto Bertin <gilberto@isovalent.com>	15 July 2021, 12:04:18 UTC
1973b1d	Gilberto Bertin	18 June 2021, 10:22:59 UTC	bpf: fix iptables masquerading for node -> remote pod traffic [ upstream commit 31927a2e8db5c7ced889cc3618d2372ba4e999c9 ] When Cilium runs with KPR, host-firewall or bandwidth manager, it will try to auto-derive one or more devices to which the bpf_host program is attached. This program will, among other things, redirect ingress traffic destined to a pod to the pod's lxc device using `bpf_redirect()`. This causes the traffic to bypass the nf_conntrack table, leading to a situation where traffic leaving the pod after the connection's been established will be (incorrectly) masqueraded in case Iptables masquerading is enabled, since the connection is not tracked by netfilter. This commit fixes this by skipping `bpf_redirect()` when we detect this case (i.e. traffic is flowing through bpf_host attached to a physical device and Cilium has installed Iptables rules which require conntrack). Fixes: #14859 Suggested-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: Gilberto Bertin <gilberto@isovalent.com>	15 July 2021, 12:04:18 UTC
c0a6ffd	Daniel Borkmann	01 March 2021, 13:04:18 UTC	bpf: enable bpf host routing for tunnels [ upstream commit ffd02dd37aebbea366df9cadc752fe95fb2ba137 ] Lift this constraint now that it is working for tunnels, too. We also transparently get the local Pod->Pod optimization through this. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Gilberto Bertin <gilberto@isovalent.com>	15 July 2021, 12:04:18 UTC
f4159af	Daniel Borkmann	03 March 2021, 10:40:29 UTC	bpf: generally return after endpoint lookup when !from_host [ upstream commit fcf61a7c7154587e36402d10c77992e76551ffe4 ] After the endpoint lookup, we should generally punt up to the stack when traffic arrives on phy dev from external (!from_host). The remainder of the handle_ipv{4,6}() code really only deals with the case when traffic was egressing from cilium_host device. Note that the tunnel encap handling for the nodeport case is done elsewhere in tail_nodeport_nat_ipv{4,6}(). Suggested-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Gilberto Bertin <gilberto@isovalent.com>	15 July 2021, 12:04:18 UTC
4a6016e	Daniel Borkmann	02 March 2021, 12:23:51 UTC	bpf: do not blindly push to stack for bpf host routing on encap [ upstream commit 8b0a9a82a77e433b6ec8200df655cf627a3fb317 ] Also in case of vxlan/geneve, let the bpf_host perform local delivery into Pods, for example, for the case of K8s services where traffic arrives on the phys dev and does not go via vxlan/geneve dev. For this scenario, the same optimizations can be performed as with the direct routing case. Hence lift the skip_redirect constraints for encaps given the return path in bpf_lxc will also support this. Typical case for this is cloud LB pushing inbound traffic to a node's NodePort service as one example where this will improve performance. For the bpf_host prog attached to the phy dev, this means that we perform the ipv4_local_delivery() into a local Pod backend for a service more efficiently compared to before where it gets pushed up the stack, then routed into cilium_host and pushed from there. Note that in tunnel mode the Pod's host-facing lxc devices do not have a policy tc egress program attached, so the tail call into the v{4,6} policy prog of the bpf_lxc is now done at an earlier point which also becomes visible in the RR numbers. Only if there's no local endpoint for the target address, we push up the stack via CTX_ACT_OK as before. Before: root@apoc:~# netperf -H 192.168.180.28 -t TCP_RR -l20 -- -P 13000,12866 MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 13000 AF_INET to 192.168.180.28 () port 12866 AF_INET : demo : first burst 0 Local /Remote Socket Size Request Resp. Elapsed Trans. Send Recv Size Size Time Rate bytes Bytes bytes bytes secs. per sec 16384 131072 1 1 20.00 8709.14 16384 131072 After: root@apoc:~# netperf -H 192.168.180.28 -t TCP_RR -l20 -- -P 13000,12866 MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 13000 AF_INET to 192.168.180.28 () port 12866 AF_INET : demo : first burst 0 Local /Remote Socket Size Request Resp. Elapsed Trans. Send Recv Size Size Time Rate bytes Bytes bytes bytes secs. per sec 16384 131072 1 1 20.00 21983.21 16384 131072 If Pod <-> Pod traffic needs to go over vxlan/geneve, the gains will be smaller since bpf_host needs to push to upper stack for triggering bpf_overlay. We still do the redirect_peer() from the overlay, just that the gain might be less visible in the big picture since the path with vxlan/geneve needs to travere upper layers like routing/netfilter. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Gilberto Bertin <gilberto@isovalent.com>	15 July 2021, 12:04:18 UTC
58a0a04	Daniel Borkmann	01 March 2021, 14:11:00 UTC	bpf: fix up pkt for bpf host routing in tunneling mode [ upstream commit dd7805a2a14ef6080867c1aaf653f630a970eefb ] When switching netns when coming from overlay the packet type is not set to HOST, so we need to do it here in order to avoid being dropped in IP layer. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Gilberto Bertin <gilberto@isovalent.com>	15 July 2021, 12:04:18 UTC
a51cd00	Daniel Borkmann	03 March 2021, 07:50:13 UTC	bpf: disable bpf host routing for flannel chaining [ upstream commit 67eb9de049825fbc1dd655c81f8d25f41deca6fa ] When Cilium's datapath is chained in any way, all bets are off. Lets not bother for such niche case for bpf host routing. Based on recent issues (#15095, #15170) it seems like users might still run with flannel. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Gilberto Bertin <gilberto@isovalent.com>	15 July 2021, 12:04:18 UTC
ee3b8a8	Gilberto Bertin	02 June 2021, 06:30:34 UTC	endpoint: trigger k8s sync controller on identity update [ upstream commit 9e086277de3e23450953c2afd24a68fd727d3066 ] When an endpoint's identity is updated, Cilium does not sync immediately the new state with k8s, but rather waits up to 10 seconds for the sync-to-k8s-ciliumendpoint controller to run, meaning that the the new identity can remain unannounced for up to 10 seconds. This commit fixes this by explicitly triggering the k8s sync controller whenever an endpoint's identity is updated. Fixes: #15097 Suggested-by: Sebastian Wicki <sebastian@isovalent.com> Signed-off-by: Gilberto Bertin <gilberto@isovalent.com>	15 July 2021, 12:04:18 UTC
95bfbe5	Gilberto Bertin	01 June 2021, 11:48:24 UTC	controller: allow to manually trigger it [ upstream commit c61d02fc4233fe925e4d0ca87fa768723190b195 ] Signed-off-by: Gilberto Bertin <gilberto@isovalent.com>	15 July 2021, 12:04:18 UTC
3368b06	dependabot[bot]	12 July 2021, 14:07:10 UTC	build(deps): bump docker/setup-buildx-action from 1.5.0 to 1.5.1 Bumps [docker/setup-buildx-action](https://github.com/docker/setup-buildx-action) from 1.5.0 to 1.5.1. - [Release notes](https://github.com/docker/setup-buildx-action/releases) - [Commits](https://github.com/docker/setup-buildx-action/compare/e673438944759779e411a0f7ceef3ba437dccfa0...abe5d8f79a1606a2d3e218847032f3f2b1726ab0) --- updated-dependencies: - dependency-name: docker/setup-buildx-action dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com>	12 July 2021, 15:04:14 UTC
3e9887c	Chris Tarazi	30 June 2021, 06:26:15 UTC	daemon: Add Azure IPAM mode for setting the native routing CIDR [ upstream commit dc7df4d85 ] [ Backporter's notes: Removed reference to AlibabaCloud IPAM ] This will allow the router IP restoration logic to pick up the correct pod CIDR to validate the router IP. This also fixes the issue where upon Cilium restart, additional IPs were erroneously assigned to `cilium_host`. Signed-off-by: Chris Tarazi <chris@isovalent.com>	12 July 2021, 01:42:16 UTC
d812037	Chris Tarazi	30 June 2021, 06:23:25 UTC	azure, ipam, k8s: Derive primary / VPC CIDR of Azure interface [ upstream commit 8d8a7f88c ] [ Backporter's notes: Resolved conflicts: * CRD schema version * CRD Azure API fields slightly differ * Removed reference to AlibabaCloud IPAM ] To align with other CRD-backed IPAM modes such as ENI and Alibaba, derive the VPC CIDR from the Azure API and set it as the native routing CIDR. This enables the subsequent commit to use the CIDR to validate the router IPs upon restoration. Signed-off-by: Chris Tarazi <chris@isovalent.com>	12 July 2021, 01:42:16 UTC
586ea07	Chris Tarazi	30 June 2021, 23:10:21 UTC	ipam: Fix return inside deriveVpcCIDR() [ upstream commit fc06cbc22 ] [ Backporter's notes: Removed reference to AlibabaCloud IPAM as it doesn't exist in the v1.9 tree. ] The `return` statement wasn't placed in the correct place, as the code should return as soon as a valid result is found. Signed-off-by: Chris Tarazi <chris@isovalent.com>	12 July 2021, 01:42:16 UTC
6cba74c	Chris Tarazi	28 June 2021, 17:07:45 UTC	daemon, node: Fix faulty router IP restoration logic [ upstream commit ff63b0775c0d7603822d79c36c32d274e1ea6a53 ] [ Backporter's notes: Removed AlibabaCloud IPAM as it's not available in the v1.9 tree. ] When running in ENI or Alibaba IPAM mode, or any CRD-backed IPAM mode ("crd") and upon Cilium restart, it was very likely that `cilium_host` was assigned an additional IP. Below is a case where Cilium was restarted 3 times, hence getting 3 additional router IPs: ``` 4: cilium_host@cilium_net: <BROADCAST,MULTICAST,NOARP,UP,LOWER_UP> mtu 9001 qdisc noqueue state UP group default qlen 1000 link/ether 66:03:3c:07:8c:47 brd ff:ff:ff:ff:ff:ff inet 192.168.35.9/32 scope link cilium_host valid_lft forever preferred_lft forever inet 192.168.34.37/32 scope link cilium_host valid_lft forever preferred_lft forever inet 192.168.57.107/32 scope link cilium_host valid_lft forever preferred_lft forever inet6 fe80::6403:3cff:fe07:8c47/64 scope link valid_lft forever preferred_lft forever ``` This was because in CRD-backed IPAM modes, we wait until we fully sync with K8s in order to derive the VPC CIDR, which becomes the pod CIDR on the node. Since the router IP restoration logic was using a different pod CIDR during the router IP validation check, it was erroneously discarding it. This was observed with: ``` 2021-06-25T13:59:47.816069937Z level=info msg="The router IP (192.168.135.3) considered for restoration does not belong in the Pod CIDR of the node. Discarding old router IP." cidr=10.8.0.0/16 subsys=node ``` This is problematic because the extraneous router IPs could be also assigned to pods, which would break pod connectivity. The fix is to break up the router IP restoration process into 2 parts. The first is to attempt a restoration of the IP from the filesystem (`node_config.h`). We also fetch the router IPs from Kubernetes resources since they were already retrieved prior inside k8s.WaitForNodeInformation(). Then after the CRD-backed IPAM is initialized and started (*Daemon).startIPAM() is called, we attempt the second part. This includes evaluating which IPs (either from filesystem or from K8s) should be set as the router IPs. The IPs from the filesystem take precedence. In case the node was rebooted, the filesystem will be wiped so then we'd rely on the IPs from the K8s resources. At this point in the daemon initialization, we have the correct CIDR range as the pod CIDR range to validate the chosen IP. Fixes: beb8bdea3 ("k8s, node: Restore router IPs (`cilium_host`) from K8s resource") Signed-off-by: Chris Tarazi <chris@isovalent.com>	12 July 2021, 01:42:16 UTC
fb819aa	Chris Tarazi	25 May 2021, 19:38:10 UTC	k8s, node: Restore router IPs (`cilium_host`) from K8s resource [ upstream commit beb8bdea384fdc4ccb10769142c8981bb10334d5 ] [ Backporter's notes: * Resolved simple conflict with RegisterCRDs() inside pkg/k8s/init.go. Resolution was to keep both the newly added function and RegisterCRDs(). * Due to https://github.com/cilium/cilium/pull/14800 not being backported to the v1.9 tree, we don't have the ability to set the router IPs via args, hence I needed to modify the warning msg in the case of mismatched router IPs between the filesystem and k8s resources. The user will not be advised to set the router IPs as a workaround, like they would in the v1.10 version of this. ] Previously, after a node reboot, Cilium would allocate a new router IP and append it slice of node IPs. Since the node IPs have already been synced to the K8s resource, meaning there are already IPs present (from the previous Cilium instance), the router IP is appended to the slice. In other parts of Cilium, it is assumed that the router IP is the first node IP (first element of the slice). Since the new router IP has been appended to the end, it is no longer where it is expected, aka no longer the first element. This causes a mismatch of which router IP is to be used. There should only ever be one router IP (one IPv4 or one IPv6). In case of a node reboot, the router IPs cannot be restored because they are wiped away due to the Cilium state dir being mounted as a tmpfs [1]. This commit fixes this to restore the router IPs from the K8s resource (Node or CiliumNode) if they are present in the annotations. This prevents the possibility of having more than one router IP, as described above. Note that router IPs from the K8s resource are only restored if no router IP was found on the filesystem, which is considered the source of truth. In other words, the filesystem takes precedence over the K8s resource. The user is warned in cases of a mismatch between the two different sources. We also check that the IP to be restored is within the pod / node CIDR range, otherwise we ignore it from restoration. [1]: Linux distributions mount /run as tmpfs and Cilium's default state directory is created under /run. (It's worth mentioning that it's also common for /var/run to be symlinked to /run.) Fixes: https://github.com/cilium/cilium/issues/16279 Signed-off-by: Chris Tarazi <chris@isovalent.com>	12 July 2021, 01:42:16 UTC
11bba4f	Chris Tarazi	01 June 2021, 20:08:37 UTC	node: Clear router IPs on Uninitialize() [ upstream commit d620a92632610e293d03e248ac802c0a1177dfa7 ] The subsequent commit will add unit tests that make use of ipv{4,6}RouterAddress and state will need to be cleared during testing. Signed-off-by: Chris Tarazi <chris@isovalent.com>	12 July 2021, 01:42:16 UTC
5d934d7	Chris Tarazi	01 June 2021, 19:33:37 UTC	node: Modify SetIPv6NodeRange() to accept cidr.CIDR [ upstream commit 0db244468ef973a19be507725f35efe2c6d164d5 ] This conforms SetIPv6NodeRange() to have the same prototype as SetIPv4AllocRange(). There was no benefit for them to be different. It will ease the subsequent commits. Signed-off-by: Chris Tarazi <chris@isovalent.com>	12 July 2021, 01:42:16 UTC
6ad543c	Chris Tarazi	21 June 2021, 19:20:41 UTC	k8s: Update libraries to 1.19.12 Also update the k8s tests versions to 1.18.20 and 1.19.12. Signed-off-by: Chris Tarazi <chris@isovalent.com>	08 July 2021, 19:51:03 UTC
bb58c2b	Aditi Ghag	29 June 2021, 23:22:50 UTC	bugtool: Collect BPF cgroup programs related information [ upstream commit 607ca9386269ea0aa240e8273d24508f67a00838 ] `bpftool cgroup tree [CGROUP_ROOT]` [1] provides information about BPF cgroup programs attached at the specified cgroup root. This is particularly useful in checking if the programs are attached at the right cgroup hierarchy. [1] https://manpages.ubuntu.com/manpages/focal/man8/bpftool-cgroup.8.html Signed-off-by: Aditi Ghag <aditi@cilium.io> Signed-off-by: Tom Payne <tom@isovalent.com>	08 July 2021, 19:47:17 UTC
6e96370	Bruno Miguel Custódio	02 July 2021, 11:39:27 UTC	Revert "docs: add 'endpointRoutes.enabled=true' to aws-cni" [ upstream commit 8c94f11e481107b9c7ae9f257d2919c640d13b75 ] This reverts commit 437e2bbd745a074b6dd140e4bd17208e3ba499f0. The original issue has been fixed, and hence this can be removed (c.f. https://github.com/cilium/cilium/pull/16227). Signed-off-by: Bruno Miguel Custódio <brunomcustodio@gmail.com> Signed-off-by: Tom Payne <tom@isovalent.com>	08 July 2021, 19:47:17 UTC
1ec5f51	André Martins	02 July 2021, 01:04:48 UTC	contrib/docs: rename 'cilium-actions.yml' with 'maintainers-little-helper.yaml" [ upstream commit d936ebf18cc329628529d7881cf5c86082de3fec ] Commit a93c0ed53691 renamed the MLH configuration file. Unfortunately in a lot of places this filename was set and this commit renames those locations with this new filename. Fixes: a93c0ed53691 (".github: Rename maintainer's little helper's config file") Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Tom Payne <tom@isovalent.com>	08 July 2021, 19:47:17 UTC
c0ffbc4	Derek Gaffney	29 June 2021, 22:21:10 UTC	Fix maglev.hashSeed byte length references in docs Signed-off-by: Derek Gaffney <derekmgaffney@gmail.com> [ upstream commit d97240695287087418f4447e6ce1abd5850a3cc1 ] Signed-off-by: Tom Payne <tom@isovalent.com>	08 July 2021, 19:47:17 UTC
21b974c	André Martins	01 July 2021, 04:15:16 UTC	test/helpers: retrieve kube-apiserver logs [ upstream commit 445af9a1b4e32038ffda698f3f7583d30741149c ] To help debug certain flakes, we need kube-apiserver logs available in the test sysdump. This commit adds the ability to retrieve such logs. Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Tom Payne <tom@isovalent.com>	08 July 2021, 19:47:17 UTC
e1b7cbd	André Martins	01 July 2021, 04:13:18 UTC	test/k8sT: set imagePullPolicy for cilium/log-gatherer stable tag [ upstream commit f470a071bd5df373578f720f852a0bd1c53731d8 ] cilium/log-gatherer:v1.1 is not mutable thus we don't need to always performing a pull of that docker image from docker hub. Fixes: a9285f49ca65 ("[CI] Move vagrant start script to separate file") Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Tom Payne <tom@isovalent.com>	08 July 2021, 19:47:17 UTC
09020ef	André Martins	01 July 2021, 04:10:47 UTC	test: fix gathering of kubelet logs [ upstream commit da0fbad0b3be3ccfad8d73f599140b36470e484f ] When using journalctl to read the logs of another system, one need to explicitly pass -D and the directory containing the logs to successfully read the log messages. Fixes: a9285f49ca65 ("[CI] Move vagrant start script to separate file") Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Tom Payne <tom@isovalent.com>	08 July 2021, 19:47:17 UTC
f188ac7	Paul Chaignon	16 June 2021, 22:13:33 UTC	contrib: Identify upstream commits by author and date [ upstream commit 4ddb158e2189fd4298d8adb92200b6937122cb5f ] When listing the commits of pull requests to backport, GitHub doesn't offer a way to find the corresponding commits merged in master. We therefore have to do it manually. To that end, we first retrieve a candidate commit by matching on the exact commit title. Several commits can have the same title however, so we need another check to confirm the candidate commit is the same commit as the pull request's. We currently use 'git patch-id' for the second check. That command computes a unique ID for a patch. It can however have false negatives. For example, 9515d1e ("docs: add a reference of helm values") and de62fa3 ("docs: add a reference of helm values") refer to the same patch, the first being from the pull request and the second from master (i.e., once merged). Nevertheless, when we run 'git patch-id', we get two different IDs: $ git show 9515d1e \| git patch-id 5d928411d72fcdb5c9c24ab2138896e6709e578c 9515d1ea37f1d1122ece73cf061cf47590e90f9e $ git show de62fa3 \| git patch-id de14f63774d0f56ecc1e22db615987bedffe1e4b de62fa37c9ac679fd45bb617e8759dd7a4918ccb Comparing the two commits shows that the difference is actually due to changes not introduced by this commit: $ diff <(git show 9515d1e) <(git show de62fa3) [...] 1997,1998c1997,1998 < @@ -118,7 +118,7 @@ contributors across the globe, there is almost always someone available to help. < \| debug.enabled \| bool \| `false` \| Enable debug logging \| --- > @@ -119,7 +119,7 @@ contributors across the globe, there is almost always someone available to help. > \| disableEndpointCRD \| string \| `"false"` \| Disable the usage of CiliumEndpoint CRD \| [...] We however don't need to use 'git patch-id'. Using the author's email address and date (+ commit title) is usually enough to uniquely identify commits on master. If someone sends two commits with the same title and author date (to the second), then they are definitely trying to game the system. In that unlikely event, we have two rounds of reviews (original pull request and backport pull request) to catch it. This commit implements that change. "%ae%at" (author email followed by author date without spaces) is used as the commit ID instead of the ID generated by git patch-id. Signed-off-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Tom Payne <tom@isovalent.com>	08 July 2021, 19:47:17 UTC
6771b45	Sebastian Wicki	16 June 2021, 10:08:41 UTC	ci: Disable NFS locking [ upstream commit 1dd477dd4198b5bf5e20d8d6b3d4a55d46bc8e89 ] This is an attempt to fix the recent issues with NFS locking in CI, e.g. issue #16551 From the nfs(5) manpage: > When using the nolock option, applications can lock files, but such > locks provide exclusion only against other applications running on > the same client. Remote applications are not affected by these locks. Since in CI, we do not have any remote applications accessing the shared folder, only using local locks should be safe and more robust than using distributed locking. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> Signed-off-by: Tom Payne <tom@isovalent.com>	08 July 2021, 19:47:17 UTC
7f04d53	André Martins	17 May 2021, 22:08:56 UTC	pkg/k8s: add pod IP event change [ upstream commit e92dc6ac6b766e793091410d0cf58c61b01d424d ] This is a follow up of 6bd98ad7e443 ("handle IP addresses modification in running nodes and CEPs") for more information read the commit description of that commit. Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Tom Payne <tom@isovalent.com>	08 July 2021, 19:47:17 UTC
88dc081	dependabot[bot]	05 July 2021, 14:05:44 UTC	build(deps): bump docker/build-push-action from 2.5.0 to 2.6.1 Bumps [docker/build-push-action](https://github.com/docker/build-push-action) from 2.5.0 to 2.6.1. - [Release notes](https://github.com/docker/build-push-action/releases) - [Commits](https://github.com/docker/build-push-action/compare/ad44023a93711e3deb337508980b4b5e9bcdc5dc...1bc1040caef9e604eb543693ba89b5bf4fc80935) --- updated-dependencies: - dependency-name: docker/build-push-action dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com>	05 July 2021, 20:47:55 UTC
9ea19bd	dependabot[bot]	02 July 2021, 14:06:29 UTC	build(deps): bump docker/setup-buildx-action from 1.4.1 to 1.5.0 Bumps [docker/setup-buildx-action](https://github.com/docker/setup-buildx-action) from 1.4.1 to 1.5.0. - [Release notes](https://github.com/docker/setup-buildx-action/releases) - [Commits](https://github.com/docker/setup-buildx-action/compare/a1c666d855a037f439ebb7bf701ee144fcadd307...e673438944759779e411a0f7ceef3ba437dccfa0) --- updated-dependencies: - dependency-name: docker/setup-buildx-action dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com>	02 July 2021, 16:16:54 UTC
0b8b2e4	Bruno Miguel Custódio	30 June 2021, 14:56:55 UTC	docs: update the version specific notes table Updates the table in the "Version Specific Notes" subsection of the "Upgrade" page in order to be explicit about the supported upgrade paths. [ upstream commit eb9a5c4 ] Signed-off-by: Bruno Miguel Custódio <brunomcustodio@gmail.com>	02 July 2021, 12:46:08 UTC
22f9e39	Nicolas Busseneau	30 June 2021, 16:51:58 UTC	workflows: update Kind version to 0.11.1 This is necessary to work around a probable GH infrastructure issue where 0.9.0 suddenly started not to work in GH Actions: https://github.com/helm/kind-action/issues/42 Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>	30 June 2021, 18:05:17 UTC
f26e76b	dependabot[bot]	30 June 2021, 14:06:27 UTC	build(deps): bump helm/kind-action from 1.1.0 to 1.2.0 Bumps [helm/kind-action](https://github.com/helm/kind-action) from 1.1.0 to 1.2.0. - [Release notes](https://github.com/helm/kind-action/releases) - [Commits](https://github.com/helm/kind-action/compare/v1.1.0...v1.2.0) --- updated-dependencies: - dependency-name: helm/kind-action dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com>	30 June 2021, 18:05:17 UTC
e03b6d2	Tobias Klauser	23 June 2021, 15:09:55 UTC	Update Go to 1.15.13 Signed-off-by: Tobias Klauser <tobias@cilium.io>	30 June 2021, 14:22:01 UTC
152ec04	dependabot[bot]	29 June 2021, 14:08:12 UTC	build(deps): bump docker/setup-buildx-action from 1.3.0 to 1.4.1 Bumps [docker/setup-buildx-action](https://github.com/docker/setup-buildx-action) from 1.3.0 to 1.4.1. - [Release notes](https://github.com/docker/setup-buildx-action/releases) - [Commits](https://github.com/docker/setup-buildx-action/compare/v1.3.0...a1c666d855a037f439ebb7bf701ee144fcadd307) --- updated-dependencies: - dependency-name: docker/setup-buildx-action dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com>	29 June 2021, 21:57:30 UTC
1355540	Jarno Rajahalme	04 June 2021, 06:30:21 UTC	policy: Make selectorcache callbacks lock-free [ upstream commit a75599da7964fb5e24c3362dfbdedf7d2f455089 ] Make IdentitySelectionUpdated() callbacks lock-free by queueing them while still holding selectorcache lock (to keep FIFO order) and calling from a goroutine not holding any locks. This prevents deadlocks caused by the implementation of IdentitySelectionUpdated() taking locks such as endpoint or selectorcache locks. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> Signed-off-by: Aditi Ghag <aditi@cilium.io>	28 June 2021, 12:42:47 UTC
0754dd4	Aditi Ghag	16 June 2021, 04:45:27 UTC	lrp: Refactor logic executed on policy delete [ upstream commit 92d851dbf3eee0f51cb3944f8b2745044dde5dbd ] The `deletePolicyService` function was previously common to both delete policy and delete service callbacks. Refactor the logic to pass the policy config directly, thereby skip config look up. Signed-off-by: Aditi Ghag <aditi@cilium.io>	28 June 2021, 12:42:47 UTC
a2c8160	Aditi Ghag	16 June 2021, 04:40:21 UTC	lrp: Skip restoring service on delete operation [ upstream commit a7d73e4c8063457b4223285dcb4ba232930bbc3b ] Previously, we were restoring the original clusterIP service even when the service was deleted. Signed-off-by: Aditi Ghag <aditi@cilium.io>	28 June 2021, 12:42:47 UTC
6996c64	Paul Chaignon	16 June 2021, 12:16:30 UTC	ipsec: Fix logging of SPI after key rotations [ upstream commit d42614e0a053fb37dd16130776616a3b88431224 ] Five minutes after IPsec key rotations, we cleanup the old IPsec state and print the following message: level=info msg="New encryption keys reclaiming SPI" spi=0 subsys=ipsec Unfortunately, due to a bug the SPI was always 0 in that log message. This commit changes it and also logs the old SPI value if we have it: level=info msg="New encryption keys reclaiming SPI" SPI=7 oldSPI=0 subsys=ipsec Fixes: 3f12fb6 ("cilium: ipsec, add cleanup xfrm routine") Signed-off-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Aditi Ghag <aditi@cilium.io>	28 June 2021, 12:42:47 UTC
23ba8d3	Martynas Pumputis	17 June 2021, 09:54:19 UTC	node-neigh: Use arping ts in last ping hashmap [ upstream commit 4c4a5dc5d5aa80a26de8ea589ac51014f7057480 ] The change is probably noop, but itshould improve the last ping timestamp precision. Signed-off-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: Aditi Ghag <aditi@cilium.io>	28 June 2021, 12:42:47 UTC
ff715c1	Martynas Pumputis	17 June 2021, 09:34:24 UTC	node-neigh: Add retry for concurrent arping test case [ upstream commit 8260f9dd72bee0a62708128d71194e9d4eb6887b ] The test became notoriously flaky. It seems that some goroutines were lagging behind with the updates and they were overwritting the new MAC addr entry with the obsolete. To fix this, retry multiple times until the correct entry is found. Signed-off-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: Aditi Ghag <aditi@cilium.io>	28 June 2021, 12:42:47 UTC
20e7166	Martynas Pumputis	17 June 2021, 09:33:33 UTC	testutils: Add WaitUntilWithSleep [ upstream commit 128f0f8db3c2bb53f041c02c3ca8f866a8b2dc55 ] As for some cases WaitUntil() is a DoS tool. Signed-off-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: Aditi Ghag <aditi@cilium.io>	28 June 2021, 12:42:47 UTC
3516e3c	Daniel Borkmann	21 June 2021, 11:54:59 UTC	bpf: fix hw_csum issue for icmp probe packets [ upstream commit 27122d4d666be42b564a06200c32647ca3c73405 ] Example trace seen in dmesg: [...] [ 7710.165608] enp10s0f0np0: hw csum failure [ 7710.165621] skb len=84 headroom=78 headlen=84 tailroom=30 mac=(64,14) net=(78,20) trans=98 shinfo(txflags=0 nr_frags=0 gso(size=0 type=0 segs=0)) csum(0x0 ip_summed=2 complete_sw=0 valid=0 level=0) hash(0x14006e3a sw=0 l4=0) proto=0x0800 pkttype=0 iif=4 [ 7710.165631] dev name=enp10s0f0np0 feat=0x0x0032b18217514ba9 [ 7710.165635] skb headroom: 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 7710.165638] skb headroom: 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 7710.165641] skb headroom: 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 7710.165644] skb headroom: 00000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 7710.165646] skb headroom: 00000040: b8 ce f6 05 e7 62 b8 ce f6 05 e7 76 08 00 [ 7710.165649] skb linear: 00000000: 45 00 00 54 8a 07 00 00 40 01 84 e8 c0 a8 a0 04 [ 7710.165652] skb linear: 00000010: 0a 9a 00 73 00 00 23 57 00 f8 15 db cd 74 d0 60 [ 7710.165654] skb linear: 00000020: 00 00 00 00 5c 2d 0d 00 00 00 00 00 10 11 12 13 [ 7710.165657] skb linear: 00000030: 14 15 16 17 18 19 1a 1b 1c 1d 1e 1f 20 21 22 23 [ 7710.165660] skb linear: 00000040: 24 25 26 27 28 29 2a 2b 2c 2d 2e 2f 30 31 32 33 [ 7710.165663] skb linear: 00000050: 34 35 36 37 [ 7710.165665] skb tailroom: 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 7710.165668] skb tailroom: 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 7710.165672] CPU: 26 PID: 0 Comm: swapper/26 Not tainted 5.13.0-rc3+ #174 [ 7710.165674] Hardware name: Gigabyte Technology Co., Ltd. X570 AORUS MASTER/X570 AORUS MASTER, BIOS F22 08/20/2020 [ 7710.165676] Call Trace: [ 7710.165677] <IRQ> [ 7710.165680] dump_stack+0x7d/0x9c [ 7710.165683] netdev_rx_csum_fault.part.0+0x41/0x45 [ 7710.165686] netdev_rx_csum_fault.cold+0xb/0x10 [ 7710.165687] __skb_checksum_complete+0xdd/0xf0 [ 7710.165690] ? skb_send_sock_locked+0x20/0x20 [ 7710.165692] ? reqsk_fastopen_remove+0x190/0x190 [ 7710.165693] nf_ip_checksum+0x5b/0x120 [ 7710.165697] nf_conntrack_icmpv4_error+0x112/0x160 [nf_conntrack] [ 7710.165706] nf_conntrack_in.cold+0x1d/0x74 [nf_conntrack] [ 7710.165714] ? nft_do_chain_inet_ingress+0x280/0x2e0 [nf_tables] [ 7710.165722] ipv4_conntrack_in+0x14/0x20 [nf_conntrack] [ 7710.165731] nf_hook_slow+0x44/0xb0 [ 7710.165733] nf_hook_slow_list+0x71/0xf0 [ 7710.165735] ip_sublist_rcv+0x1d1/0x1f0 [ 7710.165737] ? ip_sublist_rcv+0x1f0/0x1f0 [ 7710.165739] ip_list_rcv+0xf5/0x120 [ 7710.165741] __netif_receive_skb_list_core+0x228/0x250 [ 7710.165745] netif_receive_skb_list_internal+0x1a1/0x2b0 [ 7710.165747] napi_complete_done+0x7a/0x1b0 [ 7710.165749] mlx5e_napi_poll+0x16e/0x730 [mlx5_core] [ 7710.165795] __napi_poll+0x31/0x170 [ 7710.165796] net_rx_action+0x22f/0x280 [ 7710.165798] __do_softirq+0xce/0x281 [ 7710.165800] irq_exit_rcu+0xa2/0xd0 [ 7710.165803] common_interrupt+0x8d/0xa0 [ 7710.165805] </IRQ> [ 7710.165806] asm_common_interrupt+0x1e/0x40 [ 7710.165808] RIP: 0010:cpuidle_enter_state+0xcc/0x360 [...] The trace was only reproducible with NICs using CHECKSUM_COMPLETE as csum type for inbound packets. It has been observed with mlx5, for example. The hw csum failure was only reproducible under the following conditions: - Protocol is ICMP, e.g. triggered by Cilium health probe packets - Pod from one node was pinging a remote node address - BPF based masquerading was used to SNAT Pod IP to node IP - BPF NAT engine found a collision in the NAT table such that it was forced to select a different ICMP id, and hence caused L4 rewrites In the case of ICMPv4 the bug was that BPF_F_PSEUDO_HDR was used for updating the L4 checksum. However, ICMPv4 does not have a pseudo header, only ICMPv6. The packet based csum was okay either way, but the flag caused to have a buggy skb->csum. Setting flag to 0 for ICMPv4 stopped the hw csum traces. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Co-developed-by: Kornilios Kourtis <kornilios@isovalent.com> Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com> Signed-off-by: Aditi Ghag <aditi@cilium.io>	28 June 2021, 12:42:47 UTC
45ee508	Jarno Rajahalme	15 June 2021, 00:32:22 UTC	k8s: Fix logging [ upstream commit db06a64c3e0ecb87a5d7ba23dd33c09628f78456 ] Log the correct field for HostIP. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> Signed-off-by: Aditi Ghag <aditi@cilium.io>	28 June 2021, 12:42:47 UTC
17c6ab1	Mauricio Vásquez	04 June 2021, 15:36:36 UTC	pkg/option: Fix default assignment of EnableWellKnownIdentities [ upstream commit 67b946de0539edea49e7fd1079c5b83681a30f74 ] Fixes: 09d9e1e0e2d9 ("policy: Disable well-known identities for non-managed etcd") Signed-off-by: Mauricio Vásquez <mauricio@accuknox.com> Signed-off-by: Mauricio Vásquez <mauricio@kinvolk.io> Signed-off-by: Aditi Ghag <aditi@cilium.io>	28 June 2021, 12:42:47 UTC
864d451	Bruno Miguel Custódio	25 June 2021, 09:53:00 UTC	Fix typo. [ upstream commit 776af64 ] Signed-off-by: Bruno Miguel Custódio <brunomcustodio@gmail.com>	25 June 2021, 10:05:16 UTC
ac17c06	Bruno Miguel Custódio	08 June 2021, 11:23:37 UTC	Add missing descriptions to 'Helm Reference'. Some fields appear in the 'Helm Reference' page without an associated description. This commit aims at fixing that. [ upstream commit 2b77f96 ] Signed-off-by: Bruno Miguel Custódio <brunomcustodio@gmail.com>	25 June 2021, 10:05:16 UTC
271ab51	Bruno Miguel Custódio	25 June 2021, 09:47:00 UTC	Include codegen warning in 'helm-values.rst'. Include a comment in 'helm-values.rst' indicating that the file is generated automatically. This will hopefully limit the risk of having contributors opening PRs to edit it directly. [ upstream commit ea0820e ] Signed-off-by: Bruno Miguel Custódio <brunomcustodio@gmail.com>	25 June 2021, 10:05:16 UTC
7833969	Bruno Miguel Custódio	25 June 2021, 09:46:38 UTC	Add 'helm-values.rst' to '.gitattributes'. Adding so that GitHub automatically folds its diff by default for reviews given that it is a generated file. [ upstream commit 93392dd ] Signed-off-by: Bruno Miguel Custódio <brunomcustodio@gmail.com>	25 June 2021, 10:05:16 UTC
adb46b0	Bruno Miguel Custódio	08 June 2021, 11:13:20 UTC	Use a fork of 'helm-docs'. 'helm-docs' has a bug which causes it to include comments belonging to previously-appearing but commented-out fields. A fix has been proposed in https://github.com/norwoodj/helm-docs/pull/99, but hasn't been reviewed yet. While said PR doesn't get merged it's preferable to switch to a fork containing the fix so we can have a proper description for our Helm chart fields. [ upstream commit 8d4f1ea ] Signed-off-by: Bruno Miguel Custódio <brunomcustodio@gmail.com>	25 June 2021, 10:05:16 UTC
8c3c43c	Bruno Miguel Custódio	24 June 2021, 08:37:08 UTC	docs: add a reference of helm values Manual backport of de62fa3 to the v1.9 branch. [ upstream commit de62fa3 ] Signed-off-by: Bruno Miguel Custódio <brunomcustodio@gmail.com>	25 June 2021, 10:05:16 UTC
d799edf	dependabot[bot]	23 June 2021, 14:07:19 UTC	build(deps): bump docker/login-action from 1.9.0 to 1.10.0 Bumps [docker/login-action](https://github.com/docker/login-action) from 1.9.0 to 1.10.0. - [Release notes](https://github.com/docker/login-action/releases) - [Commits](https://github.com/docker/login-action/compare/28218f9b04b4f3f62068d7b6ce6ca5b26e35336c...f054a8b539a109f9f41c372932f1ae047eff08c9) --- updated-dependencies: - dependency-name: docker/login-action dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com>	23 June 2021, 17:01:58 UTC
33bdcf2	Sebastian Wicki	15 June 2021, 09:09:18 UTC	docs: Hubble UI does not show L7 endpoints anymore [ upstream commit fea1b27541c2dbe83fa5bdfc2808737f94ef7802 ] This removes a note in the Hubble getting started guide that recommends turning on L7 visibility to see what L7 endpoints are being used by flows. This feature has been removed in Hubble UI 0.7 (Cilium 1.9), making the note obsolete. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> Signed-off-by: Paul Chaignon <paul@cilium.io>	23 June 2021, 10:15:52 UTC
7d2969f	Maciej Kwiek	11 June 2021, 10:25:18 UTC	ci: restart portmap service on ci nodes [ upstream commit ad65c7939cb75e362aa24012b4a99f1db3e2a3a3 ] Signed-off-by: Maciej Kwiek <maciej@isovalent.com> Signed-off-by: Paul Chaignon <paul@cilium.io>	23 June 2021, 10:15:52 UTC
c893e19	Sebastian Wicki	17 May 2021, 17:02:08 UTC	docs: ENIs should not be unmanaged by the OS [ upstream commit b15cee151fc70274125bfbc122fb1c7c60e0671b ] When ENIs are managed by services such as NetworkManager or systemd-networkd, it can happen that they interfere with Cilium's configuration. For example, systemd-networkd can remove the ENI IP assigned by Cilium if the carrier is temporarily down, thus breaking SNAT. We previously had a similar section regarding NetworkManager and DHCP in the EKS installation guide, but the EKS guide has since been replaced by the Cilium CLI installation guide. This section here therefore acts as a replacement and states that the devices need to be unmanaged (e.g. disabling DHCP is not enough for systemd-networkd). Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> Signed-off-by: Paul Chaignon <paul@cilium.io>	23 June 2021, 10:15:52 UTC
1d8e9d4	Quentin Monnet	07 May 2021, 10:41:24 UTC	docs: add a "Copy Commands" button for shell-session snippets [ upstream commit 869e678b1ae3461b169259155e3bb52b6b4fa072 ] Add a "Copy Commands" to some code blocks. This new button attempts to copy only commands (and not their output) to the clipboard. The distinction between commands and output relies on the presence of a prompt symbol, either "$" or "#", at the beginning of the commands. If a command ends with a trailing backslash, copy the next line as well. For example, the following snippet: .. code-block:: shell-session $ ls -l foo cat $ echo 1 \ 2 \ 3\ 4 $nospace # exit should place the following text into the clipboard: ls -l echo 1 2 3 4 exit The button is added for the following blocks, when they contain several lines and at least one command is found in the block: - "code-block", but with language "shell-session" only, - Literal blocks ("::"), - Parsed literals. Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Paul Chaignon <paul@cilium.io>	23 June 2021, 10:15:52 UTC
6ce4d15	Gilberto Bertin	04 June 2021, 15:12:45 UTC	node: fix arpping test [ upstream commit 5a418a372f38004dae12275a5a3c0df6338cbd16 ] In TestArpPingHandling, wait for all goroutines that are inserting the new neighbors to finish before deleting the node. Fixes: #16221 Suggested-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: Gilberto Bertin <gilberto@isovalent.com> Signed-off-by: Paul Chaignon <paul@cilium.io>	23 June 2021, 10:15:52 UTC
e7a03c0	Chris Tarazi	04 May 2021, 23:48:40 UTC	docs: Recommend use of dev VM for backporting [ upstream commit 7a4184f1195c0dd81a84cd3b265de19fb0f0fbb8 ] This will reduce chances of users using their own vagrant VMs which may come with libraries that are incompatible with our dependencies. Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Paul Chaignon <paul@cilium.io>	23 June 2021, 10:15:52 UTC
27ff8dc	Chris Tarazi	04 May 2021, 23:41:02 UTC	docs: Update requirements for backporting [ upstream commit 6032268f7d815f858c7135cb61e8bd8afae39b95 ] Since we want to move forward with using the GitHub CLI for creating backports, the previously listed optional items are actually required. Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Paul Chaignon <paul@cilium.io>	23 June 2021, 10:15:52 UTC
82fb9e5	Martynas Pumputis	12 May 2021, 13:51:58 UTC	daemon: Improve log msg of device auto-detection [ upstream commit 117be40f577d71ac542fccfb595d3cc97ebbdae5 ] Previously, the msg was misleading by stating that devices were being derived for the NodePort BPF. It's no longer the case, as the same devices are used by host-fw and bwm. Reported-by: Gilberto Bertin <gilberto@isovalent.com> Reported-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: Paul Chaignon <paul@cilium.io>	23 June 2021, 10:15:52 UTC
cf8659a	Martynas Pumputis	12 May 2021, 13:50:20 UTC	daemon: Remove redundant device derivation for host-fw [ upstream commit b0e2881d6a2614cc6ba387e384a3dda39a0d7ee5 ] The devices are being derived by handleNativeDevices() invoked above. Signed-off-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: Paul Chaignon <paul@cilium.io>	23 June 2021, 10:15:52 UTC
e6426a6	Paul Chaignon	19 May 2021, 21:33:58 UTC	test: Use new test-verifier image in K8sVerifier [ upstream commit 4b3ec5760061e24dce7c749b624e1b5bb5f64c4c ] Until now, K8sVerifier was using the cilium-builder image to build the datapath and run verifier-test.sh. Having a K8sVerifier-specific image also allows us to include a patch for the tc binary, to increase the maximum size of the verifier log buffer. In the K8sVerifier test, we load all BPF programs in verbose mode, so the log buffer is always needed (vs. only in case of retry following a load error usually). A small log buffer can lead to a load failure that is actually a false positive (it's just the log buffer being too small and not an actual issue with the BPF program). Signed-off-by: Paul Chaignon <paul@cilium.io>	23 June 2021, 10:15:52 UTC
bfce530	John Watson	26 May 2021, 23:21:06 UTC	install: Allow setting enable-health-check-nodeport to 'false' [ upstream commit b69258b55db65cd50ab21eb5891f107c82131c8a ] Signed-off-by: John Watson <johnw@planetscale.com> Signed-off-by: Paul Chaignon <paul@cilium.io>	23 June 2021, 10:15:52 UTC
115b3d1	Aditi Ghag	27 May 2021, 22:44:22 UTC	docs: Clarify LRP loop related note [ upstream commit 27838336eb8de7213cc66ffcff686d1d9f6c0001 ] The previous document note can lead to confusion for readers, whereby its misinterpreted as node-local translation being skipped, but regular translation can happen. Clarify how we avoid forming a loop by rewording the note. Signed-off-by: Aditi Ghag <aditi@cilium.io> Signed-off-by: Paul Chaignon <paul@cilium.io>	23 June 2021, 10:15:52 UTC
4ddc3f0	Paul Chaignon	11 May 2021, 14:06:38 UTC	bpf/Makefile: Remove workaround for complexity issue [ upstream commit 05512b2851fce6eab19722bd7d284d88068795e5 ] On master and with kernels 5.10+, we have a complexity issue when ENABLE_HOST_SERVICES_FULL is undefined (i.e., socket-level load balancing is disabled and additional code compiled in bpf_lxc as a replacement). Our verifier test included a workaround for that issue, by always defining ENABLE_HOST_SERVICES_FULL on newer kernels. This commit removes that workaround since the previous commit fixed the complexity issue. Signed-off-by: Paul Chaignon <paul@cilium.io>	23 June 2021, 10:15:52 UTC
68a2a05	Paul Chaignon	11 May 2021, 13:55:25 UTC	loader, bpf: Use mcpu=v3 on kernels 5.10+ [ upstream commit 631f3510efe3260b097bae465a8e82cdc13db08b ] Set mcpu=v3 in the compiler on kernels 5.10+ to use all available eBPF instructions and 32-bit registers. This change fixes the complexity issue we're hitting on v5.10+ when socket-level load balancing is disabled (via enable-host-services=false or kube-proxy-replacement=disabled). Using the third eBPF instruction set doesn't reduce complexity for all BPF programs but it leads to more standard numbers, with less variations in complexities. A big part of this improvement is due to the implicit use of mattr=+alu32 to enable 32-bit eBPF registers. In addition to the end-to-end test on bpf-next, this change was tested on kernels 5.10 and 5.11 with the existing verifier-test.sh, compiling the datapath with both KERNEL=netnext and KERNEL=419. Signed-off-by: Paul Chaignon <paul@cilium.io>	23 June 2021, 10:15:52 UTC
4d36cac	Paul Chaignon	21 April 2021, 15:15:14 UTC	bpf: Fixes to support mattr=+alu32 [ upstream commit ad0d3cf341402feaf34cb612c0f57732b0f5d5b4 ] mattr=+alu32, supported since LLVM 7.0 and implied by mcpu=v3, enables the use of 32-bit registers in BPF bytecode. Enabling this compiler option can however result in loading issues as illustrated below. 12: (61) r1 = (u32 )(r0 +80) // ctx->data_end 13: (61) r6 = (u32 )(r0 +76) // ctx->data 14: (bc) w7 = w6 // <- verifier looses track of inferred pkt type here. [...] 38: (71) r1 = (u8 )(r7 +20) R7 invalid mem access 'inv' These errors typically happen because the data and data_end pointers are actually 32-bit registers. Depending on how these pointers are used, LLVM sometimes makes use of that assumption (e.g., 32-bit assignment on instruction 14 above). The verifier is however not able to follow and reject such programs. We can usually work around those by ensuring these pointers are only used via 64-bit types. This commit implements this wherever needed to pass the verifier. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Paul Chaignon <paul@cilium.io>	23 June 2021, 10:15:52 UTC
e665022	Jarno Rajahalme	12 January 2021, 19:46:47 UTC	datapath: Do not use proxy original source address with tunneling [ upstream commit 4b769cb53a15a7dcc30b8d2eac36094cb4cc071e ] Tunnel headers carry the source security ID so the use of original source address on Envoy upstream connections is not needed when tunneling. This commit disables the use of original source address when tunneling is used, which allows Envoy redirection to work also when using Kind to simulate multiple nodes in a single docker host. Signed-off-by: Jarno Rajahalme <jarno@covalent.io> Signed-off-by: Paul Chaignon <paul@cilium.io>	23 June 2021, 10:15:52 UTC
9f13155	dependabot[bot]	18 June 2021, 14:06:09 UTC	build(deps): bump actions/upload-artifact from 2.2.3 to 2.2.4 Bumps [actions/upload-artifact](https://github.com/actions/upload-artifact) from 2.2.3 to 2.2.4. - [Release notes](https://github.com/actions/upload-artifact/releases) - [Commits](https://github.com/actions/upload-artifact/compare/ee69f02b3dfdecd58bb31b4d133da38ba6fe3700...27121b0bdffd731efa15d66772be8dc71245d074) --- updated-dependencies: - dependency-name: actions/upload-artifact dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com>	21 June 2021, 12:09:35 UTC
45df313	dependabot[bot]	17 June 2021, 14:06:36 UTC	build(deps): bump actions/download-artifact from 2.0.9 to 2.0.10 Bumps [actions/download-artifact](https://github.com/actions/download-artifact) from 2.0.9 to 2.0.10. - [Release notes](https://github.com/actions/download-artifact/releases) - [Commits](https://github.com/actions/download-artifact/compare/158ca71f7c614ae705e79f25522ef4658df18253...3be87be14a055c47b01d3bd88f8fe02320a9bb60) --- updated-dependencies: - dependency-name: actions/download-artifact dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com>	17 June 2021, 18:47:33 UTC
527494c	Paul Chaignon	07 June 2021, 14:24:18 UTC	.github: Rename maintainer's little helper's config file This commit renames the config. file to better clarify its purpose. Signed-off-by: Paul Chaignon <paul@cilium.io>	17 June 2021, 12:11:55 UTC
1fd0920	Alexandre Perrin	07 June 2021, 15:52:24 UTC	examples: add an example of a hubble-cli Deployment In order to debug Relay to Hubble connectivity issues, it is sometimes useful to have a Pod running with the Hubble CLI. Because the Relay image is based on a scratch image, kubectl exec'ing into it is not possible. While the Hubble CLI can be found in the Cilium Pods, the Relay certificate needed to establish the mTLS handshake to the Hubble server is not mounted into the Cilium Pods. This commit introduce a new hubble-cli Deployment example. When debugging Relay mTLS issues, it can be used to quickly run a hubble-cli Pod: kubectl apply -n kube-system -f path/to/hubble-cli.yaml Since the Relay mTLS certificates are mounted into the hubble-cli Pods, one can connect to a Hubble server given it's IP address and ServerName: kubectl exec -it -n kube-system deployment/hubble-cli -- \ hubble observe --server tls://${IP?}:4244 \ --tls-server-name ${SERVERNAME?} \ --tls-ca-cert-files /var/lib/hubble-relay/tls/hubble-server-ca.crt \ --tls-client-cert-file /var/lib/hubble-relay/tls/client.crt \ --tls-client-key-file /var/lib/hubble-relay/tls/client.key Both ${IP} and ${SERVERNAME} can be obtained by either looking at the Hubble Relay Pod logs or alternatively by running: kubectl exec -it -n kube-system deployment/hubble-cli -- \ hubble watch peers --server unix:///var/run/cilium/hubble.sock Signed-off-by: Alexandre Perrin <alex@kaworu.ch>	17 June 2021, 12:10:15 UTC
9167551	Joe Stringer	21 April 2021, 19:34:59 UTC	docs: Update release process against template [ upstream commit a14bf9e213bb8fbaaa3b7b27dc178790c3a8ff33 ] Some recent template changes have not yet been propagated into the docs, update the docs with the latest steps. Signed-off-by: Joe Stringer <joe@cilium.io>	11 June 2021, 21:17:30 UTC
df35463	Joe Stringer	20 April 2021, 23:48:18 UTC	contrib: Automate digest PR creation [ upstream commit 893d0e7ec5766c03da2f0e7b8c548f7c4d89fcd7 ] [ Backporter's notes: Dropped conflicts in .github/ issue template ] There's still some interactive bits here just for safety, but one less step in the template. Signed-off-by: Joe Stringer <joe@cilium.io>	11 June 2021, 21:17:30 UTC
0a0b0b4	Joe Stringer	20 April 2021, 23:44:36 UTC	contrib: Make docker digest pull more idempotent [ upstream commit ef199d851e2077b8568df9fc79463ea7daaff9db ] Check the args properly so we don't require a version to be specified, and recreate the digest file every time the script is run. Signed-off-by: Joe Stringer <joe@cilium.io>	11 June 2021, 21:17:30 UTC
42155b7	dependabot[bot]	08 June 2021, 06:16:14 UTC	build(deps): bump KyleMayes/install-llvm-action from 1.3.0 to 1.4.0 Bumps [KyleMayes/install-llvm-action](https://github.com/KyleMayes/install-llvm-action) from 1.3.0 to 1.4.0. - [Release notes](https://github.com/KyleMayes/install-llvm-action/releases) - [Commits](https://github.com/KyleMayes/install-llvm-action/compare/v1.3.0...v1.4.0) --- updated-dependencies: - dependency-name: KyleMayes/install-llvm-action dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com>	08 June 2021, 22:06:43 UTC
82b5af8	Gaurav Genani	07 May 2021, 20:16:18 UTC	bugtool: add missing bpftool map dumps [ upstream commit c573ff85c02a3a404bfd6873baf65b5ea408cdf0 ] Fixes:#16008 Signed-off-by: Gaurav Genani <h3llix.pvt@gmail.com> Signed-off-by: Quentin Monnet <quentin@isovalent.com>	03 June 2021, 14:45:48 UTC

Newer
Older