Revision history - None - origin: https://github.com/cilium/cilium

visit type:

Revision	Author	Date	Message	Commit Date
06915ce	Maxim Mikityanskiy	26 July 2023, 13:32:44 UTC	Prepare for release v1.11.19 Signed-off-by: Maxim Mikityanskiy <maxim@isovalent.com>	26 July 2023, 18:28:54 UTC
fefa16b	Jarno Rajahalme	20 June 2023, 06:17:59 UTC	endpoint: Do not override deny entries with proxy redirects [ upstream commit 8aa89ef7088108fe7c5dfdb482ee57fb4ee02d25 ] Use DenyPreferredInsert instead of directly manipulating policy map state to make sure deny entries are not overridden by new proxy redirect entries. Prior to this fix it was possible for a proxy redirect to be pushed onto the policy map when it should have been overridden by a deny at least in these cases: - L3-only deny with L3/L4 redirect: No redirect should be added as the L3 is denied - L3-only deny with L4-only redirect: L4-only redirect should be added and an L3/L4 deny should also be added, but the L3/L4 deny is only added by deny preferred insert, and is missed when the map is manipulated directly. A new test case verifies this. It is clear that in the latter case the addition of the redirect can not be completely blocked, so we can't fix this by making AllowsL4 more restrictive. But also in the former case it is possible that the deny rule only covers a subset of security identities, while the redirect rule covers some of the same security identities, but also some more that should not be blocked. Hence the correct fix here is to leave AllowsL4 to be L3-independent, and cover these cases with deny preferred insert instead of adding redirect entries to the map directly. This commit also contains a related change that allows a redirect entry to be updated, maybe with a changed proxy port. I've not seen evidence that this is currently fixing a bug, but it feels like a real possibility. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>	26 July 2023, 13:02:50 UTC
6dc3fb3	Jarno Rajahalme	06 July 2023, 07:18:45 UTC	policy: Export DenyPreferredInsertWithChanges, make revertible [ upstream commit 9f52abbfdb6d5570b91fe4c1809e4ac02bc7cc0f ] Export DenyPreferredInsertWithChanges and make it revertible by taking a map of old values as a new optional argument. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>	26 July 2023, 13:02:50 UTC
6193eb6	Tam Mach	25 July 2023, 21:12:23 UTC	envoy: Bump envoy version to v1.24.10 This is for the below CVEs from the upstream. CVEs: https://github.com/envoyproxy/envoy/security/advisories/GHSA-pvgm-7jpg-pw5g https://github.com/envoyproxy/envoy/security/advisories/GHSA-69vr-g55c-v2v4 https://github.com/envoyproxy/envoy/security/advisories/GHSA-mc6h-6j9x-v3gq https://github.com/envoyproxy/envoy/security/advisories/GHSA-7mhv-gr67-hq55 Build: The build is coming from https://github.com/cilium/proxy/actions/runs/5661705068/job/15340176601 Release: https://github.com/envoyproxy/envoy/releases/tag/v1.24.10 Signed-off-by: Tam Mach <tam.mach@cilium.io>	26 July 2023, 03:42:44 UTC
af3facf	Paul Chaignon	21 July 2023, 09:44:27 UTC	docs/ipsec: Document RSS limitation [ upstream commit c9983ef8c5c03eac868aa9fe48ce2d9771074255 ] [ Backporter's notes: the changes had to be manually backported to the appropriate files for v1.11, as the docs were restructured in fbc53d084ce34159a3fde3b19e26fc2fbbef9e52 and 69d07f79cb17dd0a543043152a32604bb4226ee3 since then. ] All IPsec traffic between two nodes is always send on a single IPsec flow (defined by outer source and destination IP addresses). As a consequence, RSS on such traffic is ineffective and throughput will be limited to the decryption performance of a single core. Reported-by: Ryan Drew <ryan.drew@isovalent.com> Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>	26 July 2023, 01:02:34 UTC
d559a71	Michi Mutsuzaki	19 July 2023, 16:06:31 UTC	docs: Specify Helm chart version in "cilium install" commands [ upstream commit 12fc68a11f5773d3292d543266e0f16bd0119a0f ] [ Backporter's notes: the changes had to be manually backported to the appropriate files for v1.11, as the docs were restructured in fbc53d084ce34159a3fde3b19e26fc2fbbef9e52 and 69d07f79cb17dd0a543043152a32604bb4226ee3 since then. ] - For the main branch latest docs, clone the Cilium GitHub repo and use "--chart-directory ./install/kubernetes/cilium" flag. - For stable branches, set "--version" flag to the version in the top-level VERSION file. Fixes: #26931 Signed-off-by: Michi Mutsuzaki <michi@isovalent.com> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>	26 July 2023, 01:02:34 UTC
2da39b2	Paul Chaignon	13 July 2023, 17:18:55 UTC	docs/ipsec: Extend troubleshooting section [ upstream commit 3ba76e5781050f3a3f2402f54fb4a6ad34944eb1 ] [ Backporter's notes: the changes had to be manually backported to the appropriate files for v1.11, as the docs were restructured in fbc53d084ce34159a3fde3b19e26fc2fbbef9e52 and 69d07f79cb17dd0a543043152a32604bb4226ee3 since then. ] Recent bugs with IPsec have highlighted a need to document several caveats of IPsec operations. This commit documents those caveats as well as common XFRM errors. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>	26 July 2023, 01:02:34 UTC
3446ec2	Paul Chaignon	21 April 2023, 11:46:42 UTC	test: Print error messages in case of failure [ upstream commit 67a3ab3533a7f77aa4241c0da6b04f5b31da9af9 ] [ Backporter's notes: the changes had to be manually backported to the appropriate files for v1.11, as they were renamed in ffd7e57b377f982fb57cf574564b7f1debef74a4 since then. (main > v1.11) test/k8s/datapath_configuration.go > test/k8sT/DatapathConfiguration.go ] If we check res.WasSuccessful() instead of res, then ginkgo won't print the error message in case the command wasn't successful. Signed-off-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>	26 July 2023, 01:02:34 UTC
0470ee5	Paul Chaignon	21 April 2023, 11:50:31 UTC	test: Avoid downloading conntrack package at runtime [ upstream commit a58cb6a25753a5ddb38b46709bc120c51cd0e56b ] [ Backporter's notes: the changes had to be manually backported to the appropriate files for v1.11, as they were renamed in ffd7e57b377f982fb57cf574564b7f1debef74a4 since then. (main > v1.11) test/k8s/datapath_configuration.go > test/k8sT/DatapathConfiguration.go test/k8s/manifests/log-gatherer.yaml > test/k8sT/manifests/log-gatherer.yaml ] The 'Skip conntrack for pod traffic' test currently downloads the conntrack package at runtime to be able to flush and list Linux's conntrack entries. This sometimes fail because of connectivity issues to the package repositories. Instead, we've now included the conntrack package in the log-gatherer image. We can use those pods to run conntrack commands instead of using the Cilium agent pods. Fixes: 496ce420958 ("iptables: add support for NOTRACK rules for pod-to-pod traffic") Signed-off-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>	26 July 2023, 01:02:34 UTC
459f545	Paul Chaignon	13 July 2023, 17:55:09 UTC	docs/ipsec: Clarify limitation on number of nodes [ upstream commit 39a9def6c24ff08fc2e7d66d6284586051a30146 ] The limitation on the number of nodes in the cluster when using IPsec applies to clustermeshes as well and is the total number of nodes. This limitation arises from the use of the node IDs, which are encoded on 16-bits. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>	21 July 2023, 22:13:45 UTC
50a26e4	Paul Chaignon	06 July 2023, 09:53:24 UTC	bpf, daemon: Have bpf_host support both values for skb->cb[4] [ upstream commit 420d7faea339d8d93852da305a818cdebd41730e ] Commits 4c7cce1bf0 ("bpf: Remove IP_POOLS IPsec code") and 19a62da4e ("bpf: Lookup tunnel endpoint for IPsec rewrite") changed the way we pass the tunnel endpoint. We used to pass it via skb->cb[4] and read it in bpf_host to build the encapsulation. We changed that in the above commits to pass the identity via skb->cb[4] instead. Therefore, on upgrades, for a short while, we may end up with bpf_lxc writing the identity into skb->cb[4] (new datapath version) and bpf_host interpreting it as the tunnel endpoint (old version). Reloading bpf_host before bpf_lxc is not enough to fix it because then, for a short while, bpf_lxc would write the tunnel endpoint in skb->cb[4] (old version) and bpf_host would interpret it as the security identity (new version). In addition to reloading bpf_host first, we also need to make sure that it can handle both cases (skb->cb[4] has tunnel endpoint or identity). To distinguish between those two cases and interpret skb->cb[4] correctly, bpf_host will rely on the first byte: in the case of the tunnel endpoint is can't be zero because that would mean the IP address is within the special purpose range 0.0.0.0/8; in the case of the identity, it has to be zero because identities are on 24-bits. This commit implements those changes. Commit ca9c056deb ("daemon: Reload bpf_host first in case of IPsec upgrade") had already made the agent reload bpf_host first for ENI and Azure IPAM modes, so we just need to extend it to all IPAM modes. Note that the above bug on upgrades doesn't cause an immediate packet drop at the sender. Instead, it seems the packet is encrypted twice. The (unverified) assumption here is that the encapsulation is skipped because the tunnel endpoint IP address is invalid (being a security identity on 24-bits). The encrypted packet is then sent again to cilium_host where the encryption bit is reapplied (given the destination IP address is a CiliumInternalIP). And it goes through the XFRM encryption again. On the receiver's side, the packet is decrypted once as expected. It is then recirculated to bpf_overlay which removes the packet mark. Given it is still an ESP (encrypted) packet, it goes back through the XFRM decryption layer. But since the packet mark is now zero, it fails to match any XFRM IN state. The packet is dropped with XfrmInNoStates. This can be seen in the following trace: <- overlay encrypted flow 0x6fc46fc4 , identity unknown->unknown state new ifindex cilium_vxlan orig-ip 0.0.0.0: 10.0.9.91 -> 10.0.8.32 -> stack encrypted flow 0x6fc46fc4 , identity 134400->44 state new ifindex cilium_vxlan orig-ip 0.0.0.0: 10.0.9.91 -> 10.0.8.32 <- overlay encrypted flow 0x6fc46fc4 , identity unknown->unknown state new ifindex cilium_vxlan orig-ip 0.0.0.0: 10.0.9.91 -> 10.0.8.32 -> host from flow 0x6fc46fc4 , identity unknown->43 state unknown ifindex cilium_vxlan orig-ip 0.0.0.0: 10.0.9.91 -> 10.0.8.32 -> stack flow 0x6fc46fc4 , identity unknown->unknown state unknown ifindex cilium_host orig-ip 0.0.0.0: 10.0.9.91 -> 10.0.8.32 The packet comes from the overlay encrypted, is sent to the stack to be decrypted, and comes back still encrypted. Fixes: 4c7cce1bf0 ("bpf: Remove IP_POOLS IPsec code") Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>	21 July 2023, 22:13:45 UTC
fba7b12	Paul Chaignon	21 June 2023, 18:33:28 UTC	ipsec: Remove workarounds for path asymmetry that was removed [ upstream commit 0a8f2c4ee43e55d8f82e67bba2af186435ff4a29 ] Commits 3b3e8d0b1 ("node: Don't encrypt traffic to CiliumInternalIP") and 5fe2b2d6d ("bpf: Don't encrypt on path hostns -> remote pod") removed a path asymmetry on the paths hostns <> remote pod. They however failed to remove workarounds that we have for this path asymmetry. In particular, we would encrypt packets on the path pod -> remote node (set SPI in the node manager) to try and avoid the path asymmetry, by also encrypting the replies. And, as a result of this first change, we would also need to handle a corner case in the datapath to appluy the correct reverse DNAT for reply traffic. All of that is unnecessary now that we fixed the path asymmetry. Fixes: 3b3e8d0b1 ("node: Don't encrypt traffic to CiliumInternalIP") Fixes: 5fe2b2d6d ("bpf: Don't encrypt on path hostns -> remote pod") Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>	21 July 2023, 22:13:45 UTC
cf2563d	Paul Chaignon	21 June 2023, 16:22:54 UTC	ipsec: Don't match on source IP for XFRM OUT policies [ upstream commit ebd02f1f62c945dc1aabf9694cdfcc51d91df86b ] On IPAM modes with one pod CIDR per node, the XFRM OUT policies look like below: src 10.0.1.0/24 dst 10.0.0.0/24 dir out priority 0 ptype main mark 0x66d11e00/0xffffff00 tmpl src 10.0.1.13 dst 10.0.0.214 proto esp spi 0x00000001 reqid 1 mode tunnel When sending traffic from the hostns, however, it may not match the source CIDR above. Traffic from the hostns may indeed leave the node with the NodeInternalIP as the source IP (vs. CiliumInternalIP which would match). In such cases, we don't match the XFRM OUT policy and fall back to the catch-all default-drop rule, ending up with XfrmOutPolBlock packet drops. Why wasn't this an issue before? It was. Traffic would simply go in plain-text (which is okay given we never intended to encrypt hostns traffic in the first place). What changes is that we now have a catch-all default-drop XFRM OUT policy to avoid leaking plain-text traffic. So it now results in XfrmOutPolBlock errors. In commit 5fe2b2d6da ("bpf: Don't encrypt on path hostns -> remote pod") we removed encryption for the path hostns -> remote pod. Unfortunately, that doesn't mean the issue is completely gone. On a new Cilium install, we won't see this issue of XfrmOutPolBlock drops for hostns traffic anymore. But on existing clusters, we will still see those drops during the upgrade, after the default-drop rule is installed but before hostns traffic encryption is removed. None of this is an issue on AKS and ENI IPAM modes because there, the XFRM OUT policies look like: src 0.0.0.0/0 dst 10.0.0.0/16 dir out priority 0 ptype main mark 0x66d11e00/0xffffff00 tmpl src 10.0.1.13 dst 10.0.0.214 proto esp spi 0x00000001 reqid 1 mode tunnel Thus, hostns -> remote pod traffic is matched regardless of the source IP being selected and packets are not dropped by the default-drop rule. We can therefore avoid the upgrade drops by changing the XFRM OUT policies to never match on the source IPs, as on AKS and ENI IPAM modes. Fixes: 7d44f3750 ("ipsec: Catch-default default drop policy for encryption") Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>	21 July 2023, 22:13:45 UTC
748644d	Paul Chaignon	16 May 2023, 19:54:17 UTC	node: Don't encrypt traffic to CiliumInternalIP [ upstream commit 3b3e8d0b1194ebc1d8c8f0f1525045b521e1c7f9 ] For the similar reasons as in the previous commit, we don't want to encrypt traffic going from a pod to the CiliumInternalIP. This is currently the only node IP address type that is associated an encryption key. Since we don't encrypt traffic from the hostns to remote pods anymore (see previous commit), encrypting traffic going to a CiliumInternalIP (remote node) would result in a path asymmetry: traffic going to the CiliumInternalIP would be encrypted, whereas reply traffic coming from the CiliumInternalIP wouldn't. This commit removes that caseand therefore ensures we never encrypt traffic going to a node IP address. Reported-by: Gray Lian <gray.liang@isovalent.com> Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>	21 July 2023, 22:13:45 UTC
0551b5d	Paul Chaignon	14 May 2023, 21:16:14 UTC	bpf: Don't encrypt on path hostns -> remote pod [ upstream commit 5fe2b2d6da76d6b1d334f3ce9c0c59371b239892 ] In pod-to-pod encryption with IPsec and tunneling, Cilium currently encrypts traffic on the path hostns -> remote pod even though traffic is in plain-text on the path remote pod -> hostns. When using native routing, neither of those paths is encrypted because traffic from the hostns doesn't go through the bpf_host BPF program. Cilium's Transparent Encryption with IPsec aims at encrypting pod-to-pod traffic. It is therefore unclear why we are encrypting traffic from the hostns. The simple fact that only one direction of the connection is encrypted begs the question of its usefulness. It's possible that this traffic was encrypted by mistake: some of this logic is necessary for node-to-node encryption with IPsec (not supported anymore) and pod-to-pod encryption may have been somewhat simplified to encrypt *-to-pod traffic. Encrypting traffic from the hostns nevertheless creates several issues. First, this situation creates a path asymmetry between the forward and reply paths of hostns<>remote pod connections. Path asymmetry issues are well known to be a source of bugs, from of '--ctstate INVALID -j DROP' iptables rules to NAT issues. Second, Gray recently uncovered a separate bug which, when combined with this encryption from hostns, can prevent Cilium from starting. That separate bug is still being investigated but it seems to cause the reload of bpf_host to depend on Cilium connecting to etcd in a clustermesh context. If this etcd is a remote pod, Cilium connects to it on hostns -> remote pod path. The bpf_host program being unloaded[1], it fails. We end up in a cyclic dependency: bpf_host requires connectivity to etcd, connectivity to etcd requires bpf_host. This commit therefore removes encryption with IPsec for the path hostns -> remote pod when using tunneling (already unencrypted when using native routing). 1 - More specifically, in Gray's case, the bpf_host program is already loaded, but it needs to be reloaded because the IPsec XFRM config changed. Without this reload, encryption fails. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>	21 July 2023, 22:13:45 UTC
6892a24	Paul Chaignon	17 May 2023, 09:32:24 UTC	bpf: Remove IPsec dead code in bpf_host [ upstream commit fd6fa25103d5170f294edd283393fe222c5fef8b ] TL;DR. this commit removes a bit of dead code that seems to have been intended for IPsec in native routing mode but is never actually executed. These code paths are only executed if going through cilium_host and coming from the host (see !from_host check above). For remote destinations, we only go through cilium_host if the destination is part of a remote pod CIDR and we are running in tunneling mode. In native routing mode, we go straight to the native device. Example routing table for tunneling (10.0.0.0/24 is the remote pod CIDR): 10.0.0.0/24 via 10.0.1.61 dev cilium_host src 10.0.1.61 mtu 1373 <- we follow this 10.0.1.0/24 via 10.0.1.61 dev cilium_host src 10.0.1.61 10.0.1.61 dev cilium_host scope link 192.168.56.0/24 dev enp0s8 proto kernel scope link src 192.168.56.11** Example routing table for native routing: 10.0.0.0/24 via 192.168.56.12 dev enp0s8 <- we follow this 10.0.1.0/24 via 10.0.1.61 dev cilium_host src 10.0.1.61 10.0.1.61 dev cilium_host scope link 192.168.56.0/24 dev enp0s8 proto kernel scope link src 192.168.56.11 Thus, this code path is only used for tunneling with IPsec. However, IPsec in tunneling mode should already be handled by the encap_and_redirect_with_nodeid call above in the same functions (see info->key argument). So why was this added? It was added in commit b76e6eb59 ("cilium: ipsec, support direct routing modes") to support "direct routing modes". I found that very suspicious because, per the above, in native routing mode, traffic from the hostns shouldn't even go through cilium_host. I thus tested it out. I've checked IPsec with native routing mode, with and without endpoint routes. I can confirm that, in all those cases, traffic from the hostns is not encrypted when going to a remote pod. Therefore, this code is dead. I'm unsure when it died. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>	21 July 2023, 22:13:45 UTC
303134b	Marco Hofstetter	19 July 2023, 11:51:45 UTC	ci: increase ginkgo timeout Increase ginkgo kernel test timeout from 170m to 200m to avoid unnecessary timeouts during test execution. Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com>	20 July 2023, 17:58:56 UTC
d4a6ded	Michi Mutsuzaki	18 July 2023, 00:55:57 UTC	docs: Pick up PyYAML 6.0.1 [ upstream commit e06e70e26fdde5205429b71fdc5263b0d8905adc ] Revert commit 04d48fe3, and pick up PyYAML 6.0.1. Fixes: #26873 Signed-off-by: Michi Mutsuzaki <michi@isovalent.com> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>	19 July 2023, 14:13:57 UTC
180e564	Michi Mutsuzaki	17 July 2023, 16:59:00 UTC	Fix "make -C Documentation builder-image" [ upstream commit 04d48fe3706a83d5612da1195fac78dc69c1a7b4 ] Use this workaround until the issue gets fixed: https://github.com/yaml/pyyaml/issues/601#issuecomment-1638509577 Signed-off-by: Michi Mutsuzaki <michi@isovalent.com> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>	19 July 2023, 14:13:57 UTC
a3d58a4	Tobias Klauser	13 July 2023, 08:26:42 UTC	client, health/client: set dummy host header on unix:// local communication [ upstream commit b9ec2aaece578278733e473a72bb5594f621d495 ] Go 1.20.6 added a security fix [1] which leads to stricter sanitization of the HTTP host header in the net/http client. Cilium's pkg/client currently sets the Host header to the UDS path (e.g. /var/run/cilium/cilium.sock), however the slashes in that Host header now lead net/http to reject it. RFC 7230, Section 5.4 states [2]: > If the authority component is missing or undefined for the target URI, > then a client MUST send a Host header field with an empty field-value. The authority component is undefined for the unix:// scheme. Thus, the correct value to use would be the empty string. However, this does not work due to OpenAPI runtime using the same value for the URL's host and the http client's host header. Thus, use a dummy value "localhost". [1] https://go.dev/issue/60374 [2] https://datatracker.ietf.org/doc/html/rfc7230#section-5.4 Signed-off-by: Tobias Klauser <tobias@cilium.io> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>	19 July 2023, 14:13:57 UTC
4fa7df8	Tam Mach	13 July 2023, 15:22:35 UTC	envoy: Bump envoy to v1.24.9 This is to include the fix for below CVE. CVE: https://github.com/envoyproxy/envoy/security/advisories/GHSA-jfxv-29pc-x22r GHA build: https://github.com/cilium/proxy/actions/runs/5544741749/jobs/10122649239 Signed-off-by: Tam Mach <tam.mach@cilium.io>	14 July 2023, 05:45:28 UTC
e4b0551	Tobias Klauser	10 July 2023, 09:32:43 UTC	ariane: don't skip verifier and l4lb tests on vendor/ changes [ upstream commit 1f35bafb3d1f754a20374d177a65ed8076ee9486 ] Both of these workflows use binaries that are built in CI making use of various vendored dependencies, so run them as well on PRs only changing vendor/. backporting conflicts: * tests-datapath-verifier.yaml doesn't exist in the v1.11 branch Signed-off-by: Tobias Klauser <tobias@cilium.io> Signed-off-by: Gilberto Bertin <jibi@cilium.io>	13 July 2023, 09:23:29 UTC
57302bf	renovate[bot]	11 July 2023, 18:11:15 UTC	chore(deps): update hubble cli to v0.12.0 Signed-off-by: renovate[bot] <bot@renovateapp.com>	13 July 2023, 04:00:58 UTC
5ce15f8	Jarno Rajahalme	10 May 2023, 11:28:20 UTC	test/provision/compile.sh: Make usable from dev VM [ upstream commit 0112ddbb6960e0cbf6153e2fa3c229a32f358af8 ] Add missing 'sudo' commands so that this can be run from a shell in a dev VM to launch a local cilium agent in docker. Only install the bpf mount unit to systemd if not already mounted. This avoids error message like this: Unit sys-fs-bpf.mount has a bad unit file setting With these changes Cilium agent can be compiled and launced in docker, assuming the VM hostname does NOT include "k8s", like so: $ SKIP_TEST_IMAGE_DOWNLOAD=1 VMUSER=${USER} PROVISIONSRC=test/provision test/provision/compile.sh After this 'docker ps' should show a "cilium" container. This can be used, for example to quickly run Cilium agent locally to observer agent startup and exit logs via 'docker logs cilium -f' when stopping cilium with 'docker stop cilium'. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> Signed-off-by: Gilberto Bertin <jibi@cilium.io>	11 July 2023, 14:06:20 UTC
1ea0df0	Tom Hadlaw	08 May 2023, 19:10:01 UTC	test: Fix ACK and FIN+ACK policy drops in hostfw tests [ upstream commit 439a0a059fdcabe23a33b427b637494bc5a59eda ] First see the code comments for the full explanation. This issue with the faulty conntrack entries when enforcing host policies is suspected to cause the flakes that have been polluting host firewall tests. We've seen this faulty conntrack issue happen mostly to health and kube-apiserver connections. And it turns out that the host firewall flakes look like they are caused by connectivity blips on kube-apiserver's side, which error messages such as: error: unable to upgrade connection: Authorization error (user=kube-apiserver-kubelet-client, verb=create, resource=nodes, subresource=proxy) This commit therefore tries to workaround the issue of faulty conntrack entries in host firewall tests. If the flakes are indeed caused by those faulty entries, we shouldn't see them happen anymore. Signed-off-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com> Signed-off-by: Gilberto Bertin <jibi@cilium.io>	11 July 2023, 14:06:20 UTC
3553673	renovate[bot]	07 July 2023, 09:49:43 UTC	chore(deps): update all github action dependencies Signed-off-by: renovate[bot] <bot@renovateapp.com>	07 July 2023, 10:40:27 UTC
d8a2814	Cilium Imagebot	06 July 2023, 13:30:39 UTC	images: update cilium-{runtime,builder} Signed-off-by: Cilium Imagebot <noreply@cilium.io>	07 July 2023, 09:43:42 UTC
065e7aa	renovate[bot]	06 July 2023, 13:05:07 UTC	chore(deps): update docker.io/library/ubuntu:20.04 docker digest to c9820a4 Signed-off-by: renovate[bot] <bot@renovateapp.com>	07 July 2023, 09:43:42 UTC
5945b86	renovate[bot]	23 June 2023, 08:33:33 UTC	chore(deps): update all github action dependencies Signed-off-by: renovate[bot] <bot@renovateapp.com>	07 July 2023, 09:34:16 UTC
7679b69	renovate[bot]	21 June 2023, 09:42:43 UTC	chore(deps): update actions/setup-go action to v4 Signed-off-by: renovate[bot] <bot@renovateapp.com>	07 July 2023, 09:08:29 UTC
1963e2d	renovate[bot]	23 June 2023, 08:33:48 UTC	chore(deps): update docker.io/library/alpine docker tag to v3.16.6 Signed-off-by: renovate[bot] <bot@renovateapp.com>	05 July 2023, 14:29:15 UTC
a706f92	renovate[bot]	23 June 2023, 08:33:41 UTC	chore(deps): update docker.io/library/alpine docker tag to v3.16.6 Signed-off-by: renovate[bot] <bot@renovateapp.com>	05 July 2023, 14:27:09 UTC
fe69ac2	Nicolas Busseneau	30 June 2023, 13:40:16 UTC	ci: rework workflows to be triggered by Ariane on 1.11 [ upstream commit 9949c5a1891aff8982bfc19e7fc195e7ecc2abf1 ] This is a custom backport, please see upstream commit for full details. In this commit, we move stable workflows from the `main` branch back into the 1.11 stable branch now that workflows are triggered via `workflow_dispatch` events in the appropriate context. Since these new workflows were previously living in `main`, we also need to backport dependencies on `.github/actions` configuration files. We take the opportunity to adjust the configuration files as appropriate, notably in terms of K8s version coverage, to ensure that we only test K8s versions officially supported by the stable branch. In particular for 1.11, the AKS workflow was NOT backported because 1.11 only supports K8s versions up to 1.23, and 1.23 is not available on AKS anymore. The stable 1.11 AKS workflow from `main` was already disabled when this changed happened, as per our testing policy, so we are not backporting it. Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>	05 July 2023, 12:10:08 UTC
c32bff4	Nicolas Busseneau	30 June 2023, 13:38:13 UTC	ci: add Ariane configuration file for 1.11 [ upstream commit 4a9ee81c6b6bdb5b63e61287a93ab67a77255c4c ] This is a custom backport, please see upstream commit for full details. Ariane is a new GitHub App intended to trigger Cilium CI workflows based on trigger phrases commented in pull requests, in order to replace the existing `issue_comment`-based workflows and simplify our CI stack. This commit adds a configuration setting up triggers such that existing 1.11 workflows can be triggered with the usual `/test-backport-1.11`, and based on the same PR changelist match / ignore rules. In particular for 1.11, the AKS workflow was NOT backported because 1.11 only supports K8s versions up to 1.23, and 1.23 is not available on AKS anymore. The stable 1.11 AKS workflow from `main` was already disabled when this changed happened, as per our testing policy, so we are not backporting it. Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>	05 July 2023, 12:10:08 UTC
7cf6b03	Joe Stringer	15 June 2023, 22:19:25 UTC	install: Don't install CNI binaries if cni.install=false [ upstream commit 390b4dc0d9ef63ae30f435e3ea2926069ef5a78b ] Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>	29 June 2023, 11:24:54 UTC
ec12aef	gray	12 June 2023, 10:58:43 UTC	ipsec: Split removeStaleXFRMOnce to fix deprioritization issue [ upstream commit f4f3656b32492d31abc533d12692e6fe9b4d32f9 ] We expect deprioritizeOldOutPolicy() to be executed for IPv4 and IPv6, but removeStaleXFRMOnce prevents the second call. If both IPv4 and IPv6 are enabled, v6 xfrm policy won't be deprioritized due to this issue. This commit fixes it by spliting removeStaleXFRMOnce into removeStaleIPv4XFRMOnce and removeStaleIPv6XFRMOnce. Fixes: https://github.com/cilium/cilium/commit/688dc9ac802b11f6c16a9cbc5d60baaf77bd6ed0 Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>	29 June 2023, 11:24:54 UTC
63fa3d1	Martynas Pumputis	12 June 2023, 15:44:32 UTC	cli: Print NodeID in hex [ upstream commit e956bb1a29e131e730d74572a97152d547101143 ] [ backporter's notes: conflicts due to string format being different in v1.11, applied changes based on v1.11 format. ] The Node ID is used in SKB mark used by XFRM policies. The latter print it in hex. So, let's reduce a mental strain by a bit when debugging IPsec issues. Signed-off-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>	29 June 2023, 11:24:54 UTC
43c277a	Martynas Pumputis	12 June 2023, 15:27:49 UTC	bugtool: Add cilium bpf nodeid list [ upstream commit 18f85a014282dec7ddf2c2bf39d54564670352ec ] To help to detect when IPcache is out of sync with locally stored Node IDs. Signed-off-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>	29 June 2023, 11:24:54 UTC
c9c2262	Will Daly	09 June 2023, 20:54:13 UTC	docs: clarify that L3 DNS policies require L7 proxy enabled [ upstream commit e0931df324592358d4645a9f9a31ca87aeddaf70 ] [ backporter's notes: conflicts due to docs structure change, manually applied changes to the corresponding file pre-structure change. ] Add a note to the L3 policy documentation clarifying that L3 DNS policies require the L7 proxy enabled and an L7 policy for DNS traffic so Cilium can intercept DNS responses. Previously, the documentation linked to other sections describing the DNS Proxy, but I know at least a few people who were surprised that a policy under "L3 Examples" would require an L7 proxy. Hopefully adding a note near the beginning of the section will make this requirement more obvious. Signed-off-by: Will Daly <widaly@microsoft.com> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>	29 June 2023, 11:24:54 UTC
5650804	Peter Jausovec	09 June 2023, 22:22:12 UTC	docs: reword incorrect L7 policy description [ upstream commit 68bff35b533fcd4224236a4bef27b5e711c87c69 ] [ backporter's notes: conflicts due to docs structure change, manually applied changes to the corresponding file pre-structure change. ] Fixing incorrect description of the GET /public policy in the L7 section. Signed-off-by: Peter Jausovec <peter.jausovec@solo.io> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>	29 June 2023, 11:24:54 UTC
576580e	Jarno Rajahalme	05 June 2023, 15:25:17 UTC	docker: Detect default "desktop-linux" builder [ upstream commit 13f146eb117f54a20199bfecc1ab226eb9df6bfb ] New Docker desktop may have a default builder with name "desktop-linux" that is not buildx capable. Detect that name as well as the old "default" for the need to create a new buildx builder. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>	29 June 2023, 11:24:54 UTC
ce13536	Jarno Rajahalme	07 June 2023, 09:26:56 UTC	proxy: Increment non-DNS proxy ports on failure [ upstream commit 894aa4e062abda757e7cc952d4663206aa75b08d ] [ backporter's notes: conflicts due to ProxyType not existing on v1.11, used parserType as the v1.11 equivalent. ] Increment non-DNS proxy ports on failure even if DNS has been configured with a static port. Fixes: #20896 Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>	29 June 2023, 11:24:54 UTC
204a802	Jarno Rajahalme	07 June 2023, 09:22:20 UTC	proxy: Only update redirects with configured proxy ports [ upstream commit ca6199827b9a68fd78227cc31afa712a7e7b51f1 ] [ backporter's notes: conflicts due to ProxyType not existing on v1.11, used parserType as the v1.11 equivalent. ] Only update an existing redirect if it is configured. This prevents Cilium agent panic when trying to update redirect with released proxy port. This has only been observed to happen with explicit Envoy listener redirects in CiliumNetworkPolicy when the listener has been removed. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>	29 June 2023, 11:24:54 UTC
8d92f63	Jarno Rajahalme	07 June 2023, 08:04:31 UTC	proxy: Do not panic on local error [ upstream commit 525007f69a87282cb4056820c722a1593402bf0d ] [ backporter's notes: conflicts due to proxy_test.go not existing on v1.11, these changes were skipped. ] CreateOrUpdateRedirect called nil revertFunc when any local error was returned. This was done using the pattern `return 0, err, nil, nil` which sets the revertFunc return variable as nil, but this was called on a deferred function to revert any changes on a local error. Fix this by calling ReverStack.Revert() directly on the deferred function, and setting the return variable if there was no local error. This was hit any time a CiliumNetworkPolicy referred to a non-existing listener. Add a test case that reproduced the panic and works after the fix. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>	29 June 2023, 11:24:54 UTC
ed26f99	Michi Mutsuzaki	23 June 2023, 21:06:35 UTC	v1.11 docs: Use stable-v0.14.txt for cilium-cli version The next cilium-cli release is v0.15.0 with Helm mode as the default installation mode. Continue to use v0.14 cilium-cli for v1.11 docs since we haven't validated v1.11 docs using Helm mode. Also change the branch name from master to main. The default branch name recently changed from master to main in cilium-cli repo. Ref: https://github.com/cilium/cilium-cli/pull/1759 Ref: #26430 Signed-off-by: Michi Mutsuzaki <michi@isovalent.com>	23 June 2023, 22:24:45 UTC
a88d414	Tam Mach	16 June 2023, 12:25:33 UTC	envoy: Bump minor version to v1.24.x This commit is to bump envoy version to v1.24.8, as envoy v1.23 will be EOL next month as per [release schedule](https://github.com/envoyproxy/envoy/blob/main/RELEASES.md#major-release-schedule) The image is coming from below run https://github.com/cilium/proxy/actions/runs/5291782230/jobs/9585253849 Signed-off-by: Tam Mach <tam.mach@cilium.io>	19 June 2023, 22:31:56 UTC
9af9345	Marco Hofstetter	05 June 2023, 07:12:47 UTC	envoy: Bump envoy version to v1.23.10 This is for latest patch release from upstream https://github.com/envoyproxy/envoy/releases/tag/v1.23.10 https://www.envoyproxy.io/docs/envoy/latest/version_history/v1.23/v1.23.10 Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com>	15 June 2023, 16:12:00 UTC
a336dda	Marco Hofstetter	12 May 2023, 11:22:27 UTC	images: introduce update script update-cilium-envoy-image This commit introduces the script `update-cilium-envoy-image.sh` (and corresponding make target) which fetches the latest cilium-envoy image by fetching the relevant data from its github repo. It updates the cilium Dockerfile. Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com>	15 June 2023, 16:12:00 UTC
ba9d077	Quentin Monnet	15 June 2023, 14:27:07 UTC	install: Update image digests for v1.11.18 Generated from https://github.com/cilium/cilium/actions/runs/5279434084. ## Docker Manifests ### cilium `docker.io/cilium/cilium:v1.11.18@sha256:dda94072012c328fe0d00838f2f7d8ead071019d1d1950ecf44060640bf93cae` `quay.io/cilium/cilium:v1.11.18@sha256:dda94072012c328fe0d00838f2f7d8ead071019d1d1950ecf44060640bf93cae` ### clustermesh-apiserver `docker.io/cilium/clustermesh-apiserver:v1.11.18@sha256:b3e8de4e56c5e16ab8f4482cebf3a12bb12826ba3da3e5890de1ecdc2b34a3ed` `quay.io/cilium/clustermesh-apiserver:v1.11.18@sha256:b3e8de4e56c5e16ab8f4482cebf3a12bb12826ba3da3e5890de1ecdc2b34a3ed` ### docker-plugin `docker.io/cilium/docker-plugin:v1.11.18@sha256:b086fc1ec24b9b2b0bc5f7f525ef76ff608c26dc1bdd76d46729871cbbfb4b08` `quay.io/cilium/docker-plugin:v1.11.18@sha256:b086fc1ec24b9b2b0bc5f7f525ef76ff608c26dc1bdd76d46729871cbbfb4b08` ### hubble-relay `docker.io/cilium/hubble-relay:v1.11.18@sha256:4899d8a98c05ccb7bb3d0b54e18dc72147995b2e8a18db19805d15933ec6e45d` `quay.io/cilium/hubble-relay:v1.11.18@sha256:4899d8a98c05ccb7bb3d0b54e18dc72147995b2e8a18db19805d15933ec6e45d` ### operator-alibabacloud `docker.io/cilium/operator-alibabacloud:v1.11.18@sha256:590062c3797c0d0732d848b8fa09cd5aaf5ce2cbbbc5f5fc860bde79d27c743c` `quay.io/cilium/operator-alibabacloud:v1.11.18@sha256:590062c3797c0d0732d848b8fa09cd5aaf5ce2cbbbc5f5fc860bde79d27c743c` ### operator-aws `docker.io/cilium/operator-aws:v1.11.18@sha256:4b3aeeb5d0de096d68ab249845c4c53c7c595735d529a13a81540597a6b29bb5` `quay.io/cilium/operator-aws:v1.11.18@sha256:4b3aeeb5d0de096d68ab249845c4c53c7c595735d529a13a81540597a6b29bb5` ### operator-azure `docker.io/cilium/operator-azure:v1.11.18@sha256:c833cd215dafcb9a73dc1d435d984038fc46ebd9a0b3d50ceeb8f8c4c7e9ac3d` `quay.io/cilium/operator-azure:v1.11.18@sha256:c833cd215dafcb9a73dc1d435d984038fc46ebd9a0b3d50ceeb8f8c4c7e9ac3d` ### operator-generic `docker.io/cilium/operator-generic:v1.11.18@sha256:bccdcc3036b38581fd44bf7154255956a58d7d13006aae44f419378911dec986` `quay.io/cilium/operator-generic:v1.11.18@sha256:bccdcc3036b38581fd44bf7154255956a58d7d13006aae44f419378911dec986` ### operator `docker.io/cilium/operator:v1.11.18@sha256:0c09e5188d5d8899e7b037fafcc1928a68872f1e48e5f7a128799594c99f8282` `quay.io/cilium/operator:v1.11.18@sha256:0c09e5188d5d8899e7b037fafcc1928a68872f1e48e5f7a128799594c99f8282` Signed-off-by: Quentin Monnet <quentin@isovalent.com>	15 June 2023, 14:55:26 UTC
f5d7e2d	Michi Mutsuzaki	14 June 2023, 03:54:13 UTC	Prepare for release v1.11.18 Signed-off-by: Michi Mutsuzaki <michi@isovalent.com>	14 June 2023, 21:00:38 UTC
9d72663	Nate Sweet	13 June 2023, 03:27:52 UTC	docs: Promote Deny Policies out of Beta Signed-off-by: Nate Sweet <nathanjsweet@pm.me>	13 June 2023, 22:55:09 UTC
1e8d21b	Anton Protopopov	13 June 2023, 09:25:26 UTC	docs: fix wording for the upgrade guide Rephrase a recent change to Documentation/operations/upgrade.rst. Signed-off-by: Anton Protopopov <aspsk@isovalent.com>	13 June 2023, 19:22:22 UTC
c3ffd99	Paul Chaignon	06 June 2023, 10:07:07 UTC	ipsec: Don't rely on output-marks to know if state exists On kernels before 4.19, the XFRM output mark is not fully supported. Thus, when comparing XFRM states, if we compare the output marks, the existing states will never match the new state. The new state will have an output mark, but the states installed in the kernel don't have it (because the kernel ignored it). As a result, our current logic will assume that the state we want to install doesn't already exist, it will try to install it, fail because it already exist, assume there's a conflicting state, throw an error, remove the conflicting state, and install the new (but identical) one. The end result is therefore the same: the new state is in place in the kernel. But on the way to installing it, we will emit an unnecessary error and temporarily remove the state (potentially causing packet drops). Instead, we can safely ignore the output-marks when comparing states. We don't expect any states with same IPs, SPI, and marks, but different output-marks anyway. The only way this could happen is if someone manually added such a state. Even if they did, the only impact would be that we wouldn't overwrite the manually-added state with the different output-mark. This patch is only necessary on v1.12 and earlier versions of Cilium because v1.13 dropped support for Linux <4.19. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>	13 June 2023, 19:22:04 UTC
2c0a06d	Paul Chaignon	09 June 2023, 22:22:16 UTC	ipsec: Don't attempt per-node route deletion when unexistant [ upstream commit 1e1e2f7e410d24e4af2d6dbd2cb2ceb016fb76b7 ] Commit 3e59b681f ("ipsec: Per-node XFRM states & policies for EKS & AKS") changed the XFRM config to have one state and policy per remote node in IPAM modes ENI and Azure. The IPsec cleanup logic was therefore also updated to call deleteIPsec() whenever a remote node is deleted. However, we missed that the cleanup logic also tries to remove the per-node IP route. In case of IPAM modes ENI and Azure, the IP route however stays as before: we have a single route for all remote nodes. We therefore don't have anything to cleanup. Because of this unnecessary IP route cleanup attempt, an error message was printed for every remote node deletion: Unable to delete the IPsec route OUT from the host routing table This commit fixes it to avoid attempting this unnecessary cleanup. Fixes: 3e59b681f ("ipsec: Per-node XFRM states & policies for EKS & AKS") Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> Signed-off-by: Quentin Monnet <quentin@isovalent.com>	13 June 2023, 19:22:04 UTC
60f5a1b	Paul Chaignon	09 June 2023, 10:08:16 UTC	ipsec: Only match appropriate XFRM configs with node ID [ upstream commit 57eac9d8b42a19f5aeae412f38de3eaf8bfadc4a ] With commit 9cc8a89f9 ("ipsec: Fix leak of XFRM policies with ENI and Azure IPAMs") we rely on the node ID to find XFRM states and policies that belong to remote nodes, to clean them up when remote nodes are deleted. This commit makes sure that we only do this for XFRM states and policies that actually match on these node IDs. That should only be the same if the mark mask matches on node ID bits. Thus if should look like 0xffffff00 (matches on node ID, SPI, and encryption bit) or 0xffff0f00 (matches on node and encryption bit). Fixes: 9cc8a89f9 ("ipsec: Fix leak of XFRM policies with ENI and Azure IPAMs") Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>	13 June 2023, 19:22:04 UTC
319fa31	Steven Johnson	09 June 2023, 02:42:13 UTC	ipsec: Only delete ipsec endpoint when node ID is not 0 [ upstream commit 25064d1ec51895ab89e2f736fcf7c6c66dfb5551 ] After applying a backport of 9cc8a89f9 ("ipsec: Fix leak of XFRM policies with ENI and Azure IPAMs") to 1.11.16, I noticed that we were getting occasional spikes of "no inbound state" xfrm errors (XfrmInNoStates). These lead to packet loss and brief outages for applications sending traffic to the node on which the spikes occur. I noticed that the "No node ID found for node." logline would appear at the time of these spikes and from the code this is logged when the node ID cannot be resolved. Looking a bit further the call to `DeleteIPsecEndpoint` will end up deleting the xfrm state for any state that matches the node id as derived from the mark in the state. The problem seems to be that the inbound state for 0.0.0.0/0 -> node IP has a mark of `0xd00` which when shifted >> 16 in `getNodeIDFromXfrmMark` matches nodeID 0 and so the inbound state gets deleted and the kernel drops all the inbound traffic as it no longer matches a state. This commit updates that logic to skip the XFRM state and policy deletion when the node ID is zero. Fixes: 9cc8a89f9 ("ipsec: Fix leak of XFRM policies with ENI and Azure IPAMs") Signed-off-by: Steven Johnson <sjdot@protonmail.com> Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>	13 June 2023, 19:22:04 UTC
00bb13b	Paul Chaignon	07 June 2023, 15:37:53 UTC	ipsec: Fix IPv6 wildcard CIDR used in some IPsec policies [ upstream commit d0ab559441311dbe0908834a86d633aa9eeb6a84 ] We use this wildcard IPv6 CIDR in the catch-all default-drop OUT policy as well as in the FWD policy. It was incorrectly set to ::/128 instead of ::/0 and would therefore not match anything instead of matching everything. This commit fixes it. Fixes: e802c2985 ("ipsec: Refactor wildcard IP variables") Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>	13 June 2023, 19:22:04 UTC
3a5aa36	Paul Chaignon	06 June 2023, 16:11:16 UTC	ipsec: Change XFRM FWD policy to simplest wildcard [ upstream commit ac54f2965908c06ff53e5a63a0f47b2448204a18 ] We recently changed our XFRM configuration to have one XFRM OUT policy per remote node, regardless of the IPAM mode being used. In doing so, we also moved the XFRM FWD policy to be installed once per remote node. With ENI and Azure IPAM modes, this wouldn't cause any issue because the XFRM FWD policy is the same regardless of the remote node. On other IPAM modes, however, the XFRM FWD policy is for some reason different depending on the remote node that triggered the installation. As a result, for those IPAM modes, one FWD policy is installed per remote node. And the deletion logic triggered on node deletions wasn't updated to take that into account. We thus have a leak of XFRM FWD policies. In the end, our FWD policy just needs to allow everything through without encrypting it. It doesn't need to be specific to any remote node. We can simply completely wildcard the match, to look like: src 0.0.0.0/0 dst 0.0.0.0/0 dir fwd priority 2975 ptype main tmpl src 0.0.0.0 dst 192.168.134.181 proto esp reqid 1 mode tunnel level use So we match all packets regardless of source and destination IPs. We don't match on the packet mark. There's a small implementation hurdle here. Because we used to install FWD policies of the form "src 0.0.0.0/0 dst 10.0.1.0/24", the kernel was able to deduce which IP family we are matching against and would adapt the 0.0.0.0/0 source CIDR to ::/0 as needed. Now that we are matching on 0/0 for both CIDRs, it cannot deduce this anymore. So instead, we must detect the IP family ourself and use the proper CIDRs. In addition to changing the XFRM FWD policy to the above, we can also stop installing it once per remote node. It's enough to install it when we receive the event for the local node, once. Fixes: 3e59b681f ("ipsec: Per-node XFRM states & policies for EKS & AKS") Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>	13 June 2023, 19:22:04 UTC
10a2851	Jussi Maki	06 June 2023, 12:32:26 UTC	loader: In IPsec reload ignore veth devices & fix settle wait [ upstream commit 592777da560bea7838b99223386e943c08d5d052 ] reloadIPSecOnLinkChanges() did not ignore veth device updates causing reload to be triggered when new endpoints were created. Ignore any updates with "veth" as device type. The draining of updates during settle wait was broken due to unintentional breaking out of the loop. Removed the break. Fixes: bf0940b4ff ("loader: Reinitialize IPsec on device changes on ENI") Signed-off-by: Jussi Maki <jussi@isovalent.com>	13 June 2023, 19:22:04 UTC
f421409	Jussi Maki	06 June 2023, 12:31:43 UTC	loader: Do not fatal on IPsec reinitialization [ upstream commit 470465550bc446b920a62c5b7f7b521cd10b0a9b ] Now that the code is reloading the bpf_network program at runtime we should not fatal if we fail to reload the program since this may be caused by ongoing interface changes (e.g. interface was being removed). Change the log.Fatal into log.Error and keep loading to other interfaces. Fixes: bf0940b4ff ("loader: Reinitialize IPsec on device changes on ENI") Signed-off-by: Jussi Maki <jussi@isovalent.com>	13 June 2023, 19:22:04 UTC
fdba480	Paul Chaignon	30 May 2023, 15:33:58 UTC	ipsec: Allow old and new XFRM OUT states to coexist for upgrade [ upstream commit c0d9b8c9e791b8419c63e5e80b52bc2b39f80030 ] Commit 73c36d45e0 ("ipsec: Match OUT XFRM states & policies using node IDs") changed our XFRM states to match on packet marks of the form 0xXXXXYe00/0xffffff00 where XXXX is the node ID and Y is the SPI. The previous format for the packet mark in XFRM states was 0xYe00/0xff00. According to the Linux kernel these two states conflict (because 0xXXXXYe00/0xffffff00 ∈ 0xYe00/0xff00). That means we can't add the new state while the old one is around. Thus, in commit ddd491bd8 ("ipsec: Custom check for XFRM state existence"), we removed any old conflicting XFRM state before adding the new ones. That however causes packet drops on upgrades because we may remove the old XFRM state before bpf_lxc has been updated to use the new 0xXXXXYe00/0xffffff00 mark. Instead, we would need both XFRM state formats to coexist for the duration of the upgrade. Impossible, you say! Don't despair. Things are actually a bit more complicated (it's IPsec and Linux after all). While Linux doesn't allow us to add 0xXXXXYe00/0xffffff00 when 0xYe00/0xff00 exists, it does allow adding in the reverse order. That seems to be because 0xXXXXYe00/0xffffff00 ∈ 0xYe00/0xff00 but 0xYe00/0xff00 ∉ 0xXXXXYe00/0xffffff00 [1]. Therefore, to have both XFRM states coexist, we can remove the old state, add the new one, then re-add the old state. That is allowed because we never try to add the new state when the old is present. During the short period of time when we have removed the old XFRM state, we can have a packet drops due to the missing state. These drops should be limited to the specific node pair this XFRM state is handling. This will also only happen on upgrades. Finally, this shouldn't happen with ENI and Azure IPAM modes because they don't have such old conflicting states. I tested this upgrade path on a 20-nodes GKE cluster running our drop-sensitive application, migrate-svc, scaled up to 50 clients and 30 backends. I didn't get a single packet drop despite the application consistently sending packets back and forth between nodes. Thus, I think the window for drops to happen is really small. Diff before/after the upgrade (v1.13.0 -> thi patch, GKE): src 10.24.1.77 dst 10.24.2.207 proto esp spi 0x00000003 reqid 1 mode tunnel replay-window 0 mark 0x3e00/0xff00 output-mark 0xe00/0xf00 aead rfc4106(gcm(aes)) 0xfc2d0c4e646b87ff2d0801b57997e3598eab0d6b 128 - anti-replay context: seq 0x0, oseq 0x2c, bitmap 0x00000000 + anti-replay context: seq 0x0, oseq 0x16, bitmap 0x00000000 sel src 0.0.0.0/0 dst 0.0.0.0/0 + src 10.24.1.77 dst 10.24.2.207 + proto esp spi 0x00000003 reqid 1 mode tunnel + replay-window 0 + mark 0x713f3e00/0xffffff00 output-mark 0xe00/0xf00 + aead rfc4106(gcm(aes)) 0xfc2d0c4e646b87ff2d0801b57997e3598eab0d6b 128 + anti-replay context: seq 0x0, oseq 0x0, bitmap 0x00000000 + sel src 0.0.0.0/0 dst 0.0.0.0/0 We can notice that the counters for the existing XFRM state also changed (decreased). That's expected since the state got recreated. 1 - I think this is because XFRM states don't have priorities. So when two XFRM states would match a given packet (in our case a packet with mark XXXXYe00), the oldest XFRM state is taken. Thus, by not allowing to add a more specific match after a more generic one, the kernel ensures that the more specific match is always taken when both match a packet. That likely corresponds to user expectations. That is, if both 0xXXXXYe00/0xffffff00 and 0xYe00/0xff00 match a packet, we would probably expect 0xXXXXYe00/0xffffff00 to be used. Fixes: ddd491bd8 ("ipsec: Custom check for XFRM state existence") Fixes: 73c36d45e0 ("ipsec: Match OUT XFRM states & policies using node IDs") Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>	13 June 2023, 19:22:04 UTC
5fd9b0b	Paul Chaignon	28 May 2023, 21:44:58 UTC	daemon: Reload bpf_host first in case of IPsec upgrade [ upstream commit ca9c056deb31f6e0747c951be24b25d67ea99d6d ] As explained in the previous commit, we need to switch our IPsec logic from one implementation to another. This implementation requires some synchronized work between bpf_lxc and bpf_host. To enable this switch without causing drops, the previous commit made bpf_host support both implementations. This is quite enough though. For this to work, we need to ensure that bpf_host is always reloaded before any bpf_lxc is loaded. That is, we need to load the bpf_host program that supports both implementations before we actually start the switch from one implementation to the second. This commit makes that change in the order of BPF program reloads. Instead of regenerating the bpf_host program (i.e., the host endpoint's datapath) in a goroutine like other BPF programs, we will regenerate it first, as a blocking operation. Regenerating the host endpoint's datapath separately like this will delay the agent startup. This regeneration was measured to take around 1 second on an EKS cluster (though it can probably grow to a few seconds depending on the node type and current load). That should stay fairly small compared to the overall duration of the agent startup (around 30 seconds). Nevertheless, this separate regeneration is only performed when we actually need: for IPsec with EKS or AKS IPAM mode. Fixes: 4c7cce1bf ("bpf: Remove IP_POOLS IPsec code") Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>	13 June 2023, 19:22:04 UTC
7afecc7	Paul Chaignon	28 May 2023, 21:16:29 UTC	bpf: Support the old IP_POOLS logic in bpf_host [ upstream commit 0af2303f534bb155918e86f07f0f3f4686d2a927 ] This commit reverts the bpf_host changes of commit 4c7cce1bf ("bpf: Remove IP_POOLS IPsec code"). The IP_POOLS IPsec code was a hack to avoid having one XFRM OUT policy and state per remote node. Instead, we had a single XFRM OUT policy and state that would encrypt traffic as usual, but encapsulate it with placeholder IP addresses, such as 0.0.0.0 -> 192.168.0.0. Those outer IP addresses would then be rewritten to the proper IPs in bpf_host. To that end, bpf_lxc would pass the destination IP address, the tunnel endpoint, to bpf_host via a skb->cb slot. The source IP address was hardcoded in the object file. Commit 4c7cce1bf ("bpf: Remove IP_POOLS IPsec code") thus got rid of that hack to instead have per-node XFRM OUT policies and states. The kernel therefore directly writes the proper outer IP addresses. Unfortunately, the transition from one implementation to the other isn't so simple. If we simply remove the old IP_POOLS code as done in commit 4c7cce1bf, then we will have drops on upgrade. We have two cases, depending on which of bpf_lxc or bpf_host is reloaded first: 1. If bpf_host is reloaded before the new bpf_lxc is loaded, then it won't rewrite the outer IP addresses anymore. In that case, we end up with packets of the form 0.0.0.0 -> 192.168.0.0 leaving on the wire. Obviously, they don't go far and end up dropped. 2. If bpf_lxc is reloaded before the new bpf_host, then it will reuse skb->cb for something else and the XFRM layer will handle the outer IP addresses. But because bpf_host is still on the old implementation, it will try to use skb->cb to rewrite the outer IP addresses. We thus end up with gibberish outer destination IP addresses. One way to fix this is to have bpf_host support both implementations. This is what this commit does. The logic to rewrite the outer IP addresses is reintroduced in bpf_host, but it is only executed if the outer source IP address is 0.0.0.0. That way, we will only rewrite the outer IP addresses if bpf_lxc is on the old implementation and the XFRM layer didn't write the proper outer IPs. Fixes: 4c7cce1bf ("bpf: Remove IP_POOLS IPsec code") Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>	13 June 2023, 19:22:04 UTC
a8ce874	Paul Chaignon	28 May 2023, 20:51:56 UTC	ipsec: Deprioritize old XFRM OUT policy for dropless upgrade [ upstream commit a11d088154b2d3fe50d0ce750aca87b3fabb19e5 ] This is a revert, or rather a reimplementation, of commit 688dc9ac8 ("ipsec: Remove stale XFRM states and policies"). In that commit, we would remove the old XFRM OUT policies and states because they conflict with the new ones and prevent the installation to proceed. This removal however causes a short window of packet drops on upgrade, between the time the old XFRM configs are removed and the new ones are added. These drops would show up as XfrmOutPolBlock because packets then match the catch-all default-drop XFRM policy. Instead of removing the old XFRM configs, a better, less-disruptive approach is to deprioritize them and add the new ones in front. To that end, we "lower" the priority of the old XFRM OUT policy from 0 to 50 (0 is the highest-possible priority). By doing this the XFRM OUT state is also indirectly deprioritized because it is selected by the XFRM OUT policy. As with the code from commit 688dc9ac8 ("ipsec: Remove stale XFRM states and policies"), this whole logic can be removed in v1.15, once we are sure that nobody is upgrading with the old XFRM configs in place. At that point, we will be able to completely clean up those old XFRM configs. The priority of 50 was chosen arbitrarily, to be between the priority of new XFRM OUT configs (0) and the priority of the catch-all default-drop policy (100), while leaving space if we need to add additional rules of different priorities. Diff before/after upgrade (v1.13.0 -> this patch, GKE): src 10.24.1.0/24 dst 10.24.2.0/24 - dir out priority 0 + dir out priority 50 mark 0x3e00/0xff00 tmpl src 10.24.1.77 dst 10.24.2.207 proto esp spi 0x00000003 reqid 1 mode tunnel Fixes: 688dc9ac8 ("ipsec: Remove stale XFRM states and policies") Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>	13 June 2023, 19:22:04 UTC
e64caec	Paul Chaignon	28 May 2023, 19:51:31 UTC	ipsec: Lower priority of catch-all XFRM policies [ upstream commit 3e898f26063531b9bf3883c5c79e347f15112631 ] This commit lowers the priority of the catch-all default-drop XFRM OUT policies, from 1 to 100. For context, 0 is the highest possible priority. This change will allow us to introduce several levels of priorities for XFRM OUT policies in subsequent commits. Diff before/after this patch: src 0.0.0.0/0 dst 0.0.0.0/0 - dir out action block priority 1 + dir out action block priority 100 mark 0xe00/0xf00 Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>	13 June 2023, 19:22:04 UTC
ac40896	Paul Chaignon	31 May 2023, 08:59:12 UTC	ipsec: Fix leak of XFRM policies with ENI and Azure IPAMs [ upstream commit 9cc8a89f914195d52a8b3df021215b4051348b45 ] Our logic to clean up old XFRM configs on node deletion currently relies on the destination IP to identify the configs to remove. That doesn't work with ENI and Azure IPAMs, but until recently, it didn't need to. On ENI and Azure IPAMs we didn't have per-node XFRM configs. That changed in commit 3e59b681f ("ipsec: Per-node XFRM states & policies for EKS & AKS"). We now need to clean up per-node XFRM configs for ENI and Azure IPAM modes as well, and we can't rely on the destination IP for that because the XFRM policies don't match on that destination IP. Instead, since commit 73c36d45e0 ("ipsec: Match OUT XFRM states & policies using node IDs"), we match the per-node XFRM configs using node IDs encoded in the packet mark. The good news is that this is true for all IPAM modes (whether Azure, ENI, cluster-pool, or something else). So our cleanup logic can now rely on the node ID of the deleted node to clean up its XFRM states and policies. This commit implements that. Fixes: 3e59b681f ("ipsec: Per-node XFRM states & policies for EKS & AKS") Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>	13 June 2023, 19:22:04 UTC
df1effa	Paul Chaignon	30 May 2023, 22:43:41 UTC	node_ids: New helper function getNodeIDForNode [ upstream commit 3201a5ee689ba650df414d3417d9a9a0ad677bf7 ] This commit simply refactors some existing code into a new getNodeIDForNode function. This function will be called from elsewhere in a subsequent commit. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>	13 June 2023, 19:22:04 UTC
2217b29	Jussi Maki	29 May 2023, 12:57:53 UTC	loader: Reinitialize IPsec on device changes on ENI [ upstream commit bf0940b4ff6fcc54227137c1322c2e632e7a1819 ] If IPsec is enabled along with the ENI IPAM mode we need to load the bpf_network program onto new ENI devices when they're added at runtime. To fix this, we subscribe to netlink link updates to detect when new (non-veth) devices are added and reinitialize IPsec to load the BPF program onto the devices. The compilation of the bpf_netowrk program has been moved to Reinitialize() to avoid spurious recompilation on reinitialize. Signed-off-by: Jussi Maki <jussi@isovalent.com> Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>	13 June 2023, 19:22:04 UTC
692f395	Paul Chaignon	19 May 2023, 19:38:45 UTC	loader: Allow reinitializeIPSec to run multiple times [ upstream commit e880002be665e96473daced96f809b3b04f81e27 ] reinitializeIPSec only runs the interface detection if EncryptInterface is empty. Since it sets it after detecting interfaces, it will only run the detection once. Let's change that to run the detection even if the EncryptInterface list isn't empty. That will allow us to rerun the detection when new ENI devices are added on EKS. One consequence of this change is that we will now attach to all interfaces even if the user configured --encrypt-interface. That is fine because --encrypt-interface shouldn't actually be used in ENI mode. In ENI mode, we want to attach to all interfaces as we don't have a guarantee on which interface the IPsec traffic will come in. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> Signed-off-by: Jussi Maki <jussi@isovalent.com> Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>	13 June 2023, 19:22:04 UTC
f1a44e9	Paul Chaignon	25 May 2023, 19:37:33 UTC	ipsec: Flag to switch between IP types used for IPsec encap [ upstream commit 963e45b1c9a0a0d6420cfed6b0aaabbe45cb630e ] On EKS and AKS, IPsec used NodeInternalIPs for the encapsulation. This commit introduces a new flag to allow switching from NodeInternalIPs to CiliumInternalIPs; it defaults to the former. This new flag allows for step 3 of the migration plan defined in the previous commit. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>	13 June 2023, 19:22:04 UTC
a4ce174	Paul Chaignon	25 May 2023, 17:35:49 UTC	ipsec: Accept both CiliumInternalIP and NodeInternalIP on decrypt [ upstream commit 6b3b50d2f568bb145b09e5947ebe55df46e5bc3b ] On EKS and AKS, we currently use NodeInternalIPs for the IPsec tunnels. A subsequent commit will allow us to change that to switch to using CiliumInternalIPs (as done on GKE). For that to be possible without breaking inter-node connectivity for the whole duration of the switch, we need an intermediate mode where both CiliumInternalIPs and NodeInternalIPs are accepted on ingress. The idea is that we will then have a two-steps migration from NodeInternalIP to CiliumInternalIP: 1. All nodes are using NodeInternalIP. 2. Upgrade to the version of Cilium that supports both NodeInternalIP and CiliumInternalIP and encapsulates IPsec traffic with NodeInternalIP. 3. Via an agent flag, tell Cilium to switch to encapsulating IPsec traffic with CiliumInternalIP. 4. All nodes are using CiliumInternalIP. This commit implements the logic for step 2 above. To that end, we will duplicate the XFRM IN states such that we have both: src 0.0.0.0 dst [NodeInternalIP] # existing src 0.0.0.0 dst [CiliumInternalIP] # new thus matching and being able to receive IPsec packets with an outer destination IP of either NodeInternalIP or CiliumInternalIP. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>	13 June 2023, 19:22:04 UTC
2acc610	Paul Chaignon	25 May 2023, 10:39:20 UTC	ipsec: Reintroduce NodeInternalIPs for EKS & AKS IPsec tunnels [ upstream commit 66c45ace70f1355d44efb9c325694375751a943d ] This is a partial revert of commit 3e59b681f ("ipsec: Per-node XFRM states & policies for EKS & AKS"). One change that commit 3e59b681f ("ipsec: Per-node XFRM states & policies for EKS & AKS") make on EKS and AKS was to switch from using NodeInternalIPs to using CiliumInternalIPs for outer IPsec (ESN) IP addresses. That made the logic more consistent with the logic we use for other IPAM schemes (e.g., GKE). It however causes serious connectivity issues on upgrades and downgrades. This is mostly because typically not all nodes are updated to the new Cilium version at the same time. If we consider two pods on nodes A and B trying to communicate, then node A may be using the old NodeInternalIPs while node B is already on the new CiliumInternalIPs. When node B sends traffic to node A, node A doesn't have the XFRM state IN necessary to decrypt it. The same happens in the other direction. This commit reintroduces the NodeInternalIPs for EKS and AKS. Subsequent commits will introduce additional changes to enable a proper migration path from NodeInternalIPs to CiliumInternalIPs. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>	13 June 2023, 19:22:04 UTC
dbcd8a0	Paul Chaignon	26 May 2023, 11:41:25 UTC	ipsec: Inverse another set of conditions to reduce indentations [ upstream commit 64f4c23aefa1483e185b492dffeded6655da22e0 ] No functional changes. Best viewed with git show -b or the equivalent on GitHub to not show space-only changes. Same as the previous commit but on a different set of conditions. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>	13 June 2023, 19:22:04 UTC
ed3fb59	Paul Chaignon	25 May 2023, 13:08:45 UTC	ipsec: Inverse condition to reduce indentations [ upstream commit 2ada282ab10952177041367a94a001bf238798e9 ] No functional changes. Best viewed with git show -b or the equivalent on GitHub to not show space-only changes. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>	13 June 2023, 19:22:04 UTC
f9f49b4	Paul Chaignon	25 May 2023, 11:11:44 UTC	ipsec: Split enableIPsec into IPv4 and IPv6 [ upstream commit bbc50a3d1d18769c7a4c6751fd2eb40a678536a5 ] This small bit of refactoring will make later changes a bit easier. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>	13 June 2023, 19:22:04 UTC
e3e3509	Paul Chaignon	03 May 2023, 09:50:59 UTC	ipsec: Don't remove stale XFRM IN configs [ upstream commit 600c7d4846989fb058fbd7ec400fe1a0a499efc7 ] The XFRM IN policies and states didn't change so we should never need to remove any stale XFRM IN configs. Let's thus simplify the logic to find stale policies and states accordingly. I would expect this incorrect removal to cause a few drops on agent restart, but after multiple attempts to reproduce on small (3 nodes) and larger (20) clusters (EKS & GKE) with a drop-sensitive application (migrate-svc), I'm not able to see such drops. I'm guessing this is because we reinstall the XFRM IN configs right after we removed them so there isn't really much time for a packet to be received and dropped. Fixes: 688dc9ac80 ("ipsec: Remove stale XFRM states and policies") Signed-off-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>	13 June 2023, 19:22:04 UTC
22e3800	Anton Protopopov	13 June 2023, 06:59:15 UTC	Revert "Temporarily disable part of the conformance-kind test" This reverts commit 296303b838acc4a8a580fc5e2199e448344fb742. Signed-off-by: Anton Protopopov <aspsk@isovalent.com>	13 June 2023, 08:01:36 UTC
ce8b131	Anton Protopopov	13 June 2023, 06:58:58 UTC	Revert "Set hostServices=true for smoke test" This reverts commit fb34e019488f48ea085f03a1e3adbfe76922114f. Signed-off-by: Anton Protopopov <aspsk@isovalent.com>	13 June 2023, 08:01:36 UTC
584281a	Anton Protopopov	12 June 2023, 15:04:39 UTC	test/upgrade: disable check for the number of long-term connections When downgrading from this version to v1.10 (and v1.11.{0,1}) some long-term connections may be reset which breaks the CI, e.g., /home/jenkins/workspace/Cilium-PR-K8s-1.16-kernel-net-next/src/github.com/cilium/cilium/test/ginkgo-ext/scopes.go:527 migrate-svc restart count values do not match Expected <int>: 0 to be identical to <int>: 5 /home/jenkins/workspace/Cilium-PR-K8s-1.16-kernel-net-next/src/github.com/cilium/cilium/test/k8sT/Updates.go:347 Disable this check and document in the Upgrade Guide. Signed-off-by: Anton Protopopov <aspsk@isovalent.com>	12 June 2023, 23:31:37 UTC
d3cf7ff	Anton Protopopov	07 June 2023, 16:23:15 UTC	datapath: Reduce from LXC complexity [ upstream commit 7575ba03ccaa5038255c8c1e5c31a8011ed9aaa1 ] [ Backport note: Due to many conflicts it was cherry-picked manually. This change also includes the following minor fix: 8dbde7237c ("test/bpf: Fix format of check-complexity.sh script"). ] Split from LXC code paths into two tail calls when per packet load balancing is needed. When per packet load balancing is not needed this should have minimal impact on datapath performance. Kernel 4.9 verifier was unhappy when service load balancing code was removed from handle_ipv6_from_lxc() and replaced with simpler state restoration call, but only if host firewall was enabled, and per-packet LB was on. Had to shuffle code around to make verifier happy again. Shuffled IPv4 code to keep them similar, but git diff looks scarier there. Finally had to conditionally apply a revalidate data call to make verifier happy also in the verifier complexity tests (test-verifier). Signed-off-by: Anton Protopopov <aspsk@isovalent.com>	12 June 2023, 23:31:37 UTC
d675632	David Bimmler	12 May 2023, 13:12:11 UTC	wireguard, linuxnodehandler: untangle wg from lnh [ upstream commit c8598f8227665054d05cfeec3f65b15005f087c5 ] [ backporter notes: 1. conflicts due to moved files 2. had to embed NodeHandler in the WireguardAgent interface, so that the fact that the WG agent is a node handler is passed through to the daemon] The reason the WireGuard agent node event handling was contained within the linuxNodeHandler code was routing, which is no longer the case. In addition, entangling the two leads to a deadlock, as diagnosed in GitHub issue #24574. This patch thus implements NodeHandler for the WireGuard agent, and subscribes to the NodeManager itself. That way, the wait cycle of the deadlock is broken, as the linuxNodeHandler doesn't acquire the IPCache lock while holding its lock. From the perspective of the agent, the invocations of the callbacks change insofar that previously, only once the linuxNodeHandler considered itself "initialised" it would forward node events. Specifically, this excluded the initial sync of nodes performed on subscribe. However, I didn't see a reason to specifically replicate this behaviour. Suggested-by: Sebastian Wicki <sebastian@isovalent.com> Signed-off-by: David Bimmler <david.bimmler@isovalent.com>	12 June 2023, 21:41:23 UTC
f7e3800	David Bimmler	15 May 2023, 12:12:31 UTC	dp/types: split NodeNeighbors out of NodeHandler [ upstream commit 1aa06c6456d943badfcddb8c5e3765bdd128c469 ] [ backporter notes: conflicts due to moved files] The NodeHandler interface still contains two different concepts. Remove the NodeNeighbor discovery/handling from it, and delete the different stub implementations. Signed-off-by: David Bimmler <david.bimmler@isovalent.com>	12 June 2023, 21:41:23 UTC
7ad5c9f	David Bimmler	15 May 2023, 09:38:33 UTC	dp/types: move nodeIDs out of NodeHandler iface [ upstream commit 8294624839da166c8d6c5bbf473895607f56fd62 ] [ backporter notes: 1. conflicts due to moved files 2. had to mildly reshuffle fake datapath to hold a ptr to struct instead of interface, due to the splitting of the NodeHandler interface. ] The NodeHandler interface is too large, as can be seen in various implementations which are only implementing a subset of methods. This patch splits off the NodeID handling part, and removes the stub methods from noop implementations. Signed-off-by: David Bimmler <david.bimmler@isovalent.com>	12 June 2023, 21:41:23 UTC
9e191f4	David Bimmler	15 May 2023, 09:11:02 UTC	datapath/linux: return ptr to struct not interface [ upstream commit d8a0be674d655202277ddafd29729b032e9c2ee0 ] [ backporter notes: conflicts due to moved files ] Return pointer to implementing struct instead of implemented interface from the constructor, as is commonly considered idiomatic Go. The constructor is already in a linux-specific package. To ensure that we really do implement the interface we want, add a static type check in form of a variable assignment. As a side-effect, this allows us to drop a number of type assertions in tests. Signed-off-by: David Bimmler <david.bimmler@isovalent.com>	12 June 2023, 21:41:23 UTC
049e4d7	renovate[bot]	08 June 2023, 16:06:33 UTC	chore(deps): update dependency cilium/hubble to v0.11.6 Signed-off-by: renovate[bot] <bot@renovateapp.com>	11 June 2023, 01:14:32 UTC
3480b8d	Chance Zibolski	28 April 2023, 20:30:29 UTC	Add github workflow to push development helm charts to quay.io [ upstream commit d90803cec8e7a1508558b183600cde7dcbdd719f ] [ upstream commit 2c367de92d1825bf2ab80de2fd4e848ee2e28c46 ] [ Backporter's notes: - Removed github workflow. Workflow operates from main branch. - Applied /usr/bin/env hunk to the script (was treewide). ] Signed-off-by: Chance Zibolski <chance.zibolski@gmail.com> Signed-off-by: Joe Stringer <joe@cilium.io>	09 June 2023, 18:37:57 UTC
a8f0b88	Paul Chaignon	05 June 2023, 09:16:50 UTC	helm: Value for enable-ipsec-key-watcher [ upstream commit 3ee2fb7dd5f08dc1353266670646584e9dca1b47 ] [ backporter's note: Fixed minor conflict in the Helm template ] This commit adds a Helm value for the enable-ipsec-key-watcher agent flag introduced in the previous commit. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com>	09 June 2023, 17:26:49 UTC
48bf046	Paul Chaignon	05 June 2023, 08:55:45 UTC	ipsec, option: Allow users to disable the IPsec key watcher [ upstream commit a579e9b58fb1cdbbc2c6b88c8f85d50aec98c99c ] [ backporter's note: IPSec key duration option doesn't exist on this branch, so I removed them. Also, this PR contains the commit to add Helm option for ipsec key rotation duration. I talked with an original author and dropped that commit since it is accidentally introduced. ] The IPsec key watcher is used to automatically detect and apply changes in the key (typically during key rotations). Having this watcher avoids having to restart the agents to apply the key change. It can however be desired to only apply the key change when the agent is restarted. It gives control to the user on when exactly the change happens. It may also be used as a way to switch from one IPsec implementation to another (XFRM configs specifically): the user rotates the key just before the upgrade; on upgrade, the SPI is implicitly used to distinguish between the old and new implementations as well as the old and new keys. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com>	09 June 2023, 17:26:49 UTC
c9b2c1f	Jarno Rajahalme	05 June 2023, 14:58:38 UTC	install: Fail helm if kube-proxy-replacement is not valid [ upstream commit f64e0739d0293ed0e118df48371c4631f4573a06 ] [ backporter's note: Removed variables in the Helm templates that don't exist on this branch. Also, kubeProxyReplacement=probe is still valid in this branch, so I added it to the error condition. ] Fail helm if kube-proxy-replacement is set or defaults to an invalid value. kube-proxy-replacement can be defaulted to a deprecated (and since removed) "probe" value. User can also set it into an incorrect value explicitly. It is better to fail on helm than cilium agent failing to start. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com>	09 June 2023, 17:26:49 UTC
71728fd	Michi Mutsuzaki	30 May 2023, 20:13:31 UTC	Pick up the latest startup-script image [ upstream commit 5c9b66ce093520b29e03d2ed36f5a2bd5d1b6db4 ] [ backporter's note: Fixed conflict in the install/kubernetes/Makefile.values and regenerated relevant documents. ] Upgrading this image is not automated yet. Ref: #25773 Ref: https://github.com/cilium/image-tools/pull/218 Ref: https://quay.io/repository/cilium/startup-script?tab=tags Signed-off-by: Michi Mutsuzaki <michi@isovalent.com> Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com>	09 June 2023, 11:16:01 UTC
186cb98	Paul Chaignon	24 April 2023, 09:55:19 UTC	test: Collect sysdump as part of artifacts [ upstream commit e93fdd87b96328847bfe31ab9343ccfef4843b93 ] Once we have a sysdump in the test artifacts a lot of files we collect will become duplicates. This commit however doesn't remove all those duplicate files from the test artifacts. Let's wait a bit and confirm the sysdump collection always work before cleaning things up. The sysdump collection was tested by making a test fail on purpose. Signed-off-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com>	09 June 2023, 11:16:01 UTC
fb34e01	Anton Protopopov	07 June 2023, 12:35:41 UTC	Set hostServices=true for smoke test Setting hostServices=false with KPR=partial increases complexity which breaks the SmokeTest on newer kernels. Disable it temporarily. Signed-off-by: Anton Protopopov <aspsk@isovalent.com>	08 June 2023, 14:35:32 UTC
296303b	Anton Protopopov	07 June 2023, 11:25:30 UTC	Temporarily disable part of the conformance-kind test Temporarily disable the ipsec part of the conformance-kind test, as it is currently broken on new kernels (most probably, due to the commit [1] which greatly increases complexity). A proper fix would be to split bpf_lxc.c:from-container into more tail calls, but this is work in progress. [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=354e8f1970f821d4952458f77b1ab6c3eb24d530 Signed-off-by: Anton Protopopov <aspsk@isovalent.com>	08 June 2023, 14:35:32 UTC
176a695	renovate[bot]	07 June 2023, 18:22:18 UTC	chore(deps): update quay.io/cilium/hubble docker tag to v0.11.6 Signed-off-by: renovate[bot] <bot@renovateapp.com>	07 June 2023, 22:06:19 UTC
7dc319e	Nate Sweet	16 March 2023, 17:35:38 UTC	bug: Fix Potential Nil Reference in GetLables Implementation [ upstream commit bfbe5a26a458e114a5b8b261ed719a85a8ceff35 ] The policyIdentityLabelLookup wrapper for Endpoint implements the GetLabels interface method. This is necessary for the constructing the MapState of the policy engine. This implementation incorrectly did not check if the identity returned by LookupIdentityByID was nil. This fixes this bug, which heretofore has not caused any issues. Signed-off-by: Nate Sweet <nathanjsweet@pm.me>	02 June 2023, 13:46:30 UTC
37a0453	Joe Stringer	12 March 2023, 19:21:14 UTC	policy: Fix concurrent access of SelectorCache [ upstream commit 52ace8e9ea318fe79e86731bddbc0abc97843311 ] Marco Iorio reports that with previous code, Cilium could crash at runtime after importing a network policy, with the following error printed to the logs: fatal error: concurrent map read and map write The path for this issue is printed also in the logs, with the following call stack: pkg/policy.(SelectorCache).GetLabels(...) pkg/policy.(MapStateEntry).getNets(...) pkg/policy.entryIdentityIsSupersetOf(...) pkg/policy.MapState.denyPreferredInsertWithChanges(...) pkg/policy.MapState.DenyPreferredInsert(...) pkg/policy.(EndpointPolicy).computeDirectionL4PolicyMapEntries(...) pkg/policy.(EndpointPolicy).computeDesiredL4PolicyMapEntries(...) pkg/policy.(selectorPolicy).DistillPolicy(...) pkg/policy.(cachedSelectorPolicy).Consume(...) pkg/endpoint.(*Endpoint).regeneratePolicy(...) ... Upon further inspection, this call path is not grabbing the SelectorCache lock at any point. If we check all of the incoming calls to this function, we can see multiple higher level functions calling into this function. The following tree starts from the deepest level of the call stack and increasing indentation represents one level higher in the call stack. INCOMING CALLS - f GetLabels github.com/cilium/cilium/pkg/policy • selectorcache.go - f getNets github.com/cilium/cilium/pkg/policy • mapstate.go - f entryIdentityIsSupersetOf github.com/cilium/cilium/pkg/policy • mapstate.go - f denyPreferredInsertWithChanges github.com/cilium/cilium/pkg/policy • mapstate.go - f DenyPreferredInsert github.com/cilium/cilium/pkg/policy • mapstate.go - f computeDirectionL4PolicyMapEntries github.com/cilium/cilium/pkg/policy • resolve.go - f computeDesiredL4PolicyMapEntries github.com/cilium/cilium/pkg/policy • resolve.go + f DistillPolicy github.com/cilium/cilium/pkg/policy • resolve.go <--- No SelectorCache lock - f DetermineAllowLocalhostIngress github.com/cilium/cilium/pkg/policy • mapstate.go + f DistillPolicy github.com/cilium/cilium/pkg/policy • resolve.go <--- No SelectorCache lock - f consumeMapChanges github.com/cilium/cilium/pkg/policy • mapstate.go + f ConsumeMapChanges github.com/cilium/cilium/pkg/policy • resolve.go <--- Already locks the SelectorCache Read the above tree as "GetLabels() is called by getNets()", "getNets() is called by entryIdentityIsSupersetOf()", and so on. Siblings at the same level of indent represent alternate callers of the function that is one level of indentation less in the tree, ie DenyPreferredInsert() and consumeMapChanges() both call denyPreferredInsertWithChanges(). As annotated above, we see that calls through DistillPolicy() do not grab the SelectorCache lock. Given that ConsumeMapChanges() grabs the SelectorCache lock, we cannot introduce a new lock acquisition in any descendent function, otherwise it would introduce a deadlock in goroutines that follow that call path. This provides us the option to lock at some point from the sibling of consumeMapChanges() or higher in the call stack. Given that the ancestors of DenyPreferredInsert() are all from DistillPolicy(), we can amortize the cost of grabbing the SelectorCache lock by grabbing it once for the policy distillation phase rather than putting the lock into DenyPreferredInsert() where the SelectorCache could be locked and unlocked for each map state entry. Future work could investigate whether these call paths could make use of the IdentityAllocator's cache of local identities for the GetLabels() call rather than relying on the SelectorCache, but for now this patch should address the immediate locking issue that triggers agent crashes. CC: Nate Sweet <nathanjsweet@pm.me> Fixes: c9f0def587e6 ("policy: Fix Deny Precedence Bug") Reported-by: Marco Iorio <marco.iorio@isovalent.com> Co-authored-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Nate Sweet <nathanjsweet@pm.me>	02 June 2023, 13:46:30 UTC
b6970da	Nate Sweet	11 October 2022, 02:52:05 UTC	policy: Fix Deny Precedence Bug [ upstream commit c9f0def587e662c2b2ac4501362c6f44aa62ee71 ] - Add Tests for Deny Precedence Bug: Currently, when a broad "Deny" policy is paired with a specific "Unmanaged CIDR" policy, then the "Unmanaged CIDR" policy will still be inserted into the policy map for an endpoint. This results in "Deny" policies not always taking precedence over "Allow" policies. This test confirms the bugs existence. - Fix Deny Precedence Bug: When the policy map state is created CIDRs are now checked against one another to ensure that deny-rules that supersede allow-rules when they should. `DenyPreferredInsert` has been refactored to use utility methods that make the complex boolean logic of policy precedence more atomic. Add `NetsContainsAny` method to `pkg/ip` to compare cases where one set of networks conatins or is equal to any network in another set. - endpoint: Add policy.Identity Implementation A `policy.Identity` implementation is necessary for the incremental update to the endpoint's policy map that can occur with L7 changes. Valid deny-policy entries may prohibit these L7 changes based on CIDR rules, which are only obtainable by looking up all potentially conflicting policies' labels. Thus `l4.ToMapState` needs access to the identity allocater to lookup "random" identity labels. Signed-off-by: Nate Sweet <nathanjsweet@pm.me>	02 June 2023, 13:46:30 UTC
ddeaf64	Jarno Rajahalme	25 May 2023, 12:39:57 UTC	envoy: Never use x-forwarded-for header [ upstream commit e8fcd6bfcd91e0eabe7d4049f84b1f706d68bc38 ] Envoy by default gets the source address from the `x-forwarded-for` header, if present. Always add an explicit `use_remote_address: true` for Envoy HTTP Connection Manager configuration to disable the default behavior. Also set the `skip_xff_append: true` option to retain the old behavior of not adding `x-forwarded-for` headers on cilium envoy proxy. Setting these options is not really needed for admin and metrics listeners, or most of the tests, but we add them there too in case anyone uses them as a source of inspiration for a real proxy configuration. This fixes incorrect hubble flow data when HTTP requests contain an `x-forwarded-for` header. This change has no effect on Cilium policy enforcement where the source security identity is always resolved before HTTP headers are parsed. Fixes: #25630 Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> Signed-off-by: Tam Mach <tam.mach@cilium.io>	01 June 2023, 07:35:43 UTC
887c79b	Paul Chaignon	21 May 2023, 14:47:09 UTC	test/fqdn: Switch from jenkins.cilium.io to cilium.io [ upstream commit f66f4b159d24e1b5c8e4d92a69932ef003cff87d ] jenkins.cilium.io is down since Thursday. We can simply switch to cilium.io. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>	24 May 2023, 12:32:25 UTC
a7edc23	Paul Chaignon	21 May 2023, 14:46:28 UTC	test/fqdn: Avoid hardcoding the test FQDN [ upstream commit 4bcfebc11c29e94cc91a6cc5a027562f9ec0be20 ] Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>	24 May 2023, 12:32:25 UTC

Newer
Older