https://github.com/cilium/cilium

sort by:
Revision Author Date Message Commit Date
cf6e022 Prepare for release v1.14.8 Signed-off-by: Tim Horner <timothy.horner@isovalent.com> 13 March 2024, 21:53:07 UTC
d713993 images: update cilium-{runtime,builder} Signed-off-by: André Martins <andre@cilium.io> 12 March 2024, 19:50:20 UTC
dd92cdd images: bump cni plugins to v1.4.1 The result of running ``` images/scripts/update-cni-version.sh 1.4.1 ``` Signed-off-by: André Martins <andre@cilium.io> 12 March 2024, 19:50:20 UTC
15a3714 wireguard: Improve L7 proxy traffic detection [ upstream commit 96e01adeaa51c85a5671d34c4715c07de97e26e1 ] Use marks set by the proxy instead of assuming that each pkt from HOST_ID w/o MARK_MAGIC_HOST belongs to the proxy. In addition, in the tunneling mode the mark might get reset before entering wg_maybe_redirect_to_encrypt(), as the proxy packets are instead routed to from_host@cilium_host. The latter calls inherit_identity_from_host() which resets the mark. In this case, rely on the TC index. Suggested-by: Gray Lian <gray.liang@isovalent.com> Signed-off-by: Martynas Pumputis <m@lambda.lt> 12 March 2024, 14:24:57 UTC
ffdb9dc wireguard: Encrypt L7 proxy pkts to remote pods [ upstream commit 26f83491643b8c9b2921544ad340d8de07e9138c ] Marco reported that the following L7 proxy traffic is leaked (bypasses the WireGuard encryption): 1. WG: tunnel, L7 egress policy: forward traffic is leaked 2. WG: tunnel, DNS: all DNS traffic is leaked 3. WG: native routing, DNS: all DNS traffic is leaked This was reported before the introduction of the --wireguard-encapsulate [1]. The tunneling leak cases are obvious. The L7 proxy traffic got encapsulated by the Cilium's tunneling device. This made it to bypass the redirection to the Cilium's WireGuard device. However, [1] fixed this behavior. For Cilium v1.15 (upcoming) nothing needs to be configured. Meanwhile, for v1.14.4 users need to set --wireguard-encapsulate=true. The native routing case is more tricky. The L7 proxy taffic got a src IP of a host instead of a client pod. So, the redirection was bypassed. To fix this, we extended the redirection check to identify L7 proxy traffic. [1]: https://github.com/cilium/cilium/pull/28917 Reported-by: Marco Iorio <marco.iorio@isovalent.com> Signed-off-by: Martynas Pumputis <m@lambda.lt> 12 March 2024, 14:24:57 UTC
39705b4 wireguard: unconditionally add NodeInternalIPs to allowed IPs [ upstream commit 1eb12e071ff3ee95bf209a6e9eaf25caa7c0c006 ] Currently, we add the remote NodeInternalIPs to the list of allowed IPs associated with a given WireGuard peer only in certain circumstances, and more specifically when either tunneling or node to node encryption are enabled. However, this logic doesn't practically buy us anything in terms of additional security, but causes potential traffic disruption in case users want to enable/disable node2node encryption in a running cluster. Hence, let's just get rid of it, and unconditionally add NodeInternalIPs to the list of allowed IPs. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> 12 March 2024, 14:24:57 UTC
96defa9 chore(deps): update hubble cli to v0.13.2 Signed-off-by: renovate[bot] <bot@renovateapp.com> 12 March 2024, 12:35:00 UTC
6a989a2 fqdn: prevent conntrack GC from reaping newly-added IPs [ upstream commit b284170774203631f9cd3d3b8524dcdfb8b04e22 ] [ backporter's notes: one signature change, passing string instead of IP. Otherwise unchanged ] A bug was found where a low-TTL name was incorrectly reaped despite being part of an active connection. After looking at logs, and reproducing locally, it was determined that there is an unfortunate interleaving between the DNS and CT GC loops. The code attempts to prevent this issue by ensuring that names inserted after CT GC has started are exempt from reaping. However, we don't actually track the insertion time, we track the DNS TTL expiration time, which is strictly in the past. In fact, it can be up to a minute in the past. We shouldn't rely on timestamps anyways, as the scheduler can always play tricks on us. So, if a CT GC run has started and finished in the time between name expiration and insertion in to zombies, the IP address is immediately considered dead and unnecessarily reaped. Timeline: T1. name expires T2. CT GC starts and finishes T3. Zombies.SetCTGCTime(T2) T4. Zombies.Upsert(name, T1) T5. Zombies.GC() At T5, zombies.GC will remove IPs associated with name, because T2 > T1. The solution is to use an explicit serial number to ensure that CTGC has completed a full run before we are allowed to delete an IP. We actually need to let CT GC run twice, as it may have started before this zombie was added and thus not marked it alive. Additionally, we already have a grace period, the idle connection timeout, that gives applications a chance to re-use an expired IP. However, we did not respect this grace period if the IP in question did not have an entry in conntrack. So, pad deletion time by this grace period as well, just to be sure this grace period applies to all possible deletions. Signed-off-by: Casey Callendrello <cdc@isovalent.com> 12 March 2024, 12:17:12 UTC
3106435 conntrack: only bump FQDN GC time when CT GC successful [ upstream commit 03f8c85a452125289244d9027f92ddf84650822d ] [ backporter's notes: needed to adapt to some gc signature changes, but logic unchanged. ] The FQDN GC subsystem waits before a successful CT GC run before marking IPs as stale. However, we were erroneously marking CT GC as successful even on failure, or when only run for a single family. So, only mark notify FQDN when we've done a successful GC pass for all configured families. Signed-off-by: Casey Callendrello <cdc@isovalent.com> 12 March 2024, 12:17:12 UTC
f522b97 Bump google.golang.org/protobuf (v1.14) Resolves CVE-2024-24786. Signed-off-by: Feroz Salam <feroz.salam@isovalent.com> 12 March 2024, 07:08:47 UTC
3987378 images: update cilium-{runtime,builder} Signed-off-by: Cilium Imagebot <noreply@cilium.io> 11 March 2024, 17:46:35 UTC
4985d11 chore(deps): update docker.io/library/ubuntu:22.04 docker digest to 77906da Signed-off-by: renovate[bot] <bot@renovateapp.com> 11 March 2024, 17:46:35 UTC
a9cdd88 chore(deps): update stable lvh-images Signed-off-by: renovate[bot] <bot@renovateapp.com> 11 March 2024, 17:19:17 UTC
b59164c iptables: Read CNI chaining mode from CNI config manager [ upstream commit: 77053ae0b24e8ba61477cd42d88ba6e8468e09f4 ] [ backporter's note: iptables does not have its own cell in v1.14, thus the reference to the CNIConfigManager is passed in the Init method. ] CNI chaining mode option has been moved to the CNI cell in commit 1254bf403f. Since it is not a global config option anymore, iptables manager will not see any change to that value, and its field `CNIChainingMode` will always be an empty string. Thus, with the following config option values: - "enable-endpoint-routes": true - "cni-chaining-mode": "aws-cni" the delivery interface referenced in the rules installed by the manager is "lxc+" instead of "eni+". This commit fixes this adding a CNI config manager reference to the iptables manager parameters, in order to read the current setting for the chaining mode during rules installation. Fixes: 1254bf403f ("daemon / cni: move to Cell, watch for changes") Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com> 11 March 2024, 16:21:53 UTC
1256b5e bugtool: Capture memory fragmentation info from /proc [ upstream commit 1c3a17f672f6da2332b3731329aead13b3c17e22 ] This information can be useful to understand why memory allocation in the kernel may fail (ex. for maps or for XFRM). I've checked that these two files are accessible from a typical cilium-agent deployment (on GKE). Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com> 11 March 2024, 14:52:58 UTC
4326188 container/bitlpm: Add Lookup Boolean Return Value [ upstream commit 5a96a95cdebf9d161d73642ef863b23d0a8f5484 ] Lookup currently returns the default value of the bitlpm.Trie when it fails to find a match. There are cases where comparing the default value to the return value is logically expensive (i.e. code needs to be written to do the comparison). Lookup can easily return a boolean value to indicate whether it failed. Signed-off-by: Nate Sweet <nathanjsweet@pm.me> Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com> 11 March 2024, 14:52:58 UTC
ca4ff2a Update kafka-sw-gen-traffic.sh [ upstream commit 7a5a4295f8ca75a21e57969ef01a4926641c2ce1 ] Fixed `kubectl exec` syntax Signed-off-by: Dean <22192242+saintdle@users.noreply.github.com> Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com> 11 March 2024, 14:52:58 UTC
a9a01c2 loader: also populate NATIVE_DEV_IFINDEX for cilium_overlay [ upstream commit 6b98a0b210ea2eae59e2eeb399aad3ce121f3caf ] Avoid any odd surprises when this macro ends up being used by shared nodeport.h code. Signed-off-by: Julian Wiedmann <jwi@isovalent.com> Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com> 11 March 2024, 14:52:58 UTC
343b300 bitlpm: Factor out common code [ upstream commit 6b63ea2f83fa3a471651c949e96b0a3ddccb9618 ] Reduce code repetition by defining a 'traverse' function that is shared between multiple functions. Clarify comments. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com> 11 March 2024, 14:52:58 UTC
644c2ce xds: Avoid xds timeout due to agent restart in envoy DS mode [ upstream commit d7dba5e4628fcc15b237a157f5fefd49dceb0f88 ] For external envoy, xds server and envoy are having different life cycles i.e. each is running in its own pod, and can be deployed or restarted independently. This commit is to handle the case that xds in cilium agent got restarted, and nonce value is always 0. Sample error ``` 2024-02-05T12:49:51.771714518Z level=warning msg="Regeneration of endpoint failed" bpfCompilation=0s bpfLoadProg=105.68356ms bpfWaitForELF="24.396µs" bpfWriteELF=1.802221ms ciliumEndpointName=cilium-test/client-56f8968958-fqdl4 containerID=245b2aaac2 containerInterface=eth0 datapathPolicyRevision=5 desiredPolicyRevision=6 endpointID=134 error="Error while configuring proxy redirects: proxy state changes failed: context canceled" identity=1713 ipv4=10.244.1.1 ipv6="fd00:10:244:1::9544" k8sPodName=cilium-test/client-56f8968958-fqdl4 mapSync=2.476505ms policyCalculation=1.240346ms prepareBuild="437.049µs" proxyConfiguration="837.119µs" proxyPolicyCalculation="234.369µs" proxyWaitForAck=2m34.697546384s reason="policy rules added" subsys=endpoint total=2m34.818201428s waitingForCTClean=270ns waitingForLock="2.605µs" ``` Signed-off-by: Tam Mach <tam.mach@cilium.io> Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com> 11 March 2024, 14:52:58 UTC
91666a8 chore(deps): update all github action dependencies Signed-off-by: renovate[bot] <bot@renovateapp.com> 11 March 2024, 08:37:33 UTC
c11283d bpf: host: optimize from-host's ICMPv6 path [ upstream commit: 475a1949a5621132777aa74a595b93259d1cefcd ] [ backporter's notes: minor conflict due to v1.15 icmp6_host_handle() doesn't have ext_err parameter. ] The ICMPv6 handling in handle_ipv6() is only required for the HostFW or by from-netdev. Exclude it otherwise. This is a minor optimization for dc9dfd72f2ae ("bpf: Re-introduce ICMPv6 NS responder on from-netdev"). Signed-off-by: Julian Wiedmann <jwi@isovalent.com> Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com> 08 March 2024, 09:33:26 UTC
c8c4737 bpf/tests: Add IPv6 NDP bpf test [ upstream commit: 8d4db8988128c6110d99f3b6ae0f072b12b86b61 ] [ backporter's notes: minor changes due to lack of #30467 and #27134 ] This commit adds bpf/tests/ipv6_ndp_from_netdev_test.c to cover two scenarios: 1. from_netdev receives IPv6 NS for a pod IP on the same host 2. from_netdev receives IPv6 NS for the node IP (eth0's addr) For case 1, from_netdev should return a NA on behalf of the target pod to avoid https://github.com/cilium/cilium/issues/30926. for case 2, it must return the NS to stack to address https://github.com/cilium/cilium/issues/14509. Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com> Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com> 08 March 2024, 09:33:26 UTC
dbd6cac bpf: Re-introduce ICMPv6 NS responder on from-netdev [ upstream commit: dc9dfd72f2ae650f9d59cf82aba379f925a22d82 ] [ backporter's notes: in v1.14 icmp6_handle_ns() doesn't have ext_err parameter, so icmp6_host_handle() doesn't need to accept it either. ] This reverts commit 658071414ca4606e537bc4bbb37dcae5e18cd7dc, to fix the breakage of "IPv6 NS responder for pod" introduced by https://github.com/cilium/cilium/pull/12086 (bpf: Reply NA when recv ND for local IPv6 endpoints). 658071414ca4606e537bc4bbb37dcae5e18cd7dc was merged to solve https://github.com/cilium/cilium/issues/14509. To not revive #14509, this commit also passes through ICMPv6 NS if the target is native node IP (eth0's addr). By letting stack take care of those NS-for-node-IP packets, we managed to: 1. Solve #14509 again, but in a way keeping NS responder. The cause of #14509 was NS responder always generates ND whose source IP is "router_ip" (cilium_internal_ip) rather than "node_ip". Once we pass those NS-for-node-IP packets to stack, the ND response would naturally have "node_ip" as source. 2. Avoid the fib_lookup failure mentioned at https://github.com/cilium/cilium/pull/30837#issuecomment-1960897445. icmp6_host_handle() also has a new parameter `handle_ns` to control if we want NS responder to be active. If it is called from `to-netdev` code path, handle_ns is set to false. This is suggested by julianwiedmann. Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com> Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com> 08 March 2024, 09:33:26 UTC
caf0bb6 bpf/tests: Remove SKIP_ICMPV6_NS_HANDLING from tc_nodeport_l3_dev.c [ upstream commit: 60c5e76db1a485a3c27e837a178b583a01b2e308 ] [ backporter's notes: adds tc_nodeport_l3_dev.o into NOCOVER_PATTERN to avoid verifier issue as v1.14 still has bpf coverage test. ] SKIP_ICMPV6_NS_HANDLING was there to pass bpf coverage test, which is gone by https://github.com/cilium/cilium/pull/28090. In the meantime, removing SKIP_ICMPV6_NS_HANDLING from tc_nodeport_l3_dev.c prevents "potential missed tailcall" errors introduced by https://github.com/cilium/cilium/pull/30467, as tail_icmp6_handle_ns() doesn't exist when SKIP_ICMPV6_NS_HANDLING is defined, but still gets tail-called by icmp6_handle_ns(). Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com> 08 March 2024, 09:33:26 UTC
2a0aa85 bpf: nat: use icmp6_load_type() instead of ctx_load_bytes() [ upstream commit: a28f0fc75db6a3e5e384e12904ef7efe615aad50 ] Replace open-coded variants of the helper. Signed-off-by: Julian Wiedmann <jwi@isovalent.com> Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com> 08 March 2024, 09:33:26 UTC
c22944d bpf: icmp6: have icmp6_load_type() take a L4 offset [ upstream commit: ffbd7af823a35baddbfed9d72ec296bc5f0a12e0 ] Right now the helper takes a L3 offset, and assumes a packet without extension headers. Change it to a L4 offset, so that callers can pass this when available. Signed-off-by: Julian Wiedmann <jwi@isovalent.com> Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com> 08 March 2024, 09:33:26 UTC
17a105c bpf: icmp6: have icmp6_load_type() return an error [ upstream commit: 283b8b7e1c9afb9a8d99e5d61e52db355772b2ce ] [ backporter's notes: minor conflicts in wireguard.h ] Under the hood this uses ctx_load_bytes(), which can fail. Return such an error to the caller. Signed-off-by: Julian Wiedmann <jwi@isovalent.com> Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com> 08 March 2024, 09:33:26 UTC
cbfc433 envoy: Bump golang version to 1.21.8 This is to pick up the new image with updated golang version, and other dependency bump. Related commit: https://github.com/cilium/proxy/commit/bbde4095997ea57ead209f56158790d47224a0f5 Related build: https://github.com/cilium/proxy/actions/runs/8179371187/job/22365308893 Signed-off-by: Tam Mach <tam.mach@cilium.io> 07 March 2024, 19:00:55 UTC
b8379f2 patches: Call upstream callbacks via UpstreamFilterManager Envoy has moved the encodeHeaders() call to a new call path in upstream decoder filter. Move the upstream callbacks iteration call there to be just before the encodeHeaders() call, and call the iteration via UpstreamFilterManager so that the callbacks registered in the downstream filter manager are used. Call sendLocalReply also via the UpstreamFilterManager to have its local state updated properly for upstream processing. One more note comparing to the patches for 1.27+, the encodingHeader() call is still available in onPoolReady(), so we should move our patch on calling iterateUpstreamCallbacks() after. Relates: https://github.com/envoyproxy/envoy/pull/26916/files#r1176556258 Related commit: https://github.com/cilium/proxy/commit/860c2219c1d3a0e531c36bd2171d0b1678bba530 Related build: https://github.com/cilium/proxy/actions/runs/8156758309/job/22298887449 Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> Signed-off-by: Tam Mach <tam.mach@cilium.io> 07 March 2024, 13:15:56 UTC
953f72b images: update cilium-{runtime,builder} Signed-off-by: Cilium Imagebot <noreply@cilium.io> 07 March 2024, 11:18:36 UTC
8eaa6cc chore(deps): update go to v1.21.8 Signed-off-by: renovate[bot] <bot@renovateapp.com> 07 March 2024, 11:18:36 UTC
579d0a4 proxy: also install from-ingress-proxy rules with per-EP routing This is a v1.14-only patch, the closest upstream equivalent is 217ae4f64183 ("Re-introduce 2005 route table"). Egressing traffic would usually get routed straight to eth0. Install the 2005 rule to divert the traffic into cilium_host first. Signed-off-by: Julian Wiedmann <jwi@isovalent.com> 07 March 2024, 07:25:45 UTC
bde2d5e bpf: host: also handle from-egress proxy traffic [ upstream commit e96e9cd7542063ac6314f76c492e5b1ef41ee639 ] The from-host path already knows how to handle traffic that comes from the ingress proxy. Extend this logic to also cover traffic that originates from the egress proxy. Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com> Signed-off-by: Julian Wiedmann <jwi@isovalent.com> 07 March 2024, 07:25:45 UTC
4680356 datapath: disable net.ipv4.ip_early_demux for IPsec + L7 proxy [ upstream commit 5201896e0a393ec4199cf9b5be4ebac6374be12a ] [ backporter's notes: this is a backport to pre-cell iptables ] After forward traffic for an egress proxy onnection has traversed through cilium_host / cilium_net, we expect IPsec-marked packets to get handled by xfrm. This currently conflicts with early demux, which matches the connection's transparent socket and assigns it to the packet: ``` // https://elixir.bootlin.com/linux/v6.2/source/net/ipv4/tcp_ipv4.c#L1770 int tcp_v4_early_demux(struct sk_buff *skb) { ... sk = __inet_lookup_established(net, net->ipv4.tcp_death_row.hashinfo, iph->saddr, th->source, iph->daddr, ntohs(th->dest), skb->skb_iif, inet_sdif(skb)); if (sk) { skb->sk = sk; ... } ``` It then gets dropped in ip_forward(), before reaching xfrm: ``` // https://elixir.bootlin.com/linux/v6.2/source/net/ipv4/ip_forward.c#L100 int ip_forward(struct sk_buff *skb) { ... if (unlikely(skb->sk)) goto drop; ... } ``` To avoid this we disable early-demux in a L7 + IPsec config. Note that the L7 proxy feature needs to deal with similar troubles, as the comment for inboundProxyRedirectRule() describes. Ideally we would build a similar solution for IPsec, diverting traffic with policy routing so that it doesn't get intercepted by early-demux. Signed-off-by: Zhichuan Liang<gray.liang@isovalent.com> Signed-off-by: Julian Wiedmann <jwi@isovalent.com> 07 March 2024, 07:25:45 UTC
004fa85 iptables: preserve encrypt mark for egress proxy forward traffic [ upstream commit f018b20e9ef6c28bc37a94112b53ed9ad6890534 ] Once forward traffic for an egress proxy connection has traversed through cilium_host / cilium_net, we expect IPsec-marked packets to get handled by xfrm. But this currently conflicts with an iptables rule for the proxy's transparent socket, which then over-writes the mark: -A CILIUM_PRE_mangle -m socket --transparent -m comment --comment "cilium: any->pod redirect proxied traffic to host proxy" -j MARK --set-xmark 0x200/0xffffffff We can avoid this by adding an extra filter to this rule, so that it doesn't match IPsec-marked packets. Signed-off-by: Zhichuan Liang<gray.liang@isovalent.com> Signed-off-by: Julian Wiedmann <jwi@isovalent.com> 07 March 2024, 07:25:45 UTC
4f42d9a bpf: host: skip from-proxy handling in from-netdev [ upstream commit d4b81c03dbdb25f3f51d90149097669c31d0d59d ] from-proxy traffic gets redirected to cilium_host. Skip the proxy paths when handle_ipv*_cont() is included by from-netdev. Signed-off-by: Julian Wiedmann <jwi@isovalent.com> 07 March 2024, 07:25:45 UTC
48cd7a8 iptables: filter table accepts from-proxy packets [ upstream commit 244a5e93f0be099a3c59ee8f87fdfd26849a6de7 ] GKE has DROP policy for filter table, so we have to explicitly accept proxy traffic. Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com> Signed-off-by: Julian Wiedmann <jwi@isovalent.com> 07 March 2024, 07:25:45 UTC
b914bf5 proxy: opt-out from SNAT for L7 + Tunnel for some scenarios [ upstream commit 9fbd5a814b47131887661748996d876f541da3b8 ] Currently the L7 proxy performs SNAT for traffic when tunnel routing is enabled, even for cluster-internal traffic. This prevents cilium_host from detecting pod-level traffic, and we thus can't apply features. Modify SupportsOriginalSourceAddr(), so that the proxy doesn't SNAT such traffic when some conditions are met. Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com> Signed-off-by: Julian Wiedmann <jwi@isovalent.com> 07 March 2024, 07:25:45 UTC
9fe7f9c pkg: proxy: only install from-proxy rules/routes for native routing [ upstream commit 0ebe5162373c00f85e7ae43d0bc5d474fa08c485 ] [ backporter's notes: this is a custom backport to init.sh ] With tunnel routing, traffic to remote pods already flows via cilium_host. This is sufficient for what IPsec requires. Thus currently only native routing requires the custom redirect logic for from-ingress proxy traffic. Signed-off-by: Julian Wiedmann <jwi@isovalent.com> 07 March 2024, 07:25:45 UTC
ed64f4d bpf: l3: restore MARK_MAGIC_PROXY_INGRESS for from-proxy traffic [ upstream commit d2f1ea09b48416805600c8524443468ea4ffdaaf ] With https://github.com/cilium/cilium/pull/29530 in place, we now also divert proxy traffic to cilium_host when per-EP routes are enabled. But we potentially still need to deliver this traffic to a local endpoint - say for a pod-to-pod connection on the same node, with L7 proxy inbetween. In a configuration with per-EP routes but no BPF Host-Routing, l3_local_delivery() transfers the source identity to the skb->mark and redirects to bpf_lxc, where the to-container program handles the packet. If we transfer the packet with MARK_MAGIC_IDENTITY, to-container will look up the network policy and redirect to the L7 proxy *again*. Thus we need to fully restore the proxy's actual mark, so that to-container's inherit_identity_from_host() call finds the expected magic ID. It then sets the TC_INDEX_F_FROM_INGRESS_PROXY flag, and skips the redirect to L7 proxy. Signed-off-by: Julian Wiedmann <jwi@isovalent.com> 07 March 2024, 07:25:45 UTC
5f9c8fb bpf: work around scrubbing of skb->mark during veth transition [ upstream commit 3a93b00269b1fb762b2c9f98ab67a6ed3a333dda ] Previously we set skb->mark in from_host@cilium_host, expect the mark to remain unchanged after kernel transmits skb from cilium_host to cilium_net. The skb->mark is for instance used to transport IPsec-related information. However, as of 2023-10-19, kernel 5.10 still misses the backport patch[1] to fix a bug in skb_scrub_packet() which clears skb->mark for veth_xmit even if the veth pair is under the same netns: https://elixir.bootlin.com/linux/v5.10.198/source/include/linux/netdevice.h#L3975 To avoid hitting this issue, this patch sets metadata in skb->cb to survive skb_scrub_packet(), then to_host@cilium_net can retrieve this info and set proper mark. Only from_host bpf is setting cb, while from_lxc bpf is still using mark. [1]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ff70202b2d1a ("dev_forward_skb: do not scrub skb mark within the same name space") Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com> Signed-off-by: Julian Wiedmann <jwi@isovalent.com> 07 March 2024, 07:25:45 UTC
5390899 bpf_host can handle packets passed from L7 proxy [ upstream commit e78ff1690e4ab862057a6aefe5f0729340694254 ] Previously https://github.com/cilium/cilium/pull/25440 removed bpf_host's logic for host-to-remote-pod packets. However, we recently realized such host-to-remote-pod traffic can also be pod-to-pod traffic passing through L7 proxy. This commit made bpf_host capable of handling these host-to-remote-pod packets as long as they are originated from L7 proxy. Fixes: cilium/cilium#25440 Suggested-by: Paul Chaignon <paul.chaignon@gmail.com> Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com> Signed-off-by: Julian Wiedmann <jwi@isovalent.com> 07 March 2024, 07:25:45 UTC
823c594 Re-introduce 2005 route table [ upstream commit 217ae4f64183ce5112633338c88af2f16dfa8a14 ] [ backporter's notes: this is a custom backport to init.sh ] This commit re-introduced the 2005 routes that were removed by https://github.com/cilium/cilium/commit/9dd6cfcdf4406938c35c6ce2e8cc38fb5f2e9ea8 (datapath: remove 2005 route table for ipv6 only) and https://github.com/cilium/cilium/commit/c1a0dba3c0c79dc773ed9a9f75d5aa87b30f44f0 (datapath: remove 2005 route table for ipv4 only). Signed-off-by: Robin Gögge <r.goegge@gmail.com> Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com> Signed-off-by: Julian Wiedmann <jwi@isovalent.com> 07 March 2024, 07:25:45 UTC
176422a Allow proxy replies to WORLD_ID [ upstream commit ac6385637a7bc39ec636e3808d3a5e9c13cb3c0e ] This is an alternative approach to fix cilium/cilium#21954, so that we can re-introduce the 2005 from-proxy routing rule in following patches to fix L7 proxy issues. This commit simply allows packets to WORLD as long as they are from ingress proxy. This was one of the solution suggested by Martynas, as recorded in commit message cilium/cilium@c534bb7: One fix was to extend the troublesome check https://github.com/cilium/cilium/blob/v1.12.3/bpf/bpf_host.c#L626 by allowing proxy replies to `WORLD_ID`. To tell if an skb is originated from ingress proxy, the commit extends the semantic of existing flags `TC_INDEX_F_SKIP_{INGRESS,EGRESS}_PROXY`, renames flags to clarify the changed meaning. Fixes: cilium/cilium#21954 (Reply from pod to outside is dropped when L7 ingress policy is used) Signed-off-by: Zhichuan Liang <gray.liang@isovalent.com> Signed-off-by: Julian Wiedmann <jwi@isovalent.com> 07 March 2024, 07:25:45 UTC
6f15b00 chore(deps): update dependency cilium/cilium-cli to v0.16.0 Signed-off-by: renovate[bot] <bot@renovateapp.com> 06 March 2024, 01:48:12 UTC
4196a7f cli: Replace --cluster-name with --helm-set cluster.name [ upstream commit cfb11589e9758053f92c371a8fa71185f26f3f3f ] The --cluster-name flag got removed in cilium/cilium-cli#2351. Signed-off-by: Michi Mutsuzaki <michi@isovalent.com> 05 March 2024, 23:44:04 UTC
9a7a5d3 chore(deps): update all github action dependencies Signed-off-by: renovate[bot] <bot@renovateapp.com> 05 March 2024, 10:10:07 UTC
8c5ae98 chore(deps): update stable lvh-images Signed-off-by: renovate[bot] <bot@renovateapp.com> 04 March 2024, 20:10:57 UTC
edb5212 chore(deps): update all github action dependencies Signed-off-by: renovate[bot] <bot@renovateapp.com> 04 March 2024, 08:35:50 UTC
ee4d3cb ci/ipsec: Fix downgrade version retrieval [ upstream commit 6fee46f9e7531fd29ed290d5d4024dd951635e88 ] [ backporter's note: - e2e upgrade test doesn't exist in this branch. Removed it. - Minor conflict in tests-clustermesh-upgrade.yaml ++<<<<<<< HEAD + if [ "${{ github.event_name }}" = "workflow_dispatch" ]; then + SHA="${{ inputs.SHA }}" + else + SHA="${{ github.sha }}" + fi ++======= + CILIUM_DOWNGRADE_VERSION=$(contrib/scripts/print-downgrade-version.sh stable) + echo "downgrade_version=${CILIUM_DOWNGRADE_VERSION}" >> $GITHUB_OUTPUT ++>>>>>>> 8c3b175f5d (ci/ipsec: Fix downgrade version retrieval) ] Figuring out the right "previous patch release version number" to downgrade to in print-downgrade-version.sh turns out to be more complex than expected [0][1][2][3]. This commit is an attempt to 1) fix issues with the current script and 2) overall make the script clearer, so we can avoid repeating these mistakes. As for the fixes, there are two things that are not correct with the current version. First, we're trying to validate the existence of the tag to downgrade to, in case the script runs on top of a release preparation commit for which file VERSION has been updated to a value that does not yet contains a corresponding tag. This part of the script is actually OK, but not the way we call it in the IPsec workflow: we use "fetch-tags: true" but "fetch-depth: 0" (the default), and the two are not compatible, a shallow clone results in no tags being fetched. To address this, we retrieve the tag differently: instead of relying on "fetch-tags" from the workflow, we call "git fetch" from the script itself, provided the preconditions are met (we only run it from a Git repository, if the "origin" remote is defined). If the tag exists, either locally or remotely, then we can use it. Otherwise, the script considers that it runs from a release preparation Pull Request, and decrements the patch release number. The second issue is that we would return no value from the script if the patch release is zero. This is to avoid any attempt to find a previous patch release when working on a development branch. However, this logics is incorrect (it comes from a previous version of the script where we would always decrement the patch number). After the first release of a new minor version, it's fine to have a patch number at 0. What we should check instead is whether the version ends with "-dev". This commit brings additional changes for clarity: more comments, and a better separation between the "get latest patch release" and "get previous stable branch" cases, moving the relevant code to independent functions, plus better argument handling. We also edit the IPsec workflow to add some logs about the version retrieved. The logs should also display the script's error messages, if any, that are printed to stderr. Sample output from the script: VERSION Tag exists Prevous minor Previous patch release 1.14.3 Y v1.13 v1.14.3 1.14.1 Y v1.13 v1.14.1 1.14.0 Y v1.13 v1.14.0 1.14.1-dev N v1.13 <error> 1.15.0-dev N v1.14 <error> 1.13.90 N v1.12 v1.13.89 <- decremented 2.0.0 N <error> <error> 2.0.1 N <error> v2.0.0 <- decremented 2.1.1 N v2.0 v2.1.0 <- decremented [0] 56dfec2f1ac5 ("contrib/scripts: Support patch releases in print-downgrade-version.sh") [1] 4d7902f54a74 ("contrib/scripts: Remove special handling for patch release number 90") [2] 5581963cbf94 ("ci/ipsec: Fix version retrieval for downgrades to closest patch release") [3] 3803f539a740 ("ci/ipsec: Fix downgrade version for release preparation commits") Fixes: 3803f539a740 ("ci/ipsec: Fix downgrade version for release preparation commits") Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com> 01 March 2024, 07:41:50 UTC
51602e1 pkg/endpoint: remove reserved:init from endpoints [ upstream commit 96209790241da7be21e7c1be23e11afe1f84de4f ] [ backporter's note: Had a following conflict ++<<<<<<< HEAD ++======= + // If the endpoint is in an 'init' state we need to remove this label + // regardless of the "sourceFilter". Otherwise, we face risk of leaving the + // endpoint with the reserved:init state forever. + // We will perform the replacement only if: + // - there are new identity labels being added; + // - the sourceFilter is not any; If it is "any" then it was already + // replaced by the previous replaceIdentityLabels call. + // - the new identity labels don't contain the reserved:init label + // - the endpoint is in this init state. + if len(identityLabels) != 0 && + sourceFilter != labels.LabelSourceAny && + !identityLabels.Has(labels.NewLabel(labels.IDNameInit, "", labels.LabelSourceReserved)) && + e.IsInit() { + + idLabls := e.OpLabels.IdentityLabels() + delete(idLabls, labels.IDNameInit) + rev = e.replaceIdentityLabels(labels.LabelSourceAny, idLabls) + } + ++>>>>>>> 4ec84be6b1 (pkg/endpoint: remove reserved:init from endpoints) Took a 4ec84be6b1's one. ] Previously, a bug introduced in e43b759bab69 caused the 'reserved:init' label to persist even after an endpoint received its security identity labels. This resulted in endpoints being unable to send or receive any network traffic. This fix ensures that the 'reserved:init' label is properly removed once initialization is complete. Fixes: e43b759bab69 ("pkg/endpoint: keep endpoint labels for their original sources") Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com> 01 March 2024, 07:41:50 UTC
65f2d2e helm: Probe Envoy DaemonSet localhost IP directly [ upstream commit 29a7918faaa947fdae29fca17d83027c012b11e9 ] On IPv6-only clusters, querying localhost for the health check could attempt to check 127.0.0.1, presumable depending on host DNS configuration. As the health check does not listen on IPv4 when .Values.ipv4.enabled is false, this health check could fail. This patch uses the same logic as the bootstrap-config.json file to ensure a valid IP is always used for the health check. Fixes: #30968 Fixes: 859d2a9676c4 ("helm: use /ready from Envoy admin iface for healthprobes on daemonset") Signed-off-by: Andrew Titmuss <iandrewt@icloud.com> 01 March 2024, 05:15:58 UTC
b14d4ca bgpv1: Downgrade peer state transition logs to Debug [ upstream commit 148f81f1e3a486b8ab99ec50f2db52bb2e98f00f ] Users can now easily check the current peering state with `cilium bgp peers` command. Thus state transition logs become relatively unimportant for users. Downgrade the logs to debug level. Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com> 01 March 2024, 05:15:58 UTC
08277b9 bgpv1: Remove unnecessary stat logs from neighbor reconciler [ upstream commit c00330cdf437c4a35d53b87709454cabdeb30c56 ] [ backporter's note: neighbor.go is still under pkg/bgpv1/manager/. Do the same change for pkg/bgpv1/manager/reconcile.go. ] We don't need to show create/update/delete counts because we show logs for all create/update/delete operation anyways. Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com> 01 March 2024, 05:15:58 UTC
0fb82e9 bgpv1: Remove noisy logs from neighbor reconciler [ upstream commit 66e5de684e4ed49d806d0a6e2b1a6f272d87bb63 ] [ backporter's note: neighbor.go is still under pkg/bgpv1/manager/. Do the same change for pkg/bgpv1/manager/reconcile.go. ] Remove noisy logs generated for every single reconciliation. Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com> 01 March 2024, 05:15:58 UTC
47badc6 bgpv1: Inform when the node is not selected anymore [ upstream commit 4c5f79d4af115393d625fed6eb455265b628a9a9 ] [ backporter's note: Initialize LocalNodeStore on test init and deinitialize on test deinit. ] When users stop selecting the node with CiliumBGPPeeringPolicy, BGP Control Plane removes all running virtual router instances. However, it is only notified with Debug level. Upgrade it to Info level since this is an important information which helps users to investigate session disruption with configuration miss. Also, the log is generated and full reconciliation happens even if there is no previous policy applied. This means when there's no policy applied and any relevant resource (e.g. Service) is updated, it will generate the log and does full withdrawal meaninglessly. Introduce a flag that indicates whether there is a previous policy and conditionally trigger log generation and full withdrawal. Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com> 01 March 2024, 05:15:58 UTC
6c7f81c bgpv1: Remove a noisy log in Controller [ upstream commit 329fefb06c0cd58ea2ff3e361e1fc9d70ada2ef4 ] [ backporter's note: Fix minor conflict due to the c.BGPMgr.ConfigurePeers fixture change. ] Controller generate a log for every single reconciliation. This is noisy and doesn't make much sense since users doesn't care about reconciliation happening, but the outcome of the reconciliation. Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com> 01 March 2024, 05:15:58 UTC
f3ac3cc endpoint: don't create endpoint with labels [ upstream commit cb1533394cb4f00a69f38fd1fd1e51790b18fde1 ] When endpoint is created and `EndpointChangeRequest` contains labels, it might cause the endpoint regeneration to not be triggered as it is only triggered when labels are changed. Unfortunately this does not happen when epTemplate.Labels are set with the same labels as `EndpointChangeRequest`. This commit fixes the above issue by not setting epTemplate.Labels. Fixes: #29776 Signed-off-by: Ondrej Blazek <ondrej.blazek@firma.seznam.cz> 01 March 2024, 05:15:58 UTC
1dae60b lbipam: copy slice before modification in (*LBIPAM).handlePoolModified [ upstream commit 344180046abdb8c0057864031d7091d57aa91467 ] In Go 1.22, slices.Delete will clear the slice elements that got discarded. This leads to the slice containing the existing ranges in (*LBIPAM).handlePoolModified to be cleared while being looped over, leading to the following nil dereference in TestConflictResolution: ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ PANIC package: github.com/cilium/cilium/operator/pkg/lbipam • TestConflictResolution ┃ ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛ panic: runtime error: invalid memory address or nil pointer dereference [recovered] panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x1a8c814] goroutine 22 [running]: testing.tRunner.func1.2({0x1d5e400, 0x39e3fe0}) /home/travis/.gimme/versions/go1.22.0.linux.arm64/src/testing/testing.go:1631 +0x1c4 testing.tRunner.func1() /home/travis/.gimme/versions/go1.22.0.linux.arm64/src/testing/testing.go:1634 +0x33c panic({0x1d5e400?, 0x39e3fe0?}) /home/travis/.gimme/versions/go1.22.0.linux.arm64/src/runtime/panic.go:770 +0x124 github.com/cilium/cilium/operator/pkg/lbipam.(*LBRange).EqualCIDR(0x400021d260?, {{0x24f5388?, 0x3fce4e0?}, 0x400012c018?}, {{0x1ea5e20?, 0x0?}, 0x400012c018?}) /home/travis/gopath/src/github.com/cilium/cilium/operator/pkg/lbipam/range_store.go:151 +0x74 github.com/cilium/cilium/operator/pkg/lbipam.(*LBIPAM).handlePoolModified(0x400021d260, {0x24f5388, 0x3fce4e0}, 0x40000ed200) /home/travis/gopath/src/github.com/cilium/cilium/operator/pkg/lbipam/lbipam.go:1392 +0xfa0 github.com/cilium/cilium/operator/pkg/lbipam.(*LBIPAM).poolOnUpsert(0x400021d260, {0x24f5388, 0x3fce4e0}, {{0xffff88e06108?, 0x10?}, {0x4000088808?, 0x40003ea910?}}, 0x40000ed080?) /home/travis/gopath/src/github.com/cilium/cilium/operator/pkg/lbipam/lbipam.go:279 +0xe0 github.com/cilium/cilium/operator/pkg/lbipam.(*LBIPAM).handlePoolEvent(0x400021d260, {0x24f5388?, 0x3fce4e0?}, {{0x214e78e, 0x6}, {{0x400034d1d8, 0x6}, {0x0, 0x0}}, 0x40000ed080, ...}) /home/travis/gopath/src/github.com/cilium/cilium/operator/pkg/lbipam/lbipam.go:233 +0x1d8 github.com/cilium/cilium/operator/pkg/lbipam.(*newFixture).UpsertPool(0x40008bfe18, 0x40002a4b60, 0x40000ed080) /home/travis/gopath/src/github.com/cilium/cilium/operator/pkg/lbipam/lbipam_fixture_test.go:177 +0x148 github.com/cilium/cilium/operator/pkg/lbipam.TestConflictResolution(0x40002a4b60) /home/travis/gopath/src/github.com/cilium/cilium/operator/pkg/lbipam/lbipam_test.go:56 +0x3fc testing.tRunner(0x40002a4b60, 0x22a2558) /home/travis/.gimme/versions/go1.22.0.linux.arm64/src/testing/testing.go:1689 +0xec created by testing.(*T).Run in goroutine 1 /home/travis/.gimme/versions/go1.22.0.linux.arm64/src/testing/testing.go:1742 +0x318 FAIL github.com/cilium/cilium/operator/pkg/lbipam 0.043s Fix this by cloning the slice before iterating over it. Signed-off-by: Tobias Klauser <tobias@cilium.io> 01 March 2024, 05:15:58 UTC
b13d6b5 slices: don't modify input slices in test [ upstream commit 32543a40b1b6c5335a89850a6a25e556c6b2fe8b ] In Go 1.22, slices.CompactFunc will clear the slice elements that got discarded. This makes TestSortedUniqueFunc fail if it is run in succession to other tests modifying the input slice. Avoid this case by not modifying the input slice in the test case but make a copy for the sake of the test. Signed-off-by: Tobias Klauser <tobias@cilium.io> 01 March 2024, 05:15:58 UTC
ec0b2e5 bpf: nodeport: add missing ifindex in NAT trace event Looks like I missed some parts when resolving conflicts in the backport for 1113d70. Fixes: d086a71be998 ("bpf: nodeport: populate ifindex in NAT trace event") Signed-off-by: Julian Wiedmann <jwi@isovalent.com> 29 February 2024, 18:03:50 UTC
23d47ca images: update cilium-{runtime,builder} Signed-off-by: Cilium Imagebot <noreply@cilium.io> 27 February 2024, 19:25:35 UTC
853b986 chore(deps): update go to v1.21.7 Signed-off-by: renovate[bot] <bot@renovateapp.com> 27 February 2024, 19:25:35 UTC
54d889e chore(deps): update actions/download-artifact action to v4.1.3 Signed-off-by: renovate[bot] <bot@renovateapp.com> 27 February 2024, 09:58:35 UTC
b049f7d chore(deps): update quay.io/lvh-images/kind docker tag to v6.6-20240221.111541 Signed-off-by: renovate[bot] <bot@renovateapp.com> 27 February 2024, 09:57:15 UTC
5047d99 images: update cilium-{runtime,builder} Signed-off-by: Cilium Imagebot <noreply@cilium.io> 26 February 2024, 15:27:08 UTC
511880c chore(deps): update all-dependencies Signed-off-by: renovate[bot] <bot@renovateapp.com> 26 February 2024, 15:27:08 UTC
e4872c2 chore(deps): update all github action dependencies Signed-off-by: renovate[bot] <bot@renovateapp.com> 26 February 2024, 15:07:12 UTC
fb26a16 gha: align again conformance clustermesh matrix entries with main Now that a known interoperability issue between external kvstore and wireguard has been fixed [1], let also switch the last conformance clustermesh matrix entry to use the external kvstore, for symmetry with v1.15 and main. [1]: 2e7a1c3c6db7 ("node: Fix inconsistent EncryptKey index handling") Fixes: a5de29e58908 ("gha: extend conformance clustermesh to also cover external kvstores") Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> 22 February 2024, 20:02:40 UTC
d086a71 bpf: nodeport: populate ifindex in NAT trace event [ upstream commit 1113d7091166f94d10971e8401cdadd842f1cada ] This helps to clarify the exact origin of a TO_NETWORK trace event. Signed-off-by: Julian Wiedmann <jwi@isovalent.com> 22 February 2024, 09:36:28 UTC
1611ab0 bpf: nat: report pre-SNAT address in trace event [ upstream commit 2853c5232ee528c1a2f4550f57f19e19271a162f ] [ backporter's notes: obtaining the pre-SNAT address is a bit more complicated in the v1.14 code base ...] When applying SNAT to a packet, also report the original source address in the subsequent trace event. This helps to associate the internal and external view of a connection. We use the `orig_addr` field in the trace event, which was originally introduced back with b3aa583d494a ("bpf: Report original source IP in TRACE_TO_LXC") Signed-off-by: Julian Wiedmann <jwi@isovalent.com> 22 February 2024, 09:36:28 UTC
f49f484 policy: Only record an old entry if needed [ upstream commit 99168243892ab5d7fae9819cb58b80b31d8fb8ac ] Only record an old entry in ChangeState if it existed before this round of changes. We do this by testing if the entry is already in Adds. If not, then we record the old entry key and value. If the Adds entry exists, however, this entry may have only been added on this round of changes and we do not record the old value. This is safe due to the fact that when the Adds entry is created, the Old value is stored before adding the Adds entry, so for the first Adds entry the Old value does not yet exist and will be added. This removes extraneous Old entries that did not actually originally exist. Before this ChangeState.Revert did restore an entry the should not exists based on these extraneous Old entries. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> 21 February 2024, 16:08:35 UTC
e8770c6 ci: Restrict running tests to only the organization-members team [ upstream commit b19321e0274cc168295e0c270275f0f835bbe2ae ] This commit updates the Ariane configuration to include the GitHub organization team 'organization-members' in the list of allowed teams. Consequently, only members of this specific team will have the authorization to initiate test runs via issue comments. Signed-off-by: Birol Bilgin <birol@cilium.io> Signed-off-by: Tobias Klauser <tklauser@distanz.ch> 21 February 2024, 15:58:15 UTC
4c4357f pkg: Add Bitwise LPM Trie Library [ upstream commit 27430d4b97fdfc3ae345cfe9f181d74c3eaf6996 ] This bitwise lpm trie is a non-thread-safe binary trie that indexes arbitrarily long bit-based keys with associated prefixes indexed from most significant bit to least significant bit using the longest prefix match algorithm. Documenting the behavior of the datastructure is localized around the method calls in the trie.go file. The tests specifically test boundary cases for the various methods and fuzzes the RangeLookup method. Updating CODEOWNERS to put sig-policy and ipcache in charge of this library. Fixes: #29519 Co-authored-by: Casey Callendrello <cdc@isovalent.com> Signed-off-by: Nate Sweet <nathanjsweet@pm.me> Signed-off-by: Tobias Klauser <tklauser@distanz.ch> 21 February 2024, 15:58:15 UTC
cdee8b4 ci: change ariane config codeowners [ upstream commit bb81c06cb6a58251fb0aa966b89a52b457795433 ] The current process delegates the review of ariane-config.yaml changes to the contributing group. With this commit reviewing responsibilities be transferred to the github-sec and ci-structure groups. Signed-off-by: Birol Bilgin <birol@cilium.io> Signed-off-by: Tobias Klauser <tklauser@distanz.ch> 21 February 2024, 15:58:15 UTC
62e5d62 ci: Update tested K8S versions across all cloud providers [ upstream commit 14d68f20830dd286be2c9710c0a10fb823ad019d ] This commit revises the Kubernetes versions tested for compatibility across all supported cloud providers. Additionally, it adjusts the default Kubernetes version to match the default version provided by each cloud provider Signed-off-by: Birol Bilgin <birol@cilium.io> Signed-off-by: Tobias Klauser <tklauser@distanz.ch> 21 February 2024, 15:58:15 UTC
a843481 ci: Address AKS release cycle gap [ upstream commit d7f5e58d55d77d5ad8e15cb7e564828ac6bf96ee ] In the AKS release cycle, a gap exists between the introduction of new supported Kubernetes versions and the removal of older versions, leading to failures in scheduled tests. This PR introduces the capability to disable older Kubernetes versions, mitigating test failures. Signed-off-by: Birol Bilgin <birol@cilium.io> Signed-off-by: Tobias Klauser <tklauser@distanz.ch> 21 February 2024, 15:58:15 UTC
2e10b6c Network performance: fix native routing case [ upstream commit dc6cf34a32859c78fe41252ed095895d31bab9f8 ] While fixing one of the review comments in PR that introduced this test, I changed datapath mode to be explicitly set from matrix.mode. Unfortunately, setting `native` makes it actually use `tunneling` mode. Switching to `gke` mode resolves this issue. Fixes #30247 Signed-off-by: Marcel Zieba <marcel.zieba@isovalent.com> Signed-off-by: Tobias Klauser <tklauser@distanz.ch> 21 February 2024, 15:58:15 UTC
4726c98 labels: don't alloc a buf per label for SortedList [ upstream commit c1e21d8b7089b0b4fc51992685b223063f358d5f ] SortedList appears prominently in both CPU and Heap pprofs when in a scenario with an FQDN policy matching S3. This is because DNS requests to S3 return short-lived DNS responses from a pool of many IPs, each of which will receive a CIDR identitiy in cilium. Since we furthermore call SortedList repeatedly for these CIDR identities (represented as a set of CIDR labels containing all super-CIDRs), avoiding ~32 buffer allocations per call is worth it. Before: Labels_SortedList-10 1000000 3079 ns/op 504 B/op 13 allocs/op Labels_SortedListCIDRIDs-10 52702 21417 ns/op 3680 B/op 41 allocs/op After: Labels_SortedList-10 1000000 2164 ns/op 360 B/op 3 allocs/op Labels_SortedListCIDRIDs-10 72180 15444 ns/op 1624 B/op 3 allocs/op Benchstat: │ old │ opt │ │ sec/op │ sec/op vs base │ Labels_SortedList-10 3.279µ ± 6% 2.209µ ± 6% -32.65% (p=0.000 n=10) │ B/op │ B/op vs base │ Labels_SortedList-10 504.0 ± 0% 360.0 ± 0% -28.57% (p=0.000 n=10) │ allocs/op │ allocs/op vs base │ Labels_SortedList-10 13.000 ± 0% 3.000 ± 0% -76.92% (p=0.000 n=10) pkg: github.com/cilium/cilium/pkg/labels/cidr │ old │ opt │ │ sec/op │ sec/op vs base │ Labels_SortedListCIDRIDs-10 21.23µ ± 5% 14.89µ ± 9% -29.87% (p=0.000 n=10) │ B/op │ B/op vs base │ Labels_SortedListCIDRIDs-10 3.594Ki ± 0% 1.586Ki ± 0% -55.87% (p=0.000 n=10) │ allocs/op │ allocs/op vs base │ Labels_SortedListCIDRIDs-10 41.000 ± 0% 3.000 ± 0% -92.68% (p=0.000 n=10) Signed-off-by: David Bimmler <david.bimmler@isovalent.com> Signed-off-by: Tobias Klauser <tklauser@distanz.ch> 21 February 2024, 15:58:15 UTC
d9245b0 identity/cache: only call SortedList for release [ upstream commit 9b43d42ae8f1eed0a781a98de772dfa1993fbde4 ] This is on the hot path when we have a fqdn policy for S3, where a single hostname maps to many IPs. This occurs due to a combination of factors: 1. We allocate a CIDR identitiy for each IP for a hostname which matches a fqdn policy. 2. For each CIDR identity, we generate labels for all super-CIDRs (i.e. CIDRs which contain this CIDR). 3. We opportunistically allocate an identity for all IPs which are matched by a fqdn selector. For those we already knew, other parts of the code decrement the ref count again. 4. We use the SortedList serialization as the key to lookup for existing identities here, which sorts and serializes the labels to a byte array, which is reasonably expensive. Since we can't as easily fix the other factors at play here, at least avoid doing it twice for each label set during the opportunistic acquire/release path. We can lookup by identity for the fast path, and build the string representation for the slower path if need be. We hit this path for every IP returned by S3 that's still being kept alive (be that because of DNS TTL or the zombie mechanism), hence we can easily get to 5k acquire/release pairs per DNS request. While we're at it, also reduce the critical section by moving the SortedList call outside in lookupOrCreate. Signed-off-by: David Bimmler <david.bimmler@isovalent.com> Signed-off-by: Tobias Klauser <tklauser@distanz.ch> 21 February 2024, 15:58:15 UTC
92e239e labels/cidr: benchmark SortedList on CIDR labels [ upstream commit aa5f6998bda903837ddc560d4abf1b3d6d24898e ] This case is exercised heavily in the toFQDN policy incremental policy computation flow. Signed-off-by: David Bimmler <david.bimmler@isovalent.com> Signed-off-by: Tobias Klauser <tklauser@distanz.ch> 21 February 2024, 15:58:15 UTC
19d7326 ci-e2e: restore 6.1 kernels I accidentally swapped out 6.0 with 6.6 instead of 6.6. Use 6.1 to have a more consistent configuration across branches. Signed-off-by: Lorenz Bauer <lmb@isovalent.com> 20 February 2024, 11:35:35 UTC
a51c7a8 chore(deps): update dependency cilium/cilium-cli to v0.15.23 Signed-off-by: renovate[bot] <bot@renovateapp.com> 20 February 2024, 11:31:58 UTC
1fb49c2 images: update cilium-{runtime,builder} Signed-off-by: Cilium Imagebot <noreply@cilium.io> 17 February 2024, 10:07:28 UTC
c48dce2 chore(deps): update docker.io/library/ubuntu:22.04 docker digest to e9569c2 Signed-off-by: renovate[bot] <bot@renovateapp.com> 17 February 2024, 10:07:28 UTC
13f2cd0 srv6: Fix packet drop with GSO type mismatch [ upstream commit 12e3ae9936bd82924a995cbf2a2eb6284d6db0cb ] When the Pod generates a TCP stream larger than MSS, it may be sent as a GSO large packet. We observed in such a case, SRv6-encapsulated packet is dropped. The root cause was a misuse of ctx_adjust_hroom. We call it ctx_adjust_hroom(ctx, growth, BPF_ADJ_ROOM_MAC, 0), but this way, the helper is not aware of what kind of encapsulation we want to perform, so it doesn't adjust skb->inner_protocol (should be ETH_P_IP) and skb_shinfo->gso_type (should be SKB_GSO_IPXIP6 | SKB_GSO_TCPV4) appropriately. As a result, the packet will be dropped in ip4ip6_gso_segment due to the flag mismatch. Use BPF_F_ADJ_ROOM_ENCAP_L3_IPV6 flag which is introduced to solve this problem. Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com> Signed-off-by: Tam Mach <tam.mach@cilium.io> 16 February 2024, 13:40:00 UTC
4bfd2ca workflows: Clean IPsec test output [ upstream commit 3c479d406ab1abc548b317f02ab2ccd1a3bb20ef ] The test output are riddled with logs such as: Defaulted container "cilium-agent" out of: cilium-agent, config (init), mount-cgroup (init), apply-sysctl-overwrites (init), mount-bpf-fs (init), clean-cilium-state (init), install-cni-binaries (init) This gets particularly noisy when waiting for the key rotation to complete, during which time we run kubectl exec repeatedly. This commit fixes it. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> Signed-off-by: Tam Mach <tam.mach@cilium.io> 16 February 2024, 13:40:00 UTC
bc3ac36 docs: Document XfrmInStateInvalid errors [ upstream commit c19a84ef74a57ccbed4ed6ecdf810cf0c030e689 ] This error can happen if a state is being destroyed while packets are in flight. It should be rare as the window in the kernel where it can happen is very short. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> Signed-off-by: Tam Mach <tam.mach@cilium.io> 16 February 2024, 13:40:00 UTC
991d626 set R-bit in graceful restart [ upstream commit 11b01e397984763c283845ded34657682f9ce6fe ] Set R-bit in graceful restart negotiation to not to withdraw routes by Receiving Speaker Details is in https://www.rfc-editor.org/rfc/rfc4724.html#section-4.1 Fixes: #28168 Signed-off-by: ArsenyBelorukov <arsenig.n@gmail.com> Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com> 16 February 2024, 13:38:58 UTC
f26e34f Upgrade GoBGP to v3.23.0 To pull-in the upstream change (https://github.com/osrg/gobgp/pull/2761) to fix #30367. Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com> 16 February 2024, 13:38:58 UTC
1b33c9b bgpv1: Make NeighborReconciler idempotent [ upstream commit 192f37cd00faf18ec48f1025cf470daa73ebad76 ] [ backporter's note: NeighborReconciler doesn't have any metadata infrastructure because we don't have MD5 password feature, so we needed to introduce it in this commit. ] Currenly, NeighborReconciler only reconciles on old vs new configuration differences and doesn't take the actual GoBGP's running state into account. As a result, when the reconciliation of another reconciler fails and CurrentServer.Config is not updated, it tries to repeat the previous reconciliation again. However, when it involves adding the neighbor, it fails, because adding neighbor is not idempotent in GoBGP. To solve this issue, we track the neighbor state in metadata and compare the diff between new config and the state instead of old config. This makes NeighborReconciler idempotent. Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com> 16 February 2024, 04:59:57 UTC
ec0425b bgpv1: Introduce a helper method to construct neighbor ID [ upstream commit 7684f35d055e367f37834cd131358c08df91e460 ] [ backporter's note: Moved neighbor reconciler's logic from individual file to reconciler.go and consolidate neighborReconciler function into NeighborReconciler.Reconcile. ] There are a lot of repetition of the same fmt.Sprintf in NeighborReconciler that constructs the same neighbor ID. Introduce a helper method to construct neighbor ID. Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com> 16 February 2024, 04:59:57 UTC
b30f6aa bgpv1: Run reconcilers twice in the tests [ upstream commit 7275198b82fd7bcc6f6edc295097dde8500a64ab ] [ backporter's comment: All reconcilers are separated into file in upstream. In v1.14, they are still in the same file. Needed to move test code. ] Extend existing reconciler test cases to always execute Reconcile() twice. This simulates the behavior that BGPRouterManager doesn't clear the BGP server on reconciliation failure and the next event occurs and ensures the reconcilers are not relying on that behavior. As a result, existing reconcilers except NeighborReconciler passed the test. The fix for the neighbor reconciler will be followed. Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com> 16 February 2024, 04:59:57 UTC
5f2db81 bgpv1: Don't stop/unregister BGP server on failure [ upstream commit 1676530af81cb97b2c8859e8bdb544749a8cb5b5 ] [ backporter's note: v1.14 is missing various refactoring commits on upstream. Needed to adjust some variable names, field names, and so on. ] Currently, when the BGP Control Plane meet the error during reconciliation, it stops and unregisters the server. This comes at a risk when we accidentally introduce a situation where reconcilers repeatedly return error for a long time. In such a situation, BGP Control Plane repeatedly create/destroy server which may cause a BGP session flapping or frequent route update/withdraw. We have this problem in two places. The first one is in BGPRouterManager.register(). When the initial reconciliation fails, BGP Control Plane stop the server it just creates. In this case, we don't need to stop server. We can register the server to the RouterManager before reconciliation and don't have to unregister it even if the initial reconciliation fails. This is because we have a retry logic now. The reconciliation failure triggers the retry and in the next cycle, the reconciliation will be handled in the BGPRouterManager.reconcile() because the server is already registered. The second problem is in BGPRouterManager.reconcile(). When the reconciliation fails, BGP Control Plane stops the BGP server in failure and unregisters it. Since we have a retry logic, we don't need to stop BGP servers. We can return error and retry later until we recover from the failure state. Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com> 16 February 2024, 04:59:57 UTC
02fe280 test: fix default ipv6 subnet [ upstream commit e5e72abaa20499ba757450a425aade807b764543 ] Newer docker seems to refuse the default subnet we are using: level=warning msg="Unable to create docker network cilium-net: Error response from daemon: invalid network config: invalid subnet ::1/112: it should be ::/112\" Let's go with the machine's suggestion. Signed-off-by: Lorenz Bauer <lmb@isovalent.com> 15 February 2024, 09:38:49 UTC
44df1cc plugins/cilium-docker: don't force docker API version [ upstream commit 4fcb735684dc2decbc85e5faddc2d4314b4920f3 ] Get rid of docker version force, since it causes the following errors: error="Error response from daemon: client version 1.21 is too old. Minimum supported API version is 1.24, please upgrade your client to a newer version" subsys=cilium-docker-driver The reason why we hardcode the version has been lost to the sands of time it seems. We only use the plugin for testing so there shouldn't be much harm. Signed-off-by: Lorenz Bauer <lmb@isovalent.com> 15 February 2024, 09:38:49 UTC
56bf249 workflows: replace references to bpf-next with current LTS We currently don't update bpf-next kernels, since we don't want to cause additional regressions. This is a problem since lvh updates may break existing images. Instead of relying on bpf-next, use the current LTS kernel in workflows. Signed-off-by: Lorenz Bauer <lmb@isovalent.com> 15 February 2024, 09:38:49 UTC
b85a6ff chore(deps): update stable lvh-images Signed-off-by: renovate[bot] <bot@renovateapp.com> 15 February 2024, 09:38:49 UTC
5fd020a chore(deps): update all github action dependencies to v4 Signed-off-by: renovate[bot] <bot@renovateapp.com> 15 February 2024, 05:07:15 UTC
back to top