https://github.com/cilium/cilium

sort by:
Revision Author Date Message Commit Date
2db45c4 Prepare for release v1.15.0 Signed-off-by: André Martins <andre@cilium.io> 31 January 2024, 19:26:55 UTC
b75fba3 node/manager: Fix encryptKey for health/ingress IPs [ upstream commit d4be9e87cce2c90df29a4727b20bdee7b4bc05e8 ] WireGuard-based encryption uses different values when it comes to the EncryptKey field in the IPCache: - For endpoints, the EncryptKey should always be non-zero if pod-to-pod encryption is enabled. - For nodes, the EncryptKey should be non-zero if node-to-node encryption is enabled, and if the node has not opted out of node-to-node encryption. When creating IPCache entries for regular endpoints, the EncryptKey value is taken from either the CEP/CES Kubernetes custom resource, or from the IP entry in the kvstore. However, the IPCache entries for the health and ingress endpoints are dervied from the node resource. Before this commit, those entries therefore also used the node's EncryptKey. When a node opted out of node-to-node encryption however, that meant that we did not encrypt traffic for health and ingress to those nodes. Thus, this commit works around that issue by ignoring the node's EncryptKey for the health and ingress endpoints and instead always encrypt that traffic (which is what we do for regular endpoints with WireGuard encryption already). This commit can be considered a workaround. Ideally, we would have separate fields in the node resource for the node and its special endpoints. That however is a larger schema change, thus this commit focuses on backportable changes. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> 31 January 2024, 15:36:07 UTC
24a07ba wireguard: Fix node-to-node encryption opt-out in kvstore mode [ upstream commit 930d4427817dc2fedbcb53c18f20243721c36cc5 ] This commit fixes an issue where the node's EncryptKey field in the kvstore was not set to zero for nodes which opted out of node-to-node encryption, thereby effectively encrypting traffic for nodes which should not have been encrypted. This was due to the fact that the logic zeroing the local node's EncryptKey based on `localNode.OptOutNodeEncryption` was only performed when writing to the K8s CiliumNode CRD (in `NodeDiscovery.mutateNodeResource`), but not when writing to the kvstore (in the controller started by `NodeDiscovery.updateLocalNode`). This commit fixes this issue by zeroing `localNode.EncryptKey` directly at the source of truth in the `LocalNodeStore`. In addition, we need to be careful when that value is being written: - To avoid flapping values, the EncryptKey value needs to be initialized before we publish the local node object to the kvstore or k8s. Therefore, initialization of the EncryptKey field needs to happen before `LocalNodeSynchronizer.InitLocalNode` returns. - In order to determine if the local node has to opt out of node-to-node encryption, we need to know the local node's labels. Those are only available after `localNodeSynchronizer.initFromK8s` has been called - therefore we cannot set the field in a hive constructor like we do for some other fields that need to be initialized early. Therefore, this commit moves the initialization of the WireGuard related fields into `LocalNodeSynchronizer.InitLocalNode`. This satisfies the the above constraints for the `EncryptKey` and `OptOutNodeEncryption` values. The initialization of `WireguardPubKey` and the accompanying annotation can be performed earlier without loss of correctness, but to keep all local node initialization in the same place, we now also write those fields during `InitLocalNode`. This is safe, because `InitLocalNode` happens before the K8s/kvstore node object is published. This commit incidentally also fixes a bug where the `OptOutNodeEncryption` field in the WireGuard agent was read out too early, i.e. before `InitLocalNode` had a chance to initialize it. That bug had little consequence, as it only materialized in the `cilium status` not reporting the opt-out status correctly, and caused node IPs to be unnecessarily added to the WireGuard peer list. Both issues did not affect the encrypted traffic. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> 31 January 2024, 15:36:07 UTC
109031a wireguard: Remove unused field member [ upstream commit 230bdd98450b8d3eff14b388ff5d388188f1c428 ] This commit removes the localNodeStore field from the WireGuard agent, as is it not accessed outside of the constructor. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> 31 January 2024, 15:36:07 UTC
9274027 node: Remove superfluous if condition [ upstream commit 42910d414676a6a1786a05afca714bf3e5bd53ba ] This removes an unnecessary if condition in `GetEndpointEncryptKeyIndex`. If WireGuard is enabled, the WireGuard agent (see `pkg/wireguard/agent.NewAgent`) always sets a public key for the local node. Therefore this if condition is always true and we currently have no plans for nodes to have WireGuard enabled without them also producing a public key. Therefore this commit removes that superfluous condition. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> 31 January 2024, 15:36:07 UTC
ec38881 node: Fix inconsistent EncryptKey index handling [ upstream commit 9cd56a2fbeed048b6e2709f9f5a55fe18d3b0bfe ] WireGuard-based encryption uses different values when it comes to the EncryptKey field in the IPCache: - For endpoints, the EncryptKey should always be non-zero if pod-to-pod encryption is enabled. - For nodes, the EncryptKey should be non-zero if node-to-node encryption is enabled, and if the node has not opted out of node-to-node encryption. Before this commit, we were deriving the EncryptKey of endpoints written to the kvstore IPCache (see `runIPIdentitySync`) based on the node's EncryptKey - which is the wrong source of truth for that value. Luckily, due to a bug, it is currently not possible for Cilium nodes running in kvstore mode to opt-out of node-to-node encryption, so the value was always effectively non-zero and the result was accidentally correct. However, as we want to fix the node-to-node opt-out mechanism in a subsequent commit, we ought to fix that first. Therefore, this commit fixes up all call sites which set an endpoint's EncryptKey to use the same source of truth. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> 31 January 2024, 15:36:07 UTC
5c3fc96 node: drop unused SetOptOutNodeEncryption function [ upstream commit 0f01daa40eb5b5145b1383f5f926e3c744a18ef6 ] This global setter is no longer used since 95029ec3e99d ("daemon: Implement LocalNodeInitializer to fill local node info"). Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> 31 January 2024, 15:36:07 UTC
9e710c6 Fix error when using multiple allowRoutes namespaces in gateway Fixes: 30085 Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com> 31 January 2024, 14:55:49 UTC
0f2a341 ci: update docs-builder Signed-off-by: Cilium Imagebot <noreply@cilium.io> 31 January 2024, 12:07:29 UTC
6fcf213 ci: Move gs bucket env variable to set-env-variables action. [ upstream commit e9b0ae0b54c9635c852e8cbcfa6bffb653aaf7ff ] Signed-off-by: Marcel Zieba <marcel.zieba@isovalent.com> Signed-off-by: Jussi Maki <jussi@isovalent.com> 31 January 2024, 12:07:29 UTC
d93d705 Add network performance CI test. [ upstream commit 027bd96af890c865ce0c99788e991f85dfe65389 ] For now, we cover following matrix of features: - tunneling/direct-routing - no encryption/ipsec - hubble enabled/disabled All results are exported in a format compatible with Perfdash, where we can visualize results and see regressions/improvements for specific configurations. Signed-off-by: Marcel Zieba <marcel.zieba@isovalent.com> Signed-off-by: Jussi Maki <jussi@isovalent.com> 31 January 2024, 12:07:29 UTC
8520377 health-server: Fix various health probing bugs [ upstream commit 100818f882d8ba72544e51dd9859a0bb36e761f5 ] Fixes #29566 There were three issues with health-reporting/probing: - Whenever node was updated, it was received in nodesAdded and was overriding icmp result reporting node as unreachable - If Icmp probe stopped working and there were no node updates, it was reporting node as healthy even though probe was failing. - Http prober was not triggered at the start and only after probeInterval. Signed-off-by: Marcel Zieba <marcel.zieba@isovalent.com> Signed-off-by: Jussi Maki <jussi@isovalent.com> 31 January 2024, 12:07:29 UTC
e4b79dd health-server: remove unused proto parameter from resolveIP [ upstream commit ec6c6d0d5c731c58e9fdee6274520c38d48c658d ] Signed-off-by: Marcel Zieba <marcel.zieba@isovalent.com> Signed-off-by: Jussi Maki <jussi@isovalent.com> 31 January 2024, 12:07:29 UTC
619e45d testing: Update kube-proxy-replacement flag values [ upstream commit 3f2f369433ba798a639f242021b542f3a663f264 ] Partial and strict values are deprecating. Relates: #26036 Signed-off-by: Tam Mach <tam.mach@cilium.io> Signed-off-by: Jussi Maki <jussi@isovalent.com> 31 January 2024, 12:07:29 UTC
8f759e9 docs: Update kube-proxy-replacement flag values [ upstream commit 11355d5df302774ed579a36d15dcfbe14afaf7b8 ] Partial and strict values are deprecating. Relates: #26036 Signed-off-by: Tam Mach <tam.mach@cilium.io> Signed-off-by: Jussi Maki <jussi@isovalent.com> 31 January 2024, 12:07:29 UTC
539ca3d gha: Update kube-proxy-replacement flag values [ upstream commit 12babe68cf1d5b3247d3b6356c5a4bffcabd5666 ] Partial and strict values are deprecating. Relates: #26036 Signed-off-by: Tam Mach <tam.mach@cilium.io> Signed-off-by: Jussi Maki <jussi@isovalent.com> 31 January 2024, 12:07:29 UTC
f4b358d wireguard: also account for tunnel overhead [ upstream commit 44c3dd0e6b1074ef9450d96eae8d523be13a21d2 ] Since #29000 packets are always encapsulated before they are encrypted with WireGuard. Therefore, we also need to take the tunnel overhead for the route MTU into account. This fixes a performance regression. Before this commit WireGuard encrypted pod-to-pod traffic the iperf3 bandwidth was ~102 Mbits/sec. With this patch the bandwidth increases to 656 Mbits/sec. Without encryption the bandwidth is ~2 Gbits/sec. Fixes: b67291f039266418a9050dd47c4d01ff857865b8 Signed-off-by: Leonard Cohnen <lc@edgeless.systems> Signed-off-by: Jussi Maki <jussi@isovalent.com> 31 January 2024, 12:07:29 UTC
ee871c0 hive: Remove lifecycle type aliases and refactor uses [ upstream commit 19615a1c9c786b46461a89ce2395c31c2361747b ] This removes the now unnecessary type aliases for lifecycle related types from the hive package (hive.Lifecycle etc.) and refactors the uses to use cell.Lifecycle etc. Signed-off-by: Jussi Maki <jussi@isovalent.com> Signed-off-by: Jussi Maki <jussi@isovalent.com> 31 January 2024, 12:07:29 UTC
cb0ed6d hive: Add module ID to hive start/stop hook output [ upstream commit 07595dae12fcd52fbea8e339f8fca6d1a20fb2b2 ] The move of the lifecycle into separate package and having two implementations caused the hook output to be broken. This moves the lifecycle to the cell package and removes the internalLifecycle. With the lifecycle in the cell package we then add module IDs to the hooks to provide where the hook was added from as there's already many duplicate start/stop functions (e.g. from health reporter and job groups). "go run ./daemon hive" before: • hive.(*internalLifecycle).Append.func2 (pkg/hive/hive.go:185) • *job.group.Stop After: • *cell.reporterHooks.Stop (agent.datapath.bandwidth-manager) • *job.group.Stop (agent.datapath.l2-responder) Fixes: 7f94c897a1 ("hive: move lifecycle types to separate package.") Signed-off-by: Jussi Maki <jussi@isovalent.com> Signed-off-by: Jussi Maki <jussi@isovalent.com> 31 January 2024, 12:07:29 UTC
7e21bf4 ci/ipsec: Fix version retrieval for downgrades to closest patch release [ upstream commit 5581963cbf9489d980b9b5a8ccf1c5e017e35d3c ] This commit brings two fixes to the script that we use to determine to which version we should upgrade/downgrade in some CI workflows. The first fix is the most important one. When looking for the closest patch version, make the script return the value in VERSION instead of decrementing it. The rationale is that for stable branches, VERSION already points to the latest patch release, there is no need to decrease it further! This fix does not affect the output for the calculation of the previous minor version number. The second fix is simply the addition of an error message in case the minor version number is 0, to get some explicit error instead of a silent failure if we ever reach Cilium 2.0.0. Updated samples of numbers from VERSION and the corresponding values returned: VERSION Previous minor Previous patch release 1.14.3 v1.13 v1.14.3 1.14.1 v1.13 v1.14.1 1.14.0 v1.13 <error> 1.14.1-dev v1.13 v1.14.1 1.15.0-dev v1.14 <error> 1.13.90 v1.12 <error> 2.0.1 <error> v2.0.1 Fixes: 56dfec2f1ac5 ("contrib/scripts: Support patch releases in print-downgrade-version.sh") Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Jussi Maki <jussi@isovalent.com> 31 January 2024, 12:07:29 UTC
a36432c pkg/endpoint: fix endpoint health update always being ok. [ upstream commit 9bf7a8ec60e7637403bc09bbc00b1e28e322a86f ] In cases where the endpoint regen fails, such as in a complexity issue, the endpoints reporter should be put into a degraded state. However, currently it will degrade the endpoint hr, and the immediately clear the degraded state with hr.Ok(...). This fixes that to only clear if there is no error while regenerating. Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com> Signed-off-by: Jussi Maki <jussi@isovalent.com> 31 January 2024, 12:07:29 UTC
7efa7b8 bpf: nodeport: split up ingress path when HostFW is enabled [ upstream commit 36c7d66282449265eb4a897a4e18b39cbcdb177d ] On modern kernels, bpf_host currently builds a tail_nodeport_nat_ingress_ipv4() that includes the code for RevSNAT, HostFW and RevDNAT. But at least for the 6.1 kernel we're scratching at the complexity limit and hitting verifier troubles in main [0] and 1.15 [1]. It's currently unclear why we can't reliably(!) reproduce those troubles in CI. Take the pressure off by splitting the tail-call into two parts, whenever the HostFW is enabled - leaving RevSNAT and HostFW in the first part, while RevDNAT is handled in a separate tail-call. This prevents the trace_ctx from the RevSNAT to reach the nodeport_add_tunnel_encap() call in the RevDNAT code, but that's acceptable for now. [0]: https://github.com/cilium/cilium/issues/30266 [1]: https://github.com/cilium/cilium/issues/30093 Co-developed-by: Tom Hadlaw <tom.hadlaw@isovalent.com> Signed-off-by: Julian Wiedmann <jwi@isovalent.com> Signed-off-by: Jussi Maki <jussi@isovalent.com> 31 January 2024, 12:07:29 UTC
210fddb hubble-ui: release v0.12.3 [ upstream commit 3092ed1bc9e26136891976a0b30955da79eaa787 ] Signed-off-by: Dmitry Kharitonov <dmitry@isovalent.com> Signed-off-by: Jussi Maki <jussi@isovalent.com> 31 January 2024, 12:07:29 UTC
e28a503 docs: egressgw: describe routing on Gateway node [ upstream commit e777df1b68a2e1763b33813b7f06a10765aeb442 ] https://github.com/cilium/cilium/pull/26215 changed how we do egressGW-specific routing on the gateway node - instead of installing custom IP rules, we rely on the node's routing setup. https://github.com/cilium/cilium/pull/30286 then fixed up a corner-case on older kernels. Reflect both parts in the docs. Signed-off-by: Julian Wiedmann <jwi@isovalent.com> Signed-off-by: Jussi Maki <jussi@isovalent.com> 31 January 2024, 12:07:29 UTC
2a6659b docs: Update the Gateway API badge [ upstream commit 0983a5795ecdc5dc5a9909ff238d0a460978354e ] This is to update the Gateway API version to align with the conformance tests running in CI. Signed-off-by: Tam Mach <tam.mach@cilium.io> Signed-off-by: Jussi Maki <jussi@isovalent.com> 31 January 2024, 12:07:29 UTC
149d20e Fix where ls was an alias for flush in bpf auth [ upstream commit 7141c22936a432dfd5b46545bdc5478757de938e ] Remove a misplaces ls alias that caused cilium-dbg blf auth ls to flush the map. Signed-off-by: Maartje Eyskens <maartje@eyskens.me> Signed-off-by: Jussi Maki <jussi@isovalent.com> 31 January 2024, 12:07:29 UTC
4b65c6a fixed typo [ upstream commit 746b0f3eb35fe311d2751023fc0e576b2a62f6f5 ] Signed-off-by: Nico Vibert <nicolas.vibert@isovalent.com> Signed-off-by: Jussi Maki <jussi@isovalent.com> 31 January 2024, 12:07:29 UTC
eb99373 doc: Add Egress Gateway Policy warning [ upstream commit f30fd6044d827fd67f21d7ba11ab373562764dc5 ] This commit adds a warning to the Egress Gateway documentation to help user avoid deploying a known bad configuration. Co-authored-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: soggiest <nicholas@isovalent.com> Signed-off-by: Jussi Maki <jussi@isovalent.com> 31 January 2024, 12:07:29 UTC
b508384 ci: update docs-builder [ upstream commit 068dc473ba94456770b6138c84832471715f0258 ] Signed-off-by: Cilium Imagebot <noreply@cilium.io> Signed-off-by: Jussi Maki <jussi@isovalent.com> 31 January 2024, 12:07:29 UTC
b9f4a11 build(deps): bump jinja2 from 3.1.2 to 3.1.3 in /Documentation [ upstream commit a388c42e5d29539f98698448f8388f36c71c751f ] Bumps [jinja2](https://github.com/pallets/jinja) from 3.1.2 to 3.1.3. - [Release notes](https://github.com/pallets/jinja/releases) - [Changelog](https://github.com/pallets/jinja/blob/main/CHANGES.rst) - [Commits](https://github.com/pallets/jinja/compare/3.1.2...3.1.3) --- updated-dependencies: - dependency-name: jinja2 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> 31 January 2024, 12:07:29 UTC
5fa62a9 Fix quoting in nodeinit temporary cilium config [ upstream commit 87d948e7f25fccc07a5b1fa4e80dc97dc79f15b8 ] The Cilium nodeinit startup script lays down a temporary CNI config in order to be able to restart a version of containerd that doesn't allow a missing CNI config. This commit fixes an issue with missing double quotes in the temporary config which causes an error in containerd and leads to NotReady Kubernetes nodes I also considered heredoc or escaping the quote characters but settled on single quoting as I think its the most readable one line solution without needing to deal with the indentation issue with heredoc Signed-off-by: Tom Cowling <952241+tlcowling@users.noreply.github.com> Signed-off-by: Jussi Maki <jussi@isovalent.com> 31 January 2024, 12:07:29 UTC
c7427f0 bpf: lb: return drop reasons from __lb4_rev_nat() [ upstream commit 3932a4b9f4577a2133a6d436cc20542d9b48f8ef ] Fix up some ctx_load_bytes() usage to return a drop reason, and not the raw kernel errno. Signed-off-by: Julian Wiedmann <jwi@isovalent.com> Signed-off-by: Jussi Maki <jussi@isovalent.com> 31 January 2024, 12:07:29 UTC
c80e752 init well-known identity before new policy repository [ upstream commit db14f4bf937b1ea1598d24942c5cb32d56fd7a80 ] Fixes: #30051 Signed-off-by: Yingnan Zhang <342144303@qq.com> Signed-off-by: Jussi Maki <jussi@isovalent.com> 31 January 2024, 12:07:29 UTC
9b58f67 docs: warn users that IPsec and KPR are mutual exclusive [ upstream commit 09f18fdce65b8b020f2a5c345e199396d8bc38b1 ] Signed-off-by: Filip Nikolic <oss.filipn@gmail.com> Signed-off-by: Jussi Maki <jussi@isovalent.com> 31 January 2024, 12:07:29 UTC
1b1d4fe proxy: fix rule deletion if protocol family is unsupported [ upstream commit 945ad0c6fe41d178513a80ecf9f49380624f3cee ] Currently we try to remove IPv6 proxy rules if the IPv6 option is disabled. This is to clean up those rules if a previously running agent has installed them but was restarted with a configuration change. This can fail if the underlying kernel has no IPv6 support. This commit fixes this, by allowing the necessary netlink syscall to fail with EAFNOSUPPORT. Fixes: #29965 Signed-off-by: Robin Gögge <r.goegge@isovalent.com> Signed-off-by: Jussi Maki <jussi@isovalent.com> 31 January 2024, 12:07:29 UTC
7d0f31a daemon/cmd: Updates restoreIPCache() to use errors.Is() [ upstream commit d721b8be7df8372acc3832498638f8c5a699f0f5 ] Previously, the restoreIPCache() method would return an error on new installs because it was checking for the presence of the "file or dir missing" error but this error was being wrapped by another method in the call tree. This PR updates the restoreIPCache() method to use errors.Is() that reports whether any error in err's tree matches the target and thus reports a nil error on new installs when the "cilium_ipcache" file does not exist. Fixes: #29328 Signed-off-by: Daneyon Hansen <daneyon.hansen@solo.io> Signed-off-by: Jussi Maki <jussi@isovalent.com> 31 January 2024, 12:07:29 UTC
958c136 fix: PromQL syntax on cilium policy query [ upstream commit d84a6507bea048c1eb912ccc3c54a50293748282 ] Signed-off-by: Ludovic Ortega <ludovic.ortega@adminafk.fr> Signed-off-by: Jussi Maki <jussi@isovalent.com> 31 January 2024, 12:07:29 UTC
ef9d1b7 Fix unchecked error in datapath/linux/ipsec [ upstream commit 5b04e085b4b9d0fedbd0afced0195551a50ea637 ] Ensure errs are checked for the calls below: - deleteNodeIPSecOutRoute - replaceNodeIPSecOutRoute Signed-off-by: Fernand Galiana <fernand.galiana@isovalent.com> Signed-off-by: Jussi Maki <jussi@isovalent.com> 31 January 2024, 12:07:29 UTC
63c31e6 Encryption status refactored. [ upstream commit 000edcee7c0a264567781a8fee3114963a1b34d4 ] Signed-off-by: viktor-kurchenko <viktor.kurchenko@isovalent.com> Signed-off-by: Jussi Maki <jussi@isovalent.com> 31 January 2024, 12:07:29 UTC
84a8442 OpenAPI spec updated and used for encrypt status. [ upstream commit 7cdadbcea9c9f48e717a670ab287b1933703705a ] Global variable `countErrors` converted to the function local. Signed-off-by: viktor-kurchenko <viktor.kurchenko@isovalent.com> Signed-off-by: Jussi Maki <jussi@isovalent.com> 31 January 2024, 12:07:29 UTC
9e1b46f IPsec encrypt status JSON output implementation. [ upstream commit f299dc171e4f07c7ad4634f88c118a7154503d7f ] Signed-off-by: viktor-kurchenko <viktor.kurchenko@isovalent.com> Signed-off-by: Jussi Maki <jussi@isovalent.com> 31 January 2024, 12:07:29 UTC
90cdc16 linux/node: don't run validation functions if not yet initialized [ upstream commit 5eee9de362bba7e5af1d563b681c0ef1cd9937b7 ] The Node{Add,Update,Delete} functions of the linux node handler are already guarded in order not to execute the underlying logic if the node subsystem is not yet fully initialized. Once initialized, all updates are then automatically replayed. Yet, this does not apply to the NodeValidateImplementation and AllNodeValidateImplementation functions, which can also be invoked asynchronously, leading to a panic if not fully initialized (even without panicing, we would be enforcing an incorrect configuration, possibly disrupting existing connections): github.com/cilium/cilium/pkg/datapath/linux.(*linuxNodeHandler).nodeUpdate(0xc0022be1a0, 0x0, 0xc001936480, 0x0) /go/src/github.com/cilium/cilium/pkg/datapath/linux/node.go:1030 +0x142d github.com/cilium/cilium/pkg/datapath/linux.(*linuxNodeHandler).NodeValidateImplementation(_, {{0xc000f10720, 0x1b}, {0xc00068b2d8, 0x13}, {0xc000d6a3c0, 0x4, 0x4}, 0xc0005fc0e8, {0x0, ...}, ...}) /go/src/github.com/cilium/cilium/pkg/datapath/linux/node.go:1337 +0xc8 github.com/cilium/cilium/pkg/node/manager.(*manager).backgroundSync.func1({0x4019e80, 0xc0022be1a0}) /go/src/github.com/cilium/cilium/pkg/node/manager/manager.go:342 +0x9a github.com/cilium/cilium/pkg/node/manager.(*manager).Iter(0x3251f40?, 0xc001f0bdb8) /go/src/github.com/cilium/cilium/pkg/node/manager/manager.go:174 +0xdb github.com/cilium/cilium/pkg/node/manager.(*manager).backgroundSync(0xc00083c460, {0x400c390, 0xc00135b630}) /go/src/github.com/cilium/cilium/pkg/node/manager/manager.go:341 +0x4ab github.com/cilium/workerpool.(*WorkerPool).run.func1() Let's fix this by also checking the initialization status there. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> Signed-off-by: Jussi Maki <jussi@isovalent.com> 31 January 2024, 12:07:29 UTC
e252429 hubble: add support for TRACE_REASON_SRV6_{ENCAP,DECAP} [ upstream commit a6bfb7928e74f11211449f16ec778dc3e0721317 ] Consider encap/decap as egress/ingress (respectively) and both as unknown reply ct status. Signed-off-by: Alexandre Perrin <alex@isovalent.com> Signed-off-by: Jussi Maki <jussi@isovalent.com> 31 January 2024, 12:07:29 UTC
bba0e9a srv6,bpf: add srv6 related trace reasons [ upstream commit 2de0feac405cfd711dbe82d14fd46d2ed2d6b9e3 ] Include a trace reason for SRv6 encapsulation and decapsulation. This greatly improves the debugging process, indicating whether SRv6 VPN related packets are processed by our datapath. Signed-off-by: ldelossa <louis.delos@gmail.com> Signed-off-by: Jussi Maki <jussi@isovalent.com> 31 January 2024, 12:07:29 UTC
1832ebc srv6,bpf: rename egress_policies.h to srv6.h [ upstream commit b09561c949a88d38384a2be101db56290993e870 ] The only functions left in egress_policies.h are SRv6 related. Let's rename this to 'srv6.h' and update references to the old file name. Signed-off-by: ldelossa <louis.delos@gmail.com> Signed-off-by: Jussi Maki <jussi@isovalent.com> 31 January 2024, 12:07:29 UTC
02a43b6 bpf: egressgw: handle missing L2 resolution in from-overlay [ upstream commit 918e47baf59effdf9b60096a3182fc32b693d276 ] With a previous patch, egress_gw_fib_lookup_and_redirect() now potentially doesn't redirect the packet, and just returns CTX_ACT_OK instead. Handle this by forwarding the packet to the stack, as was done prior to 9c1d1defb8ba ("egressgateway: Redirect from bpf_overlay to egress gw SNAT netdev"). Ideally this happens just once per connection - the pass through the stack should trigger a fresh ARP resolution, and subsequent traffic can obtain a L2 resolution from the FIB lookup. Signed-off-by: Julian Wiedmann <jwi@isovalent.com> Signed-off-by: Jussi Maki <jussi@isovalent.com> 31 January 2024, 12:07:29 UTC
f803040 bpf: egressgw: tolerate missing L2 resolution on FIB lookup [ upstream commit e2760e62db7851d6459e3cea84f9b57aa55ddd92 ] When egress_gw_fib_lookup_and_redirect() in to-netdev selects the final egress interface for a packet (based on its desired EgressIP), the FIB lookup potentially returns BPF_FIB_LKUP_RET_NO_NEIGH. For 5.10+ kernels this is gracefully handled in fib_do_redirect() by redirecting to the neigh subsystem. But for older kernels we have no possibility to fall back to the NEIGH map, and the packet would just get dropped with DROP_NO_FIB / BPF_FIB_MAP_NO_NEIGH. Have egress_gw_fib_lookup_and_redirect() catch this case, and just let the packet continue on the current egress interface. Users that strictly require the *correct* egress interface need to run a 5.10+ kernel. We'll update the relevant code in from-overlay with a subsequent patch. Signed-off-by: Julian Wiedmann <jwi@isovalent.com> Signed-off-by: Jussi Maki <jussi@isovalent.com> 31 January 2024, 12:07:29 UTC
94b30f6 bpf: complexity-tests: enable EgressGW for bpf-overlay [ upstream commit bc65ca3ad50fe0c4150f68d3a56c0cff9bbf7410 ] Commit 9c1d1defb8ba ("egressgateway: Redirect from bpf_overlay to egress gw SNAT netdev") introduced some EgressGW code into bpf_overlay. Cover it in the complexity configs. Signed-off-by: Julian Wiedmann <jwi@isovalent.com> Signed-off-by: Jussi Maki <jussi@isovalent.com> 31 January 2024, 12:07:29 UTC
fceb007 l2announcer: Retry getting lease after losing it [ upstream commit 00b1e5e6031630cf5a8ab3da698505504b5da62d ] Once a service gets selected we start leader election. However, if we lose the lease for some reason, we don't retry getting it until the service is deselected and reselect, recreated or the agent restarts. This commit surrounds the lease leader election logic with a loop that ends when the context is cancelled. Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com> Signed-off-by: Jussi Maki <jussi@isovalent.com> 31 January 2024, 12:07:29 UTC
94491f6 bgpv1: remove references to advertisement from CiliumBGPPeeringPolicy Advertisement field got introduced into CiliumBGPFamily type when adding v2 APIs. This field is only required in new CiliumBGPPeerConfig structures. This change removes advertisement from CiliumBGPFamily and introduces new type CiliumBGPFamilyWithAdverts which will be used in v2 APIs. Signed-off-by: harsimran pabla <hpabla@isovalent.com> 31 January 2024, 00:43:21 UTC
34e3348 envoy: Bump envoy version for x/net library Relates: https://github.com/cilium/proxy/pull/510 Related build: https://github.com/cilium/proxy/actions/runs/7697413984/job/20974389523 Signed-off-by: Tam Mach <tam.mach@cilium.io> 29 January 2024, 22:05:14 UTC
9ab203f chore(deps): update docker.io/library/alpine docker tag to v3.19.1 Signed-off-by: renovate[bot] <bot@renovateapp.com> 29 January 2024, 10:14:16 UTC
87e7d23 images: update cilium-{runtime,builder} Signed-off-by: Cilium Imagebot <noreply@cilium.io> 28 January 2024, 20:06:17 UTC
434d442 chore(deps): update docker.io/library/golang:1.21.6 docker digest to 76aadd9 Signed-off-by: renovate[bot] <bot@renovateapp.com> 28 January 2024, 20:06:17 UTC
9d1ae5b images: update cilium-{runtime,builder} Signed-off-by: Cilium Imagebot <noreply@cilium.io> 27 January 2024, 13:16:13 UTC
040b5a2 chore(deps): update docker.io/library/ubuntu:22.04 docker digest to e6173d4 Signed-off-by: renovate[bot] <bot@renovateapp.com> 27 January 2024, 13:16:13 UTC
ac37d81 chore(deps): update gcr.io/distroless/static-debian11:nonroot docker digest to 112a87f Signed-off-by: renovate[bot] <bot@renovateapp.com> 27 January 2024, 13:15:28 UTC
2682b34 chore(deps): update stable lvh-images Signed-off-by: renovate[bot] <bot@renovateapp.com> 26 January 2024, 19:07:00 UTC
69b62e8 envoy: Bump envoy image to fix SO_REUSEPORT with BPF TPROXY Currently, if BPF TPROXY is enabled (`bpf.tproxy=true`), the BPF socket lookup for the proxy port fails because Envoys Proxy listener socket is always configured with the socket option `SO_REUSEPORT`. It ignores the fact that port reuse on the Listener socket is explicitly disabled via Envoy Listener API (`enable_reuse_port=false`) if BPF TPROXY is enabled (due to incompatibilities). Therefore, this commit bumps the envoy image to the latest version that doesn't set the socket option `SO_REUSEPORT` on the Listener socket. Relates: cilium/proxy#505 Fixes: #27498 Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com> 26 January 2024, 17:25:13 UTC
3b1f681 chore(deps): update hubble cli to v0.13.0 Signed-off-by: renovate[bot] <bot@renovateapp.com> 24 January 2024, 09:29:45 UTC
6dae3cf bpf: overlay: restore bpf_clear_meta() in from-overlay [ upstream commit 1ab043d546e52fb2428300e6c6ea35fa3bd7c711 ] Prior to 8ea31e07de2f ("bpf: Decapsulate traffic encapsulated with pod IPs") we were clearing the skb->cb on entry of from-overlay. For hs-ipcache this wasn't possible anymore, as from-netdev manually strips the tunnel encap and transfers its content via skb->cb. But we should still clear the skb->cb when hs-ipcache is disabled, and thus avoid handling stale data. Reported-by: Gray Lian <gray.liang@isovalent.com> Signed-off-by: Julian Wiedmann <jwi@isovalent.com> Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> 24 January 2024, 08:45:12 UTC
1d4cfae Add ServiceMonitor config for Agent Envoy when enabled [ upstream commit 3e09aa147fd1e9023561f88f5f9e305ddafa9ebb ] This commit adds Prometheus config to scrape Envoy metrics from the Envoy port (default 9964) on the Agent when Envoy is enabled. It uses the newer `.Values.envoy` section of the Helm chart, as we want to emphasize using that config regardless of where Envoy is running (in-agent or in a separate Daemonset). Signed-off-by: Nick Young <nick@isovalent.com> Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> 24 January 2024, 08:45:12 UTC
647974a gha: explicilty specify beefier runner type for clustermesh workflows [ upstream commit b20038e242d7caa239c5973d2cc6a6865d03335e ] Clustermesh workflows need to setup two multi-node kind clusters, which don't fit well in the default GH runners (2 vCPU and 7GiB or RAM). Although GitHub recently upgraded [1] the default runners for OSS projects to 4 vCPU and 16GiB of RAM, let's still make it explicit that these workflow actually need that amount of power to run seamlessly. [1]: https://github.blog/2024-01-17-github-hosted-runners-double-the-power-for-open-source/ Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> 24 January 2024, 08:45:12 UTC
5edb384 helm: Add extraVolumeMounts to config init container [ upstream commit fb4e5607ec8bd349bc05922fe93524c91693dc64 ] Signed-off-by: Andrii Iuspin <andrii.iuspin@isovalent.com> Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> 24 January 2024, 08:45:12 UTC
828f040 doc: Add Azure CNI Powered by cilium as external installer [ upstream commit 8180cac8382dde4afdfe2e74bf2cac33355eb35a ] Added a doc to update installation instructions of cilium via Azure CNI Powered by Cilium AKS cluster. Added a page to describe about delegated ipam. Signed-off-by: Tamilmani <tamanoha@microsoft.com> Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> 24 January 2024, 08:45:12 UTC
1ee27c5 pkg/nodediscovery: Updates updateCiliumNodeResource() Warning Message [ upstream commit aef05232ec08ca6876ab057ff1b03ad407a74734 ] Previously, updateCiliumNodeResource() would emit a warning message whenever the k8s client could not get the local CiliumNode resource from the k8s api server. This caused the following benign log message for new installations since the CiliumNode resource has yet to be created: `level=warning msg="Unable to get node resource" error="ciliumnodes.cilium.io \"kind-control-plane\" not found" subsys=nodediscovery` This PR updates updateCiliumNodeResource() to only generate the warning message when the maximum number of attempts has been reached. Fixes: #29330 Signed-off-by: Daneyon Hansen <daneyon.hansen@solo.io> Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> 24 January 2024, 08:45:12 UTC
617579a envoy: Bump envoy image to include proxy_protocol filter [ upstream commit f5c6b8a44f275b4697c24d4cd5ab74987fc93f88 ] Related build: https://github.com/cilium/proxy/actions/runs/7537100790/job/20515509923 Relates: https://github.com/cilium/proxy/pull/487 Fixes: https://github.com/cilium/cilium/issues/30180 Signed-off-by: Tam Mach <tam.mach@cilium.io> Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> 24 January 2024, 08:45:12 UTC
41be945 bpf: nodeport: opt-out from neighbour map when XDP-forwarding via tunnel [ upstream commit cc25b9194c89be5d5b0f245335cae5b0d74f21c0 ] When XDP manually builds the tunnel headers and forwards to a remote node, it makes no sense to rely on the neighbour map for L2 resolution. We have to trust that the agent installs managed neigh entries for all other nodes, and thus the FIB lookup will always return a L2 resolution. Signed-off-by: Julian Wiedmann <jwi@isovalent.com> Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> 24 January 2024, 08:45:12 UTC
ad2b84e bpf: fib: refactor fib_do_redirect() [ upstream commit c66d1e16ba3471564b80d37db90040c1ed624dcc ] Clarify the different paths of L2 resolution: 1. when the neigh-resolver is available, always use it. Forward the next-hop info from a preceding FIB lookup where available. 2. otherwise fallback to the neigh map, for callers that have opted in. Signed-off-by: Julian Wiedmann <jwi@isovalent.com> Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> 24 January 2024, 08:45:12 UTC
893484d bpf: fib: fix DMAC rewrite with ENABLE_SKIP_FIB [ upstream commit f3c34162e9de55537a1c54cc71cd90fb63a0997f ] A recent FIB refactor introduced a bug, where fib_redirect*() no longer performs a FIB lookup if ENABLE_SKIP_FIB is set. But for configs without neigh-resolver, some code paths (that can't fall back to the neigh map) strictly require this FIB lookup to obtain the next-hop's MAC address. Fix things by reintroducing the FIB lookup when neigh_resolver_available() returns false. Fixes: e30e18b646f6 ("bpf,fib: use fib_do_redirect in fib_redirect") Signed-off-by: Julian Wiedmann <jwi@isovalent.com> Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> 24 January 2024, 08:45:12 UTC
a5cb3cc bpf: fib: require opt-in for neighour map fallback in fib_do_redirect() [ upstream commit bb06f2eb1fba730f2b548b660d4b1ab7593843b5 ] The neighbour map is populated by the inbound nodeport path, and used to cache the client's MAC address. Therefore it only makes sense to use this fallback in the LB's reply path. Opt-out from using it in - the LB NAT forward path - the LB DSR forward path - the outbound EgressGW paths - bpf_lxc's reply path, as that's only used with ENABLE_HOST_ROUTING and thus can always use the neigh-resolver. Note that callers which can't use the neigh-map will need *some* sort of toleration for failed L2 resolution / DROP_NO_FIB result. Signed-off-by: Julian Wiedmann <jwi@isovalent.com> Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> 24 January 2024, 08:45:12 UTC
afabe56 test/controlplane: add field filterlist case for ciliumnodelist. [ upstream commit 5a00ed7fa0aa0bca5eb58da4afd7c7a7701dfe74 ] This fixes panic in controlplane tests introduced by previous commits related to CiliumNode Resource[T]. Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com> Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> 24 January 2024, 08:45:12 UTC
9668600 daemon: add unit test for local node init from k8s. [ upstream commit 3a2267bc6521f52e84266d5e2115c2d35102e0db ] This tests code path where node ip4/ip6 are not configured manually and thus restoration is attempted from local Node/Cilium node objects. Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com> Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> 24 January 2024, 08:45:12 UTC
c8f9b62 daemon: use local CiliumNode resource to populate CiliumInternalIP. [ upstream commit c67d4eeea73c9d192776afab6ebc4f6107a21dc3 ] It appears that during recent refactors, restoring cilium addresses from k8s node objects would only use k8s Node types. However, in order to restore cilium_host router interface IP from k8s (the prioritized restore method), the agent needs to find an IP of type CiliumInternalIP. This type is only enumerated on CN types, not K8s Nodes so in it's current state all attempts to restore from k8s would return a nil IP. As well, we've noticed that non-k8s restorations can occasionally produce unexpected new IPs causing issues when running in vxlan/ipsec mode due to delay between xfrm state and the router ip being emitted via apiserver. Note: Most cilium host restores should succeed on the configuration based retore which takes precedence over k8s based restore. Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com> Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> 24 January 2024, 08:45:12 UTC
ef63ef6 gha: postpone checkout of the untrusted context [ upstream commit 0c080f64c22cc43b9bf035216ddc6614ad68366b ] As an additional security measure, let's postpone the checkout of the untrusted context after the setup of the test environment. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> 24 January 2024, 08:45:12 UTC
f2f568a gha: keep trusted and untrusted paths separate, and simplify actions ref [ upstream commit 247e6e0fcf4879fef966b6bccd2f692241a63015 ] A few GHA workflows got recently modified to hardcode the repository and branch of the actions hosted locally (e.g., [1]). This was a security measure, as they are triggered after checking out the untrusted context (i.e., PR branch), and thus it would be possible for an external PR to inject malicious code. Yet, at the same time, this change mostly defeats the smooth development process enabled by ariane (which automatically uses the workflow and context from the PR for trusted branches -- i.e., in cilium/cilium), requiring again to manually modify those references for testing purposes. Similarly, it also requires manual adaptations when changes are backported to stable branches, or to allow running them from forks, which are easy to overlook. As an alternative solution, let's only check out the helm chart from the untrusted context in a separate directory, without overriding any of the trusted files (i.e., from the target branch) retrieved initially. This way, we are guaranteed that the local github actions are always trusted (as we are not overriding them, nor we are executing any script which could modify them), and can be invoked directly, without any additional constraint. A key aspect for this is that helm charts cannot execute arbitrary code in the client host. Another difference, compared to the previous approach, is that now we also execute the `./contrib/scripts/kind.sh` script from the trusted context (i.e., target branch) instead of the PR context. However, this file is effectively part of the workflow definition, and this change brings consistency with the rest of it. The same also applies for the Gateway API conformance tests. [1]: 654d92f29c4f ("ci-e2e: Use lvh-kind in secure way") Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> 24 January 2024, 08:45:12 UTC
4ad24e0 gha: improve conformance-clustermesh workflow coverage [ upstream commit cbae172f82c0e519a5f3d05ac8c31b204531b795 ] Extend the conformance clustermesh workflow to additionally run the tests which require the presence of an extra Kubernetes node where Cilium is not running. In particular, north/south loadbalancing (i.e., global service NodePorts accessed from outside the cluster) and compatibility between ingress and global services. To this end, the test clusters now include one control-plane node and two workers. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> 24 January 2024, 08:45:12 UTC
b7aefa7 gha: prevent circular dependency in clustermesh-upgrade workflow [ upstream commit 00ed827d9c6f7962df34f3a500468a1b42c7f2f4 ] The simultaneous restart of the clustermesh-apiserver pods in both clusters after rolling out all agents can lead to a circular dependency when Cilium is configured in tunneling mode and KPR=true [1]. For the moment, let's avoid to trigger this scenario in CI, as unlikely to happen in real environments. We never hit this issue before because we only had one worker node, which is targeted by the NodePort, and apparently the clustermesh-apiserver was always scheduled there. [1]: https://github.com/cilium/cilium/issues/30156 Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> 24 January 2024, 08:45:12 UTC
9ede93d gha: increase ip-identities-sync-timeout in clustermesh-upgrade [ upstream commit ab2a1492cfe0592b4b7bcd27e41f464e1976f713 ] Currently, it matches the `cilium clustermesh status` wait timeout, making it harder to pinpoint the cause of possible failures, as changes may intervene before collecting the sysdump. Let's raise it to decorrelate the two timeouts. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> 24 January 2024, 08:45:12 UTC
5d788ad gha: test highest possible cluster ID in conformance clustermesh [ upstream commit df3ab28daf71e4aebe8833ea950664d8448f66fe ] 809764feed5b ("workflow/clustermesh: set maxConnectedClusters") extended the conformance clustermesh tests to additionally configure the maximum number of possible clusters (either 255 or 511). Let's also configure the two clusters with the extreme cluster ID values, to make sure that the entire range works as expected. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> 24 January 2024, 08:45:12 UTC
5575547 gha: drop duplicate bpf.monitorAggregation in conformance clustermesh [ upstream commit b48a281bf294ce610a057a84f86ef49e5d97aa8d ] It is already configured by the helm-default action, so let's remove the additional explicit configuration. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> 24 January 2024, 08:45:12 UTC
05e35b8 gha: extend clustermesh upgrade to also cover external kvstores [ upstream commit 683a8e620f40bf35ec79738447bfff2787dea0fc ] [ backporter's notes: in Cilium v1.14 and earlier, the clustermesh configuration secret is created only when the clustermesh-apiserver is enabled. For this reason, we need to enable it also when actually connecting to a remote kvstore cluster, although with zero replicas. ] Let's extend the clustermesh upgrade/downgrade workflow with a new matrix entry to also cover the external kvstores configuration. We leverage the newly introduced kvstore action to setup the etcd containers and retrieve the appropriate parameters. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> 24 January 2024, 08:45:12 UTC
001a478 gha: improve max connected clusters coverage in conformance clustermesh [ upstream commit 5e8f85d55689c8164689408959ee3a711ac746c3 ] Make sure that the max connected clusters option works as expected in all configurations: clustermesh, kvstoremesh and external kvstore. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> 24 January 2024, 08:45:12 UTC
6112947 gha: extend conformance clustermesh to also cover external kvstores [ upstream commit 403b3a265085ea3e08c72b59ee2d5dd2836fe4a4 ] Let's extend the conformance clustermesh workflow to also cover the external kvstores configuration in addition to plain clustermesh and kvstoremesh. To avoid increasing the number of matrix entries, let's convert two of the already existing ones over to this mode. We leverage the newly introduced kvstore action to setup the etcd containers and retrieve the appropriate parameters. Cluster Mesh configurations are directly specified at installation time, as 'cilium clustermesh connect' does not support this scenario. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> 24 January 2024, 08:45:12 UTC
9d7947a gha: slight matrix generalization in conformance clustermesh [ upstream commit b311e79dd8f5126dfd0e917459e1d4f5d286f377 ] As a preparation for the subsequent commit, let's slightly generalize the matrix definition in the conformance clustermesh workflow, replacing the current 'kvstoremesh' boolean entry with 'mode', which can be set to either 'clustermesh', 'kvstoremesh', or, soon, 'external'. Additionally, let's also shuffle a bit the other parameters, to increase the coverage of dual stack clusters and avoid losing coverage due to the subsequent changes. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> 24 January 2024, 08:45:12 UTC
2ea5b6f gha: introduce kvstore action [ upstream commit 425b459e018af8a8af8a18f30159681a565dd078 ] [ backporter's notes: skipped renovate and codeowners changes, as only relevant in the main branch. ] Introduce a new GHA action responsible for generating the appropriate TLS certificates and starting the given number of single replica etcd clusters. It is intended to be leveraged by different workflows (e.g., clustermesh ones) to test Cilium when configured to connect to an external kvstore. In detail, it takes as input: * the number of single replica etcd clusters to be created; * the etcd image, which should be overridden only for testing purposes, as automatically bumped by renovate; * the base name of each container (to which the index is appended); * the Docker network the containers are attached to; and returns as output: * the path to the definition of the cilium-etcd-secrets secret, containing the TLS information to connect to the external kvstore; * the parameters to configure Cilium to connect to the external kvstore; they are parametrized through the KVSTORE_ID variable to specify the ID of the kvstore to connect to; * the clustermesh configuration to connect each cluster to all the remote ones (except for the cluster names, which should be specified externally). Let's additionally assign the new action to the kvstore and sig-clustermesh teams for review, as well as extend the renovate configuration to automatically update the etcd image when appropriate. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> 24 January 2024, 08:45:12 UTC
051c06b conformance-e2e: enforce no missed tail calls occurring during tests [ upstream commit f28817b682415399bfb841bea3efe67a030dd1e1 ] Signed-off-by: Timo Beckers <timo@isovalent.com> 23 January 2024, 12:56:39 UTC
7c44247 loader: install an ELF's policy programs before attaching tc/xdp hooks [ upstream commit 4651129e13921825c53e3850b4a53ed8eac0b506 ] See code comments for a detailed description of the problem. This commit installs policy programs before attaching tc/xdp hooks since doing things in the wrong order means dropping tail calls when handling traffic if the policy programs aren't inserted. Signed-off-by: Timo Beckers <timo@isovalent.com> 23 January 2024, 12:56:39 UTC
d79035b bpf: lower pending map removal warning to info level [ upstream commit 760a109e0cb5d5a0b4c65811f43a2c5ca972a2ae ] This has been making ci-ginkgo fail recently. With the removal of map migrations around the corner (https://github.com/cilium/cilium/issues/29333), and having declared bankruptcy on the Ginkgo test suite, let's not waste more time chasing this bugbear. Signed-off-by: Timo Beckers <timo@isovalent.com> 23 January 2024, 12:56:39 UTC
d5b066e loader: ignore context cancellations during map migration [ upstream commit 385dbe51bff4f474f5267b8773532fc00ad6a389 ] Allowing replaceDatapath() to be cancelled in the middle of an ongoing map migration is a potential source of chaos. We've recently seen some flakes with errors like `Removed pending pinned map, did the agent die unexpectedly?`, so let's remove this context check to reduce the likelyhood of that happening. Signed-off-by: Timo Beckers <timo@isovalent.com> 23 January 2024, 12:56:39 UTC
2fee9c9 Update v1.15.0-RC.1 digests Signed-off-by: André Martins <andre@cilium.io> 16 January 2024, 13:39:55 UTC
f582c55 Prepare for release v1.15.0-rc.1 Signed-off-by: André Martins <andre@cilium.io> 16 January 2024, 12:47:13 UTC
3a0ea0f ci: Add a call to the update label backport action [ upstream commit 7fc78e9d5f42d84d71d699f6de5cfb701438e06a ] Add an action to call the workflow that update the labels of backported PRs in stable branch. This commit is based on the following commits by Fabio from v1.14 branch: - 81ade5f693b8 ("ci: Call the workflow to update labels of backported PRs") - a5a047f2fa84 ("ci: Use pull_request_target in update label workflow") The primary change here is to list all maintained branches in a single workflow on main in order to simplify the maintenance burden when creating new stable branches (eg, during v1.15 stable branch creation). This action will not trigger from the main branch for PRs targeted to stable branches. However, when we copy this workflow to stable branches, it will run for PRs targeted to that stable branch (assuming that the versions referenced in this file are kept in sync with the branch version). Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com> Co-authored-by: Joe Stringer <joe@cilium.io> Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> 16 January 2024, 12:14:49 UTC
13915b1 daemon: Remove obsolete bpf-lb-dev-ip-addr-inherit option [ upstream commit 3381e0f3326a1ac4bacee3044ef96298dea0f6a5 ] This option was added for a niche use-case that no longer needs it and the agent did not anymore support it. Remove the remaining code related to it. Signed-off-by: Jussi Maki <jussi@isovalent.com> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> 16 January 2024, 12:14:49 UTC
0aa4fed L7LB: fix Envoy backend (endpoint) synchronization [ upstream commit 7318ce2d0d89a91227e3f313adebce892f3c388e ] Currently, when multiple `CiliumEnvoyConfig`s reference the same backend service on different ports, the `frontendPorts` that are used to filter the backends is always overwritten with the ports of the last modified CEC. As a result, not all the Cilium Backends are synchronized to Envoy as Endpoints. This breaks connectivity. Therefore, this commit fixes the frontendPorts by using the ports of all referencing CiliumEnvoyConfigs. Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> 16 January 2024, 12:14:49 UTC
a6a2641 ci-clustermesh-upgrade: Adjust name of test, to match cilium-cli's [ upstream commit 501cb42a54b2af7637e0e72da882f6d44edf25d6 ] At some point (v0.15.18), connectivity test "no-missed-tail-calls" was renamed as "no-unexpected-packet-drops" in cilium-cli [0]. We now use a cilium-cli version that contain the change, but we've omitted to update the name of the test to run in the workflow. Let's adjust it now. [0] cilium/cilium-cli@4880c91a726d ("connectivity: Check for unexpected packet drops") Fixes: 16fe16637833 ("gh/workflows: Bump CLI to v0.15.18") Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> 16 January 2024, 12:14:49 UTC
c57fe67 policy: Fix MapState.Equals() [ upstream commit 862fcd56b37465bf46717f4248f3ad29c019b0ff ] Compare the entries of 'msA' and 'msB' rather than 'msB' against itself. Simplify the body of the comparison function for readability. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> 16 January 2024, 12:14:49 UTC
6dfa9af compile: avoid nil deref of Cmd.ProcessState if compileCmd fails to start [ upstream commit 1275493e4107cf36e0ed1e76f39188ea40464333 ] The gotcha with Cmd.ProcessState is documented in comments. I'm not sure if we're really interested in Maxrss of failed compilations, or if it really needs to be debug-logged. For troubleshooting something like this, we'd want to reproduce this locally anyway, at which point we can hack in a few log lines. I didn't want to switch to a separate Cmd.Start() and Cmd.Wait(), so the maxrss logic was consolidated into a single block, only executed when compilation was successful, where Cmd.ProcessState is guaranteed to be set. Fixes #29989. Signed-off-by: Timo Beckers <timo@isovalent.com> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> 16 January 2024, 12:14:49 UTC
053bb4e option: Add --dnsproxy-enable-transparent-mode (default false) [ upstream commit 35162d1fb57c6aaeb8a57d3ec866625f1a2838b5 ] Add dnsproxy-enable-transparent-mode option to enable DNS Proxy transparent mode. If 'true', Cilium DNS proxy will use the original source address of the source pod in the forwarded DNS requests. Local host sources and destinations are excepted due to networking stack compatibility reasons, but the use of the original address is typically not significant for node local traffic. Defaults to 'false' for backwards compatibility for upgrades, or to 'true' for Cilium 1.12 onwards. Transparent mode is not compatible with CNI chaning modes, so if CNI chaining is used, transparent mode will not be set unless explicitly set with helm value 'dnsProxy.enableTransparentMode=true'. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> 15 January 2024, 19:28:43 UTC
65eefa1 dnsproxy: Do not use original source when not possible [ upstream commit 824e969f26d8bc68ee1a00cddbe25c29a876544c ] Do not use original source for server running in the local node, or when the destination is outside of the cluster, as there is a risk of missing masquarade on the upstream connection. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> 15 January 2024, 19:28:43 UTC
back to top