https://github.com/cilium/cilium

sort by:
Revision Author Date Message Commit Date
0861396 Revert "bpf: drop SVC traffic if no backend is available" This reverts commit 183501124869aca9d4d5d596e30c32cd00261b91. 23 February 2023, 13:40:02 UTC
d3385c4 Revert "bpf: Block service connections if no backend is available" This reverts commit 20c5bfd4c5190240dc88656bed1fa01d32269c68. 23 February 2023, 13:39:47 UTC
c24d7e9 Revert "bpf: Return -EADDRINUSE from post_bind where possible" This reverts commit 6937f679d02af3508607b7ddab75e527059403ae. 23 February 2023, 13:39:39 UTC
c9723a8 Prepare for release v1.13.0 Signed-off-by: André Martins <andre@cilium.io> 15 February 2023, 13:50:52 UTC
641107f Revert "Pick up etcd v3.5.7" [ upstream commit 11773bcb74a7ea4d1b2c40076d7ad6e8b055e35e ] This reverts commit 3443288ac520c56051cd5db29c06784b4e353770. Let's revert this change until we address #23760. Signed-off-by: Michi Mutsuzaki <michi@isovalent.com> 15 February 2023, 12:43:34 UTC
749d432 images: update cilium-{runtime,builder} Signed-off-by: Cilium Imagebot <noreply@cilium.io> 15 February 2023, 10:41:21 UTC
c366dea images: golang version to 1.19.6 Since Golang 1.20.0 had a regression we have decided to release 1.13.0 with 1.19.x instead. However the change made in 626ce4d8c4a8ca84a2ff1287a6780235af4877e4 was merged but not reverted. This commit sets the Golang version to the last stable release, which also includes some security bug fixes. Signed-off-by: André Martins <andre@cilium.io> 15 February 2023, 10:41:21 UTC
3bcda46 images: update cilium-{runtime,builder} Signed-off-by: Cilium Imagebot <noreply@cilium.io> 14 February 2023, 13:17:01 UTC
8622f54 images/runtime: bump FORCE_BUILD to force regeneration This will force a regeneration of the image build which will fix some vulnerabilities in the last Cilium image. Signed-off-by: André Martins <andre@cilium.io> 14 February 2023, 13:17:01 UTC
19245f3 fix:file fd should be closed [ upstream commit a97aa9c6a1da1ecc97faf29d21f542751bb350c0 ] fix:file fd should be closed Signed-off-by: jiuker <2818723467@qq.com> Signed-off-by: Paul Chaignon <paul@cilium.io> 13 February 2023, 20:12:40 UTC
7d5fb1d k8s: simplify (*Service).DeepEqual [ upstream commit 32e0021bd915e13f381ba642f827c28e21949fa8 ] The generated (*Service).deepEqual method already checks the Shared, IncludeExternal and ServiceAffinity members (among others). Thus we can drop the explicit check. Also move the call to s.deepEqual further up as it is computationally less expensive than e.g. comparing FrontendIPs. Signed-off-by: Tobias Klauser <tobias@cilium.io> Signed-off-by: Paul Chaignon <paul@cilium.io> 13 February 2023, 20:12:40 UTC
a97c67f operator/watchers: replace unnecessary global var sharedOnly [ upstream commit 263e2842e7df6cb5e21e970e58fb63eed93efdc9 ] Pass it as a parameter to k8sServiceHandler instead. Signed-off-by: Tobias Klauser <tobias@cilium.io> Signed-off-by: Paul Chaignon <paul@cilium.io> 13 February 2023, 20:12:40 UTC
2249dd8 clustermesh-apiserver: fix store name in fatal message [ upstream commit 638408e5a79a784df2aeb6405dad79f7a5b126ac ] Signed-off-by: Tobias Klauser <tobias@cilium.io> Signed-off-by: Paul Chaignon <paul@cilium.io> 13 February 2023, 20:12:40 UTC
d87c09a clustermesh: narrow scope of lock in (*globalServiceCache).onUpdate [ upstream commit 794dd9576e352a48cabc69a199814141dd8878dc ] The mutex doesn't need to be held yet while constructing scopedLog. Signed-off-by: Tobias Klauser <tobias@cilium.io> Signed-off-by: Paul Chaignon <paul@cilium.io> 13 February 2023, 20:12:40 UTC
7c56087 clustermesh: correct debug messages in OnDelete [ upstream commit 52d0f0298ee2019d6faccff7d59a03a51875f69a ] These were probably copied from OnUpdate without modification. Signed-off-by: Tobias Klauser <tobias@cilium.io> Signed-off-by: Paul Chaignon <paul@cilium.io> 13 February 2023, 20:12:40 UTC
f927518 docs: add limitation concerning IPsec and ipv6-only clusters [ upstream commit de745d1d05d047edb7357b662e53fd80ae0dcab4 ] Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> Signed-off-by: Paul Chaignon <paul@cilium.io> 13 February 2023, 20:12:40 UTC
a29f2ea ipsec: explicit that ipv6-only mode is not supported [ upstream commit 2f0284c3c534ed35fdbfd4b9a545c4715851cb8e ] This commit introduces two checks during the agent initialization phase and in the generation of the node config header, to prevent the occurrence of a panic if IPsec is enabled, and IPv4 is not enabled (i.e., the node has not an ipv4 address set). Instead, a more human-friendly error is returned. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> Signed-off-by: Paul Chaignon <paul@cilium.io> 13 February 2023, 20:12:40 UTC
a773303 cli: Command line to dump the node ID BPF map [ upstream commit 05d57905622289b4c203ce84cf76d66aa2bdc200 ] The new "cilium bpf nodeids" command dumps the node IDs stored in the BPF map as well as the associated node IP addresses. Signed-off-by: Paul Chaignon <paul@cilium.io> 13 February 2023, 20:12:40 UTC
a8d1a43 datapath, daemon: Restore node IDs from BPF map [ upstream commit 7cbc0a6dc7ef02621a1f732b7ec17c7ac720af1f ] This commit restores node IDs on agent startup from the BPF map. The restoration happens right after BPF maps are initialized. It first dumps the content of the map and then atomically replaces the node manager's Golang map. Signed-off-by: Paul Chaignon <paul@cilium.io> 13 February 2023, 20:12:40 UTC
17c3d28 datapath: Update BPF map with node IDs [ upstream commit a3041709c1b2af553032138a95b5ec35b9cd1605 ] We need to update that BPF map when we are mapping a new node IP to a new or existing node ID. Similarly, we need to update the map when removing a mapping. This commit refactors the existing code for the in-memory Golang map such that the Golang and BPF maps are updated at the same time. It tries to ensure that they both stay in sync. Signed-off-by: Paul Chaignon <paul@cilium.io> 13 February 2023, 20:12:40 UTC
4364482 bpf, maps: Define new map for node IDs [ upstream commit 39f4bccf8fbaad6e4cf59a7a648230ac2fedec5a ] This new map will map node IP addresses to their IDs. In the next commits it will be used to save and restore node IDs across agent restarts. Later on, it can be used to check (1) if an IP is a NodeExternalIP for BPF masquerading and (2) if an IP is a node IP once we replace REMOTE_NODE_ID with per-node identities. Signed-off-by: Paul Chaignon <paul@cilium.io> 13 February 2023, 20:12:40 UTC
b1cdfe7 cilium-health status: fix endpoint reachability in succinct view [ upstream commit 429c56f485e3f1582b057da610f72da2942f1ddc ] This commit fixes the reporting of `cilium-health status --succinct`, and specifically of the endpoints reachability column in succinct view. The issue was caused by the `GetAllEndpointAddresses` function returning only secondary addresses, leading the status to be reported as reachable in case of failures associated with the primary address (or if this was the only one present). Fixes: 3d8de4cb5b5d ("health/client: Include secondary endpoint status") Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> Signed-off-by: Paul Chaignon <paul@cilium.io> 13 February 2023, 20:12:40 UTC
2d01105 docs: Link KNP sections together to reduce duplication [ upstream commit b3fa5f83a0fe26a956a9f4b4583f6162757ce5db ] These seprate sections were differing from each other, so create a reference linking them together. Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Paul Chaignon <paul@cilium.io> 13 February 2023, 20:12:40 UTC
436db6e daemon: Deprecate SockOps [ upstream commit 18aaf22869275819db7994c8bf623ee825d9ab78 ] The feature will be removed in v1.14. Signed-off-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: Paul Chaignon <paul@cilium.io> 13 February 2023, 20:12:40 UTC
087d255 ci: Update docs-builder image for documentation workflow [ upstream commit bd0ac13e5825eae3ed3a6b239c0c68e12cd47b20 ] Update the docs-builder image used by the CI workflow to pick-up the changes introduced in commit e4889d72f9a8 ("docs: Drop sphinxcontrib-openapi fork, switch back to upstream"), so we can get rid of our openapi fork. Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Paul Chaignon <paul@cilium.io> 13 February 2023, 20:12:40 UTC
7ae1b93 hubble/metrics: handle SCTP port in flows to world metrics [ upstream commit 666100d96f89afde0d9fb05130997519a5407f87 ] Hubble now has support for the SCTP L4 protocol. When the "port" option is set for the flows to world metric, we should also update the port label for SCTP flows. Signed-off-by: Robin Hahling <robin.hahling@gw-computing.net> Signed-off-by: Paul Chaignon <paul@cilium.io> 13 February 2023, 20:12:40 UTC
4ddecae examples, docs: simplify manifest use in clustermesh global services example [ upstream commit a73044d17f6fc29806e8ccf9aaa14d5032c420fc ] The Service defined in rebel-base-global-shared.yaml is also defined in the cluster{1,2}.yaml files already [1] [2]. Thus, there is no need to `kubectl apply -f` that manifest separately. Also inline the manifest into the docs as it is now no longer referenced as a file. [1] https://github.com/cilium/cilium/blob/01f7d9a4a51735afdd1d0b8288cd34e539eaa8a2/examples/kubernetes/clustermesh/global-service-example/cluster1.yaml#L1-L13 [2] https://github.com/cilium/cilium/blob/01f7d9a4a51735afdd1d0b8288cd34e539eaa8a2/examples/kubernetes/clustermesh/global-service-example/cluster2.yaml#L1-L13 Signed-off-by: Tobias Klauser <tobias@cilium.io> Signed-off-by: Paul Chaignon <paul@cilium.io> 13 February 2023, 20:12:40 UTC
195432c Uniform leftover annotations in clustermesh docs [ upstream commit b673d04c13f019a9d2595077772b28cd350d1a1a ] This commit updates the annotation keys in the newly added global and shared services reference documentation. Fixes: 5abb0144c002 ("clustermesh docs: add global and shared services reference") Related: 1f1f715bc0ec ("k8s: switch service-related annotations to the service.cilium.io/... form") Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> Signed-off-by: Paul Chaignon <paul@cilium.io> 13 February 2023, 20:12:40 UTC
dddc2e1 docs: Drop sphinxcontrib-openapi fork, switch back to upstream [ upstream commit e4889d72f9a8f86bee47327dbab7b46773607a7e ] Once upon a time, Cilium docs used the openapi Sphinx add-on to generate its API reference based on the code. And things were good. One day, Dependabot raised a security alert, stating that Mistune v2.0.2 was vulnerable to catastrophic backtracking [0] - this is a regex parsing thing. Mistune was a dependency to m2r, an add-on to parse Markdown in Sphinx, which in turn was a dependency to openapi. The easy path would have been to update m2r to use the latest, fixed Mistune version; but m2r was incompatible with Mistune >= 2.0.0, and also it was no longer in development. There was a fork, m2r2, which had little activity, and would avoid the security issue by very simply pinning the Mistune version to 0.8.4 (which would either fail to build Cilium's reference correctly, or bring some incompatibilities with other dependencies, at this point the narrator does not remember for sure). There was a fork of the fork, sphinx-mdinclude. We could use that project to update openapi, except that it was not compatible with recent versions of docutils, and that this would cause openapi's test suite to fail to pass. ... So we ended up forking the openapi repository to update the dependency to sphinx-mdinclude locally, and this is what we've been using since last summer. And things were good again. But things are even better when they go upstream [citation needed]. We also filed the issue for docutils compatibility in sphinx-mdinclude [1]. It was fixed (thanks!). We submitted a PR to have openapi switch to sphinx-mdinclude [2]. It was adjusted (thanks!), merged, and a new tag was created. Now at last, we can switch back to the upstream version of openapi! [And the build system lived happily ever after.] [0]: https://github.com/advisories/GHSA-fw3v-x4f2-v673 [1]: https://github.com/omnilib/sphinx-mdinclude/issues/8 [2]: https://github.com/sphinx-contrib/openapi/pull/127 I did _not_ run `make -C Documentation update-requirements`, because the resulting changes seemed to break the Netlify preview [3]. I stuck to openapi and bumped sphinx-mdinclude to >= 0.5.2, as required by openapi. [3] https://app.netlify.com/sites/docs-cilium-io/deploys/63c55fcc5531c6000838b87c Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Paul Chaignon <paul@cilium.io> 13 February 2023, 20:12:40 UTC
ecd3bef bugtool: Dump envoy metrics for troubleshooting [ upstream commit 0307adde7b68d4664d9eabf0b339aea748eb637d ] Users might not have prometheus metrics endpoint enabled as part of existing Cilium installation. This commit is to add the capability to dump envoy metrics without the need of re-installation with additional helm flag, or updating existing cilium config map. One common use case is to check if there is any connectivity issue (e.g. 503, timeout, etc) for egress traffic. For example, the below metrics are part of the dump, these two metrics clearly signal some configuration issues with TLS egress. ```bash envoy_cluster_upstream_rq{envoy_response_code="503",envoy_cluster_name="egress-cluster-tls"} 100 envoy_cluster_upstream_cx_connect_fail{envoy_cluster_name="egress-cluster-tls"} 300 ``` Testing was done locally by running curl command in pod manually ```bash $ kubectl exec -n kube-system ds/cilium -- curl --unix-socket /var/run/cilium/envoy-admin.sock http:/admin/stats/prometheus > metrics_dump.txt $ cat metrics_dump.txt | wc -l 2753 ``` Signed-off-by: Tam Mach <tam.mach@cilium.io> Signed-off-by: Paul Chaignon <paul@cilium.io> 13 February 2023, 20:12:40 UTC
05dbf44 gha: Rename ConformanceKind1.19 to ConformanceKind [ upstream commit bc2af2a48d94f628dc59395f8c9afb8705ebb086 ] After #22325 is merged, this job is no longer using k8s 1.19, so it's better to rename the job to just ConformanceKind to avoid any potential confusion. Signed-off-by: Tam Mach <tam.mach@cilium.io> Signed-off-by: Paul Chaignon <paul@cilium.io> 13 February 2023, 20:12:40 UTC
d821d46 egressgw: selectively update policies' matchedEndpointIDs [ upstream commit ae6f2fe0c9472608971fe2bef49b0fc5416ff2e9 ] Instead of updating each policy's matchedEndpointIDs for each reconciliation run, only update them: * when a new policy is added (only update the single policy being added) * when an endpoint is updated or deleted (in this case update all policies) Signed-off-by: Gilberto Bertin <jibi@cilium.io> Signed-off-by: Paul Chaignon <paul@cilium.io> 13 February 2023, 20:12:40 UTC
8f32ea7 egressgw: pass event that triggered reconciliation to reconcile() [ upstream commit adbae703a6f68c6995e19bf46ce85b2351f82b14 ] This will give the reconcile() method more context on the event that triggered the reconciliation and will be used in a subsequent commit to optimize part of that logic. Signed-off-by: Gilberto Bertin <jibi@cilium.io> Signed-off-by: Paul Chaignon <paul@cilium.io> 13 February 2023, 20:12:40 UTC
70801f3 egressgw: cache matching endpoints in policy config [ upstream commit 139e91e3a570fbd240752a37164a855d82c51149 ] Currently whenever we need to determine if a policy matches an endpoint, we call the (*EndpointSelector).Matches() method passing the target endpoint's labels. Turns out this method is quite expensive so instead of recomputing each time if a policy matches a given endpoint, cache this information once for each (*Manager)reconcile() invocation, and then in (*PolicyConfig)selectsEndpoint() use the cached information. Signed-off-by: Gilberto Bertin <jibi@cilium.io> Signed-off-by: Paul Chaignon <paul@cilium.io> 13 February 2023, 20:12:40 UTC
e2df91c docs: Add guide for proxy load balancing feature [ upstream commit 837da071dcd42f34eab20127395c0c1a81efaf89 ] This is to add quick getting guide for proxy LB feature introduced in PR #21244. Relates: #21244 Signed-off-by: Tam Mach <tam.mach@cilium.io> Signed-off-by: Paul Chaignon <paul@cilium.io> 13 February 2023, 20:12:40 UTC
4eee143 docs,daemon: Added discouragement warnings for MetalLB to docs and agent [ upstream commit 7043cddc8739848c1e477cc8b3d212acf548e2f4 ] Added warnings explaining that development on the MetalLB based BGP feature has stopped. The feature will not be deprecated in 1.13, so noted that security updates and bugfixes will still be applied. Even though we can't officially deprecate yet, we would like to highly discourage adoption of the old feature unless absolutely necessary. Fixes: #22246 Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com> Signed-off-by: Paul Chaignon <paul@cilium.io> 13 February 2023, 20:12:40 UTC
f2f444d docs: refactor AKS installation instructions [ upstream commit 26909e74945afe8274b82e041fd549f52abc42e3 ] BYOCNI was initially introduced as the preferred installation method for Cilium on AKS clusters in d8259c1a806965c8e23f0b355a1ee99884796717, at the cost of doubling the number of AKS tabs in Getting Started and Helm guides. Since then: - More tabs have been added, making it even more complex to navigate the options. - BYOCNI is now GA (Azure CLI version 2.39.0). - [Azure CNI Powered by Cilium](https://learn.microsoft.com/en-us/azure/aks/azure-cni-powered-by-cilium) has been announced, further complexifying the Cilium on AKS landscape. In order to reduce bloat and streamline AKS installation instructions, we refactor the AKS instructions in a single tab. We use this opportunity to strongly encourage users to use BYOCNI, and prepare for Azure IPAM legacy retirement. Even though we could very add it into the current structure, Azure CNI Powered by Cilium has not been introduced as another installation option here, because the Cilium distribution used in this case is maintained and controlled by AKS, and not by the Cilium community. We felt this was sensible considering there is already a similar situation with GKE's Dataplane V2 and it is not listed in Cilium documentation either. [ Full details of edits since the diff is a bit hard to parse: - Getting Started Guides: - We had 2 separate AKS tabs for creating AKS clusters (one for BYOCNI and one for Azure IPAM), now we have only one AKS tab and it only explains how to create a cluster for BYOCNI. This is the only set of instructions that was removed, and it was done intentionally so as to just silently encourage users that don't have a cluster yet to use BYOCNI. - We had 2 separate AKS tabs for installing Cilium in an AKS cluster (one for BYOCNI and one for Azure IPAM) but they actually contained the exact same installation instructions. This is because the Cilium CLI is responsible for automatically detecting which mode to use based on the cluster type. Now we have only one AKS tab with the installation instructions up front, and then sub-tabs for both modes with the rest of the previous info we had (requirements + limitations). - So putting the 2 together, if it happens that someone already had an AKS cluster and did not create it with BYOCNI, it'll still work, and if someone actually does want to use Azure IPAM intentionally, they can still figure it out based on the requirements. - Helm: - We had 2 separate AKS tabs for installing Cilium in an AKS cluster (one for BYOCNI and one for Azure IPAM). Now we have only one AKS tab with sub-tabs for both modes with all the previous info we had (installation instructions + requirements + limitations). - Both: - BYOCNI is made even more explicit as the preferred option for installing Cilium on AKS, since it's now GA on AKS. - Azure IPAM has been re-dubbed Legacy Azure IPAM to double down on that, and also in preparation for the fact we might want to stop maintaining it. ] Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com> Signed-off-by: Paul Chaignon <paul@cilium.io> 13 February 2023, 20:12:40 UTC
a6bbf1b docs: correct Prometheus port [ upstream commit c0fd6f6925bddb067348537f3b096e0d0db258c4 ] Signed-off-by: Liz Rice <liz@lizrice.com> Signed-off-by: Paul Chaignon <paul@cilium.io> 13 February 2023, 20:12:40 UTC
df95baa test: Remove IPsec + ep routes + VXLAN from ginkgo tests [ upstream commit 9b9fbd93bf749e4ca7eb0c76e32f3d331288ee3c ] Commit 7dd3fc25 ("workflows: Cover IPsec+VXLAN+endpoint routes in datapath tests") added a new ginkgo test case to cover IPsec + VXLAN + endpoint routes. This test case is however now also covered in the Cilium Datapath workflow, so we can remove it from ginkgo. It was missed in 5c811513 ("test/k8s: Remove some encryption tests") because the two pull requests happened almost at the same time. Fixes: 5c811513 ("test/k8s: Remove some encryption tests") Signed-off-by: Paul Chaignon <paul@cilium.io> 13 February 2023, 20:12:40 UTC
e171a17 k8s: switch config-related annotations to the config.cilium.io/... form [ upstream commit 1a8e04c7030685d97750a03bffc90bad3bee355e ] This commit converts the config-related annotations to the config.cilium.io/... form, to match the style adopted by Kubernetes (e.g., kubernetes.io/egress-bandwidth). The old prefix is not maintained as an alias, since this feature has not yet been released. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> Signed-off-by: Paul Chaignon <paul@cilium.io> 13 February 2023, 20:12:40 UTC
90ba65c k8s: switch policy-related annotations to the policy.cilium.io/... form [ upstream commit 2ea70faf8654dcc19e248aeca1bd2cff1242cddf ] This commit converts the policy-related annotations to the policy.cilium.io/... form, to match the style adopted by Kubernetes (e.g., kubernetes.io/egress-bandwidth). The old ones are maintained as an alias for backward compatibility. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> Signed-off-by: Paul Chaignon <paul@cilium.io> 13 February 2023, 20:12:40 UTC
d225e6a k8s: switch ingress-related annotations to the ingress.cilium.io/... form [ upstream commit 146b9a137fc661108e9cd7999282e1af1787535d ] This commit converts the ingress-related annotations to the ingress.cilium.io/... form, to match the style adopted by Kubernetes (e.g., kubernetes.io/egress-bandwidth). The old ones are maintained as an alias for backward compatibility. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> Signed-off-by: Paul Chaignon <paul@cilium.io> 13 February 2023, 20:12:40 UTC
e9710e3 k8s: switch network-related annotations to the network.cilium.io/... form [ upstream commit 27da86467e79f0440bcde3d6719a55fbcafc669b ] This commit converts the network-related annotations to the network.cilium.io/... form, to match the style adopted by Kubernetes (e.g., kubernetes.io/egress-bandwidth). The old ones are maintained as an alias for backward compatibility. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> Signed-off-by: Paul Chaignon <paul@cilium.io> 13 February 2023, 20:12:40 UTC
e520986 k8s: switch service-related annotations to the service.cilium.io/... form [ upstream commit 1f1f715bc0ece1921f7bba6659b0760384219e3a ] This commit converts the service-related annotations to the service.cilium.io/... form, to match the style adopted by Kubernetes (e.g., kubernetes.io/egress-bandwidth). The old ones are maintained as an alias for backward compatibility, except for the keys associated with proxy load balancing, as not yet documented. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> Signed-off-by: Paul Chaignon <paul@cilium.io> 13 February 2023, 20:12:40 UTC
545d1e9 k8s: prep work to uniform annotations style [ upstream commit 6be2ebbae7528a8ea4755ddc00457a45c0261a9f ] This is a preparatory commit to the conversion of the annotations to the xxxxxx.cilium.io/... from, introducing a function to retrieve the value of an annotation associated with a given key, or one of the additional aliases if not found. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> Signed-off-by: Paul Chaignon <paul@cilium.io> 13 February 2023, 20:12:40 UTC
20f40e4 etcd kvstore: rate limit watch retries on list errors [ upstream commit 9d41018b6404bfe525044735a791a9ff2d055ea5 ] Currently, operations performed by the cilium agent against an etcd server are controlled by a global token bucket rate limiter, configured with 20 QPS by default. ListAndWatch operations lead to a tight retry loop (moderated only by the above rate limiter) if the initial Get (List) operation fails due to server errors, e.g., due to a temporary overload ("etcdserver: too many requests" error), contributing to further overwelming the server. For instance, such situation occurred in the context of clustermesh, following a brief network disruption (#22037). This commit introduces an exponential rate limiter to control the backoff in case of errors in the Get operation performed as part of ListAndWatch, to reduce the generated load against the server. The backoff is automatically adjusted based on the number of nodes in the cluster. The correct functioning of the rate limiter has been tested leveraging a mock etcd server configured to return the selected error on Range requests. Fixes: #22037 Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> Signed-off-by: Paul Chaignon <paul@cilium.io> 13 February 2023, 20:12:40 UTC
b7bc2dc fqdn/dnsproxy: move init LRU cache call out of StartDNSProxy. [ upstream commit 50af04b9ca22019b7d499f9da4df4afd1c12d0dd ] LRU init does global init and doesn't need to be part of starting the FQDN proxy. Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com> Signed-off-by: Paul Chaignon <paul@cilium.io> 13 February 2023, 20:12:40 UTC
58dcb28 clustermesh: close etcd connection on config retrieval error [ upstream commit d6759a299442851e975362f1eddb3b49e0f8306b ] Ensure that the backend connection to a remote etcd server in the context of clustermesh gets properly closed when the connect-time validation check fails due to an error, to prevent leaking the connection itself and associated resources, such as the goroutine performing the watch of 'cilium/.heartbeat' Fixes: 5e5a26e5da4d ("clustermesh: Implement a basic connect-time validation") Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> Signed-off-by: Paul Chaignon <paul@cilium.io> 13 February 2023, 20:12:40 UTC
f591e0d clustermesh docs: add global and shared services reference [ upstream commit 5abb0144c0024140f6aa812913b7c47e8ae3fe0b ] This commit adds a section to the cluster-mesh load-balancing and service discovery documentation, including a reference flow chart describing the behavior considering different combinations of the global-service and shared-service annotation values. The flow chart is a slightly adapted version of that proposed by @YutaroHayakawa (https://github.com/cilium/cilium/pull/23298#discussion_r1087433845), changing *Local* and *Remote* to *Cluster1* and *Cluster2* to reduce the possible confusion when describing the endpoints (i.e., to avoid referring to both to the clusters themselves and the location of the endpoints with the same terms). The mermaid flow chart has been generated separately and imported in the documentation as an SVG file (the code is attached as a comment). Although suboptimal, the direct usage of sphinxcontrib-mermaid to generate it in the docs proved to be problematic, and would lead to the addition of multiple dependencies to the builder image. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> Signed-off-by: Paul Chaignon <paul@cilium.io> 13 February 2023, 20:12:40 UTC
8179b07 tunnel, node: Populate tunnel map with node IDs [ upstream commit 9a70864c6384a5708f30d71386315305af149e4f ] Same as with the ipcache, we need the tunnel map to carry a node ID for each remote node it references. We can use some existing padding for that purpose so the map size won't change. Signed-off-by: Paul Chaignon <paul@cilium.io> 13 February 2023, 20:12:40 UTC
813d6ef bpf, maps: Give the tunnel map its own structs [ upstream commit 16e132ee4fb60a382c7bb2504a5f697317c270cf ] Prior to this commit, the same struct, struct endpoint_key, was used for several maps including the endpoints and tunnel maps. Some fields of the struct are unused when used with some of the maps, but that isn't really an issue as the memory footprint of those maps is small. Nevertheless, in a subsequent commit, the value of the tunnel map will be changed to include new information. If using struct endpoint_key, then this change would also impact the endpoints map. Changing the endpoints map layout breaks connectivity on upgrade and must therefore be handled with care (by migrating the map content prior to reloading the full datapath). That is a bit annoying and likely to introduce bugs. Instead, this commit separates the structs used for the endpoints and tunnel maps. The tunnel map now has new struct tunnel_key and struct tunnel_value, specialized struct that no other map uses. Hence, when changing the tunnel_value struct in a subsequent commit, it won't affect the endpoints map. Signed-off-by: Paul Chaignon <paul@cilium.io> 13 February 2023, 20:12:40 UTC
66c49a7 ipcache: Populate ipcache with node IDs [ upstream commit 814d3c797589bf32020bcbdfeaff1a141d7ce540 ] We need each remote endpoint to have a node ID referencing its host. To that end, we retrieve or compute a node ID from one of the host IPs and include it in the ipcache entry for that endpoint. We only need remote endpoints to have that ID (information about the local node is already available as static data) so the node ID for e.g. local pods doesn't matter. Note that we could also have introduced a new map to hold the node IP to node ID mapping. That would have used less space (one entry per node instead of one per pod) at the expense of a second map lookup. This commit makes the choice of avoiding the second lookup because that could get expensive and would increase complexity. On the other hand, the ipcache map's size isn't really an issue. Also note that the ipcache map will keep the same size given we are simply using implicit padding of struct remote_endpoint_info. That struct will stay 12 bytes long. Signed-off-by: Paul Chaignon <paul@cilium.io> 13 February 2023, 20:12:40 UTC
2d27cef cli: Command line to dump the agent's node IDs [ upstream commit 70abe92c40f80cd9aebc6a596db18c17a43d3e5f ] The new "cilium nodeid" command dumps the node IDs allocated by the agent as well as the associated node IP addresses. Signed-off-by: Paul Chaignon <paul@cilium.io> 13 February 2023, 20:12:40 UTC
c20c3c7 api, daemon, node: Expose allocated node ID in agent API [ upstream commit 7289744f84c81114e1a0fe32f7e70354bcb40a34 ] This commit exposes the node IDs allocated by the agent in the REST API, alongside the associated node IP addresses. The main use case is troubleshooting as it may be useful to dump the current list of node IDs and their associated IP addresses in case of incidents. Signed-off-by: Paul Chaignon <paul@cilium.io> 13 February 2023, 20:12:40 UTC
c8f0f62 datapath: Introduce node IDs [ upstream commit af88b42bd4ccca25225bb7683c7682b2d6260fa8 ] Node IDs are node-scoped unique identities for all cluster nodes. This commit introduces two methods, to allocate and release such IDs. They will be used in a subsequent patchset to identify remote nodes in our datapath. Signed-off-by: Paul Chaignon <paul@cilium.io> 13 February 2023, 20:12:40 UTC
9afe083 Revert "images/cilium-test: New test suite image" [ upstream commit 6e61aed0744f7a3a78fc713722527e118f9f37c5 ] This reverts commits 4004c33dceaa10390e09047ebaec4f3d25bec75b and 044afab2ecf30b822739806b6484e01f0d9385f1. The second commit was a fix for the first so it can now be reverted as well. This code isn't used anywhere anymore. Signed-off-by: Paul Chaignon <paul@cilium.io> 13 February 2023, 20:12:40 UTC
305c383 chore(deps): update docker.io/library/golang:1.19.5 docker digest to 572f680 Signed-off-by: Renovate Bot <bot@renovateapp.com> 13 February 2023, 10:14:14 UTC
feaa4ce chore(deps): update all github action dependencies Signed-off-by: Renovate Bot <bot@renovateapp.com> 12 February 2023, 09:49:59 UTC
1705379 chore(deps): update dependency cilium/hubble to v0.11.1 Signed-off-by: Renovate Bot <bot@renovateapp.com> 10 February 2023, 22:34:29 UTC
978f1f1 .github/workflows: add version number in GH action Related to https://github.com/cilium/cilium/pull/23625 This comment will allow renovate to automatically update GH actions dependencies. Signed-off-by: André Martins <andre@cilium.io> 10 February 2023, 22:28:36 UTC
2f28224 build(deps): bump github/codeql-action from 2.2.2 to 2.2.3 Bumps [github/codeql-action](https://github.com/github/codeql-action) from 2.2.2 to 2.2.3. - [Release notes](https://github.com/github/codeql-action/releases) - [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md) - [Commits](https://github.com/github/codeql-action/compare/39d8d7e78f59cf6b40ac3b9fbebef0c753d7c9e5...8775e868027fa230df8586bdf502bbd9b618a477) --- updated-dependencies: - dependency-name: github/codeql-action dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> 10 February 2023, 15:08:51 UTC
acead49 proxy: Bump the API proxy Signed-off-by: Tam Mach <tam.mach@cilium.io> 09 February 2023, 21:46:00 UTC
1cd4b13 envoy: Bump envoy version to 1.22.7 [upstream commit 81e2491] The runtime feature guard envoy.reloadable_features.no_extension_lookup_by_name is enabled to avoid proxy crash for cilium.tls_wrapper. https://www.envoyproxy.io/docs/envoy/latest/version_history/v1.22/v1.22.0.html Relates: https://github.com/cilium/proxy/pull/101 Signed-off-by: Tam Mach <tam.mach@cilium.io> 09 February 2023, 21:46:00 UTC
ee59a47 Revert "chore(deps): update docker.io/library/golang docker tag to v1.20.0" This reverts commit 80de240277939f7e726c6696d75fd9601d21ba89. The Golang v1.20.0 release includes a regression [1]. We should wait for a new patch release before updating. 1 - https://github.com/golang/go/issues/58293 Reported-by: Tobias Klauser <tobias@cilium.io> Signed-off-by: Paul Chaignon <paul@cilium.io> 08 February 2023, 08:56:28 UTC
9051756 chore(deps): update quay.io/cilium/hubble docker tag to v0.11.1 Signed-off-by: Renovate Bot <bot@renovateapp.com> 07 February 2023, 21:09:00 UTC
560cacc build(deps): bump github/codeql-action from 2.2.1 to 2.2.2 Bumps [github/codeql-action](https://github.com/github/codeql-action) from 2.2.1 to 2.2.2. - [Release notes](https://github.com/github/codeql-action/releases) - [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md) - [Commits](https://github.com/github/codeql-action/compare/3ebbd71c74ef574dbc558c82f70e52732c8b44fe...39d8d7e78f59cf6b40ac3b9fbebef0c753d7c9e5) --- updated-dependencies: - dependency-name: github/codeql-action dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> 07 February 2023, 20:58:24 UTC
80de240 chore(deps): update docker.io/library/golang docker tag to v1.20.0 Signed-off-by: Renovate Bot <bot@renovateapp.com> 07 February 2023, 20:36:51 UTC
e27c600 build(deps): bump docker/setup-buildx-action from 2.4.0 to 2.4.1 Bumps [docker/setup-buildx-action](https://github.com/docker/setup-buildx-action) from 2.4.0 to 2.4.1. - [Release notes](https://github.com/docker/setup-buildx-action/releases) - [Commits](https://github.com/docker/setup-buildx-action/compare/15c905b16b06416d2086efa066dd8e3a35cc7f98...f03ac48505955848960e80bbb68046aa35c7b9e7) --- updated-dependencies: - dependency-name: docker/setup-buildx-action dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> 06 February 2023, 23:41:31 UTC
626ce4d images: update cilium-{runtime,builder} Signed-off-by: Cilium Imagebot <noreply@cilium.io> 06 February 2023, 22:44:37 UTC
8a65205 chore(deps): update base-images Signed-off-by: Renovate Bot <bot@renovateapp.com> 06 February 2023, 22:44:37 UTC
6bccdf7 renovate: Replace update-hubble-version.sh with Renovate Bot [ upstream commit f6aeb5df60ed3817c9b840a46cdab95f79e9231b ] This commit removes the update-hubble-version.sh script and replaces it with two Renovate configuration entries. The first entry, `kubernetes`, is using Renovate's standard `kubernetes` and `docker` managers to update the Hubble CLI Docker image in `examples/hubble/hubble-cli.yaml` whenever there is a newer Hubble CLI docker image. To make this take effect, we have to set the `ignorePaths` to an empty list, because the `config:base` preset ignores any paths that contain an `/examples/` directory: https://docs.renovatebot.com/presets-default/#ignoremodulesandtests Setting the `ignorePaths` to the empty list is safe, because our configuration relies on `includePaths` instead. Additionally, because Renovate only updates the Docker digest if it is prsent, this commit also pins the Hubble CLI docker image digest so that it is updated by Renovate in subsequent PRs. The second new entry in the Renovate config is a custom `regexManager`. It defines two custom regexes to bump the version and digest of the Hubble CLI GitHub release. The syntax is inspired by Renovate's own Regex Manager Presets defined here: https://docs.renovatebot.com/presets-regexManagers/ The digest update is the most interesting part of this PR. Renovate's `github-releases` manager has a sophisticated mechanism to update digests of artifacts, without any direct reference to artifacts name or architecture. The way this works is that Renovate will fetch all digest files in the old release, find the digest file which contains the `currentDigest` sha, and then replace that digest sha with the one found in the matching artifact file of new release. For this to work, we need to store what version the digest is coming from, thus the custom regex also contains a `digestVersion` field. For more information, see the [mapDigestAssetToRelease](https://github.com/renovatebot/renovate/blob/461cbc75518946fe6c7612122f9ccb2e8202e583/lib/modules/datasource/github-releases/index.ts#L127-L128) function in Renovate's source code. Lastly, this commit removes the `update-hubble-version.sh` script to avoid diverging update mechanisms. Because we no longer need to generate a `hubble-version.sh` file, that file is also now inlined directly by `download-hubble.sh`. I have tested this commit on my fork of cilium/cilium. Renovate (in self-hosting mode) created the following two PRs: - https://github.com/gandro/cilium/pull/85 - https://github.com/gandro/cilium/pull/86 Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> 06 February 2023, 22:19:13 UTC
9793417 docs: Add deprecation warning for socks-enable. Signed-off-by: Martynas Pumputis <m@lambda.lt> 03 February 2023, 23:20:12 UTC
189c53f build(deps): bump actions/github-script from 6.3.3 to 6.4.0 Bumps [actions/github-script](https://github.com/actions/github-script) from 6.3.3 to 6.4.0. - [Release notes](https://github.com/actions/github-script/releases) - [Commits](https://github.com/actions/github-script/compare/v6.3.3...98814c53be79b1d30f795b907e553d8679345975) --- updated-dependencies: - dependency-name: actions/github-script dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> 03 February 2023, 22:33:44 UTC
693bede images: update cilium-{runtime,builder} Signed-off-by: Cilium Imagebot <noreply@cilium.io> 03 February 2023, 12:08:03 UTC
1e06371 chore(deps): update docker.io/library/ubuntu:22.04 docker digest to f05532b Signed-off-by: Renovate Bot <bot@renovateapp.com> 03 February 2023, 12:08:03 UTC
a106c2d .github/workflows: PR labeler fix GH workflow if expression [ upstream commit ddf6ade75d06a85beb85a69f56603c860878b753 ] The evaluation of the expression in the if statement was incorrect since the AND operator should be performed withing the ${{ }}. Fixes: c6e037b557f2 (".github/workflows: set right secret name") Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Quentin Monnet <quentin@isovalent.com> 01 February 2023, 15:32:06 UTC
2b80cd8 gh/workflows: Enable IPv6 in ci-datapath [ upstream commit d48a35df93213eb11bc05f61c6f778e56b453f5a ] [ Backport note: Fixed minor conflict. ] Signed-off-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: Quentin Monnet <quentin@isovalent.com> 01 February 2023, 15:32:06 UTC
d9358b5 hubble: Fix the condition to apply for syn-only flag [ upstream commit 0abaec50281f50826715911b45c78674163f1fe6 ] It turned out that is_reply field is not set for dropped flows, so don't ignore flows unless it's explicitly set to true. Fixes: c29141fd54c3 ("hubble: Add "syn-only" option to flows-to-world metric") Signed-off-by: Michi Mutsuzaki <michi@isovalent.com> Signed-off-by: Quentin Monnet <quentin@isovalent.com> 01 February 2023, 15:32:06 UTC
3443288 Pick up etcd v3.5.7 [ upstream commit c07498e4828b7ae5fbaa0a6e95747a6ead82edb3 ] Pick up the latest etcd patch release. This version fixes a bunch of critical security vulnerabilities. [ Backport note: Fixed conflict due to version and digest being concatenated and part of the same variable on v1.13, not on master branch. ] Ref: https://quay.io/repository/coreos/etcd?tab=tags Signed-off-by: Michi Mutsuzaki <michi@isovalent.com> Signed-off-by: Quentin Monnet <quentin@isovalent.com> 01 February 2023, 15:32:06 UTC
ace114c clustermesh: uniform global and shared service annotations behavior [ upstream commit 0240ab4cce977ccbaa3af24b3c293e805f138357 ] Currently, marking a service as *global* through the dedicated annotation in the local cluster only leads to the usage of both local and remote endpoints in that cluster, and in that case the *shared* service annotation possibly specified in the remote cluster(s) is not respected. This commit adapts the behavior such that: * the backends of a given service are shared only in case that service has the global-service annotation set to true, and the shared-service annotation is either not present, or set to true; * the shared-service and service-affinity annotations take effect only if the global-service one is also present, otherwise they are silently ignored. The full list of possibilities, in case of two clusters, is exemplified in the following: 1. the local svc is not marked as global: => local svc backends include local endpoints only; 2. the local svc is marked as global, the remote one has no annotations: => local svc backends include local endpoints only; 3. the local svc is marked as global, the remote one is marked as global (thus, implicitly as shared): => local svc backends include local and remote endpoints; 4. the local svc is marked as global, the remote one is marked as shared (but not global): => local svc backends include local endpoints only; 5. the local svc is marked as global, the remote one is marked as global and not shared: => local svc backends include local endpoints only. No changes are introduced in case of external workloads (i.e., all services' backends are still synchronized). Fixes: f0465215aec0 ("clustermesh: Correct shared service annotation behaviour") Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> Signed-off-by: Quentin Monnet <quentin@isovalent.com> 01 February 2023, 15:32:06 UTC
5c7f64e test/k8s: Remove some encryption tests [ upstream commit 5c811513eae2bfe2bf745277f5c7b71b9290ca14 ] They are already covered by the ci-datapath. Keep only the IPSec + bpf_host, as suggested by Paul (for IPSec we have only coverage for bpf_network). [ Backport note: Fixed minor conflict, by removing the whole test all the same. ] Signed-off-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: Quentin Monnet <quentin@isovalent.com> 01 February 2023, 15:32:06 UTC
183fa3f .github/workflows: set right secret name [ upstream commit c6e037b557f23b831c839d6dc8a53e431a48a143 ] The secret store in GH settings is CHECK_TEAM_ORG_APP_ID, and not the CHECK_TEAM_ORG which was used during development. [ Backport note: Fixed minor conflict due to digest update from Dependabot. ] Fixes: a71374d83214 (".github/workflows: fix external contribution detection") Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Quentin Monnet <quentin@isovalent.com> 01 February 2023, 15:32:06 UTC
8d67d41 .github: set do not use provenance from docker buildx [ upstream commit f1cbb5f8a908659817e5c0e6a3692424fa647ab9 ] This reverts commit 9ab03d05ef3d347a305456f66cb3901a7b389e31. GitHub recently rolled out Docker buildx version v0.10.0 on their builders, which transparently changed the MediaType of docker images to OCI v1 and added provenance attestations. Unfortunately, various tools we use in CI like SBOM tooling and docker manifest inspect do not properly support some aspect of the new image formats. This resulted in breaking CI, with some messages like this: level=fatal msg="generating doc: creating SPDX document: generating SPDX package from image ref quay.io/cilium/docker-plugin-ci:XXX: generating image package" This could also lead CI to fail while waiting for image builds to complete, because the command we use to test whether the image is available did not support the image types. The commit 9ab03d05ef3d347a305456f66cb3901a7b389e31 attempted to fix this problem by pinning the buildx version to v0.9.1 but unfortunately that didn't work since that version became unavailable. This commit reverts those changes and adds the "provenance: false", which is a flag available in docker buildx >= v0.10.0, to disable the provenance attestation. [ Backport note: Fixed minor conflict. ] Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Quentin Monnet <quentin@isovalent.com> 01 February 2023, 15:32:06 UTC
e1148e5 helm: Bump Hubble UI image to v0.10.0 [ upstream commit 59f4d11a1d460cc051026be3ab64f4a35f47a9bb ] [ Backport note: Fixed conflict due to version and digest being concatenated and part of the same variable on v1.13, not on master branch. ] Signed-off-by: Paulo Gomes <pjbgf@linux.com> Signed-off-by: Quentin Monnet <quentin@isovalent.com> 01 February 2023, 15:32:06 UTC
2ba1bdb .github/workflows: fix typo in organization parameter [ upstream commit 5e04a7e3044385d08dee0d41feeffbec216f69b7 ] The comma should not be part of the organization name. Fixes: a71374d83214 (".github/workflows: fix external contribution detection") Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Quentin Monnet <quentin@isovalent.com> 01 February 2023, 15:32:06 UTC
3730905 ingress: prefer annotation over ingressClassName [ upstream commit a40b8bf5cb4a9219c2e24bd43500c744eaa4550d ] The Kubernetes API documentation says that the ingressClassName field "replaces the deprecated `kubernetes.io/ingress.class` annotation. For backwards compatibility, when that annotation is set, it must be given precedence over this field." Other Kubernetes software relies on this, for example cert-manager and the Hashicorp Vault Helm chart. This patch makes the Cilium ingress controller conform to the spec, by preferring the annotation if it is set, and then falling back to ingressClassName. Fixes: #22340 Signed-off-by: Nikhil Jha <hi@nikhiljha.com> Signed-off-by: Quentin Monnet <quentin@isovalent.com> 01 February 2023, 15:32:06 UTC
93f4f34 .github/workflows: fix external contribution detection [ upstream commit a71374d83214766f3aefecbf46e17f78c9d28f80 ] GH is unable to detect if a user belongs to an organization unless that user has that membership visible in their GH profile. In order to fix this detection, this commit adds a script to perform the membership request from with a GH token that has permissions to read the all members that belong to the Cilium organization. [ Backport note: Fixed minor conflict due to digest updates from Dependabot. ] Fixes: d4aedd9c592f (".github/workflows: print author association") Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Quentin Monnet <quentin@isovalent.com> 01 February 2023, 15:32:06 UTC
61440d2 kvstore, clustermesh: pass context to BackendOperations.Close [ upstream commit 520ab3a50ea70ec15dfea05bb1ad579bbc3d1f4d ] Allow to pass a context, namely the controller context to the backend close operations. Signed-off-by: Tobias Klauser <tobias@cilium.io> Signed-off-by: Quentin Monnet <quentin@isovalent.com> 01 February 2023, 15:32:06 UTC
44c4756 clustermesh: pass controller context to KVStore operations [ upstream commit 4d757e40dce89e2b65380ce4cbebf8e593b715cf ] Make sure these operations get cancelled properly in case the controller terminates. Signed-off-by: Tobias Klauser <tobias@cilium.io> Signed-off-by: Quentin Monnet <quentin@isovalent.com> 01 February 2023, 15:32:06 UTC
39f149a proxy: Fix deadlock in error path of CreateOrUpdateRedirect [ upstream commit a1442e0d6d0601fa09a59b7f322b2aead4edc37e ] When an error occurs during CreateOrUpdateRedirect, it tries to revert previous actions by invoking the `revertFunc` in some (but not all) error paths. This can unfortunately lead to a deadlock, because one of the `RevertFunc`s in its `revertStack` is the one returned by `p.removeRedirect`, which states in its contract that the `p.mutex` must not be held when the revert func is called. This contract was violated in the previous code, as `revertFunc` was called on errors paths without unlocking `p.mutex` first. By moving the `revertFunc` into a defer statement, we ensure it's executed for all error paths. This means that the new code now also calls `implUpdateRevertFunc` in the two error cases "unable to update existing redirect" and "unable to remove old redirect". Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> Signed-off-by: Quentin Monnet <quentin@isovalent.com> 01 February 2023, 15:32:06 UTC
0f72d60 tests: add exception for etcd error [ upstream commit e3803b41670945d73373b6858d7ffb16b7bedd64 ] The etcd error 'Failed to update lock: etcdserver: request timed out' does not seem to be related with Cilium. Thus, we will add it to the list of exceptions and not fail the CI because of this error. This seems to be a consequence of etcd's errors such as: - 'Server.processUnaryRPC failed to write status: connection error: desc = "transport is closing"' - 'Failed to update lock: etcdserver: request timed out' Those issues are referred in the etcd repository: - https://github.com/etcd-io/etcd/issues/14071 - https://github.com/etcd-io/etcd/issues/14027#issuecomment-1293725491 Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Quentin Monnet <quentin@isovalent.com> 01 February 2023, 15:32:06 UTC
4c263f6 test: print log messages that need to be investigated [ upstream commit 7692443803a94435fb78a054ecfdffba70ed649d ] There is more value in printing the log messages than the exception so we should add in the test jenkins output the message that cause the test to fail. Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Quentin Monnet <quentin@isovalent.com> 01 February 2023, 15:32:06 UTC
ef74244 cilium: Fix missing error log dump from compilation [ upstream commit 59b9433ef8a4ca20b04cae81e06ebd0126e474c0 ] When compilation fails in Cilium, it currently does not dump the actual compilation error. This is in particular annoying given the developer experience suffers from it if an error cannot be reproduced otherwise. Turns out this got changed via 6882e98baf7c ("datapath: always generate BTF debug information") where it switched from `compileAndLink(ctx, prog, dir, debug, args...)` to unconditional `compileAndLink(ctx, prog, dir, true, args...)` which then accidentially made the warning messages disappear from regular (non-debug) mode in the scopedLog selector. Before looking into this, I was not aware that these would show up in debug mode. Make this a plain log.Warn given this is something we should warn about and have easy introspection no matter if in debug or non-debug mode. Before fix: [...] level=error msg="Failed to compile bpf_host.dbg.o: exit status 1" compiler-pid=3097802 linker-pid=3097803 subsys=datapath-loader level=error msg="BPF template object creation failed" bpfHeaderfileHash=b67b2d6708b2cb48f42dc6305e5802be6fba717325a2006082c3a0b396e67630 error="failed to compile template program /var/run/cilium/state/templates/b67b2d6708b2cb48f42dc6305e5802be6fba717325a2006082c3a0b396e67630: Failed to compile bpf_host.dbg.o: exit status 1" subsys=datapath-loader [...] After fix: [...] level=error msg="Failed to compile bpf_host.dbg.o: exit status 1" compiler-pid=3086249 linker-pid=3086250 subsys=datapath-loader level=warning msg="In file included from /var/lib/cilium/bpf/bpf_host.c:55:" subsys=datapath-loader level=warning msg="/var/lib/cilium/bpf/lib/nodeport.h:570:8: error: implicit declaration of function 'fib_redirect_v6' [-Werror,-Wimplicit-function-declaration]" subsys=datapath-loader level=warning msg=" ret = fib_redirect_v6(ctx, l3_off, ip6, ctx_get_ifindex(ctx), &oif);" subsys=datapath-loader level=warning msg=" ^" subsys=datapath-loader level=warning msg="/var/lib/cilium/bpf/lib/nodeport.h:601:8: error: implicit declaration of function 'fib_redirect_v4' [-Werror,-Wimplicit-function-declaration]" subsys=datapath-loader level=warning msg=" ret = fib_redirect_v4(ctx, l3_off, ip4, ctx_get_ifindex(ctx), &oif);" subsys=datapath-loader level=warning msg=" ^" subsys=datapath-loader level=warning msg="2 errors generated." subsys=datapath-loader level=error msg="BPF template object creation failed" bpfHeaderfileHash=b67b2d6708b2cb48f42dc6305e5802be6fba717325a2006082c3a0b396e67630 error="failed to compile template program /var/run/cilium/state/templates/b67b2d6708b2cb48f42dc6305e5802be6fba717325a2006082c3a0b396e67630: Failed to compile bpf_host.dbg.o: exit status 1" subsys=datapath-loader [...] Fixes: 6882e98baf7c ("datapath: always generate BTF debug information") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Quentin Monnet <quentin@isovalent.com> 01 February 2023, 15:32:06 UTC
307d271 certloader: test reloading by waiting on the files to be reloaded [ upstream commit 15ff0dc2a481b31324bcfa84902acdd029376458 ] Before this patch, certloader reloading tests were using a short static timeout after which we would expect the files to be reloaded. Because of that the reloading tests were flaky, as the timeout was too short in some testing environment. Instead of increasing the timeout, this patch make it so we actively wait on the files to be reloaded, looping over a ticker and checking that the reloading condition are met. We're relying on the per-test timeout (default 10m) to bail out. Fixes https://github.com/cilium/cilium/issues/21999 Signed-off-by: Alexandre Perrin <alex@isovalent.com> Signed-off-by: Quentin Monnet <quentin@isovalent.com> 01 February 2023, 15:32:06 UTC
c74bd30 certloader: add a generation counter for caCertPool and keypair [ upstream commit 07a983a18acc7eb2a5b17230f57bd10d387fbfd4 ] This will be used in the following patches for certloader testing. Signed-off-by: Alexandre Perrin <alex@isovalent.com> Signed-off-by: Quentin Monnet <quentin@isovalent.com> 01 February 2023, 15:32:06 UTC
839228a certloader: remove timeout from future reload testing [ upstream commit 4ab5b683d24be70c7c7f5589e952ed3b2df5540c ] It has proven flaky and we should be able to rely on the per-test timeout (default to 10m) instead. Fixes https://github.com/cilium/cilium/issues/22750 Signed-off-by: Alexandre Perrin <alex@isovalent.com> Signed-off-by: Quentin Monnet <quentin@isovalent.com> 01 February 2023, 15:32:06 UTC
2d890ec docs: Disable exclusive lock when installing in chaining mode with aws-cni [ upstream commit e61cee7884ecd9340eb090920990586219eb6cc7 ] When Cilium is installed via Helm with `cni.exclusive=true`, AWS CNI functionality is not restored when Cilium is uninstalled resulting in EKS nodes to be in NotReady state due to ``` container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized ``` This is because original CNI config file `10-aws.conflist` was renamed to `10-aws.conflist.cilium_bak` as a result of exclusive locking mechanism. Setting `cni.exclusive=false` prevents this issue. Signed-off-by: Martin Odstrcilik <martin.odstrcilik@gmail.com> Signed-off-by: Quentin Monnet <quentin@isovalent.com> 01 February 2023, 15:32:06 UTC
22f2858 bpf: Remove link scope of cilium_host's IPv4 address [ upstream commit 92a3e31428f60b91a1db6f7a2a658fa3e11fc2ad ] Kube-proxy always masquerades DNATed packets going to NodePort services. This is to ensure that reply packets always flow through the intermediate, DNATing node. Consider the following path: pod@node1 -> nodeport@node2 -> backend@node3 A packet is sent from pod@node1 to a NodePort service with node2's IP address. Node2 DNATs the packet and forwards it to the backend on node3. If node2 doesn't also masquerade the packet, the reply packet will be sent directly to node1, bypassing the reverse DNAT. In tunneling mode however, kube-proxy appears unable to pick the correct source IP for masquerading. Consider the following packet flow (under VXLAN + endpoint routes + IPsec [1]): <- endpoint 656 flow 0x5c7eb4 , identity 20590->unknown state unknown ifindex 0 orig-ip 0.0.0.0: 10.0.1.172:57110 -> 192.168.56.12:30656 tcp SYN -> stack flow 0x5c7eb4 , identity 20590->host state new ifindex 0 orig-ip 0.0.0.0: 10.0.1.172:57110 -> 192.168.56.12:30656 tcp SYN <- host flow 0x5c7eb4 , identity 20590->unknown state unknown ifindex lxc7e0fe2229abe orig-ip 0.0.0.0: 10.0.2.15:45035 -> 10.0.0.165:8080 tcp SYN -> stack flow 0x5c7eb4 , identity 20590->unknown state unknown ifindex cilium_host orig-ip 0.0.0.0: 10.0.2.15:45035 -> 10.0.0.165:8080 tcp SYN <- stack encrypted flow 0x5c7eb4 , identity 20590->unknown state new ifindex cilium_net orig-ip 0.0.0.0: 10.0.2.15:45035 -> 10.0.0.165:8080 tcp SYN -> overlay encrypted flow 0x5c7eb4 , identity 20590->unknown state new ifindex cilium_vxlan orig-ip 0.0.0.0: 10.0.2.15:45035 -> 10.0.0.165:8080 tcp SYN Client pod 10.0.1.172 sends a packet to NodePort 30656 on node 2. That packet is masqueraded to 10.0.2.15 (line 3), the IP on the default interface. This choice is incorrect as the packet will then go through the tunnel and not the underlay. The reply will therefore not be sent through the tunnel and may even fail if 10.0.2.15 isn't routable from node 2 (as is the case in our testing setup). Instead, kube-proxy should pick the IP address of cilium_host, which belongs to the node's pod CIDR, thus ensuring the reply will be routed through the tunnel. Why isn't it? Checking the kernel's source code [2], we can see that the scope of IP addresses on the interfaces is taken into account in addition to the destination IP (and other packet information in case of source routing, etc.). Specifically, in the case of netfilter's masquerading, inet_select_addr is called with a scope of RT_SCOPE_UNIVERSE (0). Therefore, only IP addresses with a scope equal to RT_SCOPE_UNIVERSE will be picked. This commit thus removes the link scope on the IPv4 address of cilium_host, such that the address now has a RT_SCOPE_UNIVERSE scope (default). This will be tested in the Cilium Datapath workflow via a subsequent pull request, but we need to fix one other bug before we can do that. 1 - IPsec doesn't matter to the bug here. Endpoint routes however does. If endpoint routes is enabled, Cilium adds a masquerading rule in front of kube-proxy's to always masquerade DNATed pod traffic to cilium_host IP address. See [3] for details. 2 - https://github.com/torvalds/linux/blob/v5.19/net/ipv4/devinet.c#L1324 3 - https://github.com/cilium/cilium/blob/v1.13.0-rc4/pkg/datapath/iptables/iptables.go#L1216-L1242 Co-authored-by: Liu Xu <liuxu623@gmail.com> Signed-off-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Quentin Monnet <quentin@isovalent.com> 01 February 2023, 15:32:06 UTC
1fa9489 egressgw: ensure stale IP routes/rules are deleted [ upstream commit ecad441f53707c63a82c696678e3ea0b35216a15 ] when the --install-egres-gateway-routes agent option is set to false Fixes: #23255 Signed-off-by: Gilberto Bertin <jibi@cilium.io> Signed-off-by: Quentin Monnet <quentin@isovalent.com> 01 February 2023, 15:32:06 UTC
70e5732 egressgw: pass option.Config.InstallEgressGatewayRoutes to manager [ upstream commit 68bc33f79646a83fb0558bfc05abcf6c20394d91 ] Instead of accessing the global config, pass the variable to NewEgressGatewaymanager() and store it in the manager. Signed-off-by: Gilberto Bertin <jibi@cilium.io> Signed-off-by: Quentin Monnet <quentin@isovalent.com> 01 February 2023, 15:32:06 UTC
back to top