https://github.com/cilium/cilium

sort by:
Revision Author Date Message Commit Date
d637121 Prepare for release v1.11.11 Signed-off-by: Michi Mutsuzaki <michi@isovalent.com> 16 November 2022, 17:55:57 UTC
a83182b images: update cilium-{runtime,builder} Signed-off-by: Michi Mutsuzaki <michi@isovalent.com> 16 November 2022, 05:13:35 UTC
be42a2a docs: Reword note in Azure CNI chaining documentation [ upstream commit b3cd077f711de5d33d0320cd67e1f9487478a383 ] Clarify that Azure CNI chaining is different than Azure CNI Powered by Cilium. Signed-off-by: Will Daly <widaly@microsoft.com> Signed-off-by: Paul Chaignon <paul@cilium.io> 15 November 2022, 21:42:59 UTC
af9f8c8 ipam: Fix overlapping PodCIDR allocation [ upstream commit 4c9c1d352abd78ef85917402d3169bdb611456b8 ] This commit fixes an edge case in the `NodesPodCIDRManager`. If there were any nodes on operator startup which have no PodCIDRs, the operator would sometimes assign PodCIDRs to these nodes which have already been allocated to other nodes. The operator assumed that when `k8sCiliumNodesCacheSynced` closes, all node events have been processed. And it proceeds to call `Resync` on the `nodeManager`. The `NodesPodCIDRManager` will queue any nodes without PodCIDRs to be allocated once the `canAllocatePodCIDRs` variable is set. This variable is set by the `Resync`. So, the assumption/expected behavior is that the `NodesPodCIDRManager.Update` function has been called for all nodes in the cache before `Resync` is called. However, this wasn't the case. The `startSynchronizingCiliumNodes` function starts the informer and connects the nodeManager to it. But instead of handling the events at once, the callbacks enqueue the events, to be handled by a separate go routine. This means that `k8sCiliumNodesCacheSynced` is closed once all of the node events are enqueued, not when they have been processed by the `nodeManager`. This commit fixes this behavior by processing all events at once in the informer callbacks until the full sync is complete, at which point we will switch over to using the workqueue. Fixes: #21482 Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com> Signed-off-by: Paul Chaignon <paul@cilium.io> 15 November 2022, 21:42:59 UTC
09c020f Add a section with distro-specific considerations [ upstream commit e121b5d89a65aacc24f01b154afd7c2944557d5b ] Over time we've been accumulating some knowledge about particular Linux distributions and groups of distributions that has gone largely unnoted in our documentation. A good understanding and implementation of these considerations are extremely important to ensure that Cilium runs properly, so this commit attempts at adding a subsection containing this information. Signed-off-by: Bruno M. Custódio <brunomcustodio@gmail.com> Signed-off-by: Paul Chaignon <paul@cilium.io> 15 November 2022, 21:42:59 UTC
64441a3 docs: Remove autoDirectNodeRoutes where not needed [ upstream commit 34127e6acef82c007a3c492a180c20561d769d6d ] The KPR guide contains the autoDirectNodeRoutes option in most Helm commands, but that option isn't a requirement for KPR subfeatures and may even fail if Kubernetes nodes are not L2-connected. Signed-off-by: Paul Chaignon <paul@cilium.io> 15 November 2022, 21:42:59 UTC
72635fb chore(deps): update module go to 1.17 Signed-off-by: Renovate Bot <bot@renovateapp.com> 15 November 2022, 00:20:31 UTC
0a5613e chore(deps): update docker.io/library/alpine docker tag to v3.16.3 Signed-off-by: Renovate Bot <bot@renovateapp.com> 15 November 2022, 00:16:37 UTC
4375d0a chore(deps): update docker.io/library/alpine docker tag to v3.16.3 Signed-off-by: Renovate Bot <bot@renovateapp.com> 15 November 2022, 00:15:31 UTC
c19eac3 build(deps): bump github/codeql-action from 2.1.30 to 2.1.32 Bumps [github/codeql-action](https://github.com/github/codeql-action) from 2.1.30 to 2.1.32. - [Release notes](https://github.com/github/codeql-action/releases) - [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md) - [Commits](https://github.com/github/codeql-action/compare/18fe527fa8b29f134bb91f32f1a5dc5abb15ed7f...4238421316c33d73aeea2801274dd286f157c2bb) --- updated-dependencies: - dependency-name: github/codeql-action dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> 15 November 2022, 00:11:24 UTC
2de6b53 chore(deps): update docker.io/library/alpine:3.12.7 docker digest to de25c7f Signed-off-by: Renovate Bot <bot@renovateapp.com> 14 November 2022, 14:52:57 UTC
5efc8f4 chore(deps): update docker.io/library/alpine:3.16.2 docker digest to 65a2763 Signed-off-by: Renovate Bot <bot@renovateapp.com> 14 November 2022, 11:31:47 UTC
508c21c chore(deps): update docker.io/library/golang:1.17.13 docker digest to 87262e4 Signed-off-by: Renovate Bot <bot@renovateapp.com> 14 November 2022, 11:31:07 UTC
7784135 chore(deps): update docker.io/library/ubuntu:20.04 docker digest to 450e066 Signed-off-by: Renovate Bot <bot@renovateapp.com> 14 November 2022, 11:25:04 UTC
2d87c0f build(deps): bump golangci/golangci-lint-action from 3.3.0 to 3.3.1 Bumps [golangci/golangci-lint-action](https://github.com/golangci/golangci-lint-action) from 3.3.0 to 3.3.1. - [Release notes](https://github.com/golangci/golangci-lint-action/releases) - [Commits](https://github.com/golangci/golangci-lint-action/compare/07db5389c99593f11ad7b44463c2d4233066a9b1...0ad9a0988b3973e851ab0a07adf248ec2e100376) --- updated-dependencies: - dependency-name: golangci/golangci-lint-action dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> 14 November 2022, 08:50:03 UTC
02bdd4d build(deps): bump github/codeql-action from 2.1.29 to 2.1.30 Bumps [github/codeql-action](https://github.com/github/codeql-action) from 2.1.29 to 2.1.30. - [Release notes](https://github.com/github/codeql-action/releases) - [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md) - [Commits](https://github.com/github/codeql-action/compare/ec3cf9c605b848da5f1e41e8452719eb1ccfb9a6...18fe527fa8b29f134bb91f32f1a5dc5abb15ed7f) --- updated-dependencies: - dependency-name: github/codeql-action dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> 03 November 2022, 13:48:13 UTC
6af996e EndpointManager: fix deadlock when releasing an endpoint [ upstream commit 061e55f3fe7f9e0028c7fa779997c22ce9670ae7 ] In high-churn clusters, there can be a three-party deadlock between the EndpointManager, the PolicyRepository, and a given Endpoint. One of the "links in the chain" is merely trying to get the container ID and namespace+name of an Endpoint for logging. Which we already have. So, rather than trying to lock an Endpoint to get it's identifiers again, just use the copy we already have. Fixes: dae07b58 (endpointmanager: Remove goroutine for ID release) Signed-off-by: Casey Callendrello <cdc@isovalent.com> Signed-off-by: Quentin Monnet <quentin@isovalent.com> 02 November 2022, 16:02:14 UTC
097a980 docs: Update k8s NetworkPolicy descriptions [ upstream commit 6fbbbb9469a86d9b5458d995b775d192cd862d8a ] Add some additional notes to highlight how Cilium's k8s NetworkPolicy support works, including: - Egress policies have not been beta for several years. - Port Ranges. [ Backport note: File Documentation/network/kubernetes/policy.rst has been renamed into Documentation/concepts/kubernetes/policy.rst on master branch. ] Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Quentin Monnet <quentin@isovalent.com> 02 November 2022, 16:02:14 UTC
006ae3f Fixed CCNP garbage collection [ upstream commit 694892c74c1e204ac63cc5e639cda35269367794 ] CCNPs are converted internally into CNPs, but metadata.name has been forgotten Fixes #21393 [ Backport note: File operator/cilium_node.go has been renamed into operator/cmd/cilium_node.go in master branch. ] Signed-off-by: Andrey Klimentyev <andrey.klimentyev@flant.com> Signed-off-by: Quentin Monnet <quentin@isovalent.com> 02 November 2022, 16:02:14 UTC
2bc7e52 build(deps): bump github/codeql-action from 2.1.28 to 2.1.29 Bumps [github/codeql-action](https://github.com/github/codeql-action) from 2.1.28 to 2.1.29. - [Release notes](https://github.com/github/codeql-action/releases) - [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md) - [Commits](https://github.com/github/codeql-action/compare/cc7986c02bac29104a72998e67239bb5ee2ee110...ec3cf9c605b848da5f1e41e8452719eb1ccfb9a6) --- updated-dependencies: - dependency-name: github/codeql-action dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> 28 October 2022, 13:47:49 UTC
61974c6 build(deps): bump KyleMayes/install-llvm-action from 1.5.5 to 1.6.0 Bumps [KyleMayes/install-llvm-action](https://github.com/KyleMayes/install-llvm-action) from 1.5.5 to 1.6.0. - [Release notes](https://github.com/KyleMayes/install-llvm-action/releases) - [Commits](https://github.com/KyleMayes/install-llvm-action/compare/4f17b6579351fb03506d988e59077826c366412c...665aaf9d6fba342a852f55fecc5688e7f00e6663) --- updated-dependencies: - dependency-name: KyleMayes/install-llvm-action dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> 25 October 2022, 09:28:10 UTC
f354eb5 build(deps): bump actions/upload-artifact from 3.1.0 to 3.1.1 Bumps [actions/upload-artifact](https://github.com/actions/upload-artifact) from 3.1.0 to 3.1.1. - [Release notes](https://github.com/actions/upload-artifact/releases) - [Commits](https://github.com/actions/upload-artifact/compare/3cea5372237819ed00197afe530f5a7ea3e805c8...83fd05a356d7e2593de66fc9913b3002723633cb) --- updated-dependencies: - dependency-name: actions/upload-artifact dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> 21 October 2022, 22:55:01 UTC
d2529b4 build(deps): bump actions/setup-go from 3.3.0 to 3.3.1 Bumps [actions/setup-go](https://github.com/actions/setup-go) from 3.3.0 to 3.3.1. - [Release notes](https://github.com/actions/setup-go/releases) - [Commits](https://github.com/actions/setup-go/compare/268d8c0ca0432bb2cf416faae41297df9d262d7f...c4a742cab115ed795e34d4513e2cf7d472deb55f) --- updated-dependencies: - dependency-name: actions/setup-go dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> 21 October 2022, 22:02:21 UTC
b2b2863 build(deps): bump actions/download-artifact from 3.0.0 to 3.0.1 Bumps [actions/download-artifact](https://github.com/actions/download-artifact) from 3.0.0 to 3.0.1. - [Release notes](https://github.com/actions/download-artifact/releases) - [Commits](https://github.com/actions/download-artifact/compare/fb598a63ae348fa914e94cd0ff38f362e927b741...9782bd6a9848b53b110e712e20e42d89988822b7) --- updated-dependencies: - dependency-name: actions/download-artifact dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> 21 October 2022, 20:51:31 UTC
2a4e0ba build(deps): bump golangci/golangci-lint-action from 3.2.0 to 3.3.0 Bumps [golangci/golangci-lint-action](https://github.com/golangci/golangci-lint-action) from 3.2.0 to 3.3.0. - [Release notes](https://github.com/golangci/golangci-lint-action/releases) - [Commits](https://github.com/golangci/golangci-lint-action/compare/537aa1903e5d359d0b27dbc19ddd22c5087f3fbc...07db5389c99593f11ad7b44463c2d4233066a9b1) --- updated-dependencies: - dependency-name: golangci/golangci-lint-action dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> 21 October 2022, 20:51:05 UTC
1da9918 build(deps): bump github/codeql-action from 2.1.27 to 2.1.28 Bumps [github/codeql-action](https://github.com/github/codeql-action) from 2.1.27 to 2.1.28. - [Release notes](https://github.com/github/codeql-action/releases) - [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md) - [Commits](https://github.com/github/codeql-action/compare/807578363a7869ca324a79039e6db9c843e0e100...cc7986c02bac29104a72998e67239bb5ee2ee110) --- updated-dependencies: - dependency-name: github/codeql-action dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> 18 October 2022, 17:09:00 UTC
2c8e8b2 build(deps): bump docker/setup-buildx-action from 2.2.0 to 2.2.1 Bumps [docker/setup-buildx-action](https://github.com/docker/setup-buildx-action) from 2.2.0 to 2.2.1. - [Release notes](https://github.com/docker/setup-buildx-action/releases) - [Commits](https://github.com/docker/setup-buildx-action/compare/c74574e6c82eeedc46366be1b0d287eff9085eb6...8c0edbc76e98fa90f69d9a2c020dcb50019dc325) --- updated-dependencies: - dependency-name: docker/setup-buildx-action dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> 18 October 2022, 17:08:43 UTC
b91ff88 build(deps): bump docker/setup-buildx-action from 2.1.0 to 2.2.0 Bumps [docker/setup-buildx-action](https://github.com/docker/setup-buildx-action) from 2.1.0 to 2.2.0. - [Release notes](https://github.com/docker/setup-buildx-action/releases) - [Commits](https://github.com/docker/setup-buildx-action/compare/95cb08cb2672c73d4ffd2f422e6d11953d2a9c70...c74574e6c82eeedc46366be1b0d287eff9085eb6) --- updated-dependencies: - dependency-name: docker/setup-buildx-action dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> 18 October 2022, 00:28:29 UTC
a2f8daa install: Update image digests for v1.11.10 Generated from https://github.com/cilium/cilium/actions/runs/3265802054. `docker.io/cilium/cilium:v1.11.10@sha256:b804f33301dc57c38839c41a1ddac26e3c25bcc35d4cb50df38075b8348395b5` `quay.io/cilium/cilium:v1.11.10@sha256:b804f33301dc57c38839c41a1ddac26e3c25bcc35d4cb50df38075b8348395b5` `docker.io/cilium/clustermesh-apiserver:v1.11.10@sha256:a6f4901e29876666e99deba16cefa326a5f14343671742f87270139171256190` `quay.io/cilium/clustermesh-apiserver:v1.11.10@sha256:a6f4901e29876666e99deba16cefa326a5f14343671742f87270139171256190` `docker.io/cilium/docker-plugin:v1.11.10@sha256:fda9c537cdceb64a5b164bbbd458221de1789d032b96ff8de59f64334b9c1eab` `quay.io/cilium/docker-plugin:v1.11.10@sha256:fda9c537cdceb64a5b164bbbd458221de1789d032b96ff8de59f64334b9c1eab` `docker.io/cilium/hubble-relay:v1.11.10@sha256:3186f65d6dcbc42f5ca32beca183b93470dc88ca2cee28c01ec89fb1a909d609` `quay.io/cilium/hubble-relay:v1.11.10@sha256:3186f65d6dcbc42f5ca32beca183b93470dc88ca2cee28c01ec89fb1a909d609` `docker.io/cilium/operator-alibabacloud:v1.11.10@sha256:8b1910d6e5ebfee50191a9c80f24ffdb737f4f908c1a287b205402ee9e109be4` `quay.io/cilium/operator-alibabacloud:v1.11.10@sha256:8b1910d6e5ebfee50191a9c80f24ffdb737f4f908c1a287b205402ee9e109be4` `docker.io/cilium/operator-aws:v1.11.10@sha256:f5bd0b9cac11667e63fb70ad9d33aab7c59c0f270c334af4ccfd8bb2d6b62210` `quay.io/cilium/operator-aws:v1.11.10@sha256:f5bd0b9cac11667e63fb70ad9d33aab7c59c0f270c334af4ccfd8bb2d6b62210` `docker.io/cilium/operator-azure:v1.11.10@sha256:ab2f74c1d478b53ac1ac4081dab261b4ecd2ea0beda4b73c75e0578e0f1238a9` `quay.io/cilium/operator-azure:v1.11.10@sha256:ab2f74c1d478b53ac1ac4081dab261b4ecd2ea0beda4b73c75e0578e0f1238a9` `docker.io/cilium/operator-generic:v1.11.10@sha256:6a947cc0655ad0383b929267fe21ab86dd72c05792a8f4056c513f39f87b53ac` `quay.io/cilium/operator-generic:v1.11.10@sha256:6a947cc0655ad0383b929267fe21ab86dd72c05792a8f4056c513f39f87b53ac` `docker.io/cilium/operator:v1.11.10@sha256:69f5207388a3247b537946e344f7f9a6b4b6b3a26eaaa7e50fef801d986977c3` `quay.io/cilium/operator:v1.11.10@sha256:69f5207388a3247b537946e344f7f9a6b4b6b3a26eaaa7e50fef801d986977c3` Signed-off-by: Quentin Monnet <quentin@isovalent.com> 17 October 2022, 16:38:41 UTC
7e6367f build(deps): bump docker/setup-qemu-action from 2.0.0 to 2.1.0 Bumps [docker/setup-qemu-action](https://github.com/docker/setup-qemu-action) from 2.0.0 to 2.1.0. - [Release notes](https://github.com/docker/setup-qemu-action/releases) - [Commits](https://github.com/docker/setup-qemu-action/compare/8b122486cedac8393e77aa9734c3528886e4a1a8...e81a89b1732b9c48d79cd809d8d81d79c4647a18) --- updated-dependencies: - dependency-name: docker/setup-qemu-action dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> 13 October 2022, 17:21:18 UTC
ba5d468 build(deps): bump docker/build-push-action from 3.1.1 to 3.2.0 Bumps [docker/build-push-action](https://github.com/docker/build-push-action) from 3.1.1 to 3.2.0. - [Release notes](https://github.com/docker/build-push-action/releases) - [Commits](https://github.com/docker/build-push-action/compare/c84f38281176d4c9cdb1626ffafcd6b3911b5d94...c56af957549030174b10d6867f20e78cfd7debc5) --- updated-dependencies: - dependency-name: docker/build-push-action dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> 13 October 2022, 17:20:09 UTC
323cac6 build(deps): bump docker/login-action from 2.0.0 to 2.1.0 Bumps [docker/login-action](https://github.com/docker/login-action) from 2.0.0 to 2.1.0. - [Release notes](https://github.com/docker/login-action/releases) - [Commits](https://github.com/docker/login-action/compare/49ed152c8eca782a232dede0303416e8f356c37b...f4ef78c080cd8ba55a85445d5b36e214a81df20a) --- updated-dependencies: - dependency-name: docker/login-action dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> 13 October 2022, 17:19:59 UTC
b58bc73 build(deps): bump docker/setup-buildx-action from 2.0.0 to 2.1.0 Bumps [docker/setup-buildx-action](https://github.com/docker/setup-buildx-action) from 2.0.0 to 2.1.0. - [Release notes](https://github.com/docker/setup-buildx-action/releases) - [Commits](https://github.com/docker/setup-buildx-action/compare/dc7b9719a96d48369863986a06765841d7ea23f6...95cb08cb2672c73d4ffd2f422e6d11953d2a9c70) --- updated-dependencies: - dependency-name: docker/setup-buildx-action dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> 13 October 2022, 17:19:49 UTC
a791fb0 build(deps): bump actions/cache from 3.0.10 to 3.0.11 Bumps [actions/cache](https://github.com/actions/cache) from 3.0.10 to 3.0.11. - [Release notes](https://github.com/actions/cache/releases) - [Changelog](https://github.com/actions/cache/blob/main/RELEASES.md) - [Commits](https://github.com/actions/cache/compare/56461b9eb0f8438fd15c7a9968e3c9ebb18ceff1...9b0c1fce7a93df8e3bb8926b0d6e9d89e92f20a7) --- updated-dependencies: - dependency-name: actions/cache dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> 13 October 2022, 17:08:45 UTC
87e9bc1 build(deps): bump dorny/paths-filter from 2.10.2 to 2.11.1 Bumps [dorny/paths-filter](https://github.com/dorny/paths-filter) from 2.10.2 to 2.11.1. - [Release notes](https://github.com/dorny/paths-filter/releases) - [Changelog](https://github.com/dorny/paths-filter/blob/master/CHANGELOG.md) - [Commits](https://github.com/dorny/paths-filter/compare/v2.10.2...4512585405083f25c027a35db413c2b3b9006d50) --- updated-dependencies: - dependency-name: dorny/paths-filter dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> 12 October 2022, 21:42:17 UTC
97f04bb Prepare for release v1.11.10 Signed-off-by: Quentin Monnet <quentin@isovalent.com> 11 October 2022, 20:47:01 UTC
9586d22 images: update cilium-{runtime,builder} Signed-off-by: Quentin Monnet <quentin@isovalent.com> 11 October 2022, 12:04:08 UTC
31de89f ipcache: Defer CIDR identity release and deletion [ upstream commit 4afbcc38ce91118f1cb4a9f715a268775cdf7f6e ] [ Backporter's notes: Switch from netip to string internally, since the older branch doesn't use netip in ipcache. Additionally, adapt to static IPIdentityCache. ] A user reports the following deadlock in with lock chain[^1] and lock chain[^2]. Lock chain 2 is at a high level trying to implement the new network policy for an endpoint. As part of this step, it calculates a new policy, then attempts to switch the policy pointer over to the new policy, and then garbage collect references & objects in the old policy. When garbage collecting the old policy objects, it is going into the selectorcache and identifying that certain identities are no longer used by the selector and cleaning up those identities. Hence we end up in the ReleaseCIDRIdentitiesByID() call and attempting to grab the ipcache lock. Commit 40e13ea attempted to mitigate a bug where interleaved ordering of identity reference counting and ipcache updates could lead to inconsistent internal state (=> packet drops), but the core idea there was that these two operations (identity refcount update + ipcache update) must happen together in the same critical section, but the exact timing when these two operations occur is not particularly important. I think they should happen in a canonical order, but when that order is iterated is not critical. Given that overall agent state is eventually consistent, cleanup of these ipcache entries can happen at any subsequent time. The sooner the better for sure, but I don't think that there are any hard constraints on this. If lock chain 2 was not holding the endpoint lock while releasing these resources, then this deadlock would not occur. So, this commit proposes to add a new queue + GC goroutine in the ipcache that handles the release of these CIDR identities out-of-band so that they can be cleaned up while not holding the endpoint lock. Then, from the endpoint policy generation perspective it can continue with the remaining endpoint policy calculation / datapath updates, return, and then that will allow lock chain 1 to proceed since it's waiting for all endpoints to regenerate their policies, then once that completes & unlocks the lock, this new dedicated goroutine can do this cleanup. [1]: Lock chain 1: ipc.Lock() -> e.lockAlive() When removeLabelsFromIPs is triggered through an EP change, it performs ipc.Lock() first, then collects information about added/deleted identities in a loop and then calls ipc.UpdatePolicyMaps(...), which then calls UpdatePolicyMaps(...) from the EndpointManager. The call stack of this is: ``` goroutine 237 [semacquire, 417 minutes]: sync.runtime_Semacquire(0xc00038d880?) /usr/local/go/src/runtime/sema.go:56 +0x25 sync.(*WaitGroup).Wait(0xc000398700?) /usr/local/go/src/sync/waitgroup.go:136 +0x52 github.com/cilium/cilium/pkg/ipcache.(*IPCache).UpdatePolicyMaps(0xc00083c980, {0x3461140, 0xc000128008}, 0xc?, 0xc0024410e0) /go/src/github.com/cilium/cilium/pkg/ipcache/metadata.go:235 +0xc7 github.com/cilium/cilium/pkg/ipcache.(*IPCache).removeLabelsFromIPs(0xc00083c980, 0xc0013d5778?, {0x2f2a9f6, 0xf}) /go/src/github.com/cilium/cilium/pkg/ipcache/metadata.go:414 +0x7c5 github.com/cilium/cilium/pkg/ipcache.(*IPCache).RemoveLabelsExcluded(0xc00083c980, 0xc0000d5c50, 0xc000bf41d8?, {0x2f2a9f6, 0xf}) /go/src/github.com/cilium/cilium/pkg/ipcache/metadata.go:328 +0x1ab github.com/cilium/cilium/pkg/k8s/watchers.(*K8sWatcher).handleKubeAPIServerServiceEPChanges(0xc000a91d40, 0xc001a739b0?) /go/src/github.com/cilium/cilium/pkg/k8s/watchers/endpoint.go:135 +0x5b github.com/cilium/cilium/pkg/k8s/watchers.(*K8sWatcher).addKubeAPIServerServiceEPSliceV1(0x1861426?, 0xc0020f2c30) /go/src/github.com/cilium/cilium/pkg/k8s/watchers/endpoint_slice.go:205 +0x452 github.com/cilium/cilium/pkg/k8s/watchers.(*K8sWatcher).updateK8sEndpointSliceV1(0xc000a91d40, 0xc0020f2c30?, 0xc0020f2c30?) /go/src/github.com/cilium/cilium/pkg/k8s/watchers/endpoint_slice.go:178 +0x69 github.com/cilium/cilium/pkg/k8s/watchers.(*K8sWatcher).endpointSlicesInit.func2({0x2ebcec0?, 0xc0020f3a00?}, {0x2ebcec0, 0xc0020f2c30}) /go/src/github.com/cilium/cilium/pkg/k8s/watchers/endpoint_slice.go:71 +0x125 k8s.io/client-go/tools/cache.ResourceEventHandlerFuncs.OnUpdate(...) /go/src/github.com/cilium/cilium/vendor/k8s.io/client-go/tools/cache/controller.go:239 github.com/cilium/cilium/pkg/k8s/informer.NewInformerWithStore.func1({0x2a48d00?, 0xc000a9b5f0?}) /go/src/github.com/cilium/cilium/pkg/k8s/informer/informer.go:103 +0x2fe k8s.io/client-go/tools/cache.(*DeltaFIFO).Pop(0xc000c5a960, 0xc00013b240) /go/src/github.com/cilium/cilium/vendor/k8s.io/client-go/tools/cache/delta_fifo.go:554 +0x566 k8s.io/client-go/tools/cache.(*controller).processLoop(0xc000eff710) /go/src/github.com/cilium/cilium/vendor/k8s.io/client-go/tools/cache/controller.go:184 +0x36 k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x40d645?) /go/src/github.com/cilium/cilium/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:155 +0x3e k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0x102fe25?, {0x3436f20, 0xc000b54b10}, 0x1, 0xc0005e06c0) /go/src/github.com/cilium/cilium/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:156 +0xb6 k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc000eff778?, 0x3b9aca00, 0x0, 0x40?, 0x7f72ce949c00?) /go/src/github.com/cilium/cilium/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133 +0x89 k8s.io/apimachinery/pkg/util/wait.Until(...) /go/src/github.com/cilium/cilium/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:90 k8s.io/client-go/tools/cache.(*controller).Run(0xc000eff710, 0xc0005e06c0) /go/src/github.com/cilium/cilium/vendor/k8s.io/client-go/tools/cache/controller.go:155 +0x2c5 created by github.com/cilium/cilium/pkg/k8s/watchers.(*K8sWatcher).endpointSlicesInit /go/src/github.com/cilium/cilium/pkg/k8s/watchers/endpoint_slice.go:156 +0x759 UpdatePolicyMaps will then start a goroutine for each EP and just return a WaitGroup, so that the caller can wait on it. The goroutines then end up in calling locking the EP lock via lockAlive(). ``` [2]: Lock chain 2: e.lockAlive() -> ipc.Lock() If at the same time a EndpointRegenerationEvent is handled, it might end up locking the Endpoint locks first and then (through a quite deep stack) end up calling ipc.Lock() in IPCache.releaseCIDRIdentities(), which will then cause a deadlock. The stack trace looks like this: ``` goroutine 455 [select, 417 minutes]: golang.org/x/sync/semaphore.(*Weighted).Acquire(0xc000842050, {0x3461140, 0xc000128000}, 0x40000000) /go/src/github.com/cilium/cilium/vendor/golang.org/x/sync/semaphore/semaphore.go:60 +0x345 github.com/cilium/cilium/pkg/lock.(*SemaphoredMutex).Lock(...) /go/src/github.com/cilium/cilium/pkg/lock/semaphored_mutex.go:30 github.com/cilium/cilium/pkg/ipcache.(*IPCache).Lock(...) /go/src/github.com/cilium/cilium/pkg/ipcache/ipcache.go:121 github.com/cilium/cilium/pkg/ipcache.(*IPCache).releaseCIDRIdentities(0xc00083c980, {0x3461178, 0xc000445680}, 0x0?) /go/src/github.com/cilium/cilium/pkg/ipcache/cidr.go:203 +0x85 github.com/cilium/cilium/pkg/ipcache.(*IPCache).ReleaseCIDRIdentitiesByID(0xc00083c980, {0x3461178, 0xc000445680}, {0x0, 0x0, 0x1bf08eb000?}) /go/src/github.com/cilium/cilium/pkg/ipcache/cidr.go:265 +0x497 github.com/cilium/cilium/daemon/cmd.cachingIdentityAllocator.ReleaseCIDRIdentitiesByID(...) /go/src/github.com/cilium/cilium/daemon/cmd/identity.go:118 github.com/cilium/cilium/pkg/policy.(*SelectorCache).releaseIdentityMappings(0xc000398700, {0x0, 0x0, 0x0}) /go/src/github.com/cilium/cilium/pkg/policy/selectorcache.go:556 +0x9d github.com/cilium/cilium/pkg/policy.(*SelectorCache).RemoveSelectors(0xc000398700, {0xc001a00210, 0x1, 0x0?}, {0x3429f40, 0xc003d42600}) /go/src/github.com/cilium/cilium/pkg/policy/selectorcache.go:961 +0xc5 github.com/cilium/cilium/pkg/policy.(*L4Filter).removeSelectors(0xc003d42600, 0xc001c02518?) /go/src/github.com/cilium/cilium/pkg/policy/l4.go:627 +0x185 github.com/cilium/cilium/pkg/policy.(*L4Filter).detach(0xb?, 0x0?) /go/src/github.com/cilium/cilium/pkg/policy/l4.go:634 +0x1e github.com/cilium/cilium/pkg/policy.L4PolicyMap.Detach(...) /go/src/github.com/cilium/cilium/pkg/policy/l4.go:804 github.com/cilium/cilium/pkg/policy.(*L4Policy).Detach(0xc0048b0180, 0x1?) /go/src/github.com/cilium/cilium/pkg/policy/l4.go:1013 +0x7b github.com/cilium/cilium/pkg/policy.(*selectorPolicy).Detach(...) /go/src/github.com/cilium/cilium/pkg/policy/resolve.go:104 github.com/cilium/cilium/pkg/policy.(*cachedSelectorPolicy).setPolicy(0xc000398770?, 0xc000c275c0?) /go/src/github.com/cilium/cilium/pkg/policy/distillery.go:188 +0x3b github.com/cilium/cilium/pkg/policy.(*PolicyCache).updateSelectorPolicy(0xc00098a090, 0xc000c275c0) /go/src/github.com/cilium/cilium/pkg/policy/distillery.go:124 +0x195 github.com/cilium/cilium/pkg/policy.(*PolicyCache).UpdatePolicy(...) /go/src/github.com/cilium/cilium/pkg/policy/distillery.go:153 github.com/cilium/cilium/pkg/endpoint.(*Endpoint).regeneratePolicy(0xc000e47c00) /go/src/github.com/cilium/cilium/pkg/endpoint/policy.go:230 +0x22b github.com/cilium/cilium/pkg/endpoint.(*Endpoint).runPreCompilationSteps(0xc000e47c00, 0xc0015cf400) /go/src/github.com/cilium/cilium/pkg/endpoint/bpf.go:809 +0x2c5 github.com/cilium/cilium/pkg/endpoint.(*Endpoint).regenerateBPF(0xc000e47c00, 0xc0015cf400) /go/src/github.com/cilium/cilium/pkg/endpoint/bpf.go:579 +0x189 github.com/cilium/cilium/pkg/endpoint.(*Endpoint).regenerate(0xc000e47c00, 0xc0015cf400) /go/src/github.com/cilium/cilium/pkg/endpoint/policy.go:398 +0x7a5 github.com/cilium/cilium/pkg/endpoint.(*EndpointRegenerationEvent).Handle(0xc005584d10, 0xc0003dae80?) /go/src/github.com/cilium/cilium/pkg/endpoint/events.go:53 +0x325 github.com/cilium/cilium/pkg/eventqueue.(*EventQueue).run.func1() /go/src/github.com/cilium/cilium/pkg/eventqueue/eventqueue.go:245 +0x13b sync.(*Once).doSlow(0x1?, 0x442205?) /usr/local/go/src/sync/once.go:68 +0xc2 sync.(*Once).Do(...) /usr/local/go/src/sync/once.go:59 github.com/cilium/cilium/pkg/eventqueue.(*EventQueue).run(0x0?) /go/src/github.com/cilium/cilium/pkg/eventqueue/eventqueue.go:233 +0x45 created by github.com/cilium/cilium/pkg/eventqueue.(*EventQueue).Run /go/src/github.com/cilium/cilium/pkg/eventqueue/eventqueue.go:229 +0x76 ``` Thanks to John Watson, Alexander Block, and Chris Tarazi for their reports and assistance in digging to the bottom of this issue. CC: Joe Stringer <joe@cilium.io> Fixes: 40e13ea2a5a9 ("ipcache: Fix race in identity/ipcache release") Signed-off-by: Joe Stringer <joe@cilium.io> 11 October 2022, 05:04:19 UTC
3855e4e ipcache: Fix metadata access from CIDR allocation [ upstream commit 021ea42e6f8cfbb139286db88c4d5e2f84dac93f ] [ Backporter's notes: Fixed release of metadata lock in error case. Additionally, adapt to static IPIdentityCache. ] The locking order for metadata -> IPCache should be first grabbing the metadata lock then the IPCache lock, according to the documentation in the metadata structure. Correct this lock ordering to conform to the documented pattern. This sort of improper lock ordering could theoretically cause a deadlock if for instance the label injector ran at the same time as this function is called. Found by inspection. Fixes: 40e13ea2a5a9 ("ipcache: Fix race in identity/ipcache release") Signed-off-by: Joe Stringer <joe@cilium.io> 11 October 2022, 05:04:19 UTC
d81ebdb fqdn: Upsert all identities to ipcache [ upstream commit e6ad7438357da93e5c5dbf823e71ae349adde61d ] [ Backporter's notes: Conflicts were mostly to convert the methods on (ipc *IPIdentityCache) to the global IPIdentityCache. ] Previously, the logic would only upsert identities into the IPCache if the identity was newly allocated. Logically this makes sense, as the relationship between a CIDR identity and the ipcache should be tightly coupled. However, we have observed in some user environments that ipcache entries may end up being removed from the datapath and the corresponding identity would remain allocated in userspace. As a result, the next time a DNS request arrives which intends to make use of that identity for subsequent connection attempts, it would not populate the ipcache with the identity, leading to packet loss on the connection allowed by ToFQDNs policy. In order to mitigate this issue, ensure that all identities used in DNS responses are populated into the datapath, and track a metric for any cases where this occurs for identities that we expect to already be present in the IPCache. This way, active issues should be mitigated, but we also still have a way to detect whether this mitigation is necessary and whether we need to further investigate the root cause of this issue. Signed-off-by: Joe Stringer <joe@cilium.io> 11 October 2022, 05:04:19 UTC
a3cbee8 ipcache: Fix race in identity/ipcache release [ upstream commit 40e13ea2a5a944a45761fc433c4c971536957f4b ] [ Backporter's notes: Conflicts were mostly to convert the methods on (ipc *IPIdentityCache) to the global IPIdentityCache. Also included a lock leak fix (9238841, "ipcache: Fix lock leak") not in the original commit. ] Create a critical section for identity release + removal from ipcache. Otherwise, it's possible to trigger the following race condition: Goroutine 1 | Goroutine 2 ---------------------------+-------------------------------------- releaseCIDRIdentities() | AllocateCIDRs() -> Release(..., id, ...) | | -> allocate(...) | -> ipc.UpsertGeneratedIdentities(...) -> ipc.deleteLocked(...) | In this case, the expectation from Goroutine 2 is that a new identity is allocated and that identity is inserted into the ipcache, but the result is that the identity is allocated but the ipcache entry is missing. This is partly because the identity released in goroutine 1 is different from the newly allocated identity in goroutine 2, however goroutine 1 will delete the ipcache entry based on the prefix and not the identity. Therefore it's possible for goroutine 1 to delete the ipcache entry corresponding to the identity allocated in goroutine 2. Note that for balancing the upsert / release, we perhaps should cover the entire allocation + ipcache push in Upsert() with the same locking. However, on upsert there is an optional feature of the API to defer the ipcache upsert to a later point, governed by the caller. There is currently no way to extend the locking over that much longer time period, so we only cover the allocation step there. This should still be safe, as one of the following cases should occur: Goroutine 1 | Goroutine 2 ---------------------------+-------------------------------------- Lock | Release() | deleteLocked() | Unlock | | Lock | ipc.allocate() | Unlock | .... (repeat below) | Lock | Upsert | Unlock Goroutine 1 | Goroutine 2 ---------------------------+-------------------------------------- | Lock | ipc.allocate() (increment refcount) | Unlock Lock | Release() | (no deleteLocked()() due | to refcount from (2)) | Unlock | | .... (repeat below) | Lock | Upsert | Unlock Found by code inspection. Suggested-by: Jarno Rajahalme <jarno@isovalent.com> Signed-off-by: Joe Stringer <joe@cilium.io> 11 October 2022, 05:04:19 UTC
358dd0f ipcache: Add metrics for upsert/delete/recover [ upstream commit 044cd8f23cc531d7441127a15832c39fc43d159d ] These errors, total metrics will help users and developers to gather understanding about ipcache operations at runtime. One specific "recover" error that will occur at runtime will be measured in an upcoming commit. This is the primary motivation for introducing these metrics. Signed-off-by: Joe Stringer <joe@cilium.io> 11 October 2022, 05:04:19 UTC
836f6c7 ipsec: Simplify UpsertIPsecEndpoint prototype [ upstream commit b1d7882b05556a7f80c39e4e046bb50246586ad3 ] The `fwd` argument of the UpsertIPsecEndpoitn function is used as the matching CIDR for the destination in the FWD XFRM policy. That CIDR should always be equal to the local CIDR and we already have that as the first argument of UpsertIPsecEndpoint. Therefore, we don't need the third, `fwd`, argument. This commit removes it. Signed-off-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Tam Mach <tam.mach@cilium.io> 11 October 2022, 01:21:25 UTC
05bd06e ipsec: Remove superfluous FWD XFRM policy [ upstream commit acd24a10e4bed9c21e025b295939dedd91bb748d ] We currently install two FWD XFRM policies: one as part of UpsertIPsecEndpoint when called for the In direction and another one as part of enableIPsec, even though that function already calls UpsertIPsecEndpoint. Only one FWD XFRM policy is needed to match all forward traffic. This commit removes one of the policies. Signed-off-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Tam Mach <tam.mach@cilium.io> 11 October 2022, 01:21:25 UTC
b47ca05 ipsec: Set 0/0 as source of FWD XFRM policy [ upstream commit 49ef791e3f6b04aa3873481e07038f0b9bcf39bb ] We want the FWD XFRM policy to allow all traffic through so we can simply set its source CIDR to 0.0.0.0/0. Similarly, the source IP used in the template doesn't matter so we can set it to 0.0.0.0 to clarify that to the kernel. Signed-off-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Tam Mach <tam.mach@cilium.io> 11 October 2022, 01:21:25 UTC
8c012e0 ipsec: Fix incorrect CIDR in XFRM IN policy for proxy [ upstream commit 3650e7b89d5d7ccf3822be600fdb82bd60a07469 ] When IPsec is enabled, we have one XFRM IN policy with mark 0x200 (proxy redirect) configured to allow proxy traffic through. That is needed because that traffic is redirected through the INPUT netfilter chains and the XFRM lookup as part of TPROXY. In EKS & AKS, the CIDR to match destination IP addresses of those packets is incorrect. Instead of being the CIDR(s) encompassing all pod IP addresses, it's the CIDR for the encryption interface. The IP address from the encryption interface should only be used as the outer destination IP address of IPsec encapsulation, as shown below (/16 to match packets in dst; 116.92 IP address as tmpl dst). Before: src 0.0.0.0/0 dst 192.168.116.92/19 dir in priority 0 ptype main mark 0x200/0xf00 tmpl src 0.0.0.0 dst 192.168.116.92 proto esp reqid 1 mode tunnel level use After: src 0.0.0.0/0 dst 192.168.0.0/16 dir in priority 0 ptype main mark 0x200/0xf00 tmpl src 0.0.0.0 dst 192.168.116.92 proto esp reqid 1 mode tunnel level use This bug was causing packet drops when using IPsec with L7 policies (including FQDN policies). It was introduced by a9f18f36e ("datapath/linux/ipsec: Insert additional In rule when tunneling") which introduced this XFRM IN policy for proxy traffic. This new policy was copied from the XFRM IN policy used to decrypt traffic. But in the XFRM IN policy for decryption it's okay to use this /19 CIDR because it's before decryption & decapsulation so that CIDR will match the outer destination IP address (even a /32 would). That's not the case for the inner packet, after decryption. Fixes: a9f18f36e ("datapath/linux/ipsec: Insert additional In rule when tunneling") Signed-off-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Tam Mach <tam.mach@cilium.io> 11 October 2022, 01:21:25 UTC
0e58132 ipsec: Fix slightly incorrect assumption [ upstream commit ec66f947a57eddc49fb6a447a16db22783da1fe7 ] Commit 592ff13a ("ipsec: Simplify XFRM IN policies") simplified the XFRM IN policies on the assumption that only one of a matching IN policy or IN state is needed (requirement 1 below). Things are actually a bit more complicated: we do need an XFRM IN policy matching incoming packets even if we have an XFRM IN state for that, but any XFRM policy with a template matching the XFRM state is good enough. Said another way, there are two requirements: 1. Either an XFRM IN policy or an XFRM IN state matches the incoming packet. AND 2. If an XFRM IN state matches the packet, that state must also match an existing XFRM IN policy's template. (If the first requirement isn't satisfied, we get XfrmInNoPols. If the second isn't, we get XfrmInTmplMismatch.) Despite the incorrect assumption, commit 592ff13a ("ipsec: Simplify XFRM IN policies") didn't introduce any bug. In 592ff13a, we removed one of the two XFRM policies we had because an XFRM IN state was already matching packets for the second policy. That didn't break requirement 2 because the first policy, which was not removed, has a template that matches the XFRM IN state. Even if there are currently no bugs introduced, a latter change may bring a bug because of this incorrect assumption. This commit therefore partially revert 592ff13a. We keep some of the simplification (setup doesn't depend on tunneling) and revert the rest. We will have two XFRM IN policies again. Fixes: 592ff13a ("ipsec: Simplify XFRM IN policies") Signed-off-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Tam Mach <tam.mach@cilium.io> 11 October 2022, 01:21:25 UTC
3576990 bugtool: Fix pprof default ports [ upstream commit b33adeb6f88701979d98095149d6ecb359dc8812 ] gops and pprof do not use the same protocol to collect profile data. Thus, the default port for pprof debug endpoints in `cilium-bugtool` should not be the one used for gops, but the default one for pprof itself. Besides, clustermesh-apiserver does not support pprof yet, but only gops. Thus, the help message for the pprof port option in cilium-bugtool is fixed accordingly. Fixes: #416319b1cd (bugtool: Default to the agent's gops port) Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com> 10 October 2022, 09:41:14 UTC
e09cfb3 contrib: avoid reviews from non-collaborators [ upstream commit 5520be196693d7fce495a91fbf21e3838d5e7343 ] submit-backport tried to create a backport PR with reviews from all contributors whose fixes are being backported, including people who do not have collaborator status in the repository. GitHub only allows reviews to be assigned to collaborators, and thus rejected the review assignments. This commit changes submit-backport to filter the review assignments to only include collaborators. Fixes: #21548 Signed-off-by: David Bimmler <david.bimmler@isovalent.com> Signed-off-by: Tam Mach <tam.mach@cilium.io> 10 October 2022, 09:41:14 UTC
274cbf9 ipsec: Simplify UpsertIPsecEndpoint CIDR arguments [ upstream commit 645da8065170baf7933c18b1b565e02757416e3a ] The previous commit changed the UpsertIPsecEndpoint function as follows: - UpsertIPsecEndpoint(local, remote, fwd *net.IPNet, ... + UpsertIPsecEndpoint(local, remote, fwd *net.IPNet, outerLocal, outerRemote net.IP, ... The first two CIDR arguments, `local` and `remote`, now don't need to carry the outer IP addresses (moved to `outerLocal` and `outerRemote`). We can therefore change calls to this function so that those two first arguments carry only the CIDR (i.e., changed from e.g. 192.168.56.11/24 to 192.168.56.0/24). As a result, we also don't need to mask those two arguments when we want only the CIDR part. Signed-off-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Tam Mach <tam.mach@cilium.io> 10 October 2022, 09:41:14 UTC
24fd48d ipsec: New arguments for UpsertIPsecEndpoint [ upstream commit 91fdc20faa8496fc57b157db87aad760418329df ] This commit adds two new arguments to UpsertIPsecEndpoint to specify the outer source and destination IP address for IPsec. It doesn't include any functional changes. - UpsertIPsecEndpoint(local, remote, fwd *net.IPNet, ... + UpsertIPsecEndpoint(local, remote, fwd *net.IPNet, outerLocal, outerRemote net.IP, ... Until now, those two outer IP addresses were carried as part of the first two CIDR arguments, `local` and `remote`. For example, `local` would be equal to 192.168.56.11/24 where 192.168.56.0/24 would be used to match packets in XFRM policies and 192.168.56.11 as the outer IP address in XFRM states. The outer IPs are now in separate arguments and the next commit will change the local and remote arguments to not carry the IPs. Why this change? Because in a subsequent commit, I will need the CIDR and IP arguments to diverge. For example, we will have UpsertIPsecEndpoint calls with `local=0.0.0.0/0` and `outerLocal=192.168.56.11`. Signed-off-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Tam Mach <tam.mach@cilium.io> 10 October 2022, 09:41:14 UTC
fdc55f9 ipsec: Rename variables in enableIPsec [ upstream commit 3fe791905a098ca5f729933d0a5593c812dc218f ] This commit has no functional changes. It simply renames a few variables in enableIPsec to make their relationships clearer. Signed-off-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Tam Mach <tam.mach@cilium.io> 10 October 2022, 09:41:14 UTC
0c00a9d ipsec: Simplify DeleteIPsecEndpoint parameter [ upstream commit 8ae15622379b2b2614eee0864c714f46b3cf66cd ] Signed-off-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Tam Mach <tam.mach@cilium.io> 10 October 2022, 09:41:14 UTC
6ac0f9c fqdn: dnsproxy: properly forward the original security identity [ upstream commit: afa968b111cd9701520b6ebb8f24c0bdfaa62d3d ] [ Backporter's notes: resolved conflicts in pkg/datapath/iptables/iptables.go ] The recent commit 44c1def67854 ("fqdn: dnsproxy: forward the original security identity") wrongly assumed that setting the (SecID << 16 | 0x0F00) magic mark is enough to pass the original identity from the DNS proxy to the tunnel. However, if iptables are installed, this is not the case: the socket mark will be set to 0X0C00 by an iptable rule. Add an exception to this rule to pass the identity. Signed-off-by: Anton Protopopov <aspsk@isovalent.com> 09 October 2022, 04:14:39 UTC
718b19a fqdn: dnsproxy: forward the original security identity (tunnel case) [ upstream commit: 44c1def67854cd1c4e575828c0824394cbeebb67 ] [ Backporter's notes: conflicts in pkg/fqdn/dnsproxy/{proxy,udp}.go ] Consider a situation in which a pod, which is also a subject to an egress policy, performs a DNS request. This request is redirected to the DNS proxy, which performs address resolutoin. The DNS proxy runs in the host network namespace, and thus the DNS request has the host identity. In case when the DNS server is a subject to an ingress policy, this request may be denied, because the DNS server will see a request from a 'remote-node' identity. Here is an example configuration wich will not let pods labeled woo=hoo to access DNS servers running on a different host, while this should be allowed: apiVersion: "cilium.io/v2" kind: CiliumClusterwideNetworkPolicy metadata: name: "egress-dns" spec: endpointSelector: matchLabels: woo: hoo egress: - toEndpoints: - matchLabels: io.kubernetes.pod.namespace: kube-system k8s-app: kube-dns toPorts: - ports: - port: "53" protocol: UDP rules: dns: - matchPattern: "*" --- apiVersion: "cilium.io/v2" kind: CiliumNetworkPolicy metadata: name: "ingress-dns" namespace: kube-system spec: endpointSelector: matchLabels: k8s-app: kube-dns ingress: - fromEndpoints: - {} toPorts: - ports: - port: "53" protocol: UDP Patch the DNS proxy to pass the original security identity with the DNS request using the SO_MARK socket option. Signed-off-by: Anton Protopopov <aspsk@isovalent.com> 09 October 2022, 04:14:39 UTC
66c1144 vendor: update github.com/miekg/dns to v1.1.50 [ upstream commit 8f377e819b79b2087e0ca4d7ec5ca150c511973e ] [ Backporter's notes: this is not a cherry-pick, but the same process as is described in the description below ] Update the github.com/miekg/dns repository to v1.1.50 (= rebase our fork of the miekg/dns and point to the result in the replace section of the go.mod). Update process: * replace github.com/miekg/dns => github.com/cilium/dns v1.1.51-0.20220729113855-5b94b11b46fc * go mod tidy * go mod vendor Updating the source code breaks build, so patch the pkg/fqdn/dnsproxy/udp.go correspondingly. Signed-off-by: Anton Protopopov <aspsk@isovalent.com> 09 October 2022, 04:14:39 UTC
33fafb2 build(deps): bump github/codeql-action from 2.1.26 to 2.1.27 Bumps [github/codeql-action](https://github.com/github/codeql-action) from 2.1.26 to 2.1.27. - [Release notes](https://github.com/github/codeql-action/releases) - [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md) - [Commits](https://github.com/github/codeql-action/compare/e0e5ded33cabb451ae0a9768fc7b0410bad9ad44...807578363a7869ca324a79039e6db9c843e0e100) --- updated-dependencies: - dependency-name: github/codeql-action dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> 08 October 2022, 10:29:02 UTC
c189c76 build(deps): bump actions/checkout from 3.0.2 to 3.1.0 Bumps [actions/checkout](https://github.com/actions/checkout) from 3.0.2 to 3.1.0. - [Release notes](https://github.com/actions/checkout/releases) - [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md) - [Commits](https://github.com/actions/checkout/compare/2541b1294d2704b0964813337f33b291d3f8596b...93ea575cb5d8a053eaa0ac8fa3b40d7e05a33cc8) --- updated-dependencies: - dependency-name: actions/checkout dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> 05 October 2022, 09:48:35 UTC
a67821f test: node: use Eventually() to check CiliumNode labels This should fix a flake that causes the test to fail as we are not allowing enough time for the changes to the Node label to be applied to the related CiliumNode object Signed-off-by: Gilberto Bertin <jibi@cilium.io> 04 October 2022, 10:26:22 UTC
780e230 test: rename Node.go to node.go to make it consistent with the other test files Signed-off-by: Gilberto Bertin <jibi@cilium.io> 04 October 2022, 10:26:22 UTC
ce24d08 daemon: avoid nil pointer dereference on invalid endpoint state [ upstream commit 2e36f12f7e23fea5377764c2ff9c969d8ebf34d6 ] In case the call to endpoint.NewEndpointFromChangeModel in (*Daemon).createEndpoint fails (e.g. due to invalid data in the request), the returned *endpoint.Endpoint is nil while err is non-nil. However, invalidDataError is called with ep=nil, leading to a nil pointer dereference in ep.SetState. Fixes: 0d6b7ade8d3f ("endpoint: Add Invalid state") Signed-off-by: Tobias Klauser <tobias@cilium.io> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> 04 October 2022, 10:21:21 UTC
ecfe23d ipsec: Simplify XFRM IN policies [ upstream commit 592ff13ae8f93f18fe78180d91f5b4cb4be02562 ] **TL;DR.** We only need one of an XFRM IN policy or an XFRM IN state to match each packet. This commit removes one superfluous XFRM IN policy and enables some additional simplification as a result. What XFRM IN policy we install currently depends of whether we are running in tunneling mode and with or without endpoint routes: In tunneling mode: XFRM IN policy matching on mark 0x200/0xf00 (for proxy) XFRM IN policy matching on mark 0xd00/0xf00 (for decrypt.) In native routing mode with endpoint routes: XFRM IN policy matching on mark 0x200/0xf00 (for proxy) In all cases, we also have: XFRM IN state matching on mark 0xd00/0xf00 (for decrypt.) The two policies in tunneling mode were introduced by a9f18f36 ("datapath/linux/ipsec: Insert additional In rule when tunneling"). The additional case for endpoint routes was introduced by 3ffe49e1 ("ipsec: Fix L7 with endpoint routes"). Now, I got to wonder how 3ffe49e1 even worked as it was missing an XFRM IN policy for 0xd00 which a9f18f36 suggested was necessary. After some local testing, it turns out that the two XFRM IN policies for tunneling mode are not required. All we need is to have either (1) an XFRM IN policy or (2) an XFRM IN state matching the packets. The XFRM state is needed if we want to decrypt packets; the XFRM policy is needed to not drop packets that don't match an XFRM state. Given we always have an XFRM IN state for packets coming with the decryption mark, we don't need an XFRM IN policy for that. We only need an XFRM IN policy for packets coming with the proxy mark because we don't have a state for those, rightly so as we don't want to decrypt them. This commit therefore removes the XFRM IN policy for decryption. It also removes any dependency on particular options: we will always install the XFRM IN policy for the proxy. It doesn't hurt to have that policy even if not required (e.g., in native routing mode without endpoint routes). **How was this tested?** This change was tested with our Jenkins IPsec tests (including the quarantined one for VXLAN), as well as with GKE and EKS clusters of 3 nodes. In all cases, the connectivity tests were executed and L7 policies were thus covered. Signed-off-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> 04 October 2022, 10:21:21 UTC
8280208 cmd/bpf: Log if no policy maps found [ upstream commit 41d14997a270faf9a7917e66fcda27776be2bdb8 ] Explicitly log if no policy maps are found to improve debuggability. Signed-off-by: Aditi Ghag <aditi@cilium.io> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> 04 October 2022, 10:21:21 UTC
69f0488 Fix a typo in the comment example [ upstream commit 3e0d6796c251fb5b671f2ff87e8b9c9b62982137 ] It's `log` in the [config.go](https://github.com/cilium/cilium/blob/master/pkg/option/config.go#L988). Also fix the delimeter (the actual cli acceps the space but not a comma). Signed-off-by: Vladimir Pouzanov <farcaller@gmail.com> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> 04 October 2022, 10:21:21 UTC
2757237 makefile: use versioned go container when formatting after api generate. [ upstream commit 563787ebb318d8cbd756784fbad940ff67c3f368 ] Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> 04 October 2022, 10:21:21 UTC
66995b4 bugtool: Dump envoy config for troubleshooting [ upstream commit 0cb4b9718330bd27b8c931b1cee285d2b9b9ec90 ] There is more and more usage of envoy proxy in cilium, so it's better to have utility to dump its config for troubleshooting later. ``` root@minikube:/home/cilium# tar -xf /tmp/cilium-bugtool-20220918-132428.955+0000-UTC-2901326153.tar root@minikube:/home/cilium# ls -lrt cilium-bugtool-20220918-132428.955+0000-UTC-2901326153/cmd/envoy-config.json -rw-r--r-- 1 root root 28645 Sep 18 13:24 cilium-bugtool-20220918-132428.955+0000-UTC-2901326153/cmd/envoy-config.json ``` Signed-off-by: Tam Mach <tam.mach@cilium.io> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> 04 October 2022, 10:21:21 UTC
f1e7f87 daemon: Fix a nil dereference on cleanup when DNS proxy is not enabled [ upstream commit 619366f813b53a0ce03003bcc3394ab49a67f98c ] The DNS proxy is only allocated if L7 proxy is enabled. The cleanup code did not check if it allocated causing a nil deref on shutdown. controlplane test was modified to only set the mock DNS proxy when L7 proxy is enabled to reflect behavior of bootstrapFDQN and catch similar issue in the future. Fixes: #21264 Fixes: 266f705888 ("dnsproxy: add cleanup") Signed-off-by: Jussi Maki <jussi@isovalent.com> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> 04 October 2022, 10:21:21 UTC
ec45a74 Remove Slack notifications [ upstream commit ac377a419e43b5b6b3cbbf86d62122f877e10ec0 ] Let's use Grafana instead to monitor CI stability. Ref: https://github.com/cilium/cilium/pull/21238 Signed-off-by: Michi Mutsuzaki <michi@isovalent.com> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> 04 October 2022, 10:21:21 UTC
c446999 helm: Fix post-start and pre-stop hooks for cilium-nodeinit [ upstream commit 751e2df4de00cbdec57ac05ca8bf07d83d29d72a ] Signed-off-by: John Watson <johnw@planetscale.com> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> 04 October 2022, 10:21:21 UTC
f0b3312 build(deps): bump actions/cache from 3.0.8 to 3.0.10 Bumps [actions/cache](https://github.com/actions/cache) from 3.0.8 to 3.0.10. - [Release notes](https://github.com/actions/cache/releases) - [Changelog](https://github.com/actions/cache/blob/main/RELEASES.md) - [Commits](https://github.com/actions/cache/compare/fd5de65bc895cf536527842281bea11763fefd77...56461b9eb0f8438fd15c7a9968e3c9ebb18ceff1) --- updated-dependencies: - dependency-name: actions/cache dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> 03 October 2022, 17:23:07 UTC
f7744b3 build(deps): bump github/codeql-action from 2.1.25 to 2.1.26 Bumps [github/codeql-action](https://github.com/github/codeql-action) from 2.1.25 to 2.1.26. - [Release notes](https://github.com/github/codeql-action/releases) - [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md) - [Commits](https://github.com/github/codeql-action/compare/86f3159a697a097a813ad9bfa0002412d97690a4...e0e5ded33cabb451ae0a9768fc7b0410bad9ad44) --- updated-dependencies: - dependency-name: github/codeql-action dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> 29 September 2022, 21:19:58 UTC
d51dbbb build(deps): bump 8398a7/action-slack from 3.13.2 to 3.14.0 Bumps [8398a7/action-slack](https://github.com/8398a7/action-slack) from 3.13.2 to 3.14.0. - [Release notes](https://github.com/8398a7/action-slack/releases) - [Commits](https://github.com/8398a7/action-slack/compare/22048831299719d772f51719ca7384e34b4cc61d...a189acbf0b7ea434558662ae25a0de71df69a435) --- updated-dependencies: - dependency-name: 8398a7/action-slack dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> 26 September 2022, 17:07:26 UTC
c53b2e6 build(deps): bump helm/kind-action from 1.3.0 to 1.4.0 Bumps [helm/kind-action](https://github.com/helm/kind-action) from 1.3.0 to 1.4.0. - [Release notes](https://github.com/helm/kind-action/releases) - [Commits](https://github.com/helm/kind-action/compare/d08cf6ff1575077dee99962540d77ce91c62387d...9e8295d178de23cbfbd8fa16cf844eec1d773a07) --- updated-dependencies: - dependency-name: helm/kind-action dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> 26 September 2022, 17:06:35 UTC
5026add build(deps): bump KyleMayes/install-llvm-action from 1.5.4 to 1.5.5 Bumps [KyleMayes/install-llvm-action](https://github.com/KyleMayes/install-llvm-action) from 1.5.4 to 1.5.5. - [Release notes](https://github.com/KyleMayes/install-llvm-action/releases) - [Commits](https://github.com/KyleMayes/install-llvm-action/compare/c538b5e281d5fc40848a3a62636a3a2b6f5a1cfa...4f17b6579351fb03506d988e59077826c366412c) --- updated-dependencies: - dependency-name: KyleMayes/install-llvm-action dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> 26 September 2022, 08:27:28 UTC
a127a89 build(deps): bump github/codeql-action from 2.1.24 to 2.1.25 Bumps [github/codeql-action](https://github.com/github/codeql-action) from 2.1.24 to 2.1.25. - [Release notes](https://github.com/github/codeql-action/releases) - [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md) - [Commits](https://github.com/github/codeql-action/compare/904260d7d935dff982205cbdb42025ce30b7a34f...86f3159a697a097a813ad9bfa0002412d97690a4) --- updated-dependencies: - dependency-name: github/codeql-action dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> 26 September 2022, 08:25:26 UTC
6994ef9 build(deps): bump github/codeql-action from 2.1.22 to 2.1.24 Bumps [github/codeql-action](https://github.com/github/codeql-action) from 2.1.22 to 2.1.24. - [Release notes](https://github.com/github/codeql-action/releases) - [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md) - [Commits](https://github.com/github/codeql-action/compare/b398f525a5587552e573b247ac661067fafa920b...904260d7d935dff982205cbdb42025ce30b7a34f) --- updated-dependencies: - dependency-name: github/codeql-action dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> 18 September 2022, 11:04:06 UTC
509f637 install: Update image digests for v1.11.9 Generated from https://github.com/cilium/cilium/actions/runs/3053490428. `docker.io/cilium/cilium:v1.11.9@sha256:a732e57cb4881abe4783562bbba0045209ef85542372b44ce61584c887c49878` `quay.io/cilium/cilium:v1.11.9@sha256:a732e57cb4881abe4783562bbba0045209ef85542372b44ce61584c887c49878` `docker.io/cilium/clustermesh-apiserver:v1.11.9@sha256:7fdc72903f079a55a5906e64d01fcc7d86024b08d82425b5d63d392e4b21e1a2` `quay.io/cilium/clustermesh-apiserver:v1.11.9@sha256:7fdc72903f079a55a5906e64d01fcc7d86024b08d82425b5d63d392e4b21e1a2` `docker.io/cilium/docker-plugin:v1.11.9@sha256:d627d49e18ddf9a343403328497e1c5fe6501c0841e31fc974439a06ef338d46` `quay.io/cilium/docker-plugin:v1.11.9@sha256:d627d49e18ddf9a343403328497e1c5fe6501c0841e31fc974439a06ef338d46` `docker.io/cilium/hubble-relay:v1.11.9@sha256:0b2f19895de281e4a416700b17a4dc9b8d3b80eb7b5b65dac173880f5113084e` `quay.io/cilium/hubble-relay:v1.11.9@sha256:0b2f19895de281e4a416700b17a4dc9b8d3b80eb7b5b65dac173880f5113084e` `docker.io/cilium/operator-alibabacloud:v1.11.9@sha256:c179af970e6cffaafecd808f5aa3f5fe3a70151a6ff3192ffbdfa852ae7447c2` `quay.io/cilium/operator-alibabacloud:v1.11.9@sha256:c179af970e6cffaafecd808f5aa3f5fe3a70151a6ff3192ffbdfa852ae7447c2` `docker.io/cilium/operator-aws:v1.11.9@sha256:e07670cfed71007fd49c27c5a7805b8c949caedfc60296b9712b98dbaff82db8` `quay.io/cilium/operator-aws:v1.11.9@sha256:e07670cfed71007fd49c27c5a7805b8c949caedfc60296b9712b98dbaff82db8` `docker.io/cilium/operator-azure:v1.11.9@sha256:65d1c2a43af3700211290a46ee71dfff194475ac94175b5281dd2c839cf37b31` `quay.io/cilium/operator-azure:v1.11.9@sha256:65d1c2a43af3700211290a46ee71dfff194475ac94175b5281dd2c839cf37b31` `docker.io/cilium/operator-generic:v1.11.9@sha256:d98c1d94da2ef597981e16fe8d894103f49b5174e6b36f91341e9fbcd723668b` `quay.io/cilium/operator-generic:v1.11.9@sha256:d98c1d94da2ef597981e16fe8d894103f49b5174e6b36f91341e9fbcd723668b` `docker.io/cilium/operator:v1.11.9@sha256:f6fad3a2c62e8406636976e13d90d852c9e64a353fb303edb492ee9bc6fa2f3f` `quay.io/cilium/operator:v1.11.9@sha256:f6fad3a2c62e8406636976e13d90d852c9e64a353fb303edb492ee9bc6fa2f3f` Signed-off-by: Maciej Kwiek <maciej@isovalent.com> 14 September 2022, 16:37:53 UTC
4409e95 Prepare for release v1.11.9 Signed-off-by: Maciej Kwiek <maciej@isovalent.com> 14 September 2022, 14:09:26 UTC
3092dce Do not enable health checks on Terminating backends [ upstream commit c462868b82a0d0019fb45d40440329a362dde90d ] [ Backporter's notes: Changed to use Terminating field directly since there is no State field in v1.11 ] Previously cilium-agent did not switch off the health check server if only Terminating Endpoints are present on a Node with trafficPolicy: Local Service. Fixes: cilium#21061 Signed-off-by: Andrey Klimentyev <andrey.klimentyev@flant.com> 14 September 2022, 12:02:43 UTC
8acf759 Documentation: run with endpoint routes under aws-cni chaining [ upstream commit 13bcd1b617fa83d14b74da70cea2640f1707e26d ] Similar to #19088, endpoint routes are also required for some features like NodePort-type services to work under aws-cni chaining. This commit adds the endpointRoutes.enabled setting to the Helm snippet in the docs. Related: #21126 Signed-off-by: Timo Beckers <timo@isovalent.com> Signed-off-by: Maciej Kwiek <maciej@isovalent.com> 14 September 2022, 12:00:54 UTC
b1a362d kvstore/allocator: fix panic on receiving identity keys with an empty value [ upstream commit 6fef26f23e53c05ca5082c3d776f99338242e3dc ] This problem is triggered when the event type is "UPDATE" AND the value is an empty string, resulting in the `key` variable uninitialized: ``` panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x1cc4362] goroutine 1430 [running]: github.com/cilium/cilium/pkg/allocator.(*Allocator).encodeKey(0xc00f4fab00, 0x0, 0x0, 0xc0207a5838, 0x2ac8c01) /go/src/github.com/cilium/cilium/pkg/allocator/allocator.go:457 +0x22 github.com/cilium/cilium/pkg/allocator.(*cache).OnModify(0xc00f4fab98, 0x333f4, 0x2f28b38, 0xc03d7b2318) /go/src/github.com/cilium/cilium/pkg/allocator/cache.go:144 +0x22d github.com/cilium/cilium/pkg/kvstore/allocator.(*kvstoreBackend).ListAndWatch(0xc005029e80, 0x2f1c088, 0xc0000c8008, 0x2f1c3d0, 0xc00f4fab98, 0xc00f5aaea0) /go/src/github.com/cilium/cilium/pkg/kvstore/allocator/allocator.go:624 +0x2a7 github.com/cilium/cilium/pkg/allocator.(*cache).start.func1(0xc00f4fab98) /go/src/github.com/cilium/cilium/pkg/allocator/cache.go:198 +0x73 created by github.com/cilium/cilium/pkg/allocator.(*cache).start /go/src/github.com/cilium/cilium/pkg/allocator/cache.go:197 +0xee ``` "CREATE" event handlings are not suffered previously as there is nil pointer checking in `OnAdd()` handler. Signed-off-by: ArthurChiao <arthurchiao@hotmail.com> Signed-off-by: Maciej Kwiek <maciej@isovalent.com> 14 September 2022, 12:00:54 UTC
205d6f4 install: add TerminationMessagePolicy to cilium pods [ upstream commit f62b617059407f437757458fc4318064076c32e7 ] This "captures" the last few lines of logs and sets them as the TerminationMessage in the Status. This means that errors are preserved even if logs are lost (e.g. because of node restarts). Signed-off-by: Casey Callendrello <cdc@isovalent.com> Signed-off-by: Maciej Kwiek <maciej@isovalent.com> 14 September 2022, 12:00:54 UTC
1127030 docs: fix check-crd-compat-table script [ upstream commit ab21ecbd53546abbc8d472498f1bbfae22842ee5 ] Since `head` terminates its execution before reading the input from the previous command, the previous command will receive a SIGPIPE signal. This, together with the fact that the scripts are set with `set -o pipefail` and `set -e`, makes the script to terminate abruptly causing the CRD compatibility table to be incorrectly created. For more information see: https://www.greenend.org.uk/rjk/tech/shellmistakes.html#pipeerrors Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Maciej Kwiek <maciej@isovalent.com> 14 September 2022, 12:00:54 UTC
9c1f29f k8s: fix test flake in TestGenerateToCIDRFromEndpoint. Fixes: #21145 Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com> 12 September 2022, 15:29:16 UTC
5039c94 operator: update CiliumNode in kvstore without lease [ upstream commit 3abbf57b079ddf5ccfd9d1e57f4ef8eadbf9fa98 ] Under normal circumstances, the agents should keep their own CiliumNode up to date in the kvstore. In case of an agent restarting or otherwise failing to renew the lease, the operator's sync logic might take over and update the key with its own lease. This could lead to problems when the respective agent comes back up and tries to renew the lease for its own CiliumNode entry. To prevent this situation, let the operator k8s->kvstore sync logic for CiliumNodes update the entries without taking a lease. Signed-off-by: Tobias Klauser <tobias@cilium.io> Signed-off-by: Jussi Maki <jussi@isovalent.com> 12 September 2022, 15:29:16 UTC
c8f30ee kvstore/store: refactor syncLocalKey{,s} to take a lease argument [ upstream commit 93ad408292d9808b3adaafec5cee91de8be5affe ] Rather that always setting lease=true in the call to the backend's UpdateIfDifferent method, allow callers to request attachment of a lease. Convert all current callers to call with lease=true. Refactoring change only, no change in functionality. Signed-off-by: Tobias Klauser <tobias@cilium.io> Signed-off-by: Jussi Maki <jussi@isovalent.com> 12 September 2022, 15:29:16 UTC
c27468c kvstore/store: remove unused (*SharedStore).UpdateLocalKey [ upstream commit 4fe3615efcabe7d0cf25d6102e54bb1fc16fddf5 ] It's unused since commit 960da244c42d ("kvstore/store: Do not remove local key on sync failure"). Signed-off-by: Tobias Klauser <tobias@cilium.io> Signed-off-by: Jussi Maki <jussi@isovalent.com> 12 September 2022, 15:29:16 UTC
1b27825 kvstore: use (*etcdClient).GetSessionLeaseID [ upstream commit 8bde91aea6c4542d58edeec80e5f1b873ca707ab ] Use the existing method instead of open-coding it in (*etcdClient).UpdateIfDifferentIfLocked and (*etcdClient).UpdateIfDifferent. Signed-off-by: Tobias Klauser <tobias@cilium.io> Signed-off-by: Jussi Maki <jussi@isovalent.com> 12 September 2022, 15:29:16 UTC
70080df hubble-ui: release v0.9.2 [ upstream commit c3feb60e6f63f8a88c668d5e030dd1ac94160395 ] Added `hubble.ui.frontend.server.ipv6.enabled` helm flag to control nginx server ipv6 listener Signed-off-by: Dmitry Kharitonov <dmitry@isovalent.com> Signed-off-by: Jussi Maki <jussi@isovalent.com> 12 September 2022, 15:29:16 UTC
2065ad4 daemon: Coalesce endpoint CIDRs in ENI mode [ upstream commit c87cdeb2496ed99fe3550efa25b515797fe6ab20 ] Fixes: #18868. Multiple CIDRs are currently not coalesced for the health endpoint when setting up routing the corresponding routing tables. This results in orphaned routing entries that may conflict when IPs are reused for workload pods after an agent restart. Addresses comment https://github.com/cilium/cilium/pull/20112#issuecomment-1180343763 Signed-off-by: Simone Sciarrati <s.sciarrati@gmail.com> Signed-off-by: Federico Hernandez <f@ederi.co> Signed-off-by: Jussi Maki <jussi@isovalent.com> 12 September 2022, 15:29:16 UTC
37c6158 datapath: allow packets to and from eni+ container interfaces [ upstream commit 48d46eb621494c09af916d156c286af6aa23e4de ] AWS CNI chaining yields container interface names like 'eni621c0fc8425', not the usual 'lxcXYZ'. This causes packets for local endpoints to be dropped in CILIUM_FORWARD when they are called through a NodePort. Before the patch, the CILIUM_FORWARD chain looks like this: ``` -A CILIUM_FORWARD -o cilium_host -m comment --comment "cilium: any->cluster on cilium_host forward accept" -j ACCEPT -A CILIUM_FORWARD -i cilium_host -m comment --comment "cilium: cluster->any on cilium_host forward accept (nodeport)" -j ACCEPT -A CILIUM_FORWARD -i lxc+ -m comment --comment "cilium: cluster->any on lxc+ forward accept" -j ACCEPT -A CILIUM_FORWARD -i cilium_net -m comment --comment "cilium: cluster->any on cilium_net forward accept (nodeport)" -j ACCEPT -A CILIUM_FORWARD -o lxc+ -m comment --comment "cilium: any->cluster on lxc+ forward accept" -j ACCEPT -A CILIUM_FORWARD -i lxc+ -m comment --comment "cilium: cluster->any on lxc+ forward accept (nodeport)" -j ACCEPT ``` This doesn't match any packets to or from `eni+` container interfaces, letting them fall through to the `KUBE-FORWARD` chain instead: ``` -A FORWARD -m comment --comment "cilium-feeder: CILIUM_FORWARD" -j CILIUM_FORWARD -A FORWARD -m comment --comment "kubernetes forwarding rules" -j KUBE-FORWARD ... -A KUBE-FORWARD -m conntrack --ctstate INVALID -j DROP ``` Initial SYN packets go through to the Pod, SYN-ACK responses from local NodePort services are bpf_redirect'ed back out the physical interface to the client, but any follow-up packets from the client arriving at the node are considered invalid by netfilter's conntrack since the reply packet bypassed the stack, and thus dropped. This commit takes care of adding `-i eni+` and `-o eni+` iptables rules to make sure world->container packets are never dropped in the stack. Signed-off-by: Timo Beckers <timo@isovalent.com> Signed-off-by: Jussi Maki <jussi@isovalent.com> 12 September 2022, 15:29:16 UTC
e203598 daemon,pkg: pull cni-chaining-mode configmap key into DaemonConfig [ upstream commit 99114351abd69f8859fd856ec45040002c95bcc0 ] The `cni-chaining-mode` ConfigMap key was introduced with the initial implementation of AWS CNI chaining, b568d2a179 ("cni: Add support for AWS CNI chaining") but was only used as an environment variable in the CNI installer script(s), not in the agent itself. This commit pulls in the key as a DaemonConfig value from the Cilium ConfigMap and removes manual parsing of CILIUM_CNI_CHAINING_MODE. Signed-off-by: Timo Beckers <timo@isovalent.com> Signed-off-by: Jussi Maki <jussi@isovalent.com> 12 September 2022, 15:29:16 UTC
a6d8370 datapath: tolerate missing ifaces when setting rp_filter sysctl [ upstream commit 86e736f285c923c144b96b6078ddcd19def59074 ] At the point where systemd-sysctl applies our rp_filter settings, the host might not have any cilium_* and/or lxc_* interfaces yet. But systemd-sysctl treats the failure to resolve these globs as an hard error: systemd-sysctl[9354]: Couldn't resolve glob 'net/ipv4/conf/lxc*/rp_filter': No such file or directory systemd[1]: systemd-sysctl.service: Main process exited, code=exited, status=1/FAILURE systemd[1]: systemd-sysctl.service: Failed with result 'exit-code'. Adding the `-` option makes systemd-sysctl tolerate such errors. Fixes: 6432558898aa ("datapath: Create sysctl `rp_filter` overwrite config on agent init") Suggested-by: Dylan Reimerink <dylan.reimerink@isovalent.com> Signed-off-by: Julian Wiedmann <jwi@isovalent.com> Signed-off-by: Jussi Maki <jussi@isovalent.com> 12 September 2022, 15:29:16 UTC
2c026b9 operator: do not GC kvstore nodes if CiliumNodes are not available [ upstream commit 62548f2bc1fad0093b88eef34edc477bd4aad98c ] If users deploy Cilium without creating any CiliumNodes, Cilium Operator will GC all kvstore nodes once it starts. This commits adds a guardrail to prevent such behavior. Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Jussi Maki <jussi@isovalent.com> 12 September 2022, 15:29:16 UTC
ec54d49 metallb: bump to latest fork version [ upstream commit 4c77f5476fb0d9424bda051612212ae72bc58087 ] bumps to Cilium's latest metallb fork version. this bump alleviates a plain log message which ran in a hot loop. Signed-off-by: Louis DeLosSantos <louis.delos@isovalent.com> Signed-off-by: Jussi Maki <jussi@isovalent.com> 12 September 2022, 15:29:16 UTC
626f350 dnsproxy: add cleanup [ upstream commit 266f70588825716c5ef4d0ceff9201ba8e6fa44b ] This change adds Cleanup function to dnsproxy which is added to daemon cleanup module. The cleanup closes TCP and UDP sockets, which will cause proxy to stop serving DNS traffic before shutdown. Signed-off-by: Maciej Kwiek <maciej@isovalent.com> Signed-off-by: Jussi Maki <jussi@isovalent.com> 12 September 2022, 15:29:16 UTC
663a34e dnsproxy: populate DNS clients before proxy start [ upstream commit 588555069ac08bbbd31af77943e33c128c5344e8 ] This change causes DNS clients that dnsproxy uses to connect to upstream DNS servers to be populated before proxy binds to it's sockets. Clients being set after proxy binds might have caused some DNS traffic to be dropped while proxy was starting up. Signed-off-by: Maciej Kwiek <maciej@isovalent.com> Signed-off-by: Jussi Maki <jussi@isovalent.com> 12 September 2022, 15:29:16 UTC
071c575 pkg/k8s/watcher: fix deadlock with service event handler & CES watcher. [ upstream commit 4b87ccc9809bb8a37e51bf5c22726bbff866e541 ] There is a deadlock that can occur when a k8s service update and a policy update occur at the same time. In practice, this can occur in the following situation: 1. CiliumEndpointSlice k8s watcher performs an update due to a new watcher event. The handler logic for this first goes to hold a lock on the IPCache. Next, this triggers an endpoint regeneration via the endpoint manager. Note: This code path will wait for endpoint regeneration to complete via a passed WaitGroup. To complete this task, endpoint manager attempts to lock policyRepository. Effectively, this means that CES handler has locking dependencies on IPCache's lock and policyRepos lock (transitively, by waiting on endpointManager endpoint regeneration). It will not release the IPCache lock until endpoint regen is done, thus waiting on the policyRepo lock. 2. The k8sServiceHandler control loop performs an update due to kube-apiserver service record change (i.e. this is common on EKS where the control plane IPs change often). A new policyRepository.Translator is constructed with k8s.RuleTranslator{} with AllocatedPrefixes being enabled. This implementation of the Translator holds a reference to ipcache and uses that to make necessary prefix updates to ipcache during the translation. This is passed to policyRepository to perform policy rule translation, which locks itself before proceeding to use translator.Translate(...) to perform translation on its state. The k8sServiceHandler now holds nested locks on policyRepo -> ipcache. At this point, let's say codepath 1. can is holding a lock on both ipcache and waiting on a lock for policyRepo (nested ipCache -> policyRepo). At the same time, codepath 2. (i.e. k8sServiceHandler) just grabbed a policyRepo lock and is waiting for the ipcache lock. Codepath 2 (which holds policyRepo) needs ipcache to unlock, which is held by Codepath 1, Which is waiting for policyRepo to unlock. The following is a stack trace of such a case occurring: 101 occurences. Sample stack trace: 6 occurences. Sample stack trace: sync.runtime_SemacquireMutex(0xc0018f0e08?, 0x20?, 0xc000c12740?) /usr/local/go/src/runtime/sema.go:71 +0x25 sync.(*RWMutex).RLock(...) /usr/local/go/src/sync/rwmutex.go:63 github.com/cilium/cilium/pkg/endpoint.(*Endpoint).regeneratePolicy(0xc0010c7c00) /go/src/github.com/cilium/cilium/pkg/endpoint/policy.go:198 +0x11a github.com/cilium/cilium/pkg/endpoint.(*Endpoint).runPreCompilationSteps(0xc0010c7c00, 0xc0005be400) /go/src/github.com/cilium/cilium/pkg/endpoint/bpf.go:814 +0x2c5 github.com/cilium/cilium/pkg/endpoint.(*Endpoint).regenerateBPF(0xc0010c7c00, 0xc0005be400) /go/src/github.com/cilium/cilium/pkg/endpoint/bpf.go:584 +0x189 github.com/cilium/cilium/pkg/endpoint.(*Endpoint).regenerate(0xc0010c7c00, 0xc0005be400) /go/src/github.com/cilium/cilium/pkg/endpoint/policy.go:398 +0x7a5 github.com/cilium/cilium/pkg/endpoint.(*EndpointRegenerationEvent).Handle(0xc0099405b0, 0x2a27540?) /go/src/github.com/cilium/cilium/pkg/endpoint/events.go:53 +0x325 github.com/cilium/cilium/pkg/eventqueue.(*EventQueue).run.func1() /go/src/github.com/cilium/cilium/pkg/eventqueue/eventqueue.go:245 +0x13b sync.(*Once).doSlow(0x2f14d01?, 0x4422a5?) /usr/local/go/src/sync/once.go:68 +0xc2 sync.(*Once).Do(...) /usr/local/go/src/sync/once.go:59 github.com/cilium/cilium/pkg/eventqueue.(*EventQueue).run(0x0?) /go/src/github.com/cilium/cilium/pkg/eventqueue/eventqueue.go:233 +0x45 created by github.com/cilium/cilium/pkg/eventqueue.(*EventQueue).Run /go/src/github.com/cilium/cilium/pkg/eventqueue/eventqueue.go:229 +0x76 1 occurences. Sample stack trace: sync.runtime_Semacquire(0xc0003f44d0?) /usr/local/go/src/runtime/sema.go:56 +0x25 sync.(*WaitGroup).Wait(0xc0003f5420?) /usr/local/go/src/sync/waitgroup.go:136 +0x52 github.com/cilium/cilium/pkg/ipcache.(*IPCache).UpdatePolicyMaps(0xc001003580, {0x3468338, 0xc00007e038}, 0xa?, 0xc008c15e60) /go/src/github.com/cilium/cilium/pkg/ipcache/metadata.go:235 +0xc7 github.com/cilium/cilium/pkg/ipcache.(*IPCache).removeLabelsFromIPs(0xc001003580, 0xc005d73778?, {0x2f35b2b, 0xf}) /go/src/github.com/cilium/cilium/pkg/ipcache/metadata.go:414 +0x7c5 github.com/cilium/cilium/pkg/ipcache.(*IPCache).RemoveLabelsExcluded(0xc001003580, 0xc0000e3110, 0xc001506dd8?, {0x2f35b2b, 0xf}) /go/src/github.com/cilium/cilium/pkg/ipcache/metadata.go:328 +0x1ab github.com/cilium/cilium/pkg/k8s/watchers.(*K8sWatcher).handleKubeAPIServerServiceEPChanges(0xc001586d80, 0xc003ec89b0?) /go/src/github.com/cilium/cilium/pkg/k8s/watchers/endpoint.go:135 +0x5b github.com/cilium/cilium/pkg/k8s/watchers.(*K8sWatcher).addKubeAPIServerServiceEPSliceV1(0xf3c386?, 0xc001ab7d40) /go/src/github.com/cilium/cilium/pkg/k8s/watchers/endpoint_slice.go:205 +0x452 github.com/cilium/cilium/pkg/k8s/watchers.(*K8sWatcher).updateK8sEndpointSliceV1(0xc001586d80, 0xc001ab7d40?, 0xc001ab7d40?) /go/src/github.com/cilium/cilium/pkg/k8s/watchers/endpoint_slice.go:178 +0x69 github.com/cilium/cilium/pkg/k8s/watchers.(*K8sWatcher).endpointSlicesInit.func2({0x2ec7ea0?, 0xc00294c410?}, {0x2ec7ea0, 0xc001ab7d40}) /go/src/github.com/cilium/cilium/pkg/k8s/watchers/endpoint_slice.go:71 +0x125 k8s.io/client-go/tools/cache.ResourceEventHandlerFuncs.OnUpdate(...) /go/src/github.com/cilium/cilium/vendor/k8s.io/client-go/tools/cache/controller.go:239 github.com/cilium/cilium/pkg/k8s/informer.NewInformerWithStore.func1({0x2a4b9c0?, 0xc00057d1e8?}) /go/src/github.com/cilium/cilium/pkg/k8s/informer/informer.go:103 +0x2fe k8s.io/client-go/tools/cache.(*DeltaFIFO).Pop(0xc001b805a0, 0xc000927940) /go/src/github.com/cilium/cilium/vendor/k8s.io/client-go/tools/cache/delta_fifo.go:554 +0x566 k8s.io/client-go/tools/cache.(*controller).processLoop(0xc001bda1b0) /go/src/github.com/cilium/cilium/vendor/k8s.io/client-go/tools/cache/controller.go:184 +0x36 k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x40d6a5?) /go/src/github.com/cilium/cilium/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:155 +0x3e k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xed53e5?, {0x343e1c0, 0xc000d50450}, 0x1, 0xc000929980) /go/src/github.com/cilium/cilium/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:156 +0xb6 k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc001bda218?, 0x3b9aca00, 0x0, 0x30?, 0x7f587b87fd30?) /go/src/github.com/cilium/cilium/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133 +0x89 k8s.io/apimachinery/pkg/util/wait.Until(...) /go/src/github.com/cilium/cilium/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:90 k8s.io/client-go/tools/cache.(*controller).Run(0xc001bda1b0, 0xc000929980) /go/src/github.com/cilium/cilium/vendor/k8s.io/client-go/tools/cache/controller.go:155 +0x2c5 created by github.com/cilium/cilium/pkg/k8s/watchers.(*K8sWatcher).endpointSlicesInit /go/src/github.com/cilium/cilium/pkg/k8s/watchers/endpoint_slice.go:156 +0x759 1 occurences. Sample stack trace: sync.runtime_SemacquireMutex(0xc000880000?, 0x20?, 0x21?) /usr/local/go/src/runtime/sema.go:71 +0x25 sync.(*RWMutex).RLock(...) /usr/local/go/src/sync/rwmutex.go:63 github.com/cilium/cilium/pkg/ipcache.(*metadata).get(0xc00104f770?, {0xc0069e9160?, 0x9?}) /go/src/github.com/cilium/cilium/pkg/ipcache/metadata.go:90 +0x66 github.com/cilium/cilium/pkg/ipcache.(*IPCache).GetIDMetadataByIP(...) /go/src/github.com/cilium/cilium/pkg/ipcache/metadata.go:86 github.com/cilium/cilium/pkg/ipcache.(*IPCache).AllocateCIDRs(0xc001003580, {0xc008680cf0, 0x2, 0x0?}, {0x0, 0x0, 0x0?}, 0x0) /go/src/github.com/cilium/cilium/pkg/ipcache/cidr.go:57 +0x22b github.com/cilium/cilium/pkg/k8s.RuleTranslator.generateToCidrFromEndpoint({0xc001003580, {{0xc005bb63c0, 0xa}, {0xc005bb6378, 0x7}}, {0xc008c15e00}, 0xc001905e60, 0x0, 0x1}, 0xc001f667e0, ...) /go/src/github.com/cilium/cilium/pkg/k8s/rule_translate.go:124 +0xb3 github.com/cilium/cilium/pkg/k8s.RuleTranslator.populateEgress({0xc001003580, {{0xc005bb63c0, 0xa}, {0xc005bb6378, 0x7}}, {0xc008c15e00}, 0xc001905e60, 0x0, 0x1}, 0xc001f667e0, ...) /go/src/github.com/cilium/cilium/pkg/k8s/rule_translate.go:62 +0x172 github.com/cilium/cilium/pkg/k8s.RuleTranslator.TranslateEgress({0xc001003580, {{0xc005bb63c0, 0xa}, {0xc005bb6378, 0x7}}, {0xc008c15e00}, 0xc001905e60, 0x0, 0x1}, 0xc001f667e0, ...) /go/src/github.com/cilium/cilium/pkg/k8s/rule_translate.go:51 +0x18e github.com/cilium/cilium/pkg/k8s.RuleTranslator.Translate({0xc001003580, {{0xc005bb63c0, 0xa}, {0xc005bb6378, 0x7}}, {0xc008c15e00}, 0xc001905e60, 0x0, 0x1}, 0xc001c66750, ...) /go/src/github.com/cilium/cilium/pkg/k8s/rule_translate.go:33 +0x117 github.com/cilium/cilium/pkg/policy.(*Repository).TranslateRules(0xc0003f5490, {0x3440260, 0xc0025fd280}) /go/src/github.com/cilium/cilium/pkg/policy/repository.go:627 +0x10b github.com/cilium/cilium/pkg/k8s/watchers.(*K8sWatcher).k8sServiceHandler.func1({0x0, {{0xc005bb63c0, 0xa}, {0xc005bb6378, 0x7}}, 0xc0015f0c80, 0x0, 0xc003165f50, 0xc001bc9c80}) /go/src/github.com/cilium/cilium/pkg/k8s/watchers/watcher.go:586 +0xc9e github.com/cilium/cilium/pkg/k8s/watchers.(*K8sWatcher).k8sServiceHandler(0xc001586d80) /go/src/github.com/cilium/cilium/pkg/k8s/watchers/watcher.go:623 +0x9f created by github.com/cilium/cilium/pkg/k8s/watchers.(*K8sWatcher).RunK8sServiceHandler /go/src/github.com/cilium/cilium/pkg/k8s/watchers/watcher.go:629 +0x56 This commit solves this situation by moving the IPCache allocation out of the k8s.RuleTranslator Translator implementation. Thus moving the responsibility of the IPCache updating out of the translator. This removes the nested policyRepo -> ipcache locks in translator. So, in situations like the one described, the translation no longer has a dependency on ipcache. Codepath 2 will be able to complete, releasing the policyRepo lock and allowing Codepath 1 to proceed. Note: Rule translation prefixes are not used in other usages of k8s.RuleTranslator called from endpoint watcher handler. So we don't have to add the same ipcache logic as in k8sServiceHandler. Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com> Reported-by: Michi Mutsuzaki <michi@isovalent.com> Signed-off-by: Jussi Maki <jussi@isovalent.com> 12 September 2022, 15:29:16 UTC
6dafc55 Coalesce of health endpoint CIDRs [ upstream commit 10f1193f887b11629c6c3bde4bc2f5bdc13c8358 ] Fixes: #18868. Multiple CIDRs are currently not coalesced for the health endpoint when setting up routing the corresponding routing tables. This results in orphaned routing entries that may conflict when IPs are reused for workload pods after an agent restart. Signed-off-by: Simone Sciarrati <s.sciarrati@gmail.com> Signed-off-by: Federico Hernandez <f@ederi.co> Signed-off-by: Jussi Maki <jussi@isovalent.com> 12 September 2022, 15:29:16 UTC
1a89c30 test: update k8s versions to the latest patched releases Also update k8s libraries to v0.23.10 Signed-off-by: André Martins <andre@cilium.io> 06 September 2022, 07:38:34 UTC
back to top