https://github.com/cilium/cilium

sort by:
Revision Author Date Message Commit Date
33ca4b9 Prepare for release v1.10.11 Signed-off-by: André Martins <andre@cilium.io> 10 May 2022, 00:35:51 UTC
3016cfb helm: Add nodes-gc-interval attribute [ upstream commit 167fa2bac9c7fa062046de6f73da98d585266274 ] This commit is to map operator CLI node GC interval flag in helm value and config map. Signed-off-by: Tam Mach <tam.mach@isovalent.com> Signed-off-by: Alexandre Perrin <alex@kaworu.ch> 09 May 2022, 23:56:00 UTC
b7bdd9a operator: Add cilium node garbage collector [ upstream commit edc1a0a0d2689473469d209027024962a9d55073 ] In the normal scenario, CiliumNode is created by agent with owner references attached all time in below PR[0]. However, there could be the case that CiliumNode is created by IPAM module[1], which didn't have any ownerReferences at all. For this case, if the corresponding node got terminated and never came back with same name, the CiliumNode resource is still dangling, and needs to be garbage collected. This commit is to add garbage collector for CiliumNode, with below logic: - Gargage collector will run every predefined interval (e.g. specify by flag --nodes-gc-interval) - Each run will check if CiliumNode is having a counterpart k8s node resource. Also, remove this node from GC candidate if required. - If yes, CiliumNode is considered as valid, happy day. - If no, check if ownerReferences are set. - If yes, let k8s perform garbage collection. - If no, mark the node as GC candidate. If in the next run, this node is still in GC candidate, remove it. References: [0]: https://github.com/cilium/cilium/pull/17329 [1]: https://github.com/cilium/cilium/blob/master/pkg/ipam/allocator/podcidr/podcidr.go#L258 Signed-off-by: Tam Mach <tam.mach@isovalent.com> Signed-off-by: Alexandre Perrin <alex@kaworu.ch> 09 May 2022, 23:56:00 UTC
69bc6b0 operator: Add sync.Once for k8s node watcher [ upstream commit 64b37e1c478ca3a6f61389aea9c8838edb49f604 ] This is to make sure that k8s node watcher is only setup at max once. Also, synced channel is added, so that the consumer of this store knows if syncing process is done. Signed-off-by: Tam Mach <tam.mach@isovalent.com> Signed-off-by: Alexandre Perrin <alex@kaworu.ch> 09 May 2022, 23:56:00 UTC
3ac7f8e operator: Refactor k8s node watcher for re-usability [ upstream commit e80f60d381238218d1d656035396ff903ed0eef9 ] This commit is to perform lift and shift current initialization for k8s node watcher to another function with single scope of work, so that it can be re-used later. There is no change in logic. Signed-off-by: Tam Mach <tam.mach@isovalent.com> Signed-off-by: Alexandre Perrin <alex@kaworu.ch> 09 May 2022, 23:56:00 UTC
9fdb3a4 images/cilium: remove cilium group from Dockerfile [ upstream commit 67f74ff432010770b43286f32110b8f4cd338e1b ] Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Alexandre Perrin <alex@kaworu.ch> 09 May 2022, 23:56:00 UTC
46af205 test/upgrade: use the unreleased helm chart of stable branches [ upstream commit 88d31cdbb052d00ab23575ffc8f73eedfd4437a7 ] The upgrade tests are using the official helm charts with unreleased Cilium images. This can might cause the upgrade tests to fail in case the changes done in the unreleased Cilium versions require a new helm chart release. To fix this problem the upgrade tests will now use the unreleased helm charts as well. Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Alexandre Perrin <alex@kaworu.ch> 09 May 2022, 23:56:00 UTC
da1c1f8 hubble/relay: Make Peer Service a K8s Service [ upstream commit 21e6e6ade37fb6ac9d38dd885c3f236bdb869c26 ] Currently Hubble-Relay builds its client list by querying the Peer Service over the local Hubble Unix domain socket. This goes against best security practices (sharing files across pods) and is not allowed on platforms that strictly enforce SELinux policies (e.g. OpenShift). This PR enables, by default, the creation of a Kubernetes Service that proxies the Hubble Peer Service so that Hubble-Relay can use it to build its client list, eliminating the need for a shared Unix domain socket completely. Helm values and configurations have been added to enable the service in a cilium deployment. Signed-off-by: Nate Sweet <nathanjsweet@pm.me> Signed-off-by: Alexandre Perrin <alex@kaworu.ch> 09 May 2022, 23:56:00 UTC
52896d6 images/runtime: update CNI plugins to 1.1.1 This allows Cilium to run in environments using containerd which do not have a loopback CNI plugin binary available. Reported-by: Krzysztof Nazarewski on Slack Suggested-by: André Martins <andre@cilium.io> Signed-off-by: Tobias Klauser <tobias@cilium.io> 09 May 2022, 21:26:20 UTC
f916333 build(deps): bump docker/build-push-action from 2.10.0 to 3 Bumps [docker/build-push-action](https://github.com/docker/build-push-action) from 2.10.0 to 3. - [Release notes](https://github.com/docker/build-push-action/releases) - [Commits](https://github.com/docker/build-push-action/compare/ac9327eae2b366085ac7f6a2d02df8aa8ead720a...e551b19e49efd4e98792db7592c17c09b89db8d8) --- updated-dependencies: - dependency-name: docker/build-push-action dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> 06 May 2022, 12:27:43 UTC
3d16a60 build(deps): bump docker/setup-buildx-action from 1.7.0 to 2 Bumps [docker/setup-buildx-action](https://github.com/docker/setup-buildx-action) from 1.7.0 to 2. - [Release notes](https://github.com/docker/setup-buildx-action/releases) - [Commits](https://github.com/docker/setup-buildx-action/compare/f211e3e9ded2d9377c8cadc4489a4e38014bc4c9...dc7b9719a96d48369863986a06765841d7ea23f6) --- updated-dependencies: - dependency-name: docker/setup-buildx-action dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> 06 May 2022, 12:27:33 UTC
b305900 build(deps): bump docker/setup-qemu-action from 1.2.0 to 2 Bumps [docker/setup-qemu-action](https://github.com/docker/setup-qemu-action) from 1.2.0 to 2. - [Release notes](https://github.com/docker/setup-qemu-action/releases) - [Commits](https://github.com/docker/setup-qemu-action/compare/27d0a4f181a40b142cce983c5393082c365d1480...8b122486cedac8393e77aa9734c3528886e4a1a8) --- updated-dependencies: - dependency-name: docker/setup-qemu-action dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> 06 May 2022, 12:27:26 UTC
7fc216f build(deps): bump docker/login-action from 1.14.1 to 2 Bumps [docker/login-action](https://github.com/docker/login-action) from 1.14.1 to 2. - [Release notes](https://github.com/docker/login-action/releases) - [Commits](https://github.com/docker/login-action/compare/dd4fa0671be5250ee6f50aedf4cb05514abda2c7...49ed152c8eca782a232dede0303416e8f356c37b) --- updated-dependencies: - dependency-name: docker/login-action dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> 06 May 2022, 12:27:17 UTC
922b949 pkg/k8s: use subresource "nodes/status" to update node annotations [ upstream commit 9014253d3640f1d2df836890f52497ac4072d88d ] We can use the "status" subresource to update node annotations which also allow us to reduce the clusterrole's permissions of the cilium DaemonSet even further. Signed-off-by: André Martins <andre@cilium.io> 04 May 2022, 20:19:43 UTC
23f77d5 operator: move certain K8s Node operations to cilium-operator [ upstream commit f612c97aacbb44e6cc7c3587541c53dd0296d5ea ] To decrease the amount of permissions Cilium's requires to operate in a cluster, the node taint removal and the setup of the node condition NetworkUnavailable can be set through cilium-operator. Cilium-operator will remove, if set, the Cilium's specific node taints from the Kubernetes nodes as well as setting up the NetworkUnavailable node condition to 'false' once it detects there is a "Ready" Cilium pod in that node. Signed-off-by: André Martins <andre@cilium.io> 04 May 2022, 20:19:43 UTC
c374991 install: default AnnotateK8sNode to true [ upstream commit 73d6cae2c90600cee5c61a0ea452b7a2a3129dd9 ] Since this option only existed to set up annotations in Kubernetes Nodes before the introduction of CiliumNodes, contrary to the upstream commit this option will be kept to 'true' with the possibility for users to change it to 'false'. Signed-off-by: André Martins <andre@cilium.io> 04 May 2022, 20:19:43 UTC
9bc8cf0 install/kubernetes: trimmed down clustermesh-apiserver's ClusterRole Trimmed down clustermesh-apiserver's ClusterRole to the exact permissions that clustermesh-apiserver requires. Signed-off-by: André Martins <andre@cilium.io> 04 May 2022, 20:19:43 UTC
5602c59 install/kubernetes: remove finalizers for Cilium resources [ upstream commit d02833801430125c018d96083881d0387554d053 ] Follow up of 0f4d3a71b055 ("helm: Remove Unnecessary RBAC Permissions for Agent") Signed-off-by: André Martins <andre@cilium.io> 04 May 2022, 20:19:43 UTC
81e2049 install/kubernetes: remove update pod from Cilium's clusterrole [ upstream commit 2d63c9b17bdb8838683990d96fda5f579dd56da5 ] Cilium does not need to perform any Pod update thus this permission can be removed from Cilium's Cluster Role. Signed-off-by: André Martins <andre@cilium.io> 04 May 2022, 20:19:43 UTC
bf033a5 pkg/k8s: remove BlockOwnerDeletion: true from CEP [ upstream commit 900f66879ad4c66e62eaa334fe7bd6ab2119e5b1 ] Since Cilium does not set any finalizer in the owner of the CEP, a Pod, it does not make sense to set "BlockOwnerDeletion: true". Regardless of this option being `true` or `false`, the Pod dependent, in this case the CEP, is always* Garbage Collected by Kubernetes. *Only if the user specifies the pod deletion with the "orphan" deletion cascading strategy that the CEP will be kept. However, Cilium Operator will garbage collect orphaned Cilium Endpoints every 5 minutes by default. Signed-off-by: André Martins <andre@cilium.io> 04 May 2022, 20:19:43 UTC
3e1c673 metrics: Add go_* metrics and go_build_info metrics [ upstream commit 6c5e2d66f3e3efcf9723c4cbb19d92ae80bcb1d7 ] [ Backporter's notes: Needed to bump the Prometheus module in order to make the upstream commit pass Go's mod checks. ] Prometheus provides metrics collectors that expose go runtime and go build information, which can be useful to server administrators, lets expose them. Signed-off-by: Chance Zibolski <chance.zibolski@gmail.com> Signed-off-by: Chris Tarazi <chris@isovalent.com> 03 May 2022, 22:32:28 UTC
8f2e008 docs: set the right url for API version check [ upstream commit af8151d730ac48789773ad5c970c6d2858bab76c ] The right format for this field should contain the protocol and a trailing "/" to work properly. Fixes: b3b05029e4c9 ("docs: fix version warning URL to point to docs.cilium.io") Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Aditi Ghag <aditi@cilium.io> 03 May 2022, 17:13:29 UTC
ca473d2 docs: Update max MTU value for Nodeport XDP on AWS [ upstream commit 1db91caffba860afb81f796ea021f4db0712a42b ] The documentation for setting up Nodeport XDP acceleration on AWS mentions that the MTU for the ena interface must be lower down so that XDP can work. It is indeed necessary; but the value which is provided as the maximal possible MTU is outdated, and not working. After installing the latest kernel through the RPM package kernel-ng (as prescribed in the documentation), the EKS nodes currently end up with Linux 5.10: $ uname -r 5.10.106-102.504.amzn2.x86_64 If we keep on following the docs and lower the MTU to 3818, the Cilium pods fail to get ready, and tell in their logs that the XDP program cannot be set due to the MTU. This is also confirmed from the dmesg of the nodes: [ 3617.059219] ena 0000:00:05.0 eth0: Failed to set xdp program, the current MTU (3818) is larger than the maximum allowed MTU (3498) while xdp is on The value 3818 comes from the legacy definition of ENA_XDP_MAX_MTU, in drivers/net/ethernet/amazon/ena/ena_netdev.h, which used to be defined as such: #define ENA_XDP_MAX_MTU (ENA_PAGE_SIZE - ETH_HLEN - ETH_FCS_LEN - \ VLAN_HLEN - XDP_PACKET_HEADROOM) Where ETH_LEN is 14, ETH_FCS_LEN and VLAN_HLEN are both 4, and XDP_PACKET_HEADROOM is 256. But after Linux commit 08fc1cfd2d25 ("ena: Add XDP frame size to amazon NIC driver"), from Linux 5.8, the definition changed to: #define ENA_XDP_MAX_MTU (ENA_PAGE_SIZE - ETH_HLEN - ETH_FCS_LEN - \ VLAN_HLEN - XDP_PACKET_HEADROOM - \ SKB_DATA_ALIGN(sizeof(struct skb_shared_info))) As a result, the maximum value for the MTU for kernels 5.8+ is 3498 bytes. This is indeed the maximum value that I could use when setting up XDP on an EKS cluster. Let's update the documentation accordingly. Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Aditi Ghag <aditi@cilium.io> 03 May 2022, 17:13:29 UTC
f892cde identity: Initialize local identity allocator early [ upstream commit 2e5f35b79e70e9de517b46615ce8e4abc6f9769d ] Move local identity allocator initialization to NewCachingIdentityAllocator() so that it is initialized when the allocator is returned to the caller. Also make the events channel and start the watcher in NewCachingIdentityAllocator(). Close() will no longer GC the local identity allocator or stop the watcher. Now that the locally allocated identities are persisted via the bpf ipcache map across restarts, recycling them at runtime via Close() would be inappropriate. This is then used in daemon bootstrap to restore locally allocated identities before new policies can be received via Cilium API or k8s API. This fixes the issue where CIDR policies were received from k8s before locally allocated (CIDR) identities were restored, causing the identities derived from the received policy to be newly allocated with different numeric identity values, ultimately causing policy drops during Cilium restart. Fixes: #19360 Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> 29 April 2022, 17:27:22 UTC
7e72244 daemon: Do not try to detect Dump support [ upstream commit b61a347f58102a2a71d47615408e74f8e5dafbc9 ] ipcache SupportDump() and SupportsDelete() open the map to probe for the support if the map is not already open and also schedule the bpf-map-sync-cilium_ipcache controller. If the controller is run before initMaps(), initMaps will fail as the controller will leave the map open and initMaps() assumes this not be the case. Solve this by not trying to detect dump support, but try dump and see if it succeeds. This fixes Cilium Agent crash on kernels that do not support ipcache dump operations and when certain Cilium features are enabled on slow machines that caused the scheduled controller to run too soon. Fixes: 19360 Fixes: 19495 Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> Signed-off-by: Cilium Maintainers <maintainer@cilium.io> 29 April 2022, 17:27:22 UTC
55e076c docs: fix version warning URL to point to docs.cilium.io [ upstream commit b3b05029e4c955c6014c5778f595af6bbd4db2e8 ] Due to some CORS policy, the requests being performed from docs.cilium.io to readthedocs.org were being denied. This was causing the warning banner to never show up in the documentation. To avoid this problem a page redirect was configured in readthedocs settings to redirect docs.cilium.io/version to readthedocs.org/api/v2/version which will hopefully fix the issue and the API endpoint was set to docs.cilium.io. Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Joe Stringer <joe@cilium.io> 28 April 2022, 22:50:55 UTC
9e95af3 fqdn: Limit the number of zombies per host [ upstream commit ac93cb4f359b7eb93e5ce97a38650062a989ed24 ] [ Backporter's notes: Minor conflicts in endpoint.go, fqdn/cache.go ] Commit f6ce522d55d5 ("FQDN: Added garbage collector functions.") introduced a per-host limit on the number of IPs to be associated in the DNS cache, but at the time we did not support keeping FQDN entries alive beyond DNS TTL ("zombie entries"). These were later added in commit f6293725867c ("fqdn: Add and use DNSZombieMappings in Endpoint"), but at that time no such per-host limit was imposed on these zombie entries. Commit 5923dafd88be ("fqdn: keep IPs alive if their name is alive") later adjusted the zombie garbage collection to allow zombies to stay alive as long as any IP that shares the same FQDN is marked as alive. Unfortunately, this lead to situations where a very high number of DNS cache entries remain in the cache beyond the DNS TTL, simply because one IP for the given name continues to be used. In the case of something like Amazon S3, where DNS TTLs are known to be low, and IP recycling high, if an app constantly made requests via ToFQDNs policy towards names hosted by this service, this could lead to thousands of stale FQDN mappings accumulating in the cache. For each of these mappings, Cilium would allocate corresponding identities, and when this is combined with a permissive pod policy, this could lead to policymaps becoming full, and error messages in the logs like: msg="Failed to add PolicyMap key" ... error="Unable to update element for map with file descriptor 67: argument list too long" This could also prevent new pods from being scheduled on nodes, as Cilium would be unable to implement the full requested policy for the new endpoints. In order to mitigate this situation, extend the per-host limit configuration to apply separately also to zombie entries. This allows up to 'ToFQDNsMaxIPsPerHost' FQDN entries that are alive (ie below DNS TTL) in addition to a further 'ToFQDNsMaxIPsPerHost' zombie entries corresponding to connections which remain alive beyond the DNS TTL. Signed-off-by: Joe Stringer <joe@cilium.io> 28 April 2022, 22:50:55 UTC
955ba82 fqdn: Refactor zombie sort function [ upstream commit aafc70b44e44519eced5927d715587d1a38c01b8 ] It'll be useful to reuse this in-place zombie sort in an upcoming patch, split it out in preparation. Signed-off-by: Joe Stringer <joe@cilium.io> 28 April 2022, 22:50:55 UTC
1ee37c1 make: check that Go major/minor version matches required version [ upstream commit ec187e876308d125240591256d9c522814e6fc90 ] [ Backporter's notes: Conflicted with v1.10 tree. Resolved to just add the new target and hook it into the Makefile. ] Currently, when building Cilium with a Go major/minor version other than the one specified in `GO_VERSON`, the build currently fails with: package github.com/cilium/cilium/cilium: build constraints exclude all Go files It's not obvious from the compiler error message that this is due to mismatching Go version. This change adds a `check-go-version` target which checks the Go compiler version used for building Cilium (as specified in `$(GO)` against the version pinned in the `GO_VERSION` file, i.e. the version used to build Cilium in CI. This check is required to pass in the `precheck` target which should surface mismatching Go versions in a more user-friendly way. Example with matching version: $ go version go version go1.18.1 linux/amd64 $ make check-go-version Example with mismatching version: $ go1.17.9 version go version go1.17.9 linux/amd64 $ make GO=go1.17.9 check-go-version Installed Go version 1.17 does not match requested Go version 1.18 make: *** [Makefile:602: check-go-version] Error 1 Suggested-by: Aditi Ghag <aditi@cilium.io> Signed-off-by: Tobias Klauser <tobias@cilium.io> Signed-off-by: Joe Stringer <joe@cilium.io> 28 April 2022, 22:50:55 UTC
ee0049f build(deps): bump docker/setup-buildx-action from 1.6.0 to 1.7.0 Bumps [docker/setup-buildx-action](https://github.com/docker/setup-buildx-action) from 1.6.0 to 1.7.0. - [Release notes](https://github.com/docker/setup-buildx-action/releases) - [Commits](https://github.com/docker/setup-buildx-action/compare/94ab11c41e45d028884a99163086648e898eed25...f211e3e9ded2d9377c8cadc4489a4e38014bc4c9) --- updated-dependencies: - dependency-name: docker/setup-buildx-action dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> 28 April 2022, 22:26:41 UTC
fec2ba7 build(deps): bump actions/checkout from 3.0.1 to 3.0.2 Bumps [actions/checkout](https://github.com/actions/checkout) from 3.0.1 to 3.0.2. - [Release notes](https://github.com/actions/checkout/releases) - [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md) - [Commits](https://github.com/actions/checkout/compare/dcd71f646680f2efd8db4afa5ad64fdcba30e748...2541b1294d2704b0964813337f33b291d3f8596b) --- updated-dependencies: - dependency-name: actions/checkout dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> 26 April 2022, 23:24:53 UTC
28713df docs: Update section title to improve readability [ upstream commit 79d53af7d5f0c3b91c17ff7cb5cc0b203d34f1d1 ] Local redirect policy requires Kube-proxy replacement, and the feature flag to be enabled. Rename the section that outlines these steps so that users are less likely to miss them. Suggested-by: Raymond de Jong <raymond.dejong@isovalent.com> Signed-off-by: Aditi Ghag <aditi@cilium.io> Signed-off-by: Tobias Klauser <tobias@cilium.io> 22 April 2022, 06:29:03 UTC
67f0d77 pkg/redirectpolicy: Improve error logs [ upstream commit 6c34c93dce924bc24072c23a154a62cf568b4d38 ] Improve error logs thrown by port validation logic so that user can take necessary actions. Signed-off-by: Aditi Ghag <aditi@cilium.io> Signed-off-by: Tobias Klauser <tobias@cilium.io> 22 April 2022, 06:29:03 UTC
7e425d4 docs: improve description for session affinity with KPR [ upstream commit b7002f593b44850af162e8f8f5f01a0dcc92e65b ] Use the correct terminology ('session affinity' vs 'service affinity'), and fix a typo. Signed-off-by: Julian Wiedmann <jwi@isovalent.com> Signed-off-by: Tobias Klauser <tobias@cilium.io> 22 April 2022, 06:29:03 UTC
d7ea906 pkg/bpf: add map name in error message for OpenParallel [ upstream commit 29c3ebdc0ace37e3068945942d35fe3e3786f4cd ] Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Tobias Klauser <tobias@cilium.io> 22 April 2022, 06:29:03 UTC
56a1c45 jenkinsfiles: Increase VM boot timeout [ upstream commit cfec27a217259e2002932864fe69e1df072319bc ] This commit increases the VM boot timeout while decreasing the overall timeout :mindblown: We currently run the vagrant-ci-start.sh script with a 15m timeout and retry twice if it fails. That takes up to 45m in total if all attempts fail, as in frequently happening in CI right now. In particular, if the script simply fails because it's taking on average more than 15m then it is likely to fail all three times. This commit instead increases the timeout from 15m to 25m and removes the retries. The goal is obviously to succeed on the first try :p Ideally, we would investigate why it is now taking longer to start the VM. But this issue has been happening for a long time. And because of the retries, we probably didn't even notice the increase at the beginning: if it takes on average 15min, it might fail half the time and the test might still succeed most of the time. That is, the retries participate to hide the increase. Signed-off-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Tobias Klauser <tobias@cilium.io> 22 April 2022, 06:29:03 UTC
43824d8 install: Update image digests for v1.10.10 Generated from https://github.com/cilium/cilium/actions/runs/2053999962. `docker.io/cilium/cilium:v1.10.10@sha256:83bfc1052543e8b1e31f06fa2b5bbd2bd41cc79f264010241fc1994e35281616` `quay.io/cilium/cilium:v1.10.10@sha256:83bfc1052543e8b1e31f06fa2b5bbd2bd41cc79f264010241fc1994e35281616` `docker.io/cilium/clustermesh-apiserver:v1.10.10@sha256:e13d41db3f5ee93d8b3abcaa10cc4005522bc797be3d69fc96ac5e03b60c7b11` `quay.io/cilium/clustermesh-apiserver:v1.10.10@sha256:e13d41db3f5ee93d8b3abcaa10cc4005522bc797be3d69fc96ac5e03b60c7b11` `docker.io/cilium/docker-plugin:v1.10.10@sha256:cd45b531e97b588d4e8c825cb588611949044db4351dcffeacf92ba2f4208054` `quay.io/cilium/docker-plugin:v1.10.10@sha256:cd45b531e97b588d4e8c825cb588611949044db4351dcffeacf92ba2f4208054` `docker.io/cilium/hubble-relay:v1.10.10@sha256:a0769e44299bba301dee08d489f4e2d3b3924916bed985346dcf9fcf10861c8a` `quay.io/cilium/hubble-relay:v1.10.10@sha256:a0769e44299bba301dee08d489f4e2d3b3924916bed985346dcf9fcf10861c8a` `docker.io/cilium/operator-alibabacloud:v1.10.10@sha256:6154fcc069700cca6754cff0ee7bf6990bbf4a2865076b5358cb0c70c0043d52` `quay.io/cilium/operator-alibabacloud:v1.10.10@sha256:6154fcc069700cca6754cff0ee7bf6990bbf4a2865076b5358cb0c70c0043d52` `docker.io/cilium/operator-aws:v1.10.10@sha256:9bc04377606cb57c16f699a5b34dcdd6b6ffc1c4f43f5e6da81015fc16c10edc` `quay.io/cilium/operator-aws:v1.10.10@sha256:9bc04377606cb57c16f699a5b34dcdd6b6ffc1c4f43f5e6da81015fc16c10edc` `docker.io/cilium/operator-azure:v1.10.10@sha256:6973d45f7255c1791c0502339675a42105b8cbeca1a98634362623433674efe1` `quay.io/cilium/operator-azure:v1.10.10@sha256:6973d45f7255c1791c0502339675a42105b8cbeca1a98634362623433674efe1` `docker.io/cilium/operator-generic:v1.10.10@sha256:8a317287b6ac8fe0ba4999342c9627dc913e0c1591552164f96d0aadf5d1a740` `quay.io/cilium/operator-generic:v1.10.10@sha256:8a317287b6ac8fe0ba4999342c9627dc913e0c1591552164f96d0aadf5d1a740` `docker.io/cilium/operator:v1.10.10@sha256:8462f34a9c081126c9281bc637d76b3c7f81668bbb77a4a66a3dda16554915e9` `quay.io/cilium/operator:v1.10.10@sha256:8462f34a9c081126c9281bc637d76b3c7f81668bbb77a4a66a3dda16554915e9` Signed-off-by: Joe Stringer <joe@cilium.io> 19 April 2022, 04:12:59 UTC
8a77a4a Prepare for release v1.10.10 Signed-off-by: Joe Stringer <joe@cilium.io> 15 April 2022, 16:03:20 UTC
3f3fec1 build(deps): bump actions/checkout from 3.0.0 to 3.0.1 Bumps [actions/checkout](https://github.com/actions/checkout) from 3.0.0 to 3.0.1. - [Release notes](https://github.com/actions/checkout/releases) - [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md) - [Commits](https://github.com/actions/checkout/compare/a12a3943b4bdde767164f792f33f40b04645d846...dcd71f646680f2efd8db4afa5ad64fdcba30e748) --- updated-dependencies: - dependency-name: actions/checkout dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> 14 April 2022, 20:31:12 UTC
47e81a6 envoy: Limit accesslog socket permissions [ upstream commit 5595e622243948f74187b449186e4575f451b9e5 ] [ Backporter's notes: trivial conflicts in `cilium-agent.md` and `pkg/envoy/accesslog_server.go` due to other changes in the lines right next to this backport since v1.10. ] Limit access to Cilium xDS and access log sockets to root and group 1337 used by Istio sidecars. Fixes: #3131 Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com> 14 April 2022, 19:33:02 UTC
266183d Add a 'Limitations' section to 'External Workloads'. [ upstream commit 3454eceacc5933a98c60481d909e8878c665aed2 ] This commit adds a 'Limitations' section to the 'External Workloads' page, initially referring to the fact that transparent encryption to/from external workloads is currently not supported. Signed-off-by: Bruno M. Custódio <brunomcustodio@gmail.com> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com> 14 April 2022, 19:33:02 UTC
9228eed ci: run documentation workflow on workflow changes [ upstream commit 5272fceb64bdd0c9609b8b18331a5c99b25ae0a5 ] Let's run the documentation GitHub workflow when the related YAML file changes, in order to make sure that the workflow is still working as expected after the change. Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com> 14 April 2022, 19:33:02 UTC
698afeb ci: pin down image for documentation workflow [ upstream commit 0da7224218ab96d5f5a8f7e9d267e9743c07c83a ] Instead of using the ":latest" version of the docs-builder image, pin down the version to indicate a specific version to use. The context for this change is some preparation for updating the version of Sphinx used by the image. Specifying an explicit image to use has the following advantages: - When using ":latest" we have to update the image _and_ the workflow at the same time, or the workflow will break. By contrast, once we pin down the image, we can push a new image on Docker without breaking the workflow, and then update the workflow to switch to the new image, on the same PR that updates the build process. - This helps testing an experimental ":latest" image from a PR, without breaking the workflow on the master branch. - If anything goes wrong, this makes it easier to revert the change by rolling back to a previous pinned image, without having to push again a rolled-back docs-builder image as the new ":latest". Most other workflows, if not all, already pin down the images they use. The pinned image is the current ":latest", so there should be no change to the current state of the workflow. Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com> 14 April 2022, 19:33:02 UTC
adff281 Use RLock while reading data in DNS proxy [ upstream commit 8233b4eaff9110b516fa82d56963054f133c5045 ] This change changes locking behaviour in GetRules and CheckAllowed DNSProxy methods to read lock, which should allow concurrent DNS traffic handling goroutines to get the data they need from the proxy faster. Signed-off-by: Maciej Kwiek <maciej@isovalent.com> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com> 14 April 2022, 19:33:02 UTC
4f5148c wireguard: Reject duplicate public keys [ upstream commit 130506e63f66bcb73339aecb857dbd374ef9094b ] This fixes an issue where a node could duplicate another node's public key and thereby force all cilium-agent instances to overwrite the original node's WireGuard peer. While the impersonating node cannot decrypt any traffic (it does not know the original node's public key, nor would traffic be redirected to the impersonating node due to the lack of AllowedIPs on the impersonating peer), the impersonating node could still effectively deny any traffic to the original node. This commit fixes the issues by keeping track of which node has added a given public key. Because node names are guaranteed to be unique, only one node can own a certain public key this way. Any attempt to upsert a public key already owned by another node will be rejected and a warning will be emitted in the cilium-agent logs. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com> 14 April 2022, 19:33:02 UTC
c5fa809 fix(helm): do not override CHECKPOINT_PATH [ upstream commit b38d59167ba32aa420df3a9ee4dfe8eb8bcf0457 ] [ Backporter's notes: conflicts due to structure around daemonset manifest having changed in #16795, which was not backported to v1.10. Changes have been manually reapplied where appropriate in `install/kubernetes/cilium/templates/cilium-nodeinit-daemonset.yaml`. ] Take advantage of the dynamic CHECKPOINT_PATH in the startup-script: https://github.com/cilium/image-tools/blob/396d5a19690229b42398a8be63b5712ea64a2601/images/startup-script/manage-startup-script.sh#L58 Signed-off-by: Raphaël Pinson <raphael@isovalent.com> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com> 14 April 2022, 19:33:02 UTC
a77f324 fix(helm): use a directory to store bootstrap time [ upstream commit 27a93e89f1b2ad39d5859cf3535b3d37be2aa6f0 ] [ Backporter's notes: conflicts due to structure around daemonset manifest having changed in #16795, which was not backported to v1.10. Changes have been manually reapplied where appropriate in `install/kubernetes/cilium/templates/cilium-nodeinit-daemonset.yaml` and `install/kubernetes/cilium/templates/cilium-agent-daemonset.yaml`. Please note in particular the Helm condition in that last file is different than in `master`, even though it should not have any impact. ] Fixes: #14483 Signed-off-by: Raphaël Pinson <raphael@isovalent.com> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com> 14 April 2022, 19:33:02 UTC
cec90cb daemon: Restore old CIDR identities [ upstream commit 2d343847846b9b51a774f69c6296463812c9be7b ] Dump old ipcache on daemon startup and restore all locally allocated (CIDR) identities with the same numeric security identity as before. This helps avoid transient drops during restart due to identity changes. Any restored identities that remain unused after 10 minutes are removed from the IP cache. This grace period is configurable with a new agent command line option --identity-restore-grace-period that takes a duration, e.g., identity-restore-grace-period=5m for 5 minutes. Note that this feature requires support from the Linux kernel. Kernels from 4.11 to 4.15 inclusive do not support this feature. Testing is done in the FQDN runtime tests to verify that locally allocated identities are stable across restart, and that they also reappear in selectorcache after FQDN policies are re-applied. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> 13 April 2022, 13:33:39 UTC
68cd8dd ipcache: Add SupportsDump() helper [ upstream commit b7d83d671e50afa7a5943f9a1bf7de0bdeedc306 ] Add SupportsDump() helper reusing the probing already done for SupportsDelete(). Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> 13 April 2022, 13:33:39 UTC
a615ab3 hubble: use HasLocalScope() [ upstream commit 84f467313dce6bad193e91eb0a77b7f883cad099 ] Use HasLocalScope() helper rather than open coding it. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> 13 April 2022, 13:33:39 UTC
d5b5e7d vendor: pull in the latest changes from github.com/vishvananda/netlink [ upstream commit fbf37525921b5eb0a838ea10b46cee718ec953ea ] Includes fix for broken compilation on mac OS - https://github.com/vishvananda/netlink/pull/730. go get github.com/vishvananda/netlink@main;go mod tidy;go mod vendor Signed-off-by: Aditi Ghag <aditi@cilium.io> Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> 13 April 2022, 13:33:39 UTC
e5e5b4a test: Fix whitespace in docker-run-cilium [ upstream commit 262ac5fbbd4270766b639529482121a55e2cbeaa ] Add space between provided and default args. Fixes: #19310 Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> 13 April 2022, 13:33:39 UTC
d39a89e test: Support provisioning in non-vagrant VMs [ upstream commit 43db56895dc5a70d31582688a3b253f81ab31ceb ] Add support for passing VMUSER (which defaults to vagrant) to ease running tests in non-vagrant VMs. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> 13 April 2022, 13:33:39 UTC
4a1d92f test: Allow runtime tests to use pre-built Cilium image [ upstream commit ea68a7cddfe4e4348a62895bfea7874669f5a601 ] Pass CILIUM_IMAGE and CILIUM_TAG from environment to provisioning. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> 13 April 2022, 13:33:39 UTC
14b4997 test: Runtime use Cilium docker image [ upstream commit 324228b5a3bad3e65d957ad2895985de2a605cd1 ] Run Cilium in docker container for the Runtime tests. Keep the systemd cilium.service, but uses a new script to run Cilium from a docker container from there. This design has a high degree of compatibility to the prior running cilium-agent directly from cilium.docker. Test scripts are organized so that there is no change when running CIlium in a development VM. There cilium-agent is still run in the host as before. While working with this I noticed that CIlium operator fails to run in Runtime tests as it now assumes to be able to reach k8s api-server. CIlium agent fails after a while due to this if it is using etcd kvstore as the heartbeats are missing. That's why the kvstore needs to return to the default (consul) configuration after the etcd test. Previously this was done after each test, but now this is done after all (two) of the kvstore tests, speeding up the tests a bit. Do not pass explicit options when they are the same as defaults. This also avoids using systemd template where bare Cilium agent options are expected. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> 13 April 2022, 13:33:39 UTC
ad0ea07 test: Pull images for runtime or k8s tests, not for both [ upstream commit 9d4b1bb322e955a715b2a2b9ac610e0dfbd2284a ] Runtime tests do not need images used by k8s tests, nor do k8s tests need images used by runtime tests. Pull images only for the test suite in use. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> 13 April 2022, 13:33:39 UTC
d8db929 build(deps): bump actions/cache from 3.0.1 to 3.0.2 Bumps [actions/cache](https://github.com/actions/cache) from 3.0.1 to 3.0.2. - [Release notes](https://github.com/actions/cache/releases) - [Changelog](https://github.com/actions/cache/blob/main/RELEASES.md) - [Commits](https://github.com/actions/cache/compare/136d96b4aee02b1f0de3ba493b1d47135042d9c0...48af2dc4a9e8278b89d7fa154b955c30c6aaab09) --- updated-dependencies: - dependency-name: actions/cache dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> 12 April 2022, 12:59:40 UTC
5096d21 workflows: Use docker manifest inspect for waiting images [ upstream commit 0c817496e8eae32862f439686d2678329c369116 ] We are currently using quay.io API to wait for new image we are building to be available. Recently, quay.io API made a breaking change and our CI affected. As a counter measure for that, we decided to use `docker manifest inspect` command. This commit converts quay.io API calls to `docker manifest inspect`. For some tests running on the MacOS environment (for nested virtualization support) are modified to install Docker CLI at the beginning, right after we set the check status to pending. Co-authored-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com> Signed-off-by: Paul Chaignon <paul@cilium.io> 06 April 2022, 15:03:35 UTC
ce9b1e2 ipcache: Add test asserting out-of-order Kubernetes events [ upstream commit 31a5ff1620e7f3b222af0c884351e161e04539cd ] These tests answer the following questions: * What happens if we receive an add/ update event from k8s with a pod that is using the same IP address of an already-gone-pod-but-delete-event-not-received? * What happens if we receive an delete of an already gone pod after we have received an add/ update from a new pod using that same IP address? What these tests confirm is that Kubernetes events that are out-of-order are handled as they're received. Meaning the ipcache doesn't have any special logic to handle for example whether an ipcache delete for a pod X with IP A is the same pod X (by namespace & name) which previously inserted an ipcache entry. Suggested-by: André Martins <andre@cilium.io> Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Paul Chaignon <paul@cilium.io> 06 April 2022, 15:03:35 UTC
8d8ab18 docs: Clarify use of the `eni.subnetTagsFilter` option [ upstream commit 212d6e78229ab8365ecc80337218c29d8107f33a ] The `eni.subnetTagsFilter` option is notoriously hard to use correctly. If it is used with tags that don't match the subnet of the pre-attached ENI, Cilium agent will never become ready (#18239). This PR removes it from the ENI documentation (which most users will use as a reference configuration) such that no one enables this option without being aware of its requirements. This PR also adds additional context the Helm value. We might deprecate and remove the option in the future as well (#19181). Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> Signed-off-by: Paul Chaignon <paul@cilium.io> 06 April 2022, 15:03:35 UTC
1661408 test: Wait until host EP is in ready state [ upstream commit 3b9b098138d6197f2f7f9a8477bb33f0b698bb1e ] Previously (hopefully), we saw many CI flakes which were due to the first request from outside to k8s Service failing. E.g., Can not connect to service "http://192.168.37.11:30146" from outside cluster After some investigation it became obvious why it happened. Cilium-agent becomes ready before the host endpoints get regenerated (e.g., bpf_netdev_eth0.o). This leads to old programs to handling requests which might fail in different ways. For example the following request failed in the K8sServicesTest Checks_N_S_loadbalancing Tests_with_direct_routing_and_DSR suite: {..., "IP":{"source":"192.168.56.13","destination":"10.0.1.105", ..., "trace_observation_point":"TO_OVERLAY","interface":{"index":40}, ...} The previous suite was running in the tunnel mode, so the old program was still trying to send the packet over the tunnel which no longer existed. This resulted in the silent drop. Fix this by making the CI to wait after deploying Cilium until the host EP is in the "ready" state. This should ensure that the host EP programs have been regenerated. Signed-off-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: Paul Chaignon <paul@cilium.io> 06 April 2022, 15:03:35 UTC
86ff454 build(deps): bump KyleMayes/install-llvm-action from 1.5.1 to 1.5.2 Bumps [KyleMayes/install-llvm-action](https://github.com/KyleMayes/install-llvm-action) from 1.5.1 to 1.5.2. - [Release notes](https://github.com/KyleMayes/install-llvm-action/releases) - [Commits](https://github.com/KyleMayes/install-llvm-action/compare/v1.5.1...v1.5.2) --- updated-dependencies: - dependency-name: KyleMayes/install-llvm-action dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> 04 April 2022, 15:11:46 UTC
22b624a routing: Consistent conditions for ip rule creation/deletion [ upstream commit 298ece34407caee7eeffacdac35a85af094ecb4b ] In addition to the inconsistency fixed in the previous commit, there is another we need to fix. On the deletion of ip rules with the form "to x.x.x.x/x", we check that we are running in ENI mode. We don't perform the same check on the creation of those rules. Thus, on AKS and AlibabaCloud, we could theoretically create rules of one form and attempt to delete the other form. In practice, this isn't causing a bug because: - The CIDR used in the "to x.x.x.x/x" form appears to always be 0.0.0.0/0 and is therefore equivalent to not having a to CIDR specified. - We don't support equivalent flags to --aws-release-excess-ips on AKS and AlibabaCloud and, as detailed in the previous commit, that flag is necessary to trigger the bug. Thus, this commit is fixing an inconsistency in the code that doesn't have any consequence today. Signed-off-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com> 04 April 2022, 09:59:59 UTC
0bc463c routing: Fix the incorrect deletion of IP rules [ upstream commit 4e422e321750047b6280c14f1df365063bb8ae27 ] The IP rules for ENI can have two forms, with or without the to x.x.x.x/x part. There is a bug between the creation and deletion of rules, where the conditions to decide which form to use aren't the same. As the result, we would attempt to remove the wrong IP rules on pod deletion, causing a warning message in logs and leaving a stale IP rule behind: level=warning msg="No rule matching found" rule="111: from 100.69.11.94/32 to 10.244.0.0/16 lookup 0" subsys=linux-routing In case a new rule is later added for a new pod with the same IP, we can end up with a conflict if the new pod is on a different ENI. We will then have two rules for the same IP with lookups in different rouing tables. 111: from 10.19.192.168 lookup 10 111: from 10.19.192.168 lookup 11 This conflict can then cause connectivity issues because of pod traffic leaving through the wrong interface. Flag --aws-release-excess-ips needs to be enabled for the bug to occur. Otherwise the IP address stays attached to the same ENI and we don't end up with an IP rule conflict. Reported-by: Sebastian Wicki <gandro@gmx.net> Signed-off-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com> 04 April 2022, 09:59:59 UTC
0d493fd build(deps): bump actions/cache from 3.0.0 to 3.0.1 Bumps [actions/cache](https://github.com/actions/cache) from 3.0.0 to 3.0.1. - [Release notes](https://github.com/actions/cache/releases) - [Changelog](https://github.com/actions/cache/blob/main/RELEASES.md) - [Commits](https://github.com/actions/cache/compare/4b0cf6cc4619e737324ddfcec08fff2413359514...136d96b4aee02b1f0de3ba493b1d47135042d9c0) --- updated-dependencies: - dependency-name: actions/cache dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> 31 March 2022, 13:15:01 UTC
90d29ab pkg/redirectpolicy, docs: Add missing namespace check [ upstream commit 2efbdd68a7a578874d1a9ecb884bfba4f76e0f0b ] Local Redirect Policy (LRP) namespace needs to match with the backend pods selected by the LRP. This check was missing in the case where backend pods are deployed after an LRP that selects them was applied. Added unit tests. Reported-by: Joe Stringer <joe@covalent.io> Signed-off-by: Aditi Ghag <aditi@cilium.io> Signed-off-by: Quentin Monnet <quentin@isovalent.com> 30 March 2022, 13:41:35 UTC
5f45074 command: Add additional string map unit test case [ upstream commit 1a34bc544e529957198777c68722baadaa215fba ] This test already passes, but it's a good corner case to test to avoid running into regressions. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> Signed-off-by: Quentin Monnet <quentin@isovalent.com> 30 March 2022, 13:41:35 UTC
31ac2d8 cmd: Always use GetStringMapString parse result [ upstream commit 96e7fa64d577ab311b1fe5f5c671f5fed4dc90ce ] Because string map options are configured `flags.Var`, they are assigned twice: Once in `flags.Var`, and then again in `option.Config.Populate()` with `GetStringMapString`. The latter is needed because it works around a viper bug where ConfigMap options were not properly parsed (cilium/cilium#18328). This commit ensures we're always using the result of the fallback parser, which fixes a bug where a ConfigMap value of `{}` was parsed as `map["{}":""]`. Fixes: 768659f2fe15 ("cmd: Fix issue reading string map type via config map") Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> Signed-off-by: Quentin Monnet <quentin@isovalent.com> 30 March 2022, 13:41:35 UTC
e4cbd7c jenkinsfiles: Update calls to Quay API [ upstream commit 277ed8c26e17fc704a32d4ec2349b50f26f82121 ] This is a follow up to 8b67e749 ("workflows: Update call to Quay API"). We also need to change the Quay API calls we're doing in the Jenkinsfiles. [ Backport notes: Minor conflict: "operator-generic-ci" image is "operator-ci" on branch 1.9. ] Signed-off-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Quentin Monnet <quentin@isovalent.com> 30 March 2022, 13:41:35 UTC
cc3fe35 workflows: Update call to Quay API [ upstream commit 8b67e7495d48d9e57fee8890545e2a8985d2e88e ] Quay changed its API on the 22nd. This commit updates the step that checks for the Docker images in all workflows to use the new API. [ Backport notes: - Not all files are present on 1.9, only the existing ones are updated. - Minor conflict: "operator-generic-ci" image is "operator-ci" on branch 1.9. ] Suggested-by: André Martins <andre@cilium.io> Signed-off-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Quentin Monnet <quentin@isovalent.com> 30 March 2022, 13:41:35 UTC
c242c2a wireguard: Fix invalid bits when agent init [ upstream commit 085874cbbb1194d7c7dd07298edf5f33ce065e8c ] This commit modifies the correct CIDR mask when wireguard agent init. Fixes: https://github.com/cilium/cilium/issues/19106 Signed-off-by: Ye Sijun <junnplus@gmail.com> Signed-off-by: Quentin Monnet <quentin@isovalent.com> 30 March 2022, 13:41:35 UTC
0a231f8 helm: Add RBAC Permissions to Clustermesh APIServer Clusterrole [ upstream commit 75f597be318002f0281a64be81666af4f9d5be2d ] The Clustermesh-APIServer creates a CiliumEndPoint and sets a node as its ownerReference, also setting blockOwnerDeletion to "true". If the OwnerReferencesPermissionEnforcement admission controller is enabled (such as in environments like Openshift) then the Clustermesh-APIServer will fail to create the CiliumEndPoint as it has insufficient privileges to set blockOwnerDeletion of a node. It needs to be able to "update" "nodes/finalizers" in order to do this. See #19053 for more details and references. Signed-off-by: Nate Sweet <nathanjsweet@pm.me> Signed-off-by: Quentin Monnet <quentin@isovalent.com> 30 March 2022, 13:41:35 UTC
9ccb92a selectorcache: Release identities without sc.Mutex [ upstream commit 7f9cfd1fe380e4a26657a4d5137152e36b09b0bd ] Similar to the prior commit, when releasing identities, it is possible for the identity allocator to attempt to hold SelectorCache.mutex while handling the release of identities. Therefore, this commit moves the release of identities outside of the critical section for the SelectorCache during AddFQDNSelector() (during selector creation), as well as during the removal of the selector from the SelectorCache. Found by code inspection. Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Quentin Monnet <quentin@isovalent.com> 30 March 2022, 13:41:35 UTC
fff10aa selectorcache: Allocate identities without sc.Mutex [ upstream commit 295e85733966759365422ea5576f38a56aea3f85 ] Michi and Maciej report that they can trigger a deadlock as part of FQDN selectorcache preparation: goroutine 873 [chan send, 969 minutes]: github.com/cilium/cilium/pkg/identity/cache.(*localIdentityCache).lookupOrCreate(0xc00096b440, 0xc0011baf00, 0x0, 0xc000f4c800, 0x0, 0x0) /go/src/github.com/cilium/cilium/pkg/identity/cache/local.go:106 +0x32a github.com/cilium/cilium/pkg/identity/cache.(*CachingIdentityAllocator).AllocateIdentity(0xc0001decc0, 0x2f6c4a0, 0xc00263e660, 0xc0011baf00, 0xc000a18400, 0x0, 0x2ed1e00, 0x0, 0x0) /go/src/github.com/cilium/cilium/pkg/identity/cache/allocator.go:377 +0xbfa github.com/cilium/cilium/pkg/ipcache.allocateCIDRs(0xc001031000, 0x816, 0x816, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0) /go/src/github.com/cilium/cilium/pkg/ipcache/cidr.go:105 +0x27d github.com/cilium/cilium/pkg/ipcache.AllocateCIDRsForIPs(0xc0010ce000, 0x816, 0x955, 0x0, 0x0, 0x0, 0x8000101, 0x0, 0xffffffffffffffff) /go/src/github.com/cilium/cilium/pkg/ipcache/cidr.go:66 +0x71 github.com/cilium/cilium/daemon/cmd.cachingIdentityAllocator.AllocateCIDRsForIPs(0xc0001decc0, 0xc0010ce000, 0x816, 0x955, 0x0, 0x252fbe0, 0xc00076ca38, 0x2598940, 0xc00148a610, 0x0) /go/src/github.com/cilium/cilium/daemon/cmd/identity.go:124 +0x4d github.com/cilium/cilium/pkg/policy.(*fqdnSelector).allocateIdentityMappings(0xc0012cfa40, 0x7fe4824e20f8, 0xc0001decc0, 0xc0013b9050) /go/src/github.com/cilium/cilium/pkg/policy/selectorcache.go:502 +0x2dd github.com/cilium/cilium/pkg/policy.(*SelectorCache).AddFQDNSelector(0xc000483420, 0x2f0dc80, 0xc001848f00, 0xc000fcef40, 0x32, 0x0, 0x0, 0x0, 0x0, 0x0) /go/src/github.com/cilium/cilium/pkg/policy/selectorcache.go:831 +0x3db ... goroutine 440 [semacquire, 969 minutes]: sync.runtime_SemacquireMutex(0xc000483424, 0xc001163500, 0x1) /usr/local/go/src/runtime/sema.go:71 +0x47 sync.(*Mutex).lockSlow(0xc000483420) /usr/local/go/src/sync/mutex.go:138 +0x105 sync.(*Mutex).Lock(...) /usr/local/go/src/sync/mutex.go:81 sync.(*RWMutex).Lock(0xc000483420) /usr/local/go/src/sync/rwmutex.go:111 +0x90 github.com/cilium/cilium/pkg/policy.(*SelectorCache).UpdateIdentities(0xc000483420, 0xc0009685a0, 0xc0009685d0, 0xc000699e40) /go/src/github.com/cilium/cilium/pkg/policy/selectorcache.go:960 +0x55 github.com/cilium/cilium/daemon/cmd.(*Daemon).UpdateIdentities(0xc0002ed8c0, 0xc0009685a0, 0xc0009685d0) /go/src/github.com/cilium/cilium/daemon/cmd/policy.go:99 +0x6a github.com/cilium/cilium/pkg/identity/cache.(*identityWatcher).watch.func1(0xc000655140, 0xc0001decf8) /go/src/github.com/cilium/cilium/pkg/identity/cache/cache.go:205 +0x20c created by github.com/cilium/cilium/pkg/identity/cache.(*identityWatcher).watch /go/src/github.com/cilium/cilium/pkg/identity/cache/cache.go:154 +0x49 Commit b10797c7d87b ("fqdn: Move identity allocation to FQDN selector") contains the following fateful quote in its commit message: > This commit should have no functional changes. Technically the identity > allocation is moved inside the SelectorCache critical section while > holding the mutex, but CIDR identities are only ever locally allocated > within memory anyway so this is not expected to block for a long time. Evidently, the author (yours truly) was unaware that deep in the local identity allocator, it holds a notification mechanism to notify the rest of the selectorcache about the newly allocated identity; or indeed that this in itself would also attempt to grab the SelectorCache mutex in order to propagate this identity into the rest of the policy subsystem. On average, this issue would not trigger a deadlock as there is a channel between the two goroutines at the beginning of this commit message. As long as the channel does not become full, and the goroutines are handled by different OS threads, the first goroutine above would not stall at (*localIdentityCache).lookupOrCreate and would instead proceed to release the SelectorCache mutex, at which point the identityWatcher from the second goroutine could acquire the mutex and make progress. However, in certain cases, particularly if many IPs were associated with a single FQDN, and policy recalculation was triggered (perhaps due to creation of a new pod or new FQDN policies), the deadlock could be triggered by filling up the channel. This patch fixes the issue by pushing the identity allocation outside of the critical section of the SelectorCache, so that even if there are many queued identity updates to handle and they fill up the channel, first goroutine will block at (*localIdentityCache).lookupOrCreate() without holding the SelectorCache mutex. This should allow the second goroutine to make progress regardless of the channel filling up or the first goroutine blocking. The following patch will apply similar logic to the identity release path. [ Backport note: - Conflicts in allocateIdentityMappings() and transferIdentityReferencesToSelector(). - Addressing those conflicts ended up in effectively backporting commit 07be425ed6e3 ("policy: Reduce allocations during FQDN processing") as well. ] Fixes: b10797c7d87b ("fqdn: Move identity allocation to FQDN selector") Reported-by: Michi Mutsuzaki <michi@isovalent.com> Reported-by: Maciej Kwiek <maciej@isovalent.com> Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Quentin Monnet <quentin@isovalent.com> 30 March 2022, 13:41:35 UTC
65b30bc helm: Remove Unnecessary RBAC Permissions for Agent [ upstream commit 0f4d3a71b05504ac56d2f9aa38916bb654b61642 ] In October 2020, we made changes[1] to the cilium-agent's ClusterRole to be more permissive. We did this, because Openshift enables[2] the OwnerReferencesPermissionEnforcement[3] admission controller. This admissions controller prevents changes to the metadata.ownerReferences of any object unless the entity (the cilium-agent in this case) has permission to delete the object as well. Furthermore, the controller allows protects metadata.ownerReferences[x].blockOwnerDeletion of a resource unless the entity (again, the cilium-agent) has "update" access to the finalizer of the object having its deletion blocked. The original PR mistakenly assumed we set ownerReferences on pods and expanded cilium-agent's permissions beyond what was necessary. Cilium-agent only sets ownerReferences on a CiliumEndpoint and the blockOwnerDeletion field propagates up to the "owning" pod of the endpoint. Cilium-agent only needs to be able to delete CiliumEndpoints (which it has always been able to) and "update" pod/finalizers (to set the blockOwnerDeletion field on CiliumEndpoints). All other changes contained in #13369 were unnecessary. 1 https://github.com/cilium/cilium/pull/13369 2 https://docs.openshift.com/container-platform/4.6/architecture/admission-plug-ins.html 3 https://kubernetes.io/docs/reference/access-authn-authz/admission-controllers/#ownerreferencespermissionenforcement Signed-off-by: Nate Sweet <nathanjsweet@pm.me> Signed-off-by: Quentin Monnet <quentin@isovalent.com> 30 March 2022, 13:41:35 UTC
dd2dbc6 hubble/recorder: Sanitize pcap filename [ upstream commit 4baf2da2086a2c946c85c7b6d7e8295dc856f258 ] This removes any special characters from the generated pcap filename. This fixes a bug where we accidentally added a slash to the filename when we added support for the cluster name. This broke file creation, as file names cannot contain slashes. Fixes: 3203df90821f ("hubble: Hubble node_name field should contain cluster name") Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> Signed-off-by: Quentin Monnet <quentin@isovalent.com> 30 March 2022, 13:41:35 UTC
1058f80 Fix unhelpful error emitted when we try to setup base devices [ upstream commit 04c29baf64ea6385c14eb97a2ca19a710c585437 ] Signed-off-by: kerthcet <kerthcet@gmail.com> Signed-off-by: Quentin Monnet <quentin@isovalent.com> 30 March 2022, 13:41:35 UTC
dbe923c install: Update image digests for v1.10.9 Generated from https://github.com/cilium/cilium/actions/runs/2053999962. `docker.io/cilium/cilium:v1.10.9@sha256:ebe1696cebfdaa95112a48c40bcef01e2771113ae2d4be0dda762bece78293b9` `quay.io/cilium/cilium:v1.10.9@sha256:ebe1696cebfdaa95112a48c40bcef01e2771113ae2d4be0dda762bece78293b9` `docker.io/cilium/clustermesh-apiserver:v1.10.9@sha256:ae43adb896c47a0ebfc2124afc0c46393ec694285e744d00df57ba697d957442` `quay.io/cilium/clustermesh-apiserver:v1.10.9@sha256:ae43adb896c47a0ebfc2124afc0c46393ec694285e744d00df57ba697d957442` `docker.io/cilium/docker-plugin:v1.10.9@sha256:33d64e022c57f8f48c8b51ade2ee360d7b2be47af36d3b33b55994ed66a7019a` `quay.io/cilium/docker-plugin:v1.10.9@sha256:33d64e022c57f8f48c8b51ade2ee360d7b2be47af36d3b33b55994ed66a7019a` `docker.io/cilium/hubble-relay:v1.10.9@sha256:92a895d77a8d6c71efceedc111adc848842e5c8bb8ee6d8c0a7812a97ebc1e00` `quay.io/cilium/hubble-relay:v1.10.9@sha256:92a895d77a8d6c71efceedc111adc848842e5c8bb8ee6d8c0a7812a97ebc1e00` `docker.io/cilium/operator-alibabacloud:v1.10.9@sha256:9d66f9ee6080ffedf46cfcb554a7303b25a15935e7bcdc73e742e1830da08f82` `quay.io/cilium/operator-alibabacloud:v1.10.9@sha256:9d66f9ee6080ffedf46cfcb554a7303b25a15935e7bcdc73e742e1830da08f82` `docker.io/cilium/operator-aws:v1.10.9@sha256:ff3dc39157b4a0935495a3cb417ce3e4c70ca906a0f7e79a323b2e920a0b0265` `quay.io/cilium/operator-aws:v1.10.9@sha256:ff3dc39157b4a0935495a3cb417ce3e4c70ca906a0f7e79a323b2e920a0b0265` `docker.io/cilium/operator-azure:v1.10.9@sha256:1af911a1a15bc7be78ca2e4f8a0ccdff037f0077005993e8c44a88f881c59018` `quay.io/cilium/operator-azure:v1.10.9@sha256:1af911a1a15bc7be78ca2e4f8a0ccdff037f0077005993e8c44a88f881c59018` `docker.io/cilium/operator-generic:v1.10.9@sha256:7f9bf92d7e38372dc19899cc3055a04d93095886687f3c03e41932ad5a32e3ac` `quay.io/cilium/operator-generic:v1.10.9@sha256:7f9bf92d7e38372dc19899cc3055a04d93095886687f3c03e41932ad5a32e3ac` `docker.io/cilium/operator:v1.10.9@sha256:a4f7f32530b8632eecfec3b051ac6616fdc7f6c25a91f48721676d4c2a0edf67` `quay.io/cilium/operator:v1.10.9@sha256:a4f7f32530b8632eecfec3b051ac6616fdc7f6c25a91f48721676d4c2a0edf67` Signed-off-by: André Martins <andre@cilium.io> 28 March 2022, 20:09:53 UTC
0338ca7 Prepare for release v1.10.9 Signed-off-by: André Martins <andre@cilium.io> 28 March 2022, 17:55:12 UTC
d034b08 Clarify taint effects in the documentation. [ upstream commit 4e6b6b5c6359857230b1b502d1fce1a57e5c78c2 ] As part of a previous PR, 'NoExecute' started being recommended as the effect that should be placed on nodes to avoid unmanaged pods. While this is correct and required for guaranteeing in the best possible way pods don't come up as unmanaged, there are some considerations that can be made and trade-offs that can be pointed out. This PR attempts at providing a clarification on the taint-based approach to prevent unmanaged pods by adding a page describing the known implications of each effect. Signed-off-by: Bruno M. Custódio <brunomcustodio@gmail.com> Signed-off-by: Quentin Monnet <quentin@isovalent.com> 28 March 2022, 17:43:22 UTC
9702355 build(deps): bump actions/cache from 2.1.7 to 3 Bumps [actions/cache](https://github.com/actions/cache) from 2.1.7 to 3. - [Release notes](https://github.com/actions/cache/releases) - [Commits](https://github.com/actions/cache/compare/937d24475381cd9c75ae6db12cb4e79714b926ed...4b0cf6cc4619e737324ddfcec08fff2413359514) --- updated-dependencies: - dependency-name: actions/cache dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> 26 March 2022, 00:42:44 UTC
3f3eb8e update cilium-{runtime,builder} Signed-off-by: Joe Stringer <joe@cilium.io> 18 March 2022, 19:39:55 UTC
302c805 docs: fix tip about opening the Hubble server port on all nodes [ upstream commit e396b5e41ed61402e3cfffff449763ef4c208616 ] The documentation page about setting up Hubble observability wrongly states that TCP port 4245 needs to be open on all nodes running Cilium to allow Hubble Relay to operate correctly. This is incorrect. Port 4245 is actually the default port used by Hubble Relay, which is a regular deployment and doesn't require any particular action from the user. However, Hubble server uses port 4244 by default and, given that it is embedded in the Cilium agent and uses a host port, it requires it to be open on all nodes to allow Hubble Relay to connect to each Hubble server instance. Signed-off-by: Robin Hahling <robin.hahling@gw-computing.net> Signed-off-by: Tam Mach <tam.mach@isovalent.com> 17 March 2022, 17:19:51 UTC
7c8eae1 Fix 'node-init' in GKE's 'cos' images. [ upstream commit ea9fd6f97b6e7b0d115067dc9f69ba461055530f ] Turns out that in GKE's 'cos' images the 'containerd' binary is still present even though '/etc/containerd/config.toml' is not. Hence, the kubelet wrapper would still be installed for these images according to the current check, even though it's not necessary. What's worse, starting the kubelet would fail because the 'sed' command targeting the aforementioned file would fail. This PR changes the check to rely on the presence of the '--container-runtime-endpoint' flag in the kubelet, which is probably a more reliable way of detecting '*_containerd' flavours and only applying the fix in these cases. Fixes #19015. Signed-off-by: Bruno M. Custódio <brunomcustodio@gmail.com> Co-authored-by: Alexandre Perrin <alex@kaworu.ch> Signed-off-by: Tam Mach <tam.mach@isovalent.com> 17 March 2022, 17:19:51 UTC
46287ed helm: check for contents of bootstartFile [ upstream commit 6c15df9a1ad978427fc291c1e3ab681faa8342d0 ] Checking for the existence of the .Values.nodeinit.bootstrapFile file will be a no-op because the file is created by kubelet if it does not exist. Instead, we should check if the file has some contents inside of it which is when we can be sure the node-init DaemonSet has started. Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Tam Mach <tam.mach@isovalent.com> 17 March 2022, 17:19:51 UTC
76faf6c ipam/crd: Fix spurious CiliumNode update status failures [ upstream commit 18b10b49fc34b9748f7b86fda872e3cb1375a859 ] When running in CRD-based IPAM modes (Alibaba, Azure, ENI, CRD), it is possible to observe spurious "Unable to update CiliumNode custom resource" failures in the cilium-agent. The full error message is as follows: "Operation cannot be fulfilled on ciliumnodes.cilium.io <node>: the object has been modified; please apply your changes to the latest version and try again". It means that the Kubernetes `UpdateStatus` call has failed because the local `ObjectMeta.ResourceVersion` of submitted CiliumNode version is out of date. In the presence of races, this error is expected and will resolve itself once the agent receives a more recent version of the object with the new resource version. However, it is possible that the resource version of a `CiliumNode` object is bumped even though the `Spec` or `Status` of the `CiliumNode` remains the same. This for examples happens when `ObjectMeta.ManagedFields` is updated by the Kubernetes apiserver. Unfortunately, `CiliumNode.DeepEqual` does _not_ consider any `ObjectMeta` fields (including the resource version). Therefore two objects with different resource versions are considered the same by the `CiliumNode` watcher used by IPAM. But to be able to successfully call `UpdateStatus` we need to know the most recent resource version. Otherwise, `UpdateStatus` will always fail until the `CiliumNode` object is updated externally for some reason. Therefore, this commit modifies the logic to always store the most recent version of the `CiliumNode` object, even if `Spec` or `Status` has not changed. This in turn allows `nodeStore.refreshNode` (which invokes `UpdateStatus`) to always work on the most recently observed resource version. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> Signed-off-by: Tam Mach <tam.mach@isovalent.com> 17 March 2022, 17:19:51 UTC
f1c276d bpf: avoid encrypt_key map lookup if IPsec is disabled [ upstream commit 3cb9ba147c0423a828e9df4330411e8b35bee4f2 ] In the bpf_lxc program's functions ipv6_l3_from_lxc and handle_ipv4_from_lxc, currently encrypt_key is always looked up in the encrypt map, regardless of whether IPsec is enabled or not. However, its value is only actually used when IPsec is enabled. Thus, the call can be avoid when IPsec is disabled. This also slightly reduces program size if !defined(ENABLE_IPSEC). Signed-off-by: Tobias Klauser <tobias@cilium.io> Signed-off-by: Tam Mach <tam.mach@isovalent.com> 17 March 2022, 17:19:51 UTC
d63fc05 bpf: fix -Wunused-but-set-variable errors when building with LLVM 14 [ upstream commit 71452d57fae5f2c1568a8ef043fcd305559b31d6 ] Building the BPF datapath with LLVM 14 leads to the following errors: bpf_lxc.c:101:16: error: variable 'daddr' set but not used [-Werror,-Wunused-but-set-variable] union v6addr *daddr, orig_dip; ^ bpf_lxc.c:103:7: error: variable 'encrypt_key' set but not used [-Werror,-Wunused-but-set-variable] __u8 encrypt_key = 0; ^ bpf_lxc.c:102:8: error: variable 'tunnel_endpoint' set but not used [-Werror,-Wunused-but-set-variable] __u32 tunnel_endpoint = 0; ^ bpf_lxc.c:526:7: error: variable 'encrypt_key' set but not used [-Werror,-Wunused-but-set-variable] __u8 encrypt_key = 0; ^ bpf_lxc.c:525:8: error: variable 'tunnel_endpoint' set but not used [-Werror,-Wunused-but-set-variable] __u32 tunnel_endpoint = 0; ^ These are normally warnings, but errors in this case due to the use of -Werror when compiling Cilium's bpf programs. Fix these by marking the affected variables as __maybe_unused. Signed-off-by: Tobias Klauser <tobias@cilium.io> Signed-off-by: Tam Mach <tam.mach@isovalent.com> 17 March 2022, 17:19:51 UTC
49a3bff build(deps): bump docker/build-push-action from 2.9.0 to 2.10.0 Bumps [docker/build-push-action](https://github.com/docker/build-push-action) from 2.9.0 to 2.10.0. - [Release notes](https://github.com/docker/build-push-action/releases) - [Commits](https://github.com/docker/build-push-action/compare/7f9d37fa544684fb73bfe4835ed7214c255ce02b...ac9327eae2b366085ac7f6a2d02df8aa8ead720a) --- updated-dependencies: - dependency-name: docker/build-push-action dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> 16 March 2022, 11:17:32 UTC
a39a72e Include nodeName on pods [ upstream commit 4240222a6f8c8a65181d35132c09217d32947c82 ] Needed for a follow-up change to address an endpoint restoration issue (https://github.com/cilium/cilium/issues/18923). Signed-off-by: Timo Reimann <ttr314@googlemail.com> Signed-off-by: Chris Tarazi <chris@isovalent.com> 16 March 2022, 10:01:58 UTC
cbdf0eb Add metrics for endpoint objects garbage collection [ Backporter's notes: Conflicts around import statement ordering from v1.12 dev cycle and metrics from CES feature in v1.11. Nothing major. ] [ upstream commit 1b7bce37a929656c4594b591da5815626ec8e1e5 ] Signed-off-by: Timo Reimann <ttr314@googlemail.com> Signed-off-by: Chris Tarazi <chris@isovalent.com> 16 March 2022, 10:01:58 UTC
f90ef11 Prevent CiliumEndpoint removal by non-owning agent [ Backporter's notes: Conflicts resolved around imports from v1.11 version and logfields that don't exist in the v1.10 tree. Nothing major. ] [ upstream commit 6f7bf6c51f7a86e458947149a72b4c12f42c331c ] CEPs are creating as well as updated based on informer store data local to an agent's node but (necessarily) deleted globally from the API server. This can currently lead to situations where an agent that does not own a CEP deletes an unrelated CEP. Avoid this problem by having agents maintain the CEP UID and using it as a precondition when deleting CEPs. This guarantees that only the owning agents can delete "their" CEPs. Signed-off-by: Timo Reimann <ttr314@googlemail.com> Signed-off-by: Chris Tarazi <chris@isovalent.com> 16 March 2022, 10:01:58 UTC
c04f41e install/helm: Add Image Override Option to All Images In order to enable offline deployment for certain platforms (like OpenShift) we need to be able to have a universal override for all images so that the OpenShift certified operator can list its "related images"[1][2]. [1]https://docs.openshift.com/container-platform/4.9/operators/operator_sdk/osdk-generating-csvs.html#olm-enabling-operator-for-restricted-network_osdk-generating-csvs [2]https://redhat-connect.gitbook.io/certified-operator-guide/appendix/offline-enabled-operators Signed-off-by: Nate Sweet <nathanjsweet@pm.me> 09 March 2022, 12:49:18 UTC
c3ed5a9 Add support to configure Clustermesh connections in Helm Chart In order to connect Clustermesh clusters without cilium-cli tool we would need to manually patch the cilium agent with hostAliases, configure the cilium-clustermesh secret with mTLS material from the connected clusters. This commit adds support to connect multiple Clustermesh clusters using the Helm Chart. Fixes: cilium#17811 Signed-off-by: Samuel Torres <samuelpirestorres@gmail.com> Signed-off-by: Nate Sweet <nathanjsweet@pm.me> 09 March 2022, 12:49:18 UTC
efd883e Update Go to 1.16.15 Signed-off-by: Tobias Klauser <tobias@cilium.io> 09 March 2022, 12:44:32 UTC
e062349 ctmap: Fix data race for accessing nat maps [ upstream commit a0f5c0d6804a39160ac252b0276573ec02da2f12 ] Commit c9810bf7b2 introduced garbage collection for cleaning orphan entries in the nat maps whereby concurrent accesses to the maps weren't serialized. The nat maps accessed via ct map construct are susceptible to data races in asynchronously running goroutines upon agent restart when endpoints are restored - ``` 2022-02-21T02:42:13.757888057Z WARNING: DATA RACE 2022-02-21T02:42:13.757895621Z Write at 0x00c00081a830 by goroutine 360: 2022-02-21T02:42:13.757912783Z github.com/cilium/cilium/pkg/bpf.(*Map).Close() 2022-02-21T02:42:13.757920669Z /go/src/github.com/cilium/cilium/pkg/bpf/map_linux.go:581 +0x1c4 2022-02-21T02:42:13.757927597Z github.com/cilium/cilium/pkg/maps/ctmap.doGC4·dwrap·4() 2022-02-21T02:42:13.757934561Z /go/src/github.com/cilium/cilium/pkg/maps/ctmap/ctmap.go:422 +0x39 2022-02-21T02:42:13.757941184Z github.com/cilium/cilium/pkg/maps/ctmap.doGC4() 2022-02-21T02:42:13.757947352Z /go/src/github.com/cilium/cilium/pkg/maps/ctmap/ctmap.go:482 +0x4d5 2022-02-21T02:42:13.757953881Z github.com/cilium/cilium/pkg/maps/ctmap.doGC() 2022-02-21T02:42:13.757960362Z /go/src/github.com/cilium/cilium/pkg/maps/ctmap/ctmap.go:517 +0x17e 2022-02-21T02:42:13.757966185Z github.com/cilium/cilium/pkg/maps/ctmap.GC() 2022-02-21T02:42:13.757972307Z /go/src/github.com/cilium/cilium/pkg/maps/ctmap/ctmap.go:537 +0xc6 2022-02-21T02:42:13.757978599Z github.com/cilium/cilium/pkg/endpoint.(*Endpoint).garbageCollectConntrack() 2022-02-21T02:42:13.757986321Z /go/src/github.com/cilium/cilium/pkg/endpoint/bpf.go:1034 +0x804 2022-02-21T02:42:13.757992160Z github.com/cilium/cilium/pkg/endpoint.(*Endpoint).scrubIPsInConntrackTableLocked() 2022-02-21T02:42:13.757998853Z /go/src/github.com/cilium/cilium/pkg/endpoint/bpf.go:1039 +0x173 2022-02-21T02:42:13.758004601Z github.com/cilium/cilium/pkg/endpoint.(*Endpoint).scrubIPsInConntrackTable() 2022-02-21T02:42:13.758010701Z /go/src/github.com/cilium/cilium/pkg/endpoint/bpf.go:1049 +0x44 2022-02-21T02:42:13.758016604Z github.com/cilium/cilium/pkg/endpoint.(*Endpoint).runPreCompilationSteps.func1() 2022-02-21T02:42:13.758022804Z /go/src/github.com/cilium/cilium/pkg/endpoint/bpf.go:761 +0x132 2022-02-21T02:42:13.758034551Z Previous read at 0x00c00081a830 by goroutine 100: 2022-02-21T02:42:13.758040659Z github.com/cilium/cilium/pkg/bpf.(*Map).DumpReliablyWithCallback() 2022-02-21T02:42:13.758046461Z /go/src/github.com/cilium/cilium/pkg/bpf/map_linux.go:756 +0x804 2022-02-21T02:42:13.758053696Z github.com/cilium/cilium/pkg/maps/nat.(*Map).DumpReliablyWithCallback() 2022-02-21T02:42:13.758059818Z /go/src/github.com/cilium/cilium/pkg/maps/nat/nat.go:121 +0x7b7 2022-02-21T02:42:13.758065580Z github.com/cilium/cilium/pkg/maps/ctmap.PurgeOrphanNATEntries() 2022-02-21T02:42:13.758072272Z /go/src/github.com/cilium/cilium/pkg/maps/ctmap/ctmap.go:612 +0x790 2022-02-21T02:42:13.758078005Z github.com/cilium/cilium/pkg/maps/ctmap/gc.runGC() 2022-02-21T02:42:13.758084196Z /go/src/github.com/cilium/cilium/pkg/maps/ctmap/gc/gc.go:214 +0xb0c 2022-02-21T02:42:13.758090362Z github.com/cilium/cilium/pkg/maps/ctmap/gc.runGC() 2022-02-21T02:42:13.758096712Z /go/src/github.com/cilium/cilium/pkg/maps/ctmap/gc/gc.go:189 +0x6a5 2022-02-21T02:42:13.758103134Z github.com/cilium/cilium/pkg/maps/ctmap/gc.runGC() 2022-02-21T02:42:13.758109338Z /go/src/github.com/cilium/cilium/pkg/maps/ctmap/gc/gc.go:189 +0x6a5 2022-02-21T02:42:13.758127517Z github.com/cilium/cilium/pkg/maps/ctmap/gc.Enable.func1() 2022-02-21T02:42:13.758135513Z /go/src/github.com/cilium/cilium/pkg/maps/ctmap/gc/gc.go:90 +0x40a 2022-02-21T02:42:13.758141621Z github.com/cilium/cilium/pkg/maps/ctmap/gc.runGC() 2022-02-21T02:42:13.758147715Z /go/src/github.com/cilium/cilium/pkg/maps/ctmap/gc/gc.go:189 +0x6a5 2022-02-21T02:42:13.758153712Z github.com/cilium/cilium/pkg/maps/ctmap/gc.Enable.func1() 2022-02-21T02:42:13.758160127Z /go/src/github.com/cilium/cilium/pkg/maps/ctmap/gc/gc.go:90 +0x40a ``` Fixes: c9810bf7b2 ("ctmap: GC orphan SNAT entries") Signed-off-by: Aditi Ghag <aditi@cilium.io> Signed-off-by: Paul Chaignon <paul@cilium.io> 07 March 2022, 08:38:06 UTC
4eb07a3 node: Fix incorrect comment for S/GetRouterInfo [ upstream commit d7f64076334d0a62e6c45f4ee42327f173d2b9df ] This function is not specific to ENI IPAM mode anymore since Alibaba and Azure's IPAM modes are also using it. Signed-off-by: Paul Chaignon <paul@cilium.io> 07 March 2022, 08:38:06 UTC
7f4e8ae linux,ipam: Use subnet IPsec for Azure IPAM [ upstream commit 7bc57616b39502a95cbf97dbf6eda6318506f426 ] When using Azure's IPAM mode, we don't have non-overlapping pod CIDRs for each node, so we can't rely on the default IPsec mode where we use the destination CIDRs to match the xfrm policies. Instead, we need to enable subnet IPsec as in EKS. In that case, the dir=out xfrm policy and state look like: src 0.0.0.0/0 dst 10.240.0.0/16 dir out priority 0 mark 0x3e00/0xff00 tmpl src 0.0.0.0 dst 10.240.0.0 proto esp spi 0x00000003 reqid 1 mode tunnel src 0.0.0.0 dst 10.240.0.0 proto esp spi 0x00000003 reqid 1 mode tunnel replay-window 0 mark 0x3e00/0xff00 output-mark 0xe00/0xf00 aead rfc4106(gcm(aes)) 0x567a47ff70a43a3914719a593d5b12edce25a971 128 anti-replay context: seq 0x0, oseq 0x105, bitmap 0x00000000 sel src 0.0.0.0/0 dst 0.0.0.0/0 As can be seen the xfrm policy matches on a broad /16 encompassing all endpoints in the cluster. The xfrm state then matches the policy's template. Finally, to write the proper outer destination IP, we need to define the IP_POOLS macro in our datapath. That way, our BPF programs will determine the outer IP from the ipcache lookup. Signed-off-by: Paul Chaignon <paul@cilium.io> 07 March 2022, 08:38:06 UTC
4473d90 hubble: Added nil check in filterByTCPFlags() to avoid segfault [ upstream commit 7e8b65187a4d37e0c41fd29c8d853ad44ecb5fd9 ] Cilium agent crashes when an L7/HTTP flow is passed to TCP flag flow filter (`filterByTCPFlags`). This is because HTTP flows will have some L4/TCP info such as src/dst port in the flow struct, but will not contain TCP flags. Added `nil` check for TCP flag pointer to avoid the segfault. Fixes: #18830 Signed-off-by: Wazir Ahmed <wazir@accuknox.com> Signed-off-by: Paul Chaignon <paul@cilium.io> 07 March 2022, 08:38:06 UTC
23d5779 docs: update Azure Service Principal / IPAM documentation [ upstream commit d9d23ba78fe692e2548047af067122017254c2a5 ] When installing Cilium in an AKS cluster, the Cilium Operator requires an Azure Service Principal with sufficient privileges to the Azure API for the IPAM allocator to be able to work. Previously, the `az ad sp create-for-rbac` was assigning by default the `Contributor` role to new Service Principals when none was provided via the optional `--role` flag, whereas it now does not assign any role at all. This of course breaks IPAM allocation due to insufficient permissions, resulting in operator failures of this kind: ``` level=warning msg="Unable to synchronize Azure virtualnetworks list" error="network.VirtualNetworksClient#ListAll: Failure responding to request: StatusCode=403 -- Original Error: autorest/azure: Service returned an error. Status=403 Code=\"AuthorizationFailed\" Message=\"The client 'd09fb531-793a-40fc-b934-7af73ca60e32' with object id 'd09fb531-793a-40fc-b934-7af73ca60e32' does not have authorization to perform action 'Microsoft.Network/virtualNetworks/read' over scope '/subscriptions/22716d91-fb67-4a07-ac5f-d36ea49d6167' or the scope is invalid. If access was recently granted, please refresh your credentials.\"" subsys=azure level=fatal msg="Unable to start azure allocator" error="Initial synchronization with instances API failed" subsys=cilium-operator-azure ``` We update the documentation guidelines for new installations to assign the `Contributor` role to new Service Principals used for Cilium. We also take the opportunity to: - Update Azure IPAM required privileges documentation. - Make it so users can now set up all AKS-specific required variables for a Helm install in a single command block, rather than have it spread over several command blocks with intermediate steps and temporary files. - Have the documentation recommend creating Service Principals with privileges over a restricted scope (AKS node resource group) for increased security. Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com> Signed-off-by: Paul Chaignon <paul@cilium.io> 07 March 2022, 08:38:06 UTC
back to top