sort by:
Revision Author Date Message Commit Date
29da866 Prepare for release v1.9.15 Signed-off-by: Joe Stringer <joe@cilium.io> 15 April 2022, 16:03:05 UTC
fa0deae install/helm: Add Image Override Option to All Images In order to enable offline deployment for certain platforms (like OpenShift) we need to be able to have a universal override for all images so that the OpenShift certified operator can list its "related images"[1][2]. [1]https://docs.openshift.com/container-platform/4.9/operators/operator_sdk/osdk-generating-csvs.html#olm-enabling-operator-for-restricted-network_osdk-generating-csvs [2]https://redhat-connect.gitbook.io/certified-operator-guide/appendix/offline-enabled-operators Signed-off-by: Nate Sweet <nathanjsweet@pm.me> 15 April 2022, 02:05:55 UTC
c100646 envoy: Limit accesslog socket permissions [ upstream commit 5595e622243948f74187b449186e4575f451b9e5 ] [ Backporter's notes: trivial conflicts in `cilium-agent.md` and `pkg/envoy/accesslog_server.go` due to other changes in the lines right next to this backport since v1.9. ] Limit access to Cilium xDS and access log sockets to root and group 1337 used by Istio sidecars. Fixes: #3131 Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com> 14 April 2022, 18:45:42 UTC
1253e91 ci: pin down image for documentation workflow [ upstream commit 0da7224218ab96d5f5a8f7e9d267e9743c07c83a ] [ Backporter's notes: conflicts due to `documentation.yaml` not existing in v1.9. Changes were manually applied to the previous file at `docs.yaml`. The other commit in this PR 5272fceb64bdd0c9609b8b18331a5c99b25ae0a5 was dropped because it cannot be applied to the older version of `docs.yaml` on v1.9. ] Instead of using the ":latest" version of the docs-builder image, pin down the version to indicate a specific version to use. The context for this change is some preparation for updating the version of Sphinx used by the image. Specifying an explicit image to use has the following advantages: - When using ":latest" we have to update the image _and_ the workflow at the same time, or the workflow will break. By contrast, once we pin down the image, we can push a new image on Docker without breaking the workflow, and then update the workflow to switch to the new image, on the same PR that updates the build process. - This helps testing an experimental ":latest" image from a PR, without breaking the workflow on the master branch. - If anything goes wrong, this makes it easier to revert the change by rolling back to a previous pinned image, without having to push again a rolled-back docs-builder image as the new ":latest". Most other workflows, if not all, already pin down the images they use. The pinned image is the current ":latest", so there should be no change to the current state of the workflow. Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com> 14 April 2022, 18:45:42 UTC
7303b8a ipcache: Add test asserting out-of-order Kubernetes events [ upstream commit 31a5ff1620e7f3b222af0c884351e161e04539cd ] [ Backporter's notes: the tests had to be adapted because the signature of `IPIdentityCache.Upsert` has been changed in c5d4b7efc978ff4ae99f23bee3078f885a94892f in v1.10, which was not backported to v1.9. ] These tests answer the following questions: * What happens if we receive an add/ update event from k8s with a pod that is using the same IP address of an already-gone-pod-but-delete-event-not-received? * What happens if we receive an delete of an already gone pod after we have received an add/ update from a new pod using that same IP address? What these tests confirm is that Kubernetes events that are out-of-order are handled as they're received. Meaning the ipcache doesn't have any special logic to handle for example whether an ipcache delete for a pod X with IP A is the same pod X (by namespace & name) which previously inserted an ipcache entry. Suggested-by: André Martins <andre@cilium.io> Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com> 14 April 2022, 18:45:42 UTC
601a76d docs: mark node-to-node IPsec encryption as beta [ upstream commit 7eb7bc6aaff0aa8a3891843348d20249c20e8d50 ] [ Backporter's notes: conflicts due to `encryption-ipsec.rst` not existing in v1.9. Changes were manually applied to the previous file at `encryption.rst`. ] Mark node-to-node encryption explicitly as a beta feature, to indicate that some issues might remain to be fixed. Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com> 14 April 2022, 18:45:42 UTC
07e8bd4 test/helpers: Fix incorrect count of endpoints [ upstream commit c877d325058df31cb6edb8b41f18de4fa56e0553 ] The test helper WaitEndpointsReady waits for all endpoints on the node to be in ready state with a non-init security identity. To that end, it lists all endpoints in the format [container-name]=[state],[identity], transforms that into a Go map m1, and iterates through the map to construct a new map m2 with state => counter. If it counts as many values (endpoints) in m1 as in state m2[ready], then all endpoints are ready. However, the number of values in m1 isn't actually equal to the number of endpoints. The container name, used as the key, may be empty for several endpoints, including the host endpoint and endpoints in init state. The last endpoint with an empty container name will therefore overwrite previous entries in the map. That leads the function to such conclusions as: =ready,5 httpd3=ready,31837 app2=ready,28159 =ready,1 httpd2=ready,4632 app1=ready,49770 httpd1=ready,14980 cilium-health=ready,4 '7' containers are in a 'ready' state of a total of '7' containers." It counts 7 containers in ready state, when there are 8 containers. Here the difference matters because the first container, which got overwritten in the map, shouldn't be considered "ready" by this function since it has the init (5) identity. As a fix, we can use the Cilium endpoint ID as the key to the map, as it is guaranteed to be unique per endpoint, contrary to the container name. Signed-off-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> 12 April 2022, 12:53:46 UTC
273f681 test: Fix whitespace in docker-run-cilium [ upstream commit 262ac5fbbd4270766b639529482121a55e2cbeaa ] Add space between provided and default args. Fixes: #19310 Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> 12 April 2022, 12:53:46 UTC
610656e test: Support provisioning in non-vagrant VMs [ upstream commit 43db56895dc5a70d31582688a3b253f81ab31ceb ] Add support for passing VMUSER (which defaults to vagrant) to ease running tests in non-vagrant VMs. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> 12 April 2022, 12:53:46 UTC
5aea34e test: Allow runtime tests to use pre-built Cilium image [ upstream commit ea68a7cddfe4e4348a62895bfea7874669f5a601 ] Pass CILIUM_IMAGE and CILIUM_TAG from environment to provisioning. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> 12 April 2022, 12:53:46 UTC
45a736d test: Runtime use Cilium docker image [ upstream commit 324228b5a3bad3e65d957ad2895985de2a605cd1 ] Run Cilium in docker container for the Runtime tests. Keep the systemd cilium.service, but uses a new script to run Cilium from a docker container from there. This design has a high degree of compatibility to the prior running cilium-agent directly from cilium.docker. Test scripts are organized so that there is no change when running CIlium in a development VM. There cilium-agent is still run in the host as before. While working with this I noticed that CIlium operator fails to run in Runtime tests as it now assumes to be able to reach k8s api-server. CIlium agent fails after a while due to this if it is using etcd kvstore as the heartbeats are missing. That's why the kvstore needs to return to the default (consul) configuration after the etcd test. Previously this was done after each test, but now this is done after all (two) of the kvstore tests, speeding up the tests a bit. Do not pass explicit options when they are the same as defaults. This also avoids using systemd template where bare Cilium agent options are expected. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> 12 April 2022, 12:53:46 UTC
ecbf5d8 test: Pull images for runtime or k8s tests, not for both [ upstream commit 9d4b1bb322e955a715b2a2b9ac610e0dfbd2284a ] Runtime tests do not need images used by k8s tests, nor do k8s tests need images used by runtime tests. Pull images only for the test suite in use. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> 12 April 2022, 12:53:46 UTC
da49c17 build(deps): bump KyleMayes/install-llvm-action from 1.5.1 to 1.5.2 Bumps [KyleMayes/install-llvm-action](https://github.com/KyleMayes/install-llvm-action) from 1.5.1 to 1.5.2. - [Release notes](https://github.com/KyleMayes/install-llvm-action/releases) - [Commits](https://github.com/KyleMayes/install-llvm-action/compare/v1.5.1...v1.5.2) --- updated-dependencies: - dependency-name: KyleMayes/install-llvm-action dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> 04 April 2022, 15:13:24 UTC
6298c86 pkg/redirectpolicy, docs: Add missing namespace check [ upstream commit 2efbdd68a7a578874d1a9ecb884bfba4f76e0f0b ] Local Redirect Policy (LRP) namespace needs to match with the backend pods selected by the LRP. This check was missing in the case where backend pods are deployed after an LRP that selects them was applied. Added unit tests. Reported-by: Joe Stringer <joe@covalent.io> Signed-off-by: Aditi Ghag <aditi@cilium.io> Signed-off-by: Quentin Monnet <quentin@isovalent.com> 30 March 2022, 13:35:00 UTC
b6a0831 helm: Add RBAC Permissions to Clustermesh APIServer Clusterrole [ upstream commit 75f597be318002f0281a64be81666af4f9d5be2d ] The Clustermesh-APIServer creates a CiliumEndPoint and sets a node as its ownerReference, also setting blockOwnerDeletion to "true". If the OwnerReferencesPermissionEnforcement admission controller is enabled (such as in environments like Openshift) then the Clustermesh-APIServer will fail to create the CiliumEndPoint as it has insufficient privileges to set blockOwnerDeletion of a node. It needs to be able to "update" "nodes/finalizers" in order to do this. See #19053 for more details and references. Signed-off-by: Nate Sweet <nathanjsweet@pm.me> Signed-off-by: Quentin Monnet <quentin@isovalent.com> 30 March 2022, 13:35:00 UTC
66748f2 helm: Remove Unnecessary RBAC Permissions for Agent [ upstream commit 0f4d3a71b05504ac56d2f9aa38916bb654b61642 ] In October 2020, we made changes[1] to the cilium-agent's ClusterRole to be more permissive. We did this, because Openshift enables[2] the OwnerReferencesPermissionEnforcement[3] admission controller. This admissions controller prevents changes to the metadata.ownerReferences of any object unless the entity (the cilium-agent in this case) has permission to delete the object as well. Furthermore, the controller allows protects metadata.ownerReferences[x].blockOwnerDeletion of a resource unless the entity (again, the cilium-agent) has "update" access to the finalizer of the object having its deletion blocked. The original PR mistakenly assumed we set ownerReferences on pods and expanded cilium-agent's permissions beyond what was necessary. Cilium-agent only sets ownerReferences on a CiliumEndpoint and the blockOwnerDeletion field propagates up to the "owning" pod of the endpoint. Cilium-agent only needs to be able to delete CiliumEndpoints (which it has always been able to) and "update" pod/finalizers (to set the blockOwnerDeletion field on CiliumEndpoints). All other changes contained in #13369 were unnecessary. 1 https://github.com/cilium/cilium/pull/13369 2 https://docs.openshift.com/container-platform/4.6/architecture/admission-plug-ins.html 3 https://kubernetes.io/docs/reference/access-authn-authz/admission-controllers/#ownerreferencespermissionenforcement [ Backport notes: The files have been renamed: - install/kubernetes/cilium/templates/cilium-agent/clusterrole.yaml is, on v1.9: install/kubernetes/cilium/templates/cilium-agent-clusterrole.yaml - install/kubernetes/cilium/templates/cilium-preflight/clusterrole.yaml is, on v1.9: install/kubernetes/cilium/templates/cilium-preflight-clusterrole.yaml Additionally, we run the following: make -C install/kubernetes experimental-install quick-install and commit the changes. ] Signed-off-by: Nate Sweet <nathanjsweet@pm.me> Signed-off-by: Quentin Monnet <quentin@isovalent.com> 30 March 2022, 13:35:00 UTC
6c502a1 install: Update image digests for v1.9.14 Generated from https://github.com/cilium/cilium/actions/runs/2053994470. `docker.io/cilium/cilium:v1.9.14@sha256:2c6ce93fa7e625979043a387eb998c17ad57df8768d89facb9b715da42a4c51c` `quay.io/cilium/cilium:v1.9.14@sha256:2c6ce93fa7e625979043a387eb998c17ad57df8768d89facb9b715da42a4c51c` `docker.io/cilium/clustermesh-apiserver:v1.9.14@sha256:a0da5edf0372899647da51de1b277f0bab8e676d694aee7f939cddfdd3172010` `quay.io/cilium/clustermesh-apiserver:v1.9.14@sha256:a0da5edf0372899647da51de1b277f0bab8e676d694aee7f939cddfdd3172010` `docker.io/cilium/docker-plugin:v1.9.14@sha256:74ae7f865202cbb22029686e5f4484afb57178c67d6daf0d08014bb695b2c9b3` `quay.io/cilium/docker-plugin:v1.9.14@sha256:74ae7f865202cbb22029686e5f4484afb57178c67d6daf0d08014bb695b2c9b3` `docker.io/cilium/hubble-relay:v1.9.14@sha256:fd6ab1aea260abc5f64eca26c1b1e7009983e4aaa8e5d098e8d442f7659603fb` `quay.io/cilium/hubble-relay:v1.9.14@sha256:fd6ab1aea260abc5f64eca26c1b1e7009983e4aaa8e5d098e8d442f7659603fb` `docker.io/cilium/operator-aws:v1.9.14@sha256:8484021ef6a794027b0dae5625d7248402686a7338d6cd36300885cf3d4f5e47` `quay.io/cilium/operator-aws:v1.9.14@sha256:8484021ef6a794027b0dae5625d7248402686a7338d6cd36300885cf3d4f5e47` `docker.io/cilium/operator-azure:v1.9.14@sha256:a118239016a7dab7bc3fedfa3d4f0c632867529e23952b0a0bf5ab2cbaa7d9b2` `quay.io/cilium/operator-azure:v1.9.14@sha256:a118239016a7dab7bc3fedfa3d4f0c632867529e23952b0a0bf5ab2cbaa7d9b2` `docker.io/cilium/operator-generic:v1.9.14@sha256:bdcfd2eade99933f2fda55ef79ea697ddfad3512f65b15bcd0ba7702518c1ba3` `quay.io/cilium/operator-generic:v1.9.14@sha256:bdcfd2eade99933f2fda55ef79ea697ddfad3512f65b15bcd0ba7702518c1ba3` `docker.io/cilium/operator:v1.9.14@sha256:ab416c1759421c2c07ea856b71a5560c1bebc4fe37ec01b4266191a15321c5aa` `quay.io/cilium/operator:v1.9.14@sha256:ab416c1759421c2c07ea856b71a5560c1bebc4fe37ec01b4266191a15321c5aa` Signed-off-by: André Martins <andre@cilium.io> 28 March 2022, 20:09:50 UTC
f444269 Prepare for release v1.9.14 Signed-off-by: André Martins <andre@cilium.io> 28 March 2022, 17:54:03 UTC
49e885e build(deps): bump actions/cache from 2.1.7 to 3 Bumps [actions/cache](https://github.com/actions/cache) from 2.1.7 to 3. - [Release notes](https://github.com/actions/cache/releases) - [Commits](https://github.com/actions/cache/compare/v2.1.7...v3) --- updated-dependencies: - dependency-name: actions/cache dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> 26 March 2022, 00:42:33 UTC
5212169 Update Cilium base images Signed-off-by: Joe Stringer <joe@cilium.io> 18 March 2022, 19:39:51 UTC
50a44ff Add metrics for endpoint objects garbage collection [ upstream commit 1b7bce37a929656c4594b591da5815626ec8e1e5 ] Signed-off-by: Timo Reimann <ttr314@googlemail.com> 17 March 2022, 17:11:16 UTC
ca528ad Prevent CiliumEndpoint removal by non-owning agent [ upstream commit 6f7bf6c51f7a86e458947149a72b4c12f42c331c ] CEPs are creating as well as updated based on informer store data local to an agent's node but (necessarily) deleted globally from the API server. This can currently lead to situations where an agent that does not own a CEP deletes an unrelated CEP. Avoid this problem by having agents maintain the CEP UID and using it as a precondition when deleting CEPs. This guarantees that only the owning agents can delete "their" CEPs. Signed-off-by: Timo Reimann <ttr314@googlemail.com> 17 March 2022, 17:11:16 UTC
f57c600 build(deps): bump docker/build-push-action from 2.9.0 to 2.10.0 Bumps [docker/build-push-action](https://github.com/docker/build-push-action) from 2.9.0 to 2.10.0. - [Release notes](https://github.com/docker/build-push-action/releases) - [Commits](https://github.com/docker/build-push-action/compare/7f9d37fa544684fb73bfe4835ed7214c255ce02b...ac9327eae2b366085ac7f6a2d02df8aa8ead720a) --- updated-dependencies: - dependency-name: docker/build-push-action dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> 16 March 2022, 11:17:15 UTC
6529b8e ci: remove box download timeout in upstream tests [ upstream commit 96f4050963881e84ccec0540b78277987c25e360 ] This timeout can be too small when the host has to download all boxes due to not having any of the boxes required for the SHA to be tested. In particular this is prone to happen on backport PRs, since it's more likely for the job to be scheduled on a node that primarily run `master` pipelines up to that point. Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com> Signed-off-by: Alexandre Perrin <alex@kaworu.ch> 15 March 2022, 08:40:24 UTC
bede2c7 node: Fix incorrect comment for S/GetRouterInfo [ upstream commit d7f64076334d0a62e6c45f4ee42327f173d2b9df ] This function is not specific to ENI IPAM mode anymore since Alibaba and Azure's IPAM modes are also using it. Signed-off-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Alexandre Perrin <alex@kaworu.ch> 15 March 2022, 08:40:24 UTC
08cce4c linux,ipam: Use subnet IPsec for Azure IPAM [ upstream commit 7bc57616b39502a95cbf97dbf6eda6318506f426 ] When using Azure's IPAM mode, we don't have non-overlapping pod CIDRs for each node, so we can't rely on the default IPsec mode where we use the destination CIDRs to match the xfrm policies. Instead, we need to enable subnet IPsec as in EKS. In that case, the dir=out xfrm policy and state look like: src 0.0.0.0/0 dst 10.240.0.0/16 dir out priority 0 mark 0x3e00/0xff00 tmpl src 0.0.0.0 dst 10.240.0.0 proto esp spi 0x00000003 reqid 1 mode tunnel src 0.0.0.0 dst 10.240.0.0 proto esp spi 0x00000003 reqid 1 mode tunnel replay-window 0 mark 0x3e00/0xff00 output-mark 0xe00/0xf00 aead rfc4106(gcm(aes)) 0x567a47ff70a43a3914719a593d5b12edce25a971 128 anti-replay context: seq 0x0, oseq 0x105, bitmap 0x00000000 sel src 0.0.0.0/0 dst 0.0.0.0/0 As can be seen the xfrm policy matches on a broad /16 encompassing all endpoints in the cluster. The xfrm state then matches the policy's template. Finally, to write the proper outer destination IP, we need to define the IP_POOLS macro in our datapath. That way, our BPF programs will determine the outer IP from the ipcache lookup. Signed-off-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Alexandre Perrin <alex@kaworu.ch> 15 March 2022, 08:40:24 UTC
819fae7 docs: update Azure Service Principal / IPAM documentation [ upstream commit d9d23ba78fe692e2548047af067122017254c2a5 ] When installing Cilium in an AKS cluster, the Cilium Operator requires an Azure Service Principal with sufficient privileges to the Azure API for the IPAM allocator to be able to work. Previously, the `az ad sp create-for-rbac` was assigning by default the `Contributor` role to new Service Principals when none was provided via the optional `--role` flag, whereas it now does not assign any role at all. This of course breaks IPAM allocation due to insufficient permissions, resulting in operator failures of this kind: ``` level=warning msg="Unable to synchronize Azure virtualnetworks list" error="network.VirtualNetworksClient#ListAll: Failure responding to request: StatusCode=403 -- Original Error: autorest/azure: Service returned an error. Status=403 Code=\"AuthorizationFailed\" Message=\"The client 'd09fb531-793a-40fc-b934-7af73ca60e32' with object id 'd09fb531-793a-40fc-b934-7af73ca60e32' does not have authorization to perform action 'Microsoft.Network/virtualNetworks/read' over scope '/subscriptions/22716d91-fb67-4a07-ac5f-d36ea49d6167' or the scope is invalid. If access was recently granted, please refresh your credentials.\"" subsys=azure level=fatal msg="Unable to start azure allocator" error="Initial synchronization with instances API failed" subsys=cilium-operator-azure ``` We update the documentation guidelines for new installations to assign the `Contributor` role to new Service Principals used for Cilium. We also take the opportunity to: - Update Azure IPAM required privileges documentation. - Make it so users can now set up all AKS-specific required variables for a Helm install in a single command block, rather than have it spread over several command blocks with intermediate steps and temporary files. - Have the documentation recommend creating Service Principals with privileges over a restricted scope (AKS node resource group) for increased security. Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com> Signed-off-by: Alexandre Perrin <alex@kaworu.ch> 15 March 2022, 08:40:24 UTC
e114e5d jenkinsfiles: bump runtime tests VM boot timeout [ upstream commit 4c3bd27c275cb16c8d4dca62d7fe51e649ecd98e ] We are hitting this timeout sometimes, and it seems it was previously updated on the regular pipelines (see 31a622ea40ff9b47bb73469b89c51db2d090b0e2) but not on the runtime pipeline. We remove the inner timeout as the outer one is pratically redundant here, as the steps outside of the inner loop are almost instantaneous. Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com> Signed-off-by: Alexandre Perrin <alex@kaworu.ch> 15 March 2022, 08:40:24 UTC
c086753 build(deps): bump actions/upload-artifact from 2.3.1 to 3 Bumps [actions/upload-artifact](https://github.com/actions/upload-artifact) from 2.3.1 to 3. - [Release notes](https://github.com/actions/upload-artifact/releases) - [Commits](https://github.com/actions/upload-artifact/compare/82c141cc518b40d92cc801eee768e7aafc9c2fa2...6673cd052c4cd6fcf4b4e6e60ea986c889389535) --- updated-dependencies: - dependency-name: actions/upload-artifact dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> 04 March 2022, 03:09:12 UTC
73aee34 build(deps): bump actions/download-artifact from 2.1.0 to 3 Bumps [actions/download-artifact](https://github.com/actions/download-artifact) from 2.1.0 to 3. - [Release notes](https://github.com/actions/download-artifact/releases) - [Commits](https://github.com/actions/download-artifact/compare/f023be2c48cc18debc3bacd34cb396e0295e2869...fb598a63ae348fa914e94cd0ff38f362e927b741) --- updated-dependencies: - dependency-name: actions/download-artifact dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> 04 March 2022, 03:08:35 UTC
dfea4f5 build(deps): bump docker/login-action from 1.14.0 to 1.14.1 Bumps [docker/login-action](https://github.com/docker/login-action) from 1.14.0 to 1.14.1. - [Release notes](https://github.com/docker/login-action/releases) - [Commits](https://github.com/docker/login-action/compare/bb984efc561711aaa26e433c32c3521176eae55b...dd4fa0671be5250ee6f50aedf4cb05514abda2c7) --- updated-dependencies: - dependency-name: docker/login-action dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> 04 March 2022, 03:07:57 UTC
37ff807 build(deps): bump actions/checkout from 2 to 3 Bumps [actions/checkout](https://github.com/actions/checkout) from 2 to 3. - [Release notes](https://github.com/actions/checkout/releases) - [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md) - [Commits](https://github.com/actions/checkout/compare/v2...v3) --- updated-dependencies: - dependency-name: actions/checkout dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> 04 March 2022, 03:07:50 UTC
bcc2df9 build(deps): bump docker/login-action from 1.13.0 to 1.14.0 Bumps [docker/login-action](https://github.com/docker/login-action) from 1.13.0 to 1.14.0. - [Release notes](https://github.com/docker/login-action/releases) - [Commits](https://github.com/docker/login-action/compare/6af3c118c8376c675363897acf1757f7a9be6583...bb984efc561711aaa26e433c32c3521176eae55b) --- updated-dependencies: - dependency-name: docker/login-action dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> 01 March 2022, 23:17:29 UTC
e53fe0b build(deps): bump actions/setup-go from 2.2.0 to 3 Bumps [actions/setup-go](https://github.com/actions/setup-go) from 2.2.0 to 3. - [Release notes](https://github.com/actions/setup-go/releases) - [Commits](https://github.com/actions/setup-go/compare/v2.2.0...v3) --- updated-dependencies: - dependency-name: actions/setup-go dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> 01 March 2022, 23:17:17 UTC
764445d build(deps): bump KyleMayes/install-llvm-action from 1.5.0 to 1.5.1 Bumps [KyleMayes/install-llvm-action](https://github.com/KyleMayes/install-llvm-action) from 1.5.0 to 1.5.1. - [Release notes](https://github.com/KyleMayes/install-llvm-action/releases) - [Commits](https://github.com/KyleMayes/install-llvm-action/compare/v1.5.0...v1.5.1) --- updated-dependencies: - dependency-name: KyleMayes/install-llvm-action dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> 28 February 2022, 13:26:45 UTC
c5d2713 build(deps): bump golangci/golangci-lint-action from 2.5.2 to 3 Bumps [golangci/golangci-lint-action](https://github.com/golangci/golangci-lint-action) from 2.5.2 to 3. - [Release notes](https://github.com/golangci/golangci-lint-action/releases) - [Commits](https://github.com/golangci/golangci-lint-action/compare/v2.5.2...v3) --- updated-dependencies: - dependency-name: golangci/golangci-lint-action dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> 28 February 2022, 13:26:25 UTC
a32bf38 install: Update image digests for v1.9.13 Generated from https://github.com/cilium/cilium/actions/runs/1890091413. `docker.io/cilium/cilium:v1.9.13@sha256:12752fd66c5448194062befaf59aaefc446cbff729aa8b2d7ea4801113d3a31a` `quay.io/cilium/cilium:v1.9.13@sha256:12752fd66c5448194062befaf59aaefc446cbff729aa8b2d7ea4801113d3a31a` `docker.io/cilium/clustermesh-apiserver:v1.9.13@sha256:3c5ae05e0c10a24a2e1c1d269a8522346dc33671fae82d9b15a4b93f9d25710c` `quay.io/cilium/clustermesh-apiserver:v1.9.13@sha256:3c5ae05e0c10a24a2e1c1d269a8522346dc33671fae82d9b15a4b93f9d25710c` `docker.io/cilium/docker-plugin:v1.9.13@sha256:903def48e38ba32e519950fc119bc8982e84cbfbc5aa2599bf31232a203d1afe` `quay.io/cilium/docker-plugin:v1.9.13@sha256:903def48e38ba32e519950fc119bc8982e84cbfbc5aa2599bf31232a203d1afe` `docker.io/cilium/hubble-relay:v1.9.13@sha256:bd374bd8cd6abccce817f6cfabd5e58f243a7ec8d0fcf4dd22f0713713ab6969` `quay.io/cilium/hubble-relay:v1.9.13@sha256:bd374bd8cd6abccce817f6cfabd5e58f243a7ec8d0fcf4dd22f0713713ab6969` `docker.io/cilium/operator-aws:v1.9.13@sha256:9a3d04b41be1b3d79d079e3ee8021230440845073aa1beca6e7835743fbdc017` `quay.io/cilium/operator-aws:v1.9.13@sha256:9a3d04b41be1b3d79d079e3ee8021230440845073aa1beca6e7835743fbdc017` `docker.io/cilium/operator-azure:v1.9.13@sha256:aab870367b39b7220fcc0997b13a4d5b8f78696ee9e39caf742f90b504a92fa8` `quay.io/cilium/operator-azure:v1.9.13@sha256:aab870367b39b7220fcc0997b13a4d5b8f78696ee9e39caf742f90b504a92fa8` `docker.io/cilium/operator-generic:v1.9.13@sha256:826136116ce840ae37efad5e63d4e2a6d7f47a3277b840ab3d45758f19f1fc78` `quay.io/cilium/operator-generic:v1.9.13@sha256:826136116ce840ae37efad5e63d4e2a6d7f47a3277b840ab3d45758f19f1fc78` `docker.io/cilium/operator:v1.9.13@sha256:18423690655c2c9c4190657608a6b3753b87fd8fd151f112ea216aa9e3cc4fec` `quay.io/cilium/operator:v1.9.13@sha256:18423690655c2c9c4190657608a6b3753b87fd8fd151f112ea216aa9e3cc4fec` Signed-off-by: Joe Stringer <joe@cilium.io> 24 February 2022, 05:52:51 UTC
f2160fc Prepare for release v1.9.13 Signed-off-by: Joe Stringer <joe@cilium.io> 23 February 2022, 22:16:26 UTC
4274343 envoy: Update to 1.21.1 [ upstream commit 571a48430b01230378efce8be9df636b3c2b7777 ] Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> 23 February 2022, 20:46:47 UTC
7d3e1a8 envoy: Update to release 1.21.0 [ upstream commit 28f0dae2666d917a818f90a8c84f147fdf977daf ] [ Backporter's notes: Dropped all Envoy API changes, adapted BPF TPROXY compatibility to the older API. ] Envoy Go API is updated to contain the generated validation code. Envoy image is updated to support the new EndpointId option for the bpf_metadata listener filter. NPDS field 'Policy' is renamed as 'EndpointID'. 'Policy' field was not used for anything, so might as well recycle it while this API is not yet public. Envoy retries may fail on "address already in use" when the original source address and port are used on upstream connections. Cilium typically does this in the egress proxy listeners. Fix this by using a Cilium Envoy build that always sets SO_REUSEADDR when original source address and port is used. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> Revert "envoy: Update to release 1.21.0" This reverts commit 377dec2d4eca3f239ff6c72f85b3e9fb9c466d21. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> 23 February 2022, 20:46:47 UTC
de7f9fe Update Cilium base images Signed-off-by: Joe Stringer <joe@cilium.io> 22 February 2022, 18:58:26 UTC
ab265c7 build(deps): bump docker/build-push-action from 2.8.0 to 2.9.0 Bumps [docker/build-push-action](https://github.com/docker/build-push-action) from 2.8.0 to 2.9.0. - [Release notes](https://github.com/docker/build-push-action/releases) - [Commits](https://github.com/docker/build-push-action/compare/1814d3dfb36d6f84174e61f4a4b05bd84089a4b9...7f9d37fa544684fb73bfe4835ed7214c255ce02b) --- updated-dependencies: - dependency-name: docker/build-push-action dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> 21 February 2022, 19:19:16 UTC
f83dcb3 build(deps): bump actions/setup-go from 2.1.5 to 2.2.0 Bumps [actions/setup-go](https://github.com/actions/setup-go) from 2.1.5 to 2.2.0. - [Release notes](https://github.com/actions/setup-go/releases) - [Commits](https://github.com/actions/setup-go/compare/v2.1.5...v2.2.0) --- updated-dependencies: - dependency-name: actions/setup-go dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> 21 February 2022, 19:19:00 UTC
912bb78 build(deps): bump docker/login-action from 1.12.0 to 1.13.0 Bumps [docker/login-action](https://github.com/docker/login-action) from 1.12.0 to 1.13.0. - [Release notes](https://github.com/docker/login-action/releases) - [Commits](https://github.com/docker/login-action/compare/42d299face0c5c43a0487c477f595ac9cf22f1a7...6af3c118c8376c675363897acf1757f7a9be6583) --- updated-dependencies: - dependency-name: docker/login-action dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> 21 February 2022, 19:18:43 UTC
fb02c6e pkg/datapath/linux: Fix asymmetric IPsec logic on delete [ upstream commit 0bd4e04b15d136e54c3930b60e5c4c129ec869ef ] With ENI IPAM mode and IPsec enabled, users were reporting cases where connectivity to particular pods breaks, and correlated with those drops, the following error msg: ``` Unable to delete the IPsec route OUT from the host routing table ``` In addition, it was also reported that the connectivity outage would only last for a few minutes before resolving itself. The issue turned out to be that upon node deletion, the logic to handle the IPsec cleanup is asymmetric with the IPsec logic to handle a node create / update. Here's how: * With ENI mode and IPsec, subnet encryption mode is enabled implicitly. * Background: Users can explicitly enable subnet encryption mode by configuring `--ipv4-pod-subnets=[cidr1,cidr2,...]`. * Background: ENIs are part of subnet(s). * Cilium with ENI mode automatically appends the node's ENIs' subnets' CIDRs to this slice. * For example, node A has ENI E which is a part of subnet S with CIDR C. Therefore, `--ipv4-pod-subnets=[C]`. * This means that each node should have an IPsec OUT routes for each pod subnet, i.e. each ENI's subnet, as shown by (*linuxNodeHandler).nodeUpdate() which contains the IPsec logic on a node create / update. * Upon a node delete [(*linuxNodeHandler).nodeDelete()], we clean up the "old" node. When it gets to the IPsec logic, it removes the routes for the pod subnets as well, i.e. removes the route to the ENI's subnet from the local node. From the example above, it'd remove the route for CIDR C. * This is problematic because in ENI mode, different nodes can share the same ENI's subnet, meaning subnets are NOT exclusive to a node. For example, a node B can also have ENI E with a subnet C attached to it. * As for how the nodes were fixing themselves, it turns out that (*Manager).backgroundSync() runs on an interval which calls NodeValidateImplementation() which calls down to (*linuxNodeHandler).nodeUpdate() thereby running the IPsec logic of a node create / update which reinstates the missing routes. Therefore, we shouldn't be deleting these routes because pods might still be relying on them. By comparing the IPsec delete logic with [1], we see that they're asymmetric. This commit fixes this asymmetry. [1]: Given subnetEncryption=true, notice how we only call enableSubnetIPsec() if the node is local. That is not the case on node delete. ``` func (n *linuxNodeHandler) nodeUpdate(oldNode, newNode *nodeTypes.Node, firstAddition bool) error { ... if n.nodeConfig.EnableIPSec && !n.subnetEncryption() && !n.nodeConfig.EncryptNode { n.enableIPsec(newNode) newKey = newNode.EncryptionKey } ... if n.nodeConfig.EnableIPSec && !n.subnetEncryption() { n.encryptNode(newNode) } if newNode.IsLocal() { isLocalNode = true ... if n.subnetEncryption() { n.enableSubnetIPsec(n.nodeConfig.IPv4PodSubnets, n.nodeConfig.IPv6PodSubnets) } ... return nil } ``` Fixes: 645de9dee63 ("cilium: remove encryption route and rules if crypto is disabled") Co-authored-by: John Fastabend <john@isovalent.com> Signed-off-by: Chris Tarazi <chris@isovalent.com> 21 February 2022, 19:18:21 UTC
c4c8566 pkg/datapath/linux: Add CIDR logfield to IPsec route logs [ upstream commit e41aea01908e49381e4dae10fface2a48f230731 ] This helps in scenarios where the user reports this log msg, but we are missing the actual CIDR from the route that failed to be deleted or created. Signed-off-by: Chris Tarazi <chris@isovalent.com> 21 February 2022, 19:18:21 UTC
f55b5e2 pkg/datapath/linux: Remove unnecessary branch in IPsec route functions [ upstream commit 7e5022e5086e109ffdbc385a1789df498006be0e ] These if-statements are unnecessary because upon code analysis, we can tell that it's not possible for the input to be nil. Remove these statements to simplify the flow of the function. In other words, now we know for a fact that calling these function will result in a route insert. Signed-off-by: Chris Tarazi <chris@isovalent.com> 21 February 2022, 19:18:21 UTC
cc23064 pkg/datapath, pkg/node/manager: Clarify NodeValidateImplementation godoc [ upstream commit a6e847766e012074aa14fa98ea0a5185434f0f0c ] Document the intent of NodeValidateImplementation(). Signed-off-by: Chris Tarazi <chris@isovalent.com> 21 February 2022, 19:18:21 UTC
1838771 ipcache: Reduce identity scope for other "hosts" [ upstream commit f6a4104253f90dd71a99b83393dc048e9ed1d807 ] This patch updates the Cilium logic for handling remote node identity updates to ensure that when Cilium's '--enable-remote-node-identity' flag is configured, each Cilium node will consistently consider all other nodes as having the "remote-node" identity. This fixes an issue where users reported policy drops from remote nodes -> pods, even though the policy appeared to allow this. The issue was limited to kvstore configurations of Cilium, and does not affect configurations where CRDs are used for sharing information within the cluster. For background: When Cilium starts up, it locally scans for IP addresses associated with the node, and updates its own IPcache to associate those IPs with the "host" identity. Additionally, it will also publish this information to other nodes so that they can make policy decisions regarding traffic coming from IPs attached to nodes in the cluster. Before commit 7bf60a59f072 ("nodediscovery: Fix local host identity propagation"), Cilium would propagate the identity "remote-node" as part of these updates to other nodes. After that commit, it would propagate the identity "host" as part of these updates to other nodes. When receiving these updates, Cilium would trust the identity directly and push IP->Identity mappings like this into the datapath, regardless of whether the '--enable-remote-node-identity' setting was configured or not. As such, when the above commit changed the behaviour, it triggered a change in policy handling behaviour. The '--enable-remote-node-identity' flag was initially introduced to allow the security domain of remote nodes in the cluster to be considered differently vs. the local host. This can be important as Kubernetes defines that the host should always have access to pods on the node, so if all nodes are considered the same as the "host", this can represent a larger open policy surface for pods than necessary in a zero trust environment. Given the potential security implications of this setting, at the time that it was introduced, we introduced mitigations both in the control plane and in the data plane. Whenever the datapath is configured with --enable-remote-node-identity=true, it will also distrust any reports that peer node identities are "host", even if the ipcache itself reports this. In this situation, the datapath does not accept that the traffic is from the "host". Rather, it demotes the identity of the traffic to considering it as part of the "world". The motivation behind this is that allowing "world" is a very permissive policy, so if the user is OK with allowing "world" traffic then it is likely that they will be OK with accepting any traffic like this which purports to be coming from a "host" in the cluster. As a result of the above conditions, users running in kvstore mode who upgraded from earlier Cilium versions to 1.9.12, 1.10.6 or 1.11.0 (and other releases up until this patch is released as part of an official version) could observe traffic drops for traffic from nodes in the cluster towards pods on other nodes in the cluster. Hubble would report that the traffic is coming "from the world" (identity=2), despite having a source address of another node in the cluster. We considered multiple approaches to solving this issue: A) Revert the commit that introduced the issue (see GH-18763). * Evidently, by this point there are multiple other codepaths relying on the internal storage of the local node's identity as Host, which would make this more difficult. B) Ensure that the kvstore propagation code propagates the current node's identity as "remote-node", as other nodes may expect. * In cases of versions with mixed knowledge of remote-node-identity (for instance during upgrade), then newer nodes could end up propagating the new identity, but old nodes would not understand how to calculate policy with this identity in consideration, so this could result in similar sorts of policy drops during upgrade. C) In the case when --enable-remote-node-identity=true, ensure that when Cilium receives updates from peer nodes, it demotes the "host" identity reported by peer nodes down to "remote-node" for the associated IP addresses. This way, the impact of the flag is limited to the way that the current node configures itself only. If the datapath is then informed (via ipcache) that thes IPs correspond to "remote-node", then the policy will be correctly assessed. This commit takes approach (C). Fixes: 7bf60a59f072 ("nodediscovery: Fix local host identity propagation") Co-authored-by: André Martins <andre@cilium.io> Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Gilberto Bertin <gilberto@isovalent.com> 15 February 2022, 09:11:39 UTC
fc65b58 ui: update envoy config to work with v1.18.4 envoy current config is not compatible with 1.18.4 envoy version Signed-off-by: Dmitry Kharitonov <dmitry@isovalent.com> 10 February 2022, 12:13:51 UTC
5f4bec2 labelfilter: Refine default label regexps [ upstream commit 422d7fc95c7bdb5acf37094b47a2ed92cc245fd3 ] Cilium treats label patterns as regular expressions. The existing default labels, e.g. "!k8s.io", used a '.', which matches any character. This led to the default labels being too permissive in their matching and consequently labels like "k8sXo" being excluded from the identity, with consequent security implications. This commit properly escapes the regular expressions used in the default labels. Signed-off-by: Tom Payne <tom@isovalent.com> Signed-off-by: Jussi Maki <jussi@isovalent.com> 09 February 2022, 14:34:46 UTC
c8a5d8b feat: add hands-on tutorials [ upstream commit e83c81882b4a87a54ab52355eae32ec073cea22d ] Add the following text to documentation: Hands-on tutorial in a live environment to quickly get started with Cilium. Signed-off-by: Van Le <vannnyle@gmail.com> Signed-off-by: Jussi Maki <jussi@isovalent.com> 09 February 2022, 14:34:46 UTC
fea52f1 contrib: Fix backport submission for own PRs [ upstream commit 1b42f7a0cb61208d9070313e526a769983fe5b59 ] On GitHub, one cannot request oneself to review one's own PR. This results in the following problem when submitting a backport PR: $ submit-backport Using GitHub repository joestringer/cilium (git remote: origin) Sending PR for branch v1.10: v1.10 backports 2021-11-23 * #17788 -- Additional FQDN selector identity tracking fixes (@joestringer) Once this PR is merged, you can update the PR labels via: ```upstream-prs $ for pr in 17788; do contrib/backporting/set-labels.py $pr done 1.10; done ``` Sending pull request... remote: remote: Create a pull request for 'pr/v1.10-backport-2021-11-23' on GitHub by visiting: remote: https://github.com/joestringer/cilium/pull/new/pr/v1.10-backport-2021-11-23 remote: Error requesting reviewer: Unprocessable Entity (HTTP 422) Review cannot be requested from pull request author. Signal ERR caught! Traceback (line function script): 58 main /home/joe/git/cilium/contrib/backporting/submit-backport Fix this by excluding ones own username from the reviewers list. Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Glib Smaga <code@gsmaga.com> 01 February 2022, 17:29:46 UTC
a8d2ae9 update k8s library versions k8s 1.19.16 Signed-off-by: André Martins <andre@cilium.io> 25 January 2022, 08:50:36 UTC
f8a0671 install: Update image digests for v1.9.12 Generated from https://github.com/cilium/cilium/actions/runs/1715602995. `docker.io/cilium/cilium:v1.9.12@sha256:7d4ef9dc7e504ba1a55a01dd743260daced11ded02fc268965ef2c98eb8b8bde` `quay.io/cilium/cilium:v1.9.12@sha256:7d4ef9dc7e504ba1a55a01dd743260daced11ded02fc268965ef2c98eb8b8bde` `docker.io/cilium/clustermesh-apiserver:v1.9.12@sha256:f9eba125ca3d9e9014613a8e43e92afd635320e82736e75d9329de5054076449` `quay.io/cilium/clustermesh-apiserver:v1.9.12@sha256:f9eba125ca3d9e9014613a8e43e92afd635320e82736e75d9329de5054076449` `docker.io/cilium/docker-plugin:v1.9.12@sha256:2ceaf31e8e66a050992cfc02b1c0cdffe29df2f787ca86a8448b6fb7aebbeca6` `quay.io/cilium/docker-plugin:v1.9.12@sha256:2ceaf31e8e66a050992cfc02b1c0cdffe29df2f787ca86a8448b6fb7aebbeca6` `docker.io/cilium/hubble-relay:v1.9.12@sha256:67c5ce60e2f7cfd6f28b68b164bb910c41be365b9e17553c8a963dd456de204f` `quay.io/cilium/hubble-relay:v1.9.12@sha256:67c5ce60e2f7cfd6f28b68b164bb910c41be365b9e17553c8a963dd456de204f` `docker.io/cilium/operator-aws:v1.9.12@sha256:5702f3e1195e3ba7dfadeb6dd6eeb585af6051abf81fb633dc796489660d1b8b` `quay.io/cilium/operator-aws:v1.9.12@sha256:5702f3e1195e3ba7dfadeb6dd6eeb585af6051abf81fb633dc796489660d1b8b` `docker.io/cilium/operator-azure:v1.9.12@sha256:0e3e1e07f4b0847b26363d6100e57a743963307a2f781a292d5831046b4e51f7` `quay.io/cilium/operator-azure:v1.9.12@sha256:0e3e1e07f4b0847b26363d6100e57a743963307a2f781a292d5831046b4e51f7` `docker.io/cilium/operator-generic:v1.9.12@sha256:b89b16476cf6500d68763a70fb3d449e0309296bd00122cbe24f306c7e5e5180` `quay.io/cilium/operator-generic:v1.9.12@sha256:b89b16476cf6500d68763a70fb3d449e0309296bd00122cbe24f306c7e5e5180` `docker.io/cilium/operator:v1.9.12@sha256:ba08cb3378e6b254d96029fa971e3314c8e8c23f322cfcb004b242d1b03bbf19` `quay.io/cilium/operator:v1.9.12@sha256:ba08cb3378e6b254d96029fa971e3314c8e8c23f322cfcb004b242d1b03bbf19` Signed-off-by: Joe Stringer <joe@cilium.io> 19 January 2022, 01:58:09 UTC
d12a812 Prepare for release v1.9.12 Signed-off-by: Joe Stringer <joe@cilium.io> 19 January 2022, 00:41:46 UTC
98e7eba build(deps): bump docker/build-push-action from 2.7.0 to 2.8.0 Bumps [docker/build-push-action](https://github.com/docker/build-push-action) from 2.7.0 to 2.8.0. - [Release notes](https://github.com/docker/build-push-action/releases) - [Commits](https://github.com/docker/build-push-action/compare/a66e35b9cbcf4ad0ea91ffcaf7bbad63ad9e0229...1814d3dfb36d6f84174e61f4a4b05bd84089a4b9) --- updated-dependencies: - dependency-name: docker/build-push-action dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> 18 January 2022, 20:58:38 UTC
1a51596 docs: Fix cilium-runtime image bump instructions This is a process that reliably works, unlike the previous one which relies on Quay.io's ability to pull images from DockerHub. It does require a core maintainer to prepare the changes, but that's fine for now since v1.9 should change infrequently now. Signed-off-by: Joe Stringer <joe@cilium.io> 17 January 2022, 16:41:27 UTC
298d8e5 Update Cilium base images Signed-off-by: Joe Stringer <joe@cilium.io> 14 January 2022, 23:08:57 UTC
6374259 ci: use python3 instead of python [ upstream commit dddbbe709e2827873420fef9b635152340f37f91 ] Our CI nodes no longer have `python` binary, python3 is available instead. Signed-off-by: Maciej Kwiek <maciej@isovalent.com> Signed-off-by: Paul Chaignon <paul@cilium.io> 11 January 2022, 18:30:33 UTC
e527987 bpf: Reset Pod's queue mapping in host veth to fix phys dev mq selection [ upstream commit ecdff123780dcc50599e424cbbc77edf2c70e396 ] Fix TX queue selection problem on the phys device as reported by Laurent. At high throughput, they noticed a significant amount of TCP retransmissions that they tracked back to qdic drops (fq_codel was used). Suspicion is that kernel commit edbea9220251 ("veth: Store queue_mapping independently of XDP prog presence") caused this due to its unconditional skb_record_rx_queue() which sets queue mapping to 1, and thus this gets propagated all the way to the physical device hitting only single queue in a mq device. Lets have bpf_lxc reset it as a workaround until we have a kernel fix. Doing this unconditionally is good anyway in order to avoid Pods messing with TX queue selection. Kernel will catch up with fix in 710ad98c363a ("veth: Do not record rx queue hint in veth_xmit"). Fixes: #18311 Reported-by: Laurent Bernaille <laurent.bernaille@datadoghq.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Laurent Bernaille <laurent.bernaille@datadoghq.com> Link (Bug): https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=edbea922025169c0e5cdca5ebf7bf5374cc5566c Link (Fix): https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=710ad98c363a66a0cd8526465426c5c5f8377ee0 Signed-off-by: Paul Chaignon <paul@cilium.io> 11 January 2022, 17:53:10 UTC
807cc54 docs: improve Kubespray installation guide [ upstream commit d8577ff9a9fe2dee7a684130346d695e092025a1 ] Previously, the Kubespray documentation recommended changing the role variables. However, changing the role files in an Ansible playbook could lead to problems. So, with this commit, the documentation recommends using the extra variables or editing the group_vars files. Co-authored-by: Yasin Taha Erol <yasintahaerol@gmail.com> Signed-off-by: necatican <necaticanyildirim@gmail.com> Signed-off-by: Paul Chaignon <paul@cilium.io> 11 January 2022, 17:53:10 UTC
3a16f37 build(deps): bump 8398a7/action-slack from 3.12.0 to 3.13.0 Bumps [8398a7/action-slack](https://github.com/8398a7/action-slack) from 3.12.0 to 3.13.0. - [Release notes](https://github.com/8398a7/action-slack/releases) - [Commits](https://github.com/8398a7/action-slack/compare/c9ff874f8549f97317ec9f6162d5449ee77bc984...a74b761b4089b5d730d813fbedcd2ec5d394f3af) --- updated-dependencies: - dependency-name: 8398a7/action-slack dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> 10 January 2022, 17:00:39 UTC
4594244 build(deps): bump actions/setup-go from 2.1.4 to 2.1.5 Bumps [actions/setup-go](https://github.com/actions/setup-go) from 2.1.4 to 2.1.5. - [Release notes](https://github.com/actions/setup-go/releases) - [Commits](https://github.com/actions/setup-go/compare/v2.1.4...v2.1.5) --- updated-dependencies: - dependency-name: actions/setup-go dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> 05 January 2022, 21:35:10 UTC
ac28c7f CODEOWNERS: janitors renamed to tophat The janitors team was renamed to tophat so we need to update the code owners accordingly. Signed-off-by: Paul Chaignon <paul@cilium.io> 03 January 2022, 23:02:20 UTC
4a27874 build(deps): bump docker/login-action from 1.10.0 to 1.12.0 Bumps [docker/login-action](https://github.com/docker/login-action) from 1.10.0 to 1.12.0. - [Release notes](https://github.com/docker/login-action/releases) - [Commits](https://github.com/docker/login-action/compare/f054a8b539a109f9f41c372932f1ae047eff08c9...42d299face0c5c43a0487c477f595ac9cf22f1a7) --- updated-dependencies: - dependency-name: docker/login-action dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> 20 December 2021, 19:04:32 UTC
3821525 build(deps): bump actions/upload-artifact from 2.3.0 to 2.3.1 Bumps [actions/upload-artifact](https://github.com/actions/upload-artifact) from 2.3.0 to 2.3.1. - [Release notes](https://github.com/actions/upload-artifact/releases) - [Commits](https://github.com/actions/upload-artifact/compare/da838ae9595ac94171fa2d4de5a2f117b3e7ac32...82c141cc518b40d92cc801eee768e7aafc9c2fa2) --- updated-dependencies: - dependency-name: actions/upload-artifact dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> 16 December 2021, 14:44:39 UTC
9134946 ci: Restart pods when toggling KPR switch [ upstream commit 06d9441d49b0b25e86af58bac16281a6950cbc27 ] Previously, in the graceful backend termination test we switched to KPR=disabled and we didn't restart CoreDNS. Before the switch, CoreDNS@k8s2 -> kube-apiserver@k8s1 was handled by the socket-lb, so the outgoing packet was $CORE_DNS_IP -> $KUBE_API_SERVER_NODE_IP. The packet should have been BPF masq-ed. After the switch, the BPF masq is no longer in place, so the packets from CoreDNS are subject to the iptables' masquerading (they can be either dropped by the invalid rule or masqueraded to some other port). Combined with CoreDNS unable to recover from connectivity errors [1], the CoreDNS was no longer able to receive updates from the kube-apiserver, thus NXDOMAIN errors for the new service name. To avoid such flakes, forcefully restart the DNS pods if the KPR setting change is detected. [1]: https://github.com/cilium/cilium/pull/18018 Signed-off-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com> 15 December 2021, 16:19:50 UTC
b083cf0 test: Redeploy DNS pods in AfterAll for datapath tests [ upstream commit c18cfc874620163769f0f66863b1c15962d25e1f ] Commit a0e77127dcd7 ("test: Redeploy DNS after changing endpointRoutes") didn't go quite far enough: It ensured that between individual tests in a given file, the DNS pods would be redeployed during the next run if there were significant enough datapath changes. However, the way it did this was by storing state within the 'kubectl' variable, which is recreated in each test file. So if the last test in one CI run enabled endpoint routes mode, then the DNS pods would not be redeployed to disable endpoint routes mode as part of the next test. Fix it by redeploying DNS after removing Cilium from the cluster. Kubernetes will remove the current DNS pods and reschedule them, but they will not launch until the next test deploys a new version of Cilium. Reported-by: Chris Tarazi <chris@isovalent.com> Fixes: 0e77127dcd7 ("test: Redeploy DNS after changing endpointRoutes") Related: #16717 Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com> 15 December 2021, 16:19:50 UTC
b3e4669 test: Redeploy DNS after changing endpointRoutes [ upstream commit a0e77127dcd7165ed1b07e3c800e41f74a342cbe ] In general up until now, Cilium has expected endpointRoutes mode to be set to exactly one value upon deployment and for that value to stay the same for the remainder of operation. Toggling it can lead to a mix of endpoints in different datapath modes which is not well covered in CI. In Github issue #16717 we observed that if the testsuite toggles this setting then we can end up with kubedns pods remaining in endpoint routes mode, even though the rest of the daemon (and other pods) are not configured in this mode. This can lead to connectivity issues in DNS, and a range of test failures in subsequent tests because DNS is broken. Longer term to resolve this, we could improve on Cilium to ensure that users can successfully toggle this setting on or off at runtime and properly handle this case, or alternatively shift all logic over to endpoint-routes mode by default and disable the other option. Given that CI for the master branch is in a poor state due to this issue today, and that part of the issue is CI reconfiguring the datapath state of Cilium during the test setup in an unsupported manner, this commit proposes to force DNS pod redeployment as part of setup any time a test reconfigures the endpointRoutes mode. This should mitigate the testing side issue while we mull over the right longer-term solution. Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com> 15 December 2021, 16:19:50 UTC
a73c819 docs: fix link to signoff / certificate of origin section [ upstream commit 6c169f63ec254de7777483b6f01c261215f9ec9c ] Signed-off-by: Timo Reimann <ttr314@googlemail.com> Signed-off-by: nathanjsweet <nathanjsweet@pm.me> 15 December 2021, 16:19:50 UTC
0c611a7 test: Extend coredns clusterrole with additional resource permissions [ upstream commit 854bb8601e420f2087f2f54e1890aae976f464da ] Commit 398d55cd didn't add permissions for `endpointslices` resource to the coredns `cluterrole` on k8s < 1.20. As a result, core-dns deployments failed on the these versions with the error - `2021-11-30T14:09:43.349414540Z E1130 14:09:43.349292 1 reflector.go:138] pkg/mod/k8s.io/client-go@v0.20.2/tools/cache/reflector.go:167: Failed to watch *v1beta1.EndpointSlice: failed to list *v1beta1.EndpointSlice: endpointslices.discovery.k8s.io is forbidden: User "system:serviceaccount:kube-system:coredns" cannot list resource "endpointslices" in API group "discovery.k8s.io" at the cluster scope` Fixes: 398d55cd Signed-off-by: Aditi Ghag <aditi@cilium.io> Signed-off-by: nathanjsweet <nathanjsweet@pm.me> 15 December 2021, 16:19:50 UTC
0d516cd test: Fix incorrect selector for netperf-service [ upstream commit 8002a50acba951b810b37b3748ec5ba90218fc63 ] Caught by random chance when using this manifest to test something locally. Might as well fix it in case someone uses this in the future and the service is not working as expected. AFAICT, no CI failures occurred from this typo because the Chaos test suite (only suite which uses this manifest) doesn't assert any traffic to the service, but rather to the netperf-server directly. Fixes: b4a3cf6abc6 ("Test: Run netperf in background while Cilium pod is being deleted") Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: nathanjsweet <nathanjsweet@pm.me> 15 December 2021, 16:19:50 UTC
78c1141 test/contrib: Bump CoreDNS version to 1.8.3 [ upstream commit 398d55cd94c0e16dc19b03c53f7b5040c1dd8f13 ] As reported in [1], Go's HTTP2 client < 1.16 had some serious bugs which could result in lost connections to kube-apiserver. Worse than this was that the client couldn't recover. In the case of CoreDNS the loose of connectivity to kube-apiserver was even not logged. I have validated this by adding the following rule on the node which was running the CoreDNS pod (6443 port as the socket-lb was doing the service xlation): iptables -I FORWARD 1 -m tcp --proto tcp --src $CORE_DNS_POD_IP \ --dport=6443 -j DROP After upgrading CoreDNS to the one which was compiled with Go >= 1.16, the pod was not only logging the errors, but also was able to recover from them in a fast way. An example of such an error: W1126 12:45:08.403311 1 reflector.go:436] pkg/mod/k8s.io/client-go@v0.20.2/tools/cache/reflector.go:167: watch of *v1.Endpoints ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding To determine the min vsn bump, I was using the following: for i in 1.7.0 1.7.1 1.8.0 1.8.1 1.8.2 1.8.3 1.8.4; do docker run --rm -ti "k8s.gcr.io/coredns/coredns:v$i" \ --version done CoreDNS-1.7.0 linux/amd64, go1.14.4, f59c03d CoreDNS-1.7.1 linux/amd64, go1.15.2, aa82ca6 CoreDNS-1.8.0 linux/amd64, go1.15.3, 054c9ae k8s.gcr.io/coredns/coredns:v1.8.1 not found: manifest unknown: k8s.gcr.io/coredns/coredns:v1.8.2 not found: manifest unknown: CoreDNS-1.8.3 linux/amd64, go1.16, 4293992 CoreDNS-1.8.4 linux/amd64, go1.16.4, 053c4d5 Hopefully, the bumped version will fix the CI flakes in which a service domain name is not available after 7min. In other words, CoreDNS is not able to resolve the name which means that it hasn't received update from the kube-apiserver for the service. [1]: https://github.com/kubernetes/kubernetes/issues/87615#issuecomment-803517109 Signed-off-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: nathanjsweet <nathanjsweet@pm.me> 15 December 2021, 16:19:50 UTC
d5ff3f7 install: Fix hubble-ui image references [ upstream commit 701967f31b7121dc811b3de6ca0fbbb284009749 ] [ Backport notes: ran `make -C install/kubernetes`, bundling other changes in the commit. ] The variable was not enclosed fully in speechmarks, observed by seeing failures in `make -C install/kubernetes` while backporting this to older branches. Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com> 15 December 2021, 14:02:06 UTC
91d4e88 ui: v0.8.5 [ upstream commit 7d748630fd965639d0baa43fe977afef98e436d2 ] Signed-off-by: Dmitry Kharitonov <dmitry@isovalent.com> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com> 15 December 2021, 14:02:06 UTC
4b8c115 build(deps): bump actions/upload-artifact from 2.2.4 to 2.3.0 Bumps [actions/upload-artifact](https://github.com/actions/upload-artifact) from 2.2.4 to 2.3.0. - [Release notes](https://github.com/actions/upload-artifact/releases) - [Commits](https://github.com/actions/upload-artifact/compare/27121b0bdffd731efa15d66772be8dc71245d074...da838ae9595ac94171fa2d4de5a2f117b3e7ac32) --- updated-dependencies: - dependency-name: actions/upload-artifact dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> 10 December 2021, 15:21:51 UTC
62bcb98 build(deps): bump actions/download-artifact from 2.0.10 to 2.1.0 Bumps [actions/download-artifact](https://github.com/actions/download-artifact) from 2.0.10 to 2.1.0. - [Release notes](https://github.com/actions/download-artifact/releases) - [Commits](https://github.com/actions/download-artifact/compare/3be87be14a055c47b01d3bd88f8fe02320a9bb60...f023be2c48cc18debc3bacd34cb396e0295e2869) --- updated-dependencies: - dependency-name: actions/download-artifact dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> 10 December 2021, 15:21:37 UTC
1b159dd Docs : Install Cilium overlay mode on EKS Once `--set ipam.mode=eni` is set in the helm install command, the Cilium CNI controller will continue to crash. Instead, run the following code snippet and the controller works fine. ``` helm install cilium cilium/cilium --version 1.9.11 \ --namespace kube-system \ --set egressMasqueradeInterfaces=eth0 \ --set nodeinit.enabled=true ``` Fixes : #12981 Signed-off-by: Oliver Wang <<a0924100192@gmail.com>> 06 December 2021, 18:51:46 UTC
33b9e70 bugtool: fix IP route debug gathering commands [ upstream commit e38e3c44f712b5f0ecf33efd1867c0ae16b241f7 ] Commit 8bcc4e5dd830 ("bugtool: avoid allocation on conversion of execCommand result to string") broke the `ip route show` commands because the change from `[]byte` to `string` causes the `%v` formatting verb to emit the raw byte slice, not the string. Fix this by using the `%s` formatting verb to make sure the argument gets interpreted as a string. Also fix another instance in `writeCmdToFile` where `fmt.Fprint` is now invoked with a byte slice. Grepping for `%v` in bugtool sources and manually inspecting all changes from commit 8bcc4e5dd830 showed no other instances where a byte slice could potentially end up being formatted in a wrong way. Fixes: 8bcc4e5dd830 ("bugtool: avoid allocation on conversion of execCommand result to string") Signed-off-by: Tobias Klauser <tobias@cilium.io> Signed-off-by: Quentin Monnet <quentin@isovalent.com> 02 December 2021, 23:08:48 UTC
b51a044 docs: add registry (quay.io/) for pre-loading images for kind [ upstream commit 4758bef62869d60df45a383c4be813ebed1343c8 ] in doc, it recommends docker pull image, but the command is : docker pull cilium/cilium:|IMAGE_TAG| this will download from docker.io However, in operator, it loads images from quay.io we should keep them the same, otherwise, we download for nothing. [ Backport note: Skip files - test/provision/manifest/1.20/coredns_deployment.yaml - test/provision/manifest/1.20/eks/coredns_deployment.yaml - test/provision/manifest/1.21/coredns_deployment.yaml - test/provision/manifest/1.21/eks/coredns_deployment.yaml - test/provision/manifest/1.22/coredns_deployment.yaml - test/provision/manifest/1.22/eks/coredns_deployment.yaml as none of them are present on v1.9. ] Signed-off-by: adamzhoul <adamzhoul186@gmail.com> Signed-off-by: Quentin Monnet <quentin@isovalent.com> 02 December 2021, 23:08:48 UTC
befda9c bugtool: fix data race occurring when running commands [ upstream commit 690c11201d9e48c0210c8dc644cc281f40e896ad ] A version of the classic closure with concurrency gotcha. Bind the value of cmd in the loop to a new variable to address the issue. Signed-off-by: Robin Hahling <robin.hahling@gw-computing.net> Signed-off-by: Quentin Monnet <quentin@isovalent.com> 29 November 2021, 13:00:38 UTC
2472efc ipsec: fix source template in skip rule [ upstream commit b385f0f1c0e20e87af4315477d78266b9261d8d1 ] This patch modifies a forward policy update introduced by 0b52fd76c0101e966d701c07ca174517948739e4 so that the template source matches the source which is 0.0.0.0/0 (wildcard). Above modification addresses an issue of intermittent packet drops, as discussed in detail below. During an investigation of intermittent dropped packets in AKS (kernel 5.4.0-1061-azure) with IPSec enabled, there was an increase of (XfrmInTmplMismatch) errors in /proc/net/xfrm_stat as packets were dropped. Tracing revaled that the packets were dropped due to an __xfrm_policy_check failure when the packet was redirected from eth0 (after decryption) to the LXC device of a pod. Further investigation, attributed the drops to changes in the forwarding policy. Specifically, the forwarding policy would change as: src 0.0.0.0/0 dst 10.240.0.0/16 dir fwd priority 2975 - tmpl src 0.0.0.0 dst 10.240.0.19 + tmpl src 10.240.0.19 dst 10.240.0.61 proto esp reqid 1 mode tunnel level use And back: src 0.0.0.0/0 dst 10.240.0.0/16 dir fwd priority 2975 - tmpl src 10.240.0.19 dst 10.240.0.61 + tmpl src 0.0.0.0 dst 10.240.0.19 proto esp reqid 1 mode tunnel level use The above change was caused by: func (n *linuxNodeHandler) enableIPsec(newNode *nodeTypes.Node) in pkg/datapath/linux/node.go. Modifying the code to avoid chancing the policy elimiated the packet drops. The are two places were the xfrm policy is updated in enableIPsec(): (1) inside UpsertIPsecEndpoint() when an IN policy is specified (as happens if newNode.IsLocal()) (2) in enableIPsec() itself, as introduced by 0b52fd76c0101e966d701c07ca174517948739e4 For example, adding log messages to IpSecReplacePolicyFwd and UpsertIPsecEndpoint produced: level=info msg="IpSecReplacePolicyFwd: src=0.0.0.0/0 dst=10.240.0.61/16 tmplSrc=10.240.0.19/16 tmplDst=10.240.0.61/16" subsys=ipsec level=info msg="UpsertIPsecEndpoint: local:10.240.0.19/16 remote:0.0.0.0/0 fowrard:10.240.0.19/16" subsys=ipsec level=info msg="IpSecReplacePolicyFwd: src=0.0.0.0/0 dst=10.240.0.19/16 tmplSrc=0.0.0.0/0 tmplDst=10.240.0.19/16" subsys=ipsec level=info msg="UpsertIPsecEndpoint: exit" subsys=ipsec Additional testing revealed that the update that resulting in a template with tmplSrc=10.240.0.19/16 was the culprit for the packet drops. Making the source template matching the source which is a wildcard in update (2), eliminated the packet drops. Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com> Signed-off-by: Quentin Monnet <quentin@isovalent.com> 29 November 2021, 13:00:38 UTC
713521c ipsec: Fix L7 with endpoint routes [ upstream commit 3ffe49e181727c1c964e0b4f4d89e6c511b9e44e ] With the previous patch, when IPsec and endpoint routes are enabled, packets flow directly from bpf_network to bpf_lxc via the Linux stack, instead of going through bpf_host. However, we noticed that, when L7 policies are applied, connectivity fails between the proxy and the destination: 43.808: cilium-test/client2-6dd75b74c6-9nxhx:45704 -> cilium-test/echo-other-node-697d5d69b7-lglxf:8080 from-endpoint FORWARDED (TCP Flags: SYN) 43.808: cilium-test/client2-6dd75b74c6-9nxhx:45704 -> cilium-test/echo-other-node-697d5d69b7-lglxf:8080 UNKNOWN 5 (TCP Flags: SYN) 43.808: cilium-test/client2-6dd75b74c6-9nxhx:45704 -> cilium-test/echo-other-node-697d5d69b7-lglxf:8080 to-proxy FORWARDED (TCP Flags: SYN) 43.808: cilium-test/echo-other-node-697d5d69b7-lglxf:8080 -> cilium-test/client2-6dd75b74c6-9nxhx:45704 from-proxy FORWARDED (TCP Flags: SYN, ACK) 43.808: cilium-test/echo-other-node-697d5d69b7-lglxf:8080 -> cilium-test/client2-6dd75b74c6-9nxhx:45704 to-endpoint FORWARDED (TCP Flags: SYN, ACK) 43.808: cilium-test/client2-6dd75b74c6-9nxhx:45704 -> cilium-test/echo-other-node-697d5d69b7-lglxf:8080 from-endpoint FORWARDED (TCP Flags: ACK) 43.808: cilium-test/client2-6dd75b74c6-9nxhx:45704 -> cilium-test/echo-other-node-697d5d69b7-lglxf:8080 to-proxy FORWARDED (TCP Flags: ACK) 43.810: cilium-test/client2-6dd75b74c6-9nxhx:45704 -> cilium-test/echo-other-node-697d5d69b7-lglxf:8080 from-endpoint FORWARDED (TCP Flags: ACK, PSH) 43.810: cilium-test/client2-6dd75b74c6-9nxhx:45704 -> cilium-test/echo-other-node-697d5d69b7-lglxf:8080 to-proxy FORWARDED (TCP Flags: ACK, PSH) 43.810: cilium-test/echo-other-node-697d5d69b7-lglxf:8080 -> cilium-test/client2-6dd75b74c6-9nxhx:45704 from-proxy FORWARDED (TCP Flags: ACK) 43.810: cilium-test/echo-other-node-697d5d69b7-lglxf:8080 -> cilium-test/client2-6dd75b74c6-9nxhx:45704 to-endpoint FORWARDED (TCP Flags: ACK) 43.810: cilium-test/client2-6dd75b74c6-9nxhx:45704 -> cilium-test/echo-other-node-697d5d69b7-lglxf:8080 http-request FORWARDED (HTTP/1.1 GET http://10.240.0.55:8080/) 43.812: cilium-test/client2-6dd75b74c6-9nxhx:45704 -> cilium-test/echo-other-node-697d5d69b7-lglxf:8080 from-network FORWARDED (TCP Flags: SYN) 43.812: cilium-test/client2-6dd75b74c6-9nxhx:45704 -> cilium-test/echo-other-node-697d5d69b7-lglxf:8080 from-stack FORWARDED (TCP Flags: SYN) 43.812: cilium-test/client2-6dd75b74c6-9nxhx:45704 -> cilium-test/echo-other-node-697d5d69b7-lglxf:8080 to-endpoint FORWARDED (TCP Flags: SYN) 43.812: cilium-test/echo-other-node-697d5d69b7-lglxf:8080 -> cilium-test/client2-6dd75b74c6-9nxhx:45704 from-endpoint FORWARDED (TCP Flags: SYN, ACK) 43.812: cilium-test/echo-other-node-697d5d69b7-lglxf:8080 -> cilium-test/client2-6dd75b74c6-9nxhx:45704 to-stack FORWARDED (TCP Flags: SYN, ACK) 43.813: cilium-test/echo-other-node-697d5d69b7-lglxf:8080 -> cilium-test/client2-6dd75b74c6-9nxhx:45704 from-network FORWARDED (TCP Flags: SYN, ACK) 44.827: cilium-test/client2-6dd75b74c6-9nxhx:45704 -> cilium-test/echo-other-node-697d5d69b7-lglxf:8080 from-network FORWARDED (TCP Flags: SYN) 44.827: cilium-test/client2-6dd75b74c6-9nxhx:45704 -> cilium-test/echo-other-node-697d5d69b7-lglxf:8080 from-stack FORWARDED (TCP Flags: SYN) 44.827: cilium-test/client2-6dd75b74c6-9nxhx:45704 -> cilium-test/echo-other-node-697d5d69b7-lglxf:8080 to-endpoint FORWARDED (TCP Flags: SYN) We can see the TCP handshake between the client and proxy, followed by an attempt to perform the TCP handshake between the proxy and server. That second part fails, as the SYN+ACK packets sent by the server never seem to reach the proxy; they are dropped somewhere after the from-network observation point. At the same time, we can see the IPsec error counter XfrmInNoPols increasing on the client node. This indicates that SYN+ACK packets were dropped after decryption, because the XFRM state used for decryption doesn't match any XFRM policy. The XFRM state used for decryption is: src 0.0.0.0 dst 10.240.0.18 proto esp spi 0x00000003 reqid 1 mode tunnel replay-window 0 mark 0xd00/0xf00 output-mark 0xd00/0xf00 aead rfc4106(gcm(aes)) 0x3ec3e84ac118c3baf335392e7a4ea24ee3aecb2b 128 anti-replay context: seq 0x0, oseq 0x0, bitmap 0x00000000 sel src 0.0.0.0/0 dst 0.0.0.0/0 And it should match the following XFRM policy: src 0.0.0.0/0 dst 10.240.0.0/16 dir in priority 0 mark 0xd00/0xf00 tmpl src 0.0.0.0 dst 10.240.0.18 proto esp reqid 1 mode tunnel After the packet is decrypted however, we hit the following rule in iptables because we're going to the proxy. -A CILIUM_PRE_mangle -m socket --transparent -m comment --comment "cilium: any->pod redirect proxied traffic to host proxy" -j MARK --set-xmark 0x200/0xffffffff As a result, the packet mark is set to 0x200 and we don't match the 0xd00 packet mark of the XFRM policy anymore. The packet is therefore dropped with XfrmInNoPols. To avoid this, we can simply mark the XFRM policy optional when endpoint routes are enabled, in the same way we do for tunneling. Fixes: 287f49c2 ("cilium: encryption, fix redirect when endpoint routes enabled") Signed-off-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Quentin Monnet <quentin@isovalent.com> 29 November 2021, 13:00:38 UTC
d4bd263 config: Fix incorrect packet path with IPsec and endpoint routes [ upstream commit 573159dffc68aeb3a48280661063a97f0dd9cc55 ] Note this commit was applied before as 7ef59aa9 and reverted due to a broken test in commit 346713b2. When endpoint routes are enabled, we attach a BPF program on the way to the container and add a Linux route to the lxc interface. So when coming from bpf_network with IPsec, we should use that route to go directly to the lxc device and its attached BPF program. In contrast, when endpoint routes are disabled, we run the BPF program for ingress pod policies from cilium_host, via a tail call in bpf_host. Therefore, in that case, we need to jump from bpf_network to cilium_host first, to follow the correct path to the lxc interface. That's what commit 287f49c2 ("cilium: encryption, fix redirect when endpoint routes enabled") attempted to implement for when endpoint routes are enabled. It's goal was to go directly from bpf_network to the stack in that case, to use the per-endpoint Linux routes to the lxc device. That commit however implements a noop change: ENABLE_ENDPOINT_ROUTES is defined as a per-endpoint setting, but then used in bpf_network, which is not tied to any endpoint. In practice, that means the macro is defined in the ep_config.h header files used by bpf_lxc, whereas bpf_network (from which the macro is used) relies on the node_config.h header file. The fix is therefore simple: we need to define ENABLE_ENDPOINT_ROUTES as a global config, written in node_config.h. To reproduce the bug and validate the fix, I deploy Cilium on GKE (where endpoint routes are enabled by default) with: helm install cilium ./cilium --namespace kube-system \ --set nodeinit.enabled=true \ --set nodeinit.reconfigureKubelet=true \ --set nodeinit.removeCbrBridge=true \ --set cni.binPath=/home/kubernetes/bin \ --set gke.enabled=true \ --set ipam.mode=kubernetes \ --set nativeRoutingCIDR=$NATIVE_CIDR \ --set nodeinit.restartPods=true \ --set image.repository=docker.io/pchaigno/cilium-dev \ --set image.tag=fix-ipsec-ep-routes \ --set operator.image.repository=quay.io/cilium/operator \ --set operator.image.suffix="-ci" \ --set encryption.enabled=true \ --set encryption.type=ipsec I then deployed the below manifest and attempted a curl request from pod client to the service echo-a. metadata: name: echo-a labels: name: echo-a spec: template: metadata: labels: name: echo-a spec: containers: - name: echo-a-container env: - name: PORT value: "8080" ports: - containerPort: 8080 image: quay.io/cilium/json-mock:v1.3.0 imagePullPolicy: IfNotPresent readinessProbe: timeoutSeconds: 7 exec: command: - curl - -sS - --fail - --connect-timeout - "5" - -o - /dev/null - localhost:8080 selector: matchLabels: name: echo-a replicas: 1 apiVersion: apps/v1 kind: Deployment --- metadata: name: echo-a labels: name: echo-a spec: ports: - name: http port: 8080 type: ClusterIP selector: name: echo-a apiVersion: v1 kind: Service --- apiVersion: "cilium.io/v2" kind: CiliumNetworkPolicy metadata: name: "l3-rule" spec: endpointSelector: matchLabels: name: client ingress: - fromEndpoints: - matchLabels: name: echo-a --- apiVersion: v1 kind: Pod metadata: name: client labels: name: client spec: affinity: podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: app.kubernetes.io/name operator: In values: - echo-a topologyKey: kubernetes.io/hostname containers: - name: netperf args: - sleep - infinity image: cilium/netperf Fixes: 287f49c2 ("cilium: encryption, fix redirect when endpoint routes enabled") Signed-off-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Quentin Monnet <quentin@isovalent.com> 29 November 2021, 13:00:38 UTC
a4d175d nodediscovery: Fix local host identity propagation [ upstream commit 7bf60a59f0720e5546c68dccf4e4fa133407355f ] The local NodeDiscovery implementation was previously informing the rest of the Cilium agent that the local node's identity is "Remote Node" because of the statically initialized "identity.GetLocalNodeID" value. However, that value should only ever be used for external workloads cases in order to prepare the source identity used for transmitting traffic to other Cilium nodes. It should not be used for locally determining the identity of traffic coming from the host itself. Fix this by hardcoding the identity to "Host" identity. Fixes: c864fd3cf5cd ("daemon: Split IPAM bootstrap, join cluster in between") Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Paul Chaignon <paul@cilium.io> 24 November 2021, 09:47:56 UTC
e62eb70 bpf: Additional tail call for IPvX-only setups [ upstream commit 1b6a98ccf809bc5b93a0d39cba2aac39888c2ebc ] When both IPv4 and IPv6 are enabled, we split the to/from-container BPF programs into two code paths, one for each IP family, to reduce program size and complexity. Because our existing K8sVerifier test only covers the IPv4+IPv6 configuration, new complexity and program size issues sneaked in for the IPvX-only setups. These new issues occur when to/from-container contain both the initial IP parsing code and the IPv4 (resp. IPv6---we have one issue per family) code path. Splitting these programs such that they only contain the initial IP parsing code is enough to fix these issues. Signed-off-by: Paul Chaignon <paul@cilium.io> 24 November 2021, 09:47:56 UTC
e6be7d2 bugtool: write files without buffering if no postprocessing is needed [ upstream commit 8d4ec81ae46807f1ca8d2db3a646fe7f28f9f488 ] When the command output doiesn't need to be postprocessed, the output can be written directly to the file without buffering. This should significantly reduce memory usage. On my test system this reduces total RSS by about one third: Before: $ /usr/bin/time -f 'RSS=%MKB' cilium-bugtool RSS=63168KB After: $ /usr/bin/time -f 'RSS=%MKB' cilium-bugtool RSS=42752KB Signed-off-by: Tobias Klauser <tobias@cilium.io> Signed-off-by: Paul Chaignon <paul@cilium.io> 24 November 2021, 09:47:56 UTC
8ad70fd bugtool: avoid allocation on conversion of execCommand result to string [ upstream commit 8bcc4e5dd830654a156d0f73bf9bc2d7863d7402 ] Converting []byte to string can cause allocation due to the fact that strings are immutable in Go. By letting execCommand return []byte instead of string we can avoid some of these allocations, i.e also the allcoation caused by further processing of that return value. Signed-off-by: Tobias Klauser <tobias@cilium.io> Signed-off-by: Paul Chaignon <paul@cilium.io> 24 November 2021, 09:47:56 UTC
d123469 bugtool: use github.com/cilium/workerpool to parallelize tasks [ upstream commit 64940c453e2942745166a24b39acafd4cb2a16d2 ] Instead of open-coding a worker pool use the existing github.com/cilium/workerpool package. Also limit the number of workers to the number of CPUs by default, which should help to reduce excessive memory usage by too many parallel goroutines at the price of potentially slightly slower bugtool report creation. If users want more parallel tasks, the number can be specified using the newly introduces `--parallel-workers` option. Signed-off-by: Tobias Klauser <tobias@cilium.io> Signed-off-by: Paul Chaignon <paul@cilium.io> 24 November 2021, 09:47:56 UTC
eef23cf bugtool: move regex compilation out of loop [ upstream commit 6a5490fab9be0f88619d861f709fcaf67371b1bb ] There is no need to recompile the constant regexp for each iteration of the loop. Also, hashEncryptionKeys is called in a loop, so move the regexp compilation to a global var to avoid recompiling it. Signed-off-by: Tobias Klauser <tobias@cilium.io> Signed-off-by: Paul Chaignon <paul@cilium.io> 24 November 2021, 09:47:56 UTC
a1df229 bugtool: preallocate ethtool command slices [ upstream commit f05a344504c8cea44ae8f90cc67da4c5f99ae432 ] Avoid reallocations and GC pressure in the loop. Signed-off-by: Tobias Klauser <tobias@cilium.io> Signed-off-by: Paul Chaignon <paul@cilium.io> 24 November 2021, 09:47:56 UTC
195c9ec bugtool: fix typo in function name and file names [ upstream commit e562904d45b404316c6e88e8bd08a3803d94fc93 ] s/ethool/ethtool/ Also fix the build for other non-Linux platforms besides macOS. Signed-off-by: Tobias Klauser <tobias@cilium.io> Signed-off-by: Paul Chaignon <paul@cilium.io> 24 November 2021, 09:47:56 UTC
af7d975 .github: Rename project/ci-force to ci/flake [ upstream commit 988e26e29329807d269e715904e72221f46aed09 ] Following discussion in the community meeting, we decided to rename the project/ci-force label to ci/flake. We need to rename it in MLH and the issue template. Signed-off-by: Paul Chaignon <paul@cilium.io> 24 November 2021, 09:47:56 UTC
093bf30 .github: Increase reporting threshold for new flakes [ upstream commit 693163e45335077b1a20bf465a2a711ab013233f ] MLH assumes a flake is a new one if the similarity to existing flakes is below 75%. This threshold is a bit low for flakes affecting the same test but failing with a different error message. We can adjust to 85% and see. Related: https://github.com/cilium/cilium/issues/17270. Signed-off-by: Paul Chaignon <paul@cilium.io> 24 November 2021, 09:47:56 UTC
70a027c test: Do not require netpols in 'waitNextPolicyRevisions()' [ upstream commit c8d2fc7129bcc236c130141a680861e56908fad4 ] 'waitNextPolicyRevisions()' currently returns 'true' when no k8s network policies are applied, bypassing the Cilium agent policy revision wait in this case. As our tests typically (never?) have no NPs applied, we have not actually waited for CNP or CCNP changes to take place in all Cilium PODs before proceeding with the tests. This may have caused CI flakes. Fix this by removing the code that checks for the presence of NPs. Reported-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> Signed-off-by: Paul Chaignon <paul@cilium.io> 24 November 2021, 09:47:56 UTC
60dc27f docs: Use git+https in requirements.txt [ upstream commit 87195bf104be5aca0c19072a7b6706cfa9c37b06 ] Netlify preview is currently failing with the following error: ``` Collecting git+git://github.com/cilium/sphinx_rtd_theme.git@v0.7 (from -r requirements.txt (line 23)) Cloning git://github.com/cilium/sphinx_rtd_theme.git (to revision v0.7) ... Running command git clone -q git://github.com/cilium/sphinx_rtd_theme.git ... fatal: remote error: The unauthenticated git protocol on port 9418 is no longer supported. Please see https://github.blog/2021-09-01-improving-git-protocol-security-github/ for more information. ``` Signed-off-by: Michi Mutsuzaki <michi@isovalent.com> Signed-off-by: Paul Chaignon <paul@cilium.io> 24 November 2021, 09:47:56 UTC
4256589 bug/pkg/health: Fix Nil Address Issue in Node Update Mechanism [ upstream commit c9da51c627b12c92a68cddb10d3876b3bc851eeb ] It is possible for a node with a "<nil>" primary address to be "added" to the health pkg cache. When this happens subsequent updates **will not** flush out the bad value and thus it will persist as a valid status for a node until the entire cilium daemon is reset. This fixes the bug by not caching nil values at all. Signed-off-by: Nate Sweet <nathanjsweet@pm.me> Signed-off-by: Paul Chaignon <paul@cilium.io> 24 November 2021, 09:47:56 UTC
8ea4a7f bpf: Verifier tests with a single IP family for bpf_{host,lxc} [ upstream commit 9acd9d363b8b29b3928cda482afb1bb08356e39c ] In bpf_host and bpf_lxc, we split some BPF programs into tail calls conditionally depending on whether both IPv4 and IPv6 are enabled or only one of the two. These two options can therefore have an impact on whether we reach the complexity limit. This commit duplicates the existing tested datapath configurations of bpf_host and bpf_lxc, but with only one of IPv4 or IPv6 enabled. We are now testing 3 datapath configurations per kernel instead of 1. Signed-off-by: Paul Chaignon <paul@cilium.io> 24 November 2021, 09:47:56 UTC
709e15c test/K8sVerifier: Support testing several configs per program [ upstream commit 6bd608a333a9e8e17113a3217a0e1f576c302f88 ] [ Backporter's notes: For the complexity-tests files, I retrieved the versions from the v1.10 branch (closer to v1.9) and removed options introduced in v1.10: ENABLE_DSR_ICMP_ERRORS, TUNNEL_MODE, ENABLE_EGRESS_GATEWAY, ETH_HLEN, ENABLE_CUSTOM_CALLS. Conversely, NEEDS_RELAX_VERIFIER was added to 4.19 and 5.4 configurations as per the BPF Makefile. ] Until now, K8sVerifier relied only on bpf/Makefile to compile BPF programs for verifier tests. The Makefile would define, for each BPF program, the set of configs to enable to maximize program size and complexity. All BPF programs would then be compiled at once and loaded with a single call to verifier-test.sh. This commit rewrites most of K8sVerifier to support testing more than one datapath config per BPF program. The list of datapath configs to test for each program is defined in a file at bpf/complexity-tests/[kernel]/[program].txt. For each BPF program and for each config in the file, K8sVerifier then compiles and loads the program. This change will allow us to significantly increase our complexity coverage by testing more configurations. Signed-off-by: Paul Chaignon <paul@cilium.io> 24 November 2021, 09:47:56 UTC
back to top