https://github.com/cilium/cilium

sort by:
Revision Author Date Message Commit Date
0338ca7 Prepare for release v1.10.9 Signed-off-by: André Martins <andre@cilium.io> 28 March 2022, 17:55:12 UTC
d034b08 Clarify taint effects in the documentation. [ upstream commit 4e6b6b5c6359857230b1b502d1fce1a57e5c78c2 ] As part of a previous PR, 'NoExecute' started being recommended as the effect that should be placed on nodes to avoid unmanaged pods. While this is correct and required for guaranteeing in the best possible way pods don't come up as unmanaged, there are some considerations that can be made and trade-offs that can be pointed out. This PR attempts at providing a clarification on the taint-based approach to prevent unmanaged pods by adding a page describing the known implications of each effect. Signed-off-by: Bruno M. Custódio <brunomcustodio@gmail.com> Signed-off-by: Quentin Monnet <quentin@isovalent.com> 28 March 2022, 17:43:22 UTC
9702355 build(deps): bump actions/cache from 2.1.7 to 3 Bumps [actions/cache](https://github.com/actions/cache) from 2.1.7 to 3. - [Release notes](https://github.com/actions/cache/releases) - [Commits](https://github.com/actions/cache/compare/937d24475381cd9c75ae6db12cb4e79714b926ed...4b0cf6cc4619e737324ddfcec08fff2413359514) --- updated-dependencies: - dependency-name: actions/cache dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> 26 March 2022, 00:42:44 UTC
3f3eb8e update cilium-{runtime,builder} Signed-off-by: Joe Stringer <joe@cilium.io> 18 March 2022, 19:39:55 UTC
302c805 docs: fix tip about opening the Hubble server port on all nodes [ upstream commit e396b5e41ed61402e3cfffff449763ef4c208616 ] The documentation page about setting up Hubble observability wrongly states that TCP port 4245 needs to be open on all nodes running Cilium to allow Hubble Relay to operate correctly. This is incorrect. Port 4245 is actually the default port used by Hubble Relay, which is a regular deployment and doesn't require any particular action from the user. However, Hubble server uses port 4244 by default and, given that it is embedded in the Cilium agent and uses a host port, it requires it to be open on all nodes to allow Hubble Relay to connect to each Hubble server instance. Signed-off-by: Robin Hahling <robin.hahling@gw-computing.net> Signed-off-by: Tam Mach <tam.mach@isovalent.com> 17 March 2022, 17:19:51 UTC
7c8eae1 Fix 'node-init' in GKE's 'cos' images. [ upstream commit ea9fd6f97b6e7b0d115067dc9f69ba461055530f ] Turns out that in GKE's 'cos' images the 'containerd' binary is still present even though '/etc/containerd/config.toml' is not. Hence, the kubelet wrapper would still be installed for these images according to the current check, even though it's not necessary. What's worse, starting the kubelet would fail because the 'sed' command targeting the aforementioned file would fail. This PR changes the check to rely on the presence of the '--container-runtime-endpoint' flag in the kubelet, which is probably a more reliable way of detecting '*_containerd' flavours and only applying the fix in these cases. Fixes #19015. Signed-off-by: Bruno M. Custódio <brunomcustodio@gmail.com> Co-authored-by: Alexandre Perrin <alex@kaworu.ch> Signed-off-by: Tam Mach <tam.mach@isovalent.com> 17 March 2022, 17:19:51 UTC
46287ed helm: check for contents of bootstartFile [ upstream commit 6c15df9a1ad978427fc291c1e3ab681faa8342d0 ] Checking for the existence of the .Values.nodeinit.bootstrapFile file will be a no-op because the file is created by kubelet if it does not exist. Instead, we should check if the file has some contents inside of it which is when we can be sure the node-init DaemonSet has started. Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Tam Mach <tam.mach@isovalent.com> 17 March 2022, 17:19:51 UTC
76faf6c ipam/crd: Fix spurious CiliumNode update status failures [ upstream commit 18b10b49fc34b9748f7b86fda872e3cb1375a859 ] When running in CRD-based IPAM modes (Alibaba, Azure, ENI, CRD), it is possible to observe spurious "Unable to update CiliumNode custom resource" failures in the cilium-agent. The full error message is as follows: "Operation cannot be fulfilled on ciliumnodes.cilium.io <node>: the object has been modified; please apply your changes to the latest version and try again". It means that the Kubernetes `UpdateStatus` call has failed because the local `ObjectMeta.ResourceVersion` of submitted CiliumNode version is out of date. In the presence of races, this error is expected and will resolve itself once the agent receives a more recent version of the object with the new resource version. However, it is possible that the resource version of a `CiliumNode` object is bumped even though the `Spec` or `Status` of the `CiliumNode` remains the same. This for examples happens when `ObjectMeta.ManagedFields` is updated by the Kubernetes apiserver. Unfortunately, `CiliumNode.DeepEqual` does _not_ consider any `ObjectMeta` fields (including the resource version). Therefore two objects with different resource versions are considered the same by the `CiliumNode` watcher used by IPAM. But to be able to successfully call `UpdateStatus` we need to know the most recent resource version. Otherwise, `UpdateStatus` will always fail until the `CiliumNode` object is updated externally for some reason. Therefore, this commit modifies the logic to always store the most recent version of the `CiliumNode` object, even if `Spec` or `Status` has not changed. This in turn allows `nodeStore.refreshNode` (which invokes `UpdateStatus`) to always work on the most recently observed resource version. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> Signed-off-by: Tam Mach <tam.mach@isovalent.com> 17 March 2022, 17:19:51 UTC
f1c276d bpf: avoid encrypt_key map lookup if IPsec is disabled [ upstream commit 3cb9ba147c0423a828e9df4330411e8b35bee4f2 ] In the bpf_lxc program's functions ipv6_l3_from_lxc and handle_ipv4_from_lxc, currently encrypt_key is always looked up in the encrypt map, regardless of whether IPsec is enabled or not. However, its value is only actually used when IPsec is enabled. Thus, the call can be avoid when IPsec is disabled. This also slightly reduces program size if !defined(ENABLE_IPSEC). Signed-off-by: Tobias Klauser <tobias@cilium.io> Signed-off-by: Tam Mach <tam.mach@isovalent.com> 17 March 2022, 17:19:51 UTC
d63fc05 bpf: fix -Wunused-but-set-variable errors when building with LLVM 14 [ upstream commit 71452d57fae5f2c1568a8ef043fcd305559b31d6 ] Building the BPF datapath with LLVM 14 leads to the following errors: bpf_lxc.c:101:16: error: variable 'daddr' set but not used [-Werror,-Wunused-but-set-variable] union v6addr *daddr, orig_dip; ^ bpf_lxc.c:103:7: error: variable 'encrypt_key' set but not used [-Werror,-Wunused-but-set-variable] __u8 encrypt_key = 0; ^ bpf_lxc.c:102:8: error: variable 'tunnel_endpoint' set but not used [-Werror,-Wunused-but-set-variable] __u32 tunnel_endpoint = 0; ^ bpf_lxc.c:526:7: error: variable 'encrypt_key' set but not used [-Werror,-Wunused-but-set-variable] __u8 encrypt_key = 0; ^ bpf_lxc.c:525:8: error: variable 'tunnel_endpoint' set but not used [-Werror,-Wunused-but-set-variable] __u32 tunnel_endpoint = 0; ^ These are normally warnings, but errors in this case due to the use of -Werror when compiling Cilium's bpf programs. Fix these by marking the affected variables as __maybe_unused. Signed-off-by: Tobias Klauser <tobias@cilium.io> Signed-off-by: Tam Mach <tam.mach@isovalent.com> 17 March 2022, 17:19:51 UTC
49a3bff build(deps): bump docker/build-push-action from 2.9.0 to 2.10.0 Bumps [docker/build-push-action](https://github.com/docker/build-push-action) from 2.9.0 to 2.10.0. - [Release notes](https://github.com/docker/build-push-action/releases) - [Commits](https://github.com/docker/build-push-action/compare/7f9d37fa544684fb73bfe4835ed7214c255ce02b...ac9327eae2b366085ac7f6a2d02df8aa8ead720a) --- updated-dependencies: - dependency-name: docker/build-push-action dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> 16 March 2022, 11:17:32 UTC
a39a72e Include nodeName on pods [ upstream commit 4240222a6f8c8a65181d35132c09217d32947c82 ] Needed for a follow-up change to address an endpoint restoration issue (https://github.com/cilium/cilium/issues/18923). Signed-off-by: Timo Reimann <ttr314@googlemail.com> Signed-off-by: Chris Tarazi <chris@isovalent.com> 16 March 2022, 10:01:58 UTC
cbdf0eb Add metrics for endpoint objects garbage collection [ Backporter's notes: Conflicts around import statement ordering from v1.12 dev cycle and metrics from CES feature in v1.11. Nothing major. ] [ upstream commit 1b7bce37a929656c4594b591da5815626ec8e1e5 ] Signed-off-by: Timo Reimann <ttr314@googlemail.com> Signed-off-by: Chris Tarazi <chris@isovalent.com> 16 March 2022, 10:01:58 UTC
f90ef11 Prevent CiliumEndpoint removal by non-owning agent [ Backporter's notes: Conflicts resolved around imports from v1.11 version and logfields that don't exist in the v1.10 tree. Nothing major. ] [ upstream commit 6f7bf6c51f7a86e458947149a72b4c12f42c331c ] CEPs are creating as well as updated based on informer store data local to an agent's node but (necessarily) deleted globally from the API server. This can currently lead to situations where an agent that does not own a CEP deletes an unrelated CEP. Avoid this problem by having agents maintain the CEP UID and using it as a precondition when deleting CEPs. This guarantees that only the owning agents can delete "their" CEPs. Signed-off-by: Timo Reimann <ttr314@googlemail.com> Signed-off-by: Chris Tarazi <chris@isovalent.com> 16 March 2022, 10:01:58 UTC
c04f41e install/helm: Add Image Override Option to All Images In order to enable offline deployment for certain platforms (like OpenShift) we need to be able to have a universal override for all images so that the OpenShift certified operator can list its "related images"[1][2]. [1]https://docs.openshift.com/container-platform/4.9/operators/operator_sdk/osdk-generating-csvs.html#olm-enabling-operator-for-restricted-network_osdk-generating-csvs [2]https://redhat-connect.gitbook.io/certified-operator-guide/appendix/offline-enabled-operators Signed-off-by: Nate Sweet <nathanjsweet@pm.me> 09 March 2022, 12:49:18 UTC
c3ed5a9 Add support to configure Clustermesh connections in Helm Chart In order to connect Clustermesh clusters without cilium-cli tool we would need to manually patch the cilium agent with hostAliases, configure the cilium-clustermesh secret with mTLS material from the connected clusters. This commit adds support to connect multiple Clustermesh clusters using the Helm Chart. Fixes: cilium#17811 Signed-off-by: Samuel Torres <samuelpirestorres@gmail.com> Signed-off-by: Nate Sweet <nathanjsweet@pm.me> 09 March 2022, 12:49:18 UTC
efd883e Update Go to 1.16.15 Signed-off-by: Tobias Klauser <tobias@cilium.io> 09 March 2022, 12:44:32 UTC
e062349 ctmap: Fix data race for accessing nat maps [ upstream commit a0f5c0d6804a39160ac252b0276573ec02da2f12 ] Commit c9810bf7b2 introduced garbage collection for cleaning orphan entries in the nat maps whereby concurrent accesses to the maps weren't serialized. The nat maps accessed via ct map construct are susceptible to data races in asynchronously running goroutines upon agent restart when endpoints are restored - ``` 2022-02-21T02:42:13.757888057Z WARNING: DATA RACE 2022-02-21T02:42:13.757895621Z Write at 0x00c00081a830 by goroutine 360: 2022-02-21T02:42:13.757912783Z github.com/cilium/cilium/pkg/bpf.(*Map).Close() 2022-02-21T02:42:13.757920669Z /go/src/github.com/cilium/cilium/pkg/bpf/map_linux.go:581 +0x1c4 2022-02-21T02:42:13.757927597Z github.com/cilium/cilium/pkg/maps/ctmap.doGC4·dwrap·4() 2022-02-21T02:42:13.757934561Z /go/src/github.com/cilium/cilium/pkg/maps/ctmap/ctmap.go:422 +0x39 2022-02-21T02:42:13.757941184Z github.com/cilium/cilium/pkg/maps/ctmap.doGC4() 2022-02-21T02:42:13.757947352Z /go/src/github.com/cilium/cilium/pkg/maps/ctmap/ctmap.go:482 +0x4d5 2022-02-21T02:42:13.757953881Z github.com/cilium/cilium/pkg/maps/ctmap.doGC() 2022-02-21T02:42:13.757960362Z /go/src/github.com/cilium/cilium/pkg/maps/ctmap/ctmap.go:517 +0x17e 2022-02-21T02:42:13.757966185Z github.com/cilium/cilium/pkg/maps/ctmap.GC() 2022-02-21T02:42:13.757972307Z /go/src/github.com/cilium/cilium/pkg/maps/ctmap/ctmap.go:537 +0xc6 2022-02-21T02:42:13.757978599Z github.com/cilium/cilium/pkg/endpoint.(*Endpoint).garbageCollectConntrack() 2022-02-21T02:42:13.757986321Z /go/src/github.com/cilium/cilium/pkg/endpoint/bpf.go:1034 +0x804 2022-02-21T02:42:13.757992160Z github.com/cilium/cilium/pkg/endpoint.(*Endpoint).scrubIPsInConntrackTableLocked() 2022-02-21T02:42:13.757998853Z /go/src/github.com/cilium/cilium/pkg/endpoint/bpf.go:1039 +0x173 2022-02-21T02:42:13.758004601Z github.com/cilium/cilium/pkg/endpoint.(*Endpoint).scrubIPsInConntrackTable() 2022-02-21T02:42:13.758010701Z /go/src/github.com/cilium/cilium/pkg/endpoint/bpf.go:1049 +0x44 2022-02-21T02:42:13.758016604Z github.com/cilium/cilium/pkg/endpoint.(*Endpoint).runPreCompilationSteps.func1() 2022-02-21T02:42:13.758022804Z /go/src/github.com/cilium/cilium/pkg/endpoint/bpf.go:761 +0x132 2022-02-21T02:42:13.758034551Z Previous read at 0x00c00081a830 by goroutine 100: 2022-02-21T02:42:13.758040659Z github.com/cilium/cilium/pkg/bpf.(*Map).DumpReliablyWithCallback() 2022-02-21T02:42:13.758046461Z /go/src/github.com/cilium/cilium/pkg/bpf/map_linux.go:756 +0x804 2022-02-21T02:42:13.758053696Z github.com/cilium/cilium/pkg/maps/nat.(*Map).DumpReliablyWithCallback() 2022-02-21T02:42:13.758059818Z /go/src/github.com/cilium/cilium/pkg/maps/nat/nat.go:121 +0x7b7 2022-02-21T02:42:13.758065580Z github.com/cilium/cilium/pkg/maps/ctmap.PurgeOrphanNATEntries() 2022-02-21T02:42:13.758072272Z /go/src/github.com/cilium/cilium/pkg/maps/ctmap/ctmap.go:612 +0x790 2022-02-21T02:42:13.758078005Z github.com/cilium/cilium/pkg/maps/ctmap/gc.runGC() 2022-02-21T02:42:13.758084196Z /go/src/github.com/cilium/cilium/pkg/maps/ctmap/gc/gc.go:214 +0xb0c 2022-02-21T02:42:13.758090362Z github.com/cilium/cilium/pkg/maps/ctmap/gc.runGC() 2022-02-21T02:42:13.758096712Z /go/src/github.com/cilium/cilium/pkg/maps/ctmap/gc/gc.go:189 +0x6a5 2022-02-21T02:42:13.758103134Z github.com/cilium/cilium/pkg/maps/ctmap/gc.runGC() 2022-02-21T02:42:13.758109338Z /go/src/github.com/cilium/cilium/pkg/maps/ctmap/gc/gc.go:189 +0x6a5 2022-02-21T02:42:13.758127517Z github.com/cilium/cilium/pkg/maps/ctmap/gc.Enable.func1() 2022-02-21T02:42:13.758135513Z /go/src/github.com/cilium/cilium/pkg/maps/ctmap/gc/gc.go:90 +0x40a 2022-02-21T02:42:13.758141621Z github.com/cilium/cilium/pkg/maps/ctmap/gc.runGC() 2022-02-21T02:42:13.758147715Z /go/src/github.com/cilium/cilium/pkg/maps/ctmap/gc/gc.go:189 +0x6a5 2022-02-21T02:42:13.758153712Z github.com/cilium/cilium/pkg/maps/ctmap/gc.Enable.func1() 2022-02-21T02:42:13.758160127Z /go/src/github.com/cilium/cilium/pkg/maps/ctmap/gc/gc.go:90 +0x40a ``` Fixes: c9810bf7b2 ("ctmap: GC orphan SNAT entries") Signed-off-by: Aditi Ghag <aditi@cilium.io> Signed-off-by: Paul Chaignon <paul@cilium.io> 07 March 2022, 08:38:06 UTC
4eb07a3 node: Fix incorrect comment for S/GetRouterInfo [ upstream commit d7f64076334d0a62e6c45f4ee42327f173d2b9df ] This function is not specific to ENI IPAM mode anymore since Alibaba and Azure's IPAM modes are also using it. Signed-off-by: Paul Chaignon <paul@cilium.io> 07 March 2022, 08:38:06 UTC
7f4e8ae linux,ipam: Use subnet IPsec for Azure IPAM [ upstream commit 7bc57616b39502a95cbf97dbf6eda6318506f426 ] When using Azure's IPAM mode, we don't have non-overlapping pod CIDRs for each node, so we can't rely on the default IPsec mode where we use the destination CIDRs to match the xfrm policies. Instead, we need to enable subnet IPsec as in EKS. In that case, the dir=out xfrm policy and state look like: src 0.0.0.0/0 dst 10.240.0.0/16 dir out priority 0 mark 0x3e00/0xff00 tmpl src 0.0.0.0 dst 10.240.0.0 proto esp spi 0x00000003 reqid 1 mode tunnel src 0.0.0.0 dst 10.240.0.0 proto esp spi 0x00000003 reqid 1 mode tunnel replay-window 0 mark 0x3e00/0xff00 output-mark 0xe00/0xf00 aead rfc4106(gcm(aes)) 0x567a47ff70a43a3914719a593d5b12edce25a971 128 anti-replay context: seq 0x0, oseq 0x105, bitmap 0x00000000 sel src 0.0.0.0/0 dst 0.0.0.0/0 As can be seen the xfrm policy matches on a broad /16 encompassing all endpoints in the cluster. The xfrm state then matches the policy's template. Finally, to write the proper outer destination IP, we need to define the IP_POOLS macro in our datapath. That way, our BPF programs will determine the outer IP from the ipcache lookup. Signed-off-by: Paul Chaignon <paul@cilium.io> 07 March 2022, 08:38:06 UTC
4473d90 hubble: Added nil check in filterByTCPFlags() to avoid segfault [ upstream commit 7e8b65187a4d37e0c41fd29c8d853ad44ecb5fd9 ] Cilium agent crashes when an L7/HTTP flow is passed to TCP flag flow filter (`filterByTCPFlags`). This is because HTTP flows will have some L4/TCP info such as src/dst port in the flow struct, but will not contain TCP flags. Added `nil` check for TCP flag pointer to avoid the segfault. Fixes: #18830 Signed-off-by: Wazir Ahmed <wazir@accuknox.com> Signed-off-by: Paul Chaignon <paul@cilium.io> 07 March 2022, 08:38:06 UTC
23d5779 docs: update Azure Service Principal / IPAM documentation [ upstream commit d9d23ba78fe692e2548047af067122017254c2a5 ] When installing Cilium in an AKS cluster, the Cilium Operator requires an Azure Service Principal with sufficient privileges to the Azure API for the IPAM allocator to be able to work. Previously, the `az ad sp create-for-rbac` was assigning by default the `Contributor` role to new Service Principals when none was provided via the optional `--role` flag, whereas it now does not assign any role at all. This of course breaks IPAM allocation due to insufficient permissions, resulting in operator failures of this kind: ``` level=warning msg="Unable to synchronize Azure virtualnetworks list" error="network.VirtualNetworksClient#ListAll: Failure responding to request: StatusCode=403 -- Original Error: autorest/azure: Service returned an error. Status=403 Code=\"AuthorizationFailed\" Message=\"The client 'd09fb531-793a-40fc-b934-7af73ca60e32' with object id 'd09fb531-793a-40fc-b934-7af73ca60e32' does not have authorization to perform action 'Microsoft.Network/virtualNetworks/read' over scope '/subscriptions/22716d91-fb67-4a07-ac5f-d36ea49d6167' or the scope is invalid. If access was recently granted, please refresh your credentials.\"" subsys=azure level=fatal msg="Unable to start azure allocator" error="Initial synchronization with instances API failed" subsys=cilium-operator-azure ``` We update the documentation guidelines for new installations to assign the `Contributor` role to new Service Principals used for Cilium. We also take the opportunity to: - Update Azure IPAM required privileges documentation. - Make it so users can now set up all AKS-specific required variables for a Helm install in a single command block, rather than have it spread over several command blocks with intermediate steps and temporary files. - Have the documentation recommend creating Service Principals with privileges over a restricted scope (AKS node resource group) for increased security. Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com> Signed-off-by: Paul Chaignon <paul@cilium.io> 07 March 2022, 08:38:06 UTC
fc23a2e jenkinsfiles: bump runtime tests VM boot timeout [ upstream commit 4c3bd27c275cb16c8d4dca62d7fe51e649ecd98e ] We are hitting this timeout sometimes, and it seems it was previously updated on the regular pipelines (see 31a622ea40ff9b47bb73469b89c51db2d090b0e2) but not on the runtime pipeline. We remove the inner timeout as the outer one is pratically redundant here, as the steps outside of the inner loop are almost instantaneous. Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com> Signed-off-by: Paul Chaignon <paul@cilium.io> 07 March 2022, 08:38:06 UTC
c871de7 docs: Remove trailing step in AWS helm install [ upstream commit 6c647ba7b06c93bf0350907255ab3c43154810b5 ] Commit 706c9009dc39 ("docs: re-write docs to create clusters with tainted nodes") removed the command for this instruction step, so there's no need to have the instruction any more. Remove it. Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Paul Chaignon <paul@cilium.io> 07 March 2022, 08:38:06 UTC
e1ddb1e pkg/datapath/linux: Fix asymmetric IPsec logic on delete [ upstream commit 0bd4e04b15d136e54c3930b60e5c4c129ec869ef ] With ENI IPAM mode and IPsec enabled, users were reporting cases where connectivity to particular pods breaks, and correlated with those drops, the following error msg: ``` Unable to delete the IPsec route OUT from the host routing table ``` In addition, it was also reported that the connectivity outage would only last for a few minutes before resolving itself. The issue turned out to be that upon node deletion, the logic to handle the IPsec cleanup is asymmetric with the IPsec logic to handle a node create / update. Here's how: * With ENI mode and IPsec, subnet encryption mode is enabled implicitly. * Background: Users can explicitly enable subnet encryption mode by configuring `--ipv4-pod-subnets=[cidr1,cidr2,...]`. * Background: ENIs are part of subnet(s). * Cilium with ENI mode automatically appends the node's ENIs' subnets' CIDRs to this slice. * For example, node A has ENI E which is a part of subnet S with CIDR C. Therefore, `--ipv4-pod-subnets=[C]`. * This means that each node should have an IPsec OUT routes for each pod subnet, i.e. each ENI's subnet, as shown by (*linuxNodeHandler).nodeUpdate() which contains the IPsec logic on a node create / update. * Upon a node delete [(*linuxNodeHandler).nodeDelete()], we clean up the "old" node. When it gets to the IPsec logic, it removes the routes for the pod subnets as well, i.e. removes the route to the ENI's subnet from the local node. From the example above, it'd remove the route for CIDR C. * This is problematic because in ENI mode, different nodes can share the same ENI's subnet, meaning subnets are NOT exclusive to a node. For example, a node B can also have ENI E with a subnet C attached to it. * As for how the nodes were fixing themselves, it turns out that (*Manager).backgroundSync() runs on an interval which calls NodeValidateImplementation() which calls down to (*linuxNodeHandler).nodeUpdate() thereby running the IPsec logic of a node create / update which reinstates the missing routes. Therefore, we shouldn't be deleting these routes because pods might still be relying on them. By comparing the IPsec delete logic with [1], we see that they're asymmetric. This commit fixes this asymmetry. [1]: Given subnetEncryption=true, notice how we only call enableSubnetIPsec() if the node is local. That is not the case on node delete. ``` func (n *linuxNodeHandler) nodeUpdate(oldNode, newNode *nodeTypes.Node, firstAddition bool) error { ... if n.nodeConfig.EnableIPSec && !n.subnetEncryption() && !n.nodeConfig.EncryptNode { n.enableIPsec(newNode) newKey = newNode.EncryptionKey } ... if n.nodeConfig.EnableIPSec && !n.subnetEncryption() { n.encryptNode(newNode) } if newNode.IsLocal() { isLocalNode = true ... if n.subnetEncryption() { n.enableSubnetIPsec(n.nodeConfig.IPv4PodSubnets, n.nodeConfig.IPv6PodSubnets) } ... return nil } ``` Fixes: 645de9dee63 ("cilium: remove encryption route and rules if crypto is disabled") Co-authored-by: John Fastabend <john@isovalent.com> Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Paul Chaignon <paul@cilium.io> 07 March 2022, 08:38:06 UTC
d678394 pkg/datapath/linux: Add CIDR logfield to IPsec route logs [ upstream commit e41aea01908e49381e4dae10fface2a48f230731 ] This helps in scenarios where the user reports this log msg, but we are missing the actual CIDR from the route that failed to be deleted or created. Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Paul Chaignon <paul@cilium.io> 07 March 2022, 08:38:06 UTC
5ec365d pkg/datapath/linux: Remove unnecessary branch in IPsec route functions [ upstream commit 7e5022e5086e109ffdbc385a1789df498006be0e ] These if-statements are unnecessary because upon code analysis, we can tell that it's not possible for the input to be nil. Remove these statements to simplify the flow of the function. In other words, now we know for a fact that calling these function will result in a route insert. Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Paul Chaignon <paul@cilium.io> 07 March 2022, 08:38:06 UTC
d390e08 pkg/datapath, pkg/node/manager: Clarify NodeValidateImplementation godoc [ upstream commit a6e847766e012074aa14fa98ea0a5185434f0f0c ] Document the intent of NodeValidateImplementation(). Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Paul Chaignon <paul@cilium.io> 07 March 2022, 08:38:06 UTC
21d7eb3 go.mod, vendor: pull in latest changes from github.com/vishvananda/netlink [ upstream commit 3222f50d05d722e39563fa65fe99b52c0858941a ] As of Linux kernel commit torvalds/linux@68ac0f3810e7 ("xfrm: state and policy should fail if XFRMA_IF_ID 0"), specifying xfrm if_id = 0 leads to EINVAL being returned. So far, the netlink library always specified the XFRMA_IF_ID netlink attribute, regardless of its value. Upstream PR https://github.com/vishvananda/netlink/pull/727 changed this behavior to only set XFRMA_IF_ID in case XfrmState.Ifid or XfrmPolicy.Ifid are != 0 to fix the issue. Updated using: go get github.com/vishvananda/netlink@main go mod tidy go mod vendor Signed-off-by: Tobias Klauser <tobias@cilium.io> Signed-off-by: Paul Chaignon <paul@cilium.io> 07 March 2022, 08:38:06 UTC
718188f build(deps): bump actions/upload-artifact from 2.3.1 to 3 Bumps [actions/upload-artifact](https://github.com/actions/upload-artifact) from 2.3.1 to 3. - [Release notes](https://github.com/actions/upload-artifact/releases) - [Commits](https://github.com/actions/upload-artifact/compare/82c141cc518b40d92cc801eee768e7aafc9c2fa2...6673cd052c4cd6fcf4b4e6e60ea986c889389535) --- updated-dependencies: - dependency-name: actions/upload-artifact dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> 04 March 2022, 03:08:46 UTC
add2e33 build(deps): bump actions/download-artifact from 2.1.0 to 3 Bumps [actions/download-artifact](https://github.com/actions/download-artifact) from 2.1.0 to 3. - [Release notes](https://github.com/actions/download-artifact/releases) - [Commits](https://github.com/actions/download-artifact/compare/f023be2c48cc18debc3bacd34cb396e0295e2869...fb598a63ae348fa914e94cd0ff38f362e927b741) --- updated-dependencies: - dependency-name: actions/download-artifact dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> 04 March 2022, 03:08:26 UTC
fc4bf52 Prevent unmanaged pods in GKE's containerd flavors The changes that we have been doing to /etc/defaults/kubelet are reset on node reboots, as is apparently the whole /etc directory --- which also means that /etc/cni/net.d/05-cilium.conf is removed. This would not be a problem if the assumption we made that the node taint we recommend placing on the nodes would come back upon reboots held true, but in practice it doesn't. Besides this, it seems that containerd will re-instante its CNI configuration file, and it will do so way before Cilium has had the chance to re-run on the node and re-create its CNI configuration, causing pods to be assigned IPs by the default CNI rather than by Cilium in the meantime. This commit attempts at preventing that from happening by observing that /home/kubernetes/bin/kubelet (i.e. the actual kubelet binary) is kept between reboots and executed concurrently with containerd by systemd. We leverage on this empirical observation to replace this file kubelet with a wrapper script that, under the required conditions, disables containerd, patches its configuration, removes undesired CNI configuration files, re-enables containerd and becomes the kubelet. [ upstream commit 36585e41ec9aaf1d768aa228083e605c804e74b8 ] Signed-off-by: Bruno Miguel Custódio <brunomcustodio@gmail.com> Co-authored-by: Alexandre Perrin <alex@kaworu.ch> Co-authored-by: Chris Tarazi <chris@isovalent.com> 03 March 2022, 14:52:48 UTC
8d41036 Recommend 'NoExecute' instead of 'NoSchedule'. To prevent situations in which the GKE node is forcibly stopped and re-created from causing unmanaged pods, and building on the observation that the node comes back with the same name and pods are already scheduled there, we change the recommended taint effect from NoSchedule to NoExecute, to cause any previously scheduled pods to be evicted, preventing them from getting IPs assigned by the default CNI. This should not impact other environments due to the nature of 'NoExecute', so we recommend it everywhere. [ upstream commit b049574616e2effcaafa65cb5700e661aafc2076 ] Signed-off-by: Bruno Miguel Custódio <brunomcustodio@gmail.com> Co-authored-by: Tam Mach <sayboras@yahoo.com> 03 March 2022, 14:52:48 UTC
fe0ebc3 alibabacloud: Read pre-allocate from CNI config [ upstream commit 842f6c8d2c1dd75a1c1cabd0f3d963ac5d082dc8 ] Currently, cilium-agent using alibaba ipam mode doesn't respect pre-allocate configuration from CNI config file when creating ciliumnode resource, and the value of pre-allocate is always the default value 8. This patch makes this option configurable via CNI config. Signed-off-by: Jaff Cheng <jaff.cheng.sh@gmail.com> Signed-off-by: Maciej Kwiek <maciej@isovalent.com> 03 March 2022, 14:52:48 UTC
5f0b760 alibabacloud: Fix panic due to invalid metric name [ upstream commit 76e3aac3b3bcf34f9ee68ce3496b86fe7bc5b18a ] error message: panic: descriptor Desc{fqName: "cilium_operator_alibaba-cloud_api_duration_seconds", help: "Duration of interactions with API", constLabels: {}, variableLabels: [operation response_code]} is invalid: "cilium_operator_alibaba-cloud_api_duration_seconds" is not a valid metric name Signed-off-by: Jaff Cheng <jaff.cheng.sh@gmail.com> Signed-off-by: Maciej Kwiek <maciej@isovalent.com> 03 March 2022, 14:52:48 UTC
e921691 build(deps): bump docker/login-action from 1.14.0 to 1.14.1 Bumps [docker/login-action](https://github.com/docker/login-action) from 1.14.0 to 1.14.1. - [Release notes](https://github.com/docker/login-action/releases) - [Commits](https://github.com/docker/login-action/compare/bb984efc561711aaa26e433c32c3521176eae55b...dd4fa0671be5250ee6f50aedf4cb05514abda2c7) --- updated-dependencies: - dependency-name: docker/login-action dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> 01 March 2022, 23:17:45 UTC
7fc0fbd build(deps): bump actions/checkout from 2.4.0 to 3 Bumps [actions/checkout](https://github.com/actions/checkout) from 2.4.0 to 3. - [Release notes](https://github.com/actions/checkout/releases) - [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md) - [Commits](https://github.com/actions/checkout/compare/ec3a7ce113134d7a93b817d10a8272cb61118579...a12a3943b4bdde767164f792f33f40b04645d846) --- updated-dependencies: - dependency-name: actions/checkout dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> 01 March 2022, 23:09:13 UTC
706fe01 build(deps): bump actions/setup-go from 2.2.0 to 3 Bumps [actions/setup-go](https://github.com/actions/setup-go) from 2.2.0 to 3. - [Release notes](https://github.com/actions/setup-go/releases) - [Commits](https://github.com/actions/setup-go/compare/bfdd3570ce990073878bf10f6b2d79082de49492...f6164bd8c8acb4a71fb2791a8b6c4024ff038dab) --- updated-dependencies: - dependency-name: actions/setup-go dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> 01 March 2022, 11:06:55 UTC
53b3100 build(deps): bump golangci/golangci-lint-action from 3.0.0 to 3.1.0 Bumps [golangci/golangci-lint-action](https://github.com/golangci/golangci-lint-action) from 3.0.0 to 3.1.0. - [Release notes](https://github.com/golangci/golangci-lint-action/releases) - [Commits](https://github.com/golangci/golangci-lint-action/compare/c675eb70db3aa26b496bc4e64da320480338d41b...b517f99ae23d86ecc4c0dec08dcf48d2336abc29) --- updated-dependencies: - dependency-name: golangci/golangci-lint-action dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> 01 March 2022, 11:06:15 UTC
98c8544 build(deps): bump docker/login-action from 1.13.0 to 1.14.0 Bumps [docker/login-action](https://github.com/docker/login-action) from 1.13.0 to 1.14.0. - [Release notes](https://github.com/docker/login-action/releases) - [Commits](https://github.com/docker/login-action/compare/6af3c118c8376c675363897acf1757f7a9be6583...bb984efc561711aaa26e433c32c3521176eae55b) --- updated-dependencies: - dependency-name: docker/login-action dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> 01 March 2022, 11:05:41 UTC
6ea55da build(deps): bump KyleMayes/install-llvm-action from 1.5.0 to 1.5.1 Bumps [KyleMayes/install-llvm-action](https://github.com/KyleMayes/install-llvm-action) from 1.5.0 to 1.5.1. - [Release notes](https://github.com/KyleMayes/install-llvm-action/releases) - [Commits](https://github.com/KyleMayes/install-llvm-action/compare/v1.5.0...v1.5.1) --- updated-dependencies: - dependency-name: KyleMayes/install-llvm-action dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> 28 February 2022, 13:25:38 UTC
680f210 build(deps): bump golangci/golangci-lint-action from 2.5.2 to 3 Bumps [golangci/golangci-lint-action](https://github.com/golangci/golangci-lint-action) from 2.5.2 to 3. - [Release notes](https://github.com/golangci/golangci-lint-action/releases) - [Commits](https://github.com/golangci/golangci-lint-action/compare/5c56cd6c9dc07901af25baab6f2b0d9f3b7c3018...c675eb70db3aa26b496bc4e64da320480338d41b) --- updated-dependencies: - dependency-name: golangci/golangci-lint-action dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> 25 February 2022, 15:42:45 UTC
fa82b35 install: Update image digests for v1.10.8 Generated from https://github.com/cilium/cilium/actions/runs/1890093745. `docker.io/cilium/cilium:v1.10.8@sha256:e6147e39a03c685e5f1225c5642e1358dcd4899bbd94e8a043bb4be52cd2f008` `quay.io/cilium/cilium:v1.10.8@sha256:e6147e39a03c685e5f1225c5642e1358dcd4899bbd94e8a043bb4be52cd2f008` `docker.io/cilium/clustermesh-apiserver:v1.10.8@sha256:c675830b9f87596680d2a45cd78c2d64ab1ceb8707629e8da71217f64e5e72e1` `quay.io/cilium/clustermesh-apiserver:v1.10.8@sha256:c675830b9f87596680d2a45cd78c2d64ab1ceb8707629e8da71217f64e5e72e1` `docker.io/cilium/docker-plugin:v1.10.8@sha256:d442e44a50ff188ca90a0af04778574348d23e21c059763491a4527ea94e0b38` `quay.io/cilium/docker-plugin:v1.10.8@sha256:d442e44a50ff188ca90a0af04778574348d23e21c059763491a4527ea94e0b38` `docker.io/cilium/hubble-relay:v1.10.8@sha256:4d2ee6b41475f6d74855d77b018f508ba978d964528f903c8e3e7be8dd275b31` `quay.io/cilium/hubble-relay:v1.10.8@sha256:4d2ee6b41475f6d74855d77b018f508ba978d964528f903c8e3e7be8dd275b31` `docker.io/cilium/operator-alibabacloud:v1.10.8@sha256:5b488759bd37890aaf1607287b902f1199288f35fc6d8ee9e2f51644f8fdc646` `quay.io/cilium/operator-alibabacloud:v1.10.8@sha256:5b488759bd37890aaf1607287b902f1199288f35fc6d8ee9e2f51644f8fdc646` `docker.io/cilium/operator-aws:v1.10.8@sha256:d591b998273f8601dd42a3f0a0b097d65077c30255b7dc5af837e0118bda6f5f` `quay.io/cilium/operator-aws:v1.10.8@sha256:d591b998273f8601dd42a3f0a0b097d65077c30255b7dc5af837e0118bda6f5f` `docker.io/cilium/operator-azure:v1.10.8@sha256:81b62495f6c682446a07f7a5ca9ec2887c99f4820b460a3c5610ecec05789140` `quay.io/cilium/operator-azure:v1.10.8@sha256:81b62495f6c682446a07f7a5ca9ec2887c99f4820b460a3c5610ecec05789140` `docker.io/cilium/operator-generic:v1.10.8@sha256:a77dff6103d047d8810ea5e80067b2fade6d099771c8dda197bdba5e4e2f0255` `quay.io/cilium/operator-generic:v1.10.8@sha256:a77dff6103d047d8810ea5e80067b2fade6d099771c8dda197bdba5e4e2f0255` `docker.io/cilium/operator:v1.10.8@sha256:98b31afa482cb9160d7bf7420b2fabf32435c0ea44164696013b5356014809c7` `quay.io/cilium/operator:v1.10.8@sha256:98b31afa482cb9160d7bf7420b2fabf32435c0ea44164696013b5356014809c7` Signed-off-by: Joe Stringer <joe@cilium.io> 24 February 2022, 05:52:56 UTC
b8e0a95 Prepare for release v1.10.8 Signed-off-by: Joe Stringer <joe@cilium.io> 23 February 2022, 22:29:19 UTC
73beb94 envoy: Update to 1.21.1 [ upstream commit 571a48430b01230378efce8be9df636b3c2b7777 ] Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> 23 February 2022, 20:47:57 UTC
bf884a6 envoy: Update to release 1.21.0 [ upstream commit 28f0dae2666d917a818f90a8c84f147fdf977daf ] [ Backporter's notes: Dropped all Envoy API changes, adapted BPF TPROXY compatibility to the older API. ] Envoy Go API is updated to contain the generated validation code. Envoy image is updated to support the new EndpointId option for the bpf_metadata listener filter. NPDS field 'Policy' is renamed as 'EndpointID'. 'Policy' field was not used for anything, so might as well recycle it while this API is not yet public. Envoy retries may fail on "address already in use" when the original source address and port are used on upstream connections. Cilium typically does this in the egress proxy listeners. Fix this by using a Cilium Envoy build that always sets SO_REUSEADDR when original source address and port is used. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> 23 February 2022, 20:47:57 UTC
63da905 update cilium-{runtime,builder} Signed-off-by: Joe Stringer <joe@cilium.io> 22 February 2022, 18:58:36 UTC
8bc765c build(deps): bump docker/login-action from 1.12.0 to 1.13.0 Bumps [docker/login-action](https://github.com/docker/login-action) from 1.12.0 to 1.13.0. - [Release notes](https://github.com/docker/login-action/releases) - [Commits](https://github.com/docker/login-action/compare/42d299face0c5c43a0487c477f595ac9cf22f1a7...6af3c118c8376c675363897acf1757f7a9be6583) --- updated-dependencies: - dependency-name: docker/login-action dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> 21 February 2022, 19:17:06 UTC
713e44a Update Go to 1.16.14 Signed-off-by: Tobias Klauser <tobias@cilium.io> 16 February 2022, 13:36:57 UTC
4359883 ipcache: Reduce identity scope for other "hosts" [ upstream commit f6a4104253f90dd71a99b83393dc048e9ed1d807 ] This patch updates the Cilium logic for handling remote node identity updates to ensure that when Cilium's '--enable-remote-node-identity' flag is configured, each Cilium node will consistently consider all other nodes as having the "remote-node" identity. This fixes an issue where users reported policy drops from remote nodes -> pods, even though the policy appeared to allow this. The issue was limited to kvstore configurations of Cilium, and does not affect configurations where CRDs are used for sharing information within the cluster. For background: When Cilium starts up, it locally scans for IP addresses associated with the node, and updates its own IPcache to associate those IPs with the "host" identity. Additionally, it will also publish this information to other nodes so that they can make policy decisions regarding traffic coming from IPs attached to nodes in the cluster. Before commit 7bf60a59f072 ("nodediscovery: Fix local host identity propagation"), Cilium would propagate the identity "remote-node" as part of these updates to other nodes. After that commit, it would propagate the identity "host" as part of these updates to other nodes. When receiving these updates, Cilium would trust the identity directly and push IP->Identity mappings like this into the datapath, regardless of whether the '--enable-remote-node-identity' setting was configured or not. As such, when the above commit changed the behaviour, it triggered a change in policy handling behaviour. The '--enable-remote-node-identity' flag was initially introduced to allow the security domain of remote nodes in the cluster to be considered differently vs. the local host. This can be important as Kubernetes defines that the host should always have access to pods on the node, so if all nodes are considered the same as the "host", this can represent a larger open policy surface for pods than necessary in a zero trust environment. Given the potential security implications of this setting, at the time that it was introduced, we introduced mitigations both in the control plane and in the data plane. Whenever the datapath is configured with --enable-remote-node-identity=true, it will also distrust any reports that peer node identities are "host", even if the ipcache itself reports this. In this situation, the datapath does not accept that the traffic is from the "host". Rather, it demotes the identity of the traffic to considering it as part of the "world". The motivation behind this is that allowing "world" is a very permissive policy, so if the user is OK with allowing "world" traffic then it is likely that they will be OK with accepting any traffic like this which purports to be coming from a "host" in the cluster. As a result of the above conditions, users running in kvstore mode who upgraded from earlier Cilium versions to 1.9.12, 1.10.6 or 1.11.0 (and other releases up until this patch is released as part of an official version) could observe traffic drops for traffic from nodes in the cluster towards pods on other nodes in the cluster. Hubble would report that the traffic is coming "from the world" (identity=2), despite having a source address of another node in the cluster. We considered multiple approaches to solving this issue: A) Revert the commit that introduced the issue (see GH-18763). * Evidently, by this point there are multiple other codepaths relying on the internal storage of the local node's identity as Host, which would make this more difficult. B) Ensure that the kvstore propagation code propagates the current node's identity as "remote-node", as other nodes may expect. * In cases of versions with mixed knowledge of remote-node-identity (for instance during upgrade), then newer nodes could end up propagating the new identity, but old nodes would not understand how to calculate policy with this identity in consideration, so this could result in similar sorts of policy drops during upgrade. C) In the case when --enable-remote-node-identity=true, ensure that when Cilium receives updates from peer nodes, it demotes the "host" identity reported by peer nodes down to "remote-node" for the associated IP addresses. This way, the impact of the flag is limited to the way that the current node configures itself only. If the datapath is then informed (via ipcache) that thes IPs correspond to "remote-node", then the policy will be correctly assessed. This commit takes approach (C). Fixes: 7bf60a59f072 ("nodediscovery: Fix local host identity propagation") Co-authored-by: André Martins <andre@cilium.io> Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Gilberto Bertin <gilberto@isovalent.com> 16 February 2022, 09:52:22 UTC
3e39dcc docs: add NeighDiscovery, arping and neighbour to spelling wordlist Signed-off-by: Gilberto Bertin <gilberto@isovalent.com> 14 February 2022, 11:29:19 UTC
eb9f359 docs: Update clustermesh example verification steps [ upstream commit 461d6d1abab46e759c9666c6cf2c1c3a198a0128 ] This commit is to update verification command to avoid the need of passing pod name explicitly, so that it's easier for users to just copy and paste. Signed-off-by: Tam Mach <tam.mach@isovalent.com> Signed-off-by: Gilberto Bertin <gilberto@isovalent.com> 14 February 2022, 11:29:19 UTC
b38cd45 ci: fix QEMU image build following Google Cloud SDK updates [ upstream commit 7bcf6d19a2b85da135b7d39cae659f8c3cbce8ea ] Recent Google Cloud SDK updates broke our QEMU image build: ``` 205.3 /usr/bin/gcloud: 190: exec: /usr/bin/../lib/google-cloud-sdk/platform/bundledpythonunix/bin/python3: not found ``` Google tracker link: https://issuetracker.google.com/issues/216325949 This has been reportedly fixed in version 371, however we still hit an issue: ``` 271.2 Setting up google-cloud-sdk (371.0.0-0) ... 271.9 /usr/bin/gcloud: 192: exec: python: not found ``` This is because the `python` dependency has been removed from `google-cloud-sdk`: ``` 23.07 The following NEW packages will be installed: 23.07 google-cloud-sdk kubectl ``` Previously, from a working run: ``` 21.85 The following NEW packages will be installed: 21.85 google-cloud-sdk kubectl libexpat1 libmpdec2 libpython3-stdlib 21.85 libpython3.8-minimal libpython3.8-stdlib mime-support python3 21.85 python3-minimal python3.8 python3.8-minimal ``` Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com> Signed-off-by: Gilberto Bertin <gilberto@isovalent.com> 14 February 2022, 11:29:19 UTC
9011bd0 helm: Add values for custom service monitor annotations [ upstream commit f49d39243ffa9891e6e511a600527a4c4a9b8936 ] Define Helm values to include custom annotations for cilium-agent / cilium-operator / hubble service monitors similar to how you can define custom annotations for other resources. Signed-off-by: Michi Mutsuzaki <michi@isovalent.com> Signed-off-by: Gilberto Bertin <gilberto@isovalent.com> 14 February 2022, 11:29:19 UTC
958d2c2 build(deps): bump actions/setup-go from 2.1.5 to 2.2.0 Bumps [actions/setup-go](https://github.com/actions/setup-go) from 2.1.5 to 2.2.0. - [Release notes](https://github.com/actions/setup-go/releases) - [Commits](https://github.com/actions/setup-go/compare/424fc82d43fa5a37540bae62709ddcc23d9520d4...bfdd3570ce990073878bf10f6b2d79082de49492) --- updated-dependencies: - dependency-name: actions/setup-go dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> 10 February 2022, 17:49:04 UTC
3465d91 ci: remove box download timeout in upstream tests [ upstream commit 96f4050963881e84ccec0540b78277987c25e360 ] This timeout can be too small when the host has to download all boxes due to not having any of the boxes required for the SHA to be tested. In particular this is prone to happen on backport PRs, since it's more likely for the job to be scheduled on a node that primarily run `master` pipelines up to that point. Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com> Signed-off-by: Jussi Maki <jussi@isovalent.com> 08 February 2022, 12:15:55 UTC
9044b9e docs(labels): Document that labels are in fact patterns [ upstream commit cf39553156f9933d809a18af157354b1de3d3acd ] Signed-off-by: Tom Payne <tom@isovalent.com> Signed-off-by: Jussi Maki <jussi@isovalent.com> 08 February 2022, 12:15:55 UTC
6688d93 labelfilter: Refine default label regexps [ upstream commit 422d7fc95c7bdb5acf37094b47a2ed92cc245fd3 ] Cilium treats label patterns as regular expressions. The existing default labels, e.g. "!k8s.io", used a '.', which matches any character. This led to the default labels being too permissive in their matching and consequently labels like "k8sXo" being excluded from the identity, with consequent security implications. This commit properly escapes the regular expressions used in the default labels. Signed-off-by: Tom Payne <tom@isovalent.com> Signed-off-by: Jussi Maki <jussi@isovalent.com> 08 February 2022, 12:15:55 UTC
87be500 docs: export KUBECONFIG for cilium-cli with k3s [ upstream commit b4c9ba134cea5c0cb1678b6454f1a826eb13ad02 ] Previously, commit 606b5fe9f49f ("docs: KUBECONFIG for cilium-cli with k3s") and commit 606b5fe9f49f ("docs: KUBECONFIG for cilium-cli with k3s") changed `cilium install` commands to set KUBECONFIG explicitly. However, successive cilium-cli commands (e.g. `cilium status` or `cilium connectivity test` in the getting started guide) will need the corresponding kubeconfig as well. Thus, suggest to `export KUBECONFIG` at the top of the k3s specific guides. For https://github.com/cilium/cilium-cli/issues/696 Signed-off-by: Tobias Klauser <tobias@cilium.io> Signed-off-by: Jussi Maki <jussi@isovalent.com> 08 February 2022, 12:15:55 UTC
dc56afe docs: fix `cilium install` command in k3s guide [ upstream commit ab891aecefc0305b6bc421cb5e341ff3adf6dc33 ] Explicity set `KUBECONFIG` when using `cilium install`, otherwise installation might target the wrong cluster. Follow commit 606b5fe9f49f ("docs: KUBECONFIG for cilium-cli with k3s") which already changed this in the quick start guide. Signed-off-by: Tobias Klauser <tobias@cilium.io> Signed-off-by: Jussi Maki <jussi@isovalent.com> 08 February 2022, 12:15:55 UTC
857ae25 docs: disable k3s network policy enforcement [ upstream commit 363cee2db713a5a1be837b22bf3ca06200c2fa3e ] We already suggest to disable k3s network policy enforcement in the quick start guide, see commit 1178b04de559 ("docs: disable k3s network policy enforcement") and commit 7bd301bb5606 ("docs(k3s): add back the flag to disable network policies"). Suggest doing so in the k3s-specific advanced installation guide as well. Signed-off-by: Tobias Klauser <tobias@cilium.io> Signed-off-by: Jussi Maki <jussi@isovalent.com> 08 February 2022, 12:15:55 UTC
c5caa23 cmd: Fix stringer for mapOption [ upstream commit ee7b4020163831baf835d82bf678e3432706d022 ] The Set() function for this struct is expecting k=v format, however, the String() implementation is not returning same format, which causes some issue while parsing for few fields such as kvstore-opt, and api-rate-limit. Relates to cb14db88d7 Signed-off-by: Tam Mach <tam.mach@isovalent.com> Signed-off-by: Jussi Maki <jussi@isovalent.com> 08 February 2022, 12:15:55 UTC
39177b0 cmd: Fix issue reading string map type via config map [ upstream commit 768659f2fe1520e89330cdc3f12aa26e219e9669 ] As mentioned in below upstream issue, there is a discrepancy in viper while reading string map string data type i.e. kv pair format was not supported, only `{"k":"v"}` format is allowed. This commit is to wrap GetStringMapString implementation to handle such case. Also, during the bootstrap, if there is any flag having invalid value, fatal log will be printed for early detection and awareness. Relates https://github.com/spf13/viper/issues/911 Fixes https://github.com/cilium/cilium/issues/18328 Signed-off-by: Tam Mach <tam.mach@isovalent.com> Signed-off-by: Jussi Maki <jussi@isovalent.com> 08 February 2022, 12:15:55 UTC
5bfea24 This change fixes commandline arguments processing of clustermesh-apiserver [ upstream commit d76a06b99d1727cb5807eab3ab326131ffb9011e ] Commandline arguments accessed through the option.Config object (eg. --identity-allocation-mode, --kvstore-opt) were not processed properly. Function option.Config.Populate() was called too soon (before calling of rootCmd.Execute(), but os.Args are processed just by rootCmd.Execute()). Also there was missing debug log-level setting so debug messages did not work at all. Signed-off-by: adam.bocim <adam.bocim@firma.seznam.cz> Signed-off-by: Jussi Maki <jussi@isovalent.com> 08 February 2022, 12:15:55 UTC
f73105a test: Report information to debug failures [ upstream commit 9020fdad47eb2894a7a11f5985087b34dc1d1160 ] Signed-off-by: Aditi Ghag <aditi@cilium.io> Signed-off-by: Jussi Maki <jussi@isovalent.com> 08 February 2022, 12:15:55 UTC
f36afa7 test: Add tests for local redirect policy of type address matcher [ upstream commit 66967dd5e3fdce3cddef56cd40b959552dae9a60 ] Test kiam like use cases (LRP select host networked pods) with an address matcher LRP - https://docs.cilium.io/en/latest/gettingstarted/local-redirect-policy/#addressmatcher. Signed-off-by: Aditi Ghag <aditi@cilium.io> Signed-off-by: Jussi Maki <jussi@isovalent.com> 08 February 2022, 12:15:55 UTC
6146f2b k8s/watchers: Fix local redirect policies selecting host networked pods [ upstream commit fcbeb6402b2ce63be373e27a216c4b6806188def ] Commits da35c88eb9f8 and 0ab4fa184d3a introduced a regression for local-redirect use cases like kiam, whereby host networked pod updates were skipped. As a result, node-local redirection for cases where LRPs select host networked pods as backends broke. Tested the fix on an EKS configured with kiam setup. Fixes: da35c88eb9f8 ("k8s/watchers: don't silently ignore (*K8sWatcher).updatePodHostData error") Fixes: 0ab4fa184d3a (pkg/k8s: ignore certain ipcache errors) Fixes: #16920 Signed-off-by: Aditi Ghag <aditi@cilium.io> Signed-off-by: Jussi Maki <jussi@isovalent.com> 08 February 2022, 12:15:55 UTC
a7853b1 docs: Update kiam example in local-redirect-policy doc [ upstream commit ab88f85ab9596b5eb7dadfb9a04ae26e8d1ad479 ] The configuration option is moved from extra args - https://artifacthub.io/packages/helm/uswitch/kiam/5.9.0. Signed-off-by: Aditi Ghag <aditi@cilium.io> Signed-off-by: Jussi Maki <jussi@isovalent.com> 08 February 2022, 12:15:55 UTC
c8eae74 bugtool: Collect local redirect policy related data [ upstream commit 1287c63681ddd3d1bc36368d33d0fd3674b30ad4 ] Signed-off-by: Aditi Ghag <aditi@cilium.io> Signed-off-by: Jussi Maki <jussi@isovalent.com> 08 February 2022, 12:15:55 UTC
c5ca178 feat: add hands-on tutorials [ upstream commit e83c81882b4a87a54ab52355eae32ec073cea22d ] Add the following text to documentation: Hands-on tutorial in a live environment to quickly get started with Cilium. Signed-off-by: Van Le <vannnyle@gmail.com> Signed-off-by: Jussi Maki <jussi@isovalent.com> 08 February 2022, 12:15:55 UTC
052ceb7 datapath: Only unload obsolete XDP when attached [ upstream commit 6f9e1df113a2a00a7b80360fa9c939b9fc6c2fe4 ] Currently, we determine whether a device is attached with an XDP program by checking if `link.Attrs().Xdp == nil`. We need to check for `Xdp.Attached` as well, otherwise, the XDP unloading will be executed on innocent devices every time the agent restarts, which might further cause network interruption for some NIC drivers, e.g. Mellanox mlx5. Signed-off-by: Jaff Cheng <jaff.cheng.sh@gmail.com> Signed-off-by: Glib Smaga <code@gsmaga.com> 04 February 2022, 15:21:52 UTC
c64ac25 test: Clean up DNS pods more gracefully [ upstream commit 8302f0ec299a3f9100ac72d2d0c113466de9894a ] As of the writing of this patch, Cilium does not support transitioning between enable-endpoint-routes=true and enable-endpoint-routes=false. Commit c18cfc874620 ("test: Redeploy DNS pods in AfterAll for datapath tests") attempted to ensure that DNS pods are properly deleted from nodes prior to completing a particular test file. This was to ensure that when making changes to Cilium configuration like toggling endpoint routes mode, the pods are properly switched over to the new mode. Previously, the order of the test cleanup could cause problems with terminating the kube-dns pods because Cilium was first deleted from the cluster, but the kube-dns pods would get stuck in "Terminating" state. This is believed to be because the nodes would still attempt to call the cilium-cni binary in order to remove the DNS pods, and that would attempt to reach out to the cilium-agent on the node, which is no longer present. The motivation behind this patch, then, is to first scale the pods down to zero so that no kube-dns pods are present in the cluster during test file cleanup, then shut down Cilium. Finally, we restore the number of replicas back to the original value, so that when the next test sets Cilium back up, the kubedns pods will be ready to be deployed afresh. Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Glib Smaga <code@gsmaga.com> 04 February 2022, 15:21:52 UTC
55e8ef7 helpers: Scale DNS pods up/down with replicas [ upstream commit 95a46c65223b205b1c15da21a9c560e2a2bef487 ] Use the spec.replicas field in the kube-dns deployment to reconfigure the number of DNS pods in the cluster during pod restart. This should restart the kube-dns pods correctly, similar to just deleting the pods. A subsequent patch intends to split this usage out depending the status of Cilium deployments, in order to mitigate issues where kube-dns pods are deployed differently between test runs due to Cilium configuration (like endpoint routes mode). Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Glib Smaga <code@gsmaga.com> 04 February 2022, 15:21:52 UTC
fb882ec test: Fix CustomCalls pod teardown [ upstream commit fb2d11c445ddf3d3013da868f84db671c16e566d ] Deleting all pods after deleting Cilium is a sure-fire way to introduce flakiness where pods hang in Terminating state during deletion, because the Cilium pods have been cleaned up and kubelet can't call into the CNI to properly clean up the pods. Delete the pods first and wait for termination to finish. CC: Quentin Monnet <quentin@isovalent.com> Fixes: 9d4e99d500d0 ("tests: rework custom calls's AfterEach/AfterAll blocks to skip if needed") Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Glib Smaga <code@gsmaga.com> 04 February 2022, 15:21:52 UTC
bee0525 test: Properly skip CustomCalls test [ upstream commit 119653b08cf08f10d22e3a4466fa96e810986176 ] Previously the skip was on the Context which meant that the BeforeAll/AfterAll for the K8sCustomCalls Describe would be run, which could deploy pods and fail to clean them up properly. Skip these unnecessary steps as they can cause problems in test runs where this test is skipped, and because it's unnecessary work. Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Glib Smaga <code@gsmaga.com> 04 February 2022, 15:21:52 UTC
14980ad test: Properly skip K8sBandwidth tests [ upstream commit 216bcb187afe0ac6f080e2ee74030f25e31cb5b3 ] Previously the skip was on the Context which meant that the BeforeAll/AfterAll for the K8sBandwidthTest Describe would be run, which could deploy pods and fail to clean them up properly. Skip these unnecessary steps as they can cause problems in test runs where this test is skipped, and because it's unnecessary work. Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Glib Smaga <code@gsmaga.com> 04 February 2022, 15:21:52 UTC
4e279a6 test: Improve error reporting when wait fails [ upstream commit 323cb8738fe183f0a34479a76767868f40f00b6a ] When the test fails because it is waiting for containers to terminate, add this information into the error that is reported so that developers can more easily debug the issue that's occurring. Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Glib Smaga <code@gsmaga.com> 04 February 2022, 15:21:52 UTC
2b6e557 test: Fix external_ips test cleanup [ upstream commit 2ae89c185c5a14e2db354434d461659120866fa4 ] This test seems like it's always skipped, so who knows whether it works or not. But this should at least mitigate issues with cleaning up state at the end of the test, similar to the issues in the adjacent commits. Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Glib Smaga <code@gsmaga.com> 04 February 2022, 15:21:52 UTC
00ea832 test: Fix pod cleanup after LRP test group [ upstream commit 6c3bf645629d52250570b8163062f337cd27675e ] According to the Ginkgo e2e test docs, All "AfterEach" statements are executed before all "AfterAll" statements[1]. Previously, this code was assuming that AfterAll inside a context would run first (to delete pods) and then the AfterEach would run outside the context (to wait for the pods to terminate), then AfterAll outside the context to finally clean up the Cilium pods. The result was that in issue 18447, the next test to run could hit a race condition where pods were deleted but did not fully terminate before Cilium was removed. Cilium ends up getting deleted before all the pods, which means that there is no longer any way to execute the CNI DEL operation for those pods, so they get stuck in Terminating state. Fix it by moving the check that the test pods are fully terminated before returning from the AfterAll statement where we initiate the deletion of those pods. [0]: https://docs.cilium.io/en/latest/contributing/testing/e2e/#example-test-layout Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Glib Smaga <code@gsmaga.com> 04 February 2022, 15:21:52 UTC
a54c2c0 test/runtime: fix flake on non-ready endpoints [ upstream commit f49dda790824eba09b0f030de17f18f8184ff469 ] Ensure that all endpoints exist and are ready before attempting to extract a specific endpoint from GetAllEndpointsIds results. Follows commit 3738680a91eb ("test/runtime: Fix flake on reserved:init endpoints") which introduced a similar check before calls to GetEndpointIds. Signed-off-by: Tobias Klauser <tobias@cilium.io> Signed-off-by: Glib Smaga <code@gsmaga.com> 04 February 2022, 15:21:52 UTC
6db1feb test/runtime: define initContainer as const [ upstream commit b6a41d2f8689ef2aae87de5f85aa0a778690f4f3 ] The init container name doesn't change during tests, so define it const. Signed-off-by: Tobias Klauser <tobias@cilium.io> Signed-off-by: Glib Smaga <code@gsmaga.com> 04 February 2022, 15:21:52 UTC
e38f9d9 metrics: Expose xfrm stats via prometheus metrics [ upstream commit 7818a5f7805131b40b941c923a34e046fdb565b7 ] This commit is to expose xfrm stats via prometheus metrics if IPSec is enabled. Fixes: #14725 Signed-off-by: Tam Mach <tam.mach@isovalent.com> Signed-off-by: Glib Smaga <code@gsmaga.com> 04 February 2022, 15:21:52 UTC
00d45e6 doc: harmonize managed Kubernetes provider names [ upstream commit c22b8d2a44f01db1aaa5b75459b658ca8d44f004 ] Before this patch the document was inconsistent in naming managed Kubernetes provider names. Signed-off-by: Alexandre Perrin <alex@kaworu.ch> Signed-off-by: Glib Smaga <code@gsmaga.com> 04 February 2022, 15:21:52 UTC
b2268b2 doc: fix copy/paste from the EKS Quick Installation guide [ upstream commit d201abb0cc1fca1222f47b9f7726b0317574d891 ] When copying from a shell-session code block, hash (#) characters are omitted. In this case, the hash characters are used to comment lines in a here doc YAML file and must be included. Signed-off-by: Alexandre Perrin <alex@kaworu.ch> Signed-off-by: Glib Smaga <code@gsmaga.com> 04 February 2022, 15:21:52 UTC
314d4ec doc: add a Requirements section for EKS [ upstream commit 6985c49dd60748f81cccf33aab3589bbb4ee6b67 ] To bring it in line with other provider documentation. Signed-off-by: Alexandre Perrin <alex@kaworu.ch> Signed-off-by: Glib Smaga <code@gsmaga.com> 04 February 2022, 15:21:52 UTC
a921142 doc: remove cluster creation from EKS requirements [ upstream commit 33689da45955a19d07b90b1d1410b6df6012293a ] Before this patch, the EKS requirements documentation would also include creating a new cluster with eksctl. Because of that, the Quick Installation guide would have the cluster creation instructions duplicated, and the Helm installation guide would include the cluster creation (only for EKS). Signed-off-by: Alexandre Perrin <alex@kaworu.ch> Signed-off-by: Glib Smaga <code@gsmaga.com> 04 February 2022, 15:21:52 UTC
3bdd1d2 contrib: Fix backport submission for own PRs [ upstream commit 1b42f7a0cb61208d9070313e526a769983fe5b59 ] On GitHub, one cannot request oneself to review one's own PR. This results in the following problem when submitting a backport PR: $ submit-backport Using GitHub repository joestringer/cilium (git remote: origin) Sending PR for branch v1.10: v1.10 backports 2021-11-23 * #17788 -- Additional FQDN selector identity tracking fixes (@joestringer) Once this PR is merged, you can update the PR labels via: ```upstream-prs $ for pr in 17788; do contrib/backporting/set-labels.py $pr done 1.10; done ``` Sending pull request... remote: remote: Create a pull request for 'pr/v1.10-backport-2021-11-23' on GitHub by visiting: remote: https://github.com/joestringer/cilium/pull/new/pr/v1.10-backport-2021-11-23 remote: Error requesting reviewer: Unprocessable Entity (HTTP 422) Review cannot be requested from pull request author. Signal ERR caught! Traceback (line function script): 58 main /home/joe/git/cilium/contrib/backporting/submit-backport Fix this by excluding ones own username from the reviewers list. Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Glib Smaga <code@gsmaga.com> 04 February 2022, 15:21:52 UTC
956676a build(deps): bump docker/build-push-action from 2.8.0 to 2.9.0 Bumps [docker/build-push-action](https://github.com/docker/build-push-action) from 2.8.0 to 2.9.0. - [Release notes](https://github.com/docker/build-push-action/releases) - [Commits](https://github.com/docker/build-push-action/compare/1814d3dfb36d6f84174e61f4a4b05bd84089a4b9...7f9d37fa544684fb73bfe4835ed7214c255ce02b) --- updated-dependencies: - dependency-name: docker/build-push-action dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> 03 February 2022, 15:37:18 UTC
c985f58 workflows: allow CI to run on fork repositories [ upstream commit cc85af87ac65afebf047d1d7a20a7ce2aa16c4d6 ] Currently CI will run only on the base cilium/cilium repo, due to a check in most workflows in the form of: if: ${{ github.repository == 'cilium/cilium' }} The original intent of this check was to avoid running CI on forks, since workflow secrets are usually not configured, resulting in multiple failures. It turned out that the ability to run CI on forks is something useful to have, and as a bonus point it would ease the development and maintenance of new and existing workflows. Because of this, this commit removes the aforementioned check. Signed-off-by: Gilberto Bertin <gilberto@isovalent.com> 27 January 2022, 16:18:41 UTC
22d0f1a workflows: enable CI for feature branches [ upstream commit c7d983637515c1010cc7db4d3c0ed75e8abb5a88 ] This commit enables CI for all feature branches based on the v1.10 one. The naming convention for the base feature branch is: ft/v1.10/<feature> Signed-off-by: Gilberto Bertin <gilberto@isovalent.com> 27 January 2022, 16:18:41 UTC
94002a1 update k8s library versions k8s 1.19.16, 1.20.15 and 1.21.9 Signed-off-by: André Martins <andre@cilium.io> 25 January 2022, 15:18:57 UTC
1c4e32d install: Update image digests for v1.10.7 Generated from https://github.com/cilium/cilium/actions/runs/1715612597. `docker.io/cilium/cilium:v1.10.7@sha256:e23f55e80e1988db083397987a89967aa204ad6fc32da243b9160fbcea29b0ca` `quay.io/cilium/cilium:v1.10.7@sha256:e23f55e80e1988db083397987a89967aa204ad6fc32da243b9160fbcea29b0ca` `docker.io/cilium/clustermesh-apiserver:v1.10.7@sha256:9afb0a15afffdf84812c8174df9de86e35239fb87a6ffd9539877a9e643d8132` `quay.io/cilium/clustermesh-apiserver:v1.10.7@sha256:9afb0a15afffdf84812c8174df9de86e35239fb87a6ffd9539877a9e643d8132` `docker.io/cilium/docker-plugin:v1.10.7@sha256:7178d952e22c5fadd42dab3e0ee5e174c922cb811d9f5c01143fb0227bb42ad6` `quay.io/cilium/docker-plugin:v1.10.7@sha256:7178d952e22c5fadd42dab3e0ee5e174c922cb811d9f5c01143fb0227bb42ad6` `docker.io/cilium/hubble-relay:v1.10.7@sha256:385fcc4fa315eb6b66626c3e5f607b6b6514c8c3a863c47c2b2dbc97790acb47` `quay.io/cilium/hubble-relay:v1.10.7@sha256:385fcc4fa315eb6b66626c3e5f607b6b6514c8c3a863c47c2b2dbc97790acb47` `docker.io/cilium/operator-alibabacloud:v1.10.7@sha256:7a6ccc99195ae6a8216d2a1e1e0cc05d49c2d263b194895da264899fe9d0f45a` `quay.io/cilium/operator-alibabacloud:v1.10.7@sha256:7a6ccc99195ae6a8216d2a1e1e0cc05d49c2d263b194895da264899fe9d0f45a` `docker.io/cilium/operator-aws:v1.10.7@sha256:97b378e0e3b6b5ade6ae1706024c7a25fe6fc48e00102b65a6b7ac51d6327f40` `quay.io/cilium/operator-aws:v1.10.7@sha256:97b378e0e3b6b5ade6ae1706024c7a25fe6fc48e00102b65a6b7ac51d6327f40` `docker.io/cilium/operator-azure:v1.10.7@sha256:556d692b2f08822101c159d9d6f731efe6c437d2b80f0ef96813e8745203c852` `quay.io/cilium/operator-azure:v1.10.7@sha256:556d692b2f08822101c159d9d6f731efe6c437d2b80f0ef96813e8745203c852` `docker.io/cilium/operator-generic:v1.10.7@sha256:d0b491d8d8cb45862ed7f0410f65e7c141832f0f95262643fa5ff1edfcddcafe` `quay.io/cilium/operator-generic:v1.10.7@sha256:d0b491d8d8cb45862ed7f0410f65e7c141832f0f95262643fa5ff1edfcddcafe` `docker.io/cilium/operator:v1.10.7@sha256:cd80afc7a5a7a70130fad0ef61977fb3dc42f8fb73201ce244b0f39843ab4b82` `quay.io/cilium/operator:v1.10.7@sha256:cd80afc7a5a7a70130fad0ef61977fb3dc42f8fb73201ce244b0f39843ab4b82` Signed-off-by: Joe Stringer <joe@cilium.io> 19 January 2022, 01:58:18 UTC
3e77756 Prepare for release v1.10.7 Signed-off-by: Joe Stringer <joe@cilium.io> 19 January 2022, 00:48:33 UTC
58f5aee build(deps): bump docker/build-push-action from 2.7.0 to 2.8.0 Bumps [docker/build-push-action](https://github.com/docker/build-push-action) from 2.7.0 to 2.8.0. - [Release notes](https://github.com/docker/build-push-action/releases) - [Commits](https://github.com/docker/build-push-action/compare/a66e35b9cbcf4ad0ea91ffcaf7bbad63ad9e0229...1814d3dfb36d6f84174e61f4a4b05bd84089a4b9) --- updated-dependencies: - dependency-name: docker/build-push-action dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> 18 January 2022, 20:32:18 UTC
42df137 bpf: Reset Pod's queue mapping in host veth to fix phys dev mq selection [ upstream commit ecdff123780dcc50599e424cbbc77edf2c70e396 ] Fix TX queue selection problem on the phys device as reported by Laurent. At high throughput, they noticed a significant amount of TCP retransmissions that they tracked back to qdic drops (fq_codel was used). Suspicion is that kernel commit edbea9220251 ("veth: Store queue_mapping independently of XDP prog presence") caused this due to its unconditional skb_record_rx_queue() which sets queue mapping to 1, and thus this gets propagated all the way to the physical device hitting only single queue in a mq device. Lets have bpf_lxc reset it as a workaround until we have a kernel fix. Doing this unconditionally is good anyway in order to avoid Pods messing with TX queue selection. Kernel will catch up with fix in 710ad98c363a ("veth: Do not record rx queue hint in veth_xmit"). Fixes: #18311 Reported-by: Laurent Bernaille <laurent.bernaille@datadoghq.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Laurent Bernaille <laurent.bernaille@datadoghq.com> Link (Bug): https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=edbea922025169c0e5cdca5ebf7bf5374cc5566c Link (Fix): https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=710ad98c363a66a0cd8526465426c5c5f8377ee0 Signed-off-by: Aditi Ghag <aditi@cilium.io> 18 January 2022, 16:02:50 UTC
e131335 test: bump l4lb kind in Vagrantfile to 0.11.1 [ upstream commit 018c94536f27c868a7795c6ba66c50559692ce28 ] The 0.11.1 release bumps the base ubuntu image to 21.04 [1], which should fix the issue we are seeing with the current test: ++ docker exec -i kind-control-plane /bin/sh -c 'echo $(( $(ip -o l show eth0 | awk "{print $1}" | cut -d: -f1) ))' [..] Reading package lists... E: The repository 'http://security.ubuntu.com/ubuntu groovy-security Release' does not have a Release file. E: The repository 'http://archive.ubuntu.com/ubuntu groovy Release' does not have a Release file. E: The repository 'http://archive.ubuntu.com/ubuntu groovy-updates Release' does not have a Release file. E: The repository 'http://archive.ubuntu.com/ubuntu groovy-backports Release' does not have a Release file. Error: Process completed with exit code 100. [1] https://github.com/kubernetes-sigs/kind/releases/tag/v0.11.1 Signed-off-by: Gilberto Bertin <gilberto@isovalent.com> Signed-off-by: Aditi Ghag <aditi@cilium.io> 18 January 2022, 16:02:50 UTC
d1c416a Fix possible IP leak in case ENI's are not present in the CN yet [ upstream commit aea1b9f24ade9068711a1a555c7345188b9f736b ] buildAllocationResult may return an error in case of inconsistencies found in the local CN's status. For example, there are situations where an IP is already part of spec.ipam.pool (including the resource/ENI where the IP comes from), while the corresponding ENI is not part of status.eni.enis yet. If that is the case, the IP would be allocated (e.g. by allocateNext) and then marked as allocated (via a.markAllocated). Shortly after that, a.buildAllocationResult() would fail and then NOT undo the changes done by a.markAllocated(). This will then result in the IP never being freed up again. At the same time, kubelet will keep scheduling PODs onto the same node without knowing that IPs run out and thus causing new PODs to never get an IP. Why exactly this inconsistency between the spec and the status arise is a different topic and should maybe be investigated further. This commit/PR fixes this issue by simply moving a.markAllocated() after the a.buildAllocationResult() result, so that the function is bailed out early enough. Some additional info on how I encountered this issue and maybe how to reproduce it. We have a cluster running that does automatic downscaling of all deployments at night and then relies on cluster-autoscaler to also shut down nodes. Next morning, all deployments are upscaled again, causing cluster-autoscaler to also start many nodes at once. This causes many nodes to appear in k8s at the same time, all being `NotReady` at the beginning. Cilium agents are then started on each node. When cilium agents start to get ready, the node are also marked `Ready`, causing the k8s scheduler to immediately schedule dozens of PODs onto the `Ready` nodes, long before cilium-operator had a chance to attach new ENIs and IPs to the fresh nodes. This means that all PODs scheduled to the fresh nodes run into a temporary state where the CNI plugin reports that there are no more IPs available. All this is expected and normal until this point. After a few seconds, cilium-operator finishes attaching new ENIs to the fresh nodes and then tries to update the CN. The update to the spec.pool seems to be successful then, causing the agent to allocate the IP. But as the update to the status seems to fail, the agent then bails out with the IP being marked as used and thus causing the leak. This is only happening with very high load on the apiserver. At the same time, I can observe errors like these happening in cilium-operator: ``` level=warning msg="Failed to update CiliumNode" attempt=1 error="Operation cannot be fulfilled on ciliumnodes.cilium.io \"ip-100-66-62-168.eu-central-1.compute.internal\": the object has been modified; please apply your changes to the latest version and try again" instanceID=i-009466ca3d82a1ec0 name=ip-100-66-62-168.eu-central-1.compute.internal subsys=ipam updateStatus=true ``` Please note the `attempt=1` in the log line, it indicates that the first attempt also failed and that no further attempt is done (looking at the many `for retry := 0; retry < 2; retry++` loops found in the code). I assume (without 100% knowing) that this is the reason for the inconsistency in spec vs status. Signed-off-by: Alexander Block <ablock84@gmail.com> Signed-off-by: Aditi Ghag <aditi@cilium.io> 18 January 2022, 16:02:50 UTC
67545c3 images: update cilium-{runtime,builder} for Go 1.16.13 While at it also bump the ubuntu:20.04 image to latest version to pick up updated systemd packages. Signed-off-by: Tobias Klauser <tobias@cilium.io> 17 January 2022, 16:40:26 UTC
b59c73f Update Go to 1.16.13 Signed-off-by: Tobias Klauser <tobias@cilium.io> 17 January 2022, 16:40:26 UTC
6d1bc27 egressgateway: fix initial reconciliation [ upstream commit ab9bfd71c9cb552445555167375b27c610ee19c6 ] When a new egress gateway manager is created, it will wait for the k8s cache to be fully synced before running the first reconciliation. Currently the logic is based on the WaitUntilK8sCacheIsSynced method of the Daemon object, which waits on the k8sCachesSynced channel to be closed (which indicates that the cache has been indeed synced). The issue with this approach is that Daemon object is passed to the NewEgressGatewayManager method _before_ its k8sCachesSynced channel is properly initialized. This in turn causes the WaitUntilK8sCacheIsSynced method to never return. Since NewEgressGatewayManager must be called before that channel is initialized, we need to switch to a polling approach, where the k8sCachesSynced is checked periodically. Signed-off-by: Gilberto Bertin <gilberto@isovalent.com> 17 January 2022, 16:11:56 UTC
back to top