https://github.com/cilium/cilium

sort by:
Revision Author Date Message Commit Date
fceed3f Prepare for release v1.10.13 Signed-off-by: Joe Stringer <joe@cilium.io> 15 July 2022, 17:24:54 UTC
19569fe test: More fine-grained host policies [ upstream commit 61f15b36a01431869d5b7654714c08bc40f3d2eb ] The previous commit fixed a bug in the host firewall where IPv4 traffic from local pods would be incorrectly processed as IPv6 traffic. That path is covered by several end-to-end tests in our CI but none of them failed. In those tests, we perform a TCP request on port 80 and a UDP request on port 69; the first is allowed by policies, while the second isn't and should be dropped. The requests are allowed and denied as expected on the bogus path. So let's take a look at how that's possible. We don't have any policy verdict for the TCP request, but we have one for the reply. So the TCP SYN packet skipped policy enforcement altogether. $ ks exec cilium-cgg9s -- cilium monitor -t policy-verdict Policy verdict log: flow 0xd9cfbd98 local EP ID 1630, remote ID 3812, proto 6, egress, action allow, match L3-Only, 192.168.56.11:80 -> 10.0.0.240:39134 tcp SYN, ACK Because of the bug, the IPv4 SYN packet went through ipv6_host_policy_ingress. The first step of that function was to lookup the destination security ID using the destination IP address. If the destination security ID is *not* HOST_ID, then host policy enforcement is skipped. Unsurprisingly, looking up an IPv4 address in the ipcache with an IPv6 prefix returns nothing and the security ID is set to WORLD_ID. Host policies are thus skipped and the SYN packet goes through. Now, why was the SYN+ACK allowed by policies? The above policy verdict says it comes from an L3 rule. Looking at the policies, we do have an L3 egress rule to allow pods not involved in the test to communicate with the host (on egress, only hostns -> zgroup=testserver requests are expected): - toEndpoints: - matchExpressions: - key: zgroup operator: NotIn values: [testServer] Unfortunately, that ends up matching the return traffic of the zgroup=testclient -> hostns connection. But then, why wasn't the UDP request allowed through as well? That would have failed the test and we would have caught this bug. If we look at policy verdicts for the UDP request, we do find the same sort of allow: $ ks exec cilium-cgg9s -- cilium monitor -t policy-verdict Policy verdict log: flow 0x0 local EP ID 1630, remote ID 3812, proto 17, egress, action allow, match L3-Only, 10.0.2.15:69 -> 10.0.0.240:54715 udp So the request isn't blocked by policies. If we look at drops: $ ks exec cilium-cgg9s -- cilium monitor -t drop xx drop (Invalid packet) flow 0x0 to endpoint 0, , identity unknown->unknown: 10.0.0.240:49043 -> 10.0.2.15:69 udp The most likely reason for an Invalid packet drop is that we received a packet that is too small (e.g., we're trying to read the IPv4 header but the packet is smaller than the minimum IPv4 length). So in this case, we received an IPv4+UDP packet that is smaller than the minimal IPv6 length and thus dropped it. Hence, the UDP connection fails, as expected by our test. This commit improves the test by making sure the L3 rule applies only to pods not involved in the test, neither as servers nor as clients. Signed-off-by: Paul Chaignon <paul@cilium.io> 15 July 2022, 16:45:09 UTC
a8d84ac bpf: Fix typo in host firewall tail call [ upstream commit 56bbbedfb8adf49ee95dca2ce336f7735104ccd4 ] There's a typo in the host firewall tail call that can lead to us executing the IPv6 code for an IPv4 packet when enforcing host policies for a packet coming from a local pod, when both IPv4, IPv6, and endpoint routes are all enabled. The typo was found while working on splitting bpf_host into more BPF programs. The next commit explains why the end-to-end tests didn't uncover this bug and improves those tests. Fixes: 5b05cc92 ("bpf: Split handle_lxc_traffic with tail calls") Signed-off-by: Paul Chaignon <paul@cilium.io> 15 July 2022, 16:45:09 UTC
52d8326 images: update cilium-{runtime,builder} Signed-off-by: Joe Stringer <joe@cilium.io> 15 July 2022, 16:19:42 UTC
e3946ad docs: Bump up Netlify Python version to 3.8 [ upstream commit 72374ab9c5f1bcc22a87babed5a76de9c015d1e7 ] Python 3.7 is no longer supported. Ref: https://github.com/netlify/build-image/blob/focal/included_software.md Signed-off-by: Michi Mutsuzaki <michi@isovalent.com> Signed-off-by: Timo Beckers <timo@isovalent.com> 15 July 2022, 10:32:06 UTC
fea9f05 Add Peer Service to Cilium DS Port List [ upstream commit bb250dcd3edc3c9cd3ec3113ee11dd3f18253e57 ] [ Backporter's notes: moved to cilium-agent-daemonset.yaml ] For the sake of documenting the Peer Service port. Signed-off-by: Nate Sweet <nathanjsweet@pm.me> Signed-off-by: Timo Beckers <timo@isovalent.com> 15 July 2022, 10:32:06 UTC
d9642e6 daemon: Fix issues where stale router IPs were not cleaned up [ upstream commit bcf5f25948827bce437c61eff2efbc41e55662fd ] Previously, removeOldRouterState would only try to clean up stale router IPs if the `cilium_host` device had more than one address assigned to it. The implementation however is incorrect if router IP restoration failed, because if restoration failed, then `restoredIP` would be nil and the function would assume the IPv6 address family, thus only trying to remove IPv6 addresses. But even with the proper address family, the existing logic seems to have a bug if there is only one single stale IP on the `cilium_host` device: If restoration failed and there is only one IP on the host device, then the IP assigned to the device cannot be the restored one, thus it needs to be removed. If the single stale IP is not removed and remains on the device, once a new router IP is allocated by `allocateRouterIPv{4,6}`, and after a new router IP is added to the interface in `init.sh` (via `Daemon.init`), then `cilium_host` device will wrongly end up with two router IPs (one of which is stale). Fixes: fcd00390c30c ("daemon, node: Remove old, discarded router IPs from `cilium_host`") Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> Signed-off-by: Timo Beckers <timo@isovalent.com> 15 July 2022, 10:32:06 UTC
6cc171a Use '.Values.enableIPv4Masquerade' in 'node-init'. [ upstream commit 2aa01c274056fd094459e19a5662197971ab0e65 ] [ Backporter's notes: applied to cilium-nodeinit-daemonset.yaml ] I'm not 100% sure if this was overlooked during the `masquerade` → `enableIPv4Masquerade` renaming (https://github.com/cilium/cilium/pull/14124), but assuming that was the case, this commit changes the reference from the old value to the new one so the value of `enableIPv4Masquerade` is actually taken into account in `node-init`. Signed-off-by: Bruno M. Custódio <brunomcustodio@gmail.com> Signed-off-by: Timo Beckers <timo@isovalent.com> 15 July 2022, 10:32:06 UTC
ac5be1f docs: Document clustermesh datapath configuration for non-tunneled modes [ upstream commit efcac8fec83c6b2551066b3f50a821a3e2124847 ] Add a section for cluster addressing requirements for non-tunneled datapath modes. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> Signed-off-by: Timo Beckers <timo@isovalent.com> 15 July 2022, 10:32:06 UTC
955c70f docs: Improve policy troubleshooting guide [ upstream commit dafb365c1489deca96a7291a035d8a166e6587f1 ] Update this section of the troubleshooting guide to give pointers to users what to do when policy is not enforced against certain pods. For hostNetwork, this means potentially using host policies. For pods started before Cilium, this means restarting the pods. Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Timo Beckers <timo@isovalent.com> 15 July 2022, 10:32:06 UTC
668454a ctmap: Do not use nil locks [ upstream commit 09f13a0ed6ff491749827fd64a180e7d9b1a65c3 ] natMapLock is nil for local CT maps and attempting to access a nil lock results in a SIGSEGV: panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x1d6df5b] goroutine 1358 [running]: github.com/cilium/cilium/pkg/maps/ctmap.doGC4(0xc004c510e0, 0xc00284ffa8) /go/src/github.com/cilium/cilium/pkg/maps/ctmap/ctmap.go:429 +0xbb github.com/cilium/cilium/pkg/maps/ctmap.doGC(0x10?, 0x2aecbc0?) /go/src/github.com/cilium/cilium/pkg/maps/ctmap/ctmap.go:532 +0xf1 github.com/cilium/cilium/pkg/maps/ctmap.GC(0xc0045a80f0?, 0xc00284ffa8?) /go/src/github.com/cilium/cilium/pkg/maps/ctmap/ctmap.go:552 +0x6c github.com/cilium/cilium/pkg/maps/ctmap/gc.runGC(0xc0044a9180, 0x1, 0x1, 0x0, 0x1?) /go/src/github.com/cilium/cilium/pkg/maps/ctmap/gc/gc.go:195 +0x54f github.com/cilium/cilium/pkg/maps/ctmap/gc.Enable.func1() /go/src/github.com/cilium/cilium/pkg/maps/ctmap/gc/gc.go:103 +0x646 created by github.com/cilium/cilium/pkg/maps/ctmap/gc.Enable /go/src/github.com/cilium/cilium/pkg/maps/ctmap/gc/gc.go:45 +0x231 Fix this by only attempting to lock if the lock is not nil. Fixes: 18952 Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> Signed-off-by: Timo Beckers <timo@isovalent.com> 15 July 2022, 10:32:06 UTC
87940df docs: remove "custom taint scenario" for AKS clusters using Azure ipam [ upstream commit 4ac0c321a1c14070cc2653e6c207ed9838f01406 ] Our previously recommended scenario for creating new AKS clusters with proper taint management is voided due to https://github.com/Azure/AKS/issues/2934. We switch to not recommending using taints anymore and instead warn users about the limits of not using taints and the fact we have no standard and foolproof alternative to recommend. Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com> Signed-off-by: Timo Beckers <timo@isovalent.com> 15 July 2022, 10:32:06 UTC
1c4d81d docs: add BYOCNI as preferred option in AKS instructions [ upstream commit d8259c1a806965c8e23f0b355a1ee99884796717 ] Following up on the previous commit: we add documentation related to the new AKS BYOCNI capabilities, and recommend it as the preferred installation method for AKS clusters. Since users might still want to use the integration offered by Azure IPAM, we elect to keep both the older Azure IPAM-based AKS installation method and the new BYOCNI-based installation method in the documentation. Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com> Signed-off-by: Timo Beckers <timo@isovalent.com> 15 July 2022, 10:32:06 UTC
85c36b0 helm: add new `aksbyocni.enabled` integration [ upstream commit 05fff61afa492a9897b0baf33e9d9f87be01b43d ] [ Backporter's notes: regenerated docs ] Add a new "datapath mode" to the Helm chart for AKS clusters using the new BYOCNI feature. This integration is specifically added to avoid confusion with the older Azure integration, as AKS clusters created in BYOCNI mode cannot use the Azure IPAM (Azure API not available in BYOCNI mode). Context: Microsoft has recently released support for "Bring your own CNI" when creating AKS clusters, allowing users to create clusters with no CNI plugin pre-installed. This is extremely useful for Cilium as previously setting up an AKS cluster and installing Cilium was very complicated, due to the need to use multiple nodepools with a complex taint system in order to ensure applications pods would not get scheduled before Cilium agents were ready to manage connectivity in the cluster. The BYOCNI feature has been available in preview as an `az` CLI extension for some time now, and Microsoft has published a dedicated documentation page outlining the prequisites and implications (notably in terms of support policy) of BYOCNI: https://docs.microsoft.com/en-us/azure/aks/use-byo-cni?tabs=azure-cli BYOCNI in its current state is a strong enough proposal to warrant a change to the recommendations we make to our users for newly created AKS clusters. Here are the changes necessary to run Cilium on an AKS cluster in BYOCNI mode: - Azure IPAM does not work in BYOCNI mode as the Azure API is not available. Instead of using the Azure operator, we must use the generic operator with Cluster Pool IPAM. - Direct routing does not work in BYOCNI mode. Instead, we must use VXLAN encapsulation. To this end, we simply create a new Helm integration completely separate of the previous `azure.enabled`, since the old method and the new BYOCNI methods are mutually exclusive due to Azure IPAM not being available. Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com> Signed-off-by: Timo Beckers <timo@isovalent.com> 15 July 2022, 10:32:06 UTC
cabc658 datapath: Create sysctl `rp_filter` overwrite config on agent init SystemD versions greater than 245 will create sysctl config which sets the `rp_filter` value for all network interfaces to 1. This conflicts with cilium which requires `rp_filter` to be 0 on interfaces it uses. This commit adds a small utility/tool: `sysctlfix` which will insert a config file into the `/etc/sysctl.d` dir with the highest priority containing directives to disable `rp_filter` and perhaps to contain other sysctl config in future. This utility is called as an init container before the cilium agent starts. Because the sysctl config is in place before the agent starts, all interfaces created by the agent and matching the patten in the config file will have `rp_filter` disabled, even when SystemD >=245 is installed. Fixes: #10645 Fixes: #19909 Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com> 15 July 2022, 01:44:04 UTC
3f822c2 update k8s versions to the latest releases Update k8s library versions: - 1.21.14 Followed the steps mentioned in https://docs.cilium.io/en/latest/contributing/development/dev_setup/#patch-version Signed-off-by: André Martins <andre@cilium.io> 14 July 2022, 21:33:52 UTC
aaa72b1 build(deps): bump actions/cache from 3.0.4 to 3.0.5 Bumps [actions/cache](https://github.com/actions/cache) from 3.0.4 to 3.0.5. - [Release notes](https://github.com/actions/cache/releases) - [Changelog](https://github.com/actions/cache/blob/main/RELEASES.md) - [Commits](https://github.com/actions/cache/compare/c3f1317a9e7b1ef106c153ac8c0f00fed3ddbc0d...0865c47f36e68161719c5b124609996bb5c40129) --- updated-dependencies: - dependency-name: actions/cache dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> 14 July 2022, 15:09:58 UTC
11a295b cilium: update cilium-{runtime,builder} Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Joe Stringer <joe@cilium.io> 12 July 2022, 16:09:43 UTC
c3522d3 image/runtime: Fix kube proxy and Cilium iptables and nftables collision [ upstream commit 369f3f917ec9f2d9bb46dde2e4aaba64934d60a4 ] Cilium currently, chooses to use iptables-legacy or iptables-nft using an iptables-wrapper script. The script currently does a simple check to see if there are more than 10 rules in iptables-legacy and if so picks legacy mode. Otherwise it will pick whichever has more rules nft or legacy. See [1] for the original wrapper this is taken from. This however can be problematic in some cases. We've hit an environment where arguably broken pods are inserting rules directly into iptables without checking legacy or nft. This can happen in cases of pods that are older for example and use an older package of iptables before 1.8.4 that was buggy or missing nft altogether. At any rate when this happens it becomes a race to see what pods come online first and insert rules into the table and if its greater than 10 cilium will flip into legacy mode. This becomes painfully obvious if the agent is restarted after the system has been running and these buggy pods already created their rules. At this point Cilium may be using legacy while kube-proxy and kubelet are running in nft space. (more on why this is bad below). We can quickly check this from a sysdump with a few one liners, $ find . -name iptables-nft-save* | xargs wc -l 1495 ./cilium-bugtool-cilium-1234/cmd/iptables-nft-save--c.md $ find . -name iptables-save* | xargs wc -l 109 ./cilium-bugtool-cilium-1234/cmd/iptables-save--c.md here we see that a single node has a significant amount of rules in both nft and legacy tables. In the above example we dove into the legacy table and found the normal CILIUM-* chains and rules. Then in the nft tables we see the standard KUBE-PROXY-* chains and rules. Another scenario where we can create a similar problem is with an old kube-proxy. In this hypothetical scenario the user upgrades to a new distribution/kernel with a base iptables image that points to iptables-nft. This will cause kubelet to use nft tables, but because of the older version of kube-proxy it may use iptables. Now kubelet and kube-proxy are out of sync. Now how should Cilium pick nft or legacy? Lets analyze the two scenarios. Assume Cilium and Kube-proxy pick differently. First we might ask what runs first nft or iptables. From the kernel side its unclear to me. The hooks are run walking an array but, it appears those hooks are registered at runtime. So its up to which hooks register first. And hooks register at init so now we are left wondering which of nft or legacy registers first. This may very well depend on if iptables-legacy or iptables-nft runs first because the init of the module is done on demand with a request_module helper. So bottom line ordering is fragile at best. For this discussion lets assume we can't make any claims on if nft or iptables runs first. Next, lets assume kube-proxy is in nft and Cilium is in legacy and nft runs first. Now this will break Cilium's expectation that the rules for Cilium are run before kube-proxy and any other iptables rules. The result can be drops in the datapath. The example that lead us on this adventure is IPSEC traffic hit a kube-proxy -j DROP rule because it never ran the Cilium -j ACCEPT rule we expected to be inserted into the front of the chain. So clearly this is no good. Just to cover our cases, consider Cilium is run first and then kube-proxy is run. Well we are still stuck from kernel code side the hooks are executed in a for loop over the hooks and an ACCEPT will run the next hook instead of the normal accept the skb and do not run any further rules. The next hook in this case will have the kube-proxy rules and we hit the same -j DROP rule again. Finally because we can't depend on the order of nft vs legacy running it doesn't matter if cilium and kube proxy flip to put cilium on nft and kube-proxy on legacy. We get the same problem. Because Cilium and kube-proxy are coupled in that they both manage iptables for datapath flows they need to be on the same hook. We could try to do this by doing [2] and following kubelet AND assuming kube-proxy does the same everything should be OK. The problem is if kube-proxy is not updated and doesn't follow kubelet we again get stuck with Cilium and kube-proxy using different hooks. To fix this case modify [2] so that Cilium follows kube-proxy instead of following kubelet. This will force cilium and kube-proxy to at least choose the same hook and avoid the faults outlined above. There is a corner case if kube-proxy is not up before cilium, but experimentally it seems kube-proxy is started close to kubelet and init paths so is in fact up before cilium making this ok. If we ever need to verify this in sysdump we can check startAt times in the k8s-pod.yaml to confirm the start ordering of pods. For reference The original iptables-wrapper script the Cilium used previous to this patch is coming from [1]. This patch is based off of the new wrapper [2] in k8s upstream repo. [1]: https://github.com/kubernetes/kubernetes/pull/82966 [2]: https://github.com/kubernetes-sigs/iptables-wrappers/blob/master/iptables-wrapper-installer.sh Signed-off-by: Tam Mach <tam.mach@cilium.io> Signed-off-by: John Fastabend <john.fastabend@gmail.com> 12 July 2022, 16:09:43 UTC
9d13ec6 build(deps): bump actions/setup-go from 3.2.0 to 3.2.1 Bumps [actions/setup-go](https://github.com/actions/setup-go) from 3.2.0 to 3.2.1. - [Release notes](https://github.com/actions/setup-go/releases) - [Commits](https://github.com/actions/setup-go/compare/b22fbbc2921299758641fab08929b4ac52b32923...84cbf8094393cdc5fe1fe1671ff2647332956b1a) --- updated-dependencies: - dependency-name: actions/setup-go dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> 11 July 2022, 15:19:06 UTC
443c304 nodediscovery: make LocalNode return a deep copy of localNode [ upstream commit 88e683528e92e20fe9557520a869030f234738a5 ] as that object contains a few slices that will get modified after StartDiscovery() is called Fixes: 3d6ed0432a ("nodediscovery: add LocalNode method") Suggested-by: Sebastian Wicki <sebastian@isovalent.com> Signed-off-by: Gilberto Bertin <jibi@cilium.io> 06 July 2022, 14:49:11 UTC
4fe618b docs: update IPSec docs [ upstream commit 08d571a97ff6402c8654f99c4af42afbd25848a6 ] Signed-off-by: Gilberto Bertin <jibi@cilium.io> 06 July 2022, 14:49:11 UTC
1109ef8 ipsec: replace 0 with FAMILY_ALL netlink constant [ upstream commit d0ac0fa70de6931e83ffeb19a5838a964254c0b7 ] Signed-off-by: Gilberto Bertin <jibi@cilium.io> 06 July 2022, 14:49:11 UTC
5ad5142 logfields: add SPI and OldSPI IPSec constants [ upstream commit 604ec254cca5057bac189a7b30e123399f6015a5 ] Signed-off-by: Gilberto Bertin <jibi@cilium.io> 06 July 2022, 14:49:11 UTC
8d0513e ipsec: rename ipSecKeysGlobalLock to ipSecLock [ upstream commit 20cd921d8d35970c5cc58a71743122a899353c69 ] as now this lock is used to protect also ipSecCurrentKeySPI in addition to ipSecKeysGlobal. Signed-off-by: Gilberto Bertin <jibi@cilium.io> 06 July 2022, 14:49:11 UTC
7778971 ipsec: fix stale keys reclaim logic [ upstream commit 3b2e982bc8a131744aa403bc6616568794d8bfa5 ] The current logic used to reclaim stale IPSec keys is incorrect. In fact, whenever a new key is added we just spawn a new goroutine which after linux_defaults.IPsecKeyDeleteDelay time will delete all keys different than the one just added. The issue with this approach is that if a new key is added while one of these goroutine is still running, this goroutine will end up deleting the new key. This commit fixes this by introducing a single new goroutine that periodically removes any stale key (XFRM state and XFRM policy) after linux_defaults.IPsecKeyDeleteDelay has passed since the key was replaced. Fixes: #19814 Signed-off-by: Gilberto Bertin <jibi@cilium.io> 06 July 2022, 14:49:11 UTC
9b1c3ec ipsec: add ipSecXfrmMark{Set,Get}SPI helpers [ upstream commit b2331289c703156b5a965a4e488424353f1c0580 ] which allows to encode and extract an SPI value in and from a XfrmMark Conflicts: * minor conflict in pkg/datapath/linux/ipsec/ipsec_linux.go as a bunch of constants are not defined in v1.10 Signed-off-by: Gilberto Bertin <jibi@cilium.io> 06 July 2022, 14:49:11 UTC
907acd3 nodediscovery: add LocalNode method [ upstream commit 3d6ed0432ad94e0615c3417cb17fc487270331b8 ] This method will be used in a subsequent commit to retrieve a copy of the localNode object stored in the nodeDiscovery one. Signed-off-by: Gilberto Bertin <jibi@cilium.io> 06 July 2022, 14:49:11 UTC
d64cb54 ipsec: implement keyfile watcher [ upstream commit dce0d53359abd388bc6f72dd9134354482bf698a ] This commit adds a new filesystem watcher for the IPSec subsystem, which in turn will enable automatic key rotation without the need to restart the agent whenever a change to the IPSec keyfile is detected. Conflicts: * minor conflit in daemon/cmd/daemon.go as v1.10 does not have support for runtime device detection Signed-off-by: Gilberto Bertin <jibi@cilium.io> 06 July 2022, 14:49:11 UTC
f15eb77 Move fswatcher package out of crypto/certloader [ upstream commit d3b2d54d6f8c298478166c546155c0ab8ade497a ] Given that this logic is totally agnostic to the crypto/certloader package and in a subsequent commit it will be used also in the IPSec subsystem, make it a top level package. Signed-off-by: Gilberto Bertin <jibi@cilium.io> 06 July 2022, 14:49:11 UTC
91813d7 nodediscovery: use time.After() instead of time.NewTimer().C [ upstream commit b36cd0186e4c704514de17c23c763afab3caa8c1 ] Signed-off-by: Gilberto Bertin <jibi@cilium.io> 06 July 2022, 14:49:11 UTC
3d3ae5a nodediscovery: split StartDiscovery into smaller methods [ upstream commit c19ef1c36ba890945507a59e3af642db8c743f27 ] in preparation to a subsequent change, this commit moves some of the (*NodeDiscovery).StartDiscovery() logic into separated methods: * fillLocalNode(), responsible for syncing the localNode object with the actual state of the node * updateLocalNode(), responsible for publishing the updated KV store entry and/or CiliumNode object for the local node In addition to that, this commit introduces also the UpdateLocalNode() method, which ensures that the proper lock is held while syncing the localNode object state with fillLocalNode() and then publishing the new object with updateLocalNode(). No functional changes are introduced. Signed-off-by: Gilberto Bertin <jibi@cilium.io> 06 July 2022, 14:49:11 UTC
96b7a40 nodediscovery: Make LocalNode object private [ upstream commit e52fe1d59d1c6f51a76993641ea9412e0f44f749 ] The LocalNode currently consists of static fields which are never changed after the local node has been initialized. However, with the introduction of secondary allocation CIDRs, we want to be update the secondary allocation CIDR fields dynamically at run-time. To ensure writes to the object are always protected by a mutex, and that to be able to ensure that any changes to the object are propagated to relevant subsystems, this commit makes the `LocalNode` object private. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> Signed-off-by: Gilberto Bertin <jibi@cilium.io> 06 July 2022, 14:49:11 UTC
8d13c9e nodediscovery: Unify source of truth for node name [ upstream commit 9c1cb4856322a80174799dae142ef761e51e2310 ] Clustermesh related code is relying on `NodeDiscovery.LocalNode.FullName()` to obtain the globally unique name of the local node. This field is initialized from `nodeTypes.GetName()`. This code removes all uses of `LocalNode.FullName()` and replaces it with an equivalent call to `nodeTypes.GetAbsoluteNodeName()`. This allows us to make the `NodeDiscovery.LocalNode` private in a subsequent commit. Conflicts: * minor conflict in daemon/cmd/daemon.go as NodeName does not exist in the v1.10 clustermesh.Configuration type Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> Signed-off-by: Gilberto Bertin <jibi@cilium.io> 06 July 2022, 14:49:11 UTC
a2e73f8 health: Move endpoint IP to node package [ upstream commit ed934cb958c52515be37998038301229959a0fd2 ] The Cilium health endpoint IP is the only static local node field which is not accessible from the node package. To avoid code modifying the `NodeDiscovery.LocalNode` field (which will become more dynamic in a subsequent mode), the health IP is now moved to the node package with a corresponding getter and setter like all other static node fields. Conficts: * minor conflict in the definition of LaunchAsEndpoint() in cilium-health/launch/endpoint.go Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> Signed-off-by: Gilberto Bertin <jibi@cilium.io> 06 July 2022, 14:49:11 UTC
0658131 node: add GetAbsoluteNodeName method This commit backport just the GetAbsoluteNodeName method from 3203df90821f562d655bbda24fd87716f6c7661a, as it will be required in a subsequent commit. Signed-off-by: Maddy007-maha <mahadev.panchal@accuknox.com> Signed-off-by: Gilberto Bertin <jibi@cilium.io> 06 July 2022, 14:49:11 UTC
76834b8 Considering VPC's secondary CIDRs during cilium_host IP restoration [ upstream commit 49832633602e8c20c3f40b359c035099dd20f751 ] During the cilium_host IP restoration process, agent verifies if the IP is contained in the VPC CIDRs. However the check is only made against the VPC's primary CIDR. This commit adds support to include secondary CIDRs as well. When router IP restoration fails, cilium_host can end up with multiple IPs attached to it because removeOldRouterState() considers nil IP as IPv6 and doesn't remove stale IPv4 IPs. Will address this issue in a separate PR. Signed-off-by: Hemanth Malla <hemanth.malla@datadoghq.com> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> 06 July 2022, 13:39:36 UTC
71ee6d5 bug: Prevent CiliumIdentities from Being Deleted Improperly [ upstream commit 959568bc3fc0de093bba3093dbac25ea28136c18 ] Allocated CiliumIdentities can sometimes be improperly deleted before being utilized. Instead of having the operator delete them immediately, they are now "marked" for later deletion. This will give a Node a chance to avoid racing the operator for the identity. If the CiliumIdentity is going to be used by a Node the Node will see the mark for deletion and remove it before utilizing the CiliumIdentity. A race between unmarking for deletion on the Node and deletion from the operator is avoided by enforcing a ResourceVersion check on deletion in the operator logic. Signed-off-by: Nate Sweet <nathanjsweet@pm.me> Signed-off-by: Paul Chaignon <paul@cilium.io> 05 July 2022, 17:07:40 UTC
3b3c0b0 Add training and support information to Getting Help [ upstream commit 4d67a8d7a76157a5da6430cecdcf9feefd581770 ] Signed-off-by: Liz Rice <liz@lizrice.com> Co-authored-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Paul Chaignon <paul@cilium.io> 05 July 2022, 17:07:40 UTC
587f746 helm: Fix cluster-id arguments in clustermesh deployment [ upstream commit b10d97c245a3525436633509925f96c13ee48b75 ] This commit is to pass the cluster-id argument from env in clustermesh deployment, otherwise the cluster-id will be defaulted to 0, which potentially causes issues with identity conflict. Before ``` $ ksyslo clustermesh-apiserver-7d77845567-5kq2d -c apiserver level=info msg=" --allocator-list-timeout='3m0s'" subsys=clustermesh-apiserver level=info msg=" --cluster-id='0'" subsys=clustermesh-apiserver ... ``` After ``` $ ksyslo clustermesh-apiserver-cb8f9bb7-689kn -c apiserver level=info msg=" --allocator-list-timeout='3m0s'" subsys=clustermesh-apiserver level=info msg=" --cluster-id='1'" subsys=clustermesh-apiserver level=info msg=" --cluster-name='test'" subsys=clustermesh-apiserver ... ``` Signed-off-by: Tam Mach <tam.mach@cilium.io> Signed-off-by: Paul Chaignon <paul@cilium.io> 05 July 2022, 17:07:40 UTC
9c41866 Add ESP to firewall requirement for IPSec enabled Cilium [ upstream commit 307314469284124755e4d73124e70bc2ec01eb98 ] Signed-off-by: Divine Odazie <dodazie@gmail.com> Signed-off-by: Paul Chaignon <paul@cilium.io> 05 July 2022, 17:07:40 UTC
f3ea7c3 jenkinsfiles: fix docker manifest inspect commands in GKE pipeline [ upstream commit 0f53a7caa44762c03d630386098fe8e408a33d24 ] Currently the `docker manifest inspect` commands used to wait for the CI images to be ready fail with the following error: 15:38:33 + docker manifest inspect quay.io/cilium/cilium-ci:c5626b69982a6af9d523a398538499e300d8c834} 15:38:33 invalid reference format 15:38:34 + docker manifest inspect quay.io/cilium/operator-generic-ci:c5626b69982a6af9d523a398538499e300d8c834} 15:38:34 invalid reference format 15:38:35 + docker manifest inspect quay.io/cilium/hubble-relay-ci:c5626b69982a6af9d523a398538499e300d8c834} 15:38:35 invalid reference format Drop the trailing `}`. Fixes: 0c817496e8ea ("workflows: Use docker manifest inspect for waiting images") Fixes: 89f67901cad6 ("jenkinsfiles: add `IMAGE_REGISTRY` env parameter") Signed-off-by: Tobias Klauser <tobias@cilium.io> Signed-off-by: Paul Chaignon <paul@cilium.io> 05 July 2022, 17:07:40 UTC
ac6577b docs(policy): add note on L7 policies & cilium agent availability [ upstream commit 3628513f754c382406ed25416673f9283b53243a ] Signed-off-by: Raphaël Pinson <raphael@isovalent.com> Signed-off-by: Paul Chaignon <paul@cilium.io> 05 July 2022, 17:07:40 UTC
19d2e91 docs(policy): add a note on DNS proxy & cilium agent availability [ upstream commit 39c8531f207ccdae14a2cd7e7a4a08ceab61b3d6 ] Signed-off-by: Raphaël Pinson <raphael@isovalent.com> Signed-off-by: Paul Chaignon <paul@cilium.io> 05 July 2022, 17:07:40 UTC
0caaa20 ci: provide CI images with unstripped binaries [ upstream commit 00b2a2f9aa2d1d1df4299fad6ed02bd40841b1fd ] These will be tagged with :<SHA>-unstripped and :latest-unstripped and are e.g. useful for profiling. Signed-off-by: Tobias Klauser <tobias@cilium.io> Signed-off-by: Paul Chaignon <paul@cilium.io> 05 July 2022, 17:07:40 UTC
159e1be Add a note about conflicting node CIDRs #20204 [ upstream commit 590388d728de5ae17f950cb8bf1cc694049522d3 ] Signed-off-by: Wojtek Czekalski <me@wczekalski.com> Signed-off-by: Paul Chaignon <paul@cilium.io> 05 July 2022, 17:07:40 UTC
2b616a2 nodediscovery: ensure we cache the nodeResource [ upstream commit a91e00e0e7aa506ae036bf3b9e9c09ecef9407a7 ] When retrying we have to explicitly use the previously fetched nodeResource in case we want to skip fetching it. Otherwise we end up with a null pointer exception. Fixes: 91e68c207308 ("nodediscovery: ensure instanceID is not empty") Signed-off-by: Odin Ugedal <ougedal@palantir.com> Signed-off-by: Paul Chaignon <paul@cilium.io> 05 July 2022, 17:07:40 UTC
f1f39af Add metric on datapath update latency due to FQDN IP updates [ upstream commit 9ede0e6a0b445cc2f154e3c52a9888a1b7a7cf8f ] While there is an overall processing metric, it would be good to extract the datapath update time from other sections of the FQDN processing. This makes it easier to diagnose the source of any backups. It should be noted that dataplaneTime will be capped to option.Config.FQDNProxyResponseMaxDelay. After that time has elapsed, the update is cancelled and the DNS packet forwarded to the endpoint. Signed-off-by: Rahul Joshi <rkjoshi@google.com> Signed-off-by: Paul Chaignon <paul@cilium.io> 05 July 2022, 17:07:40 UTC
4de29b6 Unset `ImageReference` when updating Azure VMSS [ upstream commit 444e00fa714fdf86142c474396cd62ee3610834d ] This is to avoid 403 errors when updating an Azure VMSS instance to assign additional IP addresses. The 403 can occur as the result of a permissions check when the image reference contains a subscription ID which the service principal making the request doesn't have access to, which is the case for Azure Compute Gallery images, as they contain an Azure-owned subscription ID. Unsetting this property doesn't affect the instance, it just tells Azure that we don't want to change the value. Fixes: #19695 Signed-off-by: Andrew Bulford <andrew.bulford@form3.tech> Signed-off-by: Paul Chaignon <paul@cilium.io> 05 July 2022, 17:07:40 UTC
8a1bed5 datapath: Set WORLD_ID in fwd-ed NodePort BPF requests This commit is a backport of 595ddcdd ("datapath: Set WORLD_ID in fwd-ed NodePort BPF requests"). The main difference is that the behavior introduced in the commit is protected by the hidden flag --bpf-lb-preserve-world-id (defaults to false). This is needed to avoid breaking existing network policies assumptions on v1.11. Signed-off-by: Martynas Pumputis <m@lambda.lt> 05 July 2022, 15:04:18 UTC
82207b9 helm: disable the peer service by default Commit da1c1f8b3ff607844a84ec096ff9879931e5113c turns the peer service into a Kubernetes service. The rationale for the change is that instead of cross-mounting the Hubble Unix domain socket from the Cilium pod to the Hubble Relay pod, Hubble Relay can directly query the Kubernetes service to get information about Hubble peers. This is a security improvement as it follows best-practices and streamlines the installation process on platforms that strictly enforce SELinux policies such as OpenShift. However, this change is not minor and users have reported issues with Hubble when upgrading from v1.10.x to v1.10.11. In order to follow the principle of least astonishment and to avoid breaking a Hubble deployment during a patch upgrade, this commit disables the peer service by default such that the old behavior is retained, which shouldn't cause any issue when upgrading. Users of platforms which need the peer service can always explicitly enable it in v1.10. Signed-off-by: Robin Hahling <robin.hahling@gw-computing.net> 27 June 2022, 10:20:22 UTC
232969c iptables: fix typo in addProxyRule condition [ upstream commit 2d944e93b987f9b8afc53e8fbf1b07df2c7a1b41 ] In d812b925de ("iptables: don't ignore errors") we introduced a typo in the logic used to select old proxy rules in addProxyRules: - if strings.Contains(rule, "-A CILIUM_PRE_mangle ") && strings.Contains(rule, "cilium: TPROXY to host "+name) && !strings.Contains(rule, portMatch) { + if strings.Contains(rule, "-A CILIUM_PRE_mangle ") && !strings.Contains(rule, "cilium: TPROXY to host "+name) && strings.Contains(rule, portMatch) { Then later on, in c61038bff4 ("iptables: invert conditions to simplify logic"), assuming the condition was correct, we just inverted it: - if strings.Contains(rule, "-A CILIUM_PRE_mangle ") && !strings.Contains(rule, "cilium: TPROXY to host "+name) && strings.Contains(rule, portMatch) { + if !strings.Contains(rule, "-A CILIUM_PRE_mangle ") || strings.Contains(rule, "cilium: TPROXY to host "+name) || !strings.Contains(rule, portMatch) { The correct condition to use should be the initial one, inverted: if !strings.Contains(rule, "-A CILIUM_PRE_mangle ") || !strings.Contains(rule, "cilium: TPROXY to host "+name) || strings.Contains(rule, portMatch) { Fixes: #19693 Signed-off-by: Gilberto Bertin <jibi@cilium.io> 22 June 2022, 13:36:55 UTC
654106a logfields: add Chain constant [ upstream commit c91114303659a0b717f62f88ec36755681d1fa91 ] used for iptables chains Signed-off-by: Gilberto Bertin <jibi@cilium.io> 22 June 2022, 13:36:55 UTC
704c69b iptables: always try to remove old IPv6 rules [ upstream commit c4d00ef6a98c51666e244dce64868e90b8fa0d88 ] Currently we try to remove old IPv6 rules only if IPv6 support is enabled, which means we'll leave old IPv6 rules around in case the agent is restarted after disabling IPv6 support. What we should do is always try to remove old IPv6 rules, and check only for support for ip6tables. Signed-off-by: Gilberto Bertin <jibi@cilium.io> 22 June 2022, 13:36:55 UTC
370ed98 iptables: add debug logs for all commands [ upstream commit 7258d05aaf242885f59b865ac264c13819f0e9e6 ] Rather than inconsistently logging some of the iptables/ipset operations, add debug logs for all commands (i.e. all runProg() and runProgCombinedOutput() calls). Signed-off-by: Gilberto Bertin <jibi@cilium.io> 22 June 2022, 13:36:55 UTC
b9d58d2 iptables: add ruleReferencesDisabledChain and isDisabledChain helpers [ upstream commit a7e548b4b95303edf0c57f9b400fd00988a2ca60 ] which allow to slightly simplify the logic around skipping operations for the disabled chains. Signed-off-by: Gilberto Bertin <jibi@cilium.io> 22 June 2022, 13:36:55 UTC
95ad731 iptables: refactor removeRules [ upstream commit 2e08aa5cabc3ab95dfc688c401d937f30f277656 ] No need to loop twice for v4 and v6 rules as the 2 protocols share the same tables. Signed-off-by: Gilberto Bertin <jibi@cilium.io> 22 June 2022, 13:36:55 UTC
2f5c534 iptables: refactor InstallNoTrackRules [ upstream commit 6d584d7f23cdaa13a99193001ee20068607ca77c ] No need to pass the ingress parameter to endpointNoTrackRules as we are always installing rules for both ingress and egress. Signed-off-by: Gilberto Bertin <jibi@cilium.io> 22 June 2022, 13:36:55 UTC
5572fb8 iptables: refactor addProxyRules [ upstream commit eb68bdb14ee8de03592322d948ed9f229d90bf6b ] loop over L4 protocols to install rules rather than hardcoding all cases Signed-off-by: Gilberto Bertin <jibi@cilium.io> 22 June 2022, 13:36:55 UTC
f600f3a iptables: invert conditions to simplify logic [ upstream commit c61038bff44a5d28c7f0b0fe99b55efb0fd70486 ] Invert the matching condition in removeCiliumRules, doCopyProxyRule and addProxyRules to allow continuing to the next rule and reduce branching. Signed-off-by: Gilberto Bertin <jibi@cilium.io> 22 June 2022, 13:36:55 UTC
4ab6d58 iptables: pass context to InstallRules and InstallProxyRules [ upstream commit 9cd9b43fb08a65b1f1ec8016072cf80ea6595ab9 ] in order to allow stopping the retries if they're in the middle of executing when the daemon is shutting down. Signed-off-by: Gilberto Bertin <jibi@cilium.io> 22 June 2022, 13:36:55 UTC
f14e137 iptables: retry installing rules on failure [ upstream commit 3a0130acf8dc5b841a5f71d9b1355d1209ba7768 ] This commit ensures that in case of any error the InstallRules() and InstallProxyRules() functions will keep retrying a certain amount of times before returning an error to the caller. Since there's going to be an exponential wait time between retries, this commit moves also the IptablesManager lock from InstallRules()/InstallProxyRules() to doInstallRules()/doInstallProxyRules(). Signed-off-by: Gilberto Bertin <jibi@cilium.io> 22 June 2022, 13:36:55 UTC
1ae3632 iptables: don't ignore errors [ upstream commit d812b925dec2d8be09834277050050d67681fd7c ] This commit ensure that no errors are ignored while installing iptables rules. Currently it's possible we miss some errors, for example when renaming the Cilium ruleset to the "OLD_" prefix, which can in turn cause failures in installing subsequent rules and thus inconsistencies in the iptables ruleset managed by Cilium. This change will allow in a subsequent commit to retry installing the whole iptables ruleset in case of any failure, to ensure the ruleset is always in a consistent state. Signed-off-by: Gilberto Bertin <jibi@cilium.io> 22 June 2022, 13:36:55 UTC
f2ab9f0 iptables: move custom chains logic into custom_chain.go [ upstream commit 597ad32c1c58af210953bb3f80912ab1d810d8db ] This commit moves all the logic related to custom chains into its own custom_chain.go file. No functional changes are introduced. Signed-off-by: Gilberto Bertin <jibi@cilium.io> 22 June 2022, 13:36:55 UTC
3756f1a iptables: make IptablesManager thread safe [ upstream commit b31fc5d580937acaa01a6caee56ae5561342604a ] as its methods can be called concurrently from different subsystems. Signed-off-by: Gilberto Bertin <jibi@cilium.io> 22 June 2022, 13:36:55 UTC
de9f28d datapath: Always use of wait argument on iptables commands. [ upstream commit aa329c87529cb316e4d92f9e5806a344897c6534 ] A missing wait arg can lead to spurious iptables command failure if two of them are executed in parallel due to an internal lock being held. This may cause test flakes due to missing iptables rules. Solve this by always injecting the wait args at runProgCombinedOutput() that ultimately does all iptables calls and remove the wait args from all the callers. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> Signed-off-by: Gilberto Bertin <jibi@cilium.io> 22 June 2022, 13:36:55 UTC
1a9ecf1 install: Update image digests for v1.10.12 Generated from https://github.com/cilium/cilium/actions/runs/2505052631. `docker.io/cilium/cilium:v1.10.12@sha256:6a119c4f249d42df0d5654295ac9466da117f9b838ff48b4bc64234f7ab20b80` `quay.io/cilium/cilium:v1.10.12@sha256:6a119c4f249d42df0d5654295ac9466da117f9b838ff48b4bc64234f7ab20b80` `docker.io/cilium/clustermesh-apiserver:v1.10.12@sha256:0dd9df6e4b20f7120f10ddee560e0c285875657f0db0e2a18bc0cd748f86a84c` `quay.io/cilium/clustermesh-apiserver:v1.10.12@sha256:0dd9df6e4b20f7120f10ddee560e0c285875657f0db0e2a18bc0cd748f86a84c` `docker.io/cilium/docker-plugin:v1.10.12@sha256:f913939f14bd6f1dff769af0de116d79f454f0091da933f6fb1d8485c07b1566` `quay.io/cilium/docker-plugin:v1.10.12@sha256:f913939f14bd6f1dff769af0de116d79f454f0091da933f6fb1d8485c07b1566` `docker.io/cilium/hubble-relay:v1.10.12@sha256:fd3829bf67f2f3d3471da6ded9c636b22feb9a31feaac4509a295043e93af169` `quay.io/cilium/hubble-relay:v1.10.12@sha256:fd3829bf67f2f3d3471da6ded9c636b22feb9a31feaac4509a295043e93af169` `docker.io/cilium/operator-alibabacloud:v1.10.12@sha256:72de09e0e7a17de8e61e03f251d698c68e2e8e1f1fa1ada67200920a6cad6d0a` `quay.io/cilium/operator-alibabacloud:v1.10.12@sha256:72de09e0e7a17de8e61e03f251d698c68e2e8e1f1fa1ada67200920a6cad6d0a` `docker.io/cilium/operator-aws:v1.10.12@sha256:06b31f3d9baa2be911b90ab933bb8dc08a1bd5e3104f5e90b9cb51a9dd9142f6` `quay.io/cilium/operator-aws:v1.10.12@sha256:06b31f3d9baa2be911b90ab933bb8dc08a1bd5e3104f5e90b9cb51a9dd9142f6` `docker.io/cilium/operator-azure:v1.10.12@sha256:7c920352c82cd10b402d14902f119d75e45f6faa103f2ea89f760cf5de5301f3` `quay.io/cilium/operator-azure:v1.10.12@sha256:7c920352c82cd10b402d14902f119d75e45f6faa103f2ea89f760cf5de5301f3` `docker.io/cilium/operator-generic:v1.10.12@sha256:35288de36cd1b6fe65e55a9b878100c2ab92ac88ed6a3ab04326e00326cff3f7` `quay.io/cilium/operator-generic:v1.10.12@sha256:35288de36cd1b6fe65e55a9b878100c2ab92ac88ed6a3ab04326e00326cff3f7` `docker.io/cilium/operator:v1.10.12@sha256:e466554afdfcefae92d2757c4acd364afb803be54fd529404677c83039b86163` `quay.io/cilium/operator:v1.10.12@sha256:e466554afdfcefae92d2757c4acd364afb803be54fd529404677c83039b86163` Signed-off-by: Joe Stringer <joe@cilium.io> 15 June 2022, 21:16:29 UTC
ba175f5 pkg/redirectpolicy: Fix panic in service matcher LRP handling [ upstream commit 4bcd7d23648341fedb6a4ba67243f52053ddee3a ] When a service matcher LRP and the selected backend pods are deployed first, we previously didn't check if the LRP frontend information (aka clusterIP) is available. This led to agent panic. The frontend information is populated only when the LRP selected service event is received. This issue won't be hit when the selected service was deployed prior to the LRP or backend pod. Reported-by: Karsten Nielsen Signed-off-by: Aditi Ghag <aditi@cilium.io> Signed-off-by: Maciej Kwiek <maciej@isovalent.com> 15 June 2022, 14:32:22 UTC
f8b2564 build(deps): bump helm/kind-action from 1.2.0 to 1.3.0 Bumps [helm/kind-action](https://github.com/helm/kind-action) from 1.2.0 to 1.3.0. - [Release notes](https://github.com/helm/kind-action/releases) - [Commits](https://github.com/helm/kind-action/compare/94729529f85113b88f4f819c17ce61382e6d8478...d08cf6ff1575077dee99962540d77ce91c62387d) --- updated-dependencies: - dependency-name: helm/kind-action dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> 15 June 2022, 09:03:27 UTC
08521c9 Prepare for release v1.10.12 Signed-off-by: Joe Stringer <joe@cilium.io> 10 June 2022, 23:41:39 UTC
9175938 envoy: Bump cilium envoy to latest version v1.21.3 [ upstream commit 85819de7518f411d61df106200dae247973c5117 ] The images digest is coming from below build. https://github.com/cilium/proxy/runs/6816960166?check_suite_focus=true. Release note: https://www.envoyproxy.io/docs/envoy/v1.21.3/version_history/current Signed-off-by: Tam Mach <tam.mach@cilium.io> 10 June 2022, 10:02:46 UTC
73f2028 ipam: Remove superfluous if statement [ upstream commit eac0dee9d6110d340ea7df6885486b4327052d2f ] The `node.Spec.IPAM.Pool` value is always overwritten after the removed `if` statement, so there is no need to initialize it. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> Signed-off-by: Jussi Maki <jussi@isovalent.com> 09 June 2022, 17:11:53 UTC
7adfc56 ipam: Fix inconsistent update of CiliumNodes [ upstream commit 6a1f1757c586ebda992a900fe0d757b6590e3d14 ] Currently, when the cilium-operator attaches new ENIs to a node, we update the corresponding CiliumNode in two steps: first the .Status, then the .Spec [1]. That can result in an inconsistent state, where the CiliumNode .Spec.IPAM.Pool contains new IP addresses associated with the new ENI, while .Status.ENI.ENIs is still missing the ENI. This inconsistency manisfests as a fatal: level=fatal msg="Error while creating daemon" error="Unable to allocate router IP for family ipv4: failed to associate IP 10.12.14.5 inside CiliumNode: unable to find ENI eni-9ab538c64feb9f59e" subsys=daemon This inconsistency occurs because the following can happen: 1. cilium-operator attaches a new ENI to the CiliumNode. 2. Still at cilium-operator, .Spec is synced with kube-apiserver. The IP pool is updated with a new set of IP addresses and the new ENI. 3. The agent receives this half-updated CiliumNode. 4. It allocates an IP address for the router from the pool of IPs attached to the new ENI, using .Spec.IPAM.Pool. 5. It fails because the new ENI is not listed in the .Status.ENI.ENIs of the CiliumNode object. 6. At cilium-operator, .Status is updated with the new ENI. But wait, you said .Status is updated before .Spec in the function you linked? Yes, but we read the state to populate CiliumNode from two separate places (n.ops.manager.instances and n.available) in the syncToAPIServer function and we don't have anything to prevent having a half updated (one place only) state in the middle of the update function. We lock twice, once for each place, instead of once for the while CiliumNode update. So having a half updated state in the middle of the function would technically be the same as updating .Spec first and .Status second. We can fix this by first creating a snapshot of the pool first, then write the .Status metadata (which may be more recent than the pool snapshot, which is safe, see comment in the source code of this patch), and then write the pool to .Spec. This ensures that the .Status is always updated before .Spec, but at the same time also ensures that .Status is still more recent than .Spec. 1 - https://github.com/cilium/cilium/blob/v1.12.0-rc2/pkg/ipam/node.go#L966-L1012 Co-authored-by: Sebastian Wicki <sebastian@isovalent.com> Signed-off-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> Signed-off-by: Jussi Maki <jussi@isovalent.com> 09 June 2022, 17:11:53 UTC
6bacf77 ui: drop envoy proxy container [ upstream commit cb6c554f0b27685c4032b591d091464e7e6004b3 ] Previously we used envoy proxy container to convert grpc-web traffic to true grpc traffic, so ui backend accepts real grpc traffic. There is an alternative approach to use special wrapper for grpc service on backend: https://github.com/improbable-eng/grpc-web/tree/master/go/grpcweb Related code changes on hubble-ui itself was implemented in this PR: https://github.com/cilium/hubble-ui/pull/226 Signed-off-by: Dmitry Kharitonov <dmitry@isovalent.com> Signed-off-by: Jussi Maki <jussi@isovalent.com> 09 June 2022, 17:11:53 UTC
4c7a729 Also take secondary CIDRs into account when checking IPv4NativeRoutingCIDR [ upstream commit e8b1210fb1e9a17f7c32e366c3f1dfd3b846b24d ] The given IPv4NativeRoutingCIDR is not necessarely part of the primary VPC CIDR and may as well be part of one of the secondary CIDRs. We should take these into account as well before bailing out. Signed-off-by: Alexander Block <ablock84@gmail.com> Signed-off-by: Chris Tarazi <chris@isovalent.com> 09 June 2022, 17:11:09 UTC
13d883b Move auto detection logic for IPv4NativeRoutingCIDR into own function [ upstream commit 6c6ab7422590518045ab0fc3b9afcf4951cb04c1 ] Signed-off-by: Alexander Block <ablock84@gmail.com> Signed-off-by: Chris Tarazi <chris@isovalent.com> 09 June 2022, 17:11:09 UTC
5bebaf5 build(deps): bump actions/cache from 3.0.3 to 3.0.4 Bumps [actions/cache](https://github.com/actions/cache) from 3.0.3 to 3.0.4. - [Release notes](https://github.com/actions/cache/releases) - [Changelog](https://github.com/actions/cache/blob/main/RELEASES.md) - [Commits](https://github.com/actions/cache/compare/30f413bfed0a2bc738fdfd409e5a9e96b24545fd...c3f1317a9e7b1ef106c153ac8c0f00fed3ddbc0d) --- updated-dependencies: - dependency-name: actions/cache dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> 08 June 2022, 23:20:44 UTC
b1e007a bugtool: Add structured node and health output [ upstream commit c6af5800f1f4e38d0ca5b161fac3750791ad9452 ] This commit adds the `-o json` output to `cilium node list` and `cilium-health status`, as the text version of both does not contain all details. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> Signed-off-by: Jussi Maki <jussi@isovalent.com> 08 June 2022, 08:16:39 UTC
c85faff Add constants for identity types [ upstream commit aa7572be0f7cfece0ebfb739716ab970c478d358 ] Signed-off-by: Vlad Ungureanu <ungureanuvladvictor@gmail.com> Signed-off-by: Jussi Maki <jussi@isovalent.com> 08 June 2022, 08:16:39 UTC
a6ffac9 Add type label to the identity metric [ upstream commit f9863ec990507a52bf2074a93d8f4725556aad0c ] Signed-off-by: Vlad Ungureanu <ungureanuvladvictor@gmail.com> Signed-off-by: Jussi Maki <jussi@isovalent.com> 08 June 2022, 08:16:39 UTC
c4aca8b clustermesh: Add ownerReferences for CiliumNodes [ upstream commit 3500290754b098c843ad774884626f35f5866a5a ] This commit is to add ownerReferences for CiliumNodes created by CiliumExternalWorkload, so that we don't unintentionally GC invalid CN. Thanks to @nathanejohnson for reporting this issue. Fixes: #19907 Signed-off-by: Tam Mach <tam.mach@cilium.io> Signed-off-by: Jussi Maki <jussi@isovalent.com> 08 June 2022, 08:16:39 UTC
168a8c1 pkg/fqdn: Fix missing delete for forward map [ upstream commit f439177bed8f99e04e1ad4dbbea5d5cbc54341ed ] This commit fixes a bug where the keys of the forward map inside the DNS cache were never removed, causing the map to grow forever. By contrast, the reverse map keys were being deleted. For both the forward and reverse maps (which are both maps whose values are another map), the inner map keys were being deleted. In other words, the delete on the outer map key was missing for the forward map. In addition to fixing the bug, this commit expands the unit test coverage to assert after any deletes (entries expiring or GC) that the forward and reverse maps contain what we expect. Particularly, in an environment where there are many unique DNS lookups (unique FQDNs) being done, this forward map could grow quite large over time, especially for a long-lived workload (endpoint). This fixes this memory-leak-like bug. Fixes: cf387ce5058 ("fqdn: Introduce TTL-aware cache for DNS retention") Fixes: f6ce522d55d ("FQDN: Added garbage collector functions.") Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Jussi Maki <jussi@isovalent.com> 08 June 2022, 08:16:39 UTC
ee92f56 helm: use port 80/443 by default for the peer service [ upstream commit 7c86fe77439a328ea905928fb937cb17c3d4e875 ] When the service port for the peer service is not specified, it is automatically assigned port 80 or port 443 (when TLS is enabled). Using these ports make it easy to understand whether TLS is enabled for the service or not. Moreover, it makes the behavior consistent with the Hubble Relay service. Signed-off-by: Robin Hahling <robin.hahling@gw-computing.net> Signed-off-by: Jussi Maki <jussi@isovalent.com> 08 June 2022, 08:16:39 UTC
d281e7a metrics: Bump prometheus client library [ upstream commit 044681a8540ba5cfc39b400bfc79a3c1723f9f13 ] This commit is to bump prometheus client library to the latest (e.g. v1.12.2) which will have a better support for go collector of different go versions, and fix NaN value. Fixes: #19985 Signed-off-by: Tam Mach <tam.mach@cilium.io> Signed-off-by: Jussi Maki <jussi@isovalent.com> 08 June 2022, 08:16:39 UTC
7c589ca daemon, metrics: Expose active FQDN connections per endpoint [ upstream commit 47870fe0d3ebdb4bcb859031480241fceadef48b ] This commit exposes new metrics that show the number of active names and IPs in the DNS cache, and number of alive FQDN connections that have expired (aka zombies) per endpoint. This is useful to track the endpoint's DNS cache and DNS zombie cache sizes over time. Note that these metrics are only updated during the FQDN GC which is currently invoked every minute. Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Jussi Maki <jussi@isovalent.com> 08 June 2022, 08:16:39 UTC
cf0b687 pkg/metrics: Define FQDN subsystem [ upstream commit d9ae1c4d790b5780b38d36a8c592286bd2e44599 ] This commit contains no functional changes and is only cosmetic to ease future commits when adding new FQDN-related metrics. Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Jussi Maki <jussi@isovalent.com> 08 June 2022, 08:16:39 UTC
c100a47 pkg/fqdn: Provide DNSCache count for metrics collection [ upstream commit 95362deef50f7077c516da7a6bc4a03b05d361d8 ] This commit adds a new convenience functions to get a count of * how many entries are inside the DNS cache (IPs) * and how many FQDNs are inside the DNS cache It will be used by upcoming commits to expose these values as a metrics. Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Jussi Maki <jussi@isovalent.com> 08 June 2022, 08:16:39 UTC
ae30cfb dameon: Change the default FQDN regex LRU to be 1024 [ upstream commit ce9583d8d8bbe618fb08abfa94c35e4fcc3153c7 ] Following the previous commit's benchmark result, let's update the LRU default size to be 1024, given that it only results in a few 10's of MBs increase when the cache nears full. Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Jussi Maki <jussi@isovalent.com> 08 June 2022, 08:16:39 UTC
9d0e0f1 dnsproxy: Add benchmark for large FQDN-based CNPs [ upstream commit 38c00367c7e0cefa1c9bbd51ad4414a839015bff ] When comparing efficieny of increasing the LRU size from 128 to 1024 with ~22k CNPs, we see the following results: ``` \# LRU size 128. $ go test -tags privileged_tests -v -run '^$' -bench Benchmark_perEPAllow_setPortRulesForID_large -benchmem -benchtime 1x -memprofile memprofile.out ./pkg/fqdn/dnsproxy > old.txt \# LRU size 1024. $ go test -tags privileged_tests -v -run '^$' -bench Benchmark_perEPAllow_setPortRulesForID_large -benchmem -benchtime 1x -memprofile memprofile.out ./pkg/fqdn/dnsproxy > new.txt $ benchcmp old.txt new.txt benchcmp is deprecated in favor of benchstat: https://pkg.go.dev/golang.org/x/perf/cmd/benchstat benchmark old ns/op new ns/op delta Benchmark_perEPAllow_setPortRulesForID_large-8 3954101340 3010934555 -23.85% benchmark old allocs new allocs delta Benchmark_perEPAllow_setPortRulesForID_large-8 26480632 24167742 -8.73% benchmark old bytes new bytes delta Benchmark_perEPAllow_setPortRulesForID_large-8 2899811832 1824062992 -37.10% ``` Here's the raw test run with LRU size at 128: ``` Before (N=1) Alloc = 31 MiB HeapInuse = 45 MiB Sys = 1260 MiB NumGC = 15 After (N=1) Alloc = 445 MiB HeapInuse = 459 MiB Sys = 1260 MiB NumGC = 40 ``` Here's the raw test run with LRU size at 1024: ``` Before (N=1) Alloc = 31 MiB HeapInuse = 48 MiB Sys = 1177 MiB NumGC = 17 After (N=1) Alloc = 78 MiB HeapInuse = 93 MiB Sys = 1177 MiB NumGC = 53 ``` We can see that it's saving ~300MB. Furthermore, if we compare the memprofiles from the benchmark run via ``` go tool pprof -http :8080 -diff_base memprofile.out memprofile.1024.out ``` we see an ~800MB reduction in the regex compilation. Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Jussi Maki <jussi@isovalent.com> 08 June 2022, 08:16:39 UTC
51f4daf daemon, fqdn: Add flag to control FQDN regex LRU size [ upstream commit 5fa7ae278340c41a60050cf564d60a52ad588b1b ] Advanced users can configure the LRU size for the cache holding the compiled regex expressions of FQDN match{Pattern,Name}. This is useful if users are experiencing high memory usage spikes with many FQDN policies that have repeated matchPattern or matchName across many different policies. Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Jussi Maki <jussi@isovalent.com> 08 June 2022, 08:16:39 UTC
2a3ff7c pkg/labels: Optimize SortedList() and FormatForKVStore() [ upstream commit 0790e076cb0a7a83a6314eb559c97811d3240bcb ] FormatForKVStore() previously returned a string for no reason as every caller converted the return value to a byte slice. This allows us to eliminate string concatenation entirely and use the bytes.Buffer directly. Building on the above, given that SortedList() returns a byte slice and calls FormatForKVStore() for its output, we can optimize it with the same technique to eliminate string concatenation. Here are the benchmark comparisons: ``` $ go test -v -run '^$' -bench 'BenchmarkLabels_SortedList|BenchmarkLabel_FormatForKVStore' -benchmem ./pkg/labels > old.txt $ go test -v -run '^$' -bench 'BenchmarkLabels_SortedList|BenchmarkLabel_FormatForKVStore' -benchmem ./pkg/labels > new.txt $ benchcmp old.txt new.txt benchcmp is deprecated in favor of benchstat: https://pkg.go.dev/golang.org/x/perf/cmd/benchstat benchmark old ns/op new ns/op delta BenchmarkLabels_SortedList-8 2612 1120 -57.12% BenchmarkLabel_FormatForKVStore-8 262 54.5 -79.18% benchmark old allocs new allocs delta BenchmarkLabels_SortedList-8 35 13 -62.86% BenchmarkLabel_FormatForKVStore-8 4 1 -75.00% benchmark old bytes new bytes delta BenchmarkLabels_SortedList-8 1112 664 -40.29% BenchmarkLabel_FormatForKVStore-8 96 48 -50.00% ``` Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Jussi Maki <jussi@isovalent.com> 08 June 2022, 08:16:39 UTC
2841700 pkg/labels: Add benchmark for hot labels code [ upstream commit 351c5d8f66eb7a3a35f833462a256c9fda36f51e ] SortedList() and FormatForKVStore() can be very hot code in environments where there's constant policy churn, especially CIDR policies where there can be a large number of CIDR labels. This commit adds benchmarks for later commits to use as a baseline. Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Jussi Maki <jussi@isovalent.com> 08 June 2022, 08:16:39 UTC
7211294 pkg/policy/api: Optimize FQDNSelector String() [ upstream commit 4c2c2446577d92fb15f0314efa487d77a6ce0812 ] Use strings.Builder instead of fmt.Sprintf() and preallocate the size of the string so that Go doesn't need to over-allocate if the string ends up longer than what the buffer growth algorithm predicts. Results: ``` $ go test -v -run '^$' -bench 'BenchmarkFQDNSelectorString' -benchmem ./pkg/policy/api > old.txt $ go test -v -run '^$' -bench 'BenchmarkFQDNSelectorString' -benchmem ./pkg/policy/api > new.txt $ benchcmp old.txt new.txt benchmark old ns/op new ns/op delta BenchmarkFQDNSelectorString-8 690 180 -73.97% benchmark old allocs new allocs delta BenchmarkFQDNSelectorString-8 9 4 -55.56% benchmark old bytes new bytes delta BenchmarkFQDNSelectorString-8 288 208 -27.78% ``` Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Jussi Maki <jussi@isovalent.com> 08 June 2022, 08:16:39 UTC
7702056 endpoint: Fix lock contention header file sync [ upstream commit 507332a082c8c4780e67f219dadf42f61772e908 ] Upon a successful DNS response, Cilium's DNS proxy code will sync the DNS history state to the individual endpoint's header file. Previously, this sync was done inside a trigger, however the calling code, (*Endpoint).SyncEndpointHeaderFile(), acquired a write-lock for no good reason. This effectively negated the benefits of having the DNS history sync behind a trigger of 5 seconds. This is especially suboptimal because the header file sync is actually causing Cilium to serialize processing the DNS request for a single endpoint. To illustrate the impact of the above a bit more concretely, if a single endpoint does 10 DNS requests at the same time, acquiring the write-lock causes the processing of those 10 requests to be done one at a time. For the sake of posterity, this is not the case if 10 endpoints were to make DNS requests in parallel. This obviously has a performance impact both in terms of being slow CPU-wise, but also memory-wise. Take for example a DNS request bursty environment, it could cause an uptick in memory usage due to many goroutines being created and blocking due to the serialized nature of locking. Now that the code is all executing behind a trigger, we can remove the lock completely and initialize the trigger setup where the Endpoint object is created (e.g. createEndpoint(), parseEndpoint()). Now the lock is only taken in every 5 seconds when the trigger runs. This should relieve the lock contention drastrically. For context, in a user's environment where the pprof was shared with us, there were around 440 goroutines with 203 of them stuck waiting inside SyncEndpointHeaderFile(). We can also modify SyncEndpointHeaderFile() to no longer return an error, because it's not possible for invoking the trigger to fail. If we fail to initialize the trigger itself, then we log an error, but this is essentially impossible because it can only fail if the trigger func is nil (which we control). Understanding the locking contention came from inspecting the pprof via the following command and subsequent code inspection. ``` go tool trace -http :8080 ./cilium ./pprof-trace ``` Suggested-by: Michi Mutsuzaki <michi@isovalent.com> Suggested-by: André Martins <andre@cilium.io> Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Jussi Maki <jussi@isovalent.com> 08 June 2022, 08:16:39 UTC
f4b2e1d cmd: Allow more complicated patterns in map string type. [ upstream commit 070ded019adbfee49d73bc7be0c6ba3bac0ef59c ] The previous PR #18478 wraps existing viper GetStringMapString function to get around upstream bugs, however, it's unintentionally restricted a few formats, which supported before in cilium, such as: - --aws-instance-limit-mapping=c6a.2xlarge=4,15,15,m4.xlarge=1,5,10 - --api-rate-limit=endpoint-create=rate-limit:10/s,rate-burst:10,parallel-requests:10,auto-adjust:true For complicated attribute, we are allowing comma character in value part of key value pair. As golang didn't support look-ahead functionalities in built-in regex library, this commit is to replace string.Split function by custom implementation to handle such scenario. Relates: #18478 Fixes: #18973 Signed-off-by: Tam Mach <tam.mach@cilium.io> Signed-off-by: Gilberto Bertin <jibi@cilium.io> 07 June 2022, 10:03:21 UTC
9745941 docs: Fix incorrect FQDN flag [ upstream commit 9c6e4245f0761d3e8bcf904785290e85f8fd336b ] Fixes: f6ce522d ("FQDN: Added garbage collector functions.") Signed-off-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Gilberto Bertin <jibi@cilium.io> 07 June 2022, 10:03:21 UTC
b8aeb84 docs: Fix max SPI value for IPsec key rotations [ upstream commit 54d708e5d812a00451adab99dae01609447de2cf ] The SPI value is expected to take 4 bits at most so it's maximum value should be 15 not 16. Let's fix that in the key rotation documentation. The agent also rejects value 0, so allowed values are [1;15]. Reported-by: Odin Ugedal via Slack Signed-off-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Gilberto Bertin <jibi@cilium.io> 07 June 2022, 10:03:21 UTC
a7eb2f3 Add counter to track all datapath timeouts due to FQDN IP updates [ upstream commit 29268926f571c3a008bafcd18ecfc9b494877627 ] Signed-off-by: Vlad Ungureanu <ungureanuvladvictor@gmil.com> Signed-off-by: Gilberto Bertin <jibi@cilium.io> 07 June 2022, 10:03:21 UTC
b1c1d8a api: change "group not found" log to debug [ upstream commit 80092ce0b2bf5351e4d71527904cbe401c2b8e4b ] Since commit 67f74ff ("images/cilium: remove cilium group from Dockerfile") the cilium group is no longer created in the image running the agent, resulting in the following log message on cilium-agent start: level=info msg="Group not found" error="group: unknown group cilium" file-path=/var/run/cilium/cilium.sock group=cilium subsys=api Change the log message to debug level to avoid confusion. Suggested-by: André Martins <andre@cilium.io> Signed-off-by: Tobias Klauser <tobias@cilium.io> Signed-off-by: Gilberto Bertin <jibi@cilium.io> 07 June 2022, 10:03:21 UTC
cfbcf30 Bugtool: Add additional tc commands. [ upstream commit b13dc89166e9a3d9eafe4a77fd96b389a05cfe1a ] The tc command prints out information not shown by bpftool. As well, it is possible that we may need information about tc entities that are not managed by Cilium when debugging Cilium issues. This adds extra bugtool commands to be run with cilium-bugtool. Including listing tc qdisc and getting filter/class/chain info for all network interfaces. Fixes: #17468 Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com> Signed-off-by: Gilberto Bertin <jibi@cilium.io> 07 June 2022, 10:03:21 UTC
back to top