sort by:
Revision Author Date Message Commit Date
bffc6cd WIP Signed-off-by: Martynas Pumputis <m@lambda.lt> 07 November 2023, 12:47:40 UTC
518d7cc daemon: Log stale router IPs in debug mode This helps troubleshooting potential bugs in CI. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> 07 November 2023, 11:25:57 UTC
ed20c8a ipam: Add unit test for `reallocateDatapathIPs` This slightly modifies the `reallocateDatapathIPs` function to take a mock allocator as its first argument, and adds a basic unit test which tests for the precedence logic we want. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> 07 November 2023, 11:25:57 UTC
5a1fe11 node: Remove unused code around `cilium_host` IP restoration The previous commit made these functions obsolete. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> 07 November 2023, 11:25:57 UTC
6195edf daemon: Do not attempt to remove IPs on missing interface Before the previous commit, `removeOldRouterState` would only be called with a restored IP address, which could be nil. Now however, it is always called with the new router IP, which could have been restored, or could have been a new allocation. In either case, we want to remove any non-matching IPs from `cilium_host`. However, the case with new allocations now also happens the first time Cilium is started, in which case there is no `cilium_host` device yet. This commit therefore does not treat a missing `cilium_host` as an error. This fixes the following warning which was introduced by the previous commit: ``` level=warning msg="Failed to remove old router IPs from cilium_host." attempt=1 error="Link not found\nLink not found" subsys=daemon ``` Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> 07 November 2023, 11:25:57 UTC
c646636 daemon: Simplify `cilium_host` IP restoration Before this commit, the `cilium_host` (aka router) IP restoration was done in two steps: The first step collected previous IPs from the filesystem and Kubernetes and attempted to guess which one might still be valid in the new IPAM configuration. This validation was done using either the Pod CIDR, the VPC CIDR, or the native routing CIDR. In a second step, the IP was then actually re-allocated via the IPAM subsystem. That second step could however still fail - just because an IP is part of the VPC CIDR or native routing CIDR does not mean it actually still assigned to the node. This commit attempts to simplify the logic: Instead of trying to guess if the restore IP(s) could be allocated using IPAM, we just try to allocate them. If the allocation fails, we still have a fall back logic, and the old IP address(es) are still removed from the `cilium_host` interface. This way, the old CIDR check becomes obsolete, as we now use the IPAM subsystem as the source of truth if an IP can be restored or not. This also fixes a bug where Multi-Pool IPAM always reallocated the router IP, because it did not implement the CIDR check in the first step. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> 07 November 2023, 11:25:57 UTC
481a406 daemon: Simplify router IP from FS restoration Before this commit, we were setting the restored IP during `node.AutoComplete`, getting it `Daemon.restoreCiliumHostIPs`, only to set it again in `node.RestoreHostIPs` if it passed validation. This commit simplifies the logic to extract the IPs from the file system, and only setting them after validation. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> 07 November 2023, 11:25:57 UTC
dc21b3a daemon: Move cilium_host IP removal loop into a separate function This commit contains no functional changes and prepares the code for changes coming up in subsequent commits. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> 07 November 2023, 11:25:57 UTC
b41b4d2 daemon: Split out `removeOldRouterState` from `restoreCiliumHostIPs` The `restoreCiliumHostIPs` function does two things: Determine which IP to restore, and then remove any IPs from the cilium_host device which are not the restored one. This commit splits out the second part, so it can be retried independently. There is no reason to retry the first part, as it is deterministic. This commit prepares the code for further changes following in subsequent commits. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> 07 November 2023, 11:25:57 UTC
85d8e6a Update cilium status command module health output To date, we render modules health information as tabular data. As modules health instrumentation grows this rendering makes it hard for users to grok the output. This PR proposes changing the rendering of modular health using a tree like structure to give users a better line of sight to gain a better understanding of the agent modules state. Signed-off-by: Fernand Galiana <fernand.galiana@isovalent.com> 07 November 2023, 10:59:41 UTC
1c1b7eb devel: stop kind from unnecessarily pulling images Because `make kind-install` can default to a :latest image, the kubelet will pull images on every restart. This makes restarts use unnecessary bandwidth. So, change the ImagePullPolicy to IfNotPresent. Also, pre-load the image via `kind load docker-image` to prevent all the nodes pulling the image in parallel. Signed-off-by: Casey Callendrello <cdc@isovalent.com> 07 November 2023, 10:37:36 UTC
182c85c kind-image-fast-*: stop deleting pods for every node The loop inadvertently restarted all cilium pods once **for every node** rather than once per cluster. Oops. Signed-off-by: Casey Callendrello <cdc@isovalent.com> 07 November 2023, 10:37:36 UTC
81c45d2 bpf: Remove strict encrypt check from bpf_overlay Previously, the strict encrypt check [1] was running in bpf_overlay (in addition to bpf_host). That particular check was assuming that no pod-to-pod unencrypted packet should be seen by bpf_overlay. However, after the previous commit it's no longer the case. So, remove the check, and only keep the one in bpf_host. A nice side-effect of the previous commit is that for WG+tunnel we automatically enforce the strict mode w/o relying on strict_allow(). I.e., any tunnel encaped traffic is going to be dropped until cilium-agent has propogated destination node's IP addr into WG's allowed-ips list for that node. This commit also drops the WG strict mode test case for tunneling, as the test configuration is no longer applicable, and the test is going to be migrated to the CLI connectivity suite. [1]: https://github.com/cilium/cilium/pull/21856 Signed-off-by: Martynas Pumputis <m@lambda.lt> 07 November 2023, 10:09:30 UTC
0af609f ci-e2e,ipsec: Bump CLI vsn To include the encryption suite changes [1] [2] [1]: https://github.com/cilium/cilium-cli/pull/2055 [2]: https://github.com/cilium/cilium-cli/pull/2089 Signed-off-by: Martynas Pumputis <m@lambda.lt> 07 November 2023, 10:09:30 UTC
b67291f bpf: Encap with cilium_{vxlan,geneve} before passing to WG So that a src security ID can be transferred to a remote node (e.g., for netpol checks). This commit changes a pkt path when WireGuard + tunneling are enabled AND the newly introduced --wireguard-encapsulate is set. Previously, we had the following: ┌──────┐ 1. ┌──────┐ 4. │ lxc0 ├──────────────► eth0 ├──────► └──────┘ └─┬───▲┘ │ │ │ │ 2.│ │ 3. │ │ ┌───────────────┐ ┌───▼───┴────┐ │ cilium_vxlan │ │cilium_wg0 │ └───────────────┘ └────────────┘ With this change: ┌──────┐ ┌──────┐ │ lxc0 │ ┌──────────► eth0 ├─────► └───┬──┘ │ └─┬───▲┘ 5. │ │ │ │ │ │ │ │ 1.│ 2.│ 3. │ │ 4. │ │ │ │ ┌─────▼──────┴──┐ ┌───▼───┴────┐ │ cilium_vxlan │ │cilium_wg0 │ └───────────────┘ └────────────┘ A side effect of this change is that host-to-remote-pod traffic is going to be encrypted (previously it was not). The change was first made available in v1.14 [1] (controlled w/ --wireguard-encapsulate, which defaults to false). To avoid breaking connections during an upgrade from v1.14 to v1.15 (due to missing node IPs within allowed-ips), in v1.14 we populate those IPs regardless whether the feature is enabled. [1]: https://github.com/cilium/cilium/pull/28917 Signed-off-by: Martynas Pumputis <m@lambda.lt> 07 November 2023, 10:09:30 UTC
1f1886b fqdn/dnsproxy: drop dependency on global EnableIPv{4,6} option Pass them as parameters to StartDNSProxy rather than depending on the global config. Signed-off-by: Tobias Klauser <tobias@cilium.io> 07 November 2023, 09:44:15 UTC
90ad1cd ci: disable envoy tracing in multi-pool workflow The multi-pool workflow doesn't exercise envoy in particular and the verbose logs make it harder to analyze logs. Disable them. Signed-off-by: Tobias Klauser <tobias@cilium.io> 07 November 2023, 09:43:56 UTC
4343ab0 Updates BGP CP Developer Docs Removes the deprecated `tunnel` Helm value in favor of the `routingMode` value in relevant files. Signed-off-by: Daneyon Hansen <daneyon.hansen@solo.io> 07 November 2023, 09:42:03 UTC
c186d20 operator: use params struct (cell.In) for ctrl-runtime & gw-api cells With this commit, params struct are used in the controller-runtime & gateway-api cells. Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com> 07 November 2023, 09:41:47 UTC
3505a59 operator: use k8s rest client config from clientset cell With this commit, the k8s rest client config used by the controller-runtime's manager is retrieved from the clientset hivecell. This way, the presence of a k8s api server can be checked before initializing the manager. Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com> 07 November 2023, 09:41:47 UTC
40f0d07 operator: errorhandling when adding types to controller-runtime scheme This commits adds proper errorhandling when adding types to the controller-runtime scheme. Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com> 07 November 2023, 09:41:47 UTC
84d96c1 operator: refactor controller-runtime reconciler registration This commit refactors the registration of the various Gateway API related reconcilers to the controller-runtime manager by reducing boilerplate code. Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com> 07 November 2023, 09:41:47 UTC
e61cfdf operator: use job for controller-runtime manager lifecycle Currently the manager of the controller-runtime is managed within a plain Go routine. This commit refactors this towards using a OneShot job provided by the Hive framework. Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com> 07 November 2023, 09:41:47 UTC
8f2935e operator: introduce controller-runtime cell Currently the Gateway API controller (and its reconcilers) is the only one that is making us of the controller-runtime library. To prepare its use for other use-cases, this commit extracts the controller-runtime integration into its own cell by providing the relevant components (Manager & Scheme) to dependent cells. The new controller-runtime cell is responsible to start the manager itself. Dependent cells add their reconcilers to the manager and types to the scheme respectively. Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com> 07 November 2023, 09:41:47 UTC
8579930 operator: remove unused internal gateway api model This commit removes the unused and empty model within the gateway controller. Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com> 07 November 2023, 09:41:47 UTC
18879b5 service: fix service manager interface mismatch caused by merge race Due to a merge race, the SyncWithK8sFinished method of the ServiceManager interface did not match the actual implementation. Let's fix it. Fixes: 66ba850fd1c1 ("services: refactor SyncWithK8sFinished to return stale services") Fixes: cf4279c68202 ("services: don't wait for clustermesh to delete local stale backends") Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> 07 November 2023, 09:32:10 UTC
66ba850 services: refactor SyncWithK8sFinished to return stale services Refactor the SyncWithK8sFinished function to return the list of services with stale backends, which should be refreshed, rather than directly refreshing them. This makes the separation more clear, allowing to avoid having to pass the refresh function as parameter and preventing possible deadlocks due to incorrect mutex locking (due to the interdependencies between the service subsystem and service cache). Suggested-by: Jussi Maki <jussi@isovalent.com> Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> 07 November 2023, 08:22:11 UTC
cf4279c services: don't wait for clustermesh to delete local stale backends fe4dda76dd6a ("services: prevent temporary connectivity loss on agent restart") modified the handling of restored backends to prevent possibly causing temporary connectivity disruption on agent restart if a service is either associated with multiple endpointslices (e.g., it has more than 100 backends, or is dual stack) or has backends spanning across multiple clusters (i.e., it is a global service). At a high level, we now keep a list of restored backends, which continue being merged with the ones we received an update for, until the bootstrap phase completes. At that point, we trigger an update for each service still associated with stale backends, so that they can be removed. One drawback associated with this approach, though, is that when clustermesh is enabled we currently wait for full synchronization from all remote clusters before triggering the removal of stale backends, regardless of whether the given service is global (i.e., possibly includes also remote backends) or not. One specific example in which such behavior is problematic relates to the clustermesh-apiserver Indeed, if it gets restarted at the same time of the agents (e.g., during an upgrade), the associated service might end up including both the address of the previous pod (which is now stale) and that of the new one, which is correct. When kvstoremesh is enabled, local agents connect to it through that service. In this case, there's a circular dependency: the agent may pick the stale backend, and the connection to etcd fails, which in turn prevents the synchronization from being started, and eventually complete to trigger the removal of the stale backend. Although this dependency eventually resolves as a different backend is picked by the service load-balancing algorithm, unnecessary delay is introduced (the same could also happen for remote agents connecting through a NodePort if KPR is enabled). To remove this dependency, let's perform a two-pass cleanup of stale backends: the first one as soon as we synchronize with Kubernetes, targeting non-global services only; the second triggered by full clustermesh synchronization, covering all remaining ones. Hence, non-global services can be fully functional also before the completion of the full clustermesh synchronization. It is worth mentioning that this fix applies to all non-global services, which can now converge faster also in case of large clustermeshes. Co-authored-by: Jussi Maki <jussi@isovalent.com> Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> 07 November 2023, 08:22:11 UTC
97af800 bpf: fine-tune PACKET_HOST adjustment in redirect_ep() Instead of inflicting this code path on every user of redirect_ep(), limit it to actual tunnel traffic. This is possible as we now have the `from_tunnel` parameter in bpf_lxc's policy tail-call function. Signed-off-by: Julian Wiedmann <jwi@isovalent.com> 07 November 2023, 01:16:17 UTC
0cd9780 bpf: lxc: fine-tune `from_tunnel` path in ingress tail-call Wrap the access to CB_FROM_TUNNEL in HAVE_ENCAP, so that it can be easily optimized out when there's no tunnel traffic in the cluster. Signed-off-by: Julian Wiedmann <jwi@isovalent.com> 07 November 2023, 01:16:17 UTC
746fb2a bpf: lxc: remove kube-proxy workaround in to-container program This specific workaround claims to handle service replies that came in via tunnel, and require RevDNAT by kube-proxy before delivery to the local pod. But it's located in the to-container program (and so the next hop will be the veth peer, *not* the kernel stack). And l3_local_delivery() in from-overlay will have already applied the same workaround, before passing the packet to the stack. Thus it's safe to remove the stale code in bpf_lxc. Signed-off-by: Julian Wiedmann <jwi@isovalent.com> 07 November 2023, 01:16:17 UTC
0e74363 bpf: l3: use `from_tunnel` parameter for kube-proxy workaround To stay consistent with bpf_lxc, prefer the `from_tunnel` parameter over IS_BPF_OVERLAY to detect whether l3_local_delivery() was inlined by from-overlay. Signed-off-by: Julian Wiedmann <jwi@isovalent.com> 07 November 2023, 01:16:17 UTC
37308ef bpf: lxc: use CB_FROM_TUNNEL for kube-proxy workaround in ingress tail-call We recently added a `from_tunnel` parameter to the local delivery path of IPv4 traffic, which gets passed to the pod's ingress tail-call via CB_FROM_TUNNEL. Add the same parameter for the IPv6 path, and then use it in bpf_lxc to fine-tune the kube-proxy workaround. Signed-off-by: Julian Wiedmann <jwi@isovalent.com> 07 November 2023, 01:16:17 UTC
2685d16 bpf: nodeport: fix up indentation in ingress path 5e2202af5c10 ("bpf: nodeport: make Ingress RevDNAT tail-call conditional") introduced some mis-indentation, fix this up again. While at it also clean up some tiny whitespace damage for two jump labels. Signed-off-by: Julian Wiedmann <jwi@isovalent.com> 07 November 2023, 01:12:19 UTC
229b446 bpf: nodeport: re-introduce Ingress HostFW between RevSNAT and RevDNAT 5e2202af5c10 ("bpf: nodeport: make Ingress RevDNAT tail-call conditional") changed the sequence in which RevNAT and Ingress HostFW are applied on modern kernels. Where previously we would apply in them in the order of RevSNAT / Ingress policy / RevDNAT, the new sequence is RevSNAT / RevDNAT / Ingress policy as we now inline nodeport_rev_dnat_ingress_ipv*(), and then only on the recircle pass through the Ingress policy in bpf_host's handle_ipv*(). With the subtle difference that 1. we now apply policy *after* RevDNAT, and thus the .saddr is the VIP instead of the backendIP, and 1. after RevDNAT, we might redirect to a different interface (and thus *not* recircle, skipping Ingress policy entirely). Restore the old sequence, by shuffling the relevant HostFW policy code back into place between RevSNAT and RevDNAT. Having it there was the plan all along :). Signed-off-by: Julian Wiedmann <jwi@isovalent.com> 07 November 2023, 01:12:19 UTC
80d99a6 bpf: Add TC_ACT_REDIRECT check for nodeport Relates: https://github.com/cilium/cilium/pull/18894\#discussion_r1373896641 Signed-off-by: Tam Mach <tam.mach@cilium.io> 06 November 2023, 21:26:08 UTC
e72677d k8s ingress & gateway api: qualify envoy clusters and their references Currently, the Clusters resources in the CiliumEnvoyConfig aren't qualified with the namespace and name of the CEC itself. This leads to issues when updating the CiliumEnvoyConfigs due to changes in the K8s Ingress & Gateway API resources. If multiple resources are referencing the same K8s Service, the Cluster resource gets deleted if one of these K8s Ingress or Gateway resources gets deleted - and therefore breaks the other resources. With this commit, the name of the Cluster resource and their references gets qualified like the rest of the resources. For the EDS lookup of the endpoint addresses, the field `service_name` of the Cluster is used. Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com> 06 November 2023, 16:32:33 UTC
550b56e cmd: Unit test for parseNodeID Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> 06 November 2023, 16:28:05 UTC
0e5d3c3 cmd: New flag to flush only XFRM configs for a given node ID This can be useful to flush the XFRM configs of stale node IDs. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> 06 November 2023, 16:28:05 UTC
47e1b3f ipsec: Move getNodeIDFromXfrmMark to pkg/common We will use this function from cilium-dbg in the subsequent commit. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> 06 November 2023, 16:28:05 UTC
c924bd6 cmd: Unit test for the filterXFRMs function We test both a single call to filterXFRMs and two chained calls. The latter is because we will need to chain calls for different filters because they are ANDed. For example, filtering on both the SPI and the node ID should only flush XFRM configs that match for both the given SPI and node ID. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> 06 November 2023, 16:28:05 UTC
dd8920a cmd: Refactor XFRM filter function to ease generalization Refactor the filterXFRMBySPI function to be able to filter by other things than SPI without duplicating the main logic. The new function filterXFRMs takes two predicate functions instead of hardcoding the comparison to "spi". No functional changes in this commit. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> 06 November 2023, 16:28:05 UTC
5c7cfe6 cmd: New flag to flush only XFRM configs for given SPI This is useful to for example manually delete the XFRM config corresponding to an old key. It will warn if the user is about to delete all XFRM configs on the assumption that that isn't the intended action or the filter wouldn't be necessary. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> 06 November 2023, 16:28:05 UTC
37b611e cmd: Add confirmation to encrypt flush command The cilium-dbg encrypt flush command removes all XFRM states and policies on the node. That will lead to packet drops until connections are reestablished. Traffic will also be sent in plain text between pods. This commit therefore asks for confirmation when running the command, to ensure nobody performs this action by mistake. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> 06 November 2023, 16:28:05 UTC
fe08772 ipsec: Move getSPIFromXfrmPolicy to pkg/common This function will be used from cilium-dbg so we need to expose it from a shared package. We already have such a package for IPsec utility functions in pkg/common/ipsec. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> 06 November 2023, 16:28:05 UTC
f0d2d78 Augment node manager module health reporting Add instrumentation to node manager to provide module health reporting when node CRUD operations are detected. Signed-off-by: Fernand Galiana <fernand.galiana@isovalent.com> 06 November 2023, 14:51:14 UTC
ab56f67 bgpv1: Use specific log message and remove unused parameter This commit doesn't introduce any functional changes; it's solely about modifications related to log messages and parameters. Specifically: 1. In the diff(), when calling registerOrReconcileDiff and withdrawDiff, more specific log messages can be used. 2. In the withdrawDiff(), the `policy` parameter is not used and can be safely removed. 3. Additionally, withdrawDiff() is actually populating the `withdraw` field of reconcileDiff, while reconcileDiff doesn't have a `remove` field. Signed-off-by: Huagong Wang <wanghuagong@kylinos.cn> 06 November 2023, 14:50:55 UTC
fb2f404 Makefile: go test with -vet=all Without -vet=all, go test doesn't run the same set of vets as go vet. Since we dropped go vet ./... from the integration tests, we weren't running the full suite anymore. Fixes: f6346d2856 (make: drop redundant `go vet ./...` from integration tests) Signed-off-by: David Bimmler <david.bimmler@isovalent.com> 06 November 2023, 14:47:54 UTC
d9a4fdd fieldmask_test: avoid copying flow Avoid copying the flow protobuf message, as this triggers go vet to complain about copying a mutex (in internal protobuf state). No functional changes intended, simply appeasing vet. Signed-off-by: David Bimmler <david.bimmler@isovalent.com> 06 November 2023, 14:47:54 UTC
50b3e8e images: update cilium-builder Signed-off-by: David Bimmler <david.bimmler@isovalent.com> 06 November 2023, 14:47:54 UTC
2123859 Revert "Add deepcopy plugin" This reverts commit 9f4dcea6b2dadab1a659b6a07b07eb8b5f48e86f. Signed-off-by: David Bimmler <david.bimmler@isovalent.com> 06 November 2023, 14:47:54 UTC
7cc1852 Revert "Run deepcopy protoc plugin only for flow.proto" This reverts commit cc94abe1afb8cf5e6479603b12ba89fcfeb7e387. Signed-off-by: David Bimmler <david.bimmler@isovalent.com> 06 November 2023, 14:47:54 UTC
46bea90 Change Hubble dashboard name Update hubble dashboard name Signed-off-by: Dean <22192242+saintdle@users.noreply.github.com> 06 November 2023, 14:47:22 UTC
4b098b1 Update USERS.md Signed-off-by: eliranw <39266788+eliranw@users.noreply.github.com> 06 November 2023, 14:47:02 UTC
4e8df52 CODEOWNERS: IPsec owns pkg/common/ipsec This directory was introduced in commit 2218611873de ("cmd, common: Move countUniqueIPsecKeys to common/ipsec pkg"), but the CODEOWNERS were not updated. Fixes: 2218611873de ("cmd, common: Move countUniqueIPsecKeys to common/ipsec pkg") Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> 06 November 2023, 14:22:40 UTC
8fb0dd0 ci: Remove useless quotes in update label workflow Quotes are not needed for GHA variables, so let's remove them. Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com> 06 November 2023, 14:15:45 UTC
c2674ae ingress: update resources on changed ingress class field If an Ingress resource with `ingressClass: cilium` is changed to a different value, the corresponding resources (CEC, Endpoints & Service` aren't removed (mode dedicated) or the shared CiliumEnvoyConfig isn't updated (mode shared). Therefore, this commit reflects the changes on the corresponding resources when the `ingressClass` of an Ingress gets updated from `cilium` to something else. Fixes: #23781 Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com> 06 November 2023, 14:01:05 UTC
66b71f3 bpf/tests: Fixed `loop not unrolled` error in pktgen On newer clang versions, the `pktgen_finish` function would throw an error when compiling: `loop not unrolled: the optimizer was unable to perform the requested transformation`. I moved the logic inside the switch cases into their own inlined functions, this seems to resolve the issue. Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com> 06 November 2023, 11:26:08 UTC
1900f4a datapath: Move linuxNodeHandler IPsec functions to their own file This commit has no functional changes. It simply moves all the linuxNodeHandler functions that pertain to IPsec to a new file, ipsec.go. This will ease review assignments by ensuring that we don't require an IPsec review on non-IPsec code and vice versa. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> 06 November 2023, 11:18:30 UTC
ed546a7 gateway-api: Check for required CRDs upon startup Currently, if the required CRDs are not installed, Cilium Operator will just crash due to the below error. This commit is to perform pre-flight check, and avoid the crash. ``` 2023-11-03T11:53:45.585078169Z level=fatal msg="failed to start: failed to create gateway controller: failed to get API group resources: unable to retrieve the complete list of server APIs: gateway.networking.k8s.io/v1: the server could not find the requested resource" subsys=cilium-operator-generic ``` Signed-off-by: Tam Mach <tam.mach@cilium.io> 06 November 2023, 10:56:21 UTC
3aa51eb bpf: ipsec: move get_min_encrypt_key() to encrypt.h Keep this function co-located with the other IPsec code, and bring it into CODEOWNERS scope of cilium/ipsec. Signed-off-by: Julian Wiedmann <jwi@isovalent.com> 06 November 2023, 09:52:50 UTC
df969b7 ipsec: Remove dead code for IPsec node encryption Node encryption for IPsec hasn't been supported since 1d2674df ("docs: ipsec: remove node-to-node encryption") and subsequent commits. The feature also wasn't working since several releases. This commit simply removes the code for that feature. This code has no use now and makes changes to IPsec slightly more difficult. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> 06 November 2023, 07:25:37 UTC
a42ef40 gateway-api: Enable Response header test for standard channel This is to enable HTTPResponseHeaderModification feature for standard CRDs. Signed-off-by: Tam Mach <tam.mach@cilium.io> 03 November 2023, 23:25:22 UTC
ef1b0eb gateway-api: Add support for HTTPRoute timeout We need to remove max stream duration to make the timeout effective in route action, max stream duration can be also set http connection manager. Another note that this feature is only available in the experimental CRDs. Signed-off-by: Tam Mach <tam.mach@cilium.io> 03 November 2023, 23:25:22 UTC
54230ee gateway-api: Enable CI for Backend protocol features This commit is to enable two new features HTTPRouteBackendProtocolH2C and HTTPRouteBackendProtocolWebSocket. Just notice that we can just reverse the feature flags (e.g. supported vs exempt) for better maintenance and faster lookup. Signed-off-by: Tam Mach <tam.mach@cilium.io> 03 November 2023, 23:25:22 UTC
efbaaf1 docs: Add ugprade note for Gateway API Signed-off-by: Tam Mach <tam.mach@cilium.io> 03 November 2023, 23:25:22 UTC
636d7b6 gateway-api: Bump the version to v1.0.0 There is one small change in Gateway status in upstream, in which HTTPRoute having ResolveRef as False is still allowed to be attached into Gateway resource, but respective Gateway listener Programmed status should be set as False. Due to the bug in conformance test in v1.0.0, we need to use the commit hash from upstream main branch. Signed-off-by: Tam Mach <tam.mach@cilium.io> 03 November 2023, 23:25:22 UTC
a48bce8 ctmap: improve dump of CT_SERVICE entries The CT tuple (== key) for CT entries is typically stored in "reply" layout - .saddr/.daddr match a reply packet, - .sport/.dport are in reverse order of a reply packet The exception is CT_SERVICE entries, where the CT tuple is stored in "forward" layout - .saddr/.daddr match a forward packet, - .sport/.dport are in reverse order of a forward packet ctmap's .Dump() implementations didn't consider this, so when dumping a CT map the CT_SERVICE entries would be printed in opposite direction. Fix up the formatting, and also print CT_SERVICE entries as dedicated type ("TCP SVC") instead of aliasing with "TCP OUT" entries. before: --- TCP OUT 10.96.0.1:443 -> 10.244.0.113:46298 service expires=153061 RxPackets=0 RxBytes=1 RxFlagsSeen=0x00 LastRxReport=0 TxPackets=0 TxBytes=0 TxFlagsSeen=0x1a LastTxReport=145061 Flags=0x0010 [ SeenNonSyn ] RevNAT=1 SourceSecurityID=0 IfIndex=0 TCP OUT 10.96.0.1:443 -> 10.244.0.161:59970 service expires=153062 RxPackets=0 RxBytes=1 RxFlagsSeen=0x00 LastRxReport=0 TxPackets=0 TxBytes=0 TxFlagsSeen=0x1a LastTxReport=145061 Flags=0x0010 [ SeenNonSyn ] RevNAT=1 SourceSecurityID=0 IfIndex=0 TCP OUT 10.96.0.1:443 -> 10.244.0.87:49382 service expires=153062 RxPackets=0 RxBytes=1 RxFlagsSeen=0x00 LastRxReport=0 TxPackets=0 TxBytes=0 TxFlagsSeen=0x1a LastTxReport=145061 Flags=0x0010 [ SeenNonSyn ] RevNAT=1 SourceSecurityID=0 IfIndex=0 after: --- TCP SVC 10.244.0.113:46298 -> 10.96.0.1:443 expires=155382 RxPackets=0 RxBytes=1 RxFlagsSeen=0x00 LastRxReport=0 TxPackets=0 TxBytes=0 TxFlagsSeen=0x1a LastTxReport=147382 Flags=0x0010 [ SeenNonSyn ] RevNAT=1 SourceSecurityID=0 IfIndex=0 TCP SVC 10.244.0.161:59970 -> 10.96.0.1:443 expires=155376 RxPackets=0 RxBytes=1 RxFlagsSeen=0x00 LastRxReport=0 TxPackets=0 TxBytes=0 TxFlagsSeen=0x1a LastTxReport=147376 Flags=0x0010 [ SeenNonSyn ] RevNAT=1 SourceSecurityID=0 IfIndex=0 TCP SVC 10.244.0.87:49382 -> 10.96.0.1:443 expires=155365 RxPackets=0 RxBytes=1 RxFlagsSeen=0x00 LastRxReport=0 TxPackets=0 TxBytes=0 TxFlagsSeen=0x1a LastTxReport=147361 Flags=0x0010 [ SeenNonSyn ] RevNAT=1 SourceSecurityID=0 IfIndex=0 Signed-off-by: Julian Wiedmann <jwi@isovalent.com> 03 November 2023, 16:41:09 UTC
2c76783 Add IDNIC/Kadabra as user to Cilium Signed-off-by: Ardika Bagus <me@ardikabs.com> 03 November 2023, 12:46:35 UTC
1a0553c hubble: Add config option to redact user info in L7 flows Add business logic to L7 HTTP parser to conditionally redact sensitive user info (e.g., password used in basic authentication) when present in observed URLs. * Add the '--hubble-redact-http-userinfo' option to the Cilium CLI. Preserve existing functionality by setting it to true by default. * Add unit tests to verify that password in observed URL is redacted. * Fix issue in L7 HTTP parser where sensitive values were redacted in (L7) HTTP flows, but not in (L7) HTTP summaries. * Update documentation as needed. * Update Helm chart templates, values and docs as needed. Closes #23887 Signed-off-by: Ioannis Androulidakis <androulidakis.ioannis@gmail.com> 03 November 2023, 08:35:42 UTC
3b9ea60 clustermesh-apiserver: use absolute path to include Makefile.defs Align the clustermesh-apiserver Makefile with the strategy already adopted in the ones for the other components, so that it can be included from elsewhere. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> 03 November 2023, 07:48:57 UTC
d016191 clustermesh-apiserver: include kvstoremesh as subcommand Merge the kvstoremesh logic into the clustermesh-apiserver as a separate subcommand, so that we can get rid of one image and reduce the overall time to bootstrap the clustermesh-apiserver deployment. The additional boilerplate to build the kvstoremesh images is dropped, except for the Dockerfile, which is currently preserved to prevent CI failures (and will be removed through a subsequent commit). Size of the images before this change: $ make docker-clustermesh-apiserver-image $ docker inspect -f '{{ .Size }}' quay.io/cilium/clustermesh-apiserver:latest | \ numfmt --to=si --suffix=B --format="%.2f" ---> 72.09MB $ make docker-kvstoremesh-image $ docker inspect -f '{{ .Size }}' quay.io/cilium/kvstoremesh:latest | \ numfmt --to=si --suffix=B --format="%.2f" ---> 64.62MB Size of the image after this change: $ make docker-clustermesh-apiserver-image $ docker inspect -f '{{ .Size }}' quay.io/cilium/clustermesh-apiserver:latest | \ numfmt --to=si --suffix=B --format="%.2f" ---> 72.26MB Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> 03 November 2023, 07:48:57 UTC
043adc2 kvstoremesh: refactor cobra cmd and hive to avoid globals Refactor kvstoremesh to mimic the structure adopted by the agent, as a preparation to merge it with the clustermesh-apiserver in a single binary. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> 03 November 2023, 07:48:57 UTC
c902330 clustermesh-apiserver: move to a subcommand Move the current clustermesh-apiserver code to a dedicated subcommand, to prepare for the subsequent merging of kvstoremesh into the same binary. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> 03 November 2023, 07:48:57 UTC
4edee93 clustermesh-apiserver: refactor cobra cmd and hive to avoid globals Refactor the clustermesh-apiserver to mimic the structure adopted by the agent, as a preparation to merge it with kvstoremesh in a single binary. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> 03 November 2023, 07:48:57 UTC
d221e96 k8s: Log Warning for Policies that Support "EndPort" Cilium does not currently support port ranges in network policies. Signed-off-by: Nate Sweet <nathanjsweet@pm.me> 03 November 2023, 06:26:24 UTC
6e7b22a bpf: proxy: pass IPv4 header to ctx_redirect_to_proxy_hairpin() Avoid re-validating the IPv4 header when called from nodeport_lb4(). Signed-off-by: Julian Wiedmann <jwi@isovalent.com> 03 November 2023, 03:13:15 UTC
8f88a2a bpf: lxc: pass L3 header to ipv*_policy() In the tail_ipv*_to_endpoint() path we already have a validated L3 header. Make it possible to pass this header to ipv*_policy(). Signed-off-by: Julian Wiedmann <jwi@isovalent.com> 03 November 2023, 03:13:15 UTC
f8c96d6 etcd: drop encode/decode operations tracing Only a type conversion, nothing useful to log there (even at trace level). Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> 02 November 2023, 16:21:36 UTC
adb3713 etcd: always reuse the same logger Avoid recreating a new logger instance for every operation, as it is an expensive operation in terms of allocations. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> 02 November 2023, 16:21:36 UTC
7033b39 etcd: output detailed watcher logs only if tracing is enabled Detailed logs concerning each key/value event received by every watcher can be extremely verbose even at low scales, causing other useful kvstore related debug messages to be missed. Hence, let's output them only when verbose kvstore logs are enabled. This is also consistent with all other kvstore operations. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> 02 November 2023, 16:21:36 UTC
4a7c416 kvstore: drop watcher's name from log messages In all usages, the name is somewhat related to the watched prefix. Hence, let's just drop it, and consistently use the prefix in messages for additional clarity. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> 02 November 2023, 16:21:36 UTC
cadbfec kvstore: drop ListAndWatch alias function This alias appeared to be used only in a unit test. Hence, let's just get rid of it. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> 02 November 2023, 16:21:36 UTC
0c3d7e9 kvstore: drop Debug variable controlled by ldflags The logging level can be configured through a dedicated flag (and environment variable). Let's drop this legacy ldflags variable, as not specific to the kvstore package only. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> 02 November 2023, 16:21:36 UTC
f911b12 enabled initalDelaySeconds on StartupProbe Signed-off-by: jignyasamishra <iamjignyasa@gmail.com> 02 November 2023, 09:48:29 UTC
f8e9472 Replace metricsmap-bpf-prom-sync with Prometheus Collector pattern Previously, the 'metricsmap-bpf-prom-sync' controller was responsible for periodically collecting cilium_datapath drop and forward metrics. This commit introduces the metricsmapCollector, which implements the Prometheus Collector interface within the metricsmap package. As a result, the aforementioned controller has been removed. The metricsmapCollector is registered within the global Prometheus registry during the initialization of metricsmap. The metrics.Register function has been modified to propagate errors from the registry.Register function instead of simply overriding it with nil. The metricsmapCollector comprises two metrics maps: forwardedMetricsMap and droppedMetricsMap. These maps are populated within a callback function passed to the IterateWithCallback function. This approach serves two primary purposes: 1. Separation of Map Iteration and Metric Update: By separating the iteration over the BPF map and the updating of Prometheus metrics, the implementation ensures that no partial metrics are exposed in case of map iteration failure. 2. Normalization of exposed Metrics: Unlike the statement in the bpf/lib/metrics.h comments, which suggests exposing only one reason label for forwarded metrics, the eBPF code exposes multiple reasons. Through testing, it was found that reasons 0 and 3 were exposed which triggered this error in prometheus client: https://github.com/prometheus/client_golang/issues/242 Furthermore, cilium command has indirect dependency on metricsmap package. To prevent metrics map initialization to happen each time cilium command is executed, metricsmap is converted to hive Cell and injected in cilium daemon Infrastructure module. For details see this comment: https://github.com/cilium/cilium/pull/27370#issuecomment-1710805663. Fixes: #27058 Signed-off-by: Boris Petrovic <carnerito.b@gmail.com> 02 November 2023, 08:39:42 UTC
4a87137 add v1.15.0-pre.2 release Signed-off-by: André Martins <andre@cilium.io> 02 November 2023, 08:19:33 UTC
fb996fa makefile: add back the sed command to update the logo path Fixes: 44698df ("helm: simplify auto TLS annotations and various cleanup based on PR feedback") Signed-off-by: Brad Whitfield <bradswhitfield@gmail.com> 01 November 2023, 19:50:49 UTC
e494302 Config option to set the default IP Pool Config option to customize the default IP Pool when using MultiPool Fixes: #27131 Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com> 01 November 2023, 17:00:41 UTC
36b7802 ci: Bump timeout on ci-runtime privileged worksflow Bump timeout from 20 to 30 minutes due to repeated workflow cancellations due to timeout on otherwise successful workflow runs. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> 01 November 2023, 15:27:12 UTC
7e11821 datapath/devices: Avoid adding duplicate address Since subscribing to links/addresses/routes and then listing them is racy we might end up seeing the same address in the initial listing and as an update. Avoid adding the address twice by checking its existence first. Side note: we need the initial listing to be able to populate the tables before readers access them in order to keep the existing non-reconciling semantics. The netlink library we're using does not have a mechanism to inform that the initial listing is done in the subscriptions (even though netlink DONE message is sent for this), so instead DevicesController does subscribe + list which necessitates dealing with issues like this. Signed-off-by: Jussi Maki <jussi@isovalent.com> 01 November 2023, 12:27:00 UTC
60fd85a devices: Flush routes on device delete As Linux does not send comprehensive set of route delete messages on device deletion, flush the routes on the link delete message. Since the link and route messages are coming over separate sockets they may come out of order, so keep track of what link indexes have been deleted and ignore route updates for deleted links. Linux does reuse the index after the 32 bits roll around, so remove the "dead link index" on link creation. Signed-off-by: Jussi Maki <jussi@isovalent.com> 01 November 2023, 12:27:00 UTC
ac24ad4 datapath: Ensure device and route tables are populated before readers To enforce that the device and route tables are populated before being read from, provide the Table[*Device] and Table[*Route[ from the DevicesController constructor. This makes sure that anything that depends on e.g. 'Table[*Device]' will be constructed after DevicesController and thus the DevicesController start hook will execute before it. Signed-off-by: Jussi Maki <jussi@isovalent.com> 01 November 2023, 12:27:00 UTC
c5b65f5 statedb: Add NewPrivateRWTableCell Add a helper to construct only RWTable[T] and TableMeta to allow creating a table in a way that enforces that the table is populated by the producer before it can be read: var Cell = cell.Module("example", "Example module", // Privately provides RWTable[*Foo]: statedb.NewPrivateRWTableCell[*Foo]("foos", FooIDIndex), cell.Provide(New), ) func New(lc hive.Lifecycle, t statedb.RWTable[*Foo]) (FooController, Table[*Foo]) { fooCtrl := ... lc.Append(fooCtrl) // fooCtrl now starts before anything that reads Table[*Foo] return fooCtl, t } Signed-off-by: Jussi Maki <jussi@isovalent.com> 01 November 2023, 12:27:00 UTC
28a3cb7 bpf: lb: fix missing drop reason in reverse_map_l4_port() l4_load_port() is just a thin wrapper around ctx_load_bytes(), which returns raw kernel errnos. Translate these to a Cilium-internal drop reason before returning to the caller. Signed-off-by: Julian Wiedmann <jwi@isovalent.com> 01 November 2023, 09:37:51 UTC
4110131 dnsproxy: convert LookupEndpointByIP to use netip.Addr Modernize the package by switching to the netip.Addr type provided since Go 1.18. This simplifies the code and avoids some conversions from/to string. Signed-off-by: Tobias Klauser <tobias@cilium.io> 01 November 2023, 08:38:40 UTC
28ce005 endpointmanager: fix bpf policy pressure getting stuck. Currently the policy map pressure metric only updates the map pressure metric when a new pressure value that is higher than the current one is set. This means that the metric can only ever go up, so when maps are shrunk (ex. such as after doing an cilium fqdn cache clean) the metric never goes down. This changes the behavior of the metric to maintain a map of map pressure values. When the trigger is invoked, it iterates all values and finds the max - updating the map_pressure gauge for policymaps to the max value. Endpoints that are shut down have their values removed. Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com> 01 November 2023, 06:44:41 UTC
904ceb3 gh/workflows: Dump Cilium LB node logs in case of failure The Cilium standalone LB does not run as a K8s pod, so the regular Cilium's sysdump collection does not work. Instead, just show docker container logs of the LB. Suggested-by: Sebastian Wicki <sebastian@isovalent.com> Signed-off-by: Martynas Pumputis <m@lambda.lt> 01 November 2023, 06:42:14 UTC
a2e5509 api, cli: Show srv6 status in cilium status Signed-off-by: Husni Alhamdani <dhanielluis@gmail.com> 01 November 2023, 01:58:08 UTC
71d7b94 contrib: Fix prerelease pullPolicy When running the post-release.sh (and hence pull-docker-manifests.sh) scripts for prereleases, the pull policy would be set wrong by default (set to Always, rather than IfNotPresent). Fix it. Reported-by: André Martins <andre@cilium.io> Signed-off-by: Joe Stringer <joe@cilium.io> 31 October 2023, 19:26:40 UTC
back to top