sort by:
Revision Author Date Message Commit Date
c092f67 test: [regression] Re-enable tests on non-AKS platforms The tests were incorrectly skipped on non-AKS platforms due to the regression introduced recently that flipped the conditions to skip the test on the AKS platform. Fixes: 0a92cc57a0 (test: remove references to v4.19) Signed-off-by: Aditi Ghag <aditi@cilium.io> 26 February 2024, 08:45:33 UTC
213c128 fix(deps): update all go dependencies main Signed-off-by: renovate[bot] <bot@renovateapp.com> 26 February 2024, 08:37:41 UTC
3b8d95c Egress rule supports CiliumCIDRGroup Fixes: #30597 Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com> 26 February 2024, 08:33:25 UTC
b463146 bgpv2: update advertisement CRD to include service options Define service options field (optional) in Advertisement struct. This will be used to add additional metadata related to advertisement type Service. In this case, we add additional advertisement sub-types like LoadBalancerIP, ClusterIP and ExternalIP for a service to be advertised via BGP. This change also fixes minor typos and renaming of exported fields in BGP advertisement file to have consistent prefix. Signed-off-by: harsimran pabla <hpabla@isovalent.com> 26 February 2024, 08:32:27 UTC
8fcfad9 bpf: correctly encapsulate pod to node traffic with kube-proxy+hostfw When the host firewall is enabled in tunneling mode, pod to node traffic needs to be forwarded through the tunnel in order to preserve the security identity (as otherwise the source IP address would be SNATted), which is required to enforce ingress host policies. One tricky case is represented by node (or hostns pod) to pod traffic via services with local ExternalTrafficPolicy, when KPR is disabled. Indeed, in this case, the SYN packet is routed natively (as both the source and the destination are node IPs) to the destination node, and then DNATted to one of the backend IPs, without being SNATted at the same time. Yet, the SYN+ACK packet would then be incorrectly redirected through the tunnel (as the destination is a node IP, associated with a tunnel endpoint in the ipcache), hence breaking the connection, while it should be passed to the stack to be rev DNATted and then forwarded accordingly. In detail, reporting the description from c8052a1fab8b, the broken packet path is node1 --VIP--> pod@node2 (VIP is node2IP): - SYN leaves node1 via native device with node1IP -> VIP - SYN is DNATed on node2 to node1IP -> podIP - SYN is delivered to lxc device with node1IP -> podIP - SYN+ACK is sent from lxc device with podIP -> node1IP - SYN+ACK is redirected in BPF directly to cilium_vxlan - SYN+ACK arrives on node1 via tunnel with podIP -> node1IP - RST is sent because podIP doesn't match VIP c8052a1fab8b attempted to fix this issue for the kube-proxy+hostfw (and IPSec) scenarios by always passing the packets to the stack, so that it doesn't bypass conntrack. The IPSec specific workaround got then removed in 0a8f2c4ee43e, as that path asymmetry is no longer present. However, always passing packets to the stack breaks the host firewall policy enforcement for pod to node traffic, as at that point there's no route which redirects these packets back to the tunnel to preserve the security identity, and they get simply masqueraded and routed natively. To prevent this issue, let's pass packets to the stack only if they are a reply with destination identity matching a remote node, as in that case they may need to be rev DNATted. There are two possibilities at that point: (a) the destination is a CiliumInternalIP address, and the reply needs to go through the tunnel -- node routes ensure that the packet is first forwarded to cilium_host, before being redirected through the tunnel; (b) the destination is one of the other node addresses, and the reply needs to be forwarded natively according to the local routing table (as node to pod/node traffic never goes through the tunnel unless the source is a CiliumInternalIP address). Overall, this change addresses the externalTrafficPolicy=local service case, while still preserving encapsulation in all other cases. As a side effect, it also improves the performance in the kube-proxy + hostfw case, as pod to pod traffic gets now also redirected immediately through the tunnel, instead of being sent via the stack. Fixes: c8052a1fab8b ("bpf: Do not bypass conntrack if running kube-proxy+hostfw or IPSec") Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> 26 February 2024, 08:32:20 UTC
1907334 identitybackend: clean up TestGetIdentity The previous patch explains and fixes a flake, this patch removes some of the remaining cruft from earlier attempts at fixing said flake, as well as running the test in parallel (for efficiency). Signed-off-by: David Bimmler <david.bimmler@isovalent.com> 23 February 2024, 18:34:51 UTC
deb0687 identitybackend: address race condition in test TestGetIdentity has been unreliable, even withstanding some previous attempts at deflaking. The issue lies in the use of the k8s fake infrastructure: the simple testing object tracker of client-go does _not_ set the ResourceVersion for resources created. This interacts badly with the logic of the client-go reflector's ListAndWatch method, which relies on the resource version to close the racy window between its List and Watch calls. The real k8s api-server will replay events which occur after the completion of List and before the establishment of the Watch, thanks to the ResourceVersion. The object tracker's Watch implementation, however, does (and can) not do so, as it doesn't have a resource version to determine which events it would need to replay. Notably, the HasSynced method of the informer will return true once the initial List has succeeded. This isn't a guarantee for the Watch to be established (and indeed, the reflector establishes the Watch _after_ the list). This is fine for reality, again thanks to the resource version and the api-server replaying. The race, hence, is that the creation of the identities can happen concurrently to the establishment of the watch (HasSynced guarantees that it happens _after_ the list), and thus we race the creation of the "RaceFreeWatcher" in the object tracker. If the watcher is late, it misses the creation of an identity, and we time out waiting on the wait group. To fix this, instead of attempting to wait for the Watch establishment (which doesn't seem easy, on first glance), just create the resources _before_ list and watch is started, so that they are returned in the initial list call. Prior to this patch, the following commandline typically failed quickly: while true; do go test ./pkg/k8s/identitybackend -run 'TestGetIdentity' -v -count=1 -timeout=10s || break; done After this patch, it ran thousands of times reliably. Co-authored-by: Fabian Fischer <fabian.fischer@isovalent.com> Signed-off-by: David Bimmler <david.bimmler@isovalent.com> 23 February 2024, 18:34:51 UTC
6b2d186 job: avoid a race condition in ExitOnCloseFnCtx The test attempted to avoid closing a channel multiple times by setting 'started' to nil. However, since the outer scope will wait on 'started', if started is set to nil before the outer scope waits, it will wait indefinitely - resulting in a time out of the test. Fixes: daa85a0f4a (jobs,test: Fix TestTimer_ExitOnCloseFnCtx channel close panic) Signed-off-by: David Bimmler <david.bimmler@isovalent.com> 23 February 2024, 17:27:16 UTC
0c77529 test: l4lb switch k8s-node-port type to lb With the recent changes in NodePort reconciliation (see https://github.com/cilium/cilium/pull/30374) it is needed to switch service type from --k8s-node-port to --k8s-load-balancer as the VIP is not assigned to the node. Signed-off-by: Ondrej Blazek <ondrej.blazek@firma.seznam.cz> 23 February 2024, 12:38:35 UTC
e17cf21 gha: drop unused check_url environment variable This variable used to be used in combination with the Sibz/github-status-action action, which we replaced with myrotvorets/set-commit-status-action when reworking the workflows to be triggered by Ariane [1]. Given it is now unused, let's get rid of the leftover environment variable, so that we also stop copying it to new workflows. [1]: 9949c5a1891a ("ci: rework workflows to be triggered by Ariane") Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> 23 February 2024, 10:50:46 UTC
0fa767a datapath/linux: require HAVE_LARGE_INSN_LIMIT Signed-off-by: Lorenz Bauer <lmb@isovalent.com> 23 February 2024, 09:39:18 UTC
4d2c576 datapath/linux: move creation of features.h to daemon startup Remove another side-effect from CheckRequirements. Signed-off-by: Lorenz Bauer <lmb@isovalent.com> 23 February 2024, 09:39:18 UTC
6ce6ac9 clustermesh: add byNamespace map to the global service cache Also store service in the GlobalServiceCache by namespace so that we can get all the services inside a namespace which is needed by the EndpointSlice synchronization feature. Signed-off-by: Arthur Outhenin-Chalandre <arthur@cri.epita.fr> 23 February 2024, 09:29:38 UTC
2bf3cb3 clustermesh: refactor GlobalServiceCache to use types.NamespacedName Refactor globalServiceCache byName map to use types.NamespacedName instead of string where the name and namespace was concatenated with a `/` as a separator. Using this type makes the existing behavior more explicit/type safe and prevent potential error where we would pass a potential malformed string. Signed-off-by: Arthur Outhenin-Chalandre <arthur@cri.epita.fr> 23 February 2024, 09:29:38 UTC
0291627 clustermesh: add GlobalServiceCache in clustermesh/common package With the ongoing work on EndpointSlice synchronization, we need to have a `globalServiceCache` there as well so this commit is refactoring the GlobalServiceCache to be inside the common clustermesh package so that we can avoid code duplication and access to this struct outside of the pkg/clustermesh package. Signed-off-by: Arthur Outhenin-Chalandre <arthur@cri.epita.fr> 23 February 2024, 09:29:38 UTC
0048669 daemon: inline lookupIPsBySecID Directly use d.ipcache.LookupByIdentity instead of declaring the separate single-use lookupIPsBySecID method. Signed-off-by: Tobias Klauser <tobias@cilium.io> 22 February 2024, 22:59:30 UTC
00e45d7 ingress: add unit test for namespace termination error This commit adds a unit test for the case where reconciliation shouldn't omit an error if resource creation is failing due to Namespace termination. Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com> 22 February 2024, 20:02:33 UTC
3b8c517 ingress: don't emit an error message on namespace termination When a namespace containing an Ingress configured in dedicated mode gets deleted, the controller may first receive the event corresponding to the deletion of one of the dependent resources, and only afterwards of the ingress itself. Hence, it will attempt to recreate these resources (e.g., the CiliumEnvoyConfig), but fail as rejected by the k8s APIServer with an error along the lines of: failed to create or update CiliumEnvoyConfig: ciliumenvoyconfigs.cilium.io \"cilium-ingress-cilium-test-ingress-service\" is forbidden: unable to create new content in namespace cilium-test because it is being terminated Let's replace this error with an informative message at lower severity, given that the situation resolves itself, and there's no actionable item for a user reading the error. This also prevents the error from being flagged by the Cilium CLI no-errors-in-logs check. Since we don't return error to controller-runtime, the reconciliation will not be retried automatically. However, we will be woken up again once the ingress deletion event is received, so that we can proceed with the cleanup. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> 22 February 2024, 20:02:33 UTC
955049a controlplane: wait for watcher establishment Use the infrastructure introduced in the previous commit to deflake control plane tests which update k8s state after starting the agent. Co-authored-by: Fabian Fischer <fabian.fischer@isovalent.com> Signed-off-by: David Bimmler <david.bimmler@isovalent.com> 22 February 2024, 19:54:40 UTC
ba99d74 controlplane: add mechanism to wait for watchers We've recently learned that the fake k8s client set's object tracker do not respect the semantics of the real api-server when it comes to 'Watch': since the object tracker does not care for ResourceVersions, it cannot respect the version from which it ought to replay events. As a result, the default informer (more precisely, its reflector) is racy: it uses a ListAndWatch approach, which relies on this resource version to avoid a race window between the end of list and the beginning of watch. Therefore, all informers used in cilium have a low chance of hitting this race when used with a k8s fake object tracker. This is somewhat known in the k8s community, see for example [1]. However, the upstream response is that one simply shouldn't be using the fake infrastructure to test real informers. Unfortunately, this pattern is used somewhat pervasively inside the cilium tests, specifically so in the controlplane tests. This patch introduces a mechanism which reduces the likelihood of hitting the flake, under the assumption that we do not (often) establish multiple watchers for the same resource. In the following patch, we'll use the new infrastructure to reduce the flakiness of tests. [1]: https://github.com/kubernetes/kubernetes/issues/95372 Co-authored-by: Fabian Fischer <fabian.fischer@isovalent.com> Signed-off-by: David Bimmler <david.bimmler@isovalent.com> 22 February 2024, 19:54:40 UTC
7df437a controlplane: fix panic: send on closed channel Rarely, the control plane test panics, due to a send on a closed channel. This can occur in a narrow race window in the filteringWatcher: 1. Stop is called on the child watcher 2. Child watcher calls stop on parent watcher 3. Concurrently, an event is dequeued from the parent result chan, and we enter the filtering logic. 4. The parent result chan is closed, and we close the child event channel 5. The filter is matched, and we attempt to write on the closed channel, which causes the panic. Instead of closing the channel in the Stop method, close the channel from the writing goroutine (as is commonly considered best practice in Go.) Fixes: fa89802ce7 (controlplane: Implement filtering of objects with field selectors) Signed-off-by: David Bimmler <david.bimmler@isovalent.com> 22 February 2024, 19:54:40 UTC
51101e9 controlplane: remove obsolete CNPNodeStatusGC test The control plane test for CNPNodeStatusGC was initially intended to check the behaviour of running with and without GC. With the deprecation of the CNP node status, the GC now amounts to unconditional cleanup. The commit mentioned below had broken the test insofar as it remove the differentiating factor: the configuration. In other words, we were running the same code twice, expecting a different outcome. Ideally, this would have broken the first variant - with GC "disabled", but unfortunately the test was racy in itself. I suspect that with progressing modularisation of the agent and operator, somewhere along the line we lost the guarantee that the GC happens before StartOperator returns. Hence we checked that the CNPs were unchanged while the operator was starting up concurrently - a classic race condition. Since the whole test is obsolete, simply remove it, and leave the second variant of the test around to check that we actually perform the deletion. Fixes: c15f8e4a24 (Remove skip-cnp-status-startup-clean) Co-authored-by: Fabian Fischer <fabian.fischer@isovalent.com> Signed-off-by: David Bimmler <david.bimmler@isovalent.com> 22 February 2024, 19:54:40 UTC
46198b5 chore(deps): update all github action dependencies Signed-off-by: renovate[bot] <bot@renovateapp.com> 22 February 2024, 17:16:09 UTC
117fb1f gh: remove 4.19 from maintainers little helper Signed-off-by: Lorenz Bauer <lmb@isovalent.com> 22 February 2024, 15:17:50 UTC
0a92cc5 test: remove references to v4.19 Signed-off-by: Lorenz Bauer <lmb@isovalent.com> 22 February 2024, 15:17:50 UTC
e4a007f ci-verifier: remove v4.19 complexity tests Signed-off-by: Lorenz Bauer <lmb@isovalent.com> 22 February 2024, 15:17:50 UTC
16fbb0a bpf: remove references to v4.19 Signed-off-by: Lorenz Bauer <lmb@isovalent.com> 22 February 2024, 15:17:50 UTC
bcd3ec8 Add Equinix NL Managed Services to USERS.md. Signed-off-by: Robin Elfrink <robin.elfrink@eu.equinix.com> 22 February 2024, 13:02:15 UTC
c790cfd helm: add hubble drop event emitter to the helm chart Signed-off-by: Robin Elfrink <robin@15augustus.nl> 22 February 2024, 13:02:15 UTC
efd9425 hubble: add option to emit v1.events related to pods on packet drop This adds the option to hubble to emit v1.Events related to v1.Pods when a packet drop to or from that pod is detected. By default a packet drop with the same source/destination IP address results in an Event only once per two minutes. Fixes: #29399 Events will not currently show with `kubectl describe pod ...`; that requires the event emitter to add the pod's `metadata.uid` which is unknown at the time of emitting. Retrieving `metadata.uid` is considered a too expensive call at this moment. Signed-off-by: Robin Elfrink <robin@15augustus.nl> 22 February 2024, 13:02:15 UTC
62fd791 Fix Hubble label selector parsing for labels with dots Hubble label filters did not properly parse labels with dots `.` and no explicit source prefix. The label `foo.bar/buzz` was interpreted as the cilium label `foo:bar/buzz`. This was caused by the fact that the ciliumLabels.LabelArray extends the k8sLabels.Selector logic with support for Cilium source prefixes such as "k8s:foo" or "any:bar", by treating the string before the first dot as the source prefix, i.e. `k8s.foo` is treated like `k8s:foo`. This is needed because k8sLabels.Selector does not support colons in labels. So a hubble label filter `k8s:foo=bar` becomes `k8s.foo=bar`. This works fine for label filters with explicit prefixes. It also works fine for label filters without a prefix. If a label does not contain a dot it will be treated as having `any` source. This does not work for labels without a prefix that contain a dot. So the label selector `foo.bar/buzz=test` will match the label `foo:bar/buzz` instead of matching `any:foo.bar/buzz`. This commit fixes this by explicitly adding a `any:` prefix to any key in the label selector. Signed-off-by: Fabian Fischer <fabian.fischer@isovalent.com> 22 February 2024, 11:34:08 UTC
62dd4b8 add Pionative to USERS.md Signed-off-by: Pieter van der Giessen <pieter@pionative.com> 22 February 2024, 10:41:10 UTC
23e39a9 Fix datasource for Hubble DNS and Network dashboards Signed-off-by: Pieter van der Giessen <pieter@pionative.com> 22 February 2024, 10:41:10 UTC
09bd1b9 Add OpenVEX document This commit adds an [OpenVEX][0] document. OpenVEX allows the publication of assessments of vulnerability applicability in a standardised format. Adding this document will primarily allow us to set the [Grype container scanning action][1] to hard fail. We currently do not have a good way of marking false positives, which means that inapplicable vulnerabilities show up repeatedly on scans. Adding these vulnerabilities to the OpenVEX document will remove them from the Grype scan results, allowing us to focus on new/applicable issues as they arise. Users might also see a benefit from the fact that they can use the document to automatically exclude triaged CVEs from their scan results - [Trivy][2] and [Grype][3] both support OpenVEX. [0]: https://github.com/openvex/spec [1]: https://github.com/cilium/cilium/actions/workflows/container-scan.yaml [2]: https://aquasecurity.github.io/trivy/test/docs/supply-chain/vex/#openvex [3]: https://github.com/anchore/grype?tab=readme-ov-file#vex-support Signed-off-by: Feroz Salam <feroz.salam@isovalent.com> 22 February 2024, 09:08:46 UTC
4a0c964 ingress: remove unused annotations The following Ingress annotations aren't actually used by any Ingress related logic. - `ingress.cilium.io/tcp-keep-alive` - `ingress.cilium.io/tcp-keep-alive-idle` - `ingress.cilium.io/tcp-keep-probe-interval` - `ingress.cilium.io/tcp-keep-probe-max-failures` - `ingress.cilium.io/websocket` Support has been removed with https://github.com/cilium/cilium/pull/21386, while introducing the shared loadbalancer support. Therefore, this commit uses the unused annotations and their functionality. Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com> 22 February 2024, 09:07:37 UTC
148f81f bgpv1: Downgrade peer state transition logs to Debug Users can now easily check the current peering state with `cilium bgp peers` command. Thus state transition logs become relatively unimportant for users. Downgrade the logs to debug level. Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com> 22 February 2024, 09:07:17 UTC
4baab3d bgpv1: Remove noisy log from route policy reconciler Remove a noisy log which will be generated for every single reconciliation from route policy reconciler. Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com> 22 February 2024, 09:07:17 UTC
c00330c bgpv1: Remove unnecessary stat logs from neighbor reconciler We don't need to show create/update/delete counts because we show logs for all create/update/delete operation anyways. Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com> 22 February 2024, 09:07:17 UTC
66e5de6 bgpv1: Remove noisy logs from neighbor reconciler Remove noisy logs generated for every single reconciliation. Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com> 22 February 2024, 09:07:17 UTC
4c5f79d bgpv1: Inform when the node is not selected anymore When users stop selecting the node with CiliumBGPPeeringPolicy, BGP Control Plane removes all running virtual router instances. However, it is only notified with Debug level. Upgrade it to Info level since this is an important information which helps users to investigate session disruption with configuration miss. Also, the log is generated and full reconciliation happens even if there is no previous policy applied. This means when there's no policy applied and any relevant resource (e.g. Service) is updated, it will generate the log and does full withdrawal meaninglessly. Introduce a flag that indicates whether there is a previous policy and conditionally trigger log generation and full withdrawal. Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com> 22 February 2024, 09:07:17 UTC
329fefb bgpv1: Remove a noisy log in Controller Controller generate a log for every single reconciliation. This is noisy and doesn't make much sense since users doesn't care about reconciliation happening, but the outcome of the reconciliation. Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com> 22 February 2024, 09:07:17 UTC
54c9341 loader: move Loader interface into separate package this will allow to import the Loader interface without importing the full loader package, which in some cases may end up generating import cycles Co-authored-by: André Martins <andre@cilium.io> Signed-off-by: Gilberto Bertin <jibi@cilium.io> 22 February 2024, 08:40:27 UTC
399beb7 bpf: nodeport: add nodeport_nat_egress_ipv4_hook infra this commit adds a hooking point to nodeport_nat_egress_ipv4_hook in nodeport.h that can be used by cilium plugins to extend the functionality of this function. Signed-off-by: Gilberto Bertin <jibi@cilium.io> 21 February 2024, 20:55:27 UTC
37ab8c2 bpf: overlay: add overlay_ingress_policy hook infra this commit adds a hooking point to handle_ipv4 in bpf_overlay that can be used by cilium plugins to extend the functionality of this function. Signed-off-by: Gilberto Bertin <jibi@cilium.io> 21 February 2024, 20:55:27 UTC
d8acad6 bpf: nat: add snat_v4_needs_masquerade hook infra this commit adds a hooking point to snat_v4_needs_masquerade that can be used by cilium plugins to extend the functionality of this function. Signed-off-by: Gilberto Bertin <jibi@cilium.io> 21 February 2024, 20:55:27 UTC
b6d2df2 renovate: match rhel8 lvh image updates Currently, the rhel8 lvh images are not updated by renovate (see e.g. commit c32ad87d4991 ("chore(deps): update all lvh-images main")) because the versioning regex is not matching the rhel8-X.Y format. Fix the regex so the image name format is matched as well. Signed-off-by: Tobias Klauser <tobias@cilium.io> 21 February 2024, 19:13:25 UTC
7b4c0b0 Fixes #30634: Add default `divisor` for `GOMEMLIMIT` Signed-off-by: Donnie McMahan <jmcmaha1@gmail.com> 21 February 2024, 18:48:05 UTC
cb15333 endpoint: don't create endpoint with labels When endpoint is created and `EndpointChangeRequest` contains labels, it might cause the endpoint regeneration to not be triggered as it is only triggered when labels are changed. Unfortunately this does not happen when epTemplate.Labels are set with the same labels as `EndpointChangeRequest`. This commit fixes the above issue by not setting epTemplate.Labels. Fixes: #29776 Signed-off-by: Ondrej Blazek <ondrej.blazek@firma.seznam.cz> 21 February 2024, 18:41:25 UTC
0765ddf docs: Add reference to BGP Control Plane from Multi-Pool IPAM page Signed-off-by: Rastislav Szabo <rastislav.szabo@isovalent.com> 21 February 2024, 17:50:31 UTC
7e69a3a Docs: restructure pitfalls section for policies This commit moves pitfalls and examples for them closer together, in order to make it more readable. Signed-off-by: darox <maderdario@gmail.com> 21 February 2024, 15:41:46 UTC
8091b98 Docs: add note on matchExpressions for cnp and ccnp This addition explains how a single matchExpressions works i.e. logical AND. On top of that it shows how to achieve a logical OR by having multiple matchExpressions. Signed-off-by: darox <maderdario@gmail.com> 21 February 2024, 15:41:46 UTC
f7cded3 ci: replace v4.19 with RHEL8 kernels Kernel v4.19 is the oldest kernel we currently test on. With the upcoming minimum kernel version bump to v5.4 this configuration becomes obsolete. Instead of removing it, test against the RHEL8 kernel. This gives us coverage of the two "minimum" versions we need to support: v5.4 and whatever RHEL 8.6 ships. Signed-off-by: Lorenz Bauer <lmb@isovalent.com> 21 February 2024, 15:15:27 UTC
391c6b9 docs: Document NodePort BPF and iptables SNAT port collision The collision is mentioned in https://github.com/cilium/cilium/issues/23604. Signed-off-by: Martynas Pumputis <m@lambda.lt> 21 February 2024, 14:27:27 UTC
79808c0 datapath: move state directory manipulation out of CheckRequirements For some reason CheckRequirements creates state directories and also changes the working directory of the current process. Move the code into daemon setup, which is the only caller. Signed-off-by: Lorenz Bauer <lmb@isovalent.com> 21 February 2024, 12:30:31 UTC
755a6b9 datapath/linux: remove kernel and clang version check We check for a minimum kernel and clang version on start up. The kernel version check is dubious since the minimum version we need is 4.19. It also doesn't handle the fact that RHEL 8 kernels report a version of 4.18 even though being functionally close to a 5.4 kernel. The clang version check is similarly dubious, since 3.8 hasn't worked in a long time. Remove the check outright. Signed-off-by: Lorenz Bauer <lmb@isovalent.com> 21 February 2024, 12:30:31 UTC
d078811 docs: update Linux requirements Signed-off-by: Lorenz Bauer <lmb@isovalent.com> 21 February 2024, 12:30:31 UTC
b3d7d4d renovate: try to group dependency updates on single PR Since we have tried to group all dependencies on a single PR we can remove the more specific ones such as "golang-images" and "alpine-images". For the "spire-images", we don't need to defined an minimum "allowedVersion" for the main branch since renovate will not downgrade versions. Same logic was applied for the "docker.io/library/busybox". For the kindest images, there is also no need to upgrade them separately since they can be grouped together with the other dependencies. Signed-off-by: André Martins <andre@cilium.io> 20 February 2024, 19:09:42 UTC
a100265 images: update cilium-{runtime,builder} Signed-off-by: André Martins <andre@cilium.io> 20 February 2024, 16:36:29 UTC
71e89b8 chore(deps): update go to v1.22.0 Signed-off-by: renovate[bot] <bot@renovateapp.com> 20 February 2024, 16:36:29 UTC
8791007 docs: kpr: DSR-Geneve with native-routing requires tunnelProtocol Reflect the config change from https://github.com/cilium/cilium/pull/29051 in the kubeproxy-free docs for DSR-Geneve. Fixes: https://github.com/cilium/cilium/issues/30845 Signed-off-by: Julian Wiedmann <jwi@isovalent.com> 20 February 2024, 16:12:14 UTC
fa7ac75 fix(deps): update all go dependencies main Signed-off-by: renovate[bot] <bot@renovateapp.com> 20 February 2024, 13:46:55 UTC
6964ac0 bpf: Implement handling of flag_skip_tunnel for host egress traffic This commit teaches the v4 and v6 paths of bpf_host.c that handle traffic originating from nodes how to respect the `flag_skip_tunnel` field in the ipcache, which was added in commit d9be0a0. If this field is set, then packets egressing from nodes will bypass tunnel encapsulation and will be sent up the stack for further processing. Signed-off-by: Ryan Drew <ryan.drew@isovalent.com> 20 February 2024, 12:07:43 UTC
f69db40 bpf: Implement handling of flag_skip_tunnel for cil_to_netdev masq. This commit teaches the v4 and v6 paths of the nodeport stack in cil_to_netdev how to respect the `flag_skip_tunnel` field in the ipcache, which was added in commit d9be0a0. The nodeport-relevant code in cil_to_netdev handles special cases where BPF masquerading is required, such as for sending traffic from a local pod to the IP of a remote node. Remote node endpoints with this field set will have packets forwarded to them directly, without the use of masquerading. Signed-off-by: Ryan Drew <ryan.drew@isovalent.com> 20 February 2024, 12:07:32 UTC
c1761f7 bpf: Implement handling of flag_skip_tunnel for revNAT nodeport traffic This commit teaches the v4 and v6 paths of the nodeport revNAT stack how to respect the `flag_skip_tunnel` field in the ipcache, which was added in commit d9be0a0. If this field is set, then reply packets from backends reached through a nodeport will not be sent to originating clients using a tunnel. Instead the packet will be sent up the stack for further processing. This commit covers both revSNAT and revDNAT. The only functional changes made were to the revDNAT parts of the datapath, as the revSNAT parts are not dependent on tunnel logic. Signed-off-by: Ryan Drew <ryan.drew@isovalent.com> 20 February 2024, 12:07:32 UTC
a497697 bpf: Implement handling of flag_skip_tunnel for nodeport NAT traffic This commit teaches the v4 and v6 paths that handle nodeport traffic requiring NAT how to respect the `flag_skip_tunnel` field in the ipcache, which was added in commit d9be0a0. Remote endpoints that have this field set will be forwarded traffic from the nodeport stack without the use of a tunnel. Signed-off-by: Ryan Drew <ryan.drew@isovalent.com> 20 February 2024, 12:07:32 UTC
3708240 bpf, tests: Add helper for clearing bpf maps This commit adds a helper function to the bpf test suite named "clear_map". It takes one argument, a pointer to a bpf map, and clears out all of the elements it contains. This is useful for tests which depend on map state, as maps are not reset between subsequent tests in the same program. Signed-off-by: Ryan Drew <ryan.drew@isovalent.com> 20 February 2024, 12:07:32 UTC
662f10d egressgw: remove nodeDataStore map from Manager Reduce the amount of state kept by the Manager. `Manager.nodeDataStore` appears to be redundant while `Manager.nodes` exists, refactor to remove it. While being behaviourally identical. Signed-off-by: Mark Pashmfouroush <mark@isovalent.com> 20 February 2024, 12:06:43 UTC
3441800 lbipam: copy slice before modification in (*LBIPAM).handlePoolModified In Go 1.22, slices.Delete will clear the slice elements that got discarded. This leads to the slice containing the existing ranges in (*LBIPAM).handlePoolModified to be cleared while being looped over, leading to the following nil dereference in TestConflictResolution: ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ PANIC package: github.com/cilium/cilium/operator/pkg/lbipam • TestConflictResolution ┃ ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛ panic: runtime error: invalid memory address or nil pointer dereference [recovered] panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x1a8c814] goroutine 22 [running]: testing.tRunner.func1.2({0x1d5e400, 0x39e3fe0}) /home/travis/.gimme/versions/go1.22.0.linux.arm64/src/testing/testing.go:1631 +0x1c4 testing.tRunner.func1() /home/travis/.gimme/versions/go1.22.0.linux.arm64/src/testing/testing.go:1634 +0x33c panic({0x1d5e400?, 0x39e3fe0?}) /home/travis/.gimme/versions/go1.22.0.linux.arm64/src/runtime/panic.go:770 +0x124 github.com/cilium/cilium/operator/pkg/lbipam.(*LBRange).EqualCIDR(0x400021d260?, {{0x24f5388?, 0x3fce4e0?}, 0x400012c018?}, {{0x1ea5e20?, 0x0?}, 0x400012c018?}) /home/travis/gopath/src/github.com/cilium/cilium/operator/pkg/lbipam/range_store.go:151 +0x74 github.com/cilium/cilium/operator/pkg/lbipam.(*LBIPAM).handlePoolModified(0x400021d260, {0x24f5388, 0x3fce4e0}, 0x40000ed200) /home/travis/gopath/src/github.com/cilium/cilium/operator/pkg/lbipam/lbipam.go:1392 +0xfa0 github.com/cilium/cilium/operator/pkg/lbipam.(*LBIPAM).poolOnUpsert(0x400021d260, {0x24f5388, 0x3fce4e0}, {{0xffff88e06108?, 0x10?}, {0x4000088808?, 0x40003ea910?}}, 0x40000ed080?) /home/travis/gopath/src/github.com/cilium/cilium/operator/pkg/lbipam/lbipam.go:279 +0xe0 github.com/cilium/cilium/operator/pkg/lbipam.(*LBIPAM).handlePoolEvent(0x400021d260, {0x24f5388?, 0x3fce4e0?}, {{0x214e78e, 0x6}, {{0x400034d1d8, 0x6}, {0x0, 0x0}}, 0x40000ed080, ...}) /home/travis/gopath/src/github.com/cilium/cilium/operator/pkg/lbipam/lbipam.go:233 +0x1d8 github.com/cilium/cilium/operator/pkg/lbipam.(*newFixture).UpsertPool(0x40008bfe18, 0x40002a4b60, 0x40000ed080) /home/travis/gopath/src/github.com/cilium/cilium/operator/pkg/lbipam/lbipam_fixture_test.go:177 +0x148 github.com/cilium/cilium/operator/pkg/lbipam.TestConflictResolution(0x40002a4b60) /home/travis/gopath/src/github.com/cilium/cilium/operator/pkg/lbipam/lbipam_test.go:56 +0x3fc testing.tRunner(0x40002a4b60, 0x22a2558) /home/travis/.gimme/versions/go1.22.0.linux.arm64/src/testing/testing.go:1689 +0xec created by testing.(*T).Run in goroutine 1 /home/travis/.gimme/versions/go1.22.0.linux.arm64/src/testing/testing.go:1742 +0x318 FAIL github.com/cilium/cilium/operator/pkg/lbipam 0.043s Fix this by cloning the slice before iterating over it. Signed-off-by: Tobias Klauser <tobias@cilium.io> 20 February 2024, 11:39:42 UTC
5276ce4 Lint: Remove unused method to appease linter Signed-off-by: Rafael da Fonseca <rafael.fonseca@wildlifestudios.com> 20 February 2024, 10:02:55 UTC
3a00f5b Fix: Remove IP filters from initial GC This commit changes the more destructive initial GC of the conntrack that is exectuted during initial agent startup to behave like a normal GC. The previous implementation doesn't represent any real value anymore and it caused some valid entries to be removed from conntrack, and there's no longer any reason to try to guess which entries could be valid, a regular GC is enough Fixes: #29667 Signed-off-by: Rafael da Fonseca <rafael.fonseca@wildlifestudios.com> 20 February 2024, 10:02:55 UTC
b318c67 chore(deps): update gcr.io/distroless/static-debian11:nonroot docker digest to 6a3500b Signed-off-by: renovate[bot] <bot@renovateapp.com> 20 February 2024, 09:27:22 UTC
6eae743 chore(deps): update dependency cilium/cilium-cli to v0.15.23 Signed-off-by: renovate[bot] <bot@renovateapp.com> 20 February 2024, 08:11:38 UTC
5a6a605 chore(deps): update golangci/golangci-lint docker tag to v1.56.2 Signed-off-by: renovate[bot] <bot@renovateapp.com> 20 February 2024, 07:40:31 UTC
07b0935 GCP performance OIDC auth. Signed-off-by: viktor-kurchenko <viktor.kurchenko@isovalent.com> 19 February 2024, 19:39:21 UTC
db5db03 chore(deps): update golangci/golangci-lint-action action to v4 Signed-off-by: renovate[bot] <bot@renovateapp.com> 19 February 2024, 17:08:28 UTC
a9d83fe daemon: Refactor syncHostIPs syncHostIPs depends on the set of node addresses. This is now a changing set and thus hostIPs need to be reconciled if it changes. In order to get rid of the "deviceReloader" and eventually end up in a situation where components directly watch device & addresses tables, move the syncHostIPs into a reconciliation loop that watches the addresses. For now the periodic sync is still kept as-is, though it likely could be removed and replaced with a retry mechanism. As a side cleanup, bubble up the error from startDaemon instead of using log.Fatal. Signed-off-by: Jussi Maki <jussi@isovalent.com> 19 February 2024, 15:07:03 UTC
f21f303 daemon: bubble up error from startDaemon instead of fataling Return the error from startDaemon through the start hook to abort shutdown cleanly rather than using log.Fatal. Signed-off-by: Jussi Maki <jussi@isovalent.com> 19 February 2024, 15:07:03 UTC
0d382f3 daemon: Move VTEP setup to its own controller The VTEP setup was put into the syncHostIPs controller with which it has nothing in common. Move it into its own controller to allow refactoring the syncHostIPs controller in the next commit. Signed-off-by: Jussi Maki <jussi@isovalent.com> 19 February 2024, 15:07:03 UTC
970208f iptables: Fix `New port number` case in TestAddProxyRules{v4,v6} The third part of the `TestAddProxyRulesv4` has a comment stating: // New port number, adds new ones, deletes stale rules. Does not touch OLD_ chains But it actually uses the same port number as the first part of the test, starting from the same installed iptables rules. As a result, it is not actually testing the rules updating after a call to addProxyRules with a new proxy port. The test case has been fixed to start from a redirection already installed for port "37379" and then calling addProxyRules to change the port to "37380". Doing that the test case exercises the ability of the code to update the rules when a new port is used for the proxy. The same fix has been applied for the similar case in TestAddProxyRulesv6. Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com> 19 February 2024, 14:17:25 UTC
2ac3f52 chore(deps): update all kind-images main Signed-off-by: renovate[bot] <bot@renovateapp.com> 19 February 2024, 13:22:04 UTC
7f5132e chore(deps): update all github action dependencies Signed-off-by: renovate[bot] <bot@renovateapp.com> 19 February 2024, 13:21:57 UTC
3b510f4 Update deprecated Prometheus Metrics Fixes: #30584 Signed-off-by: John Karoyannis <karoyannis@yahoo.com> 19 February 2024, 13:08:09 UTC
a871875 fix(deps): update module github.com/tidwall/gjson to v1.17.1 Signed-off-by: renovate[bot] <bot@renovateapp.com> 19 February 2024, 12:56:22 UTC
1f6cf15 gh: template: query whether the bug is a regression Let's make it easier for users to tell us about regressions. Signed-off-by: Julian Wiedmann <jwi@isovalent.com> 19 February 2024, 13:02:42 UTC
dceb890 tests: Add failed to list CRD to ignored warning logs This commit adds the 'failed to list CRDs' log - "the server could not find the requested resource" - to the list of accepted warning logs. This message is already in the list of ignored error logs, however this message can also appear as a warning log from klog. Fixes: #30776 Related: #26591 Signed-off-by: Ryan Drew <ryan.drew@isovalent.com> 19 February 2024, 09:48:02 UTC
b19321e ci: Restrict running tests to only the organization-members team This commit updates the Ariane configuration to include the GitHub organization team 'organization-members' in the list of allowed teams. Consequently, only members of this specific team will have the authorization to initiate test runs via issue comments. Signed-off-by: Birol Bilgin <birol@cilium.io> 18 February 2024, 15:10:01 UTC
3919c3f ci: fix typo in generate-k8s-api workflow Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com> 18 February 2024, 14:30:29 UTC
27430d4 pkg: Add Bitwise LPM Trie Library This bitwise lpm trie is a non-thread-safe binary trie that indexes arbitrarily long bit-based keys with associated prefixes indexed from most significant bit to least significant bit using the longest prefix match algorithm. Documenting the behavior of the datastructure is localized around the method calls in the trie.go file. The tests specifically test boundary cases for the various methods and fuzzes the RangeLookup method. Updating CODEOWNERS to put sig-policy and ipcache in charge of this library. Fixes: #29519 Co-authored-by: Casey Callendrello <cdc@isovalent.com> Signed-off-by: Nate Sweet <nathanjsweet@pm.me> 18 February 2024, 14:31:24 UTC
68c9d18 chore(deps): update dependency kubernetes-sigs/kind to v0.22.0 Signed-off-by: renovate[bot] <bot@renovateapp.com> 18 February 2024, 12:09:10 UTC
0553aea Revert "workflow: yaml change" This reverts commit dbc52a0ffb261e96cac28f2ffb1b25f66edb5b93. Signed-off-by: André Martins <andre@cilium.io> 18 February 2024, 10:38:50 UTC
dbc52a0 workflow: yaml change Currently, Cilium attaches SBOMs to images using cosign attach sbom --sbom sbom.spdx quay.io/${{ env.QUAY_ORGANIZATION }}/${{ matrix.name }}@${{ steps.docker_build_release.outputs.digest }} during build images workflows. This raises the following warning: --- WARNING: SBOM attachments are deprecated and support will be removed in a Cosign release soon after 2024-02-22 (see sigstore/cosign#2755). Instead, please use SBOM attestations. --- Cilium build image workflows need to migrate to using cosign attest Fixes: #30664 Signed-off-by: Umesh Keerthy <umesh.freelance@gmail.com> 17 February 2024, 20:33:26 UTC
32543a4 slices: don't modify input slices in test In Go 1.22, slices.CompactFunc will clear the slice elements that got discarded. This makes TestSortedUniqueFunc fail if it is run in succession to other tests modifying the input slice. Avoid this case by not modifying the input slice in the test case but make a copy for the sake of the test. Signed-off-by: Tobias Klauser <tobias@cilium.io> 17 February 2024, 14:30:11 UTC
a60f3ce images: update cilium-{runtime,builder} Signed-off-by: Cilium Imagebot <noreply@cilium.io> 17 February 2024, 12:01:22 UTC
79cace9 chore(deps): update docker.io/library/ubuntu:22.04 docker digest to f9d633f Signed-off-by: renovate[bot] <bot@renovateapp.com> 17 February 2024, 12:01:22 UTC
48bd2ac statedb: Fix race between Observable and DB stopping Since "Observable" forks a goroutine that is not tied to the lifecycle of the application what may occur is that the "observe" goroutine calls DeleteTracker.Close after DB.Stop, leading to: panic: send on closed channel goroutine 106 [running]: github.com/cilium/cilium/pkg/statedb.(*DeleteTracker[...]).Close(0x0) /host/pkg/statedb/deletetracker.go:76 +0x21e While it would be ideal that goroutines created by statedb would be tied to its lifecycle and thus Stop() could wait for e.g. all observable goroutines to be finished, it's not enough as DeleteTracker's may be created outside and stopped after DB. Thus this commit changes the logic to make it safe to call DeleteTracker.Close() even after the DB has stopped. The fix was validated by adding a "defer time.Sleep(100*time.Millisecond)" to observable.go before the "tracker.Close()" to force it to run after DB.Stop, with it failing with "send on closed channel" before fix and passing after. As a future follow-up it would make sense to use a Hive job group tied to DB's lifecycle to make sure all goroutines are cleaned up (this follow-up will be done against the cilium/statedb repo as it's being moved there). The fix in this commit is already part of cilium/statedb repo and does not need to be ported. Fixes: #30806 Fixes: 23b0492f30c8 ("statedb2: StateDB v2.0 with per-table locks and deletion tracking") Signed-off-by: Jussi Maki <jussi@isovalent.com> 17 February 2024, 10:40:13 UTC
6c07f8c chore(deps): update dependency cilium/cilium-cli to v0.15.22 Signed-off-by: renovate[bot] <bot@renovateapp.com> 17 February 2024, 00:13:30 UTC
d89ae98 chore(deps): update all github action dependencies Signed-off-by: renovate[bot] <bot@renovateapp.com> 16 February 2024, 23:41:45 UTC
4926949 fix(deps): update all go dependencies main Signed-off-by: renovate[bot] <bot@renovateapp.com> 16 February 2024, 23:40:47 UTC
9ff8d31 ingress: pass enforcedHttps from config (cell) to reconciler Currently, the config property `enforce-ingress-https` doesn't have any effect as its value isn't propagated to the ingresscontroller reconciler. Hence, Ingress HTTPS is never enforced globally (only via annotation). This commit fixes this by passing the config value to the reconciler. Fixes: #30616 Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com> 16 February 2024, 22:04:20 UTC
d1834ba chore(deps): update gcr.io/etcd-development/etcd docker tag to v3.5.12 Signed-off-by: renovate[bot] <bot@renovateapp.com> 16 February 2024, 14:38:30 UTC
back to top