https://github.com/cilium/cilium

sort by:
Revision Author Date Message Commit Date
936738a policy: Move Listener from L4Filter to PerSelectorPolicy Allow different selectors on the same L4Filter use different Envoy Listeners. This relaxes the policy import (L4Filter merge) logic by only failing out if there is a Listener conflict on the same cached selector. This change is needed to allow different Envoy Listeners to be applied on traffic on the same port, depending on the destination (for an egress policy). Consequently, we must handle conflicting proxy ports on the same MapState key, originating from different selectors selecting the same remote identity. We do this with a new optional Listener priority value. Listener priority, if not specified, or for redirects for which an explicit listener name is not given, defaults to the value of the proxy port itself. This serves as a tie-breaking rule so that the redirection is deterministic also in cases where a policy with a listener reference and a CNP L7 policy on a different selectors that then happen to select the same identities. The proxy port value is also used as a tie-breaker when the same identity is selected by two different selectors on different rules that specify different listeners but with the same priority. While this is an arbitrary choice, it is better than allowing the selected listener vary depending on rule insertion order, or the random Go map iteration order when generating the map state. By convention proxy port values are between 10000-20000, so defining any (allowed): priority value gives precedence to that listener reference against listener references without an explicit priority. Use MapState.Diff to report the difference between the obtained and expected MapState on test failures. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> 21 January 2024, 20:05:42 UTC
9942ff8 policy: Look up proxy port when creating mapstate entries Populate mapstate entries with the actual proxy redirection port number, when available. Still need to use the fake port 1 when a redirect has not been realized at the time of entry creation. Endpoint.realizedRedirects map now holds zero valued redirect ports for Istio sidecars, so that the lookup can be made without taking endpoint's mutex (without using Endpoint.hasSidecarProxy). Zero valued redirects are not created or removed from the proxy package. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> 21 January 2024, 20:05:41 UTC
099a1d3 endpoint: Make realizedRedirects lockless Endpoint's realizedRedirects is used to look up a proxy port for a redirect in during policy map updates. Make access to it lockless by storing an atomic pointer to the map, and considering the stored map immutable. The set of realized redirects initially starts empty, and all required (desired) redirects are added to it. After that the unwanted redirects are removed by comparing the old realized redirects and the new desired redirects. After this the desired redirects becomes the new realized redirects. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> 19 January 2024, 19:30:31 UTC
d6b0dc9 endpoint: Add listener to proxyID Add listener to the proxyID. This is needed so that different listeners can be supported on the same port, for different destinations/sources. The listener name also needs to be passed on via policy.MapStateEntry. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> 19 January 2024, 19:30:31 UTC
a3f0852 endpoint: Add proxy port to proxy stats key Proxy stats contains the destination port of redirected traffic. When a single port can be redirected to multiple listeners, depending on the destination (or source), their stats entries need to be kept separate. One way of doing this is to add the proxy port to the proxy stats key. Proxy port is wired through the ProxyId field in the cilium.bpf_metadata filter config, and will be carried over to the access log messages from there. Proxy stats are endpoint specific, so the endpoint id need not be in proxy stats key. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> 19 January 2024, 19:30:30 UTC
b20038e gha: explicilty specify beefier runner type for clustermesh workflows Clustermesh workflows need to setup two multi-node kind clusters, which don't fit well in the default GH runners (2 vCPU and 7GiB or RAM). Although GitHub recently upgraded [1] the default runners for OSS projects to 4 vCPU and 16GiB of RAM, let's still make it explicit that these workflow actually need that amount of power to run seamlessly. [1]: https://github.blog/2024-01-17-github-hosted-runners-double-the-power-for-open-source/ Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> 19 January 2024, 15:03:59 UTC
8609c5f makefile: make kind clustermesh clusters dual stack Create the clustermesh kind clusters as dual stack, and configure Cilium to enable both IP families, to simplify testing IPv6-related changes and features. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> 19 January 2024, 10:14:06 UTC
fb4e560 helm: Add extraVolumeMounts to config init container Signed-off-by: Andrii Iuspin <andrii.iuspin@isovalent.com> 19 January 2024, 09:52:17 UTC
8180cac doc: Add Azure CNI Powered by cilium as external installer Added a doc to update installation instructions of cilium via Azure CNI Powered by Cilium AKS cluster. Added a page to describe about delegated ipam. Signed-off-by: Tamilmani <tamanoha@microsoft.com> 19 January 2024, 09:48:00 UTC
7e3b41b api: Promote field_mask from experimental to stable Also deprecated experimental field_mask option Signed-off-by: Chance Zibolski <chance.zibolski@gmail.com> 18 January 2024, 18:51:59 UTC
aef0523 pkg/nodediscovery: Updates updateCiliumNodeResource() Warning Message Previously, updateCiliumNodeResource() would emit a warning message whenever the k8s client could not get the local CiliumNode resource from the k8s api server. This caused the following benign log message for new installations since the CiliumNode resource has yet to be created: `level=warning msg="Unable to get node resource" error="ciliumnodes.cilium.io \"kind-control-plane\" not found" subsys=nodediscovery` This PR updates updateCiliumNodeResource() to only generate the warning message when the maximum number of attempts has been reached. Fixes: #29330 Signed-off-by: Daneyon Hansen <daneyon.hansen@solo.io> 18 January 2024, 13:54:00 UTC
adeec1d gateway-api: use scheme to check if MCS API ServiceImport is supported Signed-off-by: Arthur Outhenin-Chalandre <arthur@cri.epita.fr> 18 January 2024, 10:37:00 UTC
ebb6222 gateway-api: factorize the logic to get the service name Now that we have ServiceImport support there is more code to get the real backend service name wiht all the {HTTP,GRPC,TLS}Route so this commit essentially factorize most of this logic to simplify the code. Signed-off-by: Arthur Outhenin-Chalandre <arthur@cri.epita.fr> 18 January 2024, 10:37:00 UTC
e070511 gateway-api: make ServiceImport CRD optional Add a check to make sure the ServiceImport CRD is installed. This check is done lazily once at the init time, the Cilium operator should be restarted after the user installs the ServiceImport CRD for this feature to work. This is required as the ruling on whether or not the controller will watch ServiceImport is only made at init time while there are other checks that are done at runtime, so we only check this at init time and keep the result for all of those for consistency and to prevent weird state. Signed-off-by: Arthur Outhenin-Chalandre <arthur@cri.epita.fr> 18 January 2024, 10:37:00 UTC
13fd228 gateway-api: add ServiceImport support Add ServiceImport supports in Cilium Gateway API. The only implementation supported by this commit is the one where ServiceImport reference an existing "derived" Service with the annotation `multicluster.kubernetes.io/derived-service`. This way of implementing `ServiceImport` make it as a "dummy" object while the real logic is still with the Service object. This implementation of MCS API is not enforced by the MCS API KEP but other implementations are very unlikely as it needs to modify kube-proxy (and/or other kube-proxy replacements) to support `ServiceImport` objects natively. It's also the recommended approach in the mcs-api repo "reference implementation" and what's planned for Cilium cluster mesh. Since we do not support any ServiceImport that doesn't have a derived annotation the support in Cilium Gateway API is mainly about making sure the annotation and the derived service actually exist and swapping ServiceImport by the derived Service right before ingesting it in Envoy. Signed-off-by: Arthur Outhenin-Chalandre <arthur@cri.epita.fr> 18 January 2024, 10:37:00 UTC
82d3a26 Update readme with v1.15.0-rc.1 Signed-off-by: André Martins <andre@cilium.io> 18 January 2024, 10:02:12 UTC
2c40a75 .github: Fix LVH image bump for main branch André reports that the main branch isn't receiving stable image updates, likely because the second rule here is overwriting the first rule for the quay.io/lvh-images/kind package name. Fix it by removing main from the special "stable" branch rule and ensuring that the main branch rule applies for not only bpf-next images, but all lvh-images. Fixes: 4e93d90fc71b (".github: Don't update LVH images on stable branches") Reported-by: André Martins <andre@cilium.io> Signed-off-by: Joe Stringer <joe@cilium.io> 17 January 2024, 15:03:28 UTC
f28817b conformance-e2e: enforce no missed tail calls occurring during tests Signed-off-by: Timo Beckers <timo@isovalent.com> 17 January 2024, 14:49:22 UTC
4651129 loader: install an ELF's policy programs before attaching tc/xdp hooks See code comments for a detailed description of the problem. This commit installs policy programs before attaching tc/xdp hooks since doing things in the wrong order means dropping tail calls when handling traffic if the policy programs aren't inserted. Signed-off-by: Timo Beckers <timo@isovalent.com> 17 January 2024, 14:49:22 UTC
f5c6b8a envoy: Bump envoy image to include proxy_protocol filter Related build: https://github.com/cilium/proxy/actions/runs/7537100790/job/20515509923 Relates: https://github.com/cilium/proxy/pull/487 Fixes: https://github.com/cilium/cilium/issues/30180 Signed-off-by: Tam Mach <tam.mach@cilium.io> 17 January 2024, 08:21:49 UTC
90dbb40 Remove pkg/option/fake It is no longer needed. Signed-off-by: Aleksander Mistewicz <amistewicz@google.com> 16 January 2024, 20:03:04 UTC
f4c5f45 Remove Configuration interface from pkg/ipam Signed-off-by: Aleksander Mistewicz <amistewicz@google.com> 16 January 2024, 20:03:04 UTC
9c0249b Remove redundant Configuration interface and use option.DaemonConfig directly It used an interface, presumably, to make it easier to override some configuration changes. As a result node/manager sometimes used option.Config directly instead of local reference in m.conf what is a potential source of bugs. Signed-off-by: Aleksander Mistewicz <amistewicz@google.com> 16 January 2024, 20:03:04 UTC
d9be0a0 bpf, ipcache: Add flag_skip_tunnel field to remote_endpoint_info Consume 8 bits of padding from the ipcache remote_endpoint_info struct and reserve them for optional flags. This commit adds a single new flag, `flag_skip_tunnel`, to signal that the attached endpoint shall not be forwarded through a VXLAN/Geneve tunnel, regardless of the Cilium configuration. Co-authored-by: Marco Iorio <marco.iorio@isovalent.com> Signed-off-by: Ryan Drew <ryan.drew@isovalent.com> 16 January 2024, 19:02:38 UTC
cc25b91 bpf: nodeport: opt-out from neighbour map when XDP-forwarding via tunnel When XDP manually builds the tunnel headers and forwards to a remote node, it makes no sense to rely on the neighbour map for L2 resolution. We have to trust that the agent installs managed neigh entries for all other nodes, and thus the FIB lookup will always return a L2 resolution. Signed-off-by: Julian Wiedmann <jwi@isovalent.com> 16 January 2024, 17:14:25 UTC
c66d1e1 bpf: fib: refactor fib_do_redirect() Clarify the different paths of L2 resolution: 1. when the neigh-resolver is available, always use it. Forward the next-hop info from a preceding FIB lookup where available. 2. otherwise fallback to the neigh map, for callers that have opted in. Signed-off-by: Julian Wiedmann <jwi@isovalent.com> 16 January 2024, 17:14:25 UTC
f3c3416 bpf: fib: fix DMAC rewrite with ENABLE_SKIP_FIB A recent FIB refactor introduced a bug, where fib_redirect*() no longer performs a FIB lookup if ENABLE_SKIP_FIB is set. But for configs without neigh-resolver, some code paths (that can't fall back to the neigh map) strictly require this FIB lookup to obtain the next-hop's MAC address. Fix things by reintroducing the FIB lookup when neigh_resolver_available() returns false. Fixes: e30e18b646f6 ("bpf,fib: use fib_do_redirect in fib_redirect") Signed-off-by: Julian Wiedmann <jwi@isovalent.com> 16 January 2024, 17:14:25 UTC
bb06f2e bpf: fib: require opt-in for neighour map fallback in fib_do_redirect() The neighbour map is populated by the inbound nodeport path, and used to cache the client's MAC address. Therefore it only makes sense to use this fallback in the LB's reply path. Opt-out from using it in - the LB NAT forward path - the LB DSR forward path - the outbound EgressGW paths - bpf_lxc's reply path, as that's only used with ENABLE_HOST_ROUTING and thus can always use the neigh-resolver. Note that callers which can't use the neigh-map will need *some* sort of toleration for failed L2 resolution / DROP_NO_FIB result. Signed-off-by: Julian Wiedmann <jwi@isovalent.com> 16 January 2024, 17:14:25 UTC
0ac6390 bpf: introduce ctx_load_and_clear_meta() When handling the metadata that cilium stores into skb->cb, a typical pattern is to (1) first load a field, and then (2) clear the same field. Add a combined helper for this pattern. This helps to keep the load/store steps in sync, and reduces boilerplate code. This also brings minor savings for the nodeport.h code that is included into bpf_xdp. For XDP the ->cb is emulated with a BPF map, and the combined helper requires only one map access. Signed-off-by: Julian Wiedmann <jwi@isovalent.com> 16 January 2024, 16:54:51 UTC
5a00ed7 test/controlplane: add field filterlist case for ciliumnodelist. This fixes panic in controlplane tests introduced by previous commits related to CiliumNode Resource[T]. Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com> 16 January 2024, 16:43:26 UTC
3a2267b daemon: add unit test for local node init from k8s. This tests code path where node ip4/ip6 are not configured manually and thus restoration is attempted from local Node/Cilium node objects. Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com> 16 January 2024, 16:43:26 UTC
c67d4ee daemon: use local CiliumNode resource to populate CiliumInternalIP. It appears that during recent refactors, restoring cilium addresses from k8s node objects would only use k8s Node types. However, in order to restore cilium_host router interface IP from k8s (the prioritized restore method), the agent needs to find an IP of type CiliumInternalIP. This type is only enumerated on CN types, not K8s Nodes so in it's current state all attempts to restore from k8s would return a nil IP. As well, we've noticed that non-k8s restorations can occasionally produce unexpected new IPs causing issues when running in vxlan/ipsec mode due to delay between xfrm state and the router ip being emitted via apiserver. Note: Most cilium host restores should succeed on the configuration based retore which takes precedence over k8s based restore. Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com> 16 January 2024, 16:43:26 UTC
8fa57e7 doc: Updated RKE/Rancher guides * Updated Helm installation instructions for RKE * Updated installation instructions for standalone RKE1/2 clusters * Updates installation instructions for Rancher-managed RKE1/2 clusters. Tested with the most recent Rancher version 2.8.0. Signed-off-by: Philip Schmid <philip.schmid@isovalent.com> 16 January 2024, 16:42:47 UTC
7cc920b statedb: Add Observable function As we already have lots of code doing processing via event streams (e.g. Resource[T]), make it easier to migrate to the Kubernetes source from Resource[T] by making it possible to observe the table as a stream.Observable. The workqueue that Resource[T] is not supported. It wasn't used in many cases anyway and for those the workqueue can be implemented directly. Signed-off-by: Jussi Maki <jussi@isovalent.com> 16 January 2024, 16:08:14 UTC
b5f9e42 statedb: Add Map function to map iterators Useful when combined with CollectSet: type Foo struct { Key string } var iter Iterator[Foo] var keys sets.Set[string] keys = CollectSet( Map( iter, func(f Foo) string { return f.Key })) Signed-off-by: Jussi Maki <jussi@isovalent.com> 16 January 2024, 16:08:14 UTC
ea7cb80 statedb: replace the buffer based KeySet with simpler implementation Memory profiling showed that we were allocating fair bit in NewKeySet. This can be avoided in cases where the indexer only returns a single key by using a special case implementation. This replaces KeySet with a struct of head & tail. This allows writing indexers that can return a constant and avoid all memory allocation for the key. Signed-off-by: Jussi Maki <jussi@isovalent.com> 16 January 2024, 16:08:14 UTC
be9110b statedb: Add NumObjects method and Derive utility Table[T].NumObjects() returns the number of objects in the table in O(1) time. Derive transforms objects from an input table to an output table. Useful in conjunction with a reconciler where the desired state is derived from a single input table. Example: // Assuming we have Table[*Foo] and RWTable[*Bar] and that // *Bar is an object we want reconciled. cell.Invoke( statedb.Derive[*Foo, *Bar]( func(foo *Foo, deleted bool) (*Bar, statedb.DeriveResult) { if deleted { return &Bar{ ID: foo.ID, // Only need enough for primary key Status: reconciler.StatusPendingDelete(), }, statedb.DeriveUpdate } return &Bar{ ID: foo.ID, Quux: foo.Quux, Status: reconciler.StatusPending(), }, statedb.DeriveInsert }, ), ) Signed-off-by: Jussi Maki <jussi@isovalent.com> 16 January 2024, 16:08:14 UTC
e7faf78 bpf: test: future-proof some kernel version checks Instead of listing all kernel versions that support a specific feature, just list the old version(s) that *don't* support the feature. This avoids updating the version list whenever we add support for a new kernel. Suggested-by: Lorenz Bauer <lmb@isovalent.com> Signed-off-by: Julian Wiedmann <jwi@isovalent.com> 16 January 2024, 14:54:46 UTC
2a1d0ca workflows: conformance-eks: use env.QUAY_ORGANIZATION_DEV Enable the usual customization of the quay repo location. Fixes: c26c55b1b724 ("ci: fix eks image pull flake") Signed-off-by: Julian Wiedmann <jwi@isovalent.com> 16 January 2024, 14:14:59 UTC
fffe8f2 test: un-ignore "Policy map sync fixed errors" It should (hopefully) be fixed now. Fixes: #29727 Signed-off-by: Casey Callendrello <cdc@isovalent.com> 16 January 2024, 13:28:05 UTC
4c737f4 endpoint: pause policymap-sync controller during regeneration During regeneration, we don't consistently hold the endpoint lock. This leaves some windows wherein an updated policy may be partially applied. as a side effect, the periodic policymap-sync reconciler occasionally complains (rather rudely) about benign inconsistencies. (This warning shows up in the logs and can cause CI failures). So, don't run the controller while we're in a half-applied state. This state will be quickly resolved, so the controller should succeed on the next round. Additionally, regeneration performs the equivalent synchronization *anyways*, so we're not actually missing a synchronization. Signed-off-by: Casey Callendrello <cdc@isovalent.com> 16 January 2024, 13:28:05 UTC
87f1cbe Helm: additional info for mtu value This commit adds additional information on which interfaces the mtu value configures. Signed-off-by: darox <maderdario@gmail.com> 16 January 2024, 11:51:30 UTC
0c080f6 gha: postpone checkout of the untrusted context As an additional security measure, let's postpone the checkout of the untrusted context after the setup of the test environment. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> 16 January 2024, 11:15:31 UTC
247e6e0 gha: keep trusted and untrusted paths separate, and simplify actions ref A few GHA workflows got recently modified to hardcode the repository and branch of the actions hosted locally (e.g., [1]). This was a security measure, as they are triggered after checking out the untrusted context (i.e., PR branch), and thus it would be possible for an external PR to inject malicious code. Yet, at the same time, this change mostly defeats the smooth development process enabled by ariane (which automatically uses the workflow and context from the PR for trusted branches -- i.e., in cilium/cilium), requiring again to manually modify those references for testing purposes. Similarly, it also requires manual adaptations when changes are backported to stable branches, or to allow running them from forks, which are easy to overlook. As an alternative solution, let's only check out the helm chart from the untrusted context in a separate directory, without overriding any of the trusted files (i.e., from the target branch) retrieved initially. This way, we are guaranteed that the local github actions are always trusted (as we are not overriding them, nor we are executing any script which could modify them), and can be invoked directly, without any additional constraint. A key aspect for this is that helm charts cannot execute arbitrary code in the client host. Another difference, compared to the previous approach, is that now we also execute the `./contrib/scripts/kind.sh` script from the trusted context (i.e., target branch) instead of the PR context. However, this file is effectively part of the workflow definition, and this change brings consistency with the rest of it. The same also applies for the Gateway API conformance tests. [1]: 654d92f29c4f ("ci-e2e: Use lvh-kind in secure way") Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> 16 January 2024, 11:15:31 UTC
4e93d90 .github: Don't update LVH images on stable branches The v1.14 workflows have obtained tweaks to avoid renovate from updating the dependencies. Rather than editing each workflow on each stable branch, configure the renovate config to avoid updating those dependencies on the stable branches. This is done by splitting the current group for lvh-images into one that applies to bpf-next images (only for main) and one for all other lvh-images (for all maintained branches). Signed-off-by: Joe Stringer <joe@cilium.io> 16 January 2024, 10:49:04 UTC
cbae172 gha: improve conformance-clustermesh workflow coverage Extend the conformance clustermesh workflow to additionally run the tests which require the presence of an extra Kubernetes node where Cilium is not running. In particular, north/south loadbalancing (i.e., global service NodePorts accessed from outside the cluster) and compatibility between ingress and global services. To this end, the test clusters now include one control-plane node and two workers. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> 16 January 2024, 10:30:11 UTC
00ed827 gha: prevent circular dependency in clustermesh-upgrade workflow The simultaneous restart of the clustermesh-apiserver pods in both clusters after rolling out all agents can lead to a circular dependency when Cilium is configured in tunneling mode and KPR=true [1]. For the moment, let's avoid to trigger this scenario in CI, as unlikely to happen in real environments. We never hit this issue before because we only had one worker node, which is targeted by the NodePort, and apparently the clustermesh-apiserver was always scheduled there. [1]: https://github.com/cilium/cilium/issues/30156 Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> 16 January 2024, 10:30:11 UTC
ab2a149 gha: increase ip-identities-sync-timeout in clustermesh-upgrade Currently, it matches the `cilium clustermesh status` wait timeout, making it harder to pinpoint the cause of possible failures, as changes may intervene before collecting the sysdump. Let's raise it to decorrelate the two timeouts. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> 16 January 2024, 10:30:11 UTC
df3ab28 gha: test highest possible cluster ID in conformance clustermesh 809764feed5b ("workflow/clustermesh: set maxConnectedClusters") extended the conformance clustermesh tests to additionally configure the maximum number of possible clusters (either 255 or 511). Let's also configure the two clusters with the extreme cluster ID values, to make sure that the entire range works as expected. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> 16 January 2024, 10:30:11 UTC
b48a281 gha: drop duplicate bpf.monitorAggregation in conformance clustermesh It is already configured by the helm-default action, so let's remove the additional explicit configuration. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> 16 January 2024, 10:30:11 UTC
d05cb83 pkg/comparator: Migrate tests to std Go testing pkg Migrate tests from checkmate (the temporary wrapper for gopkg.in/check.v1) to the standard Go testing framework. Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com> 16 January 2024, 10:30:01 UTC
68c41de pkg/comparator: Remove unused Map{Bool,String}Equals Since all usages of MapStringEquals have been replaced by maps.Equal, the function is now unused, so it can be removed. Also, MapBoolEquals is unused too and it can be replaced by maps.Equal as well, so remove that too. Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com> 16 January 2024, 10:30:01 UTC
fbebd40 k8s: Use maps.Equal in place of comparator.MapStringEquals Since Go 1.21 the maps package and its Equal function are available. All usages of the comparator.MapStringEquals can then be replaced with the new available function. Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com> 16 January 2024, 10:30:01 UTC
c95c827 pkg/comparator: Remove unused Compare function The Compare function from the comparator package is unused, so it should be removed. Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com> 16 January 2024, 10:30:01 UTC
ac9a430 envoy: precompute preferred backends in backendsync Currently, filtering the service backends, that should be synced to Envoy for L7 loadbalancing, might call out to `filterPreferredBackends` multiple times - even though it would be possible to precompute them. Therefore, this commit refactors the EnvoyL7LBBackendSyncer to precompute the preferrred backends. Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com> 16 January 2024, 10:29:43 UTC
4e2645e envoy: fix error in k8s watcher Change error message to start with lowercase. Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com> 16 January 2024, 10:29:43 UTC
056f1ab envoy: provide EnvoyServiceBackendSyncer via envoy Hive Cell This commit moves the initialization of the EnvoyServiceBackendSyncer into the corresponding Envoy Hive Cell. Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com> 16 January 2024, 10:29:43 UTC
1943007 envoy: move EnvoyServiceBackendSyncer from watchers to envoy package After removing the dependency from the service package (service manager) to envoy, it's possible to move the EnvoyServiceBackendSyncer from the k8s watchers package to the envoy package. Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com> 16 January 2024, 10:29:43 UTC
9ab57ac envoy: move UpsertEnvoyEndpoints logic from xDSserver to CEC watcher Currently, the xDSServer provides the possibility to update Envoy endpoints based on loadbalancer backend information. This logic is only used by the CiliumEnvoyConfig watcher to update the backend services of a CEC accordingly. With the introduction of a service backend sync callback, this logic can be moved from the xDS server to the CEC watcher. This way we can get rid of unwanted dependencies from the envoy module to the loadbalancer module. Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com> 16 January 2024, 10:29:43 UTC
f8f2bc1 service: remove envoyXdsServer from service manager The service manager no longer depends on the envoy xDS server. Hence let's remove the field from the manager and its hive cell. Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com> 16 January 2024, 10:29:43 UTC
f82405c service: introduce callback for L7LB backend sync Currently, `Service.RegisterL7LBServiceBackendSync` is built quite Envoy specific, hence it also contains logic how to filter Service backends by the frontendPorts that are passed to the method. To be able to remove the dependency from the ServiceManager to Envoy specific details (implementation and the dependency to the Envoy xDS server), this commit refactors the backend sync registration to receive a callback that gets called whenever a Service (and its backends) changed. This way, the Envoy specific details can be removed from the ServiceManager. Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com> 16 January 2024, 10:29:43 UTC
47c46f3 service: remove duplicated import This commit removes the duplicated import `pkg/datapath/types` from `pkg/service/service.go`. Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com> 16 January 2024, 10:29:43 UTC
5efacda service: introduce struct L7LBResourceName Currently, the struct `loadbalancer.ServiceName` is used when defining a reference to a `CiliumEnvoyConfig` during L7LB service registration. This commit introduces a dedicated struct `L7LBResourceName` to prevent confusion. In addition, the corresponding fields in the `L7LBInfo` are renamed to be Envoy agnostic. Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com> 16 January 2024, 10:29:43 UTC
90005a5 service: remove parameter frontendPorts from RegisterL7LBService Currently the parameter `frontendPorts` in the function `ServiceManager.RegisterL7LBService` is not used from external callees as it's sole purpose is to register a proxy port for a given service. It's only used within the `ServiceManager` in case of calling the function `ServiceManager.RegisterL7LBServiceBackendSync`. Therefore, this commit removes the parameter `frontendPorts` from `RegisterL7LBService` by properly implementing the logic of `RegisterL7LBServiceBackendSync` instead of delegating to `RegisterL7LBService`. Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com> 16 January 2024, 10:29:43 UTC
7fc78e9 ci: Add a call to the update label backport action Add an action to call the workflow that update the labels of backported PRs in stable branch. This commit is based on the following commits by Fabio from v1.14 branch: - 81ade5f693b8 ("ci: Call the workflow to update labels of backported PRs") - a5a047f2fa84 ("ci: Use pull_request_target in update label workflow") The primary change here is to list all maintained branches in a single workflow on main in order to simplify the maintenance burden when creating new stable branches (eg, during v1.15 stable branch creation). This action will not trigger from the main branch for PRs targeted to stable branches. However, when we copy this workflow to stable branches, it will run for PRs targeted to that stable branch (assuming that the versions referenced in this file are kept in sync with the branch version). Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com> Co-authored-by: Joe Stringer <joe@cilium.io> Signed-off-by: Joe Stringer <joe@cilium.io> 16 January 2024, 09:18:49 UTC
683a8e6 gha: extend clustermesh upgrade to also cover external kvstores Let's extend the clustermesh upgrade/downgrade workflow with a new matrix entry to also cover the external kvstores configuration. We leverage the newly introduced kvstore action to setup the etcd containers and retrieve the appropriate parameters. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> 16 January 2024, 07:58:05 UTC
5e8f85d gha: improve max connected clusters coverage in conformance clustermesh Make sure that the max connected clusters option works as expected in all configurations: clustermesh, kvstoremesh and external kvstore. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> 16 January 2024, 07:58:05 UTC
403b3a2 gha: extend conformance clustermesh to also cover external kvstores Let's extend the conformance clustermesh workflow to also cover the external kvstores configuration in addition to plain clustermesh and kvstoremesh. To avoid increasing the number of matrix entries, let's convert two of the already existing ones over to this mode. We leverage the newly introduced kvstore action to setup the etcd containers and retrieve the appropriate parameters. Cluster Mesh configurations are directly specified at installation time, as 'cilium clustermesh connect' does not support this scenario. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> 16 January 2024, 07:58:05 UTC
b311e79 gha: slight matrix generalization in conformance clustermesh As a preparation for the subsequent commit, let's slightly generalize the matrix definition in the conformance clustermesh workflow, replacing the current 'kvstoremesh' boolean entry with 'mode', which can be set to either 'clustermesh', 'kvstoremesh', or, soon, 'external'. Additionally, let's also shuffle a bit the other parameters, to increase the coverage of dual stack clusters and avoid losing coverage due to the subsequent changes. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> 16 January 2024, 07:58:05 UTC
425b459 gha: introduce kvstore action Introduce a new GHA action responsible for generating the appropriate TLS certificates and starting the given number of single replica etcd clusters. It is intended to be leveraged by different workflows (e.g., clustermesh ones) to test Cilium when configured to connect to an external kvstore. In detail, it takes as input: * the number of single replica etcd clusters to be created; * the etcd image, which should be overridden only for testing purposes, as automatically bumped by renovate; * the base name of each container (to which the index is appended); * the Docker network the containers are attached to; and returns as output: * the path to the definition of the cilium-etcd-secrets secret, containing the TLS information to connect to the external kvstore; * the parameters to configure Cilium to connect to the external kvstore; they are parametrized through the KVSTORE_ID variable to specify the ID of the kvstore to connect to; * the clustermesh configuration to connect each cluster to all the remote ones (except for the cluster names, which should be specified externally). Let's additionally assign the new action to the kvstore and sig-clustermesh teams for review, as well as extend the renovate configuration to automatically update the etcd image when appropriate. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> 16 January 2024, 07:58:05 UTC
0558e8f lb-ipam: Add annotation alias with lbipam.cilium.io prefix Recently, we are going with a new convention for annotation name (e.g. service.cilium.io) instead of cilium.io/service-xxx. This commit is to support the same for LB-IPAM module. Signed-off-by: Tam Mach <tam.mach@cilium.io> 16 January 2024, 04:38:30 UTC
ddb206c helm: Permit selection of datasources in UI Signed-off-by: Pat Riehecky <riehecky@fnal.gov> 16 January 2024, 04:37:11 UTC
8b5869f k8s: Fix envoyConfig description on CNP/CCNP CRDs Updated envoyConfig description on Listener struct, and re-generated CRDs with make manifests target. Signed-off-by: Hector Monsalve <hmonsalv@gmail.com> 15 January 2024, 16:49:09 UTC
cc47583 CODEOWNERS: sig-scalability owns scalability-specific GH workflows Signed-off-by: Marcel Zieba <marcel.zieba@isovalent.com> 15 January 2024, 16:28:25 UTC
a843543 CODEOWNERS: Add sig-scalability team to CODEOWNERS Signed-off-by: Marcel Zieba <marcel.zieba@isovalent.com> 15 January 2024, 16:28:25 UTC
fb92d06 bgpv1: Modularize test fixtures Modularize BGP test fixtures so that the test BGP cell can be constructed with more flexibility when needed. Signed-off-by: Rastislav Szabo <rastislav.szabo@isovalent.com> 15 January 2024, 15:29:23 UTC
e3c4d8e chore(deps): update dependency cilium/cilium-cli to v0.15.20 Signed-off-by: renovate[bot] <bot@renovateapp.com> 15 January 2024, 11:41:36 UTC
3381e0f daemon: Remove obsolete bpf-lb-dev-ip-addr-inherit option This option was added for a niche use-case that no longer needs it and the agent did not anymore support it. Remove the remaining code related to it. Signed-off-by: Jussi Maki <jussi@isovalent.com> 15 January 2024, 10:28:40 UTC
7318ce2 L7LB: fix Envoy backend (endpoint) synchronization Currently, when multiple `CiliumEnvoyConfig`s reference the same backend service on different ports, the `frontendPorts` that are used to filter the backends is always overwritten with the ports of the last modified CEC. As a result, not all the Cilium Backends are synchronized to Envoy as Endpoints. This breaks connectivity. Therefore, this commit fixes the frontendPorts by using the ports of all referencing CiliumEnvoyConfigs. Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com> 15 January 2024, 08:58:23 UTC
e1032a0 Add cilium operator go runtime sched latency metrics The operator depends on various goroutines being scheduled one time to perform critical tasks. Significant scheduling lags could indicate potential issues with the operator functions. - Added GO scheduler latency metrics tracking to the cilium operator. Signed-off-by: Fernand Galiana <fernand.galiana@isovalent.com> 15 January 2024, 08:56:35 UTC
80ebd70 bpf: lxc: remove CB_FROM_TUNNEL upgrade toleration for IPv6 This workaround was added by https://github.com/cilium/cilium/pull/29304 to deal with up-/downgrade troubles between v1.14 and v1.15. As the comment says, we can now remove this workaround for the 1.16 devel cycle. Signed-off-by: Julian Wiedmann <jwi@isovalent.com> 15 January 2024, 07:50:05 UTC
ea3a82d chore(deps): update dependency eksctl-io/eksctl to v0.167.0 Signed-off-by: renovate[bot] <bot@renovateapp.com> 13 January 2024, 12:55:24 UTC
effb1ea fix(deps): update all go dependencies main Signed-off-by: renovate[bot] <bot@renovateapp.com> 13 January 2024, 12:36:50 UTC
42f1e68 ipam/crd: remove redundant `len` and `nil` check From the Go specification [1]: "1. For a nil slice, the number of iterations is 0." "3. If the map is nil, the number of iterations is 0." `len` returns 0 if the slice or map is nil [2]. Therefore, checking `len(v) > 0` before a loop is unnecessary. [1]: https://go.dev/ref/spec#For_range [2]: https://pkg.go.dev/builtin#len Signed-off-by: Eng Zer Jun <engzerjun@gmail.com> 12 January 2024, 23:54:14 UTC
1b0a00e update 'kind-install-cilium-fast' Makefile target comment Signed-off-by: Tim Horner <timothy.horner@isovalent.com> 12 January 2024, 20:40:20 UTC
3babde7 add a `fast` make target for kind-clustermesh The current make targets for making a pair of clustermesh'd kind clusters builds all container images and loads them on all nodes for both clusters. This is slow and quite resource intensive. This commit adds a `kind-install-cilium-clustermesh-fast` target, which utilizes the existing `kind-image-fast` target to build and copy only the binaries. The workflow for utilizing fast builds for clustermesh looks like this: $ make kind-clustermesh $ make kind-install-cilium-clustermesh-fast And can be followed with any of the `kind-image-fast*` targets to re-build/copy the binaries. Signed-off-by: Tim Horner <timothy.horner@isovalent.com> 12 January 2024, 20:40:20 UTC
501cb42 ci-clustermesh-upgrade: Adjust name of test, to match cilium-cli's At some point (v0.15.18), connectivity test "no-missed-tail-calls" was renamed as "no-unexpected-packet-drops" in cilium-cli [0]. We now use a cilium-cli version that contain the change, but we've omitted to update the name of the test to run in the workflow. Let's adjust it now. [0] cilium/cilium-cli@4880c91a726d ("connectivity: Check for unexpected packet drops") Fixes: 16fe16637833 ("gh/workflows: Bump CLI to v0.15.18") Signed-off-by: Quentin Monnet <quentin@isovalent.com> 12 January 2024, 20:12:55 UTC
862fcd5 policy: Fix MapState.Equals() Compare the entries of 'msA' and 'msB' rather than 'msB' against itself. Simplify the body of the comparison function for readability. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> 12 January 2024, 19:44:02 UTC
1275493 compile: avoid nil deref of Cmd.ProcessState if compileCmd fails to start The gotcha with Cmd.ProcessState is documented in comments. I'm not sure if we're really interested in Maxrss of failed compilations, or if it really needs to be debug-logged. For troubleshooting something like this, we'd want to reproduce this locally anyway, at which point we can hack in a few log lines. I didn't want to switch to a separate Cmd.Start() and Cmd.Wait(), so the maxrss logic was consolidated into a single block, only executed when compilation was successful, where Cmd.ProcessState is guaranteed to be set. Fixes #29989. Signed-off-by: Timo Beckers <timo@isovalent.com> 12 January 2024, 16:19:03 UTC
b120e23 helm: Bump helm-toolbox version This is to pick the latest helm version, which fixes the below issue. Relates: https://github.com/cilium/helm-toolbox/pull/2 Relates: #28777 Fixes: #30039 Signed-off-by: Tam Mach <tam.mach@cilium.io> 12 January 2024, 12:56:26 UTC
2edc491 gateway: Add GRPCRoute support for status changed predicate This was missed in the previous commit 8a421e7. Signed-off-by: Tam Mach <tam.mach@cilium.io> 12 January 2024, 10:29:16 UTC
114d239 ci conformance e2e: increase request timeout from 10s to 30s. Based on investigation here: https://github.com/cilium/cilium/issues/27762#issuecomment-1886329997 I'd like to increase the response timeout for the request, to see if the json-mock application is hanging or if this is some kind of proxy related issue. Addresses: #27762 Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com> 12 January 2024, 09:10:26 UTC
6e71588 tests: check for pending maps after network policy tests finish There is a flaky test failure due to Removed pending pinned map, did the agent die unexpectedly? being logged by the cilium agent when it finds remains of a previous cilium agent. As far as I can tell this comes about since the testsuite reconfigures cilium between test runs, and that reconfiguration happens while the old agent is applying some config. Right now it's almost impossible to debug the issue since we get the logs for the wrong test. Try to make the "correct" test fail by adding an assertion to test cleanup which checks that there are no maps pending. Adding this check seems to make the problem occur a lot less frequently, which suggests a race of some sort. Updates https://github.com/cilium/cilium/issues/30101 Signed-off-by: Lorenz Bauer <lmb@isovalent.com> 11 January 2024, 22:18:59 UTC
760a109 bpf: lower pending map removal warning to info level This has been making ci-ginkgo fail recently. With the removal of map migrations around the corner (https://github.com/cilium/cilium/issues/29333), and having declared bankruptcy on the Ginkgo test suite, let's not waste more time chasing this bugbear. Signed-off-by: Timo Beckers <timo@isovalent.com> 11 January 2024, 22:14:31 UTC
385dbe5 loader: ignore context cancellations during map migration Allowing replaceDatapath() to be cancelled in the middle of an ongoing map migration is a potential source of chaos. We've recently seen some flakes with errors like `Removed pending pinned map, did the agent die unexpectedly?`, so let's remove this context check to reduce the likelyhood of that happening. Signed-off-by: Timo Beckers <timo@isovalent.com> 11 January 2024, 22:14:31 UTC
3dc3a9b operator/identitygc: remove unused GC.allocationCfg It is unsed since commit 0f323a0feb4a ("refactor: replace identity allocation globals"). Removing it also allows to drop SharedConfig.K8sNamespace which was only used to initialize GC.allocationCfg.k8sNamespace. Signed-off-by: Tobias Klauser <tobias@cilium.io> 11 January 2024, 18:10:02 UTC
a255997 docs: fix chained veth plugin example We previously looked up the chaining mode by name, but this is non-obvious and unnecessary. So, we added the CHI chaining-mode parameter. But, we failed to update the docs to reference this. Fixes: #28714 Signed-off-by: Casey Callendrello <cdc@isovalent.com> 11 January 2024, 16:13:55 UTC
c0dadbe Revert "renovate: don't separate minor/patch updates of Go modules" This reverts commit fece63cd2e171cf2be68c95e8d7f5e35e81e6a4f. Reason for revert: breaks renovate Signed-off-by: Tobias Klauser <tobias@cilium.io> 11 January 2024, 15:42:29 UTC
6edd682 optimize kind setup Signed-off-by: weizhou.lan@daocloud.io <weizhou.lan@daocloud.io> 11 January 2024, 15:14:38 UTC
6c6c121 route: dedicated net ns for each subtest of runListRules Currently, there are cases where the test TestListRules/return_all_rules fails with the following error: ``` --- FAIL: TestListRules (0.02s) --- FAIL: TestListRules/returns_all_rules#01 (0.00s) route_linux_test.go:490: expected len: 2, got: 3 []netlink.Rule{ { - Priority: -1, + Priority: 9, Family: 10, - Table: 255, + Table: 2004, - Mark: -1, + Mark: 512, - Mask: -1, + Mask: 3840, Tos: 0, TunID: 0, ... // 11 identical fields IPProto: 0, UIDRange: nil, - Protocol: 2, + Protocol: 0, }, + s"ip rule 100: from all to all table 255", {Priority: 32766, Family: 10, Table: 254, Mark: -1, ...}, } ``` It looks like there's a switch of the network namespace during the test execution. Therefore, this commit locks the OS thread for the execution of the test that runs in a dedicated network namespace. In addition, each sub-test of the table driven testset runs in its own network namespaceas they run in their own go-routine. Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com> 11 January 2024, 15:13:39 UTC
back to top