sort by:
Revision Author Date Message Commit Date
c464e66 helm: mount kvstoremesh-specific certificate into cilium agents Let's additionally mount the kvstoremesh-specific certificate into cilium agents, so that it can be used to authenticate against the local etcd instance storing the cached data. The secret entry is always configured (although marked as optional), regardless of whether KVStoreMesh is actually enabled or not, so that it can be automatically mounted in case it gets subsequently enabled, without requiring a restart of the agents. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> 13 June 2024, 13:01:41 UTC
9ffeba1 helm: generate dedicated certificate for kvstoremesh access Extend the helm chart to additionally generate the "local" certificate with the common name matching the newly introduced "local" etcd user, when kvstoremesh is enabled. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> 13 June 2024, 13:01:41 UTC
cb6a58b clustermesh: granular etcd permissions for kvstoremesh cached data Currently, the same etcd user (i.e., remote) is granted permissions to read the whole content of the clustermesh-apiserver's sidecar etcd instance, including also the data cached by kvstoremesh, when enabled. In an effort to harden the overall clustermesh posture, let's introduce a separate and dedicated user for local access, to ensure that remote clusters cannot access cached data, as it may include information that they would not normally have access to. Specifically, the remote user is intended to have access only to the information regarding the local cluster, while the local user can access cached data about remote clusters only. Still, for backward compatibility purposes, the remote user still retains access to cached data as well in this release. The reason being that there would otherwise be a time window upon upgrade in which Cilium Agents would lose access to the kvstoremesh data (especially in large clusters). Indeed, the new certificate would be mounted by the agents only upon rollout, but the configuration would be immediately reloaded (thus targeting the new, not yet mounted, certificate), hence breaking the access to the information cached by kvstoremesh. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> 13 June 2024, 13:01:41 UTC
22bab80 Fix release build SBOM generation - Use the correct image to generate SBOMs - Stop release asset uploads, which can require extra permissions (that this workflow doesn't have) and that we don't need. Signed-off-by: Feroz Salam <feroz.salam@isovalent.com> 13 June 2024, 12:08:45 UTC
70db218 endpoint: remove unused parameter from Add/NewEndpoint functions This commit removes unused parameters from the functions `Add*Endpoint` and `New*Endpoint` from the EndpointManager. - `reason` - `nodeName` Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com> 13 June 2024, 12:07:31 UTC
9f6d52e bpf: avoid race when selecting the RevSNAT port The logic to allocate SNAT mapping contains a race condition. At a high level it does the following: if (!revsnat_exists(port)) { if (!create_revsnat(port) return error; ... } Two concurrent executions of the datapath may succeed the revsnat_exists check, which then leads to one of them bailing out since create_revsnat fails. Instead simply try to create the RevSNAT entry. If that fails we retry with another port. Signed-off-by: Lorenz Bauer <lmb@isovalent.com> 13 June 2024, 12:03:41 UTC
dc52072 clustermesh: switch to surge upgrade strategy. With introduction of Clustermesh support for HA deployment in #31677 let's change upgrade strategy to make sure that Clustermesh control plane is always available. This is also configuration that we test against in CI tests - maxSurge=1 and maxUnavailable=0. On top of that change required to preferred antiAffinity to cover case with a single node cluster. Signed-off-by: Marcel Zieba <marcel.zieba@isovalent.com> 13 June 2024, 12:01:54 UTC
55a9d2e loader: cache parsed CollectionSpec The object cache parses an ELF from disk any time it is asked for a template. This is wasteful since parsing the ELF is quite resource intensive. Cache the parsed CollectionSpec instead. Signed-off-by: Lorenz Bauer <lmb@isovalent.com> 13 June 2024, 11:55:56 UTC
70f6608 loader: evict object cache when datapath config changes The object cache currently does no invalidation, which means that we accumulate cachedObject in memory and template ELF on disk. Use update of the base datapath hash as an opportunity to evict some of that cache. In practice this is probably not a big issue: datapath config changes rarely if every, and we only have templates for endpoints and host endpoint. Signed-off-by: Lorenz Bauer <lmb@isovalent.com> 13 June 2024, 11:55:56 UTC
ec651ea service: remove monitoragent nil-check Currently, accessing the monitoragent from the service manager is guarded with nil-checks as unit-tests don't provide a monitoragent. This commit removes the check in favor of a fake implementation that is used and passed in the tests. Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com> 13 June 2024, 11:54:09 UTC
2e308f9 service: unexport NewService NewService is no longer used as the service manager is provided via Hive Cell. Therefore, this commit un-exports the function `NewService`. Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com> 13 June 2024, 11:54:09 UTC
0078c6a logging: Pass debug to slog as well Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> 13 June 2024, 12:00:36 UTC
de97dd8 chore(deps): update all lvh-images main Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com> 13 June 2024, 11:06:19 UTC
74039b5 k8s: remove unused method NewStandaloneClientset This commit removes the deprecated and now unused method `NewStandaloneClientset`. In the meantime, all usages have been moved to use the Hive Cell dependency directly. Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com> 13 June 2024, 10:41:35 UTC
511f077 clustermesh: enable kvstoremesh by default KVStoreMesh has been introduced in v1.14. We have been running KVStoreMesh tests since then, while also testing upgrade path from "vanilla" Clustermesh to KVStoreMesh and back since then. There has been also a visible adaptation by users in community. Let's mark KVStoreMesh as stable and enable it by default. Note: Once 1.16 is out, we will need to update CI test Cilium Cluster Mesh upgrade (ci-clustermesh) Signed-off-by: Marcel Zieba <marcel.zieba@isovalent.com> 13 June 2024, 10:14:43 UTC
fd0b235 bpf: overlay: remove wireguard.h include This probably is no longer needed since 81c45d2280d7 ("bpf: Remove strict encrypt check from bpf_overlay"). Signed-off-by: Julian Wiedmann <jwi@isovalent.com> 13 June 2024, 07:14:02 UTC
28f7863 bpf: encap: clean up some unneeded includes The include for l3.h is no longer needed since 3aa51eb3052d ("bpf: ipsec: move get_min_encrypt_key() to encrypt.h"), and the last wireguard usage went away with b67291f03926 ("bpf: Encap with cilium_{vxlan,geneve} before passing to WG"). This uncovered some implicit includes for hs-ipcache, fix them up. Signed-off-by: Julian Wiedmann <jwi@isovalent.com> 13 June 2024, 07:14:02 UTC
dbd33db helm/certgen: add generation of hubble-ui-client-certs certificate This certificate appeared not to be generated by certgen, hence leading to an inconsistency with respect to the other certificate generation modes. Let's fix this divergence to ensure that they are all equivalent. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> 13 June 2024, 06:58:19 UTC
b7ac7c3 helm/certgen: bump certgen to v0.2.0 and adapt configuration Bump certgen to v0.2.0, which enables the definition of the certificates to be generated via a generic configuration, agnostic of the Cilium specific details. Hence, let's refactor the certgen configuration and explicitly define the characteristics of the certificates to be generated. While being there, let's also correctly propagate the extra DNS names and IP addresses, that were previously ignored if certgen was used. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> 13 June 2024, 06:58:19 UTC
8910af5 helm/certgen: explicitly specify CA secret namespace and CN In preparation for bumping the certgen version, which changes the default values to be independent of Cilium, let's explicitly specify the namespace containing the CA secret and its common name. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> 13 June 2024, 06:58:19 UTC
dd97ac3 helm/certgen: always specify --ca-secret-name Currently, we specify the --ca-secret-name certgen parameter only if both tls.ca.cert and tls.ca.key are specified. However, the secret name is always relevant, regardless of whether the CA is explicitly specified, and the default value historically matched the specified one. As a preparation for bumping the certgen version, which changes the default secret name to be independent of Cilium, let's always specify the corresponding parameter. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> 13 June 2024, 06:58:19 UTC
b429b08 bpf: transport source identity in MARK_MAGIC_OVERLAY Provide easy access to the security identity which is embedded into Cilium's overlay traffic. And start making use of it in the encrypted-overlay path, to avoid some manual packet parsing. Signed-off-by: Julian Wiedmann <jwi@isovalent.com> 13 June 2024, 06:34:52 UTC
b0e0b0c bpf: propagate src sec id from ingress bpf_overlay to egress bpf_host as in a subsequent change bpf_host will need to access the source identity that was carried over the tunnel Signed-off-by: Gilberto Bertin <jibi@cilium.io> 13 June 2024, 06:08:14 UTC
c3fcea2 gateway-api: Add periodic headless service sync This is to handle the case of late arrival or late creation of Endpoint or EndpointSlice associated to headless service. Fixes: cce40804c3ac9f564859d788faef981a697de7ac Signed-off-by: Tam Mach <tam.mach@cilium.io> 13 June 2024, 04:03:39 UTC
7ab2a58 gateway-api: Add support for listener isolation This commit is to support Listener Isolation concept from the upstream, which allows at most one Listener matches a request, and only Routes attached to that Listener are used for routing. Relates: https://github.com/kubernetes-sigs/gateway-api/pull/2465 Relates: https://github.com/kubernetes-sigs/gateway-api/pull/3047 Signed-off-by: Tam Mach <tam.mach@cilium.io> 13 June 2024, 04:03:39 UTC
4f92330 gateway-api: Shorten the service name in CEC We already perform the shortening in the below commit, this is to make sure that the same service is used. Relates: d6fbccf96cdc0a5f3bdf7aa7ac6006a100a09ba9 Signed-off-by: Tam Mach <tam.mach@cilium.io> 13 June 2024, 04:03:39 UTC
8efbb71 gateway-api: Avoid unnecessary reconciliation in GAMMA This is to avoid any unnecessary reconciliation for non-GAMMA HTTPRoute: - Explicitly check if Kind and Group are not nil, as per the Gateway API spec, the nil values is meant for Gateway. - Add GAMMA check for backend services and listening service. Additionally, one small correction on Reason status is added to make sure that the space character is not used. ``` 2024-06-06T05:34:31.583996151Z time="2024-06-06T05:34:31Z" level=error msg="Reconciler error" HTTPRoute="{attaches-to-wildcard-example-com-with-hostname-intersection gateway-conformance-infra}" controller=httproute controllerGroup=gateway.networking.k8s.io controllerKind=HTTPRoute error="failed to update HTTPRoute status: HTTPRoute.gateway.networking.k8s.io \"attaches-to-wildcard-example-com-with-hostname-intersection\" is invalid: parents[0].conditions[0].reason: Invalid value: \"Invalid HTTPRoute\": parents[0].conditions[0].reason in body should match '^[A-Za-z]([A-Za-z0-9_,:]*[A-Za-z0-9_])?$'" name=attaches-to-wildcard-example-com-with-hostname-intersection namespace=gateway-conformance-infra reconcileID="\"2c43d9eb-52ad-4344-b0ff-e58c227221fb\"" subsys=controller-runtime ``` Relates: 363fdd4ff951e02ebf666b1dccf17d0dfb5a0f47 Signed-off-by: Tam Mach <tam.mach@cilium.io> 13 June 2024, 04:03:39 UTC
4d6bee1 gateway-api: Avoid partial wildcards in server names This is to make sure that we don't have any "*" in server name slice, and avoid the below NACK issue in Envoy. ``` 2024-06-06T05:15:22.640515083Z time="2024-06-06T05:15:22Z" level=warning msg="NACK received for versions after 233 and up to 234; waiting for a version update before sending again" subsys=xds xdsAckedVersion=233 xdsClientNode="host~127.0.0.1~no-id~localdomain" xdsDetail="Error adding/updating listener(s) gateway-conformance-infra/cilium-gateway-same-namespace-with-https-listener/listener: error adding listener '127.0.0.1:14239': partial wildcards are not supported in \"server_names\"\n" xdsNonce=234 xdsStreamID=6 xdsTypeURL=type.googleapis.com/envoy.config.listener.v3.Listener ``` Signed-off-by: Tam Mach <tam.mach@cilium.io> 13 June 2024, 04:03:39 UTC
5b9364a gha: Update conformance profiles for Gateway API As part of v1.1.0, there is a new list of valid conformance profile values (e.g. GATEWAY-HTTP,GATEWAY-TLS,GATEWAY-GRPC,MESH-HTTP,MESH-GRPC). Signed-off-by: Tam Mach <tam.mach@cilium.io> 13 June 2024, 04:03:39 UTC
f03eca9 gha: Swap feature flag name for MeshConsumerRoute As mentioned in 8de7a903400aa237600c1f49d0d4ef16503c2ee3, we can use the feature flag MeshConsumerRoute in v1.1.0 instead. Relates: 8de7a903400aa237600c1f49d0d4ef16503c2ee3 Signed-off-by: Tam Mach <tam.mach@cilium.io> 13 June 2024, 04:03:39 UTC
f29c6c4 gateway-api: Bump to version v1.1.0 While GRPCRoute is still available in beta/alpha version, some of the related attribute structs are only available in v1, hence it's better to bump GRPRRoute to v1 as well. Another goal is to pick up the new conformance tests as well as bug fixes from the upstream. Signed-off-by: Tam Mach <tam.mach@cilium.io> 13 June 2024, 04:03:39 UTC
3f8585a gh: e2e-upgrade: disable config 7 The config is reliably failing [0]. Stabilize the workflow so that we can make it required. [0] https://github.com/cilium/cilium/issues/32689 Signed-off-by: Julian Wiedmann <jwi@isovalent.com> 13 June 2024, 00:49:27 UTC
b49f912 gha: Only retrieve IPv4 CIDR from docker network It seems like github runner is enabled with docker dual stack, so the current docker network inspect command might return IPv6 instead of IPv6 CIDR, which breaks LB IPPool configuration. Sample output of `docker network inspect kind` command can be found as per below. This commit is to make sure that we only retrieve IPv4 CIDR in docker network inspect command. Additionally, some echo/cat statement are added to make similar issue more visible in the future. ``` [ { "Name": "kind", "Id": "43e3b3267092150f5f2e6f2053157d912ad6b5a4ce20f700e1e9be547a437f75", "Created": "2024-06-12T14:18:17.733107881Z", "Scope": "local", "Driver": "bridge", "EnableIPv6": true, "IPAM": { "Driver": "default", "Options": {}, "Config": [ { "Subnet": "fc00:f853:ccd:e793::/64" }, { "Subnet": "172.18.0.0/16", "Gateway": "172.18.0.1" } ] }, "Internal": false, "Attachable": false, "Ingress": false, "ConfigFrom": { "Network": "" }, "ConfigOnly": false, "Containers": { "748d7161857ca5e610f196299828eacafcbdb069d38c00e4e6c14cdeefada9c5": { "Name": "chart-testing-control-plane", "EndpointID": "0f1a5bbeb14929200ed13cb289afd6bf5f9f455d4ed75bb3a26e167e67bf7784", "MacAddress": "02:42:ac:12:00:02", "IPv4Address": "172.18.0.2/16", "IPv6Address": "fc00:f853:ccd:e793::2/64" }, "c2030425e24a11ea208b87c5d70e194b0f51eee133f09b67404fd2bf97410f13": { "Name": "chart-testing-worker", "EndpointID": "81489bd101e483be7270e2b5dd7e0bf3a0163b89650d7ef69cc4ce43454479e3", "MacAddress": "02:42:ac:12:00:03", "IPv4Address": "172.18.0.3/16", "IPv6Address": "fc00:f853:ccd:e793::3/64" } }, "Options": { "com.docker.network.bridge.enable_ip_masquerade": "true", "com.docker.network.driver.mtu": "1500" }, "Labels": {} } ] ``` Signed-off-by: Tam Mach <tam.mach@cilium.io> 12 June 2024, 22:23:31 UTC
7af9a1e chore(deps): update golangci/golangci-lint docker tag to v1.59.1 Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com> 12 June 2024, 20:57:25 UTC
6aa1bc8 Update CEPS watchdog Given the inherent reconciliation to check on ceps bfp programs, using logs with error severity could be confusing. Also we currently don't log out the cep name which will help for further investigation. * Change logger from error to warning * Add cep name to log message Signed-off-by: Fernand Galiana <fernand.galiana@isovalent.com> 12 June 2024, 20:54:31 UTC
2e2c6c5 policy: determine subject identities via SelectorCache In order to determine applicable identities to which a policy applies, we need to evaluate label selectors. Given that we already have an efficient mechanism for caching label selectors (the SelectorCache), we should use that for subject endpoints as well. This refactors the PolicyRepository to use the SelectorCache when determining subject identities. It removes yet another static cache of matched identities and a corresponding event bus. It also saves memory in the case of reused selectors, which is common. An important consideration is that any new identities must be in the selectorcache *before* that endpoint is regenerated, or else it will not get the correct set of policies. Indeed this is safe, because identity allocation updates the SelectorCache synchronously, and endpoints must have their security identity allocated before they can use it. Signed-off-by: Casey Callendrello <cdc@isovalent.com> 12 June 2024, 18:32:24 UTC
437cc73 policy/selectorcache: correctly handle mutating IDs While computing the delta on an ID allocation, the SelectorCache incorretly handled the case where a label change caused an identity to no longer be selected by a selectior. The only identity that should have mutable labels is the local host, so this is not actually a visible bug. In preparation for using the SelectorCache to determine policy targets, however, it is now necessary. Signed-off-by: Casey Callendrello <cdc@isovalent.com> 12 June 2024, 18:32:24 UTC
e20ed9c bpf: host: add host_egress_policy hook this commit adds a hooking point to cil_to_netdev in bpf_host.c that can be used by cilium plugins to extend the functionality of this function. Signed-off-by: Gilberto Bertin <jibi@cilium.io> 12 June 2024, 14:53:20 UTC
9d4b3de removed deprecated calls and added nolint for strings.Title Signed-off-by: yogesh1801 <yogeshsingla481@gmail.com> 12 June 2024, 13:29:10 UTC
a1be027 docs: egressgw: remove kernel requirement We already require a 5.4 kernel (https://github.com/cilium/cilium/pull/30869). We also explicitly check for HAVE_LARGE_INSN_LIMIT (https://github.com/cilium/cilium/pull/30896), which afaik was the main reason for the 5.2 kernel requirement. Signed-off-by: Julian Wiedmann <jwi@isovalent.com> 12 June 2024, 12:54:26 UTC
b5ad1e4 daemon: remove unused policyupdater dependency from daemon/daemonparams With the removal of the k8swatcher initialization from the daemon bootstrap, the dependency to the policyUpdater can be removed from the daemon & daemonParams struct. Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com> 12 June 2024, 11:47:59 UTC
5375256 nodediscovery: explicit dependency to k8sNodeWatcher Currently, the k8sWatcher is set as dependency on the nodeDiscovery during agent initialization by using the method `RegisterK8sSetters`. Trying to add an explicit dependency from the NodeDiscovery to the `K8sWatcher` results in a cyclic dependency via datapath. With the modularization of the k8sWatcher into smaller cells, it's possible to define the explicit dependency only to the `k8sCiliumNodeWatcher`, as this is the only part the NodeDiscovery is intersted in. This way, there's no cyclic dependency. Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com> 12 June 2024, 11:47:59 UTC
0f586d0 k8s: move init test to new watcher_test.go This commit extracts the k8sWatcher related unit test into it's own file `watcher_test.go`. (Separate commit to keep the git history). Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com> 12 June 2024, 11:47:59 UTC
e44d411 k8s: rename watcher_test.go to service_test.go Currently, the file `watcher_test.go` mostly contains service related unit tests. Therefore, the file gets renamed to `service_test.go`. An upcoming commit will extract the only K8sWatcher related test into `watcher_test.go`. Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com> 12 June 2024, 11:47:59 UTC
4780989 k8s: remove k8sSvcCache from k8swatcher and use directly as daemon dep Currently, during daemon initialization, multiple components access the k8sSvcCache through the corresponding exported field in the k8sWatcher. This commit removes the field from the k8swatcher and forces the daemon to depend on the `k8sSvcCache` directly. In addition, some tests of the k8sWatcher would have been freed up from using the k8sWatcher at all, as they were only testing service logic. Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com> 12 June 2024, 11:47:59 UTC
da23ff4 k8s: extract k8sCiliumEndpointsWatcher Currently, all the k8s watchers of the `k8sWatcher` are defined in the same struct, have access to all the same dependency fields and are provided as one Cell. This commit extracts the k8s CiliumEndpoints watcher into it's own sub-cell that is provided privately. Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com> 12 June 2024, 11:47:59 UTC
62e214b k8s: extract k8sCiliumLRPWatcher Currently, all the k8s watchers of the `k8sWatcher` are defined in the same struct, have access to all the same dependency fields and are provided as one Cell. This commit extracts the k8s CiliumLRP watcher into it's own sub-cell that is provided privately. Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com> 12 June 2024, 11:47:59 UTC
0c8ab4f k8s: extract k8sEndpointsManager Currently, all the k8s watchers of the `k8sWatcher` are defined in the same struct, have access to all the same dependency fields and are provided as one Cell. This commit extracts the k8s Endpoints watcher into it's own sub-cell that is provided privately. Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com> 12 June 2024, 11:47:59 UTC
5249766 k8s: extract k8sServiceManager Currently, all the k8s watchers of the `k8sWatcher` are defined in the same struct, have access to all the same dependency fields and are provided as one Cell. This commit extracts the k8s Service watcher into it's own sub-cell that is provided privately. Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com> 12 June 2024, 11:47:59 UTC
35bbd26 k8s: extract k8sNamespaceWatcher Currently, all the k8s watchers of the `k8sWatcher` are defined in the same struct, have access to all the same dependency fields and are provided as one Cell. This commit extracts the k8s Namespace watcher into it's own sub-cell that is provided privately. Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com> 12 June 2024, 11:47:59 UTC
7b44b07 k8s: extract k8sCiliumNodeWatcher Currently, all the k8s watchers of the `k8sWatcher` are defined in the same struct, have access to all the same dependency fields and are provided as one Cell. This commit extracts the k8s CiliumNode watcher into it's own sub-cell that is provided privately. Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com> 12 June 2024, 11:47:59 UTC
de58c84 k8s: extract k8sPodWatcher Currently, all the k8s watchers of the `k8sWatcher` are defined in the same struct, have access to all the same dependency fields and are provided as one Cell. This commit extracts the k8s Pod watcher into it's own sub-cell that is provided privately. Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com> 12 June 2024, 11:47:59 UTC
914063a k8s: extract k8sEventReporter Currently, k8s event reporting is part of the k8sWatcher. It's used by sub-watchers of the k8swatcher itself, but also by external watchers (e.g. IPAM watcher). As a first step to further modularize the k8swatcher into its smaller components, the k89s event reporting is extracted into an own cell and struct `k8sEventReporter`. This way, other components can depend on it. Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com> 12 June 2024, 11:47:59 UTC
a7c3744 k8s: introduce k8s watcher cell Currently, the k8swatcher is initialized in the daemon bootstrap function `newDaemon`. With the modularization of all its dependencies into their own Hive Cell, it's about time to move the initialization of the k8sWatcher into its own Hive Cell too. In a first step, the cell only provides the pre-initialized struct, without moving any of the lifecycle aspects into the Cell. For the time being, these are being kept in the daemon initialization. Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com> 12 June 2024, 11:47:59 UTC
973d540 envoy: Remove un-necessary warning log filtering Relates: https://github.com/cilium/cilium/pull/31108 Relates: https://github.com/envoyproxy/envoy/pull/30735 Signed-off-by: Tam Mach <tam.mach@cilium.io> 12 June 2024, 09:58:44 UTC
ca81c9c bpf: host: use security identities in to-netdev's trace notifications For some types of traffic, to-netdev derives precise security identities. Consistently use these values in the trace notifications. Signed-off-by: Julian Wiedmann <jwi@isovalent.com> 12 June 2024, 08:20:41 UTC
65e93a2 ci: add tests for migration to CiliumEndpointSlice This commit adds CI to test that the migration from CiliumEndpoint to CiliumEndpointSlice does not disturb long-lived connections. A Kind cluster is set up without CiliumEndpointSlice enabled. Long-lived connections are set up. Then, CES is enabled, the operator is restarted and then the agent, after the CES CRD is created. Then, the connectivity test is run to ensure long-lived connections were not broken. Signed-off-by: jshr-w <shjayaraman@microsoft.com> 12 June 2024, 08:17:42 UTC
811cb7f make: Add include to Makefile.override within binary-specific makefiles make: Add include to Makefile.override in binary Makefiles This commit adds an include statement for Makefile.override in Makefiles specific to building Cilium's go binaries. Makefile.override is included in the top-level Makefile as a method for optionally overriding variables, however it is not included in any of these binary-specific Makefiles. This means that the ability to override variables is only available for targets in the top-level Makefile, preventing use cases where overriding variables used in these binary-specific Makefiles can be useful. As an example, this commit would allow one to override the GO variable to specify a specific go binary to use in order to build a target. Signed-off-by: Ryan Drew <ryan.drew@isovalent.com> 12 June 2024, 08:17:30 UTC
9cfa1a2 make, docker: Add ADDITIONAL_MODIFIERS environment variable This commit adds a new environment variable to the docker-specific aspects of the Cilium Makefiles named `ADDITIONAL_MODIFIERS`. This environment variable can be used to modify the `MODIFIERS` docker build arg, adding in any extra values that haven't previously been specified via a preset, such as `RACE` or `NOSTRIP`. Signed-off-by: Ryan Drew <ryan.drew@isovalent.com> 12 June 2024, 08:17:30 UTC
c4aebae docker, ci: Create generalized MODIFIERS build arg This commit replaces the NOSTRIP, NOOPT, LOCKDEBUG, RACE, V and LIBNETWORK_PLUGIN docker build args with a single, generic build arg named "MODIFIERS". This allows for arbitrary flags to be passed to make when building a docker image as well as removes the need for modifications to dockerfiles when a new build-time modifier is added. One example use case is using `Makefile.overrides` to define a new flag that can be passed to make when building docker images. The new flag could enable appending values to the MODIFIERS build argument, which would allow the propagation of configuration variables down to make invocations used to build binaries within a Dockerfile. Signed-off-by: Ryan Drew <ryan.drew@isovalent.com> 12 June 2024, 08:17:30 UTC
9334d97 l2-discovery: fix health reporting for link updater As-is, when l2 neighbor discovery is enabled, the node-neighbor-link-updater controller fails with "invalid node spec found in queue". This is due to a bug in the controller's DoFunc, where an empty list is treated the same as an invalid queue entry. When this controller fails, `cilium status` reports errors for all nodes in the cluster similar to the following: ``` cilium cilium-mgstt controller node-neighbor-link-updater is failing since 21s (49x): invalid node spec found in queue: (*manager.nodeQueueEntry)(nil) ``` To differentiate between an empty queue and a nil item, the queue's `pop` method now also returns a bool to indicate whether an element was successfully retrieved from the queue. Fixes: #8d525fe Signed-off-by: Tim Horner <timothy.horner@isovalent.com> 12 June 2024, 08:17:06 UTC
22b3e82 bgpv2: Allow empty advertisement Remove unnecessary restriction. Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com> 12 June 2024, 08:08:24 UTC
26325a8 docs: ipsec: mention dependency on transparent mode for DNS proxy For connections that are established by the DNS proxy, this is required to detect the original source IP and apply IPsec policy accordingly. The agent fatals if IPsec and L7 proxy are enabled, but the DNS proxy is not set to transparent mode. Signed-off-by: Julian Wiedmann <jwi@isovalent.com> 12 June 2024, 05:14:37 UTC
9c7bd8a gha: bump status wait timeouts in clustermesh upgrade/downgrade tests The blamed commit already increased the post-upgrade timeout. However, we have now started witnessing failures in the other wait operations as well, due to endpoint regeneration not completing on time. Hence. let's bump all timeouts to 10m. Related: 01c3b8376046 ("gha: bump post-upgrade timeout in clustermesh upgrade/downgrade tests") Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> 12 June 2024, 01:53:58 UTC
a57393f README: Update releases Signed-off-by: Quentin Monnet <qmo@qmon.net> 11 June 2024, 20:45:44 UTC
a1d0307 images: update cilium-{runtime,builder} Signed-off-by: Cilium Imagebot <noreply@cilium.io> 11 June 2024, 20:08:20 UTC
8afe844 images: Fix copyo mistake in error message This error message was copied from the equivalent runtime script. Fix it. Signed-off-by: Joe Stringer <joe@cilium.io> 11 June 2024, 20:08:20 UTC
f639135 .github: Regenerate api/v1 when updating builder The builder image contains the 'protoc' binary which can generate different API files when it's updated, notably because protoc decides to encode its own version into the files it outputs. Add a step in the builder image update workflow to update the api/v1 files. Signed-off-by: Joe Stringer <joe@cilium.io> 11 June 2024, 20:08:20 UTC
a37eaad ci: Enable LRP connectivity tests Signed-off-by: Aditi Ghag <aditi@cilium.io> 11 June 2024, 16:34:05 UTC
478e637 bpf: Disable conflicting per packet LB Per-packet LB is disabled in certain cases like when socket-LB is enabled, and load-balancing is handled in bpf_sock. However, there are other features (e.g., L7 LB) that require per-packet LB. This can conflict with processing local-redirect services in some cases. Based on user configured local redirect policies, load-balancing can be skipped for certain local-redirect services. More specifically, LB is skipped in some cases when users deploy LRPs with skipRedirectFromBackend flag. Per packet LB should not override LB decisions made for local-redirect services in bpf_sock. Signed-off-by: Aditi Ghag <aditi@cilium.io> 11 June 2024, 16:34:05 UTC
961820e docs: Promote local redirect policy feature to stable Signed-off-by: Aditi Ghag <aditi@cilium.io> 11 June 2024, 15:05:14 UTC
4a3b6c8 bgpv2: Remove node selector check from v2 PodCIDRReconciler Remove unnecessary CiliumNode label selector check for PodCIDR advertisements. This was reflected from the BGPv1 code, but for BGPv2 we would like to avoid it, as this behavior is inconsistent with other advertisement types (other advertisement types advertise the paths for selected resources, but PodCIDR only applies to the local node). Signed-off-by: Rastislav Szabo <rastislav.szabo@isovalent.com> 11 June 2024, 13:44:33 UTC
085343b docs: add upgrade note about the slightly different dialer behavior The port specified as part of the kvstore address is now respected also when the address matches a Kubernetes service, to prevent inconsistencies if the service includes multiple ports. Additionally, mention that the etcd.operator option is no longer required, and has been removed. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> 11 June 2024, 13:43:14 UTC
27e425c k8s: remove the now unused TransformToK8sService helper Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> 11 June 2024, 13:43:14 UTC
6e763ac kvstore: remove the now unused IsEtcdOperator,SplitK8sServiceURL funcs Additionally drop the etcd.operator kvstore option, which is no longer required as the service resolver logic is now always enabled. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> 11 June 2024, 13:43:14 UTC
28a7e82 service: drop the legacy and now unused custom dialer All usages have been converted over to the generic implementation in the previous commits. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> 11 June 2024, 13:43:14 UTC
0bd5200 cilium-dbg: use newly introduced custom dialer in troubleshoot commands Let's uniform the troubleshoot commands to also use the generic custom dialer implementation, and cleanup the existing hacks. We stick to the existing implementation and don't use the service resolver in this case, instead, to avoid starting an informer from a CLI tool. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> 11 June 2024, 13:43:14 UTC
5560cfb operator: use newly introduced custom dialer and resolver for etcd Similarly as for the Cilium agent, let's migrate the operator to use the newly introduced dialer and service resolver for etcd, and untangle it from the SyncK8sServices option, so that it can be turned off independently for performance reasons when not necessary (i.e., if clustermesh is not used). Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> 11 June 2024, 13:43:14 UTC
d57a922 daemon: use newly introduced custom dialer and resolver for etcd Migrate the Cilium agent to use the newly introduced generic custom dialer and service resolver for etcd, to decouple the custom dialer logic from the service cache. In an effort to simplify the logic, the dialer is always registered (i.e., without performing the kvstore.IsEtcdOperator check), as the dialer is transparent if not matching a service name. Similarly, we don't explicitly wait for cache synchronization, as that's already automatically performed by the resolver to retrieve the service store. Additionally, in case the timeout expires, the etcd client would simply retry connecting again, eventually succeeding. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> 11 June 2024, 13:43:14 UTC
9db5384 clustermesh: switch to newly introduced custom dialer and resolver Migrate the clustermesh cells, both in the agent and in the operator (for endpointslice synchronization) to use the newly introduced generic custom dialer and service resolver. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> 11 June 2024, 13:43:14 UTC
0e8c5a6 agent: introduce service resolver to map svc DNS name to ClusterIP Let's introduce a new cell which provides the resolver logic to map DNS names matching Kubernetes services to the corresponding ClusterIP address. It is backed by a lazy resource.Store, which is started only upon the first translation request for a service DNS name (i.e., either matching name.namespace, or name.namespace.svc[.other]). Overall, it is a generalized version to replace the already existing approaches spread across the codebase, and in particular: * the reliance upon the ServiceCache, which in certain circumstances may not be available (e.g., in the operator); * the similar approach already leveraged in the clustermesh/epslicesync package, which is more naive, and doesn't support lazy startup. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> 11 June 2024, 13:43:14 UTC
7a46fd4 agent: introduce new generic context dialer with resolvers support It allows to register a set of resolvers to translate the target hostname into the corresponding IP address, or possibly another alias DNS name. The dialer eventually calls (&net.Dialer).DialContext with the first successfully translated address, or the original one otherwise (ports are never modified) It's main purpose is to be used as a DialOption for etcd, and resolve DNS names representing k8s services to the corresponding ClusterIP without depending on CoreDNS. Overall, it represents a generic version of and aims to replace the already existing k8s.CreateCustomDialer utility. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> 11 June 2024, 13:43:14 UTC
0a634bf CODEOWNERS: Move devcontainer to cilium/ci When updating the builder image, this file gets updated, then pulls in @cilium/contributing as a codeowner. Move it over to cilium/ci to reduce the number of touchpoints for builder update points. Signed-off-by: Joe Stringer <joe@cilium.io> 11 June 2024, 12:23:44 UTC
6a1222d helm: directly leverage cilium.ca.setup for hubble certs generation Rather than using the intermediate hubble-generate-certs.helm.setup-ca, which performs the same steps. This brings consistency with the same operations performed for clustermesh-related certificates, and prevents divergences when generating/retrieving the CA certificate. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> 11 June 2024, 09:43:11 UTC
519d391 helm/certgen: use namespaced RBAC for hubble certs generation Convert the ClusterRole/ClusterRoleBinding to Role/RoleBinding to reduce the overall permissions considering that certgen only needs to access the secrets in the local namespace, based on the current configuration. This also aligns it with the equivalent permissions used for clustermesh. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> 11 June 2024, 09:41:51 UTC
11aa5e3 fix(deps): update all go dependencies main Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com> 11 June 2024, 09:13:00 UTC
28f308a doc: Listed L2LB LB class to LB IPAM doc Added the L2LB LoadBalancerClass `io.cilium/l2-announcer` to the LB IPAM documentation page. Signed-off-by: Philip Schmid <phisch@cisco.com> 11 June 2024, 09:00:04 UTC
4b1aba4 Remove etcd.managed Helm setting The etcd-operator Helm templates rely on a piece of software which is no longer maintained upstream, and it relies on outdated CRDs which are no longer supported since Kubernetes 1.22. The setting has been hidden and not documented for several releases, we can remove it now. Signed-off-by: Joe Stringer <joe@cilium.io> 11 June 2024, 08:58:00 UTC
f99f10b docs: Deprecate support for podnetwork etcd Running Etcd in podnetwork to distribute state between Cilium instances introduces a range of challenges to bootstrapping and ensuring reliable connectivity within the cluster. We've deprecated in-built support in the Helm charts for this sort of configuration for several releases, and documented suggested alternatives. If we deprecate this feature then we can simplify some of the operations inside the cilium-agent. For alternative installation steps, see https://docs.cilium.io/en/stable/installation/k8s-install-external-etcd/#admin-install-daemonset . Signed-off-by: Joe Stringer <joe@cilium.io> 11 June 2024, 08:56:45 UTC
e9d8122 renovate: prevent upgrading certgen to v0.2 in stable branches certgen v0.2 is going to introduce breaking changes. Hence, let's introduce a new renovate rule to prevent it from being upgraded in stable version. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> 11 June 2024, 08:02:42 UTC
96989f4 renovate: remove unnecessary etcd-related constraint This etcd-related constraint appears to have been added in the blamed commit. However, it doesn't seem intentional, considering that the latest etcd version is currently v3.5.14. Hence, let's just drop it. Fixes: b3d7d4d1dcd2 ("renovate: try to group dependency updates on single PR") Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> 11 June 2024, 08:02:42 UTC
936f928 ci-e2e: Add the coverage for Ingress + bpf.masquerade Hopefully, this will help to catch some issues with Ingress. Signed-off-by: Tam Mach <tam.mach@cilium.io> 11 June 2024, 07:01:14 UTC
6947d82 maps: nat: remove rtp.log Looks like this was accidentally checked in by https://github.com/cilium/cilium/pull/32152. Signed-off-by: Julian Wiedmann <jwi@isovalent.com> 10 June 2024, 19:40:44 UTC
719eb4f fqdn: ToFQDN policy performance improvements This commit implements `CFP-28427: ToFQDN policy performance improvements`. It is highly recommended to consult the CFP, as it contains all the high-level design decisions and mechanism found in this commit. The rest of this commit message therefore only explains the "what" and "where", and not the "why". Before this commit, there was circular interaction between the `SelectorCache` and `NameManager`: `SelectorCache` would tell `NameManager` about new `ToFQDN` selectors, and `NameManager` would in turn inform `SelectorCache` about the IPs selected by that `ToFQDN` selector. This commit simplifies this logic by removing the backlink from the `NameManager` to the `SelectorCache`. IPs are instead now labelled with the selector as an `fqdn` identity label in IPCache, thus not requiring any direct changes to the `SelectorCache` when a new IP is discovered that shares the identity with an old IP. If there is identity allocation needed for an observed IP, the `SelectorCache` is still updated, but only via `IPCache`, and no longer directly from `NameManager`. I recommend first looking at the changes to `SelectorCache` in `pkg/policy`. Note the following changes: 1. The `identityNotifier` interface (implemented by `NameManger`) is simplified: We no longer care about IPs selected by a FQDN selector, and we no longer need to care about potential deadlocks, as there are no calls back from `NameManager` to `SelectorCache` in the invoked functions (the indirect backlink from `NameManager` to `SelectorCache` via `IPCache` happens in `NameManager.UpdateGenerateDNS` - but this function is called by the DNS proxy whenever it observes a new DNS lookup and thus is called without the selector cache lock held. 2. `UpdateFQDNSelector` (previously invoked by `NameManager`) is removed - `SelectorCache` no longer directly needs to know the IPs matched by a selector. 3. The `fqdnSelector` type is simplified: Instead of containing the list of CIDR identities (one for each selected IP) and checking for the CIDR identity in `matches`, we now can simply treat the FQDN selector as a label and thus check if the requested identity has the FQDN selector label. 4. All the unit test logic around managing the selected IPs is removed, as all the responsibility for updating IPs now lies in `NameManager`. For the `NameManager` in `pkg/fqdn`, the changes are as follows: 1. Minor changes to for the query functions in `DNSCache`: Instead of just listing or checking the existence of an IP, we now want to know about `(name, IP)` pairs (needed later for updating `IPCache`). 2. Similarly, where before we only cared about the mapping between an `FQDNSelector` and the selected IPs, we now want to know what `(name, IP)` pairs are matched by a particular selector. Thus `mapSelectorsToIPsLocked` is replaced with `mapSelectorsToNamesLocked` and the unit tests are updated as well. 3. `RegisterFQDNSelector` now checks if the new selector needs to be added to any known `(name, IP)` pairs as an `fqdn` label, and `UnregisterFQDNSelector` potentially removes `fqdn` labels from IPs. 4. `UpdateGenerateDNS` (invoked for DNS lookups) determines the labels of any newly discovered IP and now directly spawns the go routine to wait for the new `(IP, identity)` pair to be injected into `IPCache`. Previously, this waiting was done as part of the call to `UpdateSelectors`, previously implemented in `daemon/cmd/fqdn.go` (and now removed). 5. `ForceGenerateDNS` is removed. It was previously called by the `NameManager` GC to remove IPs from the `SelectorCache`, but since the `SelectorCache` no longer knows about IPs, the function is obsolete (note that `IPCache` removals are still performed upon GC) 6. Changes in `CompleteBootstrap` to deal with the upgrade logic when upgrading from Cilium v1.15. See bullet point 9 below for details. 7. `updateDNSIPs` (called from `UpdateGenerateDNS`, i.e. upon new DNS lookups) now determines the labels for every newly observed IP based on the available FQDN selectors, and no longer upserts CIDR identites. Note that we only update the labels matching the looked up `dnsName`. If an IP happens to also map to a different domain name and uses a different set of selectors for the alternative name, those labels in IPCache are unaffected by the call to `updateMetadata`, as every call to IPCache uses the DNS name as the resource owner. 8. The `ipcacheResource`, `updateMetadata`, and `maybeRemoveMetadata` contain the calls to `IPCache` to update labels for a given `(name, IP)` pair. There are two main differences to before: Instead of upserting or removing CIDR prefixes, we now add labels. And instead of having one update per prefix, we now have one update per `(name, IP)` pair, meaning a single prefix (aka "IP") might have multiple IPCache resource owners in the `NameManager` (i.e. one for each `name` mapping to that IP). 9. `RestoreCache` and `CompleteBootstrap` contain the logic to initialize `IPCache` when upgrading from Cilium v1.15. This requires the previous Cilium instance to have checkpointed the known `ToFQDN` selectors, which are read in during upgrade and used to derive and inject the `IPCache` labels we expect to have once endpoint regeneration has finished. After endpoint regeneration, those restored labels are then removed, leaving the real labels in place. In contrast to all other `IPCache` updates (where each update to an IP is "owned" by the DNS name mapping to that IP, and we rely on `IPCache` to merge those labels), the resource owner here is static. This is, because they are all added at once (in `RestoreCache`) and removed at once (in `CompleteBootstrap`), and no per-name tracking is required. 10. Various changes to unit tests. The old unit tests tested the interaction between `NameManager` and `SelectorCache`, where as the new unit tests now test the interaction between `NameManager` and `IPCache`. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> 10 June 2024, 16:06:10 UTC
625e39f fqdn: Derive domain labels from FQDN selectors This commit adds logic to derive identity labels for `(name, IP)` pairs from selectors. The basic idea is that any ToFQDN selector matching the qname of the DNS lookup is added to a label to each IP returned by that DNS lookup. The functions added here will be used in a subsequent commit. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> 10 June 2024, 16:06:10 UTC
dfc11ab daemon: Wait for initial IPCache revision This introduces a wait for the initial IPCache revision after K8s caches have synced. This ensures that all prefix labels are injected and available in the new IPCache before restoration starts. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> 10 June 2024, 16:06:10 UTC
ed299a3 ipcache: Always add world label to identities with fqdn label A subsequent commit will change prefix labels upserted by the name manager to use `fqdn`-labels instead of `cidr`-labels. Because a CIDR identity currently always also have the world label, we want to mirror that logic for identities with an `fqdn` label, as such IPs allowed by a ToFQDN policy remains selectable by a `reserved:world` selector. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> 10 June 2024, 16:06:10 UTC
4035fea labels: Simplify `IsReserved` implementation This contians no functional changes and is a drive-by cleanup. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> 10 June 2024, 16:06:10 UTC
999d5f0 daemon: Also restore checkpointed FQDN identities This commit modifies the IPCache restoration to restore all local identity entries, not just CIDR identities. This is required because FQDN labels are derived from ToFQDN selectors, which are only available during endpoint regeneration. To ensure that identities of prefixes in IPCache don't change during initial regeneration, we provide the expected `fqdn` labels before regeneration. The real labels are added during regeneration, therefore the restored ones can be safely removed in `releaseRestoredIdentities`. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> 10 June 2024, 16:06:10 UTC
fda5b55 clustermesh: drain all known entries upon cluster ID change Recent changes introduced improved validation to ensure that the information retrieved from remote clusters matches the advertised cluster ID, and discard it otherwise. Let's additionally fully drain all previously known entries upon cluster ID change. Indeed, although synthetic deletion events would be generated in any case upon initial listing (as the entries with the incorrect cluster ID would not pass validation), that would leave a window of time in which there would still be stale entries for a cluster ID that has already been released, potentially leading to inconsistencies if the same ID is acquired again in the meanwhile by a different cluster. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> 10 June 2024, 16:06:02 UTC
back to top