https://github.com/cilium/cilium

sort by:
Revision Author Date Message Commit Date
479c7a4 images: update cilium-{runtime,builder} Signed-off-by: Cilium Imagebot <noreply@cilium.io> 07 July 2024, 04:29:56 UTC
42b26f5 chore(deps): update docker.io/library/golang:1.22.5 docker digest to fcae9e0 Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com> 07 July 2024, 04:15:32 UTC
ebbb0cf images: update cilium-{runtime,builder} Signed-off-by: Cilium Imagebot <noreply@cilium.io> 04 July 2024, 07:38:26 UTC
50a09d4 chore(deps): update go to v1.22.5 Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com> 04 July 2024, 07:38:26 UTC
be000ef renovate: remove concurrency group from renovate's Base Image Release Build [ oss commit 1a33ff0d12653cabc27b95066bba64bf33121e44 ] The "Base Image Release Build - Renovate" workflow doesn't need a concurrency group has it will use the concurrency group of the workflow that it uses, the "./.github/workflows/build-images-base.yaml". Using the concurrency groups on both workflows will result in the following error: Canceling since a deadlock for concurrency group 'Base Image Release Build - Renovate-refs/heads/renovate/main-all-dependencies' was detected between 'top level workflow' and 'build-base-images-from-renovate' Fixes: f054f94b24b9 (".github: add workflow for renovate to build base images") Signed-off-by: André Martins <andre@cilium.io> 03 July 2024, 12:23:28 UTC
4116562 renovate: add all dependencies of Makefile.values [ oss commit 99846fd67db870f4d6ff2ae0e9f73df43e2a4e7b ] Now we can let renovate update the dependencies of all images from Makefile.values. Signed-off-by: André Martins <andre@cilium.io> 03 July 2024, 12:23:28 UTC
5030488 chore(deps): update kindest/node docker tag to v1.30.2 Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com> 02 July 2024, 09:20:07 UTC
5973d91 chore(deps): update quay.io/lvh-images/kind docker tag to bpf-20240628.013131 Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com> 02 July 2024, 09:16:09 UTC
5020775 chore(deps): update all github action dependencies Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com> 02 July 2024, 09:16:06 UTC
e7f47e6 install: Update image digests for v1.16.0-rc.1 Generated from https://github.com/cilium/cilium/actions/runs/9718847068 ## Docker Manifests ### cilium `quay.io/cilium/cilium:v1.16.0-rc.1@sha256:0729d9eff50c2c6b798c073c6ecac15c880095c989bf4312b43da7be90bb44f2` ### clustermesh-apiserver `quay.io/cilium/clustermesh-apiserver:v1.16.0-rc.1@sha256:59ddda649562bbf369dc6584f4bf8a699e80b9db3db8f93010df8ccf11ea5eb6` ### docker-plugin `quay.io/cilium/docker-plugin:v1.16.0-rc.1@sha256:93b95ca13e00b3178ae2efa063bb44cbb1fc3030c84277fbaea8f0415bc6a8bf` ### hubble-relay `quay.io/cilium/hubble-relay:v1.16.0-rc.1@sha256:8c941e9c9cb94d23874b988adb9794a497e6d35f9893ef741e37838add909413` ### operator-alibabacloud `quay.io/cilium/operator-alibabacloud:v1.16.0-rc.1@sha256:488cf234f6730b989162e2cb2de4b479ff312d0392ec6a4bb57d697606e36a3a` ### operator-aws `quay.io/cilium/operator-aws:v1.16.0-rc.1@sha256:798917d351dc2ec53e9b71be6d3397c10a0d2d12135ac6a6e9d999862107d432` ### operator-azure `quay.io/cilium/operator-azure:v1.16.0-rc.1@sha256:0f8b0ebe8e5dc9908418602be49dfb40e5f938ed99fe1d3ddc1fec066fb42e37` ### operator-generic `quay.io/cilium/operator-generic:v1.16.0-rc.1@sha256:300d55216909d163060aae17de6305084c8208871d25f8e5962e643f6b58e216` ### operator `quay.io/cilium/operator:v1.16.0-rc.1@sha256:52adead4d4440bc85e66b32fe2ed4336cdb6b89cf4c7b2658f394e00705c2e92` Signed-off-by: Joe Stringer <joe@cilium.io> 28 June 2024, 22:23:43 UTC
ab10ea7 Prepare for release v1.16.0-rc.1 Signed-off-by: Joe Stringer <joe@cilium.io> 28 June 2024, 21:43:32 UTC
f5bdf28 Prepare v1.16 stable branch Signed-off-by: André Martins <andre@cilium.io> 28 June 2024, 20:13:38 UTC
5614531 contrib,tool: exclude slice cleanup The config_replacement.h was removed. So we no longer need in the exclude slice. Signed-off-by: viktor-kurchenko <viktor.kurchenko@isovalent.com> 28 June 2024, 12:45:05 UTC
0666847 clustermesh: grant read permissions to the cilium/.heartbeat prefix The blamed commits introduced the configuration of more granular permissions to access the different data stored in etcd. Among the others, it granted the access to the cilium/.heartbeat key, which is used to check the connection healthiness. However, this setup turned out being too restrictive, as we actually watch the entire corresponding prefix on the client side, triggering warnings like: msg="Unable to list keys before starting watcher" error="etcdserver: permission denied" prefix=cilium/.heartbeat Let's address this by relaxing the configuration and granting access to the entire cilium/.heartbeat prefix. Although we could also change the client to only retrieve and watch the specific key, this would still pose backward compatibility issues, as old clients would continue attempting to access the entire prefix. Fixes: cb6a58bef00b ("clustermesh: granular etcd permissions for kvstoremesh cached data" Fixes: aa10df3a4c6a ("kvstore: correctly assign permissions to single key, rather than prefix") Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> 28 June 2024, 12:44:55 UTC
0d27ba2 Makefile: suppress error in comment line. Prior to this change, integration tests on Ubuntu Minimal 20.04 LTS fail with an error on this comment line. Adding `-@` before the comment allows integration tests to pass. Signed-off-by: Paulo Castello da Costa <pcastello@google.com> 28 June 2024, 10:29:20 UTC
486515e bgpv2: Cleanup defaulting of peer config Use default peer config only when PeerConfigRef is not specified. If it is specified but not existing, skip the peer configuration instead of using the default values. Signed-off-by: Rastislav Szabo <rastislav.szabo@isovalent.com> 28 June 2024, 09:43:16 UTC
7918efb gateway-api: Un-set externalTrafficPolicy on LB service for host network Support has recently been added for setting an `externalTrafficPolicy` on the service generated for a Gateway. Unfortunately, this broke support for the case when `hostNetworkEnabled` is enabled, as only _externally-facing_ services (which services of type `ClusterIP` are _not_) may have an `externalTrafficPolicy`. This commit addresses that by explicitly removing the `externalTrafficPolicy` when `hostNetworkEnabled == true`. Fixes: 95886de65f7 Signed-off-by: Stefan Zwanenburg <stefan@zwanenburg.info> 28 June 2024, 07:20:32 UTC
0a5d2f7 policy: wrong cidrGroupRefs select nothing not all When current code translates a CNP with _only_ a dangling cidrGroupRef, it fails distinguish between nil (meaning an unset selector, selecting everything) and the empty list (selecting nothing). This leads to bizarre situations, such as the following. With "dangling" pointing to a non-existant cidrGroup, the CNP below allowed egress traffic to port 80 instead of denying it. egress: - toCIDRSet: [{cidrGroupRef: dangling}] toPorts: ( ... 80 ... ) To fix this, transform the nil to an empty list in an overly explicit manner to make clear why this is necessary. It's somewhat subtle as in Go, most of the time, nil and empty list are equivalent, hence be explicit to avoid an easy refactor breaking this again. In addition, of course, adapt the tests so that they would catch it too. Signed-off-by: David Bimmler <david.bimmler@isovalent.com> 28 June 2024, 07:17:47 UTC
10da51d policy: add test for non-existent cidrgroupref Dangling CIDR group references should not be part of the result set, as the code currently correctly handles, but the tests don't specify. Add a test for this condition. Signed-off-by: David Bimmler <david.bimmler@isovalent.com> 28 June 2024, 07:17:47 UTC
ea39384 install/kubernetes: update nodeinit image to latest version Renovate does not pick up new version as tag is in sha format rather than regular semver. Related: #32181 Signed-off-by: Marcel Zieba <marcel.zieba@isovalent.com> 28 June 2024, 07:17:18 UTC
2f56ef2 docs: Add note about WG and MTU with CNI chaining Mention cni.enableRouteMTUForCNIChaining introduced by https://github.com/cilium/cilium/pull/33190. Signed-off-by: Martynas Pumputis <m@lambda.lt> 28 June 2024, 05:50:24 UTC
5c2b51b Requeue CEC handling on service details lookup failure This commit ensures that if, during the handling of a CEC or CCEC, the manager cannot lookup the details of a Service, the reconcile of that object will fail, and Hive will requeue the update. Also changes the logs here to be Info rather than Error and makes clear that this will be retried. Also changes Dedicated Ingress resource creation so that the Loadbalancer Service is created _before_ the CiliumEnvoyConfig, which should reduce the incidence of Service lookup failures during CEC or CCEC processing. Signed-off-by: Nick Young <nick@isovalent.com> 28 June 2024, 04:50:38 UTC
a2a9a72 Handle nil service retrieved from Resource store Signed-off-by: Nick Young <nick@isovalent.com> 28 June 2024, 04:50:38 UTC
97e883d Fix CiliumEnvoyConfig Nodeport handling Adds additional service redirect handling for Services with Nodeports set, which will automatically include the Nodeport in the set of redirected ports if ports to redirect are specified. Also removes a hack in the Dedicated Ingress code that was introduced to solve this problem previously. Signed-off-by: Nick Young <nick@isovalent.com> 28 June 2024, 04:50:38 UTC
ca01c05 egressgw: Validate endpoint identity before fetching labels Validate endpoint identity metadata before dereferencing it to fetch identity labels. In case of missing identity, the update is rejected. Fixes: #33268 Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com> 28 June 2024, 00:18:32 UTC
28f7309 operator: include CRD categories when applying cilium CRDs Currently, when the Cilium Operator applies the Cilium CRDs to the cluster, the categories of the CRD aren't configured. This prevents listing all resources by category. This commit fixes this by including the categories when applying the CRDs. Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com> 28 June 2024, 00:16:51 UTC
c2edc7c docs: Add node-local dns troubleshooting steps Signed-off-by: Aditi Ghag <aditi@cilium.io> 28 June 2024, 00:15:56 UTC
69e5ede codeowners: Update teams for local-redirect-policy docs Signed-off-by: Aditi Ghag <aditi@cilium.io> 28 June 2024, 00:15:56 UTC
c9ccf08 Revert "IPAM: Adds IPv6 Prefix Delegation Config Option" This reverts commit 2fc54dd6dd9edb507cb3111e175071000189b81c. Signed-off-by: Chris Tarazi <chris@isovalent.com> 27 June 2024, 18:29:17 UTC
aeb5f5d bpf/tests: Add test to check L3 -> tunnel redirect To test the fix in the previous commit. Ideally, we would want to test against running L3 netdev and vxlan netdev with Cilium's BPF progs loaded. Unfortunately, we don't have an infrastructure for such a test. Thus, only check that the L2 hdr is appended before the redirect. Signed-off-by: Martynas Pumputis <m@lambda.lt> 27 June 2024, 17:57:06 UTC
d4d3aa0 datapath: Fix redirect from L3 netdev to tunnel Previously, the redirect from L3 netdev (in nodeport.h) to Cilium's tunnel device failed due to the bpf_redirect returning -ERANGE [1]. Further investigation revealed the following values for the troublesome skb: skb->len = 60 skb->data = 0x000000009566a77e skb->head = 0x000000001a70ff88 skb->network_header = 64 "skb->data" was set to a weird location when compared to "skb->head", and thus made the check to fail. Adding the L2 hdr before redirecting to the tunnel device resolved the issue. P.S., I wanted to use "if (__ctx_is == __ctx_skb)" instead of the #if. Unfortunately, then the bpf_xdp.c fails to compile with: error: use of undeclared identifier 'ENCAP_IFINDEX' [1]: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/net/core/filter.c?h=v6.7.4# n2147 Signed-off-by: Martynas Pumputis <m@lambda.lt> 27 June 2024, 17:57:06 UTC
5024cb8 operator/identitygc: Disable identitygc when Operator manages CID EnableOperatorManageCIDs enables operator to manage CID by running a CID controller. If enabled, Identity GC cell is then disabled because CID controller takes care of garbage collection. Signed-off-by: Ovidiu Tirla <otirla@google.com> 27 June 2024, 15:44:46 UTC
63e4767 helm: ensure that envoy daemonset is installed only when needed As `.Values.ingressController.enabled` and `.Values.gatewayAPI.enabled` require `.Values.l7Proxy` to be set it's sufficient to just perform a check on that value. Signed-off-by: Filip Nikolic <oss.filipn@gmail.com> 27 June 2024, 15:18:16 UTC
e49c7b3 bpf/tests: Add BPF_TEST_FILE to run a single test Signed-off-by: Martynas Pumputis <m@lambda.lt> 27 June 2024, 14:04:37 UTC
af4ebb7 gh/workflows: Set enableRenableRouteMTUForCNIChaining in ci-awscni This is one of a very few CI workflows which use the CNI chaining. Enable the new option to get some test coverage. Signed-off-by: Martynas Pumputis <m@lambda.lt> 27 June 2024, 07:39:02 UTC
d10c92c gh: Split actions/aws into two This commit splits actions/aws into actions/aws-cni and actions/eks. This is needed for a subsequent commit in which conformance-aws-cni.yaml will enable WireGuard in one of its matrix configurations. Signed-off-by: Martynas Pumputis <m@lambda.lt> 27 June 2024, 07:39:02 UTC
512b708 helm: Add cni.enableRouteMTUForCNIChaining Signed-off-by: Martynas Pumputis <m@lambda.lt> 27 June 2024, 07:39:02 UTC
0cee8ad daemon,cni,mtu: Add EnableRouteMTUForCNIChaining flag The new flag ("--enable-route-mtu-for-cni-chaning") enables the route MTU for pod netns when CNI chaning is used. The feature was introduced by [1]. Initially, the feature could have been enabled only via a CNI config. However, modifying Cilium's CNI config in the case of the CNI chaining has proven to difficult. Therefore, this commit exposes the feature via the cilium-agent flag. [1]: https://github.com/cilium/cilium/pull/26495/ Signed-off-by: Martynas Pumputis <m@lambda.lt> 27 June 2024, 07:39:02 UTC
4a38c68 api: Add enableRouteMTUForCNIChaining The daemon option is going to be used to enable [1] via an cilium-agent flag. [1]: https://github.com/cilium/cilium/pull/26495/ Signed-off-by: Martynas Pumputis <m@lambda.lt> 27 June 2024, 07:39:02 UTC
2de95c0 bpf: rename UINT8_MAX to UINT16_max and fix cluster_id casts The UINT8_MAX macro is defined as the 16bit hex value 0xFFFF. Since cluster_id is now a uint16 to allow for extended Cluster Mesh, this renames the macro to UINT16_MAX, rather than changing the value. Lingering casts of cluster_id as a uint8 have also been corrected to ensure its usage as uint16. Fixes: e8752ec125fc ("bpf: Define UINT8_MAX") Signed-off-by: Tim Horner <timothy.horner@isovalent.com> 27 June 2024, 02:03:12 UTC
5c89cca ci: Extend K8s FQDN test to assert numeric identities after restoration This extends the "Restart Cilium validate that FQDN is still working" Ginkgo test to check that numeric identities do not change after a Cilium agent restart. It does this by fetching the numeric identity of the looked up IP from IPCache and compares it to the numeric identity selected by the ToFQDN selector. In addition, the test also checks that these identities are stable across the restart. This should imply that connections during restoration should also not observe any drops. This is added to Ginkgo rather than Cilium CLI as these kind of state assertions across Cilium agent restarts would require changes to the `conn-disrupt-test-setup`, which in turn also requires us to set up additional client and server pods for checking ToFQDN connectivity, which the current upgrade tests do not facilitate. This might be added in the future as follow-up work, but to have basic coverage of ToFQDN restart persistence with K8s policies this commit provides an easy intermediate solution. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> 26 June 2024, 19:25:58 UTC
a0c0a1f chore(deps): update all lvh-images main Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com> 26 June 2024, 19:12:30 UTC
ab10324 cilium, docs: Add note about netkit in upgrade guide To provide a pointer to the performance guide but also to add a note about in-place upgrades. Closed: #33291 Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> 26 June 2024, 15:11:48 UTC
d2425c8 cilium, docs: Add note about upgrades in performance doc Add information about how to deal with the settings in context of existing clusters. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> 26 June 2024, 15:11:48 UTC
631ff6f Documentation: Add troubleshooting section to L2 Announcements Since its introduction, there has been a small yet steady stream of similar issues reported by users who are trying to use L2. This patch adds a troubleshooting section to the L2 Announcements which should help users to diagnose and resolve common issues on their own. Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com> 26 June 2024, 14:26:28 UTC
601123c envoy: update envoy 1.29.x to v1.29.6 (main) Relates: https://github.com/cilium/proxy/pull/816 Signed-off-by: Tam Mach <tam.mach@cilium.io> 26 June 2024, 13:57:26 UTC
53060c6 chore(deps): update all github action dependencies Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com> 26 June 2024, 13:38:32 UTC
bb211e7 docs: BGPv2 CP documentation update This PR updates Cilium's documentation for the BGPv2 Control Plane. In specific, it adds a caution note for the transport section. Signed-off-by: David Swafford <dswafford@coreweave.com> 26 June 2024, 13:24:20 UTC
3a669e9 docs: Improve note on kube-apiserver entity limitations Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> 26 June 2024, 13:22:03 UTC
471f19a envoy: Enable DaemonSet only for new installation This commit is to _only_ enable Envoy DS for new installation. The existing cilium installation can opt in by setting upgradeCompatibility value as per recommended upgrade guide. One user-defined function (e.g envoyDeamonSetEnabled) is added to avoid code duplication. Another point worth noting is the usage of eq function to enforce boolean data type for inline variable Relates: #30034, #33261 Signed-off-by: Tam Mach <tam.mach@cilium.io> 26 June 2024, 09:21:16 UTC
43642e5 pkg/k8s: Index Pod resources by namespace The Pod resource will be used by Operator Managing CIDs to reconcile all the pods in a namespace when the namespace labels are added or removed. Related #27752 Signed-off-by: Ovidiu Tirla <otirla@google.com> 25 June 2024, 21:18:38 UTC
037480a operator/k8s: Add Namespace resource to operator resources cell The Namespace resource will be used by Operator Managing CIDs to fetch relevant labels to create CIDs. Related #27752 Signed-off-by: Ovidiu Tirla <otirla@google.com> 25 June 2024, 21:18:38 UTC
d0a299b pkg/k8s: Add indexer for CiliumIdentity objects based on security labels Add indexer for CiliumIdentity objects based on security labels to enable efficient lookup of existing CIDs for reuse during allocation when Cilium Operator manages Cilium Identities. Related #27752 Signed-off-by: Ovidiu Tirla <otirla@google.com> 25 June 2024, 21:18:38 UTC
fdf435a fix(deps): update module github.com/hashicorp/go-hclog to v1.6.3 Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com> 25 June 2024, 18:54:35 UTC
0c9b710 .github: update kindest to 1.30.0 Signed-off-by: André Martins <andre@cilium.io> 25 June 2024, 15:11:16 UTC
1e4d3ae hive: Fixed copy-paste error in reconciler.Metrics implementation The code checked if `IncrementalReconciliationTotalErrors` was enabled but then modified `IncrementalReconciliationCurrentErrors`. Seems like a copy-paste error from the case above, fixed it. Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com> 25 June 2024, 10:57:36 UTC
b3ed575 use CiliumEndpoint watcher to remove pod metrics across all nodes Fixes #31889 by using the CiliumEndpoints watcher to drive metrics deletion instead of the Pod watcher. Metrics deletion was being handled by the pod watcher when a pod was deleted so to were all the metrics associated with it. However, if a cilium agent is configured with CiliumEndpointCRD enabled, it will only watch for pods on the current node. This leads to a leak where metrics would never be reaped up on nodes where the pod was not running. Every node watches all the CiliumEndpoints and the endpoint resources contains contain enough information to trigger the deletion of the metrics. This change moves the deletion to the CiliumEndpoint watcher and uses the endpoint name and namespace to delete the metric when the endpoint is deleted. Signed-off-by: Steve Gargan <steve.gargan@gmail.com> 25 June 2024, 10:43:37 UTC
638be8f docs: Mention new `toFQDNs` implementation in upgrade notes This adds the upgrade notes for the new ``toFQDNs`` implementation. It mentions upgrade impact and the new metrics added to troubleshoot it. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> 25 June 2024, 10:22:58 UTC
c1c3164 docs: Add section on `toFQDNs` metrics This adds a section in the `toFQDNs` troubleshooting guide on how the identity usage can be monitored. It makes use of the metrics added in previous commits. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> 25 June 2024, 10:22:58 UTC
af5c1e8 docs: Update troubleshooting guides for ToFQDN This commit updates our docs to use the new `fqdn` identities introduced by commit 719eb4f8cb0831db11184ed7d6667268fbb83720 - rather than the previously used `cidr` identities. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> 25 June 2024, 10:22:58 UTC
9518639 metrics: Add `fqdn_selectors` metric This adds a new simple metric which counts the number of registered `ToFQDN` selectors. This, in combination with the previously added `identity_label_sources` metric, allows users to monitor how many `fqdn` identities are allocated compared to how may `ToFQDN` selectors are registered. If there are orders of magnitude more identities than selectors, then this indicates that selectors are overlapping in different combinations, which can cause the local identity space to exhaust quickly. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> 25 June 2024, 10:22:58 UTC
64cd9f8 metrics: Add `identity_label_sources` metric This adds a new metric which counts the number of identities per label source. This allows users to have a bit more precise breakdown of what types of identities are allocated over the existing `identities` metrics. For example, the new metric allows users to track precisely how many identities contain a `fqdn` or `cidr` label, where as the per-type metric puts them in the same bucket. There are only about a dozen different label sources, so cardinality of the metric should be low. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> 25 June 2024, 10:22:58 UTC
aca59cd policy: take SelectorCache read lock when applying incremental changes We need to take a lock on the SelectorCache for a subtle reason: deny insertion needs the ability to convert a numerical ID to a CIDR, and the SelectorCache provides this functionality. Accordingly, we must lock the SelectorCache while applying changes. Most trees that lead to this take a RLock() of the SelectorCache. This call was an oversight. It does not modify the SelectorCache and thus may safely take an RLock as well. Signed-off-by: Casey Callendrello <cdc@isovalent.com> 25 June 2024, 08:53:26 UTC
ac28fb9 clustermesh: synchronously close backend on reconnection The historical comment mentioned that the resources needed to be released asynchronously as they may time out in case of errors. However, all watchers have been refactored over time, and are now released synchronously. The only remaining bit regards closing the etcd client, which is not blocking. Hence, let's close it synchronously as well (as we already do in case of errors), for consistency and to simplify catching possible issues in tests. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> 25 June 2024, 08:53:11 UTC
f30d83a kvstore: drop context from the Close() signature The context parameter is never used, and close is not expected to be a blocking operation, hence let's just clean it up. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> 25 June 2024, 08:53:11 UTC
5a64653 endpoint: skip unnecessary syncPolicyMaps on first regeneration Endpoints on first regeneration are in a funny state. They reference an old IPCache (which is recreated on agent restart), but their policy maps are not swapped. This means they may be in a state where IPs map to old identities, but those identities are no longer in the new ipcache. This is not 100% fixable as long as we do not recreate the BPF PolicyMap, but we can significantly reduce the time we are in this skewed state, by eliding the first syncPolicyMaps() call. We call syncPolicyMaps twice. Once before compilation, and once after. The first call is to make it safe to clean up stale Envoy resources. However, the first-ever call to syncPolicyMaps will have no stale redirects to clean up regardless. Even worse, it all-but guarantees we are in this skewed state for a partcularly long time, as we will be certainly recompiling the endpoints BPF code. Fortunately, the fix is easy: we can skip the first syncPolicyMaps, and rely on the second, which occurs very quickly after compliation finishes. This minimizes the potential for spurious policy drops. Signed-off-by: Casey Callendrello <cdc@isovalent.com> 25 June 2024, 08:25:13 UTC
56494b1 build-images-base: push to branch if pull request ref doesn't exist With the introduction of workflow_call by f054f94b24b9, pushing changes to the branch was not possible when the event was type "workflow_call" as the github.event.pull_request.head.ref does not exist. Fixes: f054f94b24b9 (".github: add workflow for renovate to build base images") Signed-off-by: André Martins <andre@cilium.io> 25 June 2024, 07:34:20 UTC
e13fa8b helm: drop IDENTITY_ALLOCATION_MODE env var from clustermesh-apiserver The associated flag is no longer configurable since 26d08a888d8a ("clustermesh-apiserver: extract external workloads in a separate cell"), because the clustermesh-apiserver cannot be enabled in combination with kvstore identity allocation mode. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> 25 June 2024, 04:58:06 UTC
036ecab policy: Enable Port Ranges - Remove EndPort warning - Add CRD/API support for Endport. - Modify the prefix logic for bpf map insertion to account for port ranges. - Update all policymap calls that need to add port mask to the argument. - Update the k8s network policy tests with a range look up test. - Pass Endport to Envoy. Signed-off-by: Nate Sweet <nathanjsweet@pm.me> 24 June 2024, 21:27:54 UTC
d63157d policy: Refactor L4PolicyMap to Index Port Ranges The L4PolicyMap needs to index L4Filters by port and protocol. With the introduction of port ranges the golang builtin map type no longer suffices to index the L4Filters. This commit refactors the type to take a mix of named ports (which can never have an EndPort) as well as simple port numbers, and ranges. All usage of the type is rewritten to account for the changes. Signed-off-by: Nate Sweet <nathanjsweet@pm.me> 24 June 2024, 21:27:54 UTC
bc13dbb Revert "CI: bump default FQDN datapath timeout from 100 to 250ms" This reverts commit 34caeb233b0d47a0e4f9553fbd7ac532e0c1a5f8. Thanks to recent improvements on `main`, P99 latency for Cilium DNS in the fqdn-perf workflow has dropped from ~100ms to ~15ms. This should allow us to fall back to the default FQDN timeout to 100ms. Note that in contrast to the above commit, this commit here should not be backported to older branches, as the performance fixes are not backported either. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> 24 June 2024, 19:31:53 UTC
c43b5b4 fix(deps): update all go dependencies main Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com> 24 June 2024, 19:28:23 UTC
24f6630 correct spelling mistake in bgp docs Signed-off-by: Dean <22192242+saintdle@users.noreply.github.com> 24 June 2024, 16:52:07 UTC
a31dcbd cni: Revert "cni: Use correct route MTU for various cloud cidrs" The PR #32244, that was merged with commit 29a340e, was intended to fix IP fragmentation with WireGuard deployments, causing poor network throughput and increased network latency. Unfortunately, after this PR was merged, users began reporting issues with Cilium modifying the MTU of the default interface of the node. This commit reverts the blamed commit in an attempt to fix said issues. The surfaced side-effect is tracked in issue #33303. Fixes: #33258 Signed-off-by: Ryan Drew <ryan.drew@isovalent.com> 24 June 2024, 16:12:03 UTC
50f29b3 policy: Replace panics with error logs with stacktrace Production code should not panic on a subsystem error. Log an error instead and include a stacktrace in the error log message. github.com/hashicorp/go-hclog was already vendored, so this is not adding a new dependency to the project. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> 24 June 2024, 15:38:43 UTC
ba10b99 bpf,tests: Add IPv4 checsum validation The PR adds eBPF tests IPv4 checksum calculation and check. Signed-off-by: viktor-kurchenko <viktor.kurchenko@isovalent.com> 24 June 2024, 15:10:57 UTC
748176f chore(deps): update dependency renovatebot/renovate to v37.415.0 Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com> 24 June 2024, 14:31:28 UTC
2ce0d2e chore(deps): update all lvh-images main Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com> 24 June 2024, 14:31:17 UTC
5afc247 images: update cilium-{runtime,builder} Signed-off-by: Cilium Imagebot <noreply@cilium.io> Signed-off-by: André Martins <andre@cilium.io> 24 June 2024, 14:26:46 UTC
7a7200d chore(deps): update docker.io/library/golang:1.22.4 docker digest to a66eda6 Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com> 24 June 2024, 14:26:46 UTC
5143566 fix(deps): update all go dependencies main Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com> 24 June 2024, 14:23:46 UTC
6496430 hubble: deflake TestLocalObserverServer_NodeLabels Before this patch, TestLocalObserverServer_NodeLabels was flaky because it didn't wait on the LocalNodeWatcher to have been updated at least once before using it. This commit makes NewLocalNodeWatcher retrieve the local node info so that the returned LocalNodeWatcher is usable right away. This means that the Hubble startup will block until the LocalNodeStore is ready with local node info, which in practice should not be an issue. Reported-by: Sebastian Wicki <sebastian@isovalent.com> Signed-off-by: Alexandre Perrin <alex@isovalent.com> 24 June 2024, 13:54:30 UTC
de87391 build-images-base: cancel github runs based on branch name With the introduction of workflow_call by f054f94b24b9, the concurrency group started to cancel jobs based on the workflow name alone which has caused workflow runs created by this workflow were canceled even if they were opened from different branches. Fixes: f054f94b24b9 (".github: add workflow for renovate to build base images") Signed-off-by: André Martins <andre@cilium.io> 24 June 2024, 13:05:53 UTC
b8464fd ctmap: Stop GC handler if signal map is closed When Cilium receives a SIGTERM, the signal manager (`pkg/signal`) cell stop hook is invoked, which in turn informs all signal handlers that the manager has been closed (by passing in a `nil` reader). This causes handlers such as `ChannelHandler` to close their respective channels too. This means that the CT GC (which uses a `ChannelHandler`) needs to deal with the case where its channel is closed during cilium-agent shutdown. This commit addresses that by stopping the GC go routine if the signals channel is closed. Previously, the GC go routine interpreted the closed channel as a signal to run GC and started do back to back GC in a busy loop, resulting in wasted CPU cycles during shutdown. This was also visibly by the following log lines being repeated many times during shutdown: ``` level=info msg="Starting initial GC of connection tracking" subsys=ct-nat-map-gc level=debug msg="Registered BPF map" path=/sys/fs/bpf/tc/globals/cilium_ct4_global subsys=bpf level=debug msg="Registered BPF map" path=/sys/fs/bpf/tc/globals/cilium_ct_any4_global subsys=bpf level=debug msg="Unregistered BPF map" path=/sys/fs/bpf/tc/globals/cilium_ct_any4_global subsys=bpf level=debug msg="Unregistered BPF map" path=/sys/fs/bpf/tc/globals/cilium_ct4_global subsys=bpf ``` With this commit applied, the GC go routine exits properly, emitting a log line while doing so. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> 24 June 2024, 10:08:26 UTC
1ac7ffd chore(deps): update all github action dependencies Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com> 24 June 2024, 10:08:07 UTC
c969368 fix(deps): update kubernetes packages to v0.30.2 Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com> 24 June 2024, 10:07:51 UTC
22fbf3c fix(deps): update aws-sdk-go-v2 monorepo Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com> 24 June 2024, 09:38:12 UTC
7f98d35 bpf: ensure test objects are compiled before tests are run The run target doesn't have a prerequisite on the .o files it ends up testing, so an invocation like make -j2 run has an undefined order. The go test may be invoked before all objects have been compiled. Signed-off-by: Lorenz Bauer <lmb@isovalent.com> 24 June 2024, 09:02:33 UTC
0477a09 daemon/ipam: don't swallow parse error of CIDR It's possible that an invalid CIDR is passed to coalesceCIDRs, so we should not swallow the error of parsing it. Signed-off-by: David Bimmler <david.bimmler@isovalent.com> 24 June 2024, 04:18:40 UTC
c3836a5 chore(deps): update all lvh-images main Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com> 22 June 2024, 16:24:33 UTC
78f3457 chore(deps): update all-dependencies Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com> 22 June 2024, 08:36:47 UTC
114935b .github: fix cloud workflows for renovate Ensure consistency by sanitizing the 'OWNER' field in these workflows. This matches the approach used in other workflows. Fixes: 6f461ea592ca ("run CI automatically for renovate") Signed-off-by: André Martins <andre@cilium.io> 21 June 2024, 22:09:33 UTC
4e63fac operator: Remove deprecated CES sync errors metric Deprecated in v1.14 so we can remove it in v1.16. Fixes: https://github.com/cilium/cilium/issues/23747 Signed-off-by: Chris Tarazi <chris@isovalent.com> 21 June 2024, 21:21:27 UTC
c929bef k8s: Wait for CEC/CCEC sync on bootstrap Include CEC and CCEC into the set of k8s resources that must be synchronized before endpoints are regenerated. This reduces endpoint regenerations during restart and reduces L7 policy churn. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> 21 June 2024, 21:16:39 UTC
99a1539 Revert "policy/k8s: Fix bug where policy synchronization event was lost" This reverts commit 3f9c28810b543bf95d8c169c73f670fc45a9726e. In addition to the reject: - registerResourceWithSyncFn() are moved from the start hook to the constructor so that they are called before daemon.InitK8sSubsystem(). - PolicyManager promise is resolved by newDaemonPromise() after Daemon has been constructed, but before the daemon waits for the k8s resources to have synced. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> 21 June 2024, 21:16:39 UTC
369e927 daemon: Check that DaemonConfig is not changed after being published Add sha256 sum to config and validate no change between DaemonConfig promise and Daemon promise are resolved. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> 21 June 2024, 21:16:39 UTC
f3f78f7 option: Make ConfigPatchMutex a pointer Make ConfigPatchMutex a pointer to be able to take a shallow copy for summing the immutable parts of Config (everything else than the IntOptions). Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> 21 June 2024, 21:16:39 UTC
17faa10 daemon: Do not stall hive Stop if k8s caches are never synced Make wait on k8s cache sync conditional on the daemon context not being canceled. Apply same logic in the goroutine waiting for endpoint restoration. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> 21 June 2024, 21:16:39 UTC
9463a86 daemon: Avoid blocking during hive start Execute startDaemon() in a goroutine, which resolves daemon promise when done. Change dependent start hooks to await for daemon promise in goroutines so that the execution of start hooks is not stalled. This way the hive may run concurrently with daemon start. Move some early validation from startDaemon() to the start hook so that we can fail out while we can simply return the error from the start hook. Move saving of daemon config to file also away from startDaemon, so that we can resolve the DaemonConfig promise from the start hook itself. The daemon promise will be resolved from the goroutine that runs startDaemon(), or rejected if there was any error. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> 21 June 2024, 21:16:39 UTC
d56b5d1 k8s/synced: Log an error if a resource is added to blocking set too late Log an error so that CI will flake when a resource is added to the blocking set after caches have already synchronized. Moved provision of k8s.CacheStatus type to k8s/synced and provision it in k8s/synced to avoid circular dependency. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> 21 June 2024, 21:16:39 UTC
back to top