https://github.com/cilium/cilium

sort by:
Revision Author Date Message Commit Date
79084b6 CODEOWNERS: Update ownership for EGW docs dir restructure. Now egw/docs-structure teams will be called to review: * EGW TOC page in ./networking * All child pages in ./networking/egress-gateway. This will make it easier to manage ownership in the future as all new EGW pages in the egw dir will be owned by egw & docs-structure teams. Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com> 12 July 2024, 01:14:20 UTC
ade188c Documentation: move egress gateway pages to separate dir. Now only egw rst file in top level Documentation/networking is the TOC file. All further EGW specific docs will be put in: Documentation/networking/egress-gateway. This will also make managing CODEOWNER ownership easier, subsequent commits will move ownership to new directory. Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com> 12 July 2024, 01:14:20 UTC
f5e613a Documentation: create dedicated egress-gateway section on index. EGW is a feature on it's own. With the addition of the advanced troubleshooting topics the "external networking" section seems to no longer adequately encapsulate it's various topics. This splits EGW and Ext Networking sections using the following structure. External networking Setting up Support for External Workloads (beta) Egress Gateway Egress Gateway Advanced Troubleshooting Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com> 12 July 2024, 01:14:20 UTC
e3cd607 Documentation/metrics.rst: fix unnecessary back-quoting on metric name. Should just be one back quote instead of three, this was being rendered to show one the inner quotes when viewing the HTML documentation. Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com> 12 July 2024, 01:14:20 UTC
9413abd Documentation: add page for egress gateway troubleshooting. New document is a general troubleshooting document for egress-gateway. This probably belongs in its own page as the content is more detailed then the generate egress-gateway documentation page (which has a troubleshooting page around fixing common issues with EGW CEGP configuration issues. This also references this document at the bottom of the egress-gateway page. Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com> 12 July 2024, 01:14:20 UTC
f67b9b9 chore(deps): update cilium/cilium-cli action to v0.16.13 Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com> 12 July 2024, 00:09:56 UTC
3fa44b5 conformance-{gateway-api,ingress}: Run cilium-cli inside Docker - Run cilium-cli inside a container in preparation to merge cilium-cli repo to cilium repo as proposed in CFP-25694 [^1]. - Move "Install Cilium CLI" step after "Create kind cluster" step so that cilium-cli can access .kube/config file. - Bump cilium-cli version to v0.16.13 to pick up cilium/cilium-cli#2672 - Add --disable-check=minimum-version flag to cilium install. Checking Kind version doesn't make sense when you run cilium-cli from inside a container since it cannot access the kind binary on the host. [^1]: https://github.com/cilium/design-cfps/pull/9 Signed-off-by: Michi Mutsuzaki <michi@isovalent.com> 11 July 2024, 21:00:12 UTC
d1abb01 LocalRedirect: RedirectBackend and RedirectFront are immutable Local Redirect Policy updates are currently not supported. If there are any changes to be made, delete the existing policy, and re-create a new one. Introducing CEL expressions to prevent users from modifying the redirectFront and redirectBackend fields Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com> 11 July 2024, 20:42:18 UTC
245ec0b README: Update releases Signed-off-by: Nate Sweet <nathanjsweet@pm.me> 11 July 2024, 20:09:10 UTC
231f3ba github template: remove jenkins CI template Jenkins is no longer being used in Cilium. Signed-off-by: André Martins <andre@cilium.io> 11 July 2024, 18:24:53 UTC
d7e7cc8 .github: rewrite issue template We have been receiving a lot of GitHub issues for older Cilium versions that have bugs fixed by recent versions. To prevent situations where we request users to update Cilium after they have created the GitHub issue, we will request users to try to update Cilium before creating the GitHub issue by modifying the issue template. Signed-off-by: André Martins <andre@cilium.io> 11 July 2024, 18:24:53 UTC
30a897a docs: Add node about socketLB.hostNamespaceOnly to Kata page Signed-off-by: Martynas Pumputis <m@lambda.lt> 11 July 2024, 17:50:35 UTC
5369196 renovate: onboard etcd image used in integration tests Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> 11 July 2024, 17:16:25 UTC
0edc35c Fix Hubble drop event emitter reasons join in config Viper.GetStringSlice() uses spf13.cast.ToStringSlice() that relies on strings.Fields() to split a string into an array. Fields() splits on whitespace, not on commas. hubble-drop-events-reasons joins Helm values list elements using a comma, so the splitting doesn't work and the resulting array of reasons is invalid. No packet drop reasons match and so we get no events even when we should. Switching to joining using a space resolves the issue. Signed-off-by: Eric Mountain <eric.mountain@datadoghq.com> 11 July 2024, 13:10:28 UTC
a96c9a9 clustermesh: prevent test races leveraging renameio.WriteFile This prevents race conditions in tests caused by the inotify watcher observing incomplete file writes, that can otherwise occur if using the os.WriteFile function from the standard library. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> 11 July 2024, 12:15:45 UTC
ea18cea kvstore: don't compute metrics scope for each watch event Let's compute it only once at the beginning when starting the kvstore, to save a bit of unnecessary operations. We additionally trim a possible leading / separator from the prefix, to ensure that it doesn't count against the number of tokens for scope computation. Most notably, this allows to coalescence the kvstore_events_queue metrics for the synced prefixes of all clusters into a single set, which does not depend on the number of clusters we are connected to. In the best case (i.e., when connecting to 254 clusters), this allows to drop ~3k metric series emitted by each Cilium agent, and KVStoreMesh replica. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> 11 July 2024, 12:15:26 UTC
3ceabd3 store: optimize KVPair marshalling and unmarshalling The KVPair struct implements the store.Key interface, and it is used to marshal/unmarshal simple key-value pairs, with no further interpretation of the value structure. Most notably, it is used for data retrieval and propagation in kvstoremesh. Currently, both the key and the value are stored as strings. However, this leads to unnecessary allocations during the marshalling and unmarshalling phase, as the value is provided as a byte slice. Hence, let's convert the struct to store it directly as a byte slice as well, to prevent them. The NewKVPair method is not updated, as its parameters are currently always provided as strings. Results of the following simple benchmark: func BenchmarkMarshalUnmarshal(b *testing.B) { key := "cilium/cache/nodes/v1/cluster-001/balanced-pigeon" kv := NewKVPair(key, strings.Repeat(".", 512)) for range b.N { out, _ := kv.Marshal() _ = kv.Unmarshal(key, out) } } Before: BenchmarkMarshalUnmarshal-20 14551974 92.92 ns/op 512 B/op 1 allocs/op BenchmarkMarshalUnmarshal-20 14522690 90.91 ns/op 512 B/op 1 allocs/op BenchmarkMarshalUnmarshal-20 14063810 89.61 ns/op 512 B/op 1 allocs/op BenchmarkMarshalUnmarshal-20 14396366 88.02 ns/op 512 B/op 1 allocs/op BenchmarkMarshalUnmarshal-20 13878943 89.60 ns/op 512 B/op 1 allocs/op After: BenchmarkMarshalUnmarshal-20 1000000000 0.1146 ns/op 0 B/op 0 allocs/op BenchmarkMarshalUnmarshal-20 1000000000 0.1183 ns/op 0 B/op 0 allocs/op BenchmarkMarshalUnmarshal-20 1000000000 0.1164 ns/op 0 B/op 0 allocs/op BenchmarkMarshalUnmarshal-20 1000000000 0.1209 ns/op 0 B/op 0 allocs/op BenchmarkMarshalUnmarshal-20 1000000000 0.1162 ns/op 0 B/op 0 allocs/op Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> 11 July 2024, 12:15:26 UTC
a0b0151 store: removed unused KVPair.DeepKeyCopy method Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> 11 July 2024, 12:15:26 UTC
7e9de94 gh: nat46x64: don't set DSR dispatch mode when LB mode is SNAT Apply a small cleanup to make the config consistent. There is no point in configuring a DSR dispatch mode when the cluster is not using DSR. Signed-off-by: Julian Wiedmann <jwi@isovalent.com> 11 July 2024, 09:06:54 UTC
358740c Fix bug where ec2-api-endpoint used the wrong endpoint We have setup proxy for aws service and need to set customized aws endpoint for cilium. We set the ec2-api-endpoint: ec2.custom.com in cilium config, but get error in cilium log: level=fatal msg="Unable to init eni allocator" error="unable to update instance type to adapter limits from EC2 API: operation error EC2: DescribeInstanceTypes, exceeded maximum number of attempts, 3, https response error StatusCode: 0, RequestID: request send failed, Post "https://ec2.us-east-1.amazonaws.com/": EOF" subsys=cilium-operator This patch fix the ec2 endpoint resolver issue by using correct service name Fixes: #33597 Signed-off-by: Archer Wu <archerwu9425@icloud.com> Signed-off-by: Joe Stringer <joe@cilium.io> 10 July 2024, 23:29:08 UTC
cd9bd94 Update cni-chaining-azure-cni.rst Update cni-chaining-azure-cni.rst Azure chain mode should follow same logic as per https://github.com/cilium/cilium/pull/23159 so that cilium in chain mode should not own /etc/cni/net.d or uninstallation will be problematic Signed-off-by: Mais <8318308+Mais316@users.noreply.github.com> Signed-off-by: Mais <mai.saleh@siemens.com> 10 July 2024, 23:28:30 UTC
f534200 helm: expose socket linger timeout helm option Signed-off-by: David Bimmler <david.bimmler@isovalent.com> 10 July 2024, 19:31:23 UTC
75289b3 fqdn: add socket linger timeout flag Add a flag to allow setting the SO_LINGER socketopt on the socket for the connection of DNS proxy to the upstream DNS server. The flag is a tristate: -1 to disable, 0 to send TCP RST on close, and a timeout value (in seconds) to let close block at most. Setting linger can positively impact the operation of Cilium's DNS proxy due to the following circumstances: 1. When a DNS response is large, the server can truncate its reply. Using the truncated flag on DNS responses, the server can signal to the client that it did not receive the full response, and that it should retry the query using TCP. 2. DNS clients often simultaneously request both A and AAAA records. Typically, they do so from the same source port. When using TCP, both queries are sent via the same, persistent TCP connection. 3. Cilium's DNS proxy fails to correctly handle this persistent TCP connection - it opens a connection to the upstream DNS server _per query_. 4. When running in transparent mode, the DNS proxy thus may attempt to bind the same local source IP/port combination rapidly - this only works since the proxy sets the SO_REUSEADDR socket opt. Unfortunately, depending on network circumstances, the second bind-then-connect can fail due to a kernel bug. Setting the linger timeout to zero has the effect of sending a TCP RST as soon as the connection is closed - i.e. forcefully slamming the connection shut. This works around the issue of DNS proxy's errors when reusing the same local ip/port combination in a relatively non-invasive way, since it avoid the kernel bug around socket reuse. However, it doesn't solve the root cause of the proxy attempting to open two connections when it should persist the connection just like the client does. Co-authored-by: David Bimmler <david.bimmler@isovalent.com> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> Signed-off-by: David Bimmler <david.bimmler@isovalent.com> 10 July 2024, 19:31:23 UTC
c0ad441 docs: cleanup upgrade docs on main Accidentally, we left out 1.16/1.15 upgrade notes. Also, remove warning with updating cilium-cli as cilium-cli does no longer support uninstall/install since v0.16.0 version Signed-off-by: Marcel Zieba <marcel.zieba@isovalent.com> 10 July 2024, 17:35:53 UTC
3a21748 ingress: Set active backend number for Local ETP This commit is to set number of active local endpoints for L7 service, which is used by CiliumEnvoyConfig (directly) and Ingress/GatewayAPI (indirectly). As the proxy port is available, it's most likely that Envoy listener is up and running, just a note that there might be a chance that the resources in CEC are problematic and not working as expected, however, it's out of scope for HealthCheckNodePort for the underlying L7 service. Relates: #33547 Signed-off-by: Tam Mach <tam.mach@cilium.io> 10 July 2024, 16:33:36 UTC
961a1a6 pkg/hive: Add FeatureLifecycle Lifecycles can be initiated or terminated based on feature names. Associated lifecycle hooks are grouped by feature, and methods are exposed to control the start and stop actions for all hooks within a feature group. Signed-off-by: Ovidiu Tirla <otirla@google.com> 10 July 2024, 16:32:09 UTC
e2b5a79 gh: cilium-config: enable IPv6 BPF Masquerade with HostFW https://github.com/cilium/cilium/pull/31511 enabled the combination of HostFW with IPv6 BPF Masquerading. Reflect this in the cilium-config action. Signed-off-by: Julian Wiedmann <jwi@isovalent.com> 10 July 2024, 15:55:04 UTC
25ae144 renovate: do not update etcd on older branches Since etcd v3.5.5, bash is no longer present on etcd images and we can't upgrade to newer versions as we still use a bash init script to setup clustermesh. Signed-off-by: André Martins <andre@cilium.io> 10 July 2024, 15:53:34 UTC
a217960 chore(deps): update dependency cilium/cilium-cli to v0.16.12 Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com> 10 July 2024, 14:55:21 UTC
342bb98 gha: Add http client timeout in Ingress Relates: https://github.com/cilium/ingress-controller-conformance/pull/3 Relates: https://github.com/cilium/cilium/issues/31857 Signed-off-by: Tam Mach <tam.mach@cilium.io> 10 July 2024, 13:35:22 UTC
7e0d074 bpf: fix flaky nat test We recently fixed the port allocation behaviour of the NAT helpers. Most tests were fixed to accomondate this, but test_nat4_icmp_error_udp_egress slipped through the cracks. Fixes: 9b6c0bddb6 ("bpf: fix __snat_clamp_port_range not using highest port") Signed-off-by: Lorenz Bauer <lmb@isovalent.com> 10 July 2024, 13:12:29 UTC
2c7df4f ci: Set cluster id in external workloads With KVStoreMesh enabled by default, we rely on having cluster id. Currently, external-workloads did not have it set in CI, which was causing CI failure when trying to update cilium-cli containing change from https://github.com/cilium/cilium-cli/pull/2660 Signed-off-by: Marcel Zieba <marcel.zieba@isovalent.com> 10 July 2024, 12:04:33 UTC
df79ac9 vendor: Bump StateDB to version v0.2.1 Update StateDB and fix up the API usage. The new version brings a simpler API and defaults for the reconciler to make its usage easier. Signed-off-by: Jussi Maki <jussi@isovalent.com> 10 July 2024, 11:51:22 UTC
f650fd8 option: Make TestDaemonConfig_StoreInFile less brittle Verify that the error contains "Config differs:" substring rather than comparing the whole error as it contains contextual information that changes over time. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> 10 July 2024, 11:35:00 UTC
5f2e61a auth: fix fatal error: concurrent map iteration and map write Fixes: #33468 Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com> 10 July 2024, 09:51:06 UTC
7db9251 gateway-api: Switch to slog Signed-off-by: Tam Mach <tam.mach@cilium.io> 10 July 2024, 03:08:19 UTC
a804c17 docs: Update LRP feature status Signed-off-by: Aditi Ghag <aditi@cilium.io> 10 July 2024, 01:55:26 UTC
1de1d87 docs,LRP: Add steps to restart agent and operator pods The step is required for the relevant CRDs to be registered, and the feature configuration change to take effect. Signed-off-by: Aditi Ghag <aditi@cilium.io> 10 July 2024, 01:55:26 UTC
d2515d2 .github: add release tool workflow This workflow will allow release managers to create releases from GitHub UI. This will be used in combination with the release tool that is available under https://github.com/cilium/release.git Signed-off-by: André Martins <andre@cilium.io> 09 July 2024, 19:04:54 UTC
729afaa docs: Remove CNCF graduation from the roadmap Cilium graduated in 2023, so we can remove this from the roadmap :tada: Signed-off-by: Joe Stringer <joe@cilium.io> 09 July 2024, 18:57:48 UTC
3f8ec1d chore(deps): update dependency go to v1.22.5 Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com> 09 July 2024, 16:16:30 UTC
8629d79 envoy: Ignore All Non-TCP Protocols When converting L4Filter policies to L7 policies envoy was explicitly checking for the unsupported protocols UDP, and SCTP. This is not robust logic as other unsupported protocols might eventually be added to the list. Instead this continuation should be the default case. Signed-off-by: Nate Sweet <nathanjsweet@pm.me> 09 July 2024, 16:00:08 UTC
f18ce4a bgpv2/contrib: update cilium version in labs and minor fixes Updating cilium version to v1.16.0-rc.0 for lab deployments. Removed bgpv2Enable flag, it is no longer required. This change also modifies peering to loopback instead of link address, this makes sure that multiple Cilium nodes can use same BGP peer address in cluster configuration. Signed-off-by: harsimran pabla <hpabla@isovalent.com> 09 July 2024, 15:34:22 UTC
7c68b7a docs: remove mention of clustermesh + L7 policies + tunnel limitation The mention of this limitation appears to have been introduced as part of an early clustermesh documentation version [1], more than 5 years ago. However, since then, the limitation must have been lifted, as we have been successfully testing this combination of features in CI for quite some time. Hence, let's remove this outdated mention from the docs. [1]: 23a71f242a63 ("doc: Update ClusterMesh documentation for 1.4") Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> 09 July 2024, 14:46:13 UTC
d7743b7 pkg/policy: Replace kr/pretty by just fmt package Standard libraries are getting better in recent go versions, this helps to remove outdated library kr/pretty (last update is 2 years ago) in the codebase. Signed-off-by: Tam Mach <tam.mach@cilium.io> 09 July 2024, 14:06:09 UTC
92805f5 pkg/comparator: Remove un-used function The function CompareWithNames is no longer used, so better to remove it. Signed-off-by: Tam Mach <tam.mach@cilium.io> 09 July 2024, 14:06:09 UTC
4f51f44 fix(deps): update sigs.k8s.io/gateway-api digest to 895f957 Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com> 09 July 2024, 12:21:06 UTC
82930f2 chore(deps): update gcr.io/distroless/static-debian11:nonroot docker digest to 63ebe03 Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com> 09 July 2024, 12:20:13 UTC
0f97b3c build(deps): bump certifi from 2023.7.22 to 2024.7.4 in /Documentation Bumps [certifi](https://github.com/certifi/python-certifi) from 2023.7.22 to 2024.7.4. - [Commits](https://github.com/certifi/python-certifi/compare/2023.07.22...2024.07.04) --- updated-dependencies: - dependency-name: certifi dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> 09 July 2024, 11:34:54 UTC
10b3a73 .github: No need to install Docker in complexity-test manually anymore After [1] has been merged, Docker is part of complexity-test images, therefore it's no longer needed to install it manually. [1]: https://github.com/cilium/little-vm-helper-images/pull/461 Signed-off-by: Maxim Mikityanskiy <maxim@isovalent.com> 09 July 2024, 08:14:10 UTC
5167d6d bitlpm: Add Return Value to Upsert Method It would be helpful to know when the Upsert method is adding a new key or replacing it. This updates the Upsert method to do that. Signed-off-by: Nate Sweet <nathanjsweet@pm.me> 09 July 2024, 07:32:03 UTC
de8b7e4 images: update cilium-{runtime,builder} Signed-off-by: Cilium Imagebot <noreply@cilium.io> 09 July 2024, 07:27:20 UTC
11b3ea3 chore(deps): update docker.io/library/golang:1.22.5 docker digest to fcae9e0 Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com> 09 July 2024, 07:27:20 UTC
325b5df chore(deps): update all github action dependencies Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com> 08 July 2024, 20:23:52 UTC
b08a66f fix(deps): update all go dependencies main Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com> 08 July 2024, 19:38:27 UTC
33840e1 renovate: assign on auto-merge Accordingly with renovate's documentation: By default, Renovate will not assign reviewers and assignees to an automerge-enabled PR unless it fails status checks. By configuring this setting to true, Renovate will instead always assign reviewers and assignees for automerging PRs at time of creation. Signed-off-by: André Martins <andre@cilium.io> 08 July 2024, 19:36:10 UTC
ad7d792 docs: Update LVH VM image pull instructions To adjust to the changes from https://github.com/cilium/little-vm-helper/pull/133. Signed-off-by: Martynas Pumputis <m@lambda.lt> 08 July 2024, 18:00:25 UTC
3621830 bpf: avoid modulo in __snat_clamp_port Modulo is an expensive operation. Replace it with an integer multiplication adapted from bounded random number generation. Signed-off-by: Lorenz Bauer <lmb@isovalent.com> 08 July 2024, 17:06:51 UTC
9b6c0bd bpf: fix __snat_clamp_port_range not using highest port __snat_clamp_port_range has an output in the range [start, end), but the call sites expect it to produce a value in [start, end]. Operate on the full range and fix tests which relied on the [start, end) interpretation. Fixes: 92cd032bf7 ("bpf, snat: select lru map if available otherwise fall back to htab") Signed-off-by: Lorenz Bauer <lmb@isovalent.com> 08 July 2024, 17:06:51 UTC
69b8bb6 bpf: fix nodeport asserts The nodeport code has a couple of configurable port ranges. In the code they are used as closed ranges [min, max] but the asserts do not allow min == max, aka the interval containing just min. Fixes: d3b6298de7 ("bpf: add build assertions for nodeport assumptions") Signed-off-by: Lorenz Bauer <lmb@isovalent.com> 08 July 2024, 17:06:51 UTC
b59efff chore(deps): update all lvh-images main Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com> 08 July 2024, 15:18:12 UTC
859854d renovate: Only update patch version for statedb For example, the version v0.2.x contains a couple of breaking changes in APIs. Co-authored-by: André Martins <andre@cilium.io> Signed-off-by: Tam Mach <sayboras@yahoo.com> 08 July 2024, 15:14:32 UTC
fdf7f09 bpf/complexity-test: Cover bpf_wireguard.c The files are copied from bpf/complexity-tests/*/bpf_host/*.txt, followed by s/-DENABLE_IPSEC/-DENABLE_WIREGUARD/. Signed-off-by: gray <gray.liang@isovalent.com> 08 July 2024, 14:35:10 UTC
3afd5cf bpf/Makefile: Add bpf_wireguard.o for build test MAX_WIREGUARD_OPTIONS and WIREGUARD_OPTIONS are set in a similar way to MAX_HOST_OPTIONS and HOST_OPTIONS, with ENABLE_IPSEC replaced by ENABLE_WIREGUARD. Signed-off-by: gray <gray.liang@isovalent.com> 08 July 2024, 14:35:10 UTC
89edd94 Add CODEOWNERS for bpf_wireguard.c Signed-off-by: gray <gray.liang@isovalent.com> 08 July 2024, 14:35:10 UTC
c8a4fa3 Attach to-wireguard to the tc egress of cilium_wg0 This is to handle rev-DNAT when L7 ingress proxy is enabled with wireguard, nodeport, native routing, and KPR. Consider a pod-to-remote-nodeport connection, the reply packet will pass: from_lxc@veth -> redirected to L7 proxy -> from_host@cilium_host -> to_host@cilium_net -> to_netdev@eth0 -> redirected to cilium_wg0. This patch ensures the new attached to-wireguard@cilium_wg0 can do the necessary rev-DNAT. Fixes: https://github.com/cilium/cilium/issues/32899 Signed-off-by: gray <gray.liang@isovalent.com> 08 July 2024, 14:35:10 UTC
24e85ac gha: introduce clustermesh scale test Introduce a new GitHub Actions Workflow to exercise clustermesh, and thus Cilium, at high scale. The workflow mimics other scale-related tests and deploys a Kubernetes cluster hosted on GCP leveraging Kops. In this case, the cluster is composed of a single control-plane node and a single worker node. Additionally, we leverage cmapisrv-mock [1] to mock an arbitrary number of remote clusters, each composed of a given number of nodes, endpoints, and identities, and associated churn rate. The cmapisrv-mock is always scheduled on the control plane node, and is currently configured to mock 250 clusters, encompassing a total of 25k nodes, 250k endpoints and 25k identities. Additionally, for each cluster, the configured churn rates are nodes: 0.1 QPS, endpoints: 1 QPS and identities: 0.2 QPS, for a total of ~325 QPS. Similarly to other scale-related workflows, we leverage ClusterLoader2 [2] to setup the monitoring stack, run the tests and gather the results in the appropriate format to then plot them using perfdash [3]. Specifically, we run it once to setup the monitoring stack, then configure Cilium to connect to the mocked clusters, and then run it a second time to execute the actual tests and gather the results. Currently, the test is composed of a first step starting a given number of pods scheduled on the worker nodes (with 5 QPS) and measuring the startup latency, to observe possible slow-downs when a large number of nodes, endpoints and identities has already been injected. Subsequently, the test sleeps for 5 minutes to observe the effects of the remote clusters churn, and eventually tears down the pods and gathers the results. [1]: https://github.com/cilium/scaffolding/tree/main/cmapisrv-mock [2]: https://github.com/kubernetes/perf-tests/tree/master/clusterloader2 [3]: https://github.com/kubernetes/perf-tests/tree/master/perfdash Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> 08 July 2024, 13:55:38 UTC
a7484dd Adding custom labels to hubble ui deployment Fixes: #33582 Signed-off-by: Andrey Maltsev <maltsev.andrey@gmail.com> 08 July 2024, 13:55:19 UTC
723fd17 Update CiliumEnvoyConfig handling for headless Services Hive will automatically retry these, so this does not need to be logged at Error level. Non-headless Services already behave in this way after #33266 was merged. Signed-off-by: Nick Young <nick@isovalent.com> 08 July 2024, 12:44:20 UTC
d65f9af renovate: add auto-approve bot for renovate PRs Enable the GitHub "auto-merge" feature in the repository settings at https://github.com/<org>/<repo>/settings. If Renovate detects this feature, it allows PRs to be auto-merged by GitHub. GitHub will auto-merge a PR if all required checks pass and CODEOWNERS have reviewed it. If these conditions are unmet, GitHub won't merge the PR. To allow Renovate to auto-approve its own PRs, configure Renovate to request a review from the bot `ciliumbot` for PRs with trusted dependencies. The `reviewers` configuration in Renovate will ensure `ciliumbot` is the sole reviewer of Renovate's PRs. Create a GitHub Action triggered by a review request event, ensuring the PR review was requested by the Renovate bot, the PR was created by Renovate, and the review request is for `ciliumbot`. Ensure `ciliumbot` belongs to some teams of the CODEOWNERS file but is not auto-assigned reviews by GitHub. This setup allows `ciliumbot` to provide the necessary approvals without manual intervention, enabling seamless integration of Renovate to auto-approve PRs. The teams that `ciliumbot` will belong to are the ones that usually are selected to review renovate PRs when a trusted dependency is updated. Signed-off-by: André Martins <andre@cilium.io> 08 July 2024, 10:21:15 UTC
743ee43 fix(deps): update aws-sdk-go-v2 monorepo Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com> 08 July 2024, 10:02:25 UTC
4db2be8 socketlb: tolerate cgroupv1 when detaching bpf programs This fixes a regression where Cilium fails to start because it fails to detach socketlb progs from a cgroupv1. This happens because QueryPrograms will fail with EBADF on a cgroupv1. To cleanup remaining programs from socketlb, we always try to remove these programs from the root cgroup. This commit ensures we don't fail to detach socketlb programs from a cgroupv1, as we would never succeed to attach any programs here in the first place. Signed-off-by: Robin Gögge <r.goegge@isovalent.com> 08 July 2024, 09:54:13 UTC
ca048e2 bpf: nodeport: clear EDT info for "Port unreachable" ICMP reply The EDT ID for pod-egressing traffic is stored in skb->queue_mapping, and checked by the Bandwidth Manager code in to-netdev. When the N/S LB responds to a service request with a "Port unreachable" ICMP message, it rebuilds the initial request and redirects it back out (passing through to-netdev). As the NIC driver potentially filled the skb->queue_mapping field on RX, we need to clear it to avoid aliasing with genuine EDT marking. This is in line with what the nodeport code does when forwarding LB'ed requests to a remote backend. Fixes: 2c2c5ae08e7e ("bpf: Send ICMP unreachable when no service backends are available") Signed-off-by: Julian Wiedmann <jwi@isovalent.com> 08 July 2024, 09:52:07 UTC
9404e92 chore(deps): update all github action dependencies Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com> 08 July 2024, 09:45:36 UTC
6c0da24 Remove deprecated call to BuildNameToCertificate Signed-off-by: David Chosrova <dchosrova@gmail.com> 08 July 2024, 09:40:05 UTC
f6e846f Remove deprecated call to BuildNameToCertificate Signed-off-by: David Chosrova <dchosrova@gmail.com> 08 July 2024, 09:40:05 UTC
b301855 helm: improve the clustermesh-apiserver service annotations comment In particular, let's update the deprecated annotations to configure an internal load balancer service on the different cloud providers: * https://learn.microsoft.com/en-us/azure/aks/internal-lb?tabs=set-service-annotations#create-an-internal-load-balancer * https://kubernetes-sigs.github.io/aws-load-balancer-controller/v2.2/guide/service/annotations/#lb-internal * https://cloud.google.com/kubernetes-engine/docs/concepts/service-load-balancer#load_balancer_types Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> 08 July 2024, 09:09:31 UTC
e9d007c clustermesh: cover watchdog logic with integration tests To validate the correct functioning of the watchdog logic, which is responsible for restarting the etcd connection in case of errors. To this end, the backend and clusterlock factories have been parametrized so that they can be overridden for testing purposes. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> 08 July 2024, 05:34:05 UTC
633f350 clustermesh: cleanup usage of changed channel Now that this channel is no longer used for testing purposes, let's remove it and simplify the logic to synchronously call the same function both when connecting to a given cluster for the first time, and in case of subsequent configuration changes, as only retriggering a controller already running in background. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> 08 July 2024, 05:34:05 UTC
5576a41 clustermesh: add unit test to cover the ConfigFiles function Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> 08 July 2024, 05:34:05 UTC
de8316e clustermesh: rework TestWatchConfigDirectory test with mocked observer Rather than using the real clustermesh implementation, which requires hacks such as the skipKvstoreConnection global variable, and blocks a subsequent refactoring to remove the usage of the changed channel. While being there, let's also replace path with filepath, which is more appropriate when dealing with paths representing files, and test that the watcher correctly returns an error if configured with a non-existing directory. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> 08 July 2024, 05:34:05 UTC
fb0efd2 clustermesh: improve integration tests coverage of common logic Specifically, let's validate the correct behavior for the different lifecycle events, and in particular when starting and stopping the clustermesh subsystem, and when adding, modifying and removing cluster configurations. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com> 08 July 2024, 05:34:05 UTC
c274e50 envoy: Add renovate configuration for cilium-proxy image version This is to make sure that we don't unintentionally bump the minor version for stable branches. Signed-off-by: Tam Mach <tam.mach@cilium.io> 08 July 2024, 03:49:52 UTC
6083768 fix(deps): update opentelemetry-go monorepo to v1.28.0 Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com> 07 July 2024, 09:36:55 UTC
fc69880 images: update cilium-{runtime,builder} Signed-off-by: Cilium Imagebot <noreply@cilium.io> 05 July 2024, 19:51:02 UTC
261af51 chore(deps): update all-dependencies Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com> 05 July 2024, 19:51:02 UTC
59e734b dev: support for an additional kind values file Currently the Cilium Helm values that are used for all of the Make kind-* targets are provided by static values files or an optional kind-custom.yaml file that is under git ignore (for local dev purposes). This commit introduces the possibilility to pass an additional values file as make argument. ``` ADDITIONAL_KIND_VALUES_FILE=contrib/testing/kind-feature-x.yaml make kind-debug ``` This provides the possibility to use feature-specific values files during development and reduces the need for a specific make target. Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com> 05 July 2024, 12:17:24 UTC
2271d1e dev: add option to delete docker containers when deleting kind cluster Currently deleting a kind cluster with `make kind-down` fails if there are docker containers attached to the kind docker network. ``` ❯ make kind-down ./contrib/scripts/kind-down.sh Deleting cluster "kind" ... Deleted nodes: ["kind-worker" "kind-control-plane"] Error response from daemon: error while removing network: network kind-cilium id be5f3e19dd958de25745635986363284e14b38504af10329f69f5176779cab3a has active endpoints ``` In some cases adding docker containers to the same network is part of the test- / dev-setup. Therefore it would be great to automatically delete these docker containers before deleting the network. This commit introduces the possibility to delete the docker containers by passing the env var `DELETE_CONTAINERS` to the make target. ``` DELETE_CONTAINERS=true make kind-down ``` Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com> 05 July 2024, 12:17:16 UTC
d36d992 build(install,dashboards): update cilium-agent Grafana dashboard In order to make the dashboard compatible with Grafana 11, the panels got upgraded. Issue: https://github.com/cilium/cilium/issues/31850 Signed-off-by: Sebastian Gaiser <sebastiangaiser@users.noreply.github.com> 05 July 2024, 06:31:47 UTC
4709707 bgpv2: Fix description of Selector behavior in CiliumBGPAdvertisement CRD The actual implementation as well as the intention is to not advertise any prefixes if the Selector CiliumBGPAdvertisement is not specified. Signed-off-by: Rastislav Szabo <rastislav.szabo@isovalent.com> 05 July 2024, 05:57:35 UTC
ecdf16d fqdn-perf: allow to inject additional metrics measurements Signed-off-by: Marcel Zieba <marcel.zieba@isovalent.com> 04 July 2024, 18:59:39 UTC
cd2f736 renovate: stop wireguard updates Renovate has been failing recently to check for the wireguard updates. Also this dependency hasn't received updates for months so it's safe to ignore it temporarily. Signed-off-by: André Martins <andre@cilium.io> 04 July 2024, 18:51:20 UTC
9cb4ae5 .github: Clean up cilium-cli action usages - Install cilium-cli in the default location to be consistent with other workflows. When cilium/design-cfps#9 eventually gets implemented, we might want to put the cilium-cli code under cilium-cli/ directory. - For conformance-ginkgo.yaml, use the cilium-cli action to install cilium-cli in /host directory. Signed-off-by: Michi Mutsuzaki <michi@isovalent.com> 04 July 2024, 17:51:47 UTC
4906db5 ipsec: Delete old, deprioritized XFRM OUT rules In commit a11d088154 ("ipsec: Deprioritize old XFRM OUT policy for dropless upgrade"), we added special logic to handle the upgrade to v1.15 and previous versions, particularly around the replacement of XFRM OUT policies. All users are now expected to have upgraded, so we can remove this logic. ...or more precisely, update it. Instead of depriorizing old XFRM OUT policies, the same logic will now remove them. This may seems strange because, in the previous commit, I advocated for not removing stale XFRM states. The difference here is two-fold. First, the logic to remove is almost the same as to deprioritize so I think the risk of introducing a regression in this way is low. Second, we know that a large number of XFRM policies on the system can have an impact on performance so there is an incentive (even if small) to remove stale policies. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> 04 July 2024, 17:27:54 UTC
2d8c746 ipsec: Remove stale upgrade logic around XFRM states In commits f1c4a6e593 ("ipsec: Allow old and new XFRM IN states to coexist for upgrade") and c0d9b8c9e7 ("ipsec: Allow old and new XFRM OUT states to coexist for upgrade"), we added special logic to handle upgrades to v1.15.0 and previous versions. This logic was required because we needed to change the structure of our XFRM states and Linux doesn't offer a way to replace them atomically. So we had to detect conflicts and temporarily remove conflicting XFRM states while we add the new ones. Painful times... With the v1.17 development cycles starting, these upgrades are now clearly behind us. All IPsec users should have the new XFRM states in place. We can therefore remove this special logic. Note that we never cleaned up the old XFRM states. It's unclear that we should. They are now unused and not causing problems. They will naturaly disappear as users rotate nodes and clusters. Adding logic to specifically remove them may carry more risk than benefit. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> 04 July 2024, 17:27:54 UTC
e357fdb ipsec: Remove stale code from v1.15 Commit a4c43f358ee ("ipsec: Do not use AllocCIDR with subnet encryption") added code to remove an old bogus route in v1.15. In v1.17, we can now assume this route was removed for all users and the related code can be removed. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> 04 July 2024, 17:27:54 UTC
d2c1411 bpf: Remove old mark logic for IPsec upgrades Commit 420d7faea3 ("bpf, daemon: Have bpf_host support both values for skb->cb[4]") introduced a special logic to handle the upgrades to v1.15.0 and previous versions. This logic was needed because the way we use skb->cb[4] changed. In v1.17, we can assume that all users have now gone through the upgrade and this logic isn't needed anymore. This commit therefore deletes it. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> 04 July 2024, 17:27:54 UTC
b855b25 node/manager: synthesize node deletion events When the cilium agent is down (due to a crash or an upgrade), it can miss node events. Upon startup, live nodes are upserted, but when deletions are missed, the agent fails to clean up node-related system state. Examples of such state includes bpf map entries, xfrm states or routes. In particular, the agent fails to clean up node IP to nodeID mappings in the nodeid bpf map. Since K8s will happily recycle such IPs, this can lead to breakage, as the agent associate the wrong nodeID with IPs. To avoid leaking this state, the node manager now dumps its view of the current set of nodes to a file in the runtime state directory, which can be read on restart of an agent. This is similar to how we restore other state upon restart. When reading this file, it's important to avoid resurrecting long-gone nodes (as we don't know for how long the agent was down) - instead, we merely take note of which nodes we knew of in the past, compare that to the nodes we consider live (once synced to k8s), and delete the ones which seem to have disappeared. The motivation to build this reconciliation based on full state dumps to disk is that downstream code generally assumes to have access to a full node object in the deletion callbacks. This makes is infeasible to base the pruning on just the information available in bpf maps. In an alternative design, downstream subsystems are responsible for cleaning up their own state based on just a node identifier, but current code doesn't allow for this. Signed-off-by: David Bimmler <david.bimmler@isovalent.com> 04 July 2024, 14:53:36 UTC
545fbc8 controlplane: clear environment after test Clearing the environment in the middle of the test can cause failures related to state being deleted, as the "environment" being cleared is simply the StateDir of the agent. Fixes: 940b186ab4 ("test/controlplane: Fix tests after removal of global hives") Signed-off-by: David Bimmler <david.bimmler@isovalent.com> 04 July 2024, 14:53:36 UTC
f5f1e5a policy: Fix mapstate.Diff() used in tests Use the actual unexpected value, rather then the one that was not found. Remove the import of unused "testing" from production code. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> 04 July 2024, 12:55:04 UTC
back to top