Revision history - refs/heads/wip-debug-link-not-found-v2 - origin: https://github.com/cilium/cilium

visit type:

Revision	Author	Date	Message	Commit Date
eeb3e8c	Tom Hadlaw	20 July 2024, 19:48:55 UTC	wip Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com>	20 July 2024, 19:48:55 UTC
acc80d2	Tom Hadlaw	20 July 2024, 19:25:33 UTC	wip Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com>	20 July 2024, 19:25:33 UTC
78932e9	Tom Hadlaw	19 July 2024, 22:59:29 UTC	wip Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com>	19 July 2024, 22:59:29 UTC
b5a204b	Tom Hadlaw	19 July 2024, 22:52:00 UTC	wip Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com>	19 July 2024, 22:52:00 UTC
351599e	Tom Hadlaw	19 July 2024, 22:08:38 UTC	wip Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com>	19 July 2024, 22:08:38 UTC
b2ed61b	Tom Hadlaw	19 July 2024, 21:58:15 UTC	wip-lock-logging Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com>	19 July 2024, 21:59:58 UTC
41759ca	Tom Hadlaw	18 July 2024, 23:33:45 UTC	wip Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com>	18 July 2024, 23:51:28 UTC
98c2d44	Tom Hadlaw	18 July 2024, 22:02:12 UTC	wip Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com>	18 July 2024, 22:02:12 UTC
3c10bef	Tom Hadlaw	18 July 2024, 21:44:43 UTC	wip Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com>	18 July 2024, 21:44:43 UTC
9c015ea	Tom Hadlaw	18 July 2024, 21:08:58 UTC	very-interesting... Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com>	18 July 2024, 21:08:58 UTC
eba69fa	Tom Hadlaw	18 July 2024, 20:37:59 UTC	wip Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com>	18 July 2024, 20:37:59 UTC
92d190a	Tom Hadlaw	18 July 2024, 20:22:38 UTC	wip Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com>	18 July 2024, 20:24:04 UTC
1dd1083	Tom Hadlaw	18 July 2024, 20:19:40 UTC	wip Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com>	18 July 2024, 20:24:04 UTC
cd8f76e	Aditi Ghag	16 July 2024, 14:40:20 UTC	install: Document requirements for SYS_ADMIN Switching into non-root network namespaces requires SYS_ADMIN permissions. The health endpoint infrastructure has had this dependency, socket-LB will have a similar requirement. Suggested-by: Daniel Borkmann <daniel@iogearbox.net> Suggested-by: Sebastian Wicki <sebastian@isovalent.com> Signed-off-by: Aditi Ghag <aditi@cilium.io>	16 July 2024, 16:13:50 UTC
28ec68c	Aditi Ghag	11 July 2024, 14:37:24 UTC	docs: Update socket-LB connection termination note Update the section as the agent now terminates stale connections in pod network namespaces as well. Signed-off-by: Aditi Ghag <aditi@cilium.io>	16 July 2024, 16:13:50 UTC
cca92d0	Aditi Ghag	11 July 2024, 22:26:34 UTC	pkg/service: Terminate stale pod netns connections Extension of PR#25169 [1] to terminate pod netns connections to deleted service backends in pod network namespaces. [1] https://github.com/cilium/cilium/pull/25169 Signed-off-by: Aditi Ghag <aditi@cilium.io>	16 July 2024, 16:13:50 UTC
7d65d92	Aditi Ghag	13 June 2024, 21:53:10 UTC	install: Mount /var/run/netns This is required to be able to exec into pod network namespaces by opening `/var/run/netns/<pod-netns>`. The current use case is to be able to terminate stale pod connections when socket-Lb is enabled. Signed-off-by: Aditi Ghag <aditi@cilium.io>	16 July 2024, 16:13:50 UTC
d2f642a	Aditi Ghag	11 July 2024, 22:25:30 UTC	config: Add flag for socket LB pod connections termination Signed-off-by: Aditi Ghag <aditi@cilium.io>	16 July 2024, 16:13:50 UTC
02415be	Louis DeLosSantos	11 July 2024, 02:16:25 UTC	node: disambiguate nodeID allocation variable Fly by nit but sematics matter. The returned ID from n.allocateIDForNode could be for the local node. The variable makes the reader think other wise. Signed-off-by: Louis DeLosSantos <louis.delos@isovalent.com>	16 July 2024, 08:04:05 UTC
3c7b20d	Julian Wiedmann	15 July 2024, 11:49:40 UTC	bpf: proxy: only change packet type when needed Pull the ctx_change_type() call into the code path that's described by the code comment. Thus limiting the cases where we touch the packet type. Signed-off-by: Julian Wiedmann <jwi@isovalent.com>	16 July 2024, 03:05:45 UTC
186b84e	Julian Wiedmann	27 May 2024, 13:13:56 UTC	bpf: fib: only touch ext_err on an actual error The ext_err variable should only be set on an actual error. But our FIB redirect code currently also sets it to transport the FIB lookup result into fib_do_redirect(). Therefore an unrelated error in the subsequent code flow will inherit this ext_err value, and report confusing error status. Fix up a few parts to only set ext_err on an unsupported FIB lookup result, and move the FIB lookup result into a separate veriable where needed. Signed-off-by: Julian Wiedmann <jwi@isovalent.com>	16 July 2024, 02:32:47 UTC
e402a56	Julian Wiedmann	27 May 2024, 13:05:21 UTC	bpf: fib: extract error check from fib_do_redirect() Evaluate the FIB lookup result in each caller. Or more precisely, in each user of fib_lookup_*(). Signed-off-by: Julian Wiedmann <jwi@isovalent.com>	16 July 2024, 02:32:47 UTC
522b98e	Julian Wiedmann	08 January 2024, 07:52:25 UTC	bpf: fib: tests: report the correct flags value on error Report the flags that we're complaining about. Signed-off-by: Julian Wiedmann <jwi@isovalent.com>	16 July 2024, 02:32:47 UTC
c57bfb6	Alexander Berger	13 July 2024, 10:18:17 UTC	fix: support validation of stringToString values in ConfigMap Fixes: #33095 https://github.com/cilium/cilium/issues/33095 Signed-off-by: Alexander Berger <alex-berger@gmx.ch>	16 July 2024, 01:56:06 UTC
36ae9be	Liam Parker	12 July 2024, 16:27:07 UTC	Update multicast.rst missing 5 in second octet of exmaple Signed-off-by: Liam Parker <liamchat500@gmail.com>	16 July 2024, 01:21:17 UTC
38e9666	Joe Stringer	15 July 2024, 23:49:38 UTC	README: Update releases Signed-off-by: Joe Stringer <joe@cilium.io>	16 July 2024, 01:19:02 UTC
fb47ea5	Jarno Rajahalme	26 June 2024, 15:30:53 UTC	policy: Remove ForEachAllow() and ForEachDeny() from MapState interface Currently there are no users so it is a good time to remove these. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>	15 July 2024, 20:35:14 UTC
675a5c0	Jarno Rajahalme	26 June 2024, 16:21:08 UTC	policy: Add visibility keys without ranging over all entries Use the cidr index to find interesting items for visibility key injection without ranging over all entries in the MapState. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>	15 July 2024, 20:35:14 UTC
7f1522f	Jarno Rajahalme	25 June 2024, 20:20:34 UTC	policy: Avoid scanning all entries with auth rules Use the port/proto trie index to exclude uninteresing keys from the scan needed for propagating auth properties from more general rules to more specific ones. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>	15 July 2024, 20:35:14 UTC
be49b7e	Jarno Rajahalme	26 June 2024, 13:32:03 UTC	policy: Propagate proxy port and auth type in added Auth L3/L4 entries The added L3/L4 entries were created with 0 proxy port, while the proxy port should be copied from the L4 rule. Auth type should propagate from a L3 rule with protocol and wildcard port when another L3-wildcard rule without auth type has the same protocol and a specific port. In this case a new entry with the L3 and proto from the L3-rule and the port (and the same protocol) from the L4-only rule should be created. Add unit testing to validate the above. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>	15 July 2024, 20:35:14 UTC
83e85ec	Jarno Rajahalme	01 July 2024, 06:56:42 UTC	policy: Use CIDR index instead of scanning all entries for deny Use PortProto and CIDR tries instead of scanning all entries when preserving precedence for deny rules. On a test benchmark with 1000 random allow and deny CIDR rules each, this speeds up MapState generation by >1800x, hopefully making deny policies useful in practical situations. Optional validator interface is added to mapState so that we can keep the existing identityIsSupersetOf logic in unit tests as additional validation that the ancestor and descendats logic is visiting only expeted entries. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>	15 July 2024, 20:35:14 UTC
036b921	Jarno Rajahalme	28 June 2024, 07:18:26 UTC	policy: Add cidr index to mapstate Add CIDR index to mapStateMap alongside the ID map. This allows for more efficient lookup of super/sub-set entries for deny rule insertion. This requires passing Identities interface along so that the CIDR can be gotten from the selector cache. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>	15 July 2024, 20:35:14 UTC
a86501a	Jarno Rajahalme	26 June 2024, 13:48:29 UTC	policy: Simplify by using mapStateMap directly Use the type mapStateMap directly rather than via mapState when adding l3l4 entries for Auth policies. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>	15 July 2024, 20:35:14 UTC
98fae35	Jarno Rajahalme	26 June 2024, 13:37:10 UTC	policy: Copy L4-key instead of L3-only key Copy the L4-key when creating new L3/L4 entries due to policy precedence rules and copy the Identity from the L3-only key. This way the code is simpler and less error prone. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>	15 July 2024, 20:35:14 UTC
14dbc0d	Jarno Rajahalme	22 June 2024, 21:21:58 UTC	policy: Refactor newMapState Remove the initMap from newMapState() and add withState(), and the same for the interface returning versions (NewMapState() and WithState()). This helps reducing refactor changes from the next commit. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>	15 July 2024, 20:35:14 UTC
2cb809d	Jarno Rajahalme	28 June 2024, 07:03:08 UTC	policy: Make mapstate modifiers private Make mapstate modifiers private. This helps later when modifiers need to have selector cache locked. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>	15 July 2024, 20:35:14 UTC
8473be3	Jarno Rajahalme	21 June 2024, 21:38:48 UTC	policy: add a main mapstate map Add back the simple Go map to hold MapStateEntries, make trie just an index to it. This change allows adding additional indices in later commits. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>	15 July 2024, 20:35:14 UTC
ca30f69	Jarno Rajahalme	19 June 2024, 07:18:23 UTC	policy: avoid indirection for contained mapStateMaps Avoid unnecessary indirection. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>	15 July 2024, 20:35:14 UTC
1654c11	Jarno Rajahalme	19 June 2024, 07:55:37 UTC	policy: Avoid copies for allKey Do not copy 'allKey' when not really needed. Have one of them for each traffic direction instead. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>	15 July 2024, 20:35:14 UTC
2c8dd44	Jarno Rajahalme	18 June 2024, 14:08:04 UTC	policy: Clarify comment Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>	15 July 2024, 20:35:14 UTC
4f391e4	Jarno Rajahalme	25 June 2024, 12:18:53 UTC	policy: Cache prefixes separately Change selector cache to return a netip.Prefix instead of *net.IPNet. This avoids conversions down the line. Store identity prefix for each CIDR identity separately in a sync.Map, so that selector cache mutex is not needed to access them. Rename GetNetsLocked as GetPrefix, as it is now returning a single netip.Prefix and it does not need locking. Suggested-by: Casey Callendrello <cdc@isovalent.com> Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>	15 July 2024, 20:35:14 UTC
6363913	Jarno Rajahalme	01 July 2024, 06:47:25 UTC	policy: Add deny rule test and benchmark Add test and benchmark with a mix of deny and allow CIDR rules. Add identity for each CIDR so that 'toMapState' has some work to do. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>	15 July 2024, 20:35:14 UTC
244d001	Jarno Rajahalme	22 June 2024, 06:07:13 UTC	policy: Generalize bootstrapRepo for different sets of identities Change bootstrapRepo to not generate identities directly, but via the passed in rule generation function. This way the generated identities can be tailored for the rules being used. This becomes useful with additional CIDR tests. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>	15 July 2024, 20:35:14 UTC
77a188d	Jarno Rajahalme	22 June 2024, 06:16:43 UTC	policy: Remove dead test code Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>	15 July 2024, 20:35:14 UTC
e541aa0	Joe Stringer	11 July 2024, 21:55:40 UTC	docs: Assume ready-to-merge is not for humans Cilium committers today in general have access to merge PRs that both (a) have all codeowner reviews covered for the PR, and (b) all required workflows are passing their tests for the PR. The committer then takes responsibility if they decide to merge the PR. For instance, if the PR passes CI but introduces unreliability into the testsuite, then they should be proactive about addressing any problems introduced by the PR, preparing a revert if necessary, or facilitating the revert if another solution cannot be proposed within a reasonable timeframe. Once a PR passes workflow checks and has all required codeowner approvals, the Maintainer's Little Helper should set the "ready-to-merge" label. @cilium/tophat will periodically review the list of PRs that are ready to merge, and will merge them. We have previously experienced situations where contributors may set the "ready-to-merge" label before the required review and testing steps have been completed, and this can lead to breakage in the main branch. This should be avoided wherever possible. As such, this commit removes wording that presumes that _humans_ will set the label, and prefers wording where it is assumed that _robots_ will set the label, then humans may be there mostly for a sanity check. This should help to formalize generally two expectations for PRs: - All PRs must pass all required CI checks. Always use /test commands. - Do not use "ready-to-merge" label as a way to bypass required checks. There will always be exceptions to these rules, and we can deal with those exceptions on an ad-hoc basis through communication on the PR. If the above rules are not working very well, then we can also iterate on the implementation of them to make them work better, for instance by reviewing codeowners groups or adjusting required CI jobs. Signed-off-by: Joe Stringer <joe@cilium.io>	15 July 2024, 17:03:31 UTC
9b9dacf	Marco Iorio	11 July 2024, 15:18:56 UTC	etcd: explicitly initialize lastHeartbeat in statusChecker The lastHeatbeat value is updated upon reception of heartbeat events. Currently, it is additionally set upon the list done event: while the outcome is technically correct, leading it to be always initialized before starting the statusChecker, it is also a code smell, as it mixes responsibilities. Let's instead move the initialization to the statusChecker function itself, and drop the subsequent IsZero check, that already led to issues in the part. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>	15 July 2024, 16:39:57 UTC
dcebbfa	Marco Iorio	09 July 2024, 16:17:37 UTC	etcd: don't create unnecessary sessions in the clustermesh context Currently, the etcd client leverages a session to wait for the establishment of the initial connection. However, this potentially introduces a quite significant overhead in the context of clustermesh, due to the usage of short-lived (25 seconds) etcd sessions which grow in number proportionally to the Cilium agents in each given cluster, times the number of clusters in the clustermesh. Yet, these sessions, despite being renewed until the client gets closed, are never used after this initial check. In an effort to improve the overall clustermesh performance, and reduce the overhead introduced on the etcd instances, let's rework this logic to not depend on the creation of a session in the clustermesh context (i.e., if the lock quorum check is disabled), but rather just wait until the heartbeat watcher is started successfully (i.e., the list operation completed). Most notably, this (a) preserves the current behavior of the etcd client reporting readiness only once successfully connected, (b) guarantees that there's already been an interaction with the target etcd instance (even if the session check is disabled), to ensure that its corresponding cluster ID has been retrieved if using the "clusterLock" interceptors (i.e., in the clustermesh context) and (c) prevents the client from turning ready if the heartbeat watcher cannot be started, avoiding regressions in this respect and ensuring that the status checker logic can operate correctly. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>	15 July 2024, 16:39:57 UTC
2931622	Marco Iorio	09 July 2024, 15:55:16 UTC	etcd: simplify hearbeat watcher setup logic An etcd watcher is automatically closed upon context cancelling, hence we don't need any special handling for that. Additionally, let's make the events channel unbuffered, as there's no point in performing any buffering in this context. Finally, let's ignore deletion events, as they do not represent a valid heartbeat signal. Differently, we consider a ListDone event as a valid signal, to ensure that the lastHeartbeat variable is set to a non-zero value, effectively starting the status check. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>	15 July 2024, 16:39:57 UTC
218c63a	Marco Iorio	09 July 2024, 15:42:45 UTC	etcd: make waitForInitLock function synchronous Convert the waitForInitLock function to run synchronously, rather than internally starting a separate goroutine and returning the result via a channel. This makes the overall logic more straightforward to reason about, especially in case the lock quorum check is disabled (whose check has been anticipated to the beginning of the function). While being there, let's also rename the function to maybeWaitForInitLock, to better explicit the fact that this check may be skipped if disabled via the corresponding option. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>	15 July 2024, 16:39:57 UTC
a47d9c9	Feroz Salam	15 July 2024, 14:50:21 UTC	Allow Renovate to bump to golang v1.22 Golang v1.21 will be EOL soon – this change will allow us to keep on top of security updates to the standard library. Signed-off-by: Feroz Salam <feroz.salam@isovalent.com>	15 July 2024, 16:22:38 UTC
3de8537	Martynas Pumputis	11 July 2024, 07:16:16 UTC	daemon: Do not require socketLB for BPF masq Previously, we required socketLB to be enabled in order for BPF masquerade to properly function. The reasoning was outlined in [1] and [2]. As pointed by Julian Wiedmann, [3] resolved the following NAT reply issue: On the remote node, the reply (dst=the client node IP) gets masqueraded by the BPF-masq feature, because we masquerade pod -> remote host IP in the tunnel mode (see comment in the "snat_v4_needed()" for the reason), and currently we don't consult the CT map to see whether a packet is reply. Thus, we can remove the check. [1]: https://github.com/cilium/cilium/issues/15437 [2]: https://github.com/cilium/cilium/commit/50e59c309e6d86adad84dc175678a91dce6def03 [3]: https://github.com/cilium/cilium/pull/17168 Signed-off-by: Martynas Pumputis <m@lambda.lt>	15 July 2024, 14:34:41 UTC
511ae88	Aleksander Mistewicz	09 July 2024, 14:00:49 UTC	Present potential errors as metrics instead of potentialy spamming logs Signed-off-by: Aleksander Mistewicz <amistewicz@google.com>	15 July 2024, 14:02:23 UTC
9c6a1d0	Aleksander Mistewicz	04 July 2024, 14:24:33 UTC	Introduce a hard limit for the number of ACT metrics Signed-off-by: Aleksander Mistewicz <amistewicz@google.com>	15 July 2024, 14:02:23 UTC
c6ad9eb	Aleksander Mistewicz	04 July 2024, 13:56:49 UTC	Add Garbage Collector to Active Connection Tracking Metrics Signed-off-by: Aleksander Mistewicz <amistewicz@google.com>	15 July 2024, 14:02:23 UTC
3d77ccb	Aleksander Mistewicz	10 June 2024, 14:37:33 UTC	Add Active Connection Tracking metrics It is recommended to use set conntrack-gc-max-interval to 1m to clean up failed connections in a timely manner. Metrics will disappear once they are stale for 10 minutes. Signed-off-by: Aleksander Mistewicz <amistewicz@google.com>	15 July 2024, 14:02:23 UTC
e076845	Aleksander Mistewicz	26 June 2024, 09:59:29 UTC	Add pkg/act to CODEOWNERS Signed-off-by: Aleksander Mistewicz <amistewicz@google.com>	15 July 2024, 14:02:23 UTC
a8a7b60	Aleksander Mistewicz	11 June 2024, 12:42:44 UTC	Abort initialization for zero-sized ACT map Signed-off-by: Aleksander Mistewicz <amistewicz@google.com>	15 July 2024, 14:02:23 UTC
3da3216	Rastislav Szabo	01 July 2024, 07:08:03 UTC	ci: Enable BGP Control Plane testing in e2e tests Enables BGP Control Plane testing in ci-e2e and ci-e2e-upgrade workflows. Signed-off-by: Rastislav Szabo <rastislav.szabo@isovalent.com>	15 July 2024, 13:30:30 UTC
d734436	Sebastian Wicki	15 July 2024, 09:22:54 UTC	policy/k8s: Fix race in service notification shutdown This fixes a race caught by CI on `main` where `serviceQueue.dequeue` was not woken up when `ctx` was cancelled. The error was that `dequeue` would enter `cond.Wait()` without checking if the context was cancelled before it was called. I have reproduced the race via the following command and verified that this patch fixes it: ``` cd pkg/policy/k8s go test -run '^\QTest_serviceNotificationsQueue\E$' -v . -count 1000 -timeout 1s ``` Fixes: e0fc5e6a0ca3 ("policy/k8s: Fix deadlock in ToServices implementation") Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>	15 July 2024, 11:52:36 UTC
f8eba9a	Jussi Maki	08 July 2024, 13:50:52 UTC	datapath: Prefer node IPs for NodePort and Primary The K8s Node IP address should be preferred when choosing the NodePort and Primary (BPF masquerade) addresses. Signed-off-by: Jussi Maki <jussi@isovalent.com>	15 July 2024, 10:56:44 UTC
887bb88	Jussi Maki	08 July 2024, 10:19:17 UTC	datapath: Fix incremental update to NodeAddress table When new node addresses were added after Cilium had started the update to the internal table of node addresses accidentally removed existing addresses due to comparing the NodeAddress instead of netip.Addr which caused change in e.g. the NodePort flag to indicate that the address should be removed. As an example: Before: NodeAddress{Device: "eth0", Addr: "10.0.0.5", NodePort: true} After (expected): NodeAddress{Device: "eth0", Addr: "10.0.0.1", NodePort: true} NodeAddress{Device: "eth0", Addr: "10.0.0.5", NodePort: false} After (actual): NodeAddress{Device: "eth0", Addr: "10.0.0.1", NodePort: true} What happened was that the code was removing anything old that wasn't in the new set of addresses, leading it to essentially this comparison: NodeAddress{"eth0", "10.0.0.5", true} != NodeAddress{"eth0", "10.0.0.5", false} The result of which was that the address "10.0.0.5" was updated and then subsequently deleted. Fix this by reworking the code to keep the old set of addresses as netip.Addr and reducing that set down when inserting the new entries and then finally cleaning up what remains. The prefix match comment on the device name and the prefix length check was removed as this issue had been fixed in StateDB. As far as we can tell, the only production impact this could have had would have been to users of v1.16 pre-releases or users running v1.15 with the experimental runtime device detection enabled (only then would have changes to Table[NodeAddress] mattered). Fixes: #33234 Signed-off-by: Jussi Maki <jussi@isovalent.com>	15 July 2024, 10:56:44 UTC
c3ba2db	Jussi Maki	10 July 2024, 13:25:38 UTC	datapath: Fix update of fallback addresses when address is removed The fallback addresses in Table[NodeAddress] were not properly updated when the address was removed. Fix this by clearing the addresses first when updating the fallbacks. Signed-off-by: Jussi Maki <jussi@isovalent.com>	15 July 2024, 10:56:44 UTC
abe0acc	Marco Hofstetter	15 July 2024, 07:27:23 UTC	helm: remove duplicate metrics for Envoy pod Currently, having Prometheus enabled `envoy.prometheus.enabled=true` results in duplicated metrics for the Envoy daemonset Pods if scraping via a dedicated `ServiceMonitor` isn't enabled. The reason is that prometheus are added to the `Pod` (in the `DaemonSet`) and to the `Service`. Therefore, this commit removes the Envoy K8s `Service` completely, as this is only used for prometheus scraping. Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com>	15 July 2024, 09:14:37 UTC
f92c43e	Marco Iorio	12 July 2024, 06:49:05 UTC	CODEOWNERS: assign etcdinit to both kvstore and clustermesh teams The etcdinit logic is used as part of the clustermesh-apiserver init container responsible for performing the etcd initialization, mainly with respect to users and roles. Hence, let's update the codeowners file to assign it to both the kvstore and clustermesh teams. Suggested-by: Marcel Zieba <marcel.zieba@isovalent.com> Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>	15 July 2024, 09:04:44 UTC
c56ece3	Fabio Falzoi	12 July 2024, 10:45:49 UTC	ci: Update branch matrix list in call-backport-label-updater In the main branch Call Backport Label Updater workflow is just a placeholder that should be copied to each new stable branch just after creation. Doing that enables the workflow in the stable branch to automatically update the backport PRs labels once they get merged. In order to be ready for the next stable branch (v1.17), update the list to include the last three stable versions plus v1.17. Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>	15 July 2024, 09:04:54 UTC
6a5d338	Michi Mutsuzaki	14 July 2024, 17:40:49 UTC	Run cilium-cli inside Docker I forgot to update these two workflows in #33753. Fixes: 7bf5d1de48 ("Run cilium-cli inside Docker") Signed-off-by: Michi Mutsuzaki <michi@isovalent.com>	15 July 2024, 08:30:19 UTC
e0fc5e6	Sebastian Wicki	11 July 2024, 14:30:46 UTC	policy/k8s: Fix deadlock in ToServices implementation This commit fixes a deadlock between the `k8s.ServiceCache` and `policyWatcher`. The interaction between the two components is as follows: 1. `ServiceCache` observes a service change event, this is forwarded to `policyWatcher` in a buffered channel (default capacity 128). 2. `policyWatcher` receives the service event, determines if any CNPs are affected by this service event, and then calls back into the `ServiceCache` via `ForEachService` to determine the endpoints for each service. This unfortunately can lead to a deadlock when the notifications channel fills up: `ServiceCache` will attempt to send the next notification with its mutex held, while `policyWatcher` attempts concurrently call into `ForEachService`, which attempts to acquire the `ServiceCache` mutex, thereby causing a deadlock. This commit works around this issue by queueing received notifications in an unbounded queue. This way, the `ServiceCache` sender is never blocked, as it will always be able to enqueue a notification. This is not a very elegant solution, but it solves the issue without restructuring the code too much. The proper long-term solution is to likely use per-CIDR labels to implement `ToServices` in policy, similar to cilium/cilium#33441. This would not only simplify the `policyWatcher` logic and decouple it from the `ServiceCache`, it would also reduce the number of policy maps updated whenever the endpoints of a selected service change. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>	15 July 2024, 07:51:48 UTC
cacd5dd	Michi Mutsuzaki	14 July 2024, 17:34:33 UTC	time: Add UTC cilium-cli uses time.UTC variable [^1], so we'll need it when cilium-cli repo gets merged to cilium repo [^2]. [^1]: https://github.com/cilium/cilium-cli/blob/9ffcbb478ceb038b8b643dccd4f693efd47475e5/sysdump/writers.go#L74-L75 [^2]: https://github.com/cilium/design-cfps/pull/9 Signed-off-by: Michi Mutsuzaki <michi@isovalent.com>	14 July 2024, 23:11:31 UTC
3fa34b1	cilium-renovate[bot]	14 July 2024, 02:28:48 UTC	chore(deps): update all lvh-images main Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com>	14 July 2024, 12:12:16 UTC
575dbcc	cilium-renovate[bot]	14 July 2024, 08:13:40 UTC	fix(deps): update aws-sdk-go-v2 monorepo Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com>	14 July 2024, 11:10:45 UTC
7062673	cilium-renovate[bot]	14 July 2024, 02:28:41 UTC	chore(deps): update all github action dependencies Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com>	14 July 2024, 10:53:17 UTC
ce267df	cilium-renovate[bot]	14 July 2024, 00:27:47 UTC	chore(deps): update all github action dependencies Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com>	14 July 2024, 10:40:59 UTC
cd8f44c	cilium-renovate[bot]	14 July 2024, 04:12:46 UTC	chore(deps): update cilium/little-vm-helper action to v0.0.19 Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com>	14 July 2024, 06:56:44 UTC
00ae9e3	Cilium Imagebot	14 July 2024, 00:32:01 UTC	images: update cilium-{runtime,builder} Signed-off-by: Cilium Imagebot <noreply@cilium.io>	14 July 2024, 06:53:34 UTC
5b91714	cilium-renovate[bot]	14 July 2024, 00:27:35 UTC	chore(deps): update go Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com>	14 July 2024, 06:53:34 UTC
7bf5d1d	Michi Mutsuzaki	12 July 2024, 00:32:02 UTC	Run cilium-cli inside Docker - Run cilium-cli inside a container in preparation to merge cilium-cli repo to cilium repo as proposed in CFP-25694 [^1]. - Move "Install Cilium CLI" step after the cluster creation step so that cilium-cli can access .kube/config file. - Run "aws configure" for workflows that run on an EKS cluster. - Add --disable-check=minimum-version flag to cilium install. Checking Kind version doesn't make sense when you run cilium-cli from inside a container since it cannot access the kind binary on the host. - Set --conn-disrupt-test-restarts-path flag. The default path under /tmp doesn't work, as cilium-cli can only access the current working directory when running inside a container [^2]. [^1]: https://github.com/cilium/design-cfps/pull/9 [^2]: https://github.com/cilium/cilium-cli/blob/cbc20a32e7996113e202aa13bdcd637dc05e66af/.github/tools/cilium.sh#L11 Signed-off-by: Michi Mutsuzaki <michi@isovalent.com>	13 July 2024, 10:37:13 UTC
7febf7e	Steven Johnson	10 July 2024, 15:15:25 UTC	Add metric for identity GC latency. This is useful for measuring changes that could impact GC performance e.g. qps throttling, moving from crd -> kvstore etc Signed-off-by: Steven Johnson <sjdot@protonmail.com>	12 July 2024, 23:43:59 UTC
5d8a2d9	Satish Matti	20 June 2024, 23:10:30 UTC	fix: trigger host endpoint regeneration only when required Explicitly check if host endpoint's identity labels are changed before calling endpoint.UpdateLabelsFrom function. Without this change, host endpoint is regenerated even if the old and new identity labels are the same. Signed-off-by: Satish Matti <smatti@google.com>	12 July 2024, 22:55:49 UTC
9707a51	Michi Mutsuzaki	11 July 2024, 21:41:59 UTC	ipsec: Run cilium-cli inside Docker - Run cilium-cli inside a container in preparation to merge cilium-cli repo to cilium repo as proposed in CFP-25694 [^1]. - Move "Install Cilium CLI" step after "Create kind cluster" step so that cilium-cli can access .kube/config file. - Add --disable-check=minimum-version flag to cilium install. Checking Kind version doesn't make sense when you run cilium-cli from inside a container since it cannot access the kind binary on the host. - Set --conn-disrupt-test-{restarts-path,xfrm-errors-path} flags. The default path under /tmp doesn't work, as cilium-cli can only access the current working directory when running inside a container [^2]. [^1]: https://github.com/cilium/design-cfps/pull/9 [^2]: https://github.com/cilium/cilium-cli/blob/cbc20a32e7996113e202aa13bdcd637dc05e66af/.github/tools/cilium.sh#L11 Signed-off-by: Michi Mutsuzaki <michi@isovalent.com>	12 July 2024, 21:00:31 UTC
287e530	Marco Iorio	11 July 2024, 10:29:19 UTC	clustermesh: fix rare deadlock due to race condition The clustermesh subsystem is currently affected by a possible, although rare, race condition in the cluster config retrieval logic, which can cause a deadlock due to attempting to write to an unbuffered channel no one is receiving from anymore. More in detail, this can happen upon context expiration, as at that point the select statement unblocks, and the function returns, with a deferred function waiting for the termination of the controller. However, the controller body may still attempt to send to the channel, if the context got canceled after having already retrieved the cluster configuration. While extremely unlikely to occur in a production environment, this issue manifested in CI [1], and can be reliably reproduced adding a sleep statement before sending to the `cfgch` channel, and running the `TestClusterMeshMultipleAddRemove` test. Let's fix this by making the channel buffered, to ensure that the controller never blocks while attempting to send to it. [1]: https://github.com/cilium/cilium/actions/runs/9887657579/job/27309862099 Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>	12 July 2024, 16:53:43 UTC
c735ab0	Gilberto Bertin	12 July 2024, 14:51:29 UTC	workflows: e2e-upgrade: bump timeout to 90 minutes Signed-off-by: Gilberto Bertin <jibi@cilium.io>	12 July 2024, 16:41:39 UTC
cb4144b	Paul Chaignon	12 July 2024, 14:43:05 UTC	workflow: Use per-tunnel keys for the IPsec upgrade test All users should now be using per-tunnel keys (that is, with the + sign in the IPsec secret) instead of the insecure global key. Our up/downgrade test for IPsec should therefore reflect that. Reported-by: Marcel Zieba <marcel.zieba@isovalent.com> Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>	12 July 2024, 15:59:11 UTC
3405535	Nicolas Busseneau	12 July 2024, 14:41:34 UTC	ci: allow all GKE K8s release channels In dd947b3a383098a5df39616a2a8850ac64072bcd, we introduced filtering of supported K8s versions for GKE, but restricted it to the regular release channel. However, new versions of K8s are added to the rapid release channel, and older K8s versions might stick around longer in the stable release channel, and we do not specifically request any release channel to be used when we create clusters, so we want all release channels to be considered for availability in the filtering mechanism. Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>	12 July 2024, 15:38:57 UTC
5a76cf2	Aditi Ghag	10 July 2024, 16:14:44 UTC	bpf: Fix complexity issue failing cil_sock{4,6}_connect prog load Verifier complexity error ``` {"level":"fatal","msg":"failed to start: daemon creation failed: error while initializing daemon: failed while reinitializing datapath: failed loading eBPF collection into the kernel: program cil_sock4_connect: load program: permission denied: 373: (07) r9 += 4: R9 pointer arithmetic on map_value_or_null prohibited, null-check it first (759 line(s) omitted)","subsys":"daemon"} 2024-07-10T08:01:19.145344750Z Verifier error: program cil_sock6_connect: load program: permission denied: 399: (07) r7 += 4: R7 pointer arithmetic on map_value_or_null prohibited, null-check it first (611 line(s) omitted) ``` Dylan reported - ``` We get a map value back here https://github.com/cilium/cilium/blob/main/bpf/bpf_sock.c#L387 Which we then null-check like we are supposed to do. However, for some reason the generated byte code does the backend->port access first so https://github.com/cilium/cilium/blob/main/bpf/bpf_sock.c#L422 or https://github.com/cilium/cilium/blob/main/bpf/bpf_sock.c#L434 We don't have the actual bytecode at the moment, but what I suspect is happening is that the compiler is computing the offset of backend->port without accessing the memory before the null check. ``` Add a barrier call so that compiler doesn't reorder the memory accesses. Signed-off-by: Aditi Ghag <aditi@cilium.io>	12 July 2024, 14:21:48 UTC
6e17c40	Marco Iorio	10 July 2024, 15:37:24 UTC	bugtool: scrape heap profiles in protocol buffer format by default By default, cilium-bugtool currently scrapes heap profiles in the legacy text format (i.e., debug=1), rather than in gzipped-compressed protocol buffer format (i.e., debug=0) [1]. Let's flip this value to use the modern format by default, as it allows to run `go tool pprof` without having to explicitly specify the target binary, and produces an output file an order of magnitude smaller, based on a local test. Additionally, this also ensures consistency with the cpu profile, which is hardcoded to using the binary format. [1]: https://pkg.go.dev/runtime/pprof#pkg-notes Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>	12 July 2024, 14:07:48 UTC
7294716	Fabio Falzoi	09 July 2024, 17:25:25 UTC	policy/k8s: Add GH issue 33432 non regression test Add a non regression test for GitHub issue #33432: updating a CNP with a nil ToEndpoints slice to an empty non-nil ToEndpoints slice should result in the update not being discarded and the rules in the policy repository reporting the empty non-nil ToEndpoints slice. Related: https://github.com/cilium/cilium/issues/33432 Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>	12 July 2024, 14:07:39 UTC
410028d	Fabio Falzoi	01 July 2024, 14:48:36 UTC	policy/api: Use a custom DeepEqual method to compare EgressCommonRule The semantic of a nil slice in one of the EgressCommonRule fields is different from the semantic of an empty non-nil slice. In order to compare two EgressCommonRule instances correctly, we add a custom DeepEqual method the explicitly checks for this case before calling the autogenerated method. This allows to correctly propagate CNP/CCNP updates when, for example, the ToEndpoints selector is changed from nil to an empty non-nil slice. In the latter case, the CNP should not select any identity, falling back to the default behavior (e.g: default deny for an allow policy). Fixes: 6ccd044b05 ("policy: Do not select any identity with egress empty slices") Fixes: https://github.com/cilium/cilium/issues/33432 Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>	12 July 2024, 14:07:39 UTC
e0c5bb7	Fabio Falzoi	01 July 2024, 14:48:08 UTC	policy/api: Use a custom DeepEqual method to compare IngressCommonRule The semantic of a nil slice in one of the IngressCommonRule fields is different from the semantic of an empty non-nil slice. In order to compare two IngressCommonRule instances correctly, we add a custom DeepEqual method the explicitly checks for this case before calling the autogenerated method. This allows to correctly propagate CNP/CCNP updates when, for example, the FromEndpoints selector is changed from nil to an empty non-nil slice. In the latter case, the CNP should not select any identity, falling back to the default behavior (e.g: default deny for an allow policy). Fixes: e97df7badd ("policy: Do not select any identity with ingress empty slices") Fixes: https://github.com/cilium/cilium/issues/33432 Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>	12 July 2024, 14:07:39 UTC
9397552	Fabio Falzoi	10 July 2024, 09:11:32 UTC	slices: Add XorNil helper XorNil is a helper that returns true only if one of the two input slices is nil and the other is not. It is useful when comparing two slices and the semantic of a nil slice is different from the semantic of an empty non-nil one. Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>	12 July 2024, 14:07:39 UTC
b8d7a18	Julian Wiedmann	12 July 2024, 11:40:42 UTC	bpf: wireguard: skip RevDNAT check for overlay traffic Overlay traffic will never require RevDNAT processing. Preserve the MAGIC_MARK_OVERLAY when redirecting to the wireguard interface, and skip the RevDNAT check there accordingly. Signed-off-by: Julian Wiedmann <jwi@isovalent.com>	12 July 2024, 13:56:43 UTC
2fdd9fb	Julian Wiedmann	12 July 2024, 11:32:14 UTC	bpf: wireguard: fine-tune retrieval of sec identity from mark Only retrieve the identity from the mark when it's identified as MARK_MAGIC_IDENTITY. Signed-off-by: Julian Wiedmann <jwi@isovalent.com>	12 July 2024, 13:56:43 UTC
5752148	cilium-renovate[bot]	12 July 2024, 11:07:08 UTC	chore(deps): update gcr.io/etcd-development/etcd docker tag to v3.5.14 Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com>	12 July 2024, 13:44:50 UTC
b360a77	Jussi Maki	12 July 2024, 10:52:48 UTC	vendor: Bump to StateDB v0.2.2 This fixes an issue calculating the next refresh time in the reconciler refreshing code that caused busy looping looking for objects to refresh. Signed-off-by: Jussi Maki <jussi@isovalent.com>	12 July 2024, 12:06:26 UTC
5581044	mrproliu	07 July 2024, 02:46:23 UTC	cert: Adding H2 Protocol Support when Get gRPC Config For Client Signed-off-by: mrproliu <741550557@qq.com>	12 July 2024, 09:23:08 UTC
152867f	Louis DeLosSantos	08 July 2024, 16:31:29 UTC	ipsec: do not explicitly prioritize host endpoint regeneration This code was introduced in #25735. As described well in: 16e446c0ffd6424a1a1b737cb359c6e9ef339e3d bpf: Support the old IP_POOLS logic in bpf_host and b429c42a4f8d7217cc998ffa50d0d18d9e27054b daemon: Reload bpf_host first in case of IPsec upgrade This code was put in place to ensure the refactored 'bpf_host' program, which supports the IP_POOLS hack upgrade compatibility, is installed before any other eBPF prog. Now that we are N+2 versions from this introduction, and we instruct individuals to upgrade consecutively, this code can be removed. Signed-off-by: Louis DeLosSantos <louis.delos@isovalent.com>	12 July 2024, 08:52:31 UTC
4440433	Tim Horner	08 July 2024, 16:33:46 UTC	operator: remove unused CES constants Remove unused constants related to CiliumEndpointSlice options that were left behind during modularization efforts. Signed-off-by: Tim Horner <timothy.horner@isovalent.com>	12 July 2024, 07:20:23 UTC
8d1fca3	Tim Horner	05 June 2024, 17:14:07 UTC	operator/CES: cleanup and add helm support for slice mode The supported values for the `--ces-slice-mode` option are a bit redundant (cesSliceModeIdentity, cesSliceModeFCFS). This commit shortens them to `identity` and `fcfs`, and adds support for configuring the CES slice mode via Helm. Signed-off-by: Tim Horner <timothy.horner@isovalent.com>	12 July 2024, 07:20:23 UTC
9799002	Marco Iorio	31 May 2024, 14:54:06 UTC	clustermesh: drop deprecated clustermesh-ip-identities-sync-timeout flag The flag got deprecated in [1] in favor of the more generic clustermesh-sync-timeout one, and it is now time to remove it. [1]: cc7c27da59dc ("clustermesh: introduce circuit breaker in wait for synchronization") Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>	12 July 2024, 07:20:09 UTC
0d896f8	Marco Iorio	31 May 2024, 14:46:04 UTC	clustermesh: remove deprecated has-cluster-config key The ".has-cluster-config" key has been deprecated as no longer necessary in [1], and can now be safely removed. [1]: 65ece676c95d ("clustermesh: add deprecation notice to the has-cluster-config key") Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>	12 July 2024, 07:20:09 UTC
2a25a31	Marco Iorio	11 July 2024, 08:03:05 UTC	gha: compress profiles archives in scale/perf tests This allows to reduce the resulting size of the archive between half and one fifth of the original size, reducing the space used in the bucket, and the subsequent download time. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>	12 July 2024, 07:19:48 UTC

Newer
Older