https://github.com/cilium/cilium

sort by:
Revision Author Date Message Commit Date
e3a3c9f test: Update unit-tests/build.sh Remove the comment related to travis, also run with all available procs. Signed-off-by: Tam Mach <tam.mach@cilium.io> 02 April 2024, 12:52:02 UTC
8b10d0d test: Switch to tag version for tparse v0.13.2 Signed-off-by: Tam Mach <tam.mach@cilium.io> 02 April 2024, 12:52:02 UTC
2fc6922 README: Update releases Signed-off-by: Tim Horner <timothy.horner@isovalent.com> 02 April 2024, 12:24:41 UTC
76867e2 feat: Add the http return code to metric api_processed_total Signed-off-by: Vipul Singh <vipul21sept@gmail.com> 02 April 2024, 11:47:49 UTC
d32b438 Apply suggestions from code review Co-authored-by: Ryan Drew <learnitall0@gmail.com> Signed-off-by: simonfelding <45149055+simonfelding@users.noreply.github.com> 02 April 2024, 09:58:01 UTC
430d023 docs: Suggest operator logs for troubleshooting Signed-off-by: simonfelding <45149055+simonfelding@users.noreply.github.com> undo final newline 02 April 2024, 09:58:01 UTC
a63a88b No longer true as of Istio 1.21 Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io> 02 April 2024, 07:44:24 UTC
f0597c0 bpf: use `bpf_htons` instead of using shift The current implementation using shift does not take into account endianness. `bpf_htons()` detects which endianness is used and converts the value appropriately. Also, this commit defines `bpf_u8_to_be16()` that wraps `bpf_htons()` because converting 8-bit ICMP types to 16-bit does not depend on the host byte order. Signed-off-by: Tomoki Sugiura <tomoki-sugiura@cybozu.co.jp> 02 April 2024, 00:05:58 UTC
8e1c73d api: Upgrade go-swagger version to v0.30.5 Also to add the renovate configuration for auto update version later. Just a note we might still need to run `make generate-api` manually till the work with self-hosted renovate with post-hook is done. Signed-off-by: Tam Mach <tam.mach@cilium.io> 01 April 2024, 21:40:22 UTC
7f505d7 IPAM: Refactors Node Type to Support IP Families Previously, the IPAM Node type represented IP information such as pools, allocations, etc. that are specific to IPv4. This PR introduces the following changes: - Adds the IPAllocAttrs type to represent IP-specific allocation attributes. - Updates the Node type to expose separate attributes for IPv4 and IPv6. - Updates Node instantiation, methods, etc. for the Node type changes introduced in this PR. - Updates the internal resyncStats API to expose separate attributes for IPv4 and IPv6 node statistics. - Updates the AllocationAction API to expose separate IP allocation attributes for IPv4 and IPv6. Note that the `EmptyInterfaceSlots` is not IP family specific and therefore will continue to be a `Statistics` field. - Updates cloud provider IPAM pkgs for API changes. __Note:__ This PR does not implement IPv6 Node attributes. Supports: #19251 Signed-off-by: Daneyon Hansen <daneyon.hansen@solo.io> 01 April 2024, 16:45:34 UTC
65807c2 pkg/ip: Updates PrefixToIps() to Limit the Number of Returned IPs Previously, PrefixToIps() generates and returns all the IP addresses in the provided CIDR. This creates performance and scalability issues when working with large IPv6 CIDRs. This PR adds the `maxIPs` parameter to limit the number of generated and returned prefixes. Supports #19251 Signed-off-by: Daneyon Hansen <daneyon.hansen@solo.io> 01 April 2024, 16:37:39 UTC
ffe4ce8 policy: Mention EgressDeny in CIDRGroupRef docs Update CIDRGroupRef docstring to take into account the support for referenced CIDRGroup in EgressDeny rules. Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com> 01 April 2024, 15:08:13 UTC
3b62c79 policy/k8s: Add support for CIDRGroupRef in EgressDeny Current version of CNP translation lacks support for translating referenced CiliumCIDRGroup objects in EgressDeny rules. The commit adds the missing logic and extends the unit tests suite to take into account the EgressDeny rules. Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com> 01 April 2024, 15:08:13 UTC
136dde6 policy/k8s: Add support for CIDRGroupRef in IngressDeny Current version of CNP translation lacks support for translating referenced CiliumCIDRGroup objects in IngressDeny rules, despite mentioning it in the CIDRGroupRef field docstring. The commit adds the missing logic and extends the unit tests suite to take into account the IngressDeny rules. Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com> 01 April 2024, 15:08:13 UTC
b8203b0 helm: Bump minimum k8s version to v1.21+ This commit is to bump minimum k8s version to v1.21. Ideally, we should bump to v1.26 as per our support matrix, but some CI jobs are still older versions as per below linked PR, hence I think v1.21 is a good balance. Relates: https://github.com/cilium/cilium/pull/29888 Relates: https://github.com/cilium/cilium/issues/30106 Signed-off-by: Tam Mach <tam.mach@cilium.io> 30 March 2024, 02:23:35 UTC
76a659c loader: only detach Cilium-owned XDP programs when XDP is disabled Currently, even when Cilium's XDP features are disabled, the Cilium agent will still attempt to detach a program attached to the legacy netlink XDP hook on managed interfaces. This is so the agent does the right thing when a user first enables and then disables an XDP feature, where the user would expect Cilium's XDP programs to be removed. However, this is at odds with users wanting to run their own XDP programs on Cilium-managed interfaces. Even with XDP disabled, the agent will unconditionally remove any XDP programs. This patch narrows down this behaviour by checking the name of the program attached to the legacy XDP hook before detaching it. If the kernel-provided name is not a prefix of the name expected by the agent, the program is left on the interface. Note that with XDP enabled, legacy XDP programs will always be replaced with Cilium programs. Signed-off-by: Timo Beckers <timo@isovalent.com> 29 March 2024, 11:24:58 UTC
129f2e2 ci/ipsec: Print more info to debug credentials removal check failures In commit 6fee46f9e753 ("ci/ipsec: Fix downgrade version retrieval") we added a check to make sure that GitHub credentials are removed before pulling the untrusted branch from the Pull Request's author. It appears that this check occasionally fails and causes the whole job to abort. But Cilium's repository _is_ public, and it's unclear why ".private == false" does not evaluate to "false" as we expected in that case. Did the curl request fail? Did the reply miss the expected .private field? We'll probably loosen the check as a workaround, but before that it would be interesting to understand better what's going on. Here we remove the -s flag from curl and print the reply from the GitHub API request, so we can better understand what's going on next time we observe a failure. Signed-off-by: Quentin Monnet <qmo@qmon.net> 29 March 2024, 09:06:49 UTC
464bbe4 fix 'mismatch' typos in error messages Signed-off-by: Julian Wiedmann <jwi@isovalent.com> 28 March 2024, 15:48:57 UTC
7baeac2 multicast: change list methods to use BatchLookup Modifying group list and subscriber list methods to use BatchLookup instead of iterating individual key, val pair. Signed-off-by: harsimran pabla <hpabla@isovalent.com> 28 March 2024, 13:52:39 UTC
dc221b0 multicast: fix multicast map name in ELF ignore prefixes Fix multicast outer map name from cilium_mcast_group_v4_outer to cilium_mcast_group_outer_v4_map. Signed-off-by: harsimran pabla <hpabla@isovalent.com> 28 March 2024, 13:52:39 UTC
806c5c2 pkg/nodediscovery,daemon: modularize node discovery This commit modularizes the node discovery package. Before node discovery was created by the daemon, but since all parameters needed are already in hive we can create the node discovery in the hive to. We also split off the creation of local node config into its own cell since there are a few components such as the loader that are interested in the local node config without needing the full node discovery. Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com> 28 March 2024, 13:52:02 UTC
362c094 images: update cilium-{runtime,builder} Signed-off-by: Cilium Imagebot <noreply@cilium.io> 28 March 2024, 12:19:48 UTC
cd5bc4e testdata: minimize build output by reducing header includes This patch should make testdata play a bit nicer with backports, since including headers like node_config.h, ep_config.h and maps.h cause potential churn in the resulting BTF info. Include a minimal subset of headers and reduce testdata code to what's strictly necessary for the Go tests to run. Signed-off-by: Timo Beckers <timo@isovalent.com> 28 March 2024, 12:19:48 UTC
5c35dc3 Makefile: declare CILIUM_BUILDER_IMAGE in Makefile.defs Centralize the declaration so we can assume it's present in other Makefiles importing Makefile.defs. Signed-off-by: Timo Beckers <timo@isovalent.com> 28 March 2024, 12:19:48 UTC
2d0c970 Remove `HAVE_CHANGE_TAIL` The value of `HAVE_CHANGE_TAIL` was dependent on the result of a feature probe that tests for the presence of the `bpf_skb_change_tail` helper function, which was added in kernel v4.9. Now that the minimum supported kernel version is v5.4, we can remove the probe and assume we always have this feature available. Given the existence of global asserts for features significantly newer, I think its safe to not add an explicit assert for this feature. Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com> 28 March 2024, 10:13:24 UTC
50547dd Remove `HAVE_SOCKET_LOOKUP` define The `HAVE_SOCKET_LOOKUP` define was used to check if the current kernel had the `bpf_sk_lookup_tcp`. This is the case of kernels after 4.20. So now that the minimum kernel version is 5.4, we can remove this and assume that the kernel has this feature. Given the precense of global assertions for features that are newer than this helper, I believe it safe to no add explicit assertions for this feature. Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com> 28 March 2024, 10:13:24 UTC
458b5cc test: Update KPR value in ipsec upgrade jobs There is a merge race between the below two PRs, which leads to failure in CI job for ipsec upgrade config 5.15. https://github.com/cilium/cilium/actions/runs/8461160283/job/23180526103 ``` Error: Unable to upgrade Cilium: execution error at (cilium/templates/cilium-configmap.yaml:70:5): kubeProxyReplacement must be explicitly set to a valid value (true or false) to continue. ``` Relates: #31637, #https://github.com/cilium/cilium/pull/31637 Signed-off-by: Tam Mach <tam.mach@cilium.io> 28 March 2024, 08:45:01 UTC
ac804b6 install/kubernetes: use renovate to update quay.io/cilium/startup-script Make sure the latest version of the image is used in the helm charts by letting renovatebot update it automatically. Signed-off-by: Tobias Klauser <tobias@cilium.io> 28 March 2024, 07:40:10 UTC
2d32dab install/kubernetes: use digest for nodeinit image Like other images used in the Cilium helm chart, use a digest in addition to the tag for the nodeinit image. Signed-off-by: Tobias Klauser <tobias@cilium.io> 28 March 2024, 07:40:10 UTC
dbf327d all: remove repetitive words Signed-off-by: deterclosed <fliter@outlook.com> 28 March 2024, 03:07:01 UTC
5daf681 lint: Remove temp variable in the 'for' loop Since golang 1.22+, temp variable in the for loop can be removed. There is new linter copyloopvar in latest golangci-lint, however, there are a lot of false positive now, so probably after a few versions, we can enable it in .golangci.yaml. Signed-off-by: Tam Mach <tam.mach@cilium.io> 27 March 2024, 22:34:36 UTC
6a83269 cleanup: Remove deprecated values for KPR This commit is to remove all deprecated values (strict, disabled, probe and partial) for kubeProxyReplacement. Relates: #26036, #26496 Signed-off-by: Tam Mach <tam.mach@cilium.io> 27 March 2024, 22:33:00 UTC
5864db7 workflows: Cover IPsec encrypted overlay mode in end-to-end tests Encrypted overlay was introduced in d6693413e8afb ("bpf: encrypt overlay traffic"). As the name indicates, with that feature, Cilium will also encrypt the overlay itself (i.e., the VXLAN headers). The present commit covers this configuration in the two IPsec workflows. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> 27 March 2024, 17:19:11 UTC
034aee7 fix: Delegated ipam not configure ipv6 in ipv6 disabled case Delegated ipam returns ipv6 address to cilium cni even if ipv6 disabled in cilium agent config. In this scenario, ipv6 node addressing is not set and its causing cilium cni to crash if delegated ipam returns ipv6 but disabled in cilium agent. Signed-off-by: Tamilmani <tamanoha@microsoft.com> 27 March 2024, 17:16:29 UTC
f2d804b loader: clean up tcx bpf_links created by newer Cilium versions A follow-up commit will introduce attaching TC programs using tcx. Those attachments cannot be overridden using netlink. If an older version of Cilium wants to replace an TC program on a managed interface, it'll need to remove the tcx attachment first. This commit teaches the agent to remove leftover tcx link objects from previous installs, before reattaching it using netlink. Note that this transition is never seamless, since some time passes between deleting the link and attaching the new program using netlink. However, as explained in 7a8e3c810c ("loader: clean up XDP bpf_links created by newer Cilium versions"), this downgrade path should rarely happen. Signed-off-by: Robin Gögge <r.goegge@isovalent.com> Co-authored-by: Timo Beckers <timo@isovalent.com> 27 March 2024, 17:04:38 UTC
e2d90da loader: aggregate replaceDatapath arguments The arguments to the replaceDatapath functions are already quite numerous and make the function signature hard to read. In preparation for future commits, this patch aggregates almost all arguments to the function into one option parameter. Signed-off-by: Robin Gögge <r.goegge@isovalent.com> 27 March 2024, 17:04:38 UTC
377df9b test/verifier: Sort BPF program names for stable output Repeated runs of `go test ./test/verifier` print program complexity in random order. Sorting by external wrappers is not feasibly, because there are groups (each object file compiled with a certain set of defines) that need to be sorted individually. Make the output stable. Signed-off-by: Maxim Mikityanskiy <maxim@isovalent.com> 27 March 2024, 12:54:43 UTC
820aa07 workflows: Debug info for key rotations During the key rotations, we compare the number of keys to the expected number to know where we are in the process (started the rotation or finished it). The expected number of keys depends on the configuration so let's print it in the logs to help debug. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> 27 March 2024, 11:51:37 UTC
60e7212 test/verifier: Keep existing environment when running make Don't purge the environment when running `make -C bpf` in the verifier tests, because unsetting $PATH and $HOME has numerous undesired side effects: 1. Go is not found in complexity-test little-vm-helper images. 2. Git can't find its config in complexity-test LVH images. 3. The user can't override the path to clang. Signed-off-by: Maxim Mikityanskiy <maxim@isovalent.com> 27 March 2024, 10:55:27 UTC
283cb04 workflows: ipsec-e2e: add missing key types for some configs These configs were recent additions, and missed the introduction of the key-type-* parameters. Add them now. Suggested-by: Paul Chaignon <paul.chaignon@gmail.com> Signed-off-by: Julian Wiedmann <jwi@isovalent.com> 27 March 2024, 10:55:06 UTC
23dd8de Document the process for disabling workflows Co-authored-by: Quentin Monnet <qmo@qmon.net> Signed-off-by: Michi Mutsuzaki <michi@isovalent.com> 27 March 2024, 10:40:39 UTC
d00547a bpf,test: add tests for vxlan helper functions Add unit tests for new vxlan helper functions in tunnel.h Signed-off-by: Louis DeLosSantos <louis.delos@isovalent.com> 27 March 2024, 10:11:33 UTC
e1951e9 bpf: add trace notification for overlay encryption Add a trace notification when we are redirecting a packet back into the stack for XFRM encryption. Trace example: -> stack flow 0xc218244b , identity unknown->unknown state encrypt-overlay ifindex 0 orig-ip 0.0.0.0: 172.18.0.3:58167 -> 172.18.0.2:8472 udp Signed-off-by: ldelossa <louis.delos@isovalent.com> 27 March 2024, 10:11:33 UTC
d669341 bpf: encrypt overlay traffic This commit introduces the ability to encrypt overlay traffic before it leaves the host. The 'cil_to_netdev' function is updated to sniff into overlay packets (only VXLAN supported for now) and determine if the ENCRYPTED_OVERLAY_ID security identifier is present in the overlay's header. If it is, a new function in encrypt.h will set the appropriate packet mark on the skb and redirect the packet to the ingress of the interface it was egressing on. When the packet is seen on the ingress side of the device it will be submitted to the XFRM hooks in the output routing path and the XFRM subsystem will encrypt the packet. Subsequent changes to the IPSec control plane to create the appropriate states and policies to support this are required. Signed-off-by: ldelossa <louis.delos@isovalent.com> 27 March 2024, 10:11:33 UTC
8f172e2 ipsec: add encrypted overlay flags This commits and both the agent and datapath flag required to enable the "Encrypted Overlay" feature. The datapath will use ENABLE_ENCRYPTED_OVERLAY feature flag. The agent will use "encryption.ipsec.encryptOverlay" Signed-off-by: Louis DeLosSantos <louis.delos@isovalent.com> 27 March 2024, 10:11:33 UTC
8503f96 datapath: add EncryptedOverlayID reserved ID 11 This commit adds a new reserved security identity for signaling overlay traffic which must be IPSec encrypted. When the eBPF datapath encounters an egress packet with this security identity in an overlay header (currently only VXLan supported) it will subject the packet to IPSec encryption and rewrite the overlay header with the correct security identity before the packet leaves the host. Therefore, this identity should NEVER be seen on traffic ingress or egress the node from the network. Signed-off-by: Louis DeLosSantos <louis.delos@isovalent.com> 27 March 2024, 10:11:33 UTC
43bd8c1 cilium-health: Fix broken retry loop in `cilium-health-ep` controller This commit fixes a bug in the `cilium-health-ep` controller restart logic where it did not give the cilium-health endpoint enough time to startup before it was re-created. For context, the `cilium-health-ep` performs two tasks: 1. Launch the cilium-health endpoint when the controller is started for the first time. 2. Ping the cilium-health endpoint, and if it does not reply, destroy and re-create it. The controller has a `RunInterval` of 60 seconds and a default `ErrorRetryBaseDuration` of 1 second. This means that after launching the initial cilium-health endpoint, we wait for 60 seconds before we attempt to ping it. If that ping succeeds, we then keep pinging the health endpoint every 60 seconds. However, if a ping fails, the controller deletes the existing endpoint and creates a new one. Because the controller then also returns an error, it is immediately re-run after one second, because in the failure case a controller retries with an interval of `consecutiveErrors * ErrorRetryBaseDuration`. This meant that after a failed ping, we deleted the unreachable endpoint, recreated a new one, and after 1s would immediately try to ping it. Because the newly launched endpoint will is unlikely to be reachable after just one second (it requires a full endpoint regeneration with BPF compilation), the `cilium-health-ep` logic would declare the still starting endpoint as dead and re-create it. This loop would continue endlessly, causing lots of unnecessary CPU churn, until enough consecutive errors have happened for the wait time between launch and the first ping to be long enough for a cilium-health endpoint to be fully regenerated. This commit attempts to fix the logic by not immediately killing a unreachable health endpoint and instead waiting for three minutes to pass before we attempt to try again. Three minutes should hopefully be enough time for the initial endpoint regeneration to succeed. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> 27 March 2024, 09:59:03 UTC
e2e97f3 docs: Document No node ID drops in case of remote node deletion While testing cluster scale downs, we noticed that under constant traffic load, we sometimes had drops of type "No node ID found". We confirmed that these are expected when the remote node was just deleted, the delete event received by the local agent, but a local pod is still sending traffic to pods on that node. In that case, the node is removed from the node ID map, but information on pods hosted by that node may still be present. This commit documents it with the other expected reasons for "No node ID found" drops. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> 27 March 2024, 07:45:48 UTC
ebf272d contrib: Add devcontainer setup script and doc update. Signed-off-by: Tomoya Fujita <Tomoya.Fujita@sony.com> 27 March 2024, 02:14:15 UTC
9b1a7c3 iptables: Extract runnable interface from iptablesInterface The current iptablesInteface is mainly used to mock the iptables and ip6tables command in unit testing. Hence it includes the runProgOutput and runProg methods. However, it also includes other methods that are not strictly necessary for testing, so it may be built as an extension of a slim runnable interface that includes just what we need to mock the iptables command execution. As a side benefit, this eliminates the need for mocking the getVersion method. Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com> 26 March 2024, 21:54:53 UTC
4577df2 iptables: Migrate tests to std Go testing pkg Migrate tests from checkmate (the temporary wrapper for gopkg.in/check.v1) to the standard Go testing framework. Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com> 26 March 2024, 21:54:53 UTC
5185252 testing: Update Restore Sort Method Signatrues The Sort methods are updated to take an unused testing.T structure to indicate to all callers that they are only for testing purposes. Signed-off-by: Nate Sweet <nathanjsweet@pm.me> 26 March 2024, 17:19:11 UTC
abd7c6e fqdn: Fallback to Version 1 Port Lookups In cases where a port-protocol is not present in an restored port protocol, look up up the Version 1 version of the PortoProto in case a Version 1 PortProto was restored. Signed-off-by: Nate Sweet <nathanjsweet@pm.me> 26 March 2024, 17:19:11 UTC
6baab36 endpoint: Create a New Restore Field for DNS DNSRulesV2 accounts for protocol and DNSRules does not. DNSProxy needs to account for both, and endpoint needs to be able to restore from a downgrade. DNSRulesV2 is used by default now, but DNSRules is maintained in case of a downgrade. Signed-off-by: Nate Sweet <nathanjsweet@pm.me> 26 March 2024, 17:19:11 UTC
bc7fbf3 fqdn: Add Protocol to DNS Proxy Cache DNS Proxy indexes domain selectors by port only. In cases where protocols collide on port the DNS proxy may have a more restrictive selector than it should because it does not merge port protocols for L7 policies (only ports). All callers of the DNS Proxy are updated to add protocol to any DNS Proxy entries, and all tests are updated to test for port-protocol merge errors. Signed-off-by: Nate Sweet <nathanjsweet@pm.me> 26 March 2024, 17:19:11 UTC
1941679 fqdn: Update DNS Restore to Index to PortProto DNS Proxy needs to account for protocol when indexing L7 DNS rules that it needs to adhere to, otherwise L7 rules with differing port-protocols can override each other (nondeterministically) and create overly restrictive, and incorrect DNS rules. The problem with accounting for protocol is that Endpoint restoration logic uses DNS rules that index to port-only as JSON saved to disk. Adding an additional protocol index to a map structure changes the JSON structure and breaks restoration logic between Cilium versions. This change makes the map index backwards compatible, since it changes the index from a uint16 to a uint32, both of which marshal the same into a JSON structure. The endpoint restoration logic will succeed between versions, because the older version will be automatically differentiated with a lack of a 1-bit at bit position 24. Version 2 will save a 1 bit at the 24th bit going forward to differentiate when protocol is indexed or not present. Signed-off-by: Nate Sweet <nathanjsweet@pm.me> 26 March 2024, 17:19:11 UTC
54b2ce4 ci-e2e: Add e2e test with WireGuard + Host Firewall To get more coverage about the host firewall, let's add a new job in the e2e test suites to run it alongside WireGuard encryption. Signed-off-by: Quentin Monnet <qmo@qmon.net> 26 March 2024, 14:34:53 UTC
147a9c4 docs,bgpv1: A few minor wording improvements Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com> 26 March 2024, 14:16:40 UTC
5d682ad docs,bgpv1: Node failure scenario Add a node failure scenario doc Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com> Co-authored-by: Harsimran Pabla <128612031+harsimran-pabla@users.noreply.github.com> Co-authored-by: Ryan Drew <learnitall0@gmail.com> 26 March 2024, 14:16:40 UTC
5e5ed75 docs,bgpv1: Add Node Shutdown operation guide Add an operation guide to shut down the node while avoiding packet loss as much as possible. Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com> 26 March 2024, 14:16:40 UTC
f5a34f7 node: Log local boot ID We have very little logging of the boot IDs. Really fixing that will require a bit of work to not be too verbose, but in the meantime, we should at least log the local boot ID. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> 26 March 2024, 13:47:12 UTC
98dd97b ipsec: fix per-node-pair-key computation This commit ensures that - each time we compute a per-node-pair-key we create an empty slice with the correct length first, and then append all the input data instead of appending to one of the input slices (`globalKey`) directly. - the IPs that are used as arguments in `computeNodeIPsecKey` are canonical, meaning IPv4 IPs consist of 4 bytes and IPv6 IPs consist of 16 bytes. This is necessary to always have the same inputs on all nodes when computing the per-node-pair-key. Without this IPs might not match on the byte level, e.g on one node the input is a v6 mapped v4 address (IPv4 address in 16 bytes) and on the other it isn't when used as input to the hash function. This will generate non-matching keys. Co-authored-by: Zhichuan Liang <gray.liang@isovalent.com> Signed-off-by: Robin Gögge <r.goegge@gmail.com> 26 March 2024, 13:47:12 UTC
2e321eb k8s: bump CRD schema version When adding the BootID field to the CiliumNode CRD, we forgot to bump the version, which is an issue when after an cilium upgrade the operator tries to update the CiliumNode objects to include the BootID field. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> 26 March 2024, 13:47:12 UTC
07711d8 ipsec: disallow empty bootid for key generation A node update that doesn't contain a BootID will cause the creation of non-matching XFRM IN and OUT states across the cluster as the BootID is used to generate per-node key pairs. Non-matching XFRM states will result in XfrmInStateProtoError, causing packet drops. An empty BootID should thus be treated as an error, and Cilium should not attempt to derive per-node keys from it. Signed-off-by: Robin Gögge <r.goegge@gmail.com> 26 March 2024, 13:47:12 UTC
e8ddc88 workflows: Extend IPsec key rotation coverage Since commit 4cf468b91b ("ipsec: Control use of per-node-pair keys from secret bit"), IPsec key rotations can be used to switch from the single-key system to the per-tunnel key system (also referred to as per-node-pair key system). Our key rotation test in CI was updated to cover such a switch. This commit extends it to also cover traditional key rotations, with both the new and old key systems. The switch back into a single-key system is also covered. These special key rotations are controlled with a single + sign. Adding it after the SPI in the IPsec Kubernetes secret is enough to switch to a per-tunnel key system. We thus simply need to cover all 4 cases of having or not having the + sign in the old and new secrets. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> 26 March 2024, 13:47:12 UTC
e448644 workflows: Rename argument of key-rotation action to key-algo The subsequent commit will introduce other arguments that are also named "type" so let's make the existing one more precise. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> 26 March 2024, 13:47:12 UTC
8e1c313 conn-disrupt: Allowlist XfrmInNoStates packet drops The IPsec fixes will introduce a few XfrmInNoStates packet drops on up/downgrades due to non-atomic Linux APIs (can't replace XFRM states atomically). Those are limited to a very short time (time between two netlink syscalls). We however need to allowlist them in the CI. Since we're using the conn-disrupt GitHub action from main, we need to allowlist in main for the pull request's CI to pass. Note that despite the expected-xfrm-errors flag, the tests will still fail if we get 10 or more such drops. We don't expect so many XfrmInNoStates drops so we still want to fail in that case. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> 26 March 2024, 13:47:12 UTC
b511bd1 ipsec: Control use of per-node-pair keys from secret bit The ESN bit in the IPsec secret will be used to indicate whether per-node-pair keys should be used or if the global key should remain in use. Specifically, it consist in a '+' sign after the SPI number in the secret. This ESN bit will be used to transition from a global key system to a per-node-pair system at runtime. We would typically rely on an agent flag for such a configuration. However, in this case, we need to perform a key rotation at the same time as we change the key system. Encoding the key system in the IPsec secret achieves that. By transition from the global to the per-node-pair keys via a key rotation, we ensure that the two can coexist during the transition. The old, global key will have XFRM rules with SPI n, whereas the new, per-node-pair keys will have XFRM rules with SPI n+1. Using a bit in the IPsec secret is also easier to test because we already have all the logic to test key rotation (whereas we would need new logic to test a flag change). The users therefore need to perform a key rotation from e.g.: 3 rfc4106(gcm(aes)) [...] 128 to: 4+ rfc4106(gcm(aes)) [...] 128 The key rotation test in CI is updated to cover a rotation from 3 to 4+ (meaning a rotation into the new per-node-pair key system). Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> 26 March 2024, 13:47:12 UTC
7a2a18d ipsec: Enable ESN anti-replay protection Now we can enable ESN anti-replay with window size of 1024. If a node reboots then everyone updates the related keys with the new one due to the different bootid, the node itself is already generating the keys with the new bootid. The window is used to allow for out-of-order packets, anti-replay still doesn't allow to replay any packet but keeps a bitmap and can accept out-of-order packets within window size range. For more information check section ""A2. Anti-Replay Window" of RFC 4303. Signed-off-by: Nikolay Aleksandrov <nikolay@isovalent.com> Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> Co-authored-by: Paul Chaignon <paul.chaignon@gmail.com> 26 March 2024, 13:47:12 UTC
913995f docs: Document Xfrm{In,Out}NoStates on node reboots When a node reboots the key used to communicate with it is expected to change due to the new boot id generated. While the new key is being installed we may need to do it non-atomically (delete + insert), so packets to/from that node might be dropped which would cause increases in the XfrmNoStatesIn/Out. Add a note about it in the docs so users are not surprised. Signed-off-by: Nikolay Aleksandrov <nikolay@isovalent.com> Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> Co-authored-by: Paul Chaignon <paul.chaignon@gmail.com> 26 March 2024, 13:47:12 UTC
840c88e ipsec: Update existing states when a node's bootid changes When we detect that a node's bootid has changed, we need to update the IPsec states. Unfortunately this is not as straightforward as it should be, because we may receive the new boot ID before a CiliumInternalIP is assign to the node. In such a case, we can't install the XFRM states yet because we don't have the CiliumInternalIP, but we need to remember that the boot ID changed and states should be replaced. We therefore record that information in a map, ipsecUpdateNeeded, which is later read to see if the boot ID changed. Signed-off-by: Nikolay Aleksandrov <nikolay@isovalent.com> Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> Co-authored-by: Paul Chaignon <paul.chaignon@gmail.com> 26 March 2024, 13:47:12 UTC
b409312 ipsec: Use boot IDs when deriving per-node keys We need to ensure we never have two packets encrypted with the same key and sequence number. To that end, in previous commits, we introduced per-node-pair keys. That is however not sufficient. Since the sequence numbers can start from zero on node boot, if a node reboot, it will start sending traffic encrypted again with the same key and sequence number as it did before. To fix that, we need to ensure the per-node-pair keys change on node reboots. We achieve that by using the boot ID in the per-node-pair key calculation. For a pair of nodes A and B with IP addresses a and b and boot IDs x and y, we will therefore install two different keys: Node A <> Node B XFRM IN: key(b+a+y+x) XFRM IN: key(a+b+x+y) XFRM OUT: key(a+b+x+y) XFRM OUT: key(b+a+y+x) This is done such that, for each pair of nodes A, B, the key used for decryption on A (XFRM IN) is the same key used for encryption on B (XFRM OUT), and vice versa. Since we are now retrieving the local node's boot ID as part of the IPsec code, we need to initialize the mocked local node store in the unit tests. Signed-off-by: Nikolay Aleksandrov <nikolay@isovalent.com> Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> Co-authored-by: Paul Chaignon <paul.chaignon@gmail.com> 26 March 2024, 13:47:12 UTC
c72e9f4 k8s, node: Add bootid to CiliumNode resource Read and export the local bootid via CiliumNode. We'll need it in a subsequent commit to generate new IPsec keys when a node reboots. This commit also collects the boot_id file as part of the bugtool report. Signed-off-by: Nikolay Aleksandrov <nikolay@isovalent.com> Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> Co-authored-by: Paul Chaignon <paul.chaignon@gmail.com> 26 March 2024, 13:47:12 UTC
f1c4a6e ipsec: Allow old and new XFRM IN states to coexist for upgrade This commit extends the logic from commit c0d9b8c9e ("ipsec: Allow old and new XFRM OUT states to coexist for upgrade") to have both the old and the new XFRM IN states in place. This is necessary to avoid packet drops during the upgrade. As with the XFRM OUT states, we can't add the new IN state while the old one is in place. We therefore need to first remove the old state, to then add the new one. See c0d9b8c9e ("ipsec: Allow old and new XFRM OUT states to coexist for upgrade") for details. Note this commit also removes the comparison of output-marks. Output-marks aren't actually used by the kernel to decide if two states conflict. And in the case of XFRM IN states, the output-marks changed a bit as well. Despite being different, the states still conflict. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> 26 March 2024, 13:47:12 UTC
c017f65 ipsec: Per-node XFRM IN states We want to have one IPsec key per node1->node2 (not including node2->node1 which will get a different key). We therefore need per-node XFRM states on the receive/decrypt side to carry each node's key. This commit implements that change. Thus, instead of creating a unique XFRM IN state when we receive the local node, we will create an XFRM IN state everytime we receive a remote node. Signed-off-by: Paul Chaignon <paul@cilium.io> 26 March 2024, 13:47:12 UTC
7f35fc5 ipsec, bpf: Match XFRM IN states using mark instead of source IP It turns out that two XFRM IN states can't have the same mark and destination IP, even if they have different source IPs. That's an issue in our case because each node1->node2 pair will have its own IPsec key. Therefore, we need one XFRM state per origin node on input. Since we can't differentiate those XFRM states by their source IPs, we will have to differentiate using the marks. To do so, we need to convert the source IP into a packet mark before matching against XFRM states. We can write these packet marks in bpf_network, before going up the stack for decryption. And conveniently, we've just introduce a way to convert each cluster node into an ID, the node ID, which fits in the packet mark. This commit therefore performs an node ID map lookup to retrieve the node ID using the outer source IP address when packets are first processed in bpf_network. We clear the node ID from the packet mark after decryption using XFRM (output-mark). If no node ID is found for the outer source IP, we drop the packet. It seems preferable to drop it from BPF with all the contextual information rather than let it proceed to the XFRM layer where it will be dropped with only an error code incrementing. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> 26 March 2024, 13:47:12 UTC
41d74a0 ipsec: Replace states with the old IPsec key In the previous commit, we changed the way we compute the IPsec keys. We therefore need to replace the XFRM states to use the new keys. Our current update logic however doesn't take this case into account. It compares states based on IPs, marks, and SPIs, but it doesn't compare the keys. It would therefore assume that the correct states are already installed. This commit extends that logic to detect a difference in encryption keys and, if such a difference exist, remove the old states. Signed-off-by: Paul Chaignon <paul@cilium.io> 26 March 2024, 13:47:12 UTC
c28e046 ipsec: Compute per-node-pair IPsec keys We need to ensure the (key used, sequence number) tuple for each encrypted packet is always unique on the network. Today that's not the case because the key is the same for all nodes and the sequence number starts at 0 on node reboot. To enable this, we will derive one key per node pair from a global key shared across all nodes. We need it per node pair and not per node because the first message emitted from A to B shouldn't be using the same key as the first message emitted from B to A, to satisfy the above requirement. To that end, for each node pair (A, B), we compute a key as follows: key = sha256(global_key + ip_of_a + ip_of_b) The sha256 sum is then truncated to the expected length. Once computed, we install the derived keys such that the key used for encryption on node A is the same as the key used for decryption on node B: Node A <> Node B XFRM IN: key(b+a) XFRM IN: key(a+b) XFRM OUT: key(a+b) XFRM OUT: key(b+a) Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> 26 March 2024, 13:47:12 UTC
c0d5d90 ipsec: Move enableIPsecIPv{4,6} preconditions to caller This small bit of refactoring will ease a subsequent commit. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> 26 March 2024, 13:47:12 UTC
01e93ba ipsec: New IPsec secret bit to indicate per-node-pair keys Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> 26 March 2024, 13:47:12 UTC
2f82995 daemon,test: remove enableRemoteNodeIdentity flag The enable-remote-node-identity agent flag was marked as deprecated for 1.15 in commit cf472ef ("daemon: Deprecate EnableRemoteNodeIdentity"). Signed-off-by: Donia Chaiehloudj <donia.cld@isovalent.com> 26 March 2024, 13:30:18 UTC
57a13a4 manager: Remove legacy node ipbehaviour for remote node identity This commit wipes the code from the legacy remote node ip behaviour. It also removes the `TestRemoteNodeIdentities` test case in `manager_test` and completes `TestIPCache`. Signed-off-by: Donia Chaiehloudj <donia.cld@isovalent.com> 26 March 2024, 13:30:18 UTC
b546034 helm,documentation: Remove enableRemoteNodeIdentity agent flag The enable-remote-node-identity agent flag was marked as deprecated for 1.15 in commit cf472ef ("daemon: Deprecate EnableRemoteNodeIdentity"). This commit removes the option in the helm values and updates the documentation. Signed-off-by: Donia Chaiehloudj <donia.cld@isovalent.com> 26 March 2024, 13:30:18 UTC
c0f9955 contrib/vagrant: Remove enableRemoteNodeIdentity agent flag The enable-remote-node-identity agent flag was marked as deprecated for 1.15 in commit cf472ef ("daemon: Deprecate EnableRemoteNodeIdentity"). This commit removes the option from the deployment tool. Signed-off-by: Donia Chaiehloudj <donia.cld@isovalent.com> 26 March 2024, 13:30:18 UTC
010fb44 bgpv2: adding pod ip pool reconciler Introducing PodIPPoolReconciler in BGPv2 reconcilers. Desired prefixes are calculated based on CiliumBGPAdvertisement of type BGPCiliumPodIPPoolAdvert and matching CiliumPodIPPool prefixes assigned to CiliumNode object. It will use Advertisement reconciler to calculate diff and push desired configuration to underlying router interface. Signed-off-by: harsimran pabla <hpabla@isovalent.com> 26 March 2024, 11:59:07 UTC
8ec942b bgpv2: pod_cidr reconciler to add paths corresponding to address family Adding address family check to make sure we add v4 prefixes to v4 address family and v6 prefixes to v6 address family. Signed-off-by: harsimran pabla <hpabla@isovalent.com> 26 March 2024, 11:59:07 UTC
a00a5cd bgpv2: advertisement reconciler to compute all address families Advertisement reconciler is consumed by multiple reconcilers, all of which work on AFPathsMap, instead of slice of Paths. In order to simplify other reconcilers, advertisement reconciler will consume AFPathsMap and reconcile paths per address family. This change also replaces slice of paths per address family to map of paths per address family. This makes sure that caller are not having duplicate paths in desired path list as well as simplifies calculation of the diff. Signed-off-by: harsimran pabla <hpabla@isovalent.com> 26 March 2024, 11:59:07 UTC
3d95fbc bugtool: Collect hubble metrics Signed-off-by: Chance Zibolski <chance.zibolski@gmail.com> 26 March 2024, 10:59:45 UTC
7992f75 ingress/gateway-api: sorted virtual hosts Currently, while translating K8s Ingress or Gateway API resources into Envoy resources, the virtualhosts aren't sorted. This leads to situations (especially in combination with Shared Ingress) where the order of the virtual hosts isn't guaranteed. Therefore, this commit orders the virtualhosts within a Envoy RouteConfiguration by their name. This influences the Envoy route matching process (https://www.envoyproxy.io/docs/envoy/latest/configuration/http/http_conn_man/route_matching), but only by making it constant and not random. Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com> 26 March 2024, 07:28:12 UTC
16f6afe ingress: sort all shared ingresses during model generation Currently, when building the model for shared Ingress, all Ingresses in the cluster are listed and processed. The order of the Ingresses can differ and potentially influence the generated CiliumEnvoyConfig. This can lead to unnecessary reconciles. (Even though the internal translation already handles a stable CiliumEnvoyConfig generation where possible.) In addition to the existing stable translation logic, this commit sorts all shared Ingresses by their namespace and name before processing. This way a consistent translation is more likely to be guaranteed. Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com> 26 March 2024, 07:27:17 UTC
a2bf108 docs: ipsec: document native-routing + Egress proxy case Let the docs reflect the limitation from https://github.com/cilium/cilium/security/advisories/GHSA-j89h-qrvr-xc36. Signed-off-by: Julian Wiedmann <jwi@isovalent.com> 26 March 2024, 06:35:11 UTC
d3e62cb ipam: Remove unused variable Even though this variable is exported, none of the code uses it which makes it safe to remove. Signed-off-by: Chris Tarazi <chris@isovalent.com> 26 March 2024, 05:44:38 UTC
4367ffa docs: add section for scale implications of nodemap size. With previous commits adding the ability to adjust nodemap size, this adds a section explaining the implications of the nodemap sizing. Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com> 26 March 2024, 05:43:51 UTC
3d641e1 nodemap: add validation to check that node map max is at least 16384. This is the constant default size prior to adding the flag. There's not much reason to lower this value so to avoid edge cases we'll just say that this is the lower bound. Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com> 26 March 2024, 05:43:51 UTC
a596a5a helm: add bpf.nodeMapMax helm val to configure node map size. This can be used to override the default node-map-max value which sets bpf node map size. In some cases, node map size may need to be overridden for very large clusters. Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com> 26 March 2024, 05:43:51 UTC
0c3570c nodemap: add node-map-max flag to configure nodemap bpf size. Using the node-map-max flag, one can now override the default 16k node map size. This may be needed for large clusters, where the number of distinct node IPs in the cluster exceeds the standard size. Also provides Size() to nodemap.Map interface such that loader can use this to set the NODE_MAP_MAX var while building bpf programs. Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com> 26 March 2024, 05:43:51 UTC
305ea74 ingress/gateway-api: ordered envoy filterchain for TLS listener Currently, while translating K8s Ingress or Gateway API resources into Envoy resources, the filterchain for TLS listeners is in random order. This leads to situations (especially in combination with Shared Ingress) where the order of the filterchains isn't guaranteed - resulting in unnecessary reconciliations. Therefore, this commit orders the filterchains within a Envoy Listener by the name of the backends. This makes the translation deterministic. Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com> 25 March 2024, 22:19:29 UTC
d8082f9 ingress/gateway-api: ordered envoy filterchain Currently, while translating K8s Ingress or Gateway API resources into Envoy resources, the filterchain is in random order. This leads to situations (especially in combination with Shared Ingress) where the order of the filterchains isn't guaranteed - resulting in unnecessary reconciliations. Therefore, this commit orders the filterchains within a Envoy Listener by the namespace and name of the TLS secret. This makes the translation deterministic. Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com> 25 March 2024, 22:19:29 UTC
a5f2a72 chore(deps): update golangci/golangci-lint docker tag to v1.57.1 Signed-off-by: renovate[bot] <bot@renovateapp.com> 25 March 2024, 16:03:11 UTC
460fc38 bpf: lb: have __lb*_rev_nat() take the source port from CT tuple Instead of loading the source port from the packet, obtain it from the provided CT tuple. Signed-off-by: Julian Wiedmann <jwi@isovalent.com> 25 March 2024, 15:17:23 UTC
back to top