Revision history - refs/heads/beta/service-mesh - origin: https://github.com/cilium/cilium

visit type:

Revision	Author	Date	Message	Commit Date
d8460a4	André Martins	09 December 2021, 15:37:33 UTC	.github: add parameter to allow for image suffix This commits adds a workflow parameter to allow for an developer-defined image suffix. It's useful if the developers don't want to use the default "beta" prefix for the docker image repository. Signed-off-by: André Martins <andre@cilium.io>	09 December 2021, 18:24:50 UTC
26a551e	Jarno Rajahalme	07 December 2021, 19:43:28 UTC	helm: Add clusterroles for ciliumenvoyconfigs, ingresses Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>	07 December 2021, 19:56:30 UTC
1716da7	Jarno Rajahalme	06 December 2021, 15:31:01 UTC	ingress: Fix generated code Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>	07 December 2021, 19:56:30 UTC
c6a5327	Jarno Rajahalme	06 December 2021, 14:47:52 UTC	ingress: Lint fixes Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>	07 December 2021, 19:56:30 UTC
030e5c5	Jarno Rajahalme	06 December 2021, 08:32:33 UTC	ingress: Comment on ingress target ports It seems these are ignored in all cases. Ingress still must specify them. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>	07 December 2021, 19:56:30 UTC
25f030c	Jarno Rajahalme	06 December 2021, 07:59:00 UTC	ingress: Use unique naming for Envoy resources Prepend Envoy resource names with ingress namespace and name so that multiple ingresses can be configured at the same time. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>	07 December 2021, 19:56:30 UTC
d75a5e6	Jarno Rajahalme	03 December 2021, 15:04:48 UTC	ingress: Handle updates Handle ingress updates by updating the CiliumEnvoyConfig if needed. The ingress service is assumed to remain the same. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>	07 December 2021, 19:56:30 UTC
178aba0	Jarno Rajahalme	03 December 2021, 12:28:34 UTC	ingress: Minor fixes Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>	07 December 2021, 19:56:30 UTC
011a657	Michi Mutsuzaki	16 November 2021, 18:26:14 UTC	specify protocol options Signed-off-by: Michi Mutsuzaki <michi@isovalent.com>	07 December 2021, 19:56:29 UTC
5be8a70	Michi Mutsuzaki	01 November 2021, 17:39:56 UTC	try configuring tls Signed-off-by: Michi Mutsuzaki <michi@isovalent.com>	07 December 2021, 19:56:29 UTC
8aa3438	Michi Mutsuzaki	19 October 2021, 23:57:45 UTC	add service Signed-off-by: Michi Mutsuzaki <michi@isovalent.com>	07 December 2021, 19:56:29 UTC
9d83a8c	Michi Mutsuzaki	31 October 2021, 23:37:57 UTC	Add DeletionTimestamp Signed-off-by: Michi Mutsuzaki <michi@isovalent.com>	07 December 2021, 19:56:29 UTC
34d77a1	Michi Mutsuzaki	19 October 2021, 00:05:45 UTC	Generate k8s client and register Ingress types Re-generate k8s client by running `make generate-k8s-api` and register Ingress and IngressClass types. Ref: #17191 Signed-off-by: Michi Mutsuzaki <michi@isovalent.com>	07 December 2021, 19:56:29 UTC
8546ce5	Michi Mutsuzaki	19 October 2021, 00:03:50 UTC	Add types related to K8s Ingress This commit adds K8s types related to Ingress. The current plan is to implement an ingress controller logic in cilium-operator that translates ingress definitions to corresponding CiliumEnvoyConfig custom resources. Ref: #17191 Signed-off-by: Michi Mutsuzaki <michi@isovalent.com>	07 December 2021, 19:56:29 UTC
ff43f87	Jarno Rajahalme	01 November 2021, 18:08:38 UTC	datapath: Reduce from LXC complexity Split from LXC code paths into two tail calls when per packet load balancing is needed. When per packet load balancing is not needed this should have minimal impact on datapath performance. IPv6 code path is refactored to pattern it more similar to IPv4 code path. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>	07 December 2021, 19:56:29 UTC
7e8aee1	Jarno Rajahalme	20 October 2021, 07:23:33 UTC	datapath: Redirect nodeport L7 LB services Redirect L7 LB nodeport service connections received by the host to an Envoy listener. Prior to this commit these services worked as intended only when the service connection was originated by a pod. This commit implements L7 LB redirection for nodeport services received by the node from other nodes or from an external load balancer. Redirect to nodeport L7 LB from bpf_sock for hostns sources. This is needed due to this case not being handled by eBPF datapath on TC level. Note that this requires a recent kernel to work (Linux 5.12+). With older kernels nodeport services only work between nodes, not from a node host namespace to the node itself. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>	07 December 2021, 19:56:29 UTC
3d475fb	Jarno Rajahalme	21 September 2021, 07:52:29 UTC	bpf: Add support for re-entering LXC egress path after L7 LB Allow host to tail call to lxc egress for policy enforcement after L7 LB. This requires L7 LB proxy to set the source Endpoint ID in the socket mark. This mark includes a new magic value that the datapath recognizes to subject the traffic to source LXC (policy enforcement) processing again. Note that this does not loop endlessly due to the L7 LB switching from service frontend IP to backend IP. iptables rules are amended to avoid overwriting the new "from L7 LB" mark with "from host" mark. L7 load balancer upstream connections may go to the egress proxy listener of the source pod. Pod connections to proxy listeners are already not tracked to avoid conflicts due to reuse of the same 5-tuple on both sides of the (egress) proxy. Add NOTRACK rules for L7 LB connections that may go to the egress proxy for the same reason. A new flag 'from_l7lb' is added to conntrack to enable redirecting reply packets back to L7 LB proxy. Add a new map for bpf_host to tail call into for LXC policy enforcement after L7 LB. Some kernels reached their eBPF complexity limit with the new code compiled in. Address this by making all L7 LB related code in bpf_lxc.c confitional on the new ENABLE_L7_LB. This allows the new feature to be used on kernels that allow a but more complex eBPF programs. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>	07 December 2021, 19:56:29 UTC
0085812	Jarno Rajahalme	17 September 2021, 10:13:11 UTC	envoy: Add xDS resource validation Make sure Envoy xDS resources have names and validate xDS resources before adding them to xDS cache. This validation can not be done at the k8s CRD apply time, so k8s CRD can contain invalid Envoy configurations. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>	07 December 2021, 19:56:29 UTC
affba99	Jarno Rajahalme	19 October 2021, 14:46:17 UTC	envoy: Do not wait for RDS, CDS, EDS, or SDS Do not wait for routes (RDS), clusters (CDS), cluster load assignments (EDS), or secrets (SDS) as they will only be acked if referenced from a listener that may not exist yet or may have been deleted already. Add Cilium agent option --envoy-config-timeout to limit the maximum time the agent waits for Envoy (Listener) configuration to be acked, defaulting to 2 minutes. This unblocks k8s watchers if the agent ever waits for Envoy for resources that are not being acknoowledged. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>	07 December 2021, 19:56:29 UTC
61648cf	Jarno Rajahalme	17 September 2021, 12:45:22 UTC	envoy: Register k8s service redirection to CRD listener Register L7 LB service redirections. Use 'L7LBInfo' to keep track of the CiliumEnvoyConfig that requested a service to be registered for L7 load balancing. Only one CiliumEnvoyConfig may request redirection of a given service at a time, as otherwise the choice between Envoy listeners would be ambiguous. Service backends are synced to Envoy, so that the ingress service can load balance to them. Datapath L7LB redirection is only requested for 'services' in a CiliumEnvoyResources. Backends are synced to Envoy also for 'backendServices'. This allows the regular eBPF LB to be used for pod-to-pod traffic for 'backendServices' while L7 LB is used for ingress to those same services. Note: 'Ingress' CEC configurations are still experimental. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>	07 December 2021, 19:56:29 UTC
37e4bf3	Jarno Rajahalme	17 September 2021, 12:27:50 UTC	envoy: Allocate CRD listener ports if unspecified Add support for dynamically created (CRD) proxy ports in pkg/proxy. Allocate proxy ports for CRD listeners without Listener.Address specification and install datapath TPROXY rules if Listener.Address is left empty. Allocated proxy ports are sticky in order to return the same port when the same listener (by name) is reconfigured or configured again after deletion. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>	07 December 2021, 19:56:29 UTC
776fadc	Jarno Rajahalme	10 September 2021, 13:24:00 UTC	service: Push Envoy Endpoints for L7 LB k8s services Upsert Envoy endpoint resources for services that have been selected for L7 LB in CiliumEnvoyConfig CRDs. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>	07 December 2021, 19:56:29 UTC
66afaec	Jarno Rajahalme	06 September 2021, 17:09:37 UTC	lb: Add support for L7 load balancers Add new SVC_FLAG_L7LOADBALANCER to designate services that are not loadbalanced in BPF, but are forwarded to a local TPROXY loadbalancer instead. The set of proxy ports registered to datapath was limited to the preconfigured set of HTTP, DNS, and proxylib ports. Allow adding (CRD) proxy ports to this set. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>	07 December 2021, 19:56:29 UTC
426e719	Jarno Rajahalme	25 October 2021, 23:04:57 UTC	envoy: Initial SDS support Add Initial support for Secret Discovery Service. Cilium xds cluster details need to be given on each instance, omitting SDS config source is not yet supported. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>	07 December 2021, 19:56:29 UTC
e6b43ef	Jarno Rajahalme	06 September 2021, 09:00:48 UTC	envoy: Add xDS support for RDS, CDS, & EDS Add support for Route, Cluster, and Endpoint discovery services. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>	07 December 2021, 19:56:29 UTC
7da6cff	Jarno Rajahalme	17 August 2021, 15:40:20 UTC	k8s: Add CiliumEnvoyConfig CRD Add a new CiliumEnvoyConfig to cilium.io API v2alpha1 Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>	07 December 2021, 19:56:29 UTC
3ddbf4d	Jarno Rajahalme	15 October 2021, 08:19:27 UTC	envoy: Clarify xDS revert function locking xDS revert functions can not be called from the callback functions provided to xDS cache update functions due to a locking that would lead to a deadlock. Revert functions are designed to be called after a larger transaction fails, not after a single resource update fails. This is why the revert functions assume to be called outside of any given callback function and therefore take the locks needed for that. Clarify this and add a type that calls the revert functions n reverse order. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>	07 December 2021, 19:56:29 UTC
abb91b4	Jarno Rajahalme	19 August 2021, 14:52:35 UTC	envoy: Call 'callback' also when using current resource version Call the completion callback also when there is no change in the cached resource and current version is used, but it has not been acknowledged yet. In current use the prior behavior did not cause a problem, but the new Envoy CRD code depends on the callback to be called also in this case. Note that the callback is NOT called when there is no change in the resource and the current version has already been acknowledged. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>	07 December 2021, 19:56:29 UTC
aef9e87	Jarno Rajahalme	18 August 2021, 14:30:21 UTC	envoy: Switch to protobuf Go APIv2 Switch to protobuf Go APIv2 by changing imports from 'github.com/golang/protobuf' to 'google.golang.org/protobuf'. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>	07 December 2021, 19:56:29 UTC
2cabef7	Jarno Rajahalme	11 November 2021, 14:44:18 UTC	Envoy: Update API and image for CiliumEnvoyConfig support Envoy Go API is updated to contain the generated validation code. Envoy image is updated to support the new EndpointId option for the bpf_metadata listener filter. NPDS field 'Policy' is renamed as 'EndpointID'. 'Policy' field was not used for anything, so might as well recycle it while this API is not yet public. Envoy retries may fail on "address already in use" when the original source address and port are used on upstream connections. Cilium typically does this in the egress proxy listeners. Fix this by using a Cilium Envoy build that always sets SO_REUSEADDR when original source address and port is used. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>	07 December 2021, 19:56:29 UTC
6f6dd13	Joe Stringer	06 December 2021, 16:10:29 UTC	install: Update image digests for v1.11.0 Generated from https://github.com/cilium/cilium/actions/runs/1545290132. `docker.io/cilium/cilium:v1.11.0@sha256:ea677508010800214b0b5497055f38ed3bff57963fa2399bcb1c69cf9476453a` `quay.io/cilium/cilium:v1.11.0@sha256:ea677508010800214b0b5497055f38ed3bff57963fa2399bcb1c69cf9476453a` `docker.io/cilium/clustermesh-apiserver:v1.11.0@sha256:361942671ce067cc7f3e97c2114512283148bcee5ec29e4f0a828869aedd4ced` `quay.io/cilium/clustermesh-apiserver:v1.11.0@sha256:361942671ce067cc7f3e97c2114512283148bcee5ec29e4f0a828869aedd4ced` `docker.io/cilium/docker-plugin:v1.11.0@sha256:2b7df46918ba832f7c55bc7255f8599af30aa8dc43d62f854b7f10b43f8387c9` `quay.io/cilium/docker-plugin:v1.11.0@sha256:2b7df46918ba832f7c55bc7255f8599af30aa8dc43d62f854b7f10b43f8387c9` `docker.io/cilium/hubble-relay:v1.11.0@sha256:306ce38354a0a892b0c175ae7013cf178a46b79f51c52adb5465d87f14df0838` `quay.io/cilium/hubble-relay:v1.11.0@sha256:306ce38354a0a892b0c175ae7013cf178a46b79f51c52adb5465d87f14df0838` `docker.io/cilium/operator-alibabacloud:v1.11.0@sha256:e61929869d59c5093c6d129ca1c21386338e1387051779d499a988545680b00a` `quay.io/cilium/operator-alibabacloud:v1.11.0@sha256:e61929869d59c5093c6d129ca1c21386338e1387051779d499a988545680b00a` `docker.io/cilium/operator-aws:v1.11.0@sha256:5f60a4e17ab33a3dcd2a942802b15f9e7be3d18f24464f31bba81a65a117e094` `quay.io/cilium/operator-aws:v1.11.0@sha256:5f60a4e17ab33a3dcd2a942802b15f9e7be3d18f24464f31bba81a65a117e094` `docker.io/cilium/operator-azure:v1.11.0@sha256:c1b41e6cbf6f1e0bb417170ac79eb6d78a7e39b775f1131a1104546fd18d745f` `quay.io/cilium/operator-azure:v1.11.0@sha256:c1b41e6cbf6f1e0bb417170ac79eb6d78a7e39b775f1131a1104546fd18d745f` `docker.io/cilium/operator-generic:v1.11.0@sha256:b522279577d0d5f1ad7cadaacb7321d1b172d8ae8c8bc816e503c897b420cfe3` `quay.io/cilium/operator-generic:v1.11.0@sha256:b522279577d0d5f1ad7cadaacb7321d1b172d8ae8c8bc816e503c897b420cfe3` `docker.io/cilium/operator:v1.11.0@sha256:c802c16b7ab561075c08779c0e4c53acdb97753c38f27424bc243e444aa524b9` `quay.io/cilium/operator:v1.11.0@sha256:c802c16b7ab561075c08779c0e4c53acdb97753c38f27424bc243e444aa524b9` Signed-off-by: Joe Stringer <joe@cilium.io>	06 December 2021, 18:52:33 UTC
27e0848	Joe Stringer	05 December 2021, 23:34:41 UTC	Prepare for release v1.11.0 Signed-off-by: Joe Stringer <joe@cilium.io>	06 December 2021, 15:22:37 UTC
9e9ca6d	André Martins	03 December 2021, 00:15:53 UTC	docs: fix eksctl ClusterConfig to allow copy [ upstream commit 00275427db4addae523c17fc5424bab63cacc029 ] This commit fixes the eksctl ClusterConfig to allow for copy. It is merely a workaround for now until a proper fix is available. Fixes: 706c9009dc39 ("docs: re-write docs to create clusters with tainted nodes") Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: nathanjsweet <nathanjsweet@pm.me>	05 December 2021, 23:01:54 UTC
0884239	Martynas Pumputis	03 December 2021, 08:53:45 UTC	docs: Clarify deprecated "prefilter-devices" [ upstream commit cc1ded8aefd72159066bebe37af17d890a14f749 ] Make it clear how users can select devices for the prefiltering. Reported-by: André Martins <andre@cilium.io> Signed-off-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: nathanjsweet <nathanjsweet@pm.me>	05 December 2021, 23:01:54 UTC
d92980f	Sebastian Wicki	01 December 2021, 09:52:57 UTC	images: Bump Hubble CLI to v0.9.0 [ upstream commit 6bd38339e4d55198befab99d2147038189b32b07 ] This bumps the Hubble CLI to the recently released version 0.9.0. Hubble CLI v0.9.0 has been released to include the Hubble protobuf API changes present in Cilium v1.11-rc3 and thus is intended to be bundled with the final Cilium v1.11 release. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> Signed-off-by: nathanjsweet <nathanjsweet@pm.me>	05 December 2021, 23:01:54 UTC
b0c91ae	André Martins	02 December 2021, 03:07:09 UTC	docs: cleanup and tidy up the 1.11 upgrade guide [ upstream commit ce68d37266ba70a9b77de70f7657930ca874980f ] This upgrade guide contained all other versions in it. To prevent users from mistakenly reading an old upgrade guide, we should remove those leftovers. Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: nathanjsweet <nathanjsweet@pm.me>	05 December 2021, 23:01:54 UTC
0c83cb7	Alexandre Perrin	02 December 2021, 09:34:43 UTC	doc: add upgrade note about nativeRoutingCIDR deprecation [ upstream commit b0ab42558e2b80839f2b321475646e909011185c ] Missed by e03bfffd55466366289944dd087b9ae18593355f Signed-off-by: Alexandre Perrin <alex@kaworu.ch> Signed-off-by: nathanjsweet <nathanjsweet@pm.me>	05 December 2021, 23:01:54 UTC
000edbd	Gilberto Bertin	02 December 2021, 13:45:37 UTC	docs: clarify upgrade impact for clients using an egress gateway [ upstream commit 2273b041c152805cab3989c50d5c4735af854227 ] Signed-off-by: Gilberto Bertin <gilberto@isovalent.com> Signed-off-by: nathanjsweet <nathanjsweet@pm.me>	05 December 2021, 23:01:54 UTC
5b27de2	Joe Stringer	03 December 2021, 21:19:33 UTC	helm: Fix operator cloud image digests [ upstream commit 915a7f5f4727208bc4b0ff79725781a1f4039d13 ] Tested by applying this patch to v1.11 branch and validating that the digest matches the correct cloud image vs. the v1.11.0-rc3 images on Quay.io: $ helm template cilium ./install/kubernetes/cilium/ --version 1.10.0-rc3 \ --namespace kube-system --set eni.enabled=true --set ipam.mode=eni \ --set egressMasqueradeInterfaces=eth0 --set tunnel=disabled \ \| grep operator.*sha image: quay.io/cilium/operator-aws:v1.11.0-rc3@sha256:5ea0ccb6a866a5fb13f4bdfcf1ed8bce12a1355cb10a0914ea52af25f3a8f931 Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: nathanjsweet <nathanjsweet@pm.me>	05 December 2021, 23:01:54 UTC
97df361	Martynas Pumputis	03 December 2021, 15:09:29 UTC	service: Always allocate higher ID for svc/backend [ upstream commit 33bd95c6375c4a494b47fc3634a1eb0a8892660a ] Previously, it was possible that a backend or a service would get allocated ID, which would be ID_backend_A < ID < ID_backend_B. This could have happened after cilium-agent restart, as the nextID was not advanced upon the restoration of IDs. This could have led to situations in which the per-packet LB could selected a backend which did not belong to a requested service when the following was fulfilled in the chronological order: 1. Previously the same client made the request to the service and the backend with ID_x was chosen. 2. The service endpoint (backend) with ID_x was removed. 3. cilium-agent was restarted. 4. A new service backend which does not belong to the initial service was created and got the ID_x allocated. 5. The CT_SERVICE entry for the old connection was not removed by the CT GC. 6. The same client made a new connection to the same service from the same src port. The above led the lb{4,6}_local() to select the wrong backend, as it found the CT_SERVICE entry with the backend ID_x. The advancement of the nextID upon the restoration only partly mitigates the issue. The real fix would be to introduce a match map which key would be (svc_id, backend_id), and it would be populated by the agent. The lb{4,6}_local() routines would consult the map to detect whether the backend belongs to the service. Signed-off-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: nathanjsweet <nathanjsweet@pm.me>	05 December 2021, 23:01:54 UTC
b423190	Joe Stringer	02 December 2021, 02:29:35 UTC	aws: Disable flaky test [ upstream commit 0c7fe95b2fb16964cb932f82cedfd33ff1502c5c ] This test has been flaky for well over a year now, see issue 11560. Track re-enablement in https://github.com/cilium/cilium/projects/173 Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: nathanjsweet <nathanjsweet@pm.me>	03 December 2021, 21:11:31 UTC
7201336	Joe Stringer	02 December 2021, 02:18:16 UTC	test: Quarantine Secondary nodeport device tests [ upstream commit 2d7602e9aeb8123d61922a862b726284106853e1 ] See issue 18072 for more details about the flaky test. Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: nathanjsweet <nathanjsweet@pm.me>	03 December 2021, 21:11:31 UTC
81bef41	Aditi Ghag	02 December 2021, 16:25:37 UTC	test: Extend coredns clusterrole with additional resource permissions [ upstream commit 854bb8601e420f2087f2f54e1890aae976f464da ] Commit 398d55cd didn't add permissions for `endpointslices` resource to the coredns `cluterrole` on k8s < 1.20. As a result, core-dns deployments failed on the these versions with the error - `2021-11-30T14:09:43.349414540Z E1130 14:09:43.349292 1 reflector.go:138] pkg/mod/k8s.io/client-go@v0.20.2/tools/cache/reflector.go:167: Failed to watch v1beta1.EndpointSlice: failed to list v1beta1.EndpointSlice: endpointslices.discovery.k8s.io is forbidden: User "system:serviceaccount:kube-system:coredns" cannot list resource "endpointslices" in API group "discovery.k8s.io" at the cluster scope` Fixes: 398d55cd Signed-off-by: Aditi Ghag <aditi@cilium.io> Signed-off-by: nathanjsweet <nathanjsweet@pm.me>	03 December 2021, 21:11:31 UTC
c5fd114	André Martins	01 December 2021, 22:20:00 UTC	test/helpers: use rc.0 as the default version of kubectl [ upstream commit 75fbebbfbb5de9a591042386a40e6fab4eafaac9 ] Since we only update the Kubernetes version tested on our CI when the first RC is announced we should use that binary instead of the `.0` as the `.0` is not available at the time the rc.0 is released. Fixes: 61812551f659 ("test: ensure kubectl version is available for test run") Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: nathanjsweet <nathanjsweet@pm.me>	03 December 2021, 21:11:31 UTC
9d87903	André Martins	01 December 2021, 22:19:03 UTC	Revert "test/helpers: fix ensure kubectl version to work for RCs" [ upstream commit 6c432fb1f73f4f099a6059700cfb0c1ed72ef7a2 ] This reverts commit bb6ef27c7c3628e5cd22072caaae5e0c399a31a5. Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: nathanjsweet <nathanjsweet@pm.me>	03 December 2021, 21:11:31 UTC
7df4a2e	Martynas Pumputis	26 November 2021, 12:54:30 UTC	test/contrib: Bump CoreDNS version to 1.8.3 [ upstream commit 398d55cd94c0e16dc19b03c53f7b5040c1dd8f13 ] As reported in [1], Go's HTTP2 client < 1.16 had some serious bugs which could result in lost connections to kube-apiserver. Worse than this was that the client couldn't recover. In the case of CoreDNS the loose of connectivity to kube-apiserver was even not logged. I have validated this by adding the following rule on the node which was running the CoreDNS pod (6443 port as the socket-lb was doing the service xlation): iptables -I FORWARD 1 -m tcp --proto tcp --src $CORE_DNS_POD_IP \ --dport=6443 -j DROP After upgrading CoreDNS to the one which was compiled with Go >= 1.16, the pod was not only logging the errors, but also was able to recover from them in a fast way. An example of such an error: W1126 12:45:08.403311 1 reflector.go:436] pkg/mod/k8s.io/client-go@v0.20.2/tools/cache/reflector.go:167: watch of *v1.Endpoints ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding To determine the min vsn bump, I was using the following: for i in 1.7.0 1.7.1 1.8.0 1.8.1 1.8.2 1.8.3 1.8.4; do docker run --rm -ti "k8s.gcr.io/coredns/coredns:v$i" \ --version done CoreDNS-1.7.0 linux/amd64, go1.14.4, f59c03d CoreDNS-1.7.1 linux/amd64, go1.15.2, aa82ca6 CoreDNS-1.8.0 linux/amd64, go1.15.3, 054c9ae k8s.gcr.io/coredns/coredns:v1.8.1 not found: manifest unknown: k8s.gcr.io/coredns/coredns:v1.8.2 not found: manifest unknown: CoreDNS-1.8.3 linux/amd64, go1.16, 4293992 CoreDNS-1.8.4 linux/amd64, go1.16.4, 053c4d5 Hopefully, the bumped version will fix the CI flakes in which a service domain name is not available after 7min. In other words, CoreDNS is not able to resolve the name which means that it hasn't received update from the kube-apiserver for the service. [1]: https://github.com/kubernetes/kubernetes/issues/87615#issuecomment-803517109 Signed-off-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: nathanjsweet <nathanjsweet@pm.me>	03 December 2021, 21:11:31 UTC
12bb19b	Aditi Ghag	02 December 2021, 02:25:51 UTC	test: Replace `WaitUntilMatch` with `Eventually` [ upstream commit 1987b67b36b60dd72a9f3f8dc633f42efb1bbeae ] The library function provides the same functionality. Signed-off-by: Aditi Ghag <aditi@cilium.io> Signed-off-by: Quentin Monnet <quentin@isovalent.com>	02 December 2021, 23:11:56 UTC
0b524f6	Aditi Ghag	30 November 2021, 01:55:44 UTC	test: Fix graceful termination test flake [ upstream commit 8986930b4b9edaedc066343cbb30e523fa463e98 ] The graceful termination test apps [1] are updated to make the test logic to fix flakes. Specifically, added read and write deadlines while making socket calls on the server side. This way the server doesn't block on the socket calls when `SIGTERM` event is received on termination. While at it, also updated the test logic to validate that connectivity between client and server is intact at least for the configured `terminationGracePeriodInSeconds` duration. [1] https://github.com/cilium/graceful-termination-test-apps Signed-off-by: Aditi Ghag <aditi@cilium.io> Signed-off-by: Quentin Monnet <quentin@isovalent.com>	02 December 2021, 23:11:56 UTC
d27e750	Aditi Ghag	01 December 2021, 00:19:03 UTC	Revert "test/Services: Quarantine 'Checks graceful termination'" [ upstream commit 32b5bb2caeeb7c3156d2a1d1ecab68f9c149374b ] This reverts commit cbbea398 Signed-off-by: Aditi Ghag <aditi@cilium.io> Signed-off-by: Quentin Monnet <quentin@isovalent.com>	02 December 2021, 23:11:56 UTC
fcb0390	Sebastian Wicki	30 November 2021, 16:28:02 UTC	health: Use signal.NotifyContext [ upstream commit 6334f982cf531b24262739beada87a6c5240d65a ] This is a cleanup commit with no functional change. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> Signed-off-by: Quentin Monnet <quentin@isovalent.com>	02 December 2021, 23:11:56 UTC
dd8d524	Sebastian Wicki	30 November 2021, 13:38:55 UTC	ci: Set ClusterHealthPort in K8sHealth [ upstream commit cfd9da24370428f384b99e05be7be6b6da254f96 ] This sets a custom value for `cluster-health-port` in the K8sHealth test suite, to ensure we support setting a custom health port (e.g. used in OpenShift, which we do not test in our CI at the moment). Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> Signed-off-by: Quentin Monnet <quentin@isovalent.com>	02 December 2021, 23:11:56 UTC
77d3828	Sebastian Wicki	30 November 2021, 13:18:52 UTC	health: Fix cluster-health-port for health endpoint [ upstream commit c640c712f9415ad6425cd202f1807fb73c62bae9 ] To determine cluster health, Cilium exposes a HTTP server both on each node, as well as on the artificial health endpoint running on each node. The port used for this HTTP server is the same and can be configured via `cluster-health-port` (introduced in #16926) and defaults to 4240. This commit fixes a bug where the port specified by `cluster-health-port` was not passed to the Cilium health endpoint responder. Which meant that `cilium-health-responder` was always listening on the default port instead of the one configured by the user, while the probe tried to connect via `cluster-health-port`. This resulted in the cluster being reported us unhealthy whenever `cluster-health-port` was set to a non-default value (which is the case our OpenShift OLM for v1.11): ``` Nodes: gandro-7bmc2-worker-2-blgxf.c.cilium-dev.internal (localhost): Host connectivity to 10.0.128.2: ICMP to stack: OK, RTT=634.746µs HTTP to agent: OK, RTT=228.066µs Endpoint connectivity to 10.128.11.73: ICMP to stack: OK, RTT=666.83µs HTTP to agent: Get "http://10.128.11.73:9940/hello": dial tcp 10.128.11.73:9940: connect: connection refused ``` Fixes: e624868e165d ("health: Add a flag to set HTTP port") Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> Signed-off-by: Quentin Monnet <quentin@isovalent.com>	02 December 2021, 23:11:56 UTC
f0cdb57	André Martins	30 November 2021, 03:17:08 UTC	.github: add workflow to build beta images [ upstream commit 420028f0b0118b833cf2c7bcfeb3b4278833b9d5 ] With this new workflow, developers will be able to release beta features that are created on top of an existing release. The workflow to create a new beta image is as follow: 1. Push a branch into Cilium's repository with the name: `feature/<stable-branch>/<feature-name>` where `<stable-branch>` represents the branch where the feature is based on and `<feature-name>` represents the name of the feature being released. 2. Trigger the workflow by going into [1], use the workflow from `feature/<stable-branch>/<feature-name>` branch and write an image tag name. The tag name should be in the format `vX.Y.Z-<feature-name>` where `vX.Y.Z` is the version on which the branch is built on, and `<feature-name>` the name of the feature. 3. Ping one of the maintainers or anyone from the cilium-build team to approve the build and release process of this feature. [1] https://github.com/cilium/cilium/actions/workflows/build-images-beta.yaml Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Quentin Monnet <quentin@isovalent.com>	02 December 2021, 23:11:56 UTC
cdec70e	Chris Tarazi	02 November 2021, 19:36:12 UTC	daemon, node: Remove old, discarded router IPs from `cilium_host` [ upstream commit fcd00390c30c6eeffbe2fefa81d5b22e59397297 ] In the previous commit (referenced below), we forgot to remove the old router IPs from the actual interface (`cilium_host`). This caused connectivity issues in user environments where the discarded, stale IPs were reassigned to pods, causing the ipcache entries for those IPs to have `remote-node` identity. To fix this, we remove all IPs from the `cilium_host` interface that weren't restored during the router IP restoration process. This step correctly finalizes the restoration process for router IPs. Fixes: ff63b0775c0 ("daemon, node: Fix faulty router IP restoration logic") Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Quentin Monnet <quentin@isovalent.com>	02 December 2021, 23:11:56 UTC
cc1e75d	Chris Tarazi	03 November 2021, 22:22:54 UTC	node: Add missing fallback to router IP from CiliumNode for restoration [ upstream commit 02fa124f73e44cde4124c9f37325ce66c338aa98 ] Previously in the case that both router IPs from the filesystem and the CiliumNode resource were available, we missed a fallback to the CiliumNode IP, if the IP from the FS was outside the provided CIDR range. In other words, we returned early that the FS IP does not belong to the CIDR, without checking if the IP from the CiliumNode was a valid fallback. This commit adds the missing case logic and also adds more documentation to the function. Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Quentin Monnet <quentin@isovalent.com>	02 December 2021, 23:11:56 UTC
fba005e	Joe Stringer	30 November 2021, 03:50:04 UTC	test/DatapathConfiguration: Quarantine 'Encapsulation' [ upstream commit 0fc11885327277b0413f4cf449828c2b9774cb38 ] CC: Thomas Graf <thomas@cilium.io> Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Quentin Monnet <quentin@isovalent.com>	02 December 2021, 23:11:56 UTC
d935d8d	Joe Stringer	30 November 2021, 03:41:20 UTC	test/Services: Quarantine 'IPv6 masquerading across K8s nodes' [ upstream commit f77a8d8263664dcccd23217b20b123d28eb2dbb6 ] CC: Deepesh Pathak <deepshpathak@gmail.com> Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Quentin Monnet <quentin@isovalent.com>	02 December 2021, 23:11:56 UTC
bb918fe	Joe Stringer	30 November 2021, 03:37:10 UTC	test/Services: Quarantine 'Checks graceful termination' [ upstream commit cbbea398c3fdf77aefbcb75a74467b69b07a000e ] CC: Aditi Ghag <aditi@cilium.io> Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Quentin Monnet <quentin@isovalent.com>	02 December 2021, 23:11:56 UTC
50665ba	Joe Stringer	30 November 2021, 03:30:53 UTC	test/Services: Quarantine 'Tests with direct routing' [ upstream commit dea1343e749fcf998b0d283e738917d193491f50 ] CC: Martynas Pumputis <m@lambda.lt> Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Quentin Monnet <quentin@isovalent.com>	02 December 2021, 23:11:56 UTC
356fd51	Joe Stringer	30 November 2021, 02:36:46 UTC	test/Services: Quarantine 'Checks service on same node' [ upstream commit ca4ed8dac7f7f7454d6eff86c2158c1ed45d7752 ] CC: Sebastian Wicki <sebastian@isovalent.com> Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Quentin Monnet <quentin@isovalent.com>	02 December 2021, 23:11:56 UTC
be0d7c8	Joe Stringer	30 November 2021, 02:28:15 UTC	contrib: Add quarantine commit creation script [ upstream commit 34c1d6e3703b88745f4c7321cf572263bbeac159 ] usage: ./contrib/scripts/quarantine.sh "<focus-phrase>" This will generate a commit that quarantines the tests that match the specified focus phrase. It mostly works, but if the declarations for tests are made across multiple lines then it will be unable to locate the line to execute the quarantine. There's also a bit of a trick in selecting the right phrase to quarantine; often it will make sense to use the last set of words in a test name for a failing test. Typically these start with something like 'Checks ...' or 'Tests ...' so that only the inner-most 'It' or 'Context' statement is quarantined. However, if a more widespread issue is present then it may make sense to quarantine something using a phrase in the middle or even at the start of the test name. Other hints may be gathered by studying the Jenkins UI, the CI dashboard, and/or the GitHub issues page for issues labeled with 'ci/flake' which have been recently updated. Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Quentin Monnet <quentin@isovalent.com>	02 December 2021, 23:11:56 UTC
d6a1678	Chris Tarazi	24 November 2021, 20:03:19 UTC	test: Fix incorrect selector for netperf-service [ upstream commit 8002a50acba951b810b37b3748ec5ba90218fc63 ] Caught by random chance when using this manifest to test something locally. Might as well fix it in case someone uses this in the future and the service is not working as expected. AFAICT, no CI failures occurred from this typo because the Chaos test suite (only suite which uses this manifest) doesn't assert any traffic to the service, but rather to the netperf-server directly. Fixes: b4a3cf6abc6 ("Test: Run netperf in background while Cilium pod is being deleted") Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Quentin Monnet <quentin@isovalent.com>	02 December 2021, 23:11:56 UTC
3897a1a	Kornilios Kourtis	30 November 2021, 16:25:04 UTC	docs: KUBECONFIG for cilium-cli with k3s [ upstream commit 606b5fe9f49f1734d15fcd2d914e56ffa59a82e1 ] Clarify how cilium-cli can work with k3s Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com> Signed-off-by: Quentin Monnet <quentin@isovalent.com>	02 December 2021, 23:11:56 UTC
d0e5341	Paul Chaignon	29 November 2021, 18:18:37 UTC	bpf: Add WireGuard to complexity and compile tests [ upstream commit 04bf74c8444cc0b3aa5de380d2c542df540543a7 ] ENABLE_WIREGUARD was missing from the compile tests in bpf/Makefile and from the complexity tests in bpf/complexity-tests. We could therefore have missed new complexity issues or compilation errors occurring only when WireGuard is enabled. Fixes: 8930bebe ("daemon: Configure Wireguard for local node") Reported-by: Joe Stringer <joe@cilium.io> Signed-off-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Quentin Monnet <quentin@isovalent.com>	02 December 2021, 23:11:56 UTC
6a54d77	Tobias Klauser	30 November 2021, 12:08:22 UTC	bugtool: fix IP route debug gathering commands [ upstream commit e38e3c44f712b5f0ecf33efd1867c0ae16b241f7 ] Commit 8bcc4e5dd830 ("bugtool: avoid allocation on conversion of execCommand result to string") broke the `ip route show` commands because the change from `[]byte` to `string` causes the `%v` formatting verb to emit the raw byte slice, not the string. Fix this by using the `%s` formatting verb to make sure the argument gets interpreted as a string. Also fix another instance in `writeCmdToFile` where `fmt.Fprint` is now invoked with a byte slice. Grepping for `%v` in bugtool sources and manually inspecting all changes from commit 8bcc4e5dd830 showed no other instances where a byte slice could potentially end up being formatted in a wrong way. Fixes: 8bcc4e5dd830 ("bugtool: avoid allocation on conversion of execCommand result to string") Signed-off-by: Tobias Klauser <tobias@cilium.io> Signed-off-by: Quentin Monnet <quentin@isovalent.com>	02 December 2021, 23:11:56 UTC
59c7319	Daniel Borkmann	30 November 2021, 09:45:29 UTC	neigh, test: Bump max timeout for tests [ upstream commit 7376df3dada0e8114a62c1369d5d25ae90b94fdc ] There has been report that the neighbor tests took slightly longer than expected and while there was nothing wrong with them, the timeout kicked in and led to failure. Slighly bump it to avoid flakes like these. Fixes: #18013 Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Quentin Monnet <quentin@isovalent.com>	02 December 2021, 23:11:56 UTC
0d24037	Daniel Borkmann	30 November 2021, 09:39:08 UTC	neigh, test: Also retry upon temporary NUD_FAILED state [ upstream commit 98697f3bb0f74011e8a374476edd7ca794582236 ] Wasn't able to reproduce the flake even after running the test overnight. The only explanation I'd have is that there is a small/rare flake due to a temporary NUD_FAILED state where we won't retry again. Closes: #18004 Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Quentin Monnet <quentin@isovalent.com>	02 December 2021, 23:11:56 UTC
0cec63b	Gobinath Krishnamoorthy	05 November 2021, 01:06:48 UTC	Prometheus lint errors in operator metrics [ upstream commit 65a46b50da85958b3df8e3b432abd8ff47497168 ] Promtool identified following lint errors when running against operator metrics 1) cilium_operator_identity_gc_entries_total non-counter metrics should not have "_total" suffix 2) cilium_operator_identity_gc_runs_total non-counter metrics should not have "_total" suffix Add relevant changes in upgrade documentation for 1.10 and 1.11 Fixing both the non-counter metrics. Signed-off-by: Gobinath Krishnamoorthy <gobinathk@google.com> Signed-off-by: Quentin Monnet <quentin@isovalent.com>	02 December 2021, 23:11:56 UTC
33bb807	Alexandre Perrin	26 November 2021, 15:38:55 UTC	doc: use ipv4NativeRoutingCIDR instead of nativeRoutingCIDR [ upstream commit e03bfffd55466366289944dd087b9ae18593355f ] As the latter has been deprecated in favor of the former. Signed-off-by: Alexandre Perrin <alex@kaworu.ch> Signed-off-by: Quentin Monnet <quentin@isovalent.com>	02 December 2021, 23:11:56 UTC
2915156	Martynas Pumputis	26 November 2021, 19:35:23 UTC	ci: Restart pods when toggling KPR switch [ upstream commit 06d9441d49b0b25e86af58bac16281a6950cbc27 ] Previously, in the graceful backend termination test we switched to KPR=disabled and we didn't restart CoreDNS. Before the switch, CoreDNS@k8s2 -> kube-apiserver@k8s1 was handled by the socket-lb, so the outgoing packet was $CORE_DNS_IP -> $KUBE_API_SERVER_NODE_IP. The packet should have been BPF masq-ed. After the switch, the BPF masq is no longer in place, so the packets from CoreDNS are subject to the iptables' masquerading (they can be either dropped by the invalid rule or masqueraded to some other port). Combined with CoreDNS unable to recover from connectivity errors [1], the CoreDNS was no longer able to receive updates from the kube-apiserver, thus NXDOMAIN errors for the new service name. To avoid such flakes, forcefully restart the DNS pods if the KPR setting change is detected. [1]: https://github.com/cilium/cilium/pull/18018 Signed-off-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: Quentin Monnet <quentin@isovalent.com>	02 December 2021, 23:11:56 UTC
849079d	adamzhoul	26 November 2021, 03:22:01 UTC	docs: add registry (quay.io/) for pre-loading images for kind [ upstream commit 4758bef62869d60df45a383c4be813ebed1343c8 ] in doc, it recommends docker pull image, but the command is : docker pull cilium/cilium:\|IMAGE_TAG\| this will download from docker.io However, in operator, it loads images from quay.io we should keep them the same, otherwise, we download for nothing. Signed-off-by: adamzhoul <adamzhoul186@gmail.com> Signed-off-by: Quentin Monnet <quentin@isovalent.com>	02 December 2021, 23:11:56 UTC
7e3b0b3	Austin Cawley-Edwards	22 November 2021, 21:40:58 UTC	docs: correct ec2 modify net iface action [ upstream commit ce45bc36946120ee5495be23ccc753d5e1910c8c ] `ModifyNetworkInterface` -> `ModifyNetworkInterfaceAttribute` see: https://docs.aws.amazon.com/AWSEC2/latest/APIReference/API_ModifyNetworkInterfaceAttribute.html Signed-off-by: austin ce <austin.cawley@gmail.com> Signed-off-by: Quentin Monnet <quentin@isovalent.com>	02 December 2021, 23:11:56 UTC
32b4e15	Weilong Cui	16 November 2021, 22:21:31 UTC	Adds a locked function to do ipcache delete on metadata match [ upstream commit 3650544c5d213586843c8376f8741b430979dae9 ] Fixes potential racing condition introduced in PR #17161. Suggested-by: Joe Stringer <joe@cilium.io> Signed-off-by: Weilong Cui <cuiwl@google.com> Signed-off-by: Quentin Monnet <quentin@isovalent.com>	02 December 2021, 23:11:56 UTC
cb07b26	Dmitry Kharitonov	26 November 2021, 19:47:51 UTC	ui: v0.8.3 [ upstream commit ff8a7e63c1f9a4681910a5d8c0e89f270453c4ed ] [ Backport note: Ran the following commands and committed the changes: - "make -C install/kubernetes" - "make -C Documentation update-helm-values" ] Signed-off-by: Dmitry Kharitonov <dmitry@isovalent.com> Signed-off-by: Quentin Monnet <quentin@isovalent.com>	02 December 2021, 23:11:56 UTC
f185a6d	Nicolas Busseneau	30 November 2021, 18:04:42 UTC	workflows: fix build-and-push-with-qemu on v1.11 We removed the PR bits in `build-and-push-with-qemu` in 5ae34ddc6cb28cfe9a6fae3679462ea63ba0be3a, which means the job should use `${{ github.sha }}` directly. Fixes: 8bbae9cb4323bf3dd94936e355b0c2aad96d0df8 Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>	02 December 2021, 22:40:27 UTC
7ed8a42	Nicolas Busseneau	30 November 2021, 17:16:34 UTC	mlh: update Jenkins jobs following 1.23 support Following merge of #18027, we now support K8s 1.23 on branch 1.11 and have rotated the Jenkins test jobs as follow: - Changed: Kernel 4.9 testing on K8s 1.23 (instead of 1.22) - Changed: Kernel 4.19 testing on K8s 1.22 (instead of 1.21) - Changed: Kernel 5.4 testing on K8s 1.21 (instead of 1.20) - Added: Kernel 4.9 testing on K8s 1.21 See the Table of Truth™️ for up to date status on CI testing: https://docs.google.com/spreadsheets/d/1TThkqvVZxaqLR-Ela4ZrcJ0lrTJByCqrbdCjnI32_X0/edit#gid=0 Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>	30 November 2021, 17:46:06 UTC
887f3d7	André Martins	25 November 2021, 23:35:28 UTC	test/helpers: fix ensure kubectl version to work for RCs [ upstream commit bb6ef27c7c3628e5cd22072caaae5e0c399a31a5 ] Fixes: 61812551f659 ("test: ensure kubectl version is available for test run") Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Quentin Monnet <quentin@isovalent.com>	30 November 2021, 16:58:37 UTC
c02e1e5	André Martins	25 November 2021, 03:06:49 UTC	Update k8s tests and libraries to v1.23.0-rc.0 [ upstream commit c56075dec217db86a2c259be34bf7794be027e90 ] Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Quentin Monnet <quentin@isovalent.com>	30 November 2021, 16:58:37 UTC
ab1c408	Sebastian Wicki	17 November 2021, 17:47:35 UTC	ipam/crd: Fix spurious CiliumNode update status failures [ upstream commit 18b10b49fc34b9748f7b86fda872e3cb1375a859 ] When running in CRD-based IPAM modes (Alibaba, Azure, ENI, CRD), it is possible to observe spurious "Unable to update CiliumNode custom resource" failures in the cilium-agent. The full error message is as follows: "Operation cannot be fulfilled on ciliumnodes.cilium.io <node>: the object has been modified; please apply your changes to the latest version and try again". It means that the Kubernetes `UpdateStatus` call has failed because the local `ObjectMeta.ResourceVersion` of submitted CiliumNode version is out of date. In the presence of races, this error is expected and will resolve itself once the agent receives a more recent version of the object with the new resource version. However, it is possible that the resource version of a `CiliumNode` object is bumped even though the `Spec` or `Status` of the `CiliumNode` remains the same. This for examples happens when `ObjectMeta.ManagedFields` is updated by the Kubernetes apiserver. Unfortunately, `CiliumNode.DeepEqual` does _not_ consider any `ObjectMeta` fields (including the resource version). Therefore two objects with different resource versions are considered the same by the `CiliumNode` watcher used by IPAM. But to be able to successfully call `UpdateStatus` we need to know the most recent resource version. Otherwise, `UpdateStatus` will always fail until the `CiliumNode` object is updated externally for some reason. Therefore, this commit modifies the logic to always store the most recent version of the `CiliumNode` object, even if `Spec` or `Status` has not changed. This in turn allows `nodeStore.refreshNode` (which invokes `UpdateStatus`) to always work on the most recently observed resource version. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> Signed-off-by: Quentin Monnet <quentin@isovalent.com>	30 November 2021, 16:58:37 UTC
b544383	Gilberto Bertin	25 November 2021, 19:13:32 UTC	egressgateway: refactor manager logic [ upstream commit ed73a3174c868dde427c6a11194adc5f59f4a0f1 ] This commit refactors the egress gateway manager in order to provide a single `reconcile()` method which will be invoked on all events received by the manager. This method is responsible for adding and removing entries to and from the egress policy map. In addition to this, the manager will now wait for the k8s cache to be fully synced before running its first reconciliation, in order to always have the egress_policy map in a consistent state with the k8s configuration. Fixes: #17380 Fixes: #17753 Signed-off-by: Gilberto Bertin <gilberto@isovalent.com> Signed-off-by: Quentin Monnet <quentin@isovalent.com>	30 November 2021, 16:58:37 UTC
65e6bec	Gilberto Bertin	25 November 2021, 17:08:06 UTC	daemon: add WaitUntilK8sCacheIsSynced method [ upstream commit d9b60f7102777c84f4917a6953be5e3538084c65 ] which will block the caller until the agent has fully sync its k8s cache. Signed-off-by: Gilberto Bertin <gilberto@isovalent.com> Signed-off-by: Quentin Monnet <quentin@isovalent.com>	30 November 2021, 16:58:37 UTC
83f2560	Gilberto Bertin	24 November 2021, 13:44:43 UTC	docs: add a note on egress gateway upgrade impact for 1.11 [ upstream commit cdb4b461560565f13cc574ab0f70bf40d4876c0c ] Signed-off-by: Gilberto Bertin <gilberto@isovalent.com> Signed-off-by: Quentin Monnet <quentin@isovalent.com>	30 November 2021, 16:58:37 UTC
b8506a1	Gilberto Bertin	24 November 2021, 14:11:10 UTC	bpf: rename egress policy map and its fields [ upstream commit 2b079593b04e5fb1fe2dbc6095921b15415e4a1f ] to make it more clear it's related to the egress gateway policies Signed-off-by: Gilberto Bertin <gilberto@isovalent.com> Signed-off-by: Quentin Monnet <quentin@isovalent.com>	30 November 2021, 16:58:37 UTC
7b062d3	Gilberto Bertin	25 November 2021, 19:24:26 UTC	maps: switch egressmap to cilium/ebpf package [ upstream commit 3ba8e6e481fe6601747c62369790bcc9d79fa0b6 ] Signed-off-by: Gilberto Bertin <gilberto@isovalent.com> Signed-off-by: Quentin Monnet <quentin@isovalent.com>	30 November 2021, 16:58:37 UTC
7293c11	Martynas Pumputis	19 November 2021, 07:27:18 UTC	docs: Mention service topology in KPR guide [ upstream commit 0b27f8046f4a33ecde68472cf7effaa9869ebd9e ] Signed-off-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: Quentin Monnet <quentin@isovalent.com>	30 November 2021, 16:58:37 UTC
5a9d369	Martynas Pumputis	19 November 2021, 07:20:22 UTC	helm: Add loadBalancer.serviceTopology [ upstream commit 545d94c33c0c45b349c93fa83117d50895ab9fb5 ] This enables k8s service topology aware hints. Signed-off-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: Quentin Monnet <quentin@isovalent.com>	30 November 2021, 16:58:37 UTC
fc0efe5	Martynas Pumputis	18 November 2021, 09:46:32 UTC	k8s: Add unit tests for topology aware hints [ upstream commit ed9c7cebaae73b3fc5b5f9019a71a9f2b4d245d3 ] Signed-off-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: Quentin Monnet <quentin@isovalent.com>	30 November 2021, 16:58:37 UTC
355c86b	Martynas Pumputis	18 November 2021, 09:46:15 UTC	k8s: Fix endpoints returned by update routine [ upstream commit 8442d6e2323e7a78b98df60bf87bc3b715719e65 ] Previously, the function returned all passed endpoints instead the ones which were filtered and correlated by correlateEndpoints(). The change is no-op, as nobody was consuming the return value of UpdateEndpoint*(). Signed-off-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: Quentin Monnet <quentin@isovalent.com>	30 November 2021, 16:58:37 UTC
2d383d6	Martynas Pumputis	17 November 2021, 13:05:54 UTC	k8s: Implement svc topology aware hints [ upstream commit 6ddfbd27def16d8f3cab04026dad9af9db42f678 ] This commit implements the topology aware hints for k8s services described in [1]. The idea of the feature is to provision service endpoints only if their zone hints matches the self node's "topology.kubernetes.io/zone" label value. The main benefit is that it allows service traffic to prefer zone-local endpoints which could be used e.g., to avoid costs associated with crossing cloud network zones. Also, it might yield better performance for service traffic, as the nearer endpoints are preferred. The hints for endpoints is set by kube-controller-manager. The heuristics are described in [1]. The hints are set in the EndpointsliceV1 object (this is the reason why we don't implement the hints parsing for other endpoint object types). I considered implementing the feature in "pkg/service" instead of "pkg/k8s". The main reasons for choosing the latter is (1) that this feature is k8s specific and (2) that in the near future we probably will merge "pkg/service" with "pkg/maps/lbmap", as both deal with the low-level datapath specific details. [1]: https://kubernetes.io/docs/concepts/services-networking/topology-aware-hints/ Signed-off-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: Quentin Monnet <quentin@isovalent.com>	30 November 2021, 16:58:37 UTC
f1d770a	Martynas Pumputis	17 November 2021, 19:52:22 UTC	k8s: Extend Node subscriber to accept swg [ upstream commit 14b70adad3a4e95f66cf7fbcafedfcb0346f4353 ] The swg (stoppable wait group) is used by the service_cache.go when syncing k8s caches upon the agent startup. Until now, service_cache was consuming only Service and Endpoint* objects. However, for the upcoming service topology aware hints feature we need to add (self) Node object as well to the list. This is because the feature needs to get the "topology.kubernetes.io/zone" of the self Node. Signed-off-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: Quentin Monnet <quentin@isovalent.com>	30 November 2021, 16:58:37 UTC
17d5daf	Martynas Pumputis	17 November 2021, 10:11:51 UTC	daemon: Add --enable-service-topology [ upstream commit 2ddf5e706cdf0a08274aa7d2ce2e7c4d5e6beda4 ] It's going to be used by the k8s service topology aware hints feature to be implemented in the next commit. Signed-off-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: Quentin Monnet <quentin@isovalent.com>	30 November 2021, 16:58:37 UTC
9d353df	Martynas Pumputis	17 November 2021, 09:49:01 UTC	k8s: Add Hints.ForZone field to slim Endpoint [ upstream commit 2ac1403949b5241f035c1ed6faecf446cde34d36 ] This is going to be used by the upcoming (service) topology aware hints feature. Signed-off-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: Quentin Monnet <quentin@isovalent.com>	30 November 2021, 16:58:37 UTC
db03b99	Joe Stringer	24 November 2021, 01:45:48 UTC	docs: Add cilium "managed pods" example [ upstream commit c46a0280127ef6f6079c27a9aca5d1228112d3db ] This example demonstrates a good example of when all pods are managed by Cilium. Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Quentin Monnet <quentin@isovalent.com>	30 November 2021, 16:58:37 UTC
36e0acd	Joe Stringer	24 November 2021, 01:41:09 UTC	docs: Document recent feature deprecations [ upstream commit 4ce5cef0ca808368a88a84c7f8b74854940e6854 ] Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Quentin Monnet <quentin@isovalent.com>	30 November 2021, 16:58:37 UTC
c2bb31a	Joe Stringer	24 November 2021, 01:38:00 UTC	Remove remaining references to Mesos [ upstream commit b0a9510bb508a778ce828bdfccbec0faaa02a9a1 ] Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Quentin Monnet <quentin@isovalent.com>	30 November 2021, 16:58:37 UTC
6995ea1	Joe Stringer	24 November 2021, 01:33:46 UTC	docs: Deprecate 'cilium policy trace' [ upstream commit 747ef3acf6a202ae151f53a3b52ffc0afad98bd5 ] Support for the various policy types in the in-pod 'cilium policy trace' command has not kept pace with the development on the core policy model. Deprecate this tool so that users are not misled by the confusing and often wrong policy trace output. Users are suggested to use one of the alternative methods to reason about their policies: * https://app.networkpolicy.io * https://docs.cilium.io/en/stable/gettingstarted/policy-creation/ Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Quentin Monnet <quentin@isovalent.com>	30 November 2021, 16:58:37 UTC
1b9650b	Joe Stringer	24 November 2021, 01:14:06 UTC	docs: Deprecate Consul support [ upstream commit fb65f8c55b085936bfb3af16f475f2187a2d2383 ] Consul support has been primarily used for developer environments in local testing, but we are not aware of any users running clusters depending on Consul for Cilium control plane co-ordination. Deprecate it in preparation to remove support in a future release, to minimize the maintenance burden of this code. Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Quentin Monnet <quentin@isovalent.com>	30 November 2021, 16:58:37 UTC
1662658	Joe Stringer	24 November 2021, 00:51:19 UTC	docs: Deprecate IPVLAN support [ upstream commit abb1d069298232b8326795418f2b5a8834a23ea9 ] IPVLAN support has a list of caveats in terms of features, few users and fewer maintainers. Recently, we improved virtual ethernet support in the kernel to gain many of the performance advantages of IPVLAN. Unless there is strong community support for maintaining this feature going forward, it will make sense to remove support in the v1.12 development cycle. Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Quentin Monnet <quentin@isovalent.com>	30 November 2021, 16:58:37 UTC
bb76b35	André Martins	24 November 2021, 03:13:41 UTC	docs: remove mention of 250 nodes for kvstore [ upstream commit 7eaafc8afb35260372a492636031b1336158482b ] Most of the use cases don't require setting up a KVstore to use Cilium. This commit updates the documentation to reflect the current situations where someone would like to set up a KVStore. Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Quentin Monnet <quentin@isovalent.com>	30 November 2021, 16:58:37 UTC
c1c7caf	Aditi Ghag	23 November 2021, 06:33:39 UTC	daemon/cmd: Extend Cilium status with graceful termination flag [ upstream commit eeb7f1b464b283dc0c2b2bcada2a149b40372437 ] The status only reflects the value of the flag 'enable-k8s-terminating-endpoints'. Per the (kube-proxy-replacement) documentation, the relevant feature gate still needs to be enabled in kubernetes deployments >= v1.20. Signed-off-by: Aditi Ghag <aditi@cilium.io> Signed-off-by: Quentin Monnet <quentin@isovalent.com>	30 November 2021, 16:58:37 UTC

Newer
Older