sort by:
Revision Author Date Message Commit Date
373d2ca WIP Signed-off-by: Martynas Pumputis <m@lambda.lt> 07 December 2021, 14:33:25 UTC
37c3499 Chore: Change ip address used in testing to RFC1918 address space Fixes: #17701 Signed-off-by: Aniruddha Amit Dutta <duttaaniruddha31@gmail.com> 06 December 2021, 01:59:11 UTC
f458639 build(deps): bump gopkg.in/ini.v1 from 1.66.0 to 1.66.2 Bumps [gopkg.in/ini.v1](https://github.com/go-ini/ini) from 1.66.0 to 1.66.2. - [Release notes](https://github.com/go-ini/ini/releases) - [Commits](https://github.com/go-ini/ini/compare/v1.66.0...v1.66.2) --- updated-dependencies: - dependency-name: gopkg.in/ini.v1 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> 06 December 2021, 01:57:53 UTC
1802445 docs: prevent search engines from indexing old branches To avoid users find on old results on search engines, this commit adds the required configuration for that to happen. Based on https://stackoverflow.com/questions/63542354/readthedocs-robots-txt-and-sitemap-xml Signed-off-by: André Martins <andre@cilium.io> 04 December 2021, 00:46:05 UTC
0027542 docs: fix eksctl ClusterConfig to allow copy This commit fixes the eksctl ClusterConfig to allow for copy. It is merely a workaround for now until a proper fix is available. Fixes: 706c9009dc39 ("docs: re-write docs to create clusters with tainted nodes") Signed-off-by: André Martins <andre@cilium.io> 03 December 2021, 22:18:50 UTC
cc1ded8 docs: Clarify deprecated "prefilter-devices" Make it clear how users can select devices for the prefiltering. Reported-by: André Martins <andre@cilium.io> Signed-off-by: Martynas Pumputis <m@lambda.lt> 03 December 2021, 21:48:10 UTC
6bd3833 images: Bump Hubble CLI to v0.9.0 This bumps the Hubble CLI to the recently released version 0.9.0. Hubble CLI v0.9.0 has been released to include the Hubble protobuf API changes present in Cilium v1.11-rc3 and thus is intended to be bundled with the final Cilium v1.11 release. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> 03 December 2021, 21:47:31 UTC
ce68d37 docs: cleanup and tidy up the 1.11 upgrade guide This upgrade guide contained all other versions in it. To prevent users from mistakenly reading an old upgrade guide, we should remove those leftovers. Signed-off-by: André Martins <andre@cilium.io> 03 December 2021, 21:47:15 UTC
b0ab425 doc: add upgrade note about nativeRoutingCIDR deprecation Missed by e03bfffd55466366289944dd087b9ae18593355f Signed-off-by: Alexandre Perrin <alex@kaworu.ch> 03 December 2021, 21:46:59 UTC
2273b04 docs: clarify upgrade impact for clients using an egress gateway Signed-off-by: Gilberto Bertin <gilberto@isovalent.com> 03 December 2021, 21:46:50 UTC
915a7f5 helm: Fix operator cloud image digests Tested by applying this patch to v1.11 branch and validating that the digest matches the correct cloud image vs. the v1.11.0-rc3 images on Quay.io: $ helm template cilium ./install/kubernetes/cilium/ --version 1.10.0-rc3 \ --namespace kube-system --set eni.enabled=true --set ipam.mode=eni \ --set egressMasqueradeInterfaces=eth0 --set tunnel=disabled \ | grep operator.*sha image: quay.io/cilium/operator-aws:v1.11.0-rc3@sha256:5ea0ccb6a866a5fb13f4bdfcf1ed8bce12a1355cb10a0914ea52af25f3a8f931 Signed-off-by: Joe Stringer <joe@cilium.io> 03 December 2021, 21:34:34 UTC
33bd95c service: Always allocate higher ID for svc/backend Previously, it was possible that a backend or a service would get allocated ID, which would be ID_backend_A < ID < ID_backend_B. This could have happened after cilium-agent restart, as the nextID was not advanced upon the restoration of IDs. This could have led to situations in which the per-packet LB could selected a backend which did not belong to a requested service when the following was fulfilled in the chronological order: 1. Previously the same client made the request to the service and the backend with ID_x was chosen. 2. The service endpoint (backend) with ID_x was removed. 3. cilium-agent was restarted. 4. A new service backend which does not belong to the initial service was created and got the ID_x allocated. 5. The CT_SERVICE entry for the old connection was not removed by the CT GC. 6. The same client made a new connection to the same service from the same src port. The above led the lb{4,6}_local() to select the wrong backend, as it found the CT_SERVICE entry with the backend ID_x. The advancement of the nextID upon the restoration only partly mitigates the issue. The real fix would be to introduce a match map which key would be (svc_id, backend_id), and it would be populated by the agent. The lb{4,6}_local() routines would consult the map to detect whether the backend belongs to the service. Signed-off-by: Martynas Pumputis <m@lambda.lt> 03 December 2021, 18:42:56 UTC
a1fdcb9 Makefile.docker: replace hardcode "docker" command with $(CONTAINER_ENGINE) Fix building image broken when specify a custom container engine or specify the priviledged command in some environments, such as: $ CONTAINER_ENGINE="sudo docker" DOCKER_IMAGE_TAG=xxx make docker-cilium-image -j4 Signed-off-by: ArthurChiao <arthurchiao@hotmail.com> 02 December 2021, 18:49:02 UTC
0c7fe95 aws: Disable flaky test This test has been flaky for well over a year now, see issue 11560. Track re-enablement in https://github.com/cilium/cilium/projects/173 Signed-off-by: Joe Stringer <joe@cilium.io> 02 December 2021, 18:33:11 UTC
2d7602e test: Quarantine Secondary nodeport device tests See issue 18072 for more details about the flaky test. Signed-off-by: Joe Stringer <joe@cilium.io> 02 December 2021, 18:32:59 UTC
a7855a5 .github: Disable EKS encryption tests These tests are flaky, see issue 16938. Tracking to fix them in https://github.com/cilium/cilium/projects/173. Signed-off-by: Joe Stringer <joe@cilium.io> 02 December 2021, 18:32:20 UTC
854bb86 test: Extend coredns clusterrole with additional resource permissions Commit 398d55cd didn't add permissions for `endpointslices` resource to the coredns `cluterrole` on k8s < 1.20. As a result, core-dns deployments failed on the these versions with the error - `2021-11-30T14:09:43.349414540Z E1130 14:09:43.349292 1 reflector.go:138] pkg/mod/k8s.io/client-go@v0.20.2/tools/cache/reflector.go:167: Failed to watch *v1beta1.EndpointSlice: failed to list *v1beta1.EndpointSlice: endpointslices.discovery.k8s.io is forbidden: User "system:serviceaccount:kube-system:coredns" cannot list resource "endpointslices" in API group "discovery.k8s.io" at the cluster scope` Fixes: 398d55cd Signed-off-by: Aditi Ghag <aditi@cilium.io> 02 December 2021, 18:03:33 UTC
af6b795 dependabot: disable all AWS package updates This will prevent dependabot from updating subpackages of github.com/aws/aws-sdk-go-v2 as well as github.com/aws/smithy-go Fixes: 4762a4abae5a ("dependabot: disable cloud provider SDK updates") Signed-off-by: Tobias Klauser <tobias@cilium.io> 02 December 2021, 14:44:33 UTC
75fbebb test/helpers: use rc.0 as the default version of kubectl Since we only update the Kubernetes version tested on our CI when the first RC is announced we should use that binary instead of the `.0` as the `.0` is not available at the time the rc.0 is released. Fixes: 61812551f659 ("test: ensure kubectl version is available for test run") Signed-off-by: André Martins <andre@cilium.io> 02 December 2021, 11:24:51 UTC
6c432fb Revert "test/helpers: fix ensure kubectl version to work for RCs" This reverts commit bb6ef27c7c3628e5cd22072caaae5e0c399a31a5. Signed-off-by: André Martins <andre@cilium.io> 02 December 2021, 11:24:51 UTC
1987b67 test: Replace `WaitUntilMatch` with `Eventually` The library function provides the same functionality. Signed-off-by: Aditi Ghag <aditi@cilium.io> 02 December 2021, 09:50:11 UTC
8986930 test: Fix graceful termination test flake The graceful termination test apps [1] are updated to make the test logic to fix flakes. Specifically, added read and write deadlines while making socket calls on the server side. This way the server doesn't block on the socket calls when `SIGTERM` event is received on termination. While at it, also updated the test logic to validate that connectivity between client and server is intact at least for the configured `terminationGracePeriodInSeconds` duration. [1] https://github.com/cilium/graceful-termination-test-apps Signed-off-by: Aditi Ghag <aditi@cilium.io> 02 December 2021, 09:50:11 UTC
32b5bb2 Revert "test/Services: Quarantine 'Checks graceful termination'" This reverts commit cbbea398 Signed-off-by: Aditi Ghag <aditi@cilium.io> 02 December 2021, 09:50:11 UTC
8fd2bb9 bpf/Makefile: Remove unnecessary shell references These Makefiles were sprinkled with semi-colons, causing the overall statement to be run as a series of commands in a shell and to ignore the result. Remove them and rely on regular Makefile statements. Signed-off-by: Joe Stringer <joe@cilium.io> 01 December 2021, 14:05:36 UTC
47dab33 bpf: Quieten mock targets Make the bpf mock testing framework targets respect the user's verbosity flag. Signed-off-by: Joe Stringer <joe@cilium.io> 01 December 2021, 14:05:36 UTC
a663671 docs: Remove manual installation instruction for `kind` clustermesh The clustermesh guide has installation instructions using cilium CLI so let's use that. Signed-off-by: Aditi Ghag <aditi@cilium.io> 01 December 2021, 09:41:44 UTC
6334f98 health: Use signal.NotifyContext This is a cleanup commit with no functional change. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> 30 November 2021, 23:18:49 UTC
cfd9da2 ci: Set ClusterHealthPort in K8sHealth This sets a custom value for `cluster-health-port` in the K8sHealth test suite, to ensure we support setting a custom health port (e.g. used in OpenShift, which we do not test in our CI at the moment). Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> 30 November 2021, 23:18:49 UTC
c640c71 health: Fix cluster-health-port for health endpoint To determine cluster health, Cilium exposes a HTTP server both on each node, as well as on the artificial health endpoint running on each node. The port used for this HTTP server is the same and can be configured via `cluster-health-port` (introduced in #16926) and defaults to 4240. This commit fixes a bug where the port specified by `cluster-health-port` was not passed to the Cilium health endpoint responder. Which meant that `cilium-health-responder` was always listening on the default port instead of the one configured by the user, while the probe tried to connect via `cluster-health-port`. This resulted in the cluster being reported us unhealthy whenever `cluster-health-port` was set to a non-default value (which is the case our OpenShift OLM for v1.11): ``` Nodes: gandro-7bmc2-worker-2-blgxf.c.cilium-dev.internal (localhost): Host connectivity to 10.0.128.2: ICMP to stack: OK, RTT=634.746µs HTTP to agent: OK, RTT=228.066µs Endpoint connectivity to 10.128.11.73: ICMP to stack: OK, RTT=666.83µs HTTP to agent: Get "http://10.128.11.73:9940/hello": dial tcp 10.128.11.73:9940: connect: connection refused ``` Fixes: e624868e165d ("health: Add a flag to set HTTP port") Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> 30 November 2021, 23:18:49 UTC
420028f .github: add workflow to build beta images With this new workflow, developers will be able to release beta features that are created on top of an existing release. The workflow to create a new beta image is as follow: 1. Push a branch into Cilium's repository with the name: `feature/<stable-branch>/<feature-name>` where `<stable-branch>` represents the branch where the feature is based on and `<feature-name>` represents the name of the feature being released. 2. Trigger the workflow by going into [1], use the workflow from `feature/<stable-branch>/<feature-name>` branch and write an image tag name. The tag name should be in the format `vX.Y.Z-<feature-name>` where `vX.Y.Z` is the version on which the branch is built on, and `<feature-name>` the name of the feature. 3. Ping one of the maintainers or anyone from the cilium-build team to approve the build and release process of this feature. [1] https://github.com/cilium/cilium/actions/workflows/build-images-beta.yaml Signed-off-by: André Martins <andre@cilium.io> 30 November 2021, 22:43:38 UTC
fcd0039 daemon, node: Remove old, discarded router IPs from `cilium_host` In the previous commit (referenced below), we forgot to remove the old router IPs from the actual interface (`cilium_host`). This caused connectivity issues in user environments where the discarded, stale IPs were reassigned to pods, causing the ipcache entries for those IPs to have `remote-node` identity. To fix this, we remove all IPs from the `cilium_host` interface that weren't restored during the router IP restoration process. This step correctly finalizes the restoration process for router IPs. Fixes: ff63b0775c0 ("daemon, node: Fix faulty router IP restoration logic") Signed-off-by: Chris Tarazi <chris@isovalent.com> 30 November 2021, 21:32:22 UTC
02fa124 node: Add missing fallback to router IP from CiliumNode for restoration Previously in the case that both router IPs from the filesystem and the CiliumNode resource were available, we missed a fallback to the CiliumNode IP, if the IP from the FS was outside the provided CIDR range. In other words, we returned early that the FS IP does not belong to the CIDR, without checking if the IP from the CiliumNode was a valid fallback. This commit adds the missing case logic and also adds more documentation to the function. Signed-off-by: Chris Tarazi <chris@isovalent.com> 30 November 2021, 21:32:22 UTC
1a49543 install: add tolerations for the certgen cronjob Enable pod tolerations for the certgen cronjob pods to allow jobs on tainted nodes. Signed-off-by: David Wolffberg <davidwolffberg@gmail.com> 30 November 2021, 21:31:18 UTC
0fc1188 test/DatapathConfiguration: Quarantine 'Encapsulation' CC: Thomas Graf <thomas@cilium.io> Signed-off-by: Joe Stringer <joe@cilium.io> 30 November 2021, 18:28:14 UTC
f77a8d8 test/Services: Quarantine 'IPv6 masquerading across K8s nodes' CC: Deepesh Pathak <deepshpathak@gmail.com> Signed-off-by: Joe Stringer <joe@cilium.io> 30 November 2021, 18:28:14 UTC
cbbea39 test/Services: Quarantine 'Checks graceful termination' CC: Aditi Ghag <aditi@cilium.io> Signed-off-by: Joe Stringer <joe@cilium.io> 30 November 2021, 18:28:14 UTC
dea1343 test/Services: Quarantine 'Tests with direct routing' CC: Martynas Pumputis <m@lambda.lt> Signed-off-by: Joe Stringer <joe@cilium.io> 30 November 2021, 18:28:14 UTC
ca4ed8d test/Services: Quarantine 'Checks service on same node' CC: Sebastian Wicki <sebastian@isovalent.com> Signed-off-by: Joe Stringer <joe@cilium.io> 30 November 2021, 18:28:14 UTC
34c1d6e contrib: Add quarantine commit creation script usage: ./contrib/scripts/quarantine.sh "<focus-phrase>" This will generate a commit that quarantines the tests that match the specified focus phrase. It mostly works, but if the declarations for tests are made across multiple lines then it will be unable to locate the line to execute the quarantine. There's also a bit of a trick in selecting the right phrase to quarantine; often it will make sense to use the last set of words in a test name for a failing test. Typically these start with something like 'Checks ...' or 'Tests ...' so that only the inner-most 'It' or 'Context' statement is quarantined. However, if a more widespread issue is present then it may make sense to quarantine something using a phrase in the middle or even at the start of the test name. Other hints may be gathered by studying the Jenkins UI, the CI dashboard, and/or the GitHub issues page for issues labeled with 'ci/flake' which have been recently updated. Signed-off-by: Joe Stringer <joe@cilium.io> 30 November 2021, 18:28:14 UTC
8002a50 test: Fix incorrect selector for netperf-service Caught by random chance when using this manifest to test something locally. Might as well fix it in case someone uses this in the future and the service is not working as expected. AFAICT, no CI failures occurred from this typo because the Chaos test suite (only suite which uses this manifest) doesn't assert any traffic to the service, but rather to the netperf-server directly. Fixes: b4a3cf6abc6 ("Test: Run netperf in background while Cilium pod is being deleted") Signed-off-by: Chris Tarazi <chris@isovalent.com> 30 November 2021, 18:05:08 UTC
606b5fe docs: KUBECONFIG for cilium-cli with k3s Clarify how cilium-cli can work with k3s Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com> 30 November 2021, 16:49:06 UTC
04bf74c bpf: Add WireGuard to complexity and compile tests ENABLE_WIREGUARD was missing from the compile tests in bpf/Makefile and from the complexity tests in bpf/complexity-tests. We could therefore have missed new complexity issues or compilation errors occurring only when WireGuard is enabled. Fixes: 8930bebe ("daemon: Configure Wireguard for local node") Reported-by: Joe Stringer <joe@cilium.io> Signed-off-by: Paul Chaignon <paul@cilium.io> 30 November 2021, 16:40:38 UTC
2f749ab build(deps): bump gopkg.in/ini.v1 from 1.64.0 to 1.66.0 Bumps [gopkg.in/ini.v1](https://github.com/go-ini/ini) from 1.64.0 to 1.66.0. - [Release notes](https://github.com/go-ini/ini/releases) - [Commits](https://github.com/go-ini/ini/compare/v1.64.0...v1.66.0) --- updated-dependencies: - dependency-name: gopkg.in/ini.v1 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> 30 November 2021, 16:39:05 UTC
e9b28d8 build(deps): bump github.com/aws/aws-sdk-go-v2/config Bumps [github.com/aws/aws-sdk-go-v2/config](https://github.com/aws/aws-sdk-go-v2) from 1.10.0 to 1.10.3. - [Release notes](https://github.com/aws/aws-sdk-go-v2/releases) - [Changelog](https://github.com/aws/aws-sdk-go-v2/blob/main/CHANGELOG.md) - [Commits](https://github.com/aws/aws-sdk-go-v2/compare/v1.10.0...config/v1.10.3) --- updated-dependencies: - dependency-name: github.com/aws/aws-sdk-go-v2/config dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> 30 November 2021, 16:38:39 UTC
4762a4a dependabot: disable cloud provider SDK updates The cloud provider SDKs are updated too frequently, often times without any code changes affecting Cilium. However, these PRs still require developer's time reviewing/approving such PRs and increase CI cost. Thus, exclude these dependencies from automatic updates and instead update them manually once every month. Signed-off-by: Tobias Klauser <tobias@cilium.io> 30 November 2021, 16:24:08 UTC
aef5002 test: temporary increase Hubble buffer size to 64k Temporary increase the Hubble buffer size in order to capture more flows. This will hopefully help us understand why the K8sEgressGatewayTest is occasionally failing (#18012) Signed-off-by: Gilberto Bertin <gilberto@isovalent.com> 30 November 2021, 14:12:05 UTC
e38e3c4 bugtool: fix IP route debug gathering commands Commit 8bcc4e5dd830 ("bugtool: avoid allocation on conversion of execCommand result to string") broke the `ip route show` commands because the change from `[]byte` to `string` causes the `%v` formatting verb to emit the raw byte slice, not the string. Fix this by using the `%s` formatting verb to make sure the argument gets interpreted as a string. Also fix another instance in `writeCmdToFile` where `fmt.Fprint` is now invoked with a byte slice. Grepping for `%v` in bugtool sources and manually inspecting all changes from commit 8bcc4e5dd830 showed no other instances where a byte slice could potentially end up being formatted in a wrong way. Fixes: 8bcc4e5dd830 ("bugtool: avoid allocation on conversion of execCommand result to string") Signed-off-by: Tobias Klauser <tobias@cilium.io> 30 November 2021, 13:31:05 UTC
7376df3 neigh, test: Bump max timeout for tests There has been report that the neighbor tests took slightly longer than expected and while there was nothing wrong with them, the timeout kicked in and led to failure. Slighly bump it to avoid flakes like these. Fixes: #18013 Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> 30 November 2021, 12:20:37 UTC
98697f3 neigh, test: Also retry upon temporary NUD_FAILED state Wasn't able to reproduce the flake even after running the test overnight. The only explanation I'd have is that there is a small/rare flake due to a temporary NUD_FAILED state where we won't retry again. Closes: #18004 Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> 30 November 2021, 12:20:37 UTC
fc0045f docs: Mention how to build images for local CI testing Signed-off-by: Martynas Pumputis <m@lambda.lt> 30 November 2021, 11:52:57 UTC
1b42f7a contrib: Fix backport submission for own PRs On GitHub, one cannot request oneself to review one's own PR. This results in the following problem when submitting a backport PR: $ submit-backport Using GitHub repository joestringer/cilium (git remote: origin) Sending PR for branch v1.10: v1.10 backports 2021-11-23 * #17788 -- Additional FQDN selector identity tracking fixes (@joestringer) Once this PR is merged, you can update the PR labels via: ```upstream-prs $ for pr in 17788; do contrib/backporting/set-labels.py $pr done 1.10; done ``` Sending pull request... remote: remote: Create a pull request for 'pr/v1.10-backport-2021-11-23' on GitHub by visiting: remote: https://github.com/joestringer/cilium/pull/new/pr/v1.10-backport-2021-11-23 remote: Error requesting reviewer: Unprocessable Entity (HTTP 422) Review cannot be requested from pull request author. Signal ERR caught! Traceback (line function script): 58 main /home/joe/git/cilium/contrib/backporting/submit-backport Fix this by excluding ones own username from the reviewers list. Signed-off-by: Joe Stringer <joe@cilium.io> 30 November 2021, 11:50:28 UTC
816849a build(deps): bump github.com/Azure/azure-sdk-for-go Bumps [github.com/Azure/azure-sdk-for-go](https://github.com/Azure/azure-sdk-for-go) from 59.3.0+incompatible to 59.4.0+incompatible. - [Release notes](https://github.com/Azure/azure-sdk-for-go/releases) - [Changelog](https://github.com/Azure/azure-sdk-for-go/blob/main/CHANGELOG.md) - [Commits](https://github.com/Azure/azure-sdk-for-go/compare/v59.3.0...v59.4.0) --- updated-dependencies: - dependency-name: github.com/Azure/azure-sdk-for-go dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> 30 November 2021, 11:46:00 UTC
71381e8 elf: skip TestWrite if ELF file wasn't built This will skip the test when running the tests standalone (i.e. via `go test` and not via Makefile). See #17536 for more details about this particular file, which applied the same principle to the benchmark in that test suite. See also #16914 Reported-by: Hemanth Malla <hemanth.malla@datadoghq.com> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> 30 November 2021, 11:45:46 UTC
54bd57b docs: update k8s instructions on how to update k8s libraries Signed-off-by: André Martins <andre@cilium.io> 30 November 2021, 01:59:49 UTC
65a46b5 Prometheus lint errors in operator metrics Promtool identified following lint errors when running against operator metrics 1) cilium_operator_identity_gc_entries_total non-counter metrics should not have "_total" suffix 2) cilium_operator_identity_gc_runs_total non-counter metrics should not have "_total" suffix Add relevant changes in upgrade documentation for 1.10 and 1.11 Fixing both the non-counter metrics. Signed-off-by: Gobinath Krishnamoorthy <gobinathk@google.com> 29 November 2021, 18:34:02 UTC
398d55c test/contrib: Bump CoreDNS version to 1.8.3 As reported in [1], Go's HTTP2 client < 1.16 had some serious bugs which could result in lost connections to kube-apiserver. Worse than this was that the client couldn't recover. In the case of CoreDNS the loose of connectivity to kube-apiserver was even not logged. I have validated this by adding the following rule on the node which was running the CoreDNS pod (6443 port as the socket-lb was doing the service xlation): iptables -I FORWARD 1 -m tcp --proto tcp --src $CORE_DNS_POD_IP \ --dport=6443 -j DROP After upgrading CoreDNS to the one which was compiled with Go >= 1.16, the pod was not only logging the errors, but also was able to recover from them in a fast way. An example of such an error: W1126 12:45:08.403311 1 reflector.go:436] pkg/mod/k8s.io/client-go@v0.20.2/tools/cache/reflector.go:167: watch of *v1.Endpoints ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding To determine the min vsn bump, I was using the following: for i in 1.7.0 1.7.1 1.8.0 1.8.1 1.8.2 1.8.3 1.8.4; do docker run --rm -ti "k8s.gcr.io/coredns/coredns:v$i" \ --version done CoreDNS-1.7.0 linux/amd64, go1.14.4, f59c03d CoreDNS-1.7.1 linux/amd64, go1.15.2, aa82ca6 CoreDNS-1.8.0 linux/amd64, go1.15.3, 054c9ae k8s.gcr.io/coredns/coredns:v1.8.1 not found: manifest unknown: k8s.gcr.io/coredns/coredns:v1.8.2 not found: manifest unknown: CoreDNS-1.8.3 linux/amd64, go1.16, 4293992 CoreDNS-1.8.4 linux/amd64, go1.16.4, 053c4d5 Hopefully, the bumped version will fix the CI flakes in which a service domain name is not available after 7min. In other words, CoreDNS is not able to resolve the name which means that it hasn't received update from the kube-apiserver for the service. [1]: https://github.com/kubernetes/kubernetes/issues/87615#issuecomment-803517109 Signed-off-by: Martynas Pumputis <m@lambda.lt> 29 November 2021, 18:17:01 UTC
e03bfff doc: use ipv4NativeRoutingCIDR instead of nativeRoutingCIDR As the latter has been deprecated in favor of the former. Signed-off-by: Alexandre Perrin <alex@kaworu.ch> 29 November 2021, 17:25:55 UTC
04c29ba Fix unhelpful error emitted when we try to setup base devices Signed-off-by: kerthcet <kerthcet@gmail.com> 29 November 2021, 16:11:32 UTC
06d9441 ci: Restart pods when toggling KPR switch Previously, in the graceful backend termination test we switched to KPR=disabled and we didn't restart CoreDNS. Before the switch, CoreDNS@k8s2 -> kube-apiserver@k8s1 was handled by the socket-lb, so the outgoing packet was $CORE_DNS_IP -> $KUBE_API_SERVER_NODE_IP. The packet should have been BPF masq-ed. After the switch, the BPF masq is no longer in place, so the packets from CoreDNS are subject to the iptables' masquerading (they can be either dropped by the invalid rule or masqueraded to some other port). Combined with CoreDNS unable to recover from connectivity errors [1], the CoreDNS was no longer able to receive updates from the kube-apiserver, thus NXDOMAIN errors for the new service name. To avoid such flakes, forcefully restart the DNS pods if the KPR setting change is detected. [1]: https://github.com/cilium/cilium/pull/18018 Signed-off-by: Martynas Pumputis <m@lambda.lt> 29 November 2021, 16:10:49 UTC
1f71f4e build(deps): bump github.com/aliyun/alibaba-cloud-sdk-go Bumps [github.com/aliyun/alibaba-cloud-sdk-go](https://github.com/aliyun/alibaba-cloud-sdk-go) from 1.61.1340 to 1.61.1357. - [Release notes](https://github.com/aliyun/alibaba-cloud-sdk-go/releases) - [Changelog](https://github.com/aliyun/alibaba-cloud-sdk-go/blob/master/ChangeLog.txt) - [Commits](https://github.com/aliyun/alibaba-cloud-sdk-go/compare/v1.61.1340...v1.61.1357) --- updated-dependencies: - dependency-name: github.com/aliyun/alibaba-cloud-sdk-go dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> 29 November 2021, 14:13:50 UTC
6c1dae8 CODEOWNERS: clean-up entries for deleted files Remove from CODEOWNERS the patterns for which we no longer have any entry in the repository. List obtained with: while read i; do case "$i" in /*) # Remove leading slash and use ls LIST+=" ${i#/}" ;; *) # No leading slash: maybe not at the root, use find [[ -n $(find . -name "$i" -print -quit) ]] || echo "$i" ;; esac done <<< $(awk '/^[^#]/ {print $1}' CODEOWNERS) ls -- $LIST 2>&1 >/dev/null | sed "s=.*'\(.*\)':.*=/\1=" Fixes: b8401aa2edd6 ("checkpatch: remove checkpatch-related files from the repository") Fixes: 72e7740245c5 ("doc: Move goverance documentation to a more visible") Fixes: bf6039b99c33 ("doc: Remove obsolete Docker getting started guide") Fixes: db9b6f71453c ("docs: add cilium-operator technical overview documentation") Fixes: 26a80c381696 ("jenkinsfile: Remove stale symlinks") Fixes: b667f010fb81 ("pkg/k8s: self contain CRDs in common directory") Signed-off-by: Quentin Monnet <quentin@isovalent.com> 29 November 2021, 10:50:29 UTC
e5d84ac CODEOWNERS: fix wildcard patterns for files under daemon/cmd/ The syntax for wildcards in the CODEOWNERS file is a simple asterisk "*", which should not be preceded by a dash (".*") like other languages may use for regular expressions. For most entries, this means that only files starting by e.g. "ipcache.", such as "ipcache.go", are covered, but not "ipcache_test.go". For "kube_proxy.*", this even means there is no related entries, given that all files follow the pattern "kube_proxy_*". Let's replace the use of ".*" by "*" (or simply ".go" where relevant) in the CODEOWNERS file. References: - https://docs.github.com/en/repositories/managing-your-repositorys-settings-and-features/customizing-your-repository/about-code-owners#codeowners-syntax - https://git-scm.com/docs/gitignore#_pattern_format Fixes: a6677888fe04 ("CODEOWNERS: Attach entries to root of repository") Signed-off-by: Quentin Monnet <quentin@isovalent.com> 29 November 2021, 10:50:29 UTC
4758bef docs: add registry (quay.io/) for pre-loading images for kind in doc, it recommends docker pull image, but the command is : docker pull cilium/cilium:|IMAGE_TAG| this will download from docker.io However, in operator, it loads images from quay.io we should keep them the same, otherwise, we download for nothing. Signed-off-by: adamzhoul <adamzhoul186@gmail.com> 29 November 2021, 10:14:05 UTC
ce45bc3 docs: correct ec2 modify net iface action `ModifyNetworkInterface` -> `ModifyNetworkInterfaceAttribute` see: https://docs.aws.amazon.com/AWSEC2/latest/APIReference/API_ModifyNetworkInterfaceAttribute.html Signed-off-by: austin ce <austin.cawley@gmail.com> 26 November 2021, 21:43:29 UTC
3650544 Adds a locked function to do ipcache delete on metadata match Fixes potential racing condition introduced in PR #17161. Suggested-by: Joe Stringer <joe@cilium.io> Signed-off-by: Weilong Cui <cuiwl@google.com> 26 November 2021, 21:42:02 UTC
93d4a62 mlh: update Jenkins jobs following 1.23 support Following merge of #18008, we now support K8s 1.22 and have rotated the Jenkins test jobs as follow: - Changed: Kernel 4.9 testing on K8s 1.23 (instead of 1.22) - Changed: Kernel 4.19 testing on K8s 1.22 (instead of 1.21) - Changed: Kernel 5.4 testing on K8s 1.21 (instead of 1.20) - Added: Kernel 4.9 testing on K8s 1.21 See the Table of Truth:tm: for up to date status on all trigger phrases: https://docs.google.com/spreadsheets/d/1TThkqvVZxaqLR-Ela4ZrcJ0lrTJByCqrbdCjnI32_X0/edit#gid=0 Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com> 26 November 2021, 21:39:51 UTC
ff8a7e6 ui: v0.8.3 Signed-off-by: Dmitry Kharitonov <dmitry@isovalent.com> 26 November 2021, 21:39:05 UTC
bb6ef27 test/helpers: fix ensure kubectl version to work for RCs Fixes: 61812551f659 ("test: ensure kubectl version is available for test run") Signed-off-by: André Martins <andre@cilium.io> 26 November 2021, 17:08:30 UTC
c56075d Update k8s tests and libraries to v1.23.0-rc.0 Signed-off-by: André Martins <andre@cilium.io> 26 November 2021, 17:08:30 UTC
18b10b4 ipam/crd: Fix spurious CiliumNode update status failures When running in CRD-based IPAM modes (Alibaba, Azure, ENI, CRD), it is possible to observe spurious "Unable to update CiliumNode custom resource" failures in the cilium-agent. The full error message is as follows: "Operation cannot be fulfilled on ciliumnodes.cilium.io <node>: the object has been modified; please apply your changes to the latest version and try again". It means that the Kubernetes `UpdateStatus` call has failed because the local `ObjectMeta.ResourceVersion` of submitted CiliumNode version is out of date. In the presence of races, this error is expected and will resolve itself once the agent receives a more recent version of the object with the new resource version. However, it is possible that the resource version of a `CiliumNode` object is bumped even though the `Spec` or `Status` of the `CiliumNode` remains the same. This for examples happens when `ObjectMeta.ManagedFields` is updated by the Kubernetes apiserver. Unfortunately, `CiliumNode.DeepEqual` does _not_ consider any `ObjectMeta` fields (including the resource version). Therefore two objects with different resource versions are considered the same by the `CiliumNode` watcher used by IPAM. But to be able to successfully call `UpdateStatus` we need to know the most recent resource version. Otherwise, `UpdateStatus` will always fail until the `CiliumNode` object is updated externally for some reason. Therefore, this commit modifies the logic to always store the most recent version of the `CiliumNode` object, even if `Spec` or `Status` has not changed. This in turn allows `nodeStore.refreshNode` (which invokes `UpdateStatus`) to always work on the most recently observed resource version. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> 26 November 2021, 13:39:41 UTC
ed73a31 egressgateway: refactor manager logic This commit refactors the egress gateway manager in order to provide a single `reconcile()` method which will be invoked on all events received by the manager. This method is responsible for adding and removing entries to and from the egress policy map. In addition to this, the manager will now wait for the k8s cache to be fully synced before running its first reconciliation, in order to always have the egress_policy map in a consistent state with the k8s configuration. Fixes: #17380 Fixes: #17753 Signed-off-by: Gilberto Bertin <gilberto@isovalent.com> 26 November 2021, 08:05:35 UTC
d9b60f7 daemon: add WaitUntilK8sCacheIsSynced method which will block the caller until the agent has fully sync its k8s cache. Signed-off-by: Gilberto Bertin <gilberto@isovalent.com> 26 November 2021, 08:05:35 UTC
cdb4b46 docs: add a note on egress gateway upgrade impact for 1.11 Signed-off-by: Gilberto Bertin <gilberto@isovalent.com> 26 November 2021, 08:05:35 UTC
2b07959 bpf: rename egress policy map and its fields to make it more clear it's related to the egress gateway policies Signed-off-by: Gilberto Bertin <gilberto@isovalent.com> 26 November 2021, 08:05:35 UTC
3ba8e6e maps: switch egressmap to cilium/ebpf package Signed-off-by: Gilberto Bertin <gilberto@isovalent.com> 26 November 2021, 08:05:35 UTC
0b27f80 docs: Mention service topology in KPR guide Signed-off-by: Martynas Pumputis <m@lambda.lt> 25 November 2021, 17:34:05 UTC
545d94c helm: Add loadBalancer.serviceTopology This enables k8s service topology aware hints. Signed-off-by: Martynas Pumputis <m@lambda.lt> 25 November 2021, 17:34:05 UTC
ed9c7ce k8s: Add unit tests for topology aware hints Signed-off-by: Martynas Pumputis <m@lambda.lt> 25 November 2021, 17:34:05 UTC
8442d6e k8s: Fix endpoints returned by update routine Previously, the function returned all passed endpoints instead the ones which were filtered and correlated by correlateEndpoints(). The change is no-op, as nobody was consuming the return value of UpdateEndpoint*(). Signed-off-by: Martynas Pumputis <m@lambda.lt> 25 November 2021, 17:34:05 UTC
6ddfbd2 k8s: Implement svc topology aware hints This commit implements the topology aware hints for k8s services described in [1]. The idea of the feature is to provision service endpoints only if their zone hints matches the self node's "topology.kubernetes.io/zone" label value. The main benefit is that it allows service traffic to prefer zone-local endpoints which could be used e.g., to avoid costs associated with crossing cloud network zones. Also, it might yield better performance for service traffic, as the nearer endpoints are preferred. The hints for endpoints is set by kube-controller-manager. The heuristics are described in [1]. The hints are set in the EndpointsliceV1 object (this is the reason why we don't implement the hints parsing for other endpoint object types). I considered implementing the feature in "pkg/service" instead of "pkg/k8s". The main reasons for choosing the latter is (1) that this feature is k8s specific and (2) that in the near future we probably will merge "pkg/service" with "pkg/maps/lbmap", as both deal with the low-level datapath specific details. [1]: https://kubernetes.io/docs/concepts/services-networking/topology-aware-hints/ Signed-off-by: Martynas Pumputis <m@lambda.lt> 25 November 2021, 17:34:05 UTC
14b70ad k8s: Extend Node subscriber to accept swg The swg (stoppable wait group) is used by the service_cache.go when syncing k8s caches upon the agent startup. Until now, service_cache was consuming only Service and Endpoint* objects. However, for the upcoming service topology aware hints feature we need to add (self) Node object as well to the list. This is because the feature needs to get the "topology.kubernetes.io/zone" of the self Node. Signed-off-by: Martynas Pumputis <m@lambda.lt> 25 November 2021, 17:34:05 UTC
2ddf5e7 daemon: Add --enable-service-topology It's going to be used by the k8s service topology aware hints feature to be implemented in the next commit. Signed-off-by: Martynas Pumputis <m@lambda.lt> 25 November 2021, 17:34:05 UTC
2ac1403 k8s: Add Hints.ForZone field to slim Endpoint This is going to be used by the upcoming (service) topology aware hints feature. Signed-off-by: Martynas Pumputis <m@lambda.lt> 25 November 2021, 17:34:05 UTC
c46a028 docs: Add cilium "managed pods" example This example demonstrates a good example of when all pods are managed by Cilium. Signed-off-by: Joe Stringer <joe@cilium.io> 25 November 2021, 15:58:28 UTC
4ce5cef docs: Document recent feature deprecations Signed-off-by: Joe Stringer <joe@cilium.io> 25 November 2021, 15:58:28 UTC
b0a9510 Remove remaining references to Mesos Signed-off-by: Joe Stringer <joe@cilium.io> 25 November 2021, 15:58:28 UTC
747ef3a docs: Deprecate 'cilium policy trace' Support for the various policy types in the in-pod 'cilium policy trace' command has not kept pace with the development on the core policy model. Deprecate this tool so that users are not misled by the confusing and often wrong policy trace output. Users are suggested to use one of the alternative methods to reason about their policies: * https://app.networkpolicy.io * https://docs.cilium.io/en/stable/gettingstarted/policy-creation/ Signed-off-by: Joe Stringer <joe@cilium.io> 25 November 2021, 15:58:28 UTC
fb65f8c docs: Deprecate Consul support Consul support has been primarily used for developer environments in local testing, but we are not aware of any users running clusters depending on Consul for Cilium control plane co-ordination. Deprecate it in preparation to remove support in a future release, to minimize the maintenance burden of this code. Signed-off-by: Joe Stringer <joe@cilium.io> 25 November 2021, 15:58:28 UTC
abb1d06 docs: Deprecate IPVLAN support IPVLAN support has a list of caveats in terms of features, few users and fewer maintainers. Recently, we improved virtual ethernet support in the kernel to gain many of the performance advantages of IPVLAN. Unless there is strong community support for maintaining this feature going forward, it will make sense to remove support in the v1.12 development cycle. Signed-off-by: Joe Stringer <joe@cilium.io> 25 November 2021, 15:58:28 UTC
b12ecd1 Makefile: Add kind-image target Add a new kind-image target, mirroring the microk8s target for building and tagging the image. To use: $ make kind <install Cilium> $ make kind-image $ kubectl -n kube-system set image daemonset/cilium cilium-agent=localhost:5000/cilium/cilium-dev:local Signed-off-by: Joe Stringer <joe@cilium.io> 24 November 2021, 18:24:53 UTC
fdfd78d Makefile: Use target-specific vars for microk8s The microk8s target previously relied on an env var declared at the top-level of the Makefile, which will conflict with an upcoming commit. Move it into a target-specific variable. This also means we can remove the redundant tagging operation. Signed-off-by: Joe Stringer <joe@cilium.io> 24 November 2021, 18:24:53 UTC
8c333e0 build(deps): bump github.com/go-openapi/strfmt from 0.21.0 to 0.21.1 Bumps [github.com/go-openapi/strfmt](https://github.com/go-openapi/strfmt) from 0.21.0 to 0.21.1. - [Release notes](https://github.com/go-openapi/strfmt/releases) - [Commits](https://github.com/go-openapi/strfmt/compare/v0.21.0...v0.21.1) --- updated-dependencies: - dependency-name: github.com/go-openapi/strfmt dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> 24 November 2021, 16:16:56 UTC
e3e1fea build(deps): bump aws-actions/configure-aws-credentials Bumps [aws-actions/configure-aws-credentials](https://github.com/aws-actions/configure-aws-credentials) from 1.5.11 to 1.6.0. - [Release notes](https://github.com/aws-actions/configure-aws-credentials/releases) - [Changelog](https://github.com/aws-actions/configure-aws-credentials/blob/master/CHANGELOG.md) - [Commits](https://github.com/aws-actions/configure-aws-credentials/compare/0d9a5be0dceea74e09396820e1e522ba4a110d2f...ea7b857d8a33dc2fb4ef5a724500044281b49a5e) --- updated-dependencies: - dependency-name: aws-actions/configure-aws-credentials dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> 24 November 2021, 13:51:31 UTC
7eaafc8 docs: remove mention of 250 nodes for kvstore Most of the use cases don't require setting up a KVstore to use Cilium. This commit updates the documentation to reflect the current situations where someone would like to set up a KVStore. Signed-off-by: André Martins <andre@cilium.io> 24 November 2021, 12:27:45 UTC
86b836d build(deps): bump github/codeql-action from 1.0.23 to 1.0.24 Bumps [github/codeql-action](https://github.com/github/codeql-action) from 1.0.23 to 1.0.24. - [Release notes](https://github.com/github/codeql-action/releases) - [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md) - [Commits](https://github.com/github/codeql-action/compare/a627e9fa504113bfa8e90a9b429b157a38b1cdbd...e095058bfa09de8070f94e98f5dc059531bc6235) --- updated-dependencies: - dependency-name: github/codeql-action dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> 24 November 2021, 12:27:31 UTC
eeb7f1b daemon/cmd: Extend Cilium status with graceful termination flag The status only reflects the value of the flag 'enable-k8s-terminating-endpoints'. Per the (kube-proxy-replacement) documentation, the relevant feature gate still needs to be enabled in kubernetes deployments >= v1.20. Signed-off-by: Aditi Ghag <aditi@cilium.io> 23 November 2021, 23:27:38 UTC
1ccb845 install/kubernetes: fix helm generation for operator image digest This commit fixes the image digest as part of the operator deployment. Fixes: 4638de2ad17b ("cleanup the cilium helm chart:") Signed-off-by: André Martins <andre@cilium.io> 23 November 2021, 21:36:20 UTC
cb1bf90 bpf: Fix l4lb stale map removal under cni mode When the agent starts up we can see the following maps being removed as stale maps: [...] level=info msg="Restored endpoint" endpointID=3747 ipAddr="[ ]" subsys=endpoint level=info msg="Finished regenerating restored endpoints" regenerated=1 subsys=daemon total=1 level=info msg="Removed stale bpf map" file-path=/sys/fs/bpf/tc/globals/cilium_capture_cache subsys=daemon level=info msg="Removed stale bpf map" file-path=/sys/fs/bpf/tc/globals/cilium_ktime_cache subsys=daemon [...] This is due to pcap.h being included from nodeport.h where the former defines mentioned maps unconditionally. Rework it, so that both are only created in L4LB mode. Fixes: #17935 Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> 23 November 2021, 20:23:26 UTC
f579ab7 bpf: Move time cache into separate header file Reduces scope to where it is really used given this creates a map. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> 23 November 2021, 20:23:26 UTC
3bd4ad6 workflows: Run CodeQL workflow is the workflow is edited We use path filters in the CodeQL workflow to avoid running it for unrelated changes. We're however missing the workflow file itself in the path filters. As a result, the CodeQL workflow isn't run when the GitHub Actions it uses are updated by dependabot. This commit fixes it. Signed-off-by: Paul Chaignon <paul@cilium.io> 23 November 2021, 17:31:09 UTC
back to top