https://github.com/cilium/cilium

sort by:
Revision Author Date Message Commit Date
769efda Prepare for release v1.13.0-rc2 Signed-off-by: André Martins <andre@cilium.io> 01 November 2022, 12:57:18 UTC
8aec7c6 update AUTHORS and Documentation Signed-off-by: André Martins <andre@cilium.io> 01 November 2022, 12:57:18 UTC
c84c03f Fix: prevent goroutine leakage Use the ctx passed to startSynchronizingCiliumNodes instead of wait.NeverStop. Signed-off-by: kerthcet <kerthcet@gmail.com> 01 November 2022, 10:05:49 UTC
b8e8ca4 hive: Add title to Module() and enforce format To support short module identifiers for use in logs the module name is now an identifier with forced format (lower-case, 30 chars). This however hampers readability when visualizing the hive, so for this purpose we add a title to the module. This is also forced to only contain alpha-numeric characters and be at most 80 characters in length to mostly keep the module line in "PrintObjects" short enough to fit on one line. Signed-off-by: Jussi Maki <jussi@isovalent.com> 01 November 2022, 10:00:35 UTC
e946cbb Remove log message exception from tests Signed-off-by: Thomas Balthazar <thomas@balthazar.info> 01 November 2022, 10:00:23 UTC
3d79d98 Use the new error type to decide on the log level We want a log level of "debug" instead of "error" when this error occurs: "local-redirect service exists for frontend, skip update for svc lrp-demo-service" Fixes: #16400 Signed-off-by: Thomas Balthazar <thomas@balthazar.info> 01 November 2022, 10:00:23 UTC
60f3bc0 Test the new error type This add unit tests to the new error and function added in the previous commit. Signed-off-by: Thomas Balthazar <thomas@balthazar.info> 01 November 2022, 10:00:23 UTC
d5feb78 Add a new ErrLocalRedirectServiceExists error type As explained in #16400, we want to be able to choose the log level depending on this error type. Signed-off-by: Thomas Balthazar <thomas@balthazar.info> 01 November 2022, 10:00:23 UTC
4f464bc build(deps): bump github.com/hashicorp/consul/api from 1.15.2 to 1.15.3 Bumps [github.com/hashicorp/consul/api](https://github.com/hashicorp/consul) from 1.15.2 to 1.15.3. - [Release notes](https://github.com/hashicorp/consul/releases) - [Changelog](https://github.com/hashicorp/consul/blob/main/CHANGELOG.md) - [Commits](https://github.com/hashicorp/consul/compare/api/v1.15.2...api/v1.15.3) --- updated-dependencies: - dependency-name: github.com/hashicorp/consul/api dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> 01 November 2022, 09:59:51 UTC
228f495 alibabacloud: Fix create ENI failure due to invalid parameter SecondaryPrivateIpAddressCount is optional but must not be zero. If it's given zero, omit it. Signed-off-by: Jaff Cheng <jaff.cheng.sh@gmail.com> 01 November 2022, 09:59:17 UTC
5a6b713 Remove chart fields planned for removal in 1.12 Signed-off-by: xin.li <xin.li@daocloud.io> 01 November 2022, 01:52:06 UTC
ab62c64 correct the stale documentation link Signed-off-by: Dmitry Savintsev <dmitris@users.noreply.github.com> 01 November 2022, 01:50:20 UTC
7f7c00b makefile: cleanup comments The commit 7303c02bc2 ("bpf-mock makefile: Remove removed mock target") removed the bpf-mock-lint target, but there left some comments about running it. Remove the mentions of bpf-mock. Signed-off-by: Anton Protopopov <aspsk@isovalent.com> 31 October 2022, 23:14:20 UTC
b2bff0f makefile: add a new target to run 'golangci-lint run --fix' There is a make target to execute the 'golangci-lint run' command, but there is no simple way to execute it with a '--fix' argument. Add a new target to do this. If more arguments need to be added to the golangci-lint command, then one can do this by setting the newly added GOLANGCI_LINT_ARGS variable. Signed-off-by: Anton Protopopov <aspsk@isovalent.com> 31 October 2022, 23:14:20 UTC
021f75b build(deps): bump github/codeql-action from 2.1.28 to 2.1.29 Bumps [github/codeql-action](https://github.com/github/codeql-action) from 2.1.28 to 2.1.29. - [Release notes](https://github.com/github/codeql-action/releases) - [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md) - [Commits](https://github.com/github/codeql-action/compare/cc7986c02bac29104a72998e67239bb5ee2ee110...ec3cf9c605b848da5f1e41e8452719eb1ccfb9a6) --- updated-dependencies: - dependency-name: github/codeql-action dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> 31 October 2022, 22:31:00 UTC
572f3b6 daemon: Use disabled k8s client in daemon tests The refactoring to remove the global clients caused the daemon tests to fail as now the fake clientset was passed down to NodeDiscovery. Previously it worked as k8s.SetClients() wasn't called in daemon tests. Fix this by implementing Disable() support in the fake client and disable it in daemon tests. Signed-off-by: Jussi Maki <jussi@isovalent.com> 31 October 2022, 19:48:41 UTC
927bd8c k8s: Remove the global k8s client accessors This removes the k8s.Client(), k8s.CiliumClient() etc. client accessors and refactors all prior uses of them to use a client.Clientset instead. Signed-off-by: Jussi Maki <jussi@isovalent.com> 31 October 2022, 19:48:41 UTC
2db7211 k8s/client: Add getters This adds the GetSecrets, GetK8sNode and GetCiliumNode into the Clientset in preparation for removing the global k8s.Client() etc. accessors that implemented these. Signed-off-by: Jussi Maki <jussi@isovalent.com> 31 October 2022, 19:48:41 UTC
0c21fc3 Sign Cilium container images Implement container image signing using cosign in build-images workflows. Leveraging image signing gives users confidence that the container images they got from the container registry were the trusted code that the maintainer built and published. Fixes: cilium#19282 Signed-off-by: Sandipan Panda <samparksandipan@gmail.com> 31 October 2022, 11:10:32 UTC
9a0858e gha: Add gateway API conformance test This commit is add gateway api conformance test from upstream. The goal is to have it running on every PR, so that we can catch any issue due to regression, refactoring or adding new features. The upstream conformance, by default, is not configured with query param matching feature. To reduce the coupling with upstream, the conformance_test.go is added for flexibility, for example, query param tests are enabled. Signed-off-by: Tam Mach <tam.mach@cilium.io> 31 October 2022, 10:26:27 UTC
e3dfed8 gateway-api: Support v1alpha2 ReferenceGrant API This commit is to support ReferenceGrant for cross-namespace resources: - Secret is referenced in Gateway - Service is referenced in HTTPRoute The conformance test is also enabled with ReferenceGrant feature. Signed-off-by: Tam Mach <tam.mach@cilium.io> 31 October 2022, 10:26:27 UTC
759f716 gateway/api: Add secrets sync mechanism This commit is to make sure that any TLS related secret will be synced to cilium-secrets namespace, so that the agent's permission is scoped down to single namespace instead of cluster-wide. The same approach is used in Ingress. However, it's better to keep it separate due to: - underlying framework is different (e.g. controller-runtime) - placeholder to support ReferenceGrant API later. Signed-off-by: Tam Mach <tam.mach@cilium.io> 31 October 2022, 10:26:27 UTC
a3e32f0 gateway-api: Add reconciliation step for Gateway For gateway resources, the reconciliation should start if any of below events happens: - Changes in related GatewayClass - Changes in any of HTTPRoute status - Changes in owning LB services status - Changes in owning CEC (as currently we don't have status subresource) - Changes in any Secret used in TLS As we are using the same LB service for all listeners, it's all or nothing for ListenerStatus. Signed-off-by: Tam Mach <tam.mach@cilium.io> 31 October 2022, 10:26:27 UTC
db3ab62 gateway-api/translation: Add translation logic This commit is to our internal representation to CEC, LB Service and Endpoints. The logic is exactly the same compared to default translator, except a few tweaks: - hostname matching is suffix based - multiple listeners might have a same port number (e.g. 80 or 443), so we need to consider only unique values. Signed-off-by: Tam Mach <tam.mach@cilium.io> 31 October 2022, 10:26:27 UTC
58c8aff gateway-api/ingestion: Add ingestion logic This commit is to convert gateway api resource to our internal representation. The logic is pretty simple, just few things to highlight compared to Ingress: - Query match, header match are supported. - Request header filter is supported for operations Set, Add and Remove. Signed-off-by: Tam Mach <tam.mach@cilium.io> 31 October 2022, 10:26:27 UTC
c90d388 gateway-api/model: Support request header filters This commit is to support request header add/set/removal operation. Signed-off-by: Tam Mach <tam.mach@cilium.io> 31 October 2022, 10:26:27 UTC
ee85bf8 gateway-api/model: Support direct response for Route This is to support the scenario, in which no backend is valid or available. Signed-off-by: Tam Mach <tam.mach@cilium.io> 31 October 2022, 10:26:27 UTC
51e128c gateway-api/model: Support hostnames in routes This commit is to support hostnames in route level, mainly for the stricter domain validation compared to listener domain. For example, listener might have wildcard domain such as *.example.com, but each route might have its own sub-domains such as route1.example.com or route2.example.com. If nothing is specified in route level, the value from listener will be honoured. Signed-off-by: Tam Mach <tam.mach@cilium.io> 31 October 2022, 10:26:27 UTC
71cc3be gateway-api/model: Support multiple TLS secrets This commit is to support multiple TLS secrets, which can be useful for some use cases in Gateway API. Signed-off-by: Tam Mach <tam.mach@cilium.io> 31 October 2022, 10:26:27 UTC
bb37b82 translator: Refactor the shared translator This commit is just to lift and shift existing shared translator (used in Ingress) to higher level, so that it can be re-used naturally for both Ingress and Gateway API. Signed-off-by: Tam Mach <tam.mach@cilium.io> 31 October 2022, 10:26:27 UTC
edd998b gateway-api/model: Extend matching rules in Route This commit is to support headers and query params matching, also add weightage attribute for backends as well. The goal is to prepare for supporting more options in HTTPRoute from Gateway API in subsequent changes. Signed-off-by: Tam Mach <tam.mach@cilium.io> 31 October 2022, 10:26:27 UTC
d0a43aa gateway-api: Add reconciliation step for HTTPRoute For HTTPRoute resources, the reconciliation should start if any of below event happens: - Changes in HTTPRoute itself - Changes in related backend services - Changes in parent Gateway spec (e.g. allowedRoutes) The current reconciliation loop is trying its best to make sure that HTTPRoute is attachable to Gateway. If all validations are passed, then the Accepted condition will be updated to True, which signals the reconciliation loop for parent Gateway resources. Signed-off-by: Tam Mach <tam.mach@cilium.io> 31 October 2022, 10:26:27 UTC
8733eb0 gateway-api: Add reconciliation step for GatewayClass This is just to add a simple reconciliation loop, which will just check controller name and update the status accordingly. Currently, there is no support of GatewayClass configuration from either custom resource or configmap (preferred), hence the Accepted condition will be just updated to True. Future improvement can be done with different set of configuration parameters (e.g. internal vs external, etc). Signed-off-by: Tam Mach <tam.mach@cilium.io> 31 October 2022, 10:26:27 UTC
410af24 helm,cli: Add option to enable Gateway API feature This commit is doing nothing but just add a flag for enabling Gateway API support. The permission for operator clusterrole is updated as required. Signed-off-by: Tam Mach <tam.mach@cilium.io> 31 October 2022, 10:26:27 UTC
31e6f9a operator: Add skeleton to Gateway API controller This commit is to leverage controller-runtime library for Gateway API controllers. operator-sdk CLI is used to create the scaffold structure and code. ``` operator-sdk create api --group gateway --version v1beta1 --kind GatewayClass --resource --controller --namespaced false operator-sdk create api --group gateway --version v1beta1 --kind Gateway --resource --controller operator-sdk create api --group gateway --version v1beta1 --kind HTTPRoute --resource --controller ``` One adjustment is to separate the reconciliation event trigger (e.g. watch sepecific resource) and the reconciliation logic itself into different files (e.g. gateway.go and gateway_reconcile.go). This will create some space for actual implementation later. Signed-off-by: Tam Mach <tam.mach@cilium.io> 31 October 2022, 10:26:27 UTC
7cc3e3a fqdn/dnsproxy: Rewrite dnsproxy benchmark This rewrites the Benchmark_perEPAllow_setPortRulesForID benchmark to test a "real world" scenario in terms of allocations, cpu usage and heap memory. Looong regexes take a lot of space, and the previous version of this benchmark wasn't too representative. In general this test will now simulate an agent that has 60 running pods at any given time, and they keep recycling until we have created endpoint number 10000 (nEPs). They all have distinct rules, with some exceptions. Every everyNIsEqual EP has the same dns rules, and every everyNHasWildcard has a wildcard rule. This is very useful since we don't need to rely on some CNP dump, and it behaves like a real world agent with a "limited" number of pods running at any given time. It would also help benchmark new changes, since the pprof dump now only contain the relevant data, and not ~mostly yaml+json parsing like the Benchmark_perEPAllow_setPortRulesForID_large. Also, previously it looked like this benchmark had two distinct selectors, but since they were null, in the end it only had one. The new "B(HeapInUse)/op" value is a rough calculation of how much the heap size grows from the start of the benchmark to the end. This is a very good proxy for how much memory the dnsproxy will consume during runtime. $ go test -v -bench='^Benchmark_perEPAllow_setPortRulesForID$' -run="^$" -benchtime 1x -memprofile memprofile.prof ./pkg/fqdn/dnsproxy cache size of 1024: Benchmark_perEPAllow_setPortRulesForID-7 1 91863112900 ns/op 1786241024 B(HeapInUse)/op 78642436208 B/op 167454992 allocs/op cache size of 128: Benchmark_perEPAllow_setPortRulesForID-7 1 88338408700 ns/op 224722944 B(HeapInUse)/op 78642493144 B/op 167456155 allocs/op cache size of 1: Benchmark_perEPAllow_setPortRulesForID-7 1 79318500100 ns/op 105177088 B(HeapInUse)/op 86872021272 B/op 180312236 allocs/op cache size of 0 (disabled): Benchmark_perEPAllow_setPortRulesForID-7 1 144261660700 ns/op 209502208 B(HeapInUse)/op 169280546296 B/op 309321759 allocs/op Comparing the old default of a 1024 cache to 128; name old time/op new time/op delta _perEPAllow_setPortRulesForID-7 83.7s ±17% 103.3s ±18% +23.44% (p=0.032 n=5+5) name old B(HeapInUse)/op new B(HeapInUse)/op delta _perEPAllow_setPortRulesForID-7 1.79G ± 0% 0.22G ± 0% -87.40% (p=0.008 n=5+5) name old alloc/op new alloc/op delta _perEPAllow_setPortRulesForID-7 78.6GB ± 0% 78.6GB ± 0% ~ (p=0.310 n=5+5) name old allocs/op new allocs/op delta _perEPAllow_setPortRulesForID-7 167M ± 0% 167M ± 0% ~ (p=0.421 n=5+5) Signed-off-by: Odin Ugedal <ougedal@palantir.com> 28 October 2022, 13:47:50 UTC
b3cd077 docs: Reword note in Azure CNI chaining documentation Clarify that Azure CNI chaining is different than Azure CNI Powered by Cilium. Signed-off-by: Will Daly <widaly@microsoft.com> 28 October 2022, 13:00:46 UTC
88c40e7 build(deps): bump go.opentelemetry.io/otel/trace from 1.10.0 to 1.11.1 Bumps [go.opentelemetry.io/otel/trace](https://github.com/open-telemetry/opentelemetry-go) from 1.10.0 to 1.11.1. - [Release notes](https://github.com/open-telemetry/opentelemetry-go/releases) - [Changelog](https://github.com/open-telemetry/opentelemetry-go/blob/main/CHANGELOG.md) - [Commits](https://github.com/open-telemetry/opentelemetry-go/compare/v1.10.0...v1.11.1) --- updated-dependencies: - dependency-name: go.opentelemetry.io/otel/trace dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> 28 October 2022, 12:58:34 UTC
6da1aae build(deps): bump google.golang.org/grpc from 1.49.0 to 1.50.1 Bumps [google.golang.org/grpc](https://github.com/grpc/grpc-go) from 1.49.0 to 1.50.1. - [Release notes](https://github.com/grpc/grpc-go/releases) - [Commits](https://github.com/grpc/grpc-go/compare/v1.49.0...v1.50.1) --- updated-dependencies: - dependency-name: google.golang.org/grpc dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> 28 October 2022, 12:58:26 UTC
34e824d datapath: don't require TUNNEL_MAP for EgressGW In a direct-routing config, we don't actually require the TUNNEL_MAP for EgressGW. In the from-container path, handle_ipv4_from_lxc() gets the tunnel_endpoint from the EgressGW policy (and can trust that it's != 0). So extract an optimized __encap_and_redirect_lxc() that doesn't depend on TUNNEL_MAP. In the reply path, rev_nodeport_lb4() queries the IPCache to obtain the source node of an EgressGW connection. If that fails, there's no further fallback to the TUNNEL_MAP. Signed-off-by: Julian Wiedmann <jwi@isovalent.com> 28 October 2022, 12:58:15 UTC
cc4af99 bpf: encap: fix comment for encap_and_redirect_with_nodeid() encap_and_redirect_with_nodeid() doesn't do any tunnel lookup. Consequently it also doesn't return DROP_NO_TUNNEL_ENDPOINT. None of the callers is trying to handle such an error either, so this was presumably just a copy&paste typo. Signed-off-by: Julian Wiedmann <jwi@isovalent.com> 28 October 2022, 12:58:15 UTC
4665481 bpf: extract Egress Gateway logic Move most of the Egress Gateway-specific code into its header file, so that readers don't have to concern themselves with all the details. Fixes: https://github.com/cilium/cilium/issues/19785 Signed-off-by: Julian Wiedmann <jwi@isovalent.com> 28 October 2022, 12:58:15 UTC
433a2f8 bpf: test: add TC per-packet LB test for service without backend Packets that are adressed to a VIP without any backend should be dropped. As the VIP doesn't get translated, this currently works "by accident" if no matching allow-policy for the VIP is installed. But we actually want to happen this indepedently of policy, with a proper drop reason. Signed-off-by: Julian Wiedmann <jwi@isovalent.com> 28 October 2022, 12:56:30 UTC
1a65968 bpf: test: add XDP LB test for service without backend Packets that are adressed to a VIP without any backend should be dropped. Signed-off-by: Julian Wiedmann <jwi@isovalent.com> 28 October 2022, 12:56:30 UTC
5b8a06e Fix behavior where packets leave node if there are no backends Resolves an issue where an outgoing packet destined for a service will not be dropped if it does not have any backends. Currently we will not return the service if there are no backends for it, meaning we will never drop a packet in this case and instead simply route it through the kernels default routes. Fixes: #21453 Signed-off-by: Michael Aspinwall <maspinwall@google.com> 28 October 2022, 12:56:30 UTC
dd08c3f fix: correction in PR #21825 This change corrects a typo from PR #21825 and changes the use of method 'GetBool' from 'viper' package to 'vp'. Signed-off-by: Nishant Burte <nburte@google.com> 28 October 2022, 12:54:43 UTC
4574392 resource: Make the resource lazy by default For some resource kinds we only have observers if a feature has been enabled. Rather than guarding the creating of resource based on a set of "enabled" flags, make the resource lazy and only start the informer when the first observer (or call to Store()) is seen. Signed-off-by: Jussi Maki <jussi@isovalent.com> 28 October 2022, 12:53:09 UTC
0f17d75 Enable icmp error replies with new flag ICMP error replies (packet size too big) are sent in dsr_reply_icmp4 and dsr_reply_icmp6 functions. These replies are sent only when cilium is enabled in LBOnly mode and NodePortAcceleration is set to true. This change allows icmp error replies to be sent to the client in cilium CNI mode when enable-pmtu-discovery flag is set to true. By default, this flag is false. Fixes: #21795 Signed-off-by: Nishant Burte <nburte@google.com> 27 October 2022, 14:26:09 UTC
5905304 hubble: Add "hubble-prefer-ipv6" option When a node has mixed IPv4 and IPv6 addresses, but internal cluster communication is only IPv6-based, Hubble would announce wrong node addresses which would break communication to the agents. This option allows to control this behavior explicitly. The previous IPv4 preference is kept as default. Signed-off-by: Heiko Rothe <me@heikorothe.com> 26 October 2022, 16:44:31 UTC
3b35e9d Enable exemplars for histogram queries in Hubble L7 workloads dashboard With https://github.com/cilium/cilium/pull/21599 it's possible to extract traceIDs into exemplars from HTTP flows, but to visualize them in the Hubble L7 HTTP workloads dashboard, we need to explicitly enable exemplars on the panels we wish to have them displayed on. This commit enables exemplars on the HTTP request duration by source/destination panels. Signed-off-by: Chance Zibolski <chance.zibolski@gmail.com> 26 October 2022, 15:37:37 UTC
9fc43c0 bpf: nodeport: clarify handling for non-routable ClusterIP Move the check for *_svc_is_routable() into the preceding if-statement, where we already checked that *_lookup_service() returned a non-NULL svc. Then turn the whole construct into a simple if-else statement. Signed-off-by: Julian Wiedmann <jwi@isovalent.com> 26 October 2022, 15:37:09 UTC
7fb2d89 bpf: nodeport: remove bpf_skip_nodeport*() wrappers Just call the ctx_skip_nodeport*() helpers directly. Signed-off-by: Julian Wiedmann <jwi@isovalent.com> 26 October 2022, 15:37:09 UTC
5cbe132 bpf: clean up skip_redirect in bpf_host Currently we first set the skip_redirect flag, and then later act upon it. Just do everything in one step instead. Signed-off-by: Julian Wiedmann <jwi@isovalent.com> 26 October 2022, 15:37:09 UTC
8752c38 bpf: remove unused ipv4_ct_tuple in bpf_host Going back in history, this has been unused for a long time. Signed-off-by: Julian Wiedmann <jwi@isovalent.com> 26 October 2022, 15:37:09 UTC
345c283 Pin gcloud CLI version Ref: https://cloud.google.com/sdk/docs/release-notes#40500_2022-10-04 Signed-off-by: Michi Mutsuzaki <michi@isovalent.com> 26 October 2022, 09:39:21 UTC
e704d0c dns: Add DataSource field to ProxyRequestContext Add Datasource field to ProxyRequestContext so that NotifyOnDNSMsgFunc callback can retrieve the data source information from request context instead of assuming the data source is always "proxy". Signed-off-by: Michi Mutsuzaki <michi@isovalent.com> 25 October 2022, 19:39:04 UTC
bb3d217 build(deps): bump KyleMayes/install-llvm-action from 1.5.5 to 1.6.0 Bumps [KyleMayes/install-llvm-action](https://github.com/KyleMayes/install-llvm-action) from 1.5.5 to 1.6.0. - [Release notes](https://github.com/KyleMayes/install-llvm-action/releases) - [Commits](https://github.com/KyleMayes/install-llvm-action/compare/4f17b6579351fb03506d988e59077826c366412c...665aaf9d6fba342a852f55fecc5688e7f00e6663) --- updated-dependencies: - dependency-name: KyleMayes/install-llvm-action dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> 25 October 2022, 08:47:30 UTC
750d716 build(deps): bump github.com/onsi/gomega from 1.20.2 to 1.22.1 Bumps [github.com/onsi/gomega](https://github.com/onsi/gomega) from 1.20.2 to 1.22.1. - [Release notes](https://github.com/onsi/gomega/releases) - [Changelog](https://github.com/onsi/gomega/blob/master/CHANGELOG.md) - [Commits](https://github.com/onsi/gomega/compare/v1.20.2...v1.22.1) --- updated-dependencies: - dependency-name: github.com/onsi/gomega dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> 25 October 2022, 08:45:29 UTC
2524f98 chore: Fix typo in contrib/script/kind-shell-helpers.sh Signed-off-by: SADIK KUZU <sadikkuzu@hotmail.com> 25 October 2022, 08:45:12 UTC
811bf1b Rename env var required to disable Flannel See https://docs.k3s.io/reference/env-variables for documentation on the INSTALL_K3S_EXEC environment variable. Signed-off-by: Eric Hausig <eric.hausig+github@gmail.com> 25 October 2022, 08:44:54 UTC
0b7919a alibabacloud: Release ENI after failure to attach Delete ENI after a failed AttachNetworkInterface API call. Fixes: #21747 Signed-off-by: Jaff Cheng <jaff.cheng.sh@gmail.com> 25 October 2022, 08:44:44 UTC
fc23d12 ipam/metrics: Deprecate `cilium_operator_ipam_available_interfaces` Deprecate `cilium_operator_ipam_available_interfaces` and add the following metrics: * cilium_operator_ipam_interface_candidates * cilium_operator_ipam_empty_interface_slots Related: #21747 Signed-off-by: Jaff Cheng <jaff.cheng.sh@gmail.com> 25 October 2022, 08:44:44 UTC
42fadfd ipam/crd: Fix ENI leak due to miscounting of empty interface slots Currently, in CRD ipam modes, before the operator creates a new interface (or ENI for brevity), it checks the number of remaining empty ENI slots to ensure the max ENI limit does not exceed. However, the check is made against the field `AllocationAction.AvailableInterfaces`, which is set as `(empty ENI slots) + (attached ENIs that have not yet reached the IP address capacity)` in `PrepareIPAllocation`. This makes the ENI limit check ineffective in situations where an ENI with free space somehow can't be assigned with more IPs, e.g. the subnet of ENI is exhausted. As a result, the operator continuously tries to create and attach ENIs to the instance, the createInterface calls would succeed but attachInterface calls might fail due to ENI limit, and ENIs are leaked and exhausted eventually. The cause of miscouting might be the ambiguous field name `AvailableInterfaces`, which might be interpreted with different meanings in different contexts: 1. empty ENI slots, in interface creation context 2. attached ENIs with IP space, in IP allocation context 3. sum of 1. and 2., in node ipam stats context This patch fixes this by checking max ENI limit against the correct value, and also replaces `AllocationAction.AvailableInterfaces` with clearer names `InterfaceCandidates` `EmptyInterfaceSlots` to avoid potential misunderstanding in the future. Fixes: #21747 Signed-off-by: Jaff Cheng <jaff.cheng.sh@gmail.com> 25 October 2022, 08:44:44 UTC
4c9c1d3 ipam: Fix overlapping PodCIDR allocation This commit fixes an edge case in the `NodesPodCIDRManager`. If there were any nodes on operator startup which have no PodCIDRs, the operator would sometimes assign PodCIDRs to these nodes which have already been allocated to other nodes. The operator assumed that when `k8sCiliumNodesCacheSynced` closes, all node events have been processed. And it proceeds to call `Resync` on the `nodeManager`. The `NodesPodCIDRManager` will queue any nodes without PodCIDRs to be allocated once the `canAllocatePodCIDRs` variable is set. This variable is set by the `Resync`. So, the assumption/expected behavior is that the `NodesPodCIDRManager.Update` function has been called for all nodes in the cache before `Resync` is called. However, this wasn't the case. The `startSynchronizingCiliumNodes` function starts the informer and connects the nodeManager to it. But instead of handling the events at once, the callbacks enqueue the events, to be handled by a separate go routine. This means that `k8sCiliumNodesCacheSynced` is closed once all of the node events are enqueued, not when they have been processed by the `nodeManager`. This commit fixes this behavior by processing all events at once in the informer callbacks until the full sync is complete, at which point we will switch over to using the workqueue. Fixes: #21482 Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com> 25 October 2022, 08:44:29 UTC
0f4ee73 ipam: Add test for overlapping PodCIDR assignment edge case There exists an edge case in the operators pod CIDR allocation code which causes the assignment of overlapping PodCIDRs. This commit adds a test which currently fails, proving that the bug exists. The test, tests that when the operator starts up, all assigned PodCIDR are marked as allocated so they will never be given out to nodes seeking a new PodCIDR. The next commit in this patch set will fix the bug, using this test to validate correct behavior. Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com> 25 October 2022, 08:44:29 UTC
0d6346d install: move cni config management to the agent This commit makes use of the logic added to the agent to manage the CNI configuration file. The "cni-uninstall.sh" script is no longer necessary, so we can just remove that. The cni-install.sh script only does binary installation now. It plumbs some helm variables a bit differently, but shouldn't change any behavior. Signed-off-by: Casey Callendrello <cdc@isovalent.com> 25 October 2022, 08:44:19 UTC
12b7b11 agent: generate and manage CNI configuration file, add to healthz Rather than doing this as part of a PostStart hook, let's move the logic in to the agent itself. This commit adds the logic to generate an appropriate CNI config file, ported over from cni-install.sh. It is fairly straightforward except for some subtleties around AWS. This will mean that pod creation no longer fails while the agent is starting up, which improves user experience. This also wires up status probing so that the daemon doesn't consider itself ready until a CNI configuration file is successfully written. This means that rollouts are now safer, since the DaemonSet controller will stop rollouts. Signed-off-by: Casey Callendrello <cdc@isovalent.com> 25 October 2022, 08:44:19 UTC
7d881aa vendor: add tidwall/gjson tidwall/sjson Signed-off-by: Casey Callendrello <cdc@isovalent.com> 25 October 2022, 08:44:19 UTC
174a500 api: add CNI config file writer status This adds a field to determine the status of the CNI file writer controller. This will be incorporated in to the overall Cilium status, since a failure to write CNI config may be a cluster-breaking event. Signed-off-by: Casey Callendrello <cdc@isovalent.com> 25 October 2022, 08:44:19 UTC
79fc7f2 docs: fix CNI setup mistakes The documentation erroneously stated that CNI configuration would be preserved. I verified that this isn't true; the CNI file is always overwritten (unless CNI management is disabled). Also, an environment variable was confuslingly described. Signed-off-by: Casey Callendrello <cdc@isovalent.com> 25 October 2022, 08:44:19 UTC
7032bf5 build(deps): bump github.com/coreos/go-systemd/v22 from 22.3.2 to 22.4.0 Bumps [github.com/coreos/go-systemd/v22](https://github.com/coreos/go-systemd) from 22.3.2 to 22.4.0. - [Release notes](https://github.com/coreos/go-systemd/releases) - [Commits](https://github.com/coreos/go-systemd/compare/v22.3.2...v22.4.0) --- updated-dependencies: - dependency-name: github.com/coreos/go-systemd/v22 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> 25 October 2022, 08:43:20 UTC
b4a4b62 docs: Update Ingress troubleshooting docs This commit is to add a small note showing which resources should be checked for shared LB mode. Signed-off-by: Tam Mach <tam.mach@cilium.io> 22 October 2022, 00:12:49 UTC
bd72515 docs: Update Ingress example docs Below items are done as part of this commit: - Embed Ingress resource in docs, so that user can reference easier. - Update output snippet for shared LB mode. Signed-off-by: Tam Mach <tam.mach@cilium.io> 22 October 2022, 00:12:49 UTC
e004b64 docs: Update Ingress installation docs with LB mode This is to make sure that the LB mode is visible for end-user to discover and follow. Signed-off-by: Tam Mach <tam.mach@cilium.io> 22 October 2022, 00:12:49 UTC
181059d gha: Add wait step for Conformance Ingress (shared) This is to make sure that the cleanup of sanity test is done before the conformance test is kicked off. The goal is to avoid any conflict in HTTP rules as the shared LB is used in this case. Signed-off-by: Tam Mach <tam.mach@cilium.io> 22 October 2022, 00:12:49 UTC
39ce43c helm: move pod capability configuration to chart values Previous iteration of the chart had hardcoded `SYS_MODULE` capability, which rendered the Pod unschedulable on platforms without kernel modules (such as Talos Linux). This change moves the default capabilities of all (init) containers to the values, which makes them configurable for end users. Of course, the old behaviour (which includes `SYS_MODULE`) is default. Fixes #20636 Signed-off-by: Jorik Jonker <jorik.jonker@eu.equinix.com> 22 October 2022, 00:10:10 UTC
d37c0b2 helm: document securityContext values It was documented as an object, while sub-values have clear meanings. Signed-off-by: Jorik Jonker <jorik.jonker@eu.equinix.com> 22 October 2022, 00:10:10 UTC
d9d04a1 probe: fix kernel config probe log In https://github.com/cilium/cilium/issues/20858 and my lab, I tested kernel with 1 missing kernel config file /boot/config-xxx and /proc/config.gz 2 missing required kernel config option CONFIG_NET_CLS_ACT when both condition 1 and 2 are met, the cilium agent does not log missing kernel config file, and missing required kernel config option. The reason is when missing kernel config file, SystemConfigProbes() returns ErrKernelConfigNotFound, but SystemConfigProbes() caller only logs error when SystemConfigProbes() detect missing required kernel config option, which would never happen because SystemConfigProbes() returned already and kernel config option check is skipped. After the fix: Warn missing kernel config file if cilium agent fail to start level=info msg="clang (10.0.0) and kernel (5.18.0) versions: OK!" subsys=linux-datapath level=info msg="linking environment: OK!" subsys=linux-datapath level=warning msg="Kernel Config file not found: Check required kernel config parameter \ if cilium agent fail to start!" subsys=probes …SNIP… level=warning msg="+ tc qdisc replace dev cilium_vxlan clsact" subsys=datapath-loader level=warning msg="Error: Cannot find ingress queue for specified device." subsys=datapath-loader Warn missing required kernel config parameter if kernel config file is found level=info msg="clang (10.0.0) and kernel (5.18.0) versions: OK!" subsys=linux-datapath level=info msg="linking environment: OK!" subsys=linux-datapath level=warning msg="BPF system config check: NOT OK." error="CONFIG_NET_CLS_ACT kernel parameter \ is required (needed for: Essential eBPF infrastructure)" subsys=linux-datapath …SNIP… level=warning msg="+ tc qdisc replace dev cilium_vxlan clsact" subsys=datapath-loader level=warning msg="Error: Cannot find ingress queue for specified device." subsys=datapath-loader Fix: #20858 Signed-off-by: Vincent Li <v.li@f5.com> 21 October 2022, 23:00:11 UTC
b91326b build(deps): bump go.etcd.io/etcd/[api|client/pkg]/v3 from 3.5.4 to 3.5.5 Bumps [go.etcd.io/etcd/api/v3](https://github.com/etcd-io/etcd) from 3.5.4 to 3.5.5. - [Release notes](https://github.com/etcd-io/etcd/releases) - [Changelog](https://github.com/etcd-io/etcd/blob/main/Dockerfile-release.amd64) - [Commits](https://github.com/etcd-io/etcd/compare/v3.5.4...v3.5.5) --- updated-dependencies: - dependency-name: go.etcd.io/etcd/api/v3 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> 21 October 2022, 22:56:01 UTC
37a1329 build(deps): bump KyleMayes/install-llvm-action from 1.5.4 to 1.5.5 Bumps [KyleMayes/install-llvm-action](https://github.com/KyleMayes/install-llvm-action) from 1.5.4 to 1.5.5. - [Release notes](https://github.com/KyleMayes/install-llvm-action/releases) - [Commits](https://github.com/KyleMayes/install-llvm-action/compare/c538b5e281d5fc40848a3a62636a3a2b6f5a1cfa...4f17b6579351fb03506d988e59077826c366412c) --- updated-dependencies: - dependency-name: KyleMayes/install-llvm-action dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> 21 October 2022, 22:43:52 UTC
a003635 clustermesh-apiserver: Add support for pprof Add support for pprof debug endpoints, disabled by default. Default port is set to 6063, to avoid clashing with daemon default port (6060), operator default port (6061) and hubble-relay default port (6062). Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com> 21 October 2022, 22:33:31 UTC
c9f7f90 hive: Use a new type for the start and stop contexts to avoid confusion The context passed to Start and Stop hooks is used to abort the start and stop in case of a timeout. This however is not that evident to devs new to this. This changes context.Context to hive.HookContext to make it clear that the context has a narrow purpose. Signed-off-by: Jussi Maki <jussi@isovalent.com> 21 October 2022, 22:33:02 UTC
e266722 k8s/resource: Make handlers optional in Event.Handle() In many cases the subscriber to resource events does not care to handle some event types (e.g. sync). Make the handler functions optional so the subscriber doesn't need to implement a no-op handler. Signed-off-by: Jussi Maki <jussi@isovalent.com> 21 October 2022, 22:33:02 UTC
8bf430b k8s/resource: Drop Store[T] from SyncEvent The store is already available via Resource.Store() and in most cases one uses either the Store directly, or the stream of events. When using the event stream the store is usually not needed as subscribers build their own projected view of the state via update events. Signed-off-by: Jussi Maki <jussi@isovalent.com> 21 October 2022, 22:33:02 UTC
655c313 stream: Add goroutine leak checks As stream is another library besides pkg/k8s/resource that uses a lot of goroutines, add the leak checks here as well. Fix an issue in Debounce. By handling the context cancellation directly instead of relying on error from the src observable, Debounce ended up not draining errors channel and causing a goroutine leak (ToChannel blocked on sending to errs channel). Signed-off-by: Jussi Maki <jussi@isovalent.com> 21 October 2022, 22:33:02 UTC
434c660 hive: Make Hook into an interface Simplifies dealing with lifecycle in the normal case: func (*MyObject) Start(context.Context) error { ... } func (*MyObject) Stop(context.Context) error { ... } func New(lc fx.Lifecycle) *MyObject { mo := &MyObject{} lc.Append(mo) return mo } Signed-off-by: Jussi Maki <jussi@isovalent.com> 21 October 2022, 22:33:02 UTC
8fb0504 k8s/resource: Make retries configurable Instead of a fixed 5 retries allow constructing the resource with a configurable retry strategy: type ErrorHandler func(key Key, numRetries int, err error) ErrorAction func WithErrorHandler(h ErrorHandler) Option Resource is now constructed with New that takes functional options: func New[T k8sRuntime.Object](lc hive.Lifecycle, lw cache.ListerWatcher, opts ...Option) Resource[T] This new API now allows building the ListerWatcher using other objects: cell.Provide( func(lc hive.Lifecycle, c client.Clientset, cfg MyConfig) resource.Resource[*MyObject] { lw := /* constructed based on 'cfg' */ return resource.New(lc, lw) }) Signed-off-by: Jussi Maki <jussi@isovalent.com> 21 October 2022, 22:33:02 UTC
d9ac332 k8s/client: Set up clientset in constructor The clientset was only constructed from OnStart in order to stop use of it prior to it being started (so that we can inspect the object graph without unwanted side-effects). This however made it inconvenient to use in cases where a specific interface was being extracted from the clientset, e.g. to build a ListerWatcher. The usability hit is not worth it, so change the k8s client to construct a usable clientset at construction time. Signed-off-by: Jussi Maki <jussi@isovalent.com> 21 October 2022, 22:33:02 UTC
1577620 logging: Initialize klog from SetupLogging The init() in logging bridges the klog library (used by client-go) to logrus. It uses the WriterLevel() method which forks of a goroutine that implements a io.Writer using a pipe. This unfortunately leaves multiple goroutine around that causes false positives for goroutine leak checks. Since SetupLogging() is always called in our applications we can perform the klog initialization from there and allow tests to do goroutine checks. Signed-off-by: Jussi Maki <jussi@isovalent.com> 21 October 2022, 22:33:02 UTC
320762b fqdn: convert cacheEntry.IPs and (*DNSCache).Update to netip.Addr Signed-off-by: Tobias Klauser <tobias@cilium.io> 21 October 2022, 22:29:50 UTC
944c93a ip: add MustAddrsFromIPs This helper function converts a []net.IP slice to a []netip.Addr slice and will be used while transitioning the code base to use netip types. Signed-off-by: Tobias Klauser <tobias@cilium.io> 21 October 2022, 22:29:50 UTC
1f8f99d fqdn: remove benchmarks covering net standard library functionality These served their purpose for performance evaluation during the initial implementation. Now we're switching to net/netip, so these benchmarks can be removed. Signed-off-by: Tobias Klauser <tobias@cilium.io> 21 October 2022, 22:29:50 UTC
6765bc8 fqdn: convert DNSZombieMapping.IP to netip.Addr Signed-off-by: Tobias Klauser <tobias@cilium.io> 21 October 2022, 22:29:50 UTC
d352da2 fqdn: use netip.Addr as DNSZombieMapping.deletes map key This avoid converting to string for map lookup. Signed-off-by: Tobias Klauser <tobias@cilium.io> 21 October 2022, 22:29:50 UTC
224f539 fqdn: use netip.Addr as DNSCache.reverse map key This avoid converting to string for map lookup. Signed-off-by: Tobias Klauser <tobias@cilium.io> 21 October 2022, 22:29:50 UTC
f73e29d fqdn: use netip.Addr as ipEntries map key This avoid converting to string for map lookup. Signed-off-by: Tobias Klauser <tobias@cilium.io> 21 October 2022, 22:29:50 UTC
d76fe63 fqdn: remove always-nil (*DNSZombieMappings).ForceExpire argument The cidr argument to (*DNSZombieMappings).ForceExpire is always passed as nil from all existing code paths. All usage where the cidr argument is provided is internal to the package and directly calls (*DNSZombieMappings).forceExpireLocked. Signed-off-by: Tobias Klauser <tobias@cilium.io> 21 October 2022, 22:29:50 UTC
ab59a05 pkg/labels: Optimize LabelArray String() Use string builder instead of concatenating strings manually, as the former is much more performant. $ benchstat old.txt new.txt name old time/op new time/op delta LabelArray_String-7 5.97µs ±11% 1.17µs ±13% -80.34% (p=0.000 n=30+30) name old alloc/op new alloc/op delta LabelArray_String-7 6.93kB ± 0% 0.50kB ± 0% -92.73% (p=0.000 n=30+30) name old allocs/op new allocs/op delta LabelArray_String-7 78.0 ± 0% 6.0 ± 0% -92.31% (p=0.000 n=30+30) Signed-off-by: Odin Ugedal <ougedal@palantir.com> 21 October 2022, 22:28:26 UTC
cdf0d12 pkg/labels: Optimize LabelArray GetModel() Allocate a long enough slice right away to avoid unnecessary allocations. $ benchstat old.txt new.txt name old time/op new time/op delta LabelArray_GetModel-7 2.02µs ±13% 1.46µs ±18% -28.04% (p=0.000 n=29+30) name old alloc/op new alloc/op delta LabelArray_GetModel-7 1.22kB ± 0% 0.62kB ± 0% -48.68% (p=0.000 n=30+30) name old allocs/op new allocs/op delta LabelArray_GetModel-7 32.0 ± 0% 27.0 ± 0% -15.62% (p=0.000 n=30+30) Signed-off-by: Odin Ugedal <ougedal@palantir.com> 21 October 2022, 22:28:26 UTC
bc20e84 fqdn/dnsproxy: Add concurrency grace period parameter Add a parameter to set the concurrency grace period of the dnsproxy. Doing this instead of relying directly on the value from the option package enables better reuse of the dnsproxy from other consumers. Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com> 21 October 2022, 22:25:01 UTC
back to top