sort by:
Revision Author Date Message Commit Date
08521c9 Prepare for release v1.10.12 Signed-off-by: Joe Stringer <joe@cilium.io> 10 June 2022, 23:41:39 UTC
9175938 envoy: Bump cilium envoy to latest version v1.21.3 [ upstream commit 85819de7518f411d61df106200dae247973c5117 ] The images digest is coming from below build. https://github.com/cilium/proxy/runs/6816960166?check_suite_focus=true. Release note: https://www.envoyproxy.io/docs/envoy/v1.21.3/version_history/current Signed-off-by: Tam Mach <tam.mach@cilium.io> 10 June 2022, 10:02:46 UTC
73f2028 ipam: Remove superfluous if statement [ upstream commit eac0dee9d6110d340ea7df6885486b4327052d2f ] The `node.Spec.IPAM.Pool` value is always overwritten after the removed `if` statement, so there is no need to initialize it. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> Signed-off-by: Jussi Maki <jussi@isovalent.com> 09 June 2022, 17:11:53 UTC
7adfc56 ipam: Fix inconsistent update of CiliumNodes [ upstream commit 6a1f1757c586ebda992a900fe0d757b6590e3d14 ] Currently, when the cilium-operator attaches new ENIs to a node, we update the corresponding CiliumNode in two steps: first the .Status, then the .Spec [1]. That can result in an inconsistent state, where the CiliumNode .Spec.IPAM.Pool contains new IP addresses associated with the new ENI, while .Status.ENI.ENIs is still missing the ENI. This inconsistency manisfests as a fatal: level=fatal msg="Error while creating daemon" error="Unable to allocate router IP for family ipv4: failed to associate IP 10.12.14.5 inside CiliumNode: unable to find ENI eni-9ab538c64feb9f59e" subsys=daemon This inconsistency occurs because the following can happen: 1. cilium-operator attaches a new ENI to the CiliumNode. 2. Still at cilium-operator, .Spec is synced with kube-apiserver. The IP pool is updated with a new set of IP addresses and the new ENI. 3. The agent receives this half-updated CiliumNode. 4. It allocates an IP address for the router from the pool of IPs attached to the new ENI, using .Spec.IPAM.Pool. 5. It fails because the new ENI is not listed in the .Status.ENI.ENIs of the CiliumNode object. 6. At cilium-operator, .Status is updated with the new ENI. But wait, you said .Status is updated before .Spec in the function you linked? Yes, but we read the state to populate CiliumNode from two separate places (n.ops.manager.instances and n.available) in the syncToAPIServer function and we don't have anything to prevent having a half updated (one place only) state in the middle of the update function. We lock twice, once for each place, instead of once for the while CiliumNode update. So having a half updated state in the middle of the function would technically be the same as updating .Spec first and .Status second. We can fix this by first creating a snapshot of the pool first, then write the .Status metadata (which may be more recent than the pool snapshot, which is safe, see comment in the source code of this patch), and then write the pool to .Spec. This ensures that the .Status is always updated before .Spec, but at the same time also ensures that .Status is still more recent than .Spec. 1 - https://github.com/cilium/cilium/blob/v1.12.0-rc2/pkg/ipam/node.go#L966-L1012 Co-authored-by: Sebastian Wicki <sebastian@isovalent.com> Signed-off-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> Signed-off-by: Jussi Maki <jussi@isovalent.com> 09 June 2022, 17:11:53 UTC
6bacf77 ui: drop envoy proxy container [ upstream commit cb6c554f0b27685c4032b591d091464e7e6004b3 ] Previously we used envoy proxy container to convert grpc-web traffic to true grpc traffic, so ui backend accepts real grpc traffic. There is an alternative approach to use special wrapper for grpc service on backend: https://github.com/improbable-eng/grpc-web/tree/master/go/grpcweb Related code changes on hubble-ui itself was implemented in this PR: https://github.com/cilium/hubble-ui/pull/226 Signed-off-by: Dmitry Kharitonov <dmitry@isovalent.com> Signed-off-by: Jussi Maki <jussi@isovalent.com> 09 June 2022, 17:11:53 UTC
4c7a729 Also take secondary CIDRs into account when checking IPv4NativeRoutingCIDR [ upstream commit e8b1210fb1e9a17f7c32e366c3f1dfd3b846b24d ] The given IPv4NativeRoutingCIDR is not necessarely part of the primary VPC CIDR and may as well be part of one of the secondary CIDRs. We should take these into account as well before bailing out. Signed-off-by: Alexander Block <ablock84@gmail.com> Signed-off-by: Chris Tarazi <chris@isovalent.com> 09 June 2022, 17:11:09 UTC
13d883b Move auto detection logic for IPv4NativeRoutingCIDR into own function [ upstream commit 6c6ab7422590518045ab0fc3b9afcf4951cb04c1 ] Signed-off-by: Alexander Block <ablock84@gmail.com> Signed-off-by: Chris Tarazi <chris@isovalent.com> 09 June 2022, 17:11:09 UTC
5bebaf5 build(deps): bump actions/cache from 3.0.3 to 3.0.4 Bumps [actions/cache](https://github.com/actions/cache) from 3.0.3 to 3.0.4. - [Release notes](https://github.com/actions/cache/releases) - [Changelog](https://github.com/actions/cache/blob/main/RELEASES.md) - [Commits](https://github.com/actions/cache/compare/30f413bfed0a2bc738fdfd409e5a9e96b24545fd...c3f1317a9e7b1ef106c153ac8c0f00fed3ddbc0d) --- updated-dependencies: - dependency-name: actions/cache dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> 08 June 2022, 23:20:44 UTC
b1e007a bugtool: Add structured node and health output [ upstream commit c6af5800f1f4e38d0ca5b161fac3750791ad9452 ] This commit adds the `-o json` output to `cilium node list` and `cilium-health status`, as the text version of both does not contain all details. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> Signed-off-by: Jussi Maki <jussi@isovalent.com> 08 June 2022, 08:16:39 UTC
c85faff Add constants for identity types [ upstream commit aa7572be0f7cfece0ebfb739716ab970c478d358 ] Signed-off-by: Vlad Ungureanu <ungureanuvladvictor@gmail.com> Signed-off-by: Jussi Maki <jussi@isovalent.com> 08 June 2022, 08:16:39 UTC
a6ffac9 Add type label to the identity metric [ upstream commit f9863ec990507a52bf2074a93d8f4725556aad0c ] Signed-off-by: Vlad Ungureanu <ungureanuvladvictor@gmail.com> Signed-off-by: Jussi Maki <jussi@isovalent.com> 08 June 2022, 08:16:39 UTC
c4aca8b clustermesh: Add ownerReferences for CiliumNodes [ upstream commit 3500290754b098c843ad774884626f35f5866a5a ] This commit is to add ownerReferences for CiliumNodes created by CiliumExternalWorkload, so that we don't unintentionally GC invalid CN. Thanks to @nathanejohnson for reporting this issue. Fixes: #19907 Signed-off-by: Tam Mach <tam.mach@cilium.io> Signed-off-by: Jussi Maki <jussi@isovalent.com> 08 June 2022, 08:16:39 UTC
168a8c1 pkg/fqdn: Fix missing delete for forward map [ upstream commit f439177bed8f99e04e1ad4dbbea5d5cbc54341ed ] This commit fixes a bug where the keys of the forward map inside the DNS cache were never removed, causing the map to grow forever. By contrast, the reverse map keys were being deleted. For both the forward and reverse maps (which are both maps whose values are another map), the inner map keys were being deleted. In other words, the delete on the outer map key was missing for the forward map. In addition to fixing the bug, this commit expands the unit test coverage to assert after any deletes (entries expiring or GC) that the forward and reverse maps contain what we expect. Particularly, in an environment where there are many unique DNS lookups (unique FQDNs) being done, this forward map could grow quite large over time, especially for a long-lived workload (endpoint). This fixes this memory-leak-like bug. Fixes: cf387ce5058 ("fqdn: Introduce TTL-aware cache for DNS retention") Fixes: f6ce522d55d ("FQDN: Added garbage collector functions.") Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Jussi Maki <jussi@isovalent.com> 08 June 2022, 08:16:39 UTC
ee92f56 helm: use port 80/443 by default for the peer service [ upstream commit 7c86fe77439a328ea905928fb937cb17c3d4e875 ] When the service port for the peer service is not specified, it is automatically assigned port 80 or port 443 (when TLS is enabled). Using these ports make it easy to understand whether TLS is enabled for the service or not. Moreover, it makes the behavior consistent with the Hubble Relay service. Signed-off-by: Robin Hahling <robin.hahling@gw-computing.net> Signed-off-by: Jussi Maki <jussi@isovalent.com> 08 June 2022, 08:16:39 UTC
d281e7a metrics: Bump prometheus client library [ upstream commit 044681a8540ba5cfc39b400bfc79a3c1723f9f13 ] This commit is to bump prometheus client library to the latest (e.g. v1.12.2) which will have a better support for go collector of different go versions, and fix NaN value. Fixes: #19985 Signed-off-by: Tam Mach <tam.mach@cilium.io> Signed-off-by: Jussi Maki <jussi@isovalent.com> 08 June 2022, 08:16:39 UTC
7c589ca daemon, metrics: Expose active FQDN connections per endpoint [ upstream commit 47870fe0d3ebdb4bcb859031480241fceadef48b ] This commit exposes new metrics that show the number of active names and IPs in the DNS cache, and number of alive FQDN connections that have expired (aka zombies) per endpoint. This is useful to track the endpoint's DNS cache and DNS zombie cache sizes over time. Note that these metrics are only updated during the FQDN GC which is currently invoked every minute. Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Jussi Maki <jussi@isovalent.com> 08 June 2022, 08:16:39 UTC
cf0b687 pkg/metrics: Define FQDN subsystem [ upstream commit d9ae1c4d790b5780b38d36a8c592286bd2e44599 ] This commit contains no functional changes and is only cosmetic to ease future commits when adding new FQDN-related metrics. Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Jussi Maki <jussi@isovalent.com> 08 June 2022, 08:16:39 UTC
c100a47 pkg/fqdn: Provide DNSCache count for metrics collection [ upstream commit 95362deef50f7077c516da7a6bc4a03b05d361d8 ] This commit adds a new convenience functions to get a count of * how many entries are inside the DNS cache (IPs) * and how many FQDNs are inside the DNS cache It will be used by upcoming commits to expose these values as a metrics. Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Jussi Maki <jussi@isovalent.com> 08 June 2022, 08:16:39 UTC
ae30cfb dameon: Change the default FQDN regex LRU to be 1024 [ upstream commit ce9583d8d8bbe618fb08abfa94c35e4fcc3153c7 ] Following the previous commit's benchmark result, let's update the LRU default size to be 1024, given that it only results in a few 10's of MBs increase when the cache nears full. Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Jussi Maki <jussi@isovalent.com> 08 June 2022, 08:16:39 UTC
9d0e0f1 dnsproxy: Add benchmark for large FQDN-based CNPs [ upstream commit 38c00367c7e0cefa1c9bbd51ad4414a839015bff ] When comparing efficieny of increasing the LRU size from 128 to 1024 with ~22k CNPs, we see the following results: ``` \# LRU size 128. $ go test -tags privileged_tests -v -run '^$' -bench Benchmark_perEPAllow_setPortRulesForID_large -benchmem -benchtime 1x -memprofile memprofile.out ./pkg/fqdn/dnsproxy > old.txt \# LRU size 1024. $ go test -tags privileged_tests -v -run '^$' -bench Benchmark_perEPAllow_setPortRulesForID_large -benchmem -benchtime 1x -memprofile memprofile.out ./pkg/fqdn/dnsproxy > new.txt $ benchcmp old.txt new.txt benchcmp is deprecated in favor of benchstat: https://pkg.go.dev/golang.org/x/perf/cmd/benchstat benchmark old ns/op new ns/op delta Benchmark_perEPAllow_setPortRulesForID_large-8 3954101340 3010934555 -23.85% benchmark old allocs new allocs delta Benchmark_perEPAllow_setPortRulesForID_large-8 26480632 24167742 -8.73% benchmark old bytes new bytes delta Benchmark_perEPAllow_setPortRulesForID_large-8 2899811832 1824062992 -37.10% ``` Here's the raw test run with LRU size at 128: ``` Before (N=1) Alloc = 31 MiB HeapInuse = 45 MiB Sys = 1260 MiB NumGC = 15 After (N=1) Alloc = 445 MiB HeapInuse = 459 MiB Sys = 1260 MiB NumGC = 40 ``` Here's the raw test run with LRU size at 1024: ``` Before (N=1) Alloc = 31 MiB HeapInuse = 48 MiB Sys = 1177 MiB NumGC = 17 After (N=1) Alloc = 78 MiB HeapInuse = 93 MiB Sys = 1177 MiB NumGC = 53 ``` We can see that it's saving ~300MB. Furthermore, if we compare the memprofiles from the benchmark run via ``` go tool pprof -http :8080 -diff_base memprofile.out memprofile.1024.out ``` we see an ~800MB reduction in the regex compilation. Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Jussi Maki <jussi@isovalent.com> 08 June 2022, 08:16:39 UTC
51f4daf daemon, fqdn: Add flag to control FQDN regex LRU size [ upstream commit 5fa7ae278340c41a60050cf564d60a52ad588b1b ] Advanced users can configure the LRU size for the cache holding the compiled regex expressions of FQDN match{Pattern,Name}. This is useful if users are experiencing high memory usage spikes with many FQDN policies that have repeated matchPattern or matchName across many different policies. Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Jussi Maki <jussi@isovalent.com> 08 June 2022, 08:16:39 UTC
2a3ff7c pkg/labels: Optimize SortedList() and FormatForKVStore() [ upstream commit 0790e076cb0a7a83a6314eb559c97811d3240bcb ] FormatForKVStore() previously returned a string for no reason as every caller converted the return value to a byte slice. This allows us to eliminate string concatenation entirely and use the bytes.Buffer directly. Building on the above, given that SortedList() returns a byte slice and calls FormatForKVStore() for its output, we can optimize it with the same technique to eliminate string concatenation. Here are the benchmark comparisons: ``` $ go test -v -run '^$' -bench 'BenchmarkLabels_SortedList|BenchmarkLabel_FormatForKVStore' -benchmem ./pkg/labels > old.txt $ go test -v -run '^$' -bench 'BenchmarkLabels_SortedList|BenchmarkLabel_FormatForKVStore' -benchmem ./pkg/labels > new.txt $ benchcmp old.txt new.txt benchcmp is deprecated in favor of benchstat: https://pkg.go.dev/golang.org/x/perf/cmd/benchstat benchmark old ns/op new ns/op delta BenchmarkLabels_SortedList-8 2612 1120 -57.12% BenchmarkLabel_FormatForKVStore-8 262 54.5 -79.18% benchmark old allocs new allocs delta BenchmarkLabels_SortedList-8 35 13 -62.86% BenchmarkLabel_FormatForKVStore-8 4 1 -75.00% benchmark old bytes new bytes delta BenchmarkLabels_SortedList-8 1112 664 -40.29% BenchmarkLabel_FormatForKVStore-8 96 48 -50.00% ``` Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Jussi Maki <jussi@isovalent.com> 08 June 2022, 08:16:39 UTC
2841700 pkg/labels: Add benchmark for hot labels code [ upstream commit 351c5d8f66eb7a3a35f833462a256c9fda36f51e ] SortedList() and FormatForKVStore() can be very hot code in environments where there's constant policy churn, especially CIDR policies where there can be a large number of CIDR labels. This commit adds benchmarks for later commits to use as a baseline. Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Jussi Maki <jussi@isovalent.com> 08 June 2022, 08:16:39 UTC
7211294 pkg/policy/api: Optimize FQDNSelector String() [ upstream commit 4c2c2446577d92fb15f0314efa487d77a6ce0812 ] Use strings.Builder instead of fmt.Sprintf() and preallocate the size of the string so that Go doesn't need to over-allocate if the string ends up longer than what the buffer growth algorithm predicts. Results: ``` $ go test -v -run '^$' -bench 'BenchmarkFQDNSelectorString' -benchmem ./pkg/policy/api > old.txt $ go test -v -run '^$' -bench 'BenchmarkFQDNSelectorString' -benchmem ./pkg/policy/api > new.txt $ benchcmp old.txt new.txt benchmark old ns/op new ns/op delta BenchmarkFQDNSelectorString-8 690 180 -73.97% benchmark old allocs new allocs delta BenchmarkFQDNSelectorString-8 9 4 -55.56% benchmark old bytes new bytes delta BenchmarkFQDNSelectorString-8 288 208 -27.78% ``` Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Jussi Maki <jussi@isovalent.com> 08 June 2022, 08:16:39 UTC
7702056 endpoint: Fix lock contention header file sync [ upstream commit 507332a082c8c4780e67f219dadf42f61772e908 ] Upon a successful DNS response, Cilium's DNS proxy code will sync the DNS history state to the individual endpoint's header file. Previously, this sync was done inside a trigger, however the calling code, (*Endpoint).SyncEndpointHeaderFile(), acquired a write-lock for no good reason. This effectively negated the benefits of having the DNS history sync behind a trigger of 5 seconds. This is especially suboptimal because the header file sync is actually causing Cilium to serialize processing the DNS request for a single endpoint. To illustrate the impact of the above a bit more concretely, if a single endpoint does 10 DNS requests at the same time, acquiring the write-lock causes the processing of those 10 requests to be done one at a time. For the sake of posterity, this is not the case if 10 endpoints were to make DNS requests in parallel. This obviously has a performance impact both in terms of being slow CPU-wise, but also memory-wise. Take for example a DNS request bursty environment, it could cause an uptick in memory usage due to many goroutines being created and blocking due to the serialized nature of locking. Now that the code is all executing behind a trigger, we can remove the lock completely and initialize the trigger setup where the Endpoint object is created (e.g. createEndpoint(), parseEndpoint()). Now the lock is only taken in every 5 seconds when the trigger runs. This should relieve the lock contention drastrically. For context, in a user's environment where the pprof was shared with us, there were around 440 goroutines with 203 of them stuck waiting inside SyncEndpointHeaderFile(). We can also modify SyncEndpointHeaderFile() to no longer return an error, because it's not possible for invoking the trigger to fail. If we fail to initialize the trigger itself, then we log an error, but this is essentially impossible because it can only fail if the trigger func is nil (which we control). Understanding the locking contention came from inspecting the pprof via the following command and subsequent code inspection. ``` go tool trace -http :8080 ./cilium ./pprof-trace ``` Suggested-by: Michi Mutsuzaki <michi@isovalent.com> Suggested-by: André Martins <andre@cilium.io> Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Jussi Maki <jussi@isovalent.com> 08 June 2022, 08:16:39 UTC
f4b2e1d cmd: Allow more complicated patterns in map string type. [ upstream commit 070ded019adbfee49d73bc7be0c6ba3bac0ef59c ] The previous PR #18478 wraps existing viper GetStringMapString function to get around upstream bugs, however, it's unintentionally restricted a few formats, which supported before in cilium, such as: - --aws-instance-limit-mapping=c6a.2xlarge=4,15,15,m4.xlarge=1,5,10 - --api-rate-limit=endpoint-create=rate-limit:10/s,rate-burst:10,parallel-requests:10,auto-adjust:true For complicated attribute, we are allowing comma character in value part of key value pair. As golang didn't support look-ahead functionalities in built-in regex library, this commit is to replace string.Split function by custom implementation to handle such scenario. Relates: #18478 Fixes: #18973 Signed-off-by: Tam Mach <tam.mach@cilium.io> Signed-off-by: Gilberto Bertin <jibi@cilium.io> 07 June 2022, 10:03:21 UTC
9745941 docs: Fix incorrect FQDN flag [ upstream commit 9c6e4245f0761d3e8bcf904785290e85f8fd336b ] Fixes: f6ce522d ("FQDN: Added garbage collector functions.") Signed-off-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Gilberto Bertin <jibi@cilium.io> 07 June 2022, 10:03:21 UTC
b8aeb84 docs: Fix max SPI value for IPsec key rotations [ upstream commit 54d708e5d812a00451adab99dae01609447de2cf ] The SPI value is expected to take 4 bits at most so it's maximum value should be 15 not 16. Let's fix that in the key rotation documentation. The agent also rejects value 0, so allowed values are [1;15]. Reported-by: Odin Ugedal via Slack Signed-off-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Gilberto Bertin <jibi@cilium.io> 07 June 2022, 10:03:21 UTC
a7eb2f3 Add counter to track all datapath timeouts due to FQDN IP updates [ upstream commit 29268926f571c3a008bafcd18ecfc9b494877627 ] Signed-off-by: Vlad Ungureanu <ungureanuvladvictor@gmil.com> Signed-off-by: Gilberto Bertin <jibi@cilium.io> 07 June 2022, 10:03:21 UTC
b1c1d8a api: change "group not found" log to debug [ upstream commit 80092ce0b2bf5351e4d71527904cbe401c2b8e4b ] Since commit 67f74ff ("images/cilium: remove cilium group from Dockerfile") the cilium group is no longer created in the image running the agent, resulting in the following log message on cilium-agent start: level=info msg="Group not found" error="group: unknown group cilium" file-path=/var/run/cilium/cilium.sock group=cilium subsys=api Change the log message to debug level to avoid confusion. Suggested-by: André Martins <andre@cilium.io> Signed-off-by: Tobias Klauser <tobias@cilium.io> Signed-off-by: Gilberto Bertin <jibi@cilium.io> 07 June 2022, 10:03:21 UTC
cfbcf30 Bugtool: Add additional tc commands. [ upstream commit b13dc89166e9a3d9eafe4a77fd96b389a05cfe1a ] The tc command prints out information not shown by bpftool. As well, it is possible that we may need information about tc entities that are not managed by Cilium when debugging Cilium issues. This adds extra bugtool commands to be run with cilium-bugtool. Including listing tc qdisc and getting filter/class/chain info for all network interfaces. Fixes: #17468 Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com> Signed-off-by: Gilberto Bertin <jibi@cilium.io> 07 June 2022, 10:03:21 UTC
b04c14a Do not disable peer service when hubble.listenAddress is empty [ upstream commit 0b58178dd5d8d9fac123c1cd7bb1c66e7c42c97f ] Configure the peerService to access hubble on the hubble.peerService.targetPort rather than determining the port from hubble.listenAddress which may be empty when using a sidecar to proxy to hubble. Signed-off-by: Chance Zibolski <chance.zibolski@gmail.com> Signed-off-by: Gilberto Bertin <jibi@cilium.io> 07 June 2022, 10:03:21 UTC
91d6b40 docs: Remove '\r' chars from grep result to parse Alpine image name [ upstream commit f17a17a3677de9eac7284c0546f77ab8401d9f35 ] The first step for building the cilium/docs-builder image, used for building Cilium's documentation, consists in pre-pulling the image with Docker (to avoid failures from buildkit). The relevant command is formed by parsing the name of the Alpine image from the Dockerfile. On some setups, for example on Ubuntu running in Windows WSL with the Cilium repository mounted from a Windows partition, the Dockerfile may contain DOS-style line breaks (CR-LF). The result from "grep" being piped to xargs and passed to "docker pull", we get an error because Docker cannot recognise a valid reference with this '\r' character at the end of the string. Let's remove any carriage return characters before feeding the line to xargs. Reported-by: Yoyo Wu <yoyo19980720@163.com> Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Gilberto Bertin <jibi@cilium.io> 07 June 2022, 10:03:21 UTC
225f11d docs: Add docs-builder build as dependency to live preview [ upstream commit 70676ec2c0f50074238b9da9edbe5b38a8f3544d ] With "make render-docs-live-preview", we use the cilium/docs-builder image to build a preview of the documentation, to serve it locally, and to watch the source files for changes to update automatically the preview. When the Docker image is present locally, the command uses it. When this is not the case, it pulls it from Docker, in its ":latest" version by default. This can be an issue due to commit 0da7224218ab ("ci: pin down image for documentation workflow"), where we pinned down the docs-builder image to use in the CI. Since this commit, the reference image is not longer ":latest", but the tag in use in the CI files. As a consequence, the live preview may attempt to use an outdated version of the image. This is currently the case: running the command with no local image raises an error about a missing "myst_parser" extension, which is not present on the version tagged with ":latest". To fix this, we mark builder-image as a dependency for the render-docs-live-preview target, so that the image gets built locally. Reported-by: Yoyo Wu <yoyo19980720@163.com> Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Gilberto Bertin <jibi@cilium.io> 07 June 2022, 10:03:21 UTC
73d5621 ipsec: Fix off-by-one error on max SPI [ upstream commit c88244cc3be4bbb92cc64714d545b555e4afaeb8 ] We encoded the SPI (aka keyID) on 4 bits [1] in the xfrm and packet marks. The maximum value is therefore 15 and not 16. This commit fixes the check on the maximum keyID value. Note the documentation for IPsec key rotation already has the correct value [2] so there shouldn't be any users with an incorrect keyID. 1 - https://github.com/cilium/cilium/blob/v1.10.1/pkg/datapath/linux/ipsec/ipsec_linux.go#L147-L150 2 - https://docs.cilium.io/en/v1.10/gettingstarted/encryption-ipsec/#key-rotation Fixes: b698972 ("cilium: ipsec, support rolling updates") Signed-off-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Gilberto Bertin <jibi@cilium.io> 07 June 2022, 10:03:21 UTC
0f78a8c build(deps): bump actions/cache from 3.0.2 to 3.0.3 Bumps [actions/cache](https://github.com/actions/cache) from 3.0.2 to 3.0.3. - [Release notes](https://github.com/actions/cache/releases) - [Changelog](https://github.com/actions/cache/blob/main/RELEASES.md) - [Commits](https://github.com/actions/cache/compare/48af2dc4a9e8278b89d7fa154b955c30c6aaab09...30f413bfed0a2bc738fdfd409e5a9e96b24545fd) --- updated-dependencies: - dependency-name: actions/cache dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> 31 May 2022, 17:59:45 UTC
3d481ff tests: fix K8sCLI tests based on labels [ upstream commit b42e5a0d197b1dba8a118725dd4238e271d874d2 ] New GKE clusters have the automatic labelling feature gate enabled by default, so the labels used in the `Identity CLI testing` `K8sCLI` test need to be updated with the additional `k8s:io.cilium.k8s.namespace.labels.kubernetes.io/metadata.name` automatic label. Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com> Signed-off-by: Gilberto Bertin <jibi@cilium.io> 30 May 2022, 21:48:37 UTC
fe6d2ab jenkins: switch to ad-hoc GKE cluster creation/deletion [ upstream commit 6260073b25f1584e3ad4ba67c034c421ea40f91b ] The general idea is to remove the need for our permanent pool of GKE clusters + management cluster (that manages the pool via Config Connector). Instead, we switch to ad-hoc clusters as we do on CI 3.0. This should: - Remove the upper limit on the number of concurrent Jenkins GKE jobs. - Remove the need for permanent clusters (reduce CI costs). - Have no effect on the setup time required before the tests actually start running on GKE clusters. - Improve control over GKE features (e.g. `DenyServiceExternalIPs` admission controller) that cannot be controlled via CNRM / Config Connector. Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com> Signed-off-by: Gilberto Bertin <jibi@cilium.io> 30 May 2022, 21:48:37 UTC
0032f94 ci: set Cilium base version to v1.10.12 in kind conformance test The agent health check port will be changed for v1.10.12, ref. https://github.com/cilium/cilium-cli/pull/869. Set the base version explicitly to already use the correct port in PRs in preparation for v1.10. Signed-off-by: Tobias Klauser <tobias@cilium.io> 30 May 2022, 21:48:37 UTC
dd21fd1 daemon, helm: change default agent health check port to avoid conflicts [ upstream commit 22cd47ef496be1c0b78ff8d146b2240810e78978 ] The default value 9876 for the agent health port introduced in commit efffbdbebdbf ("daemon: expose HTTP endpoint on localhost for health checks") conflicts with Istio's ControlZ port. The latter has been in use for longer. Because the port is only exposed on localhost for use by liveness/readiness probe, we can change the default value without breaking users. Thus, change it to 9879 for which there doesn't seem to be any documented use. Suggested-by: Joe Stringer <joe@cilium.io> Signed-off-by: Tobias Klauser <tobias@cilium.io> Signed-off-by: Chris Tarazi <chris@isovalent.com> 30 May 2022, 21:48:37 UTC
d146c29 docs: agent health port is only listening on localhost [ upstream commit 422a9df7bea3ed139f39690a7da0e74094b9d00c ] Ref. https://github.com/cilium/cilium/blob/dfa6b157e8e9484c65fd938b2b45a4f5f50c61f9/daemon/cmd/agenthealth.go#L25-L31 Signed-off-by: Tobias Klauser <tobias@cilium.io> Signed-off-by: Chris Tarazi <chris@isovalent.com> 30 May 2022, 21:48:37 UTC
e712453 cli: Update regex for key value validation [ upstream commit eaa714110d878a59e84b960a6cbc05e321b6e0fe ] This commit is to allow empty value, as well as @ character in value for key value pair validation. Fixes: #19793 Signed-off-by: Tam Mach <tam.mach@cilium.io> Signed-off-by: Chris Tarazi <chris@isovalent.com> 30 May 2022, 21:48:37 UTC
d14a799 docs: Fix incorrect command in IPsec GSG [ upstream commit c0e20dc99a836ab786f68b14ef290c3dfb42e0a2 ] The encryption-interface flag doesn't exist. It is called encrypt-interface. Fixes: 3662560f ("Add new unified Helm guide") Signed-off-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Chris Tarazi <chris@isovalent.com> 30 May 2022, 21:48:37 UTC
fd4ac3c Add concurrency limiting for DNS message processing [ upstream commit f48198195c29976fb4d378c281ba6d2aa20a9e3f ] This commit does the following: * Adds a configuration option for controlling the concurrency of the DNS proxy * Adds a configuration option for the semaphore (from above) timeout * Exposes an additional metric for the time taken to perform the policy check on a DNS request within the DNS proxy The concurrency limitation is done by introducing a semaphore to DNS proxy. By default, no such limit is imposed. Users are advised to take into account the number of DNS requests[1] and how many CPUs on each node in their cluster in order to come up with an appropriate concurrency limit. In addition, we expose the semaphore grace period as a configurable option. Assuming the "right" for this timeout is a tradeoff that we shouldn't really assume for the user. The semaphore grace period is to prevent the situation where Cilium deadlocks or consistently high rate of DNS traffic causing Cilium to be unable to keep up. See https://github.com/cilium/cilium/pull/19543#discussion_r862136748 by <joe@cilium.io>. The user can take into account the rate that they expect DNS requests to be following into Cilium and how many of those requests should be processed without retrying. If retrying isn't an issue then keeping the grace period at 0 (default) will immediately free the goroutine handling the DNS request if the semaphore acquire fails. Conversely, if a backlog of "unproductive" goroutines is acceptable (and DNS request retries are not), then setting the grace period is advisable. This gives the goroutines some time to acquire the semaphore. Goroutines could pile up if the grace period is too high and there's a consistently high rate of DNS requests. It's worth noting that blindly increasing the concurrency limit will not linearly improve performance. It might actually degrade instead due to internal downstream lock contention (as seen by the recent commits to move Endpoint-related functions to use read-locks). Ultimately, it becomes a tradeoff between high number of semaphore timeouts (dropped DNS requests that must be retried) or high number of (unproductive) goroutines, which can consume system resources. [1]: The metric to monitor is ``` cilium_policy_l7_total{rule="received"} ``` Co-authored-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Maciej Kwiek <maciej@isovalent.com> Signed-off-by: Chris Tarazi <chris@isovalent.com> 30 May 2022, 21:48:37 UTC
778196d helm: don't generate the hubble-peer svc during preflight checks [ upstream commit 35f2ff3f84c5caaf4506210cc3c320f3ff7f51f4 ] Before this patch, the hubble-peer Service would be deployed during preflight check, which will in turn prevent Cilium to be installed as it would attempt to install it again. Suggested-by: Sebastian Wicki <sebastian@isovalent.com> Signed-off-by: Alexandre Perrin <alex@kaworu.ch> Signed-off-by: Chris Tarazi <chris@isovalent.com> 30 May 2022, 21:48:37 UTC
eec579c fqdn/dnsproxy: Improve error wrapping [ upstream commit 6b57b5257ced34b765bba77e702141541a5ef3b6 ] This commit fixes the error wrapping inside the dnsproxy package. When a DNS response is being processed by the DNS proxy inside NotifyOnDNSMsg(), we check ProxyRequestContext. If the request response timed out, we annotate the metrics specifically to indicate that it timed out. This relies on the errors being properly wrapped. In order to do this, errors.As() is used and all errors are properly wrapped with '%w'. Signed-off-by: Chris Tarazi <chris@isovalent.com> 30 May 2022, 21:48:37 UTC
e577e4e workflow: l4lb: pass correct path for PR checkout Passing ${{ github.workspace }}/pull-request to the init.sh script as path of the PR checkout is incorrect, as inside the VM the repository will be under a different path. Instead of that, pass the correct absolute path for the PR checkout. Moreover, checkout first the upstream repo (under cilium/cilium) and only after that the PR one (under cilium/cilium/pull-request), as otherwise the upstream checkout will nuke the directory with PR one. This PR includes only the changes related to test/l4lb/test.sh, as the workflow is being updated on the master PR #20007. Fixes: #20004 Signed-off-by: Gilberto Bertin <jibi@cilium.io> 30 May 2022, 14:25:35 UTC
aa410e4 tests-l4lb: Use Helm chart from local branch In addition to the upstream (i.e. v1.10, v1.11 and master branches) checkouts, checkout also the pull request branch so that test.sh uses the local Helm chart from the pull request. This ensures that any Helm-related changes get reflected in the CI run. This PR includes only the changes related to test/l4lb/test.sh, as the workflow is being updated on the master PR #19953. Signed-off-by: Gilberto Bertin <jibi@cilium.io> 30 May 2022, 10:25:20 UTC
44392b1 build(deps): bump actions/setup-go from 3.1.0 to 3.2.0 Bumps [actions/setup-go](https://github.com/actions/setup-go) from 3.1.0 to 3.2.0. - [Release notes](https://github.com/actions/setup-go/releases) - [Commits](https://github.com/actions/setup-go/compare/fcdc43634adb5f7ae75a9d7a9b9361790f7293e2...b22fbbc2921299758641fab08929b4ac52b32923) --- updated-dependencies: - dependency-name: actions/setup-go dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> 27 May 2022, 08:26:19 UTC
2199c7b .github/workflows: bump kind workflow to cilium-cli v0.10.6 This should fix issues occuring after the change of the agent health check port in https://github.com/cilium/cilium/pull/19830 Signed-off-by: Tobias Klauser <tobias@cilium.io> 24 May 2022, 13:45:53 UTC
5329912 build(deps): bump actions/upload-artifact from 3.0.0 to 3.1.0 Bumps [actions/upload-artifact](https://github.com/actions/upload-artifact) from 3.0.0 to 3.1.0. - [Release notes](https://github.com/actions/upload-artifact/releases) - [Commits](https://github.com/actions/upload-artifact/compare/6673cd052c4cd6fcf4b4e6e60ea986c889389535...3cea5372237819ed00197afe530f5a7ea3e805c8) --- updated-dependencies: - dependency-name: actions/upload-artifact dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> 23 May 2022, 20:57:15 UTC
04998af build(deps): bump KyleMayes/install-llvm-action from 1.5.2 to 1.5.3 Bumps [KyleMayes/install-llvm-action](https://github.com/KyleMayes/install-llvm-action) from 1.5.2 to 1.5.3. - [Release notes](https://github.com/KyleMayes/install-llvm-action/releases) - [Commits](https://github.com/KyleMayes/install-llvm-action/compare/v1.5.2...v1.5.3) --- updated-dependencies: - dependency-name: KyleMayes/install-llvm-action dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> 23 May 2022, 20:57:07 UTC
752e0bd bug: Fix Hubble Peer Service Helm File Location For 1.10 Hubble Peer Service needs to live at hubble-peer-service.yaml in the root template directory and not ./hubble/peer-service.yaml Signed-off-by: Nate Sweet <nathanjsweet@pm.me> 23 May 2022, 18:54:08 UTC
449ec45 k8s: Update libraries to v1.21.11 Signed-off-by: Nate Sweet <nathanjsweet@pm.me> 23 May 2022, 12:43:25 UTC
4d48e94 .github/workflows: bump kind workflow to cilium-cli v0.10.5 Signed-off-by: Tobias Klauser <tobias@cilium.io> 20 May 2022, 17:07:40 UTC
ec3ee92 build(deps): bump actions/setup-go from 3.0.0 to 3.1.0 Bumps [actions/setup-go](https://github.com/actions/setup-go) from 3.0.0 to 3.1.0. - [Release notes](https://github.com/actions/setup-go/releases) - [Commits](https://github.com/actions/setup-go/compare/f6164bd8c8acb4a71fb2791a8b6c4024ff038dab...fcdc43634adb5f7ae75a9d7a9b9361790f7293e2) --- updated-dependencies: - dependency-name: actions/setup-go dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> 17 May 2022, 18:48:52 UTC
8140e68 docs: Document operator.unmanagedPodWatcher [ upstream commit 75be918a9f5a2f31a3e5efb3404d694415052397 ] Describe what the new operator.unmanagedPodWatcher option does in the upgrade guide. Signed-off-by: Joe Stringer <joe@cilium.io> 16 May 2022, 22:35:41 UTC
01a2683 install: Update image digests for v1.10.11 Generated from https://github.com/cilium/cilium/actions/runs/2333766876. `docker.io/cilium/cilium:v1.10.11@sha256:48e1a261046c2e534e370f960f0920233f9fd5ad4623aebdca0e403264a06202` `quay.io/cilium/cilium:v1.10.11@sha256:48e1a261046c2e534e370f960f0920233f9fd5ad4623aebdca0e403264a06202` `docker.io/cilium/clustermesh-apiserver:v1.10.11@sha256:ea07dd1c842befe9c5941a328497a47d41b2af47379527750e4b0f03af20532b` `quay.io/cilium/clustermesh-apiserver:v1.10.11@sha256:ea07dd1c842befe9c5941a328497a47d41b2af47379527750e4b0f03af20532b` `docker.io/cilium/docker-plugin:v1.10.11@sha256:b2bec081798391e348b1dcb6669a523e3a8adc70850c403d923fa897688251f6` `quay.io/cilium/docker-plugin:v1.10.11@sha256:b2bec081798391e348b1dcb6669a523e3a8adc70850c403d923fa897688251f6` `docker.io/cilium/hubble-relay:v1.10.11@sha256:8f30fb40bd46be4d1bfb55eb91cff7d0f8958eeb486d6184b5495f6624cf6ff1` `quay.io/cilium/hubble-relay:v1.10.11@sha256:8f30fb40bd46be4d1bfb55eb91cff7d0f8958eeb486d6184b5495f6624cf6ff1` `docker.io/cilium/operator-alibabacloud:v1.10.11@sha256:83e18445ef3285317ed712514966cda8213722f548bea5ded61ad3446067b94b` `quay.io/cilium/operator-alibabacloud:v1.10.11@sha256:83e18445ef3285317ed712514966cda8213722f548bea5ded61ad3446067b94b` `docker.io/cilium/operator-aws:v1.10.11@sha256:aed283cb4932fec07746c09770b7a9ec959aab6d5051dfdd3449c9d7d9be2a33` `quay.io/cilium/operator-aws:v1.10.11@sha256:aed283cb4932fec07746c09770b7a9ec959aab6d5051dfdd3449c9d7d9be2a33` `docker.io/cilium/operator-azure:v1.10.11@sha256:1acea544097ede5f120d190309b46c1ea62da5fa6c61203945073d86a7891203` `quay.io/cilium/operator-azure:v1.10.11@sha256:1acea544097ede5f120d190309b46c1ea62da5fa6c61203945073d86a7891203` `docker.io/cilium/operator-generic:v1.10.11@sha256:468ce59342298f1cf87ca8512cd9192754e83348b347a4bc7c27158ef9c4a37d` `quay.io/cilium/operator-generic:v1.10.11@sha256:468ce59342298f1cf87ca8512cd9192754e83348b347a4bc7c27158ef9c4a37d` `docker.io/cilium/operator:v1.10.11@sha256:d24af610f2e55f9ff1737690bb09ae948cc390c9cfd88b5d3728d747cc7a3a25` `quay.io/cilium/operator:v1.10.11@sha256:d24af610f2e55f9ff1737690bb09ae948cc390c9cfd88b5d3728d747cc7a3a25` Signed-off-by: Joe Stringer <joe@cilium.io> 16 May 2022, 19:17:20 UTC
821d769 build(deps): bump golangci/golangci-lint-action from 3.1.0 to 3.2.0 Bumps [golangci/golangci-lint-action](https://github.com/golangci/golangci-lint-action) from 3.1.0 to 3.2.0. - [Release notes](https://github.com/golangci/golangci-lint-action/releases) - [Commits](https://github.com/golangci/golangci-lint-action/compare/b517f99ae23d86ecc4c0dec08dcf48d2336abc29...537aa1903e5d359d0b27dbc19ddd22c5087f3fbc) --- updated-dependencies: - dependency-name: golangci/golangci-lint-action dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> 11 May 2022, 18:40:49 UTC
33ca4b9 Prepare for release v1.10.11 Signed-off-by: André Martins <andre@cilium.io> 10 May 2022, 00:35:51 UTC
3016cfb helm: Add nodes-gc-interval attribute [ upstream commit 167fa2bac9c7fa062046de6f73da98d585266274 ] This commit is to map operator CLI node GC interval flag in helm value and config map. Signed-off-by: Tam Mach <tam.mach@isovalent.com> Signed-off-by: Alexandre Perrin <alex@kaworu.ch> 09 May 2022, 23:56:00 UTC
b7bdd9a operator: Add cilium node garbage collector [ upstream commit edc1a0a0d2689473469d209027024962a9d55073 ] In the normal scenario, CiliumNode is created by agent with owner references attached all time in below PR[0]. However, there could be the case that CiliumNode is created by IPAM module[1], which didn't have any ownerReferences at all. For this case, if the corresponding node got terminated and never came back with same name, the CiliumNode resource is still dangling, and needs to be garbage collected. This commit is to add garbage collector for CiliumNode, with below logic: - Gargage collector will run every predefined interval (e.g. specify by flag --nodes-gc-interval) - Each run will check if CiliumNode is having a counterpart k8s node resource. Also, remove this node from GC candidate if required. - If yes, CiliumNode is considered as valid, happy day. - If no, check if ownerReferences are set. - If yes, let k8s perform garbage collection. - If no, mark the node as GC candidate. If in the next run, this node is still in GC candidate, remove it. References: [0]: https://github.com/cilium/cilium/pull/17329 [1]: https://github.com/cilium/cilium/blob/master/pkg/ipam/allocator/podcidr/podcidr.go#L258 Signed-off-by: Tam Mach <tam.mach@isovalent.com> Signed-off-by: Alexandre Perrin <alex@kaworu.ch> 09 May 2022, 23:56:00 UTC
69bc6b0 operator: Add sync.Once for k8s node watcher [ upstream commit 64b37e1c478ca3a6f61389aea9c8838edb49f604 ] This is to make sure that k8s node watcher is only setup at max once. Also, synced channel is added, so that the consumer of this store knows if syncing process is done. Signed-off-by: Tam Mach <tam.mach@isovalent.com> Signed-off-by: Alexandre Perrin <alex@kaworu.ch> 09 May 2022, 23:56:00 UTC
3ac7f8e operator: Refactor k8s node watcher for re-usability [ upstream commit e80f60d381238218d1d656035396ff903ed0eef9 ] This commit is to perform lift and shift current initialization for k8s node watcher to another function with single scope of work, so that it can be re-used later. There is no change in logic. Signed-off-by: Tam Mach <tam.mach@isovalent.com> Signed-off-by: Alexandre Perrin <alex@kaworu.ch> 09 May 2022, 23:56:00 UTC
9fdb3a4 images/cilium: remove cilium group from Dockerfile [ upstream commit 67f74ff432010770b43286f32110b8f4cd338e1b ] Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Alexandre Perrin <alex@kaworu.ch> 09 May 2022, 23:56:00 UTC
46af205 test/upgrade: use the unreleased helm chart of stable branches [ upstream commit 88d31cdbb052d00ab23575ffc8f73eedfd4437a7 ] The upgrade tests are using the official helm charts with unreleased Cilium images. This can might cause the upgrade tests to fail in case the changes done in the unreleased Cilium versions require a new helm chart release. To fix this problem the upgrade tests will now use the unreleased helm charts as well. Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Alexandre Perrin <alex@kaworu.ch> 09 May 2022, 23:56:00 UTC
da1c1f8 hubble/relay: Make Peer Service a K8s Service [ upstream commit 21e6e6ade37fb6ac9d38dd885c3f236bdb869c26 ] Currently Hubble-Relay builds its client list by querying the Peer Service over the local Hubble Unix domain socket. This goes against best security practices (sharing files across pods) and is not allowed on platforms that strictly enforce SELinux policies (e.g. OpenShift). This PR enables, by default, the creation of a Kubernetes Service that proxies the Hubble Peer Service so that Hubble-Relay can use it to build its client list, eliminating the need for a shared Unix domain socket completely. Helm values and configurations have been added to enable the service in a cilium deployment. Signed-off-by: Nate Sweet <nathanjsweet@pm.me> Signed-off-by: Alexandre Perrin <alex@kaworu.ch> 09 May 2022, 23:56:00 UTC
52896d6 images/runtime: update CNI plugins to 1.1.1 This allows Cilium to run in environments using containerd which do not have a loopback CNI plugin binary available. Reported-by: Krzysztof Nazarewski on Slack Suggested-by: André Martins <andre@cilium.io> Signed-off-by: Tobias Klauser <tobias@cilium.io> 09 May 2022, 21:26:20 UTC
f916333 build(deps): bump docker/build-push-action from 2.10.0 to 3 Bumps [docker/build-push-action](https://github.com/docker/build-push-action) from 2.10.0 to 3. - [Release notes](https://github.com/docker/build-push-action/releases) - [Commits](https://github.com/docker/build-push-action/compare/ac9327eae2b366085ac7f6a2d02df8aa8ead720a...e551b19e49efd4e98792db7592c17c09b89db8d8) --- updated-dependencies: - dependency-name: docker/build-push-action dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> 06 May 2022, 12:27:43 UTC
3d16a60 build(deps): bump docker/setup-buildx-action from 1.7.0 to 2 Bumps [docker/setup-buildx-action](https://github.com/docker/setup-buildx-action) from 1.7.0 to 2. - [Release notes](https://github.com/docker/setup-buildx-action/releases) - [Commits](https://github.com/docker/setup-buildx-action/compare/f211e3e9ded2d9377c8cadc4489a4e38014bc4c9...dc7b9719a96d48369863986a06765841d7ea23f6) --- updated-dependencies: - dependency-name: docker/setup-buildx-action dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> 06 May 2022, 12:27:33 UTC
b305900 build(deps): bump docker/setup-qemu-action from 1.2.0 to 2 Bumps [docker/setup-qemu-action](https://github.com/docker/setup-qemu-action) from 1.2.0 to 2. - [Release notes](https://github.com/docker/setup-qemu-action/releases) - [Commits](https://github.com/docker/setup-qemu-action/compare/27d0a4f181a40b142cce983c5393082c365d1480...8b122486cedac8393e77aa9734c3528886e4a1a8) --- updated-dependencies: - dependency-name: docker/setup-qemu-action dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> 06 May 2022, 12:27:26 UTC
7fc216f build(deps): bump docker/login-action from 1.14.1 to 2 Bumps [docker/login-action](https://github.com/docker/login-action) from 1.14.1 to 2. - [Release notes](https://github.com/docker/login-action/releases) - [Commits](https://github.com/docker/login-action/compare/dd4fa0671be5250ee6f50aedf4cb05514abda2c7...49ed152c8eca782a232dede0303416e8f356c37b) --- updated-dependencies: - dependency-name: docker/login-action dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> 06 May 2022, 12:27:17 UTC
922b949 pkg/k8s: use subresource "nodes/status" to update node annotations [ upstream commit 9014253d3640f1d2df836890f52497ac4072d88d ] We can use the "status" subresource to update node annotations which also allow us to reduce the clusterrole's permissions of the cilium DaemonSet even further. Signed-off-by: André Martins <andre@cilium.io> 04 May 2022, 20:19:43 UTC
23f77d5 operator: move certain K8s Node operations to cilium-operator [ upstream commit f612c97aacbb44e6cc7c3587541c53dd0296d5ea ] To decrease the amount of permissions Cilium's requires to operate in a cluster, the node taint removal and the setup of the node condition NetworkUnavailable can be set through cilium-operator. Cilium-operator will remove, if set, the Cilium's specific node taints from the Kubernetes nodes as well as setting up the NetworkUnavailable node condition to 'false' once it detects there is a "Ready" Cilium pod in that node. Signed-off-by: André Martins <andre@cilium.io> 04 May 2022, 20:19:43 UTC
c374991 install: default AnnotateK8sNode to true [ upstream commit 73d6cae2c90600cee5c61a0ea452b7a2a3129dd9 ] Since this option only existed to set up annotations in Kubernetes Nodes before the introduction of CiliumNodes, contrary to the upstream commit this option will be kept to 'true' with the possibility for users to change it to 'false'. Signed-off-by: André Martins <andre@cilium.io> 04 May 2022, 20:19:43 UTC
9bc8cf0 install/kubernetes: trimmed down clustermesh-apiserver's ClusterRole Trimmed down clustermesh-apiserver's ClusterRole to the exact permissions that clustermesh-apiserver requires. Signed-off-by: André Martins <andre@cilium.io> 04 May 2022, 20:19:43 UTC
5602c59 install/kubernetes: remove finalizers for Cilium resources [ upstream commit d02833801430125c018d96083881d0387554d053 ] Follow up of 0f4d3a71b055 ("helm: Remove Unnecessary RBAC Permissions for Agent") Signed-off-by: André Martins <andre@cilium.io> 04 May 2022, 20:19:43 UTC
81e2049 install/kubernetes: remove update pod from Cilium's clusterrole [ upstream commit 2d63c9b17bdb8838683990d96fda5f579dd56da5 ] Cilium does not need to perform any Pod update thus this permission can be removed from Cilium's Cluster Role. Signed-off-by: André Martins <andre@cilium.io> 04 May 2022, 20:19:43 UTC
bf033a5 pkg/k8s: remove BlockOwnerDeletion: true from CEP [ upstream commit 900f66879ad4c66e62eaa334fe7bd6ab2119e5b1 ] Since Cilium does not set any finalizer in the owner of the CEP, a Pod, it does not make sense to set "BlockOwnerDeletion: true". Regardless of this option being `true` or `false`, the Pod dependent, in this case the CEP, is always* Garbage Collected by Kubernetes. *Only if the user specifies the pod deletion with the "orphan" deletion cascading strategy that the CEP will be kept. However, Cilium Operator will garbage collect orphaned Cilium Endpoints every 5 minutes by default. Signed-off-by: André Martins <andre@cilium.io> 04 May 2022, 20:19:43 UTC
3e1c673 metrics: Add go_* metrics and go_build_info metrics [ upstream commit 6c5e2d66f3e3efcf9723c4cbb19d92ae80bcb1d7 ] [ Backporter's notes: Needed to bump the Prometheus module in order to make the upstream commit pass Go's mod checks. ] Prometheus provides metrics collectors that expose go runtime and go build information, which can be useful to server administrators, lets expose them. Signed-off-by: Chance Zibolski <chance.zibolski@gmail.com> Signed-off-by: Chris Tarazi <chris@isovalent.com> 03 May 2022, 22:32:28 UTC
8f2e008 docs: set the right url for API version check [ upstream commit af8151d730ac48789773ad5c970c6d2858bab76c ] The right format for this field should contain the protocol and a trailing "/" to work properly. Fixes: b3b05029e4c9 ("docs: fix version warning URL to point to docs.cilium.io") Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Aditi Ghag <aditi@cilium.io> 03 May 2022, 17:13:29 UTC
ca473d2 docs: Update max MTU value for Nodeport XDP on AWS [ upstream commit 1db91caffba860afb81f796ea021f4db0712a42b ] The documentation for setting up Nodeport XDP acceleration on AWS mentions that the MTU for the ena interface must be lower down so that XDP can work. It is indeed necessary; but the value which is provided as the maximal possible MTU is outdated, and not working. After installing the latest kernel through the RPM package kernel-ng (as prescribed in the documentation), the EKS nodes currently end up with Linux 5.10: $ uname -r 5.10.106-102.504.amzn2.x86_64 If we keep on following the docs and lower the MTU to 3818, the Cilium pods fail to get ready, and tell in their logs that the XDP program cannot be set due to the MTU. This is also confirmed from the dmesg of the nodes: [ 3617.059219] ena 0000:00:05.0 eth0: Failed to set xdp program, the current MTU (3818) is larger than the maximum allowed MTU (3498) while xdp is on The value 3818 comes from the legacy definition of ENA_XDP_MAX_MTU, in drivers/net/ethernet/amazon/ena/ena_netdev.h, which used to be defined as such: #define ENA_XDP_MAX_MTU (ENA_PAGE_SIZE - ETH_HLEN - ETH_FCS_LEN - \ VLAN_HLEN - XDP_PACKET_HEADROOM) Where ETH_LEN is 14, ETH_FCS_LEN and VLAN_HLEN are both 4, and XDP_PACKET_HEADROOM is 256. But after Linux commit 08fc1cfd2d25 ("ena: Add XDP frame size to amazon NIC driver"), from Linux 5.8, the definition changed to: #define ENA_XDP_MAX_MTU (ENA_PAGE_SIZE - ETH_HLEN - ETH_FCS_LEN - \ VLAN_HLEN - XDP_PACKET_HEADROOM - \ SKB_DATA_ALIGN(sizeof(struct skb_shared_info))) As a result, the maximum value for the MTU for kernels 5.8+ is 3498 bytes. This is indeed the maximum value that I could use when setting up XDP on an EKS cluster. Let's update the documentation accordingly. Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Aditi Ghag <aditi@cilium.io> 03 May 2022, 17:13:29 UTC
f892cde identity: Initialize local identity allocator early [ upstream commit 2e5f35b79e70e9de517b46615ce8e4abc6f9769d ] Move local identity allocator initialization to NewCachingIdentityAllocator() so that it is initialized when the allocator is returned to the caller. Also make the events channel and start the watcher in NewCachingIdentityAllocator(). Close() will no longer GC the local identity allocator or stop the watcher. Now that the locally allocated identities are persisted via the bpf ipcache map across restarts, recycling them at runtime via Close() would be inappropriate. This is then used in daemon bootstrap to restore locally allocated identities before new policies can be received via Cilium API or k8s API. This fixes the issue where CIDR policies were received from k8s before locally allocated (CIDR) identities were restored, causing the identities derived from the received policy to be newly allocated with different numeric identity values, ultimately causing policy drops during Cilium restart. Fixes: #19360 Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> 29 April 2022, 17:27:22 UTC
7e72244 daemon: Do not try to detect Dump support [ upstream commit b61a347f58102a2a71d47615408e74f8e5dafbc9 ] ipcache SupportDump() and SupportsDelete() open the map to probe for the support if the map is not already open and also schedule the bpf-map-sync-cilium_ipcache controller. If the controller is run before initMaps(), initMaps will fail as the controller will leave the map open and initMaps() assumes this not be the case. Solve this by not trying to detect dump support, but try dump and see if it succeeds. This fixes Cilium Agent crash on kernels that do not support ipcache dump operations and when certain Cilium features are enabled on slow machines that caused the scheduled controller to run too soon. Fixes: 19360 Fixes: 19495 Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> Signed-off-by: Cilium Maintainers <maintainer@cilium.io> 29 April 2022, 17:27:22 UTC
55e076c docs: fix version warning URL to point to docs.cilium.io [ upstream commit b3b05029e4c955c6014c5778f595af6bbd4db2e8 ] Due to some CORS policy, the requests being performed from docs.cilium.io to readthedocs.org were being denied. This was causing the warning banner to never show up in the documentation. To avoid this problem a page redirect was configured in readthedocs settings to redirect docs.cilium.io/version to readthedocs.org/api/v2/version which will hopefully fix the issue and the API endpoint was set to docs.cilium.io. Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Joe Stringer <joe@cilium.io> 28 April 2022, 22:50:55 UTC
9e95af3 fqdn: Limit the number of zombies per host [ upstream commit ac93cb4f359b7eb93e5ce97a38650062a989ed24 ] [ Backporter's notes: Minor conflicts in endpoint.go, fqdn/cache.go ] Commit f6ce522d55d5 ("FQDN: Added garbage collector functions.") introduced a per-host limit on the number of IPs to be associated in the DNS cache, but at the time we did not support keeping FQDN entries alive beyond DNS TTL ("zombie entries"). These were later added in commit f6293725867c ("fqdn: Add and use DNSZombieMappings in Endpoint"), but at that time no such per-host limit was imposed on these zombie entries. Commit 5923dafd88be ("fqdn: keep IPs alive if their name is alive") later adjusted the zombie garbage collection to allow zombies to stay alive as long as any IP that shares the same FQDN is marked as alive. Unfortunately, this lead to situations where a very high number of DNS cache entries remain in the cache beyond the DNS TTL, simply because one IP for the given name continues to be used. In the case of something like Amazon S3, where DNS TTLs are known to be low, and IP recycling high, if an app constantly made requests via ToFQDNs policy towards names hosted by this service, this could lead to thousands of stale FQDN mappings accumulating in the cache. For each of these mappings, Cilium would allocate corresponding identities, and when this is combined with a permissive pod policy, this could lead to policymaps becoming full, and error messages in the logs like: msg="Failed to add PolicyMap key" ... error="Unable to update element for map with file descriptor 67: argument list too long" This could also prevent new pods from being scheduled on nodes, as Cilium would be unable to implement the full requested policy for the new endpoints. In order to mitigate this situation, extend the per-host limit configuration to apply separately also to zombie entries. This allows up to 'ToFQDNsMaxIPsPerHost' FQDN entries that are alive (ie below DNS TTL) in addition to a further 'ToFQDNsMaxIPsPerHost' zombie entries corresponding to connections which remain alive beyond the DNS TTL. Signed-off-by: Joe Stringer <joe@cilium.io> 28 April 2022, 22:50:55 UTC
955ba82 fqdn: Refactor zombie sort function [ upstream commit aafc70b44e44519eced5927d715587d1a38c01b8 ] It'll be useful to reuse this in-place zombie sort in an upcoming patch, split it out in preparation. Signed-off-by: Joe Stringer <joe@cilium.io> 28 April 2022, 22:50:55 UTC
1ee37c1 make: check that Go major/minor version matches required version [ upstream commit ec187e876308d125240591256d9c522814e6fc90 ] [ Backporter's notes: Conflicted with v1.10 tree. Resolved to just add the new target and hook it into the Makefile. ] Currently, when building Cilium with a Go major/minor version other than the one specified in `GO_VERSON`, the build currently fails with: package github.com/cilium/cilium/cilium: build constraints exclude all Go files It's not obvious from the compiler error message that this is due to mismatching Go version. This change adds a `check-go-version` target which checks the Go compiler version used for building Cilium (as specified in `$(GO)` against the version pinned in the `GO_VERSION` file, i.e. the version used to build Cilium in CI. This check is required to pass in the `precheck` target which should surface mismatching Go versions in a more user-friendly way. Example with matching version: $ go version go version go1.18.1 linux/amd64 $ make check-go-version Example with mismatching version: $ go1.17.9 version go version go1.17.9 linux/amd64 $ make GO=go1.17.9 check-go-version Installed Go version 1.17 does not match requested Go version 1.18 make: *** [Makefile:602: check-go-version] Error 1 Suggested-by: Aditi Ghag <aditi@cilium.io> Signed-off-by: Tobias Klauser <tobias@cilium.io> Signed-off-by: Joe Stringer <joe@cilium.io> 28 April 2022, 22:50:55 UTC
ee0049f build(deps): bump docker/setup-buildx-action from 1.6.0 to 1.7.0 Bumps [docker/setup-buildx-action](https://github.com/docker/setup-buildx-action) from 1.6.0 to 1.7.0. - [Release notes](https://github.com/docker/setup-buildx-action/releases) - [Commits](https://github.com/docker/setup-buildx-action/compare/94ab11c41e45d028884a99163086648e898eed25...f211e3e9ded2d9377c8cadc4489a4e38014bc4c9) --- updated-dependencies: - dependency-name: docker/setup-buildx-action dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> 28 April 2022, 22:26:41 UTC
fec2ba7 build(deps): bump actions/checkout from 3.0.1 to 3.0.2 Bumps [actions/checkout](https://github.com/actions/checkout) from 3.0.1 to 3.0.2. - [Release notes](https://github.com/actions/checkout/releases) - [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md) - [Commits](https://github.com/actions/checkout/compare/dcd71f646680f2efd8db4afa5ad64fdcba30e748...2541b1294d2704b0964813337f33b291d3f8596b) --- updated-dependencies: - dependency-name: actions/checkout dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> 26 April 2022, 23:24:53 UTC
28713df docs: Update section title to improve readability [ upstream commit 79d53af7d5f0c3b91c17ff7cb5cc0b203d34f1d1 ] Local redirect policy requires Kube-proxy replacement, and the feature flag to be enabled. Rename the section that outlines these steps so that users are less likely to miss them. Suggested-by: Raymond de Jong <raymond.dejong@isovalent.com> Signed-off-by: Aditi Ghag <aditi@cilium.io> Signed-off-by: Tobias Klauser <tobias@cilium.io> 22 April 2022, 06:29:03 UTC
67f0d77 pkg/redirectpolicy: Improve error logs [ upstream commit 6c34c93dce924bc24072c23a154a62cf568b4d38 ] Improve error logs thrown by port validation logic so that user can take necessary actions. Signed-off-by: Aditi Ghag <aditi@cilium.io> Signed-off-by: Tobias Klauser <tobias@cilium.io> 22 April 2022, 06:29:03 UTC
7e425d4 docs: improve description for session affinity with KPR [ upstream commit b7002f593b44850af162e8f8f5f01a0dcc92e65b ] Use the correct terminology ('session affinity' vs 'service affinity'), and fix a typo. Signed-off-by: Julian Wiedmann <jwi@isovalent.com> Signed-off-by: Tobias Klauser <tobias@cilium.io> 22 April 2022, 06:29:03 UTC
d7ea906 pkg/bpf: add map name in error message for OpenParallel [ upstream commit 29c3ebdc0ace37e3068945942d35fe3e3786f4cd ] Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Tobias Klauser <tobias@cilium.io> 22 April 2022, 06:29:03 UTC
56a1c45 jenkinsfiles: Increase VM boot timeout [ upstream commit cfec27a217259e2002932864fe69e1df072319bc ] This commit increases the VM boot timeout while decreasing the overall timeout :mindblown: We currently run the vagrant-ci-start.sh script with a 15m timeout and retry twice if it fails. That takes up to 45m in total if all attempts fail, as in frequently happening in CI right now. In particular, if the script simply fails because it's taking on average more than 15m then it is likely to fail all three times. This commit instead increases the timeout from 15m to 25m and removes the retries. The goal is obviously to succeed on the first try :p Ideally, we would investigate why it is now taking longer to start the VM. But this issue has been happening for a long time. And because of the retries, we probably didn't even notice the increase at the beginning: if it takes on average 15min, it might fail half the time and the test might still succeed most of the time. That is, the retries participate to hide the increase. Signed-off-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Tobias Klauser <tobias@cilium.io> 22 April 2022, 06:29:03 UTC
43824d8 install: Update image digests for v1.10.10 Generated from https://github.com/cilium/cilium/actions/runs/2053999962. `docker.io/cilium/cilium:v1.10.10@sha256:83bfc1052543e8b1e31f06fa2b5bbd2bd41cc79f264010241fc1994e35281616` `quay.io/cilium/cilium:v1.10.10@sha256:83bfc1052543e8b1e31f06fa2b5bbd2bd41cc79f264010241fc1994e35281616` `docker.io/cilium/clustermesh-apiserver:v1.10.10@sha256:e13d41db3f5ee93d8b3abcaa10cc4005522bc797be3d69fc96ac5e03b60c7b11` `quay.io/cilium/clustermesh-apiserver:v1.10.10@sha256:e13d41db3f5ee93d8b3abcaa10cc4005522bc797be3d69fc96ac5e03b60c7b11` `docker.io/cilium/docker-plugin:v1.10.10@sha256:cd45b531e97b588d4e8c825cb588611949044db4351dcffeacf92ba2f4208054` `quay.io/cilium/docker-plugin:v1.10.10@sha256:cd45b531e97b588d4e8c825cb588611949044db4351dcffeacf92ba2f4208054` `docker.io/cilium/hubble-relay:v1.10.10@sha256:a0769e44299bba301dee08d489f4e2d3b3924916bed985346dcf9fcf10861c8a` `quay.io/cilium/hubble-relay:v1.10.10@sha256:a0769e44299bba301dee08d489f4e2d3b3924916bed985346dcf9fcf10861c8a` `docker.io/cilium/operator-alibabacloud:v1.10.10@sha256:6154fcc069700cca6754cff0ee7bf6990bbf4a2865076b5358cb0c70c0043d52` `quay.io/cilium/operator-alibabacloud:v1.10.10@sha256:6154fcc069700cca6754cff0ee7bf6990bbf4a2865076b5358cb0c70c0043d52` `docker.io/cilium/operator-aws:v1.10.10@sha256:9bc04377606cb57c16f699a5b34dcdd6b6ffc1c4f43f5e6da81015fc16c10edc` `quay.io/cilium/operator-aws:v1.10.10@sha256:9bc04377606cb57c16f699a5b34dcdd6b6ffc1c4f43f5e6da81015fc16c10edc` `docker.io/cilium/operator-azure:v1.10.10@sha256:6973d45f7255c1791c0502339675a42105b8cbeca1a98634362623433674efe1` `quay.io/cilium/operator-azure:v1.10.10@sha256:6973d45f7255c1791c0502339675a42105b8cbeca1a98634362623433674efe1` `docker.io/cilium/operator-generic:v1.10.10@sha256:8a317287b6ac8fe0ba4999342c9627dc913e0c1591552164f96d0aadf5d1a740` `quay.io/cilium/operator-generic:v1.10.10@sha256:8a317287b6ac8fe0ba4999342c9627dc913e0c1591552164f96d0aadf5d1a740` `docker.io/cilium/operator:v1.10.10@sha256:8462f34a9c081126c9281bc637d76b3c7f81668bbb77a4a66a3dda16554915e9` `quay.io/cilium/operator:v1.10.10@sha256:8462f34a9c081126c9281bc637d76b3c7f81668bbb77a4a66a3dda16554915e9` Signed-off-by: Joe Stringer <joe@cilium.io> 19 April 2022, 04:12:59 UTC
8a77a4a Prepare for release v1.10.10 Signed-off-by: Joe Stringer <joe@cilium.io> 15 April 2022, 16:03:20 UTC
3f3fec1 build(deps): bump actions/checkout from 3.0.0 to 3.0.1 Bumps [actions/checkout](https://github.com/actions/checkout) from 3.0.0 to 3.0.1. - [Release notes](https://github.com/actions/checkout/releases) - [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md) - [Commits](https://github.com/actions/checkout/compare/a12a3943b4bdde767164f792f33f40b04645d846...dcd71f646680f2efd8db4afa5ad64fdcba30e748) --- updated-dependencies: - dependency-name: actions/checkout dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> 14 April 2022, 20:31:12 UTC
47e81a6 envoy: Limit accesslog socket permissions [ upstream commit 5595e622243948f74187b449186e4575f451b9e5 ] [ Backporter's notes: trivial conflicts in `cilium-agent.md` and `pkg/envoy/accesslog_server.go` due to other changes in the lines right next to this backport since v1.10. ] Limit access to Cilium xDS and access log sockets to root and group 1337 used by Istio sidecars. Fixes: #3131 Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com> 14 April 2022, 19:33:02 UTC
266183d Add a 'Limitations' section to 'External Workloads'. [ upstream commit 3454eceacc5933a98c60481d909e8878c665aed2 ] This commit adds a 'Limitations' section to the 'External Workloads' page, initially referring to the fact that transparent encryption to/from external workloads is currently not supported. Signed-off-by: Bruno M. Custódio <brunomcustodio@gmail.com> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com> 14 April 2022, 19:33:02 UTC
back to top