https://github.com/cilium/cilium

sort by:
Revision Author Date Message Commit Date
411f93e hubble: Set drop reason to POLICY_DENIED for L7 dropped flows Set drop reason to POLICY_DENIED for L7 dropped flows instead of setting it to DROP_REASON_UNKNOWN. Fixes: #22402 Signed-off-by: Michi Mutsuzaki <michi@isovalent.com> 28 November 2022, 23:35:58 UTC
e8407ba docs: clarifications about CNCF maintainer status Signed-off-by: Liz Rice <liz@lizrice.com> 28 November 2022, 23:08:11 UTC
a75e24b test: Remove flaking test Remove new part of TLS test that keeps flaking in most PRs. Will be added back when flaking is resolved. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> 28 November 2022, 19:48:37 UTC
cf3cc16 fqdn: dnsproxy: fix forwarding of the original security identity for TCP In case of TCP this is not enough to do net.Dial + setsockopt(SO_MARK), as in this case TCP SYN will have a wrong identity, e.g.: Policy verdict log: flow 0x7a95a133 local EP ID 393, remote ID 14616, proto 6, egress, action redirect, match L3-L4, 10.244.1.122:42437 -> 10.244.1.120:53 tcp SYN Policy verdict log: flow 0x907eaa19 local EP ID 458, remote ID host, proto 6, ingress, action allow, match L3-Only, 172.19.0.2:56276 -> 10.244.1.120:53 tcp SYN Here the second message has wrong identity (host). We still allow the traffic, as the origin is local host and the coredns is running on the same host, but this will not work for a remote host if ingress policy doesn't allow remote-node identity.) To fix this we need to pass a Control parameter to Dial, so that setsockopt(2) is called before the connect(2). With such a change we now see the correct identity in case of TCP: Policy verdict log: flow 0xeb7902a9 local EP ID 393, remote ID 14616, proto 6, egress, action redirect, match L3-L4, 10.244.1.122:36661 -> 10.244.1.120:53 tcp SYN Policy verdict log: flow 0x4efbc5a0 local EP ID 458, remote ID 41903, proto 6, ingress, action allow, match L3-L4, 172.19.0.2:40508 -> 10.244.1.120:53 tcp SYN Fixes: 44c1def67854 ("fqdn: dnsproxy: forward the original security identity") Signed-off-by: Anton Protopopov <aspsk@isovalent.com> 28 November 2022, 18:10:19 UTC
8264fd4 fqdn: dnsproxy: fix forwarding of the security identity for cluster mesh The commit 44c1def67854 wrongly forwarded only lower 16 bits of the original identity. This might corrupt identities when cluster-id is not zero (as the cluster-id is encoded in bits 16..23 of the identity) and leads to policy drops due to unknown identity, e.g. xx drop (Policy denied) flow 0xd1a7add4 to endpoint 3966, file bpf_lxc.c line 2032, , identity 47657->157516: 10.2.3.223:55853 -> 10.2.3.206:53 udp (Here the security identity 47657 doesn't exist, as it should actually be equal to 0x10000|47657 = 113193.) Fix this by also storing bits 16..23 of the identity in the skb mark according to the datapath ABI, i.e., skb mark should be equal to (id << 16) | (id >> 16). Fixes: 44c1def67854 ("fqdn: dnsproxy: forward the original security identity") Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com> Signed-off-by: Anton Protopopov <aspsk@isovalent.com> 28 November 2022, 18:10:19 UTC
b8a6791 Some tofqdn flags not being parsed Signed-off-by: Carlos Castro <carlos.castro@jumo.world> 28 November 2022, 16:23:25 UTC
bb28996 ctmap: Add missing FromL7LB flag 'FromL7LB' was not added for string conversion when it was added to the map, do it now. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> 28 November 2022, 16:18:54 UTC
f59df85 ignore auto-generated pkg/k8s/client directories for PR reviews and codeownership Signed-off-by: Tim Horner <timothy.horner@isovalent.com> 28 November 2022, 11:26:38 UTC
70252f4 Update k8s tests and libraries to v1.26.0-rc.0 Upstream changes included changing Ingress.LoadBalancerStatus from corev1.LoadBalancerStatus to networkingv1.IngressLoadBalancerStatus. This required the addition of 2 new factory funcs to convert slim.LoadBalancerIngress to networkingv1.IngressLoadBalancerIngress and another to convert LoadBalancerStatus to IngressLoadBalancerStatus in the slim client. See: https://github.com/kubernetes/kubernetes/pull/106242 Signed-off-by: Tim Horner <timothy.horner@isovalent.com> 28 November 2022, 11:26:38 UTC
f09610e ingestion/gateway-api: Map backend weight to model This commit is to make sure the weightage value is propagated to internal model. Relates: 58c8aff11062f944e9f3a18569c647c64edd1bc9 Reported-by: Nico Vibert <nicolas.vibert@isovalent.com> Signed-off-by: Tam Mach <tam.mach@cilium.io> 28 November 2022, 10:05:45 UTC
15baaec .clomonitor: Update CLOMonitor checks exemptions Add dangerous workflow, signed releases and token permissions checks to CLOMonitor exemptions. Signed-off-by: Sandipan Panda <samparksandipan@gmail.com> 28 November 2022, 10:05:01 UTC
93ed15d build(deps): bump google.golang.org/grpc from 1.50.1 to 1.51.0 Bumps [google.golang.org/grpc](https://github.com/grpc/grpc-go) from 1.50.1 to 1.51.0. - [Release notes](https://github.com/grpc/grpc-go/releases) - [Commits](https://github.com/grpc/grpc-go/compare/v1.50.1...v1.51.0) --- updated-dependencies: - dependency-name: google.golang.org/grpc dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> 28 November 2022, 10:02:07 UTC
d895d08 bpf: lb: remove direction argument in lb*_extract_key() It's always CT_EGRESS. Signed-off-by: Julian Wiedmann <jwi@isovalent.com> 28 November 2022, 10:01:36 UTC
1a2fb11 bpf: nodeport: fine-tune path for delivery to local backend When delivering a packet to its selected backend, we already have a check for whether the backend is local. Also use this path when deciding whether the packet should be passed up to the stack. Signed-off-by: Julian Wiedmann <jwi@isovalent.com> 28 November 2022, 10:01:36 UTC
79ea936 bpf: nodeport: reduce scope of macaddr variables The macaddr variables are only needed when updating the neighbour map. Signed-off-by: Julian Wiedmann <jwi@isovalent.com> 28 November 2022, 10:01:36 UTC
e20e925 datapath: remove unused ENCRYPT_NODE macro It's safe to remove this unused macro. Signed-off-by: Julian Wiedmann <jwi@isovalent.com> 28 November 2022, 09:58:33 UTC
5b573c3 Make fsnotify event more readable. Signed-off-by: yanggang <gang.yang@daocloud.io> 28 November 2022, 09:57:51 UTC
e6fb48a helm: Add secret permission for agent This commit is to make sure that cilium agent has required secret permission if gateway api (but not Ingress) is enabled. The original commit 759f7161a925b4e837338bd5c667c1abd8e59452 added the same logic for operator, but missed out agent part. The end-goal is to have ingress and gateway api as independent features, so that users can just enable only what they need. Without this change, gateway API will only work if and only if ingressController.enabled is set and default secret namespace is used (e.g. cilium-secrets). Relates: 759f7161a925b4e837338bd5c667c1abd8e59452 Signed-off-by: Tam Mach <tam.mach@cilium.io> 28 November 2022, 09:55:59 UTC
c057af2 build(deps): bump golang.org/x/tools from 0.2.0 to 0.3.0 Bumps [golang.org/x/tools](https://github.com/golang/tools) from 0.2.0 to 0.3.0. - [Release notes](https://github.com/golang/tools/releases) - [Commits](https://github.com/golang/tools/compare/v0.2.0...v0.3.0) --- updated-dependencies: - dependency-name: golang.org/x/tools dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> 28 November 2022, 09:54:12 UTC
b9f7292 envoy: Do not set AutoSNI options Cilium filters already set SNI when available, and Envoy may crash if auto_sni option is used in this case. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> 26 November 2022, 15:03:24 UTC
18a2b2c proxylib: Do not log raw policies Policies may contain large sets of TLS certificates, avoid polluting the logs with them. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> 26 November 2022, 15:03:24 UTC
663f7c0 envoy: Add TLS filter chains for TCP proxy Add TLS filter chains so that TLS can be used also with TCP proxy. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> 26 November 2022, 15:03:24 UTC
c733439 policy: Allow TLS termination and origination without L7 rules Add new L7ParserType "tls" to be used when TLS termination and/or origination is needed, and when no L7 policy is to be used. Use Envoy TCP proxy for TLS termination and/or origination in this case. artii.herokuapp.com is no more, so tests against it fail. Remove them and unquarantine the TLS test. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> 26 November 2022, 15:03:24 UTC
86a6445 policy: Fix comments Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> 26 November 2022, 15:03:24 UTC
ec4f132 policy: Factor out L7ParserType.Merge() Factor out the merging logic of L7ParserTypes and add a unit test. This makes adding new types with more complex merging logic easier in the future. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> 26 November 2022, 15:03:24 UTC
d9471a0 policy: Use generated DeepEqual() in PerSelectorPolicy.Equal() Use generated DeepEqual() in PerSelectorPolicy.Equal() instead of reflection. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> 26 November 2022, 15:03:24 UTC
ca801ee policy-api: Use Len for IsEmpty Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> 26 November 2022, 15:03:24 UTC
c43a02d Envoy: Upgrade for SNI enforcement Update Envoy image with: - websocket filters (cilium.network.websocket.client and cilium.network.websocket.server) - use upstream destination address for egress policy enforcement only if listener is an L7 LB listener. This allows listener to tunnel pod traffic while the original destination address is used for policy enforcement rather than the tunnel destination address. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> 26 November 2022, 15:03:24 UTC
c892dbb build(deps): bump github.com/hashicorp/consul/api from 1.15.3 to 1.17.0 Bumps [github.com/hashicorp/consul/api](https://github.com/hashicorp/consul) from 1.15.3 to 1.17.0. - [Release notes](https://github.com/hashicorp/consul/releases) - [Changelog](https://github.com/hashicorp/consul/blob/main/CHANGELOG.md) - [Commits](https://github.com/hashicorp/consul/compare/api/v1.15.3...api/v1.17.0) --- updated-dependencies: - dependency-name: github.com/hashicorp/consul/api dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> 26 November 2022, 01:21:15 UTC
cf6b274 build(deps): bump go.etcd.io/etcd/client/v3 from 3.5.5 to 3.5.6 Bumps [go.etcd.io/etcd/client/v3](https://github.com/etcd-io/etcd) from 3.5.5 to 3.5.6. - [Release notes](https://github.com/etcd-io/etcd/releases) - [Changelog](https://github.com/etcd-io/etcd/blob/main/Dockerfile-release.amd64) - [Commits](https://github.com/etcd-io/etcd/compare/v3.5.5...v3.5.6) --- updated-dependencies: - dependency-name: go.etcd.io/etcd/client/v3 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> 26 November 2022, 01:18:40 UTC
09f72f7 resource: Fix queue entry coalescing The entries added to the resource's workqueue were added as pointers which messes up the comparisons causing coalescing to not happen. This causes TestResource_Retries to flake sometimes with: Error: Not equal: expected: 5 actual : 10 Test: TestResource_Retries Messages: expected to see 5 retries for update What happens is that the key gets requeued as &updateEntry{key}, which doesn't match with the previous one &updateEntry{key}, so it's effectively a new entry with it's own rate limiting and retry count state and thus we end up seeing more retries than expected. This fixes the issue by adding the entries by value. The comparisons of syncEntry and updateEntry are now trivially correct. The deleteEntry carries the pointer to the last known state of the deleted object, but this is fine since there can only be one such object. Signed-off-by: Jussi Maki <jussi@isovalent.com> 26 November 2022, 01:17:30 UTC
0d2c2a8 doc: fixed broken doc link in helm chart Signed-off-by: David Calvert <david@0xdc.me> 26 November 2022, 01:11:16 UTC
c2d6908 preflight: Fail 'validate-cnp' check for empty to/from endpoints selector Previously, 'validate-cnp' preflight check would log a verbose warning if it detected a CCNP with an empty toEndpoints/fromEndpoints selector and pass the check with the following output: time="2022-11-03T15:50:04Z" level=info msg="Validation OK!" CiliumClusterwideNetworkPolicy=test-empty-endpointselector time="2022-11-03T15:50:04Z" level=info msg="All CCNPs and CNPs valid!" This could be misleading and tempt the user to ignore the warning. The preflight check will now fail with the following output: time="2022-11-03T16:05:30Z" level=error msg="Unexpected validation error" CiliumClusterwideNetworkPolicy=test-empty-endpointselector error="use of empty toEndpoints/fromEndpoints selector" time="2022-11-03T16:05:30Z" level=error msg="Start hook failed" error="Found invalid CiliumClusterwideNetworkPolicy" function="cilium/cmd.validateCNPCmd.func1.1 (preflight_k8s_valid_cnp.go:41)" subsys=hive time="2022-11-03T16:05:30Z" level=info msg="Stop hook executed" duration="21.858µs" function="pkg/k8s/client.(*compositeClientset).onStop-fm (<autogenerated>:1)" subsys=hive time="2022-11-03T16:05:30Z" level=fatal msg="failed to start: Found invalid CiliumClusterwideNetworkPolicy" Fixes: #17471 Signed-off-by: Tim Horner <timothy.horner@isovalent.com> 26 November 2022, 00:44:07 UTC
0101700 hubble/metrics: Fix label ordering in Hubble TCP metrics The code setting the flag label value assumes that it's the first label in the slice. If context options are enabled, then it's not true, so one of the context labels incorrectly gets the flag value, and the flag label gets discarded. Fixes: d4d73681026b ("hubble/metrics: Replace panic in contextLabels with error log") Signed-off-by: Anna Kapuscinska <anna@isovalent.com> 26 November 2022, 00:42:58 UTC
faa0135 test: Move log-gatherer image to Quay Some CI jobs are failing because we are getting rate-limited on docker.io for the log-gatherer image. André copied it to Quay and we can now use that instead of docker.io. Signed-off-by: Paul Chaignon <paul@cilium.io> 26 November 2022, 00:40:40 UTC
655ed8d docs: Add LB-IPAM documentation This commit adds documentation for the LB-IPAM feature. Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com> 26 November 2022, 00:39:59 UTC
8df3d1e operator: Add LB-IPAM This commit adds the LB-IPAM feature. LB-IPAM allows users to specify a set of pools containing one or more CIDRs. Services of type LoadBalancer will receive Ingress IPs from these pools. LB-IPAM is part of the ongoing work to add service announcements to the BGP Control Plane. However, the component is designed to be generic so it can be used by other features as well. Co-authored-by: Jussi Maki <jussi@isovalent.com> Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com> 26 November 2022, 00:39:59 UTC
c278055 k8s: Rename and reuse BGP IP Pool This commit renames the CiliumBGPLoadBalancerIPPool CRD to the CiliumLoadBalancerIPPool so it may be used for load balancers other than those who use BGP. The IP Pool will be used by the operators LB IPAM component, and the contents of the CRD have been updated to match the new requirements. Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com> 26 November 2022, 00:39:59 UTC
86e41f1 k8s/resource: Expose the underlying cache.Store in Store[T] To make it easier to partially transition to using Resource[T], expose the underlying cache.Store. Hopefully temporary :fingerscrossed:. Signed-off-by: Jussi Maki <jussi@isovalent.com> 26 November 2022, 00:39:59 UTC
ebb9a78 k8s/slim: Add missing fields needed by LB-IPAM This adds: - metav1.Condition, metav1.ConditionStatus - metav1.ObjectMeta.Generation - corev1.IPFamilyPolicy - corev1.IPFamilyPolicyType - corev1.LoadBalancerClass - corev1.Service.{IPFamilyPolicy, LoadBalancerClass} - corev1.ServiceStatus.Condition Signed-off-by: Jussi Maki <jussi@isovalent.com> 26 November 2022, 00:39:59 UTC
ec41c3d test: remove kube-proxy-replacement: probe from upstream tests This option was removed by 691f1c33c9ad and broke all upstream tests. This commit removes this setting as well to make the tests pass. As some tests are failing because KPR is now disabled we need to set the sessionAffinity=true to make the relevant session affinity conformance tests to pass. Fixes: 691f1c33c9ad ("daemon: Remove KPR=probe") Signed-off-by: André Martins <andre@cilium.io> 25 November 2022, 21:27:59 UTC
fe39350 helm: Do not create Grafana dashboards by default The default in #21181 was true, but not everyone uses Grafana and this was already brought up in a comment in the previous PR that it can cause troubles with the cilium upgrade preflight manifest. Signed-off-by: Chance Zibolski <chance.zibolski@gmail.com> 25 November 2022, 11:27:36 UTC
14babaf chore: fix typo in enableCNPWatcher comment Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com> 25 November 2022, 11:27:22 UTC
f4ff2ce operator: rate limit CNP nodes status clean up When the option `disable-cnp-status-updates` is set to true, the operator, at startup, will garbage collect all stale status nodes updates in CNPs and CCNPs. To avoid an excessive requests rate to the API server, the clean up is rate limited. The requests rate per second and the maximum allowed burst of requests is controlled, respectively, by the two new options `cnp-status-cleanup-qps` and `cnp-status-cleanup-burst`. Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com> 25 November 2022, 11:27:22 UTC
c2c66a8 operator: add a flag to skip CNP status cleaning at startup When the option `disable-cnp-status-updates` is set to true, the operator, at startup, will garbage collect all stale status nodes updates in CNPs and CCNPs. This new option `skip-cnp-status-startup-cleaning` may be used to skip this clean up so to speed up the operator startup. Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com> 25 November 2022, 11:27:22 UTC
20bd519 operator: clear CNP status nodes if updates disabled When the option `disable-cnp-status-updates` is set to true, no policy enforcement update is tracked in CiliumNetworkPolicies. However, if the option was previously set to false, the field status.nodes still contains the last status of each node when the feature was turned off. Currently, the GC in the cilium operator removes status entries only if the relative node has been turned off. Given that these stale updates may hinder scalability for large clusters, we clean up all those entries at startup if `disable-cnp-status-updates` is set to true. Fixes #20231 Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com> 25 November 2022, 11:27:22 UTC
fdc6d39 operator: use GC controller context while patching CNPs Use the context from the GC controller to execute the update queries. Doing so, possible pending queries will be cancelled as soon as the controller context is cancelled. Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com> 25 November 2022, 11:27:22 UTC
22ba23e operator: fix typos in CNP node status gc Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com> 25 November 2022, 11:27:22 UTC
1b33ead operator: preallocate cnp list backing array The number of returned CiliumClusterwideNetworkPolicies is known in advance, so the preallocation of the backing array will avoid reallocations after the append. Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com> 25 November 2022, 11:27:22 UTC
ee428c8 workflows: aks: bump timeout to 60m Some test runs are timing out as each of the 2 connectivity test runs takes about 18/19 minutes. So bump the timeout to 1 hour. Signed-off-by: Gilberto Bertin <jibi@cilium.io> 25 November 2022, 11:27:06 UTC
a447012 bugtool: add missing bpftool vtep map dump add missing bpftool vtep map dump in cilium bugtool Signed-off-by: Vincent Li <v.li@f5.com> 24 November 2022, 19:32:28 UTC
0ded29b daemon: Deprecate force-local-policy-eval-at-source This should never have been exposed to users in the first place. It also causes issues when set to true, as explained in the previous commit. There are other ways to control if policy enforcement happens at the source or not (enable-endpoint-routes). Signed-off-by: Paul Chaignon <paul@cilium.io> 24 November 2022, 17:07:41 UTC
3277400 options: Disable force-local-policy-eval-at-source by default The force-local-policy-eval-at-source flag was introduced in commit c525c755 ("bpf: Continue to enforce policy at source endpoint unless disabled"). It is enabled by default and causes Cilium to always enforce policies at the source when the destination is a local pod. Unfortunately, this flag is also causing issues when both endpoint routes and tunneling are enabled [1] (a configuration that was not possible at the time the flag was introduced). We have enough test coverage (L7 on multiple cloud providers) now to be able to safely disable this flag by default. We can remove it after a couple releases. 1 - https://github.com/cilium/cilium/issues/14657 Signed-off-by: Paul Chaignon <paul@cilium.io> 24 November 2022, 17:07:41 UTC
b332f4b workflows: aks: collect sysdumps for each failing test Signed-off-by: Gilberto Bertin <jibi@cilium.io> 24 November 2022, 14:32:56 UTC
88d494a workflows: aks: enable connectivity test debug logs Signed-off-by: Gilberto Bertin <jibi@cilium.io> 24 November 2022, 14:32:56 UTC
048ac27 operator: Use hand picked bucket values It gives nicer looking values than the computed version. Signed-off-by: Aleksander Mistewicz <amistewicz@google.com> 24 November 2022, 13:02:06 UTC
4488a6e operator: Adjust buckets for CEP queue delay histogram In CiliumEndpointSliceQueueDelay histogram buckets configuration option was unset, so defaults were used (https://pkg.go.dev/github.com/prometheus/client_golang/prometheus#pkg-variables). This change doubles the number of buckets and increases the end of the last bucket to 1 hour as values larger than this can be observed in large clusters. Signed-off-by: Aleksander Mistewicz <amistewicz@google.com> 24 November 2022, 13:02:06 UTC
0274afc operator: Fix bucket width for CEP histogram to the documented values In CiliumEndpointSliceDensity histogram buckets configuration option was unset, so defaults were used (they have values form 0 to 10 as seen here: https://pkg.go.dev/github.com/prometheus/client_golang/prometheus#pkg-variables). This change makes the width of the buckets 10 as documented. BUG=254474623 Signed-off-by: Aleksander Mistewicz <amistewicz@google.com> 24 November 2022, 13:02:06 UTC
3632e16 ip: remove deprecated and unused GetCIDRPrefixesFromIPs Last remaining use was removed in commit bbcadc43758b ("treewide: Switch policy CIDR handling to netip"). Signed-off-by: Tobias Klauser <tobias@cilium.io> 24 November 2022, 12:53:04 UTC
929089e ip: remove unused IPNetToPrefix This helper function is now unused, remove it. Signed-off-by: Tobias Klauser <tobias@cilium.io> 24 November 2022, 12:53:04 UTC
9834253 daemon: convert Daemon.restoredCIDRs to netip.Prefix This avoids conversions to/from net.IPNet when populating and accessing the restored CIDRs. Signed-off-by: Tobias Klauser <tobias@cilium.io> 24 November 2022, 12:53:04 UTC
567031a maps/ipcache: add key.Prefix Same as Key.IPNet, but returns a netip.Prefix instead of *net.IPNet. This will be used in a successive commit. Signed-off-by: Tobias Klauser <tobias@cilium.io> 24 November 2022, 12:53:04 UTC
de82f8c daemon/policy: Reduce overhead of policy deletion This reduce the overhead of deleting policies, since it will now only loop through the policies in the repository once instead of twice. We originally found this when some of our clusters started having networking problems where legitimate traffic was randomly dropped on pod startup. After a while, we tracked it down to the main cilium event loop having a bad time, and due to CPU contention, it was unable to keep up with the creation and deletions of policies in the cluster. We grabbed a pprof, and realized that the biggest user of CPU time were "(*Daemon) policyAdd" and "(*Daemon) policyDelete". Overall, we would have expected them to be ~equally costly, and when looking at why, we saw that "(*Daemon) policyDelete" was effectively spending double the amount of CPU time, and that it was calling both "(*Repository) SearchRLocked" and "(*Repository) DeleteByLabelsLocked" for every policy delete; and that they were both ~equally expensive. After some more investigation, we realised that we could omit the call to "(*Repository) SearchRLocked". Signed-off-by: Odin Ugedal <ougedal@palantir.com> Signed-off-by: Odin Ugedal <odin@uged.al> 24 November 2022, 12:03:27 UTC
8df41dc Revert "relay: Add Go runtime metrics and process metrics" This reverts commit f0fa683870e1030707ed01b4d4b23b57b2d5c6a8. It appears to introduce a double-initialization of metrics, causing relay initialization failures. Signed-off-by: Joe Stringer <joe@cilium.io> 23 November 2022, 20:31:00 UTC
f3a22e8 docs: add instructions to build the base images from external forks When opening a PR to update the base images from external forks, the bot does not have necessary permissions to push the changes into the fork. For those cases the developer should amend the commit locally and push the changes themselves. Fixes: c5a778723a43 ("add auto-commit capability to build base images GH workflow") Signed-off-by: André Martins <andre@cilium.io> 23 November 2022, 20:19:56 UTC
4571310 build(deps): bump go.etcd.io/etcd/api/v3 from 3.5.5 to 3.5.6 Bumps [go.etcd.io/etcd/api/v3](https://github.com/etcd-io/etcd) from 3.5.5 to 3.5.6. - [Release notes](https://github.com/etcd-io/etcd/releases) - [Changelog](https://github.com/etcd-io/etcd/blob/main/Dockerfile-release.amd64) - [Commits](https://github.com/etcd-io/etcd/compare/v3.5.5...v3.5.6) --- updated-dependencies: - dependency-name: go.etcd.io/etcd/api/v3 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> 23 November 2022, 16:32:32 UTC
b70eb15 build(deps): bump go.etcd.io/etcd/client/pkg/v3 from 3.5.5 to 3.5.6 Bumps [go.etcd.io/etcd/client/pkg/v3](https://github.com/etcd-io/etcd) from 3.5.5 to 3.5.6. - [Release notes](https://github.com/etcd-io/etcd/releases) - [Changelog](https://github.com/etcd-io/etcd/blob/main/Dockerfile-release.amd64) - [Commits](https://github.com/etcd-io/etcd/compare/v3.5.5...v3.5.6) --- updated-dependencies: - dependency-name: go.etcd.io/etcd/client/pkg/v3 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> 23 November 2022, 16:30:09 UTC
366b968 daemon/cmd: Fix error handling for getting proxy port The error check handling should be done immediately after the GetProxyPort() call, in order to error out as soon as possible. This unchecked error can cascade to code integrations with the Agent and cause potentially difficult to track down behavior. Signed-off-by: Chris Tarazi <chris@isovalent.com> 23 November 2022, 13:13:09 UTC
f0fa683 relay: Add Go runtime metrics and process metrics Currently the agent has a GoCollector and ProcessCollector but relay does not, this updates the relay for consistency and enhanced debuggability. Signed-off-by: Chance Zibolski <chance.zibolski@gmail.com> 23 November 2022, 09:20:58 UTC
b27208a .github: fix bpf-checks on ubuntu-latest runner Take the same approach as in 5f7aa03fcc7b (".github: Explicitly set build-commits job runner image version"). Signed-off-by: Julian Wiedmann <jwi@isovalent.com> 23 November 2022, 09:17:55 UTC
5116f7a gha: Pin ubuntu-20.04 for conformance-test-ipv6 This commit is to avoid ubuntu version drift for runner, till the proper version upgrade is done. Signed-off-by: Tam Mach <tam.mach@cilium.io> 23 November 2022, 09:16:44 UTC
54e70e2 docs: Update Cilium Sphinx RTD Theme reference This updates Documentation/requirements.txt to reference a new commit hash on the theme's v1.0 branch. This will trigger an RTD build. Signed-off-by: Stacy Kim <stacy.kim@ucla.edu> 23 November 2022, 01:13:37 UTC
5f7aa03 .github: Explicitly set build-commits job runner image version github: Install libtinfo5 for clang in build-commits CI job Signed-off-by: Chance Zibolski <chance.zibolski@gmail.com> 22 November 2022, 21:51:08 UTC
ee4ea1a test: Fail on router IP mismatch warnings We try to restore the router IP both from the filesystem (first) and from Kubernetes objects (as a fallback). If the two IP addresses don't match, we emit a warning. There is no good reason for this to happen in CI so we should fail the test if that warning ever shows up. Doing so would have prevented the flake fixed by the previous commit. Signed-off-by: Paul Chaignon <paul@cilium.io> 22 November 2022, 21:21:57 UTC
1e947e9 pkg/nodediscovery: do not use Node annotations when mutating CiliumNode When using CiliumNode, the agent's source of truth should be the agent itself and not k8s node annotations. Thus we will not use the annotations for the CiliumInternalIP address when generating a CiliumNode from the k8s Node resource. Signed-off-by: André Martins <andre@cilium.io> 22 November 2022, 21:21:57 UTC
0696874 pkg/k8s: do not read k8s node annotations if they are not written When there is an annotation in the k8s node object, the annotation `io.cilium.network.ipv4-cilium-host` is used as the CiliumInternal IP address of the CiliumNode object in [1]. Whenever Cilium is updating any state into the CiliumNode it retrieves all IP address from k8s node, including the ones from annotations, and appends the local node's IP addresses, including the newly correct internal / router IP address, in [2]. Since this is a list, the annotation's IP address is always used first and all other Cilium agents will wrongly use it for any operation. [1] https://github.com/cilium/cilium/blob/927bd8c26904ff92e42c61cec6d00ea8ac062c05/pkg/nodediscovery/nodediscovery.go#L453-L459 [2] https://github.com/cilium/cilium/blob/927bd8c26904ff92e42c61cec6d00ea8ac062c05/pkg/nodediscovery/nodediscovery.go#L474-L489 Fixes: 73d6cae2c906 ("install: default AnnotateK8sNode to false") Signed-off-by: André Martins <andre@cilium.io> 22 November 2022, 21:21:57 UTC
d5ada37 docs: fix deployment resource type output Since k8s had remove support for extensions/v1beta1 API version after 1.16, we should update the docs to the latest and stable version. Signed-off-by: cleverhu <shouping.hu@daocloud.io> 22 November 2022, 19:39:41 UTC
b00806f bugtool: Fix URL to blog.ralch.com Signed-off-by: yanggang <gang.yang@daocloud.io> Signed-off-by: Joe Stringer <joe@cilium.io> 22 November 2022, 18:20:03 UTC
6e13c3b images: update cilium-{runtime,builder} Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> 22 November 2022, 14:39:30 UTC
005f900 docker: Do not specify syntax Not specifying the syntax starts builds faster, but relies the default syntax to be recent enough. This is currently the case, so remove the syntax references. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> 22 November 2022, 14:39:30 UTC
6c98f15 operator: fix CEP GC When CEP was converted to an internal CEP structure, the UID field was not copied, causing the delete requests of CEPs to have their UID precondition set as empty. When kube-apiserver received this delete request it didn't delete the CEP because an empty CEP UID didn't match an existent UID. Fixes: 6f7bf6c51f7a ("Prevent CiliumEndpoint removal by non-owning agent") Reported-by: Bruno Custódio <bruno@isovalent.com> Signed-off-by: André Martins <andre@cilium.io> 22 November 2022, 12:12:02 UTC
3a5e985 pkg/k8s: fallback on retrieving CiliumNode from kube-apiserver Retrieving objects from caches can be useful to prevent doing useless requests to kube-apiserver. In the unlikely event that the object doesn't exist in the local cache Cilium can try to retrieve it from kube-apiserver directly. For this particular case, with CiliumNode, it is causing Cilium to fatal as it is unable to retrieve CiliumNode from the cache, due subsystem initialization issues, thus we will fallback on retrieving the object directly from kube-apiserver. In this case, the subsystem initialization issue happened due to the fact that CiliumNode watcher is blocked on its event handler by the egressGatewayManager [1] which is blocked by the initialization of the identity allocator [2]. Unfortunately, the identity allocator is only initialized at a later stage causing the CiliumNode cache from being populated with all of its nodes. [1] https://github.com/cilium/cilium/blob/933bdcbec9319b0148b12688f720fbaaf55e0dba/pkg/k8s/watchers/cilium_node.go#L56 [2] https://github.com/cilium/cilium/blob/933bdcbec9319b0148b12688f720fbaaf55e0dba/pkg/egressgateway/manager.go#L83 Fixes: 69e4c6974891 ("k8s: optimize API calls made to kube-apiserver") Signed-off-by: André Martins <andre@cilium.io> 22 November 2022, 12:03:46 UTC
2b7bbe3 build(deps): bump github/codeql-action from 2.1.30 to 2.1.32 Bumps [github/codeql-action](https://github.com/github/codeql-action) from 2.1.30 to 2.1.32. - [Release notes](https://github.com/github/codeql-action/releases) - [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md) - [Commits](https://github.com/github/codeql-action/compare/18fe527fa8b29f134bb91f32f1a5dc5abb15ed7f...4238421316c33d73aeea2801274dd286f157c2bb) --- updated-dependencies: - dependency-name: github/codeql-action dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> 22 November 2022, 11:06:05 UTC
13a6109 build(deps): bump github.com/spf13/cobra from 1.5.0 to 1.6.1 Bumps [github.com/spf13/cobra](https://github.com/spf13/cobra) from 1.5.0 to 1.6.1. - [Release notes](https://github.com/spf13/cobra/releases) - [Commits](https://github.com/spf13/cobra/compare/v1.5.0...v1.6.1) --- updated-dependencies: - dependency-name: github.com/spf13/cobra dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> 22 November 2022, 11:05:33 UTC
a20094d bpf: egressgw: clarify IPSec key for tunnel encapsulation The `encrypt_key` in handle_ipv4_from_lxc() is obtained from a IPCache lookup for the packet's `daddr`. It doesn't make sense to use this key in the context of redirecting EgressGW traffic - here the tunnel's remote endpoint is not `daddr`, but an EgressGW node. As EgressGW and IPSec are currently mutually exclusive, we can just hard-code this parameter to 0 for now. In the future we would need to look up the IPSec key of the selected EgressGW node. Signed-off-by: Julian Wiedmann <jwi@isovalent.com> 22 November 2022, 10:58:18 UTC
cdff193 Fix broken documentation URL in helm chart template Link to OSS documentation for policy-enforcement-modes is incorrect in helm chart template. This is a minor fix to point to correct documentation URL Signed-off-by: Navin Kukreja <navin.kukreja@isovalent.com> Co-authored-by: Raphaël Pinson <raphael@isovalent.com> 22 November 2022, 10:57:58 UTC
f072dbd docs: Fix incorrect FQDN metrics which are disabled by default This metrics were incorrectly stating that they were enabled by default which confused users. Fix it to mention they are disabled by default and must be enabled explicitly via --metrics. Fixes: 1133bd5d30 ("docs: Added `Default` column in metrics details") Fixes: https://github.com/cilium/cilium/pull/20255 Signed-off-by: Chris Tarazi <chris@isovalent.com> 22 November 2022, 01:06:01 UTC
e3b0095 docs: Update API rate limiter metrics to match style of other metrics We do this by removing the extraneous "cilium_" prefix from the metrics to align with the other metrics names in this file. Signed-off-by: Chris Tarazi <chris@isovalent.com> 22 November 2022, 01:06:01 UTC
933bdcb docs: Clarify wildcards and subdomains in FQDN policies Signed-off-by: flxman <felix.farjsjo@gmail.com> 21 November 2022, 20:04:29 UTC
cf354e9 policy/trace: Remove redundant files Signed-off-by: Rushikesh Butley <rushikeshbutley@gmail.com> 21 November 2022, 19:23:52 UTC
9b78190 policy/trace: Remove unused yaml parsing function Signed-off-by: Rushikesh Butley <rushikeshbutley@gmail.com> 21 November 2022, 19:23:52 UTC
89e4454 cli: Remove yaml parser from policy trace Signed-off-by: Rushikesh Butley <rushikeshbutley@gmail.com> 21 November 2022, 19:23:52 UTC
923be90 daemon/cmd: cleanup, remove superfluous sprintf. Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com> 21 November 2022, 10:39:02 UTC
e46054c k8s/watchers: prevent panic when cep has no network status. Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com> 21 November 2022, 10:39:02 UTC
5268b58 daemon/cmd: make CES cleanup behaviour explicit. Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com> 21 November 2022, 10:39:02 UTC
618ba55 daemon: add cleanup for stale local ciliumendpoints. It's possible for CiliumEndpoints to become stale where they still reference existing Pods that are no longer being managed by Cilium. In this scenario, the operator will not GC these CEPs as they have a valid pod owner reference. This commit adds an init cleanup which cleans up stale ceps. As well, cep/ces K8s watchers will mark such CEPs for deletion and a controller GC routine will periodically GC the old CEPs. Fixes #17631 Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com> 21 November 2022, 10:39:02 UTC
264dd3f daemon: add indexer to CES watcher which indexes local CES. Added support for indexing informer in k8s/watchers, as well as custom indexer func which allows maintaining index on CES's containing local endpoints by their underlying endpoint names. Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com> 21 November 2022, 10:39:02 UTC
5bb04cf pkg/option: add flag for toggling stale CEP cleanup. Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com> 21 November 2022, 10:39:02 UTC
cdb1f3f workflows: aks: enable debug This commit enables debug logging for all AKS workflow, which should help debug some flaky tests. Signed-off-by: Gilberto Bertin <jibi@cilium.io> 21 November 2022, 10:22:23 UTC
123b0d9 chore(deps): update docker.io/library/golang:1.19.3 docker digest to dc76ef0 Signed-off-by: Renovate Bot <bot@renovateapp.com> 18 November 2022, 23:37:40 UTC
back to top