https://github.com/cilium/cilium

sort by:
Revision Author Date Message Commit Date
ff4f5b3 test commit Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> 23 May 2023, 13:57:22 UTC
b0836e8 Prepare for release v1.10.5 Signed-off-by: Joe Stringer <joe@cilium.io> 14 October 2021, 00:37:27 UTC
27553fb Update Go to 1.16.9 Signed-off-by: Tobias Klauser <tobias@cilium.io> 13 October 2021, 23:05:01 UTC
f5a5149 test: Basic e2e test for egress gateway [ upstream commit 0ed817c78b54f9b1d6b0ef184e1670354768993b ] The original patch (https://github.com/cilium/cilium/pull/17377/commits/06e1f1c3784fafe426fc7a78743ebfb1229bc731) for this test included an additional policy in test/k8sT/manifests/egress-nat-policy.yaml: > apiVersion: cilium.io/v2alpha1 > kind: CiliumEgressNATPolicy > metadata: > name: egress-to-black-hole > spec: > egress: > - podSelector: > matchLabels: > zgroup: testDSClient > namespaceSelector: > matchLabels: > ns: cilium-test > # Route everything to a black hole. > # It shouldn't affect in-cluster traffic. > destinationCIDRs: > - 0.0.0.0/0 > egressSourceIP: 1.1.1.1 # It's a black hole which was meant to test https://github.com/cilium/cilium/pull/17377/commits/b8c757a8b2dd3cd7b8c331f3ca5d38713c79a967, which aimed to address https://github.com/cilium/cilium/issues/16147. The above patch, however, lead to a verification error so it was excluded from this PR. Signed-off-by: Yongkun Gui <ygui@google.com> Signed-off-by: Kornlios Kourtis <kornilios@isovalent.com> Signed-off-by: Ilya Dmitrichenko <errordeveloper@gmail.com> 13 October 2021, 13:45:50 UTC
e9e9b81 datapath: egress gw: fix non-tunnel mode [ upstream commit 14331816e9b748cdd6ca8d5ab247e9e357eea64f ] When a client uses an egress gateway node, it forwards traffic via a vxlan tunnel to the egress gateway node. If datapath is configured in non-tunnel mode (direct routing), replies from the gateway to the client do not go via the tunnel. This causes these replies to be dropped by iptables because no Cilium's FORWARD rule matches them This patch identifies above packets (i.e., from egress gw to client), and steers them via the vlxan tunnel after rev-SNAT is performed even when datapath is configured in non-tunnel mode. A suggestion by Paul and Martynas (@brb) was to use the following condition to identify said packets: > if rev-SNATed IP ∈ native CIDR && rev-SNATed IP !∈ node pod CIDR => send to tunnel This patch, instead, checks the egress gateway policy map. This seems like a safer approach, because all packets that match contents of above map in the forward direction will be forwarded to the gw node. Fixes: #17386 Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com> Signed-off-by: Ilya Dmitrichenko <errordeveloper@gmail.com> 13 October 2021, 13:45:50 UTC
abba902 daemon: add mock dns proxy to tests [ upstream commit 0666f5355de975733e8d1b094aade7fd3a6b8786 ] This change allows for daemon integration tests to run with mock DNS proxy Signed-off-by: Maciej Kwiek <maciej@isovalent.com> Signed-off-by: Ilya Dmitrichenko <errordeveloper@gmail.com> 13 October 2021, 13:45:50 UTC
d7e77df fqdn: Add proxy interface [ upstream commit 1fc4208789d913f255bd99c77956de3ac7cc4bec ] This change adds interface for abstracting away FQDN proxy Signed-off-by: Maciej Kwiek <maciej@isovalent.com> Signed-off-by: Ilya Dmitrichenko <errordeveloper@gmail.com> 13 October 2021, 13:45:50 UTC
5a81d23 bpf: Add extension for running sock LB on MKE-related containers [ upstream commit 13ebeb0f83737d0509a1f18ac0e1899b381c92c1 ] This adds two hidden/undocumented options to the agent which allows Cilium in KPR=strict mode to be deployed with Mirantis Kubernetes Engine (MKE): --enable-mke=true --mke-cgroup-mount="" (auto-detection as default, or for manual specification:) --mke-cgroup-mount="/sys/fs/cgroup/net_cls,net_prio" MKE adds a number of Docker containers onto each MKE node which are otherwise neither visible nor managed from Cilium side, example: docker network inspect ucp-bridge -f "{{json .Containers }}" | jq . | grep Name "Name": "ucp-kv", "Name": "ucp-kube-controller-manager", "Name": "ucp-kube-apiserver", "Name": "ucp-swarm-manager", "Name": "ucp-kubelet", "Name": "ucp-auth-store", "Name": "ucp-cluster-root-ca", "Name": "ucp-hardware-info", "Name": "ucp-client-root-ca", "Name": "ucp-kube-scheduler", "Name": "ucp-proxy", "Name": "ucp-controller", They [0] contain things like the kubeapi-server which then live in their own network namespace with their own private address range of 172.16.0.0/12. The link to the hostns is set up from MKE side and are veth pairs which are connected to a bridge device: [...] 59: br-61d49ba5e56d: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default link/ether 02:42:b2:e4:55:ff brd ff:ff:ff:ff:ff:ff inet 172.19.0.1/16 brd 172.19.255.255 scope global br-61d49ba5e56d valid_lft forever preferred_lft forever 61: vethd56c086@if60: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br-61d49ba5e56d state UP group default link/ether 06:ad:07:c6:55:e8 brd ff:ff:ff:ff:ff:ff link-netnsid 4 63: veth7db52f6@if62: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br-61d49ba5e56d state UP group default link/ether aa:10:e2:d8:b7:6c brd ff:ff:ff:ff:ff:ff link-netnsid 5 65: vethe23d66c@if64: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br-61d49ba5e56d state UP group default link/ether ba:f1:e3:de:ce:a0 brd ff:ff:ff:ff:ff:ff link-netnsid 6 [...] This is different compared to regular K8s deployments where such components reside in the hostns. For the socket LB which is enabled in KPR=strict deployments this is problematic as these containers are not seen the same way as hostns and therefore not all the translations might take place as we perform in hostns (like accessing NodePort via loopback/local addressing, etc). We've noticed this in particular in combination with the work in f7303afa1412 ("Adds a new option to skip socket lb when in pod ns") which is needed to get the Istio use-case working under MKE since the latter treats these MKE system containers in the same way as application Pods and therefore disables socket LB for them whereas no bpf_lxc style per packet translation gets attached from tc, hence complete lack of service translation in this scenario. One observation in MKE environments is that cgroups-v2 is only supported with MCR 20.10 which is not available in every deployment at this point. However, MKE makes use of cgroup-v1 and under /sys/fs/cgroup/net_cls/ it populates both the com.docker.ucp/ and docker/ subdirectories. One idea for a non-intrusive fix to get KPR=strict deployments working is to tag these container's net_cls controllers with a magic marker which we can then be read out from the socket LB with the kernel extension we added some time ago [1]. Given this relies on 'current' as task we can query for get_cgroup_classid() to determine that this should have similar service handling behavior as in hostns. This works reliable as 'current' points to the application doing the syscall which is always in process context. Pods are under /sys/fs/cgroup/net_cls/kubepods/ whereas all MKE containers under /sys/fs/cgroup/net_cls/{com.docker.ucp,docker}/. Upon agent start, it will set a net_cls tag for all paths under the latter. On cgroup side, this will walk all sockets of all processes of a given cgroup and tag them. In case MKE sets up a subpath under the latter, then this will automatically inherit the net_cls tag as per cgroup semantics. This has two limitations which were found to be acceptable: i) this will only work in Kind environments with kernel fixes we upstreamed in [2], and ii) no other application on the node can use the same net_cls tag. Running MKE on Kind is not supported at the moment, so i) is a non-issue right now. And it's very unlikely to run into collisions related to ii). This approach has been tested on RHEL8, and Duffie asserted that connectivity works as expected [when testing] manually. For the sake of record, there were 2 alternative options that have been weighted against this approach: i) attaching cgroups-v2 non-root programs, ii) per packet translation at tc level. Unfortunately i) was not an option since MKE does not support cgroups-v2 in near future and therefore MKE-related containers are also not in their own cgroup-v2 path in the unified hierarchy. Otherwise it would have allowed for a clean way to override default behavior for specific containers. And option ii) would have ended up in a very intrusive way, meaning, the agent would need to detect MKE related veth devices, attach to tc ingress and tc egress and we would have to split out the bpf_lxc service translation bits or attach some form of stripped down bpf_lxc object to them in order to perform DNAT and reverse DNAT. This approach taken in here achieves the same in just very few lines of extra code. [0] https://docs.mirantis.com/mke/3.4/ref-arch/manager-nodes.html https://docs.mirantis.com/mke/3.4/ref-arch/worker-nodes.html [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5a52ae4e32a61ad06ef67f0b3123adbdbac4fb83 [2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8520e224f547cd070c7c8f97b1fc6d58cff7ccaa https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=78cc316e9583067884eb8bd154301dc1e9ee945c Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Duffie Cooley <dcooley@isovalent.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> 12 October 2021, 12:32:52 UTC
9d1752f helm: Expose l2 neigh discovery related agent flags [ upstream commit da746075e6f51943453bfe7a6d3bda5a3824ccaa ] This commit exposes the following flags: - "--enable-l2-neigh-discovery" via "l2NeighDiscovery.enabled". - "--arping-refresh-period" via "l2NeighDiscovery.refreshPeriod". Signed-off-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: Chris Tarazi <chris@isovalent.com> 11 October 2021, 23:57:41 UTC
a4ca47a datapath/linux: enable neighbor discovery in unit tests [ upstream commit 6e022750e78819397e4e886727d3f153412d8841 ] This option should be enabled in unit tests since it is also enabled as a flag in the agent. Fixes: cee08cd2b299 ("daemon: Make L2 neighbor discovery configurable.") Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Chris Tarazi <chris@isovalent.com> 11 October 2021, 23:57:41 UTC
b4c9323 daemon, ipam, option: Introduce ability to bypass IP availability error [ upstream commit baa9e9a200a96eb4cf97a3e660af3f5f02ae3762 ] In CRD-backed IPAM modes such as ENI mode, some IPs were recently removed from the IPAM pool [1]. In user environments where Cilium was running a version prior to [1], it is possible for endpoints to be assigned "unavailable" IPs. Upon upgrade (to a version which includes [1]), endpoint restoration will fail with [2]. In order to workaround this failure and not disrupt the upgrade, this change introduces a hidden flag (`--bypass-ip-availability-upon-restore`) which will inform Cilium to continue on if the restored endpoint's IP is not available for reallocation, bypassing the specific error "IP is not available". Other errors will not be bypassed, in order to reduce the scope of this stop-gap solution. With the flag set, restored endpoints which had "unavailable" IPs will keep them. Any new endpoints / pods will be assigned fresh, valid IPs from the pool. This flag is only meant to be enabled with CRD-backed IPAM modes such as ENI mode. The reason is because of the change described in [1], where the primary ENI IP was removed from the IPAM pool. In any other mode that this flag is enabled in, the user is warned that the flag is not intended for other modes and will have no effect. This patch is intended to be reverted in the future, as this stop-gap solution will no longer be required, as users of Cilium don't upgrade from versions prior to [1]. I propose that we revert this in the following release that this patch makes it in (N+1). How was this tested? 1) Deployed a Cilium version that doesn't include [1] on EKS cluster 2) Created a Deployment object which I scaled to max out the ENI IPs, such that at least one pod is assigned an "unavailable" IP 3) Upgraded Cilium to a version which does include [1] and observe [2] failures 4) Reset cluster back to state from (2) 5) Upgrade Cilium to the version that contains this commit 6) Observe log msgs from this commit and endpoint restoration succeeding 7) Scale Deployment to 0 and back up, to restart all pods 8) Observe that they all get fresh IPs and none of the "unavailable" IPs [1]: https://github.com/cilium/cilium/pull/15453 [2]: ``` { "time":"2021-09-20T16:57:00.400086481Z", "level":"WARN", "origin":"cilium.io/agent", "message":"Unable to restore endpoint, ignoring", "params":{ "endpointID":"992", "error":"Failed to re-allocate IP of endpoint: unable to reallocate 10.0.133.193 IPv4 address: IP 10.0.133.193 is not available", "k8sPodName":"default/pod-1", "subsys":"daemon" } } ``` Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Glib Smaga <code@gsmaga.com> 07 October 2021, 15:35:33 UTC
124fa2b ipam: Convert IP not available error into sentinel error [ upstream commit c88e06fd35e4480d08782c1bad8728b22fcafa39 ] This will be used in the subsequent commit to check for a specific error when allocating an IP. Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Glib Smaga <code@gsmaga.com> 07 October 2021, 15:35:33 UTC
06a35c1 Adds a new option to skip socket lb when in pod ns [ upstream commit f7303afa14121a288773bd3854231a658a37b88d ] This is for compatibility with Istio in kube-proxy free mode. Currently, even though Istio would still get all traffic within pod namespace, but the original service VIP is lost during socket lb, causing it to miss all Istio routing chains and therefore bypassing all Istio functionalities. This adds a new option to bypass socket lb in pod namespace. When enabled, service resolution for connection from pod namespaces will be handled in bpf_lxc at veth. For host-namespaced pods, socket lb kept as is. Signed-off-by: Weilong Cui <cuiwl@google.com> Signed-off-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: Glib Smaga <code@gsmaga.com> 07 October 2021, 15:35:33 UTC
ba84d60 Add neighbor discovery behavior docs to kubeproxy-free. [ upstream commit 0410c924268214e5caaf573bfbc317f26a685abe ] xref: https://github.com/cilium/cilium/pull/16974 Signed-off-by: Ayodele Abejide <abejideayodele@gmail.com> Signed-off-by: Glib Smaga <code@gsmaga.com> 07 October 2021, 15:35:33 UTC
91a9199 test: bump coredns version to 1.7.0 [ upstream commit f6f2406017ca6cca537400bc6fbf4b32ebec42e2 ] coredns < 1.7.0 has a bug that makes the services resolution to become out-of-sync with the last state from Kubernetes in case coredns suffers from a disconnection with kube-apiserver [1]. This bug is fixed on all versions equal and above 1.7.0. [2] In our CI this affects all Kubernetes jobs 1.18 and below and can result in flaky tests that have the result in the following similar logs: ``` service IP retrieved from DNS (10.101.253.144) does not match the IP for the service stored in Kubernetes (10.108.15.225) ``` [1] https://github.com/coredns/coredns/issues/3587 [2] https://github.com/coredns/coredns/pull/3924 Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Glib Smaga <code@gsmaga.com> 07 October 2021, 15:35:33 UTC
07733b0 dnsproxy: unit test GetSelectorRegexMap [ upstream commit fcc345ec0a00ad4a385f650910f1d35e653a1a2e ] Signed-off-by: Maciej Kwiek <maciej@isovalent.com> Signed-off-by: Glib Smaga <code@gsmaga.com> 07 October 2021, 15:35:33 UTC
6337fa5 proxy: Expose cachedSelectorREEntry type, regex map retrieval [ upstream commit 5936a5a7a84c218a7a20d0e7295d1ec4487d38d9 ] This type will be useful for serializing cache entries for communication between agent and external components Signed-off-by: Maciej Kwiek <maciej@isovalent.com> Signed-off-by: Glib Smaga <code@gsmaga.com> 07 October 2021, 15:35:33 UTC
59b274e daemon: Make L2 neighbor discovery configurable. [ upstream commit cee08cd2b29927f81e2eca3ad4aea8741a380d70 ] This allows users who do not want cilium populating neighbor table with mac addresses of neighbors cilium might have discovered via its discovery process to opt out of cilium's neighbor discovery mechanisms. Signed-off-by: Ayodele Abejide <abejideayodele@gmail.com> Signed-off-by: Glib Smaga <code@gsmaga.com> 07 October 2021, 15:35:33 UTC
0836078 Update language on libceph with kubeproxy-free [ upstream commit a8b34806e2905c675b90b61f5c9d8ae7609057f2 ] It was not clear if kernel v5.8 has problem with libceph or if 5.8 fixes the problem. Redo the sentence based on feedback to make it more clear and easy to read. Signed-off-by: Ville Ojamo <bluikko@users.noreply.github.com> Signed-off-by: Glib Smaga <code@gsmaga.com> 07 October 2021, 15:35:33 UTC
4204f66 node: don't exclude IPs from devices in unknown oper state [ upstream commit 6dbabed5c70570e69110fb09ba40f14abc32615d ] In initExcludedIPs() we build a list of IPs that Cilium needs to exclude to operate. One check to determine if an IP should be excluded is based on the state of the net device: if the device is not up, then its IPs are excluded. Unfortunately, this check is not enough, as it's possible to have a device reporting an unknown state (because its driver is missing the operstate handling, e.g. a dummy device) while still being operational. This commit changes the logic in initExcludedIPs() to not exclude IPs of devices reporting an unknown state. Signed-off-by: Gilberto Bertin <gilberto@isovalent.com> Suggested-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Gilberto Bertin <gilberto@isovalent.com> 02 October 2021, 12:52:15 UTC
df1c742 .github/workflows: add step to verify helm-charts images versions [ upstream commit c4773d834c99d4ea8788dd86ce00e6c32c55937f ] As image versions are supposed to be set in the Makefile, we should add a step on the GH workflow to verify the correctness of those versions. Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Gilberto Bertin <gilberto@isovalent.com> 02 October 2021, 12:52:15 UTC
7c97bc7 install/kubernetes: set right version on Makefile [ upstream commit 5d37a2fcedcbe142449fc91866ae1944fc3c9dfa ] The Makefile contains all component versions which are then used to generate the helm charts. This commit fixes some of those versions that got out-of-sync with the right versions. Fixes: 206105f4462c ("helm: use 'quay.io/cilium/certgen:v0.1.5'") Fixes: 09f3c81920bd ("helm: upgrade envoy to v1.18.4 for hubble-ui") Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Gilberto Bertin <gilberto@isovalent.com> 02 October 2021, 12:52:15 UTC
4d9b307 pkg/identity: Add missing labels to well-known identities [ upstream commit b281dd741d0904b088c11e38f22d87b5fc364e96 ] Kubernetes 1.21 automatically adds a new label to all namespaces when the NamespaceDefaultLabelName feature gate is enabled. (https://kubernetes.io/docs/concepts/overview/_print/#automatic-labelling) This commit adds an additional entry for all well-known identities adding that label. Signed-off-by: Mauricio Vásquez <mauricio@accuknox.com> Signed-off-by: Mauricio Vásquez <mauricio@kinvolk.io> Signed-off-by: Gilberto Bertin <gilberto@isovalent.com> 02 October 2021, 12:52:15 UTC
ad50b02 pkg/fqdn: fix memory leak [ upstream commit 898322767feae24f7f0e657f1dc5fbdd666bcc3f ] In the FQDN architecture there's a DNS Cache per endpoint, used to track which domain names each endpoint makes DNS requests, and a global DNS Cache where its main functionality is to help tracking which api.FQDNSelector present in the policy applies to locally running endpoints. The latter, as opposed to the former, didn't have any cleanup mechanism for the map that tracked which entries should be garbage collected, making the global DNS Cache to grow. This commit prevents those entries from being tracked for Garbage Collection in the global DNS Cache. Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Gilberto Bertin <gilberto@isovalent.com> 02 October 2021, 12:52:15 UTC
8de8ee2 pkg/fqdn: clean unused code [ upstream commit 900825528354b52810a3b5e95a63d6fcf81b2c83 ] The public function ForceExpiredByNames is not executed from anywhere so this function can be safely removed. Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Gilberto Bertin <gilberto@isovalent.com> 02 October 2021, 12:52:15 UTC
5061f07 helm: upgrade envoy to v1.18.4 for hubble-ui [ upstream commit 09f3c81920bd3eb6a4ec13ea54694feb63fc25e9 ] Signed-off-by: Dmitry Kharitonov <dmitry@isovalent.com> Signed-off-by: Gilberto Bertin <gilberto@isovalent.com> 02 October 2021, 12:52:15 UTC
5b4a3b7 pkg/k8s: fix User-Agent for kubernetes client [ upstream commit 9e4d84b17c1e1e52ba5413763c44a62510f84675 ] The Kubernetes' client User-Agent was never set and it would always fallback to the default value. This commit fixes this issue and now all Cilium components will correctly present their User-Agent. Fixes: b31ed337090a ("Add k8s client qps and burst as cli flags for the operator") Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Gilberto Bertin <gilberto@isovalent.com> 02 October 2021, 12:52:15 UTC
247d509 hubble: Display proxy redirects in policy verdict events [ upstream commit c40ed791109d6b1500b2a86c2a75126e507a934e ] Before this commit, Hubble was ignoring proxy redirection information from the policy-verdict events it received from the datapath. For example, a cilium monitor event such as: Policy verdict log: flow 0x0 local EP ID 1531, remote ID 35429, proto 17, egress, action redirect, match L3-L4, 10.240.0.62:37282 -> 10.240.0.63:53 udp would be displayed in hubble observe as: Sep 15 17:23:11.960: cilium-test/client-6488dcf5d4-f9kfl:37282 -> kube-system/coredns-d4866bcb7-zh5jv:53 L3-L4 FORWARDED (UDP) This commit adds a new verdict REDIRECTED to signal such event. Such events now appear as: default/pod-to-external-fqdn-allow-google-cnp-5ff4986c89-n87h2:58314 -> kube-system/coredns-755cd654d4-j4vzh:53 UNKNOWN 5 (UDP) A subsequent patch to the Hubble command line will display value 5 as "REDIRECTED". Signed-off-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Gilberto Bertin <gilberto@isovalent.com> 02 October 2021, 12:52:15 UTC
3e4d9f0 contrib/backporting: add environment variables to set ORG and REPO [ upstream commit 83d30deca58251e7246039ed100183d68c7a0d6a ] Having these environment variables allows the cherry-pick script to be used on other projects that are not Cilium. Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Gilberto Bertin <gilberto@isovalent.com> 02 October 2021, 12:52:15 UTC
3b1034d daemon: Add --derive-masquerade-ip-addr-from-device opt [ upstream commit d204d789746b1389cc2ba02fdd55b81a2f55b76e ] The new option is used to specify a device which globally scoped IP addr should be used for BPF-based masquerading. This is a workaround for an environment which uses ECMP for outgoing traffic via multiple devices and it has a dedicated device which IP addr should be used for the masquerading. The workaround is relevant until https://github.com/cilium/cilium/issues/17158 has been resolved (thus, we hide the flag). Signed-off-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: Gilberto Bertin <gilberto@isovalent.com> 02 October 2021, 12:52:15 UTC
3a3b497 Fix overwriting iptables for kube-proxy free installation [ upstream commit 27fd5cc0f7177e62e7facaeec70ed9eae9ac7a00 ] Signed-off-by: Stijn Smits <stijn@stijn98s.nl> Signed-off-by: Gilberto Bertin <gilberto@isovalent.com> 02 October 2021, 12:52:15 UTC
eca4f60 bugtool: Include listing of egress gateway map [ upstream commit 3441acc9b88d2552e2b51289360798e432e2b4b2 ] Signed-off-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Gilberto Bertin <gilberto@isovalent.com> 02 October 2021, 12:52:15 UTC
0ff0073 operator: Improve identity GC efficiency [ upstream commit ede69e83021a02768382b7d7b3ff128933f870c1 ] With this commit, the identity GC rate limit (--identity-gc-rate-interval) becomes the effective rate at which identities are garbage collected. Previously, the identity GC interval (--identity-gc-interval) would cause the Operator to GC for that much time, then the sleep for that much time, rinse and repeat, effectively halving the rate. To use concrete numbers for an example, let's say our interval is 5m and our GC rate interval is 1000 per minute. It would mean that previously, we would GC 5000 identities at a maximum for 10m (assuming that deletion takes 0s). How was that calculated? Each minute, we GC 1000 identities. After 5m, we have GC'd 5000 identities. But now we have to sleep for 5m because that's our GC interval. Hence making our effective GC rate 500 per minute (instead of being 1000/m). Now, we compute the time taken to perform the actual GC and subtract that from the interval. So in our above example, we eliminate the dead time of 5m and avoid slashing our effective GC rate in half. This change allows the Operator to keep up with the demand more efficiently. The Operator will warn if the GC duration took longer than the interval and set the sleep duration to 0. Suggested-by: Joe Stringer <joe@cilium.io> Suggested-by: Dan Wendlandt <dan@isovalent.com> Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Gilberto Bertin <gilberto@isovalent.com> 02 October 2021, 12:52:15 UTC
4d8383b logfields: Add Hint field [ upstream commit 2b44dcbd7eb77e8506d91b01f1789f0fc9be04a1 ] This is useful in warning or error level messages to help nudge the user in the right direction when troubleshooting. Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Gilberto Bertin <gilberto@isovalent.com> 02 October 2021, 12:52:15 UTC
c056c27 pkg/nodediscovery: Error if impossible to set OwnerReference on CiliumNode [ upstream commit b0c33939f8b4d61c1afc35e9b80395ae922f2b25 ] It is impossible to set the OwnerReference if we fail to fetch the corresponding Kubernetes Node and the existing CiliumNode resource doesn't already have it set. We can rely the OwnerReference to be set because this logic was added in v1.6, which is sufficiently earlier version of Cilium. [1] The reason for doing this is to ensure that the OwnerReference can always be set. If we cannot, this should be treated as an error and we shouldn't proceed. Cilium should not run in an environment where the Kubernetes Node resource is missing. [1]: 5c365f2c6d7930dcda0b8f0d5e6b826a64022a4f ("ipam: Automatically create CiliumNode resource on startup") Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Gilberto Bertin <gilberto@isovalent.com> 02 October 2021, 12:52:15 UTC
1a78f16 ipam, operator: Remove CiliumNode deletion logic from CiliumNode watcher [ upstream commit 71a65cb1830d37eb186c8ffda0bca6affec62fb5 ] We don't need to implement this logic for two reasons: 1) We rely on CiliumNode resources to be deleted / cleaned up by attaching the corresponding K8s Node as an `ownerReference` in the CiliumNode. 2) It is redundant to delete the CiliumNode in response to an event...of the CiliumNode deletion itself. In very rare cases, this logic can actually delete a newly created CiliumNode by accident (see example below). Instead, keep all deletion logic besides the actual K8s API calls (DELETE) and perform a Get() to ensure that it's been deleted. Otherwise, log to the user that the resource may still exist. Example: Say an existing node was deleted and then recreated in quick succession with the same name. When the node is recreated, the agent will be scheduled on it. During bootstrap it'll create a corresponding CiliumNode resource. Given that only one Operator is operational at any time in a cluster, it is already running on another node in the cluster. The node-delete event will first delete the K8s node and then trigger a CN-delete via reason 1 from above. It is possible for the CN-delete event to be delayed such that it is received after the node-create event (the recreate). When the CN-delete event is received by the already-running Operator, the CiliumNode watcher logic will then trigger (erroneously) another CN-delete, thereby deleting the CiliumNode resource while the K8s node is still alive. Fixes: 6d44f4cd ("operator: sync cilium nodes to kvstore instead of k8s nodes") Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Gilberto Bertin <gilberto@isovalent.com> 02 October 2021, 12:52:15 UTC
8f2647a docs: Fix version sorting for CRD schema docs [ upstream commit 98a995c9b5633cca0ed576e1c201b71a3f05b52d ] Use "sort -V" (versions) rather than "sort -n" (numeric) so that the docs list the minor versions in chronological order. Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Gilberto Bertin <gilberto@isovalent.com> 02 October 2021, 12:52:15 UTC
35f2ca2 docs: Fix up broken minikube link [ upstream commit 9e740b187ea733b4db27cb050ce8a418350bd818 ] The section that this guide refers to is now its own dedicated page guide, and users can use any environment to test it out. Fix the redirect. Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Gilberto Bertin <gilberto@isovalent.com> 02 October 2021, 12:52:15 UTC
a7a5463 ci: update cilium-cli to 0.9.1 [ upstream commit acf34318e49df9060fc51b205d1a0069c988ee62 ] This change updates cilium-cli to 0.9.1 in github action workflows files. Signed-off-by: Maciej Kwiek <maciej@isovalent.com> Signed-off-by: Alexandre Perrin <alex@kaworu.ch> 29 September 2021, 08:42:55 UTC
599fa33 workflows: pin `cilium-cli` version to v0.8.6 [ upstream commit 2202dae9a4e74fc1d485d5da48484cb7ae79671b ] In #16892, we switched from pinning CLI version in workflows to using the latest stable version automatically. This can cause issues if a new release does not play nice with the set of environments tested by the workflows on `cilium/cilium`. We are reverting to pinning CLI version so as to have better control over the test environment, and avoid new CLI releases negatively impacting `cilium/cilium` workflows immediately upon release. With the CLI version pinned, any issues with the new version will be detected in the PR bumping the pinned version, allowing us to fix them prior to merging. Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com> Signed-off-by: Alexandre Perrin <alex@kaworu.ch> 29 September 2021, 08:42:55 UTC
0c69d17 jenkinsfiles: Don't display nulls in current build display name [ upstream commit e0da2e441e506279bdc9d3089024655abfbebb8d ] Signed-off-by: Tom Payne <tom@isovalent.com> Co-authored-by: Nicolas Busseneau <nicolas@isovalent.com> Signed-off-by: Alexandre Perrin <alex@kaworu.ch> 29 September 2021, 08:42:55 UTC
795ed5f docs: Clarify exact requirements for the egress gateway [ upstream commit 82469c3f79ca326f8dbae2e7374a8a9fdc99a125 ] The egress gateway doesn't exactly require our kube-proxy replacement to be enabled. It only requires BPF masquerading which itself requires BPF NodePort. Enabling KPR is just an easy way to enable BPF NodePort. Signed-off-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Alexandre Perrin <alex@kaworu.ch> 29 September 2021, 08:42:55 UTC
66d3316 datapath/linux/ethtool: use IoctlGetEthtoolDrvinfo [ upstream commit d82ac6f54c0118088cc46d8d892ff5e87cf5d09e ] Use the ioctl wrapper provided in the golang.org/x/sys/unix package with the correctly padded ifreqData struct, rather than providing our own wrapper and struct which is incorrectly padded. Also add a simple unit test and make sure the package is only built on Linux. Signed-off-by: Tobias Klauser <tobias@cilium.io> Signed-off-by: Alexandre Perrin <alex@kaworu.ch> 29 September 2021, 08:42:55 UTC
9d56b51 vendor: bump golang.org/x/sys to latest version [ upstream commit 6418ade4ba021ca5d8b46dac70d707617e5dd0a4 ] This pulls in a few fixes around ioctl wrappers wrt. unsafe.Pointer usage and fixes ifreqEthtool to be correctly padded. Ref. golang/sys@e5e7981a10699f0af2ffea4c0e0f542e447b2e4a Ref. golang/sys@b4502255bfe74b77ffdeb6ed972eeecd98d13721 Signed-off-by: Tobias Klauser <tobias@cilium.io> Signed-off-by: Alexandre Perrin <alex@kaworu.ch> 29 September 2021, 08:42:55 UTC
289d46a test: Enable BPF masq in test from k8s [ upstream commit 4b92d2de7f36de246dd5b2144f1e8d2532f23f0e ] Previously, they were failing due to our datapath masquerading replies from pod to outside. As it got fixed in the previous commit, we can enable BPF-based masquerading. This will also gives us some coverage for the fix. Signed-off-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: Alexandre Perrin <alex@kaworu.ch> 29 September 2021, 08:42:55 UTC
3f62d49 datapath: Improve snat_v4_needed() comment [ upstream commit 55bfba9dd5b574bacb5b4afbfa337fc9d4cdd5cf ] Signed-off-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: Alexandre Perrin <alex@kaworu.ch> 29 September 2021, 08:42:55 UTC
9cbb6e5 datapath: Do not SNAT replies to outside [ upstream commit d3ff998f2b30b41227a395b6415627d8cb611e69 ] Previously, the BPF-based masquerading (--enable-bpf-masquerade=true) was wrongly masquerading replies from a pod to an outside when the outside had initiated a connection. This was possible when e.g., the outside had a route to the pod cidr. To fix this, we introduce a lightweight CT lookup function ct_is_reply4() which checks whether a given flow is a reply. The lookup function is called in snat_v4_needed(). As a side note, I've tried to move the port extraction to a separate function, but unfortunately it hits complexity issues on the 4.19 kernel in the "K8sDatapathConfig AutoDirectNodeRoutes Check direct connectivity with per endpoint routes" suite: BPF program is too large. Processed 131073 insn libbpf: failed to load program 'handle_to_container' libbpf: failed to load object '624_next/bpf_lxc.o' Signed-off-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: Alexandre Perrin <alex@kaworu.ch> 29 September 2021, 08:42:55 UTC
1fcaa62 test/runtime: Look into log errors after test start [ upstream commit 38994b01cdd6b707a422263e379692dcacaf14e1 ] When running runtime tests locally sometimes the test fail as level=error log entries are found that are the result of cilium-agent restarts during provisioning. This is similar to the fix done in https://github.com/cilium/cilium/pull/14529. Signed-off-by: Jussi Maki <jussi@isovalent.com> Signed-off-by: Alexandre Perrin <alex@kaworu.ch> 29 September 2021, 08:42:55 UTC
f084d79 fix bandwidth-manager install error Signed-off-by: JinLin Fu withlin@apache.org [ upstream commit 712af8e26e4afe1f380ee081b152fba1392b3b96 ] Signed-off-by: Alexandre Perrin <alex@kaworu.ch> 29 September 2021, 08:42:55 UTC
72c3768 test: Skip Istio test on k8s <1.17 [ upstream commit 3992048f6ac44b2e44f4c2d9d157a3987b883af0 ] Istio 1.10 requires at least k8s version 1.17. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> Signed-off-by: Alexandre Perrin <alex@kaworu.ch> 29 September 2021, 08:42:55 UTC
87fc5ad istio: Update to release 1.10.4 [ upstream commit 4c87394b7a2380909d86ffa3e8da47df0fff2e98 ] Update Cilium Istio integration to Istio release 1.10.4. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> Signed-off-by: Alexandre Perrin <alex@kaworu.ch> 29 September 2021, 08:42:55 UTC
99bb382 build(deps): bump 8398a7/action-slack from 3.9.3 to 3.10.0 Bumps [8398a7/action-slack](https://github.com/8398a7/action-slack) from 3.9.3 to 3.10.0. - [Release notes](https://github.com/8398a7/action-slack/releases) - [Commits](https://github.com/8398a7/action-slack/compare/047b09b154480ed39076984b64f324fff010d703...c84a35cfa82a01f3733a3cbf5d5260123e55c2f9) --- updated-dependencies: - dependency-name: 8398a7/action-slack dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> 21 September 2021, 18:45:10 UTC
94e3efe Update Go to 1.16.8 Signed-off-by: Tobias Klauser <tobias@cilium.io> 20 September 2021, 17:17:43 UTC
d84ea98 fix MLH config trigger Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com> 20 September 2021, 07:36:42 UTC
e23dc61 Populates backend map from V2 backend map This is potentially a lossy process, ideally we copy everything from v2 map to v1 map but since v1 map's key type is smaller (16b vs 32b in v2), for entries with an ID that is larger than 64k, we drop them. This means existing connection could potentially be interrupted when downgrading from v2 to v1. This logic needs to go into 1.10 releases, because we will introduce v2 map in 1.11 and the downgrade logic needs to be in place. Signed-off-by: Weilong Cui <cuiwl@google.com> 13 September 2021, 19:36:17 UTC
1f0d836 build(deps): bump 8398a7/action-slack from 3.9.2 to 3.9.3 Bumps [8398a7/action-slack](https://github.com/8398a7/action-slack) from 3.9.2 to 3.9.3. - [Release notes](https://github.com/8398a7/action-slack/releases) - [Commits](https://github.com/8398a7/action-slack/compare/e74cd4e48f4452e8158dc4f8bcfc780ae6203364...047b09b154480ed39076984b64f324fff010d703) --- updated-dependencies: - dependency-name: 8398a7/action-slack dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> 13 September 2021, 19:30:50 UTC
097dec6 node: Fix race condition on labels' getter/setter [ upstream commit ac0d184d51d67fe9eeabd5683bd21c263c55c9e6 ] Our race detection pipeline detected a race condition between the setter and getter for the local node's labels: WARNING: DATA RACE Write at 0x000005882150 by goroutine 154: github.com/cilium/cilium/pkg/node.SetLabels() /go/src/github.com/cilium/cilium/pkg/node/host_endpoint.go:33 +0x1dc github.com/cilium/cilium/pkg/k8s/watchers.(*K8sWatcher).updateK8sNodeV1() /go/src/github.com/cilium/cilium/pkg/k8s/watchers/node.go:89 +0x1cc github.com/cilium/cilium/pkg/k8s/watchers.(*K8sWatcher).nodesInit.func1() /go/src/github.com/cilium/cilium/pkg/k8s/watchers/node.go:44 +0xd1 k8s.io/client-go/tools/cache.ResourceEventHandlerFuncs.OnAdd() /go/src/github.com/cilium/cilium/vendor/k8s.io/client-go/tools/cache/controller.go:231 +0x8f k8s.io/client-go/tools/cache.(*ResourceEventHandlerFuncs).OnAdd() <autogenerated>:1 +0x2a github.com/cilium/cilium/pkg/k8s/informer.NewInformerWithStore.func1() /go/src/github.com/cilium/cilium/pkg/k8s/informer/informer.go:119 +0x297 k8s.io/client-go/tools/cache.(*DeltaFIFO).Pop() /go/src/github.com/cilium/cilium/vendor/k8s.io/client-go/tools/cache/delta_fifo.go:544 +0x50d k8s.io/client-go/tools/cache.(*controller).processLoop() /go/src/github.com/cilium/cilium/vendor/k8s.io/client-go/tools/cache/controller.go:183 +0x83 k8s.io/client-go/tools/cache.(*controller).processLoop-fm() /go/src/github.com/cilium/cilium/vendor/k8s.io/client-go/tools/cache/controller.go:181 +0x4a k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1() /go/src/github.com/cilium/cilium/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:155 +0x75 k8s.io/apimachinery/pkg/util/wait.BackoffUntil() /go/src/github.com/cilium/cilium/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:156 +0xba k8s.io/apimachinery/pkg/util/wait.JitterUntil() /go/src/github.com/cilium/cilium/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133 +0x114 k8s.io/apimachinery/pkg/util/wait.Until() /go/src/github.com/cilium/cilium/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:90 +0x507 k8s.io/client-go/tools/cache.(*controller).Run() /go/src/github.com/cilium/cilium/vendor/k8s.io/client-go/tools/cache/controller.go:154 +0x4a9 Previous read at 0x000005882150 by main goroutine: github.com/cilium/cilium/pkg/nodediscovery.(*NodeDiscovery).StartDiscovery() /go/src/github.com/cilium/cilium/pkg/nodediscovery/nodediscovery.go:182 +0x3ab github.com/cilium/cilium/daemon/cmd.NewDaemon() /go/src/github.com/cilium/cilium/daemon/cmd/daemon.go:762 +0x34d6 github.com/cilium/cilium/daemon/cmd.runDaemon() /go/src/github.com/cilium/cilium/daemon/cmd/daemon_main.go:1558 +0xb8b github.com/cilium/cilium/daemon/cmd.glob..func1() /go/src/github.com/cilium/cilium/daemon/cmd/daemon_main.go:137 +0x3a4 github.com/spf13/cobra.(*Command).execute() /go/src/github.com/cilium/cilium/vendor/github.com/spf13/cobra/command.go:854 +0x8cb github.com/spf13/cobra.(*Command).ExecuteC() /go/src/github.com/cilium/cilium/vendor/github.com/spf13/cobra/command.go:958 +0x4b2 github.com/spf13/cobra.(*Command).Execute() /go/src/github.com/cilium/cilium/vendor/github.com/spf13/cobra/command.go:895 +0x88 github.com/cilium/cilium/daemon/cmd.Execute() /go/src/github.com/cilium/cilium/daemon/cmd/daemon_main.go:162 +0x69 main.main() /go/src/github.com/cilium/cilium/daemon/main.go:22 +0x2f runtime.main() /usr/local/go/src/runtime/proc.go:225 +0x255 github.com/cilium/ebpf.init() /go/src/github.com/cilium/cilium/vendor/github.com/cilium/ebpf/syscalls.go:437 +0x58b github.com/cilium/ebpf.init() /go/src/github.com/cilium/cilium/vendor/github.com/cilium/ebpf/prog.go:453 +0x518 github.com/cilium/ebpf.init() /go/src/github.com/cilium/cilium/vendor/github.com/cilium/ebpf/syscalls.go:419 +0x4ab github.com/cilium/ebpf.init() /go/src/github.com/cilium/cilium/vendor/github.com/cilium/ebpf/syscalls.go:235 +0x439 github.com/cilium/ebpf.init() /go/src/github.com/cilium/cilium/vendor/github.com/cilium/ebpf/syscalls.go:217 +0x3cc runtime.doInit() /usr/local/go/src/runtime/proc.go:6309 +0xeb github.com/cilium/ebpf/internal/btf.init() /go/src/github.com/cilium/cilium/vendor/github.com/cilium/ebpf/internal/btf/btf.go:728 +0x1cc This commit fixes it. Fixes: 81f0626a ("datapath: Include the host endpoint ID in all endpoint headers") Signed-off-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Aditi Ghag <aditi@cilium.io> 08 September 2021, 14:43:09 UTC
b612bd4 pkg/fqdn: use LRU in FQDN policy calculation [ upstream commit 42417c725ab7fae78acbb921c91977c1c977bc87 ] Using an LRU for the memory-intensive operations such as regex.Compile bring some benefits as presented in the following benchmarks [1]. These benchmarks assumed there were 2 CNPs that shared 100 FQDN `matchPattern` on a node with 20 endpoints. [1] ``` name old time/op new time/op delta _perEPAllow_setPortRulesForID-8 13.9ms ± 6% 1.2ms ±63% -91.10% (p=0.008 n=5+5) name old alloc/op new alloc/op delta _perEPAllow_setPortRulesForID-8 17.4MB ± 0% 0.6MB ± 0% -96.56% (p=0.008 n=5+5) name old allocs/op new allocs/op delta _perEPAllow_setPortRulesForID-8 42.8k ± 0% 8.1k ± 0% -81.13% (p=0.008 n=5+5) ``` Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Aditi Ghag <aditi@cilium.io> 08 September 2021, 14:43:09 UTC
653cc7d vendor: update mongo-driver to 1.5.1 to fix CVE-2021-20329 [ upstream commit 1695d9c59ac4e78b5a02a96e83a57ec07ddbaa7f ] Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Aditi Ghag <aditi@cilium.io> 08 September 2021, 14:43:09 UTC
ddd5ac1 build(deps): bump docker/setup-buildx-action from 1.5.1 to 1.6.0 Bumps [docker/setup-buildx-action](https://github.com/docker/setup-buildx-action) from 1.5.1 to 1.6.0. - [Release notes](https://github.com/docker/setup-buildx-action/releases) - [Commits](https://github.com/docker/setup-buildx-action/compare/abe5d8f79a1606a2d3e218847032f3f2b1726ab0...94ab11c41e45d028884a99163086648e898eed25) --- updated-dependencies: - dependency-name: docker/setup-buildx-action dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> 06 September 2021, 16:12:17 UTC
c100aa9 install: Update image digests for v1.10.4 Generated from https://github.com/cilium/cilium/actions/runs/1196157166. `docker.io/cilium/cilium:v1.10.4@sha256:7d354052ccf2a7445101d78cebd14444c7c40129ce7889f2f04b89374dbf8a1d` `quay.io/cilium/cilium:v1.10.4@sha256:7d354052ccf2a7445101d78cebd14444c7c40129ce7889f2f04b89374dbf8a1d` `docker.io/cilium/cilium:stable@sha256:7d354052ccf2a7445101d78cebd14444c7c40129ce7889f2f04b89374dbf8a1d` `quay.io/cilium/cilium:stable@sha256:7d354052ccf2a7445101d78cebd14444c7c40129ce7889f2f04b89374dbf8a1d` `docker.io/cilium/clustermesh-apiserver:v1.10.4@sha256:280c6230d32d7045089141177d5b052559ea194006bf1b02d84ab332812cc8c0` `quay.io/cilium/clustermesh-apiserver:v1.10.4@sha256:280c6230d32d7045089141177d5b052559ea194006bf1b02d84ab332812cc8c0` `docker.io/cilium/clustermesh-apiserver:stable@sha256:280c6230d32d7045089141177d5b052559ea194006bf1b02d84ab332812cc8c0` `quay.io/cilium/clustermesh-apiserver:stable@sha256:280c6230d32d7045089141177d5b052559ea194006bf1b02d84ab332812cc8c0` `docker.io/cilium/docker-plugin:v1.10.4@sha256:da57f22cb4984031d7d424539901730e6c093ef46f839e60ca25dfc2d056c3a2` `quay.io/cilium/docker-plugin:v1.10.4@sha256:da57f22cb4984031d7d424539901730e6c093ef46f839e60ca25dfc2d056c3a2` `docker.io/cilium/docker-plugin:stable@sha256:da57f22cb4984031d7d424539901730e6c093ef46f839e60ca25dfc2d056c3a2` `quay.io/cilium/docker-plugin:stable@sha256:da57f22cb4984031d7d424539901730e6c093ef46f839e60ca25dfc2d056c3a2` `docker.io/cilium/hubble-relay:v1.10.4@sha256:be17169d2b68a974e9e27bc194e0c899dbec8caee9dd95011654b75d775d413d` `quay.io/cilium/hubble-relay:v1.10.4@sha256:be17169d2b68a974e9e27bc194e0c899dbec8caee9dd95011654b75d775d413d` `docker.io/cilium/hubble-relay:stable@sha256:be17169d2b68a974e9e27bc194e0c899dbec8caee9dd95011654b75d775d413d` `quay.io/cilium/hubble-relay:stable@sha256:be17169d2b68a974e9e27bc194e0c899dbec8caee9dd95011654b75d775d413d` `docker.io/cilium/operator-alibabacloud:v1.10.4@sha256:39810dcfba0ca4dc02fcc1ac7515b87e362b6eb5c174cd08d3f511f48e2de108` `quay.io/cilium/operator-alibabacloud:v1.10.4@sha256:39810dcfba0ca4dc02fcc1ac7515b87e362b6eb5c174cd08d3f511f48e2de108` `docker.io/cilium/operator-alibabacloud:stable@sha256:39810dcfba0ca4dc02fcc1ac7515b87e362b6eb5c174cd08d3f511f48e2de108` `quay.io/cilium/operator-alibabacloud:stable@sha256:39810dcfba0ca4dc02fcc1ac7515b87e362b6eb5c174cd08d3f511f48e2de108` `docker.io/cilium/operator-aws:v1.10.4@sha256:45df7a09f8278a9c2313fa7d96e4254873c4e3fc42b181fd174985d6eafee326` `quay.io/cilium/operator-aws:v1.10.4@sha256:45df7a09f8278a9c2313fa7d96e4254873c4e3fc42b181fd174985d6eafee326` `docker.io/cilium/operator-aws:stable@sha256:45df7a09f8278a9c2313fa7d96e4254873c4e3fc42b181fd174985d6eafee326` `quay.io/cilium/operator-aws:stable@sha256:45df7a09f8278a9c2313fa7d96e4254873c4e3fc42b181fd174985d6eafee326` `docker.io/cilium/operator-azure:v1.10.4@sha256:f3fed6efdabc69731cbad1c883e6f0821511fa60fd62138ab63046f32ea56be0` `quay.io/cilium/operator-azure:v1.10.4@sha256:f3fed6efdabc69731cbad1c883e6f0821511fa60fd62138ab63046f32ea56be0` `docker.io/cilium/operator-azure:stable@sha256:f3fed6efdabc69731cbad1c883e6f0821511fa60fd62138ab63046f32ea56be0` `quay.io/cilium/operator-azure:stable@sha256:f3fed6efdabc69731cbad1c883e6f0821511fa60fd62138ab63046f32ea56be0` `docker.io/cilium/operator-generic:v1.10.4@sha256:c49a14e34634ff1a494c84b718641f27267fb3a0291ce3d74352b44f8a8d2f93` `quay.io/cilium/operator-generic:v1.10.4@sha256:c49a14e34634ff1a494c84b718641f27267fb3a0291ce3d74352b44f8a8d2f93` `docker.io/cilium/operator-generic:stable@sha256:c49a14e34634ff1a494c84b718641f27267fb3a0291ce3d74352b44f8a8d2f93` `quay.io/cilium/operator-generic:stable@sha256:c49a14e34634ff1a494c84b718641f27267fb3a0291ce3d74352b44f8a8d2f93` `docker.io/cilium/operator:v1.10.4@sha256:4679c953207a3fe9cfbd9b4a3f41149a8bddf1cc8f944f6d5c7f5b345338d98d` `quay.io/cilium/operator:v1.10.4@sha256:4679c953207a3fe9cfbd9b4a3f41149a8bddf1cc8f944f6d5c7f5b345338d98d` `docker.io/cilium/operator:stable@sha256:4679c953207a3fe9cfbd9b4a3f41149a8bddf1cc8f944f6d5c7f5b345338d98d` `quay.io/cilium/operator:stable@sha256:4679c953207a3fe9cfbd9b4a3f41149a8bddf1cc8f944f6d5c7f5b345338d98d` Signed-off-by: Joe Stringer <joe@cilium.io> 03 September 2021, 00:33:39 UTC
2a46fd6 Prepare for release v1.10.4 Signed-off-by: Joe Stringer <joe@cilium.io> 01 September 2021, 22:03:22 UTC
430e348 pkg/controller: set runFunc to when controller is manually triggered [ upstream commit 4ebcc27ec1c6539ad31416b2f2b5e335b6dd6761 ] `runFunc` should be set to true when the controller is manually triggered otherwise the controller might never be executed in the case where its run interval is zero. Fixes: c61d02fc4233 ("controller: allow to manually trigger it") Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Tom Payne <tom@isovalent.com> 01 September 2021, 18:42:55 UTC
d6d3a90 pkg/k8s: use a controller manager in K8sClient [ upstream commit 5949ae4b7cfb8c2fd7ea258e06c1fd7de2b77de0 ] Since "controller.NewManager()" returns a new Manager, triggering a controller from this newly returned manager does not work because the manager does not have any controller set in it. In order to be able to re-trigger controllers, a single instance of this manager is set in the K8sClient struct. Fixes: bd34b95a7939 ("pkg/k8s: remove node.cilium.io/agent-not-ready taint from nodes") Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Tom Payne <tom@isovalent.com> 01 September 2021, 18:42:55 UTC
8f44433 Update cilium base images Signed-off-by: Joe Stringer <joe@cilium.io> 31 August 2021, 17:50:13 UTC
cc120d8 build(deps): bump actions/setup-go from 2.1.3 to 2.1.4 Bumps [actions/setup-go](https://github.com/actions/setup-go) from 2.1.3 to 2.1.4. - [Release notes](https://github.com/actions/setup-go/releases) - [Commits](https://github.com/actions/setup-go/compare/37335c7bb261b353407cff977110895fa0b4f7d8...331ce1d993939866bb63c32c6cbbfd48fa76fc57) --- updated-dependencies: - dependency-name: actions/setup-go dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> 30 August 2021, 20:49:20 UTC
e5513b3 envoy: Update to 1.18.4 Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> 30 August 2021, 18:29:38 UTC
ea6ebf0 envoy: Update to release 1.18.3 [ upstream commit 74e89a4d55b774c5c95853f522c9a7bc63c5e692 ] Signed-off-by: Jarno Rajahalme <jarno@isovalent.com> 30 August 2021, 18:29:38 UTC
91bd24b routing: Fix incorrect detection of Linux slave devices [ upstream commit 37d6c8d9c41766c356534ddbf2572c4b0f1ef019 ] Using method Slave() exposed by the netlink package doesn't always work. In particular, it doesn't work on AKS, maybe because there's no master bond interface in that case. We should instead rely on the flags passed by Linux's netlink API. Fixes: 3e245517 ("routing: Fix incorrect interface selection for pod routes") Signed-off-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Tobias Klauser <tobias@cilium.io> 24 August 2021, 06:35:34 UTC
d96c5f0 proxylib: Use channel instead of atomics [ upstream commit d2cc2b9e582fc6c0ef5110f31abcc274eb9d4624 ] This commit contains no functional change. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> Signed-off-by: Tobias Klauser <tobias@cilium.io> 24 August 2021, 06:35:34 UTC
9104a49 proxylib: Fix race in accesslog_server [ upstream commit 82ec9f0952048fefde14a21d5117b8e1856dfd78 ] This introduces a mutex for the connections list, as there is a race where `Close()` starts iterating over the connections while the accept loop is trying to append a new connection to it. Fixes: #16371 Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> Signed-off-by: Tobias Klauser <tobias@cilium.io> 24 August 2021, 06:35:34 UTC
7168de1 proxylib: Fix data race in unit test [ upstream commit c7cc24f09c7a9e8a84fd86ad9ef61e7374d32f6f ] This fixes a data race where we were accessing raw integers in unit tests from two different go routines without using atomics. Fixes: #16315 Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> Signed-off-by: Tobias Klauser <tobias@cilium.io> 24 August 2021, 06:35:34 UTC
6aca25a policy: Fix transient policy deny during restart due to missing ipcache [ upstream commit ff7bacb1bfa1c2cf3e865f2beed46275c7947a1b ] Currently, during endpoint restoration, ipcache map is unpinned and recreated by Map.OpenParallel, regenerated endpoints lookup from the newly created ipcache map, so the regeneration of endpoints should wait for the ipcache map to be synchronized. However, the regeneration of host endpoint doesn't wait for ipcache map sync, which introduces transient policy deny. This patch fixes this by making all endpoints wait for ipcache map sync. Fixes: #15878 Signed-off-by: Jaff Cheng <jaff.cheng.sh@gmail.com> Signed-off-by: Tobias Klauser <tobias@cilium.io> 24 August 2021, 06:35:34 UTC
5972f48 policy: Fix transient policy deny during restart due to empty policy map [ upstream commit 175dab2f6488ffb2d314e0e0e6c4afdbe49a2a04 ] Currently, during endpoint restoration, policy maps are flushed before they are refilled, which introduces transient policy deny. Instead of flushing all policy map entries while restoring/initializing endpoint policyMap, this patch synchronizes the in-memory realized state with BPF map entries, and any potential discrepancy between desired and realized state would be dealt with by the following e.syncPolicyMap. Fixes: #15878 Signed-off-by: Jaff Cheng <jaff.cheng.sh@gmail.com> Signed-off-by: Tobias Klauser <tobias@cilium.io> 24 August 2021, 06:35:34 UTC
1d9256c proxylib/test: fix data race between StartAccessLogServer and Close [ upstream commit 47d4224060367b85d870db9b65480f14d669dbcf ] Don't attempt to accept the connection if the server is closing. Fixes #16296 Signed-off-by: Tobias Klauser <tobias@cilium.io> 24 August 2021, 06:35:34 UTC
93db262 build(deps): bump docker/build-push-action from 2.6.1 to 2.7.0 Bumps [docker/build-push-action](https://github.com/docker/build-push-action) from 2.6.1 to 2.7.0. - [Release notes](https://github.com/docker/build-push-action/releases) - [Commits](https://github.com/docker/build-push-action/compare/1bc1040caef9e604eb543693ba89b5bf4fc80935...a66e35b9cbcf4ad0ea91ffcaf7bbad63ad9e0229) --- updated-dependencies: - dependency-name: docker/build-push-action dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> 20 August 2021, 16:45:42 UTC
4bc9424 hubble: Never fail with ErrInvalidRead [ upstream commit 98b5fb3f964fc421c56b3e5e9899812e9e1242eb ] Currently GetFlows() fails with the following error when a position in the ring buffer being read by Ring.read() has been overwritten: requested data has been overwritten and is no longer available This turned out to be impractical as it makes it difficult to read all the flows in the ring buffer (e.g.. hubble observe --all). GetFlows() would fail if Hubble observes a single flow between the reader rewinding to the oldest position and retrieving the entry. This patch modifies Ring.read() so that GetFlows() returns LostEvent instead of stopping with an error. The caller of GetFlows() can then decide how to handle LostEvent. Note that this makes the behavior of Ring.read() consistent with that of Ring.readFrom() used in the follow mode. It generates LostEvent and continues following instead of failing with ErrInvalidRead. Fixes: #17036 Signed-off-by: Michi Mutsuzaki <michi@isovalent.com> Signed-off-by: Timo Beckers <timo@isovalent.com> 19 August 2021, 09:15:17 UTC
050f1ed datapath: panic explicitly when IP of direct-routing-device not found Slightly modified the ipv4 code to work with HostSliceToNetwork, which has been removed on master. [ upstream commit 83edea82d215930c1ee714b322d6e59cbbfec766 ] A user reported the follow panic on agent startup: ``` level=info msg=" --devices=''" subsys=daemon level=info msg=" --direct-routing-device='wg0'" subsys=daemon ... level=info msg="Trying to auto-enable \"enable-node-port\", \"enable-external-ips\", \"enable-host-reachable-services\", \"enable-host-port\", \"enable-session-affinity\" features" subsys=daemon ... level=info msg="Cluster-ID is not specified, skipping ClusterMesh initialization" subsys=daemon panic: runtime error: index out of range [3] with length 0 goroutine 1 [running]: encoding/binary.bigEndian.Uint32(...) /usr/local/go/src/encoding/binary/binary.go:112 github.com/cilium/cilium/pkg/byteorder.HostSliceToNetwork(0x0, 0x0, 0x0, 0xa, 0x4795aa0, 0x41983f8) /go/src/github.com/cilium/cilium/pkg/byteorder/byteorder.go:134 +0x24f github.com/cilium/cilium/pkg/datapath/linux/config.(*HeaderfileWriter).WriteNodeConfig(0x4792cf0, 0x2e86b00, 0xc0005b4fe8, 0xc0000fcc68, 0x0, 0x0) /go/src/github.com/cilium/cilium/pkg/datapath/linux/config/config.go:426 +0x47db github.com/cilium/cilium/daemon/cmd.(*Daemon).createNodeConfigHeaderfile(0xc0001d5200, 0x0, 0x0) /go/src/github.com/cilium/cilium/daemon/cmd/datapath.go:72 +0x3e9 github.com/cilium/cilium/daemon/cmd.(*Daemon).init(0xc0001d5200, 0x4239c20, 0x42278e0) /go/src/github.com/cilium/cilium/daemon/cmd/daemon.go:233 +0x6a6 ``` With some investigations, this is how the panic happened: 1. --devices='' && --direct-routing-device='wg0' resulted in final option.Config.Devices=["eth0"] on his machine; which further led to 2. no ip address were initialized in NodePort IPv4 address map for `wg0`, then 3. `nodePortIPv4Addrs["wg0"]` returned an empty net.IP object, and the subsequent byteorder convertion paniced as above Although it's the user to blame for misconfiguration, the panic message is not so friendly either for ordinary users determining what's happened and how to fix it. This patch improves it by checking the existance of the IP address before using it, and panic explicitly with more user-friendly messages. Update: this patch also helps even if c042c05f8bf is added recently, as the latter also accesses indexes before checking IP existance: ```go 10 // NetIPv4ToHost32 converts an net.IP to a uint32 in host byte order. ip 11 // must be a IPv4 address, otherwise the function will panic. 12 func NetIPv4ToHost32(ip net.IP) uint32 { 13 ipv4 := ip.To4() 14 _ = ipv4[3] // Assert length of ipv4. 15 return Native.Uint32(ipv4) 16 } ``` Signed-off-by: ArthurChiao <arthurchiao@hotmail.com> Signed-off-by: Timo Beckers <timo@isovalent.com> 19 August 2021, 09:15:17 UTC
4139182 routing: Fix incorrect interface selection for pod routes [ upstream commit 3e245517c9112b664e01cd47c0900beacbdedf93 ] The Configure method relies on the MAC address to select the proper egress interface for new pods (EKS and AKS). Several interfaces can however have the same MAC address in the case of slave devices. In such a case, the wrong interface may be selected. To avoid this, we skip Linux slave devices during the lookup by MAC address. Thus, in case of slave devices, we will select the master device. Fixes: 26308b63 ("Implement support for cilium-health in ENI mode") Signed-off-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Timo Beckers <timo@isovalent.com> 19 August 2021, 09:15:17 UTC
f5c7d80 routing: Throw error if MAC lookup finds several devices [ upstream commit 11c0faa94730d489a1fa5dc989410d5e12009ee2 ] When setting up the Linux routes and rules for ENI and Azure, we lookup the interfaces by their MAC addresses. In that case, we want to ensure a single interface is found for the given MAC address. If several are found, we throw an error now rather than to fail in a more obscure way down the line. Signed-off-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Timo Beckers <timo@isovalent.com> 19 August 2021, 09:15:17 UTC
159f453 pkg/redirectpolicy: Make code robust against incorrect policy configurations [ upstream commit fbeb5c819abb70b985eb105542b8290c4f44883b ] If an incorrect service name is specified in an LRP, we need to guard against this case when service IP can't be retrieved. Also, when a valid LRP is applied before the service it selects, service information won't be available in the manager callback to add an LRP. The LRP will be applied when the service callback is later received by the manager. Signed-off-by: Aditi Ghag <aditi@cilium.io> Signed-off-by: Timo Beckers <timo@isovalent.com> 19 August 2021, 09:15:17 UTC
2960a34 Makefile: fix typo in helper message [ upstream commit 7c85c4364d3dbff20a1f2877303d041dd725fb63 ] Fixes: 2186ae4c3079 ("make: add help target to root Makefile for printing info about available targets") Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Timo Beckers <timo@isovalent.com> 19 August 2021, 09:15:17 UTC
a9d0e3c Update Go to 1.16.7 Signed-off-by: Tobias Klauser <tobias@cilium.io> 19 August 2021, 07:11:53 UTC
4a9ba52 docs: Regenerate helm values Regenerates the helm values documentation on the v1.10 branch, as some entires were out of date. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> 13 August 2021, 14:58:28 UTC
dd9723f helm: use 'quay.io/cilium/certgen:v0.1.5' [ upstream commit 206105f4462c8ab70820e6b0bf5e9e974f46bcc2 ] Related to the previous commit, v0.1.5 of 'cilium/certgen' adds '*.mesh.cilium.io' to the list of SANs for the server certificate generated for 'clustermesh-apiserver'. Signed-off-by: Bruno Miguel Custódio <brunomcustodio@gmail.com> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> 13 August 2021, 14:58:28 UTC
694ded0 helm: add '*.mesh.cilium.io' to the list of SANs [ upstream commit 3753fe72460d47f4c1dc861c7fc2342df1d116b3 ] Currently, the server certificate generated by Helm for 'clustermesh-apiserver' doesn't include '*.mesh.cilium.io', which is used alongside host aliases when establishing a cluster mesh. This commit addresses that by adding said domain to the list of SANs. Signed-off-by: Bruno Miguel Custódio <brunomcustodio@gmail.com> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> 13 August 2021, 14:58:28 UTC
d221704 install/kubernetes: use bidirectional mounts to mount bpf fs [ upstream commit f7a3f59fd74983c600bfce9cac364b76d20849d9 ] Bidirectional mounts are available in Kubernetes since 1.4 [1]. This allows Cilium container to mount the bpf fs automatically and propagate the mount into the host. This will improve Cilium's UX as it will remove the requirement of mounting the BPF fs in the host. [1] https://kubernetes.io/docs/concepts/storage/volumes/#mount-propagation Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> 13 August 2021, 14:58:28 UTC
af4f10a node-neigh: Wait instead of sleeping in unit tests [ upstream commit 2017e04b6bf40291e3e6e8cbd0ce5537fe5d0110 ] We can inspect the neighLastPingByNextHop map to check when insertNeighbor() or deleteNeighbor() was called. Fixes: e68848b98004 ("remove ARP entries left from previous Cilium run") Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> 13 August 2021, 14:58:28 UTC
3c25bb2 docs: Fix missing quote in gcloud command for GKE [ upstream commit 9fa20c82becafb075d1fa41f151cc8f002b08644 ] This was causing the command to fail if copy-pasted directly. Fixes: 3662560f897 ("Add new unified Helm guide") Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> 13 August 2021, 14:58:28 UTC
5f65e82 policy: Fix cilium policy trace output when only deny rules are applied [ upstream commit 9badbd7ce6fc77e8af9d0ebb8e4c211958f49657 ] Currently, deny rules are not counted in `*Repository.GetRulesMatching()`. Therefore, this function returns `false` even though a rule is applied if it is a deny rule. This causes a wrong output of `cilium policy trace` commnad in some cases. This PR makes `*Repository.GetRulesMatching()` to count ingress/egress deny rules. Also, add the unit test for this. Signed-off-by: Tomoki Sugiura <cheztomo513@gmail.com> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> 13 August 2021, 14:58:28 UTC
f20df2d install: Fix README links to getting started guides [ upstream commit f5c39586866486ab3532f2a3947e50cf7350763d ] Fix two issues in the README links for getting started guides: * Bad formatting of the image for the Self-Managed K8s option * Broken links to the docs page for getting started. Link all of the links to the same page, like we do on cilium.io. Fixes: 3662560f8979 ("Add new unified Helm guide") Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> 13 August 2021, 14:58:28 UTC
c40652e operator: misc. refactoring and code removal [ upstream commit 94f94e5451697132ae1bf5381e3f03882e3ed49d ] Add better code comments as well as variable names. Also remove the code handling the GC of CiliumNodes since that is taken care by Kubernetes has CiliumNodes have owner references set to the K8s Node. Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> 13 August 2021, 14:58:28 UTC
bcdc9e9 add MLH config for flake tracking Signed-off-by: André Martins <andre@cilium.io> 04 August 2021, 21:30:30 UTC
f607d59 hubble/recorder: Be more explicit about mutex [ upstream commit 6f9b87549f23effae024934ec3fcdec4dba95aa3 ] This commit documents what fields are now protected by the mutex in `type sink` and updates a two usages accordingly, by moving channel operations out of the critical section. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> Signed-off-by: Paul Chaignon <paul@cilium.io> 01 August 2021, 04:48:32 UTC
73fd753 hubble/recorder: Fix grpc send from concurrent go routine [ upstream commit f8ff0514cc671adfcfa31f4b1103ad32d2e4ea78 ] This commit fixes a concurrency issue in the implementation of the Hubble Recorder API. Before this commit, we were sending responses to the client from both the main `Record` function, as well as the `watchRecording` function which was spawned in a separate go routine. However, sending to a grpc.ServerStream from multiple go routines is _not_ safe: https://pkg.go.dev/google.golang.org/grpc#ServerStream It is however safe to have one go routine receive from, and another go routine send to the stream. Therefore, this commit restructures the Hubble Recorder API in such a way that only the `Record` stub ever sends back messages to the client. Receiving is done in a separate go routine which forwards all received messages into a channel, allowing us to select on incoming responses. In addition, this commit hopefully also makes the logic a bit more easier to read, as it tries to separate the cleanup of resources and communicating with the client a bit more explicitly. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> Signed-off-by: Paul Chaignon <paul@cilium.io> 01 August 2021, 04:48:32 UTC
97800d5 hubble/recorder: Improve ergonomics by not closing stats channel [ upstream commit ee00c7e9ff37c4da634ad9e48dbbe31c999299ed ] Since we have introduced the `Handle.Done` channel, we do not have toV signal the shutdown of the sink by closing the statistics channel anymore. Instead, consumers can now wait on the `Handle.Done` channel getting closed. While there are not many benefits in this version of the code, it will make the select statement in a subsequent commit much more readable. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> Signed-off-by: Paul Chaignon <paul@cilium.io> 01 August 2021, 04:48:32 UTC
9d2113a hubble/recoder: Automatically unregister sink when stopped [ upstream commit 3a462198847edbae6e7876be651a7bfc3b405053 ] This commit changes the interface of sink.Dispatch from an explicit `RegisterSink`+`UnregisterSink` pair to a `StartSink` call which will unregister itself when it stops due to an error, an expired context, or an explicit stop request. This commit does not introduce any functional changes, it is purely a refactoring. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> Signed-off-by: Paul Chaignon <paul@cilium.io> 01 August 2021, 04:48:32 UTC
d5b3a75 hubble/recorder: Stop without draining queue [ upstream commit b72eacfe58f20c45dd15cf9e99adbf0bf48857dc ] Previously, we waited for the recording queue to drain when the client requested a stop. However, because the client has no visibility into the queue (and indeed doesn't even know if there are queued records when they issue a stop request), this does not provide any value to the client. Therefore, this PR changes the semantics of a stop request by immediately initiating a shutdown, instead of waiting for the queue to drain. This ensures that the resulting recording more closely matches the observed statistics at the time when the client issued a stop request. It also simplifies the code a bit. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> Signed-off-by: Paul Chaignon <paul@cilium.io> 01 August 2021, 04:48:32 UTC
96994a7 hubble/recorder: Allow multiple go routines to wait for sink [ upstream commit eccc712d1e9f1895f23326457bbca956c9d27de7 ] This commit splits the sink's `done chan error` channel (which only allows a single consumer to wait on the sink to finish) into a `chan struct{}` channel and a `lastError error` variable. This enables us to signal that the sink has finished by closing the channel instead of sending a value over it. Closing the channel allows multiple go routines to block on this event via `<-s.done`. The final error value can then be retrieved via `s.err()`. This pattern is very similar to how `context.Context` works. This commit does not yet make use of this functionality. The changes enabled by this will follow in a subsequent commit. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> Signed-off-by: Paul Chaignon <paul@cilium.io> 01 August 2021, 04:48:32 UTC
117c3c5 backporting: Detect only one related commit [ upstream commit 9abbbbfef0431f0e54dbe863648329c7bf138ee4 ] Recently, the check-stable script has suggested every single possible match for commits where the name does not uniquely identify the commit. This can be a bit confusing to backporters since it looks like there are many commits to backport as part of this PR, but the second and later ones are not necessary to backport. * PR: 16589 -- vagrant: Bump all Vagrant box versions (@pchaigno) -- https://github.com/cilium/cilium/pull/16589 Merge with 1 commit(s) merged at: Tue, 22 Jun 2021 12:36:17 -0700! Branch: master (!) refs/pull/16589/head ---------- ------------------- v (start) | edf76fb1ef6b58d5ef90b439d54134f314ed086e 5bef5d77137a9ecc5d3f2b72149307ffdd52cd42 4dc60e6faf654d7424ee959867a774205b3fed13 816b3231cdbc39f4bcdd3e6f5b40a056459a478c 51826b31087496d108044f3bffbf304580fffb4a df8238d451d755d5be75e202be89b4f88067c77b a4e7bc6c1f0e96078793458b6719b9a3999b89db via fb723f8133c40faa068a5a401f594622668b2753 ("vagrant: Bump all Vagrant box versions") v (end) Probably within the last year of commits, we should be able to correlate the exact commit that needs backporting, so iterate through those to find the exact commit. If none of those are the correct commit, fail out and push back to the backporter to figure out. This allows us to now accurately pick the correct commit in most cases: * PR: 16589 -- vagrant: Bump all Vagrant box versions (@pchaigno) -- https://github.com/cilium/cilium/pull/16589 Merge with 1 commit(s) merged at: Tue, 22 Jun 2021 12:36:17 -0700! Branch: master (!) refs/pull/16589/head ---------- ------------------- v (start) | edf76fb1ef6b58d5ef90b439d54134f314ed086e via fb723f8133c40faa068a5a401f594622668b2753 ("vagrant: Bump all Vagrant box versions") v (end) Manually tested by substituting a known commit into 'related_commits', and by checking the current v1.8 backports which includes an ambiguous commit due to a revert+reapply in the master branch. Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Paul Chaignon <paul@cilium.io> 01 August 2021, 04:48:32 UTC
back to top