https://github.com/cilium/cilium

sort by:
Revision Author Date Message Commit Date
faa658a Prepare for release v1.7.10 Signed-off-by: Joe Stringer <joe@cilium.io> 30 September 2020, 23:05:17 UTC
2c3f4d7 envoy: Stop using deprecated filter names Stop using deprecated Envoy filter names in order to get rid of deprecation warning logs. Signed-off-by: Jarno Rajahalme <jarno@covalent.io> 30 September 2020, 17:38:45 UTC
863a98f Envoy: Update to release 1.14.5 [ upstream commit 3cf224e99523273d01835fb83876d92765c90b6b ] Signed-off-by: Jarno Rajahalme <jarno@covalent.io> Signed-off-by: Joe Stringer <joe@cilium.io> 30 September 2020, 17:38:45 UTC
0e9b244 envoy: Require Node only on the first request of a stream [ upstream commit 89f654d44351aa3fad671ee298f5beaadd2e5704 ] xDS request's Node field has grown huge and Envoy optionally only sends it in the first request. Use this option an adapt xDS stream processing accordingly. Partially also: [ upstream commit df0c9bbb1f3d25c1c1a534f66d28083e741354aa ] Signed-off-by: Jarno Rajahalme <jarno@covalent.io> 30 September 2020, 17:38:45 UTC
cd5c6c5 k8s: Remove CRD deleting functionality [ upstream commit 5c6aad6d5c78d80283c8d5b0238c180576ae1416 ] This commit removes the ability to delete CRDs from Cilium because that would delete all the CRs in the cluster. Follow-up from: https://github.com/cilium/cilium/pull/11477#discussion_r487816729 Updates: https://github.com/cilium/cilium/issues/12737 Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Michal Rostecki <mrostecki@opensuse.org> 29 September 2020, 21:54:19 UTC
aa927f9 iptables: comment on xt_connmark requirement for EKS rules [ upstream commit 815be6aa8d1660b37fd33c492ec8db39b07f5d78 ] EKS requires some specific rules for asymmetric routing with multi-node NodePort traffic. These rules relies on the xt_connmark kernel module, which is usually loaded by iptables when necessary. The rules are installed when the selected IPAM is ENI, meaning they are installed on AWS (but not only EKS). The xt_connmark module should be loaded in a similar way, unless loading modules after boot has been disabled, in which case the setup fails and the agent crashes. Add a comment to at least help debug the issue. Longer term, we may want to add more explicit hints to the logs if too many users hit the issue, but that would require parsing iptables' output for the specific error, so let's see how it goes with a simple comment in the code for now. Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Michal Rostecki <mrostecki@opensuse.org> 29 September 2020, 21:54:19 UTC
a0d754b iptables, loader: use interface with default route for EKS rules [ upstream commit a301853023ef79b36649dc985e0e4b0d9db0edbe ] Multi-node NodePort traffic on EKS needs specific rules regarding asymmetric routing. These rules were implemented for the eth0 interface (namely), because this is what EKS uses. With the default Amazon Linux 2 distribution. But EKS can also run with Ubuntu for example, and the name of the interface is not the same in that case. Instead of "eth0", use the interface with the dafault route. This is a quick fix, and longer term we want to add the rules to all relevant interfaces, as discussed in #12770. Fixes: #12770 Fixes: #13143 Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Michal Rostecki <mrostecki@opensuse.org> 29 September 2020, 21:54:19 UTC
2a0308c iptables, loader: skip rules for EKS asymmetric routing if !IPv4 [ upstream commit 09e9a469f667b877bdcedbe4137d8919c7420aa5 ] EKS needs some specific rules for asymmetric routing with multi-node NodePort traffic. These rules are implemented only for IPv4, so we can avoid installing them when IPv4 is disabled. This is what this commit does. Note that this check is, in fact, not necessary at the moment, because as the config package says: "IPv6 cannot be enabled in ENI IPAM mode". So we always run with IPv4. But let's have it for good measure, to avoid issues if IPv6 support comes in the future. For the same reason, we also do not have to implement equivalent rules for IPv6 at the moment. Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Michal Rostecki <mrostecki@opensuse.org> 29 September 2020, 21:54:19 UTC
1c4c3c1 loader: move ENI rules for asymmetric routing to dedicated function [ upstream commit 01f8dcc51c84e1cab269f84e782e09d8261ac495 ] EKS needs some specific rules for NodePort traffic (see PR #12770, or comments in the code, for details). The addition of part of these rules were added to the body of the Reinitialize() function in the loader. To make them easier to maintain or extend, let's move them to a dedicated function called by Reinitialize(). No functional change. Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Michal Rostecki <mrostecki@opensuse.org> 29 September 2020, 21:54:19 UTC
de660c3 cilium: encrypt-node creates two IPsec tunnels but only uses one [ upstream commit 86858947d0e52d48c0b31fd496467b07bbed3c79 ] When we enable encrypt-node to encrypt all host traffic instead of just the pod traffic we setup two tunnels. The first tunnel is the typical one used in the case where encrypt-node is not set. This tunnel uses the cilium_host ip address range. The second tunnel uses the nodes IP per the encrypt-interface parameter or in the auto-discovered case the IP found after inspecting the route tables. On new kernels having duplicate entires in the 'ip xfrm policy' table that matches to multiple sates appears to be causing packets to be dropped. This was working on 4.19 kernels, but broke on upgrade to 5.4 kernels. We have not found the exact patch or fixed the regression. But, it is actually better/simpler to only have a single tunnel running in the encrypt-node case. This means fewer xfrm rules and is much easier to understand when only one rule can match a packet. Before this patch the xfrm policy/state looked like this with states and policies for both the 44.* xfrm tunnel and the 10.* xfrm tunnel. # ip x s src 0.0.0.0 dst 44.0.0.20 proto esp spi 0x00000003 reqid 1 mode tunnel replay-window 0 mark 0xd00/0xf00 output-mark 0xd00 aead rfc4106(gcm(aes)) 0xd0561beea6ab84e073edf1e76c49e9c0c6531d52 128 anti-replay context: seq 0x0, oseq 0x0, bitmap 0x00000000 sel src 0.0.0.0/0 dst 0.0.0.0/0 src 0.0.0.0 dst 10.0.0.200 proto esp spi 0x00000003 reqid 1 mode tunnel replay-window 0 mark 0xd00/0xf00 output-mark 0xd00 aead rfc4106(gcm(aes)) 0xd0561beea6ab84e073edf1e76c49e9c0c6531d52 128 anti-replay context: seq 0x0, oseq 0x0, bitmap 0x00000000 sel src 0.0.0.0/0 dst 0.0.0.0/0 src 44.0.0.20 dst 44.0.0.10 proto esp spi 0x00000003 reqid 1 mode tunnel replay-window 0 mark 0x3e00/0xff00 output-mark 0xe00 aead rfc4106(gcm(aes)) 0xd0561beea6ab84e073edf1e76c49e9c0c6531d52 128 anti-replay context: seq 0x0, oseq 0x25d, bitmap 0x00000000 sel src 0.0.0.0/0 dst 0.0.0.0/0 src 10.0.0.200 dst 10.0.1.135 proto esp spi 0x00000003 reqid 1 mode tunnel replay-window 0 mark 0x3e00/0xff00 output-mark 0xe00 aead rfc4106(gcm(aes)) 0xd0561beea6ab84e073edf1e76c49e9c0c6531d52 128 anti-replay context: seq 0x0, oseq 0x589, bitmap 0x00000000 sel src 0.0.0.0/0 dst 0.0.0.0/0 # ip x p src 0.0.0.0/0 dst 44.0.0.20/32 dir fwd priority 0 mark 0xd00/0xf00 tmpl src 0.0.0.0 dst 44.0.0.20 proto esp reqid 1 mode tunnel src 0.0.0.0/0 dst 44.0.0.20/32 dir in priority 0 mark 0xd00/0xf00 tmpl src 0.0.0.0 dst 44.0.0.20 proto esp reqid 1 mode tunnel src 0.0.0.0/0 dst 10.0.0.0/24 dir fwd priority 0 mark 0xd00/0xf00 tmpl src 0.0.0.0 dst 10.0.0.200 proto esp reqid 1 mode tunnel src 0.0.0.0/0 dst 10.0.0.0/24 dir in priority 0 mark 0xd00/0xf00 tmpl src 0.0.0.0 dst 10.0.0.200 proto esp reqid 1 mode tunnel src 0.0.0.0/0 dst 10.0.1.0/24 dir out priority 0 mark 0x3e00/0xff00 tmpl src 44.0.0.20 dst 44.0.0.10 proto esp spi 0x00000003 reqid 1 mode tunnel src 0.0.0.0/0 dst 44.0.0.10/32 dir out priority 0 mark 0x3e00/0xff00 tmpl src 44.0.0.20 dst 44.0.0.10 proto esp spi 0x00000003 reqid 1 mode tunnel src 10.0.0.0/24 dst 10.0.1.0/24 dir out priority 0 mark 0x3e00/0xff00 tmpl src 10.0.0.200 dst 10.0.1.135 proto esp spi 0x00000003 reqid 1 mode tunnel Now we get only the single 44* xfrm tunnel, # ip x s src 0.0.0.0 dst 44.0.0.20 proto esp spi 0x00000003 reqid 1 mode tunnel replay-window 0 mark 0xd00/0xf00 output-mark 0xd00 aead rfc4106(gcm(aes)) 0xd0561beea6ab84e073edf1e76c49e9c0c6531d52 128 anti-replay context: seq 0x0, oseq 0x0, bitmap 0x00000000 sel src 0.0.0.0/0 dst 0.0.0.0/0 src 44.0.0.20 dst 44.0.0.10 proto esp spi 0x00000003 reqid 1 mode tunnel replay-window 0 mark 0x3e00/0xff00 output-mark 0xe00 aead rfc4106(gcm(aes)) 0xd0561beea6ab84e073edf1e76c49e9c0c6531d52 128 anti-replay context: seq 0x0, oseq 0x4423, bitmap 0x00000000 sel src 0.0.0.0/0 dst 0.0.0.0/0 # ip x p src 0.0.0.0/0 dst 44.0.0.20/32 dir fwd priority 0 mark 0xd00/0xf00 tmpl src 0.0.0.0 dst 44.0.0.20 proto esp reqid 1 mode tunnel src 0.0.0.0/0 dst 44.0.0.20/32 dir in priority 0 mark 0xd00/0xf00 tmpl src 0.0.0.0 dst 44.0.0.20 proto esp reqid 1 mode tunnel src 0.0.0.0/0 dst 10.0.1.0/24 dir out priority 0 mark 0x3e00/0xff00 tmpl src 44.0.0.20 dst 44.0.0.10 proto esp spi 0x00000003 reqid 1 mode tunnel src 0.0.0.0/0 dst 44.0.0.10/32 dir out priority 0 mark 0x3e00/0xff00 tmpl src 44.0.0.20 dst 44.0.0.10 proto esp spi 0x00000003 reqid 1 mode tunnel There are two key pieces here. First, pod traffic is encrypted by the policy with dst 10.0.1.0/24 which is the peer nodes pod IP ranges. If we had multiple nodes we would have multiple entries here. And then second, because we now encrypt always with the 44.* IP tunnel we can eliminate a set of state rules. Considering this is both cleaner and will work on all kernels regardless of fix for kernel regression lets use this for encrypt-node use cases instead. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Michal Rostecki <mrostecki@opensuse.org> 29 September 2020, 21:54:19 UTC
fb10a31 identity: Avoid kvstore lookup for local identities [ upstream commit f3424a3690b4aca315a7b5e9c40b5fa8ed270e36 ] Reserved and CIDR identities are local to the agent and not stored in the kvstore. This commit changes the identity cache to avoid performing a kvstore lookup for CIDR entries (which is currently done when a CIDR identity is released). This is a follow-up to https://github.com/cilium/cilium/pull/13205#discussion_r490306171 Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> 29 September 2020, 21:31:11 UTC
4bae20d operator: fix operator behaviour when kube-apiserver is down [ upstream commit 30835c7a69dc863076d24708ad004d8ccd1f9f7f ] * This commit fixes an issue in Cilium dev environment wherein if kube-apiserver is stopped then cilium-operator does not restarts after loosing leader election. This was happening because we were returning exit code 0 on loosing leader election. This was causing systemd to not restart cilium-operator as the case is not regarded as failure. This was working fine with Kubernetes deployment of operator as we have restart policy set to always for operator deployment. * One edge case is fixed where now we do an exit if an error is returned when updating K8s capabilities. Earlier this could lead to an inconsistent behaviour in the cluster as we can misinterpret capabilities if kube-apiserver was down. Fixes: df90c99905ad "operator: support HA mode for operator using k8s leaderelection library" Fixes #13185 Signed-off-by: Deepesh Pathak <deepshpathak@gmail.com> Signed-off-by: Michal Rostecki <mrostecki@opensuse.org> 24 September 2020, 09:28:31 UTC
b8d50d9 doc: typo fix in gettingstarted clustermesh [ upstream commit e2a935d06fb0883e42d09bcbb0c65b9b8306771a ] Signed-off-by: Alexandre Perrin <alex@kaworu.ch> Signed-off-by: Michal Rostecki <mrostecki@opensuse.org> 24 September 2020, 09:28:31 UTC
153dccd docs: Fix cilium-operator cmdref This was missed during an earlier backport, fix it up. Fixes: 8c23646845d1 ("operator: Encase CNP status handover in CLI option") Signed-off-by: Joe Stringer <joe@cilium.io> 18 September 2020, 08:43:05 UTC
3b91443 identity: Perform identity lookup in remote kvstore caches too [ upstream commit 72f6685ddb0b69b312cca9652147c16484229f11 ] When looking up a security identity (either by its numeric id or by labels) in user-space (e.g. in Hubble or the API), we want to ensure to also include identities owned by remote clusters in cluster mesh too. Before this commit, `GetIdentities` function of the identity allocator (e.g. used for `cilium identity list`) would return all global identities (i.e. including the ones from remote clusters as well), while `LookupIdentity{,ByID}` would only return identities found the main kvstore, ignoring any indentities cached from remote kvstores. This fixes multiple missed annotations which can occur in cluster-mesh setups: - Hubble failed to annotate identities from remote clusters (#13076). - While the API would list remote identities in `/v1/identities`, performing a lookup on identities from remote clusters via API would fail with a "not found error". This 404 could be observed in `cilium identity get <remote-id>` or in `cilium bpf policy get`. - DNS proxy logrecords would not have the destination endpoint labels populated. - The `CiliumEndpoint.Status.Policy` CRD field would not contain labels for identities for remote clusters (if CEP status updates were enabled). Fixes: #13076 Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> Signed-off-by: Joe Stringer <joe@cilium.io> 18 September 2020, 08:43:05 UTC
2d56d0b endpoint: Update proxy policy after adding redirects [ upstream commit 407c4157e0eafd95316b1ba9b1f7ca2bf40bade5 ] Proxy policy must be updated after redirects have been added, as waiting for ACKs for the current version from the proxy is disabled when there are no redirects (yet). This leads to the Cilium API updating the realized policy too soon, and may lead to test flakes due to traffic being denied because policy has not been initialized yet. Found in local testing with dev VM proxy smoke test. Signed-off-by: Jarno Rajahalme <jarno@covalent.io> Signed-off-by: Joe Stringer <joe@cilium.io> 18 September 2020, 08:43:05 UTC
8fd63f5 service: Add unit test for HealthCheckNodePort=false [ upstream commit 584318fb77ecb8d09137dcb2891a772cb970c1e2 ] [ Backporter's notes: pkg/service/service_test.go had a few conflicts: * v1.7 UpsertService() takes all parameters directly * GetServiceNameByAddr() doesn't exist * TestHealthCheckNodePort doesn't create a service with id2 Resolve by reworking the code against the v1.7 version to avoid backporting large refactoring commits. ] When HealthCheckNodePort is false, then the service health server will be nil. This commit adds a unit test to ensure that the code does not crash if the service health server is nil. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> Signed-off-by: Joe Stringer <joe@cilium.io> 18 September 2020, 08:43:05 UTC
f8d9a68 service: Fix panic when restoring services with enable-health-check-nodeport: false [ upstream commit ac2d4525c66785e55fa9b3fc5c142b680e494761 ] [ Backporter's notes: Minor conflict due to lack of backend filter. Dropped the backend filter check. ] We do not instanciate the service health server if HealthCheckNodePort support is disabled in our kube-proxy replacement. This means that we need to check the `EnableHealthCheckNodePort` flag before accessing the service health server to avoid a nil pointer panic. Fixes: edff374340cb ("agent: Add an option to cilium-agent for disabling 'HealthCheckNodePort'") Fixes: #13178 Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> Signed-off-by: Joe Stringer <joe@cilium.io> 18 September 2020, 08:43:05 UTC
b13324e contrib: Add release helper scripts [ upstream commit 5388518ca5495d685ea5202694d2b7614f6063e8 ] [ Backporter's notes: Dropped docs, gh templates from commit ] Signed-off-by: Joe Stringer <joe@cilium.io> 18 September 2020, 08:43:05 UTC
42d51b9 operator: initialize gops in RootCmd Run func [ upstream commit 619294fef138d4b2e0071dbff088656b28724aaa ] [ Backporter's notes: Minor conflict due to no cmdref call in v1.7 version. ] This fixes an issue that `cilium-operator --help` exits with an error "unable to start gops..." instead of displaying the help message. This is similar to what commit 241c069d3f95 ("daemon/cmd: initialize gops in RootCmd execution function") did for the same problem in cilium-agent. Reported-by: André Martins <andre@cilium.io> Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Signed-off-by: Joe Stringer <joe@cilium.io> 18 September 2020, 08:43:05 UTC
b90af91 daemon, operator, test: log gops start failure to stderr instead of stdout [ upstream commit d570daaf06075d1d599bdbd061d22a3be7a08bbc ] Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Signed-off-by: Joe Stringer <joe@cilium.io> 18 September 2020, 08:43:05 UTC
c9a3abc changing operator ID and identity [ upstream commit 3d6cf1b1be6bfc1736ba5b09cedf82fd1765193e ] Signed-off-by: Gaurav Yadav <gaurav.dev.iiitm@gmail.com> Signed-off-by: Joe Stringer <joe@cilium.io> 18 September 2020, 08:43:05 UTC
bb0cd89 adding info message in log [ upstream commit 4c68f8adc22b71fb1869629c0d541ee097a058f4 ] Signed-off-by: Gaurav Yadav <gaurav.dev.iiitm@gmail.com> Signed-off-by: Joe Stringer <joe@cilium.io> 18 September 2020, 08:43:05 UTC
0fbbfd9 adding logs in strcutured way [ upstream commit b10fd49501a88cf197907989cfd579863417bd75 ] Signed-off-by: Gaurav Yadav <gaurav.dev.iiitm@gmail.com> Signed-off-by: Joe Stringer <joe@cilium.io> 18 September 2020, 08:43:05 UTC
558a01b daemon/cmd: initialize gops in RootCmd execution function [ upstream commit 241c069d3f951bd52c29ec85ebeedd49ce3203b7 ] [ Backporter's notes: Minor conflict due to cmdref generation not being present in daemon on v1.7.x. ] This commit fixes a bug introduced in 299629d932d8 wherein `cilium-agent --help` exits with an error(unable to start gops..). This happens because the user started the cilium-agent process with root privileges which created the directory `/home/vagrant/.config/gops` with the user root. Any further attempt to run cilium-agent without root privileges will print out this error. The commit fixes running commands like `cilium-agent --help` `cilium-agent --cmdref <dir>` without root priviliges by moving gops initialization to a later stage in the agent. Fixes: 299629d932d8 "daemon: open socket in agent for gops stack traces" Signed-off-by: Deepesh Pathak <deepshpathak@gmail.com> Signed-off-by: Joe Stringer <joe@cilium.io> 18 September 2020, 08:43:05 UTC
c22283b test: update k8s test versions to 1.16.15 and 1.17.12 Also update Kubernetes libraries to 1.17.12 Signed-off-by: André Martins <andre@cilium.io> 17 September 2020, 20:01:15 UTC
8c23646 operator: Encase CNP status handover in CLI option [ upstream commit d1c6a750dabe799fad6bbdc70218d8ab4351c083 ] The code for handling CNP status updates from other nodes via the kvstore was previously not covered by the same option that enables this functionality in the cilium-agent daemon. As such, this could cause the logic to run, including goroutines for each CNP, in scenarios where this logic is not in use. Improve memory usage by disabling this functionality when it is disabled. Co-authored-by: Joe Stringer <joe@cilium.io> Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: André Martins <andre@cilium.io> 10 September 2020, 21:30:45 UTC
84b9b2d docs: add wildcard from endpoints clusterwide policy example [ upstream commit 2231af01774bdf0399f6a409d8d466c4339df87f ] Signed-off-by: Deepesh Pathak <deepshpathak@gmail.com> 10 September 2020, 05:58:12 UTC
0534d97 cilium/preflight: add check for empty to/fromEndpoint in CCNP validation [ upstream commit 91a6fc482e9b1e815de1ebe4bbb995de87de0078 ] * This commit extends the cilium preflight validate-cnp check. When validating CCNP it checks if there is an empty to/from endpoint selector in the rules and warns about the problem and a possible fix. * This is to help users with upgrade scenarios when using Cilium. For a more detailed discussion on the probelm see issue - https://github.com/cilium/cilium/issues/12844 Signed-off-by: Deepesh Pathak <deepshpathak@gmail.com> 10 September 2020, 05:58:12 UTC
c5afae6 pkg/k8s: add unit tests for empty to/fromEndpoint with ccnp [ upstream commit abe5e24849a9a81767ee675a7a179f12e10d3793 ] Signed-off-by: Deepesh Pathak <deepshpathak@gmail.com> 10 September 2020, 05:58:12 UTC
7d9905c pkg/policy: fix endpoint selection for wildcard to/fromEndpoint in CCNP [ upstream commit 905b8d41d09caa387e8a54f041d02d2b47ade968 ] * This commit fixes an issue in endpoint selection when we provide wildcard for to/fromEndpoint in CCNP. When a wildcard is provided in CCNP fromEndpoint selector we end up with a truly empty endpoint selector. This results in allowing all traffic. The commit restricts this to only include endpoints that are managed by cilium by checking the presence of namespace label in endpoint. * For a more detailed explaination of the approach and the issue take a look at discussion following this github comment - https://github.com/cilium/cilium/pull/12890#issuecomment-674358012 Signed-off-by: Deepesh Pathak <deepshpathak@gmail.com> 10 September 2020, 05:58:12 UTC
3236442 test: Detect missed tail calls on upgrade/downgrade test [ upstream commit e9b3844f57fc13a870d113a568153867d4c4c464 ] Connectivity disruptions caused by missed tail calls were recently reported at #13015. It was caused by an incorrect handling of a call map rename. We didn't detect it because we don't have code to specifically detect missed tail calls during the upgrade/downgrade test; the test only fails if the connectivity is broken during a long enough period. This commit adds a new function to retrieve the sum of 'Missed tail calls' metrics across all Cilium pods. It is then used in the test after both the upgrade and the subsequent downgrade to check that no drops due to missed tail calls happened. This new test was tested by: - backporting to v1.8 and checking that missed tail calls are indeed detected. - backporting the fixes on the v1.7 (#13052) and v1.8 (#13051) branches and checking that no more tail calls were detected. We need to wait for both #13052 and #13051 to be merged and backported before we can backport this test to v1.7 and v1.8, as it will otherwise fail. Related: #13015, #13051, #13052 Signed-off-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Deepesh Pathak <deepshpathak@gmail.com> 10 September 2020, 05:58:12 UTC
b513e35 policy/api: update toServices docs to include current support status [ upstream commit 09b3df13a5f3e7232075bff6b33fd3b885726301 ] Signed-off-by: Deepesh Pathak <deepshpathak@gmail.com> 10 September 2020, 05:58:12 UTC
e43cf00 policy/groups: add unit tests for CCNP toGroups derivative policies [ upstream commit ff997740c46443bd6a6cc17365878ce42554c6cd ] Signed-off-by: Deepesh Pathak <deepshpathak@gmail.com> 10 September 2020, 05:58:12 UTC
9dfa1b1 pkg/policy: fix toGroups derivative policy for clusterwide policies [ upstream commit 5519f2bab286143670cf985aa7046c492d77be94 ] * This commit fixes an inherent issue with CCNP where if toGroups is specified in the Clusterwide policies it has no effect in creating a derived policy. The reason being we handle CCNP similar to CNP as they both are converted into SlimCNPs and processed in a similar way. For CCNP, creation of derived policy fails as we try to update status as CNP which is not possible. * This commit introduces a fix by checking the namespace field in the converted SlimCNP and handling cases of CCNP in a different manner suited for the required type. Signed-off-by: Deepesh Pathak <deepshpathak@gmail.com> 10 September 2020, 05:58:12 UTC
08a4294 daemon: Fix handling of policy call map on downgrades The name of the policy call map on the bpffs was changed between 1.7 and 1.8 by commit 5d6b669. Commit 6bada02 then added code to delete the old map name on initialization of the agent. We cannot simply delete the old policy call map because it might still be used by endpoints (until they are regenerated) and the base programs (until they are reloaded). However, if we rename the map in the bpffs, it shouldn't affect BPF programs that are using it and they will then pick up the new name on reload. The reverse renaming operation is needed in 1.7 to allow for downgrades from 1.8. Fixes: 5d6b669 ("maps/policymap: Rename policy call map to clarify intent") Fixes: 6bada02 ("daemon: Remove old policy call map") Signed-off-by: Paul Chaignon <paul@cilium.io> 07 September 2020, 17:56:36 UTC
8c62494 Update kops installation documentation [ upstream commit 06dbc52d6edb6a0d2f24b150e96c513f5815ae0e ] Removes references to things like CoreOS and old etcd versions. Also added some further reading links for those who want to configure cilium on kops further. Signed-off-by: Ole Markus With <o.with@sportradar.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> 04 September 2020, 21:31:46 UTC
ea83d37 docs: Cosmetic format fix [ upstream commit beb401bf019839f6aba0d6abfdfab6bb22320d7d ] Signed-off-by: Ilya Dmitrichenko <errordeveloper@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> 04 September 2020, 21:31:46 UTC
20b1ced docs: Fix inconsisten title formats [ upstream commit 5d9139f014ff5230df726e7044c2c12fffe189e6 ] Signed-off-by: Ilya Dmitrichenko <errordeveloper@gmail.com> Signed-off-by: Paul Chaignon <paul@cilium.io> 04 September 2020, 21:31:46 UTC
9a3bd9c docs: Replace references to old demo app [ upstream commit e166ae0d417fcfd01c2df01ad3f501ea0dfa5f84 ] Signed-off-by: Ilya Dmitrichenko <errordeveloper@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> 04 September 2020, 21:31:46 UTC
3ece8b8 test: Fix GuestBook test [ upstream commit 8aba1c195ec96f8696ef354c65874be8e6b7d088 ] ae9e4be updated the GuestBook images and labels, but failed to make the same label update in the test itself. Thus, since then, we have not been running any connectivity check in the GuestBook test. That went unnoticed because we didn't check that the set of pods returned (from which we run connectivity checks) was not empty. This commit fixes it by: 1. Updating the label in the test itself to app=guestbook. 2. Adding a check that the set of pods selected isn't empty. However, the nc utility we were using to check connectivity from the frontend pods to the Redis backend isn't available in the new images. Therefore, we also need to: 3. Use curl instead inside the frontend pods to check that the PHP frontend works as expected and is able to contact the Redis backend. That's it? No. Turns out some of the pod labels and names have also been hardcoded in the Docker images and have been updated (mostly to use more neutral terms). 4. Update the YAML file to better match [1]. We however can't update the 'redis-master' name because our v6 frontend image has it hardcoded. The v5 frontend image at [1] has 'redis-leader' as the name, but somehow not the v6. We want to use the v6 image because it is a lot bigger (cf. dffb450fe7). 5. And finally, Bob's our uncle! 1 - https://cloud.google.com/kubernetes-engine/docs/tutorials/guestbook Fixes: #12994 Fixes: ae9e4be ("test: replace guestbook test docker image") Signed-off-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> 04 September 2020, 21:31:46 UTC
e7ae390 Prepare for release v1.7.9 Signed-off-by: Joe Stringer <joe@cilium.io> 02 September 2020, 03:24:30 UTC
957c4c5 pkg/kvstore: set endpoint shuffle in etcd client connectivity [ upstream commit 642a2e1f516bb2ba423cde4c083668c89b757533 ] The endpoint shuffle was happening before loading the etcd configuration. To have the endpoints shuffled we should do it after loading the configuration from disk. Fixes: b95650b30b46 ("etcd: Shuffle list of etcd endpoints") Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Joe Stringer <joe@cilium.io> 02 September 2020, 01:33:44 UTC
5e82934 docs: Mention L7 limitation in Calico chaining GSG [ upstream commit 40a30cfc928833d5227f7c2097503999ce58b612 ] Several users have reported issues with L7 policies when running Cilium in chaining configuration on top of Calico. The long-term solution to this issue is well known (BPF TPROXY), but we should add a note to the documentation in the meantime to warn users. Signed-off-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Joe Stringer <joe@cilium.io> 02 September 2020, 01:33:44 UTC
9ee7668 daemon: Add --transmit-host-src option By default to facilitate smooth upgrade from v1.7.x to v1.7.y with --enable-remote-node-identity=false, this option ensures that v1.7.9 or later will transmit traffic from the local node towards other nodes with the "remote-node" identity, as v1.7.7 or earlier did. This flag is the third in a trifecta of flags used to control how traffic is transmitted and received between Cilium nodes: * --allow-host-src: Allows traffic from nodes that report the traffic as from the host identity, such as v1.6.x or earlier. * --transmit-host-src: Sends traffic towards other nodes with the source identity host, to ensure that 1.6.x nodes will accept traffic from this node during transition. * --enable-remote-node-identity: Treat network policy that allows from host identity to also allow from remote-node identity. When performing an upgrade from v1.6.x to v1.7.x, the following steps should be taken to prevent datapath outage: * Upgrade to Cilium v1.7.x with --transmit-host-src=true, --allow-host-src=true and --enable-remote-node-identity=false. This ensures that traffic from v1.6.x with source identity host will be accepted by v1.7.x nodes; v1.7.x nodes will transmit with source identity host so that v1.6.x nodes will accept the traffic; and that no network policy change is required to support this transition. * Reconfigure Cilium v1.7.x to disable --transmit-host-src. Traffic will still be accepted from other v1.7.x nodes if sent from the host identity, but after this configuration step all nodes will transmit the remote-node identity. * Reconfigure Cilium v1.7.x to disable --allow-host-src. After the previous step, no node should be transmitting the source identity so there is no need to accept traffic from remote nodes with this src. * Adapt network policy to ensure that traffic from remote nodes is explicitly allowed in the policy, eg via "fromEntities" policy that matches on the remote-node identity. * Finally, configure --enable-remote-node-identity=true. This will mean that traffic from remote nodes is not automatically allowed unless explicitly declared in the policy from the previous step. Signed-off-by: Joe Stringer <joe@cilium.io> 02 September 2020, 01:32:05 UTC
1749f6c daemon: Default --allow-host-src to false If the user specifies --enable-remote-node-identity=false, previously we would default --allow-host-src to true, which had the side effect of also transmitting using the source identity "host". In this mode, during upgrade from v1.7.7 to v1.7.8, as new v1.7.8 nodes came online and began to packets to v1.7.7 nodes with source identity "host", the v1.7.7 nodes would drop the traffic. Remove this behaviour to fix v1.7.7 upgrades to v1.7.9. Signed-off-by: Joe Stringer <joe@cilium.io> 02 September 2020, 01:32:05 UTC
d7ca75c doc: Document seamless upgrade process for remote-node identity change The upgrade instructions have been incomplete and did not allow for a seamless upgrade from 1.6.x to 1.7.0. Signed-off-by: Thomas Graf <thomas@cilium.io> 02 September 2020, 00:36:08 UTC
b5bc930 Prepare for release v1.7.8 Signed-off-by: Joe Stringer <joe@cilium.io> 28 August 2020, 20:34:55 UTC
8d9e532 daemon: Add hidden allow-remote-src flag This flag allows to override the default setting for whether to accept traffic into a node if the source node reports the source identity as "host". Default is "auto" - to allow if enableRemoteNodeIdentity is disabled; deny if enableRemoteNodeIdentity is enabled. If specified directly, it will take on the configured behaviour (allow such traffic if true, otherwise drop it). Signed-off-by: Joe Stringer <joe@cilium.io> 28 August 2020, 20:10:47 UTC
830f9cd bpf: Allow from host with remote-node-identity=false According to the documentation: One can set enable-remote-node-identity=false in the ConfigMap to retain the Cilium 1.6.x behavior. The above is true when evaluating policy, but not true from the DROP_INVALID_IDENTITY perspective as during an upgrade from v1.6 to v1.7, v1.6 nodes may send traffic to v1.7 nodes with the "host" identity and the v1.7 nodes will drop such traffic with this error code. Mitigate this by also covering this datapath case with the `EnableRemoteNodeIdentity` flag check. Signed-off-by: Joe Stringer <joe@cilium.io> 28 August 2020, 20:10:47 UTC
f3e57e4 iptables, loader: add rules to ensure symmetric routing for AWS ENI traffic [ upstream commit 132088c996a59e64d8f848c88f3c0c93a654290c ] Multi-node NodePort traffic with AWS ENI needs a set of specific rules that are usually set by the AWS DaemonSet: # sysctl -w net.ipv4.conf.eth0.rp_filter=2 # iptables -t mangle -A PREROUTING -i eth0 -m comment --comment "AWS, primary ENI" -m addrtype --dst-type LOCAL --limit-iface-in -j CONNMARK --set-xmark 0x80/0x80 # iptables -t mangle -A PREROUTING -i eni+ -m comment --comment "AWS, primary ENI" -j CONNMARK --restore-mark --nfmask 0x80 --ctmask 0x80 # ip rule add fwmark 0x80/0x80 lookup main These rules mark packets coming from another node through eth0, and restore the mark on the return path to force a lookup into the main routing table. Without them, the "ip rules" set by the cilium-cni plugin tell the host to lookup into the table related to the VPC for which the CIDR used by the endpoint has been configured. We want to reproduce equivalent rules to ensure correct routing, or multi-node NodePort traffic will not be routed correctly. This could be observed with the pod-to-b-multi-node-nodeport pod from connectivity check never getting ready. This commit makes the loader and iptables module create the relevant rules when ENI is in use. The rules are nearly identical to those from the aws daemonset (different comments, different interface prefix for conntrack return path, explicit preference for ip rule): # sysctl -w net.ipv4.conf.<egressMasqueradeInterfaces>.rp_filter=2 # iptables -t mangle -A PREROUTING -i <egressMasqueradeInterfaces> -m comment --comment "cilium: primary ENI" -m addrtype --dst-type LOCAL --limit-iface-in -j CONNMARK --set-xmark 0x80/0x80 # iptables -t mangle -A PREROUTING -i lxc+ -m comment --comment "cilium: primary ENI" -j CONNMARK --restore-mark --nfmask 0x80 --ctmask 0x80 # ip rule add fwmark 0x80/0x80 lookup main pref 109 Fixes: #12098 Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: John Fastabend <john.fastabend@gmail.com> 28 August 2020, 20:09:06 UTC
1819921 daemon: properly maintain node lists on updates [ upstream commit 5550c0f3f2206d05f3ef3af569ab756cbba94fae ] NodeAdd and NodeUpdate update the node state for clients so that they can return the changes when client requests so. If a node was added and then updated, its old and new version would be on the added list and its old on the removed list. Instead, we can just update the node on the added list. Note that the setNodes() function on pkg/health/server/prober.go first deletes the removed nodes and then adds the new ones, which means that the old version of the node would be added and remain as stale on the health server. This was found during investigation of issues with inconsistent health reports when nodes are added/removed from the cluster (e.g., #11532), and it seems to fix inconsistencies observed a small-scale test I did to reproduce the issue. Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com> Signed-off-by: Alexandre Perrin <alex@kaworu.ch> 28 August 2020, 18:26:15 UTC
93f4976 docs: limit copybutton to content area only [ upstream commit 6711a0ce13cceb217df187c492f11e7879cb3a09 ] Fixes copy button to not conflict with the search Signed-off-by: Sergey Generalov <sergey@genbit.ru> Signed-off-by: Alexandre Perrin <alex@kaworu.ch> 28 August 2020, 18:26:15 UTC
ad25469 Upgrade Cilium docs theme version [ upstream commit eeec4d0e00549a886511069ebf6784042d93550c ] Signed-off-by: Nicolas Jacques <neela@isovalent.com> Signed-off-by: Alexandre Perrin <alex@kaworu.ch> 28 August 2020, 18:26:15 UTC
035a85b vagrant: Don't use the NFS device's IP as node IP [ upstream commit 1c37921003a824568d8165de4625c4ce390df37c ] The K8s node IP is the IP address propagated to other nodes and mapped to the REMOTE_NODE_ID in the ipcache. We therefore don't want to use the IP address of the NFS interface (enp0s9) for that. When we use that IP address, any policy using the remote-node identity (or host in case the two aren't dissociated) will fail to resolve properly. In general, I don't think K8s even needs to know about the NFS interface or its IP addresses. Fixes: 0eafea4 ("examples/kubernetes-ingress: fixing scripts to run k8s 1.8.1") Signed-off-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Alexandre Perrin <alex@kaworu.ch> 28 August 2020, 18:26:15 UTC
2cebf91 fix: node-init should use docker if /etc/crictl.yaml not found [ upstream commit 552c823f561149213807627cfbd724c39dbd8a10 ] This script has several tests for what the container runtime situation looks like to determine how best to restart the underlying containers (going around the kubelet) so that the new networking configuration can take effect. The first test looks to see if the crictl config file is configured to use docker, but if that file doesn't exist then it fails. I believe docker is the default if this hasn't been configured at all so if that file doesn't exist then use docker. Fixes #12850 Signed-off-by: Nathan Bird <njbird@infiniteenergy.com> Signed-off-by: John Fastabend <john.fastabend@gmail.com> 28 August 2020, 01:54:54 UTC
9fdee51 etcd: Make keepalive interval and timeout configurable [ upstream commit a4a1df0289a3067e3c9913c894b322b64cc3b0e1 ] Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: John Fastabend <john.fastabend@gmail.com> 28 August 2020, 01:54:54 UTC
a1326e5 pkg/kvstore: add gRPC keep alives for etcd connectivity [ upstream commit 268f4066e4f8d245f67d3cfc305a11d76ffffb1e ] If the client does not receive a keep alive from the server, that connection should be closed so the etcd client library does proper round robin for the other available endpoints. This might be a little bit aggressive in a larger environment if all clients perform a keep alive requests to the etcd servers. Some testing could be done to verify if there is a large overhead of doing these keep alive requests. Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: John Fastabend <john.fastabend@gmail.com> 28 August 2020, 01:54:54 UTC
8f5a278 datapath: Pull skb data in to-netdev path [ upstream commit 2960b5f56ad048fe04560e01349b36c2422c8afc ] It has been reported [1][2] that ICMP packets are being dropped by a receiving node due to DROP_INVALID when bpf_host was attached to the receiving iface. Further look into the issue revealed that the drops were happening because IP headers were not in the skb linear data (unsuccessful revalidate_data() caused the DROP_INVALID return). Fix this by making sure that the first invocation of revalidate_data() in the "to-netdev" path will always do skb_data_pull() before deciding that the packet is invalid. [1]: https://github.com/cilium/cilium/issues/11802 [2]: https://github.com/cilium/cilium/issues/12854 Reported-by: Andrei Kvapil <kvapss@gmail.com> Signed-off-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: John Fastabend <john.fastabend@gmail.com> 28 August 2020, 01:54:54 UTC
6d2c9d1 docs/metrics: Correct label typo `equal` in metrics.rst [ upstream commit 85600be8c0d73ca564661979f37c37e63340cd2d ] This PR is to correct simple typo equal in metrics.rst Signed-off-by: Tam Mach <sayboras@yahoo.com> Signed-off-by: John Fastabend <john.fastabend@gmail.com> 28 August 2020, 01:54:54 UTC
2f6f93a doc: fix the AKS installation validation Before this patch, the aks documentation tries to find cilium in the rube-system namespace although it is installed in the cilium namespace (alike to the GKE documentation). Signed-off-by: Alexandre Perrin <alex@kaworu.ch> 27 August 2020, 17:03:47 UTC
35f1b10 doc: Specify CILIUM_NAMESPACE for Hubble installation instruction [ upstream commit 575bff841b3f796d255e009da74944e4b7166b3a ] This makes it easier to follow the instructions, especially for GKE which uses cilium namespace instead of kube-system. Signed-off-by: Michi Mutsuzaki <michi@isovalent.com> Signed-off-by: Alexandre Perrin <alex@kaworu.ch> 27 August 2020, 17:03:47 UTC
88ecd7f operator: make EC2 AWS API endpoint configurable Add a new --ec2-api-endpoint operator option which allows to specify a custom AWS API endpoints for the EC2 service. One possible use-case for this is the usage of FIPS endpoints, see https://aws.amazon.com/compliance/fips/. For example, to use API endpoint ec2-fips.us-west-1.amazonaws.com, the AWS operator can be called using: cilium-operator --ec2-api-endpoint=ec2-fips.us-west-1.amazonaws.com Updates #12620 Signed-off-by: Tobias Klauser <tklauser@distanz.ch> 18 August 2020, 07:01:58 UTC
bb0ee9c Istio: Update to release 1.5.9 [ upstream commit 8dca1c8e10138e99e664951a4d3540154bb25117 ] Signed-off-by: Jarno Rajahalme <jarno@covalent.io> Signed-off-by: Tobias Klauser <tklauser@distanz.ch> 14 August 2020, 12:46:19 UTC
34c7adf daemon: Add hidden --k8s-sync-timeout option [ upstream commit bd89e83a4245769dac42860cf928e2dd7c227ce1 ] This option governs how long Cilium agent will wait to synchronize local caches with global Kubernetes state before exiting. The default is 3 minutes. Don't expose it by default, this is for advanced tweaking. Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Tobias Klauser <tklauser@distanz.ch> 14 August 2020, 12:46:19 UTC
fbab570 k8s: update k8s versions to 1.17.11 Also update test versions to 1.17.11 and 1.16.14 Signed-off-by: André Martins <andre@cilium.io> 14 August 2020, 10:41:13 UTC
01b935e operator: Fix non-leader crashing with kvstore [ upstream commit 3d376fae42ab0fac43403dfec6f08fe7eecb3234 ] A non-leader operator will hang during its healthcheck report as it tries to check the status of the kvstore. The reason it hangs is because the leader operator is the only one that has access to the client. This hang causes an HTTP level timeout on the kubetlet liveness check. The timeout then causes kubelet to roll the pod, eventually into CrashLoopBackOff. ``` Warning Unhealthy 8m17s (x19 over 17m) kubelet, ip-10-0-12-239.us-west-2.compute.internal Liveness probe failed: Get http://127.0.0.1:9234/healthz: net/http: request canceled (Client.Timeout exceeded while awaiting headers) ``` Signed-off-by: Chris Tarazi <chris@isovalent.com> 11 August 2020, 03:46:40 UTC
de6a5ac Update Go to 1.13.15 Signed-off-by: Tobias Klauser <tklauser@distanz.ch> 07 August 2020, 15:35:11 UTC
5aeb7ff docs: clarify Kubernetes compatibility with Cilium [ upstream commit 038877cc5a71cb546a27d53f97d4b4fa46f592b4 ] Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Robin Hahling <robin.hahling@gw-computing.net> 07 August 2020, 05:41:47 UTC
492a26c docs: add current k8s network policy limitations [ upstream commit c767682be85bb96e7398aa492f538867591943b3 ] Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Robin Hahling <robin.hahling@gw-computing.net> 07 August 2020, 05:41:47 UTC
0cee8c8 doc: update #ebpf Slack channel name [ upstream commit 0547ea4b5c86e81bf8967ed21dd87b0007fd3d67 ] The Slack channel dedicated to discussions on eBPF and datapath has been renamed from #bpf to #eBPF (on 2020-08-03). Report this change to Cilium's documentation, and also turn "BPF" into "eBPF" on the updated page. Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Robin Hahling <robin.hahling@gw-computing.net> 07 August 2020, 05:41:47 UTC
1c83817 Makes ExecInPods not wait for pod ready during log gathering upon test failure. [ upstream commit 7986df9d25ad345be97e71a5af3e3720367c9533 ] Signed-off-by: Weilong Cui <cuiwl@google.com> Signed-off-by: Robin Hahling <robin.hahling@gw-computing.net> 07 August 2020, 05:41:47 UTC
f2615d6 test: Disable K8sKubeProxyFreeMatrix [ upstream commit e09d991a1970ffd4a286382322091ffc64d40add ] The suite does not provide much value because of the following reasons: - It does not test the kube-proxy replacement from outside, so only bpf_sock is tested. - K8sServicesTest should provide the same coverage. - It takes 20min to run the suite. Signed-off-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: Robin Hahling <robin.hahling@gw-computing.net> 07 August 2020, 05:41:47 UTC
b67a6fe install: Default cilium-operator replicas to 1 Given that Cilium v1.7.x was not released with operator HA and we are otherwise disabling this functionality (including etcd heartbeats) by default in v1.7 branch, we should also not surprise users during the v1.7.7->v1.7.8 point release update by upgrading to find that there are more copies of the cilium-operator running in their cluster. Users can still opt in by explicitly configuring the number of replicas. Signed-off-by: Joe Stringer <joe@cilium.io> 05 August 2020, 21:47:27 UTC
da02ac8 install/kubernetes: do not schedule cilium-operator pods in same node [ upstream commit bde2daf77fa8bac84a66c803ecc18f78df779082 ] Since Cilium Operator is running in host network, 2 or more pods can't run in the same node at the same time or they will clash the open ports they use for liveness and / or readiness health check. Fixes: 930bde726974 ("install: update helm templates to add HA capabilities for operator") Signed-off-by: André Martins <andre@cilium.io> 05 August 2020, 21:47:27 UTC
233aeb1 test: generate cilium helm template validating against k8s cluster [ upstream commit 82cc7c3d076149e147cd120f7ee4424317a39ccd ] * Use --validate with `helm template` command to validate the generated manifest against the associated kubernetes cluster * For more information see - https://github.com/cilium/cilium/pull/12409#discussion_r453313631 Signed-off-by: Deepesh Pathak <deepshpathak@gmail.com> Signed-off-by: André Martins <andre@cilium.io> 05 August 2020, 21:47:27 UTC
b88b96e install: update helm templates to add HA capabilities for operator [ upstream commit 930bde726974320196583cde03ccf1c57af55606 ] Signed-off-by: Deepesh Pathak <deepshpathak@gmail.com> Signed-off-by: André Martins <andre@cilium.io> 05 August 2020, 21:47:27 UTC
a625703 operator: support HA mode for operator using k8s leaderelection library [ upstream commit df90c99905ad107710ce66d2dd36820f068db189 ] * Make leaderelection parameters configurable using command line flags * Update cmdref to include documentation for new flags. Signed-off-by: Deepesh Pathak <deepshpathak@gmail.com> Signed-off-by: André Martins <andre@cilium.io> 05 August 2020, 21:47:27 UTC
81c2d35 k8s: add coordinationv1 capability check to k8s version package [ upstream commit fb101dfc04ddb6277413207eb6b6580f4be82b82 ] * Introduces config option `K8sLeasesFallbackDiscoveryEnabled` to check if fallback discovery is enabled for Leases. * K8sLeasesFallbackDiscovery is enabled by default only in operator. Signed-off-by: Deepesh Pathak <deepshpathak@gmail.com> Signed-off-by: André Martins <andre@cilium.io> 05 August 2020, 21:47:27 UTC
f7431a5 vendor: vendor kubernetes leaderelection library [ upstream commit 66c3d9c4d3ef85d57914ec1a595f0226fcde7e00 ] Signed-off-by: Deepesh Pathak <deepshpathak@gmail.com> Signed-off-by: André Martins <andre@cilium.io> 05 August 2020, 21:47:27 UTC
469af4e rand: rename RandomRune* funcs to RandomString* [ upstream commit 3c9c9709cf6a82da9060fa824119c24463e18c0a ] These functions return strings, not a rune. Rename them accordingly. Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Signed-off-by: André Martins <andre@cilium.io> 05 August 2020, 21:47:27 UTC
ae9f83e ctmap: Add unit test for ICMP CT/NAT GC [ upstream commit ae3917b64dd04d35c68e5746b1b670555a9f0ffe ] Signed-off-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: Robin Hahling <robin.hahling@gw-computing.net> 05 August 2020, 21:47:27 UTC
efeec6c datapath: Fix CT tuple ports for ICMP Echo [ upstream commit c633c2d02d59cc9429800bd1912187f246eaf0ac ] Previously, when an ICMP EchoRequest was sent from one node A to another node B with Echo ID > NAT_MIN_EGRESS, the ICMP EchoReply sent from B -> A created a CT entry and NAT entries which could not be related by GC. E.g. node A (192.168.34.12) pings node B (192.168.34.11): ICMP IN 192.168.34.12:0 -> 192.168.34.11:38193 XLATE_DST 192.168.34.11:38193 Created=6292sec HostLocal=1 ICMP OUT 192.168.34.11:38193 -> 192.168.34.12:0 XLATE_SRC 192.168.34.11:38193 Created=6292sec HostLocal=1 ICMP OUT 192.168.34.11:0 -> 192.168.34.12:38193 expires=16783063 RxPackets=0 RxBytes=0 RxFlagsSeen=0x00 LastRxReport=0 TxPackets=1 TxBytes=50 TxFlagsSeen=0x00 LastTxReport=16783005 Flags=0x0000 [ ] RevNAT=0 SourceSecurityID=0 IfIndex=0 This made the NAT entries to escape the CT GC meaning that the CT entry was removed, while the NAT entries were kept which made them to stay forever until a user manually ran "cilium bpf nat flush". Fix this by setting ICMP Echo ID in a port which belongs to addr of the local node, so that the CT GC could relate the NAT entries. In the previous example, the CT entry after the fix is the following: ICMP OUT 192.168.34.11:38193 -> 192.168.34.12:0 expires=16783063 RxPackets=0 RxBytes=0 RxFlagsSeen=0x00 LastRxReport=0 TxPackets=1 TxBytes=50 TxFlagsSeen=0x00 LastTxReport=16783005 Flags=0x0000 [ ] RevNAT=0 SourceSecurityID=0 IfIndex=0 The fix does not change the ID placement in a port for the case when B -> A sends ICMP EchoRequest. Signed-off-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: Robin Hahling <robin.hahling@gw-computing.net> 05 August 2020, 21:47:27 UTC
bd9d164 Prepare for release v1.7.7 Signed-off-by: Joe Stringer <joe@cilium.io> 05 August 2020, 01:26:41 UTC
c55a82d etcd: Fix firstSession error handling [ upstream commit 40026dbb211a43061ac8bbd9d534a3a1fa1e562f ] The commit bf8e4327448 ("etcd: Ensure that firstSession is closed") incorrectly assumed that only a single reader exists for firstSession. This is not the case and the error returned via the channel will only be read by one of the readers, the other readers will assume success and continue in their code logic even though the etcd client is being shut down. Fixes: bf8e4327448 Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: Joe Stringer <joe@cilium.io> 05 August 2020, 01:00:07 UTC
cb0befd test: Validate FQDN connectivity during restart This commit reworks how the FQDN test is run. It now validates that connectivity is still available during a Cilium restart, instead of waiting until Cilium is back up. This allows validating https://github.com/cilium/cilium/pull/12718 and https://github.com/cilium/cilium/pull/12731 improving the time it takes the procy to respond to DNS requests. Signed-off-by: Chris Tarazi <chris@isovalent.com> 04 August 2020, 10:10:47 UTC
0f78aa0 test: Remove temporary wait for endpoints in FQDN This can be removed now that https://github.com/cilium/cilium/pull/12718 has been merged. The aforementioned PR should improve the DNS connectivity downtime when Cilium is restarting. Related: https://github.com/cilium/cilium/pull/12731 Signed-off-by: Chris Tarazi <chris@isovalent.com> 04 August 2020, 10:10:47 UTC
3ffa3c5 test: Refactor FQDN test for readability This commit splits out the function to test connectivity into components to be reused. The call sites are consolidated. Signed-off-by: Chris Tarazi <chris@isovalent.com> 04 August 2020, 10:10:47 UTC
629efeb etcd: Disable heartbeat quorum check by default Diable the heartbeat check by default as the sudden requirement for the cilium-operator to be always available can come to a surprise to existing 1.7 users. Require etcd.enableHeartbeat=true to be set in order to enable the requirement for the heartbeat. Signed-off-by: Thomas Graf <thomas@cilium.io> 04 August 2020, 08:05:33 UTC
5c12c1a endpoint: Demote proxy stats not found warnings to debug level "Proxy stats not found when updating" warnigns are currently issued if stats updates are received for a proxy redirect that can not be found. There are two common scenarios where this can happen as part of normal operation: 1. A policy change removed a proxy redirect, and stats updates from requests that were redirected to the proxy before the datapath redirect entry was removed are received. 2. DNS proxy issues stats for requests that have been forwarded on the basis of a restored DNS policy while the Endpoint policy has not yet been computed. Demote this log message to debug level to avoid these false warnings. Signed-off-by: Jarno Rajahalme <jarno@covalent.io> 01 August 2020, 10:23:27 UTC
d7da834 dnsproxy: Use restored Endpoints before Endpoints are available Use restored Endpoints during Cilium restart when Endpoints are not yet available. Do not error out if the destination IP can not be found from ipcache, but default to WORLD destination security identity instead. This allows IP-based restored rules to be processed before ipcache is fully updated. Signed-off-by: Jarno Rajahalme <jarno@covalent.io> 01 August 2020, 10:23:27 UTC
adac95f ipcache: Fix unit test flake This list was unsorted which caused random ordering at test run time. Fixes: #12733 Fixes: c3e19d8ee0f6 ("dnsproxy: Use restored rules during restart") Signed-off-by: Joe Stringer <joe@cilium.io> 01 August 2020, 00:40:20 UTC
88174d7 fqdn/dnsproxy: set SO_REUSEPORT on listening socket Now that we start re-using the same port for the DNS proxy across restarts (see #12718), set the SO_REUSEPORT option on the listening port. This given the proxy a better chance to re-bind() upon restarts. Signed-off-by: Tobias Klauser <tklauser@distanz.ch> 31 July 2020, 14:25:32 UTC
a1b7232 agent: Fix bootstrap metric for kvstore [ Backporter's notes: Resolved conflict with `k8sCachesSynced` channel, which which was moved to another location in the upstream commit. ] [ upstream commit 87d68ea095158dbf347bdd5ea6aca17566dd05a2 ] Do not account kvstore initialization as k8s bootstrap time. Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: Chris Tarazi <chris@isovalent.com> 31 July 2020, 10:18:57 UTC
389705e k8s: Register CRDs in parallel [ Backporter's notes: Ran `go mod tidy && go mod vendor` to retrieve errgroup external package. ] [ upstream commit c8fd3e9c5d10914576abd670971f81e4c7c60a3a ] Individual CRD registrations do not depend on each other, the registration can be done in parallel. Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: Chris Tarazi <chris@isovalent.com> 31 July 2020, 10:18:57 UTC
2ba8ae9 fqdn: Limit the max processing in GetRules() Normally the number of DNS proxy rules should be very small. To guard against pathological cases, limit the number of IPs processed to 1000 per port. Signed-off-by: Jarno Rajahalme <jarno@covalent.io> 31 July 2020, 07:17:16 UTC
759af79 endpoint: Remove restored DNS rules Remove restored DNS rules after a successful regeneration, and also at endpoint delete to cover endpoints that were never regenerated. Signed-off-by: Jarno Rajahalme <jarno@covalent.io> 31 July 2020, 07:17:16 UTC
401f171 endpoint: Update DNSRules on header rewrite Update DNSRules, if any, before writing headers to capture potentially changed allowed destination IPs. Signed-off-by: Jarno Rajahalme <jarno@covalent.io> 31 July 2020, 07:17:16 UTC
c3e19d8 dnsproxy: Use restored rules during restart Store current DNS rules with the Endpoint and use them in the DNS proxy during initial regeneration of the restrored endpoints. DNS proxy starts with restored DNS rules based on allowed IP addresses. These rules are removed for each endpoint as soon as the first regeneration completes. Such restored rules should allow DNS requests to be served, but for new DNS resolutions to be added to the Endpoint's policy the restored endpoint's must still have their first regeneration completed. Signed-off-by: Jarno Rajahalme <jarno@covalent.io> 31 July 2020, 07:17:16 UTC
99196d5 daemon: Minimize DNS proxy downtime on restart Start proxy support earlier in the daemon bootstrap, notably before any k8s setup. Fetch old endpoints earlier so that the DNS history is avalailable before k8s is set up and move dns proxy initialization earlier in the bootstrap. Reuse DNS proxy port from previous run on restart unless overridden by an explicit Cilium agent option These changes allow the DNS proxy start serving requests as soon as the toFQDN policy is received from k8s and avoid any service disruption prevously possible due to endpoints being regenerated before the DNS proxy was started. Signed-off-by: Jarno Rajahalme <jarno@covalent.io> 31 July 2020, 07:17:16 UTC
back to top