https://github.com/cilium/cilium

sort by:
Revision Author Date Message Commit Date
4dce9cb Prepare for release v1.7.6 Signed-off-by: Joe Stringer <joe@cilium.io> 02 July 2020, 21:55:06 UTC
55893b8 Dockerfile: Bump cilium-runtime image Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Joe Stringer <joe@cilium.io> 02 July 2020, 21:41:42 UTC
761c958 contrib: Anchor bpftool to stable 5.7.x tree Generate bpftool binaries from a known stable release rather than the latest development tree. Signed-off-by: Joe Stringer <joe@cilium.io> 02 July 2020, 21:41:42 UTC
b49dd74 bpf: Use same file as Golang side instead of nproc [ upstream commit 1bd46240479805818370daadd342cf5c8aa271f6 ] On Golang side we get the number of CPUs from /sys/devices/system/cpu/possible. On BPF side, we use $(nproc -all). nproc calls num_processors() from the gnulib. That function, however, may not always return the value from the /sys file above. Instead, we should use the exact same source as Golang side to ensure both sides have the same value and avoid issues later on. See #12070 for details. The __NR_CPUS__ values in test/bpf/Makefile and bpf/Makefile.bpf do not need to be in sync. with Golang values because these files are only used for unit tests, sparse, and compile testing. Fixes: 8191b16 ("bpf: Use `nproc --all` for __NR_CPUS__") Signed-off-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> 02 July 2020, 02:34:12 UTC
6247cc5 bpf: Use `nproc --all` for __NR_CPUS__ [ upstream commit 8191b16178231e3e6070cc410b6cd89805bd18a6 ] This uses `-D__NR_CPUS__=$(nproc --all)` (or `GetNumPossibleCPUs` when invoked from Go) to compile the datapath. This fixes an issue where cilium monitor fails to report any events on AKS, due to the `perf_event_array` map duplicates being created with different max_entries sizes, presumably causing the datapath to write to the first one, while the agent is reading from the second one. This bug occurs for example on AKS due to the present/possible cpuset on the VMs. The default Standard_D2s_v3 node size has 2 present CPUs, but 128 possible CPUs in /sys/devices/system/cpu. Fixes: #12070 Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> 02 July 2020, 02:34:12 UTC
c93e4b2 maps/metricsmap, common: Move getNumPossibleCPUs to common.util [ upstream commit 62b1c86e5ffc09eac461a139add4d8a1d6565951 ] getNumPossibleCPUs is needed for all per-cpu map and perf event arrays. Signed-off-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> 02 July 2020, 02:34:12 UTC
564cffc doc: revamp kata containers getting started guide This commit updates the getting started guide for kata containers in the following ways: - Remove all custom instructions that were likely copied over from external sources, namely the official Kata Containers, CRI-O and containerd guides. These turned out to be outdated for the most part. Instead, this guide now points the reader to the official guides from the Kata Containers documentation to setup Kata Containers and a Kubernetes cluster. - By removing custom instructions and linking to the official Kata Containers documentation, this guide is now also more generic in that it should work for any platform that supports the Kata Containers runtime instead of being specific to Google Compute Engine (GCE). - This guide now being generic, rename it, including the file name, to just kata instead of kata-gce. - Include `k8s-install-download-release.rst` instead of duplicating the instructions. - Add a note that this guide has only been validated using instructions for GCE. Signed-off-by: Robin Hahling <robin.hahling@gw-computing.net> 01 July 2020, 21:45:38 UTC
8f51abc pkg/k8s: add validation schema for MatchLabels [ upstream commit f5b188772dcf571c6837b542cc463b3f98d12bd4 ] From now on, the validation schema for MatchLabels will only allow at maximum of 63 characters with the regex '^(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])?$' similar to what is used in k8s structures: "Valid label values must be 63 characters or less and must be empty or begin and end with an alphanumeric character ([a-z0-9A-Z]) with dashes (-), underscores (_), dots (.), and alphanumerics between." Ref: https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/ This fixes an issue where a user could create a badly defined CNP which wouldn't have a map[string]string in the matchLabels field. This CNP would then be accepted by the kube-apiserver and make Cilium error out and possibly crashing. Unfortunately not all k8s version support this fix and we can only backport it to Cilium versions that have a minimum support for k8s >= 1.11. Signed-off-by: André Martins <andre@cilium.io> 01 July 2020, 13:46:18 UTC
05ea996 cilium/cmd: use right CCNP validation [ upstream commit 35501c724ca6b354bf39e83a8a4a96567e6a4301 ] The CCNP validation is different from the CNP validation so we need to validate the CCNP with the right schema validation. Fixes: 9b0ae85b5fd3 ("k8s: Fix CCNP for host policies") Signed-off-by: André Martins <andre@cilium.io> 01 July 2020, 13:46:18 UTC
dd82d42 pkg/k8s: replace init with specific functions [ upstream commit ae19a9d402dc7d243d99e4b80b00f240664a77a4 ] Having a init function to initialize all structures does not initialize the different fields of 'CNPCRV' in case this variable is accessed outside the 'v2/client' package. Replacing the 'init' function with dedicated functions that initialize those fields allows 'CNPCRV' to have the fields rightfully initialized. Signed-off-by: André Martins <andre@cilium.io> 01 July 2020, 13:46:18 UTC
9945ca5 test: set nativeRoutingCIDR [ upstream commit fc94aa1cb45320b5a81edc5b11b87aa89c43ef03 ] As we are currently running our CI with a CIDR from the Cilium-Operator, which is "10.0.0.0/16", we should set it as part of our 'nativeRoutingCIDR'. Fixes: ace902d42715 ("helm: Enable BPF masquerading by default") Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Chris Tarazi <chris@isovalent.com> 01 July 2020, 13:46:18 UTC
d7054d3 daemon: add native-routing-cidr as part of the daemon flags [ upstream commit 658f9db20852c74e80c2f33f86d6f70ee38d402f ] Fixes: c496e25635c2 ("eni: Support masquerading") Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Chris Tarazi <chris@isovalent.com> 01 July 2020, 13:46:18 UTC
cf319a1 pkg/option: return error in case validation of IPv4NativeRoutingCIDR fail [ upstream commit 03c39a01bb87ca2d6982a2e55f794cf08ba1cbd1 ] Fixes: e7d4f5c6af7d ("daemon: validate IPv4NativeRoutingCIDR value in DaemonConfig") Signed-off-by: André Martins <andre@cilium.io> 01 July 2020, 13:46:18 UTC
c020865 option: Require native-routing-cidr only if IPv4 is enabled [ upstream commit 93d32ddf211a157f33a73731f2ab02ae8c69a721 ] Otherwise, when running with IPv6-only the agent fails with the following: level=fatal msg="Error while creating daemon" error="invalid daemon configuration: native routing cidr must be configured with option --native-routing-cidr in combination with --masquerade --tunnel=disabled --ipam=hostscope-legacy" subsys=daemon Also, we currently do not masquerade IPv6. Fixes: e7d4f5c6af ("daemon: validate IPv4NativeRoutingCIDR value in DaemonConfig") Signed-off-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: André Martins <andre@cilium.io> 01 July 2020, 13:46:18 UTC
3620ab8 daemon: validate IPv4NativeRoutingCIDR value in DaemonConfig [ upstream commit e7d4f5c6af7d7bb6be8630c8d36c6595022d828f ] Signed-off-by: Rene Zbinden <rene.zbinden@postfinance.ch> Signed-off-by: André Martins <andre@cilium.io> 01 July 2020, 13:46:18 UTC
df8412b helm: add possibility to configure native-routing-cidr [ upstream commit 606736c36652aba228b96809b99f805a0f5f9c8b ] Signed-off-by: Rene Zbinden <rene.zbinden@postfinance.ch> Signed-off-by: André Martins <andre@cilium.io> 01 July 2020, 13:46:18 UTC
f797d80 istio: Update to 1.5.7 [ upstream commit 31f8ba02339f7cfb0a3018354a59f9f129b7e6f3 ] Signed-off-by: Jarno Rajahalme <jarno@covalent.io> 01 July 2020, 11:25:54 UTC
5dda066 test: Download correct cilium-istioctl for the executing OS. [ upstream commit 12ef7d191c96eeb65916675ffd1eaffbd4031edc ] Only use the Ginkgo runtime OS for determining which cilium-istioctl binary to download is the command executor is local, otherwise default to "linux". This supports Ginkgo running in OSX both with local and SSH Executors. Fixes: #11905 Signed-off-by: Jarno Rajahalme <jarno@covalent.io> 01 July 2020, 11:25:54 UTC
d544205 test: Remove ginkgo linux dependency [ upstream commit 015ffbde0a66fe20fbf6028d05231e356804b7c1 ] Remove ginkgo test source dependency on pkg/iptables, as that only compiles on Linux. This allows running ginkgo from OSX against GKE, for example. Signed-off-by: Jarno Rajahalme <jarno@covalent.io> 01 July 2020, 11:25:54 UTC
b0025ed test: Add retries to curl command [ upstream commit 47f8d321da248699d14932e43ed7407e620e4194 ] The curl is reaching out to a world / external resource so retrying is acceptable, and helps test flakes. Fixes: https://github.com/cilium/cilium/pull/11797 Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Jarno Rajahalme <jarno@covalent.io> 01 July 2020, 11:25:54 UTC
cc60b5a test: Skip Istio test if Ginkgo runs on unsupported runtime. [ upstream commit 471fe63bc855eb96385628b7968b470fa19de513 ] Skip Istio test if running cilium-istioctl is not supported for the current Go runtime. Support running Istio test from OSX by downloading the osx version of cilium-istioctl if the test suite is running in OSX. This allows running the Istio test on a remote cluster (e.g., GKE) when Ginkgo is running on OSX. On Windows the test is skipped, even though the cilium-istioctl binary is released also for Windows, but this has not been tested yet. Signed-off-by: Jarno Rajahalme <jarno@covalent.io> 01 July 2020, 11:25:54 UTC
3c048c3 envoy: Update to 1.13.3 [ upstream commit b796665e077a6fc2ad9a2fe53bb36f79a0057240 ] This fixes the following CVEs for the Envoy version 1.13.x: - CVE-2020-12603 (CVSS score 7.0, High): Envoy through 1.14.2 may consume excessive amounts of memory when proxying HTTP/2 requests or responses with many small (e.g., 1 byte) data frames. - CVE-2020-12605 (CVSS score 7.0, High): Envoy through 1.14.2 may consume excessive amounts of memory when processing HTTP/1.1 headers with long field names or requests with long URLs. - CVE-2020-8663 (CVSS score 7.0, High): Envoy version 1.14.2 or earlier may exhaust file descriptors and/or memory when accepting too many connections. - CVE-2020-12604 (CVSS score 5.3, Medium): Envoy through 1.14.2 is susceptible to increased memory usage in the case where an HTTP/2 client requests a large payload but does not send enough window updates to consume the entire stream and does not reset the stream. The attacker can cause data associated with many streams to be buffered forever. Signed-off-by: Jarno Rajahalme <jarno@covalent.io> Signed-off-by: Chris Tarazi <chris@isovalent.com> 01 July 2020, 07:07:18 UTC
768179e endpoint: Use kvstore timeout for undo [ upstream commit 8bb5382fd916c534bca22469413edf817d0178d3 ] When there's some kind of late error / failure and a newly allocated identity must be released, allow the kvstore connectivity timeout to be customised via the standard kvstore connectivity timeout. This path may still be called from endpoint create, so it's not appropriate to block for up to two minutes to attempt to roll back the identity allocation here. Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Chris Tarazi <chris@isovalent.com> 01 July 2020, 07:07:18 UTC
c60f2ec endpoint: Inherit context during identity allocation [ upstream commit fa8857fbd09f4c2045bf96fc700927a82c3d1100 ] Inherit the identity allocation context from the parent function when calling into identityLabelsChanged(). This function isn't a background thread, and it receives a context so it should respect the passed context. Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Chris Tarazi <chris@isovalent.com> 01 July 2020, 07:07:18 UTC
abd782e iptables: Remove '--nowildcard' from socket match [ upstream commit ca767ee2e1fff301c99ee7f63a190e3367d726cb ] '--no-wildcard' allows the socket match to find zero-bound (listening) sockets, which we do not want, as this may intercept (reply) traffic intended for other nodes when an ephemeral source port number allocated in one node happens to be the same as the allocated proxy port number in 'this' node (the node doing the iptables socket match changed here). Fixes: #12241 Related: #8864 Signed-off-by: Jarno Rajahalme <jarno@covalent.io> Signed-off-by: Chris Tarazi <chris@isovalent.com> 30 June 2020, 23:06:29 UTC
03300f3 make: fix LOCKDEBUG env variable reference for docker-plugin-image [ upstream commit 8cf1e203b826a0a28243e83631dd7c37e46179e0 ] This commit fixes a typo that prevents LOCKDEBUG from working for the `docker-plugin-image` make target. Signed-off-by: Robin Hahling <robin.hahling@gw-computing.net> Signed-off-by: Chris Tarazi <chris@isovalent.com> 30 June 2020, 23:06:29 UTC
1a6daa7 fqdn/dnsproxy/proxy_test: increase timeout for DNS TCP client exchanges [ upstream commit c93459e984c408e3f991d07655391f8adfaef7ce ] Under heavy load, the round-trip-time (RTT) for DNS requests between a TCP client and a DNS proxy may exceed the 100ms timeout specified when creating the client in the dnsproxy tests. This was observed on the test-PR #12298, with a RTT value going up to 296ms (under exceptional memory strain). This might be the cause for the rare flakes reported in #12042. Let's increase this timeout. The timeout is only used a couple of times in the tests, so increasing it by a few hundred milliseconds would have no visible impact. And because we expect all requests from the TCP client to succeed on the L4 anyway (i.e. it should never time out in our tests), this should not prolong at all the execution of tests in the normal case. Let's also retrieve and print the RTT value for that request in case of error, to get more info if this change were not enough to fix the flake. Hopefully fixes: #12042 Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Chris Tarazi <chris@isovalent.com> 30 June 2020, 23:06:29 UTC
447e75a policy/api: add ability to select reserved:health entity from CCNP [ upstream commit 839030549275119406c9f9f859a3bd90afa1d138 ] With the introduction of CiliumClusterwideNetworkPolicies a policy has the ability to select the 'reserved:health' endpoints. If a user applies a policy such as [0], it will block all connectivity for the health endpoints. As this does not seem expected, since Cilium will report that the connectivity across Cilium pods is not working, the user will need to deploy a CCNP that allows connectivity between health endpoints. This can be done with [1]. This commit introduces 'reserved:health' as an entity that can be selected by CNP and / or CCNP. [0] ``` Signed-off-by: Chris Tarazi <chris@isovalent.com> 30 June 2020, 23:06:29 UTC
5240ef9 pkg/endpoint: wait for global security identities on restore [ upstream commit 54d2175113a57e9506d024098ec7806eafa6525e ] If the KVStore connectivity is not reliable during the endpoint restore process Cilium can end up with an endpoint in a 'restoring' state in case the global security identities sync would fail or time out. Adding a controller will make sure Cilium will wait until the global security identities are synced or until the endpoint is removed before restoring the endpoint. Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Chris Tarazi <chris@isovalent.com> 30 June 2020, 23:06:29 UTC
9037430 pkg/endpoint: wait for security identity on restore [ upstream commit 8e1a7fa112e46bb507e3610c39f9c5b941f71241 ] If the KVStore connectivity is not reliable during the endpoint restore process Cilium can end up with an endpoint in a 'restoring' state in case the ep's security identity resolution fails. Adding a controller will make sure Cilium will retry to get an identity for that endpoint until the endpoint is removed or the connectivity with the allocator is successful. Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Chris Tarazi <chris@isovalent.com> 30 June 2020, 23:06:29 UTC
d0823ea kvstore: Align LockPath() timeouts with session [ upstream commit b99c7b82bc692e53c79806e2c81ed0fe1bd01ab9 ] When renewSession() attempts to re-establish a session with an etcd peer, it may block for some period of time, currently 10s. At the moment, this timeout is added to the LockPath() timeout because the LockPath() context timeout is created *after* acquiring the lock. Move the context creation before the lock acquisition to allow these timers to overlap to provide more accurate bounding on the timeouts here. Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Chris Tarazi <chris@isovalent.com> 30 June 2020, 23:06:29 UTC
a56fc31 kvstore: Add session renew backoff [ upstream commit 8524fca879be3bd31a7f2c7a4a351d737477d0af ] A user reported an issue where Cilium would become unable to provision new endpoints, with the following messages recurring in the agent logs: [T=0s] ... level=info msg="New endpoint" [T=0s] ... level=info msg="Resolving identity labels (blocking) [T=43s]... level=error msg="Forcefully unlocking local kvstore lock" [T=89s]... level=info msg="Removed endpoint" Upon further investigation of "gops stack" output from a sysdump in the environment, the following stack trace was observed on Cilium 1.6.8: goroutine 589 [select, 14 minutes]: github.com/cilium/cilium/vendor/google.golang.org/grpc/internal/transport.(*Stream).waitOnHeader(0xc00214d500, 0x8, 0xc000740d10) /go/src/github.com/cilium/cilium/vendor/google.golang.org/grpc/internal/transport/transport.go:245 +0xcc github.com/cilium/cilium/vendor/google.golang.org/grpc/internal/transport.(*Stream).RecvCompress(...) /go/src/github.com/cilium/cilium/vendor/google.golang.org/grpc/internal/transport/transport.go:256 github.com/cilium/cilium/vendor/google.golang.org/grpc.(*csAttempt).recvMsg(0xc000740d10, 0x32ec340, 0xc001caf890, 0x0, 0xc001e52650, 0x0) /go/src/github.com/cilium/cilium/vendor/google.golang.org/grpc/stream.go:850 +0x70a github.com/cilium/cilium/vendor/google.golang.org/grpc.(*clientStream).RecvMsg.func1(0xc000740d10, 0x0, 0x0) /go/src/github.com/cilium/cilium/vendor/google.golang.org/grpc/stream.go:715 +0x46 ... github.com/cilium/cilium/vendor/go.etcd.io/etcd/etcdserver/etcdserverpb.(*leaseClient).LeaseGrant(0xc00042a238, 0x38e6ae0, 0xc00005cb40, 0xc00150ad20, 0xc001fbaea0, 0x4 /go/src/github.com/cilium/cilium/vendor/go.etcd.io/etcd/etcdserver/etcdserverpb/rpc.pb.go:3792 +0xd3 github.com/cilium/cilium/vendor/go.etcd.io/etcd/clientv3.(*retryLeaseClient).LeaseGrant(0xc000ec0510, 0x38e6ae0, 0xc00005cb40, 0xc00150ad20, 0x53bcba0, 0x3, 0x3, 0x1319 /go/src/github.com/cilium/cilium/vendor/go.etcd.io/etcd/clientv3/retry.go:144 +0xeb github.com/cilium/cilium/vendor/go.etcd.io/etcd/clientv3.(*lessor).Grant(0xc0006a8640, 0x38e6ae0, 0xc00005cb40, 0x19, 0x0, 0xc000abd680, 0xc000abd708) /go/src/github.com/cilium/cilium/vendor/go.etcd.io/etcd/clientv3/lease.go:216 +0x98 github.com/cilium/cilium/vendor/go.etcd.io/etcd/clientv3/concurrency.NewSession(0xc000c15860, 0xc001e52f50, 0x1, 0x1, 0x0, 0x0, 0x0) /go/src/github.com/cilium/cilium/vendor/go.etcd.io/etcd/clientv3/concurrency/session.go:46 +0x308 github.com/cilium/cilium/pkg/kvstore.(*etcdClient).renewLockSession(0xc000b00790, 0xc0227a56a6, 0x14dffe3633def) /go/src/github.com/cilium/cilium/pkg/kvstore/etcd.go:467 +0xde github.com/cilium/cilium/pkg/kvstore.connectEtcdClient.func4(0x38e6ae0, 0xc00005d640, 0x53bd3e0, 0x2) /go/src/github.com/cilium/cilium/pkg/kvstore/etcd.go:583 +0x2a github.com/cilium/cilium/pkg/controller.(*Controller).runController(0xc000c06ff0) /go/src/github.com/cilium/cilium/pkg/controller/controller.go:203 +0x9e1 created by github.com/cilium/cilium/pkg/controller.(*Manager).UpdateController /go/src/github.com/cilium/cilium/pkg/controller/manager.go:112 +0xae0 In particular, it's useful to note: * The goroutine has been blocking for 14 minutes attempting to make progress on re-establishing the etcd session. * This path is hit by a controller responsible for handling etcd session renewal in the event of etcd connectivity issues. It would not be triggered by initial etcd connectivity failures, only when etcd connectivity was successfully established then subsequently lost. * NewSession() attempts to pick one of the configured etcd peers to connect to, and will block until connectivity to that peer is restored. It will not by itself back off and re-attempt connectivity to another node in the etcd cluster. * This path is in the critical section for writelock of `etcdClient.RWMutex` so will block out all etcd client reads. The above is consistent with the agent logs: Effectively, a new endpoint is scheduled and Cilium attempts to allocate an identity for it. This process requires the kvstore and will block on the lock. Given that it makes no progress, first we see the lock liveness check kick to forcefully unlock the path lock (`Forcefully unlocking local kvstore lock`), then later we see cilium-cni / kubelet give up on creating the endpoint and remove it again. This patch fixes the issue by introducing a timeout when attempting to renew the session. Each time `NewSession()` is called, the etcd client library will pick an etcd peer to connect to. The intent behind the timeout is to provide a way for Cilium to detect when it is attempting to connect to an unhealthy peer, and try to reconnect to one of the other peers in the etcd cluster in the hopes that that other peer will be healthy. In the event that there is an etcd connectivity outage where one of three etcd peers is unhealthy, we expect that the remaining two can retain chorum and continue to operate despite the outage of the third peer. In this case if Cilium was attempting to connect to the third (unhealthy) peer, Cilium would previously block indefinitely. With this patch Cilium will time out after statusCheckTimeout (10s) and re-establish a session to the remaining etcd peers, thereby unblocking subsequent endpoint provisioning. Signed-off-by: Joe Stringer <joe@cilium.io> Co-authored-by: André Martins <andre@cilium.io> Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Chris Tarazi <chris@isovalent.com> 30 June 2020, 23:06:29 UTC
a75e143 helm/operator: fix IPv6 liveness probe address for operator [ upstream commit 462e08d9d22fb42ff51ee274788e4e586d3069cb ] In an IPv6 only cluster, the operator host was set to `[::1]` in the Helm chart. This is a problem as it is "bracketed" when concatenated with the port to form an invalid address: `[[::1]]:9234`. In turn, this prevented the liveness probe from working: Warning Unhealthy 10m (x11 over 14m) kubelet, kind-worker2 Liveness probe failed: Get http://[[::1]]:9234/healthz: dial tcp: address [[::1]]:9234: missing port in address This commit fixes this issue by removing the extraneous brackets in the Helm chart of the Cilium operator. Signed-off-by: Robin Hahling <robin.hahling@gw-computing.net> Signed-off-by: Chris Tarazi <chris@isovalent.com> 30 June 2020, 23:06:29 UTC
187622a istio: Update to 1.5.6 [ upstream commit 640f66917fa9657513790cb48d3ec2d330de9295 ] Signed-off-by: Jarno Rajahalme <jarno@covalent.io> Signed-off-by: Chris Tarazi <chris@isovalent.com> 30 June 2020, 23:06:29 UTC
70162e3 Fix syslog hook missing in DefaultLogger [ upstream commit 95c74cac37dad12a54561103a9896c11625748ed ] When specify cilium-agent to use syslog driver, all logs will still be output to `stdout`. This is because each module uses the `DefaultLogger` (cilium's customized logger, not logrus's default logger), while syslog hook is not added to this logger. This patch fixes this problem by adding syslog hook to `DefaultLogger`. Signed-off-by: arthurchiao <arthurchiao@hotmail.com> Signed-off-by: Chris Tarazi <chris@isovalent.com> 30 June 2020, 23:06:29 UTC
47d5aff docs: Fix GKE install native routing CIDR option. [ upstream commit 947a58671ffba42fbaf4a284de9484e4fd536b66 ] Helm name for the native routing CIDR is "global.nativeRoutingCIDR". Cilium agent command line option name is "native-routing-cidr". Signed-off-by: Jarno Rajahalme <jarno@covalent.io> Signed-off-by: Chris Tarazi <chris@isovalent.com> 30 June 2020, 23:06:29 UTC
212435a pkg/option: disable K8sEventHandover if Cilium is running without KVStore [ upstream commit 0ee0458af776f770aa1e03a82419e96242ed3d7d ] Running Cilium with K8sEventHandover and without a KVStore configured causes it on not being able to delete any CNP after being created. Fixes: e4e83e80128a ("daemon: Allow kvstore to be unconfigured") Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Chris Tarazi <chris@isovalent.com> 30 June 2020, 23:06:29 UTC
b17eb29 contrib: Fixup k8s-cilium-exec.sh script [ upstream commit 819185568eb5cf653052043c5299d1224460368b ] This commit adds a few misc. changes such as defining a new helper function and using double-quotes around variables for consistency. Signed-off-by: Chris Tarazi <chris@isovalent.com> 30 June 2020, 23:06:29 UTC
10d8d8e contrib: Fix passing multiple args to script [ upstream commit 1f8cc15adeab17a17b7910370aa15ce9c5b8909f ] This commit makes it possible to do the following: ``` $ ./contrib/k8s/k8s-cilium-exec.sh bash -c "cilum status && hostname && echo" ``` Signed-off-by: Chris Tarazi <chris@isovalent.com> 30 June 2020, 23:06:29 UTC
593f675 doc: Add note not encrypting within the node [ upstream commit dc4f5499ca98bb3006b2bbeb9930a5f255a781dd ] Signed-off-by: Chris Tarazi <chris@isovalent.com> 30 June 2020, 23:06:29 UTC
9034c62 doc: Add misc. improvements to encryption guide [ upstream commit 3f963925828852c71e55521e18dc13737a82c9c9 ] Signed-off-by: Chris Tarazi <chris@isovalent.com> 30 June 2020, 23:06:29 UTC
18f9121 contrib: Add K8S_NAMESPACE to control namespace [ upstream commit d1f2b481f12112e9f7155ceebfdd4be085cba0cf ] This updates the script to allow the user to specify which namespace Cilium is deployed in. Signed-off-by: Chris Tarazi <chris@isovalent.com> 30 June 2020, 23:06:29 UTC
109672e Update xargs usage in restart-pods documentation [ upstream commit b330436f0b105c10d906d4f34977c358d7e148d1 ] Restart command in documentation pipes complete output of the awk into xargs which make creates invalid kubectl command. This commit updates xargs to use per line mode and also makes it omit empty input to not execute empty "kubectl delete pod" on the last line. Signed-off-by: Arthur Evstifeev <aevstifeev@gitlab.com> Signed-off-by: Chris Tarazi <chris@isovalent.com> 30 June 2020, 23:06:29 UTC
f169bf8 Fix setting monitorAggregationLevel to max reflects via CLI [ upstream commit 52b44d13ed7d0572e8bc8122256e72b28dbf218d ] This PR fixes setting MonitorAggregationLevel to max to correctly reflect through CLI. Fixes: #12000 Signed-off-by: Swaminathan Vasudevan <svasudevan@suse.com> Signed-off-by: Chris Tarazi <chris@isovalent.com> 30 June 2020, 23:06:29 UTC
05032eb ginkgo-ext: Fix data-race in Writer [ upstream commit df114a6f87b8aea4ddcd74a8f118953f249fce7b ] `ginkgoext.Writer` is used as the output buffer behind `ginkgoext.GinkgoPrint`. `GinkgoPrint` can be called from multiple go routines concurrently. For example, `kubectl.validateCilium` spawns `cilium{Status,Controllers,Health}PreFlightCheck` in three separate go routines, all of which call `ginkgoext.By`, which calls `ginkgoext.GinkgoPrint` which calls `ginkgoext.Writer.Write`. This hopefully fixes a spurious panic which observed in https://jenkins.cilium.io/job/Cilium-PR-K8s-newest-kernel-4.9/719, as the "slice bounds out of range" panic seems to be a common failure when `bytes.Buffer.Write` is called unsychronized from different go routines. ``` 09:55:43 STEP: Performing Cilium controllers preflight check panic: runtime error: slice bounds out of range [532609:532604] goroutine 871 [running]: bytes.(*Buffer).Write(0x2f09ca0, 0xc0006f4000, 0x2e, 0x4000, 0x2e, 0x0, 0x0) /usr/local/go/src/bytes/buffer.go:174 +0x101 github.com/cilium/cilium/test/ginkgo-ext.(*Writer).Write(0xc000373b00, 0xc0006f4000, 0x2e, 0x4000, 0x2eca680, 0xc000210c58, 0x40be5b) /home/jenkins/workspace/Cilium-PR-K8s-newest-kernel-4.9/k8s-1.18-gopath/src/github.com/cilium/cilium/test/ginkgo-ext/writer.go:46 +0xb2 fmt.Fprintln(0x1f3aaa0, 0xc000373b00, 0xc000210ca8, 0x1, 0x1, 0xc000221b90, 0x2d, 0x2d) /usr/local/go/src/fmt/print.go:265 +0x8b github.com/cilium/cilium/test/ginkgo-ext.GinkgoPrint(0xc000221b90, 0x2d, 0x0, 0x0, 0x0) /home/jenkins/workspace/Cilium-PR-K8s-newest-kernel-4.9/k8s-1.18-gopath/src/github.com/cilium/cilium/test/ginkgo-ext/scopes.go:178 +0xa7 github.com/cilium/cilium/test/ginkgo-ext.By(0x1c9e224, 0x1e, 0x0, 0x0, 0x0) /home/jenkins/workspace/Cilium-PR-K8s-newest-kernel-4.9/k8s-1.18-gopath/src/github.com/cilium/cilium/test/ginkgo-ext/scopes.go:170 +0x12f github.com/cilium/cilium/test/helpers.(*Kubectl).ciliumHealthPreFlightCheck(0xc00099fa40, 0x1f392a0, 0xc000652120) /home/jenkins/workspace/Cilium-PR-K8s-newest-kernel-4.9/k8s-1.18-gopath/src/github.com/cilium/cilium/test/helpers/kubectl.go:3496 +0x7a github.com/cilium/cilium/test/helpers.(*Kubectl).validateCilium.func3(0xc0000645a0, 0xc000206470) /home/jenkins/workspace/Cilium-PR-K8s-newest-kernel-4.9/k8s-1.18-gopath/src/github.com/cilium/cilium/test/helpers/kubectl.go:3386 +0x2e golang.org/x/sync/errgroup.(*Group).Go.func1(0xc00074cd80, 0xc000796bc0) /home/jenkins/workspace/Cilium-PR-K8s-newest-kernel-4.9/k8s-1.18-gopath/src/github.com/cilium/cilium/vendor/golang.org/x/sync/errgroup/errgroup.go:57 +0x59 created by golang.org/x/sync/errgroup.(*Group).Go /home/jenkins/workspace/Cilium-PR-K8s-newest-kernel-4.9/k8s-1.18-gopath/src/github.com/cilium/cilium/vendor/golang.org/x/sync/errgroup/errgroup.go:54 +0x66 Ginkgo ran 1 suite in 3m21.079336902s Test Suite Failed ``` Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> Signed-off-by: Chris Tarazi <chris@isovalent.com> 30 June 2020, 23:06:29 UTC
d9afed2 fqdn: Fix panic on MarshalJSON [ upstream commit 486bedff7375892f07d9c9a5308e522bf30d51e4 ] We need to hold a lock on DNSZombieMappings when calling json.Marshal(), otherwise we risk hitting panics on concurrent writes. panic: reflect: call of reflect.Value.IsNil on zero Value [recovered] goroutine 2807 [running]: ... panic(...) /usr/local/go/src/runtime/panic.go:679 +0x1b2 reflect.Value.IsNil(...) /usr/local/go/src/reflect/value.go:1073 ... github.com/cilium/cilium/pkg/fqdn.(*DNSZombieMappings).MarshalJSON(...) /go/src/github.com/cilium/cilium/pkg/fqdn/cache.go:949 +0x3b Fixes: f629372 ("fqdn: Add and use DNSZombieMappings in Endpoint") Signed-off-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Tobias Klauser <tklauser@distanz.ch> 22 June 2020, 17:21:28 UTC
c89f032 k8s: update k8s libraries to v1.17.7 Also update k8s test versions to v1.16.11 and v1.17.7 Signed-off-by: André Martins <andre@cilium.io> 22 June 2020, 12:01:52 UTC
0343402 logo: change SVG file used for the logo [ upstream commit 8a02976798b02f2b4c9ab4f7c50504ec54b47c97 ] Replace the SVG for Cilium's logo by a version picked up from [0]. The latter has a few differences: - Wider space between logo (hexagons) and text - Proportionally bigger text - Darker text - No white borders inside some hexagons (SVG artifacts) It also corresponds to the logo found on https://cilium.io (although that one is a .png). The logo is used in the top-left corner of the documentation, and at the top of the rendered README.rst on GitHub. [0]: https://github.com/cilium/sphinx_rtd_theme/blob/master/docs/demo/static/cilium-logo.svg Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Joe Stringer <joe@cilium.io> 18 June 2020, 00:25:16 UTC
6b7ae2e docs: point cilium docs into a stable version of sphinx theme [ upstream commit 4ea9b4f82a4761345e612d37945c86e88494dfce ] [ Backporter's notes: Minor conflict with requirements ] - point to v0.6 version of the sphinx theme Signed-off-by: Sergey Generalov <sergey@genbit.ru> Signed-off-by: Joe Stringer <joe@cilium.io> 18 June 2020, 00:25:16 UTC
d56ccc6 docs: introduce white cilium docs version [ upstream commit 1319eff554ab4b7ec6320fe2d8f93d1fa92a0af5 ] [ Backporter's notes: Minor rebase issue with requirements detection, Fixed by extending Makefile requirements check ] - use forked version of sphinx_rtd_theme - removes _themes/sphinx_rtd_theme - use white colors to be inline with cilium.io design - use cilium.io navigation for top menu Signed-off-by: Sergey Generalov <sergey@genbit.ru> Signed-off-by: Joe Stringer <joe@cilium.io> 18 June 2020, 00:25:16 UTC
0d318dc cilium, encryption: for tunnel'ed packets push encryption packets to stack [ upstream commit 984ce4986b72137624af52bd3382f7333cf9140b ] Avoid adding extra marks and labels to ip6 packets which can result in dropped packets in ip6 case. Signed-off-by: John Fastabend <john.fastabend@gmail.com> 15 June 2020, 15:09:11 UTC
f524ca0 Prepare for release v1.7.5 Signed-off-by: André Martins <andre@cilium.io> 12 June 2020, 12:55:38 UTC
27ddfb2 contrib/backporting: remove requires-janitor-review label [ upstream commit 477c4872cb4af11156df600d3092b7bd6aaf5a1d ] The backport PRs shouldn't add the label requires-janitor-review since the review is automatically requested by the CODEOWNERS file. Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Joe Stringer <joe@cilium.io> 12 June 2020, 09:52:30 UTC
58aeaca iptables: add transient rules removal failure to badLogMessages [ upstream commit e78405a087c5884184330a56fa4861ca1c8e630e ] [ Backporter's notes: Minor conflicts against PR #10639 and CI failure log messages. ] In an attempt to catch #11276 in CI, let's add any message related to a failure to flush or delete the chain related to iptables transient rules to the list of badLogMessages we want to catch. We need to filter on the name of the chain for transient rules to avoid false positives, which requires exporting that name. We also need to modify the log message error, to avoid adding four disting logs to the list (combinations for iptables/ip6tables, flush/delete). Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Joe Stringer <joe@cilium.io> 12 June 2020, 09:52:30 UTC
d4d11e2 iptables: log unexpected failures to delete transient rules [ upstream commit 052fa31faea2ed1792762aab7edf3e47ead2929c ] When Cilium reinitialises its daemon, transient iptables rules are set up to avoid dropping packets while the regular rules are not in place. On rare occasions, setting up those transient rules has been found to fail, for an unknown reason (see issue #11276). The error message states that the "Chain already exists", even though we try to flush and remove any leftover from previous transient rules before adding the new ones. It sounds likely that removing the leftovers is failing, but we were not able to understand why, because we quieten the function to avoid spurious warnings the first time we try to remove them (since none is existing). It would be helpful to get more information to understand what happens in those rare occasions where setting up transient rules fails. Let's find a way to get more logs, without making too much noise. We cannot warn unconditionally in remove() since we want removal in the normal case to remain quiet. What we can do is logging when the "quiet" flag is not passed, _or_ when the error is different from the chain being not found, i.e. when the error is different from the one we want to silence on start up. This means matching on the error message returned by ip(6)tables. It looks fragile, but at least this message has not changed since 2009, so it should be relatively stable and pretty much the same on all supported systems. Since remove() is used for chains other than for transient rules too, we also match on chain name to make sure we are dealing with transient rules if ignoring the "quiet" flag. This additional logging could be removed once we reproduce and fix the issue. Alternative approaches could be: - Uncoupling the remove() function for transient rules and regular rules, to avoid matching on chain name (but it sounds worse). - Logging on failure for all rules even when the "quiet" flag is passed, but on "info" level instead of "warning". This would still require a modified version of runProg(), with also a modified version of CombinedOutput() in package "exec". Here I chose to limit the number of logs and keep the changes local. - Listing the chain first before trying to remove it, so we only try to remove if it exists, but this would likely add unnecessary complexity and latency. Should help with (but does not solve): #11276 Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Joe Stringer <joe@cilium.io> 12 June 2020, 09:52:30 UTC
5f16e27 datapath/loader: refactor function to remove iptables chain [ upstream commit 66aa41dacbec28e2411e5b9b385fcfac16f62ec1 ] We do the same thing for IPv4 and IPv6, with just the name of the program changing. Then we do nearly the same thing for flushing and deleting a chain. Let's refactor (no functional change). Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Joe Stringer <joe@cilium.io> 12 June 2020, 09:52:30 UTC
e47b1d7 datapath/loader: carry on on failure to set up transient iptables rules [ upstream commit b2c73e75cf4b1e42e39d5a4edc138faead740677 ] When Reinitialize()-ing the datapath, transient iptables rules are set up to avoid dropping packets while Cilium's rules are not in place. In rare occasions, a failure to add those rules has been observed (see issue #11276), leading to an early exit from Reinitialize() and a failure to set up the daemon. But those transient rules are just used to lend a hand and keep packets going for a very small window of time: it does not actually matter much if we fail to install them, and it should not stop the reinitializing of the daemon. Let's simply log a warning and carry on if we fail to add those rules. Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Joe Stringer <joe@cilium.io> 12 June 2020, 09:52:30 UTC
9614ba1 envoy: Include detail in NACK warning [ upstream commit 2ed885141ff5637cbd0bcb0272f9638330a1f7c8 ] Include Envoy-supplied error detail in NACK warning logs to facilitate debugging. Signed-off-by: Jarno Rajahalme <jarno@covalent.io> Signed-off-by: Joe Stringer <joe@cilium.io> 12 June 2020, 09:52:30 UTC
7600599 cilium: encryption fixes when fib helper is not supported [ upstream commit 0d89f055806dc83b7d7b0586e5c08cd597a92bdc ] [ Backporter's notes: Minor conflicts with asm version of this code, bpf_netdev -> bpf_host prog rename, bpf sha ] Encryption fixes, when setting the encrypt ctx->mark field we need to skip resetting it with the source identity. Its not needed in the encryption case anywasy because we already checked the endpoint is remote before encoding encryption signal. Next if fib lookup is not available we will discover the route at init time and encode it in the ENCRYPT_IFACE define. If this field is non-zero we should use it. Otherwise in some configurations where there is not a route to egress in the main routing table the packet will be dropped. Fixes: 86db0fde493fc ("cilium: encryption, use fib_lookup to rewrite dmac/smac") Fixes: f25d8b97e908 ("bpf: Preserve source identity for hairpin via stack") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Joe Stringer <joe@cilium.io> 11 June 2020, 18:58:59 UTC
15b9c83 etcd: propagate Context from higher-level calls [ upstream commit 941e3281a8979a035ec3a04d8ee78c6f413275ba ] This makes sure that cancelation signals or timeouts are propagated properly. Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Signed-off-by: Joe Stringer <joe@cilium.io> 11 June 2020, 18:58:59 UTC
bee8baf cilium, test: Only run sockops tests on 4.19 and bpf-next kernels [ upstream commit c9ea0e9d36d35a9f83d7742d435e5d110b43cacc ] [ Backporter's notes: Had to adjust the helper for only net-next since 4.19 is not enabled in v1.7 CI ] Lets avoid pointless runs of sockops tests where the feature is disabled due to kernel limitations. Should save us a couple runs in CI. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Joe Stringer <joe@cilium.io> 11 June 2020, 18:58:59 UTC
b87ff31 pkg/k8s: wait for EndpointSlice to be ready before initializing Cilium [ upstream commit b4627b74c151acbd699a54154325383febdbc549 ] Similar to the v1.Endpoints, in case Cilium is running with EndpointSlice enabled, it should wait for the k8s watchers watching those resources before it starts. Fixes: c3b5ca6fdc40 ("add support for k8s endpoint slice") Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Joe Stringer <joe@cilium.io> 11 June 2020, 18:58:59 UTC
1c8a8b2 pkg/k8s: fix endpoint slice support [ upstream commit 009fb349bd715ff09bbab871c73ee9356187bbd1 ] [ Backporter's notes: Rebased against lack of slim k8s structures ] The endpoint slice implementation did not account that a service could map to multiple endpoint slices. Wrongly assuming that only a single endpoint slice existed for a single service can cause Cilium to fail the translation of a service to a backend in case 2 or more endpoint slices existed. In some rare occurrences where a single endpoint slice was deleted and another one was recreated, for the same service, it could cause Cilium stop doing service translation entirely. The unit test added easily replicates the issue as we can see the test fails in the current master where the 2nd endpoint slice added overwrote the 1st endpoint slice: ``` c.Assert(endpoints.String(), check.Equals, "2.2.2.2:8080/TCP,2.2.2.3:8080/TCP") Expected :string = "2.2.2.2:8080/TCP,2.2.2.3:8080/TCP" Actual :string = "2.2.2.3:8080/TCP" ``` Fixes: c3b5ca6fdc40 ("add support for k8s endpoint slice") Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Joe Stringer <joe@cilium.io> 11 June 2020, 18:58:59 UTC
79b5cc8 service: Clean up HealthCheckNodePort server when traffic policy changes [ upstream commit 21de74dbe67da0c4b0267c47628cdef869c1e1c4 ] [ Backporter's notes: Dropped testsuite changes.] This commit fixes a case where the HealthCheckNodePort service was not removed properly anymore when a LoadBalancer service was changed from externalTrafficPolicy=Local to externalTrafficPolicy=Cluster. The unit tests are extended to capture this behavior. Fixes: f7b037887d6b ("service: Fix wrong localEndpoints count in HealthCheckNodePort") Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> Signed-off-by: Joe Stringer <joe@cilium.io> 11 June 2020, 18:58:59 UTC
f9198b8 service/healthserver: Remove unused variable [ upstream commit 8460ac6627602aa5fc0e7d27421ccf88f928ffc8 ] Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> Signed-off-by: Joe Stringer <joe@cilium.io> 11 June 2020, 18:58:59 UTC
8ef8eac test: Wrap helm inside Eventually clauses [ upstream commit 7d26df19dce3668e896b74e7ecb604abb6d63746 ] This commit is an attempt to add retry logic to Helm operations in the Kubernetes test suite. Signed-off-by: Chris Tarazi <chris@isovalent.com> 10 June 2020, 23:07:38 UTC
479fab0 service: Fix wrong localEndpoints count in HealthCheckNodePort [ upstream commit f7b037887d6b516050c60c01b301b90871342752 ] This fixes an issue with the `HealthCheckNodePort` server where it would non-deterministically sometimes return a non-zero `localEndpoints` count on nodes which do not have local endpoints. Because Cilium internally creates a service object per frontend IP, we end up with multiple services sharing the same name. In the case where a `LoadBalancer` service has `externalTrafficPolicy=Local` with no local backends, Cilium will still create a `ClusterIP` sibling service which retains the non-local backends. In that case, we must take care to not incooperate the `ClusterIP` backends into the `localEndpoints` count intended for external traffic. The final count is dependent on the order in which services are added to the service manager, which explains why the occurence of this bug was non-deterministic. This commit fixes this issue by checking that the service may only contain local backends before its count is added to the `HealthCheckNodePort` server. Fixes: #11043 Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> Signed-off-by: Chris Tarazi <chris@isovalent.com> 10 June 2020, 23:07:38 UTC
c1143d4 Include directions to restart pods in the k3s install guide [ upstream commit 110ecb4be381a560f43662b05f326f8895a6b8a6 ] Fixes: #11821 Signed-off-by: Sean Winn <sean@isovalent.com> Signed-off-by: Chris Tarazi <chris@isovalent.com> 10 June 2020, 23:07:38 UTC
8ef0977 ci: Change vagrant timeout mechanism [ upstream commit 03602e30eca2d5b313e13b1f1c0a6f4e6cf35920 ] Due to bug in jenkins, nesting timeout in retry block causes build to abort. Work around by using shell-based timeout Signed-off-by: Maciej Kwiek <maciej@isovalent.com> Signed-off-by: Chris Tarazi <chris@isovalent.com> 10 June 2020, 23:07:38 UTC
1f6d0e4 Envoy: Update to Envoy 1.13.2 [ upstream commit 65303257bc3fe97bd40f9c3463a00dc22b5fa0f1 ] Update proxy image to build with Envoy 1.13.2. This fixes CVE-2020-11080 by rejecting HTTP/2 SETTINGS frames with too many parameters. Signed-off-by: Jarno Rajahalme <jarno@covalent.io> 10 June 2020, 20:27:48 UTC
6d96f38 daemon: Fix waiting on metrics in endpoint_test.go [ upstream commit 19030a7583844ad358e3ca0ce54c2c6324d07850 ] This commit fixes a flake where the metrics do not update as quickly as we expect. It is fixed by waiting for up to 10 seconds to retrieve the metric value we expect. Fixes: 4a4a59b127 ("daemon: Add assertions for endpoint state metrics") Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Joe Stringer <joe@cilium.io> 10 June 2020, 09:27:27 UTC
022349e ipcache: Fix deadlock when ipcache GC results in datapath reload [ upstream commit 3285cbc53fb41913e6d20fad38667b12961b47ad ] The following deadlock can occur when the ipcache GC relies on map renames and datapath reloads to delete entries in combination with endpoint regenrations triggered by the FQDN proxy which perform ipcache upserts as part of regenerations: Routine 1: The following go routine holds the ipcache mutex while garbage collecting: ``` goroutine 779 [semacquire, 48 minutes]: sync.runtime_SemacquireMutex(0xc0008f4f68, 0xc007663600, 0x0) /usr/local/go/src/runtime/sema.go:71 +0x47 sync.(*RWMutex).Lock(0xc0008f4f60) /usr/local/go/src/sync/rwmutex.go:103 +0x88 github.com/cilium/cilium/pkg/datapath/loader.(*Loader).Reinitialize(0xc0003d70a0, 0x2759d20, 0xc000468bc0, 0x2771e60, 0xc000515680, 0x2329, 0x7f87768eac80, 0xc00022ff10, 0x26faf60, 0xc0004a18b0, ...) /go/src/github.com/cilium/cilium/pkg/datapath/loader/base.go:124 +0xcd main.(*Daemon).TriggerReloadWithoutCompile(0xc000515680, 0x2332316, 0x10, 0x12, 0xc008f75060, 0x16) /go/src/github.com/cilium/cilium/daemon/daemon.go:588 +0x202 github.com/cilium/cilium/pkg/datapath/ipcache.(*BPFListener).garbageCollect(0xc0006f8960, 0x2759d20, 0xc003748000, 0x0, 0x0, 0x0) /go/src/github.com/cilium/cilium/pkg/datapath/ipcache/listener.go:306 +0xadc github.com/cilium/cilium/pkg/datapath/ipcache.(*BPFListener).OnIPIdentityCacheGC.func1(0x2759d20, 0xc003748000, 0x3a5bf00, 0x2) /go/src/github.com/cilium/cilium/pkg/datapath/ipcache/listener.go:340 +0x3e github.com/cilium/cilium/pkg/controller.(*Controller).runController(0xc001e1e000) /go/src/github.com/cilium/cilium/pkg/controller/controller.go:205 +0xa2a created by github.com/cilium/cilium/pkg/controller.(*Manager).updateController /go/src/github.com/cilium/cilium/pkg/controller/manager.go:120 +0xb09 ``` As part of this, Reinitialize() is called which will require the compilation mutex to be acquired. Routine 2 The following ongoing endpoint regeneration is holding the compilation lock and thus blocks Routine 1 from completing. It is itself blocked on the FQDN NameManager mutex. ``` goroutine 7227352 [semacquire, 48 minutes]: sync.runtime_SemacquireMutex(0xc0005c3744, 0xc009bf4c00, 0x1) /usr/local/go/src/runtime/sema.go:71 +0x47 sync.(*Mutex).lockSlow(0xc0005c3740) /usr/local/go/src/sync/mutex.go:138 +0xfc sync.(*Mutex).Lock(...) /usr/local/go/src/sync/mutex.go:81 github.com/cilium/cilium/pkg/fqdn.(*NameManager).Lock(0xc0005c3740) /go/src/github.com/cilium/cilium/pkg/fqdn/name_manager.go:93 +0x47 github.com/cilium/cilium/pkg/policy.(*SelectorCache).AddFQDNSelector(0xc0004691c0, 0x26faf00, 0xc00acd6e70, 0x0, 0x0, 0xc004106590, 0xb, 0x0, 0x0, 0x40ca00) /go/src/github.com/cilium/cilium/pkg/policy/selectorcache.go:580 +0x13f github.com/cilium/cilium/pkg/policy.(*L4Filter).cacheFQDNSelector(...) /go/src/github.com/cilium/cilium/pkg/policy/l4.go:392 github.com/cilium/cilium/pkg/policy.(*L4Filter).cacheFQDNSelectors(0xc00acd6e70, 0xc008c7d100, 0x38, 0x38, 0xc0004691c0) /go/src/github.com/cilium/cilium/pkg/policy/l4.go:387 +0xad github.com/cilium/cilium/pkg/policy.createL4Filter(0x2739660, 0xc001c49680, 0xc006beab60, 0x1, 0x1, 0x0, 0x0, 0x0, 0x0, 0x0, ...) /go/src/github.com/cilium/cilium/pkg/policy/l4.go:489 +0x7ce github.com/cilium/cilium/pkg/policy.createL4EgressFilter(...) /go/src/github.com/cilium/cilium/pkg/policy/l4.go:599 github.com/cilium/cilium/pkg/policy.mergeEgressPortProto(0x2739660, 0xc001c49680, 0xc0052087d0, 0xc006beab60, 0x1, 0x1, 0x0, 0x0, 0x0, 0x0, ...) /go/src/github.com/cilium/cilium/pkg/policy/rule.go:637 +0x187 github.com/cilium/cilium/pkg/policy.mergeEgress(0x2739660, 0xc001c49680, 0xc0052087d0, 0xc006beab60, 0x1, 0x1, 0x0, 0x0, 0x0, 0xc00b91f2c0, ...) /go/src/github.com/cilium/cilium/pkg/policy/rule.go:572 +0xdbd github.com/cilium/cilium/pkg/policy.(*rule).resolveEgressPolicy(0xc0079a0400, 0x2739660, 0xc001c49680, 0xc0052087d0, 0xc005208418, 0xc0090738f0, 0x0, 0x0, 0x0, 0xc0090738f0, ...) /go/src/github.com/cilium/cilium/pkg/policy/rule.go:675 +0x265 github.com/cilium/cilium/pkg/policy.ruleSlice.resolveL4EgressPolicy(0xc008d0a9c0, 0x5, 0x8, 0x2739660, 0xc001c49680, 0xc0052087d0, 0xc008d6c300, 0x0, 0x0) /go/src/github.com/cilium/cilium/pkg/policy/rules.go:106 +0x70b github.com/cilium/cilium/pkg/policy.(*Repository).resolvePolicyLocked(0xc000246460, 0xc0059fe820, 0xc0000066c3, 0xc0030b0960, 0xc006176201) /go/src/github.com/cilium/cilium/pkg/policy/repository.go:665 +0x3f9 github.com/cilium/cilium/pkg/policy.(*PolicyCache).updateSelectorPolicy(0xc000342b20, 0xc0059fe820, 0x3a5bf00, 0x0, 0x0) /go/src/github.com/cilium/cilium/pkg/policy/distillery.go:135 +0x148 github.com/cilium/cilium/pkg/policy.(*PolicyCache).UpdatePolicy(...) /go/src/github.com/cilium/cilium/pkg/policy/distillery.go:170 github.com/cilium/cilium/pkg/endpoint.(*Endpoint).regeneratePolicy(0xc001427080, 0x0, 0x0) /go/src/github.com/cilium/cilium/pkg/endpoint/policy.go:175 +0x255 github.com/cilium/cilium/pkg/endpoint.(*Endpoint).runPreCompilationSteps(0xc001427080, 0xc005d3a500, 0xc008d0a800, 0x0, 0x0) /go/src/github.com/cilium/cilium/pkg/endpoint/bpf.go:736 +0x96a github.com/cilium/cilium/pkg/endpoint.(*Endpoint).regenerateBPF(0xc001427080, 0xc005d3a500, 0x0, 0x0, 0x0, 0x0) /go/src/github.com/cilium/cilium/pkg/endpoint/bpf.go:508 +0x1e0 github.com/cilium/cilium/pkg/endpoint.(*Endpoint).regenerate(0xc001427080, 0xc005d3a500, 0x0, 0x0) /go/src/github.com/cilium/cilium/pkg/endpoint/policy.go:349 +0x841 github.com/cilium/cilium/pkg/endpoint.(*EndpointRegenerationEvent).Handle(0xc003530210, 0xc0068115c0) /go/src/github.com/cilium/cilium/pkg/endpoint/events.go:58 +0x459 github.com/cilium/cilium/pkg/eventqueue.(*EventQueue).Run.func1() /go/src/github.com/cilium/cilium/pkg/eventqueue/eventqueue.go:260 +0x187 sync.(*Once).doSlow(0xc005f60e58, 0xc009bbb330) /usr/local/go/src/sync/once.go:66 +0xe3 sync.(*Once).Do(0xc005f60e58, 0xc009bbb330) /usr/local/go/src/sync/once.go:57 +0x45 created by github.com/cilium/cilium/pkg/eventqueue.(*EventQueue).Run /go/src/github.com/cilium/cilium/pkg/eventqueue/eventqueue.go:248 +0xa9 ``` Routine 3 This third routine is holding the FQDN NameManager mutex while itself waiting on the ipcache mutex which is held by routine 1. ``` goroutine 7229843 [select, 48 minutes]: golang.org/x/sync/semaphore.(*Weighted).Acquire(0xc000464230, 0x2759d60, 0xc000058018, 0x40000000, 0xc0008f9740, 0x0) /go/src/github.com/cilium/cilium/vendor/golang.org/x/sync/semaphore/semaphore.go:60 +0x28e github.com/cilium/cilium/pkg/lock.(*SemaphoredMutex).Lock(...) /go/src/github.com/cilium/cilium/pkg/lock/semaphored_mutex.go:41 github.com/cilium/cilium/pkg/ipcache.(*IPCache).Upsert(0xc000464280, 0xc008858dd0, 0xd, 0x0, 0x0, 0x0, 0x0, 0x0, 0x1000596, 0x231acf1, ...) /go/src/github.com/cilium/cilium/pkg/ipcache/ipcache.go:184 +0x93 github.com/cilium/cilium/pkg/ipcache.allocateCIDRs(0xc004bb2a00, 0x98, 0x98, 0x0, 0x0, 0x0, 0x0, 0x0) /go/src/github.com/cilium/cilium/pkg/ipcache/cidr.go:97 +0x7aa github.com/cilium/cilium/pkg/ipcache.AllocateCIDRsForIPs(0xc0046a2000, 0x98, 0x200, 0x1, 0x1, 0x0, 0x1f4a440, 0xc00a768900) /go/src/github.com/cilium/cilium/pkg/ipcache/cidr.go:52 +0x65 main.identitiesForFQDNSelectorIPs(0xc00a7687e0, 0xc0008f9740, 0x0, 0x0, 0x0) /go/src/github.com/cilium/cilium/daemon/fqdn.go:87 +0x38e main.(*Daemon).updateSelectors(0xc000515680, 0x2759da0, 0xc000e09a40, 0xc00a7687e0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0) /go/src/github.com/cilium/cilium/daemon/fqdn.go:320 +0x40 github.com/cilium/cilium/pkg/fqdn.(*NameManager).UpdateGenerateDNS(0xc0005c3740, 0x2759da0, 0xc000e09a40, 0xbfae6eb0118dc6cd, 0x131b26c90d12, 0x3a5bf00, 0xc004cf9910, 0x0, 0x0, 0x0) /go/src/github.com/cilium/cilium/pkg/fqdn/name_manager.go:225 +0x48f main.(*Daemon).notifyOnDNSMsg(0xc000515680, 0xbfae6eb0118dc6cd, 0x131b26c90d12, 0x3a5bf00, 0xc001427080, 0xc00a2bdec0, 0x11, 0xc00ae52340, 0x10, 0xc003564f30, ...) /go/src/github.com/cilium/cilium/daemon/fqdn.go:584 +0x12cd github.com/cilium/cilium/pkg/fqdn/dnsproxy.(*DNSProxy).ServeDNS(0xc0008c8e80, 0x27818e0, 0xc00302e1e0, 0xc0072a4f30) /go/src/github.com/cilium/cilium/pkg/fqdn/dnsproxy/proxy.go:475 +0x238c github.com/miekg/dns.(*Server).serveDNS(0xc000563a40, 0xc001f8be00, 0x6b, 0x200, 0xc00302e1e0) /go/src/github.com/cilium/cilium/vendor/github.com/miekg/dns/server.go:597 +0x20b github.com/miekg/dns.(*Server).serveUDPPacket(0xc000563a40, 0xc00047d060, 0xc001f8be00, 0x6b, 0x200, 0xc000df0380, 0xc0092cf2b0) /go/src/github.com/cilium/cilium/vendor/github.com/miekg/dns/server.go:552 +0xb2 github.com/miekg/dns.(*Server).serveUDP.func2(0xc000563a40, 0xc00047d060, 0xc001f8be00, 0x6b, 0x200, 0xc000df0380, 0xc0092cf2b0) /go/src/github.com/cilium/cilium/vendor/github.com/miekg/dns/server.go:478 +0x67 created by github.com/miekg/dns.(*Server).serveUDP /go/src/github.com/cilium/cilium/vendor/github.com/miekg/dns/server.go:477 +0x272 ``` Fixes: #11946 Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: Joe Stringer <joe@cilium.io> 10 June 2020, 09:27:27 UTC
c2fe822 endpoint: Fix data races while accessing GetIdentity() [ upstream commit 10f4e2ce3664fecc6c30cb971a5ed7924bcea692 ] [ Backporter's notes: Minor conflicts scattered across five files. ] Calls to GetIdentity() have been assuming both that the endpoint is locked or not locked. * daemon/cmd/fqdn.go / notifyOnDNSMsg(): LookupEndpointIDByIP(). No locking. --> Vulnerable * pkg/datapath/linux/config/config.go / writeStaticData() e.owner.Datapath().WriteEndpointConfig -> WriteEndpointConfig() -> writeStaticData() The endpoint is locked. Not using epCacheInfo. Must use a non-locking variation or a deadlock may occur. --> Not vulnerable * pkg/datapath/loader/template.go e.realizeBPFState() -> -> CompileOrLoad -> ELFSubstitutions() -> elfVariableSubstitutions() -> CompileAndLoad() -> compileAndLoad() -> realizeBPFState() -> ReloadDatapath() -> -> ReloadDatapath() -> reloadHostDatapath() -> patchHostNetdevDatapath() -> ELFSubstitutions() -> elfVariableSubstitutions() Uses epInfoCache --> Not vulnerable * pkg/envoy/server.go e.updateNetworkPolicy() -> UpdateNetworkPolicy() The endpoint is locked. Must use a non-locking variation. --> Not vulnerable * pkg/hubble/parser/threefour/parser.go Decode() -> resolveEndpoint() --> Vulnerable Fixes: #11932 Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: Joe Stringer <joe@cilium.io> 10 June 2020, 09:27:27 UTC
bce1e4b defaults: Increase IPAM expiration to 10 minutes [ upstream commit 53a80b0057d739cf4f98b489191891a5b212279e ] It has been observed that 3 minutes is not sufficient. Endpoint creations are currently dependant on kubelet cancelling the request or scheduling a new one. The IPAM expiration may never ooccur before such an event happens as otherwise the IP is returned prematurely. Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: Joe Stringer <joe@cilium.io> 10 June 2020, 09:27:27 UTC
ddd8fb1 agent: Cancel endpoint create request as it becomes obsolete [ upstream commit 27b0a88ba587491f2704e6194e6a5930293cf713 ] So far, the endpoint create request has relied on timing out eventually and checking liveness of the endpoint in various spots to then abort the creation. This has the disadvantage that if a blocking operation such as etcd interactions are delayed for a long time, kubelet may schedule a new pod creation attempt while the old endpoint is still being created, leading to parallel endpoint create events for the same pod. Establish a new map which keeps track of all endpoint create requests and: * Cancel any endpoint create request if an endpoint delete request for the same endpoint is being received. * Cancel any endpoint create request if a new endpoint create request for the same pod is being received. The new request will continue. In order to assist in troubleshooting of create endpoint related issues, the list of ongoing endpoint creations is printed in the debuginfo output. Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: Joe Stringer <joe@cilium.io> 10 June 2020, 09:27:27 UTC
37b3a34 agent: Log endpoint create, patch, and delete requests [ upstream commit 31e2e1166b2c937c4150527d01c5de5a8b9c1f5a ] Add an info log message for all create, patch and dlete request. The volume should be low enough. Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: Joe Stringer <joe@cilium.io> 10 June 2020, 09:27:27 UTC
2b99aec datapath: Only NOTRACK proxy return traffic going to Cilium datapath [ upstream commit 02b43fc85e602bceac61914d7117281768a5f079 ] [ Backporter's notes: Added extra "! -s $internal_ip" match to resolve conflict on older v1.7 branch. ] Proxy return traffic accessed via a k8s NodePort will not be routed back via Cilium bpf datapath, so such traffic needs to have possible reverse NAT applied. Setting NOTRACK prevented this. Fix this by setting NOTRACK only on packets heading back to the Cilium datapath (-o lxc+ and -o cilium_host). Signed-off-by: Jarno Rajahalme <jarno@covalent.io> Signed-off-by: Joe Stringer <joe@cilium.io> 10 June 2020, 09:27:27 UTC
92492d1 daemon: Clarify log msg how to use only TCP socket-lb [ upstream commit 6bd5b9c4e14621bc036c16d82ed4624110f5358c ] Fix: https://github.com/cilium/cilium/issues/11055 Signed-off-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: Joe Stringer <joe@cilium.io> 10 June 2020, 09:27:27 UTC
4cca0fd policy: Fix rule translation test flake [ upstream commit c54a76f7af13762b9191f50e96c083624bbe8cdf ] [ Backporter's notes: Minor conflict in imports ] The rule_translate_test.go was checking the output of an unsorted slice, which could occasionally fail with: FAIL: rule_translate_test.go:197: K8sSuite.TestGenerateToCIDRFromEndpoint rule_translate_test.go:228: c.Assert(string(rule.ToCIDRSet[0].Cidr), Equals, epIP1+"/32") ... obtained string = "10.1.1.2/32" ... expected string = "10.1.1.1/32" Fix it by implementing some basic sorting functions in the policy api package and using these in the offending tests. Signed-off-by: Joe Stringer <joe@cilium.io> 10 June 2020, 09:27:27 UTC
3698c7c pkg/k8s: delete toCIDRSets for more than 2 endpoints [ upstream commit 90217c1aa758ea54ffa3aed92b4ec337e35d8eb3 ] In case a toServices selected a service that contained more than 1 endpoint, the generated rules could never be deleted. This can easily be reproducible by adding one more endpoint to the unit test of the deleteToCidrFromEndpoint function. Fixes: bae09f7cc960 ("Move ToCIDR gen logic to k8s package") Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Joe Stringer <joe@cilium.io> 10 June 2020, 09:27:27 UTC
8ccccd9 daemon: Add assertions for endpoint state metrics [ upstream commit 4a4a59b127c25abe5983fb0c7ba494cd9a4dd365 ] This commit adds new assertions on endpoint state metrics to the existing test cases. This commit helps validate that when an endpoint fails during creation, it will be deleted, and the bookkeeping of the endpoint state metrics is correct. Additionally, this commits increases the code coverage around the creation of endpoints when they fail. Signed-off-by: Chris Tarazi <chris@isovalent.com> 05 June 2020, 23:59:34 UTC
c061871 endpoint: Improve tests to assert against metrics [ upstream commit ccde86d3b2ea068c03ee7503232fc48f89bb4c30 ] This commits augments the endpoint state transition unit tests to assert against incrementing / decrementing metrics. It also adds new tests to cover the new endpoint state, "invalid". Signed-off-by: Chris Tarazi <chris@isovalent.com> 05 June 2020, 23:59:34 UTC
d5a78c5 endpoint: Add Invalid state [ upstream commit 0d6b7ade8d3fface0559ead13818c8b81692feab ] This commit adds a new Endpoint state called "invalid-endpoint". This state represents endpoints that failed during their creation due to invalid data. Previously, endpoints that failed during their creation due to invalid data were ignored and left running. This would cause leaking metrics as they'd likely be stuck in "waiting-for-identity" which can be alarming for the user. Signed-off-by: Chris Tarazi <chris@isovalent.com> 05 June 2020, 23:59:34 UTC
4209ac4 metrics: Add getter for gauge values [ upstream commit 94f0b691c1b08ee6203e3170230230d0572f3a0b ] This is useful for observing gauge values in tests. Previously, only counter values had a getter. Signed-off-by: Chris Tarazi <chris@isovalent.com> 05 June 2020, 23:59:34 UTC
6f27f52 Retry on ciliumnode update/create [ upstream commit c9c77a2b1c8e165aa742cb123a1a1351cf50d4f4 ] [ Backporter's notes ] * Conflicts with several features introduced in v1.8 branch: * Azure IPAM * Kubernetes 1.18 client library rebase (context parameters) * pkg/node/types refactor Currently, we don't retry on updates/creates for the CiliumNode when the agent starts up. This is suboptimal because the operator and the agent can end up "fighting" each other since both of them create/update the CiliumNode resource. This can cause the agent to continually fail to start and prolong the time before a node is ready for use. Signed-off-by: Ashray Jain <ashrayj@palantir.com> Signed-off-by: Joe Stringer <joe@cilium.io> 05 June 2020, 07:41:27 UTC
36a0729 eni: Use deep-copied object to prevent data race This fixes a potential data race as `n.resource` is a live pointer to an object. Fixes: 06bce43ab8 ("aws/eni: Fix race condition leading to overaggressive ENI allocation") Signed-off-by: Chris Tarazi <chris@isovalent.com> 05 June 2020, 00:02:38 UTC
a518327 eni: Fix data race when updating limits [ upstream commit cac8d0d8b5f3a7a3e2c12db4dac01fa5ea671b11 ] Fixes: ``` WARNING: DATA RACE Write at 0x00c0005b0750 by goroutine 308: runtime.mapassign_faststr() /usr/local/go/src/runtime/map_faststr.go:202 +0x0 github.com/cilium/cilium/pkg/aws/eni.UpdateLimitsFromUserDefinedMappings() /home/vagrant/go/src/github.com/cilium/cilium/pkg/aws/eni/limits.go:269 +0xdf github.com/cilium/cilium/pkg/aws/eni.(*ENISuite).TestUpdateLimitsFromUserDefinedMappings() /home/vagrant/go/src/github.com/cilium/cilium/pkg/aws/eni/limits_test.go:47 +0x11d runtime.call32() /usr/local/go/src/runtime/asm_amd64.s:539 +0x3a reflect.Value.Call() /usr/local/go/src/reflect/value.go:321 +0xd3 gopkg.in/check%2ev1.(*suiteRunner).forkTest.func1() /home/vagrant/go/src/github.com/cilium/cilium/vendor/gopkg.in/check.v1/check.go:781 +0xa0a gopkg.in/check%2ev1.(*suiteRunner).forkCall.func1() /home/vagrant/go/src/github.com/cilium/cilium/vendor/gopkg.in/check.v1/check.go:675 +0xd9 ``` Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: Chris Tarazi <chris@isovalent.com> 05 June 2020, 00:02:38 UTC
373d11d eni: Fix data race around ENI map Fixes: ``` WARNING: DATA RACE Write at 0x00c0003e4450 by goroutine 81: github.com/cilium/cilium/pkg/aws/eni.(*Node).recalculateLocked() /home/chris/code/cilium/cilium-backports/pkg/aws/eni/node.go:235 +0x75 github.com/cilium/cilium/pkg/aws/eni.(*NodeManager).resyncNode() /home/chris/code/cilium/cilium-backports/pkg/aws/eni/node_manager.go:261 +0xeb github.com/cilium/cilium/pkg/aws/eni.(*NodeManager).Resync.func1() /home/chris/code/cilium/cilium-backports/pkg/aws/eni/node_manager.go:309 +0x88 Previous read at 0x00c0003e4450 by goroutine 123: github.com/cilium/cilium/pkg/aws/eni.(*Node).maintainIpPool() /home/chris/code/cilium/cilium-backports/pkg/aws/eni/node.go:736 +0xa74 github.com/cilium/cilium/pkg/aws/eni.(*Node).MaintainIpPool() /home/chris/code/cilium/cilium-backports/pkg/aws/eni/node.go:776 +0x90 github.com/cilium/cilium/pkg/aws/eni.(*NodeManager).Update.func1() /home/chris/code/cilium/cilium-backports/pkg/aws/eni/node_manager.go:154 +0x8b github.com/cilium/cilium/pkg/trigger.(*Trigger).waiter() /home/chris/code/cilium/cilium-backports/pkg/trigger/trigger.go:210 +0x4b9 Goroutine 81 (running) created at: github.com/cilium/cilium/pkg/aws/eni.(*NodeManager).Resync() /home/chris/code/cilium/cilium-backports/pkg/aws/eni/node_manager.go:308 +0x27c github.com/cilium/cilium/pkg/aws/eni.NewNodeManager.func1() /home/chris/code/cilium/cilium-backports/pkg/aws/eni/node_manager.go:111 +0x101 github.com/cilium/cilium/pkg/trigger.(*Trigger).waiter() /home/chris/code/cilium/cilium-backports/pkg/trigger/trigger.go:210 +0x4b9 Goroutine 123 (running) created at: github.com/cilium/cilium/pkg/trigger.NewTrigger() /home/chris/code/cilium/cilium-backports/pkg/trigger/trigger.go:133 +0x23d github.com/cilium/cilium/pkg/aws/eni.(*NodeManager).Update() /home/chris/code/cilium/cilium-backports/pkg/aws/eni/node_manager.go:149 +0x482 github.com/cilium/cilium/pkg/aws/eni.(*ENISuite).TestNodeManagerManyNodes() /home/chris/code/cilium/cilium-backports/pkg/aws/eni/node_manager_test.go:579 +0x9c8 runtime.call32() /usr/lib/go/src/runtime/asm_amd64.s:539 +0x3a reflect.Value.Call() /usr/lib/go/src/reflect/value.go:321 +0xd3 gopkg.in/check%2ev1.(*suiteRunner).forkTest.func1() /home/chris/go/pkg/mod/gopkg.in/check.v1@v1.0.0-20180628173108-788fd7840127/check.go:781 +0xa0a gopkg.in/check%2ev1.(*suiteRunner).forkCall.func1() /home/chris/go/pkg/mod/gopkg.in/check.v1@v1.0.0-20180628173108-788fd7840127/check.go:675 +0xd9 ``` Signed-off-by: Chris Tarazi <chris@isovalent.com> 05 June 2020, 00:02:38 UTC
ddfbc91 eni/node_manager_test: Add set up and tear down [ upstream commit 916826877d15798783a37221052dc4aa94b61380 ] In this test suite, the `metricsapi` is global variable which is shared among all the `Test*` functions. It is possible that it becomes polluted over time during test execution. This is an attempt to resolve the following: ``` FAIL: node_manager_test.go:563: ENISuite.TestNodeManagerManyNodes node_manager_test.go:602: c.Errorf("Node %s allocation mismatch. expected: %d allocated: %d", s.name, minAllocate, node.Stats().AvailableIPs) ... Error: Node node53 allocation mismatch. expected: 10 allocated: 18 node_manager_test.go:602: c.Errorf("Node %s allocation mismatch. expected: %d allocated: %d", s.name, minAllocate, node.Stats().AvailableIPs) ... Error: Node node59 allocation mismatch. expected: 10 allocated: 18 node_manager_test.go:617: c.Assert(metricsapi.AllocatedIPs("available"), check.Equals, numNodes*minAllocate) ... obtained int = 1016 ... expected int = 1000 ... OOPS: 17 passed, 1 FAILED Signed-off-by: Chris Tarazi <chris@isovalent.com> 05 June 2020, 00:02:38 UTC
1decbdb eni: Fix data races around `Node.enis` [ upstream commit 699a8a51516d9ed97819878d95c79f618fa79227 ] This fixes data races found when running the unit-tests with `-race`. The mutex is to protect access to `n.enis`. ``` WARNING: DATA RACE Write at 0x00c0007b40e0 by goroutine 135: github.com/cilium/cilium/pkg/aws/eni.(*Node).ResyncInterfacesAndIPs() /home/chris/code/cilium/cilium/pkg/aws/eni/node.go:430 +0xfa github.com/cilium/cilium/pkg/ipam.(*Node).recalculate() /home/chris/code/cilium/cilium/pkg/ipam/node.go:352 +0xfb github.com/cilium/cilium/pkg/ipam.(*NodeManager).resyncNode() /home/chris/code/cilium/cilium/pkg/ipam/node_manager.go:342 +0x7a github.com/cilium/cilium/pkg/ipam.(*NodeManager).Resync.func1() /home/chris/code/cilium/cilium/pkg/ipam/node_manager.go:389 +0x88 Previous read at 0x00c0007b40e0 by goroutine 46: github.com/cilium/cilium/pkg/aws/eni.(*Node).findNextIndex() /home/chris/code/cilium/cilium/pkg/aws/eni/node.go:292 +0x86 github.com/cilium/cilium/pkg/aws/eni.(*Node).CreateInterface() /home/chris/code/cilium/cilium/pkg/aws/eni/node.go:339 +0x584 github.com/cilium/cilium/pkg/ipam.(*Node).createInterface() /home/chris/code/cilium/cilium/pkg/ipam/node.go:435 +0x290 github.com/cilium/cilium/pkg/ipam.(*Node).maintainIPPool() /home/chris/code/cilium/cilium/pkg/ipam/node.go:628 +0x85d github.com/cilium/cilium/pkg/ipam.(*Node).MaintainIPPool() /home/chris/code/cilium/cilium/pkg/ipam/node.go:663 +0x82 github.com/cilium/cilium/pkg/ipam.(*NodeManager).Update.func1() /home/chris/code/cilium/cilium/pkg/ipam/node_manager.go:241 +0x8b github.com/cilium/cilium/pkg/trigger.(*Trigger).waiter() /home/chris/code/cilium/cilium/pkg/trigger/trigger.go:206 +0x4b9 Goroutine 135 (running) created at: github.com/cilium/cilium/pkg/ipam.(*NodeManager).Resync() /home/chris/code/cilium/cilium/pkg/ipam/node_manager.go:388 +0x2c5 github.com/cilium/cilium/pkg/ipam.NewNodeManager.func1() /home/chris/code/cilium/cilium/pkg/ipam/node_manager.go:168 +0x101 github.com/cilium/cilium/pkg/trigger.(*Trigger).waiter() /home/chris/code/cilium/cilium/pkg/trigger/trigger.go:206 +0x4b9 Goroutine 46 (running) created at: github.com/cilium/cilium/pkg/trigger.NewTrigger() /home/chris/code/cilium/cilium/pkg/trigger/trigger.go:129 +0x23d github.com/cilium/cilium/pkg/ipam.(*NodeManager).Update() /home/chris/code/cilium/cilium/pkg/ipam/node_manager.go:236 +0x523 github.com/cilium/cilium/pkg/aws/eni.(*ENISuite).TestNodeManagerManyNodes() /home/chris/code/cilium/cilium/pkg/aws/eni/node_manager_test.go:593 +0x80a runtime.call32() /usr/lib/go/src/runtime/asm_amd64.s:539 +0x3a reflect.Value.Call() /usr/lib/go/src/reflect/value.go:321 +0xd3 gopkg.in/check%2ev1.(*suiteRunner).forkTest.func1() /home/chris/code/cilium/cilium/vendor/gopkg.in/check.v1/check.go:781 +0xa0a gopkg.in/check%2ev1.(*suiteRunner).forkCall.func1() /home/chris/code/cilium/cilium/vendor/gopkg.in/check.v1/check.go:675 +0xd9 ``` ``` WARNING: DATA RACE Write at 0x00c000110060 by goroutine 94: github.com/cilium/cilium/pkg/aws/eni.(*Node).ResyncInterfacesAndIPs() /home/chris/code/cilium/cilium/pkg/aws/eni/node.go:439 +0xfa github.com/cilium/cilium/pkg/ipam.(*Node).recalculate() /home/chris/code/cilium/cilium/pkg/ipam/node.go:352 +0xfb github.com/cilium/cilium/pkg/ipam.(*NodeManager).resyncNode() /home/chris/code/cilium/cilium/pkg/ipam/node_manager.go:342 +0x7a github.com/cilium/cilium/pkg/ipam.(*NodeManager).Resync.func1() /home/chris/code/cilium/cilium/pkg/ipam/node_manager.go:389 +0x88 Previous read at 0x00c000110060 by goroutine 92: github.com/cilium/cilium/pkg/aws/eni.(*Node).PrepareIPAllocation() /home/chris/code/cilium/cilium/pkg/aws/eni/node.go:211 +0xefc github.com/cilium/cilium/pkg/ipam.(*Node).determineMaintenanceAction() /home/chris/code/cilium/cilium/pkg/ipam/node.go:542 +0x184 github.com/cilium/cilium/pkg/ipam.(*Node).maintainIPPool() /home/chris/code/cilium/cilium/pkg/ipam/node.go:578 +0x53 github.com/cilium/cilium/pkg/ipam.(*Node).MaintainIPPool() /home/chris/code/cilium/cilium/pkg/ipam/node.go:663 +0x82 github.com/cilium/cilium/pkg/ipam.(*NodeManager).Update.func1() /home/chris/code/cilium/cilium/pkg/ipam/node_manager.go:241 +0x8b github.com/cilium/cilium/pkg/trigger.(*Trigger).waiter() /home/chris/code/cilium/cilium/pkg/trigger/trigger.go:206 +0x4b9 Goroutine 94 (running) created at: github.com/cilium/cilium/pkg/ipam.(*NodeManager).Resync() /home/chris/code/cilium/cilium/pkg/ipam/node_manager.go:388 +0x2c5 github.com/cilium/cilium/pkg/ipam.NewNodeManager.func1() /home/chris/code/cilium/cilium/pkg/ipam/node_manager.go:168 +0x101 github.com/cilium/cilium/pkg/trigger.(*Trigger).waiter() /home/chris/code/cilium/cilium/pkg/trigger/trigger.go:206 +0x4b9 Goroutine 92 (running) created at: github.com/cilium/cilium/pkg/trigger.NewTrigger() /home/chris/code/cilium/cilium/pkg/trigger/trigger.go:129 +0x23d github.com/cilium/cilium/pkg/ipam.(*NodeManager).Update() /home/chris/code/cilium/cilium/pkg/ipam/node_manager.go:236 +0x523 github.com/cilium/cilium/pkg/aws/eni.(*ENISuite).TestNodeManagerManyNodes() /home/chris/code/cilium/cilium/pkg/aws/eni/node_manager_test.go:593 +0x80a runtime.call32() /usr/lib/go/src/runtime/asm_amd64.s:539 +0x3a reflect.Value.Call() /usr/lib/go/src/reflect/value.go:321 +0xd3 gopkg.in/check%2ev1.(*suiteRunner).forkTest.func1() /home/chris/code/cilium/cilium/vendor/gopkg.in/check.v1/check.go:781 +0xa0a gopkg.in/check%2ev1.(*suiteRunner).forkCall.func1() /home/chris/code/cilium/cilium/vendor/gopkg.in/check.v1/check.go:675 +0xd9 ``` Signed-off-by: Chris Tarazi <chris@isovalent.com> 05 June 2020, 00:02:38 UTC
2758412 datapath: Accept proxy traffic [ upstream commit c4553a8bca8367a85ca1aa07343526c629f6143b ] The forward chain rules have been depended on the local delivery interface which depending on the setting of enable-endpoint-routes is either `cilium_host` or `lxc+`. This is sufficient for all regular traffic. For proxy redirection traffic, all traffic still passes through cilium_host regardless of the value of enable-endpoint-routes. Eample of existing rules: ``` -A CILIUM_FORWARD -o lxc+ -m comment --comment "cilium: any->cluster on lxc+ forward accept" -j ACCEPT -A CILIUM_FORWARD -i lxc+ -m comment --comment "cilium: cluster->any on lxc+ forward accept (nodeport)" -j ACCEPT -A CILIUM_FORWARD -i lxc+ -m comment --comment "cilium: cluster->any on lxc+ forward accept" -j ACCEPT ``` This problem was masked because Kubernetes would install these wide-reaching rules: ``` -A KUBE-FORWARD -m comment --comment "kubernetes forwarding conntrack pod source rule" -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT -A KUBE-FORWARD -m comment --comment "kubernetes forwarding conntrack pod destination rule" -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT ``` However, more recent versions of Kubernetes would install more fine-grained rules when the PodCIDR is known too the host: ``` -A KUBE-FORWARD -s 10.10.0.0/16 -m comment --comment "kubernetes forwarding conntrack pod source rule" -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT ``` This will still mask the problem if Cilium uses the PodCIDR for IP allocation. However, in case Cilium does not use the announced PodCIDR then these rules would no longer allow the proxy redirection traffic causes proxy redirection to break. Fixes: #11235 Fixes: #11807 Signed-off-by: Jarno Rajahalme <jarno@covalent.io> Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: Tobias Klauser <tklauser@distanz.ch> 03 June 2020, 21:00:28 UTC
0aff78c iptables: de-duplicate code in (*IptablesManager).ciliumNoTrackXfrmRules [ upstream commit 7408fc5ef2b5ca2d887d55304c37d247f10a59f0 ] Signed-off-by: Tobias Klauser <tklauser@distanz.ch> 03 June 2020, 21:00:28 UTC
7f26932 iptables: de-duplicate code for forward chain rules [ upstream commit 3088697661c9423a89cfed66d4a995c758285d47 ] Move the installation of the forward chain rules into a separate function which is called from (*IptablesManager).TransientRulesStart and (*IptablesManager).InstallRules. Signed-off-by: Tobias Klauser <tklauser@distanz.ch> 03 June 2020, 21:00:28 UTC
f90ba4c agent: Fix data race when accessing d.monitorAgent [ upstream commit 870894a8126469d1710f57631cf46021c31bb3de ] Fixes: ``` 2020-04-15T08:20:51.171924491Z WARNING: DATA RACE 2020-04-15T08:20:51.171929471Z Write at 0x00c0002d89c0 by main goroutine: 2020-04-15T08:20:51.171932068Z github.com/cilium/cilium/daemon/cmd.NewDaemon() 2020-04-15T08:20:51.171934565Z /go/src/github.com/cilium/cilium/daemon/cmd/daemon.go:521 +0x3811 2020-04-15T08:20:51.171937059Z github.com/cilium/cilium/daemon/cmd.(*Daemon).init() 2020-04-15T08:20:51.171939762Z /go/src/github.com/cilium/cilium/daemon/cmd/daemon.go:210 +0x4a4 2020-04-15T08:20:51.171942223Z github.com/cilium/cilium/daemon/cmd.NewDaemon() 2020-04-15T08:20:51.171944604Z /go/src/github.com/cilium/cilium/daemon/cmd/daemon.go:508 +0x2fa5 2020-04-15T08:20:51.171947111Z github.com/cilium/cilium/daemon/cmd.NewDaemon() 2020-04-15T08:20:51.171949522Z /go/src/github.com/cilium/cilium/daemon/cmd/daemon.go:411 +0x463e 2020-04-15T08:20:51.171951975Z github.com/cilium/cilium/daemon/cmd.runDaemon() 2020-04-15T08:20:51.171954378Z /go/src/github.com/cilium/cilium/daemon/cmd/daemon_main.go:1194 +0x345 2020-04-15T08:20:51.171956805Z github.com/cilium/cilium/daemon/cmd.glob..func1() 2020-04-15T08:20:51.171959248Z /go/src/github.com/cilium/cilium/daemon/cmd/daemon_main.go:107 +0x108 2020-04-15T08:20:51.171961654Z github.com/cilium/cilium/daemon/cmd.initEnv() 2020-04-15T08:20:51.171964065Z /go/src/github.com/cilium/cilium/daemon/cmd/daemon_main.go:1073 +0x2c56 2020-04-15T08:20:51.171966471Z github.com/cilium/cilium/daemon/cmd.glob..func1() 2020-04-15T08:20:51.171968921Z /go/src/github.com/cilium/cilium/daemon/cmd/daemon_main.go:105 +0xee 2020-04-15T08:20:51.171971339Z github.com/spf13/cobra.(*Command).execute() 2020-04-15T08:20:51.171973711Z /go/src/github.com/cilium/cilium/vendor/github.com/spf13/cobra/command.go:830 +0x8e0 2020-04-15T08:20:51.171983921Z github.com/spf13/cobra.(*Command).ExecuteC() 2020-04-15T08:20:51.17198653Z /go/src/github.com/cilium/cilium/vendor/github.com/spf13/cobra/command.go:914 +0x41a 2020-04-15T08:20:51.171989059Z github.com/spf13/cobra.(*Command).Execute() 2020-04-15T08:20:51.171991461Z /go/src/github.com/cilium/cilium/vendor/github.com/spf13/cobra/command.go:864 +0x251 2020-04-15T08:20:51.171993892Z github.com/cilium/cilium/daemon/cmd.Execute() 2020-04-15T08:20:51.171996301Z /go/src/github.com/cilium/cilium/daemon/cmd/daemon_main.go:138 +0x232 2020-04-15T08:20:51.171998765Z main.main() 2020-04-15T08:20:51.172001107Z /go/src/github.com/cilium/cilium/daemon/main.go:22 +0x2f 2020-04-15T08:20:51.172003564Z 2020-04-15T08:20:51.172005844Z Previous read at 0x00c0002d89c0 by goroutine 82: 2020-04-15T08:20:51.172008245Z github.com/cilium/cilium/daemon/cmd.(*Daemon).SendNotification() 2020-04-15T08:20:51.172010648Z /go/src/github.com/cilium/cilium/daemon/cmd/daemon.go:633 +0xc0 2020-04-15T08:20:51.172013085Z github.com/cilium/cilium/pkg/datapath/ipcache.(*BPFListener).notifyMonitor() 2020-04-15T08:20:51.172015524Z /go/src/github.com/cilium/cilium/pkg/datapath/ipcache/listener.go:113 +0x288 2020-04-15T08:20:51.172019826Z github.com/cilium/cilium/pkg/datapath/ipcache.(*BPFListener).OnIPIdentityCacheChange() 2020-04-15T08:20:51.172022388Z /go/src/github.com/cilium/cilium/pkg/datapath/ipcache/listener.go:142 +0x21e 2020-04-15T08:20:51.172024963Z github.com/cilium/cilium/pkg/ipcache.(*IPCache).Upsert() 2020-04-15T08:20:51.172027361Z /go/src/github.com/cilium/cilium/pkg/ipcache/ipcache.go:278 +0xaa0 2020-04-15T08:20:51.172029767Z github.com/cilium/cilium/daemon/cmd.(*Daemon).syncEndpointsAndHostIPs() 2020-04-15T08:20:51.17203218Z /go/src/github.com/cilium/cilium/daemon/cmd/datapath.go:275 +0xcd0 2020-04-15T08:20:51.172034647Z github.com/cilium/cilium/daemon/cmd.(*Daemon).init.func2() 2020-04-15T08:20:51.17203703Z /go/src/github.com/cilium/cilium/daemon/cmd/daemon.go:224 +0x41 2020-04-15T08:20:51.172039455Z github.com/cilium/cilium/pkg/controller.(*Controller).runController() 2020-04-15T08:20:51.172041885Z /go/src/github.com/cilium/cilium/pkg/controller/controller.go:205 +0xc71 ``` Fixes: #10987 Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: Tobias Klauser <tklauser@distanz.ch> 03 June 2020, 21:00:28 UTC
bfdde04 connectivity-check: Do not perform hostport in standard check [ upstream commit d961b9dea5d8ed828b0028713b84a9b0f1f60404 ] Due to HostPort not being enabled by default, do not perform the check by default. Require the "connectivity-check-hostport.yaml" to be deployed. Fixes: #11563 Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: Tobias Klauser <tklauser@distanz.ch> 03 June 2020, 21:00:28 UTC
6b5431a proxy: Do not decrement proxy port reference count when reverting. [ upstream commit 2a6198956dc22967b97af0437c3227af592ea25f ] Proxy port reference count is incremented only when an ACK has been received from all proxies in a specific policy update. If any of the proxies fail to ACK in time, the revert function is called. Proxy port reference counts must not be decremented at this time, as they have not been incremented yet. Fixes: #11637 Fixes: #6921 Signed-off-by: Jarno Rajahalme <jarno@covalent.io> Signed-off-by: Tobias Klauser <tklauser@distanz.ch> 03 June 2020, 21:00:28 UTC
59e3523 etcd: Increase status check timeout to 10 seconds [ upstream commit 3430980860e68dbb92d7f0faa79a49d09b6e6786 ] If the node is under heavy CPU load, the status check may take longer than 5 seconds. Increase it to 10 seconds and also make it configurable with an environment variable. Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: Tobias Klauser <tklauser@distanz.ch> 03 June 2020, 21:00:28 UTC
afb6e91 ipcache: Add unit tests for IP shadowing [ upstream commit d909b14c93abe77573df31aac59f7b1d8b41e883 ] These unit tests validate the bug fixed by the prior commit where entries dumped from the ipcache may not consistently map IPs to the correct Identity. Note that there is a potential Golang map dump ordering aspect to these tests so depending on the Go version used they may/may not consistently fail. They consistently fail for me prior to the fix (eg v1.8.0-rc1), and consistently pass with the fix, but YMMV. Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Tobias Klauser <tklauser@distanz.ch> 03 June 2020, 21:00:28 UTC
4f61bbe ipcache: Fix Dump precedence for overlapping IPs [ upstream commit a899305bce1f27596a30ef97a01acc9f824c4a82 ] Cilium defines a precedence ordering between sources for mappings from IP to Identity, defined in pkg/source/source.go. These are used to determine which Identity should be used for an IP when multiple sources provide conflicting reports of the Identity to be associated with an IP. This ordering was handled in the case of source.Generated CIDR/FQDN -sourced identities by inserting potentially two overlapping entries into the ipcache.IPIdentityCache when an endpoint's IP is the same as a CIDR mapping: * An endpoint Identity would be inserted with the key 'w.x.y.z', and * A CIDR Identity would be inserted with the key 'w.x.y.z/32'. (IPv6: /128) During Upsert() and Delete(), when overlapping entries existed in the map, this overlap would be resolved by directly checking whether another entry exists with/without the `/32` suffix (IPv6: /128) and either hiding the update from listeners (when a shadowed mapping is upserted), or converting the delete to an update (when a shadowing entry is deleted, revealing the underlying shadowed entry). During DumpToListenerLocked() however, this shadowing would not be resolved and instead both entries would be dumped to the caller in an arbitrary order. This is particularly notable on Linux 4.11 to 4.15 where LPM support is available but deletion is not supported. In these cases, Cilium periodically dumps the ipcache to a listener using this function, and populates the BPF ipcache map using this dump. Any time this dump occurs (default 5 minutes), it would be possible for the ipcache mapping to be modified to map the IP to either of the cached identities. Depending on the Go runtime in use by the version of Cilium, this may or may not consistently provide particular dump ordering. Resolve this issue by keeping track of shadowed entries with an explicit boolean field in the value of the map, and avoiding dumping such entries in the DumpToListenerLocked() function. Fixes: #11517 Reported-by: Will Deuschle <wdeuschle@palantir.com> Suggested-by: Jarno Rajahalme <jarno@cilium.io> Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Tobias Klauser <tklauser@distanz.ch> 03 June 2020, 21:00:28 UTC
6f35417 proxy: Never release proxy port for non-dynamic listeners [ upstream commit af9fd59b7ce0816794b9ab782e3b10f15d1df846 ] Due to an apparent reference counting problem, where DNS redirect count reaches zero even though the reference count is set to one by SetProxyPort(), is is possible for the DNS proxy listening port to be reallocated and the corresponding datapath redirection rules changed to a new port, while the DNS proxy is incapable of changing it's listening port. Fix this by marking a proxy post set via SetProxyPort() as static and adding corresponding conditions that prevent the release and reallocation of the proxy port even if the reference count reaches zero. Fixes: 11637 Signed-off-by: Jarno Rajahalme <jarno@covalent.io> Signed-off-by: Tobias Klauser <tklauser@distanz.ch> 03 June 2020, 21:00:28 UTC
back to top