https://github.com/cilium/cilium

sort by:
Revision Author Date Message Commit Date
f909e32 Prepare v1.9.0-rc2 release Signed-off-by: André Martins <andre@cilium.io> 19 October 2020, 13:54:21 UTC
92d5d60 Prepare v1.9.0-rc2 release Signed-off-by: André Martins <andre@cilium.io> 19 October 2020, 11:38:54 UTC
289e999 cilium: fixup encrypt lib to pass non ipv4/v6 traffic up stack When moving encryption code into a library we made some errors. First, we created a path where we redirect non-encrypted data to cilium_ifindex. Usually this is fine except for the case where we need to use the iptables snat/dnat rules on the traffic and we bypass those by doing a BPF redirect. This manifested as a failure when running the cilium test-connectivity example yaml. The host to pod service IP was broke. Next, we also skipped doing data_pull in all cases which causes some issues up the stack if we have not pulled the data in after popping the IP header off. Fixes: 9ed106a017c52 ("cilium: create lib for encryption") Signed-off-by: John Fastabend <john.fastabend@gmail.com> 19 October 2020, 08:52:43 UTC
2e0539a loader: Check if device has BPF prog before trying to detach it When running Cilium with the devices set, but neither kube-proxy-free or the host firewall enabled, cilium-agent will attempt to remove the previous BPF program from the native devices' egress tc hook. If no program is attached there (i.e., Cilium was running without host firewall and kube-proxy replacement before restarting), the removal will fail with the following error. level=error msg="Command execution failed" cmd="[tc filter delete dev enp0s8 egress]" error="exit status 2" subsys=datapath-loader level=warning msg="Error: Parent Qdisc doesn't exists." subsys=datapath-loader level=warning msg="We have an error talking to the kernel, -1" subsys=datapath-loader This commit fixes this error by first checking that the device has a BPF program attached on egress. I tested this in the dev. VM, by first starting Cilium with our kube-proxy-replacement, then without but keeping the devices configuration. Fixes: a695f53 ("Endpoint for host") Signed-off-by: Paul Chaignon <paul@cilium.io> 19 October 2020, 08:29:27 UTC
ee7e681 docs/performance: update scripts repo and tf ver Renamed the scripts repo to `cilium-perf-networking.git` to make it more clear and less annoying for people that have to check it out. While we are at it, add a note about the terraform version that we expect. Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com> 19 October 2020, 08:28:47 UTC
290d9e9 daemon: Init endpoint queue during validation This commit fixes the following errors: ``` evel=error msg="Unable to enqueue endpoint policy visibility event" containerID=9f680a5847 datapathPolicyRevision=0 desiredPolicyRevision=0 endpointID=3479 error="unable to Enqueue event" identity=8771 ipv4=10.116.2.10 ipv6= k8sPodName=cilium-monitoring/grafana-6d49bd9ff7-s8zsd subsys=endpoint ``` These errors occurred because during endpoint validation (when the endpoint is being restored), its event queue has not been initialized yet. Once the endpoint is eventually exposed endpoint manager (after restoration), it will begin processing the events off the queue. Signed-off-by: Chris Tarazi <chris@isovalent.com> 19 October 2020, 08:27:38 UTC
79bf425 endpoint: Add function to initialize event queue This is useful during endpoint validation when endpoints are being restored. When they are being restored, their event queue is not yet initialized because they haven't been exposed to the endpoint manager. It is important to initialize an endpoint's event queue so that events are not missed during their restoration. Signed-off-by: Chris Tarazi <chris@isovalent.com> 19 October 2020, 08:27:38 UTC
fad24e6 helm: Enable mTLS for Hubble in experimental-install.yaml This will deploy the cilium/certgen container as a Kubernetes Job and CronJob to generate the mTLS certificates needed for Hubble Relay. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> 16 October 2020, 17:36:44 UTC
6c8b369 helm: Add K8s Job and CronJob for Hubble mTLS certificate (re)generation This adds a new method to automatically generate mTLS certificates for Hubble at run-time without relying on Helm. This for example is useful when a pregenerated YAML manifest (e.g. `experimentall-install.yaml`) which with the Helm based would re-use the same set of certificates for all installations. This new approach generates the certificates at run-time instead. When `hubble.tls.auto.method` is set to `cronJob`, the generated YAML will create a Kubernetes Job to generate the certificates at installation time. The job is running in host namespace and will store the certificates as Kubernetes secrets. Once the secerts are creatd, Kubernetes will automatically mount the generated certificates into the already running Cilium pod, allowing it to pick them up and start serving the Hubble API with mTLS (c.f. #13249). In addition to the one-shot job to generate the initial set of certificates, an additional recurring Kubernetes CronJob is deployed to regenerate the certificate in a regular interval (regardless of the expiration date). This cronjob can be optionally disabled in Helm by unsetting `hubble.tls.auto.cronJob.schedule`. The source code for the cron job docker image can be found at https://github.com/cilium/certgen. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> 16 October 2020, 17:36:44 UTC
05034a1 helm: Make Hubble mTLS secret/configmap mounts optional Since #13249, if Hubble is configured to serve the API via mTLS, it is able to delay starting the server until the certificates are present on the file system. As such, we can now mark the mounts for the certificates as optional, as the we can already start Cilium agent without Hubble server. The Hubble server will be started automatically once the TLS secrets are ready and the files are mounted by Kubernetes. The current Hubble server implementation prints an info log message if the certificate files are not present after 30 seconds. This allows users to troubleshoot if the Hubble server is not started due to missing certificates. The mounts are still required for Hubble Relay. In contrast to cilium-agent, it cannot operate without the certificate present. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> 16 October 2020, 17:36:44 UTC
535f501 Fix extraction of manifest for OpenShift Since Cilium subcharts were refactored into a sinlge large chart, the layout of `helm template` output changed also. This fixes OpenShift instructions to work with the new layout. Fixes: e9cb43c03179 (Helm: full refactor of helm charts, default values implemented, tests updated, kind cni integration) Signed-off-by: Ilya Dmitrichenko <errordeveloper@gmail.com> 16 October 2020, 15:35:06 UTC
be0cabe build: Fix CC for CGO compilation for Arm Setting of CC environment variable was accidentally dropped and broke the build. This change re-introduces CC for Arm target. Fixes: #13551 (implement multiple platform architecture docker images for cilium and hubble-relay) Signed-off-by: Ilya Dmitrichenko <errordeveloper@gmail.com> 16 October 2020, 15:02:07 UTC
3ff45d2 images: Fix handling of git tags (cilium/image-tools#72) Signed-off-by: Ilya Dmitrichenko <errordeveloper@gmail.com> 16 October 2020, 14:25:33 UTC
9aa5117 implement multiple platform architecture docker images for cilium and hubble-relay Signed-off-by: Michael Fornaro <20387402+xUnholy@users.noreply.github.com> refactor Signed-off-by: Michael Fornaro <20387402+xUnholy@users.noreply.github.com> 16 October 2020, 13:13:17 UTC
6e0ddd3 docs: Update CI documentation following Helm refactoring Fixes: e9cb43c ("Helm: full refactor of helm charts, default values implemented, tests updated, kind cni integration") Signed-off-by: Paul Chaignon <paul@cilium.io> 16 October 2020, 08:07:28 UTC
2283103 cli: Add cilium bpf lb maglev get $SVC_ID Maglev lookup tables can be HUGE. So, instead of dumping them all, give a tool to users to dump an individual table by service ID (aka revNAT ID). Signed-off-by: Martynas Pumputis <m@lambda.lt> 16 October 2020, 06:40:16 UTC
778ed6a pkg/azure/ipam: fix data race in (*Node).PopulateStatusFields Fixes the following data race: ``` WARNING: DATA RACE Write at 0x00c000358270 by goroutine 57: github.com/cilium/cilium/pkg/azure/ipam.(*InstancesManager).Resync() /home/travis/gopath/src/github.com/cilium/cilium/pkg/azure/ipam/instances.go:98 +0x933 github.com/cilium/cilium/pkg/ipam.(*NodeManager).instancesAPIResync() /home/travis/gopath/src/github.com/cilium/cilium/pkg/ipam/node_manager.go:186 +0x8b github.com/cilium/cilium/pkg/ipam.NewNodeManager.func1() /home/travis/gopath/src/github.com/cilium/cilium/pkg/ipam/node_manager.go:168 +0x8f github.com/cilium/cilium/pkg/trigger.(*Trigger).waiter() /home/travis/gopath/src/github.com/cilium/cilium/pkg/trigger/trigger.go:206 +0x5cf Previous read at 0x00c000358270 by goroutine 250: github.com/cilium/cilium/pkg/azure/ipam.(*Node).PopulateStatusFields() /home/travis/gopath/src/github.com/cilium/cilium/pkg/azure/ipam/node.go:51 +0x11a github.com/cilium/cilium/pkg/ipam.(*Node).syncToAPIServer() /home/travis/gopath/src/github.com/cilium/cilium/pkg/ipam/node.go:716 +0x232 github.com/cilium/cilium/pkg/ipam.(*NodeManager).Update.func4() /home/travis/gopath/src/github.com/cilium/cilium/pkg/ipam/node_manager.go:305 +0x75 github.com/cilium/cilium/pkg/trigger.(*Trigger).waiter() /home/travis/gopath/src/github.com/cilium/cilium/pkg/trigger/trigger.go:206 +0x5cf Goroutine 57 (running) created at: github.com/cilium/cilium/pkg/trigger.NewTrigger() /home/travis/gopath/src/github.com/cilium/cilium/pkg/trigger/trigger.go:129 +0x2c4 github.com/cilium/cilium/pkg/ipam.NewNodeManager() /home/travis/gopath/src/github.com/cilium/cilium/pkg/ipam/node_manager.go:163 +0x356 github.com/cilium/cilium/pkg/azure/ipam.(*IPAMSuite).TestIpamManyNodes() /home/travis/gopath/src/github.com/cilium/cilium/pkg/azure/ipam/ipam_test.go:319 +0x52e runtime.call32() /home/travis/.gimme/versions/go1.15.2.linux.amd64/src/runtime/asm_amd64.s:540 +0x3d reflect.Value.Call() /home/travis/.gimme/versions/go1.15.2.linux.amd64/src/reflect/value.go:336 +0xd8 gopkg.in/check%2ev1.(*suiteRunner).forkTest.func1() /home/travis/gopath/src/github.com/cilium/cilium/vendor/gopkg.in/check.v1/check.go:781 +0xabb gopkg.in/check%2ev1.(*suiteRunner).forkCall.func1() /home/travis/gopath/src/github.com/cilium/cilium/vendor/gopkg.in/check.v1/check.go:675 +0xe1 Goroutine 250 (running) created at: github.com/cilium/cilium/pkg/trigger.NewTrigger() /home/travis/gopath/src/github.com/cilium/cilium/pkg/trigger/trigger.go:129 +0x2c4 github.com/cilium/cilium/pkg/ipam.(*NodeManager).Update() /home/travis/gopath/src/github.com/cilium/cilium/pkg/ipam/node_manager.go:300 +0xfe5 github.com/cilium/cilium/pkg/azure/ipam.(*IPAMSuite).TestIpamManyNodes() /home/travis/gopath/src/github.com/cilium/cilium/pkg/azure/ipam/ipam_test.go:345 +0xbcd runtime.call32() /home/travis/.gimme/versions/go1.15.2.linux.amd64/src/runtime/asm_amd64.s:540 +0x3d reflect.Value.Call() /home/travis/.gimme/versions/go1.15.2.linux.amd64/src/reflect/value.go:336 +0xd8 gopkg.in/check%2ev1.(*suiteRunner).forkTest.func1() /home/travis/gopath/src/github.com/cilium/cilium/vendor/gopkg.in/check.v1/check.go:781 +0xabb gopkg.in/check%2ev1.(*suiteRunner).forkCall.func1() /home/travis/gopath/src/github.com/cilium/cilium/vendor/gopkg.in/check.v1/check.go:675 +0xe1 ``` Closes #13580 Fixes: b89979a697be ("ipam: Move instance ID retrieval into generic code") Signed-off-by: Tobias Klauser <tklauser@distanz.ch> 16 October 2020, 06:38:42 UTC
7295879 build(helm): Add makefile target to generate helm README.md This commit is to add Makefile target to generate helm README.md from values.yaml and template file. Github Action job is added and fail the build if there is any dirty changes. Closes #13583 Signed-off-by: Tam Mach <sayboras@yahoo.com> 16 October 2020, 06:35:49 UTC
d749185 helm: Update README.md for helm chart This commit is to update README.md from values.yaml as per latest helm chart. Relates #13583 Signed-off-by: Tam Mach <sayboras@yahoo.com> 16 October 2020, 06:35:49 UTC
3f765be Improve policy documentation * Add more information in the L7 HTTP example * Add a pointers to the L7 examples Signed-off-by: Manuel Buil <mbuil@suse.com> 15 October 2020, 23:10:22 UTC
b0a7e08 Adding affinity for the operator Signed-off-by: Youssef Azrak <yazrak.tech@gmail.com> 15 October 2020, 22:59:20 UTC
4ab0161 datapath/linux: return errors when unable to setup encrypt interface When unable to setup the encryption interface we should error out prematurely so the user will be aware of the possible errors with the encryption interface set. Fixes: f8ca446c07e1 ("cilium: Add option ipv*-pod-subnets to enable chaining + encryption") Signed-off-by: André Martins <andre@cilium.io> 15 October 2020, 22:45:41 UTC
8c3b56d helm: Remove unused serviceAccount values PR #13259 made the service account configuration a top level value (`.Values.serviceAccounts`). This commit removes the now unused service accounts under `hubble.ui.serviceAccount` and `hubble.relay.serviceAccount`, which are supposed to be configured via `serviceAccounts.ui` and `serviceAccounts.relay`. In addition, also fixes the service account value in the Hubble UI cluster role and cluster role binding. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> 15 October 2020, 22:45:06 UTC
22c1a27 datapath: Avoid loops with local-redirect service translation - Service translation logic for a local-redirect service can cause packets to be looped back to a service node-local backend after translation. This can happen when the node-local backend itself tries to connect to the service frontend for which it acts as a backend. There are cases where this can break traffic flow if the backend needs to forward the redirected traffic to the actual service frontend. Hence, allow service translation for pod traffic getting redirected to backend (across namespaces), but skip service translation for backend to to itself or another service backend within the same namespace. - Wrap the sk_lookup_* calls ifdef so that it'll be skipped on older kernels that don't support these. Also, disable this code path for ipv6 (specifically, ip4 mapped ipv6 addresses). For example, a local-redirect service exists with a frontend <169.254.169.254, 80> and backend be1. When traffic destined to the frontend originates from be1 in namespace ns1 (host ns in case of hostNetwork pods or child ns for regular pods), when be1 is selected as a backend, the traffic would get looped backed to be1. Identify such cases by doing a socket lookup for be1 <ip, port> in its namespace, ns1, and skip service translation. Testing done: Verified the desired behavior with LRP backend deployed in hostNetwork mode. Created the following LRP on an EKS cluster running kiam proxy pods - apiVersion: "cilium.io/v2" kind: CiliumLocalRedirectPolicy metadata: name: "lrp-addr" spec: redirectFrontend: addressMatcher: ip: "169.254.169.254" toPorts: - port: "80" protocol: TCP redirectBackend: localEndpointSelector: matchLabels: type: proxy toPorts: - port: "8181" protocol: TCP cilium service list ID Frontend Service Type Backend 4 169.254.169.254:80 LocalRedirect 1 => 192.168.85.200:8181 kubectl exec nginx-server-2 -- curl -s -w "\n" -X GET http://169.254.169.254/latest/meta-data/iam/security-credentials/ service translation done for pod traffic : cilium monitor --from 1228 Press Ctrl-C to quit level=info msg="Initializing dissection cache..." subsys=monitor -> stack flow 0xbdbbfc09 identity 7679->31968 state new ifindex 0 orig-ip 0.0.0.0: 192.168.114.69:52880 -> 192.168.99.244:8181 tcp SYN -> endpoint 1228 flow 0xda3b5569 identity 31968->7679 state reply ifindex 0 orig-ip 192.168.99.244: 192.168.99.244:8181 -> 192.168.114.69:52880 tcp SYN, ACK service translation skipped for kiam pod backend traffic : 06:49:49.848156 Out 02:91:74:9f:76:8d ethertype IPv4 (0x0800), length 76: 192.168.33.99.41746 > 169.254.169.254.80: Flags [S], seq 3745277117, win 62727, options [mss 8961,sackOK,TS val 3718432708 ecr 0,nop,wscale 7], length 0 06:49:49.848232 In 02:cc:a8:69:5f:89 ethertype IPv4 (0x0800), length 76: 169.254.169.254.80 > 192.168.33.99.41746: Flags [S.], seq 3888998160, ack 3745277118, win 63196, options [mss 9040,sackOK,TS val 1696374119 ecr 3718432708,nop,wscale 7], length 0 Signed-off-by: Aditi Ghag <aditi@cilium.io> 15 October 2020, 21:56:14 UTC
bf16257 Azure IPAM: option to ignore primary addresses Cilium Azure IPAM allocates secondaries (Azure meaning) IPConfigs only, and must be provided with an interface having a primary IPConfiguration ready. Nodes starting with a primary IPConfig will be served that IPConfig's address and its subnet's route by Azure DHCP (as op. to non-primary IPConfigurations, which are managed by third parties tools like Cilium). Unless explicitly configured to disable dhcp on that interface, hosts will then be configured to use that IP and route. Assuming an host with several interfaces, by placing that primary IP address into its IPAM pool, Cilium can allocate it to a pod (or eg. cilium_host interface), and install rules and routes which will set the pod to preempt inbound traffic for that address. Which also implies traffic from that interface's subnet won't reach the host (or hostnetworked pods), but the pod (and the host can't exchange with other pods or hosts in that subnet). The requirement to provide a pre-existing interface makes misconfiguration on Azure more likely (than ie. on AWS), and requires some non intuitive hosts setup. Disabling primary addr usage by default would be a regression, so inappropriate for a 1.8 backport, hence the added flag. Signed-off-by: Benjamin Pineau <benjamin.pineau@datadoghq.com> 15 October 2020, 19:46:51 UTC
58d352f Revert "Differentiate UDP and TCP Protocols in Services" This reverts commit 107b0f5616cdc07204dc5c212e4e9de1b48aa745. Signed-off-by: Nate Sweet <nathanjsweet@pm.me> 15 October 2020, 19:08:51 UTC
4ac7670 identity: Fix potential data races and nil pointer panics The local and global allocator are initialized asynchronously in `InitIdentityAllocator`. Therefore we must check for the initialization channels a potential synchronization point before we can check for `m.{localIdentities,IdentityAllocator} != nil`. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> 15 October 2020, 17:26:27 UTC
005291c identity: Fix nil pointer panic in LookupIdentityByID Because the identity allocator is initialized asychronously via `InitIdentityAllocator`, the local identitiy allocator might not have been initialized yet when the lookup functions are called. This can cause nil pointer panics, as observed in #13479. Before b194612c004c3e69289286e9a35d337b2645fc50, this nil pointer panic could not occur in `LookupIdentityByID` as the function checked for `m.IdentityAllocator != nil` which also implies `m.localIdentities != nil`. This commit adds an explict check for `m.localIdentities` and fixes a potential data race by checking the initialization channels before accessing `m.localIdentities` or `m.IdentityAllocator`. Fixes: #13479 Fixes: b194612c004c ("identity: Avoid kvstore lookup for local identities") Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> 15 October 2020, 17:26:27 UTC
6fbcdae pkg/k8s: mark unused 'k8s-watcher-queue-size' flag for removal Signed-off-by: André Martins <andre@cilium.io> 15 October 2020, 17:08:20 UTC
2f48782 Create healthz HTTP endpoint for kube-proxy replacement This PR adds in code that creates a healthz HTTP endpoint for kube-proxy replacement. It also provides an option to configure the bind address through cilium-agent and helm charts. Fixes: #10621 Signed-off-by: Swaminathan Vasudevan <svasudevan@suse.com> Co-authored-by: Valas Valancius <valas@google.com> 15 October 2020, 16:31:26 UTC
ddc2984 CODEOWNERS: change docs to docs-structure Rename team docs to docs-structure. Signed-off-by: André Martins <andre@cilium.io> 15 October 2020, 16:28:31 UTC
598090c pkg/policy: ignore test mutex comparison Comparing both mutexes causes the race detector to have false positives. As a workaround we can assign a new mutex before performing the comparison of both structures as mutexes are not relevant for the validity of the test. Signed-off-by: André Martins <andre@cilium.io> 15 October 2020, 14:42:09 UTC
b34e5d8 service: Use initNextID in acquireLocalID() When rolling over, it should use initNextID instead of FirstFreeServiceID, which doesn't belong to the IDAllocator. This would create problems if FirstFreeServiceID and FirstFreeBackendID have different values although now they happen to be the same. Fixes: ab9cf4ba4206 ("service: Make local ID allocator more service agnostic") Signed-off-by: Han Zhou <hzhou8@ebay.com> 15 October 2020, 13:31:35 UTC
cbe9182 Update Go to 1.15.3 Signed-off-by: Tobias Klauser <tklauser@distanz.ch> 15 October 2020, 13:07:44 UTC
09619ae helm: improve hubble related config documentation in values file Signed-off-by: Robin Hahling <robin.hahling@gw-computing.net> 15 October 2020, 09:50:41 UTC
07ac1ff api-limiter: Make auto adjust test less flaky Fixes: #13508 The auto adjust unit test relies on time.Sleep to schedule multiple requests. The existing sleep times were close to the limit for the test to pass. This commit shortens the times so the test is less sensitive to delays and more likely to pass. The use of a fake clock in the tests, e.g. github.com/jonboulle/clockwork, to make the tests both reliable and more accurate was investigated. However, the existing code builds on golang.org/x/time/rate which itself does not have support for a fake clock in tests, so golang.org/x/time/rate would need to updated first. Signed-off-by: Tom Payne <tom@isovalent.com> 15 October 2020, 08:45:02 UTC
cd2cff8 api-limiter: Add missing type declaration Signed-off-by: Tom Payne <tom@isovalent.com> 15 October 2020, 08:45:02 UTC
08db0ce lock: fix data race in (*SemaphoredMutexSuite).TestParallelism() In this case it's fine to use the global source (which is concurrency-safe) as it is seeded with time.Now().UnixNano() Closes #13569 Fixes: fac5ddea5007 ("rand: add and use concurrency-safe PRNG source") Signed-off-by: Tobias Klauser <tklauser@distanz.ch> 15 October 2020, 08:27:16 UTC
8754e59 k8s: install all CRDs regardless of options used To avoid complexity in the daemon, the operator should install all CRDs regardless of the options used in the operator. This prevents Cilium from stopping its initialization while waiting for the CRD "ciliumidentities.cilium.io". Signed-off-by: André Martins <andre@cilium.io> 15 October 2020, 08:22:00 UTC
a73cb4d k8s: Add ability to watch for full CRD object This fixes the CRD controller to work for K8s 1.14 and below. Fixes: https://github.com/cilium/cilium/issues/13498 Signed-off-by: Chris Tarazi <chris@isovalent.com> 15 October 2020, 08:22:00 UTC
c9474ac k8s: Add capability flag for watching metav1.POM The ability to watch metav1.PartialObjectMetadata (or POM) and metav1.Table was introduced in K8s 1.15. [1] This is relevant because our CRD controller attempts to fetch CRDs in the cluster efficiently by requesting the CRD in a POM object. The CRD controller does this in order to avoid requesting the full object, which may contain a large validation schema and other irrelevant fields. This is important because in large-scale environments, all agents will request all the CRDs at once which will put unnecessary load on the apiserver. However, we cannot perform this request at all on versions of K8s 1.14 and below. Therefore, we must fullback to requesting the full CRD object. Hence, this commit allows us to check whether the apiserver supports this action, so that we can efficiently request CRDs on versions that do support it. [1]: KEP: https://github.com/kubernetes/enhancements/blob/master/keps/sig-api-machinery/20190322-server-side-get-to-ga.md#goals [1]: PR: https://github.com/kubernetes/kubernetes/pull/71548 Signed-off-by: Chris Tarazi <chris@isovalent.com> 15 October 2020, 08:22:00 UTC
ee603b9 daemon: Fix receive on K8s cache channel in init The channel should only be received from if K8s is enabled, otherwise it will block forever. Once the K8s resources are synced, the channel will be closed and unblocks. Signed-off-by: Chris Tarazi <chris@isovalent.com> 15 October 2020, 08:22:00 UTC
e91f803 daemon, k8s: Make agent wait for CRDs Previously, cilium-operator was responsible for waiting for the CRDs to be registered because the agent was registering them. Now the roles have been inverted. This commit implements a new CRD controller that will fetch the CRDs efficiently, by requesting them as metav1.PartialObjectMetadata. These objects contain the minimal metadata to identify the object, without containing the full validation schema, for example, in the case of CRDs. This is important because moving the waiting to the agent, from cilium-operator, means that there are many more agents that will hit the K8s apiserver upon startup, than cilium-operators would, due cilium-agent being a Daemonset. The efficient retrieval of the CRDs will help minimize the load on the apiserver when the agents start up. This commit effectively reverts 8b4b01009 ("operator: Revert waiting for CRDs"), but introduces the functionality in a complete different manner. In addition, this commit reorders the agent (daemon) initialization steps to first wait until all Cilium CRDs have been registered. It is important to perform this first before the CiliumNode and other node initialization code runs, as specific node information is needed when the Services and Pods controllers begin. Once the node initialization code is run, then we can kick off all the K8s watchers (controllers). This also prevents the agent from blocking its restart while it's waiting for K8s subsys to be initialized. Co-authored-by: André Martins <andre@cilium.io> Signed-off-by: Chris Tarazi <chris@isovalent.com> 15 October 2020, 08:22:00 UTC
bbdedd3 k8s: Allow hasSynced func to be passed directly A hasSynced function is part of a controller which returns a boolean indicating whether it has synced all the K8s resources it was watching to its cache. This is a preparatory commit for a future commit that will create a CRD controller with a custom hasSynced function, rather than depending on the underlying K8s controller implementation. The reason is because we needed to customize how the CRD controller retrieves CRDs for efficiency purposes, which are important in large-scale environments. Co-authored-by: André Martins <andre@cilium.io> Signed-off-by: Chris Tarazi <chris@isovalent.com> 15 October 2020, 08:22:00 UTC
a18159b k8s: Fix variable name of CLRP CRD This is a cosmetic change. Signed-off-by: Chris Tarazi <chris@isovalent.com> 15 October 2020, 08:22:00 UTC
1e13ef0 k8s: Define and use dedicated apiextensions client This helps code reuse. This client is used for registering the CRDs and to wait for the CRDs. Signed-off-by: Chris Tarazi <chris@isovalent.com> 15 October 2020, 08:22:00 UTC
65f510e k8s: Refactor K8s client code This commit only contains non-functional changes. It is a refactor meant to ease the readability of the file. The highlight of the changes: - Rearrange functions from most important to least important - Unexport CreateClient() because it was not used outside of this file - Rename vars and functions to be consistent with Golang style - Fix Godocs on each exported type Signed-off-by: Chris Tarazi <chris@isovalent.com> 15 October 2020, 08:22:00 UTC
ce30735 daemon: Move CRD wait timeout option to agent Previously, this option existed in cilium-operator. It has been deprecated in 325547f6e ("operator: Deprecate crd-wait-timeout"). This commit moves this option to the agent. In a future commit, the agent will be taking on the resposbility of waiting for the CRDs to be registered. Previously, cilium-operator had this resposbility. Signed-off-by: Chris Tarazi <chris@isovalent.com> 15 October 2020, 08:22:00 UTC
0f1d040 Fix typo in UpdateEC2AdapterLimitViaAPI command line flag This PR fixes the typo in UpdateEC2AdapterLimitViaAPI command line flag by introducing a new command line flag and deprecating the old command line flag. Fixes: #12396 Signed-off-by: Swaminathan Vasudevan <svasudevan@suse.com> 15 October 2020, 07:42:26 UTC
b9fdf18 helm: Bump dependency versions Signed-off-by: Joe Stringer <joe@cilium.io> 14 October 2020, 22:28:26 UTC
44c591d install: Fix version update scripting Fix the Makefile to properly inject the appropriate image tags into the various Cilium values fields during the release bump process. Remove the redundant comments explaining repository, tag and pullPolicy since they would interfere with the sed expression here that matches the line after the repository line to update the tag. Signed-off-by: Joe Stringer <joe@cilium.io> 14 October 2020, 22:28:26 UTC
007c13a install: Make shell invocation more obvious This particular statement was being used to iterate over a single item in a variable which is a particular filepath, but really what is needed is a shell invocation to run multiple commands with shell syntax. Modify it to make this more clear. Signed-off-by: Joe Stringer <joe@cilium.io> 14 October 2020, 22:28:26 UTC
dae3d3c install: Add workaround for helm chart versions This overrides the version in the tree so that when you run `make -C install/kubernetes`, it doesn't break the YAML generation for the values.yaml and quick-install.yaml with an upcoming patch. When we branch for v1.9 (circa v1.9.0-rc2), we can drop this. Signed-off-by: Joe Stringer <joe@cilium.io> 14 October 2020, 22:28:26 UTC
a2d2478 test: Fix kube-proxy-free on GKE due to wrong k8sServiceHost value On GKE, kube-apiserver doesn't run on the first node as in our Jenkins builds, leading to a failure to start Cilium in kube-proxy-free mode. Since kube-proxy is always provisioned on GKE, we don't need to specify k8sServiceHost and k8sServicePort on GKE. Reported-by: Gilberto Bertin <gilberto@isovalent.com> Suggested-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: Paul Chaignon <paul@cilium.io> 14 October 2020, 17:56:38 UTC
efa6665 allocator/podcidr: fix race conditions in tests This commit fixes some race conditions found in the unit tests of podcidr IPAM allocator. Signed-off-by: André Martins <andre@cilium.io> 14 October 2020, 17:49:20 UTC
0609014 pkg/idpool: fix test for race detector This test was failing for the race detector mode as the number of identities was lower than the number of go routines. This caused the IDs to never become available for all go routines to be successful. Signed-off-by: André Martins <andre@cilium.io> 14 October 2020, 15:43:06 UTC
cd84519 vagrant: New helpers to get logs and watch pods - kslogs dumps all Cilium logs from all agents. - wk displays the output of 'kubectl get pods' with updates every 2s. Further arguments can be passed to wk, e.g., 'wk -o wide'. - wks does the same as wk but for the kube-system namespace. Signed-off-by: Paul Chaignon <paul@cilium.io> 14 October 2020, 14:49:33 UTC
00f1228 vagrant: Add same kubectl aliases as for test VMs Test VMs have 'k', 'ks', and 'cilium_pod' as aliases since commit 53cf46a ("test: Add bash aliases and completion for kubectl"). The present commit adds the same aliases for dev. VMs. Signed-off-by: Paul Chaignon <paul@cilium.io> 14 October 2020, 14:49:33 UTC
41c8944 docs: Add Hubble to SIGs table Signed-off-by: Beatriz Martínez <beatriz@isovalent.com> 14 October 2020, 14:36:50 UTC
078ec54 install/kubernetes: consistent case spelling of iptables related values Make the case spelling of the newly introduced "ipTablesRandomFully" value consistent with other iptables option values which use the "iptables" spelling. Fixes: 4e39def13bca ("daemon: Enable configuration of iptables --random-fully") Signed-off-by: Tobias Klauser <tklauser@distanz.ch> 14 October 2020, 14:27:19 UTC
7d234ae pkg/hubble: ignore klog/v2 in goleak detector Signed-off-by: André Martins <andre@cilium.io> 14 October 2020, 13:47:18 UTC
e38b336 docs: Fix shell syntax issue in OpenShift guide This issue affects zsh users, for some reason bash doesn't attempt to interpret the `.SecurityGroups[0].GroupId` expression, while zsh does and it results in an error like this: zsh: no matches found: .SecurityGroups[0].GroupId Exception ignored in: <_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'> BrokenPipeError: [Errno 32] Broken pipe Signed-off-by: Ilya Dmitrichenko <errordeveloper@gmail.com> 14 October 2020, 13:28:02 UTC
bab1ea2 test: Increase timeout for waiting LB IP addr on GKE Recently, the test case which waits for LB IP addr started to fail: github.com/cilium/cilium/test/ginkgo-ext/scopes.go:514 Cannot retrieve loadbalancer IP for test-lb Expected <*errors.errorString | 0xc000324af0>: { s: "could not get service LoadBalancer IP addr: 30s timeout expired", } to be nil github.com/cilium/cilium/test/k8sT/Services.go:920 This happens because on GKE we use GCP LB which provisioning might take longer than 30s. For now, let's increase the timeout. If it doesn't help, we can always install dummy-lb instead. Signed-off-by: Martynas Pumputis <m@lambda.lt> 14 October 2020, 12:50:18 UTC
34626c3 ci: Do not label control plane nodes with cilium.io/node Labeling control plane nodes with cilium.io/node when running against a target cluster was causing some applications to fail to schedule properly. Fixes: #13395 Signed-off-by: Michal Rostecki <mrostecki@opensuse.org> 14 October 2020, 12:06:46 UTC
3453877 Fix wrong csum for non-first ipv4 fragments Before this commit, natting are blindly changing l4 ports to non-first ipv4 fragments, causing wrong csum. Also fixes the same issue during the rev natting for dsr. Signed-off-by: Yuan Liu <liuyuan@google.com> 14 October 2020, 10:53:17 UTC
350f0b3 test: Test iptables masquerading with --random-fully Signed-off-by: Paul Chaignon <paul@cilium.io> 14 October 2020, 09:41:50 UTC
4e39def daemon: Enable configuration of iptables --random-fully Signed-off-by: Karl Heins <karlheins@northwesternmutual.com> 14 October 2020, 09:41:50 UTC
afe94d1 helm: Correct indentation for imagePullSecret This commit is to correct indentation for imagePullSecret object Signed-off-by: Tam Mach <sayboras@yahoo.com> 14 October 2020, 09:28:51 UTC
626a23f test/vagrant: Fix NFS setup for test VMs The well-known 'interrupted system call' errors started popping up in the test VMs as well. These are caused by newer Go versions being more strigent with slow system calls through the rsync shared directory. Using NFS proved to be a good way to avoid these errors in the dev. VMs. This commit fixes the NFS setup in test VMs to allow for the same workaround. Signed-off-by: Paul Chaignon <paul@cilium.io> 14 October 2020, 08:57:45 UTC
b461666 bpf_host: describe the position of {to,from}-{host,netdev} in the data path Add some comments about when and where exactly the BPF host entrypoints are attached to. Signed-off-by: Timo Beckers <timo@isovalent.com> 14 October 2020, 08:41:13 UTC
8f0e7fa bpf: only clean up XDP from devices with XDP attached Currently, during agent startup, cilium removes XDP from all interfaces except for `cilium_host`, `cilium_net` and `$XDP_DEV` regardless of whether there is an XDP program attached to it. For some drivers, e.g. Mellanox mlx5, the following command will cause device reset regardless of whether there is an XDP program attached to it, which introduces node and pod network interruption: `ip link set dev $DEV xdpdrv off`. This patch adds a check of XDP program existence to avoid such network interruption. Fixes: #13526 Reported-by: ArthurChiao <arthurchiao@hotmail.com> Signed-off-by: Jaff Cheng <jaff.cheng.sh@gmail.com> 14 October 2020, 08:35:08 UTC
a0ad5ad EDT: disable bandwidth-manager for new deployments Bandwidth manager is a beta feature as of this commit. Disable it by default for new deployments. Signed-off-by: Quentin Monnet <quentin@isovalent.com> 14 October 2020, 08:32:59 UTC
42b8e4d K8sBandwidthTest: enforce usage of bandwidth manager when deploying Make sure that tests for the bandwidth manager deploy with the manager enabled, in case this were not performed by default. Signed-off-by: Quentin Monnet <quentin@isovalent.com> 14 October 2020, 08:32:59 UTC
4dde09f helm: remove random value file It looks like this file was wrongly committed as part of the Helm refactoring PR (#13259). Signed-off-by: Robin Hahling <robin.hahling@gw-computing.net> 14 October 2020, 08:30:31 UTC
7ce9432 helm: bring back hubble dependencies validation PR #11577 added checks to ensure that the following conditions are met: - Hubble UI requires Hubble Relay - Hubble Relay requires Hubble These checks were pruned as part of the Helm refactoring PR (#13259) so this commit adds them back. This ensures that invalid Hubble settings such as enabling UI without enable Relay are caught early to avoid confusion. Fixes: #13537 Signed-off-by: Robin Hahling <robin.hahling@gw-computing.net> 14 October 2020, 08:30:03 UTC
c04d5d1 install: repository changed from quay.io to docker.io for hubble-ui Signed-off-by: Renat Tuktarov renat@isovalent.com 14 October 2020, 08:29:32 UTC
d55b0cb CODEOWNERS: fix owner assignment for hubble related helm charts PR #13259 move helm chart files around and the CODEOWNERS file was not updated accordingly. Signed-off-by: Robin Hahling <robin.hahling@gw-computing.net> 14 October 2020, 08:29:01 UTC
e3df4a8 helm: remove unused var in make quick-install target Signed-off-by: Robin Hahling <robin.hahling@gw-computing.net> 14 October 2020, 08:28:19 UTC
2099610 docs: Adjust the hubble CLI definition The wording on it was a bit off. Signed-off-by: Glib Smaga <code@gsmaga.com> 14 October 2020, 08:26:16 UTC
b2e58d7 Fix kubectl command in cassandra NetworkPolicy documentation. Fixes: #13544 Signed-off-by: Vadim Ponomarev <velizarx@gmail.com> 13 October 2020, 19:17:31 UTC
985e8cb maps: move mocks into separate testutils/mockmaps package This commit introduces some preliminary work to consolidate the mock maps used for testing (#13305) by moving the implementations of the different mocks (ctmap, lbmap and nat) into a single `mockmaps` package in `testutils`. No changes in the interface of the mocks are introduced yet. Signed-off-by: Gilberto Bertin <gilberto@isovalent.com> 13 October 2020, 16:14:27 UTC
0d4c4b6 fqdn: remove remnants godoc comments mentioning DNS poller Follow-up for #13229 which removed the DNS poller implementation. Signed-off-by: Tobias Klauser <tklauser@distanz.ch> 13 October 2020, 12:04:06 UTC
a5d2a4c Fix script cass-populate-tables from cassandra examples. Issue: #13533 Signed-off-by: Vadim Ponomarev <velizarx@gmail.com> 13 October 2020, 12:02:40 UTC
e86c29c kvstore: Do not write to read-only keys in join-cluster mode When in --join-cluster mode, node registration is done using the "noderegister" key. For such nodes the kvstore only allows writes to the "noderegister" and ".initlock" keys, so avoid writing or deleting other keys. Signed-off-by: Jarno Rajahalme <jarno@covalent.io> 13 October 2020, 09:07:56 UTC
3e85bd5 build Add a debug make target This target is unstripped, and built with compilation optimizations and inlining disabled, which can be used for debugging. Signed-off-by: Aditi Ghag <aditi@cilium.io> 13 October 2020, 09:02:25 UTC
1f896fe vagrant: Default to NFS in the dev. VMs and remove support for rsync. Rsync support has been broken for a few Go releases because it's too slow and now results in many 'interrupted system call' errors. Signed-off-by: Paul Chaignon <paul@cilium.io> 13 October 2020, 09:00:50 UTC
94fd90d doc: Kubeadm guide The official kubeadm documentation [0] used to have instructions for deploying various network providers, including Cilium, but it doesn't anymore. This change adds instructions about deploying Cilium on kubeadm managed clusters to our documentation. [0] https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/ Fixes #13278 Signed-off-by: Michal Rostecki <mrostecki@opensuse.org> 13 October 2020, 08:54:39 UTC
d996d1f test: Disable hostfw by default when running tests locally The host firewall is disabled by default in the CI (unless you have label ci/host-firewall on your PR). The same should be true when running tests locally with the test VMs. os.Getenv("HOST_FIREWALL") returns an empty string for undefined env. variables, so we also need to match on that. Signed-off-by: Paul Chaignon <paul@cilium.io> 13 October 2020, 08:52:37 UTC
4d9167f maglev: Hash with pkg/murmur3 The main change in the commit is the change of the murmur3 implementation. This allows us to get rid of the github.com/spaolacci/murmur3 dependency. We can argue that our own implementation of the hash has a better test coverage because of the randomized testing against the reference implementation. Also, it has less LoC, because we don't need other than the 128-bit variant of murmur3 and no need to implement the Hash interface. Next big change is that instead of hashing twice to retrieve two 64-bit values, we hash once to retrieve a 128-bit value. Previously, the 64-bit variant under the hood was calling the 128-bit variant, and then dropping the second 64-bit value. This gives us the following speed-up: MaglevTestSuite.BenchmarkGetMaglevTable 10 187498662 ns/op (BEFORE) MaglevTestSuite.BenchmarkGetMaglevTable 10 181484487 ns/op (AFTER) In addition, we make the TestBackendRemove case more robust by checking a percentage of changes after the removal instead of expecting that existing backends configuration won't change (which is not true, see the 5.3 Section in the paper [1]). Finally, we make InitMaglevSeeds to return an error instead of panicking, as it's a caller's responsibility (not a library) to decide what to do with a failure. [1]: https://static.googleusercontent.com/media/research.google.com/en/pubs/archive/44824.pdf Signed-off-by: Martynas Pumputis <m@lambda.lt> 13 October 2020, 08:49:48 UTC
8d54087 murmur3: Add reference implementation of murmur3_x64_128 The reference implementation [1] is used for testing the pkg/murmur3 implementation. It's licensed under MIT License, thus the extra license header in murmur3_reference.go. [1]: https://github.com/aappleby/smhasher/blob/master/src/MurmurHash3.cpp Signed-off-by: Martynas Pumputis <m@lambda.lt> 13 October 2020, 08:49:48 UTC
5232207 murmur3: Add native implementation of murmur3_x64_128 This is going to replace github.com/spaolacci/murmur3. Signed-off-by: Martynas Pumputis <m@lambda.lt> 13 October 2020, 08:49:48 UTC
80a7179 endpoint: Differentiate between Stop and Delete action This change moves the code which stops endpoint's goroutines to a separate Stop function, which can be used directly in case we want to stop goroutines without deleting BPF maps and datapath state - for example, where we shut down the agent due to received signal. Fixes: #13198 Signed-off-by: Michal Rostecki <mrostecki@opensuse.org> 13 October 2020, 08:42:58 UTC
1662955 pkg/azure: protect against potential deadlock Fixes the following potential deadlock ``` POTENTIAL DEADLOCK: Inconsistent locking. saw this ordering in one goroutine: happened before pkg/lock/lock_debug.go:78 lock.(*internalRWMutex).RLock { i.RWMutex.RLock() } <<<<< pkg/azure/ipam/node.go:162 ipam.(*Node).ResyncInterfacesAndIPs { n.manager.mutex.RLock() } pkg/ipam/node.go:358 ipam.(*Node).recalculate { a, err := n.ops.ResyncInterfacesAndIPs(context.TODO(), scopedLog) } pkg/ipam/node.go:334 ipam.(*Node).UpdatedResource { n.recalculate() } pkg/ipam/node_manager.go:315 ipam.(*NodeManager).Update { return node.UpdatedResource(resource) } pkg/azure/ipam/ipam_test.go:345 ipam.(*IPAMSuite).TestIpamManyNodes { mngr.Update(state[i].cn) } ~/.gimme/versions/go1.15.2.linux.amd64/src/reflect/value.go:475 reflect.Value.call { call(frametype, fn, args, uint32(frametype.size), uint32(retOffset)) } ~/.gimme/versions/go1.15.2.linux.amd64/src/reflect/value.go:336 reflect.Value.Call { return v.call("Call", in) } pkg/../vendor/gopkg.in/check.v1/check.go:781 check%2ev1.(*suiteRunner).forkTest.func1 { c.method.Call([]reflect.Value{reflect.ValueOf(c)}) } pkg/../vendor/gopkg.in/check.v1/check.go:675 check%2ev1.(*suiteRunner).forkCall.func1 { dispatcher(c) } happened after pkg/lock/lock_debug.go:78 lock.(*internalRWMutex).RLock { i.RWMutex.RLock() } <<<<< pkg/ipam/types/types.go:340 types.(*InstanceMap).ForeachAddress { m.mutex.RLock() } pkg/azure/ipam/node.go:164 ipam.(*Node).ResyncInterfacesAndIPs { n.manager.instances.ForeachAddress(n.node.InstanceID(), func(instanceID, interfaceID, ip, poolID string, addressObj ipamTypes.Address) error { } pkg/ipam/node.go:358 ipam.(*Node).recalculate { a, err := n.ops.ResyncInterfacesAndIPs(context.TODO(), scopedLog) } pkg/ipam/node.go:334 ipam.(*Node).UpdatedResource { n.recalculate() } pkg/ipam/node_manager.go:315 ipam.(*NodeManager).Update { return node.UpdatedResource(resource) } pkg/azure/ipam/ipam_test.go:345 ipam.(*IPAMSuite).TestIpamManyNodes { mngr.Update(state[i].cn) } ~/.gimme/versions/go1.15.2.linux.amd64/src/reflect/value.go:475 reflect.Value.call { call(frametype, fn, args, uint32(frametype.size), uint32(retOffset)) } ~/.gimme/versions/go1.15.2.linux.amd64/src/reflect/value.go:336 reflect.Value.Call { return v.call("Call", in) } pkg/../vendor/gopkg.in/check.v1/check.go:781 check%2ev1.(*suiteRunner).forkTest.func1 { c.method.Call([]reflect.Value{reflect.ValueOf(c)}) } pkg/../vendor/gopkg.in/check.v1/check.go:675 check%2ev1.(*suiteRunner).forkCall.func1 { dispatcher(c) } in another goroutine: happened before pkg/lock/lock_debug.go:78 lock.(*internalRWMutex).RLock { i.RWMutex.RLock() } <<<<< pkg/ipam/types/types.go:381 types.(*InstanceMap).ForeachInterface { m.mutex.RLock() } pkg/azure/ipam/node.go:74 ipam.(*Node).PrepareIPAllocation { err = n.manager.instances.ForeachInterface(n.node.InstanceID(), func(instanceID, interfaceID string, interfaceObj ipamTypes.InterfaceRevision) error { } pkg/ipam/node.go:548 ipam.(*Node).determineMaintenanceAction { a.allocation, err = n.ops.PrepareIPAllocation(scopedLog) } pkg/ipam/node.go:584 ipam.(*Node).maintainIPPool { a, err := n.determineMaintenanceAction() } pkg/ipam/node.go:678 ipam.(*Node).MaintainIPPool { err := n.maintainIPPool(ctx) } pkg/ipam/node_manager.go:272 ipam.(*NodeManager).Update.func1 { if err := node.MaintainIPPool(context.TODO()); err != nil { } pkg/trigger/trigger.go:206 trigger.(*Trigger).waiter { t.params.TriggerFunc(reasons) } happened after pkg/lock/lock_debug.go:78 lock.(*internalRWMutex).RLock { i.RWMutex.RLock() } <<<<< pkg/azure/ipam/instances.go:75 ipam.(*InstancesManager).FindSubnetForAllocation { m.mutex.RLock() } pkg/azure/ipam/node.go:115 ipam.(*Node).PrepareIPAllocation.func1 { poolID, available := n.manager.FindSubnetForAllocation(preferredPoolIDs) } pkg/ipam/types/types.go:364 types.foreachInterface { if err := fn(instanceID, rev.Resource.InterfaceID(), rev); err != nil { } pkg/ipam/types/types.go:386 types.(*InstanceMap).ForeachInterface { return foreachInterface(instanceID, instance, fn) } pkg/azure/ipam/node.go:74 ipam.(*Node).PrepareIPAllocation { err = n.manager.instances.ForeachInterface(n.node.InstanceID(), func(instanceID, interfaceID string, interfaceObj ipamTypes.InterfaceRevision) error { } pkg/ipam/node.go:548 ipam.(*Node).determineMaintenanceAction { a.allocation, err = n.ops.PrepareIPAllocation(scopedLog) } pkg/ipam/node.go:584 ipam.(*Node).maintainIPPool { a, err := n.determineMaintenanceAction() } pkg/ipam/node.go:678 ipam.(*Node).MaintainIPPool { err := n.maintainIPPool(ctx) } pkg/ipam/node_manager.go:272 ipam.(*NodeManager).Update.func1 { if err := node.MaintainIPPool(context.TODO()); err != nil { } pkg/trigger/trigger.go:206 trigger.(*Trigger).waiter { t.params.TriggerFunc(reasons) } ``` Fixes: 24cb0618756b ("azure: Calculate available addresses based on subnet resource") Signed-off-by: André Martins <andre@cilium.io> 13 October 2020, 07:40:20 UTC
f6e42c4 pkg/ipam: avoid deadlock of node manager in case of error In case of an error the node manager mutex was never unlocked in case of an error which would them cause a deadlock for this structure. Fixes: 2eb51a30c9d2 ("eni: Refactor ENI IPAM into generic ipam.NodeManager") Signed-off-by: André Martins <andre@cilium.io> 13 October 2020, 07:40:20 UTC
12942bb helm: Update documentation links to point to stable These settings should point to the currently supported stable version of the docs rather than the development version, point to stable instead of latest. Signed-off-by: Joe Stringer <joe@cilium.io> 13 October 2020, 07:37:49 UTC
9ab7bfc Prepare for release v1.9.0-rc1 Signed-off-by: Joe Stringer <joe@cilium.io> 13 October 2020, 00:01:20 UTC
b553f49 Enables LRP to react on pod readiness. Currently local redirect service won't react if backend pod becomes unhealty. This adds logic that checks the pod status within watcher and update LRP configs accordingly. To achieve the above goal, PodConditions are added into the slim apis to allow for watching the "ready" condition of pods. This PR also adds logic in LRP manager to restore service when no backend is healthy. Signed-off-by: Weilong Cui <cuiwl@google.com> 12 October 2020, 22:02:36 UTC
e9cb43c Helm: full refactor of helm charts, default values implemented, tests updated, kind cni integration Signed-off-by: Sean Winn <sean@isovalent.com> Fixes: #13210 ```release-note Helm charts have been fully re-structured to a single chart for cilium with no dependency on sub-charts. More than 170 global values have been properly scoped to the cilium chart with sane defaults defined. Users upgrading from prior versions of cilium should be sure to read the upgrade guide for specific instructions. ``` 12 October 2020, 21:31:01 UTC
4c954ee Fixes LRP service deletion and restoration - Deletes all frontends before resetting the config map. Otherwise, for named ports, later iterations would have a frontend with <nil> IP. - Moves the restoration logic outside of the deleting loop, since we only need to restore the service once after *all* frontends with LRP are removed for this service. Signed-off-by: Weilong Cui <cuiwl@google.com> 12 October 2020, 20:44:33 UTC
d3b4745 Adds a basic e2e test for LRP. Adds 2 test cases: 1. checks LRP should redirect traffic *only* to local backend. 2. checks removing LRP restores original service (that traffic goes to both backends). Signed-off-by: Weilong Cui <cuiwl@google.com> 12 October 2020, 20:44:33 UTC
back to top