https://github.com/cilium/cilium

sort by:
Revision Author Date Message Commit Date
bc2c3ee Prepare for release v1.2.8 Signed-off by: Ian Vernon <ian@cilium.io> 19 February 2019, 08:26:56 UTC
7f4fd0e docs: Add note about triggering builds with net-next [ upstream commit 23979ed3ecded429e575dbd51f9f61e0176922b8 ] Signed-off-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: Eloy Coto <eloy.coto@gmail.com> 14 February 2019, 19:21:04 UTC
a17ac4f docs: fixed copy buttons icon [ upstream commit 48939c075af6e7c4e858658befd112ae6b5cdd26 ] Signed-off-by: Eloy Coto <eloy.coto@gmail.com> 14 February 2019, 19:21:04 UTC
dc7f394 fqdn: Avoid regenerations on each poller update When the DNS Poller would update DNS info we called StopPollDNSName before calling StartPollDNSName. In cases where a new rule was added or one removed, this was ok. In cases where we replaced an existing rule we would remove a rule-uuid -> DNS name association, deleting the IPs for that DNS name, and then add it again. This caused the update logic to treat the IP update as new always, instead of only when the IPs had changed (since it had no copy of the IP to compare to). This ultimately resulted in a regeneration with every poller update. The solution is to call StopPollDNSName on rules only after we have called StartPollDNSName, and more aggresively reference count UUID-based associations. This is especially important because we now randomize UUIDs given to rules, and it was possible to not clean the UUID of an old rule and regenerate it back into existence. Signed-off-by: Ray Bejjani <ray@covalent.io> 13 February 2019, 19:34:49 UTC
199fa74 docs: bump copyright headers to 2017-2019 [ upstream commit d96d205a58fc1329b23915bfa588ebab1efeb84c ] Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> 11 February 2019, 13:00:08 UTC
1023733 correct button labels for various cases [ upstream commit 7fd1b6751f8aeb551f758045de4479bb7083573d ] Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> 11 February 2019, 13:00:08 UTC
e7f153a downgrade to es5 syntax [ upstream commit 240563170d0920b8669b65d2ab7ac6efaa2695d7 ] Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> 11 February 2019, 13:00:08 UTC
493c4bc added copy buttons for code blocks [ upstream commit d3ef57eae8bd03e4eb8f74749df329881d56316e ] Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> 11 February 2019, 13:00:08 UTC
3dcabe0 backporting: Add set-labels commands to check-stable [ upstream commit 0c1dbad17c2ddb13697d889787b8f0bac3f2fc0f ] Output the full set-labels.py one-liner command when running check-stable to assist backporters attempting to backport. It's up to the backporter to exclude particular PRs if the PR is not included in the PR. Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Joe Stringer <joe@covalent.io> 02 February 2019, 00:19:50 UTC
948779a docs: Fix backporting shell example formatting [ upstream commit bf17c1b8dd9ab07a1d5127e5bb02127623e818c8 ] Sphinx seems to expect indents of three spaces, make each indentation in this section conform with this standard. Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Joe Stringer <joe@covalent.io> 02 February 2019, 00:19:50 UTC
9c6f540 Makefile: Serve render-docs on port 9080. [ upstream commit c75b9609f3da2b38d5cc202ca949acbe662f864d ] When running `make render-docs`, serve it on port 9080 so it's less likely to conflict with other local web servers running on port 8080. Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Joe Stringer <joe@covalent.io> 02 February 2019, 00:19:50 UTC
2365e8e docs: Update backporting for the latest scripts [ upstream commit 0b40eba0dd61bd6b547bd14cb53b4fdae5f8894b ] Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Joe Stringer <joe@covalent.io> 02 February 2019, 00:19:50 UTC
48e9202 backporting: Add summary log option to check-stable [ upstream commit f6b46388de4ad7a012670bbeea91d4e786c98133 ] Allow writing a nicely-formatted PR message to a file: ``` $ check-stable 1.4 my-pr.txt $ cat my-pr.txt v1.4 backports 2019-01-30 * #xxxx -- commit title (@author) ... ``` Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Joe Stringer <joe@covalent.io> 02 February 2019, 00:19:50 UTC
340f8a9 docs: Streamline and tidy backporting docs [ upstream commit e7e3e72a7db645a0cbc6733429945aaf93d70e34 ] A couple of steps could be compressed into one, and this could make better use of sphinx features to better format the steps. Signed-off-by: Joe Stringer <joe@covalent.io> 02 February 2019, 00:19:50 UTC
fa4061b contrib: Accept multiple commits in 'cherry-pick' [ upstream commit 8b74d6b7139a992b3905f26f886d1317a749f5ae ] Update the 'cherry-pick' commit to accept multiple commits, which it will attempt to apply one-by-one until either they all apply or a patch fails to apply. When a patch fails to apply, it will terminate and not continue applying the rest of the commits in the list. Signed-off-by: Joe Stringer <joe@cilium.io> 02 February 2019, 00:19:50 UTC
6f5932f kvstore: Decrease stale lock timeout from 2 minutes to 30 seconds [ upstream commit 06a4c4ec17b28cc7a2081f98526d444347ef43c0 ] Even 30 seconds is an incredibly long time to hold a distributed lock and should never happen. Lowering this timeout ensures that Cilium recovers quickly when the local locking state gets out of sync. Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: Joe Stringer <joe@covalent.io> 29 January 2019, 02:28:12 UTC
a70a796 kvstore: Release local kvstore lock after timeout [ upstream commit 8267b5b086f9a7cde9b1fc126d156900eeaf1415 ] [ Backporter's notes: commit 5a2ad0dd0a5ad7206489dc9f06f1208461fb96ee was missing, causing a minor conflict in the log message. Manually resolved. ] Cilium does distributed locking via etcd. For this purpose we first acquire a local lock indexed by the etcd key path. This is needed because etcd locks only protect against other clients and not from mutual access from the same etcd client. We then acquire the lock in etcd itself. We store locally that we are currently holding the etcd lock. When the etcd lock is unlocked, this local state is removed. For some reason, it’s unclear why, the local state was left behind and the local unlocking never took place. etcd was identified to have crashed several times based on logs which could have caused the etcd client returning something unexpected. When unable to acquire the local lock, we ignore this state and ask etcd directly. Unfortunately the original request requiring the etcd lock, in this case identity allocation, has typically already timed out. The current code never corrects the local state until the cilium pod is restarted. The new behavior is to release the lock lock forcefully after the timeout and start a new lock acquisition cycle with a new timeout. This ensures that if multiple local consumers are waiting, only one will be able to acquire the lock. Fixes: #6667 Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: Joe Stringer <joe@covalent.io> 29 January 2019, 02:28:12 UTC
dfe140b endpoint: Update LXC map before proxy ack wait and signalling [ upstream commit 7bdde059140137464c54cc55772ac28b02fd6762 ] In BPF regeneration, update the LXC map for the endpoint before waiting for an ACK from Envoy, as the LXC map update is required for the endpoint to receive any traffic. Signed-off-by: Romain Lenglet <romain@covalent.io> Signed-off by: Ian Vernon <ian@cilium.io> Signed-off-by: Ian Vernon <ian@cilium.io> 18 January 2019, 23:30:44 UTC
b7edb8f endpoint: signal when BPF program is compiled for the first time [ upstream commit 4b4602677a4bc06874d5920d5f35df814929456c ] The Istio CI test has been unstable because of behavior in some pods where, if DNS fails to resolve, before the BPF program for the pod is installed, even after the pod has a BPF program installed, it will not be able to connect to the DNS server to resolve DNS again until the pod restarts. Prior to this change, if Cilium is running in tandem with Istio, we returned before the BPF program was installed for the endpoint to avoid deadlock when configuring Envoy. To ensure that this sort of behavior in pods does not occur again, return from endpoint creation if the endpoint has been determined to be Istio-injected, and has a BPF program installed. This PR adds a channel which is closed once the initial BPF compilation has occurred for an endpoint. The state of this channel is used to determine if a BPF program has been compiled for the endpoint. Fixes: #5859 Signed-off by: Ian Vernon <ian@cilium.io> Signed-off-by: Ian Vernon <ian@cilium.io> 18 January 2019, 23:30:44 UTC
24c4229 Prepare for release v1.2.7 Signed-off by: Ian Vernon <ian@cilium.io> 17 January 2019, 22:45:04 UTC
3e21a06 endpoint: do not regenerate health endpoint after identity change The health endpoint's identity never changes after initial bootstrapping, and the initial regeneration is handled specially when the health endpoint is launched, so skip regeneration of the health endpoint when its identity is set. Signed-off by: Ian Vernon <ian@cilium.io> 17 January 2019, 12:00:45 UTC
fe96766 health: acquire lock to set state to WaitingToRegenerate in LaunchAsEndpoint [ upstream commit 99799331dac3a0dea8706d5764ff8bf8ea9bc8f2 ] \`SetStateLocked\` requires the endpoint mutex be held, so acquire it when bootstrapping the health endpoint. Fixes: 74d2ce2 ("agent: Fix endpoint removal when createEndpoint() fails") Signed-off by: Ian Vernon <ian@cilium.io> Signed-off-by: Maciej Kwiek <maciej.iai@gmail.com> 15 January 2019, 18:58:02 UTC
6978d62 agent: Fix endpoint restore with unmounted BPF filesystem [ upstream commit 19be15be0f3748613942ff250075e44635997b45 ] The function Daemon.restoreOldEndpoints() had errored out when the endpoint BPF map could not be opened. This would prevent restoring of endpoints when the BPF filesystem was not mounted. The error does not matter if we can't open the map as the map will be initialized from scratch and no cleaning up of old potentially stale entries is necessary. Fixes: 4ecf111c35f ("agent: Fix temporary corruption of BPF endpoint map on restart") Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: Maciej Kwiek <maciej.iai@gmail.com> 15 January 2019, 18:58:02 UTC
3c765d7 daemon: Do not omit any error of createNodeConfigHeaderfile [ upstream commit 952b4942cd3f9c0a8f0f802c6d0a1c3a33d41132 ] Previously, any error returned by the function was omitted. Fixes: cf52c0691 ("daemon: factor out node config headerfile into separate") Signed-off-by: Martynas Pumputis <martynas@covalent.io> Signed-off-by: Maciej Kwiek <maciej.iai@gmail.com> 15 January 2019, 18:58:02 UTC
5fbe558 endpoint: Require SyncBuildEndpoint flag to wait for endpoint build to complete [ upstream commit 23071fec6509a412556fb63ce1cfebd070c64464 ] If the entire regeneration for an endpoint is waited to be completed (e.g., policy generation, BPF compilation / loading, and proxy updates) when an endpoint is created, then creation of endpoints which have L7 policy which selects them when Cilium is running with Istio will result in a deadlock. This is because when a pod is created, its sidecar proxy is not started yet, so when addition of the endpoint into the endpoint manager waits for regeneration to complete, the regeneration gets stuck because the its trying to configure a proxy which does not exist it. To get around this, do not synchronously wait for the entire regeneration process to succeed for a given endpoint in endpointmanager. Instead, pass SyncBuildEndpoint from the docker plugin to provied the same behavior as before and replace the call to RegenerateWait() with Regenerate(). When the SyncBuildEndpoint flag is provided while creating an endpoint, the API will still wait synchronously for the generation of policy and loading of the endpoint's BPF program, but not the entire regeneration process which includes proxy updates, to complete. This means that initial L3/L4 connectivity is only in place when endpoint creation is completed when CNI creates an endpoint. Signed-off by: Ian Vernon <ian@cilium.io> Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: Maciej Kwiek <maciej.iai@gmail.com> 15 January 2019, 18:58:02 UTC
32cac9b agent: Fix endpoint removal when createEndpoint() fails [ upstream commit a01f37e47b06ba29ae9262ffea5873105e3a4fa5 ] This fixes two scenarios where a failure in createEndpoint() can lead to the controller created in endpoint.UpdateLabels() to be leaked due to lack of invoking `d.deleteEndpoint()`. This had resulted in an endpoint to exist, invisible in the endpoint list that would be continuously rebuilt via the identity controller. 1. Endpoint build triggered via AddEndpoint() fails 2. Client context timeout causes exit after UpdateLabels() Fixes: 186b5e93a84 ("daemon: do not add endpoint if client connection closes during add operation") Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: Maciej Kwiek <maciej.iai@gmail.com> 15 January 2019, 18:58:02 UTC
23f1f3c pkg/health: protect local variable against concurrent writes [ upstream commit 5bb3d41ad6d52df8717b10904edcfe8a0a7ada0e ] ``` ================== WARNING: DATA RACE Write at 0x00c0004c8590 by goroutine 26: github.com/cilium/cilium/pkg/health/server.(*Server).getNodes() /go/src/github.com/cilium/cilium/pkg/health/server/server.go:122 +0xa18 github.com/cilium/cilium/pkg/health/server.(*Server).FetchStatusResponse() /go/src/github.com/cilium/cilium/pkg/health/server/server.go:189 +0x3f github.com/cilium/cilium/pkg/health/server.(*putStatusProbe).Handle() /go/src/github.com/cilium/cilium/pkg/health/server/status.go:53 +0xab github.com/cilium/cilium/api/v1/health/server/restapi/connectivity.(*PutStatusProbe).ServeHTTP() /go/src/github.com/cilium/cilium/api/v1/health/server/restapi/connectivity/put_status_probe.go:59 +0x2b1 github.com/cilium/cilium/vendor/github.com/go-openapi/runtime/middleware.NewOperationExecutor.func1() /go/src/github.com/cilium/cilium/vendor/github.com/go-openapi/runtime/middleware/operation.go:28 +0xb3 net/http.HandlerFunc.ServeHTTP() /usr/local/go/src/net/http/server.go:1964 +0x51 github.com/cilium/cilium/vendor/github.com/go-openapi/runtime/middleware.NewRouter.func1() /go/src/github.com/cilium/cilium/vendor/github.com/go-openapi/runtime/middleware/router.go:73 +0x440 net/http.HandlerFunc.ServeHTTP() /usr/local/go/src/net/http/server.go:1964 +0x51 github.com/cilium/cilium/vendor/github.com/go-openapi/runtime/middleware.Redoc.func1() /go/src/github.com/cilium/cilium/vendor/github.com/go-openapi/runtime/middleware/redoc.go:72 +0xff net/http.HandlerFunc.ServeHTTP() /usr/local/go/src/net/http/server.go:1964 +0x51 github.com/cilium/cilium/vendor/github.com/go-openapi/runtime/middleware.Spec.func1() /go/src/github.com/cilium/cilium/vendor/github.com/go-openapi/runtime/middleware/spec.go:45 +0xee net/http.HandlerFunc.ServeHTTP() /usr/local/go/src/net/http/server.go:1964 +0x51 net/http.serverHandler.ServeHTTP() /usr/local/go/src/net/http/server.go:2741 +0xc4 net/http.(*conn).serve() /usr/local/go/src/net/http/server.go:1847 +0x80a Previous write at 0x00c0004c8590 by goroutine 58: github.com/cilium/cilium/pkg/health/server.(*Server).getNodes() /go/src/github.com/cilium/cilium/pkg/health/server/server.go:122 +0xa18 github.com/cilium/cilium/pkg/health/server.(*Server).runActiveServices.func1() /go/src/github.com/cilium/cilium/pkg/health/server/server.go:220 +0x6f github.com/cilium/cilium/vendor/github.com/servak/go-fastping.(*Pinger).run() /go/src/github.com/cilium/cilium/vendor/github.com/servak/go-fastping/fastping.go:454 +0xa33 Goroutine 26 (running) created at: net/http.(*Server).Serve() /usr/local/go/src/net/http/server.go:2851 +0x4c5 github.com/cilium/cilium/vendor/github.com/tylerb/graceful.(*Server).Serve() /go/src/github.com/cilium/cilium/vendor/github.com/tylerb/graceful/graceful.go:307 +0x464 github.com/cilium/cilium/api/v1/health/server.(*Server).Serve.func1() /go/src/github.com/cilium/cilium/api/v1/health/server/server.go:177 +0x80 Goroutine 58 (running) created at: github.com/cilium/cilium/vendor/github.com/servak/go-fastping.(*Pinger).RunLoop() /go/src/github.com/cilium/cilium/vendor/github.com/servak/go-fastping/fastping.go:362 +0x199 github.com/cilium/cilium/pkg/health/server.(*prober).RunLoop() /go/src/github.com/cilium/cilium/pkg/health/server/prober.go:354 +0x4b github.com/cilium/cilium/pkg/health/server.(*Server).runActiveServices() /go/src/github.com/cilium/cilium/pkg/health/server/server.go:224 +0x1ab github.com/cilium/cilium/pkg/health/server.(*Server).Serve.func2() /go/src/github.com/cilium/cilium/pkg/health/server/server.go:251 +0x38 ================== ``` Fixes: d95ad2edb5eb ("cilium-health: Report ping status to other nodes") Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Maciej Kwiek <maciej.iai@gmail.com> 15 January 2019, 18:58:02 UTC
03c47de pkg/endpoint: do DeepCopy for proxy stats [ upstream commit 4c50884acb3cb74037733ccd395e60ba6d464032 ] Fixes a race condition where the value of a field from ProxyStatistics was read by a go routine but it was previously written by another go routine. ``` WARNING: DATA RACE Read at 0x00c0021860d8 by goroutine 295: encoding/json.isEmptyValue() /usr/local/go/src/reflect/value.go:959 +0x171 encoding/json.(*structEncoder).encode() ... encoding/json.Marshal() /usr/local/go/src/encoding/json/encode.go:160 +0x73 github.com/cilium/cilium/vendor/github.com/go-openapi/swag.WriteJSON() /go/src/github.com/cilium/cilium/vendor/github.com/go-openapi/swag/json.go:58 +0x17f github.com/cilium/cilium/api/v1/models.(*Endpoint).MarshalBinary() /go/src/github.com/cilium/cilium/api/v1/models/endpoint.go:99 +0x51 github.com/cilium/cilium/pkg/k8s/apis/cilium.io/v2.(*CiliumEndpointDetail).DeepCopyInto() /go/src/github.com/cilium/cilium/pkg/k8s/apis/cilium.io/v2/types.go:292 +0xce github.com/cilium/cilium/pkg/k8s/endpointsynchronizer.(*EndpointSynchronizer).RunK8sCiliumEndpointSync.func1() /go/src/github.com/cilium/cilium/pkg/k8s/endpointsynchronizer/cep.go:223 +0x7d9 github.com/cilium/cilium/pkg/controller.(*Controller).runController() /go/src/github.com/cilium/cilium/pkg/controller/controller.go:194 +0x537 Previous write at 0x00c0021860d8 by goroutine 224: github.com/cilium/cilium/pkg/endpoint.(*Endpoint).UpdateProxyStatistics() /go/src/github.com/cilium/cilium/pkg/endpoint/endpoint.go:1794 +0x12e github.com/cilium/cilium/pkg/envoy.(*accessLogServer).logRecord() /go/src/github.com/cilium/cilium/pkg/envoy/accesslog_server.go:173 +0x6fd github.com/cilium/cilium/pkg/envoy.(*accessLogServer).accessLogger() /go/src/github.com/cilium/cilium/pkg/envoy/accesslog_server.go:127 +0x5b2 ``` Fixes d8cd3052b6b8 ("k8s: Add CiliumEndpoint sync controller") Fixes cdfb40c8aa4c ("api: Refactor /endpoint API for 1.0") Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Maciej Kwiek <maciej.iai@gmail.com> 15 January 2019, 18:58:02 UTC
f37ac82 lbmap: Fix consistent load balancing when reusing backend holes [ upstream commit 7c14b8af93df56f49a12b2267cba1fda1b0d3344 ] Re-use of holes in the lsit of backends can cause existing load-balancing decisions to be changed and thus cause connection disruption. Only re-use holes when scaling down to 0 and back up. Otherwise, the list of backends has to grow continuously. Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: Maciej Kwiek <maciej.iai@gmail.com> 15 January 2019, 18:58:02 UTC
e408495 lbmap: Use length of backend map index instead of uniqueBackends map when growing [ upstream commit 122c50116570439fe7af9357aa7ca4065e3a8e7e ] The existing code likely worked correctly but it's not logical to use the unique backends. Fortunately, when growing beyond holes, the number of unique backends should always be identical to the number of backend indices. Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: Maciej Kwiek <maciej.iai@gmail.com> 15 January 2019, 18:58:02 UTC
b753384 lbmap: Avoid panic in case backendsByMapIndex contains holes [ upstream commit 341d8f2962f35fac5ce0dc4623575bffe995d3ca ] This should never happen but it may have happened. It's still unclear whether this has happened due to a locking bug or another corruption. As the previous commit has fixed locking, print a fat error when a hole is encountered but avoid the panic. Fixes: #6537 Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: Maciej Kwiek <maciej.iai@gmail.com> 15 January 2019, 18:58:02 UTC
2015fb4 lbmap: Add locking to bpfService and lbmapCache [ upstream commit ceb7fdbd92f3e7cb6ec74f57c1df87fed1c76efc ] This code has been assuming for the caller to provide appropriate protection but it's unclear whether that is always guaranteed. Add a mutex to bpfService and lbmapCache to protect data access. Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: Maciej Kwiek <maciej.iai@gmail.com> 15 January 2019, 18:58:02 UTC
00901cd kubernetes: Change terminationGracePeriodSeconds to 1 [ upstream commit ca912066f54bc3e5aa4f52ec14403f843789c5ab ] There is no graceful shutdown involved so we can limit the graceful shutdown period to a second. Any Cilium pod in terminating state only causes downtime of new pods being scheduled onto the node. Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: Maciej Kwiek <maciej.iai@gmail.com> 15 January 2019, 18:58:02 UTC
c3cb83c daemon: do not add endpoint if client connection closes during add operation [ upstream commit 186b5e93a847d3f85d42787bb171bbdbf838f44c ] If a client closes the connection during a endpoint creation with the cilium-agent while the cilium-agent is resolving the identity for that particular endpoint, the cilium-agent will continue to create the endpoint although the endpoint should not be created as the connection with the client was terminated. This bug also explains the problem when an interface does not exist when cilium-agent tries to regenerate the endpoint. Since the client, in this case cilium-cni, times out while waiting for a response from the cilium-agent, cilium-cni cleans up the interface as it assumes the cilium-agent will not be able to perform the task requested. As cilium-agent was not aware the client terminated the connection, once it tries to regenerate the endpoint, it will fail to do so as the interface no longer exists since the interface was removed by the client, cilium-cni. Fixes: 65fe98c4f391 ("cni: Synchroneous pod label retrieval on CNI add") Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Maciej Kwiek <maciej.iai@gmail.com> 15 January 2019, 18:58:02 UTC
6d3299f doc: Fix non-critical CVE-2017-18342: [ upstream commit 7164739417d9b40e4a7d1c378afb55a28edf792c ] In PyYAML before 4.1, the yaml.load() API could execute arbitrary code. In other words, yaml.safe_load is not used. PyYAML is only used in the documentation building chain. This is not security relevant for Cilium. Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: Maciej Kwiek <maciej.iai@gmail.com> 15 January 2019, 18:58:02 UTC
a64edee iptables: Fix iptables removal logic on bootstrap [ upstream commit 36cdd98c7278c87c4034c7a8fbdc57b74271ecbc ] The existing legacy rule removal logic on bootstrap removed all rules which contains the word "cilium". While this removed Cilium relevant rules, it also incorrectly removed rules installed by the portmap/hostport plugin if the plugin was configured with a name that contained the string cilium. Example CNI configuration: ``` { "cniVersion": "0.3.1", "name": "cilium-portmap", "plugins": [ { "type": "cilium-cni" }, { "type": "portmap", "capabilities": { "portMappings": true } } ] } ``` Example of incorrectly removed rule: -A CNI-HOSTPORT-DNAT -m comment --comment "dnat name: \"cilium-portmap\" id: \"95dc537b9152da5f91be3fc5692bf91592bf1871b6e61755aed2056a03e98c4f\"" -j CNI-DN-258a52f03b4b7aa8abdc5 The fix is to be more restrictive in selecting rules to remove and limit it to rules which contain the string "CILIUM_". Fixes: #6499 Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: Ian Vernon <ian@cilium.io> 15 January 2019, 13:14:59 UTC
cc836e8 dns tech preview title update Signed-off-by: Cynthia Thomas <cynthia@covalent.io> 03 January 2019, 10:40:56 UTC
54aa458 doc update dns title tech-preview Signed-off-by: Cynthia Thomas <cynthia@covalent.io> 02 January 2019, 22:31:36 UTC
05c75d9 bpf: Fix reading flags attributes from /proc [ upstream commit 6174f80bb6911986045966170b08bea49e31d150 ] This patch fixes a bug in the reading of the flags attributes from the filesystem. The entry in fdinfo prints a hex value in the form "0xabcdef", however we were previously expecting it to be direct hexits without the "0x" prefix. This lead to userspace treating the flags as 0, even if the /proc filesystem presented a different value for the map attribute. Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Maciej Kwiek <maciej.iai@gmail.com> 14 December 2018, 17:18:32 UTC
26249e7 bpf: Add unit test for map info reading [ upstream commit 9252e3f35f14c94dbbb154fb1f0823fe6961d362 ] This unit test checks that when creating a map from userspace, the mapinfo that we read back out of /proc filesystem matches the map info for the map that was just created. This revealed the bug in the previous commit. Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Maciej Kwiek <maciej.iai@gmail.com> 14 December 2018, 17:18:32 UTC
c885fd0 metricsmap: fix retrieval of possible CPU count [ upstream commit 6b7b62c07ff7e685a6588cad91a31b993a42bd0a ] We use "/sys/devices/system/cpu/possible" to get a number of possible CPUs. The number is used to determine how many entries there are per value in a BPF map of the "BPF_MAP_TYPE_*_PERCPU_*" type. Previously, the retrieval function contained two bugs: - It didn't handle the case of sparse CPU allocations. - It logged the error when only one CPU was possible. The side-effect of the second bug was that upon issuing a cilium cli cmd, 'level=error msg="unable to parse sysfs to get CPU count" error="input does not match format"' was printed as a part of the cmd output. Fixes #4559. Signed-off-by: Martynas Pumputis <martynas@covalent.io> Signed-off-by: Maciej Kwiek <maciej.iai@gmail.com> 14 December 2018, 17:18:32 UTC
88f6431 Prepare for v1.2.6 release Signed-off by: Ian Vernon <ian@cilium.io> 05 December 2018, 19:02:08 UTC
1b2cc17 identity: Block createEndpoint() while identity is being resolved [ upstream commit 54d94a5b1a3c70f2385a395fa0c0c9aeb83a73a6 ] Commit 65fe98 has changed the endpoint creation API to resolve and assign the endpoint labels in a synchronous manner when running in Kubernetes mode. However, the identity resolution has remained non-blocking. This leads to the endpoint not being assigned an identity for some period of time. Previously, the init identity was resolved immediately due to not depending on the kvstore. Resolve the identity while creating the endpoint via the API. This guarantees that an endpoint has a proper identity from the moment it starts up. The consequence is that endpoint creation and thus CNI ADD will fail for pods when the kvstore is not available and the pod is not using a well-known identity. Fixes: 65fe98c4f39 ("cni: Synchroneous pod label retrieval on CNI add") Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: Ian Vernon <ian@cilium.io> 04 December 2018, 00:41:28 UTC
af386a4 bpf: Don't reset TCP timer on final ACK [ upstream commit fd5ea2a3d2132c1f130be2ccf2b465e0e9754041 ] A typical TCP connection close looks something like: -> FIN <- ACK, FIN -> ACK or -> FIN <- ACK <- FIN -> ACK For each direction when the FIN is received, either entry->rx_closing or entry->tx_closing is set. This is triggered via the caller's code which choses to ACTION_CREATE or ACTION_CLOSE depending on the presence of TCP_FLAG_RST or TCP_FLAG_FIN. When the final ack packet arrives, it does not have a `RST` or `FIN`, so the action ends up being ACTION_CREATE. As a result, with the existing logic, the final ack will *always refresh the timer* back to the full 12-hour TCP timeout, after the FINs previously reduced the entry timeout to CT_CLOSE_TIMEOUT. This patch alleviates this by only resetting the closing state and timeout if it appears that a brand new connection is establishing with the same 5-tuple. Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Ian Vernon <ian@cilium.io> 04 December 2018, 00:41:28 UTC
e8a145a bpf: Fix tcp flag access [ upstream commit bbc5a51407a8c8294495a5d8433685195a14f91a ] Previously, access into tcp flags was governed using a bitfield declared based on the endianness of the host CPU, even though the packet data is always in network byte-order. This would mean that any direct access of flags would access the wrong bits in the packet, and would cause conntrack entries for closed TCP connections to expire as long as 12 hours after a connection was closed. Fix this issue by redefining the tcp flags struct to store in a 32-bit structure, then use the Linux TCP_FLAG_* defines to check / store the appropriate TCP flag bits. Fixes: #6280 Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Ian Vernon <ian@cilium.io> 04 December 2018, 00:41:28 UTC
58aaaf6 bpf: Relax verifier in IPv6 drop case [ upstream commit 717a4683f5077a74a1b4a3a3687aaa91fe508f27 ] Before: $ make -C bpf && sudo ./test/bpf/check-complexity.sh | grep -A 2 IPV6_FROM_LXC ... Prog section '2/10' (tail_call IPV6_FROM_LXC) loaded (31)! - Instructions: 4006 (0 over limit) processed 62569 insns After: $ make -C bpf && sudo ./test/bpf/check-complexity.sh | grep -A 2 IPV6_FROM_LXC ... Prog section '2/10' (tail_call IPV6_FROM_LXC) loaded (31)! - Instructions: 4014 (0 over limit) processed 49669 insns Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Ian Vernon <ian@cilium.io> 04 December 2018, 00:41:28 UTC
84b8113 Jenkins: Run cleanup script before build. Signed-off-by: Eloy Coto <eloy.coto@gmail.com> 03 December 2018, 23:14:19 UTC
346e08d cni: Synchroneous pod label retrieval on CNI add [ upstream commit 65fe98c4f391b82e47c47f63e556d9a835cee99c ] Leverage the CNI_ARGS to retrieve the pod name and namespace and issue a call to the apiserver to retrieve the pod labels. This allows to retrieve the pod labels and allocate the identity synchronously on the endpoint creation API call. This makes the CNI endpoint creation code divert from the CMM plugin. The CMM plugin continues to rely on the workloads API to retrieve the metadata. Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: Eloy Coto <eloy.coto@gmail.com> 29 November 2018, 19:29:20 UTC
3c18119 envoy: Use datapath timeouts [ upstream commit 46d6ad03a1a9d2b2b23a1f373a8dcde30ea7098c ] Cilium bpf datapath uses an idle connection timeout of 12 hours. Using timeout that large for Envoy may lead to Envoy runtime failure as Envoy uses a file descriptor for each connection, and it is possible to run out of available file descriptors under an attack. HTTP downloads may take a long time, however, so the Envoy request timeout should be long enough to allow any reasonable download to complete. A tighter idle timeout could configured to clean up Envoy state if the download becomes stale. Idle timeout can, however, interfere with HTTP long polling and GRPC streaming. This commit changes the default value of request timeout to one hour to allow for longer downloads. Idle timeout is kept as unlimited, as is the maximum GRPC timeout. This means that GRPC requests may take longer than one hour to complete without being interrupted. Signed-off-by: Jarno Rajahalme <jarno@covalent.io> Signed-off-by: vagrant <vagrant@runtime1> 29 November 2018, 11:21:01 UTC
0564325 envoy: Make timeouts user-configurable [ upstream commit fae186d26e284dd9354ac9ec700b6fe4617588a8 ] Allow users to configure the timeouts Cilium configures for Envoy. This is required due to Envoy defaults not working in usual situations, e.g., HTTP downloads taking more than 15 seconds to complete. Note that this commit should not change behavior as the default values of parameters not previously configured are the same as Envoy defaults: - request timeout: 15 seconds - retry timeout: 0 (same as request timeout, effectively disabled as there is no time left to do any retries) We had already previously configured max GRPC timeout as 0 (unlimited), max number of retries as 3, and connect timeout as 1 second. Signed-off-by: Jarno Rajahalme <jarno@covalent.io> 29 November 2018, 11:21:01 UTC
2fbef00 bpf: Fix node-port access to l7 proxy [ upstream commit 45f824c077599446f8e5936f9550bfad72dfc33b ] Commit 1671a19eb979 ("bpf: Avoid routing loops for former local endpoint IPs") adjusted the logic for the cilium_host devices to drop traffic which is destined for somewhere that we don't control, under the assumption that it is likely something that we previously controlled, but no longer do. One hole in this logic was that when external endpoints reach to an endpoint that has L7 policy applied, the traffic is first routed into the localhost to terminate the connection in the L7 proxy, which then sends back a response to establish the connection. This traffic may be sent back over the cilium_host device, and will then be dropped by the new logic to avoid routing loops. Fix this case by only dropping traffic for destinations that lie within the range of IPs managed by this node, and allowing other traffic to pass. Fixes: 1671a19eb979 ("bpf: Avoid routing loops for former local endpoint IPs") Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Ian Vernon <ian@cilium.io> 21 November 2018, 23:57:10 UTC
c6ef928 test: reduce # of expected tunnels [ upstream commit 054882d139f7c98fd326984b1dff505cd9691394 ] Now that the node no longer inserts its own address into the tunnel map, update the assertion in the CI to account for the local node IPs being in the tunnel map. Signed-off by: Ian Vernon <ian@cilium.io> Signed-off-by: Ian Vernon <ian@cilium.io> 16 November 2018, 03:44:53 UTC
aaf58c7 bpf: Avoid routing loops for former local endpoint IPs [ upstream commit 1671a19eb979db1bba2da01d8e00ff62e6b4519b ] The bpf_netdev BPF program is used for traffic in device snooping mode as well as to handle packets from the host. When running in the mode to handle traffic from the host and local delivery is unsuccessful as well as encapsulation is not an option, packets were routed back to the host so far. This behavior is required in snooping mode but leads to routing loops until the TTL decreases to zero when running in host mode. Add a special case to drop the packet early without requiring the TTL to decrease to 0 first. Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: Ian Vernon <ian@cilium.io> 16 November 2018, 03:44:53 UTC
f4db136 node: Don't insert own node into tunnel map [ upstream commit 3fc98a700666f7e395b9bd40ad2c531779f2a10c ] Inserting the own node address into the tunnel map can lead to unnecessary encapsulation when the destination IP belongs to a former local endpoint IP that no longer exists. It's harmless but inefficient. Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: Ian Vernon <ian@cilium.io> 16 November 2018, 03:44:53 UTC
a10fb32 k8s: CEP controller retries k8s version checks [ upstream commit 384b0a1c84b7a3ac5be210af46eb501a3fd7e17c ] When requests for the k8s version are flaky, newly created CEP controllers will not create the controller and never try again. The check is now repeated in the controller until it succeeds, and the result cached. Signed-off-by: Ray Bejjani <ray@covalent.io> Signed-off-by: Ian Vernon <ian@cilium.io> 14 November 2018, 16:42:46 UTC
09815b5 docs: use READTHEDOCS version in version warning [ upstream commit 68f60eb0b4921927092be2ef040e846f0a1f33cc ] Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Maciej Kwiek <maciej.iai@gmail.com> 12 November 2018, 16:18:21 UTC
5667e83 cilium: spelling: sha is an acronym replace with SHA [ upstream commit 26d57d4fa7ea9c2e4cc32f7534a9c81d02347d92 ] When running 'make html' in Documentation my version of this command `sphinx-build -b spelling ...' through an error on sha. Presumably, I have different dictionary then standard builds (this is a bare metal build on my dev box). But, really sha is an acronym so lets capitalize and this has the nice side benefit of getting docs to build for me. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Maciej Kwiek <maciej.iai@gmail.com> 12 November 2018, 12:26:15 UTC
5d464a1 cmd: fix documentation links for cmdref [ upstream commit 86578d79e15a31a843fe62906beeb38aa721b5d3 ] The cmdref points the "SEE ALSO" links to non existing files, this commit will point the "SEE ALSO" links to the right location. Reported-by: Strukov Anton <savemech@gmail.com> Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Maciej Kwiek <maciej.iai@gmail.com> 12 November 2018, 12:26:15 UTC
7363740 endpoint: Unlock endpoint to prevent deadlocks. [ upstream commit 9f0d2c4e3582b177f6559fb2866539258fe9a6d0 ] Endpoint was inadvertently left locked by ModifyIdentityLabels() if the endpoint is already disconnected or disconnecting when ModifyIdentityLabels() acquired the endpoint lock. Unlock also in this case. Signed-off-by: Jarno Rajahalme <jarno@covalent.io> Signed-off-by: Maciej Kwiek <maciej.iai@gmail.com> 12 November 2018, 12:26:15 UTC
608dff1 docs: remove height for all images [ upstream commit 76502b239e971e3aba2432f881bf0be28b2cfb18 ] This will make images to have the right size how being presented in the browser. Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Maciej Kwiek <maciej.iai@gmail.com> 12 November 2018, 12:26:15 UTC
f1226cc test: Do not clean during parallel builds. [ upstream commit 88852b32a638140a87cc7bfb00656ab5fb2e66c3 ] Making "clean" during a parallel build interferes with another build making "install". Signed-off-by: Jarno Rajahalme <jarno@covalent.io> Signed-off-by: Maciej Kwiek <maciej.iai@gmail.com> 12 November 2018, 12:26:15 UTC
6d68200 idpool: Fix leaseAvailableID() and slice out of bounds [ upstream commit 14858b1d7c77717cfe189742d94def8881237712 ] LeastAvailableID() was not returning all available IDs and was unnecessarily using random.Intn when allocating each identity instead of generating a random sequence at the beginning. Also fixes a slice out of bounds access when Remove() is called on a exhausted ID pool. The use of the random variable was not protected, concurrent id pool usage could potentially corrupt the random variable. Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: Maciej Kwiek <maciej.iai@gmail.com> 12 November 2018, 12:26:15 UTC
eef2c47 idpool: Factor out IDPool from allocator into package for reuse [ upstream commit 07e26fdbeff1d48f0e7ef67055b04ec698bdc71f ] No functional changes are made in this commit. Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: Maciej Kwiek <maciej.iai@gmail.com> 12 November 2018, 12:26:15 UTC
34ac0fa lbmap: Add unit test for getBackends() [ upstream commit dabace8a2a455c7e90daf2089ef748ffe31bf89c ] Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: Maciej Kwiek <maciej.iai@gmail.com> 12 November 2018, 12:26:15 UTC
6bc1418 service: Restore bpfservie cache on startup [ upstream commit fa9d0cb8e38ea2beb5622bf46178f852011fd31c ] Lack of restoring the cache resulted in the backends to be rewritten in a different order on the first service update. This resulted in connections getting reset when the agent restarts. Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: Maciej Kwiek <maciej.iai@gmail.com> 12 November 2018, 12:26:15 UTC
bc54e43 service: Restore service IDs before connecting to Kubernetes apiserver [ upstream commit 3b5133ad54766ea7f8e201aaa1ce7beb266e0d04 ] So far, the service restore functionality in Kubernetes mode did not preserve the service ID. This was not even possible as the synchronization was executed after receiving the initial set of Service objects. At this point, the Service and Endpoints handler had already modified the services in the datapath. This resulted in a new service ID being allocated. Perform an early service ID restore before the connecting to the API server to ensure that all the ID cache is filled with all currently used service ID to service IP mapping to avoid changing service IDs on startup. Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: Maciej Kwiek <maciej.iai@gmail.com> 12 November 2018, 12:26:15 UTC
c0d417d lbmap: Retrieve service ID when dumping BPF map [ upstream commit 63b1a27472bae5c74e0184d8ded983310e710b62 ] Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: Maciej Kwiek <maciej.iai@gmail.com> 12 November 2018, 12:26:15 UTC
8414d73 docs: remove user flag when rendering documentation locally [ upstream commit 4a3ebcaa90dc97b9f7d87d5fa10c7434863452bf ] As a documentation dependency writes file into the python library while the documentation is being built we have to remove the user flag when creating the HTML files from the documentation. This will require developers to use root privileges to remove the generated documentation in `Documentation/_build/html` Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Maciej Kwiek <maciej.iai@gmail.com> 12 November 2018, 12:26:15 UTC
348d1c8 docs: add warning in docs for older versions [ upstream commit 5a7e89100f2bc50d05fb96f0830b7e0868e9dc76 ] Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Maciej Kwiek <maciej.iai@gmail.com> 12 November 2018, 12:26:15 UTC
2f16e3a docs: fix CVE-2018-18074 [ upstream commit c3b3235f0a6be78736b1cf77809938b9aeda8012 ] Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Ray Bejjani <ray@covalent.io> 31 October 2018, 13:59:49 UTC
12dcd01 Test: stop background monitor command. [ upstream commit 516f13825f252f4e444b6d0ffd96028e545e555f ] Always call cancel in a for loop to not leak the monitor command in background. Fix #6022 Signed-off-by: Eloy Coto <eloy.coto@gmail.com> Signed-off-by: Ray Bejjani <ray@covalent.io> 29 October 2018, 17:04:49 UTC
21e1955 examples/kubernetes: fix cilium tolerations [ upstream commit 1f39ded3bfa4e0165d55be46b6af1de5b7b7f265 ] If the operator "Exists" is not specified, by default kubernetes will compare if the value of the node toleration equals to the value specified in kubernetes node. Reported-by: Lennart Weller <lhw@ring0.de> Signed-off-by: André Martins <andre@cilium.io> 26 October 2018, 17:41:38 UTC
6521a9e Update kube-router.rst Signed-off-by: Carson Anderson <ca@carsonoid.net> 26 October 2018, 15:44:03 UTC
bb849d6 fqdn: Make Rule UUIDs random instead of depending on the labels. Rules may not have any labels and many rules may share the same labels. Therefore a rule can't be uniquely identified by it's labels. Generate a random rule UUID for each rule with "toFQDNs" so that we can identify the same rule when needed for an update. Unit tests are updated to not assume a stable UUID for a given set of labels. Tests are also modified to make sure each rule has a unique UUID regardless of it's labels, and that UUID is generated also if the rule has no labels. Signed-off-by: Jarno Rajahalme <jarno@covalent.io> 24 October 2018, 19:53:25 UTC
f094f0a Prepare for release v1.2.5 Signed-off by: Ian Vernon <ian@cilium.io> 23 October 2018, 09:54:56 UTC
4ef5ea9 k8s: Increase CEP GC interval to 30 minutes [ upstream commit 5f91ba58fcf13152c7a3d3521d175935a073dfac ] In case the lists done by the GC controller are problematic, we now slow this down considerably. Signed-off-by: Ray Bejjani <ray@covalent.io> Signed-off-by: Ian Vernon <ian@cilium.io> 19 October 2018, 04:03:19 UTC
4922a44 k8s: Simpler CEP GC should-run logic [ upstream commit 6d096583e1b009e2f406251d841ef48d3663fe81 ] CEPs of one node need to be garbage collected by other nodes since the original is not present to delete its endpoints. We previously selected which remaining node to run probabilistically but this is hard to reason about. The selection is now based on node names, where the lowest in sorted order is selected per cluster. This assumes that the node list on each node is the same most of the time. This change also skips executing the GC on the first ever run, as that occurs on agent startup and may spam GC runs needlessly. The GC now also runs when there is only a single node. Signed-off-by: Ray Bejjani <ray@covalent.io> Signed-off-by: Ian Vernon <ian@cilium.io> 19 October 2018, 04:03:19 UTC
fe6efd3 k8s: Add --disable-endpoint-crd to disable use of the CEP CRD [ upstream commit 324ee3887b1de9c17c656b912a514cb5d67d71af ] Synchronization of the CEP CRD can have a large performance impact. Provide an option to disable the feature if the CRD is not required. Signed-off-by: Thomas Graf <thomas@cilium.io> 18 October 2018, 23:13:04 UTC
eb21925 V1.2: Add toleration on not-ready Commit 923491437e61c01a78dd9abedef9535906074dae added a new toleration. That commit cannot be backported due 1.2 branch uses kubecfg instead of kubectl patch. Fix #5924 Signed-off-by: Eloy Coto <eloy.coto@gmail.com> 18 October 2018, 21:22:05 UTC
dd6bcc0 pkg/endpoint: fix global k8sServerVer variable assignment [ upstream commit 86c88ab1f12cc6285399a4798e4b66d5b26e6fe9 ] As the variable is being globally assigned for all endpoints, if one endpoints re-sets it to nil it can cause a nil pointer exception during a controller execution causing the cilium-agent to crash. Fixes: a8a81b32a771 ("pkg/endpoint: add UpdateStatus functionality for CEP") Signed-off-by: André Martins <andre@cilium.io> 18 October 2018, 16:41:53 UTC
7f6cde4 endpoint: Wait for CT cleanup to complete before BPF compilation [ upstream commit eaa486d787efd37110c70784c95c52535969f8ea ] Signed-off-by: Romain Lenglet <romain@covalent.io> Signed-off-by: Ian Vernon <ian@cilium.io> 17 October 2018, 13:05:22 UTC
cd7289b envoy: Pass nil completion if Acks are not expected. [ upstream commit 8babf8dff27bc5eeef391cd01084474e14069293 ] Completions are cleaned up when an ack or nack for the resource type is received. This never happens if there are no configured L7 proxies. Prevent indefinite collection of completions in this case by passing nil completion to Upsert(). Fixes: 99a73a2fbd ("envoy: Update revision upon acks from NPDS") Signed-off-by: Jarno Rajahalme <jarno@covalent.io> Signed-off-by: Ian Vernon <ian@cilium.io> 17 October 2018, 13:05:22 UTC
b36c37c daemon: Clean up k8s health EP pidfile on startup [ upstream commit 71a863422a43e0499cd9b777b68261227544de2a ] When in kubernetes mode, every time Cilium starts, there is a new PID namespace. As such, any pidfiles that remain on the filesystem on startup are pointing to PIDs which may be reused in the new PID namespace, so it's not safe to trust their contents. In particular, for the health endpoint, we make use of a PIDfile to allow the Cilium agent to find and kill a health endpoint if it becomes unresponsive. However, we should never try to kill the health endpoint based on a PID from a previous PID namespace. Therefore, delete the pidfile for the health endpoint when starting up, just before the health endpoint is launched. This will avoid the logic for killing the previous health endpoint (which has been known to unintentionally terminate processes other than health endpoints). Fixes: #5907 Signed-off-by: Joe Stringer <joe@covalent.io> Signed-off-by: Ian Vernon <ian@cilium.io> 17 October 2018, 13:05:22 UTC
ec89eda pidfile: Add 'Remove' to provide pidfile deletion [ upstream commit 29b976f310d9e03e35eecb6b560fd4925f115ec6 ] Signed-off-by: Joe Stringer <joe@covalent.io> Signed-off-by: Ian Vernon <ian@cilium.io> 17 October 2018, 13:05:22 UTC
ad26c37 allocator: test: Disable GC in GC unit tests [ upstream commit 5348b6b78755a37ab58737ca926af70effbfe313 ] The GC is run manually to test correctness. Disable the background GC to guarantee predictability of the test. Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: Ian Vernon <ian@cilium.io> 17 October 2018, 13:05:22 UTC
4967219 allocator: Re-create slave keys when master key is missing [ upstream commit 7b212896ea74b3eb55edc0cc106da82dc26721ca ] So far, the slave key was assumed to be still present when a missing master key was detected. In case all slave keys are missing, the master key will be removed again during the next GC cycle of any node. The behavior is correct but there is unnecessary churn that is never being recovered from until the identity is no longer used. By re-creating the the slave key, recovery is possible and it also increases resiliency when the kvstore is completely wiped as each node will re-create all slave keys of locally used keys. Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: Ian Vernon <ian@cilium.io> 17 October 2018, 13:05:22 UTC
15e6d04 allocator: Lock master key prefix when reusing existing cluster identity [ upstream commit 88fccdb49c02c3446d769b28e1997c264b50cc45 ] The master key prefix was only allocated when a master key was freshly allocated. When re-using a master key, there is a small race window in which the a remote garbage collector can kick-in and delete the master key in between the Get() and the creation of the slave key. The race would be immediately fixed via the master key protection mechanism but it will lead to an unnecessary delete and re-create event which will trigger unnecessary regenerations. By locking the master key prefix earlier, the GC can be kept out during the race window. The cost is a slightly slower allocation. Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: Ian Vernon <ian@cilium.io> 17 October 2018, 13:05:22 UTC
fcfaaef test: fix `go vet` failures in v1.2 Signed-off by: Ian Vernon <ian@cilium.io> 17 October 2018, 13:05:22 UTC
163b303 bump timeout on K8s upstream tests to 40 minutes Signed-off by: Ian Vernon <ian@cilium.io> 17 October 2018, 01:38:46 UTC
e470e6e endpoint: Skip conntrack clean on endpoint restore [ upstream commit b6f9dcc0f99729b361db1776c95410a788afc860 ] The commit cb49db51afa introduced conntrack cleaning on the initial endpoint build to clean eventual state from a previous endpoint build. This is correct in principle but the commit missed to exclude endpoint builds triggered by restored endpoints. Fixes: cb49db51afa ("endpoint: Clear conntrack on initial endpoint build") Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: Ian Vernon <ian@cilium.io> 16 October 2018, 23:18:15 UTC
099654a k8s: Avoid TriggerPolicyUpdates when no ToServices rules are loaded [ upstream commit 964714d51fe6391a41b936cbe96a89dd5ce7aa75 ] The call to regenerate policy for all endpoints due to an Endpoints change on an external service is only required when the policy repository contains ToServices rules. This can avoid a lot of unnecessary calls to TriggerPolicyUpdates(). Fixes: #5871 Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: Ian Vernon <ian@cilium.io> 16 October 2018, 23:18:15 UTC
1d29ae7 daemon: add parameter indicating why TriggerPolicyUpdates is called [upstream commit 897375482278a8753aec40b117afa8a146248f48] This function triggers regeneration for all endpoints, which can potentially be very costly. To get more visibility into why it is called, add a string parameter to it for callers to indicate the reason for calling it. Signed-off by: Ian Vernon <ian@cilium.io> Signed-off-by: Thomas Graf <thomas@cilium.io> 16 October 2018, 12:51:58 UTC
af978a0 protect bpf.PerfEvent.Read from infinite loop [ upstream commit 37c67e8085f0cf241904a0274facf9167d7bd7f9 ] Infinite loop in `Read` can be caused by corrupted data in perf ring buffer. If the timeout in `Read` loop is reached, perf event is logged for further debugging. Signed-off-by: Maciej Kwiek <maciej@covalent.io> Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: Joe Stringer <joe@covalent.io> 12 October 2018, 12:02:35 UTC
cc32bf4 bpf: Use 'forwarding_reason' instead of potentially overwritten 'ret' [ upstream commit 1ec1c3e87e24e001c3c762781d8819a645b3fcef ] 'ret' is overwritten in some code paths, so use 'forwarding_reason' instead. Fixes: 07a0969a2b4b ("bpf: Do not redirect replies from a pod to a proxyport.") Signed-off-by: Jarno Rajahalme <jarno@covalent.io> Signed-off-by: Joe Stringer <joe@covalent.io> 12 October 2018, 12:02:35 UTC
8239d7e bpf, perf: refine barriers, tail pointer update and buffers [ upstream commit c3df3d7bd168ada3c4658a270ce955784aa21db3 ] - Refine paired memory barriers from user space side. - In place tail update which allows kernel to fill new entries before we finished a slow full run as before. - Increase tmp buffer to avoid potential of corruptions, and realloc on demand for really large sizes. - Sanity check on ring creation. - Using mask instead of modulo op for ring offset. - Reduce a bit complexity overall on read side. - Error logs counter for truncated events. - Unmap golang mmap buffer on close. - Select sample_period of 1 to trigger writes into RB. Future improvements: add a mode sample_period=0,wakeup_events=0 where we act on poll timeout for collecting data that is not too latency sensitive so we can avoid wakeups and batch samples. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Joe Stringer <joe@covalent.io> 12 October 2018, 12:02:35 UTC
6f32b11 bpf: Avoid additional cgo call per perf read [ upstream commit 3e391ce21d8e0db83d0736cab5d556cf463178cd ] Every cgo call is expensive. Avoid the call to Cast() as it can be folded into the read. Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: Joe Stringer <joe@covalent.io> 12 October 2018, 12:02:35 UTC
03d6838 pidfile: Log when killing a process [ upstream commit 3d68810350b68eff9f69d23cb1ff310e2ae9a7f2 ] Whenever we kill a process using its pidfile, log a message. These may be useful when trying to trace the origin of a spontaneous program exit. Signed-off-by: Joe Stringer <joe@covalent.io> 12 October 2018, 12:02:35 UTC
fa5a4d0 examples/kubernetes: Synchronize CRIO init YAMLs [ upstream commit 3b6e8a9aa377121d6ff1dfc864f18757ef91b4cb ] These YAMLs were out of date compared with the regular Cilium DaemonSet YAMLs, so update them to match. Backporter's notes: Regenerated to include changes in k8s v1.7 YAMLs Signed-off-by: Joe Stringer <joe@covalent.io> 12 October 2018, 12:02:35 UTC
615e677 examples/kubernetes: Clean up pidfiles on startup [ upstream commit a12824bdf0a97423c7e935cf7bf9d737e01207ec ] When the Cilium pod starts up, it begins with a brand new PID namespace. Any pidfiles that may have been persisted in the /var/run/cilium directory will refer to PIDs that have been terminated, so these PIDs will be of no use to anything within the daemon. This is believed to be the cause of occasional termination of clang and llc processes during startup with the error "signal: killed". Fixes: #5748 Backporter's notes: Regenerated to include changes in k8s v1.7 YAMLs Signed-off-by: Joe Stringer <joe@covalent.io> 12 October 2018, 12:02:35 UTC
16e65f8 bpf: Do not redirect replies from a pod to a proxyport. [ upstream commit 07a0969a2b4badb5e0d3eb874da14e7c2e7d394e ] Replies going into a pod are already not redirected to a proxy. Do the same for replies going out of a pod. Only original direction packets should be redirected to a proxy. Backporter's notes: Rebased conflict due to CT map getter in v1.3+. Signed-off-by: Jarno Rajahalme <jarno@covalent.io> Signed-off-by: Joe Stringer <joe@covalent.io> 12 October 2018, 12:02:35 UTC
back to top