https://github.com/cilium/cilium

sort by:
Revision Author Date Message Commit Date
50359ea Prepare for v1.1.6 Signed-off by: Ian Vernon <ian@cilium.io> 23 October 2018, 13:54:00 UTC
56a94ed daemon: Clean up k8s health EP pidfile on startup [ upstream commit 71a863422a43e0499cd9b777b68261227544de2a ] When in kubernetes mode, every time Cilium starts, there is a new PID namespace. As such, any pidfiles that remain on the filesystem on startup are pointing to PIDs which may be reused in the new PID namespace, so it's not safe to trust their contents. In particular, for the health endpoint, we make use of a PIDfile to allow the Cilium agent to find and kill a health endpoint if it becomes unresponsive. However, we should never try to kill the health endpoint based on a PID from a previous PID namespace. Therefore, delete the pidfile for the health endpoint when starting up, just before the health endpoint is launched. This will avoid the logic for killing the previous health endpoint (which has been known to unintentionally terminate processes other than health endpoints). Fixes: #5907 Signed-off-by: Joe Stringer <joe@covalent.io> Signed-off-by: Ian Vernon <ian@cilium.io> 17 October 2018, 09:29:02 UTC
3a76729 pidfile: Add 'Remove' to provide pidfile deletion [ upstream commit 29b976f310d9e03e35eecb6b560fd4925f115ec6 ] Signed-off-by: Joe Stringer <joe@covalent.io> Signed-off-by: Ian Vernon <ian@cilium.io> 17 October 2018, 09:29:02 UTC
b77e542 envoy: Pass nil completion if Acks are not expected. [ upstream commit 8babf8dff27bc5eeef391cd01084474e14069293 ] Completions are cleaned up when an ack or nack for the resource type is received. This never happens if there are no configured L7 proxies. Prevent indefinite collection of completions in this case by passing nil completion to Upsert(). Fixes: 99a73a2fbd ("envoy: Update revision upon acks from NPDS") Signed-off-by: Jarno Rajahalme <jarno@covalent.io> 17 October 2018, 09:15:06 UTC
ec785c1 endpoint: Skip conntrack clean on endpoint restore [ upstream commit b6f9dcc0f99729b361db1776c95410a788afc860 ] The commit cb49db51afa introduced conntrack cleaning on the initial endpoint build to clean eventual state from a previous endpoint build. This is correct in principle but the commit missed to exclude endpoint builds triggered by restored endpoints. Fixes: cb49db51afa ("endpoint: Clear conntrack on initial endpoint build") Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: Ian Vernon <ian@cilium.io> 16 October 2018, 01:18:34 UTC
24241e5 protect bpf.PerfEvent.Read from infinite loop [ upstream commit 37c67e8085f0cf241904a0274facf9167d7bd7f9 ] Infinite loop in `Read` can be caused by corrupted data in perf ring buffer. If the timeout in `Read` loop is reached, perf event is logged for further debugging. Backporter's notes: Conflict in pkg/bpf/map. Discarded that change. Signed-off-by: Maciej Kwiek <maciej@covalent.io> Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: Joe Stringer <joe@covalent.io> 12 October 2018, 09:54:30 UTC
2c979d4 bpf: Use 'forwarding_reason' instead of potentially overwritten 'ret' [ upstream commit 1ec1c3e87e24e001c3c762781d8819a645b3fcef ] 'ret' is overwritten in some code paths, so use 'forwarding_reason' instead. Fixes: 07a0969a2b4b ("bpf: Do not redirect replies from a pod to a proxyport.") Signed-off-by: Jarno Rajahalme <jarno@covalent.io> Signed-off-by: Joe Stringer <joe@covalent.io> 12 October 2018, 09:54:30 UTC
b41b329 bpf, perf: refine barriers, tail pointer update and buffers [ upstream commit c3df3d7bd168ada3c4658a270ce955784aa21db3 ] - Refine paired memory barriers from user space side. - In place tail update which allows kernel to fill new entries before we finished a slow full run as before. - Increase tmp buffer to avoid potential of corruptions, and realloc on demand for really large sizes. - Sanity check on ring creation. - Using mask instead of modulo op for ring offset. - Reduce a bit complexity overall on read side. - Error logs counter for truncated events. - Unmap golang mmap buffer on close. - Select sample_period of 1 to trigger writes into RB. Future improvements: add a mode sample_period=0,wakeup_events=0 where we act on poll timeout for collecting data that is not too latency sensitive so we can avoid wakeups and batch samples. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Joe Stringer <joe@covalent.io> 12 October 2018, 09:54:30 UTC
b54edee bpf: Avoid additional cgo call per perf read [ upstream commit 3e391ce21d8e0db83d0736cab5d556cf463178cd ] Every cgo call is expensive. Avoid the call to Cast() as it can be folded into the read. Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: Joe Stringer <joe@covalent.io> 12 October 2018, 09:54:30 UTC
02e8d07 examples/kubernetes: Synchronize CRIO init YAMLs [ upstream commit 3b6e8a9aa377121d6ff1dfc864f18757ef91b4cb ] These YAMLs were out of date compared with the regular Cilium DaemonSet YAMLs, so update them to match. Signed-off-by: Joe Stringer <joe@covalent.io> 10 October 2018, 15:01:06 UTC
1769f82 examples/kubernetes: Clean up pidfiles on startup [ upstream commit a12824bdf0a97423c7e935cf7bf9d737e01207ec ] When the Cilium pod starts up, it begins with a brand new PID namespace. Any pidfiles that may have been persisted in the /var/run/cilium directory will refer to PIDs that have been terminated, so these PIDs will be of no use to anything within the daemon. This is believed to be the cause of occasional termination of clang and llc processes during startup with the error "signal: killed". Fixes: #5748 Backporter's notes: Regenerated to include changes in k8s v1.7 YAMLs Signed-off-by: Joe Stringer <joe@covalent.io> 10 October 2018, 15:01:06 UTC
f7f103c bpf: Do not redirect replies from a pod to a proxyport. [ upstream commit 07a0969a2b4badb5e0d3eb874da14e7c2e7d394e ] Replies going into a pod are already not redirected to a proxy. Do the same for replies going out of a pod. Only original direction packets should be redirected to a proxy. Backporter's notes: Rebased conflict due to CT map getter in v1.3+. Signed-off-by: Jarno Rajahalme <jarno@covalent.io> Signed-off-by: Joe Stringer <joe@covalent.io> 10 October 2018, 15:01:06 UTC
863c43b policy: do policy modifications based on the CNP identifiable labels [ upstream commit 7be9127a7e79cf16cc3ae0d05b67238108a48ce4 ] The user installs a CNP with 2 rules defined on which 1 of the rules contain a user defined label. When doing an CNP update where the rule with the user defined label no longer exists, that rule would never be deleted by Cilium because there was no reference to those user-defined labels. This commit introduces a new functionality to the PolicyAdd which allows rules with a specifc set of labels to be deleted during a PolicyAdd. This should ideally be used during a PolicyAdd to perform an atomic operation during an update of the CNP. [backport note: dropped tests due to missing dependencies/infra not in 1.1 but used in original patch] Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: John Fastabend <john.fastabend@gmail.com> 08 October 2018, 10:05:40 UTC
03c5d16 crio: don't mount bpf path for k8s >= 1.11 It seems since cri-o 1.11 the `/sys/fs/bpf` is already mounted into the container. To avoid Cilium from failing its initialization setup, as it checks if bpf is correctly mounted, the crio kubernetes descriptors won't have the volume mounted for `/sys/fs/bpf` as it is automatically done by runc. Signed-off-by: André Martins <andre@cilium.io> 07 October 2018, 18:31:54 UTC
2857988 examples/kubernetes: add better comment for bpf-maps volume This will also help the removal of the N lines that start with this exact description. Signed-off-by: André Martins <andre@cilium.io> 07 October 2018, 18:31:54 UTC
3758cae envoy: Use separate clusters for egress and ingress redirects. [ upstream commit 8bd65a6bfd44b9949362999073c363273133623a ] Currently it is possible that an ingress proxy accidentally reuses a connection opened by an egress proxy, since in both cases the connections share the same (original) destination address and the same source security ID. Fix this by using separate clusters for ingress and egress redirects, as each cluster has its own connection pool. Signed-off-by: Jarno Rajahalme <jarno@covalent.io> 04 October 2018, 23:01:08 UTC
d9468b3 envoy: Update generated protobufs Signed-off-by: Jarno Rajahalme <jarno@covalent.io> 04 October 2018, 23:01:08 UTC
7f0e672 test: fix star wars demo Fixes: 043a9c89d0a5 ("test: fix star wars demo to run star-wars v1.0") Signed-off-by: André Martins <andre@cilium.io> 02 October 2018, 10:53:22 UTC
a62a974 envoy: Pass error detail when NACK [ upstream commit b636fd92c378d9cfcbbcd67cf18d1d9142777c1a ] Define a 'ProxyError' error type that embodies the error detail received from Envoy when xDS protocols reject resource updates. This helps debugging when policy or listener updates fail. Signed-off-by: Jarno Rajahalme <jarno@covalent.io> 01 October 2018, 23:49:59 UTC
838f530 xds: Start versioning at 1. [ upstream commit a02b33b7d959da8b5cfc6580f97bd44f3ddd1273 ] Reserve version 0 for the initial Envoy xDS request (empty string). This simplifies version number handling as we do not need to deal with uint64 pointers any more. Signed-off-by: Jarno Rajahalme <jarno@covalent.io> 01 October 2018, 23:49:59 UTC
bc6a941 envoy: Make NACK cancel the WaitGroup [ upstream commit 551377ffa0d3cca2e4a6f8502afd640ba7afc9cc ] Allow the completion type to complete in failure. The callback is extended with an error parameter to pass the error condition. The err parameter will be nil when called after positiva acknowledgement has been received, non-nil when negative acknowledgement has been recived, and 'context.Canceled' or 'context.DeadlineExceeded' when the context has been cancelled or timed out, respectively. A non-nil error parameter causes cancellation of the whole WaitGroup. WaitGroup.Wait() returns the most severe error value encountered, in the following order of severity, from highest to lowest: 1. non-context errors 2. context.Canceled 3. context.DeadlineExceeded 4. nil To correctly detect xDS resource versions that are ACKed or NACKed the nonce field is changed to be the same as the version in the response messages. Then, in the following request, the VersionInfo field tells us the last ACKed version, while the ResponseNonce field tells us the last processed, but not ACKed version. This means that all versions after VersionInfo up to and including ResponseNonce are NACKed. Signed-off-by: Jarno Rajahalme <jarno@covalent.io> 01 October 2018, 23:49:59 UTC
0475e36 daemon: fix potential nil pointer dereference [ upstream commit b7f36b641391c7f74e23dd2ebe70e21ec52363d6 ] During 1.1 backports, I was hitting the following golang panic in the CI: [...] 09:42:59 PANIC: daemon_test.go:158: DaemonEtcdSuite.SetUpTest 09:42:59 09:42:59 ... Panic: runtime error: invalid memory address or nil pointer dereference (PC=0xF3B38A) 09:42:59 09:42:59 /usr/local/go/src/runtime/panic.go:502 09:42:59 in gopanic 09:42:59 /usr/local/go/src/runtime/panic.go:63 09:42:59 in panicmem 09:42:59 /usr/local/go/src/runtime/signal_unix.go:388 09:42:59 in sigpanic 09:42:59 state.go:155 09:42:59 in Daemon.regenerateRestoredEndpoints 09:42:59 daemon.go:1291 09:42:59 in NewDaemon 09:42:59 daemon_test.go:105 09:42:59 in DaemonSuite.SetUpTest 09:42:59 daemon_test.go:160 09:42:59 in DaemonEtcdSuite.SetUpTest 09:42:59 /usr/local/go/src/reflect/value.go:308 09:42:59 in Value.Call 09:42:59 /usr/local/go/src/runtime/asm_amd64.s:2361 09:42:59 in goexit Turns out there's one spot in restoreOldEndpoints() which can return a nil state on error instead of an empty old one as in other error cases. While the issue was reproducing every single run before, I wasn't able to trigger the panic after the fix. Fixes: #5695 Spotted-by: André Martins <andre@cilium.io> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> 01 October 2018, 23:47:17 UTC
df30bf6 fix alignment in Go structs [ upstream commit 81da0ceeda66e4bc9e3c72d549b77197c7265f88 ] [ partial backport ] There was a structure that was missing a field which was causing a misalignment in the cilium bpf ct list global. ``` TCP OUT 192.168.33.11 48672:6443 expires=16211 RxPackets=0 RxBytes=0 TxPackets=1 TxBytes=66 Flags=0 RevNAT=3 SourceSecurityID=2038366208 ``` This bug triggered a verification of all golang structures to see if they were missing any field or they were misaligned with the C equivalent structure. It doesn't solve the out-of-order fields between the 2 programming languages as that should be manually verified by the developer. Now, every time the cilium-agent starts a verification of all C and Golang equivalent structs are verified to have the exact offset per field and the exact size between both structures. In case they mismatch, a similar panic will be presented when running cilium-agent: ``` different structure types for CGO and GO: align_checker._Ctype_struct_ct_entry != ctmap.CtEntry panic: C struct field "slave" (2) has different size than GoStruct field "TxFlagsSeen" (1) goroutine 1 [running]: github.com/cilium/cilium/pkg/align_checker.init.0() /home/aanm/git-repos/go/src/github.com/cilium/cilium/pkg/align_checker/align_checker.go:93 +0x1d8 ``` Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> 01 October 2018, 23:47:17 UTC
523b740 bpf: Add basic endpointKey.ToIP() test [ upstream commit 6e7c551ec8970960c95f065f663f1758de3e90c8 ] Add a test to ensure that endpointKey.ToIP() converts IPv4 and IPv6 addresses into strings properly. Signed-off-by: Joe Stringer <joe@covalent.io> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> 01 October 2018, 23:47:17 UTC
c1c7f42 daemon: Improve syncLXCMap failure log [ upstream commit 39634c5e612aa7a245474cb71c1049abfcb9e490 ] This log previously wouldn't provide the context of what was used to perform the lookup, so it was impossible to draw any conclusions from the log message. Fix this by formatting the IP being deleted. Signed-off-by: Joe Stringer <joe@covalent.io> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> 01 October 2018, 23:47:17 UTC
5fd3362 lxcmap: Fix invalid dumping of IPv4 entries [ upstream commit 72d28b725caa2ba7ea843b4f13d53ee23e26535c ] This would previously format IPv4 addresses in the IPv6 form, which would lead to the following errors: msg="Unable to delete obsolete host IP from BPF map" error="Unable to delete element from map cilium_lxc: no such file or directory" Fix this by using the appropriate function to convert the address into the right form. Fixes: #5678 Signed-off-by: Joe Stringer <joe@covalent.io> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> 01 October 2018, 23:47:17 UTC
e96d8bc agent: Fix temporary corruption of BPF endpoint map on restart [ upstream commit 4ecf111c35f047f490f5cc8858d23c15e342541f ] Due to deleting and repopulating the BPF endpoint map on startup. Cilium agent restarts would introduce a race window of a couple of seconds [0] in which local endpoints would not be reachable. This would result in packet drops or depending on the routing path, a routing loop which eventually leads to a drop followed by a ICMP TTL exceeded. [0] The race window for host IPs was in the microseconds range. The race window for endpoints depended on the number of local endpoints and etcd latency as the endpoint would reappear as it was regenerated on restore. Fixes: #5651 Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> 01 October 2018, 23:47:17 UTC
e2d2073 proxy: Check whether a port is already open before allocating [ upstream commit 98626286af4779ed100acb53f0b809c873a5583f ] port to a new proxy redirect, which tried opening and closing the port. That check was triggering errors in Envoy when it tried to open the allocated port too soon after the check closed the port. However, removing that check exposed Envoy to legitimate port conflicts with other processes opening ports in the same port rannge. Re-add a check before port allocation, implemented by listing the currently open ports (like netstat) instead of opening and closing the ports. Fixes: #5532 Fixes: #5495 Signed-off-by: Romain Lenglet <romain@covalent.io> Signed-off-by: Eloy Coto <eloy.coto@gmail.com> 19 September 2018, 10:29:01 UTC
e55cff7 Test/Demos: Make assert more robust. [ upstream commit 07903008ecd60ce7e9db6ccbad79e6e42e85d821 ] Seen in 1.0 branch and issue #5531 that xwingPods array had 0 len, and the assert will not work, pod is different than a empty string. With this commit the assert will check that the array does not have 0 len and it'll be not a test flake Signed-off-by: Eloy Coto <eloy.coto@gmail.com> 19 September 2018, 10:29:01 UTC
41538f1 k8s: Fix CNP delete handling to not rely on rules being embedded [ upstream commit 9073d4252620ec35d49e245053fa3c9d93d0ceb9 ] The delete behavior of CNPs so far was to parse the CNP and extract the labels of the first rule embedded. In case a delete notification did not include the rules, the delete would be ignored but successfully acknowleged. It is not necessary to rely on the rules as correlation of the CNP to the policy rule is done via namespace+name+resource-type which is available in the metadata section of the CNP. Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: Eloy Coto <eloy.coto@gmail.com> 19 September 2018, 10:29:01 UTC
1ad73e2 k8s: Include type of derived k8s resource in policy rule [ upstream commit 483c92c1a3ce2df372f80150536dba8d8305fc3a ] So far, policy rules derived from k8s resources such as CiliumNetworkPolicy and NetworkPolicy have been identified by namespace and name. This is not unique enough as a CiliumNetworkPolicy and NetworkPolicy can have conflicting names. Include the resource type in the policy rule labels to guarantee uniqueness. Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: Eloy Coto <eloy.coto@gmail.com> 19 September 2018, 10:29:01 UTC
42e5259 test: Fix the semantics of WithTimeout's Timeout [ upstream commit 460c376e45813f77285ea2cc6c23c7854a30807c ] The Go docs specify that a time.Duration "represents the elapsed time between two instants as an int64 nanosecond count." However, TimeoutConfig's Ticker and Timeout fields violated that semantics as they were defined as time.Duration while storing durations in Seconds. Fix the timeout handling in WaitForServiceEndpoints, which converted timeouts into billions of seconds. To prevent such bugs from occurring again, specify clearly that Ticker and Timeout contain seconds, and change their type to int64. Signed-off-by: Romain Lenglet <romain@covalent.io> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> 18 September 2018, 07:35:07 UTC
043a9c8 test: fix star wars demo to run star-wars v1.0 [ upstream commit 50f29e92fc737d4649257c744156724c064a983c ] Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> 18 September 2018, 07:35:07 UTC
5f2b695 Revert "test/k8sT: use specific commit for cilium/star-wars-demo YAMLs" [ upstream commit fce1786db8d09662b18f18578038b5e9f09a355f ] This reverts commit c162b904a93e7b597a60dd00cbcf9f398d9235c1. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> 18 September 2018, 07:35:07 UTC
024c58f tests: disable k8s 1.12-alpha.0 tests [ upstream commit db674150168541c2d1dc6524d761f7e2d0f00906 ] Since k8s 1.12-alpha.0 was build before the integration of the CRD Update Status feature, the tests will always fail as Cilium uses this feature for all k8s versions >= 1.11.0. This commit should be reverted once k8s 1.12-beta.0 is released. Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> 18 September 2018, 07:35:07 UTC
59f6b8c Revert "Revert "ginkgo-kubernetes-all.Jenkinsfile: move k8s 1.10 and 1.12 to same stage"" [ upstream commit 41d8fc9260bc6b80cf12e01431a70e3c5e5f4423 ] This reverts commit 5e000239faafaa35cbd0bda747259c7016283ea5. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> 18 September 2018, 07:35:07 UTC
09aea5b test: update k8s to 1.9.9 and 1.10.5 [ upstream commit e1c154dece59e5c4ff392ac66385ad6a8426c608 ] Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> 18 September 2018, 07:35:07 UTC
a72cfe4 Revert "Revert "test: update k8s to 1.8.14, 1.10.4 and 1.11.0"" [ upstream commit 9f979bbb42d6611159211f58f963be3ac3a11528 ] This reverts commit 0f4a1d7bf4042b17c5130f0fb356e37e9e0f8023. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> 18 September 2018, 07:35:07 UTC
4809648 test: set default CRI socket [ upstream commit 0c4e2a7b9e9417accea6e9955778e0a0b1c7a860 ] Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> 18 September 2018, 07:35:07 UTC
3aa79f4 test/k8sT: wait for DNS to be ready in Kafka pods [ upstream commit 1f2f2807bce9020075e73f8a6fe5facbfed53e56 ] We have observed that while DNS lookups succeed from the host for the various services used in the Kafka Policies test, the CI has failed with errors like the following: "Removing server kafka-service:9092 from bootstrap.servers as DNS resolution failed for kafka-service" To prevent this issue, ensure that "nslookup" succeeds within each Kafka pod itself for "kafka-service". Signed-off by: Ian Vernon <ian@cilium.io> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> 18 September 2018, 07:35:07 UTC
1c4d54d Test: Fix issues with kubernetes test on old branchs. In commit `6ca4ac682c2800bd4d5bb7d221a302ef7b0532b8` logger was not initialized and test panic on start, with this change test does not longer panic. Fix #5531 Signed-off-by: Eloy Coto <eloy.coto@gmail.com> 17 September 2018, 12:54:47 UTC
a2551d7 Prepare for 1.1.5 release Signed-off-by: Thomas Graf <thomas@cilium.io> 14 September 2018, 14:11:52 UTC
b99a931 test: add CI test for endpoint with already-allocated identity [ upstream commit 9f12aa2bb8968352944f35a22d4b28b312ce7ce1 ] Add a CI test which ensures that policy is correctly computed for endpoints which have an identity which was created in the key-value store prior to the creation of said endpoint. Signed-off by: Ian Vernon <ian@cilium.io> Signed-off-by: Ian Vernon <ian@cilium.io> 13 September 2018, 16:06:43 UTC
075979f endpoint: remove revision check around L4 policy calculation [ upstream commit 4c4be8c9965e09a8ae234fa33eb8d19cd3ae489c ] Endpoint policy regeneration logic would check if the policy repository revision changed from the endpoint's prior policy calculation to determine if the endpoint's L4 policy should be calculated. However, endpoint L4 policy should be calculated for a variety of other reasons, including endpoint identity change. Thus, short-circuiting endpoint L4 policy regeneration only based of the difference of the endpoint's policy revision and the policy repository's revision is incorrect and could lead to cases where L4 policy for an endpoint is not calculated. Signed-off by: Ian Vernon <ian@cilium.io> Signed-off-by: Ian Vernon <ian@cilium.io> 13 September 2018, 16:06:43 UTC
1d69b98 examples/kubernetes: Add clean-cilium-bpf-state option [ upstream commit b88b879d2b5ab78c5a82398d6a46d084e5892a5b ] Add a new init container option which removes all pinned BPF maps during startup without clearing /var/run/cilium/state/. Signed-off-by: Joe Stringer <joe@covalent.io> Signed-off-by: Ian Vernon <ian@cilium.io> 13 September 2018, 16:06:43 UTC
ea599bc lbmap: Guarantee order of backends while scaling service [ upstream commit 7e2f9149ccdcf8300959f4b53eeb0aa6ae567f3e ] This commits resolves inconsistency when updating loadbalancer BPF map entries. The datapath relies on a consistent mapping of backend index to backend IP as the backend index is cached in the connection tracking table. The commit resolves the following deficits: * The previous listing of backend services when synchronizing down was relying on map iteration. Unfortunately, map iteration order is random which was able to lead to unnecessary backend slot reordering. * When scaling down the number of backends. The previous behavior would delete the backends and shift all entries to correspond to the new backend count. This did break consistent load-balancing mapping for existing connections relying on the shifted backends. Unfortunately, the datapath relies on a hole free list of backends in order to perform cheap slave selection based on the packet hash. The resolution is to preserve backend slots that are freeing up due to backend deletions and fill them with duplicates of other backends. In order to keep load balancing distribution fair, the backend with the least duplicates is nominated to fill in. Fixes: 42254ccbfaf ("lbmap: Support transactional updates") Fixes: #5425 Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: Ian Vernon <ian@cilium.io> 11 September 2018, 18:55:37 UTC
f453d5f proxy: Remove port binding check on redirect creation [ upstream commit 02e737224008e21f04cf7872eaf1e568a0325c3f ] On redirect creation, we were checking that the allocated port could be bound, by opening and closing a listen socket on that port. However, Linux gives no guarantees about the delay after closing when the port can be bound again (by Envoy, in this case). Due to recent performance improvements, the delay between the test and the creation of the listener in Envoy was shortened, which increased the probability of Envoy not being able to bind the port for a new listener. Remove this check altogether. Signed-off-by: Romain Lenglet <romain@covalent.io> Signed-off-by: Ian Vernon <ian@cilium.io> 11 September 2018, 18:55:37 UTC
e32a475 lbmap: Support transactional updates [ upstream commit 42254ccbfaf8d3bfd464064aa995a38145bbfb07 ] The existing servive update logic deleted and re-added a service every time a service had to be updated. This commit adds update logic to the load balancer map and uses it when adding and updating services. There is still a small race window: 1. If the number of service backends is reduced 2. AND the CPU processing a packet in the datapath gets preempted between looking up the master key and looking up the selected slave. 3. AND the selected slave has an index that is greater than the new number of backends 4. THEN the slave lookup will fail Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: Ian Vernon <ian@cilium.io> 11 September 2018, 18:55:37 UTC
d3e7106 lbmap: Introduce lock to allow for transactional operations [ upstream commit 6339f502979ebcf78072b29f6228e2bcdeabeaf5 ] There is a high level lock in daemon/loadbalanger.go which lbmap currently relies on but this is very fragile and not ideal. Introduce a lock on lbmap level. It's a simple global lock spanning all maps for now. It could eventually become more fine grained to cover only certain maps at a time but the contention on this lock is minimal for now. Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: Ian Vernon <ian@cilium.io> 11 September 2018, 18:55:37 UTC
93a9bb3 lbmap: Mark internal APIs as private [ upstream commit 055e0578b6fbab6f028da514e7a29d47d94b225e ] Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: Ian Vernon <ian@cilium.io> 11 September 2018, 18:55:37 UTC
f755ced endpoint: Fix realized state corruption when initial policy regeneration fails [ upstream commit 46b36d4157aab6cab3336e39ec33026bb162d67b ] When an endpoint regeneration calculates policy for the first time, the policy map is created. Up to now, the regeneration function queued a deferred function that, on error, deleted the policy map again. If the endpoint generation failed *after* the policy map content was already successfully synchronization, the removal of the policy map result in a flush of the policy map content without resetting e.realizedMapState which in turn would make e.syncPolicyMap() believe that the desired state was already fully realized when the BPF map itself was flushed. This would result in the BPF map being out of sync with what is believed to be the realized state. There is no need to defer the deletion of the map if the regeneration fails, the policy map can remain in the endpoint. It will get reused in the next regeneration attempt and will eventually get deleted when the endpoint is removed together with all other BPF maps tied to the endpoint. Similarly, if the policy map is removed for some reason, the regeneration does re-create the map but did not reset the e.realizedMapState and thus would never write to the map. Resolve this by always resetting the realizedMapState when the policy map is created. Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: Ian Vernon <ian@cilium.io> 11 September 2018, 18:55:37 UTC
b38f763 envoy_test: Better logging for Envoy. [ upstream commit 84f20c307acc9253c1d0cc96c2be404167cf31c4 ] Use a temp directory that remains in the file system after the test is finished so that the Envoy logs can be examined after the test run. Enable flowdebug so that Envoy logging will be set to "debug" Signed-off-by: Jarno Rajahalme <jarno@covalent.io> Signed-off-by: Ian Vernon <ian@cilium.io> 11 September 2018, 18:55:37 UTC
4b17b0c envoy: Unix domain socket for Envoy admin. [ upstream commit fc806381d415267bb916236b53d1e9d6d066f372 ] Configure Envoy to use a Unix domain socket for the admin interface. Signed-off-by: Jarno Rajahalme <jarno@covalent.io> Signed-off-by: Ian Vernon <ian@cilium.io> 11 September 2018, 18:55:37 UTC
b8a91d3 envoy: Use POST for admin interface [ upstream commit ee52d736887622328c90fa8d79d04cd6fdde891d ] Envoy has changed to not allow GET method on the admin interface for requests that mutate Envoy state. Use POST instead so that we can successfully change the log level at runtime and make Envoy quit when Cilium is terminated. Signed-off-by: Jarno Rajahalme <jarno@covalent.io> Signed-off-by: Ian Vernon <ian@cilium.io> 11 September 2018, 18:55:37 UTC
8fd00b7 allocator: Only re-create verified local keys [ upstream commit 7e5dd1052490134e0d14081736cba67b7181b9cf ] The re-creation logic recreated all local keys, even keys that were not verified yet. Unverified keys are keys that have been locally allocated but not yet synchronized with the kvstore. Upon re-creating such keys, the later initial creation of the key in the regular code path would no longer regard such keys as created and unit tests would fail. This happened when the background sync routine ran in between local allocation and kvstore synchronization. Fixes: #5437 Fixes: 10e50bef7b6a ("allocator: Periodically re-create master keys for local allocations") Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: Ian Vernon <ian@cilium.io> 11 September 2018, 18:55:37 UTC
266c0fe allocator: Periodically re-create master keys for local allocations [ upstream commit 10e50bef7b6a4a48b5eab5abe8cc7df0a8bdd2dc ] Add a new master key protection mode of the allocator which will cause any node to immediately re-create master keys of locally used allocations. Also run a background routine to periodically re-create all master keys of locally used allocations. This fixes #5397 while we are not entirely sure what caused the master keys to be deleted. This also indirectly improves behavior of #5399 but does not directly fix it. Fixes: #5397 Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: Ian Vernon <ian@cilium.io> 11 September 2018, 18:55:37 UTC
4705bc4 envoy: Set route reply policy to retry on "5xx" [ upstream commit e275ab46683f5e1a840d6b13cc0cbab78cf3bbbb ] On "5xx" policy "Envoy will attempt a retry if the upstream server responds with any 5xx response code, or does not respond at all (disconnect/reset/read timeout). (Includes connect-failure and refused-stream)." This fixes issues due to broken connections due to policy changes that either insert or remove a proxy redirect on a the path of the connection. The max number of retries is set to 3, as I found out in local testing that sometimes the first retry would end up also re-using a connection broken by a policy change. Signed-off-by: Jarno Rajahalme <jarno@covalent.io> Signed-off-by: Ian Vernon <ian@cilium.io> 11 September 2018, 18:55:37 UTC
5c24bbc alloactor: Log allocator garbage collection events [ upstream commit 8b38474c45fd27daa0a03a5122207b10ae26aa68 ] There was almost no logging information so far. Log errors while performing allocator garabge collection as warnings and successful deletion as info message. Even the deletion is relatively rare and will happen once per unused identity on one of the nodes in the cluster. Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: Ian Vernon <ian@cilium.io> 11 September 2018, 18:55:37 UTC
c6d7592 agent: Fix periodic agent unhealthiness due to CompilationLock contention [ upstream commit 29f95818ab0f1c19e8b727f8c0e84a22017986a2 ] The agent status report acquire the CompilationLock in attempt to detect deadlocks. This lock can receive a lot of contention as it is being held throughout entire compilation cycles. At times, when many endpoints need to be rebuilt. Waiting for this lock can potentially take a minute or longer. There is no point in acquiring this lock with the motivation to detect deadlocks as this lock may be held for a long time. Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: Ian Vernon <ian@cilium.io> 11 September 2018, 18:55:37 UTC
83e1bb8 Revert "endpoint: In BPF regeneration, create/remove listeners early" This reverts commit f556222968a19314950d74f97f834d09174bdb56. Fixes: https://github.com/cilium/cilium/pull/5443 Signed-off-by: Romain Lenglet <romain@covalent.io> 04 September 2018, 22:50:19 UTC
83bea42 Ignore non-existing link error in cni del [ upstream commit dd19c96e33efc8471992ff49968e3a023e5ef4e9 ] Signed-off-by: Maciej Kwiek <maciej@covalent.io> Signed-off-by: Maciej Kwiek <maciej.iai@gmail.com> 04 September 2018, 17:29:40 UTC
f556222 endpoint: In BPF regeneration, create/remove listeners early [ upstream commit d8314c7734c6efee94411bf6e506b91ac853d0e7 ] Create new listeners and remove obsolete listeners as early as possible, just after updating the policy map. This gives more time to Envoy to configure and ACK listener config updates, and reduces the probability of regeneration timeouts due to Envoy. Also remove one endpoint mutex locking section after BPF compilation, in order to reduce the regeneration time. Signed-off-by: Romain Lenglet <romain@covalent.io> Signed-off-by: Maciej Kwiek <maciej.iai@gmail.com> 04 September 2018, 17:29:40 UTC
237e131 xds: Ignore completion timeouts on resource upsert and delete [ upstream commit bd42ceca1680b365905a5e84dfbc43b63d1db853 ] Fixes: https://github.com/cilium/cilium/issues/5317 Signed-off-by: Romain Lenglet <romain@covalent.io> Signed-off-by: Maciej Kwiek <maciej.iai@gmail.com> 04 September 2018, 17:29:40 UTC
6ca4ac6 Revert "test/k8sT: use specific commit for cilium/star-wars-demo YAMLs" [ upstream commit fce1786db8d09662b18f18578038b5e9f09a355f ] This reverts commit c162b904a93e7b597a60dd00cbcf9f398d9235c1. Signed-off-by: Maciej Kwiek <maciej.iai@gmail.com> 04 September 2018, 17:29:40 UTC
98edaae lxcmap: Fix always returning an error on delete [ upstream commit a0fbf0adde2c87b1b3b7f32e4464022a7adc2a26 ] Previously, the `errors` variable that would be returned was always initialized to a non-nil empty slice, which would ensure that if the caller checked it against nil that it would appear to always fail, leading to the following false positive warning message: unable to delete element 7439 from map /sys/fs/bpf/tc/globals/cilium_policy_7439: [] Fixes: #5089 Signed-off-by: Joe Stringer <joe@covalent.io> Signed-off-by: Maciej Kwiek <maciej.iai@gmail.com> 04 September 2018, 17:29:40 UTC
a7cf921 lxcmap: Improve error messages in DeleteElement() [ upstream commit a8e1512f3afe3ad88a6f8d708aa4fd1f569ead21 ] Include the map name here, so that the caller doesn't need to. Related: #5089 Signed-off-by: Joe Stringer <joe@covalent.io> Signed-off-by: Maciej Kwiek <maciej.iai@gmail.com> 04 September 2018, 17:29:40 UTC
a13fb85 Change the prometheus yaml to deploy in monitoring namespace [ upstream commit b88691fcb10dc482c2174159c18ebe5cde33e780 ] Prometheus and grafana are commonly deployed in the monitoring ns the change makes it easy to connect those services without requiring a full domain name for service Signed-off-by: Arvind Soni <arvindsoni@gmail.com> Signed-off-by: Maciej Kwiek <maciej.iai@gmail.com> 04 September 2018, 17:29:40 UTC
4219289 examples/kubernetes: add node.kubernetes.io/not-ready toleration Kubernetes 1.12 added node.kubernetes.io/not-ready for nodes that don't have its runtime network ready. Since Cilium needs to be deployed on nodes so it can setup the CNI configuration the not-ready toleration needs to be added to the DaemonSet. Signed-off-by: André Martins <andre@cilium.io> 04 September 2018, 16:07:53 UTC
8d78392 k8s: add /status to RBAC for backport compatability In Cilium 1.2, a k8s functionality was added called "CRD Subresources". Cilium enables this functionality by updating its CRD definition in Kubenetes API-Server and with it, the CRD version. This functionality, allows Cilium to only update the Cilium Network Policy (CNP) and Cilium Endpoint (CEP) Status in the `/status` API endpoint without sending the full object to Kubernetes API-Server. On a downgrade from 1.2 to 1.1, where the user also downgrades the RBAC rules, Cilium will send the full object to Kubernetes whenever wants to update the status of the CNP or the CEP. As the user downgraded the RBAC definition, Cilium will no longer have permissions to write any status for any Cilium object. As the CRD definition was updated into Kubernetes API-Server in 1.2, Cilium 1.1 will not perform any changes of that CRD definition as its version is lower the one installed in kube-apiserver. To fix this issue, it is easier to backport the RBAC rules for 1.1 which allows Cilium to continue to have write permissions in `/status`. This downgrade issue will only happen in Kubernetes >= 1.11 which was when the "CRD Subresources" where enabled by default in Kubernetes API-Server. Signed-off-by: André Martins <andre@cilium.io> 03 September 2018, 11:44:24 UTC
c68f516 k8s: fix clean state InitContainer permissions Signed-off-by: André Martins <andre@cilium.io> 30 August 2018, 19:02:14 UTC
899335b pkg/k8s: properly handle empty NamespaceSelector [ upstream commit 376ab67369722fde7b7d2dda0e1aa75741859b65 ] If a rule is passed with an empty NamespaceSelector, this means that all traffic should be allowed to/from all pods in all namespaces. However, Cilium was improperly treating this as allowing all traffic to/from all destinations/sources accordingly. Signed-off by: Ian Vernon <ian@cilium.io> Signed-off-by: Ray Bejjani <ray@covalent.io> 21 August 2018, 23:01:43 UTC
7a22166 endpoint: Clear conntrack on initial endpoint build [ upstream commit cb49db51afaac8aede386f0fdacd38b015b6ee3b ] On regular endpoint lifecycle management, the connection tracking table should automatically be cleared from entries of endpoints that have been removed. Similar, if the agent crashed, the cleaning during the bootstrapping should remove all entries that cannot be related to a restored endpoint. As a last safety net, this commit removes all eventual connection tracking entries related to an endpoint IP that is being re-used. The cleaning is done for the first regeneration of the endpoint in parallel to the compilation. Fixes: #5239 Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: Ray Bejjani <ray@covalent.io> 21 August 2018, 23:01:43 UTC
285e339 endpoint: Remove conntrack entries on endpoint removal [ upstream commit 75f1f7234f2268d1f9eb9d039cc2f8e480108f44 ] When an endpoint is removed, all conntrack entries of that endpoint are guaranteed to become invalid. Use the opportunity to remove all such entries. Testing: ``` $ sudo cilium bpf ct list global ICMPv6 IN [f00d::a0f:0:0:3adc]:0 -> [fd01::b]:0 related expires=100021 rx_packets=1 rx_bytes=142 tx_packets=0 tx_bytes=0 flags=10 revnat=0 src_sec_id=1 ICMPv6 OUT [2a00:1450:400a:803::200e]:0 -> [f00d::a0f:0:0:3adc]:0 related expires=100021 rx_packets=0 rx_bytes=0 tx_packets=1 tx_bytes=94 flags=10 revnat=0 src_sec_id=1031 TCP OUT [2a00:1450:400a:803::200e]:80 -> [f00d::a0f:0:0:3adc]:37958 expires=100021 rx_packets=0 rx_bytes=0 tx_packets=1 tx_bytes=94 flags=0 revnat=0 src_sec_id=1031 UDP OUT 8.8.8.8:53 -> 10.11.246.180:48671 expires=100021 rx_packets=1 rx_bytes=98 tx_packets=1 tx_bytes=70 flags=0 revnat=0 src_sec_id=1031 ICMP IN 10.11.224.225:0 -> 10.11.0.1:0 related expires=100022 rx_packets=1 rx_bytes=74 tx_packets=0 tx_bytes=0 flags=10 revnat=0 src_sec_id=1 TCP IN 10.11.224.225:4240 -> 10.11.0.1:54450 expires=121562 rx_packets=5 rx_bytes=475 tx_packets=5 tx_bytes=464 flags=10 revnat=0 src_sec_id=1 ICMP OUT 8.8.8.8:0 -> 10.11.246.180:0 related expires=100021 rx_packets=0 rx_bytes=0 tx_packets=1 tx_bytes=70 flags=10 revnat=0 src_sec_id=1031 UDP OUT 8.8.8.8:53 -> 10.11.246.180:40772 expires=100021 rx_packets=1 rx_bytes=86 tx_packets=1 tx_bytes=70 flags=0 revnat=0 src_sec_id=1031 ICMP OUT 172.217.168.78:0 -> 10.11.246.180:0 related expires=100021 rx_packets=0 rx_bytes=0 tx_packets=1 tx_bytes=74 flags=10 revnat=0 src_sec_id=1031 TCP OUT 172.217.168.78:80 -> 10.11.246.180:36166 expires=121561 rx_packets=5 rx_bytes=814 tx_packets=6 tx_bytes=418 flags=10 revnat=0 src_sec_id=1031 $ docker rm -f 98a611fb2d4f $ sudo cilium bpf ct list global ICMP IN 10.11.224.225:0 -> 10.11.0.1:0 related expires=100022 rx_packets=1 rx_bytes=74 tx_packets=0 tx_bytes=0 flags=10 revnat=0 src_sec_id=1 TCP IN 10.11.224.225:4240 -> 10.11.0.1:54450 expires=121562 rx_packets=5 rx_bytes=475 tx_packets=5 tx_bytes=464 flags=10 revnat=0 src_sec_id=1 ``` Fixes: #5239 Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: Ray Bejjani <ray@covalent.io> 21 August 2018, 23:01:43 UTC
f2946a8 conntrack: Remove GCFilterType [ upstream commit d2baaafc39abfcac8315e5a92394692910d18e42 ] Required refactoring to allow simple addition of removal of certain IPs while performing GC. Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: Ray Bejjani <ray@covalent.io> 21 August 2018, 23:01:43 UTC
7cf0d6a conntrack: Scrub all entries that cannot be associated with restored endpoints [ upstream commit eefe58a3a7e8742909b7e630bcb1359c3dfb3ae1 ] So far, no scrubbing was performed and all entries in the conntrack table have been subject to regular expiration cycles. When starting the agent, the agent restores all previous endpoints. The IPs of all those endpoints are known. We can thus scrub the conntrack table from all entries that cannot be associated with one of the restored endpoints. Testing: Old table ``` $ sudo cilium bpf ct list global ICMPv6 IN [f00d::a0f:0:0:fbac]:0 -> [fd01::b]:0 related expires=98121 rx_packets=1 rx_bytes=142 tx_packets=0 tx_bytes=0 flags=10 revnat=0 src_sec_id=1 TCP OUT [2a00:1450:400a:801::2004]:80 -> [f00d::a0f:0:0:fbac]:43232 expires=98121 rx_packets=0 rx_bytes=0 tx_packets=1 tx_bytes=94 flags=0 revnat=0 src_sec_id=63927 ICMPv6 OUT [2a00:1450:400a:801::2004]:0 -> [f00d::a0f:0:0:fbac]:0 related expires=98121 rx_packets=0 rx_bytes=0 tx_packets=1 tx_bytes=94 flags=10 revnat=0 src_sec_id=63927 TCP OUT 172.217.168.68:80 -> 10.11.248.28:41732 expires=119587 rx_packets=10 rx_bytes=13101 tx_packets=11 tx_bytes=692 flags=10 revnat=0 src_sec_id=63927 TCP OUT 216.58.215.228:80 -> 10.11.248.28:58144 expires=119661 rx_packets=13 rx_bytes=13303 tx_packets=14 tx_bytes=854 flags=10 revnat=0 src_sec_id=63927 ICMP OUT 8.8.8.8:0 -> 10.11.248.28:0 related expires=98121 rx_packets=0 rx_bytes=0 tx_packets=1 tx_bytes=74 flags=10 revnat=0 src_sec_id=63927 ICMP OUT 216.58.215.228:0 -> 10.11.248.28:0 related expires=98121 rx_packets=0 rx_bytes=0 tx_packets=1 tx_bytes=74 flags=10 revnat=0 src_sec_id=63927 UDP OUT 8.8.8.8:53 -> 10.11.248.28:52753 expires=98121 rx_packets=1 rx_bytes=102 tx_packets=1 tx_bytes=74 flags=0 revnat=0 src_sec_id=63927 UDP OUT 8.8.8.8:53 -> 10.11.248.28:52597 expires=98121 rx_packets=1 rx_bytes=90 tx_packets=1 tx_bytes=74 flags=0 revnat=0 src_sec_id=63927 ``` [restart agent] New table: ``` sudo cilium bpf ct list global ``` Related: #5239 Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: Ray Bejjani <ray@covalent.io> 21 August 2018, 23:01:43 UTC
b546103 ctmap: Provide conntrack gc statistics [ upstream commit 4ae95ad4eb6165749d52c406e1aa4f821bee10f7 ] This provides verbose conntrack garbage collection statistics to get a better idea how the conntrack table scales up and down and how long it takes to perform garbage collection. Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: Ray Bejjani <ray@covalent.io> 21 August 2018, 23:01:43 UTC
65b8979 ctmap: Log fatal message on unsupported ct map type [ upstream commit ff1f1526115cbdec0499298a173ae2ee3ad6670e ] This represents a software bug and normal operation cannot be guaranteed. Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: Ray Bejjani <ray@covalent.io> 21 August 2018, 23:01:43 UTC
0d72409 conntrack: Mark RunGC() private [ upstream commit 25e718c6be835d4c288e44b1932d38083a7bae4d ] Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: Ray Bejjani <ray@covalent.io> 21 August 2018, 23:01:43 UTC
07cfaff daemon: Expose restored endpoints on start This is to support backporting https://github.com/cilium/cilium/pull/5243 It simply exposes the restored endpoints from NewDaemon. Note that https://github.com/cilium/cilium/pull/5115 does this but it had conflicts itself when backported. Signed-off-by: Ray Bejjani <ray@covalent.io> 21 August 2018, 23:01:43 UTC
8128c5f test/k8sT: use specific commit for cilium/star-wars-demo YAMLs [ upstream commit c162b904a93e7b597a60dd00cbcf9f398d9235c1 ] A recent commit moved location of files in the cilium/star-wars-demo repository. This broke the CI tests because they assumed the files were at a specific location. Hardcode the base link to GitHub to a specific commit to get the files for running the test instead of pointing to master. Signed-off by: Ian Vernon <ian@cilium.io> Signed-off-by: Ian Vernon <ian@cilium.io> 18 August 2018, 09:08:34 UTC
7bd89ae Prepare 1.1.4 Signed-off-by: Thomas Graf <thomas@cilium.io> 17 August 2018, 18:16:08 UTC
2e1bfe1 docs: Update sphinx theme to print version for stable [ upstream commit 556036452e9e25b5d417003c4c067622f660aff7 ] Update the sphinx theme so that if the version provided by readthedocs is "stable", then the actual version that the docs are generated from is formatted in the upper left hand navigational bar, rather than "stable". Signed-off-by: Joe Stringer <joe@covalent.io> Signed-off-by: Thomas Graf <thomas@cilium.io> 17 August 2018, 18:00:44 UTC
ed8f125 docs: Fix theme paths so RTD picks up in-tree theme [ upstream commit 8e7979e5b3c4b1745ad7753c9b0a86ff49d5e0b7 ] Signed-off-by: Joe Stringer <joe@covalent.io> Signed-off-by: Thomas Graf <thomas@cilium.io> 17 August 2018, 18:00:44 UTC
46fe785 docs: Expand the "v:" to "version:" in the nav bar [ upstream commit aada722e97320cbc5ef7847b4e6a791cac627163 ] The navigational bar has formatted versions like this for some time: "v: v1.0" This looks a bit weird, because the "v" is duplicated. Make it a bit more obvious by printing "version: ..." instead. For active branches, this should show like: "version: v1.1" For other standard readthedocs versions, it should show like: "version: latest" "version: stable" For tagged releases, this should show like: "version: 1.1.1" Signed-off-by: Joe Stringer <joe@covalent.io> Signed-off-by: Thomas Graf <thomas@cilium.io> 17 August 2018, 18:00:44 UTC
5bc8fb6 pkg/health: remove dereferences of members within pointers [ upstream commit d2c0887afc8843e9c38df1da24ef53eca67dbd53 ] The `NodeStatus` object has fields, like HostStatus, which are pointers. Within `HostStatus`, there are other fields which are public, like `PrimaryAddress`. `pkg/health/client` had multiple areas of the code that tried to access `NodeStatus.HostStatus.PrimaryAddress`, for example. If the HostStatus field was `nil`, this would result in a panic. Remove these pointer-within-pointer dereferences, and instead add wrapper functions to get such members. Add unit tests which test possible permutations of NodeStatus to avoid panics like this in the future if the code is modified. Signed-off by: Ian Vernon <ian@cilium.io> Signed-off-by: Thomas Graf <thomas@cilium.io> 17 August 2018, 18:00:44 UTC
170efd0 endpointmanager: Fix conntrack GC [ upstream commit cc88da53986a42118f4e8816925fe894cbeab48a ] Commit 61759e706c6b ("ctmap: Add accessor method for path per endpoint") inadvertently hid the map type (IPv4 or IPv6) from the endpointmanager, which was relying on this map type to call the correct ctmap garbage collection function. This commit does a quick fix to fetch the maptype so that it can be used in the same way as it was in previous releases. A followup commit will refactor this to avoid leaking CT map details into the callers. Fixes: 61759e706c6b ("ctmap: Add accessor method for path per endpoint") Signed-off-by: Joe Stringer <joe@covalent.io> Signed-off-by: Thomas Graf <thomas@cilium.io> 17 August 2018, 18:00:44 UTC
f1fb781 daemon: cache error on policy import for CNP [ upstream commit a09accf3f293da857678432ce110d7c9f6b6aa91 ] If an error occurs policy importing, it should be cached so that if the annotations are updated for a rule, they are aware of said import error, and can access the error from a map of CiliumNetworkPolicy to policy-import related metadata (revision number, error from import), and pass this appropriately to the function to update the status of the CNP. This ensures that annotations updates do not override the \"Error\" field of the CNP NodeStatus if there was an error importing the policy. Fixes: 43e3c7ee3b0e ("daemon: always use same sync func for CNPNS") Signed-off by: Ian Vernon <ian@cilium.io> Signed-off-by: Thomas Graf <thomas@cilium.io> 17 August 2018, 18:00:44 UTC
b8331c0 daemon: always use same sync func for CNPNS [ upstream commit 43e3c7ee3b0e0679826ac896134b74ee28826e16 ] Previously, if the controller for updating the status of a given CNP on a node failed, and then the annotations were updated before the controller had a chance to run again, the controller would be replaced for the given rule, and the annotations would be updated. As a side effect of this, the rule's other fields (e.g., "Enforcing", "OK", etc.) would never be updated because of the replacement of the controller which was trying to do the update, with different functionality which only updated the annotations of the CiliumNetworkPolicyNodeStatus for the given node. To avoid this problem, parameterize the function for updating the status of a CiliumNetworkPolicy on a node, and use this parameterized function both times when the controller to sync the CNP node status is updated. This ensures that the same behavior occurs when only annotations are updated for the rule, and when the rule itself is changed. When a CiliumNetworkPolicy update is received and only the annotations are updated, the rule is not re-added into the policy repository, because there are no actual changes to the policy-related content within the rules themselves. To ensure no race conditions occur between adding of a CiliumNetworkPolicy and updating of the same CiliumNetworkPolicy, add a local cache of CNP --> policy revision at time of import which is guaranteed to be updated by the code which adds CiliumNetworkPolicies before updates are called for the same CiliumNetworkPolicies. Signed-off by: Ian Vernon <ian@cilium.io> Signed-off-by: Thomas Graf <thomas@cilium.io> 17 August 2018, 18:00:44 UTC
e8a07c9 Prepare for 1.1.3 release Signed-off-by: Thomas Graf <thomas@cilium.io> 15 August 2018, 08:25:22 UTC
338ea73 ctmap: Detect support for LRU before upgrading [ upstream commit de50402681eda839df1e26dc82d51ed7be6644d9 ] Fixes: fa12caf597f9 ("daemon: Upgrade CT map properties on startup") Signed-off-by: Joe Stringer <joe@covalent.io> Signed-off-by: Thomas Graf <thomas@cilium.io> 15 August 2018, 08:25:22 UTC
88619c9 bpf: Read feature probes from filesystem [ upstream commit 288ba06dc5a4e007c543b003d24101f1598daa75 ] After running the ``run_probes.sh`` script, load the results of the feature probes into userspace and cache them for subsequent use by map code. Signed-off-by: Joe Stringer <joe@covalent.io> Signed-off-by: Thomas Graf <thomas@cilium.io> 15 August 2018, 08:25:22 UTC
5b2c59e bpf: Pack ipv6_ct_tuple to match Golang [ upstream commit b3759ca11d8976386e67e9ff518be15713c42bd8 ] The Golang structure definition for ipv6_ct_tuple (the ct map key) would not pad out the structure to the nearest 8B offset, which means that when using Golang unsafe.Sizeof() for the userspace definition of this struct, it would give a different size compared to the C sizeof() used when declaring the CT map. An upcoming commit relies on the userspace and kernel definitions for these structures to have the same size, so this commit packs the ipv6 tuple to make them the same. Note that the ipv4_ct_tuple was already marked as packed. This should not have upgrade considerations because for Cilium 1.2, the definition of the ct map values already changes so the map must be deleted/recreated. MERGE NOTE: Instead of packing ipv6_ct_tuple, which causes the map size to become incompatible, round up the key size to achieve the same effect. This avoids requiring to re-create the CT maps on upgrades within the 1.1 series. Signed-off-by: Joe Stringer <joe@covalent.io> Signed-off-by: Thomas Graf <thomas@cilium.io> 13 August 2018, 22:11:46 UTC
12cae9f k8s: Use server version instead of ComponentStatus to check health [ upstream commit 25de08e65ce4ae22caa168551566d71f69729c4d ] The ComponentStatus is no longer a legitimate way of checking the health of the k8s connection. Use the server version instead. Fixes: #5104 Signed-off-by: Thomas Graf <thomas@cilium.io> 13 August 2018, 22:11:46 UTC
1905949 daemon: Remove health-ep on controller stop [ upstream commit 71c48165bd8875c3f7c3380c0b76ef2b30162586 ] Previously, the step to remove the health-ep from the endpointmanager was skipped in the StopFunc for the health-ep controller. Don't skip it. Signed-off-by: Joe Stringer <joe@covalent.io> Signed-off-by: Thomas Graf <thomas@cilium.io> 13 August 2018, 22:11:46 UTC
969cd89 daemon: Remove health-ep before deleting its devices [ upstream commit bcef24f74d2e729b839b16797d62a530c3a15b5b ] Previously, the health endpoint's veth devices would be removed before it was removed from the endpointmanager. This meant that it was possible for the health-ep controller to check (and fail to connect to) the health endpoint then interrupt an ongoing regeneration for the health endpoint. To avoid this, separate the veth cleanup step from the killing of the process such that this ordering should be followed when the health endpoint becomes unresponsive: * Kill the process * Remove the endpoint * Delete the devices * Re-launch the endpoint Fixes: a722e79bd553 ("daemon: Controllerize cilium-health endpoint") Fixes: #5152 Signed-off-by: Joe Stringer <joe@covalent.io> Signed-off-by: Thomas Graf <thomas@cilium.io> 13 August 2018, 22:11:46 UTC
703004e daemon: Refactor health endpoint cleanup code [ upstream commit 24760a864ffaf5c9fbe4a1f89fd986bbf2371caa ] Refactor this code into a separate function for use by subsequent patches. No functional changes. Signed-off-by: Joe Stringer <joe@covalent.io> Signed-off-by: Thomas Graf <thomas@cilium.io> 13 August 2018, 22:11:46 UTC
82c57d0 launcher: Wait for process to exit and release resources [ upstream commit 8fabd73bca7797cfe0269299327d23852323175b ] Fixes: #5103 Signed-off-by: Thomas Graf <thomas@cilium.io> 13 August 2018, 22:11:46 UTC
dc7f074 daemon: Upgrade CT map properties on startup [ upstream commit fa12caf597f92a6020becf572035f39a89a2c2d7 ] If the CT map has properties that differ from the target properties as configured in the daemon code (for example, different value size), then remove that map during endpoint restore. When the map is removed, any existing BPF program that refers to the map should continue to operate as normal. When the endpoints being restored are regenerated, they will cause automatic recreation of the new conntrack map (if it doesn't yet exist), and begin to use the new endpoint. Note that when an endpoint switches from the old CT map to the new CT map, as it would in the above case, existing connections may be temporarily disrupted. Once all endpoints have finished using the old CT map, the kernel will automatically relinquish its resources. Fixes: #5070 Signed-off-by: Joe Stringer <joe@covalent.io> Signed-off-by: Thomas Graf <thomas@cilium.io> 13 August 2018, 22:11:46 UTC
15db199 ctmap: Add accessor method for path per endpoint [ upstream commit 61759e706c6bb3d5ba35bbd8c54b48360398a784 ] This code was previously tucked in the endpoint manager, which doesn't seem right. Refactor it and make it available in the ctmap package so it can be reused by future commits. Signed-off-by: Joe Stringer <joe@covalent.io> Signed-off-by: Thomas Graf <thomas@cilium.io> 13 August 2018, 22:11:46 UTC
72afa1d controller: Fix controller update [ upstream commit be8c7e39aaca106d7ff73ec98fb365946ce81c7f ] Don't stop the old goroutine to start a new one on update, to prevent having two goroutines running simultaneously for a single controller. Instead, atomically change the parameters of the existing controller. Make sure that all fields (esp. lastError) are updated while locking the controller mutex. Preserving the old controller when it is updated also preserves the controller's UUID and stats (success and failure counts, success and failure timestamps, etc.). Signed-off-by: Romain Lenglet <romain@covalent.io> Signed-off-by: Thomas Graf <thomas@cilium.io> 13 August 2018, 22:11:46 UTC
back to top