https://github.com/cilium/cilium

sort by:
Revision Author Date Message Commit Date
826d75a Prepare for v1.5.0-rc5 Signed-off by: Ian Vernon <ian@cilium.io> 19 April 2019, 00:18:12 UTC
9c4b907 test: Allow Cilium 1.4 to be run with K8s 1.14 [ upstream commit 2cdda25c5d1cd6239f0427f98e8fffaa4fa675ea ] This commit makes it possible to run upgrade tests on k8s 1.14 with Cilium 1.4. Signed-off-by: Martynas Pumputis <m@lambda.lt> 19 April 2019, 00:02:12 UTC
89d8833 cilium, template: add cilium_encrypt_state to ignored prefixes [ upstream commit 8fe063a3d2030ea2f7c69016efcea028d8ab9326 ] Seen several warnings in the cilium agent log files as the following: 2019-04-17T20:11:25.804796408Z level=warning msg="Skipping symbol substitution" subsys=elf symbol=cilium_encrypt_state The map is a global one, therefore add it to ignored prefixes to silence the warning. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Martynas Pumputis <m@lambda.lt> 19 April 2019, 00:02:12 UTC
441864b cni: Stop removing CNI_CONF_NAME on preStop [ upstream commit 14affce7c973e7240b954474c20ede0c578e0eeb ] The idea behind removing the CNI_CONF_NAME on preStop has been to leave no trace behind when the DaemonSet is removed. This strategy has caused two reocurring problems: * When Cilium is restarted, kubelet may fall-back to a different CNI configuration during a short race window, this may cause pods to be managed by a CNI which does not provide network security policy enforcement which then leads to unexpected protection loss. * When users provide a manual CNI configuration, the manual file is kept untouched on start but then removed when the agent restarts. Stop removing the CNI configuration Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: Martynas Pumputis <m@lambda.lt> 19 April 2019, 00:02:12 UTC
7146cd7 cilium: enable encrypt + vxlan test again [ upstream commit 69d6e812c9311200734c1156967f778e5fc844dd ] Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Martynas Pumputis <m@lambda.lt> 19 April 2019, 00:02:12 UTC
9784a3c datapath/iptables: Check iptables kernel modules [ upstream commit 5b17c993e57963a86bfa4887b58018e055624086 ] Before this change, ip6tables was called for rules removal even if ip6tables modules were not loaded. After this change, iptables module is going to: - fail if iptables modules are not loaded - fail if ip6tables modules are not loaded, but IPv6 is enabled - not use ip6tables if its modules are not loaded and IPv6 is disabled Fixes: #7453 Fixes: 65cfe4a6cf49 ("iptables: Add support for IPv6, make IPv4 optional.") Signed-off-by: Michal Rostecki <mrostecki@opensuse.org> Signed-off-by: Martynas Pumputis <m@lambda.lt> 19 April 2019, 00:02:12 UTC
02945fe modules: Add utility for checking loaded kernel modules [ upstream commit 6d19ee3cb14794aaa3c667a22b435767c96c2a73 ] This package provides a manager for checking and searching loaded kernel modules. It's going to be used for checking whether iptables and ip6tables are usable. Signed-off-by: Michal Rostecki <mrostecki@opensuse.org> Signed-off-by: Martynas Pumputis <m@lambda.lt> 19 April 2019, 00:02:12 UTC
4691362 set: Add utility for subset checks [ upstream commit 2ce94b1b30891471b346634320d397c0ea812c8a ] This package provides a function which checks whether the first slice of strings is a subset of the second one. It's going for searching loaded kernel modules. Signed-off-by: Michal Rostecki <mrostecki@opensuse.org> Signed-off-by: Martynas Pumputis <m@lambda.lt> 19 April 2019, 00:02:12 UTC
5774b5b k8s: Fix leak of k8s controller on kvstore connect & disconnect [ upstream commit a63342e75944a3c43548ce53ee40d22c01176cf4 ] A go channel plus a k8s controller was leaked on each kvstore connect + disconnect event as the minimal k8s event controller was replaced with a full-scope controller. Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: Martynas Pumputis <m@lambda.lt> 19 April 2019, 00:02:12 UTC
8150036 k8s: Disable k8s event handover to kvstore by default [ upstream commit bf613204203a85809ac0bee9effeb368694c69a0 ] This is a new feature and primiarly targeted at larger scale. It requires the operator to be functional. This adds complexity which is only required once k8s apiserver event handling at scale becomes a concern. Require users to opt-in. Fixes: #7722 Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: Martynas Pumputis <m@lambda.lt> 19 April 2019, 00:02:12 UTC
f095c92 daemon: Panic if executable name does not match cilium{-agent,-node-monitor,} [ upstream commit 3459f051deb55b6005781bb6181bc12ea540db4f ] Previously, if the executable name of cilium didn't match neither from the above, the executable exited with 0 and didn't log anything. This could have confused users who downloaded the released cilium executables which were suffixed with `-x86_64` and due to the suffix were failing silently. Signed-off-by: Martynas Pumputis <m@lambda.lt> 19 April 2019, 00:02:12 UTC
cfd06bb Add `dep check` to travis build [ upstream commit 61960adfba101bec0c231575c7c4b77c585d02f3 ] This ensures that Gopkg.lock and Gopkg.yaml are not committed to repository malformed Signed-off-by: Maciej Kwiek <maciej@covalent.io> Signed-off-by: Martynas Pumputis <m@lambda.lt> 19 April 2019, 00:02:12 UTC
6777d2a docs: Add containerd to self-managed installation section [ upstream commit 7b21c8350335f6e42c87f79114703f5fe548049d ] Adds a tabbed section for containerd to the self-managed installation section, next to Docker and CRI-O. Signed-off-by: David Birks <davidebirks@gmail.com> Signed-off-by: Martynas Pumputis <m@lambda.lt> 19 April 2019, 00:02:12 UTC
bb4d546 test: Check whether v2 and legacy svc maps are in sync [ upstream commit 3c302b8c8d4a51c44ba54344cec86a313389bc73 ] Signed-off-by: Martynas Pumputis <m@lambda.lt> 17 April 2019, 23:41:26 UTC
875241e test: Extend BpfLBList to list legacy svc BPF maps [ upstream commit 1e8ec20e04bd7fd865ab92c3bf1f0f04d890955c ] Also, add an option to remove duplicate endpoint entries from the legacy svc BPF maps. Signed-off-by: Martynas Pumputis <m@lambda.lt> 17 April 2019, 23:41:26 UTC
a73655b cli: Add flag to list legacy service BPF maps [ upstream commit 93768ed585565688b88c93ea46d69dd25900a343 ] This commits adds the `--legacy` flag which can be passed to the `cilium bpf lb list` command in order to dump the legacy service BPF maps. Also, it moves the dumping code to separate functions. Signed-off-by: Martynas Pumputis <m@lambda.lt> 17 April 2019, 23:41:26 UTC
dbc2e59 bpf, snat: dump external v4/v6 addresses more clearly into node config [ upstream commit 74ab809120953a923f91aee68bbea178f95fa51a ] It's useful for debugging, so make it clear what Cilium daemon has selected for node_config.h such that this can be introspected easily from the usual locations (/var/run/cilium/state/globals/node_config.h). Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Martynas Pumputis <m@lambda.lt> 17 April 2019, 23:41:26 UTC
f3c8587 node, address: fix bug where internal IP is selected over external [ upstream commit 08a33486cb819c37cabaebc2c7c55c2063b255b8 ] Commit df63d9b20be1 unfortunately broke SNAT due to selecting an private IP with global scope over a public one. The interface I had specified had two globally scoped IPs, one of them had a 10.0.0.0/8 prefix, the other one not. However, since prior node_config.h was present on the system, it selected the 10.0.0.0/8 one. As a consequence NAT broke since host local communication which was using the public address got then NATed by using the private one and thus machine looses connectivity. Rework firstGlobal*Addr() to prefer public over private addresses. If there is a preferredIP passed, then it's only preferred pick within the set of public resp. private ones. For IPv6 add similar logic. Also, the findIPv6NodeAddr() function comment mentions an interface that can be passed, however there is no such parameter. Rework the IPv6 logic to reuse all of the IPv4 one such that global scope address of a provided interface is picked if device is configured. The fallback logic is then first trying to find a reduced scope if a device was specified, and if that breaks down, the second fallback is to find the IP considering all interfaces with universe scope (and again, falling back to reduced scope before giving up completely). Fixes: df63d9b20be1 ("Node: Try to prioritize the InternalIPv[46] from restore.") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Martynas Pumputis <m@lambda.lt> 17 April 2019, 23:41:26 UTC
403886b bpf, snat: select lru map if available otherwise fall back to htab [ upstream commit 92cd032bf7b183241c0e248bd101258474f1c5c8 ] Same principle as in conntrack table. In case of evicting old NAT entries from active connection via LRU mechanism, we could run into two scenarios wrt TCP connections: i) packet coming in via ingress. If there is no state for the given connection, the packet will be dropped. Node might resend in which case we'll have an outgoing connection. In case of the latter, we'll then re-create a new mapping via snat_v{4,6}_new_mapping(). Try to retain evicted 5-tuple mapping if possible such that there is a chance connections would keep working before we try completely random ones. Lift the short interval for GC cleanup when NAT is enabled to not interfere with stale NAT entries in LRU. NAT hash-tab would normally piggy-back on GC. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Martynas Pumputis <m@lambda.lt> 17 April 2019, 23:41:26 UTC
e816028 bpf, snat: reject unknown ethertypes early [ upstream commit 7de434985f89bd0bd0e0100db397462af3c1ce5b ] Move the test out of do_netdev() and into each entry point, so we don't even attempt to try SNAT in the first place. This also fixes a verifier issue for older kernels where LLVM is generating ctx+off in a register which is not allowed. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Martynas Pumputis <m@lambda.lt> 17 April 2019, 23:41:26 UTC
a34ad8f bpf, snat: add cilium monitor support for pre/post snat engine [ upstream commit e84065655c26f7083ad15f1d1845800a578b73e1 ] Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Martynas Pumputis <m@lambda.lt> 17 April 2019, 23:41:26 UTC
eb861a8 Replace deprecated provider [ upstream commit d8474e499ad1fb779827e506890fa327194e56e7 ] Signed-off-by: Maciej Kwiek <maciej@covalent.io> Signed-off-by: Martynas Pumputis <m@lambda.lt> 17 April 2019, 23:41:26 UTC
4f57f75 examples: Add --enable-legacy-service=false to ConfigMap [ upstream commit 201d72fa595bdfddb91f756fc4b45f00456252f0 ] This commit adds the `enable-legacy-services` flag to the ConfigMap and sets it to false, so that new users won't need to maintain unnecessary legacy services. Signed-off-by: Martynas Pumputis <m@lambda.lt> 17 April 2019, 23:41:26 UTC
faca16d cilium: Encryption overhead MTU accounting [ upstream commit 658f232b3cabe19316c113037697fdb42a49ff50 ] Account for MTU overhead associated with encryption. Depending on key length this may vary, these calculation use default key size of 128B unless other size is known from reading encryption secrets. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Martynas Pumputis <m@lambda.lt> 17 April 2019, 23:41:26 UTC
16528a0 update Vagrantfiles to version 145 [ upstream commit 707129e16ced46b81e212cf418d1ed3a852ed8a4 ] This version contains the cached Docker images for increasing the throughput of the Cilium CI. Signed-off by: Ian Vernon <ian@cilium.io> Signed-off-by: Martynas Pumputis <m@lambda.lt> 17 April 2019, 23:41:26 UTC
b84eb52 daemon: Improve config file log handling [ upstream commit ca9397b08ed1a890f745bcf9beb0706b96d1e7de ] The config file log handling would always complain if the user doesn't specify a config file (the default). Lower that complaint to info level. Furthermore, fix up the log usage to properly make use of logrus Fields. Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: John Fastabend <john.fastabend@gmail.com> 17 April 2019, 15:07:54 UTC
209a585 daemon: Only invoke daemon init in daemon [ upstream commit b5f1d9f2f0d2293f15fdb1f11f4275be28b47807 ] Previously, the daemon init function would be invoked in non-daemon settings such as when running the cilium cli, or the monitor. Avoid this with a simple Args check at the beginning of the init function. Fixes: #7732 Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: John Fastabend <john.fastabend@gmail.com> 17 April 2019, 15:07:54 UTC
4c84f1b contrib: Exit early if no git remote is found [ upstream commit 65b001f4377904768be03384b68a3324f00a072e ] Previously, if no git remote pointing to git@github.com:cilium/cilium.git or https://github.com/cilium/cilium was found, the `check-stable` and `cherry-pick` scripts continued to execute nevertheless which lead to cryptic errors. This commit makes both scripts to exit early and also to print an error to stderr that the git remote could not have been found. Also, factor out the common code into the `get_remote` function which can be shared by sourcing common.sh. Signed-off-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: John Fastabend <john.fastabend@gmail.com> 17 April 2019, 15:07:54 UTC
452231a daemon: Set backend ID in local LB cache [ upstream commit 43fc56301183e78e7b01f87a5b6234faf180968b ] Currently, the allocation of backend IDs is triggered inside `lbmap` methods. This is far from ideal, as this breaks a separation of concerns between different layers (i.e. the allocation should be really controlled by `daemon`; this is going to be changed in the nearest future). One bug which slipped unnoticed was that we didn't populate the daemon's local LB cache with the acquired backend IDs which lead to non-consistent ordering of service endpoints reported by `cilium service list`, and, as a consequence, this made some CI tests flaky which were expecting some exact service endpoint ordering. Signed-off-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: John Fastabend <john.fastabend@gmail.com> 17 April 2019, 15:07:54 UTC
3655af4 service: Add LookupBackendID method [ upstream commit f462059f4386727d4761952087de6d75d6144759 ] The method is used to look up already allocated backend IDs. Signed-off-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: John Fastabend <john.fastabend@gmail.com> 17 April 2019, 15:07:54 UTC
b8566d0 Prepare for release v1.5.0-rc4 Signed-off by: Ian Vernon <ian@cilium.dev> 16 April 2019, 00:51:13 UTC
1524131 daemon,lbmap: Remove orphan backends [ upstream commit 3d471a5d41a7eea689eea6002e36232ac46658b5 ] This commit removes orphan backends (i.e. backends which are not used by any service) and releases their IDs. Signed-off-by: vagrant <vagrant@k8s1> 16 April 2019, 00:15:51 UTC
8523844 daemon,lbmap: Remove orphan v2 services [ upstream commit 1c5ac0557abaa086e686fbf4079ffa2915f6f3ff ] It's possible to end up in a situation when a v2 service does not have an equivalent legacy service. Consider the following: 1. cilium-agent upgraded to >= v1.5. 2. cilium-agent downgraded to < v1.5. 3. The service is removed (only from the legacy BPF map). 4. cilium-agent upgraded to >= 1.5. If this happen, in the 4. step v2 service won't be restored from legacy. To avoid it, we remove orphan services after we have migrated legacy services to v2. Signed-off-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: vagrant <vagrant@k8s1> 16 April 2019, 00:15:51 UTC
64864cb lbmap: Add BackendAddrID.IsIPv6 method [ upstream commit f6a3e506e408518da07a50f1749743416d5ea512 ] This method detects (in a dirty way) whether a backend identified with the given addrID is of the ipv6 type. It will be used when removing orphan backends from the BPF maps. Signed-off-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: vagrant <vagrant@k8s1> 16 April 2019, 00:15:51 UTC
fad2e8c lbmap: Fix BackendAddrID of IPv6 backend [ upstream commit f7d43a1e2fb48ed2da2e069b0e5850a5644d9059 ] Signed-off-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: vagrant <vagrant@k8s1> 16 April 2019, 00:15:51 UTC
5e9fa41 logfields: Fix BackendID logfield value [ upstream commit 151d4ddc0f705eeb2545db84c9ffecb92cbac04e ] The field is used when logging a single instance of backend. Signed-off-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: vagrant <vagrant@k8s1> 16 April 2019, 00:15:51 UTC
2c44b8f daemon: Use v2 services when syncing with k8s [ upstream commit 33e836c52ffe041c5c662e59b6e0b93e3272396a ] The syncing (`syncLBMapsWithK8s`) happens after the v2 services have been created from the legacy ones, so we can use the v2 for comparing against the k8s ones when syncing. Signed-off-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: vagrant <vagrant@k8s1> 16 April 2019, 00:15:51 UTC
9b49c53 daemon: Remove legacy svc BPF maps if they are disabled [ upstream commit b96869fcbfcd71e927ba2e0021da2cd6e1278472 ] This commit tries to remove the legacy svc related maps if the legacy services are disabled. Signed-off-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: vagrant <vagrant@k8s1> 16 April 2019, 00:15:51 UTC
ef827fe daemon,lbmap: Do not update legacy svc if they are disabled [ upstream commit 2766bc1273682d1663860c2c9e595d35d1eba336 ] This commit disables updating of the legacy services if they are disabled (`--enable-legacy-services=false`). Two other changes: - Restoring from the legacy services is disabled if the legacy services are disabled. - As we do not support the weighted services, the calculation of the service endpoint weights is disabled to simplify the code. Signed-off-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: vagrant <vagrant@k8s1> 16 April 2019, 00:15:51 UTC
6dae468 lbmap: Update revNAT table from v2 routines [ upstream commit 5ab551c7354ef973ca021d458ab901b86914c1ad ] This commit moves the creation and update of rev nat tables from the update function of the legacy services to the update function of the v2 services. This is needed, because we will optionally disable the legacy services. Signed-off-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: vagrant <vagrant@k8s1> 16 April 2019, 00:15:51 UTC
7a3e36b lbmap: Exclude master service earlier in dump function [ upstream commit 6a90432f76ab9dce3f334dd5185e9c0c577658fa ] As we do not use master service to extract a service ID [1], we can ignore parsing of the master service earlier which makes the `lbmap.DumpServiceMapsToUserspaceV2` function smaller. [1]: https://github.com/cilium/cilium/pull/7540#discussion_r272913219 Signed-off-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: vagrant <vagrant@k8s1> 16 April 2019, 00:15:51 UTC
2341571 lbmap,daemon: Make removal of lbmap cache more explict [ upstream commit b791ed867af34a87bc77dd417cb977a6b11fe6c0 ] This commit introduces the `lbmap.DeleteServiceCache` function which is responsible for the removal of the lbmap cache svc entry. Also, the removal is moved from a function which was called when a legacy svc has been removed to a function which manages the removal of both types of services, so that we still remove the cache entry even when the legacy services are disabled. Once we stop support the legacy services, we can move the removal to the `lbmap.DeleteServiceV2` function. Signed-off-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: vagrant <vagrant@k8s1> 16 April 2019, 00:15:51 UTC
234d851 daemon,bpf: Add --enable-legacy-services flags [ upstream commit d5bc02ecc942f578933f84e1553ac4bc316744ba ] The flag (by default set to "true") enables or disables the legacy services. This can be useful to avoid unnecessary memory usage and to save some CPU cycles when doing a service lookup in the datapath for those deployments which are going to use Cilium v1.5 from the beginning, i.e. won't do an upgrade from v1.4 to v1.5. Signed-off-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: vagrant <vagrant@k8s1> 16 April 2019, 00:15:51 UTC
8354070 loadbalancer: Sort backends by ID when listing [ upstream commit 27d82b5ab006a9fa38f8beffcc593a0f491072a1 ] When listing services with `cilium service list`, the order of backends might not reflect the actual order of service values in the svc v2 BPF maps. After the cilium agent has been restarted, it restores the loadbalancer cache from the BPF maps. So, the service value entries might be in a different order than it used to be before the restart. This is fine, but it makes some integrations test to fail which expect service values to be in the same order as before the restart. This commit ensures that the service values are listed always in the same order (i.e. sorted by the backend ID). Signed-off-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: vagrant <vagrant@k8s1> 16 April 2019, 00:15:51 UTC
e48c2c4 cli: Use svc v2 maps when listing [ upstream commit 99bf44a02c1a2ff8e8c0e7f2792658196e1639be ] This commit changes the source maps of the `cilium bpf lb list` cmd - instead of the legacy svc BPF maps, the v2 BPF are used. Signed-off-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: vagrant <vagrant@k8s1> 16 April 2019, 00:15:51 UTC
2781acc bpf: Add Map.UnpinIfExists method [ upstream commit 496ba15f4e74cb54b25a4bc1d21ac87fc5238aab ] This method is going to be used for removal of the legacy svc related maps which might not exist during the removal. Signed-off-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: vagrant <vagrant@k8s1> 16 April 2019, 00:15:51 UTC
17b07cd bpf: Add Map.DumpWithCallbackIfExists method [ upstream commit ceee83340e8555d2eb10b57df11b531c51797d90 ] This method is going to be used by the `cilium service list` handler to dump and parse svc v2 entries. Signed-off-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: vagrant <vagrant@k8s1> 16 April 2019, 00:15:51 UTC
1fe60f0 Fix backporting scripts for https users [ upstream commit 4cc1799a47fa4869feaaa554620024a947b7eb56 ] Signed-off-by: Maciej Kwiek <maciej@covalent.io> Signed-off-by: vagrant <vagrant@k8s1> 16 April 2019, 00:15:51 UTC
5cfb995 test: Update Istio test to 1.1.2 with proxy 1.1.3. [ upstream commit 3c4c3c22c8c09fcdc6a6e196515f32afa2c8175e ] Replace the istio-cilium.yaml file with a new one, generated with helm using the updated Istio GSG. Replace the bookinfo manifests with ones newly generated with helm. Istio release 1.1.3 is not out yet, but we can use the 1.1.3 proxy with Istio 1.1.2 to get the path normalization fix in. Signed-off-by: Jarno Rajahalme <jarno@covalent.io> Signed-off-by: vagrant <vagrant@k8s1> 16 April 2019, 00:15:51 UTC
172e1d8 istio: Update istio proxy to 1.1.3 [ upstream commit 343dc518cdf2e8c984b240d9e3fc29eec3ca70d1 ] Use proxy with path normalization (1.1.3) Istio release 1.1.3 is not out yet, but we can use the 1.1.3 proxy with Istio 1.1.2 to get the path normalization fix in. Signed-off-by: Jarno Rajahalme <jarno@covalent.io> Signed-off-by: vagrant <vagrant@k8s1> 16 April 2019, 00:15:51 UTC
cc36fde CI: Enforce sensible timeouts. [ upstream commit a1f1d0ff5f2ff158badd24f31122845d97764dc8 ] Default timeout/duration unit is nanoseconds, so specify time.Second where the unit was missing, and enforce minimum timeout of 10 seconds and minimum ticker interval of 1 second in WithTimeout(). The default Ticker duration in WithTimeout was 5 nanoseconds, which may have caused all calls to mysteriously fail if the furst call to body() was not successful. Ginkgo functions have optional timeouts (that we usually do not use), expressed in seconds. These should not be mixed up! Istio test may have been failing due to this. Microscope test also had both timeouts and ticker durations without units (defaulting to nanoseconds), which may have contributed to the fact that microscope test had to be disabled. Signed-off-by: Jarno Rajahalme <jarno@covalent.io> Signed-off-by: vagrant <vagrant@k8s1> 16 April 2019, 00:15:51 UTC
7a6ae69 envoy: Update to enable path normalization [ upstream commit 8b4f3cead8b8a0786a0939a6b0f7bf886b0ff993 ] Signed-off-by: Jarno Rajahalme <jarno@covalent.io> Signed-off-by: vagrant <vagrant@k8s1> 16 April 2019, 00:15:51 UTC
458666b test: Disable flaky encapsulation encryption test [ upstream commit 087abbd318d16a7f2ac4f069d6e26c936d3f0c58 ] The test is not reliable enough to be enabled right now Related: #7615 Signed-off by: Ian Vernon <ian@cilium.io> Signed-off-by: vagrant <vagrant@k8s1> 16 April 2019, 00:15:51 UTC
9a9ee27 Revert "test: Disable flaky encapsulation encryption test" [ upstream commit 6c08f085eeffca1207ddf2681728d712c323e626 ] After fix to correctly handle peer and local keys during key transitions e.g. enabling/disabling encryption like this test. And reducing instruction count to get this running on all kernels and LLVM7. Lets enable now. This reverts commit 08e4dde73a327bf693422df76a67e99f5d5a8655. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: vagrant <vagrant@k8s1> 16 April 2019, 00:15:51 UTC
c01bc3c cilium: fix dropping Health node IP updates [ upstream commit 3ed410362178ea7d88644416e6af76406c157b25 ] In my failing ginkgo test for cillium + vxlan + encryption I see Preflight failure. After reviewing the logs I noticed that one node is emmitting an event with only the Health IPs changing and in this case the peer is complaining that it is already in the ipcache and therefore is not updating its node map. This results in a situation where the map has the old Health IPs and when we run cilium-health we try to http/ping the old non-existing IP and as expected get a timeout. Running 'cilium node list' on each node shows that the two nodes disagree on the Health IPs. It is also always exactly one of the two nodes in my setup. To fix this update the node map (but don't do datpath reconfigs) on all nodes even if the ipcache has an entry for the node. We can skip the datapath configuration to avoid churn because the datapath does not use the health IPs. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: vagrant <vagrant@k8s1> 16 April 2019, 00:15:51 UTC
d40fb20 cilium: combine tunnel and non-tunnel cases into single branch [ upstream commit c8d084775af057641ea2dd6ecea55288f0afeef0 ] Refactoring tunnel and non-tunnel cases in encap.h reduces the insn count a reasonable amount. It also helps identify in encap.h where each call is from. We now have a _lxc encap call and a _netdev encap call to handle calls from ingress of veth pair attached to pods and the cilium_host running bpf_netdev. Before: Prog section '2/10' (tail_call IPV6_FROM_LXC) loaded (40)! - Instructions: 3735 (0 over limit) processed 49451 insnser: After: Prog section '2/10' (tail_call IPV6_FROM_LXC) loaded (40)! - Instructions: 3711 (0 over limit) processed 48850 insns Gain another 24 instructions. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: vagrant <vagrant@k8s1> 16 April 2019, 00:15:51 UTC
e8975a2 cilium: remove relax() calls to get more free insns [ upstream commit d19889561aedd4c06365277ca06325f7f1db3619 ] This paticular relax() call does little for complexity limits and I need the extra instructions more. Testing on Vagrant box provided with cilium/test which is using a 4.9 kernel. Before: Prog section '2/10' (tail_call IPV6_FROM_LXC) - Instructions: 3743 (0 over limit) processed 48977 insns After: Prog section '2/10' (tail_call IPV6_FROM_LXC) loaded (40)! - Instructions: 3735 (0 over limit) processed 49451 insns For the win +8 instructions!!! =) Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: vagrant <vagrant@k8s1> 16 April 2019, 00:15:51 UTC
e1efa49 cilium: remove unecessary zero'ing of ip6 endpoint key [ upstream commit abd3a3c88c1448f81519283c2e4833f6ec985c5f ] The structure is initialized at initialization no need to zero the fields yet again. Apparently compiler does emit these instructions. To verify change reduces complexity use Vagrant box provided with Cilium and the following command, #make -C bpf && ./test/bpf/check-complexity.sh Before Prog section '2/10' (tail_call IPV6_FROM_LXC) loaded (40)! - Instructions: 3746 (0 over limit) processed 49004 insns After Prog section '2/10' (tail_call IPV6_FROM_LXC) loaded (40)! - Instructions: 3743 (0 over limit) processed 48977 insns Note, the max instruction limit is 4096 and we will be using the remaining instructions for fixing tunnel keys when encryption keys are changing. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: vagrant <vagrant@k8s1> 16 April 2019, 00:15:51 UTC
afc753e cilium: transparent encryption, use correct keys during key rotation [ upstream commit 22e5c6d232dfdfd5186386b02e9a62bf309b8972 ] When keys are being changed from previous key to new key its possible that nodes will have different sets of keys installed in the datapath. This is a result of either a rolling update and/or because secrets are not propagated at the same time through all the nodes. The result is a peer key ID (e.g. the one looked up in the ipcache) may represent a key that has not yet been installed on the node. Trying to encrypt with that key will result in dropped traffic because the key is not known. So this patch keeps the state of the last key installed in the datapath and uses that. The assumption key updates only happen after all nodes have sync'd the prevous update. This required implementing another map to hold the state. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: vagrant <vagrant@k8s1> 16 April 2019, 00:15:51 UTC
8020677 Doc: Update jinja dependency for documentation building [ upstream commit 32414bd4daeaf931ea94a276b8f441e91517ff6c ] Fixes: CVE-2019-10906 Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: Ian Vernon <ian@cilium.io> 13 April 2019, 19:34:08 UTC
95d58d8 Various bugfixes & improvements to daemon config handling [ upstream commit 54e3c5b765f080a3e40fe644b74fe17981307954 ] Signed-off-by: Ronald van Zantvoort <the.loeki@gmail.com> Signed-off-by: Ian Vernon <ian@cilium.io> 13 April 2019, 19:34:08 UTC
4ca0c31 ipam: Provide ownership information of IP allocations [ upstream commit b2ff83adff68a557e12119ebb2cdf9d89a6cf017 ] Keeps track of who owns an allocated IP and provides it as part of cilium status: ``` IPv4 address pool: 13/65535 allocated from 10.16.0.0/16 IPv6 address pool: 13/65535 allocated from f00d::a10:0:0:0/112 Allocated addresses: 10.16.0.1 (router) 10.16.113.196 (default/echo-79b5fcd4d9-2qlrn [restored]) 10.16.126.206 (default/probe-7b4494bd59-m5222 [restored]) 10.16.139.230 (default/echo-79b5fcd4d9-vwc6l [restored]) 10.16.163.105 (loopback) 10.16.174.44 (default/echo-79b5fcd4d9-cfrgn [restored]) 10.16.183.123 (default/echo-79b5fcd4d9-nk9ln [restored]) 10.16.23.213 (health) 10.16.5.111 (default/probe-7b4494bd59-8rrs2 [restored]) 10.16.71.7 (default/probe-7b4494bd59-whqv8 [restored]) 10.16.76.95 (default/probe-7b4494bd59-pqvx4 [restored]) 10.16.88.220 (default/probe-7b4494bd59-lcd9s [restored]) 10.16.92.82 (default/echo-79b5fcd4d9-bxgw5 [restored]) f00d::a10:0:0:1 (router) f00d::a10:0:0:13ba (default/probe-7b4494bd59-8rrs2 [restored]) f00d::a10:0:0:19f5 (default/probe-7b4494bd59-m5222 [restored]) f00d::a10:0:0:2601 (health) f00d::a10:0:0:69d6 (default/probe-7b4494bd59-whqv8 [restored]) f00d::a10:0:0:71 (default/echo-79b5fcd4d9-bxgw5 [restored]) f00d::a10:0:0:722c (default/probe-7b4494bd59-lcd9s [restored]) f00d::a10:0:0:8e65 (default/echo-79b5fcd4d9-2qlrn [restored]) f00d::a10:0:0:ab5e (default/echo-79b5fcd4d9-nk9ln [restored]) f00d::a10:0:0:b3ec (default/probe-7b4494bd59-pqvx4 [restored]) f00d::a10:0:0:ca90 (default/echo-79b5fcd4d9-cfrgn [restored]) f00d::a10:0:0:d8d4 (default/echo-79b5fcd4d9-vwc6l [restored]) ``` Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: Ian Vernon <ian@cilium.io> 13 April 2019, 19:34:08 UTC
71cf99c kubernetes-upstream: update to k8s 1.14 [ upstream commit 50ebcf33e94d16b83f22bedba7af0e61aa2b1086 ] Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Ian Vernon <ian@cilium.io> 13 April 2019, 19:34:08 UTC
1c06ff2 k8s: Don't bother to create CEP if endpoint is already disconnecting [ upstream commit 4a0c9844e1503be083af9386a48cf8076b49286f ] A common pattern is that an entire namespcae is deleted while pods are still being created. This will trigger a lot of errors like the one below. Fix this by not even bothering to create a CEP if the endpoint is already disconnecting again. This will not completely avoid the error but make it less likely. ``` level=error msg="Cannot create CEP" containerID=1c0bc02ec5 controller="sync-to-k8s-ciliumendpoint (1032)" datapathPolicyRevision=43 desiredPolicyRevision=43 endpointID=1032 error="ciliumendpoints.cilium.io \"client2\" is forbidden: unable to create new content in namespace staging-couchdb because it is being terminated" identity=25422 ipv4= ipv6= k8sPodName=staging-couchdb/client2 subsys=endpointsynchronizer ``` Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: Ian Vernon <ian@cilium.io> 13 April 2019, 19:34:08 UTC
5055967 k8s: Don't error when CEP does not exist on endpoint exit [ upstream commit 97e550f5673a26c9bd83b1143d07f38a4dc2e912 ] An endpoint may be deleted before the CEP was ever created. Skip logging any errors if the error is NotFound and avoid returning an error as the controller was successfully stopped either way. This avoids errors like: ``` level=error msg="Unable to delete CEP" containerID=8ac6da9fb1 controller="sync-to-k8s-ciliumendpoint (963)" datapathPolicyRevision=43 desiredPolicyRevision=43 endpointID=963 error="ciliumendpoints.cilium.io \"couchdb-server\" not found" identity=15353 ipv4= ipv6= k8sPodName=staging-couchdb/couchdb-server subsys=endpointsynchronizer level=warning msg="Error on Controller stop" consecutiveErrors=2 error="ciliumendpoints.cilium.io \"couchdb-server\" not found" name="sync-to-k8s-ciliumendpoint (963)" subsys=controller uuid=8845cc2c-5d17-11e9-916d-02b2f5c938d8 ``` Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: Ian Vernon <ian@cilium.io> 13 April 2019, 19:34:08 UTC
e39a323 Node: Try to prioritize the InternalIPv[46] from restore. [ upstream commit df63d9b20be1bed432da09490865a1aa0cd4aaad ] On case of Cilium restore can choose different Node CIDR, and the endpoints will no restore correctly due to the IP CIDR mismatch. This change will list all IPS, and if the IP that it is on `node_config.h` is present in the list of the IPs will be selected, in case that it is not present (new installation) will list all the address with the RT_SCOPE_UNIVERSE and sort that to try to return always the same results. Fix #7637 Signed-off-by: Eloy Coto <eloy.coto@gmail.com> Signed-off-by: Ian Vernon <ian@cilium.io> 13 April 2019, 19:34:08 UTC
4560095 Don't use local remote in backporting scripts [ upstream commit 521ee8357011958d9a099413a7b64688e1180df8 ] I have several local remotes set up on my machine, which helps in parallel development of features. When backporting, check-stable script took my local remotes instead of github origin because path to local remote contained `github.com/cilium/cilium` Signed-off-by: Maciej Kwiek <maciej@covalent.io> 12 April 2019, 12:18:47 UTC
bc40025 endpointmanager: Avoid regenerating restoring endpoints [ upstream commit 69b90d33381db757e4d35a8e5ef37e39d8217e10 ] If an endpoint is currently being restored, ensure that its state reflects this for the duration of the restoration, and avoid triggering regeneration during this time. Otherwise this could cause the endpoint to be regenerated using intermediate state (eg wrong security identity), rather than continuing to run the previously configured datapath until the newly started Cilium is ready to fully regenerate the endpoint. Related: #7358 Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Maciej Kwiek <maciej@covalent.io> 12 April 2019, 12:18:47 UTC
c576944 endpoint: Sanitize ep.SecurityIdentity on restore [ upstream commit 85e25bcc22bdcc0163a7b581de14cd7658caf69d ] When deserializing an endpoint from json, we previously didn't ensure that the security identity is properly restored. Other users of the security identity would rely upon the `ep.SecurityIdentity.LabelArray` being properly populated at all times, however this field was previously (1) not serialized, and (2) not repopulated upon restore. Sanitize the identity when restoring the endpoint from json. Fixes: #7358 Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Maciej Kwiek <maciej@covalent.io> 12 April 2019, 12:18:47 UTC
7b40d32 cni: Fix CNI delete side-effects [ upstream commit 19115f53270c93ccb77991fdd41ae92e8e759a88 ] Fail the CNI delete request if the agent cannot be contacted to delete the endpoint. Also, don't blindly release the IP address as return by the endpoint get. This is racy, rely on the delete endpoint call to release the IPs. It will do so in an atomic manner while ensuring that the respective endpoint really owns those IP addresses. Differentiate between errors that can be recovered from such as failure to reach the agent or timeouts and errors which are guaranteed to be permanent such as endpoint or interface nonexistence. Only return an error on recoverable errors to avoid kubelet retrying for a long timeout while minimizing the risk of leaving behind resources. Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: Maciej Kwiek <maciej@covalent.io> 12 April 2019, 12:18:47 UTC
6525eb5 endpoint: Delegate IP release on endpoint creation failure [ upstream commit f1b06d593406e6a7f952f7a66844f2a2c7443804 ] The caller of PUT /endpoint/{id} provides the IP. Do not attempt to release the IP provided by the caller. The caller is responsible to release the IP as it has to do so anyway. By releasing in the agent as well, a race window exists when the IP is becoming available after the agent release, gets allocated and the original owner then releases the IP. Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: Maciej Kwiek <maciej@covalent.io> 12 April 2019, 12:18:47 UTC
e4832e9 cni: Always release created resources on failure of CNI ADD [ upstream commit 1be1df03ef088f332133806d38728326028ba247 ] * ep.Addressing was not set in all error paths which could lead to leaking IP addresses * err variable was shadowed or not set in some cases, causing the deferred cleanup to not work correctly. This could lead to leaking of IP address and created interfaces. Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: Maciej Kwiek <maciej@covalent.io> 12 April 2019, 12:18:47 UTC
e4827c2 endpoint: Improve logging around headerfile writes [ upstream commit b60d512501f089854e25370fbc6768f20c1656d6 ] We haven't been performing any logging around header file writes, even when errors occur during filesystem interaction. Add a debug log before writing, and log any errors that may occur at debug level. Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Maciej Kwiek <maciej@covalent.io> 12 April 2019, 12:18:47 UTC
bed3b2c Vagrantfiles: bump version to 144 [ upstream commit 4e333a290d1fbc9b45096ca75c9e6adf36335768 ] This new VM caches the following image docker.io/cilium/istio_proxy_debug:1.0.0 Caching this image will reduce the amount of time the Istio CI tests take. Signed-off by: Ian Vernon <ian@cilium.io> Signed-off-by: Maciej Kwiek <maciej@covalent.io> 12 April 2019, 12:18:47 UTC
8db47f0 daemon: pass context down into QueueEndpointBuild [ upstream commit 7a352e08365b9c5bdcb099600d42f666541f8809 ] Since regeneration now has context associated with it, we can pass this into `QueueEndpointBuild`, which releases resources (build semaphore acquisition) if the endpoint regeneration is cancelled before the actual build permissions have been acquired. Update unit tests which implement the `Owner` interface to account for this change as well. Signed-off by: Ian Vernon <ian@cilium.io> Signed-off-by: vagrant <vagrant@runtime1> 12 April 2019, 01:40:18 UTC
1ea39c9 loader: check whether context is cancelled [ upstream commit 1f602bb2f0992560c2fb577acaa0cd6ed92ab31b ] This allows for detection of when endpoint builds are cancelled, instead of trying to perform operations after operations like endpoint creation have failed. Signed-off by: Ian Vernon <ian@cilium.io> Signed-off-by: vagrant <vagrant@runtime1> 12 April 2019, 01:40:18 UTC
e1418ec daemon: pass down context on endpoint creation into regeneration functionality [ upstream commit bb1f2e1dc561b99a0b83a15be08c0fb15b8e9b10 ] Passing down the context from the API request allows for the regeneration to be cancelled immediately if the associated context is cancelled. This is safe to do without worrying about an endpoint being in an indeterminate state in the case of endpoint creation because the endpoint will be deleted immediately afterwards upon the context being cancelled. Signed-off by: Ian Vernon <ian@cilium.io> Signed-off-by: vagrant <vagrant@runtime1> 12 April 2019, 01:40:18 UTC
a9b886b endpoint: use parent context with prepareForProxyUpdates [ upstream commit 7c588a70a3eed245067bca595305edc9a2be30da ] This means that if the context passed down through endpoint regeneration is cancelled, the proxy configuration will also be cancelled immediately as well. Signed-off by: Ian Vernon <ian@cilium.io> Signed-off-by: vagrant <vagrant@runtime1> 12 April 2019, 01:40:18 UTC
7893e2b endpoint: add Context field to regenerationContext [ upstream commit 7c40dedd4624b03973b17b8bf7aa910f0b26174b ] This will be used to propagate context down from callers of endpoint regenerate so that endpoint builds can be cancelled in the case of endpoint deletion. Signed-off by: Ian Vernon <ian@cilium.io> Signed-off-by: vagrant <vagrant@runtime1> 12 April 2019, 01:40:18 UTC
ccaa6dc exec: return for any error from context [ upstream commit 21b156fbb55c12fbf734be3b892d27d95c85a287 ] Since contexts passed down to regeneration can now be canceled, we need to handle not only `DeadlineExceedded` but also `Canceled`. Signed-off by: Ian Vernon <ian@cilium.dev> Signed-off-by: vagrant <vagrant@runtime1> 12 April 2019, 01:40:18 UTC
9e28a35 agent: Delete endpoints which failed to restore synchronously [ upstream commit 1a370790e4f9341e4fafde5140d712b1b5d90149 ] Endpoints which fail to restore have been deleted in the background so far. This has potentially overlapped with new endpoints being created which could then cause the release of resources such as IP addresses which are used by new endpoints. In order to avoid potential deadlocks in bootstrapping, identity release has to be disabled as we can rely on any kvstore dependency. Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: vagrant <vagrant@runtime1> 12 April 2019, 01:40:18 UTC
a1baaf5 Change suiteName to not match test folders names. [ upstream commit 25cb8f0d501eed1832059e1b31ebf8bdeb3bfcaf ] To avoid issues in Junit Attachment Jenkins plugin that the className is the same as a folder and attachment fails: ``` 00:59:24.653 Recording test results 00:59:38.788 Attachment fe200965_RuntimeFQDNPolicies_Implements_matchPattern:_"*".zip was referenced from the test 'runtime' but it doesn't exist. Skipping. 00:59:38.938 Attachment a923df9d_RuntimeFQDNPolicies_toFQDNs_populates_toCIDRSet_when_poller_is_disabled_(data_from_proxy)_Policy_addition_after_DNS_lookup.zip was referenced from the test 'runtime' but it doesn't exist. Skipping. ``` From here: https://github.com/jenkinsci/junit-attachments-plugin/blob/b68f75080535d3f264a2408958a88f8b733d30d4/src/main/java/hudson/plugins/junitattachments/GetTestDataMethodObject.java#L193 Signed-off-by: Eloy Coto <eloy.coto@gmail.com> Signed-off-by: vagrant <vagrant@runtime1> 12 April 2019, 01:40:18 UTC
8a4177d Documentation: clean up upgrade instructions [ upstream commit 127800d273e41d4ab52792109ba01ef794d3631b ] Versions of cilium which are EOL'd (e.g., v1.0-v1.2) have their own documentation about how to upgrade Cilium. Since we no longer support or update those versions with backports / documentation changes, we can remove the upgrade instructions for them from the upgrade guide. Tell users to refer to the documentation for that version for specific upgrade information to keep the upgrade guide at a sane size and more maintainable. Signed-off by: Ian Vernon <ian@cilium.io> Signed-off-by: vagrant <vagrant@runtime1> 12 April 2019, 01:40:18 UTC
8e1c649 identity: Don't serialize reference counts [ upstream commit a078bb313112a801a39234f9fafa610475d77359 ] I'm not sure why this reference count was serialized to json, but I don't see a way how it can be meaningful across cilium instances, so disable serialization/deserialization. Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: vagrant <vagrant@runtime1> 12 April 2019, 01:40:18 UTC
3dae5bc policy: Fix metrics for policy revision [ upstream commit 9195a107b1d3a54b4b0fd28b23f53f937f899692 ] Since commit 62b8993b2a25, the metrics policy revision was bumped twice in add/delete paths, for example once inside `p.BumpRevision()` and once outside in `AddListLocked()`. Fix this by leaving this responsibility up to `BumpRevision()`. Fixes: 62b8993b2a25 ("Policy: Increment revision using atomic.") Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: vagrant <vagrant@runtime1> 12 April 2019, 01:40:18 UTC
1c891e1 test: update k8s test versions to v1.14.1 [ upstream commit da45c47b11f97c84ed996589477d27fc48bfe19e ] Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: vagrant <vagrant@runtime1> 12 April 2019, 01:40:18 UTC
3b6c55c vendor: update k8s dependencies to 1.14.1 [ upstream commit 7e389f35911db69615226b9429e5502091eafffa ] Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: vagrant <vagrant@runtime1> 12 April 2019, 01:40:18 UTC
efd7bb4 cilium: docs update encryption algo example to use GCM [ upstream commit 7972c7f00c7f379ed35cf3f84295430454d18703 ] Use modern GCM-128-AES algorithm in encryption example. We see better performance with this algorithm. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: vagrant <vagrant@runtime1> 12 April 2019, 01:40:18 UTC
7c31996 cilium: support aead state keys [ upstream commit 074cb88ad91605c7b7d0c09905b4d627f0036a23 ] Add support aead keys. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: vagrant <vagrant@runtime1> 12 April 2019, 01:40:18 UTC
801b5eb cilium: ipsec tests should use decodeIPSecKey for strings to hex [ upstream commit 913a2cdb4f03e16a98af3b182ffa86f7a681f27d ] Currently, we pass keys in to byte[] as a string which is not reliable it seems to work but that is just by chance. Use the decodeIPSecKey() routine to convert strings to hex keys. Note, all code but tests use the decodeIPSecKey() so this also brings our test code inline with daemon code. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: vagrant <vagrant@runtime1> 12 April 2019, 01:40:18 UTC
ef3685d cilium: Policy rules are no longer unique for key [ upstream commit 74c656bb9e53d6228abfe203cac55719102fdc28 ] Now that policy rules are no longer unique per key we do not need to delete them when keys change. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: vagrant <vagrant@runtime1> 12 April 2019, 01:40:18 UTC
b1476c0 cilium: ipsec_linux only set spi bit in xfrm mark on egress [ upstream commit 9fea830dcc8db7d293f3cede4fae0a40ab714aaa ] When we add egress encryption policies via xfrm we add them using a mark/mask value like this (mark=0xke00 mask=0xff00) where 'k' is the Key index referencing the associated keys. This is used to select the key to use for encryption on egress when multiple keys are in-use during key rotation or encryption enablement/disablement. For example (mark=0x1e00) will only apply to skbs that should be encrypted with keys added with the 1 index. Similarly 0x2e00 will encrypt with the keys that have index 2, and so on. Being a bit lazy perhaps we install decryption policies using the same mark but with a mask the 'k' value masked out, e.g. (mark=0xkd00 mask=0x0f00) because the key to use for decryption is specified by the SPI field in the ESP header. Parsing the packet in BPF to additionally provide the 'k' is wasted work. However, when we attempt to delete the keys the kernel compares policy entries to find a matching policy to delete like so, if (pol->type == type && (mark & pol->mark.m) == pol->mark.v && !selector_cmp(sel, &pol->selector) && xfrm_sec_ctx_match(ctx, pol->security)) { Where 'mark' is the value passed in by user, 'pol->mark.m' is the installed policies mask value and 'pol->mark.v' is the installed policies mark value. So what is happening is the decrypt entries (the ones with mark=0xkd00 mask=0x0f00) get stuck in the policy database and can never be deleted. This happens because the policies mask is applied to the input before comparing with the value. This also explains why the key rollover logic gets stuck sometimes because old policies are not being removed completely. This seems like a kernel bug or at least a strange design but the fix from user space is to simple. After this patch the decrypt rules do not include the 'k' bits so that (mark=0x0d00 mask0x0f00). After which the delete operation is sucessful. This bug was introduced when we added (re)keying support. Unit tests worked previously because if we don't try to add any conflicting entries nothing "bad" happens but this PR deletes policy/state to add a new policy/state with the same src/dst mark/mask but with new keys. And the failure above is seen. Fixes: b6989723a7cc ("cilium: ipsec, support rolling updates") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: vagrant <vagrant@runtime1> 12 April 2019, 01:40:18 UTC
3472b1c cilium: ipsec_linux, remote DeleteIPSecEndpint and use SPI version [ upstream commit 6202595c7230b8f957ba4214a21e50d64e907e4a ] DeleteIPSecEndpoint is only used in ipsec_linux_testing its been replaced by the routine ipsecDeleteXfrmSPI which is called once a timer expires and xfrm entries need to be deleted. This patch removes DeleteIPSecEndpoint and converts the testing code to use the same delete method used from daemon side. ipsecDeleteXfrmSPI(0) deletes all policy/state entries associated with Cilium. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: vagrant <vagrant@runtime1> 12 April 2019, 01:40:18 UTC
c20df18 kvstore: Return from LockPath() when local locking is cancelled [ upstream commit 341274337e5dc928b6ed4326776651fa26e23fc0 ] Previously, the global lock was attempted to be taken which would then fail and in return called unlock() on the local lock which was not acquired before. Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: vagrant <vagrant@runtime1> 12 April 2019, 01:40:18 UTC
961c02a kvstore: Protect Unlock() from timeout overwrite [ upstream commit 7bc3528336f4496763fffe4d23b69b50b49ec933 ] When the local lock times out, the ownership of the lock can change. Ensure that Unlock() only unlocks the lock if the lock is still held by the respective owner. Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: vagrant <vagrant@runtime1> 12 April 2019, 01:40:18 UTC
8ca0708 allocator: Provide info and warning messages around key allocation [ upstream commit 8091d2441f38f1fe763783ea4b23ed19c45f048b ] It is vital to have some visibility into this without requiring debug or trace messages enabled. Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: vagrant <vagrant@runtime1> 12 April 2019, 01:40:18 UTC
1b9636c allocator: Block Allocate() and Release() until key list is initialized [ upstream commit ce889ce39084e8be9a199d155ce41ac3e8169003 ] By allowing the Allocate() and Release() to start before the initial list of keys has been received causes two actors to modify the local cache which can result in corruption of the cache as the cache assumes that the ListAndWatch() will guarantee consistency. This may resolve #7598 Related: #7598 Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: vagrant <vagrant@runtime1> 12 April 2019, 01:40:18 UTC
1a050eb cilium, bpf: fix panic when run with newer LLVM [ upstream commit 2b27408ef9377effceef4cce64c1a5d9918b5237 ] While debugging another issue I ran into the following panic wrt ELF templating: # daemon/cilium-agent --kvstore consul --kvstore-opt consul.address=127.0.0.1:8500 --enable-ipv6=false --masquerade=true --auto-direct-node-routes=true --disable-envoy-version-check [...] panic: runtime error: index out of range goroutine 318 [running]: github.com/cilium/cilium/pkg/elf.(*symbols).extractFrom(0xc0006520d0, 0xc000f28320, 0xc000f28320, 0x0) /root/go/src/github.com/cilium/cilium/pkg/elf/symbols.go:170 +0x1357 github.com/cilium/cilium/pkg/elf.NewELF(0x317a580, 0xc00000f518, 0xc0001090a0, 0x0, 0x0, 0xc0001090a0) /root/go/src/github.com/cilium/cilium/pkg/elf/elf.go:77 +0x145 github.com/cilium/cilium/pkg/elf.Open(0xc000afd5c0, 0x52, 0xc0000a6c00, 0x7f42126307d8, 0xc0006f20f0) /root/go/src/github.com/cilium/cilium/pkg/elf/elf.go:97 +0x1eb github.com/cilium/cilium/pkg/datapath/loader.CompileOrLoad(0x31b6f80, 0xc0000a6c00, 0x31db1c0, 0xc0006f20f0, 0xc000fdf7c0, 0x0, 0x0) /root/go/src/github.com/cilium/cilium/pkg/datapath/loader/loader.go:123 +0xd6 github.com/cilium/cilium/pkg/endpoint.(*Endpoint).realizeBPFState(0xc000d46000, 0xc000fdf680, 0xc000fdf888, 0xc000fdf680, 0xc0012d5ad8) /root/go/src/github.com/cilium/cilium/pkg/endpoint/bpf.go:488 +0x30d github.com/cilium/cilium/pkg/endpoint.(*Endpoint).regenerateBPF(0xc000d46000, 0x31d7540, 0xc0007b21c0, 0xc000fdf680, 0x0, 0x0, 0x0, 0x0) /root/go/src/github.com/cilium/cilium/pkg/endpoint/bpf.go:415 +0x273 github.com/cilium/cilium/pkg/endpoint.(*Endpoint).regenerate(0xc000d46000, 0x31d7540, 0xc0007b21c0, 0xc000fdf680, 0x0, 0x0) /root/go/src/github.com/cilium/cilium/pkg/endpoint/policy.go:323 +0x704 github.com/cilium/cilium/pkg/endpoint.(*EndpointRegenerationEvent).Handle(0xc00139d8c0, 0xc0000a6ba0) /root/go/src/github.com/cilium/cilium/pkg/endpoint/events.go:54 +0x21b github.com/cilium/cilium/pkg/eventqueue.(*EventQueue).Run.func1() /root/go/src/github.com/cilium/cilium/pkg/eventqueue/eventqueue.go:236 +0x144 sync.(*Once).Do(0xc0004f0e78, 0xc0010df540) /usr/local/go/src/sync/once.go:44 +0xb3 created by github.com/cilium/cilium/pkg/eventqueue.(*EventQueue).Run /root/go/src/github.com/cilium/cilium/pkg/eventqueue/eventqueue.go:225 +0xa9 level=fatal msg="Agent pipe unexpectedly closed, shutting down" subsys=cilium-node-monitor Reason is that newer LLVM changed BPF backend to emit symbols under OBJECT instead of NOTYPE symbol scope. Also, BTF line info is not recognized by golang's default ELF parser, so accessing section for it will lead to the oob access panic (sym.Section is 64k here). Just skip these. # llc --version LLVM (http://llvm.org/): LLVM version 9.0.0svn Optimized build. Default target: x86_64-unknown-linux-gnu Host CPU: haswell Registered Targets: bpf - BPF (host endian) bpfeb - BPF (big endian) bpfel - BPF (little endian) x86 - 32-bit X86: Pentium-Pro and above x86-64 - 64-bit X86: EM64T and AMD64 Symtab from bpf_netdev: # readelf -a /var/run/cilium/state/bpf_netdev.o [...] 30: 00000000000000a8 28 OBJECT GLOBAL DEFAULT 14 cilium_calls_netdev_2 31: 0000000000000000 28 OBJECT GLOBAL DEFAULT 14 cilium_events 32: 00000000000000c4 28 OBJECT GLOBAL DEFAULT 14 cilium_ipcache 33: 000000000000001c 28 OBJECT GLOBAL DEFAULT 14 cilium_lxc 34: 0000000000000038 28 OBJECT GLOBAL DEFAULT 14 cilium_metrics 35: 0000000000000054 28 OBJECT GLOBAL DEFAULT 14 cilium_policy 36: 0000000000000070 28 OBJECT GLOBAL DEFAULT 14 cilium_policy_reserved_2 37: 000000000000008c 28 OBJECT GLOBAL DEFAULT 14 cilium_proxy4 [...] Symtab from bpf_lxc: # readelf -a /var/run/cilium/state/templates/c4406a7451ccb067746c874b71aaf19266ce7212/bpf_lxc.o [...] 285: 000000000000001c 4 OBJECT GLOBAL DEFAULT 11 LXC_ID 286: 0000000000000010 4 OBJECT GLOBAL DEFAULT 11 LXC_IPV4 287: 0000000000000000 4 OBJECT GLOBAL DEFAULT 11 LXC_IP_1 288: 0000000000000004 4 OBJECT GLOBAL DEFAULT 11 LXC_IP_2 289: 0000000000000008 4 OBJECT GLOBAL DEFAULT 11 LXC_IP_3 290: 000000000000000c 4 OBJECT GLOBAL DEFAULT 11 LXC_IP_4 291: 0000000000000014 4 OBJECT GLOBAL DEFAULT 11 NODE_MAC_1 292: 0000000000000018 4 OBJECT GLOBAL DEFAULT 11 NODE_MAC_2 293: 0000000000000020 4 OBJECT GLOBAL DEFAULT 11 SECLABEL 294: 0000000000000024 4 OBJECT GLOBAL DEFAULT 11 SECLABEL_NB 295: 0000000000000000 4 OBJECT GLOBAL DEFAULT 13 ____license [...] Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Maciej Kwiek <maciej@covalent.io> 10 April 2019, 15:00:54 UTC
a2e36e6 docs: Document cilium-operator in concepts section. [ upstream commit 861b39e9fbc6f926a8e890d92573d040b7f0e09e ] Fixes: #7501 Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Maciej Kwiek <maciej@covalent.io> 10 April 2019, 15:00:54 UTC
fdf36ec daemon: remove host-allows-world option [ upstream commit aacd71ef7209de51d4b04502c59ea3fd65c66548 ] This is scheduled to be removed in v1.5, so remove it. Signed-off by: Ian Vernon <ian@cilium.io> Signed-off-by: vagrant <vagrant@runtime1> 10 April 2019, 09:31:14 UTC
back to top