https://github.com/cilium/cilium

sort by:
Revision Author Date Message Commit Date
364f55f Prepare for release v1.7.13 Signed-off-by: Joe Stringer <joe@cilium.io> 27 January 2021, 22:30:50 UTC
5cf516a Dockerfile: Bump cilium-runtime image Signed-off-by: Joe Stringer <joe@cilium.io> 27 January 2021, 22:17:43 UTC
a448957 contrib/release: clarify project number for release process [ upstream commit 257e91f0780813da2a319b8999ae9cbbadc12cee ] Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Joe Stringer <joe@cilium.io> 26 January 2021, 11:43:16 UTC
a6258b3 cilium-cni: Fix error handling for bad netns [ upstream commit 6e3ca8f84e643200203aec24b4310ec71c403942 ] If kubelet gives cilium-cni bad input (no netns), the error here would not be returned properly to the caller, which could result in a segfault: panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x14e0c8b] goroutine 1 [running, locked to thread]: main.cmdAdd(0xc00015a000, 0xc0004d60e8, 0x5) /go/src/github.com/cilium/cilium/plugins/cilium-cni/cilium-cni.go:354 +0x5cb github.com/containernetworking/cni/pkg/skel.(*dispatcher).checkVersionAndCall(0xc0005e5d40, 0xc00015a000, 0x1a42f20, 0xc0004de000, 0x18d07c0, 0x0, 0x44a1ef) /go/src/github.com/cilium/cilium/vendor/github.com/containernetworking/cni/pkg/skel/skel.go:185 +0x258 github.com/containernetworking/cni/pkg/skel.(*dispatcher).pluginMain(0xc0005e5d40, 0x18d07c0, 0x0, 0x18d07c8, 0x1a42f20, 0xc0004de000, 0xc000174000, 0x5d, 0xc000174000) /go/src/github.com/cilium/cilium/vendor/github.com/containernetworking/cni/pkg/skel/skel.go:221 +0x546 github.com/containernetworking/cni/pkg/skel.PluginMainWithError(...) /go/src/github.com/cilium/cilium/vendor/github.com/containernetworking/cni/pkg/skel/skel.go:286 github.com/containernetworking/cni/pkg/skel.PluginMain(0x18d07c0, 0x0, 0x18d07c8, 0x1a42f20, 0xc0004de000, 0xc000174000, 0x5d) /go/src/github.com/cilium/cilium/vendor/github.com/containernetworking/cni/pkg/skel/skel.go:301 +0x128 main.main() /go/src/github.com/cilium/cilium/plugins/cilium-cni/cilium-cni.go:85 +0x33c The above logs would typically be pushed to kubelet logs. Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Michal Rostecki <mrostecki@opensuse.org> 21 January 2021, 20:53:08 UTC
4e0eac8 ip: Fix RemoveCIDR edge condition [ upstream commit 6ad5a22b8644fc23fb5dec2291a80ffce8d8657e ] A CIDR should be able to be removed from itself. Add test cases that would fail without the fix: $ go test ---------------------------------------------------------------------- FAIL: ip_test.go:184: IPTestSuite.TestRemoveCIDRsEdgeCase ip_test.go:190: s.testIPNetsEqual(allowedCIDRs, expectedCIDRs, c) ip_test.go:83: c.Assert(created, HasLen, len(expected)) ... obtained []*net.IPNet = []*net.IPNet(nil) ... n int = 1 ---------------------------------------------------------------------- FAIL: ip_test.go:194: IPTestSuite.TestRemoveCIDRsEdgeCase2 ip_test.go:200: s.testIPNetsEqual(allowedCIDRs, expectedCIDRs, c) ip_test.go:83: c.Assert(created, HasLen, len(expected)) ... obtained []*net.IPNet = []*net.IPNet(nil) ... n int = 1 ---------------------------------------------------------------------- FAIL: ip_test.go:176: IPTestSuite.TestRemoveSameCIDR ip_test.go:180: c.Assert(err, IsNil) ... value *errors.errorString = &errors.errorString{s:"allow CIDR prefix must be a superset of remove CIDR prefix"} ("allow CIDR prefix must be a superset of remove CIDR prefix") OOPS: 14 passed, 3 FAILED Signed-off-by: Jarno Rajahalme <jarno@covalent.io> Signed-off-by: Michal Rostecki <mrostecki@opensuse.org> 21 January 2021, 20:53:08 UTC
af81330 clustermesh: Ignore symlink files on fsnotify events [ upstream commit f03450706a4b3ca4fcfd17def021a74114aac7f5 ] Kubernetes secrets are mapped into the pod using symlinks. The initial scan was already correctly ignoring symlinks but the fsnotify events have not been. This has resulted in invalid cluster configurations being added: ``` ClusterMesh: 0/3 clusters ready, 0 global-services cluster2: not-ready, 0 nodes, 0 identities, 0 services, 0 failures (last: never) └ Waiting for initial connection to be established ..2021_01_08_21_11_57.892158678: not-ready, 0 nodes, 0 identities, 0 services, 0 failures (last: never) └ Waiting for initial connection to be established ..data: not-ready, 0 nodes, 0 identities, 0 services, 0 failures (last: never) └ Waiting for initial connection to be established ``` Fixes: 076b0188b98 ("Inter cluster connectivity (ClusterMesh)") Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: Michal Rostecki <mrostecki@opensuse.org> 21 January 2021, 20:53:08 UTC
a5b91c3 [backport 1.7] pkg/node: Skip setting MTU on local node routes [ upstream commit 4ad84d9cd35c] Previously, we were setting mtu on local node routes, which takes effect in routing in recent kernels. This could lead to drops for jumbo packets when the "Don't fragment" bit is set. Regardless of the kernel routing, we shouldn't restrict mtu on the host local routes. This fix avoids setting an mtu on local node routes (i.e., when enable-local-node flag is enabled). Testing - kubectl get node k8s1 -o json | jq .spec.podCIDR "10.16.0.0/16" Before: 10.16.0.0/16 via 10.16.213.171 dev cilium_host src 10.16.213.171 mtu 1450 After: 10.16.0.0/16 via 10.16.213.171 dev cilium_host src 10.16.213.171 Signed-off-by: Aditi Ghag <aditi@cilium.io> 21 January 2021, 20:37:58 UTC
bd444aa daemon: Plumb the endpoint garbage collector In a user environment, we encountered a scenario where pods would "disappear" and Cilium seemingly received no notification of deletion (eg, CNI DELETE command). If this occurs regularly over time, then this leads to Cilium managing a series of phantom endpoints which no longer have corresponding k8s objects. Some symptoms include: * Exhaustion of IP address management pool, causing inability to deploy new endpoints on the node; * Metrics such as cilium_endpoint_count increase to the maximum number of endpoints on the node (typically limited by IP pool); * Label resolution controllers reporting errors in "cilium status" output, such as: pod.core "my-pod-name" not found; and * Endpoints that Cilium is aware of have no corresponding veth interfaces. Fix this by periodically iterating the list of exposed endpoints and checking that the endpoints are still alive and healthy, by checking that the link is still present on the node. If an endpoint's link is not present for two consecutive iterations of the garbage collection (and the endpoint is not otherwise cleaned up by CNI DELETE or similar operations), then disconnect it from the endpointmanager and release its resources. Signed-off-by: Joe Stringer <joe@cilium.io> 20 January 2021, 08:27:57 UTC
0e4378a endpointmanager: Add garbage collection routine To improve the robustness of Cilium operations in production environments, introduce a new periodic garbage collection controller that attempts to detect when endpoints are no longer alive and healthy, and disconnects them from the agent. Signed-off-by: Joe Stringer <joe@cilium.io> 20 January 2021, 08:27:57 UTC
49cc364 daemon: Refactor endpoint management to expose Delete Due to some complications around daemon initialization, when deleting an endpoint, the caller must pass in additional parameters like the daemon, the IPAM module, and the endpoint manager. This makes it difficult to share an endpoint deletion implementation with other subsystems. While it'd be nice to better untangle these objects, doing so could accrue some risk which we'd rather avoid while preparing an upcoming bugfix commit. Instead, refactor endpoint management functionality into a new dedicated type in the daemon package, which hopefully can be subsequently refactored into more appropriate places (like the endpointmanager package). No functional changes, this is purely prepatory work for additional callers of endpointManager.Delete() from other packages. Signed-off-by: Joe Stringer <joe@cilium.io> 20 January 2021, 08:27:57 UTC
c4933f1 docs: Add upgrade docs for egress-multi-home-ip-rule-compat Signed-off-by: Chris Tarazi <chris@isovalent.com> 06 January 2021, 19:20:02 UTC
332a3fd routing: Fix route collisions in AWS ENI This commit fixes a potential route collision in AWS ENI IPAM modes, where the ifindex could equal the main routing table ID (from 253-255) [1], causing traffic to be subject to these routes incorrectly. This is admittedly rare, but we've seen this from a user report. The impact is that most traffic on the node is suddenly blackholed. To fix this, we say that each device or interface (ENI) will have their own dedicated routing table. The table ID will start with an offset of 10 because it is highly unlikely to collide with the main routing table ID (from 253-255). We grab the number associated with the ENI device (`Number`) and add the offset. For example, if we have an ENI device "eni-0" which has a `Number` of 5, then the table ID will be 10 + 5 = 15. Another important piece to note is that only the egress rule will reside inside the per-device tables, whereas the ingress rule will stay in the main routing table. This is because we want the main routing table to hold the routes to the endpoint. Moving forward, the ENI datapath will now create rules under a new egress priority value (RulePriorityEgressv2), as long as the egress-multi-home-ip-rule-compat flag is false. If it's true, then the datapath will create rules under the original egress priority value (RulePriorityEgress). This helps disambiguate when running with the older or newer ENI datapath. See https://github.com/cilium/cilium/issues/14336. [1]: See ip-route(8) Reported-by: Vlad Ungureanu <vladu@palantir.com> Suggested-by: Joe Stringer <joe@cilium.io> Suggested-by: Thomas Graf <thomas@cilium.io> Signed-off-by: Chris Tarazi <chris@isovalent.com> 06 January 2021, 19:20:02 UTC
1b9ed44 routing: Add ENI route table migration logic This commit will fixup the ENI datapath depending on the egress-multi-home-ip-rule-compat flag (see previous commits for context). The migration logic supports both upgrading and downgrading the ENI datapath. This logic must run on startup before the API is served and before the health endpoint is created, so that no endpoints are prematurely crreated before Cilium has had the chance to migrate the entire datapath. See https://github.com/cilium/cilium/issues/14336. Suggested-by: Joe Stringer <joe@cilium.io> Suggested-by: Thomas Graf <thomas@cilium.io> Signed-off-by: Chris Tarazi <chris@isovalent.com> 06 January 2021, 19:20:02 UTC
db23c50 revert: Add ability to extend the revert stack This is useful to aggregate the items to revert in one stack, so that it can all be done at once. Signed-off-by: Chris Tarazi <chris@isovalent.com> 06 January 2021, 19:20:02 UTC
e052560 linux_defaults: Add RouteTableInterfacesOffset This new value is the table ID for the per-ENI routing tables in the new ENI datapath. Upcoming commits will use this value and implement the new datapath. See https://github.com/cilium/cilium/issues/14336. Signed-off-by: Chris Tarazi <chris@isovalent.com> 06 January 2021, 19:20:02 UTC
aec2d8f linux_defaults: Add RulePriorityEgressv2 This new priority value is vital for disambiguating which rules are still under the old scheme. Without this, upgrading to the new scheme would be difficult, as we aren't able to identify which rules have been fixed up [1]. Furthermore, this would also allow us to enable downgrades from the new scheme, because we would be able to identify which rules need to be modified. [1]: https://github.com/cilium/cilium/issues/14336 Signed-off-by: Chris Tarazi <chris@isovalent.com> 06 January 2021, 19:20:02 UTC
b86bf93 cni, routing: Plumb interface number In the previous commit, we added the interface number to the IPAM response for ENI mode. This commit plumbs this new field into the CNI to set up the ENI datapath. Signed-off-by: Chris Tarazi <chris@isovalent.com> 06 January 2021, 19:20:02 UTC
a5d696a api: Extend IPAM to accept interface number This is needed in ENI mode. In upcoming commits, the interface number (ENI.Number) will be used to compute the per-ENI table ID in order to store rules and routes for the ENI datapath. Signed-off-by: Chris Tarazi <chris@isovalent.com> 06 January 2021, 19:20:02 UTC
d5ad26c api: Expose egress-multi-home-ip-rule-compat flag This is important for use in the CNI to decide whether to use the new ENI datapath (see previous commit for context) or the original datapath. Signed-off-by: Chris Tarazi <chris@isovalent.com> 06 January 2021, 19:20:02 UTC
549c256 daemon, option: Add flag egress-multi-home-ip-rule-compat This flag is needed to control the behavior of Cilium when it starts up under ENI mode. If the flag is false, meaning "do not maintain compatibility", then Cilium will attempt to migrate the ENI datapath (`ip rule`s and routes) created under the aforementioned IPAM mode to a new table ID scheme. The table ID refers to the Linux routing policy database tables, aka "routing table". If the flag is true, meaning "maintain compatibility", then Cilium will not attempt to migrate the ENI database under the aforementioned IPAM mode to the new table ID scheme. It will continue to use the original scheme. Additionally, when the flag is true and Cilium finds the rules under the newer scheme (by checking the priority of the rule), it will attempt to migrate back to the original scheme. This allows downgrading Cilium without affecting connectivity. Signed-off-by: Chris Tarazi <chris@isovalent.com> 06 January 2021, 19:20:02 UTC
cfab9cf routing: Remove unnecessary debug logs from test Signed-off-by: Chris Tarazi <chris@isovalent.com> 06 January 2021, 19:20:02 UTC
18da314 routing: Refactor helper to run function in netns This makes it usable for an upcoming commit which adds a new test suite to this packag. Signed-off-by: Chris Tarazi <chris@isovalent.com> 06 January 2021, 19:20:02 UTC
df62e87 k8s: Update libraries to v1.17.16 This also updates the K8s version for provisioning in CI. Signed-off-by: Chris Tarazi <chris@isovalent.com> 22 December 2020, 16:41:31 UTC
47d7b9b ipcache: Use controller.Manager on IPIdentityCache for ipcache-bpf-garbage-collection [ upstream commit 1ef686bfb92bb3cc7ee1ac5a6d5d706835e34806 ] Signed-off-by: John Watson <johnw@planetscale.com> Signed-off-by: Gilberto Bertin <gilberto@isovalent.com> 17 December 2020, 18:18:49 UTC
bfd9b79 pkg/k8s: fix k8s_event_lag_seconds for negative time [ upstream commit 478f409ccab1b7bfb73653aed90756ad0ee5cd44 ] In some occasions the metric `k8s_event_lag_seconds` could be presented as an overflown value such as `9223372036854775807`. This commit fixes this by checking if the calculated value is less than zero by only setting this metric for positive times. Fixes: 4e2913004340 ("pkg/endpoint: calculate Kube API-Server lag from pod events") Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Gilberto Bertin <gilberto@isovalent.com> 17 December 2020, 18:18:49 UTC
95a76d5 docs: Fix dependency conflict [ upstream commit f15af5e64617ff39629e37ac7dc5f85065921038 ] Pip 20.3 introduced a new dependency resolver[[0]] which silently reinterprets our current requirements file in a different way to resolve the dependencies, which results the build being broken. On non-aarch64 systems, there are two requirements that satisfy the sphinx-rtd-theme package: * One provided by our own theme repository[[1]] which has nice consistent theming for the website, and * One provided by the upstream sphinx-rtd-theme package. Prior to pip 20.3, the default resolver was able to resolve this conflict to favour our custom theme, which is the one we intend to use in most cases. Unfortunately, with the new resolver, this conflict is resolved the other way. As far as I can tell, there is no "strict" mode to prescribe that pip should resolve conflicts first and fail out if the requirements are ambiguous. Instead, pip silently resolves this conflict and we do not find out that there was ambiguity until later in the process. In addition, on readthedocs.org, new versions of pip are automatically pulled upon each new docs build, which meant that from one day to the next, a previously successful build began to consistently fail with weird errors that imply a problem with the dependency but don't explain why the problem was introduced without changes to code in our build system: Theme error: no theme named 'sphinx_rtd_theme_cilium' found (missing theme.conf?) make[1]: *** [Makefile:48: html] Error 2 make: *** [Makefile:552: test-docs] Error 2 Fix the issue by disambiguating the theme dependency. [0]: http://pyfound.blogspot.com/2020/11/pip-20-3-new-resolver.html [1]: https://github.com/cilium/sphinx_rtd_theme Fixes: #14252 Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Gilberto Bertin <gilberto@isovalent.com> 17 December 2020, 18:18:49 UTC
573dd34 docker: rebuild cilium-runtime image The build of the cilium-runtime docker image will pull in the latest version of the openssl package, which includes a fix for CVE-2020-1971. Signed-off-by: Gilberto Bertin <gilberto@isovalent.com> 17 December 2020, 11:39:28 UTC
4427286 policy: Don't nil an empty selectors map. [ upstream commit 03bfb2bece5108549b3d613e119059758035d448 ] Turns out unit testing did not need this any more and this actually caused a runtime panic. Signed-off-by: Jarno Rajahalme <jarno@covalent.io> 13 December 2020, 12:00:15 UTC
0e19149 policy: Track selectors that contribute to MapStateEntries [ upstream commit 04840b96530031a84bc359c476a59d320617d2db ] Track which selectors in policy require a specific bpf policy map key to be present, and keep policy entries in the map as long as any selector requires it's presence. Without this it is possible for a timed-out DNS cache entry to clear a policy cache key that is still required by another selector (FQDN or CIDR). To implement this, each MapStateEntry is now equipped with a set of (cached) selectors through which the policy map key/value was added. 'nil' has the special significance that it is used as the CachedSelector in cases where the policy map entry is added due to some administrative or configuration reason. Currently incremental updates will never remove such entries. Incremental policy updates now simply collect the requested map changes. When the endpoint then pulls the changes they are first applied the desired policy map (MapState), while tallying which selectors still need the map entries to be present. The actual bpf map diffs are recorded based on the total count of selectors on each map entry. Signed-off-by: Jarno Rajahalme <jarno@covalent.io> 11 December 2020, 11:06:40 UTC
5665303 Prepare for release v1.7.12 Signed-off-by: Joe Stringer <joe@cilium.io> 04 December 2020, 10:59:12 UTC
b8b817d vendor: Fix cilium/arping goroutine leak [ upstream commit 24d44500e40af599dfc1b932be0dac1b75504889 ] This fixes a privileged runtime test failure caused by leaked goroutines on arpings with no response. Signed-off-by: Jarno Rajahalme <jarno@covalent.io> 03 December 2020, 20:08:31 UTC
54444f3 metrics: add cilium_datapath_nat_gc_entries [ upstream commit 57784e318449e711ffc994ee88609397086330a4 ] [ Backporter's notes: Resolved conflict with gcFamily const types. ] Signed-off-by: ArthurChiao <arthurchiao@hotmail.com> Signed-off-by: Chris Tarazi <chris@isovalent.com> 30 November 2020, 10:10:15 UTC
2994bba metrics: replace replicated "direction" strings with LabelDirection constant [ upstream commit e4bf8ca149a95a611335bc82acd948995393a189 ] Signed-off-by: ArthurChiao <arthurchiao@hotmail.com> Signed-off-by: Chris Tarazi <chris@isovalent.com> 30 November 2020, 10:10:15 UTC
befd325 pkg/bpf: Wrap error to fix test failure [ upstream commit 0e220cfc3 ] [ Backporter's notes: The upstream commit is coming from v1.8 branch, as the fix was not needed on v1.9 and master. See https://github.com/cilium/cilium/pull/14022#issuecomment-727923791 and https://github.com/cilium/cilium/pull/13912#issuecomment-730185299 ] To fix the ctmap privileged test failure, the following needs to be applied (otherwise, if ...; errors.Is(err, unix.ENOENT) is always false in the PurgeOrphanNATEntries(); the change was introduced in v1.9 by 2283103): Signed-off-by: Tom Payne <tom@isovalent.com> 30 November 2020, 10:10:15 UTC
fe46860 ctmap: Iterate SNAT map once when doing GC [ upstream commit 0c83c28963cdc2af5514b9707cae42d815afbdc1 ] [ Backporter's notes: Resolved conflict with exporting MapType* consts which were unexported in master branch. ] Previously, after receiving the signal from the datapath, we iterated NAT map twice: first to compare against CT TCP map, second - against CT any map. Obviously, doing the iterations two times was inefficient. This commit fixes that by passing both CT {TCP,any} maps to the NAT GC routine. This allows the NAT GC to iterate once. Suggested-by: Joe Stringer <joe@cilium.io> Signed-off-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: Chris Tarazi <chris@isovalent.com> 30 November 2020, 10:10:15 UTC
471dfd5 ctmap: GC orphan SNAT entries [ upstream commit c9810bf7b235ee371279c85f025535a86b1ea675 ] [ Backporter's note: Resolved slight conflicts with (*Map).DumpEntries and removed reference to NodePort hybrid mode which doesn't exist in v1.7 branch. ] This commit adds a mechanism to remove orphan SNAT entries. We call an SNAT entry orphan if it does not have either a corresponding CT entry or an SNAT entry in a reverse order. Both cases can happen due to LRU eviction heuristics (both CT and NAT maps are of the LRU type). The mechanism for the removal is based on the GC signaling in the datapath. When the datapath SNAT routine fails to find a free mapping after SNAT_SIGNAL_THRES attempts, it sends the signal via the perf ring buffer. The consumer of the buffer is the daemon. After receiving the signal it invokes the CT GC. The newly implemented GC addition iterates over all SNAT entries and checks whether a corresponding CT entry is found, and if not, it tries to remove both SNAT entries (for original and reverse flows). For now, I didn't add GC of orphan SNAT entries created by DSR to keep complexity of changes as low as possible. This will come as a follow up. Signed-off-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: Chris Tarazi <chris@isovalent.com> 30 November 2020, 10:10:15 UTC
8473756 cilium: generalize and extend signal framework for CT fill-up [ upstream commit 24fcbe6144df229b302cddf045d9f229f6e69f2e ] Rework the 1:1 relationship with signal to channel and instead allow different signals from BPF datapath for the same go channel. This is useful so we can push the different BPF signals into the metric collection. Wire-up Signal{CT,Nat}FillUp signal into the SignalWakeGC channel. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Martynas Pumputis <m@lambda.lt> 30 November 2020, 10:10:15 UTC
a5fc85a bpf: signal agent on CT map update error that CT map is full [ upstream commit fc5a3bd73f39012cedb99cf4b32461d45eeca812 ] There are users on 4.9 kernels which are suffering connectivity loss since the CT map is full and GC doesn't trigger yet. We can help improving the situation with the same framework we set in place for NAT when under stress. Upon insertion error, send a signal to the agent in order to trigger GC so that it can free up old entries. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Martynas Pumputis <m@lambda.lt> 30 November 2020, 10:10:15 UTC
a628214 maps/ctmap: unexport NewMap, MapType type and related consts [ upstream commit b563284ac8910b5f26a98a539694fb652c8b56b9 ] All of these are not used outside the ctmap package. Also make the mapTypeIP* consts typed to avoid type conversions when using them. Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Signed-off-by: Martynas Pumputis <m@lambda.lt> 30 November 2020, 10:10:15 UTC
553815e Register logging flags with operator main cmd [ upstream commit 54275be17ab694abc0dc39af6ddf3a87fb3b38c8 ] Signed-off-by: Vlad Ungureanu <vladu@palantir.com> Signed-off-by: André Martins <andre@cilium.io> 26 November 2020, 14:19:39 UTC
73c8d19 Support '--log-opt format=json' option to log in JSON format [ upstream commit 454eae136e7f5e4ba25b0901646bb87c70aa6ca0 ] Signed-off-by: Maxime VISONNEAU <maxime.visonneau@gmail.com> Signed-off-by: André Martins <andre@cilium.io> 26 November 2020, 14:19:39 UTC
491805f endpoint: Add DebugPolicy option Add endpoint DebugPolicy option that, if enabled, logs endpoint policy map update details to /var/run/cilium/state/endpoint-policy.log. The new DebugPolicy option is enabled if the new flag --debug-verbose=policy is set, but can be enabled also independently via: cilium endpoint config <EPID> DebugPolicy=true Signed-off-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: Jarno Rajahalme <jarno@covalent.io> 25 November 2020, 01:27:41 UTC
538223c daemon: Postpone ipcache upserts until after policy changes have been regenerated by endpoints. Move ipcache CIDR upserts and releases to the policy reaction queue, where upserts can be executed after regenerations have been completed, i.e. after endpoint policy maps have been updated. This way IP addresses are mapped to newly allocated identities only after endpoint policy maps are ready to classify them. Correspondingly, on deletes the to-be-deleted CIDR identities are first deleted from ipcache so that when they are deleted from endpoint policy maps they are no longer used in classification. Releases of CIDR identities must still be serialized with ipcache upserts via the policy reaction queue so that they are executed in the same order w.r.t. ipcache upserts as policy deletes and adds. Signed-off-by: Jarno Rajahalme <jarno@covalent.io> 24 November 2020, 19:15:19 UTC
451443d fqdn: Delay ipcache upserts until policies have been updated Add a map for newly allocated identities to ipcache.AllocateCIDR functions that the caller can use to upsert the IPs to ipcache later, after affected endpoint policy maps have been updated. Use this new functionality on the DNS proxy code path, that makes sure that new policy map entries are in place before an IP received from a DNS server is placed in ipcache. This is really straightforward as the logic for waiting was already in place for delaying the forwarding of the DNS response. Policy update path is still allowing ipcache upserts at policy ingestion time rather than waiting for the policy maps to be updated. This means that new, more specific CIDRs (e.g., 10.0.0/24) in policies can still cause momentary drops on traffic currently using a less specific CIDR (e.g., 10.0/16). Similarly the DNS poller path still upserts to ipcache before policies have been updated. Signed-off-by: Jarno Rajahalme <jarno@covalent.io> 24 November 2020, 19:15:19 UTC
8b35ae4 fqdn: Fix unit test Setting usedServers to nil caused write to nil map on other tests. Signed-off-by: Jarno Rajahalme <jarno@covalent.io> 21 November 2020, 00:58:27 UTC
f767866 fqdn: Only keep used IPs for restored DNS rules. [ upstream commit 61efa8fd0e58b65de9628de7e3ef8db0cd4df40c ] The DNS policy may allow a huge number of IPs, only some of which are actual DNS servers. Collect a set of DNS servers that have been allowed in the past and only store allowed IPs that have actually been used. Signed-off-by: Jarno Rajahalme <jarno@covalent.io> 21 November 2020, 00:58:27 UTC
2b03d55 fqdn: Fix confusion of ToFQDNs vs. DNS rules. [ upstream commit a218052444243b6e439e77675f5f5034d5e86ffe ] Restored DNS proxy rules are DNS rules, not ToFQDNs rules. Fixes: #13991 Fixes: #13992 Signed-off-by: Jarno Rajahalme <jarno@covalent.io> 21 November 2020, 00:58:27 UTC
07f3dee dnsproxy: print total number of rules if too many [ upstream commit b115c544aadc9999b26964687bfebd893798db5f ] GetRules() will not process more than 1000 rules per port. Print how many are the total rules in the message. Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com> Signed-off-by: Jarno Rajahalme <jarno@covalent.io> 21 November 2020, 00:58:27 UTC
5f99a13 fqdn: Make maximum number of IPs per restored rule configurable [ upstream commit 871e7e128b406f0753d48196c8049689d863d798 ] Only count the number of IPs for each FQDN selector/rule when storing rules for restoration, rather than ignoring later rules on a port after previous rules have hit the maximum number of IPs. Make the maximum number of IPs per restored rule configurable with the new option '--tofqdns-max-ips-per-restored-rule' (default 1000). Signed-off-by: Jarno Rajahalme <jarno@covalent.io> 21 November 2020, 00:58:27 UTC
3b0f646 Add Registry Credentials to Tests [ upstream commit e8605f88baabceebef52170b7337c793436492aa ] In order to get around image registry pull limits, credentials can be set. Signed-off-by: Nate Sweet <nathanjsweet@pm.me> Signed-off-by: Jarno Rajahalme <jarno@covalent.io> 21 November 2020, 00:58:27 UTC
6585c1d k8s: update k8s libraries to 1.17.14 Also update k8s test versions to 1.17.14 Signed-off-by: André Martins <andre@cilium.io> 17 November 2020, 17:24:57 UTC
e0fc12f ci: log in to docker in vagrant boxes [ upstream commit 238262f15a681674c2c02b4650e311b947ab44fc ] Signed-off-by: Maciej Kwiek <maciej@isovalent.com> Signed-off-by: Tom Payne <tom@isovalent.com> 13 November 2020, 10:04:26 UTC
307ff61 change default docker image repository from docker.io to quay.io [ upstream commit 434b056f7c2f25487068458684247063e2a52452 ] With the recent limitations introduced by Docker, docker hub has been rate limiting docker pulls on images for non-registered users. To avoid this limitation we should switch the default container image repositories to quay.io. Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Tom Payne <tom@isovalent.com> 13 November 2020, 10:04:26 UTC
a968618 fqdn: keep IPs alive if their name is alive [ upstream commit 5923dafd88be2cef9f292479aff1eb0ea1ff3d05 ] There are applications that when a DNS name resolves to multiple IPs, they will store the IPs and use them past their TTL point. For example: - name resolves to IP1,IP2 - app connects to IP1 - protocol error forces disconnect - app connects to IP2 This patch keeps the IPs that map to a name alive as long as one of the IPs for the given name is alive, so that applications like the one above will not fail. Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com> Signed-off-by: Tobias Klauser <tklauser@distanz.ch> 10 November 2020, 08:25:42 UTC
e2e3e6b fqdn: Add a nil check for security id lookup [ upstream commit af95561ff00e815a6509b5fc6fccd17756c7e896 ] The security id lookup could return nil if the identity cache isn't initialized during endpoints restore time, resulting in a crash. Hence, add a nil check before populating log record values. Signed-off-by: Aditi Ghag <aditi@cilium.io> Signed-off-by: Tobias Klauser <tklauser@distanz.ch> 10 November 2020, 08:25:42 UTC
4789bef operator: increase GC Rate limit of identities to 2500 per minute [ upstream commit 86e419ead4568c972cf67088441857d8beda07bf ] In cluster that have some high churn of pods being created and deleted with different security identities, garbage collecting 250 identities per minute might not be sufficient. Thus, we are increasing the default limit to 2500 identities per minute. Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Tobias Klauser <tklauser@distanz.ch> 10 November 2020, 08:25:42 UTC
5273db1 release: add script to check presence of docker images [ upstream commit 73be2c15ced2d54aecdfa462b38d2aac2e6f631f ] To check if images are published across all repositories the `check-docker-images.sh` script will be able to perform this check of a particular release. Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Tobias Klauser <tklauser@distanz.ch> 10 November 2020, 08:25:42 UTC
f3ef8f3 api-limiter: Add documentation for log fields [ upstream commit cd6f6c47a8982d2711ab14e22af75a8926337c20 ] This commit contains no functional changes. It is meant to provide more context and ease understanding of the log msgs from the rate limiter to help debugging, and so on. Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Tobias Klauser <tklauser@distanz.ch> 03 November 2020, 19:56:28 UTC
04d3160 docs: Explain the rate limiter log messages [ upstream commit e9a536c48073225e49d0c96ac49dff9f27e44804 ] Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Tobias Klauser <tklauser@distanz.ch> 03 November 2020, 19:56:28 UTC
c41f7be docs: Fix format of rate limiter config table [ upstream commit 8fd20e820bd516d856b87266856138002a5bdced ] This commit simply fixes the format by making it consistent. Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Tobias Klauser <tklauser@distanz.ch> 03 November 2020, 19:56:28 UTC
92c8adc k8s: clarify CRD schema versioning and its update process [ upstream commit 72ec75fe0961bb3f106abc96d38f097867546465 ] Add steps on and when the CRD validation version should be updated. Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Tobias Klauser <tklauser@distanz.ch> 03 November 2020, 19:56:28 UTC
9826c18 backporting: Escape commit message when used as regex [ upstream commit 1f178f092c155c728d6fa9822ee062b2cfa9821b ] This commit escapes the commit message when it is used as a regular expression. Any special character of the posix extended regex (`.^$*+?()[{\|`) is prefixed with a backslash. This fixes an issue where `git log` would crash due to to the string being passed to `--grep` not being a valid regex, such as e.g. in the commit messages found in PR #13674 which contain `$`, `*` and `{`. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> Signed-off-by: Tobias Klauser <tklauser@distanz.ch> 03 November 2020, 19:56:28 UTC
edee82a backporting: Clean tmp files after backport with conflicts [ upstream commit 7b4d70015a88437e34476a2360dc14333a7dfd2b ] During the backport process, when cherry-picking commits to backport, the cherry-pick script may fail if there are conflicts. In that case, the temporary file holding the backported commit is not clean up. This commit fixes it to clean the temporary file even in case of failure. Suggested-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Tobias Klauser <tklauser@distanz.ch> 03 November 2020, 19:56:28 UTC
1cb7b5e contrib: Sort authors without depending on locale [ upstream commit d8be57743e34bd1a8350122e40b47e1aaa6ad0f1 ] [ Backporter's notes: The above upstream commit is from the v1.8 branch as this commit was never in master. ] This is a partial backport of https://github.com/cilium/cilium/pull/13106. Signed-off-by: Chris Tarazi <chris@isovalent.com> 03 November 2020, 18:34:42 UTC
6258a70 Prepare for release v1.7.11 Signed-off-by: Chris Tarazi <chris@isovalent.com> 28 October 2020, 00:43:10 UTC
97f4483 Dockerfile: Bump cilium-runtime image Signed-off-by: Chris Tarazi <chris@isovalent.com> 28 October 2020, 00:21:39 UTC
c5480cf docs: Add a note about systemd 245 rp_filter issue [ upstream commit 61100c50b8fece5cac963c67f71c259ca1b05052 ] Signed-off-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: Chris Tarazi <chris@isovalent.com> 24 October 2020, 10:26:31 UTC
709b6a8 pkg/endpoint: calculate Kube API-Server lag from pod events [ upstream commit 4e29130043408063d62e91d21cfcf00f9ef837ed ] Since Cilium receives CNI events when a pod is created, Cilium can calculate the lag for kube-apiserver events by checking the time an ADD event for that Pod was received and subtracting by the time the CNI event for that pod was received. Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Chris Tarazi <chris@isovalent.com> 24 October 2020, 10:26:31 UTC
792414c pkg/labelsfilter: add more unit test and rewrite docs [ upstream commit 5733f6af771c127ecc2777af4da3d2bcbb870154 ] This change is only adding more unit tests to better understand the behavior of the labelsfilter as well as improving the documentation for the expectation of filtering labels. Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> 23 October 2020, 18:39:23 UTC
e752015 lbmap: Optimize lbmap byte order related code [ upstream commit 682f6826bce735056a5e0d285ec0cdbb1e6cd9c8 ] Gets rid of ToNetwork() in DumpParser(). Signed-off-by: Jianlin Lv <Jianlin.Lv@arm.com> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> 23 October 2020, 18:39:23 UTC
4819438 lbmap: Fixed lb cmd byte order issue [ upstream commit c5b709b22cfc0c127c7bc92d7ea2eacdc6b59179 ] The port/RevNat info is stored in the bpf map in network byte order; When displaying the given lbmap content, the port needs to be converted to host byte order. Add ToHost() function for lbmap ToHost converts fields to host byte order. Signed-off-by: Jianlin Lv <Jianlin.Lv@arm.com> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> 23 October 2020, 18:39:23 UTC
e13de04 metrics: fix negative identity count [ upstream commit 9673c485a72ec93c10e2db1f4fdc8feab45d3d98 ] Identity allocation uses cache and refcnt mechanisms, if the identity info is already in remote kvstore and localkeys store, it will just increase the refcnt, then notify the caller that this identity is reused. The caller will then not bump up the identity counter. However, there is a corner case that not get handled: refcnt from 0 to 1, which will result to negative identity count in the metrics output. This patch fixes the problem by returning another flag to indicate whether the identity is first-time referenced (refcnt from 0 to 1) or not. The caller then uses this information to determine whether or not to increase the counter. Signed-off-by: arthurchiao <arthurchiao@hotmail.com> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> 23 October 2020, 18:39:23 UTC
984f7bf test: Display BPF map content on fail [ upstream commit ec2c18a074de2446186854188c4853c5c5664ffc ] Signed-off-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Nate Sweet <nathanjsweet@pm.me> 22 October 2020, 23:23:11 UTC
fda8422 pkg/endpoint: reduce cardinality of prometheus labels [ upstream commit ec16cab361309155d012ce12b93750fc5b876c9d ] If the controller that is used for label resolution fails, the prometheus metrics will increase its cardinality since the uniquely controller name was being used as a prometheus label. To avoid this, we will reference these warnings with a common subsystem name, 'resolve-labels'. Fixes: a31ab29f57b2 ("endpoint: Run labels controller under ep manager") Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Paul Chaignon <paul@cilium.io> 22 October 2020, 22:04:10 UTC
9494169 backporting: Update labels by default when submitting backport [ upstream commit a8e67f1139aa9f384d2c0365aecfacd3532bda10 ] When submitting the backport PR using submit-backport, the script proposes to update the labels (i.e., remove needs-backport/X and add backport-pending/X): Sending pull request... Everything up-to-date https://github.com/cilium/cilium/pull/13700 Updating labels for PRs 13383 13608 12975 Set labels for all PRs above? [y/N] y Setting labels for PR 13383... ✓ Setting labels for PR 13608... ✓ Setting labels for PR 12975... ✓ The choice defaults to not updating the labels. That may give the wrong impression that it is an optional step---and if you're like me, when you're unsure what an optional step does, you skip it. We should default to setting the labels because we later rely on the labels being set (e.g., when we update them after the PR is merged). This commit changes it to the following: Sending pull request... Everything up-to-date https://github.com/cilium/cilium/pull/13700 Updating labels for PRs 13383 13608 12975 Set labels for all PRs above? [Y/n] Setting labels for PR 13383... ✓ Setting labels for PR 13608... ✓ Setting labels for PR 12975... ✓ Signed-off-by: Paul Chaignon <paul@cilium.io> 22 October 2020, 22:04:10 UTC
bb49619 endpoint: Avoid benign error messages on restoration [ upstream commit 228a485f2a5441f506ba9c0f357321c060f93590 ] During the endpoint restoration process, when we parse the endpoints, we assign them a reserved init identity if they don't already have an identity [0]. If we later remove the endpoint (because the corresponding K8s pod or interface are missing), we attempt to remove the identity from the identity manager. That last operation results in the following error message because the init identity was never added to the manager. level=error msg="removing identity not added to the identity manager!" identity=5 subsys=identitymanager This commit fixes it by skipping the removal attempt from the manager in the case of identity init. 0 - https://github.com/cilium/cilium/blob/80a71791320df34df5b6252b9680553e38d88d20/pkg/endpoint/endpoint.go#L819 Signed-off-by: Paul Chaignon <paul@cilium.io> 22 October 2020, 22:04:10 UTC
a2c9ee2 endpoint: Avoid unnecessary warning logs [ upstream commit df6ede7fa5555984d5b60d961cf3c90a965a6cdb ] Do not log a warning when can't release the ID of a disconnected endpoint. These changes remove warning logs like: msg="Unable to restore endpoint, ignoring" endpointID=1925 error="interface lxc18d62e89ea16 could not be found" k8sPodName=default/spaceship-d5d56b59-6c582 subsys=daemon msg="Unable to release endpoint ID" error="Unable to release endpoint ID 1925" state=disconnected subsys=endpoint Signed-off-by: Jarno Rajahalme <jarno@covalent.io> Signed-off-by: Paul Chaignon <paul@cilium.io> 22 October 2020, 22:04:10 UTC
87017ae endpoint: Remove interval for metadata resolver [ upstream commit cb9aa3e73c72f7a9c1bf65609d9db85f2c493e5f ] This controller's run interval is actually unnecessary as this controller is intended to only run once as long as it succeeds. This is evidenced by the fact that we have a `done` channel that's closed when the controller succeeds and a goroutine waiting `done` to be closed to remove the controller, thereby stopping it. In addition, in the very unlikely case that the controller is scheduled again to run _and_ the goroutine waiting for the `done` channel to be closed has not run yet, the controller could panic because it would close the channel twice. This commit prevents that as the interval defaults to 10m if not provided. If the goroutine waiting on the channel doesn't run within 10m, then we very likely have much worse problems at hand. Signed-off-by: Chris Tarazi <chris@isovalent.com> 22 October 2020, 16:33:58 UTC
44d7996 endpoint: Fix goroutine leak when EP is deleted [ upstream commit 6747a845542b38015917760d9905883170b5496c ] The metadata resolver controller can be put in a scenario where it can never succeed if the Cilium K8s pod cache is out-of-sync with the current view of pods in the cluster. The cache can be out-of-sync if there's heavy load on the cluster and especially on the apiserver. This leaves the Cilium pod cache starved. When the pod cache is starved, Cilium may never have an up-to-date view of the pods running in the cluster, and therefore may never be able to resolve labels for pods because the fetch will return "pod not found". Eventually, kubelet (or maybe even the user) will give up on the pod and remove it, thereby Cilium begins removing the endpoint. The controller is stopped, but the goroutine waiting on the `done` channel will never receive, hence the leak. This commit fixes the goroutine leak when the controller never succeeds and therefore, never closes the `done` channel. The fix is to add a `select` statement to also watch the endpoint's `aliveCtx` which is cancelled (and the context's `Done` channel is closed) when the endpoint is deleted. This commit was validated by forcing the metadata resolver to never find a pod (manually hardcode the wrong pod name), and ensuring that the `gops stack` output does not contain an entry like: ``` goroutine 1244259 [chan receive, 1434 minutes]: github.com/cilium/cilium/pkg/endpoint.(*Endpoint).RunMetadataResolver.func1(0xc0055d2a20, 0xc0049ff600, 0xc0055d2ae0, 0x5d) /go/src/github.com/cilium/cilium/pkg/endpoint/endpoint.go:1531 +0x34 created by github.com/cilium/cilium/pkg/endpoint.(*Endpoint).RunMetadataResolver /go/src/github.com/cilium/cilium/pkg/endpoint/endpoint.go:1530 +0x11e ``` Fixes: https://github.com/cilium/cilium/issues/13680 Signed-off-by: Chris Tarazi <chris@isovalent.com> 22 October 2020, 16:33:58 UTC
05b5cd3 k8s: update k8s libraries to 1.17.13 Also update k8s test versions to 1.17.13 Signed-off-by: André Martins <andre@cilium.io> 21 October 2020, 14:33:16 UTC
8603523 test: Clean up etcd-deployment after test Signed-off-by: Chris Tarazi <chris@isovalent.com> 20 October 2020, 08:46:34 UTC
7fa76d0 test: Refactor DatapathConfiguration etcd test This commit separates out the preamble of deploying etcd and reconfiguring Cilium to use it into a BeforeEach. This improves the readability of the test. Signed-off-by: Chris Tarazi <chris@isovalent.com> 20 October 2020, 08:46:34 UTC
f7f3913 test: Avoid reinstalling Cilium after test suite This commit removes an unnecessary step which installs Cilium after every Context block in this test suite. It is unnecessary because each Context will install Cilium and configure it accordingly. This will also speed up the test suite execution time. Signed-off-by: Chris Tarazi <chris@isovalent.com> 20 October 2020, 08:46:34 UTC
7eef9ed test: Remove leftover etcd operator related code [ upstream commit 43a7bba3bf543eba6deebf35c57f63856f44f819 ] Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: Chris Tarazi <chris@isovalent.com> 20 October 2020, 08:46:34 UTC
070c410 test: Replace managed etcd test with etcd test [ upstream commit bf3a6c585ce05d83fc52224c602192d547d84302 ] [ Backporter's notes: The deployment manager code doesn't exist in v1.7, so that was omitted in this backport commit. The functional changes that the deployment manager provided were translated for the current code in v1.7. Also, "synchronizeK8sNodes" is not under the "config" Helm subchart in v1.7, but rather under "operator". ] The managed etcd feature is being deprecated soon. The test has been unreliable due to etcd operator not being able to bring up etcd clusters reliably. We want to ensure test coverage for the etcd backing though. Convert the managed etcd test to a more generic test which uses stateless etcd for a reliable outcome. Fixes: #11181 Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: Chris Tarazi <chris@isovalent.com> 20 October 2020, 08:46:34 UTC
e1c9152 identity: Fix nil pointer panic in LookupIdentityByID [ upstream commit 005291cf60e0c48d3968093091c090349e623955 ] Because the identity allocator is initialized asychronously via `InitIdentityAllocator`, the local identitiy allocator might not have been initialized yet when the lookup functions are called. This can cause nil pointer panics, as observed in #13479. Before b194612c004c3e69289286e9a35d337b2645fc50, this nil pointer panic could not occur in `LookupIdentityByID` as the function checked for `m.IdentityAllocator != nil` which also implies `m.localIdentities != nil`. This commit adds an explict check for `m.localIdentities` and fixes a potential data race by checking the initialization channels before accessing `m.localIdentities` or `m.IdentityAllocator`. Fixes: #13479 Fixes: b194612c004c ("identity: Avoid kvstore lookup for local identities") Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> Signed-off-by: Chris Tarazi <chris@isovalent.com> 20 October 2020, 08:33:56 UTC
ee1b3e3 service: Use initNextID in acquireLocalID() [ upstream commit b34e5d80ad6929e735e9486f1f38c77df99043cc ] When rolling over, it should use initNextID instead of FirstFreeServiceID, which doesn't belong to the IDAllocator. This would create problems if FirstFreeServiceID and FirstFreeBackendID have different values although now they happen to be the same. Fixes: ab9cf4ba4206 ("service: Make local ID allocator more service agnostic") Signed-off-by: Han Zhou <hzhou8@ebay.com> Signed-off-by: Chris Tarazi <chris@isovalent.com> 20 October 2020, 08:33:56 UTC
1454ca3 bpf: only clean up XDP from devices with XDP attached [ upstream commit 8f0e7fad3aff07cf0b00528080fb42dffe15e1af ] [ Backporter's notes: Includes update to bpf.sha. ] Currently, during agent startup, cilium removes XDP from all interfaces except for `cilium_host`, `cilium_net` and `$XDP_DEV` regardless of whether there is an XDP program attached to it. For some drivers, e.g. Mellanox mlx5, the following command will cause device reset regardless of whether there is an XDP program attached to it, which introduces node and pod network interruption: `ip link set dev $DEV xdpdrv off`. This patch adds a check of XDP program existence to avoid such network interruption. Fixes: #13526 Reported-by: ArthurChiao <arthurchiao@hotmail.com> Signed-off-by: Jaff Cheng <jaff.cheng.sh@gmail.com> Signed-off-by: Chris Tarazi <chris@isovalent.com> 20 October 2020, 08:33:56 UTC
54fdd63 bpf: clean up XDP from prior devices when cilium configuration changes [ upstream commit 1afb536cd61b5a817c472f761979fd08e0611ae9 ] Avoid having to leave around stale XDP programs when the config changes. Therefore do the same as we do in tc which is to clean up prior state. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Chris Tarazi <chris@isovalent.com> 20 October 2020, 08:33:56 UTC
98210c4 contrib: match commit subject exactly when searching for upstream commit [ upstream commit 6557f7557ca7ef65f8097743d4fdb935967d686e ] In generate_commit_list_for_pr, the commit subject is used to determine the upstream commit ID from $REMOTE/master. However, if in the meantime another commit with e.g. a Fixes tag that mentions this commit subject, it appears first and leads to the original commit not being found. This can be demonstrated using #13383: ``` * PR: 13383 -- daemon: Enable configuration of iptables --random-fully (@kh34) -- https://github.com/cilium/cilium/pull/13383 Merge with 2 commit(s) merged at: Wed, 14 Oct 2020 11:41:51 +0200! Branch: master (!) refs/pull/13383/head ---------- ------------------- v (start) | Warning: No commit correlation found! via dbac86cffc6d57e8c093d2821e0d794f4c13d284 ("daemon: Enable configuration of iptables --random-fully") | 350f0b36fd9b4cf23ebc11f4365c5c89591d0ff4 via 22d4554e963e2d8029ff95087ac03e55e90a7377 ("test: Test iptables masquerading with --random-fully") v (end) $ # this is the git log command (with the subject added) from $ # contrib/backporting/check-stable that should extract a single $ # upstream commit $ git log -F --since="1year" --pretty="%H %s" --no-merges --grep "daemon: Enable configuration of iptables --random-fully" origin/master 078ec543d36a8f5d6caed5c4649c74c72090ae20 install/kubernetes: consistent case spelling of iptables related values 4e39def13bca568a21087238877fbc60f8751567 daemon: Enable configuration of iptables --random-fully $ git show 078ec543d36a8f5d6caed5c4649c74c72090ae20 commit 078ec543d36a8f5d6caed5c4649c74c72090ae20 Author: Tobias Klauser <tklauser@distanz.ch> Date: Wed Oct 14 11:58:29 2020 +0200 install/kubernetes: consistent case spelling of iptables related values Make the case spelling of the newly introduced "ipTablesRandomFully" value consistent with other iptables option values which use the "iptables" spelling. Fixes: 4e39def13bca ("daemon: Enable configuration of iptables --random-fully") Signed-off-by: Tobias Klauser <tklauser@distanz.ch> ``` Note the `Fixes: ...` line in commit 078ec543d36a8f5d6caed5c4649c74c72090ae20 above. Fix this behavior by grepping for the subject line from start of line: ``` $ git log -F --since="1year" --pretty="%H %s" --no-merges --extended-regexp --grep "^daemon: Enable configuration of iptables --random-fully" origin/master 4e39def13bca568a21087238877fbc60f8751567 daemon: Enable configuration of iptables --random-fully * PR: 13383 -- daemon: Enable configuration of iptables --random-fully (@kh34) -- https://github.com/cilium/cilium/pull/13383 Merge with 2 commit(s) merged at: Wed, 14 Oct 2020 11:41:51 +0200! Branch: master (!) refs/pull/13383/head ---------- ------------------- v (start) | 4e39def13bca568a21087238877fbc60f8751567 via dbac86cffc6d57e8c093d2821e0d794f4c13d284 ("daemon: Enable configuration of iptables --random-fully") | 350f0b36fd9b4cf23ebc11f4365c5c89591d0ff4 via 22d4554e963e2d8029ff95087ac03e55e90a7377 ("test: Test iptables masquerading with --random-fully") v (end) ``` Reported-by: Robin Hahling <robin.hahling@gw-computing.net> Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Signed-off-by: Nate Sweet <nathanjsweet@pm.me> 20 October 2020, 08:26:58 UTC
2e29740 docs: Fix TLS visibility GSG [ upstream commit 837caf8a649d11d879726415fc61d872b4def62e ] - Use k8s secret key name 'ca.crt' for 'ca-certificates.crt' so that the example CNP works - Change the example monitor output from lyft.com to artii.herokuapp.com - Add deletion of the secrets to clean-up - Fix indentation and white-spacing in enumerated lists so that they are rendered properly Signed-off-by: Jarno Rajahalme <jarno@covalent.io> Signed-off-by: Quentin Monnet <quentin@isovalent.com> 13 October 2020, 07:40:53 UTC
b79349f test: fix escaping in l7 tls vis policy [ upstream commit 8ed026ada6b01bf06b05b3cb392c2210eabbbb71 ] Signed-off-by: Maciej Kwiek <maciej@isovalent.com> Signed-off-by: Quentin Monnet <quentin@isovalent.com> 13 October 2020, 07:40:53 UTC
da73428 docs, test: replace swapi by artii.herokuapp.com for TLS visibility [ upstream commit 92aa025e90127c8fb5d00bb832882783d2cca318 ] The STAR WARS API used as an example for TLS visibility documentation and tests is no longer maintained [0], and the website is down. Let's switch to something else instead: https://artii.herokuapp.com/, which turns arguments into ASCII art snippets. [0] https://github.com/phalt/swapi Signed-off-by: Quentin Monnet <quentin@isovalent.com> 13 October 2020, 07:40:53 UTC
fbe6d7d cilium.io/v2: fix CNP Deep Equal function [ upstream commit bafc8813f5b044bd396deefbfe50f6426d7024a5 ] An equal function should never modify its fields, by doing it so it might cause race conditions. [ Backport note: Change applied to pkg/k8s/factory_functions.go instead of pkg/k8s/apis/cilium.io/v2/cnp_types.go where the related code is located in v1.9. Also keep the check on reflect.deepEqual(...) added to the value returned by the function, as is the case in branch v1.7 (changed in v1.9 with commit c813a15 ("k8s: Explicitly embed CRD types into CCNP"). ] Fixes: 100d83c27535 ("k8s: ignore kubectl.kubernetes.io/last-applied-configuration annotation") Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Quentin Monnet <quentin@isovalent.com> 12 October 2020, 20:00:40 UTC
ff72ce1 pkg/comparator: add function to compare maps ignoring keys [ upstream commit 94ddcd35946007afb7cf8e513c50b7b2afa16b61 ] MapStringEqualsIgnoreKeys will be used to compare maps while ignoring certain keys. Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Quentin Monnet <quentin@isovalent.com> 12 October 2020, 20:00:40 UTC
55f0c5a api-limiter: Add test covering cancelation of rate limiter [ upstream commit 269aa1b69786eed0649162bf319e10ebf5f4b601 ] Test that failing requests will not impact the rate limiter. This is already working correctly so this test only adds additional coverage. Signed-off-by: Thomas Graf <thomas@cilium.io> 09 October 2020, 07:22:43 UTC
61f2aa9 api-limiter: Add stress test [ upstream commit 75159c64565c59175cdae7c127e0394fde148f19 ] Simulate stress by feeding many parallel requests through the API rate limiter. The maximum wait duration will soon be hit for requests causing them to fail. Without the fix, this test would often fail. If it succeeded, the number of retries is in the range of 60-80K. With the fix, the test succeeds consistently and retries are in the range of 6-8K. That's a 10x reduction of retries. Signed-off-by: Thomas Graf <thomas@cilium.io> 09 October 2020, 07:22:43 UTC
8019191 api-limiter: Enforce ParallelRequests before applying rate limiting [ upstream commit 208c69ceddb127d811c7dc011c69bd51b14896e1 ] So far, the rate limiting has been enforced before enforcing parallel requests. In theory, this is better because we can return error code 429 earlier to tell callers to slow down. In practice, many callers will retry immediately anyway which means that all the limiting will happen by the rate limiter. The rate limiter relies on reservations that need to be canceled. If too many requests happen in parallel, reservations can't be canceled quickly enough. By swapping the enforcement of parallel requests with the rate limiter, all requests will block for at least MaxWaitDuration if more than the allowed number of parallel requests are pending which will naturally pace the callers. Swapping the enforcement order requires the acquired semaphore to be released in error cases of the rate limiter. This requires to change the structure of Wait() to have a single error handling structure. By reusing finishRequest(), the metrics handler has to be adjusted slightly to account for new outcomes as it now bumps the metric for canceled requests as well. What remains unchanged is that only successful API requests are used to calculate the mean processing duration. Fixes: 3141e6581e0 ("rate: Add API rate limiting system") Signed-off-by: Thomas Graf <thomas@cilium.io> 09 October 2020, 07:22:43 UTC
09fee20 api-limiter: Fix duplicate log message [ upstream commit ecb2369afc1660ae19406a32168f1a838aa06554 ] The following log message was duplicated and printed twice for two different meanings. Fixes: 3141e6581e0 ("rate: Add API rate limiting system") Signed-off-by: Thomas Graf <thomas@cilium.io> 09 October 2020, 07:22:43 UTC
b462f52 contexthelpers: Fix deadlock when nobody recvs on success channel [ upstream commit 4a66961b5ab8857d77bf1be13d56776049b2ca5f ] It has been observed that the "sessionSuccess <- true" statement can block forever. E.g. Cilium v1.7.9: goroutine 742 [chan send, 547 minutes]: github.com/cilium/cilium/pkg/kvstore.(*etcdClient).renewLockSession(0xc000f66000, 0x2790b40, 0xc000e247c0, 0x0, 0x0) /go/src/github.com/cilium/cilium/pkg/kvstore/etcd.go:657 +0x3e2 github.com/cilium/cilium/pkg/kvstore.connectEtcdClient.func6(0x2790b40, 0xc000e247c0, 0x3aae820, 0x2) /go/src/github.com/cilium/cilium/pkg/kvstore/etcd.go:819 +0x3e github.com/cilium/cilium/pkg/controller.(*Controller).runController(0xc000f42500) /go/src/github.com/cilium/cilium/pkg/controller/controller.go:205 +0xa2a created by github.com/cilium/cilium/pkg/controller.(*Manager).updateController /go/src/github.com/cilium/cilium/pkg/controller/manager.go:120 +0xb09 This can happen when the context cancellation timer fires after a function which consumes the context has returned, but before "sessionSuccess <- true" is executed. After the timeout, the receiving goroutine is closed, making the sending block forever. Fix this by making the sessionSuccess channel buffered. Note that after sending we don't check whether the context has been cancelled, as we expect that any subsequent functions which consume the context will check for the cancellation. Fixes: 02628547 ("etcd: Fix incorrect context usage in session renewal") Signed-off-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: Quentin Monnet <quentin@isovalent.com> 07 October 2020, 16:38:01 UTC
back to top