Revision history - refs/heads/multi-stack-dev-vm - origin: https://github.com/cilium/cilium

visit type:

Revision	Author	Date	Message	Commit Date
4ec9a5d	André Martins	18 September 2019, 07:25:15 UTC	add multistack to dev VM Signed-off-by: André Martins <andre@cilium.io>	11 December 2019, 13:20:18 UTC
9d3374d	Bruno Miguel Custódio	10 December 2019, 08:12:08 UTC	Improve the 'Setting up Cluster Mesh' page. Signed-off-by: Bruno Miguel Custódio <brunomcustodio@gmail.com>	10 December 2019, 13:49:34 UTC
b23e18e	Ray Bejjani	10 December 2019, 12:50:54 UTC	docs: Upgrade about tofqdns-min-ttl default and zombies We lowered the default value of tofqdns-min-ttl to 1 hour with the introduction of zombies. This may be unexpected when upgrading, especially as many would not have changed from the default. Signed-off-by: Ray Bejjani <ray@isovalent.com>	10 December 2019, 13:47:57 UTC
7141bc5	André Martins	09 December 2019, 19:04:39 UTC	github: remove github actions integration Since github actions are not well suitable to run on forked branches we can't run them reliably for forks of cilium/cilium for this reason we will remove the github actions temporarily for now. Signed-off-by: André Martins <andre@cilium.io>	10 December 2019, 09:33:46 UTC
b062b32	Joe Stringer	09 December 2019, 09:02:20 UTC	policy: Remove API limit on number of CIDR prefixes This limit was arbitrary and should not apply to most deployments (on kernels 4.11 or higher). Later code in the daemon will enforce this limit based on whether the limit really needs to be here for the given kernel version, so we can remove these checks entirely from the API validation. Signed-off-by: Joe Stringer <joe@cilium.io>	10 December 2019, 08:12:47 UTC
171378e	Martynas Pumputis	06 December 2019, 15:19:05 UTC	test: Extend MetalLB test case Send requests to LB from the "client-from-outside" container, which corresponds to a case when a host outside a k8s cluster tries to send a request to a LoadBalancer service. Signed-off-by: Martynas Pumputis <m@lambda.lt>	09 December 2019, 17:17:38 UTC
fe8b695	Martynas Pumputis	06 December 2019, 15:16:09 UTC	test: Start a client Docker container on each VM Start a Docker container with the name "client-from-outside" on each test VM. The containers are going to be used for testing cases when a connectivity from outside is required (e.g. accessing a load-balancer provisioned by MetalLB). Signed-off-by: Martynas Pumputis <m@lambda.lt>	09 December 2019, 17:17:38 UTC
3c9e502	Vlad Ungureanu	04 December 2019, 18:53:05 UTC	Use github.com/cilium/cilium/pkg/checker for DeepEquals Signed-off-by: Vlad Ungureanu <vladu@palantir.com>	09 December 2019, 16:09:22 UTC
ca5d030	Vlad Ungureanu	03 December 2019, 06:48:09 UTC	Add docs for selecting sec groups by tags for ENI IPAM Signed-off-by: Vlad Ungureanu <vladu@palantir.com>	09 December 2019, 16:09:22 UTC
deee08d	Vlad Ungureanu	02 December 2019, 05:59:54 UTC	Allow selecting ENI sec groups by tags Signed-off-by: Vlad Ungureanu <vladu@palantir.com>	09 December 2019, 16:09:22 UTC
5b551b7	Vlad Ungureanu	30 November 2019, 20:19:06 UTC	Add ability to query EC2 ENI instance via ec2:DescribeInstanceTypes AWS added one new endpoint to the EC2 API that allows describing all the available EC2 instance types in the region (https://docs.aws.amazon.com/AWSEC2/latest/APIReference/API_DescribeInstanceTypes.html). This allows us to rely on this going forward instead of keeping a hardcoded map of instance types to adapter limits. This will require a new EC2 API persmission to be added to the default IAM permission set: ec2:DescribeInstanceTypes. Signed-off-by: Vlad Ungureanu <vladu@palantir.com>	09 December 2019, 15:24:32 UTC
c82ea8f	Deepesh Pathak	27 November 2019, 13:32:28 UTC	operator: add node gc controller for clusterwide policies Signed-off-by: Deepesh Pathak <deepshpathak@gmail.com>	09 December 2019, 14:24:43 UTC
fbf685e	Deepesh Pathak	27 November 2019, 13:31:46 UTC	operator: add rbac for clusterwide policies to operator Signed-off-by: Deepesh Pathak <deepshpathak@gmail.com>	09 December 2019, 14:24:43 UTC
25ef8ab	Deepesh Pathak	26 November 2019, 17:15:27 UTC	operator: add ccnp event watcher to cilium operator * Add kvstore status watcher for CiliumClusterwideNetworkPolicies similar to that of CiliumNetworkPolicies. Signed-off-by: Deepesh Pathak <deepshpathak@gmail.com>	09 December 2019, 14:24:43 UTC
46ce010	Ray Bejjani	09 December 2019, 10:13:11 UTC	docs: Include env variable in EKS e2e examples I forgot to include these previously. They are still needed, for now. Signed-off-by: Ray Bejjani <ray@isovalent.com>	09 December 2019, 12:29:48 UTC
fd3670b	Ray Bejjani	07 November 2019, 15:30:31 UTC	fqdn: Add zombie limit Zombies can accumulate under certain pathological scenarios. The default cap is high, and should only impact cases where the cleanup isn't happening fast enough. Signed-off-by: Ray Bejjani <ray@isovalent.com>	09 December 2019, 11:08:39 UTC
3ce8259	John Fastabend	06 December 2019, 15:01:45 UTC	cilium: encryption bugtool should remove aead, comp and auth-trunk keys Originally encryption only supported end and auth key types but we have also added aead types now as well. This adds all the supported ALGO types for stripping from bugtool this future proofs us if we add more keys later and also removes the aead keys which is an algo we support today. Signed-off-by: John Fastabend <john.fastabend@gmail.com>	09 December 2019, 07:55:50 UTC
0932671	Swaminathan Vasudevan	05 December 2019, 20:07:28 UTC	doc: the namespace is wrong when validating cilium on Azure CNI The namespace mentioned in the chaining.yaml is different from the namespace queried using kubectl under validation section. This seems to be an issue because we are re-using the file k8s-install-validate.rst. Fixes: #9710 Signed-off-by: Swaminathan Vasudevan <svasudevan@suse.com>	06 December 2019, 13:40:55 UTC
5f28f53	André Martins	05 December 2019, 17:06:36 UTC	Dockerfile runtime: add python3 dependency It seems bpftool can't be build without python3 so we need to install python3 in order to build it. Signed-off-by: André Martins <andre@cilium.io>	06 December 2019, 10:22:23 UTC
f93ee14	André Martins	05 December 2019, 10:49:25 UTC	bump k8s client libraries to v1.17.0-rc.2 Signed-off-by: André Martins <andre@cilium.io>	05 December 2019, 16:53:19 UTC
433d108	Jarno Rajahalme	28 November 2019, 00:08:17 UTC	test: Bump K8S_VERSION to 1.17 in vagrant-local-start Bump K8S_VERSION to 1.17 in vagrant-local-start and change the default K8S_NODES to 2. Signed-off-by: Jarno Rajahalme <jarno@covalent.io>	05 December 2019, 10:30:59 UTC
356444a	Jarno Rajahalme	27 November 2019, 19:14:42 UTC	test: Support reprovisioning with K8S_NODES=1 Do not try to reprovision k8s2 when K8S_NODES=1. This makes local k8s CI testing more feasible in situations where the local machine does not have memory for running the two VMs. Note that some tests are likely to fail due to resource limitiations when only one VM is provisioned. Signed-off-by: Jarno Rajahalme <jarno@covalent.io>	05 December 2019, 10:30:59 UTC
4dbbe08	Jarno Rajahalme	27 November 2019, 17:34:42 UTC	test: Delete all old vms before starting new ones Delete all old VMs regardless the value of K8S_NODES. Signed-off-by: Jarno Rajahalme <jarno@covalent.io>	05 December 2019, 10:30:59 UTC
68ac4fe	André Martins	04 December 2019, 23:11:30 UTC	.github: add github actions to cilium Make use of https://github.com/cilium/github-actions to automate PR interactions in Cilium repository. Signed-off-by: André Martins <andre@cilium.io>	05 December 2019, 10:14:24 UTC
507a256	Ray Bejjani	27 November 2019, 19:33:48 UTC	docs: Document running via kubeconfig on EKS We now support using any kubeconfig to run tests, similar to sshconfig. More steps are required but they are not complex. Signed-off-by: Ray Bejjani <ray@isovalent.com>	05 December 2019, 09:38:13 UTC
f0c9854	Martynas Pumputis	29 November 2019, 14:25:23 UTC	test: Add test for MetalLB integration The test checks whether the "test-lb" LoadBalancer service is reachable from the "k8s1" and "k8s2" hosts via the IP addr allocated by MetalLB operating in the L2 mode. Unfortunately, we cannot run the test from a remote host (which would mimic a real world case), as the hosts (VMs) network is not exposed to a host which runs them. MetalLB in the L2 mode sends ARP responses for the given IP addr from an elected master node which makes the master node to handle requests to the service via the LoadBalancer IP addr. Signed-off-by: Martynas Pumputis <m@lambda.lt>	04 December 2019, 19:21:55 UTC
36efa08	Martynas Pumputis	29 November 2019, 14:24:32 UTC	test: Provision LoadBalancer service test-lb for testDS Signed-off-by: Martynas Pumputis <m@lambda.lt>	04 December 2019, 19:21:55 UTC
741f3f8	Martynas Pumputis	29 November 2019, 14:24:04 UTC	test: Add GetLoadBalancerIP helper function The function is going to be used by the LoadBalancer service tests. Signed-off-by: Martynas Pumputis <m@lambda.lt>	04 December 2019, 19:21:55 UTC
165db22	Martynas Pumputis	29 November 2019, 10:31:12 UTC	k8s: Add support for LoadBalancer services This commit adds a support for a service of the LoadBalancer type when running w/o kube-proxy. Such service has an IP addr assigned by an external loadbalancer (e.g. MetalLB) which might be used to access a service from outside. The IP addr is assigned not immediately. Therefore, the provisioning of the service usually takes place after receiving a second update of the service which contains the IP addr stored in the Service.Status.LoadBalancer.Ingress field (array). The behavior of the service is identical to a service of the ExternalIP "type", therefore no changes are required in the BPF datapath. Signed-off-by: Martynas Pumputis <m@lambda.lt>	04 December 2019, 19:21:55 UTC
d87a793	John Fastabend	27 November 2019, 21:09:13 UTC	cilium: add benchmark flag to run benchmarks when set Benchmarks may take some time to run and are likely not always needed. So add a flag and only run when -cilium.benchmark is set. Signed-off-by: John Fastabend <john.fastabend@gmail.com>	04 December 2019, 18:36:00 UTC
0732bc0	John Fastabend	27 November 2019, 04:58:39 UTC	cilium: sockops, testing http Add tests for http using wrk. This will deploy an nginx instance and then run wrk2 against that deployment. In the future we can extract these and track performance. For now just verify the stack is working as expected. Next we will add kTLS tests by having a wrk2-ktls and nginx docker images. Signed-off-by: John Fastabend <john.fastabend@gmail.com>	04 December 2019, 18:36:00 UTC
af51920	John Fastabend	26 November 2019, 22:45:02 UTC	cilium: benchmark tests for sockops datapath using netperf Run datapath benchmarks for sockops from K8s configuration. This lets me easily test TCP_RR, TCP_CRR, and TCP_STREAM while working on upstream kernels to verify fixes, reproduce issues, etc. As we do performance testing we should start tracking performance output from run to run. Signed-off-by: John Fastabend <john.fastabend@gmail.com>	04 December 2019, 18:36:00 UTC
96eb10a	John Fastabend	26 November 2019, 22:04:28 UTC	cilium: Add same node connectivity tests to sockops test The sockops case enables sockmap+sockops on the local nodes. Testing pod traffic across nodes is a good negative test but does not actually exercise sockmap+sockops datapath. This adds a pod to pod connectivity test on the same node and adds the test to the sockops test case. After this we start to test the sockops datapath between pods on the same node. Signed-off-by: John Fastabend <john.fastabend@gmail.com>	04 December 2019, 18:36:00 UTC
ccee5aa	Ray Bejjani	27 November 2019, 16:41:03 UTC	CI: Delete then create namespacees in policy tests This manifests when re-running tests manually on the same cluster. Cleaning up in an AfterAll or AfterEach would be more correct but this allowed namespaces to persist for debugging when holdEnvironment is set. Deleting the namespace should clear all residual entities, including pods and CNPs, ensuring a clean slate every time. Signed-off-by: Ray Bejjani <ray@isovalent.com>	04 December 2019, 16:26:54 UTC
6a143c8	Ray Bejjani	26 November 2019, 14:35:57 UTC	CI: ExecInPods returns an error when no pods are found Returning an error when no pods match should fix panics when nil error returns imply a non-nil CmdRes. This occurred in some callers of ExecInHostNS, itself a simple wrapper of ExecInPods. Signed-off-by: Ray Bejjani <ray@isovalent.com>	04 December 2019, 16:26:09 UTC
63c2a8c	Martynas Pumputis	03 December 2019, 15:40:03 UTC	vagrant: Remove workaround to MASQ traffic from k8s2 A leftover from 861c8a901a ("test: Remove workaround to MASQ traffic from k8s2"). Signed-off-by: Martynas Pumputis <m@lambda.lt>	04 December 2019, 16:25:50 UTC
09d9e1e	Thomas Graf	03 December 2019, 14:49:39 UTC	policy: Disable well-known identities for non-managed etcd Add an option --enable-well-known-identities to allow disabling well-known identities for all new deployments which are not using managed etcd. This reduces the number of security identities whitelisted for each endpoint. Signed-off-by: Thomas Graf <thomas@cilium.io>	04 December 2019, 15:48:16 UTC
ed17402	Jaff Cheng	27 November 2019, 14:12:44 UTC	eni: Check instance existence before resolving deficit This patch adds a check for instance existence before resolving IP deficit, this prevents operator from creating and attaching ENIs to non-existing instances cyclically. Also adds eth0 to instances in unit tests if necessary. Fixes: #9533 Signed-off-by: Jaff Cheng <jaff.cheng.sh@gmail.com>	03 December 2019, 16:51:17 UTC
8e47017	Martynas Pumputis	27 November 2019, 10:05:44 UTC	docs: Keep externalTrafficPolicy=Local limitation The NodePort BPF hasn't added a support the externalTrafficPolicy=Local annotation yet. Therefore, keep the limitation in the docs. Fixes: 7629d9da814ca ("docs: Fix up nodeport limitations") Signed-off-by: Martynas Pumputis <m@lambda.lt>	02 December 2019, 23:52:02 UTC
a6dac63	Maciej Kwiek	07 October 2019, 13:28:47 UTC	Add eks specific jenkinsfile Signed-off-by: Maciej Kwiek <maciej@isovalent.com>	02 December 2019, 17:24:58 UTC
bcc21fc	Maciej Kwiek	28 November 2019, 13:37:10 UTC	Add new startup script image with jq Signed-off-by: Maciej Kwiek <maciej@isovalent.com>	02 December 2019, 17:24:58 UTC
5634675	Vlad Ungureanu	28 November 2019, 23:07:39 UTC	Move --aws-instance-limit-mapping flag to be on the operator Via https://github.com/cilium/cilium/pull/9236 the option was added to the agent CLI but all the code that evaluated the AWS instance limits runs in the operator and not the agent. Moved the option to be in the operator so the limit map to be updated there. Signed-off-by: Vlad Ungureanu <vladu@palantir.com>	02 December 2019, 14:44:01 UTC
203085a	Vlad Ungureanu	02 December 2019, 01:39:08 UTC	Rate limit ec2:DescribeSubnets Signed-off-by: Vlad Ungureanu <vladu@palantir.com>	02 December 2019, 14:42:48 UTC
07104bb	Martynas Pumputis	02 December 2019, 09:36:42 UTC	k8s: Remove unused types.Ingress This commit removes no longer used types.Ingress and its related functions. This is a leftover of https://github.com/cilium/cilium/pull/9419. Signed-off-by: Martynas Pumputis <m@lambda.lt>	02 December 2019, 14:42:34 UTC
1230f2b	Martynas Pumputis	29 November 2019, 09:53:05 UTC	watchers: Add test to check for updates when port is changed This commit adds a unit test to check whether changing a port number of a service triggers a service update. Previously, it was not a case due to k8s.EqualV1Services() ignoring ports. Signed-off-by: Martynas Pumputis <m@lambda.lt>	30 November 2019, 19:44:08 UTC
65357a2	Martynas Pumputis	28 November 2019, 15:11:01 UTC	k8s: Use ParseService when comparing two services Previously, the EqualV1Services() function had its own k8s.Service constructor which was not in sync with ParseService() and did not consider some service fields. The consequence of this was that changing a type or ports of a service did not trigger the service update, and thus the service change was not reflected in the SVC BPF maps. Use ParseService() instead of the custom constructor to prevent any discrepancies in the future. Signed-off-by: Martynas Pumputis <m@lambda.lt>	30 November 2019, 19:44:08 UTC
24fc784	Thomas Graf	29 November 2019, 14:59:50 UTC	doc: Disable masquerading in all chaining guides Not all chaining guides were properly disabling masquerading. In chaining mode, the masquerade decision is delegated to the underlying plugin responsible for the networking. Enabling chaining leads to unnecessary iptables rules being installed. Signed-off-by: Thomas Graf <thomas@cilium.io>	29 November 2019, 15:15:07 UTC
5f58c1f	Thomas Graf	28 November 2019, 18:47:59 UTC	doc: Fix AKS installation guide Transaprent mode is no longer required and we can chain on any networking mode of the AKS CNI plugin. Signed-off-by: Thomas Graf <thomas@cilium.io>	29 November 2019, 14:43:48 UTC
4123e62	Martynas Pumputis	28 November 2019, 15:23:37 UTC	k8s: Fix Service.DeepEquals for ExternalIP Previously, the DeepEquals() method did not consider a change when an external IP had been added to a service which didn't have any externalIP assigned to it. This prevented from updating such service, therefore no externalIP services were provisioned for the service. Fixes: a402a424b5 ("add externalIPs implementation in Cilium") Signed-off-by: Martynas Pumputis <m@lambda.lt>	29 November 2019, 09:31:01 UTC
809676c	Swaminathan Vasudevan	22 November 2019, 20:35:16 UTC	Deprecate/Delete support for monitor v1.0 socket The monitor v1.0 socket protocol has not been used since Cilium v1.2, which is well past support so we should be able to remove all code associated with monitor v1.0 socket handling as per the recomendation. Fixes: #8955 Signed-off-by: Swaminathan Vasudevan <svasudevan@suse.com>	29 November 2019, 02:06:39 UTC
d868de1	Sergey Generalov	27 November 2019, 17:37:03 UTC	Improve Getting Started guides by provide next steps sections: * Hubble installation instructions * Next Steps that ask user to enable DNS visibility * Enable HTTP/L7 Visibility next step * More tutorials Signed-off-by: Sergey Generalov <sergey@isovalent.com>	28 November 2019, 21:24:36 UTC
b4d32f1	Vlad Ungureanu	25 November 2019, 03:16:33 UTC	Add CRD validation for ciliumnodes Signed-off-by: Vlad Ungureanu <vladu@palantir.com>	28 November 2019, 21:19:34 UTC
8dc0b12	Sebastian Wicki	28 November 2019, 15:02:27 UTC	k8s: Fix typo in io.cilium/shared-service annotation The annotation was misspelled as `io.ciliumshared-service`. This adds the missing slash and test cases for the annotation parser. Since the annotation is not documented, no fallback to support the old spelling is introduced. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>	28 November 2019, 17:44:51 UTC
eb2714e	Daniel Borkmann	27 November 2019, 09:48:15 UTC	bpf, nodeport: fix CT GC race where syn/ack reply gets dropped locally While testing BPF nodeport with local backend under stress, I've noticed that once in every several thousand cases the SYN/ACK is 'stuck' and not being pushed back to the remote node. 'Odd' case seen: [...] <- stack flow 0x65f91604 identity 0->0 state new ifindex eno1 orig-ip 0.0.0.0: 192.168.1.120:58490 -> 192.168.1.125:30000 tcp SYN <- host flow 0x65f91604 identity 2->0 state new ifindex eno1 orig-ip 0.0.0.0: 192.168.1.120:58490 -> 10.11.14.171:80 tcp SYN -> endpoint 893 flow 0x65f91604 identity 2->16777217 state established ifindex lxc962bf03db31a orig-ip 192.168.1.120: 192.168.1.120:58490 -> 10.11.14.171:80 tcp SYN <- endpoint 893 flow 0xed432cf identity 16777217->0 state new ifindex 0 orig-ip 0.0.0.0: 10.11.14.171:80 -> 192.168.1.120:58490 tcp SYN, ACK -> stack flow 0xed432cf identity 16777217->2 state reply ifindex 0 orig-ip 0.0.0.0: 10.11.14.171:80 -> 192.168.1.120:58490 tcp SYN, ACK <---- BUGGY [...] 'Regular' case: [...] <- stack flow 0xe5afda01 identity 0->0 state new ifindex eno1 orig-ip 0.0.0.0: 192.168.1.120:58470 -> 192.168.1.125:30000 tcp SYN <- host flow 0xe5afda01 identity 2->0 state new ifindex eno1 orig-ip 0.0.0.0: 192.168.1.120:58470 -> 10.11.14.171:80 tcp SYN -> endpoint 893 flow 0xe5afda01 identity 2->16777217 state established ifindex lxc962bf03db31a orig-ip 192.168.1.120: 192.168.1.120:58470 -> 10.11.14.171:80 tcp SYN <- endpoint 893 flow 0xdb0f605a identity 16777217->0 state new ifindex 0 orig-ip 0.0.0.0: 10.11.14.171:80 -> 192.168.1.120:58470 tcp SYN, ACK <- stack flow 0xe5afda01 identity 0->0 state new ifindex eno1 orig-ip 0.0.0.0: 192.168.1.120:58470 -> 192.168.1.125:30000 tcp ACK <---- OK <- stack flow 0xe5afda01 identity 0->0 state new ifindex eno1 orig-ip 0.0.0.0: 192.168.1.120:58470 -> 192.168.1.125:30000 tcp ACK <- host flow 0xe5afda01 identity 2->0 state new ifindex eno1 orig-ip 0.0.0.0: 192.168.1.120:58470 -> 10.11.14.171:80 tcp ACK [...] In the 'odd' case, the SYN/ACK is wrongly being pushed back into the local stack where it is later dropped: [...] 0.63% 0.63% skbaddr=0xffff92ded3c8b700 protocol=2048 location=0xffffffff90c7915b \| ---ret_from_fork kthread smpboot_thread_fn run_ksoftirqd __do_softirq net_rx_action process_backlog __netif_receive_skb __netif_receive_skb_one_core ip_rcv ip_rcv_finish ip_local_deliver ip_local_deliver_finish ip_protocol_deliver_rcu tcp_v4_rcv kfree_skb kfree_skb [...] After further debugging, turns out that we have a CT GC race, meaning, for the local backend case we always create 2 separate CT entries in nodeport_lb(), the second one is for the purpose of reverse NAT for replies. The race turns out to be triggered when the GC runs in parallel and evicts the second CT entry before it hits the CT lookup in ipv4_policy(). Latter will then return CT_NEW and creates a new entry but without the nodeport status bit set. This causes to miss the CILIUM_CALL_IPV_NODEPORT_REVNAT tail call in handle_ipv4_from_lxc() and results in mentioned push-to-stack behavior. Now, if the first entry would have been evicted by the GC first, then this would not have been an issue since we'd have recreated the second entry once again. As a fix, bump the lifetime of the second entry upon lookup of the first to keep it alive, or, if already evicted by the GC, recreate the second entry before it's being pushed into the pod. Fixes: #9663 Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	28 November 2019, 10:57:36 UTC
e0c4115	Daniel Borkmann	28 November 2019, 07:42:49 UTC	readme: update info latest stable branch releases Update stable section of README to reflect recent releases: https://github.com/cilium/cilium/releases/tag/v1.6.4 https://github.com/cilium/cilium/releases/tag/v1.5.10 Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	28 November 2019, 10:00:44 UTC
f694b3b	André Martins	25 November 2019, 20:52:08 UTC	add support for k8s 1.17 Signed-off-by: André Martins <andre@cilium.io>	27 November 2019, 13:55:41 UTC
adbbcb7	Daniel Borkmann	21 November 2019, 20:13:08 UTC	fqdn: remove EP header file sync from MarkDNSCTEntry Issue is that SyncEndpointHeaderFile() takes the EP lock, and from other callsites e.g. scrubIPsInConntrackTable() we also take the EP lock. Latter is before calling into CT GC, and former is while walking it from a doFiltering() callback hence if both are the same EPs we might run into a potential deadlock. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	27 November 2019, 09:13:06 UTC
c942051	Daniel Borkmann	21 November 2019, 15:59:06 UTC	cilium: lock GC walks for global CT maps to serialize deletions We've seen reports where many goroutines walk the CT map concurrently in order to erase data from it, namely (Endpoint).scrubIPsInConntrackTableLocked() seems to be called from i) REST API handler via (PutEndpointID).ServeHTTP() -> deleteEndpointQuiet() -> LeaveLocked(), and ii) (*Endpoint).runPreCompilationSteps(). This is suboptimal as iterations through the BPF map could evict keys the other routines are currently holding and therefore enforcing restarts of the walk until we hit the stats.MaxEntries limit of the map w/o having finished the regular walk in DumpReliablyWithCallback(). Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	27 November 2019, 09:13:06 UTC
b2b387a	Joe Stringer	25 November 2019, 22:56:18 UTC	cosmetic: Improve identity / allocator logs The allocator log here was duplicated with one that follows two lines down; and one of the identity API calls was missing a debug log for the calls. Signed-off-by: Joe Stringer <joe@cilium.io>	26 November 2019, 22:13:50 UTC
82d394f	John Fastabend	20 November 2019, 19:53:53 UTC	cilium: encryption, add fib support for ipv6 When we added FIB support we only added it for IPv4. This adds support IPv6 and fixes an issue where IPv6 packets were being dropped due to fib lookup failing. Signed-off-by: John Fastabend <john.fastabend@gmail.com>	26 November 2019, 20:05:41 UTC
4d4fed5	John Fastabend	20 November 2019, 19:52:16 UTC	cilium: encryption, delete in xfrm rules if encryption toggled We do not currently remove the 'dir in' encryption rules when encryption is disabled. The original thinking is we could simply leave these xfrm policy/state around because they would only ever be triggered if the datapath marked packets for encryption which it wouldn't because encryption was disabled. Then if encryption was ever (re)enabled the rules would already be there. But, subtle detail if the cilium_host IP changes after encryption is disabled then it (re)enabled with a new IP the 'dir in' state/policy in the xfrm table will no longer be valid. And if the new cilium host IP is in the same CIDR we can end up with a collision in the table and possible use old out of date rules. Fix by removing any 'dir in' rules when encryption is disabled. Signed-off-by: John Fastabend <john.fastabend@gmail.com>	26 November 2019, 20:05:41 UTC
d15b048	André Martins	26 November 2019, 17:21:44 UTC	docs: add policy visibility status documentation Signed-off-by: André Martins <andre@cilium.io>	26 November 2019, 19:51:53 UTC
0d11dfc	Maciej Kwiek	26 November 2019, 11:46:24 UTC	[CI] allow to specify focus via GH comment Signed-off-by: Maciej Kwiek <maciej@isovalent.com>	26 November 2019, 18:56:34 UTC
c1a09c3	Maciej Kwiek	26 November 2019, 15:17:43 UTC	[CI] fix /var/log/journal mount in log gatherer Signed-off-by: Maciej Kwiek <maciej@isovalent.com>	26 November 2019, 18:56:16 UTC
66cccbb	Jarno Rajahalme	25 November 2019, 23:16:38 UTC	test: Keep box in test/.vagrant/ Signed-off-by: Jarno Rajahalme <jarno@covalent.io>	26 November 2019, 12:35:33 UTC
2086971	Jarno Rajahalme	25 November 2019, 21:48:00 UTC	test: Support variable number of workers locally. Signed-off-by: Jarno Rajahalme <jarno@covalent.io>	26 November 2019, 12:35:33 UTC
f62db98	Jarno Rajahalme	18 November 2019, 18:53:33 UTC	test: Add `vagrant-local-start.sh` with preloaded VM support Add a new script `test/vagrant-local-start.sh` that can be run in a local development machine with minimal requirements. Make image pulling more efficient by partially provisioning a new VM image that is then duplicated for each k8s node. This way the dependencies never need to be pulled more than once. The local box file stored in `/tmp/` must be manually deleted for it to be re-built on the next invocation of `vagrant-local-start.sh`. Signed-off-by: Jarno Rajahalme <jarno@covalent.io>	26 November 2019, 12:35:33 UTC
347139c	Jarno Rajahalme	18 November 2019, 18:35:27 UTC	CI: Only install helm if needed. Skip helm install if the required version of helm is already installed. This helps streamline local testing where the same VMs are being re-used for multiple test runs. Singed-off-by: Jarno Rajahalme <jarno@covalent.io>	26 November 2019, 12:35:33 UTC
c8e338f	Joe Stringer	25 November 2019, 23:00:31 UTC	.github: Make it more clear that RN must be removed The release-notes section must either exist with a valid release note, or be removed (so that the relnotes script pulls the PR title). If it exists with the dummy text, then the relnotes script will populate the release notes with this nonsense text instead of the PR title. Signed-off-by: Joe Stringer <joe@cilium.io>	26 November 2019, 12:33:50 UTC
1cd7bb6	Joe Stringer	25 November 2019, 23:16:34 UTC	docs: Describe how to read from tracing pipe Signed-off-by: Joe Stringer <joe@cilium.io>	26 November 2019, 12:33:24 UTC
509d96e	Jarno Rajahalme	25 November 2019, 23:49:18 UTC	test: Support external DNS lookups also for k8s-1.16 Allow coredns to proxy DNS lookups to external domains also in k8s 1.16. Signed-off-by: Jarno Rajahalme <jarno@covalent.io>	26 November 2019, 12:30:47 UTC
38ca1f8	Ray Bejjani	25 November 2019, 15:24:26 UTC	helm: Fix bug to disable health-checks in chaining mode Endpoint healthchecks fail in these modes and should be disabled. The helm charts accounted for this but it was incorrectly triggered previously. fixes [bdb51a0fb0e1a139eb9a2dfbd024fb3ed1e717d7] Signed-off-by: Ray Bejjani <ray@isovalent.com>	26 November 2019, 09:54:09 UTC
ae878fa	ifeanyi	19 November 2019, 16:19:21 UTC	pkg/endpoint: delete _next directories during restore This patch detects and deletes during endpoint restoration, endpoint directories that match `${EPID}_next` or `${EPID}_next_fail` and for which there already exists an endpoint directory `${EPID}`. The idea is to consider such a directory stale (e.g the process was terminated while regenerating endpoint `${EPID}` - in which case there is no need to attempt to restore an endpoint from them. Fixes #9600 Signed-off-by: ifeanyi <ify1992@yahoo.com>	25 November 2019, 17:00:14 UTC
ae6683a	Ashray Jain	16 November 2019, 12:25:57 UTC	Allow setting timeout on status command We use cilium status as a liveness check and sometimes (due to other issues), the daemon can fail to respond to the health endpoint within a reasonable amount of time. If this is larger then the liveness timeout of 5 seconds then this can cause the liveness check to timeout. In these scenarios, for exec liveness probes, the kubelet does not treat timeouts as failure. (It does do that for http/tcp liveness probes) kubernetes/kubernetes#82987 has more context on the k8s side. Signed-off-by: AshrayJain <ashrayj11@gmail.com>	25 November 2019, 16:58:04 UTC
1aa296a	Ashray Jain	20 November 2019, 11:41:16 UTC	Update status.go	25 November 2019, 16:58:04 UTC
2e1b5aa	Ashray Jain	16 November 2019, 12:25:57 UTC	Allow setting timeout on status command We use cilium status as a liveness check and sometimes (due to other issues), the daemon can fail to respond to the health endpoint within a reasonable amount of time. If this is larger then the liveness timeout of 5 seconds then this can cause the liveness check to timeout. In these scenarios, for exec liveness probes, the kubelet does not treat timeouts as failure. (It does do that for http/tcp liveness probes) kubernetes/kubernetes#82987 has more context on the k8s side. Signed-off-by: AshrayJain <ashrayj11@gmail.com>	25 November 2019, 16:58:04 UTC
2556f27	Ray Bejjani	23 November 2019, 14:06:03 UTC	docs: Explain Alpine/musl Refused/NameError fix Alpine based images, of which there are many, treat the Refused response for the first search-list lookup as a full failure. NXDomain responses are fine, however. We have an option to select this agent-wide but have never really documented it. Signed-off-by: Ray Bejjani <ray@isovalent.com>	25 November 2019, 14:50:47 UTC
ebc19e3	Ray Bejjani	23 November 2019, 13:53:08 UTC	docs: Explain DNS zombies for short/long-lived connections The behaviour around this is different in 1.7 and the advice no longer valid. We can now handle mixes of connection types from an endpoint and will never evict an in-use IP. Signed-off-by: Ray Bejjani <ray@isovalent.com>	25 November 2019, 14:50:47 UTC
226e606	Ray Bejjani	23 November 2019, 13:54:02 UTC	daemon: Reduce default toFQDNs min TTL values These previously also had to account for connection lifetimes to the IPs in the DNS responses. We now defer deleting IPs until CT GC marks them unused so these times can be reduced considerably. Signed-off-by: Ray Bejjani <ray@isovalent.com>	25 November 2019, 14:50:47 UTC
36579a6	Ray Bejjani	22 November 2019, 14:08:13 UTC	fqdn: Remove zombies when a new DNS lookup has the same IPs To avoid duplicating zombies with live DNS cache entries, when we insert a new DNS lookup into the cache, we also clear any zombies with the same name -> IP mapping. Signed-off-by: Ray Bejjani <ray@isovalent.com>	25 November 2019, 14:48:14 UTC
5a36f9c	Ray Bejjani	22 November 2019, 12:54:37 UTC	fqdn: Correctly GC endpoint DNS cache on restore The older code accidentally GCed the global cache instead of the endpoint specific cache. This also caused the GC zombie cascade to happen incorrectly, as the global cache is already empty. fixes 64498e8fdd1e3630323f58a7c047a2fdf3fbf480 Signed-off-by: Ray Bejjani <ray@isovalent.com>	25 November 2019, 14:48:14 UTC
4b31875	Joe Stringer	23 September 2019, 17:54:54 UTC	vendor: Add cilium/ebpf dependency Signed-off-by: Joe Stringer <joe@cilium.io>	25 November 2019, 11:42:28 UTC
6cd71e9	Joe Stringer	05 November 2019, 18:07:15 UTC	daemon: Disable CGO by default Now that the perf reader implementation is written in pure Go, we can disable CGO for the main binary, saving 9% of the main binary size: Prior to this commit: $ ll daemon/cilium-agent -rwxrwxr-x 1 joe joe 100M Nov 5 10:05 daemon/cilium-agent After this commit: $ ll daemon/cilium-agent -rwxrwxr-x 1 joe joe 91M Nov 5 10:07 daemon/cilium-agent Signed-off-by: Joe Stringer <joe@cilium.io>	25 November 2019, 11:42:28 UTC
ae53f9c	Joe Stringer	27 September 2019, 00:10:21 UTC	bpf: Delete perf event reader implementation Now that we're based against the golang perf event reader, we don't need this code any more. Delete it! Signed-off-by: Joe Stringer <joe@cilium.io>	25 November 2019, 11:42:28 UTC
ae0fd5b	Joe Stringer	05 November 2019, 05:34:38 UTC	tools: Remove ring-dump tool The ring-dump tool was useful in the past to debug issues with dumps from the perf ring buffer, but now that we're switching the bpf implementation over to a more generic library I don't see the benefit to rebasing / maintaining the tool. If necessary we can always pull this back up from the v1.6 branch. Signed-off-by: Joe Stringer <joe@cilium.io>	25 November 2019, 11:42:28 UTC
282bd14	Joe Stringer	05 November 2019, 05:14:51 UTC	signal: Rebase against pure Go perf reader Signed-off-by: Joe Stringer <joe@cilium.io>	25 November 2019, 11:42:28 UTC
36d5cb3	Joe Stringer	23 September 2019, 19:28:57 UTC	monitor: Rebase against pure Go perf reader Rework the monitor to run against the new pure golang perf reader provided by github.com/cilium/ebpf/perf. Signed-off-by: Joe Stringer <joe@cilium.io>	25 November 2019, 11:42:28 UTC
9975bba	André Martins	13 November 2019, 21:59:01 UTC	pkg/endpoint: fetch pod and namespace labels from local stores This commit reduces the number of k8s apiserver calls by switching to a local store fetch logic for Pods an Namespaces. Signed-off-by: André Martins <andre@cilium.io>	22 November 2019, 17:50:19 UTC
2040431	André Martins	13 November 2019, 21:56:28 UTC	k8s/watchers: expose Namespace store access To avoid performing more API calls to kube-apiserver we can re-use the Namespaces already stored in the k8s watcher. For this we simply need to use expose a function call that can access the internal Namespace store. Signed-off-by: André Martins <andre@cilium.io>	22 November 2019, 17:50:19 UTC
d2011d4	André Martins	13 November 2019, 21:40:03 UTC	pkg/k8s: add more fields to slimmer Pod type Since k8s apiserver calls would return the full Pod structure, switching to the k8s watcher store will need those missing fields as well so that we can switch the k8s apiserver direct calls with the local store calls. Signed-off-by: André Martins <andre@cilium.io>	22 November 2019, 17:50:19 UTC
acbf880	André Martins	13 November 2019, 20:49:02 UTC	pkg/endpoint: decrease direct iteractions with k8s-apiserver This commit reduces the amount of direct iteractions with k8s-apiserver by reusing the k8s watcher stores that are already listening for k8s events. This also fixes a non trivial bug with the concurrency across the update of the visibility policy. Since this commit stops fetching the most up to date annotations from k8s api-server, we need to use the endpoint's event queue to make sure Pod events and endpoint restoration are being processed in order. Otherwise, Cilium could end up with a visibility policy set for a pod that had already removed the policy visibility annotation from its metadata. Signed-off-by: André Martins <andre@cilium.io>	22 November 2019, 17:50:19 UTC
ff75c71	André Martins	13 November 2019, 20:43:29 UTC	k8s/watchers: expose Pod store access To avoid performing more API calls to kube-apiserver we can re-use the Pods already stored in the k8s watcher. For this we simply need to use exponse a function call that can access the internal Pod store. Signed-off-by: André Martins <andre@cilium.io>	22 November 2019, 17:50:19 UTC
2525318	André Martins	13 November 2019, 20:39:19 UTC	pkg/endpoint: add channel that signalize when endpoint is exposed It will be useful the follow up commits to know when an endpoint is exposed in the endpoint manager. Signed-off-by: André Martins <andre@cilium.io>	22 November 2019, 17:50:19 UTC
7201638	Michal Rostecki	18 November 2019, 16:18:43 UTC	make: Remove GOPATH from swagger command Cilium repository used go.mod, so GOPATH is not required anymore. Using `make generate-api` when Cilium is cloned in a regular directory and GOPATH is not set, resulted in the following error: ``` 2019/11/18 16:03:08 trying to read config from /home/mrostecki/repos/cilium/api/v1/cilium-server.yml 2019/11/18 16:03:08 validating spec /home/mrostecki/repos/cilium/api/v1/openapi.yaml 2019/11/18 16:03:10 preprocessing spec with option: minimal flattening 2019/11/18 16:03:10 building a plan for generation 2019/11/18 16:03:10 lstat /root/go: no such file or directory make: [Makefile:259: generate-api] Error 1 (ignored) sudo podman run --rm -v /home/mrostecki/repos/cilium:/home/mrostecki/repos/cilium -w /home/mrostecki/repos/cilium -e GOPATH= --entrypoint swagger quay.io/goswagger/swagger:v0.20.1 generate client -a restapi \ -t api/v1 -f api/v1/openapi.yaml 2019/11/18 16:03:12 validating spec api/v1/openapi.yaml 2019/11/18 16:03:14 preprocessing spec with option: minimal flattening 2019/11/18 16:03:14 building a plan for generation 2019/11/18 16:03:14 lstat /root/go: no such file or directory make: [Makefile:260: generate-api] Error 1 (ignored) ``` Signed-off-by: Michal Rostecki <mrostecki@opensuse.org>	21 November 2019, 16:42:48 UTC
b36385b	John Fastabend	14 November 2019, 14:47:55 UTC	cilium: fix disconnects on operator restarts when using ipsec As reported by Laurent: " We noticed that on container restarts established TCP connections were disconnected (under load). After some investigation we noticed these events on the agent for all nodes: - Received key update via kvstore with invalid EncryptionKey:0 - a few minutes later Received key update via kvstore with valid EncryptionKey:x After looking into the code, it seems that on operator restart the node Informer will add all nodes to the kvstore based on the k8s-node resource which does not have the EncryptionKey info and will set it to 0. Watches on the agents will then update the ipsec configuration and create the issue. After a while, the agents update their kvstore resource to the the good value and traffic can flow again (I haven't checked but I assume there is a reconcile loop in the agents). " Andre suggest only updating the EncryptionState from NodeUpdated if a valid key is provided. Its not valid to both have encryption enabled and specify the null key for example. However, for rolling updates disabling encryption we may have a state where locally encryption is enabled but the remote node is disabling encryption, signaled by sending a null encryption key. So instead of checking in NodeUpdated lets push key into K8s annotation then we can read it out correctly when an event is received and push down stack to configure datapath correctly. This adds the new K8s annotation CiliumEncryptionKey, ".network.encryption-key" Fixes: 500fb2b5cd3e6 ("node: Discover other nodes based on CiliumNode custom resource") Reported-by: Laurent Bernaille <laurent.bernaille@datadoghq.com> Suggested-by: Andre Martins <andre@cilium.io> Signed-off-by: John Fastabend <john.fastabend@gmail.com>	21 November 2019, 16:41:24 UTC
b5c1cbd	Vlad Ungureanu	20 November 2019, 19:06:41 UTC	Add missing words to spelling_wordlist Signed-off-by: Vlad Ungureanu <vladu@palantir.com>	21 November 2019, 02:23:50 UTC
34815d7	Sebastian Wicki	20 November 2019, 12:37:26 UTC	README: Add link to Hubble Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>	20 November 2019, 17:21:05 UTC
ca7f685	André Martins	19 November 2019, 16:00:31 UTC	Updating AUTHORS Signed-off-by: André Martins <andre@cilium.io>	19 November 2019, 16:01:37 UTC
f8ca563	Joe Stringer	19 November 2019, 00:47:28 UTC	nodeport: Clarify externalIP limitations Clarify the limitations and differences from kube-proxy externalIP handling per the current implementation. Signed-off-by: Joe Stringer <joe@cilium.io>	19 November 2019, 15:38:12 UTC
54f29c8	Joe Stringer	19 November 2019, 00:26:40 UTC	docs: Document upgrade step necessary for labels change For more information, refer to PR #9519. Add-on manager likes to delete DaemonSets with this label which causes havoc in some environments. We've removed the label to work around this, but it requires an additional step from users. Signed-off-by: Joe Stringer <joe@cilium.io>	19 November 2019, 15:38:12 UTC

Newer
Older