https://github.com/cilium/cilium
Revision 3516e3ca15956df81ee51f1a184605d94bdafa58 authored by Daniel Borkmann on 21 June 2021, 11:54:59 UTC, committed by Ilya Dmitrichenko on 28 June 2021, 12:42:47 UTC
[ upstream commit 27122d4d666be42b564a06200c32647ca3c73405 ]

Example trace seen in dmesg:

  [...]
  [ 7710.165608] enp10s0f0np0: hw csum failure
  [ 7710.165621] skb len=84 headroom=78 headlen=84 tailroom=30
                 mac=(64,14) net=(78,20) trans=98
                 shinfo(txflags=0 nr_frags=0 gso(size=0 type=0 segs=0))
                 csum(0x0 ip_summed=2 complete_sw=0 valid=0 level=0)
                 hash(0x14006e3a sw=0 l4=0) proto=0x0800 pkttype=0 iif=4
  [ 7710.165631] dev name=enp10s0f0np0 feat=0x0x0032b18217514ba9
  [ 7710.165635] skb headroom: 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  [ 7710.165638] skb headroom: 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  [ 7710.165641] skb headroom: 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  [ 7710.165644] skb headroom: 00000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  [ 7710.165646] skb headroom: 00000040: b8 ce f6 05 e7 62 b8 ce f6 05 e7 76 08 00
  [ 7710.165649] skb linear:   00000000: 45 00 00 54 8a 07 00 00 40 01 84 e8 c0 a8 a0 04
  [ 7710.165652] skb linear:   00000010: 0a 9a 00 73 00 00 23 57 00 f8 15 db cd 74 d0 60
  [ 7710.165654] skb linear:   00000020: 00 00 00 00 5c 2d 0d 00 00 00 00 00 10 11 12 13
  [ 7710.165657] skb linear:   00000030: 14 15 16 17 18 19 1a 1b 1c 1d 1e 1f 20 21 22 23
  [ 7710.165660] skb linear:   00000040: 24 25 26 27 28 29 2a 2b 2c 2d 2e 2f 30 31 32 33
  [ 7710.165663] skb linear:   00000050: 34 35 36 37
  [ 7710.165665] skb tailroom: 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  [ 7710.165668] skb tailroom: 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  [ 7710.165672] CPU: 26 PID: 0 Comm: swapper/26 Not tainted 5.13.0-rc3+ #174
  [ 7710.165674] Hardware name: Gigabyte Technology Co., Ltd. X570 AORUS MASTER/X570 AORUS MASTER, BIOS F22 08/20/2020
  [ 7710.165676] Call Trace:
  [ 7710.165677]  <IRQ>
  [ 7710.165680]  dump_stack+0x7d/0x9c
  [ 7710.165683]  netdev_rx_csum_fault.part.0+0x41/0x45
  [ 7710.165686]  netdev_rx_csum_fault.cold+0xb/0x10
  [ 7710.165687]  __skb_checksum_complete+0xdd/0xf0
  [ 7710.165690]  ? skb_send_sock_locked+0x20/0x20
  [ 7710.165692]  ? reqsk_fastopen_remove+0x190/0x190
  [ 7710.165693]  nf_ip_checksum+0x5b/0x120
  [ 7710.165697]  nf_conntrack_icmpv4_error+0x112/0x160 [nf_conntrack]
  [ 7710.165706]  nf_conntrack_in.cold+0x1d/0x74 [nf_conntrack]
  [ 7710.165714]  ? nft_do_chain_inet_ingress+0x280/0x2e0 [nf_tables]
  [ 7710.165722]  ipv4_conntrack_in+0x14/0x20 [nf_conntrack]
  [ 7710.165731]  nf_hook_slow+0x44/0xb0
  [ 7710.165733]  nf_hook_slow_list+0x71/0xf0
  [ 7710.165735]  ip_sublist_rcv+0x1d1/0x1f0
  [ 7710.165737]  ? ip_sublist_rcv+0x1f0/0x1f0
  [ 7710.165739]  ip_list_rcv+0xf5/0x120
  [ 7710.165741]  __netif_receive_skb_list_core+0x228/0x250
  [ 7710.165745]  netif_receive_skb_list_internal+0x1a1/0x2b0
  [ 7710.165747]  napi_complete_done+0x7a/0x1b0
  [ 7710.165749]  mlx5e_napi_poll+0x16e/0x730 [mlx5_core]
  [ 7710.165795]  __napi_poll+0x31/0x170
  [ 7710.165796]  net_rx_action+0x22f/0x280
  [ 7710.165798]  __do_softirq+0xce/0x281
  [ 7710.165800]  irq_exit_rcu+0xa2/0xd0
  [ 7710.165803]  common_interrupt+0x8d/0xa0
  [ 7710.165805]  </IRQ>
  [ 7710.165806]  asm_common_interrupt+0x1e/0x40
  [ 7710.165808] RIP: 0010:cpuidle_enter_state+0xcc/0x360
  [...]

The trace was only reproducible with NICs using CHECKSUM_COMPLETE as
csum type for inbound packets. It has been observed with mlx5, for
example. The hw csum failure was only reproducible under the following
conditions:

 - Protocol is ICMP, e.g. triggered by Cilium health probe packets
 - Pod from one node was pinging a remote node address
 - BPF based masquerading was used to SNAT Pod IP to node IP
 - BPF NAT engine found a collision in the NAT table such that
   it was forced to select a different ICMP id, and hence caused
   L4 rewrites

In the case of ICMPv4 the bug was that BPF_F_PSEUDO_HDR was used for
updating the L4 checksum. However, ICMPv4 does not have a pseudo
header, only ICMPv6. The packet based csum was okay either way, but
the flag caused to have a buggy skb->csum. Setting flag to 0 for
ICMPv4 stopped the hw csum traces.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Co-developed-by: Kornilios Kourtis <kornilios@isovalent.com>
Signed-off-by: Kornilios Kourtis <kornilios@isovalent.com>
Signed-off-by: Aditi Ghag <aditi@cilium.io>
1 parent 45ee508
History
Tip revision: 3516e3ca15956df81ee51f1a184605d94bdafa58 authored by Daniel Borkmann on 21 June 2021, 11:54:59 UTC
bpf: fix hw_csum issue for icmp probe packets
Tip revision: 3516e3c
File Mode Size
.github
.travis
Documentation
api
bpf
bugtool
cilium
cilium-health
clustermesh-apiserver
contrib
daemon
envoy
examples
hack
hubble-relay
images
install
jenkinsfiles
operator
pkg
plugins
proxylib
test
tests
tools
vendor
.authors.aux -rw-r--r-- 416 bytes
.dockerignore -rw-r--r-- 1.1 KB
.gitattributes -rw-r--r-- 236 bytes
.gitignore -rw-r--r-- 1.3 KB
.gitmodules -rw-r--r-- 0 bytes
.golangci.yaml -rw-r--r-- 3.2 KB
.mailmap -rw-r--r-- 3.4 KB
.travis.yml -rw-r--r-- 1.1 KB
AUTHORS -rw-r--r-- 16.3 KB
CHANGELOG.md -rw-r--r-- 130.6 KB
CODEOWNERS -rw-r--r-- 1.6 KB
CONTRIBUTING.md -rw-r--r-- 227 bytes
Dockerfile -rw-r--r-- 3.0 KB
Dockerfile.builder -rw-r--r-- 1.2 KB
FURTHER_READINGS.rst -rw-r--r-- 4.9 KB
GO_VERSION -rw-r--r-- 8 bytes
Jenkinsfile.nightly l--------- 32 bytes
LICENSE -rw-r--r-- 11.1 KB
MAINTAINERS.rst -rw-r--r-- 2.4 KB
Makefile -rw-r--r-- 22.8 KB
Makefile.buildkit -rw-r--r-- 4.1 KB
Makefile.defs -rw-r--r-- 5.6 KB
Makefile.docker -rw-r--r-- 9.6 KB
Makefile.quiet -rw-r--r-- 718 bytes
README.rst -rw-r--r-- 15.7 KB
SECURITY.md -rw-r--r-- 615 bytes
USERS.md -rw-r--r-- 6.1 KB
VERSION -rw-r--r-- 6 bytes
Vagrantfile -rw-r--r-- 12.6 KB
cilium-dev.Dockerfile -rw-r--r-- 1.4 KB
cilium-dev.Dockerfile.dockerignore -rw-r--r-- 931 bytes
cilium-docker-plugin.Dockerfile -rw-r--r-- 645 bytes
cilium-operator-aws.Dockerfile -rw-r--r-- 1.5 KB
cilium-operator-azure.Dockerfile -rw-r--r-- 1.5 KB
cilium-operator-generic.Dockerfile -rw-r--r-- 1.5 KB
cilium-operator.Dockerfile -rw-r--r-- 1.5 KB
clustermesh-apiserver.Dockerfile -rw-r--r-- 1.6 KB
docs.Jenkinsfile l--------- 29 bytes
flannel.Jenkinsfile l--------- 32 bytes
ginkgo-kubernetes-all.Jenkinsfile l--------- 46 bytes
ginkgo.Jenkinsfile l--------- 31 bytes
go.mod -rw-r--r-- 5.2 KB
go.sum -rw-r--r-- 101.3 KB
hubble-relay.Dockerfile -rw-r--r-- 1.4 KB
kubernetes-upstream.Jenkinsfile l--------- 44 bytes
netlify.toml -rw-r--r-- 92 bytes
vagrant_box_defaults.rb -rw-r--r-- 392 bytes

README.rst

back to top