Revision 319fa31794acfd05c8419355888e358dbdf8a58a authored by Steven Johnson on 09 June 2023, 02:42:13 UTC, committed by Sebastian Wicki on 13 June 2023, 19:22:04 UTC
[ upstream commit 25064d1ec51895ab89e2f736fcf7c6c66dfb5551 ]

After applying a backport of 9cc8a89f9 ("ipsec: Fix leak of XFRM
policies with ENI and Azure IPAMs") to 1.11.16, I noticed that we were
getting occasional spikes of "no inbound state" xfrm errors
(XfrmInNoStates). These lead to packet loss and brief outages for
applications sending traffic to the node on which the spikes occur.

I noticed that the "No node ID found for node." logline would appear at
the time of these spikes and from the code this is logged when the node
ID cannot be resolved. Looking a bit further the call to
`DeleteIPsecEndpoint` will end up deleting the xfrm state for any state
that matches the node id as derived from the mark in the state.

The problem seems to be that the inbound state for 0.0.0.0/0 -> node IP
has a mark of `0xd00` which when shifted >> 16 in
`getNodeIDFromXfrmMark` matches nodeID 0 and so the inbound state gets
deleted and the kernel drops all the inbound traffic as it no longer
matches a state.

This commit updates that logic to skip the XFRM state and policy
deletion when the node ID is zero.

Fixes: 9cc8a89f9 ("ipsec: Fix leak of XFRM policies with ENI and Azure IPAMs")
Signed-off-by: Steven Johnson <sjdot@protonmail.com>
Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
1 parent 00bb13b
Raw File
stable.txt
v1.10.5
back to top