Wednesday, November 16, 2022

Egress capture using TPROXY

Very low level notes on intercepting traffic for Istio and similar apps. IPtables provide 2 mechanisms to capture, REDIRECT and TPROXY. The first is buggy and not recommended by the kernel docs. TPROXY unfortunately requires NET_ADMIN (or root) and is only available in the `PREROUTING` chain, i.e. can only be used for packets received on an interface - not on packets sent by local apps ( OUTPUT ).

I've been playing with this for some time, and this is what I've found:

  1. Use an OUTPUT chain to mark packets - just like we do for REDIRECT interception
  2. Use a routing table with 'dev lo' to route all marked packets to loopback. 
  3. Apply TPROXY capture on the loopback PREROUTING - if the dest IP is not
It looks like this: 

  # Anything with the mark 15001 will be sent to loopback
  ip -4 rule add fwmark 15001 lookup 15001
  ip -4 route add local default dev lo table 15001

  # Calling this chain will set the mark resulting in route to lo
  iptables -t mangle -N ZT_CAPTURE_EGRESS
  iptables -t mangle -A ZT_CAPTURE_EGRESS -j MARK --set-mark 15001

  # PREROUTING on loopback - anything routed by the route table 15001, based on OUTPUT mark
  # Ignore local source or dst - it's not egress
  iptables -t mangle -N ZT_TPROXY
  iptables -t mangle -A ZT_TPROXY -d -j RETURN
  iptables -t mangle -A ZT_TPROXY -d -j RETURN
  iptables -t mangle -A ZT_TPROXY --match mark --mark 15001 -p tcp  -j TPROXY --tproxy-mark 15001/0xffffffff --on-port 15001
  iptables -t mangle -A PREROUTING -i lo -j ZT_TPROXY

  # Table that determines who gets redirected
  iptables -t mangle -N ZT_EGRESS
  iptables -t mangle -A OUTPUT  -j ZT_EGRESS

The OUTPUT table is similar to regular Istio:

  # Exclude few ports that should not be captured
  iptables -t mangle -A ZT_EGRESS  -p tcp --dport 15001 -j RETURN
  iptables -t mangle -A ZT_EGRESS  -p tcp --dport 15009 -j RETURN
  iptables -t mangle -A ZT_EGRESS  -p tcp --dport 15008 -j RETURN

  # UID or GID of the app capturing - so it can originate egress without 
  # getting captured again.
  # Best is to use GID - so root user is also captured. However when debuggin
  # in an IDE like CLion/Golang it is very easy to set 'run as root' but not
  # 'using group id' - so using uid-owner.
  iptables -t mangle -A ZT_EGRESS -m owner --uid-owner 0 -j RETURN

  # For now capture only, private range, can be changed to
  # to capture everything.
  iptables -t mangle -A ZT_EGRESS -d -j ZT_CAPTURE_EGRESS

This works for sidecars - and avoids the problems with REDIRECT, however it does require the sidecar to run with NET_ADMIN cap, which is not always possible. For Ambient Istio (ZTunnel) it may not be needed since eBPF or veth can be used instead.

The other major benefit of TPproxy is that it also allows UDP capture - REDIRECT 'original DST' does not work for UDP. I did a bit of testing with UDP and IPv6 - all seems to be working.

No comments: