PF or "traceroute -e -P TCP" bug?

Mon Aug 21 20:03:29 UTC 2006

On Mon, Aug 21, 2006 at 11:23:50AM +0200, Daniel Hartmeier wrote:
> [ I'm CC'ing Crist, maybe he can explain why -e behaves like it does ]
> 
> On Fri, Aug 18, 2006 at 11:57:56PM +0300, Rostislav Krasny wrote:
> 
> > I've tried the new "-e" traceroute option on today's RELENG_6 and
> > found following problem:
> > 
> > > traceroute -nq 1 -e -P TCP -p 80 216.136.204.117
> 
> As I understand the -e option, that should send a sequence of TCP SYNs
> with
> 
>   - constant source port (randomly picked per invokation)

It's actually trivial encoding of the traceroute process ID so
that two traceroute programs running simultaenously do not
clobber each other. However, this becomes important.

>   - constant destination port 80

Yes, the whole point of "-e."

>   - increasing TTL per probe

Yes, the basic kludge that makes traceroute work.

Here is the basic explanation behind the changes,

	http://docs.freebsd.org/cgi/getmsg.cgi?fetch=414378+0+archive/2005/freebsd-net/20050925.freebsd-net

[snip]
> What you changed in your patch is switching to a sequential (instead of
> constant) source port. This forces creation of one state per probe,
> treating each probe as a separate connection. I don't think that's in
> the spirit of the -e option. There's really no need for that, once the
> underlying problem is fixed.

Creating multiple state entries in a firewall really has no concequence
as far as the operation of the "-e" option goes. It doesn't have any
affect on the three essential characeristics of the probe that you listed
above.

> So, why doesn't -e without your patch produce probes that all match a
> single state entry?
> 
> Look at how the TCP sequence numbers are generated across the probes:
> 
> 	tcp->th_seq = (tcp->th_sport << 16) | (tcp->th_dport +
> 	    (fixedPort ? outdata->seq : 0));
> 
> This is the problem. traceroute increments the sequence number with each
> probe. I don't know why that is done. Why not use the same th_seq for
> all probes, like an ISN (initial sequence number) would be re-used in
> retransmissions in a real TCP handshake?

'Cause I needed to include that traceroute sequence number somewhere
since it wasn't in the destination port any more.

> If you create state on the first TCP SYN pf sees, pf will note the ISN
> from the traceroute side. When pf sees further SYNs from that side, it
> will deal with them like with any client retransmitting the SYN of the
> handshake (before the peer replies with a SYN+ACK, giving its side's
> ISN). Subsequent TCP SYNs with different ISN matching the address/port
> pairs will be blocked by pf.

That may be a little strict on the part of pf. One has to balance the
"liberal in what you accept" versus being overly strict in security
software. But it would be difficult to come up with a legitimate reason
for a host to send SYNs with differing ISNs to the same
source-IP-source-port-destination-IP-destination-port-tuple on any
timescale less than the MSL.

> If this happens on the IP forwarding path (i.e. pf blocks the packet
> outgoing), the stack produces the ICMP host unreachable error that shows
> up as "!H" in traceroute. I assume you have a "pass out on $ext_if keep
> state" rule, and don't filter on the internal interface. If you add
> stateful filtering on the internal interface, I think you'll find that
> subsequent TCP SYNs are blocked without eliciting the ICMP error.
> 
> I suggest traceroute with -e uses fixed th_seq, as in
> 
> -	tcp->th_seq = (tcp->th_sport << 16) | (tcp->th_dport +
> -	   (fixedPort ? outdata->seq : 0));
> +	tcp->th_seq = (tcp->th_sport << 16) tcp->th_dport;
> 
> Maybe the (fixedPort?:) operands were mistakenly switched, and you want to
> increment th_seq when -e is NOT used, but I can't think off-hand why you
> would.

The ISNs do increment when the "-e" option is not used since
the dport increments. That's why I didn't realize incrementing
the SYN might cause new problems. The problem with this patch
is that we don't have the sequence number anywhere in the TCP
header. (Don't bring up the IP header please. That's a whole
'nother issue.)

So, to expand on the three points above, we need (1) fixed
destination port, (2) to increment IP TTL, (3) the sequence
number encoded in some head field, and (3) a source port
chosen so that multiple traceroute invocations do not
share any src-sport-dst-dport-tuples during their lifetime.
In the past, using the PID worked for the sport, but think about
what happens if you start with the PID then start incrementing
or decrementing, you get overlaps (unless your system does a
decent job with random PIDs; not the default for FreeBSD
unfortunately).

The patch to freebsd-net addresses these problems. It
changes the sorce port so that we don't have overlapping
src-sport-dst-dport-tuples, and uses a base source port from
the LSBs of the clock for a "random" number. That would seem
to fix the problem. The only question would be is that a good
way to pick the base source port? It's probably good enough,
although some kind of hash of the PID might be better.
-- 
Crist J. Clark                     |     cjclark at alum.mit.edu