fixes for ipfw and pf lock ordering issues

Tue Sep 28 11:01:16 PDT 2004

On Fri, Sep 24, 2004 at 10:37:54PM +0000, Christian S.J. Peron wrote:
> Good day folks, we need some beta testers
> 
Hi, as an author of LOR reports I feel obliged to test this patch. I was
running it for a 2 days and intended to report, that for me everything works
ok, when an panic occured. Regretably, I do not have actual panic message, but
the trace looks as follows:
pf_socket_lookup(cbb24958,cbb2495c,2,cbb24a0c,c15275a0) at
pf_socket_lookup+0x22
pf_test_tcp(cbb249c0,cbb249bc,2,c14d6200,c139e500) at pf_test_tcp+0x648
pf_test(2,c14b8014,cbb24aa8,c15275a0,c15661c0) at pf_test+0x53d
pf_check_out(0,cbb24aa8,c14b8014,2,c15275a0) at pf_check_out+0x6d
pfil_run_hooks(c066da00,cbb24b1c,c14b8014,2,c15275a0) at pfil_run_hooks+0xeb
ip_output(c139e500,0,cbb24ae8,0,0) at ip_output+0x630
tcp_twrespond(c18709a0,10,c0607304,69c,1) at tcp_twrespond+0x1ed
tcp_twstart(c186b380,0,c0606ba2,96f,0) at tcp_twstart+0x1d3
tcp_input(c139d800,14,c14b8014,1,0) at tcp_input+0x2c39
ip_input(c139d800,0,c06053ae,e7,c066d098) at ip_input+0x5b0
netisr_processqueue(c066d098,c0642940,1,c05fb4da,c10d62c0) at
netisr_processqueu
e+0x8e
swi_net(0,0,c05f9b18,269,0) at swi_net+0xe9
ithread_loop(c10de480,cbb24d48,c05f990f,31f,1000000) at ithread_loop+0x172
fork_exit(c04a6520,c10de480,cbb24d48) at fork_exit+0xc6
fork_trampoline() at fork_trampoline+0x8
--- trap 0x1, eip = 0, esp = 0xcbb24d7c, ebp = 0 ---
db>

db> show locks
exclusive sleep mutex inp (tcpinp) r = 0 (0xc1527630) locked @
/usr/src/sys/neti
net/tcp_input.c:737
exclusive sleep mutex tcp r = 0 (0xc066de6c) locked @
/usr/src/sys/netinet/tcp_i
nput.c:611
db>

(gdb) l *pf_socket_lookup+0x22
0xc043a2d2 is in pf_socket_lookup (/usr/src/sys/contrib/pf/net/pf.c:2414).
2409    #endif
2410            struct inpcb            *inp;
2411
2412    #ifdef __FreeBSD__
2413            if (inp_arg != NULL) {
2414                    *uid = inp_arg->inp_socket->so_cred->cr_uid;
2415                    *gid = inp_arg->inp_socket->so_cred->cr_groups[0];
2416                    return (1);
2417            }
2418    #endif

(gdb) l *pf_test_tcp+0x648
0xc043aef8 is in pf_test_tcp (/usr/src/sys/contrib/pf/net/pf.c:2781).
2776                            r = TAILQ_NEXT(r, entries);
2777                    else if (r->rule_flag & PFRULE_FRAGMENT)
2778                            r = TAILQ_NEXT(r, entries);
2779                    else if ((r->flagset & th->th_flags) != r->flags)
2780                            r = TAILQ_NEXT(r, entries);
2781                    else if (r->uid.op && (lookup != -1 || (lookup =
2782    #ifdef __FreeBSD__
2783                        pf_socket_lookup(&uid, &gid, direction, pd, inp),
1)) &&
2784    #else
2785                        pf_socket_lookup(&uid, &gid, direction, pd), 1))
&&

If there is anything more I may provide, please tell me. I can't get my kernel
dumps on, although I have KDB_UNATTENDED  option in kernel, it gaves me prompt
on panics, and when I call panic from debugger I get hangs :S If you know any
other way to get the panic message, I'd appreciate.

My comments for the patch alone:
Before the patch, I got the LOR's and rather rare panics due to this problem.
They were happening mainly when changing PF rules, sometimes on shutdown.

After the patch, I do not have any LOR messages, I tried to load PF rules in a
loop for a few minutes. After that I just left the system for it own, while
there was some activity on network (and particularly on rules with uid
matching). Till today I was quite happy with that.

If there is anything I can debug more, to help you solve the problem, please
ask.

Cheers,

Wiktor Niesiobedzki

PS. Just for the record - I tired it only with PF. I'm also planning to give
it a shot with my old IPFW rules.