vnet: acessing module's virtualized global variables from another module

Marko Zec zec at fer.hr
Mon May 9 22:11:01 UTC 2011


On Monday 09 May 2011 19:05:28 Mikolaj Golub wrote:
> On Mon, 9 May 2011 16:21:15 +0200 Marko Zec wrote:
>
>  MZ> On Monday 09 May 2011 14:48:25 Mikolaj Golub wrote:
>  >> Hi,
>  >>
>  >> Trying ipfw_nat under VIMAGE kernel I got this panic on the module
>  >> load:
>
>  MZ> Hi,
>
>  MZ> I think the problem here is that curvnet context is not set properly
> on entry MZ> to ipfw_nat_modevent().  The canonical way to initialize
> VNET-enabled MZ> subsystems is to trigger them using VNET_SYSINIT() macros
> (instead of using MZ> modevent mechanisms), which in turn ensure that:
>
>  MZ> a) that the initializer function gets invoked for each existing vnet
>  MZ> b) curvnet context is set properly on entry to initializer functions
> and
>
> hm, sorry, but I don't see how curvnet context might help here.

You're getting a panic in a function, i.e. in ipfw_nat_modevent() which has 
ipfw_nat_init() inlined into it, where you attempt to access per-vnet data 
without having curvnet context set.  By definition that is not supposed to 
work on a VIMAGE kernel, so what you observe is not unexpected at all.  
Please set the curvnet context using VNET_SYSINIT() macros, or by hand using 
CURVNET_SET() / CURVNET_RESTORE(), before accesing any V_ data.

Marko


> For me this 
> does not look like curvnet context problem or my understanding how it works
> completely wrong.
>
> Below is kgdb session on live VIMAGE system with ipfw.ko loaded.
>
> Let's look at some kernel virtualized variable:
>
> (kgdb) p vnet_entry_ifnet
> $1 = {tqh_first = 0x0, tqh_last = 0x0}
> (kgdb) p &vnet_entry_ifnet
> $2 = (struct ifnethead *) 0x8102d488
>
> As expected the address is in kernel 'set_vnet':
>
> kopusha:/usr/src/sys% kldstat |grep kernel
>  1   69 0x80400000 1092700  kernel
> kopusha:/usr/src/sys% nm /boot/kernel/kernel |grep  __start_set_vnet
> 8102d480 A __start_set_vnet
>
> default vnet:
>
> (kgdb) p vnet0
> $3 = (struct vnet *) 0x86d9b000
>
> Calculate ifnet location on vnet0 (a la VNET_VNET(vnet0, ifnet)):
>
> (kgdb) printf "0x%x\n", vnet0->vnet_data_base + (uintptr_t) &
> vnet_entry_ifnet 0x86d9c008
>
> Access it:
>
> (kgdb) p *((struct ifnethead *)0x86d9c008)
> $4 = {tqh_first = 0x86da5c00, tqh_last = 0x89489c0c}
> (kgdb) p (*((struct ifnethead *)0x86d9c008)).tqh_first->if_dname
> $7 = 0x80e8b480 "usbus"
> (kgdb) p (*((struct ifnethead *)0x86d9c008)).tqh_first->if_vnet
> $8 = (struct vnet *) 0x86d9b000
>
> Everything looks good. Now try the same with virtualized variable
> layer3_chain from ipfw module:
>
> (kgdb) p vnet_entry_layer3_chain
> $9 = {rules = 0x0, reap = 0x0, default_rule = 0x0, n_rules = 0, static_len
> = 0, map = 0x0, nat = {lh_first = 0x0}, tables = {0x0 <repeats 128 times>},
> rwmtx = {lock_object = { lo_name = 0x0, lo_flags = 0, lo_data = 0,
> lo_witness = 0x0}, rw_lock = 0}, uh_lock = { lock_object = {lo_name = 0x0,
> lo_flags = 0, lo_data = 0, lo_witness = 0x0}, rw_lock = 0}, id = 0, gencnt
> = 0}
>
> "master" variable looks good (initialized to zeros), what about its
> address?
>
> (kgdb) p &vnet_entry_layer3_chain
> $10 = (struct ip_fw_chain *) 0x894a5c00
>
> It points to 'set_vnet' of the ipfw.ko:
>
> kopusha# kldstat |grep ipfw.ko
> 13    2 0x89495000 11000    ipfw.ko
> kopusha:/usr/src/sys% nm /boot/kernel/ipfw.ko |grep  __start_set_vnet
> 00010be0 A __start_set_vnet
> kopusha:/usr/src/sys% printf "0x%x\n" $((0x89495000 + 0x00010be0))
> 0x894a5be0
>
> Calculate layer3_chain location on vnet0 (a la VNET_VNET(vnet0,
> layer3_chain)):
>
> (kgdb) printf "0x%x\n", vnet0->vnet_data_base + (uintptr_t) &
> vnet_entry_layer3_chain 0x8f214780
>
> Try to read it:
>
> (kgdb) p ((struct ip_fw_chain *)0x8f214780)->rwmtx
> $13 = {lock_object = {lo_name = 0x0, lo_flags = 0, lo_data = 0, lo_witness
> = 0x0}, rw_lock = 0} (kgdb) p ((struct ip_fw_chain *)0x8f214780)->rules
> $14 = (struct ip_fw *) 0x6
>
> Data looks wrong. But this is the way how this variable is acessed by
> ipfw_nat. I see the same in the crash image:
>
> (kgdb) where
> ...
> #11 0xc09a4882 in _rw_wlock (rw=0xc6d5e91c,
>     file=0xca0ac2e3
> "/usr/src/sys/modules/ipfw_nat/../../netinet/ipfw/ip_fw_nat.c", line=547)
> at /usr/src/sys/kern/kern_rwlock.c:238
> #12 0xca0ab841 in ipfw_nat_modevent (mod=0xc98a48c0, type=0, unused=0x0)
>     at /usr/src/sys/modules/ipfw_nat/../../netinet/ipfw/ip_fw_nat.c:547
>
> note, rw=0xc6d5e91c (it crashed on it). And I get the same address doing
> like I did above:
>
> (kgdb) VNET_VNET vnet0 vnet_entry_layer3_chain
> at 0xc6d5e700 of type = struct ip_fw_chain
> (kgdb) p &((struct ip_fw_chain *)0xc6d5e700)->rwmtx
> $8 = (struct rwlock *) 0xc6d5e91c
>
> Thus ipfw_nat was in vnet0 context then. I saw crashes (in other modules)
> when the context was not initialised and they looked differently.
>
> Right location was 0x86d9c160 (found adding print to ipfw module, I don't
> know easier way):
>
> (kgdb) p ((struct ip_fw_chain *)0x86d9c160)->rwmtx
> $1 = {lock_object = {lo_name = 0x932ba4b3 "IPFW static rules", lo_flags =
> 69402624, lo_data = 0, lo_witness = 0x86d6ab30}, rw_lock = 1}
> (kgdb) p ((struct ip_fw_chain *)0x86d9c160)->rules
> $2 = (struct ip_fw *) 0x8f2d1c80
>
> So I don't see a way how to reach module's virtualized variable from
> outside the module even if you are in the right vnet context. The linker,
> when loading the module and allocating the variable on vnet stacks in
> 'modspace' possesses this information and it reallocates addresses in the
> module and they are accessible from inside the module, but not from
> outside.
>
>  MZ> Cheers,
>
>  MZ> Marko
>
>  >> Fatal trap 12: page fault while in kernel mode
>  >> cpuid = 1; apic id = 01
>  >> fault virtual address   = 0x4
>  >> fault code              = supervisor read, page not present
>  >> instruction pointer     = 0x20:0xc09f098e
>  >> stack pointer           = 0x28:0xf563b944
>  >> frame pointer           = 0x28:0xf563b998
>  >> code segment            = base 0x0, limit 0xfffff, type 0x1b
>  >>                         = DPL 0, pres 1, def32 1, gran 1
>  >> processor eflags        = interrupt enabled, resume, IOPL = 0
>  >> current process         = 4264 (kldload)
>  >>
>  >> witness_checkorder(c6d5e91c,9,ca0ac2e3,223,0,...) at
>  >> witness_checkorder+0x6e _rw_wlock(c6d5e91c,ca0ac2e3,223,0,c0e8f795,...)
>  >> at _rw_wlock+0x82
>  >> ipfw_nat_modevent(c98a48c0,0,0,75,0,...) at ipfw_nat_modevent+0x41
>  >> module_register_init(ca0ad508,0,c0e8d834,e6,0,...) at
>  >> module_register_init+0xa7
>  >> linker_load_module(0,f563bc18,c0e8d834,3fc,f563bc28,...) at
>  >> linker_load_module+0xa05
>  >> kern_kldload(c86835c0,c72d3400,f563bc40,0,c8d0d000,...) at
>  >> kern_kldload+0x133 kldload(c86835c0,f563bcec,c09e8940,c86835c0,0,...)
>  >> at kldload+0x74 syscallenter(c86835c0,f563bce4,c0ce05dd,c1022150,0,...)
>  >> at syscallenter+0x263 syscall(f563bd28) at syscall+0x34
>  >> Xint0x80_syscall() at Xint0x80_syscall+0x21
>  >> --- syscall (304, FreeBSD ELF32, kldload), eip = 0x280da00b, esp =
>  >> 0xbfbfe79c, ebp = 0xbfbfec88 -
>  >>
>  >> It crashed on acessing data from virtualized global variable
>  >> V_layer3_chain in ipfw_nat_modevent(). V_layer3_chain is defined in
>  >> ipfw module and it turns out that &V_layer3_chain returns wrong
>  >> location from anywhere but ipfw.ko.
>  >>
>  >> May be this is a known issue, but I have not found info about this, so
>  >> below are details of investigation why this happens.
>  >>
>  >> Virtualized global variables are defined using the VNET_DEFINE() macro,
>  >> which places them in the 'set_vnet' linker set (in the base kernel or
>  >> in module). This is used to
>  >>
>  >> 1) copy these "default" values to each virtual network stack instance
>  >> when created;
>  >>
>  >> 2) act as unique global names by which the variable can be referred to.
>  >> The location of a per-virtual instance variable is calculated at
>  >> run-time like in the example below for layer3_chain variable in the
>  >> default vnet (vnet0):
>  >>
>  >> vnet0->vnet_data_base + (uintptr_t) & vnet_entry_layer3_chain          
>  >>      (1)
>  >>
>  >> For modules the thing is more complicated. When a module is loaded its
>  >> global variables from 'set_vnet' linker set are copied to the kernel
>  >> 'set_vnet', and for module to be able to access them the linker
>  >> reallocates all references accordingly
>  >> (kern/link_elf.c:elf_relocaddr()):
>  >>
>  >>         if (x >= ef->vnet_start && x < ef->vnet_stop)
>  >>                 return ((x - ef->vnet_start) + ef->vnet_base);
>  >>
>  >> So from inside the module the access to its virtualized variables
>  >> works, but from the outside we get wrong location using calculation
>  >> like above (1), because &vnet_entry_layer3_chain returns address of the
>  >> variable in the module's 'set_vnet'.
>  >>
>  >> The workaround is to compile such modules into the kernel or use a hack
>  >> I have done for ipfw_nat -- add the function to ipfw module which
>  >> returns the location of virtualized layer3_chain variable and use this
>  >> location instead of V_layer3_chain macro (see the attached patch).
>  >>
>  >> But I suppose the problem is not a new and there might be better
>  >> approach already invented to deal with this?




More information about the freebsd-virtualization mailing list