cache_purge > cache_zap segmentation fault

Thu May 8 16:53:54 PDT 2003

On Thu, May 08, 2003 at 09:12:52PM +0200, Poul-Henning Kamp wrote:
> In message <20030508150341.B28906 at internetDog.org>, Ali Bahar writes:

> If you look at the definition in sys/sys/vnode.h, it actually is pretty

I have. It's the "from us"/"to us" which were unclear:

>         LIST_HEAD(, namecache) v_cache_src;     /* c Cache entries from us */
>         TAILQ_HEAD(, namecache) v_cache_dst;    /* c Cache entries to us */

> So imagine this name cache entry:
> 
> 	{
> 	directory vnode:	vnode of "/usr/src"	[vnode#1]
> 	name			"sys"
> 	destination vnode:	vnode of "/usr/src/sys"	[vnode#2]
> 	}

So this must be a namecache for /usr/src/sys.

> This name cache entry will be a member of the "v_cache_src" LIST from
> vnode#1, and of the v_cache_dst TAILQ for vnode#2
> 
> Other entries on vnode#1's ..._src LIST will be for /usr/src/bin,
> /usr/src/etc and so on.

> The thing you have to remember is that one vnode can have multiple names
> due to hard links.  If it could have only one, the TAILQ would not
> be necessary.

I believe I understand. 
So a 'destination chain' (ie TAILQ/v_cache_dst) lists all the names
which a file (or dir) may be known as. 

Then, judging by the definition of 'struct namecache' and cache_purge,
there are 3 name cache chains: hash, source and destination. I
don't understand why separate source and destination system-wide
chains are needed. I'd have expected that a purge/reclaim would need
only delete the namecache from a single global list.

That is, for each destination namecache I find for the recycled vnode,
I'll be deleting only one namecache node in a global chain.

> >Would you know if the 5.0 modifications could fix this problem?
> 
> I'm not sure I know what the problem is, I just stumbled on your
> email midthread I think...

The relevant posts were done this past Sunday and Wednesday. 
In brief, I get a seg fault in 

  getnewvnode > cache_purge(v_cache_dst) > cache_zap(nc_src)

because nc_src.le_next has a junk value, which gets de-referenced.
The crash happens in various processes at various times. It started
about 3 weeks ago, and has been increasing in frequency. We're adding
some networking modules to the kernel, and there've been quite a few
resulting crashes of the box! :-)

Considering its increasing frequency, I even suspected that the
filesystem had been corrupted -- in a way undetected by fsck. But, a
'normal' filesystem corruption exhibits _random_ crashes, not ones
consistently following the above execution thread.

However, your mention of hard links makes me wonder. I thought hard
links were rare. Are they prevalent in a FreeBSD/unix OS tree? (I'll
look this up.) If so, then maybe the underlying filesystem objects
have been corrupted by all the crashes? ... Nah, the files accessed
when the seg fault occurs, are often temporary files (eg .o files
during compilation).

Thanks much for your help. Much appreciated.
regards,
ali
-- 
             Jesus was an Arab.