Bug in recent large_alloc changes to the ZFS zio code?
rmtodd at ichotolot.servalan.com
Sun May 31 06:45:19 UTC 2009
Okay, I'm looking at the recent changes in the ZFS zio code to change how
data buffers are allocated (svn r192207). The old code for
zio_data_buf_alloc just called kmem_alloc (the Solaris compatibility
one), which in turn called malloc() with M_WAITOK, so it would always
be guaranteed of getting a valid, non-null pointer. Fair enough.
The new code has an alternate code path, where in "arc_large_memory_enabled"
mode, it calls the new function zio_large_malloc instead. zio_large_malloc
in turn tries a few times to allocate the required pages using
vm_phys_alloc_contig, but if that fails goes ahead and returns NULL.
Here's the problem. As near as I can tell, none of the code that calls
zio_data_buf_alloc appears to check for the possibility that the
returned pointer could be NULL, which I guess is reasonable as the original
code never could return NULL. However, the new large malloc code *can* return
NULL, which causes the obvious problem. The other day I mentioned here a
panic I saw where under sufficiently heavy load the GEOM code was
complaining that it had been given a NULL data pointer. It seems to me that
that was likely because zio had tried to allocate a data buffer and gotten
a NULL pointer instead.
More information about the freebsd-current