cleaning files beyond EOF

Sun Feb 17 07:48:38 UTC 2013

On Sun, Feb 17, 2013 at 06:01:50PM +1100, Bruce Evans wrote:
> On Sun, 17 Feb 2013, Konstantin Belousov wrote:
> 
> > On Sun, Feb 17, 2013 at 11:33:58AM +1100, Bruce Evans wrote:
> >> I have a (possibly damaged) ffs data block with nonzero data beyond
> >> EOF.  Is anything responsible for clearing this data when the file
> >> is mmapped()?
> >>
> >> At least old versions of gcc mmap() the file and have a bug checking
> >> for EOF.  They read the garbage beyond the end and get confused.
> >
> > Does the 'damaged' status of the data block mean that it contain the
> > garbage after EOF on disk ?
> 
> Yes, it's at most software damage.  I used a broken version of
> vfs_bio_clrbuf() for a long time and it probably left some unusual
> blocks.  This matters suprisingly rarely.
I recently had to modify the vfs_bio_clrbuf().  For me, a bug in the
function did matter a lot, because the function is used, in particular,
to clear the indirect blocks.  The bug caused quite random filesystem
failures until I figured it out.  My version of vfs_bio_clrbuf() is
at the end of the message, it avoids accessing b_data.

> 
> I forgot to mention that this is with an old version of FreeBSD,
> where I changed vfs_bio.c a lot but barely touched vm.
> 
> > UFS uses a small wrapper around vnode_generic_getpages() as the
> > VOP_GETPAGES(), the wrapping code can be ignored for the current
> > purpose.
> >
> > vnode_generic_getpages() iterates over the the pages after the bstrategy()
> > and marks the part of the page after EOF valid and zeroes it, using
> > vm_page_set_valid_range().
> 
> The old version has a large non-wrapper in ffs, and vnode_generic_getpages()
> uses vm_page_set_validclean().  Maybe the bug is just in the old
> ffs_getpages().  It seems to do only DEV_BSIZE'ed zeroing stuff.  It
> begins with the same "We have to zero that data" code that forms most
> of the wrapper in the current version.  It normally only returns
> vnode_pager_generic_getpages() after that if bsize < PAGE_SIZE.
> However, my version has a variable which I had forgotten about to
> control this, and the forgotten setting of this variable results in
> always using vnode_pager_generic_getpages(), as in -current.  I probably
> copied some fixes in -current for this.  So the bug can't be just in
> ffs_getpages().
> 
> The "damaged" block is at the end of vfs_default.c.  The file size is
> 25 * PAGE_SIZE + 16.  It is in 7 16K blocks, 2 full 2K frags, and 1 frag
> with 16 bytes valid in it.
But the ffs_getpages() might be indeed the culprit. It calls
vm_page_zero_invalid(), which only has DEV_BSIZE granularity. I think
that ffs_getpages() also should zero the after eof part of the last page
of the file to fix your damage, since device read cannot read less than
DEV_BSIZE.

diff --git a/sys/ufs/ffs/ffs_vnops.c b/sys/ufs/ffs/ffs_vnops.c
index ef6194c..4240b78 100644
--- a/sys/ufs/ffs/ffs_vnops.c
+++ b/sys/ufs/ffs/ffs_vnops.c
@@ -844,9 +844,9 @@ static int
 ffs_getpages(ap)
 	struct vop_getpages_args *ap;
 {
-	int i;
 	vm_page_t mreq;
-	int pcount;
+	uint64_t size;
+	int i, pcount;
 
 	pcount = round_page(ap->a_count) / PAGE_SIZE;
 	mreq = ap->a_m[ap->a_reqpage];
@@ -861,6 +861,9 @@ ffs_getpages(ap)
 	if (mreq->valid) {
 		if (mreq->valid != VM_PAGE_BITS_ALL)
 			vm_page_zero_invalid(mreq, TRUE);
+		size = VTOI(ap->a_vp)->i_size;
+		if (mreq->pindex == OFF_TO_IDX(size))
+			pmap_zero_page_area(mreq, size & PAGE_MASK, PAGE_SIZE);
 		for (i = 0; i < pcount; i++) {
 			if (i != ap->a_reqpage) {
 				vm_page_lock(ap->a_m[i]);

On the other hand, it is not clear should we indeed protect against such
case, or just declare the disk data broken.

> 
> I have another problem that is apparently with
> vnode_pager_generic_getpages() and now affects -current from about a
> year ago in an identical way with the old version: mmap() is very slow
> in msdosfs.  cmp uses mmap() too much, and reading files sequentially
> using mmap() is 3.4 times slower than reading them using read() on my
> DVD media/drive.  The i/o seems to be correctly clustered for both.
> with average transaction sizes over 50K but tps much lower for mmap().
> Similarly on a (faster) hard disk except the slowness is not as noticeable
> (drive buffering might hide it completely).  However, for ffs files on
> the hard disk, mmap() is as fast as read().

diff --git a/sys/kern/vfs_bio.c b/sys/kern/vfs_bio.c
index 6393399..83d3609 100644
--- a/sys/kern/vfs_bio.c
+++ b/sys/kern/vfs_bio.c
@@ -3704,8 +4070,7 @@ vfs_bio_set_valid(struct buf *bp, int base, int size)
 void
 vfs_bio_clrbuf(struct buf *bp) 
 {
-	int i, j, mask;
-	caddr_t sa, ea;
+	int i, j, mask, sa, ea, slide;
 
 	if ((bp->b_flags & (B_VMIO | B_MALLOC)) != B_VMIO) {
 		clrbuf(bp);
@@ -3723,39 +4088,69 @@ vfs_bio_clrbuf(struct buf *bp)
 		if ((bp->b_pages[0]->valid & mask) == mask)
 			goto unlock;
 		if ((bp->b_pages[0]->valid & mask) == 0) {
-			bzero(bp->b_data, bp->b_bufsize);
+			pmap_zero_page_area(bp->b_pages[0], 0, bp->b_bufsize);
 			bp->b_pages[0]->valid |= mask;
 			goto unlock;
 		}
 	}
-	ea = sa = bp->b_data;
-	for(i = 0; i < bp->b_npages; i++, sa = ea) {
-		ea = (caddr_t)trunc_page((vm_offset_t)sa + PAGE_SIZE);
-		ea = (caddr_t)(vm_offset_t)ulmin(
-		    (u_long)(vm_offset_t)ea,
-		    (u_long)(vm_offset_t)bp->b_data + bp->b_bufsize);
+	sa = bp->b_offset & PAGE_MASK;
+	slide = 0;
+	for (i = 0; i < bp->b_npages; i++) {
+		slide = imin(slide + PAGE_SIZE, bp->b_bufsize + sa);
+		ea = slide & PAGE_MASK;
+		if (ea == 0)
+			ea = PAGE_SIZE;
 		if (bp->b_pages[i] == bogus_page)
 			continue;
-		j = ((vm_offset_t)sa & PAGE_MASK) / DEV_BSIZE;
+		j = sa / DEV_BSIZE;
 		mask = ((1 << ((ea - sa) / DEV_BSIZE)) - 1) << j;
 		VM_OBJECT_LOCK_ASSERT(bp->b_pages[i]->object, MA_OWNED);
 		if ((bp->b_pages[i]->valid & mask) == mask)
 			continue;
 		if ((bp->b_pages[i]->valid & mask) == 0)
-			bzero(sa, ea - sa);
+			pmap_zero_page_area(bp->b_pages[i], sa, ea - sa);
 		else {
 			for (; sa < ea; sa += DEV_BSIZE, j++) {
-				if ((bp->b_pages[i]->valid & (1 << j)) == 0)
-					bzero(sa, DEV_BSIZE);
+				if ((bp->b_pages[i]->valid & (1 << j)) == 0) {
+					pmap_zero_page_area(bp->b_pages[i],
+					    sa, DEV_BSIZE);
+				}
 			}
 		}
 		bp->b_pages[i]->valid |= mask;
+		sa = 0;
 	}
 unlock:
 	VM_OBJECT_UNLOCK(bp->b_bufobj->bo_object);
 	bp->b_resid = 0;
 }
 
+void
+vfs_bio_bzero_buf(struct buf *bp, int base, int size)
+{
+	vm_page_t m;
+	int i, n;
+
+	if ((bp->b_flags & B_UNMAPPED) == 0) {
+		BUF_CHECK_MAPPED(bp);
+		bzero(bp->b_data + base, size);
+	} else {
+		BUF_CHECK_UNMAPPED(bp);
+		n = PAGE_SIZE - (base & PAGE_MASK);
+		VM_OBJECT_LOCK(bp->b_bufobj->bo_object);
+		for (i = base / PAGE_SIZE; size > 0 && i < bp->b_npages; ++i) {
+			m = bp->b_pages[i];
+			if (n > size)
+				n = size;
+			pmap_zero_page_area(m, base & PAGE_MASK, n);
+			base += n;
+			size -= n;
+			n = PAGE_SIZE;
+		}
+		VM_OBJECT_UNLOCK(bp->b_bufobj->bo_object);
+	}
+}
+
 /*
  * vm_hold_load_pages and vm_hold_free_pages get pages into
  * a buffers address space.  The pages are anonymous and are
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 834 bytes
Desc: not available
URL: <http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20130217/eed14663/attachment.sig>