cleaning files beyond EOF
Konstantin Belousov
kostikbel at gmail.com
Sun Feb 17 07:48:38 UTC 2013
On Sun, Feb 17, 2013 at 06:01:50PM +1100, Bruce Evans wrote:
> On Sun, 17 Feb 2013, Konstantin Belousov wrote:
>
> > On Sun, Feb 17, 2013 at 11:33:58AM +1100, Bruce Evans wrote:
> >> I have a (possibly damaged) ffs data block with nonzero data beyond
> >> EOF. Is anything responsible for clearing this data when the file
> >> is mmapped()?
> >>
> >> At least old versions of gcc mmap() the file and have a bug checking
> >> for EOF. They read the garbage beyond the end and get confused.
> >
> > Does the 'damaged' status of the data block mean that it contain the
> > garbage after EOF on disk ?
>
> Yes, it's at most software damage. I used a broken version of
> vfs_bio_clrbuf() for a long time and it probably left some unusual
> blocks. This matters suprisingly rarely.
I recently had to modify the vfs_bio_clrbuf(). For me, a bug in the
function did matter a lot, because the function is used, in particular,
to clear the indirect blocks. The bug caused quite random filesystem
failures until I figured it out. My version of vfs_bio_clrbuf() is
at the end of the message, it avoids accessing b_data.
>
> I forgot to mention that this is with an old version of FreeBSD,
> where I changed vfs_bio.c a lot but barely touched vm.
>
> > UFS uses a small wrapper around vnode_generic_getpages() as the
> > VOP_GETPAGES(), the wrapping code can be ignored for the current
> > purpose.
> >
> > vnode_generic_getpages() iterates over the the pages after the bstrategy()
> > and marks the part of the page after EOF valid and zeroes it, using
> > vm_page_set_valid_range().
>
> The old version has a large non-wrapper in ffs, and vnode_generic_getpages()
> uses vm_page_set_validclean(). Maybe the bug is just in the old
> ffs_getpages(). It seems to do only DEV_BSIZE'ed zeroing stuff. It
> begins with the same "We have to zero that data" code that forms most
> of the wrapper in the current version. It normally only returns
> vnode_pager_generic_getpages() after that if bsize < PAGE_SIZE.
> However, my version has a variable which I had forgotten about to
> control this, and the forgotten setting of this variable results in
> always using vnode_pager_generic_getpages(), as in -current. I probably
> copied some fixes in -current for this. So the bug can't be just in
> ffs_getpages().
>
> The "damaged" block is at the end of vfs_default.c. The file size is
> 25 * PAGE_SIZE + 16. It is in 7 16K blocks, 2 full 2K frags, and 1 frag
> with 16 bytes valid in it.
But the ffs_getpages() might be indeed the culprit. It calls
vm_page_zero_invalid(), which only has DEV_BSIZE granularity. I think
that ffs_getpages() also should zero the after eof part of the last page
of the file to fix your damage, since device read cannot read less than
DEV_BSIZE.
diff --git a/sys/ufs/ffs/ffs_vnops.c b/sys/ufs/ffs/ffs_vnops.c
index ef6194c..4240b78 100644
--- a/sys/ufs/ffs/ffs_vnops.c
+++ b/sys/ufs/ffs/ffs_vnops.c
@@ -844,9 +844,9 @@ static int
ffs_getpages(ap)
struct vop_getpages_args *ap;
{
- int i;
vm_page_t mreq;
- int pcount;
+ uint64_t size;
+ int i, pcount;
pcount = round_page(ap->a_count) / PAGE_SIZE;
mreq = ap->a_m[ap->a_reqpage];
@@ -861,6 +861,9 @@ ffs_getpages(ap)
if (mreq->valid) {
if (mreq->valid != VM_PAGE_BITS_ALL)
vm_page_zero_invalid(mreq, TRUE);
+ size = VTOI(ap->a_vp)->i_size;
+ if (mreq->pindex == OFF_TO_IDX(size))
+ pmap_zero_page_area(mreq, size & PAGE_MASK, PAGE_SIZE);
for (i = 0; i < pcount; i++) {
if (i != ap->a_reqpage) {
vm_page_lock(ap->a_m[i]);
On the other hand, it is not clear should we indeed protect against such
case, or just declare the disk data broken.
>
> I have another problem that is apparently with
> vnode_pager_generic_getpages() and now affects -current from about a
> year ago in an identical way with the old version: mmap() is very slow
> in msdosfs. cmp uses mmap() too much, and reading files sequentially
> using mmap() is 3.4 times slower than reading them using read() on my
> DVD media/drive. The i/o seems to be correctly clustered for both.
> with average transaction sizes over 50K but tps much lower for mmap().
> Similarly on a (faster) hard disk except the slowness is not as noticeable
> (drive buffering might hide it completely). However, for ffs files on
> the hard disk, mmap() is as fast as read().
diff --git a/sys/kern/vfs_bio.c b/sys/kern/vfs_bio.c
index 6393399..83d3609 100644
--- a/sys/kern/vfs_bio.c
+++ b/sys/kern/vfs_bio.c
@@ -3704,8 +4070,7 @@ vfs_bio_set_valid(struct buf *bp, int base, int size)
void
vfs_bio_clrbuf(struct buf *bp)
{
- int i, j, mask;
- caddr_t sa, ea;
+ int i, j, mask, sa, ea, slide;
if ((bp->b_flags & (B_VMIO | B_MALLOC)) != B_VMIO) {
clrbuf(bp);
@@ -3723,39 +4088,69 @@ vfs_bio_clrbuf(struct buf *bp)
if ((bp->b_pages[0]->valid & mask) == mask)
goto unlock;
if ((bp->b_pages[0]->valid & mask) == 0) {
- bzero(bp->b_data, bp->b_bufsize);
+ pmap_zero_page_area(bp->b_pages[0], 0, bp->b_bufsize);
bp->b_pages[0]->valid |= mask;
goto unlock;
}
}
- ea = sa = bp->b_data;
- for(i = 0; i < bp->b_npages; i++, sa = ea) {
- ea = (caddr_t)trunc_page((vm_offset_t)sa + PAGE_SIZE);
- ea = (caddr_t)(vm_offset_t)ulmin(
- (u_long)(vm_offset_t)ea,
- (u_long)(vm_offset_t)bp->b_data + bp->b_bufsize);
+ sa = bp->b_offset & PAGE_MASK;
+ slide = 0;
+ for (i = 0; i < bp->b_npages; i++) {
+ slide = imin(slide + PAGE_SIZE, bp->b_bufsize + sa);
+ ea = slide & PAGE_MASK;
+ if (ea == 0)
+ ea = PAGE_SIZE;
if (bp->b_pages[i] == bogus_page)
continue;
- j = ((vm_offset_t)sa & PAGE_MASK) / DEV_BSIZE;
+ j = sa / DEV_BSIZE;
mask = ((1 << ((ea - sa) / DEV_BSIZE)) - 1) << j;
VM_OBJECT_LOCK_ASSERT(bp->b_pages[i]->object, MA_OWNED);
if ((bp->b_pages[i]->valid & mask) == mask)
continue;
if ((bp->b_pages[i]->valid & mask) == 0)
- bzero(sa, ea - sa);
+ pmap_zero_page_area(bp->b_pages[i], sa, ea - sa);
else {
for (; sa < ea; sa += DEV_BSIZE, j++) {
- if ((bp->b_pages[i]->valid & (1 << j)) == 0)
- bzero(sa, DEV_BSIZE);
+ if ((bp->b_pages[i]->valid & (1 << j)) == 0) {
+ pmap_zero_page_area(bp->b_pages[i],
+ sa, DEV_BSIZE);
+ }
}
}
bp->b_pages[i]->valid |= mask;
+ sa = 0;
}
unlock:
VM_OBJECT_UNLOCK(bp->b_bufobj->bo_object);
bp->b_resid = 0;
}
+void
+vfs_bio_bzero_buf(struct buf *bp, int base, int size)
+{
+ vm_page_t m;
+ int i, n;
+
+ if ((bp->b_flags & B_UNMAPPED) == 0) {
+ BUF_CHECK_MAPPED(bp);
+ bzero(bp->b_data + base, size);
+ } else {
+ BUF_CHECK_UNMAPPED(bp);
+ n = PAGE_SIZE - (base & PAGE_MASK);
+ VM_OBJECT_LOCK(bp->b_bufobj->bo_object);
+ for (i = base / PAGE_SIZE; size > 0 && i < bp->b_npages; ++i) {
+ m = bp->b_pages[i];
+ if (n > size)
+ n = size;
+ pmap_zero_page_area(m, base & PAGE_MASK, n);
+ base += n;
+ size -= n;
+ n = PAGE_SIZE;
+ }
+ VM_OBJECT_UNLOCK(bp->b_bufobj->bo_object);
+ }
+}
+
/*
* vm_hold_load_pages and vm_hold_free_pages get pages into
* a buffers address space. The pages are anonymous and are
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 834 bytes
Desc: not available
URL: <http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20130217/eed14663/attachment.sig>
More information about the freebsd-fs
mailing list