sa(4) 9.2->10.1, nsa0.0: request ptr 0x803135040 is not on a page boundary; cannot split request

Kenneth D. Merry ken at FreeBSD.ORG
Fri Oct 24 23:07:30 UTC 2014


On Thu, Oct 23, 2014 at 20:53:06 +0200, Harald Schmalzbauer wrote:
>  Hello,
> 
> I read about the changes in sa(4) regarding large-block-split changes
> and the transitional 'kern.cam.sa.allow_io_split' workarround.
> 
> I'm using bacula (7.0.5) and my previous neccessarry multi-blocking
> adjustmets like "Minimum block size = 2097152" obviously didn't work
> with FreebSD 10.1 anymore.
> Good news is, they are not needed any more!
> With the default of 126 blocks (64512) I get 60-140MB/s with btape(8)'s
> speed test on my LTO4 (HH) drive and another quick test showed that
> using mbuffer(1) for zfs(8) 'send' isn't needed anymore (| dd
> of=/dev/nsa0 bs=64512 seems to max out LTO4 speed). [with FreeBSD 9 the
> transfer rates were some magnitudes lower with these block size settings!]
> 
> Not so good news is, that bacula can't read the tape's label.
> 'Labeling a tape (with 'label' at bconsole(8) or btape(8)) is
> successful, and btape(8)'s 'readlabel' partially displays the correct
> label, but not the very beginning of the label:
> Volume Label:
> Id : **error**VerNo
> ?rest OK
> 
> While it should read:
> Volume Label:
> Id : Bacula 1.0 immortal
> VerNo : 11
> ?
> 
> When btape(8) starts to read the label, the _subject's error is reported_:
> *nsa0.0: request ptr 0x803135040 is not on a page boundary; cannot split
> request*

What blocksize are you using with btape(8)?

What kind of controller are you using?

The reason you get that error message is that the sa(4) driver goes through
physio(9) to get buffers from userland into the kernel.  physio(9) relies
on the vmapbuf()/vunmapbuf() routines to map buffers in and out of the
kernel.

vmapbuf() operates with a page granularity.  The address to be mapped has
to start on a page boundary.  It also uses kernel virtual address segments
that are MAXPHYS in size.  On x86 boxes at least, MAXPHYS is 128KB.

So if you use a blocksize of 128KB, but pass in a pointer that doesn't
start on a page boundary, vmapbuf() will have to map 33 pages instead of
32.  In your case, it will have to start at page address 0x803135000, and
will need 33 4KB pages, which is greater than 128KB.

This behavior obviously isn't very user friendly. 

If you want to avoid the problem, try setting your blocksize in Bacula to
4K less than what is reported in kern.cam.sa.0.maxio.  If it's 131072, then
set the blocksize to 126976.

Another way to avoid the problem is to increase MAXPHYS.  Increasing it
beyond kern.cam.sa.0.cpi_maxio won't help anything.  If you increase
it too much, you can run into other problems.

That said, though, you can probably bump it to 512K without much worry.
Put this in your kernel config file and recompile/reinstall your kernel:

options         MAXPHYS="(512*1024)"
options         DFLTPHYS="(512*1024)"

The same thing applies, though -- you'll want to set your blocksize to 1
page less than kern.cam.sa.0.maxio, since Bacula isn't using page-aligned
buffers.

> The same error show up if I configure bacula to use a fixed block size
> of kern.cam.sa.0.maxio (131072).

At that (i.e. the physio(9)) level, variable vs. fixed block mode won't
matter.

> Like expected, allowing split (with kern.cam.sa.allow_io_split in
> loader.conf) works arround that problem.
> But I'd like to understand why I cannot set kern.cam.sa.0.maxio resp.
> why btape(8) doesn't work 100% correct although blocksize < sa.0.maxio

See above.  The unfortunate thing is that with the above setup, I think
you'll wind up with a bigger block and then a smaller block going onto the
tape in variable block mode at least.

This is an example of why I/O splitting is bad -- you don't have good
visibility from userland into exactly how things are getting put on tape.
The application writes out what it wants, but it doesn't know what size
blocks are hitting the tape.

> I don't have enough understanding to check the code myself, if it's a
> cam/sa(4) issue in FreeBSD or a problem in btape(8) (and also bacula
> itself, most likely the tool shares the code with bacula's storage deamon).
> 
> Any hints highly appreciated!

I have considered implementing a custom read/write routine in the sa(4)
driver to get around some of these issues, but it will require more than
just sa(4) driver modifications for everything to work optimally.

With a custom read/write routine, if we copied data into the kernel, we
could essentially allow any I/O size that the controller and tape drive
support without altering MAXPHYS.  And alignment issues wouldn't matter,
either.

The drawback is that we wouldn't be able to do unmapped I/O for drivers
that support it.  (Unless the user happened to give us a single buffer that
we could send down as an unmapped I/O.)  The unmapped I/O code doesn't
currently handle scatter/gather lists of unmapped buffers.

Another drawback to copying is the increased overhead of versus unmapped
I/O.  Although on modern hardware, copying is usually more efficient than
mapping user memory into the kernel's virtual address space, because of the
TLB shootdowns that happen with the mapping operation.

For tape users with just one tape drive, the overhead wouldn't be a big
deal.  If you have lots of tape drives attached to one machine, though, it
could have a noticable effect.

Ken
-- 
Kenneth Merry
ken at FreeBSD.ORG


More information about the freebsd-stable mailing list