Harald Schmalzbauer
2014-Oct-23 18:53 UTC
sa(4) 9.2->10.1, nsa0.0: request ptr 0x803135040 is not on a page boundary; cannot split request
Hello, I read about the changes in sa(4) regarding large-block-split changes and the transitional 'kern.cam.sa.allow_io_split' workarround. I'm using bacula (7.0.5) and my previous neccessarry multi-blocking adjustmets like "Minimum block size = 2097152" obviously didn't work with FreebSD 10.1 anymore. Good news is, they are not needed any more! With the default of 126 blocks (64512) I get 60-140MB/s with btape(8)'s speed test on my LTO4 (HH) drive and another quick test showed that using mbuffer(1) for zfs(8) 'send' isn't needed anymore (| dd of=/dev/nsa0 bs=64512 seems to max out LTO4 speed). [with FreeBSD 9 the transfer rates were some magnitudes lower with these block size settings!] Not so good news is, that bacula can't read the tape's label. 'Labeling a tape (with 'label' at bconsole(8) or btape(8)) is successful, and btape(8)'s 'readlabel' partially displays the correct label, but not the very beginning of the label: Volume Label: Id : **error**VerNo ?rest OK While it should read: Volume Label: Id : Bacula 1.0 immortal VerNo : 11 ? When btape(8) starts to read the label, the _subject's error is reported_: *nsa0.0: request ptr 0x803135040 is not on a page boundary; cannot split request* The same error show up if I configure bacula to use a fixed block size of kern.cam.sa.0.maxio (131072). Like expected, allowing split (with kern.cam.sa.allow_io_split in loader.conf) works arround that problem. But I'd like to understand why I cannot set kern.cam.sa.0.maxio resp. why btape(8) doesn't work 100% correct although blocksize < sa.0.maxio I don't have enough understanding to check the code myself, if it's a cam/sa(4) issue in FreeBSD or a problem in btape(8) (and also bacula itself, most likely the tool shares the code with bacula's storage deamon). Any hints highly appreciated! Thanks, -Harry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 196 bytes Desc: OpenPGP digital signature URL: <http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20141023/f0695724/attachment.sig>
Kenneth D. Merry
2014-Oct-24 23:07 UTC
sa(4) 9.2->10.1, nsa0.0: request ptr 0x803135040 is not on a page boundary; cannot split request
On Thu, Oct 23, 2014 at 20:53:06 +0200, Harald Schmalzbauer wrote:> Hello, > > I read about the changes in sa(4) regarding large-block-split changes > and the transitional 'kern.cam.sa.allow_io_split' workarround. > > I'm using bacula (7.0.5) and my previous neccessarry multi-blocking > adjustmets like "Minimum block size = 2097152" obviously didn't work > with FreebSD 10.1 anymore. > Good news is, they are not needed any more! > With the default of 126 blocks (64512) I get 60-140MB/s with btape(8)'s > speed test on my LTO4 (HH) drive and another quick test showed that > using mbuffer(1) for zfs(8) 'send' isn't needed anymore (| dd > of=/dev/nsa0 bs=64512 seems to max out LTO4 speed). [with FreeBSD 9 the > transfer rates were some magnitudes lower with these block size settings!] > > Not so good news is, that bacula can't read the tape's label. > 'Labeling a tape (with 'label' at bconsole(8) or btape(8)) is > successful, and btape(8)'s 'readlabel' partially displays the correct > label, but not the very beginning of the label: > Volume Label: > Id : **error**VerNo > ?rest OK > > While it should read: > Volume Label: > Id : Bacula 1.0 immortal > VerNo : 11 > ? > > When btape(8) starts to read the label, the _subject's error is reported_: > *nsa0.0: request ptr 0x803135040 is not on a page boundary; cannot split > request*What blocksize are you using with btape(8)? What kind of controller are you using? The reason you get that error message is that the sa(4) driver goes through physio(9) to get buffers from userland into the kernel. physio(9) relies on the vmapbuf()/vunmapbuf() routines to map buffers in and out of the kernel. vmapbuf() operates with a page granularity. The address to be mapped has to start on a page boundary. It also uses kernel virtual address segments that are MAXPHYS in size. On x86 boxes at least, MAXPHYS is 128KB. So if you use a blocksize of 128KB, but pass in a pointer that doesn't start on a page boundary, vmapbuf() will have to map 33 pages instead of 32. In your case, it will have to start at page address 0x803135000, and will need 33 4KB pages, which is greater than 128KB. This behavior obviously isn't very user friendly. If you want to avoid the problem, try setting your blocksize in Bacula to 4K less than what is reported in kern.cam.sa.0.maxio. If it's 131072, then set the blocksize to 126976. Another way to avoid the problem is to increase MAXPHYS. Increasing it beyond kern.cam.sa.0.cpi_maxio won't help anything. If you increase it too much, you can run into other problems. That said, though, you can probably bump it to 512K without much worry. Put this in your kernel config file and recompile/reinstall your kernel: options MAXPHYS="(512*1024)" options DFLTPHYS="(512*1024)" The same thing applies, though -- you'll want to set your blocksize to 1 page less than kern.cam.sa.0.maxio, since Bacula isn't using page-aligned buffers.> The same error show up if I configure bacula to use a fixed block size > of kern.cam.sa.0.maxio (131072).At that (i.e. the physio(9)) level, variable vs. fixed block mode won't matter.> Like expected, allowing split (with kern.cam.sa.allow_io_split in > loader.conf) works arround that problem. > But I'd like to understand why I cannot set kern.cam.sa.0.maxio resp. > why btape(8) doesn't work 100% correct although blocksize < sa.0.maxioSee above. The unfortunate thing is that with the above setup, I think you'll wind up with a bigger block and then a smaller block going onto the tape in variable block mode at least. This is an example of why I/O splitting is bad -- you don't have good visibility from userland into exactly how things are getting put on tape. The application writes out what it wants, but it doesn't know what size blocks are hitting the tape.> I don't have enough understanding to check the code myself, if it's a > cam/sa(4) issue in FreeBSD or a problem in btape(8) (and also bacula > itself, most likely the tool shares the code with bacula's storage deamon). > > Any hints highly appreciated!I have considered implementing a custom read/write routine in the sa(4) driver to get around some of these issues, but it will require more than just sa(4) driver modifications for everything to work optimally. With a custom read/write routine, if we copied data into the kernel, we could essentially allow any I/O size that the controller and tape drive support without altering MAXPHYS. And alignment issues wouldn't matter, either. The drawback is that we wouldn't be able to do unmapped I/O for drivers that support it. (Unless the user happened to give us a single buffer that we could send down as an unmapped I/O.) The unmapped I/O code doesn't currently handle scatter/gather lists of unmapped buffers. Another drawback to copying is the increased overhead of versus unmapped I/O. Although on modern hardware, copying is usually more efficient than mapping user memory into the kernel's virtual address space, because of the TLB shootdowns that happen with the mapping operation. For tape users with just one tape drive, the overhead wouldn't be a big deal. If you have lots of tape drives attached to one machine, though, it could have a noticable effect. Ken -- Kenneth Merry ken at FreeBSD.ORG