Bhyve storage improvements
lists at jnielsen.net
Fri Mar 27 20:37:36 UTC 2015
On Mar 27, 2015, at 11:43 AM, Alexander Motin <mav at freebsd.org> wrote:
> On 27.03.2015 18:47, John Nielsen wrote:
>> Does anyone have plans (or know about any) to implement virtio-scsi support in bhyve? That API does support TRIM and should retain most or all of the low-overhead virtio goodness.
> I was thinking about that (not really a plans yet, just some thoughts),
> but haven't found a good motivation and understanding of whole possible
> I am not sure it worth to emulate SCSI protocol in addition to already
> done ATA in ahci-hd and simple block in virtio-blk just to get another,
> possibly faster then AHCI, block storage with TRIM/UNMAP. Really good
> SCSI disk emulation in CTL in kernel takes about 20K lines of code. It
> is pointless to duplicate it, and may be complicated for administration
> to just interface to it. Indeed I've seen virtio-blk being faster then
> ahci-hd in some tests, but those tests were highly synthetic. I haven't
> tested it on real workloads, but I have feeling that real difference may
> be not that large. If somebody wants to check -- more benchmarks are
> highly welcome! From the theoretical side I'd like to notice that both
> ATA and SCSI protocols on guests go through additional ATA/SCSI
> infrastructure (CAM in FreeBSD), absent in case pure block virtio-blk,
> so they have some more overhead by definition.
Agreed, more testing is needed to see how big an effect having TRIM remain dependent on AHCI emulation would have on performance.
> Main potential benefit I see from using virtio-scsi is a possibility to
> pass through to client not a block device, but some real SCSI device. It
> can be some local DVD writer, or remote iSCSI storage. The last would be
> especially interesting for large production installations. But the main
> problem I see here is booting. To make user-level loader boot the kernel
> from DVD or iSCSI, bhyve has to implement its own SCSI initiator, like
> small second copy of CAM in user-level. Booting kernel from some other
> local block storage and then attaching to remote iSCSI storage for data
> can be much easier, but it is not convenient. It is possible to nt
> connect to iSCSI directly from user-level, but to make kernel CAM do it,
> and then make CAM provide both block layer for booting and SCSI layer
> for virtio-scsi, but I am not sure that it is very good from security
> point to make host system to see virtual disks. Though may be it could
> work if CAM could block kernel/GEOM access to them, alike it is done for
> ZVOLs now, supporting "geom" and "dev" modes. Though that complicates
> CAM and the whole infrastructure.
Yes, pass-through of disk devices opens up a number of possibilities. Would it be feasible to just have bhyve broker between a pass(4) device on the host and virtio_scsi(4) in the guest? That would require the guest devices (be they local disks, iSCSI LUNs, etc) be connected to the host but I'm not sure that's a huge concern. The host will always have a high level of access to the guest's data. (Plus, there's nothing preventing a guest from doing its own iSCSI, etc. after it boots). Using the existing kernel infrastructure (CAM, iSCSI initiator, etc) would also remove the need to duplicate any of that in userland, wouldn't it?
The user-level loader is necessary for now but once UEFI support exists in bhyve the external loader can go away. Any workarounds like you've described above would similarly be temporary.
Using Qemu+KVM on Linux as a comparison point, there are examples of both kernel-level and user-level access by the host to guest disks. Local disk images (be they raw or qcow2) are obviously manipulated by the Qemu process from userland. RBD (Ceph/RADOS network block device) is in userland. SRP (SCSI RDMA Protocol) is in kernel. There are a few ways to do host- and/or kernel-based iSCSI. There is also a userland option if you link Qemu against libiscsi when you build it. If we do ever want userland iSCSI support, libiscsi does claim to be "pure POSIX" and to have been tested on FreeBSD, among others.
More information about the freebsd-virtualization