On Thu, 2015-09-10 at 15:38 +0100, Stefan Hajnoczi wrote:> On Thu, Sep 10, 2015 at 6:48 AM, Ming Lin <mlin at kernel.org> wrote: > > These 2 patches added virtio-nvme to kernel and qemu, > > basically modified from virtio-blk and nvme code. > > > > As title said, request for your comments. > > > > Play it in Qemu with: > > -drive file=disk.img,format=raw,if=none,id=D22 \ > > -device virtio-nvme-pci,drive=D22,serial=1234,num_queues=4 > > > > The goal is to have a full NVMe stack from VM guest(virtio-nvme) > > to host(vhost_nvme) to LIO NVMe-over-fabrics target. > > Why is a virtio-nvme guest device needed? I guess there must either > be NVMe-only features that you want to pass through, or you think the > performance will be significantly better than virtio-blk/virtio-scsi?It simply passes through NVMe commands. Right now performance is poor. Performance tunning is on my todo list. It should be as good as virtio-blk/virtio-scsi.> > At first glance it seems like the virtio_nvme guest driver is just > another block driver like virtio_blk, so I'm not clear why a > virtio-nvme device makes sense.I think the future "LIO NVMe target" only speaks NVMe protocol. Nick(CCed), could you correct me if I'm wrong? For SCSI stack, we have: virtio-scsi(guest) tcm_vhost(or vhost_scsi, host) LIO-scsi-target For NVMe stack, we'll have similar components: virtio-nvme(guest) vhost_nvme(host) LIO-NVMe-target> > > Now there are lots of duplicated code with linux/nvme-core.c and qemu/nvme.c. > > The ideal result is to have a multi level NVMe stack(similar as SCSI). > > So we can re-use the nvme code, for example > > > > .-------------------------. > > | NVMe device register | > > Upper level | NVMe protocol process | > > | | > > '-------------------------' > > > > > > > > .-----------. .-----------. .------------------. > > Lower level | PCIe | | VIRTIO | |NVMe over Fabrics | > > | | | | |initiator | > > '-----------' '-----------' '------------------' > > You mentioned LIO and SCSI. How will NVMe over Fabrics be integrated > into LIO? If it is mapped to SCSI then using virtio_scsi in the guest > and tcm_vhost should work.I think it's not mapped to SCSI. Nick, would you share more here?> > Please also post virtio draft specifications documenting the virtio device.I'll do this later.> > Stefan
On Thu, Sep 10, 2015 at 6:28 PM, Ming Lin <mlin at kernel.org> wrote:> On Thu, 2015-09-10 at 15:38 +0100, Stefan Hajnoczi wrote: >> On Thu, Sep 10, 2015 at 6:48 AM, Ming Lin <mlin at kernel.org> wrote: >> > These 2 patches added virtio-nvme to kernel and qemu, >> > basically modified from virtio-blk and nvme code. >> > >> > As title said, request for your comments. >> > >> > Play it in Qemu with: >> > -drive file=disk.img,format=raw,if=none,id=D22 \ >> > -device virtio-nvme-pci,drive=D22,serial=1234,num_queues=4 >> > >> > The goal is to have a full NVMe stack from VM guest(virtio-nvme) >> > to host(vhost_nvme) to LIO NVMe-over-fabrics target. >> >> Why is a virtio-nvme guest device needed? I guess there must either >> be NVMe-only features that you want to pass through, or you think the >> performance will be significantly better than virtio-blk/virtio-scsi? > > It simply passes through NVMe commands.I understand that. My question is why the guest needs to send NVMe commands? If the virtio_nvme.ko guest driver only sends read/write/flush then there's no advantage over virtio-blk. There must be something you are trying to achieve which is not possible with virtio-blk or virtio-scsi. What is that? Stefan
On Fri, 2015-09-11 at 08:48 +0100, Stefan Hajnoczi wrote:> On Thu, Sep 10, 2015 at 6:28 PM, Ming Lin <mlin at kernel.org> wrote: > > On Thu, 2015-09-10 at 15:38 +0100, Stefan Hajnoczi wrote: > >> On Thu, Sep 10, 2015 at 6:48 AM, Ming Lin <mlin at kernel.org> wrote: > >> > These 2 patches added virtio-nvme to kernel and qemu, > >> > basically modified from virtio-blk and nvme code. > >> > > >> > As title said, request for your comments. > >> > > >> > Play it in Qemu with: > >> > -drive file=disk.img,format=raw,if=none,id=D22 \ > >> > -device virtio-nvme-pci,drive=D22,serial=1234,num_queues=4 > >> > > >> > The goal is to have a full NVMe stack from VM guest(virtio-nvme) > >> > to host(vhost_nvme) to LIO NVMe-over-fabrics target. > >> > >> Why is a virtio-nvme guest device needed? I guess there must either > >> be NVMe-only features that you want to pass through, or you think the > >> performance will be significantly better than virtio-blk/virtio-scsi? > > > > It simply passes through NVMe commands. > > I understand that. My question is why the guest needs to send NVMe commands? > > If the virtio_nvme.ko guest driver only sends read/write/flush then > there's no advantage over virtio-blk. > > There must be something you are trying to achieve which is not > possible with virtio-blk or virtio-scsi. What is that?I actually learned from your virtio-scsi work. http://www.linux-kvm.org/images/f/f5/2011-forum-virtio-scsi.pdf Then I thought a full NVMe stack from guest to host to target seems reasonable. Trying to achieve similar things as virtio-scsi, but all NVMe protocol. - Effective NVMe passthrough - Multiple target choices: QEMU, LIO-NVMe(vhost_nvme) - Almost unlimited scalability. Thousands of namespaces per PCI device - True NVMe device - End-to-end Protection Information - ....
Hi Ming & Co, On Thu, 2015-09-10 at 10:28 -0700, Ming Lin wrote:> On Thu, 2015-09-10 at 15:38 +0100, Stefan Hajnoczi wrote: > > On Thu, Sep 10, 2015 at 6:48 AM, Ming Lin <mlin at kernel.org> wrote: > > > These 2 patches added virtio-nvme to kernel and qemu, > > > basically modified from virtio-blk and nvme code. > > > > > > As title said, request for your comments.<SNIP>> > > > At first glance it seems like the virtio_nvme guest driver is just > > another block driver like virtio_blk, so I'm not clear why a > > virtio-nvme device makes sense. > > I think the future "LIO NVMe target" only speaks NVMe protocol. > > Nick(CCed), could you correct me if I'm wrong? > > For SCSI stack, we have: > virtio-scsi(guest) > tcm_vhost(or vhost_scsi, host) > LIO-scsi-target > > For NVMe stack, we'll have similar components: > virtio-nvme(guest) > vhost_nvme(host) > LIO-NVMe-target >I think it's more interesting to consider a 'vhost style' driver that can be used with unmodified nvme host OS drivers. Dr. Hannes (CC'ed) had done something like this for megasas a few years back using specialized QEMU emulation + eventfd based LIO fabric driver, and got it working with Linux + MSFT guests. Doing something similar for nvme would (potentially) be on par with current virtio-scsi+vhost-scsi small-block performance for scsi-mq guests, without the extra burden of a new command set specific virtio driver.> > > > > Now there are lots of duplicated code with linux/nvme-core.c and qemu/nvme.c. > > > The ideal result is to have a multi level NVMe stack(similar as SCSI). > > > So we can re-use the nvme code, for example > > > > > > .-------------------------. > > > | NVMe device register | > > > Upper level | NVMe protocol process | > > > | | > > > '-------------------------' > > > > > > > > > > > > .-----------. .-----------. .------------------. > > > Lower level | PCIe | | VIRTIO | |NVMe over Fabrics | > > > | | | | |initiator | > > > '-----------' '-----------' '------------------' > > > > You mentioned LIO and SCSI. How will NVMe over Fabrics be integrated > > into LIO? If it is mapped to SCSI then using virtio_scsi in the guest > > and tcm_vhost should work. > > I think it's not mapped to SCSI. > > Nick, would you share more here? >(Adding Dave M. CC') So NVMe target code needs to function in at least two different modes: - Direct mapping of nvme backend driver provided hw queues to nvme fabric driver provided hw queues. - Decoding of NVMe command set for basic Read/Write/Flush I/O for submission to existing backend drivers (eg: iblock, fileio, rd_mcp) With the former case, it's safe to assumes there to be anywhere from a very small amount of code involved, to no code involved for fast-path operation. For more involved logic like PR, ALUA, and EXTENDED_COPY, I think both modes will still mostly likely handle some aspects of this in software, and not entirely behind a backend nvme host hw interface. --nab
On Wed, Sep 16, 2015 at 11:10 PM, Nicholas A. Bellinger <nab at linux-iscsi.org> wrote:> Hi Ming & Co,Hi Nic,> > On Thu, 2015-09-10 at 10:28 -0700, Ming Lin wrote: >> On Thu, 2015-09-10 at 15:38 +0100, Stefan Hajnoczi wrote: >> > On Thu, Sep 10, 2015 at 6:48 AM, Ming Lin <mlin at kernel.org> wrote: >> > > These 2 patches added virtio-nvme to kernel and qemu, >> > > basically modified from virtio-blk and nvme code. >> > > >> > > As title said, request for your comments. > > <SNIP> > >> > >> > At first glance it seems like the virtio_nvme guest driver is just >> > another block driver like virtio_blk, so I'm not clear why a >> > virtio-nvme device makes sense. >> >> I think the future "LIO NVMe target" only speaks NVMe protocol. >> >> Nick(CCed), could you correct me if I'm wrong? >> >> For SCSI stack, we have: >> virtio-scsi(guest) >> tcm_vhost(or vhost_scsi, host) >> LIO-scsi-target >> >> For NVMe stack, we'll have similar components: >> virtio-nvme(guest) >> vhost_nvme(host) >> LIO-NVMe-target >> > > I think it's more interesting to consider a 'vhost style' driver that > can be used with unmodified nvme host OS drivers. > > Dr. Hannes (CC'ed) had done something like this for megasas a few years > back using specialized QEMU emulation + eventfd based LIO fabric driver, > and got it working with Linux + MSFT guests.Are the patches already in qemu upstream and LIO upstream? I found you played it in 2010. Is it? [QEMU-KVM]: Megasas + TCM_Loop + SG_IO into Windows XP guests https://groups.google.com/forum/#!topic/linux-iscsi-target-dev/3hdaI6H3X0A> > Doing something similar for nvme would (potentially) be on par with > current virtio-scsi+vhost-scsi small-block performance for scsi-mq > guests, without the extra burden of a new command set specific virtio > driver.
On Wed, 2015-09-16 at 23:10 -0700, Nicholas A. Bellinger wrote:> Hi Ming & Co, > > On Thu, 2015-09-10 at 10:28 -0700, Ming Lin wrote: > > On Thu, 2015-09-10 at 15:38 +0100, Stefan Hajnoczi wrote: > > > On Thu, Sep 10, 2015 at 6:48 AM, Ming Lin <mlin at kernel.org> wrote: > > > > These 2 patches added virtio-nvme to kernel and qemu, > > > > basically modified from virtio-blk and nvme code. > > > > > > > > As title said, request for your comments. > > <SNIP> > > > > > > > At first glance it seems like the virtio_nvme guest driver is just > > > another block driver like virtio_blk, so I'm not clear why a > > > virtio-nvme device makes sense. > > > > I think the future "LIO NVMe target" only speaks NVMe protocol. > > > > Nick(CCed), could you correct me if I'm wrong? > > > > For SCSI stack, we have: > > virtio-scsi(guest) > > tcm_vhost(or vhost_scsi, host) > > LIO-scsi-target > > > > For NVMe stack, we'll have similar components: > > virtio-nvme(guest) > > vhost_nvme(host) > > LIO-NVMe-target > > > > I think it's more interesting to consider a 'vhost style' driver that > can be used with unmodified nvme host OS drivers. > > Dr. Hannes (CC'ed) had done something like this for megasas a few years > back using specialized QEMU emulation + eventfd based LIO fabric driver, > and got it working with Linux + MSFT guests. > > Doing something similar for nvme would (potentially) be on par with > current virtio-scsi+vhost-scsi small-block performance for scsi-mq > guests, without the extra burden of a new command set specific virtio > driver.Trying to understand it. Is it like below? .------------------------. MMIO .---------------------------------------. | Guest |--------> | Qemu | | Unmodified NVMe driver |<-------- | NVMe device simulation(eventfd based) | '------------------------' '---------------------------------------' | ^ write NVMe | | notify command command | | completion to eventfd | | to eventfd v | .--------------------------------------. | Host: | | eventfd based LIO NVMe fabric driver | '--------------------------------------' | | nvme_queue_rq() v .--------------------------------------. | NVMe driver | '--------------------------------------' | | v .-------------------------------------. | NVMe device | '-------------------------------------'