On Thu, 2015-09-17 at 17:55 -0700, Nicholas A. Bellinger wrote:> On Thu, 2015-09-17 at 16:31 -0700, Ming Lin wrote: > > On Wed, 2015-09-16 at 23:10 -0700, Nicholas A. Bellinger wrote: > > > Hi Ming & Co, > > > > > > On Thu, 2015-09-10 at 10:28 -0700, Ming Lin wrote: > > > > On Thu, 2015-09-10 at 15:38 +0100, Stefan Hajnoczi wrote: > > > > > On Thu, Sep 10, 2015 at 6:48 AM, Ming Lin <mlin at kernel.org> wrote: > > > > > > These 2 patches added virtio-nvme to kernel and qemu, > > > > > > basically modified from virtio-blk and nvme code. > > > > > > > > > > > > As title said, request for your comments. > > > > > > <SNIP> > > > > > > > > > > > > > At first glance it seems like the virtio_nvme guest driver is just > > > > > another block driver like virtio_blk, so I'm not clear why a > > > > > virtio-nvme device makes sense. > > > > > > > > I think the future "LIO NVMe target" only speaks NVMe protocol. > > > > > > > > Nick(CCed), could you correct me if I'm wrong? > > > > > > > > For SCSI stack, we have: > > > > virtio-scsi(guest) > > > > tcm_vhost(or vhost_scsi, host) > > > > LIO-scsi-target > > > > > > > > For NVMe stack, we'll have similar components: > > > > virtio-nvme(guest) > > > > vhost_nvme(host) > > > > LIO-NVMe-target > > > > > > > > > > I think it's more interesting to consider a 'vhost style' driver that > > > can be used with unmodified nvme host OS drivers. > > > > > > Dr. Hannes (CC'ed) had done something like this for megasas a few years > > > back using specialized QEMU emulation + eventfd based LIO fabric driver, > > > and got it working with Linux + MSFT guests. > > > > > > Doing something similar for nvme would (potentially) be on par with > > > current virtio-scsi+vhost-scsi small-block performance for scsi-mq > > > guests, without the extra burden of a new command set specific virtio > > > driver. > > > > Trying to understand it. > > Is it like below? > > > > .------------------------. MMIO .---------------------------------------. > > | Guest |--------> | Qemu | > > | Unmodified NVMe driver |<-------- | NVMe device simulation(eventfd based) | > > '------------------------' '---------------------------------------' > > | ^ > > write NVMe | | notify command > > command | | completion > > to eventfd | | to eventfd > > v | > > .--------------------------------------. > > | Host: | > > | eventfd based LIO NVMe fabric driver | > > '--------------------------------------' > > | > > | nvme_queue_rq() > > v > > .--------------------------------------. > > | NVMe driver | > > '--------------------------------------' > > | > > | > > v > > .-------------------------------------. > > | NVMe device | > > '-------------------------------------' > > > > Correct. The LIO driver on KVM host would be handling some amount of > NVMe host interface emulation in kernel code, and would be able to > decode nvme Read/Write/Flush operations and translate -> submit to > existing backend drivers.Let me call the "eventfd based LIO NVMe fabric driver" as "tcm_eventfd_nvme" Currently, LIO frontend driver(iscsi, fc, vhost-scsi etc) talk to LIO backend driver(fileio, iblock etc) with SCSI commands. Did you mean the "tcm_eventfd_nvme" driver need to translate NVMe commands to SCSI commands and then submit to backend driver? But I thought the future "LIO NVMe target" can support frontend driver talk to backend driver directly with NVMe commands without translation. Am I wrong?> > As with the nvme-over-fabric case, it would be possible to do a mapping > between backend driver queue resources for real NVMe hardware (eg: > target_core_nvme), but since it would still be doing close to the same > amount of software emulation for both backend driver cases, I wouldn't > expect there to be much performance advantage over just using normal > submit_bio(). > > --nab >
On Fri, 2015-09-18 at 11:12 -0700, Ming Lin wrote:> On Thu, 2015-09-17 at 17:55 -0700, Nicholas A. Bellinger wrote: > > On Thu, 2015-09-17 at 16:31 -0700, Ming Lin wrote: > > > On Wed, 2015-09-16 at 23:10 -0700, Nicholas A. Bellinger wrote: > > > > Hi Ming & Co,<SNIP>> > > > > I think the future "LIO NVMe target" only speaks NVMe protocol. > > > > > > > > > > Nick(CCed), could you correct me if I'm wrong? > > > > > > > > > > For SCSI stack, we have: > > > > > virtio-scsi(guest) > > > > > tcm_vhost(or vhost_scsi, host) > > > > > LIO-scsi-target > > > > > > > > > > For NVMe stack, we'll have similar components: > > > > > virtio-nvme(guest) > > > > > vhost_nvme(host) > > > > > LIO-NVMe-target > > > > > > > > > > > > > I think it's more interesting to consider a 'vhost style' driver that > > > > can be used with unmodified nvme host OS drivers. > > > > > > > > Dr. Hannes (CC'ed) had done something like this for megasas a few years > > > > back using specialized QEMU emulation + eventfd based LIO fabric driver, > > > > and got it working with Linux + MSFT guests. > > > > > > > > Doing something similar for nvme would (potentially) be on par with > > > > current virtio-scsi+vhost-scsi small-block performance for scsi-mq > > > > guests, without the extra burden of a new command set specific virtio > > > > driver. > > > > > > Trying to understand it. > > > Is it like below? > > > > > > .------------------------. MMIO .---------------------------------------. > > > | Guest |--------> | Qemu | > > > | Unmodified NVMe driver |<-------- | NVMe device simulation(eventfd based) | > > > '------------------------' '---------------------------------------' > > > | ^ > > > write NVMe | | notify command > > > command | | completion > > > to eventfd | | to eventfd > > > v | > > > .--------------------------------------. > > > | Host: | > > > | eventfd based LIO NVMe fabric driver | > > > '--------------------------------------' > > > | > > > | nvme_queue_rq() > > > v > > > .--------------------------------------. > > > | NVMe driver | > > > '--------------------------------------' > > > | > > > | > > > v > > > .-------------------------------------. > > > | NVMe device | > > > '-------------------------------------' > > > > > > > Correct. The LIO driver on KVM host would be handling some amount of > > NVMe host interface emulation in kernel code, and would be able to > > decode nvme Read/Write/Flush operations and translate -> submit to > > existing backend drivers. > > Let me call the "eventfd based LIO NVMe fabric driver" as > "tcm_eventfd_nvme" > > Currently, LIO frontend driver(iscsi, fc, vhost-scsi etc) talk to LIO > backend driver(fileio, iblock etc) with SCSI commands. > > Did you mean the "tcm_eventfd_nvme" driver need to translate NVMe > commands to SCSI commands and then submit to backend driver? >IBLOCK + FILEIO + RD_MCP don't speak SCSI, they simply process I/Os with LBA + length based on SGL memory or pass along a FLUSH with LBA + length. So once the 'tcm_eventfd_nvme' driver on KVM host receives a nvme host hardware frame via eventfd, it would decode the frame and send along the Read/Write/Flush when exposing existing (non nvme native) backend drivers. This doesn't apply to PSCSI backend driver of course, because it expects to process actual SCSI CDBs.> But I thought the future "LIO NVMe target" can support frontend driver > talk to backend driver directly with NVMe commands without translation. >The native target_core_nvme backend driver is not processing nvme commands per-say, but simply exposing nvme hardware queue resources to a frontend fabric driver. For the nvme-over-fabrics case, backend nvme submission/complete queues are mapped to RDMA hardware queues. So essentially the nvme physical region page (PRP) is mapped to ib_sgl->addr. For a 'tcm_eventfd_nvme' style host-paravirt fabric case, it's less clear how exposing native nvme backend hardware resources would work, or if there is a significant performance benefit over just using submit_bio() for Read/Write/Flush. --nab
On Fri, Sep 18, 2015 at 2:09 PM, Nicholas A. Bellinger <nab at linux-iscsi.org> wrote:> On Fri, 2015-09-18 at 11:12 -0700, Ming Lin wrote: >> On Thu, 2015-09-17 at 17:55 -0700, Nicholas A. Bellinger wrote: >> > On Thu, 2015-09-17 at 16:31 -0700, Ming Lin wrote: >> > > On Wed, 2015-09-16 at 23:10 -0700, Nicholas A. Bellinger wrote: >> > > > Hi Ming & Co, > > <SNIP> > >> > > > > I think the future "LIO NVMe target" only speaks NVMe protocol. >> > > > > >> > > > > Nick(CCed), could you correct me if I'm wrong? >> > > > > >> > > > > For SCSI stack, we have: >> > > > > virtio-scsi(guest) >> > > > > tcm_vhost(or vhost_scsi, host) >> > > > > LIO-scsi-target >> > > > > >> > > > > For NVMe stack, we'll have similar components: >> > > > > virtio-nvme(guest) >> > > > > vhost_nvme(host) >> > > > > LIO-NVMe-target >> > > > > >> > > > >> > > > I think it's more interesting to consider a 'vhost style' driver that >> > > > can be used with unmodified nvme host OS drivers. >> > > > >> > > > Dr. Hannes (CC'ed) had done something like this for megasas a few years >> > > > back using specialized QEMU emulation + eventfd based LIO fabric driver, >> > > > and got it working with Linux + MSFT guests. >> > > > >> > > > Doing something similar for nvme would (potentially) be on par with >> > > > current virtio-scsi+vhost-scsi small-block performance for scsi-mq >> > > > guests, without the extra burden of a new command set specific virtio >> > > > driver. >> > > >> > > Trying to understand it. >> > > Is it like below? >> > > >> > > .------------------------. MMIO .---------------------------------------. >> > > | Guest |--------> | Qemu | >> > > | Unmodified NVMe driver |<-------- | NVMe device simulation(eventfd based) | >> > > '------------------------' '---------------------------------------' >> > > | ^ >> > > write NVMe | | notify command >> > > command | | completion >> > > to eventfd | | to eventfd >> > > v | >> > > .--------------------------------------. >> > > | Host: | >> > > | eventfd based LIO NVMe fabric driver | >> > > '--------------------------------------' >> > > | >> > > | nvme_queue_rq() >> > > v >> > > .--------------------------------------. >> > > | NVMe driver | >> > > '--------------------------------------' >> > > | >> > > | >> > > v >> > > .-------------------------------------. >> > > | NVMe device | >> > > '-------------------------------------' >> > > >> > >> > Correct. The LIO driver on KVM host would be handling some amount of >> > NVMe host interface emulation in kernel code, and would be able to >> > decode nvme Read/Write/Flush operations and translate -> submit to >> > existing backend drivers. >> >> Let me call the "eventfd based LIO NVMe fabric driver" as >> "tcm_eventfd_nvme" >> >> Currently, LIO frontend driver(iscsi, fc, vhost-scsi etc) talk to LIO >> backend driver(fileio, iblock etc) with SCSI commands. >> >> Did you mean the "tcm_eventfd_nvme" driver need to translate NVMe >> commands to SCSI commands and then submit to backend driver? >> > > IBLOCK + FILEIO + RD_MCP don't speak SCSI, they simply process I/Os with > LBA + length based on SGL memory or pass along a FLUSH with LBA + > length. > > So once the 'tcm_eventfd_nvme' driver on KVM host receives a nvme host > hardware frame via eventfd, it would decode the frame and send along the > Read/Write/Flush when exposing existing (non nvme native) backend > drivers. > > This doesn't apply to PSCSI backend driver of course, because it expects > to process actual SCSI CDBs. > >> But I thought the future "LIO NVMe target" can support frontend driver >> talk to backend driver directly with NVMe commands without translation. >> > > The native target_core_nvme backend driver is not processing nvme > commands per-say, but simply exposing nvme hardware queue resources to a > frontend fabric driver. > > For the nvme-over-fabrics case, backend nvme submission/complete queues > are mapped to RDMA hardware queues. So essentially the nvme physical > region page (PRP) is mapped to ib_sgl->addr. > > For a 'tcm_eventfd_nvme' style host-paravirt fabric case, it's less > clear how exposing native nvme backend hardware resources would work, or > if there is a significant performance benefit over just using > submit_bio() for Read/Write/Flush.Now it's much more clear. I'll do a tcm_eventfd_nvme prototype. Thanks for all the detail explanation.> > --nab >
On Fri, 2015-09-18 at 14:09 -0700, Nicholas A. Bellinger wrote:> On Fri, 2015-09-18 at 11:12 -0700, Ming Lin wrote: > > On Thu, 2015-09-17 at 17:55 -0700, Nicholas A. Bellinger wrote: > > > On Thu, 2015-09-17 at 16:31 -0700, Ming Lin wrote: > > > > On Wed, 2015-09-16 at 23:10 -0700, Nicholas A. Bellinger wrote: > > > > > Hi Ming & Co, > > <SNIP> > > > > > > > I think the future "LIO NVMe target" only speaks NVMe protocol. > > > > > > > > > > > > Nick(CCed), could you correct me if I'm wrong? > > > > > > > > > > > > For SCSI stack, we have: > > > > > > virtio-scsi(guest) > > > > > > tcm_vhost(or vhost_scsi, host) > > > > > > LIO-scsi-target > > > > > > > > > > > > For NVMe stack, we'll have similar components: > > > > > > virtio-nvme(guest) > > > > > > vhost_nvme(host) > > > > > > LIO-NVMe-target > > > > > > > > > > > > > > > > I think it's more interesting to consider a 'vhost style' driver that > > > > > can be used with unmodified nvme host OS drivers. > > > > > > > > > > Dr. Hannes (CC'ed) had done something like this for megasas a few years > > > > > back using specialized QEMU emulation + eventfd based LIO fabric driver, > > > > > and got it working with Linux + MSFT guests. > > > > > > > > > > Doing something similar for nvme would (potentially) be on par with > > > > > current virtio-scsi+vhost-scsi small-block performance for scsi-mq > > > > > guests, without the extra burden of a new command set specific virtio > > > > > driver. > > > > > > > > Trying to understand it. > > > > Is it like below? > > > > > > > > .------------------------. MMIO .---------------------------------------. > > > > | Guest |--------> | Qemu | > > > > | Unmodified NVMe driver |<-------- | NVMe device simulation(eventfd based) | > > > > '------------------------' '---------------------------------------' > > > > | ^ > > > > write NVMe | | notify command > > > > command | | completion > > > > to eventfd | | to eventfd > > > > v | > > > > .--------------------------------------. > > > > | Host: | > > > > | eventfd based LIO NVMe fabric driver | > > > > '--------------------------------------' > > > > | > > > > | nvme_queue_rq() > > > > v > > > > .--------------------------------------. > > > > | NVMe driver | > > > > '--------------------------------------' > > > > | > > > > | > > > > v > > > > .-------------------------------------. > > > > | NVMe device | > > > > '-------------------------------------' > > > > > > > > > > Correct. The LIO driver on KVM host would be handling some amount of > > > NVMe host interface emulation in kernel code, and would be able to > > > decode nvme Read/Write/Flush operations and translate -> submit to > > > existing backend drivers. > > > > Let me call the "eventfd based LIO NVMe fabric driver" as > > "tcm_eventfd_nvme" > > > > Currently, LIO frontend driver(iscsi, fc, vhost-scsi etc) talk to LIO > > backend driver(fileio, iblock etc) with SCSI commands. > > > > Did you mean the "tcm_eventfd_nvme" driver need to translate NVMe > > commands to SCSI commands and then submit to backend driver? > > > > IBLOCK + FILEIO + RD_MCP don't speak SCSI, they simply process I/Os with > LBA + length based on SGL memory or pass along a FLUSH with LBA + > length. > > So once the 'tcm_eventfd_nvme' driver on KVM host receives a nvme host > hardware frame via eventfd, it would decode the frame and send along the > Read/Write/Flush when exposing existing (non nvme native) backend > drivers.Learned vhost architecture: http://blog.vmsplice.net/2011/09/qemu-internals-vhost-architecture.html The nice thing is it is not tied to KVM in any way. For SCSI, there are "virtio-scsi" in guest kernel and "vhost-scsi" in host kernel. For NVMe, there is no "virtio-nvme" in guest kernel(just unmodified NVMe driver), but I'll do similar thing in Qemu with vhost infrastructure. And there is "vhost_nvme" in host kernel. For the "virtqueue" implementation in qemu-nvme, I'll possibly just use/copy drivers/virtio/virtio_ring.c, same as what linux/tools/virtio/virtio_test.c does. A bit more detail graph as below. What do you think? .-----------------------------------------. .------------------------. | Guest(Linux, Windows, FreeBSD, Solaris) | NVMe | qemu | | unmodified NVMe driver | command | NVMe device emulation | | | -------> | vhost + virtqueue | '-----------------------------------------' '------------------------' | | ^ passthrough | kick/notify NVMe command | via eventfd userspace via virtqueue | | | v v | ---------------------------------------------------------------------------------- .-----------------------------------------------------------------------. kernel | LIO frontend driver | | - vhost_nvme | '-----------------------------------------------------------------------' | translate ^ | (NVMe command) | | to | v (LBA, length) | .----------------------------------------------------------------------. | LIO backend driver | | - fileio (/mnt/xxx.file) | | - iblock (/dev/sda1, /dev/nvme0n1, ...) | '----------------------------------------------------------------------' | ^ | submit_bio() | v | .----------------------------------------------------------------------. | block layer | | | '----------------------------------------------------------------------' | ^ | | v | .----------------------------------------------------------------------. | block device driver | | | '----------------------------------------------------------------------' | | | | | | | | v v v v .------------. .-----------. .------------. .---------------. | SATA | | SCSI | | NVMe | | .... | '------------' '-----------' '------------' '---------------'