Nicholas A. Bellinger
2012-Jul-04 04:24 UTC
[PATCH 0/6] tcm_vhost/virtio-scsi WIP code for-3.6
From: Nicholas Bellinger <nab at linux-iscsi.org> Hi folks, This series contains patches required to update tcm_vhost <-> virtio-scsi connected hosts <-> guests to run on v3.5-rc2 mainline code. This series is available on top of target-pending/auto-next here: git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending.git tcm_vhost This includes the necessary vhost changes from Stefan to to get tcm_vhost functioning, along a virtio-scsi LUN scanning change to address a client bug with tcm_vhost I ran into.. Also, tcm_vhost driver has been merged into a single source + header file that is now living under /drivers/vhost/, along with latest tcm_vhost changes from Zhi's tcm_vhost tree. Here are a couple of screenshots of the code in action using raw IBLOCK backends provided by FusionIO ioDrive Duo: http://linux-iscsi.org/images/Virtio-scsi-tcm-vhost-3.5-rc2-3.png http://linux-iscsi.org/images/Virtio-scsi-tcm-vhost-3.5-rc2-4.png So the next steps on my end will be converting tcm_vhost to submit backend I/O from cmwq context, along with fio benchmark numbers between tcm_vhost/virtio-scsi and virtio-scsi-raw using raw IBLOCK iomemory_vsl flash. Please have a look vhost + virtio-scsi folks (mst + paolo CC'ed) and let us know if you have any concerns. Thanks! --nab Nicholas Bellinger (4): vhost: Add vhost_scsi specific defines tcm_vhost: Initial merge for vhost level target fabric driver virtio-scsi: Add vdrv->scan for post VIRTIO_CONFIG_S_DRIVER_OK LUN scanning virtio-scsi: Set shost->max_id=1 for tcm_vhost WWPNs Stefan Hajnoczi (2): vhost: Separate vhost-net features from vhost features vhost: make vhost work queue visible drivers/scsi/virtio_scsi.c | 20 +- drivers/vhost/Kconfig | 6 + drivers/vhost/Makefile | 1 + drivers/vhost/net.c | 4 +- drivers/vhost/tcm_vhost.c | 1592 ++++++++++++++++++++++++++++++++++++++++++++ drivers/vhost/tcm_vhost.h | 70 ++ drivers/vhost/vhost.c | 5 +- drivers/vhost/vhost.h | 6 +- drivers/virtio/virtio.c | 5 +- include/linux/vhost.h | 9 + include/linux/virtio.h | 1 + 11 files changed, 1708 insertions(+), 11 deletions(-) create mode 100644 drivers/vhost/tcm_vhost.c create mode 100644 drivers/vhost/tcm_vhost.h -- 1.7.2.5
Nicholas A. Bellinger
2012-Jul-04 04:24 UTC
[PATCH 1/6] vhost: Separate vhost-net features from vhost features
From: Stefan Hajnoczi <stefanha at linux.vnet.ibm.com> In order for other vhost devices to use the VHOST_FEATURES bits the vhost-net specific bits need to be moved to their own VHOST_NET_FEATURES constant. Signed-off-by: Stefan Hajnoczi <stefanha at linux.vnet.ibm.com> Cc: Zhi Yong Wu <wuzhy at cn.ibm.com> Cc: Michael S. Tsirkin <mst at redhat.com> Cc: Paolo Bonzini <pbonzini at redhat.com> Signed-off-by: Nicholas A. Bellinger <nab at risingtidesystems.com> --- drivers/vhost/net.c | 4 ++-- drivers/vhost/vhost.h | 3 ++- 2 files changed, 4 insertions(+), 3 deletions(-) diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index f82a739..072cbba 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -823,14 +823,14 @@ static long vhost_net_ioctl(struct file *f, unsigned int ioctl, return -EFAULT; return vhost_net_set_backend(n, backend.index, backend.fd); case VHOST_GET_FEATURES: - features = VHOST_FEATURES; + features = VHOST_NET_FEATURES; if (copy_to_user(featurep, &features, sizeof features)) return -EFAULT; return 0; case VHOST_SET_FEATURES: if (copy_from_user(&features, featurep, sizeof features)) return -EFAULT; - if (features & ~VHOST_FEATURES) + if (features & ~VHOST_NET_FEATURES) return -EOPNOTSUPP; return vhost_net_set_features(n, features); case VHOST_RESET_OWNER: diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h index 8de1fd5..07b9763 100644 --- a/drivers/vhost/vhost.h +++ b/drivers/vhost/vhost.h @@ -201,7 +201,8 @@ enum { VHOST_FEATURES = (1ULL << VIRTIO_F_NOTIFY_ON_EMPTY) | (1ULL << VIRTIO_RING_F_INDIRECT_DESC) | (1ULL << VIRTIO_RING_F_EVENT_IDX) | - (1ULL << VHOST_F_LOG_ALL) | + (1ULL << VHOST_F_LOG_ALL), + VHOST_NET_FEATURES = VHOST_FEATURES | (1ULL << VHOST_NET_F_VIRTIO_NET_HDR) | (1ULL << VIRTIO_NET_F_MRG_RXBUF), }; -- 1.7.2.5
Nicholas A. Bellinger
2012-Jul-04 04:24 UTC
[PATCH 2/6] vhost: make vhost work queue visible
From: Stefan Hajnoczi <stefanha at gmail.com> The vhost work queue allows processing to be done in vhost worker thread context, which uses the owner process mm. Access to the vring and guest memory is typically only possible from vhost worker context so it is useful to allow work to be queued directly by users. Currently vhost_net only uses the poll wrappers which do not expose the work queue functions. However, for tcm_vhost (vhost_scsi) it will be necessary to queue custom work. Signed-off-by: Stefan Hajnoczi <stefanha at linux.vnet.ibm.com> Cc: Zhi Yong Wu <wuzhy at cn.ibm.com> Cc: Michael S. Tsirkin <mst at redhat.com> Cc: Paolo Bonzini <pbonzini at redhat.com> Signed-off-by: Nicholas Bellinger <nab at linux-iscsi.org> --- drivers/vhost/vhost.c | 5 ++--- drivers/vhost/vhost.h | 3 +++ 2 files changed, 5 insertions(+), 3 deletions(-) diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index 94dbd25..1aab08b 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -64,7 +64,7 @@ static int vhost_poll_wakeup(wait_queue_t *wait, unsigned mode, int sync, return 0; } -static void vhost_work_init(struct vhost_work *work, vhost_work_fn_t fn) +void vhost_work_init(struct vhost_work *work, vhost_work_fn_t fn) { INIT_LIST_HEAD(&work->node); work->fn = fn; @@ -137,8 +137,7 @@ void vhost_poll_flush(struct vhost_poll *poll) vhost_work_flush(poll->dev, &poll->work); } -static inline void vhost_work_queue(struct vhost_dev *dev, - struct vhost_work *work) +void vhost_work_queue(struct vhost_dev *dev, struct vhost_work *work) { unsigned long flags; diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h index 07b9763..1125af3 100644 --- a/drivers/vhost/vhost.h +++ b/drivers/vhost/vhost.h @@ -43,6 +43,9 @@ struct vhost_poll { struct vhost_dev *dev; }; +void vhost_work_init(struct vhost_work *work, vhost_work_fn_t fn); +void vhost_work_queue(struct vhost_dev *dev, struct vhost_work *work); + void vhost_poll_init(struct vhost_poll *poll, vhost_work_fn_t fn, unsigned long mask, struct vhost_dev *dev); void vhost_poll_start(struct vhost_poll *poll, struct file *file); -- 1.7.2.5
Nicholas A. Bellinger
2012-Jul-04 04:24 UTC
[PATCH 3/6] vhost: Add vhost_scsi specific defines
From: Nicholas Bellinger <nab at risingtidesystems.com> This patch adds the initial vhost_scsi_ioctl() callers for VHOST_SCSI_SET_ENDPOINT and VHOST_SCSI_CLEAR_ENDPOINT respectively, and also adds struct vhost_vring_target that is used by tcm_vhost code when locating target ports during qemu setup. Signed-off-by: Stefan Hajnoczi <stefanha at linux.vnet.ibm.com> Cc: Zhi Yong Wu <wuzhy at cn.ibm.com> Cc: Michael S. Tsirkin <mst at redhat.com> Cc: Paolo Bonzini <pbonzini at redhat.com>, Signed-off-by: Nicholas A. Bellinger <nab at risingtidesystems.com> --- include/linux/vhost.h | 9 +++++++++ 1 files changed, 9 insertions(+), 0 deletions(-) diff --git a/include/linux/vhost.h b/include/linux/vhost.h index e847f1e..33b313b 100644 --- a/include/linux/vhost.h +++ b/include/linux/vhost.h @@ -24,7 +24,11 @@ struct vhost_vring_state { struct vhost_vring_file { unsigned int index; int fd; /* Pass -1 to unbind from file. */ +}; +struct vhost_vring_target { + unsigned char vhost_wwpn[224]; + unsigned short vhost_tpgt; }; struct vhost_vring_addr { @@ -121,6 +125,11 @@ struct vhost_memory { * device. This can be used to stop the ring (e.g. for migration). */ #define VHOST_NET_SET_BACKEND _IOW(VHOST_VIRTIO, 0x30, struct vhost_vring_file) +/* VHOST_SCSI specific defines */ + +#define VHOST_SCSI_SET_ENDPOINT _IOW(VHOST_VIRTIO, 0x40, struct vhost_vring_target) +#define VHOST_SCSI_CLEAR_ENDPOINT _IOW(VHOST_VIRTIO, 0x41, struct vhost_vring_target) + /* Feature bits */ /* Log all write descriptors. Can be changed while device is active. */ #define VHOST_F_LOG_ALL 26 -- 1.7.2.5
Nicholas A. Bellinger
2012-Jul-04 04:24 UTC
[PATCH 4/6] tcm_vhost: Initial merge for vhost level target fabric driver
From: Nicholas Bellinger <nab at linux-iscsi.org> This patch adds the initial code for tcm_vhost, a Vhost level TCM fabric driver for virtio SCSI initiators into KVM guest. This code is currently up and running on v3.5-rc2 host+guest along with the virtio-scsi vdev->scan() patch to allow a proper scsi_scan_host() to occur once the tcm_vhost nexus has been established by the paravirtualized virtio-scsi client. (nab: Merge into single source + header file, and move to drivers/vhost/) Cc: Stefan Hajnoczi <stefanha at linux.vnet.ibm.com> Cc: Zhi Yong Wu <wuzhy at cn.ibm.com> Cc: Michael S. Tsirkin <mst at redhat.com> Cc: Paolo Bonzini <pbonzini at redhat.com> Cc: Christoph Hellwig <hch at lst.de> Cc: Jens Axboe <axboe at kernel.dk> Signed-off-by: Nicholas Bellinger <nab at linux-iscsi.org> --- drivers/vhost/Kconfig | 6 + drivers/vhost/Makefile | 1 + drivers/vhost/tcm_vhost.c | 1592 +++++++++++++++++++++++++++++++++++++++++++++ drivers/vhost/tcm_vhost.h | 70 ++ 4 files changed, 1669 insertions(+), 0 deletions(-) create mode 100644 drivers/vhost/tcm_vhost.c create mode 100644 drivers/vhost/tcm_vhost.h diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig index e4e2fd1..a8642e2 100644 --- a/drivers/vhost/Kconfig +++ b/drivers/vhost/Kconfig @@ -9,3 +9,9 @@ config VHOST_NET To compile this driver as a module, choose M here: the module will be called vhost_net. +config TCM_VHOST + tristate "TCM_VHOST fabric module (EXPERIMENTAL)" + depends on TARGET_CORE && EVENTFD && EXPERINETAL && m + default n + ---help--- + Say M here to enable the TCM_VHOST fabric module for use with virtio-scsi guests diff --git a/drivers/vhost/Makefile b/drivers/vhost/Makefile index 72dd020..b10c7b1 100644 --- a/drivers/vhost/Makefile +++ b/drivers/vhost/Makefile @@ -1,2 +1,3 @@ obj-$(CONFIG_VHOST_NET) += vhost_net.o +obj-$(CONFIG_TCM_VHOST) += tcm_vhost.o vhost_net-y := vhost.o net.o diff --git a/drivers/vhost/tcm_vhost.c b/drivers/vhost/tcm_vhost.c new file mode 100644 index 0000000..cd86633 --- /dev/null +++ b/drivers/vhost/tcm_vhost.c @@ -0,0 +1,1592 @@ +/******************************************************************************* + * Vhost kernel TCM fabric driver for virtio SCSI initiators + * + * (C) Copyright 2010-2012 RisingTide Systems LLC. + * (C) Copyright 2010-2012 IBM Corp. + * + * Licensed to the Linux Foundation under the General Public License (GPL) version 2. + * + * Authors: Nicholas A. Bellinger <nab at risingtidesystems.com> + * Stefan Hajnoczi <stefanha at linux.vnet.ibm.com> + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + ****************************************************************************/ + +#include <linux/module.h> +#include <linux/moduleparam.h> +#include <generated/utsrelease.h> +#include <linux/utsname.h> +#include <linux/init.h> +#include <linux/slab.h> +#include <linux/kthread.h> +#include <linux/types.h> +#include <linux/string.h> +#include <linux/configfs.h> +#include <linux/ctype.h> +#include <linux/compat.h> +#include <linux/eventfd.h> +#include <linux/vhost.h> +#include <linux/fs.h> +#include <linux/miscdevice.h> +#include <asm/unaligned.h> +#include <scsi/scsi.h> +#include <scsi/scsi_tcq.h> +#include <target/target_core_base.h> +#include <target/target_core_fabric.h> +#include <target/target_core_fabric_configfs.h> +#include <target/target_core_configfs.h> +#include <target/configfs_macros.h> +#include <linux/vhost.h> +#include <linux/virtio_net.h> /* TODO vhost.h currently depends on this */ +#include <linux/virtio_scsi.h> + +#include "vhost.c" +#include "vhost.h" +#include "tcm_vhost.h" + +struct vhost_scsi { + atomic_t vhost_ref_cnt; + struct tcm_vhost_tpg *vs_tpg; + struct vhost_dev dev; + struct vhost_virtqueue vqs[3]; + + struct vhost_work vs_completion_work; /* cmd completion work item */ + struct list_head vs_completion_list; /* cmd completion queue */ + spinlock_t vs_completion_lock; /* protects s_completion_list */ +}; + +/* Local pointer to allocated TCM configfs fabric module */ +struct target_fabric_configfs *tcm_vhost_fabric_configfs; + +/* Global spinlock to protect tcm_vhost TPG list for vhost IOCTL access */ +DEFINE_MUTEX(tcm_vhost_mutex); +LIST_HEAD(tcm_vhost_list); + +static int tcm_vhost_check_true(struct se_portal_group *se_tpg) +{ + return 1; +} + +static int tcm_vhost_check_false(struct se_portal_group *se_tpg) +{ + return 0; +} + +static char *tcm_vhost_get_fabric_name(void) +{ + return "vhost"; +} + +static u8 tcm_vhost_get_fabric_proto_ident(struct se_portal_group *se_tpg) +{ + struct tcm_vhost_tpg *tpg = container_of(se_tpg, + struct tcm_vhost_tpg, se_tpg); + struct tcm_vhost_tport *tport = tpg->tport; + + switch (tport->tport_proto_id) { + case SCSI_PROTOCOL_SAS: + return sas_get_fabric_proto_ident(se_tpg); + case SCSI_PROTOCOL_FCP: + return fc_get_fabric_proto_ident(se_tpg); + case SCSI_PROTOCOL_ISCSI: + return iscsi_get_fabric_proto_ident(se_tpg); + default: + pr_err("Unknown tport_proto_id: 0x%02x, using" + " SAS emulation\n", tport->tport_proto_id); + break; + } + + return sas_get_fabric_proto_ident(se_tpg); +} + +static char *tcm_vhost_get_fabric_wwn(struct se_portal_group *se_tpg) +{ + struct tcm_vhost_tpg *tpg = container_of(se_tpg, + struct tcm_vhost_tpg, se_tpg); + struct tcm_vhost_tport *tport = tpg->tport; + + return &tport->tport_name[0]; +} + +u16 tcm_vhost_get_tag(struct se_portal_group *se_tpg) +{ + struct tcm_vhost_tpg *tpg = container_of(se_tpg, + struct tcm_vhost_tpg, se_tpg); + return tpg->tport_tpgt; +} + +static u32 tcm_vhost_get_default_depth(struct se_portal_group *se_tpg) +{ + return 1; +} + +static u32 tcm_vhost_get_pr_transport_id( + struct se_portal_group *se_tpg, + struct se_node_acl *se_nacl, + struct t10_pr_registration *pr_reg, + int *format_code, + unsigned char *buf) +{ + struct tcm_vhost_tpg *tpg = container_of(se_tpg, + struct tcm_vhost_tpg, se_tpg); + struct tcm_vhost_tport *tport = tpg->tport; + + switch (tport->tport_proto_id) { + case SCSI_PROTOCOL_SAS: + return sas_get_pr_transport_id(se_tpg, se_nacl, pr_reg, + format_code, buf); + case SCSI_PROTOCOL_FCP: + return fc_get_pr_transport_id(se_tpg, se_nacl, pr_reg, + format_code, buf); + case SCSI_PROTOCOL_ISCSI: + return iscsi_get_pr_transport_id(se_tpg, se_nacl, pr_reg, + format_code, buf); + default: + pr_err("Unknown tport_proto_id: 0x%02x, using" + " SAS emulation\n", tport->tport_proto_id); + break; + } + + return sas_get_pr_transport_id(se_tpg, se_nacl, pr_reg, + format_code, buf); +} + +static u32 tcm_vhost_get_pr_transport_id_len( + struct se_portal_group *se_tpg, + struct se_node_acl *se_nacl, + struct t10_pr_registration *pr_reg, + int *format_code) +{ + struct tcm_vhost_tpg *tpg = container_of(se_tpg, + struct tcm_vhost_tpg, se_tpg); + struct tcm_vhost_tport *tport = tpg->tport; + + switch (tport->tport_proto_id) { + case SCSI_PROTOCOL_SAS: + return sas_get_pr_transport_id_len(se_tpg, se_nacl, pr_reg, + format_code); + case SCSI_PROTOCOL_FCP: + return fc_get_pr_transport_id_len(se_tpg, se_nacl, pr_reg, + format_code); + case SCSI_PROTOCOL_ISCSI: + return iscsi_get_pr_transport_id_len(se_tpg, se_nacl, pr_reg, + format_code); + default: + pr_err("Unknown tport_proto_id: 0x%02x, using" + " SAS emulation\n", tport->tport_proto_id); + break; + } + + return sas_get_pr_transport_id_len(se_tpg, se_nacl, pr_reg, + format_code); +} + +static char *tcm_vhost_parse_pr_out_transport_id( + struct se_portal_group *se_tpg, + const char *buf, + u32 *out_tid_len, + char **port_nexus_ptr) +{ + struct tcm_vhost_tpg *tpg = container_of(se_tpg, + struct tcm_vhost_tpg, se_tpg); + struct tcm_vhost_tport *tport = tpg->tport; + + switch (tport->tport_proto_id) { + case SCSI_PROTOCOL_SAS: + return sas_parse_pr_out_transport_id(se_tpg, buf, out_tid_len, + port_nexus_ptr); + case SCSI_PROTOCOL_FCP: + return fc_parse_pr_out_transport_id(se_tpg, buf, out_tid_len, + port_nexus_ptr); + case SCSI_PROTOCOL_ISCSI: + return iscsi_parse_pr_out_transport_id(se_tpg, buf, out_tid_len, + port_nexus_ptr); + default: + pr_err("Unknown tport_proto_id: 0x%02x, using" + " SAS emulation\n", tport->tport_proto_id); + break; + } + + return sas_parse_pr_out_transport_id(se_tpg, buf, out_tid_len, + port_nexus_ptr); +} + +static struct se_node_acl *tcm_vhost_alloc_fabric_acl(struct se_portal_group *se_tpg) +{ + struct tcm_vhost_nacl *nacl; + + nacl = kzalloc(sizeof(struct tcm_vhost_nacl), GFP_KERNEL); + if (!nacl) { + pr_err("Unable to alocate struct tcm_vhost_nacl\n"); + return NULL; + } + + return &nacl->se_node_acl; +} + +static void tcm_vhost_release_fabric_acl( + struct se_portal_group *se_tpg, + struct se_node_acl *se_nacl) +{ + struct tcm_vhost_nacl *nacl = container_of(se_nacl, + struct tcm_vhost_nacl, se_node_acl); + kfree(nacl); +} + +static u32 tcm_vhost_tpg_get_inst_index(struct se_portal_group *se_tpg) +{ + return 1; +} + +/* + * Called by struct target_core_fabric_ops->new_cmd_map() + * + * Always called in process context. A non zero return value + * here will signal to handle an exception based on the return code. + */ +static int tcm_vhost_new_cmd_map(struct se_cmd *se_cmd) +{ + struct tcm_vhost_cmd *tv_cmd = container_of(se_cmd, + struct tcm_vhost_cmd, tvc_se_cmd); + struct scatterlist *sg_ptr, *sg_bidi_ptr = NULL; + u32 sg_no_bidi = 0; + int ret; + /* + * Allocate the necessary tasks to complete the received CDB+data + */ + ret = target_setup_cmd_from_cdb(se_cmd, tv_cmd->tvc_cdb); + if (ret != 0) + return ret; + /* + * Setup the struct scatterlist memory from the received + * struct tcm_vhost_cmd.. + */ + if (tv_cmd->tvc_sgl_count) { + sg_ptr = tv_cmd->tvc_sgl; + /* + * For BIDI commands, pass in the extra READ buffer + * to transport_generic_map_mem_to_cmd() below.. + */ +/* FIXME: Fix BIDI operation in tcm_vhost_new_cmd_map() */ +#if 0 + if (se_cmd->se_cmd_flags & SCF_BIDI) { + mem_bidi_ptr = NULL; + sg_no_bidi = 0; + } +#endif + } else { + /* + * Used for DMA_NONE + */ + sg_ptr = NULL; + } + + /* Tell the core about our preallocated memory */ + return transport_generic_map_mem_to_cmd(se_cmd, sg_ptr, + tv_cmd->tvc_sgl_count, sg_bidi_ptr, + sg_no_bidi); +} + +static void tcm_vhost_release_cmd(struct se_cmd *se_cmd) +{ + return; +} + +static int tcm_vhost_shutdown_session(struct se_session *se_sess) +{ + return 0; +} + +static void tcm_vhost_close_session(struct se_session *se_sess) +{ + return; +} + +static u32 tcm_vhost_sess_get_index(struct se_session *se_sess) +{ + return 0; +} + +static int tcm_vhost_write_pending(struct se_cmd *se_cmd) +{ + /* Go ahead and process the write immediately */ + transport_generic_process_write(se_cmd); + return 0; +} + +static int tcm_vhost_write_pending_status(struct se_cmd *se_cmd) +{ + return 0; +} + +static void tcm_vhost_set_default_node_attrs(struct se_node_acl *nacl) +{ + return; +} + +static u32 tcm_vhost_get_task_tag(struct se_cmd *se_cmd) +{ + return 0; +} + +static int tcm_vhost_get_cmd_state(struct se_cmd *se_cmd) +{ + return 0; +} + +static void vhost_scsi_complete_cmd(struct tcm_vhost_cmd *); + +static int tcm_vhost_queue_data_in(struct se_cmd *se_cmd) +{ + struct tcm_vhost_cmd *tv_cmd = container_of(se_cmd, + struct tcm_vhost_cmd, tvc_se_cmd); + vhost_scsi_complete_cmd(tv_cmd); + return 0; +} + +static int tcm_vhost_queue_status(struct se_cmd *se_cmd) +{ + struct tcm_vhost_cmd *tv_cmd = container_of(se_cmd, + struct tcm_vhost_cmd, tvc_se_cmd); + vhost_scsi_complete_cmd(tv_cmd); + return 0; +} + +static int tcm_vhost_queue_tm_rsp(struct se_cmd *se_cmd) +{ + return 0; +} + +static u16 tcm_vhost_set_fabric_sense_len(struct se_cmd *se_cmd, u32 sense_length) +{ + return 0; +} + +static u16 tcm_vhost_get_fabric_sense_len(void) +{ + return 0; +} + +static void vhost_scsi_free_cmd(struct tcm_vhost_cmd *tv_cmd) +{ + struct se_cmd *se_cmd = &tv_cmd->tvc_se_cmd; + + /* TODO locking against target/backend threads? */ + transport_generic_free_cmd(se_cmd, 1); + + if (tv_cmd->tvc_sgl_count) { + u32 i; + for (i = 0; i < tv_cmd->tvc_sgl_count; i++) + put_page(sg_page(&tv_cmd->tvc_sgl[i])); + } + + kfree(tv_cmd); +} + +/* Dequeue a command from the completion list */ +static struct tcm_vhost_cmd *vhost_scsi_get_cmd_from_completion(struct vhost_scsi *vs) +{ + struct tcm_vhost_cmd *tv_cmd = NULL; + + spin_lock_bh(&vs->vs_completion_lock); + if (list_empty(&vs->vs_completion_list)) { + spin_unlock_bh(&vs->vs_completion_lock); + return NULL; + } + + list_for_each_entry(tv_cmd, &vs->vs_completion_list, + tvc_completion_list) { + list_del(&tv_cmd->tvc_completion_list); + break; + } + spin_unlock_bh(&vs->vs_completion_lock); + return tv_cmd; +} + +/* Fill in status and signal that we are done processing this command + * + * This is scheduled in the vhost work queue so we are called with the owner + * process mm and can access the vring. + */ +static void vhost_scsi_complete_cmd_work(struct vhost_work *work) +{ + struct vhost_scsi *vs = container_of(work, struct vhost_scsi, + vs_completion_work); + struct tcm_vhost_cmd *tv_cmd; + + while ((tv_cmd = vhost_scsi_get_cmd_from_completion(vs)) != NULL) { + struct virtio_scsi_cmd_resp v_rsp; + struct se_cmd *se_cmd = &tv_cmd->tvc_se_cmd; + int ret; + + pr_debug("%s tv_cmd %p resid %u status %#02x\n", __func__, + tv_cmd, se_cmd->residual_count, se_cmd->scsi_status); + + memset(&v_rsp, 0, sizeof(v_rsp)); + v_rsp.resid = se_cmd->residual_count; + /* TODO is status_qualifier field needed? */ + v_rsp.status = se_cmd->scsi_status; + v_rsp.sense_len = se_cmd->scsi_sense_length; + memcpy(v_rsp.sense, tv_cmd->tvc_sense_buf, + v_rsp.sense_len); + ret = copy_to_user(tv_cmd->tvc_resp, &v_rsp, sizeof(v_rsp)); + if (likely(ret == 0)) + vhost_add_used(&vs->vqs[2], tv_cmd->tvc_vq_desc, 0); + else + pr_err("Faulted on virtio_scsi_cmd_resp\n"); + + vhost_scsi_free_cmd(tv_cmd); + } + + vhost_signal(&vs->dev, &vs->vqs[2]); +} + +static void vhost_scsi_complete_cmd(struct tcm_vhost_cmd *tv_cmd) +{ + struct vhost_scsi *vs = tv_cmd->tvc_vhost; + + pr_debug("%s tv_cmd %p\n", __func__, tv_cmd); + + spin_lock_bh(&vs->vs_completion_lock); + list_add_tail(&tv_cmd->tvc_completion_list, &vs->vs_completion_list); + spin_unlock_bh(&vs->vs_completion_lock); + + vhost_work_queue(&vs->dev, &vs->vs_completion_work); +} + +static struct tcm_vhost_cmd *vhost_scsi_allocate_cmd( + struct tcm_vhost_tpg *tv_tpg, + struct virtio_scsi_cmd_req *v_req, + u32 exp_data_len, + int data_direction) +{ + struct tcm_vhost_cmd *tv_cmd; + struct tcm_vhost_nexus *tv_nexus; + struct se_portal_group *se_tpg = &tv_tpg->se_tpg; + struct se_session *se_sess; + struct se_cmd *se_cmd; + int sam_task_attr; + + tv_nexus = tv_tpg->tpg_nexus; + if (!tv_nexus) { + pr_err("Unable to locate active struct tcm_vhost_nexus\n"); + return ERR_PTR(-EIO); + } + se_sess = tv_nexus->tvn_se_sess; + + tv_cmd = kzalloc(sizeof(struct tcm_vhost_cmd), GFP_ATOMIC); + if (!tv_cmd) { + pr_err("Unable to allocate struct tcm_vhost_cmd\n"); + return ERR_PTR(-ENOMEM); + } + INIT_LIST_HEAD(&tv_cmd->tvc_completion_list); + tv_cmd->tvc_tag = v_req->tag; + + se_cmd = &tv_cmd->tvc_se_cmd; + /* + * Locate the SAM Task Attr from virtio_scsi_cmd_req + */ + sam_task_attr = v_req->task_attr; + /* + * Initialize struct se_cmd descriptor from target_core_mod infrastructure + */ + transport_init_se_cmd(se_cmd, se_tpg->se_tpg_tfo, se_sess, exp_data_len, + data_direction, sam_task_attr, + &tv_cmd->tvc_sense_buf[0]); + +#if 0 /* FIXME: vhost_scsi_allocate_cmd() BIDI operation */ + if (bidi) + se_cmd->se_cmd_flags |= SCF_BIDI; +#endif + /* + * From here the rest of the se_cmd will be setup and dispatched + * via tcm_vhost_new_cmd_map() from TCM backend thread context + * after transport_generic_handle_cdb_map() has been called from + * vhost_scsi_handle_vq() below.. + */ + return tv_cmd; +} + +/* + * Map a user memory range into a scatterlist + * + * Returns the number of scatterlist entries used or -errno on error. + */ +static int vhost_scsi_map_to_sgl(struct scatterlist *sgl, + unsigned int sgl_count, + void __user *ptr, size_t len, int write) +{ + struct scatterlist *sg = sgl; + unsigned int npages = 0; + int ret; + + while (len > 0) { + struct page *page; + unsigned int offset = (uintptr_t)ptr & ~PAGE_MASK; + unsigned int nbytes = min(PAGE_SIZE - offset, len); + + if (npages == sgl_count) { + ret = -ENOBUFS; + goto err; + } + + ret = get_user_pages_fast((unsigned long)ptr, 1, write, &page); + BUG_ON(ret == 0); /* we should either get our page or fail */ + if (ret < 0) + goto err; + + sg_set_page(sg, page, nbytes, offset); + ptr += nbytes; + len -= nbytes; + sg++; + npages++; + } + return npages; + +err: + /* Put pages that we hold */ + for (sg = sgl; sg != &sgl[npages]; sg++) + put_page(sg_page(sg)); + return ret; +} + +static int vhost_scsi_map_iov_to_sgl(struct tcm_vhost_cmd *tv_cmd, + struct iovec *iov, unsigned int niov, + int write) +{ + int ret; + unsigned int i; + u32 sgl_count; + struct scatterlist *sg; + + /* + * Find out how long sglist needs to be + */ + sgl_count = 0; + for (i = 0; i < niov; i++) { + sgl_count += (((uintptr_t)iov[i].iov_base + iov[i].iov_len + + PAGE_SIZE - 1) >> PAGE_SHIFT) - + ((uintptr_t)iov[i].iov_base >> PAGE_SHIFT); + } + /* TODO overflow checking */ + + sg = kmalloc(sizeof(tv_cmd->tvc_sgl[0]) * sgl_count, GFP_ATOMIC); + if (!sg) + return -ENOMEM; + pr_debug("%s sg %p sgl_count %u is_err %ld\n", __func__, + sg, sgl_count, IS_ERR(sg)); + sg_init_table(sg, sgl_count); + + tv_cmd->tvc_sgl = sg; + tv_cmd->tvc_sgl_count = sgl_count; + + pr_debug("Mapping %u iovecs for %u pages\n", niov, sgl_count); + for (i = 0; i < niov; i++) { + ret = vhost_scsi_map_to_sgl(sg, sgl_count, iov[i].iov_base, + iov[i].iov_len, write); + if (ret < 0) { + for (i = 0; i < tv_cmd->tvc_sgl_count; i++) + put_page(sg_page(&tv_cmd->tvc_sgl[i])); + kfree(tv_cmd->tvc_sgl); + tv_cmd->tvc_sgl = NULL; + tv_cmd->tvc_sgl_count = 0; + return ret; + } + + sg += ret; + sgl_count -= ret; + } + return 0; +} + +static void vhost_scsi_handle_vq(struct vhost_scsi *vs) +{ + struct vhost_virtqueue *vq = &vs->vqs[2]; + struct virtio_scsi_cmd_req v_req; + struct tcm_vhost_tpg *tv_tpg; + struct tcm_vhost_cmd *tv_cmd; + u32 exp_data_len, data_first, data_num, data_direction; + unsigned out, in, i; + int head, ret, lun; + + /* Must use ioctl VHOST_SCSI_SET_ENDPOINT */ + tv_tpg = vs->vs_tpg; + if (unlikely(!tv_tpg)) { + pr_err("%s endpoint not set\n", __func__); + return; + } + + mutex_lock(&vq->mutex); + vhost_disable_notify(&vs->dev, vq); + + for (;;) { + head = vhost_get_vq_desc(&vs->dev, vq, vq->iov, + ARRAY_SIZE(vq->iov), &out, &in, + NULL, NULL); + pr_debug("vhost_get_vq_desc: head: %d, out: %u in: %u\n", head, out, in); + /* On error, stop handling until the next kick. */ + if (unlikely(head < 0)) + break; + /* Nothing new? Wait for eventfd to tell us they refilled. */ + if (head == vq->num) { + if (unlikely(vhost_enable_notify(&vs->dev, vq))) { + vhost_disable_notify(&vs->dev, vq); + continue; + } + break; + } + +/* FIXME: BIDI operation */ + if (out == 1 && in == 1) { + data_direction = DMA_NONE; + data_first = 0; + data_num = 0; + } else if (out == 1 && in > 1) { + data_direction = DMA_FROM_DEVICE; + data_first = out + 1; + data_num = in - 1; + } else if (out > 1 && in == 1) { + data_direction = DMA_TO_DEVICE; + data_first = 1; + data_num = out - 1; + } else { + pr_err("Invalid buffer layout out: %u in: %u\n", out, in); + break; + } + + /* + * Check for a sane resp buffer so we can report errors to + * the guest. + */ + if (unlikely(vq->iov[out].iov_len !+ sizeof(struct virtio_scsi_cmd_resp))) { + pr_err("Expecting virtio_scsi_cmd_resp, got %zu bytes\n", + vq->iov[out].iov_len); + break; + } + + if (unlikely(vq->iov[0].iov_len != sizeof(v_req))) { + pr_err("Expecting virtio_scsi_cmd_req, got %zu bytes\n", + vq->iov[0].iov_len); + break; + } + pr_debug("Calling __copy_from_user: vq->iov[0].iov_base: %p, len: %lu\n", + vq->iov[0].iov_base, sizeof(v_req)); + ret = __copy_from_user(&v_req, vq->iov[0].iov_base, sizeof(v_req)); + if (unlikely(ret)) { + pr_err("Faulted on virtio_scsi_cmd_req\n"); + break; + } + + exp_data_len = 0; + for (i = 0; i < data_num; i++) { + exp_data_len += vq->iov[data_first + i].iov_len; + } + + tv_cmd = vhost_scsi_allocate_cmd(tv_tpg, &v_req, + exp_data_len, data_direction); + if (IS_ERR(tv_cmd)) { + pr_err("vhost_scsi_allocate_cmd failed %ld\n", PTR_ERR(tv_cmd)); + break; + } + pr_debug("Allocated tv_cmd: %p exp_data_len: %d, data_direction: %d\n", + tv_cmd, exp_data_len, data_direction); + + tv_cmd->tvc_vhost = vs; + + if (unlikely(vq->iov[out].iov_len !+ sizeof(struct virtio_scsi_cmd_resp))) { + pr_err("Expecting virtio_scsi_cmd_resp, " + " got %zu bytes, out: %d, in: %d\n", vq->iov[out].iov_len, out, in); + break; + } + + tv_cmd->tvc_resp = vq->iov[out].iov_base; + + /* + * Copy in the recieved CDB descriptor into tv_cmd->tvc_cdb + * that will be used by tcm_vhost_new_cmd_map() and down into + * target_setup_cmd_from_cdb() + */ + memcpy(tv_cmd->tvc_cdb, v_req.cdb, TCM_VHOST_MAX_CDB_SIZE); + /* + * Check that the recieved CDB size does not exceeded our + * hardcoded max for tcm_vhost + */ + /* TODO what if cdb was too small for varlen cdb header? */ + if (unlikely(scsi_command_size(tv_cmd->tvc_cdb) > TCM_VHOST_MAX_CDB_SIZE)) { + pr_err("Received SCSI CDB with command_size: %d that exceeds" + " SCSI_MAX_VARLEN_CDB_SIZE: %d\n", + scsi_command_size(tv_cmd->tvc_cdb), TCM_VHOST_MAX_CDB_SIZE); + break; /* TODO */ + } + lun = ((v_req.lun[2] << 8) | v_req.lun[3]) & 0x3FFF; + + pr_debug("vhost_scsi got command opcode: %#02x, lun: %d\n", + tv_cmd->tvc_cdb[0], lun); + + if (data_direction != DMA_NONE) { + ret = vhost_scsi_map_iov_to_sgl(tv_cmd, &vq->iov[data_first], + data_num, data_direction == DMA_TO_DEVICE); + if (unlikely(ret)) { + pr_err("Failed to map iov to sgl\n"); + break; /* TODO */ + } + } + + /* + * Save the descriptor from vhost_get_vq_desc() to be used to + * complete the virtio-scsi request in TCM callback context via + * tcm_vhost_queue_data_in() and tcm_vhost_queue_status() + */ + tv_cmd->tvc_vq_desc = head; + /* + * Locate the struct se_lun pointer based on v_req->lun, and + * attach it to struct se_cmd + */ + if (transport_lookup_cmd_lun(&tv_cmd->tvc_se_cmd, lun) < 0) { + pr_err("Failed to look up lun: %d\n", lun); + /* NON_EXISTENT_LUN */ + transport_send_check_condition_and_sense(&tv_cmd->tvc_se_cmd, + tv_cmd->tvc_se_cmd.scsi_sense_reason, 0); + continue; + } + /* + * Now queue up the newly allocated se_cmd to be processed + * within TCM thread context to finish the setup and dispatched + * into a TCM backend struct se_device. + */ + transport_generic_handle_cdb_map(&tv_cmd->tvc_se_cmd); + } + + mutex_unlock(&vq->mutex); +} + +static void vhost_scsi_ctl_handle_kick(struct vhost_work *work) +{ + pr_err("%s: The handling func for control queue.\n", __func__); +} + +static void vhost_scsi_evt_handle_kick(struct vhost_work *work) +{ + pr_err("%s: The handling func for event queue.\n", __func__); +} + +static void vhost_scsi_handle_kick(struct vhost_work *work) +{ + struct vhost_virtqueue *vq = container_of(work, struct vhost_virtqueue, + poll.work); + struct vhost_scsi *vs = container_of(vq->dev, struct vhost_scsi, dev); + + vhost_scsi_handle_vq(vs); +} + +/* + * Called from vhost_scsi_ioctl() context to walk the list of available tcm_vhost_tpg + * with an active struct tcm_vhost_nexus + */ +static int vhost_scsi_set_endpoint( + struct vhost_scsi *vs, + struct vhost_vring_target *t) +{ + struct tcm_vhost_tport *tv_tport; + struct tcm_vhost_tpg *tv_tpg; + int index; + + mutex_lock(&vs->dev.mutex); + /* Verify that ring has been setup correctly. */ + for (index = 0; index < vs->dev.nvqs; ++index) { + /* Verify that ring has been setup correctly. */ + if (!vhost_vq_access_ok(&vs->vqs[index])) { + mutex_unlock(&vs->dev.mutex); + return -EFAULT; + } + } + + if (vs->vs_tpg) { + mutex_unlock(&vs->dev.mutex); + return -EEXIST; + } + mutex_unlock(&vs->dev.mutex); + + mutex_lock(&tcm_vhost_mutex); + list_for_each_entry(tv_tpg, &tcm_vhost_list, tv_tpg_list) { + mutex_lock(&tv_tpg->tv_tpg_mutex); + if (!tv_tpg->tpg_nexus) { + mutex_unlock(&tv_tpg->tv_tpg_mutex); + continue; + } + if (atomic_read(&tv_tpg->tv_tpg_vhost_count)) { + mutex_unlock(&tv_tpg->tv_tpg_mutex); + continue; + } + tv_tport = tv_tpg->tport; + + if (!strcmp(tv_tport->tport_name, t->vhost_wwpn) && + (tv_tpg->tport_tpgt == t->vhost_tpgt)) { + atomic_inc(&tv_tpg->tv_tpg_vhost_count); + smp_mb__after_atomic_inc(); + mutex_unlock(&tv_tpg->tv_tpg_mutex); + mutex_unlock(&tcm_vhost_mutex); + + mutex_lock(&vs->dev.mutex); + vs->vs_tpg = tv_tpg; + atomic_inc(&vs->vhost_ref_cnt); + smp_mb__after_atomic_inc(); + mutex_unlock(&vs->dev.mutex); + return 0; + } + mutex_unlock(&tv_tpg->tv_tpg_mutex); + } + mutex_unlock(&tcm_vhost_mutex); + return -EINVAL; +} + +static int vhost_scsi_clear_endpoint( + struct vhost_scsi *vs, + struct vhost_vring_target *t) +{ + struct tcm_vhost_tport *tv_tport; + struct tcm_vhost_tpg *tv_tpg; + int index; + + mutex_lock(&vs->dev.mutex); + /* Verify that ring has been setup correctly. */ + for (index = 0; index < vs->dev.nvqs; ++index) { + if (!vhost_vq_access_ok(&vs->vqs[index])) { + mutex_unlock(&vs->dev.mutex); + return -EFAULT; + } + } + + if (!vs->vs_tpg) { + mutex_unlock(&vs->dev.mutex); + return -ENODEV; + } + tv_tpg = vs->vs_tpg; + tv_tport = tv_tpg->tport; + + if (strcmp(tv_tport->tport_name, t->vhost_wwpn) || + (tv_tpg->tport_tpgt != t->vhost_tpgt)) { + mutex_unlock(&vs->dev.mutex); + pr_warn("tv_tport->tport_name: %s, tv_tpg->tport_tpgt: %hu" + " does not match t->vhost_wwpn: %s, t->vhost_tpgt: %hu\n", + tv_tport->tport_name, tv_tpg->tport_tpgt, + t->vhost_wwpn, t->vhost_tpgt); + return -EINVAL; + } + atomic_dec(&tv_tpg->tv_tpg_vhost_count); + vs->vs_tpg = NULL; + mutex_unlock(&vs->dev.mutex); + + return 0; +} + +static int vhost_scsi_open(struct inode *inode, struct file *f) +{ + struct vhost_scsi *s; + int r; + + s = kzalloc(sizeof(*s), GFP_KERNEL); + if (!s) + return -ENOMEM; + + vhost_work_init(&s->vs_completion_work, vhost_scsi_complete_cmd_work); + INIT_LIST_HEAD(&s->vs_completion_list); + spin_lock_init(&s->vs_completion_lock); + + s->vqs[0].handle_kick = vhost_scsi_ctl_handle_kick; + s->vqs[1].handle_kick = vhost_scsi_evt_handle_kick; + s->vqs[2].handle_kick = vhost_scsi_handle_kick; + r = vhost_dev_init(&s->dev, s->vqs, 3); + if (r < 0) { + kfree(s); + return r; + } + + f->private_data = s; + return 0; +} + +static int vhost_scsi_release(struct inode *inode, struct file *f) +{ + struct vhost_scsi *s = f->private_data; + + if (s->vs_tpg && s->vs_tpg->tport) { + struct vhost_vring_target backend; + memcpy(backend.vhost_wwpn, s->vs_tpg->tport->tport_name, sizeof(backend.vhost_wwpn)); + backend.vhost_tpgt = s->vs_tpg->tport_tpgt; + vhost_scsi_clear_endpoint(s, &backend); + } + + vhost_dev_cleanup(&s->dev, false); + kfree(s); + return 0; +} + +static int vhost_scsi_set_features(struct vhost_scsi *vs, u64 features) +{ + if (features & ~VHOST_FEATURES) + return -EOPNOTSUPP; + + mutex_lock(&vs->dev.mutex); + if ((features & (1 << VHOST_F_LOG_ALL)) && + !vhost_log_access_ok(&vs->dev)) { + mutex_unlock(&vs->dev.mutex); + return -EFAULT; + } + vs->dev.acked_features = features; + /* TODO possibly smp_wmb() and flush vqs */ + mutex_unlock(&vs->dev.mutex); + return 0; +} + +static long vhost_scsi_ioctl(struct file *f, unsigned int ioctl, + unsigned long arg) +{ + struct vhost_scsi *vs = f->private_data; + struct vhost_vring_target backend; + void __user *argp = (void __user *)arg; + u64 __user *featurep = argp; + u64 features; + int r; + + switch (ioctl) { + case VHOST_SCSI_SET_ENDPOINT: + if (copy_from_user(&backend, argp, sizeof backend)) + return -EFAULT; + + return vhost_scsi_set_endpoint(vs, &backend); + case VHOST_SCSI_CLEAR_ENDPOINT: + if (copy_from_user(&backend, argp, sizeof backend)) + return -EFAULT; + + return vhost_scsi_clear_endpoint(vs, &backend); + case VHOST_GET_FEATURES: + features = VHOST_FEATURES; + if (copy_to_user(featurep, &features, sizeof features)) + return -EFAULT; + return 0; + case VHOST_SET_FEATURES: + if (copy_from_user(&features, featurep, sizeof features)) + return -EFAULT; + return vhost_scsi_set_features(vs, features); + default: + mutex_lock(&vs->dev.mutex); + r = vhost_dev_ioctl(&vs->dev, ioctl, arg); + mutex_unlock(&vs->dev.mutex); + return r; + } +} + +static const struct file_operations vhost_scsi_fops = { + .owner = THIS_MODULE, + .release = vhost_scsi_release, + .unlocked_ioctl = vhost_scsi_ioctl, + /* TODO compat ioctl? */ + .open = vhost_scsi_open, + .llseek = noop_llseek, +}; + +static struct miscdevice vhost_scsi_misc = { + MISC_DYNAMIC_MINOR, + "vhost-scsi", + &vhost_scsi_fops, +}; + +static int __init vhost_scsi_register(void) +{ + return misc_register(&vhost_scsi_misc); +} + +static int vhost_scsi_deregister(void) +{ + return misc_deregister(&vhost_scsi_misc); +} + +static char *tcm_vhost_dump_proto_id(struct tcm_vhost_tport *tport) +{ + switch (tport->tport_proto_id) { + case SCSI_PROTOCOL_SAS: + return "SAS"; + case SCSI_PROTOCOL_FCP: + return "FCP"; + case SCSI_PROTOCOL_ISCSI: + return "iSCSI"; + default: + break; + } + + return "Unknown"; +} + +static int tcm_vhost_port_link( + struct se_portal_group *se_tpg, + struct se_lun *lun) +{ + struct tcm_vhost_tpg *tv_tpg = container_of(se_tpg, + struct tcm_vhost_tpg, se_tpg); + + atomic_inc(&tv_tpg->tv_tpg_port_count); + smp_mb__after_atomic_inc(); + + return 0; +} + +static void tcm_vhost_port_unlink( + struct se_portal_group *se_tpg, + struct se_lun *se_lun) +{ + struct tcm_vhost_tpg *tv_tpg = container_of(se_tpg, + struct tcm_vhost_tpg, se_tpg); + + atomic_dec(&tv_tpg->tv_tpg_port_count); + smp_mb__after_atomic_dec(); +} + +static struct se_node_acl *tcm_vhost_make_nodeacl( + struct se_portal_group *se_tpg, + struct config_group *group, + const char *name) +{ + struct se_node_acl *se_nacl, *se_nacl_new; + struct tcm_vhost_nacl *nacl; + u64 wwpn = 0; + u32 nexus_depth; + + /* tcm_vhost_parse_wwn(name, &wwpn, 1) < 0) + return ERR_PTR(-EINVAL); */ + se_nacl_new = tcm_vhost_alloc_fabric_acl(se_tpg); + if (!se_nacl_new) + return ERR_PTR(-ENOMEM); +//#warning FIXME: Hardcoded nexus depth in tcm_vhost_make_nodeacl() + nexus_depth = 1; + /* + * se_nacl_new may be released by core_tpg_add_initiator_node_acl() + * when converting a NodeACL from demo mode -> explict + */ + se_nacl = core_tpg_add_initiator_node_acl(se_tpg, se_nacl_new, + name, nexus_depth); + if (IS_ERR(se_nacl)) { + tcm_vhost_release_fabric_acl(se_tpg, se_nacl_new); + return se_nacl; + } + /* + * Locate our struct tcm_vhost_nacl and set the FC Nport WWPN + */ + nacl = container_of(se_nacl, struct tcm_vhost_nacl, se_node_acl); + nacl->iport_wwpn = wwpn; + /* tcm_vhost_format_wwn(&nacl->iport_name[0], TCM_VHOST_NAMELEN, wwpn); */ + + return se_nacl; +} + +static void tcm_vhost_drop_nodeacl(struct se_node_acl *se_acl) +{ + struct tcm_vhost_nacl *nacl = container_of(se_acl, + struct tcm_vhost_nacl, se_node_acl); + core_tpg_del_initiator_node_acl(se_acl->se_tpg, se_acl, 1); + kfree(nacl); +} + +static int tcm_vhost_make_nexus( + struct tcm_vhost_tpg *tv_tpg, + const char *name) +{ + struct se_portal_group *se_tpg; + struct tcm_vhost_nexus *tv_nexus; + + mutex_lock(&tv_tpg->tv_tpg_mutex); + if (tv_tpg->tpg_nexus) { + mutex_unlock(&tv_tpg->tv_tpg_mutex); + pr_debug("tv_tpg->tpg_nexus already exists\n"); + return -EEXIST; + } + se_tpg = &tv_tpg->se_tpg; + + tv_nexus = kzalloc(sizeof(struct tcm_vhost_nexus), GFP_KERNEL); + if (!tv_nexus) { + mutex_unlock(&tv_tpg->tv_tpg_mutex); + pr_err("Unable to allocate struct tcm_vhost_nexus\n"); + return -ENOMEM; + } + /* + * Initialize the struct se_session pointer + */ + tv_nexus->tvn_se_sess = transport_init_session(); + if (IS_ERR(tv_nexus->tvn_se_sess)) { + mutex_unlock(&tv_tpg->tv_tpg_mutex); + kfree(tv_nexus); + return -ENOMEM; + } + /* + * Since we are running in 'demo mode' this call with generate a + * struct se_node_acl for the tcm_vhost struct se_portal_group with + * the SCSI Initiator port name of the passed configfs group 'name'. + */ + tv_nexus->tvn_se_sess->se_node_acl = core_tpg_check_initiator_node_acl( + se_tpg, (unsigned char *)name); + if (!tv_nexus->tvn_se_sess->se_node_acl) { + mutex_unlock(&tv_tpg->tv_tpg_mutex); + pr_debug("core_tpg_check_initiator_node_acl() failed" + " for %s\n", name); + transport_free_session(tv_nexus->tvn_se_sess); + kfree(tv_nexus); + return -ENOMEM; + } + /* + * Now register the TCM vHost virtual I_T Nexus as active with the + * call to __transport_register_session() + */ + __transport_register_session(se_tpg, tv_nexus->tvn_se_sess->se_node_acl, + tv_nexus->tvn_se_sess, tv_nexus); + tv_tpg->tpg_nexus = tv_nexus; + + mutex_unlock(&tv_tpg->tv_tpg_mutex); + return 0; +} + +static int tcm_vhost_drop_nexus( + struct tcm_vhost_tpg *tpg) +{ + struct se_session *se_sess; + struct tcm_vhost_nexus *tv_nexus; + + mutex_lock(&tpg->tv_tpg_mutex); + tv_nexus = tpg->tpg_nexus; + if (!tv_nexus) { + mutex_unlock(&tpg->tv_tpg_mutex); + return -ENODEV; + } + + se_sess = tv_nexus->tvn_se_sess; + if (!se_sess) { + mutex_unlock(&tpg->tv_tpg_mutex); + return -ENODEV; + } + + if (atomic_read(&tpg->tv_tpg_port_count)) { + mutex_unlock(&tpg->tv_tpg_mutex); + pr_err("Unable to remove TCM_vHost I_T Nexus with" + " active TPG port count: %d\n", + atomic_read(&tpg->tv_tpg_port_count)); + return -EPERM; + } + + if (atomic_read(&tpg->tv_tpg_vhost_count)) { + pr_err("Unable to remove TCM_vHost I_T Nexus with" + " active TPG vhost count: %d\n", + atomic_read(&tpg->tv_tpg_vhost_count)); + return -EPERM; + } + + pr_debug("TCM_vHost_ConfigFS: Removing I_T Nexus to emulated" + " %s Initiator Port: %s\n", tcm_vhost_dump_proto_id(tpg->tport), + tv_nexus->tvn_se_sess->se_node_acl->initiatorname); + /* + * Release the SCSI I_T Nexus to the emulated vHost Target Port + */ + transport_deregister_session(tv_nexus->tvn_se_sess); + tpg->tpg_nexus = NULL; + mutex_unlock(&tpg->tv_tpg_mutex); + + kfree(tv_nexus); + return 0; +} + +static ssize_t tcm_vhost_tpg_show_nexus( + struct se_portal_group *se_tpg, + char *page) +{ + struct tcm_vhost_tpg *tv_tpg = container_of(se_tpg, + struct tcm_vhost_tpg, se_tpg); + struct tcm_vhost_nexus *tv_nexus; + ssize_t ret; + + mutex_lock(&tv_tpg->tv_tpg_mutex); + tv_nexus = tv_tpg->tpg_nexus; + if (!tv_nexus) { + mutex_unlock(&tv_tpg->tv_tpg_mutex); + return -ENODEV; + } + ret = snprintf(page, PAGE_SIZE, "%s\n", + tv_nexus->tvn_se_sess->se_node_acl->initiatorname); + mutex_unlock(&tv_tpg->tv_tpg_mutex); + + return ret; +} + +static ssize_t tcm_vhost_tpg_store_nexus( + struct se_portal_group *se_tpg, + const char *page, + size_t count) +{ + struct tcm_vhost_tpg *tv_tpg = container_of(se_tpg, + struct tcm_vhost_tpg, se_tpg); + struct tcm_vhost_tport *tport_wwn = tv_tpg->tport; + unsigned char i_port[TCM_VHOST_NAMELEN], *ptr, *port_ptr; + int ret; + /* + * Shutdown the active I_T nexus if 'NULL' is passed.. + */ + if (!strncmp(page, "NULL", 4)) { + ret = tcm_vhost_drop_nexus(tv_tpg); + return (!ret) ? count : ret; + } + /* + * Otherwise make sure the passed virtual Initiator port WWN matches + * the fabric protocol_id set in tcm_vhost_make_tport(), and call + * tcm_vhost_make_nexus(). + */ + if (strlen(page) > TCM_VHOST_NAMELEN) { + pr_err("Emulated NAA Sas Address: %s, exceeds" + " max: %d\n", page, TCM_VHOST_NAMELEN); + return -EINVAL; + } + snprintf(&i_port[0], TCM_VHOST_NAMELEN, "%s", page); + + ptr = strstr(i_port, "naa."); + if (ptr) { + if (tport_wwn->tport_proto_id != SCSI_PROTOCOL_SAS) { + pr_err("Passed SAS Initiator Port %s does not" + " match target port protoid: %s\n", i_port, + tcm_vhost_dump_proto_id(tport_wwn)); + return -EINVAL; + } + port_ptr = &i_port[0]; + goto check_newline; + } + ptr = strstr(i_port, "fc."); + if (ptr) { + if (tport_wwn->tport_proto_id != SCSI_PROTOCOL_FCP) { + pr_err("Passed FCP Initiator Port %s does not" + " match target port protoid: %s\n", i_port, + tcm_vhost_dump_proto_id(tport_wwn)); + return -EINVAL; + } + port_ptr = &i_port[3]; /* Skip over "fc." */ + goto check_newline; + } + ptr = strstr(i_port, "iqn."); + if (ptr) { + if (tport_wwn->tport_proto_id != SCSI_PROTOCOL_ISCSI) { + pr_err("Passed iSCSI Initiator Port %s does not" + " match target port protoid: %s\n", i_port, + tcm_vhost_dump_proto_id(tport_wwn)); + return -EINVAL; + } + port_ptr = &i_port[0]; + goto check_newline; + } + pr_err("Unable to locate prefix for emulated Initiator Port:" + " %s\n", i_port); + return -EINVAL; + /* + * Clear any trailing newline for the NAA WWN + */ +check_newline: + if (i_port[strlen(i_port)-1] == '\n') + i_port[strlen(i_port)-1] = '\0'; + + ret = tcm_vhost_make_nexus(tv_tpg, port_ptr); + if (ret < 0) + return ret; + + return count; +} + +TF_TPG_BASE_ATTR(tcm_vhost, nexus, S_IRUGO | S_IWUSR); + +static struct configfs_attribute *tcm_vhost_tpg_attrs[] = { + &tcm_vhost_tpg_nexus.attr, + NULL, +}; + +static struct se_portal_group *tcm_vhost_make_tpg( + struct se_wwn *wwn, + struct config_group *group, + const char *name) +{ + struct tcm_vhost_tport*tport = container_of(wwn, + struct tcm_vhost_tport, tport_wwn); + + struct tcm_vhost_tpg *tpg; + unsigned long tpgt; + int ret; + + if (strstr(name, "tpgt_") != name) + return ERR_PTR(-EINVAL); + if (strict_strtoul(name + 5, 10, &tpgt) || tpgt > UINT_MAX) + return ERR_PTR(-EINVAL); + + tpg = kzalloc(sizeof(struct tcm_vhost_tpg), GFP_KERNEL); + if (!tpg) { + pr_err("Unable to allocate struct tcm_vhost_tpg"); + return ERR_PTR(-ENOMEM); + } + mutex_init(&tpg->tv_tpg_mutex); + INIT_LIST_HEAD(&tpg->tv_tpg_list); + tpg->tport = tport; + tpg->tport_tpgt = tpgt; + + ret = core_tpg_register(&tcm_vhost_fabric_configfs->tf_ops, wwn, + &tpg->se_tpg, tpg, TRANSPORT_TPG_TYPE_NORMAL); + if (ret < 0) { + kfree(tpg); + return NULL; + } + mutex_lock(&tcm_vhost_mutex); + list_add_tail(&tpg->tv_tpg_list, &tcm_vhost_list); + mutex_unlock(&tcm_vhost_mutex); + + return &tpg->se_tpg; +} + +static void tcm_vhost_drop_tpg(struct se_portal_group *se_tpg) +{ + struct tcm_vhost_tpg *tpg = container_of(se_tpg, + struct tcm_vhost_tpg, se_tpg); + + mutex_lock(&tcm_vhost_mutex); + list_del(&tpg->tv_tpg_list); + mutex_unlock(&tcm_vhost_mutex); + /* + * Release the virtual I_T Nexus for this vHost TPG + */ + tcm_vhost_drop_nexus(tpg); + /* + * Deregister the se_tpg from TCM.. + */ + core_tpg_deregister(se_tpg); + kfree(tpg); +} + +static struct se_wwn *tcm_vhost_make_tport( + struct target_fabric_configfs *tf, + struct config_group *group, + const char *name) +{ + struct tcm_vhost_tport *tport; + char *ptr; + u64 wwpn = 0; + int off = 0; + + /* if (tcm_vhost_parse_wwn(name, &wwpn, 1) < 0) + return ERR_PTR(-EINVAL); */ + + tport = kzalloc(sizeof(struct tcm_vhost_tport), GFP_KERNEL); + if (!tport) { + pr_err("Unable to allocate struct tcm_vhost_tport"); + return ERR_PTR(-ENOMEM); + } + tport->tport_wwpn = wwpn; + /* tcm_vhost_format_wwn(&tport->tport_name[0], TCM_VHOST__NAMELEN, wwpn); */ + /* + * Determine the emulated Protocol Identifier and Target Port Name + * based on the incoming configfs directory name. + */ + ptr = strstr(name, "naa."); + if (ptr) { + tport->tport_proto_id = SCSI_PROTOCOL_SAS; + goto check_len; + } + ptr = strstr(name, "fc."); + if (ptr) { + tport->tport_proto_id = SCSI_PROTOCOL_FCP; + off = 3; /* Skip over "fc." */ + goto check_len; + } + ptr = strstr(name, "iqn."); + if (ptr) { + tport->tport_proto_id = SCSI_PROTOCOL_ISCSI; + goto check_len; + } + + pr_err("Unable to locate prefix for emulated Target Port:" + " %s\n", name); + return ERR_PTR(-EINVAL); + +check_len: + if (strlen(name) > TCM_VHOST_NAMELEN) { + pr_err("Emulated %s Address: %s, exceeds" + " max: %d\n", name, tcm_vhost_dump_proto_id(tport), + TCM_VHOST_NAMELEN); + kfree(tport); + return ERR_PTR(-EINVAL); + } + snprintf(&tport->tport_name[0], TCM_VHOST_NAMELEN, "%s", &name[off]); + + pr_debug("TCM_VHost_ConfigFS: Allocated emulated Target" + " %s Address: %s\n", tcm_vhost_dump_proto_id(tport), name); + + return &tport->tport_wwn; +} + +static void tcm_vhost_drop_tport(struct se_wwn *wwn) +{ + struct tcm_vhost_tport *tport = container_of(wwn, + struct tcm_vhost_tport, tport_wwn); + + pr_debug("TCM_VHost_ConfigFS: Deallocating emulated Target" + " %s Address: %s\n", tcm_vhost_dump_proto_id(tport), + tport->tport_name);; + + kfree(tport); +} + +static ssize_t tcm_vhost_wwn_show_attr_version( + struct target_fabric_configfs *tf, + char *page) +{ + return sprintf(page, "TCM_VHOST fabric module %s on %s/%s" + "on "UTS_RELEASE"\n", TCM_VHOST_VERSION, utsname()->sysname, + utsname()->machine); +} + +TF_WWN_ATTR_RO(tcm_vhost, version); + +static struct configfs_attribute *tcm_vhost_wwn_attrs[] = { + &tcm_vhost_wwn_version.attr, + NULL, +}; + +static struct target_core_fabric_ops tcm_vhost_ops = { + .get_fabric_name = tcm_vhost_get_fabric_name, + .get_fabric_proto_ident = tcm_vhost_get_fabric_proto_ident, + .tpg_get_wwn = tcm_vhost_get_fabric_wwn, + .tpg_get_tag = tcm_vhost_get_tag, + .tpg_get_default_depth = tcm_vhost_get_default_depth, + .tpg_get_pr_transport_id = tcm_vhost_get_pr_transport_id, + .tpg_get_pr_transport_id_len = tcm_vhost_get_pr_transport_id_len, + .tpg_parse_pr_out_transport_id = tcm_vhost_parse_pr_out_transport_id, + .tpg_check_demo_mode = tcm_vhost_check_true, + .tpg_check_demo_mode_cache = tcm_vhost_check_true, + .tpg_check_demo_mode_write_protect = tcm_vhost_check_false, + .tpg_check_prod_mode_write_protect = tcm_vhost_check_false, + .tpg_alloc_fabric_acl = tcm_vhost_alloc_fabric_acl, + .tpg_release_fabric_acl = tcm_vhost_release_fabric_acl, + .tpg_get_inst_index = tcm_vhost_tpg_get_inst_index, + .new_cmd_map = tcm_vhost_new_cmd_map, + .release_cmd = tcm_vhost_release_cmd, + .shutdown_session = tcm_vhost_shutdown_session, + .close_session = tcm_vhost_close_session, + .sess_get_index = tcm_vhost_sess_get_index, + .sess_get_initiator_sid = NULL, + .write_pending = tcm_vhost_write_pending, + .write_pending_status = tcm_vhost_write_pending_status, + .set_default_node_attributes = tcm_vhost_set_default_node_attrs, + .get_task_tag = tcm_vhost_get_task_tag, + .get_cmd_state = tcm_vhost_get_cmd_state, + .queue_data_in = tcm_vhost_queue_data_in, + .queue_status = tcm_vhost_queue_status, + .queue_tm_rsp = tcm_vhost_queue_tm_rsp, + .get_fabric_sense_len = tcm_vhost_get_fabric_sense_len, + .set_fabric_sense_len = tcm_vhost_set_fabric_sense_len, + /* + * Setup function pointers for generic logic in target_core_fabric_configfs.c + */ + .fabric_make_wwn = tcm_vhost_make_tport, + .fabric_drop_wwn = tcm_vhost_drop_tport, + .fabric_make_tpg = tcm_vhost_make_tpg, + .fabric_drop_tpg = tcm_vhost_drop_tpg, + .fabric_post_link = tcm_vhost_port_link, + .fabric_pre_unlink = tcm_vhost_port_unlink, + .fabric_make_np = NULL, + .fabric_drop_np = NULL, + .fabric_make_nodeacl = tcm_vhost_make_nodeacl, + .fabric_drop_nodeacl = tcm_vhost_drop_nodeacl, +}; + +static int tcm_vhost_register_configfs(void) +{ + struct target_fabric_configfs *fabric; + int ret; + + pr_debug("TCM_VHOST fabric module %s on %s/%s" + " on "UTS_RELEASE"\n",TCM_VHOST_VERSION, utsname()->sysname, + utsname()->machine); + /* + * Register the top level struct config_item_type with TCM core + */ + fabric = target_fabric_configfs_init(THIS_MODULE, "vhost"); + if (IS_ERR(fabric)) { + pr_err("target_fabric_configfs_init() failed\n"); + return PTR_ERR(fabric); + } + /* + * Setup fabric->tf_ops from our local tcm_vhost_ops + */ + fabric->tf_ops = tcm_vhost_ops; + /* + * Setup default attribute lists for various fabric->tf_cit_tmpl + */ + TF_CIT_TMPL(fabric)->tfc_wwn_cit.ct_attrs = tcm_vhost_wwn_attrs; + TF_CIT_TMPL(fabric)->tfc_tpg_base_cit.ct_attrs = tcm_vhost_tpg_attrs; + TF_CIT_TMPL(fabric)->tfc_tpg_attrib_cit.ct_attrs = NULL; + TF_CIT_TMPL(fabric)->tfc_tpg_param_cit.ct_attrs = NULL; + TF_CIT_TMPL(fabric)->tfc_tpg_np_base_cit.ct_attrs = NULL; + TF_CIT_TMPL(fabric)->tfc_tpg_nacl_base_cit.ct_attrs = NULL; + TF_CIT_TMPL(fabric)->tfc_tpg_nacl_attrib_cit.ct_attrs = NULL; + TF_CIT_TMPL(fabric)->tfc_tpg_nacl_auth_cit.ct_attrs = NULL; + TF_CIT_TMPL(fabric)->tfc_tpg_nacl_param_cit.ct_attrs = NULL; + /* + * Register the fabric for use within TCM + */ + ret = target_fabric_configfs_register(fabric); + if (ret < 0) { + pr_err("target_fabric_configfs_register() failed" + " for TCM_VHOST\n"); + return ret; + } + /* + * Setup our local pointer to *fabric + */ + tcm_vhost_fabric_configfs = fabric; + pr_debug("TCM_VHOST[0] - Set fabric -> tcm_vhost_fabric_configfs\n"); + return 0; +}; + +static void tcm_vhost_deregister_configfs(void) +{ + if (!tcm_vhost_fabric_configfs) + return; + + target_fabric_configfs_deregister(tcm_vhost_fabric_configfs); + tcm_vhost_fabric_configfs = NULL; + pr_debug("TCM_VHOST[0] - Cleared tcm_vhost_fabric_configfs\n"); +}; + +static int __init tcm_vhost_init(void) +{ + int ret; + + ret = vhost_scsi_register(); + if (ret < 0) + return ret; + + ret = tcm_vhost_register_configfs(); + if (ret < 0) + return ret; + + return 0; +}; + +static void tcm_vhost_exit(void) +{ + tcm_vhost_deregister_configfs(); + vhost_scsi_deregister(); +}; + +MODULE_DESCRIPTION("TCM_VHOST series fabric driver"); +MODULE_LICENSE("GPL"); +module_init(tcm_vhost_init); +module_exit(tcm_vhost_exit); diff --git a/drivers/vhost/tcm_vhost.h b/drivers/vhost/tcm_vhost.h new file mode 100644 index 0000000..0e8951b --- /dev/null +++ b/drivers/vhost/tcm_vhost.h @@ -0,0 +1,70 @@ +#define TCM_VHOST_VERSION "v0.1" +#define TCM_VHOST_NAMELEN 256 +#define TCM_VHOST_MAX_CDB_SIZE 32 + +struct tcm_vhost_cmd { + /* Descriptor from vhost_get_vq_desc() for virt_queue segment */ + int tvc_vq_desc; + /* The Tag from include/linux/virtio_scsi.h:struct virtio_scsi_cmd_req */ + u64 tvc_tag; + /* The number of scatterlists associated with this cmd */ + u32 tvc_sgl_count; + /* Pointer to the SGL formatted memory from virtio-scsi */ + struct scatterlist *tvc_sgl; + /* Pointer to response */ + struct virtio_scsi_cmd_resp __user *tvc_resp; + /* Pointer to vhost_scsi for our device */ + struct vhost_scsi *tvc_vhost; + /* The TCM I/O descriptor that is accessed via container_of() */ + struct se_cmd tvc_se_cmd; + /* Copy of the incoming SCSI command descriptor block (CDB) */ + unsigned char tvc_cdb[TCM_VHOST_MAX_CDB_SIZE]; + /* Sense buffer that will be mapped into outgoing status */ + unsigned char tvc_sense_buf[TRANSPORT_SENSE_BUFFER]; + /* Completed commands list, serviced from vhost worker thread */ + struct list_head tvc_completion_list; +}; + +struct tcm_vhost_nexus { + /* Pointer to TCM session for I_T Nexus */ + struct se_session *tvn_se_sess; +}; + +struct tcm_vhost_nacl { + /* Binary World Wide unique Port Name for Vhost Initiator port */ + u64 iport_wwpn; + /* ASCII formatted WWPN for Sas Initiator port */ + char iport_name[TCM_VHOST_NAMELEN]; + /* Returned by tcm_vhost_make_nodeacl() */ + struct se_node_acl se_node_acl; +}; + +struct tcm_vhost_tpg { + /* Vhost port target portal group tag for TCM */ + u16 tport_tpgt; + /* Used to track number of TPG Port/Lun Links wrt to explict I_T Nexus shutdown */ + atomic_t tv_tpg_port_count; + /* Used for vhost_scsi device reference to tpg_nexus */ + atomic_t tv_tpg_vhost_count; + /* list for tcm_vhost_list */ + struct list_head tv_tpg_list; + /* Used to protect access for tpg_nexus */ + struct mutex tv_tpg_mutex; + /* Pointer to the TCM VHost I_T Nexus for this TPG endpoint */ + struct tcm_vhost_nexus *tpg_nexus; + /* Pointer back to tcm_vhost_tport */ + struct tcm_vhost_tport *tport; + /* Returned by tcm_vhost_make_tpg() */ + struct se_portal_group se_tpg; +}; + +struct tcm_vhost_tport { + /* SCSI protocol the tport is providing */ + u8 tport_proto_id; + /* Binary World Wide unique Port Name for Vhost Target port */ + u64 tport_wwpn; + /* ASCII formatted WWPN for Vhost Target port */ + char tport_name[TCM_VHOST_NAMELEN]; + /* Returned by tcm_vhost_make_tport() */ + struct se_wwn tport_wwn; +}; -- 1.7.2.5
Nicholas A. Bellinger
2012-Jul-04 04:24 UTC
[PATCH 5/6] virtio-scsi: Add vdrv->scan for post VIRTIO_CONFIG_S_DRIVER_OK LUN scanning
From: Nicholas Bellinger <nab at linux-iscsi.org> This patch changes virtio-scsi to use a new virtio_driver->scan() callback so that scsi_scan_host() can be properly invoked once virtio_dev_probe() has set add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK) to signal active virtio-ring operation, instead of from within virtscsi_probe(). This fixes a bug where SCSI LUN scanning for both virtio-scsi-raw and virtio-scsi/tcm_vhost setups was happening before VIRTIO_CONFIG_S_DRIVER_OK had been set, causing VIRTIO_SCSI_S_BAD_TARGET to occur. This fixes a bug with virtio-scsi/tcm_vhost where LUN scan was not detecting LUNs. Tested with virtio-scsi-raw + virtio-scsi/tcm_vhost w/ IBLOCK on 3.5-rc2 code. Cc: Paolo Bonzini <pbonzini at redhat.com> Cc: Stefan Hajnoczi <stefanha at linux.vnet.ibm.com> Cc: Zhi Yong Wu <wuzhy at cn.ibm.com> Cc: Christoph Hellwig <hch at lst.de> Cc: Hannes Reinecke <hare at suse.de> Signed-off-by: Nicholas Bellinger <nab at linux-iscsi.org> --- drivers/scsi/virtio_scsi.c | 15 ++++++++++++--- drivers/virtio/virtio.c | 5 ++++- include/linux/virtio.h | 1 + 3 files changed, 17 insertions(+), 4 deletions(-) diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c index 1b38431..391b30d 100644 --- a/drivers/scsi/virtio_scsi.c +++ b/drivers/scsi/virtio_scsi.c @@ -481,9 +481,10 @@ static int __devinit virtscsi_probe(struct virtio_device *vdev) err = scsi_add_host(shost, &vdev->dev); if (err) goto scsi_add_host_failed; - - scsi_scan_host(shost); - + /* + * scsi_scan_host() happens in virtscsi_scan() via virtio_driver->scan() + * after VIRTIO_CONFIG_S_DRIVER_OK has been set.. + */ return 0; scsi_add_host_failed: @@ -493,6 +494,13 @@ virtscsi_init_failed: return err; } +static void virtscsi_scan(struct virtio_device *vdev) +{ + struct Scsi_Host *shost = (struct Scsi_Host *)vdev->priv; + + scsi_scan_host(shost); +} + static void virtscsi_remove_vqs(struct virtio_device *vdev) { /* Stop all the virtqueues. */ @@ -537,6 +545,7 @@ static struct virtio_driver virtio_scsi_driver = { .driver.owner = THIS_MODULE, .id_table = id_table, .probe = virtscsi_probe, + .scan = virtscsi_scan, #ifdef CONFIG_PM .freeze = virtscsi_freeze, .restore = virtscsi_restore, diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c index f355807..c3b3f7f 100644 --- a/drivers/virtio/virtio.c +++ b/drivers/virtio/virtio.c @@ -141,8 +141,11 @@ static int virtio_dev_probe(struct device *_d) err = drv->probe(dev); if (err) add_status(dev, VIRTIO_CONFIG_S_FAILED); - else + else { add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK); + if (drv->scan) + drv->scan(dev); + } return err; } diff --git a/include/linux/virtio.h b/include/linux/virtio.h index 8efd28a..a1ba8bb 100644 --- a/include/linux/virtio.h +++ b/include/linux/virtio.h @@ -92,6 +92,7 @@ struct virtio_driver { const unsigned int *feature_table; unsigned int feature_table_size; int (*probe)(struct virtio_device *dev); + void (*scan)(struct virtio_device *dev); void (*remove)(struct virtio_device *dev); void (*config_changed)(struct virtio_device *dev); #ifdef CONFIG_PM -- 1.7.2.5
Nicholas A. Bellinger
2012-Jul-04 04:24 UTC
[PATCH 6/6] virtio-scsi: Set shost->max_id=1 for tcm_vhost WWPNs
From: Nicholas Bellinger <nab at linux-iscsi.org> This is currently required for connecting to tcm_vhost in order to prevent the client LUN scan from detecting the same tcm_vhost WWPN on multiple target IDs. Cc: Paolo Bonzini <pbonzini at redhat.com> Cc: Stefan Hajnoczi <stefanha at linux.vnet.ibm.com> Cc: Zhi Yong Wu <wuzhy at cn.ibm.com> Cc: Christoph Hellwig <hch at lst.de> Cc: Hannes Reinecke <hare at suse.de> Signed-off-by: Nicholas Bellinger <nab at linux-iscsi.org> --- drivers/scsi/virtio_scsi.c | 5 ++++- 1 files changed, 4 insertions(+), 1 deletions(-) diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c index 391b30d..8711951 100644 --- a/drivers/scsi/virtio_scsi.c +++ b/drivers/scsi/virtio_scsi.c @@ -475,7 +475,10 @@ static int __devinit virtscsi_probe(struct virtio_device *vdev) shost->cmd_per_lun = min_t(u32, cmd_per_lun, shost->can_queue); shost->max_sectors = virtscsi_config_get(vdev, max_sectors) ?: 0xFFFF; shost->max_lun = virtscsi_config_get(vdev, max_lun) + 1; - shost->max_id = virtscsi_config_get(vdev, max_target) + 1; + /* + * Currently required for tcm_vhost to function.. + */ + shost->max_id = 1; shost->max_channel = 0; shost->max_cmd_len = VIRTIO_SCSI_CDB_SIZE; err = scsi_add_host(shost, &vdev->dev); -- 1.7.2.5
Asias He
2012-Jul-04 04:41 UTC
[PATCH 1/6] vhost: Separate vhost-net features from vhost features
On 07/04/2012 12:24 PM, Nicholas A. Bellinger wrote:> From: Stefan Hajnoczi <stefanha at linux.vnet.ibm.com> > > In order for other vhost devices to use the VHOST_FEATURES bits the > vhost-net specific bits need to be moved to their own VHOST_NET_FEATURES > constant. > > Signed-off-by: Stefan Hajnoczi <stefanha at linux.vnet.ibm.com> > Cc: Zhi Yong Wu <wuzhy at cn.ibm.com> > Cc: Michael S. Tsirkin <mst at redhat.com> > Cc: Paolo Bonzini <pbonzini at redhat.com> > Signed-off-by: Nicholas A. Bellinger <nab at risingtidesystems.com>I think you need to change drivers/vhost/test.c as well. diff --git a/drivers/vhost/test.c b/drivers/vhost/test.c index 3de00d9..91d6f06 100644 --- a/drivers/vhost/test.c +++ b/drivers/vhost/test.c @@ -261,14 +261,14 @@ static long vhost_test_ioctl(struct file *f, unsigned int ioctl, return -EFAULT; return vhost_test_run(n, test); case VHOST_GET_FEATURES: - features = VHOST_FEATURES; + features = VHOST_NET_FEATURES; if (copy_to_user(featurep, &features, sizeof features)) return -EFAULT; return 0; case VHOST_SET_FEATURES: if (copy_from_user(&features, featurep, sizeof features)) return -EFAULT; - if (features & ~VHOST_FEATURES) + if (features & ~VHOST_NET_FEATURES) return -EOPNOTSUPP; return vhost_test_set_features(n, features); case VHOST_RESET_OWNER:> --- > drivers/vhost/net.c | 4 ++-- > drivers/vhost/vhost.h | 3 ++- > 2 files changed, 4 insertions(+), 3 deletions(-) > > diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c > index f82a739..072cbba 100644 > --- a/drivers/vhost/net.c > +++ b/drivers/vhost/net.c > @@ -823,14 +823,14 @@ static long vhost_net_ioctl(struct file *f, unsigned int ioctl, > return -EFAULT; > return vhost_net_set_backend(n, backend.index, backend.fd); > case VHOST_GET_FEATURES: > - features = VHOST_FEATURES; > + features = VHOST_NET_FEATURES; > if (copy_to_user(featurep, &features, sizeof features)) > return -EFAULT; > return 0; > case VHOST_SET_FEATURES: > if (copy_from_user(&features, featurep, sizeof features)) > return -EFAULT; > - if (features & ~VHOST_FEATURES) > + if (features & ~VHOST_NET_FEATURES) > return -EOPNOTSUPP; > return vhost_net_set_features(n, features); > case VHOST_RESET_OWNER: > diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h > index 8de1fd5..07b9763 100644 > --- a/drivers/vhost/vhost.h > +++ b/drivers/vhost/vhost.h > @@ -201,7 +201,8 @@ enum { > VHOST_FEATURES = (1ULL << VIRTIO_F_NOTIFY_ON_EMPTY) | > (1ULL << VIRTIO_RING_F_INDIRECT_DESC) | > (1ULL << VIRTIO_RING_F_EVENT_IDX) | > - (1ULL << VHOST_F_LOG_ALL) | > + (1ULL << VHOST_F_LOG_ALL), > + VHOST_NET_FEATURES = VHOST_FEATURES | > (1ULL << VHOST_NET_F_VIRTIO_NET_HDR) | > (1ULL << VIRTIO_NET_F_MRG_RXBUF), > }; >-- Asias
Michael S. Tsirkin
2012-Jul-04 14:02 UTC
[PATCH 0/6] tcm_vhost/virtio-scsi WIP code for-3.6
On Wed, Jul 04, 2012 at 04:24:00AM +0000, Nicholas A. Bellinger wrote:> From: Nicholas Bellinger <nab at linux-iscsi.org> > > Hi folks, > > This series contains patches required to update tcm_vhost <-> virtio-scsi > connected hosts <-> guests to run on v3.5-rc2 mainline code. This series is > available on top of target-pending/auto-next here: > > git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending.git tcm_vhost > > This includes the necessary vhost changes from Stefan to to get tcm_vhost > functioning, along a virtio-scsi LUN scanning change to address a client bug > with tcm_vhost I ran into.. Also, tcm_vhost driver has been merged into a single > source + header file that is now living under /drivers/vhost/, along with latest > tcm_vhost changes from Zhi's tcm_vhost tree. > > Here are a couple of screenshots of the code in action using raw IBLOCK > backends provided by FusionIO ioDrive Duo: > > http://linux-iscsi.org/images/Virtio-scsi-tcm-vhost-3.5-rc2-3.png > http://linux-iscsi.org/images/Virtio-scsi-tcm-vhost-3.5-rc2-4.png > > So the next steps on my end will be converting tcm_vhost to submit backend I/O from > cmwq context, along with fio benchmark numbers between tcm_vhost/virtio-scsi and > virtio-scsi-raw using raw IBLOCK iomemory_vsl flash.OK so this is an RFC, not for merge yet?> > Please have a look vhost + virtio-scsi folks (mst + paolo CC'ed) and let us > know if you have any concerns. > > Thanks! > > --nab > Nicholas Bellinger (4): > vhost: Add vhost_scsi specific defines > tcm_vhost: Initial merge for vhost level target fabric driver > virtio-scsi: Add vdrv->scan for post VIRTIO_CONFIG_S_DRIVER_OK LUN > scanning > virtio-scsi: Set shost->max_id=1 for tcm_vhost WWPNs > > Stefan Hajnoczi (2): > vhost: Separate vhost-net features from vhost features > vhost: make vhost work queue visible > > drivers/scsi/virtio_scsi.c | 20 +- > drivers/vhost/Kconfig | 6 + > drivers/vhost/Makefile | 1 + > drivers/vhost/net.c | 4 +- > drivers/vhost/tcm_vhost.c | 1592 ++++++++++++++++++++++++++++++++++++++++++++ > drivers/vhost/tcm_vhost.h | 70 ++ > drivers/vhost/vhost.c | 5 +- > drivers/vhost/vhost.h | 6 +- > drivers/virtio/virtio.c | 5 +- > include/linux/vhost.h | 9 + > include/linux/virtio.h | 1 + > 11 files changed, 1708 insertions(+), 11 deletions(-) > create mode 100644 drivers/vhost/tcm_vhost.c > create mode 100644 drivers/vhost/tcm_vhost.h > > -- > 1.7.2.5
Paolo Bonzini
2012-Jul-04 14:50 UTC
[PATCH 5/6] virtio-scsi: Add vdrv->scan for post VIRTIO_CONFIG_S_DRIVER_OK LUN scanning
Il 04/07/2012 06:24, Nicholas A. Bellinger ha scritto:> From: Nicholas Bellinger <nab at linux-iscsi.org> > > This patch changes virtio-scsi to use a new virtio_driver->scan() callback > so that scsi_scan_host() can be properly invoked once virtio_dev_probe() has > set add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK) to signal active virtio-ring > operation, instead of from within virtscsi_probe(). > > This fixes a bug where SCSI LUN scanning for both virtio-scsi-raw and > virtio-scsi/tcm_vhost setups was happening before VIRTIO_CONFIG_S_DRIVER_OK > had been set, causing VIRTIO_SCSI_S_BAD_TARGET to occur. This fixes a bug > with virtio-scsi/tcm_vhost where LUN scan was not detecting LUNs. > > Tested with virtio-scsi-raw + virtio-scsi/tcm_vhost w/ IBLOCK on 3.5-rc2 code. > > Cc: Paolo Bonzini <pbonzini at redhat.com> > Cc: Stefan Hajnoczi <stefanha at linux.vnet.ibm.com> > Cc: Zhi Yong Wu <wuzhy at cn.ibm.com> > Cc: Christoph Hellwig <hch at lst.de> > Cc: Hannes Reinecke <hare at suse.de> > Signed-off-by: Nicholas Bellinger <nab at linux-iscsi.org>Reviewed-by: Paolo Bonzini <pbonzini at redhat.com> Please send this independently. I think we also want it in stable at vger? Paolo> --- > drivers/scsi/virtio_scsi.c | 15 ++++++++++++--- > drivers/virtio/virtio.c | 5 ++++- > include/linux/virtio.h | 1 + > 3 files changed, 17 insertions(+), 4 deletions(-) > > diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c > index 1b38431..391b30d 100644 > --- a/drivers/scsi/virtio_scsi.c > +++ b/drivers/scsi/virtio_scsi.c > @@ -481,9 +481,10 @@ static int __devinit virtscsi_probe(struct virtio_device *vdev) > err = scsi_add_host(shost, &vdev->dev); > if (err) > goto scsi_add_host_failed; > - > - scsi_scan_host(shost); > - > + /* > + * scsi_scan_host() happens in virtscsi_scan() via virtio_driver->scan() > + * after VIRTIO_CONFIG_S_DRIVER_OK has been set.. > + */ > return 0; > > scsi_add_host_failed: > @@ -493,6 +494,13 @@ virtscsi_init_failed: > return err; > } > > +static void virtscsi_scan(struct virtio_device *vdev) > +{ > + struct Scsi_Host *shost = (struct Scsi_Host *)vdev->priv; > + > + scsi_scan_host(shost); > +} > + > static void virtscsi_remove_vqs(struct virtio_device *vdev) > { > /* Stop all the virtqueues. */ > @@ -537,6 +545,7 @@ static struct virtio_driver virtio_scsi_driver = { > .driver.owner = THIS_MODULE, > .id_table = id_table, > .probe = virtscsi_probe, > + .scan = virtscsi_scan, > #ifdef CONFIG_PM > .freeze = virtscsi_freeze, > .restore = virtscsi_restore, > diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c > index f355807..c3b3f7f 100644 > --- a/drivers/virtio/virtio.c > +++ b/drivers/virtio/virtio.c > @@ -141,8 +141,11 @@ static int virtio_dev_probe(struct device *_d) > err = drv->probe(dev); > if (err) > add_status(dev, VIRTIO_CONFIG_S_FAILED); > - else > + else { > add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK); > + if (drv->scan) > + drv->scan(dev); > + } > > return err; > } > diff --git a/include/linux/virtio.h b/include/linux/virtio.h > index 8efd28a..a1ba8bb 100644 > --- a/include/linux/virtio.h > +++ b/include/linux/virtio.h > @@ -92,6 +92,7 @@ struct virtio_driver { > const unsigned int *feature_table; > unsigned int feature_table_size; > int (*probe)(struct virtio_device *dev); > + void (*scan)(struct virtio_device *dev); > void (*remove)(struct virtio_device *dev); > void (*config_changed)(struct virtio_device *dev); > #ifdef CONFIG_PM >
Nicholas A. Bellinger
2012-Jul-06 09:13 UTC
SCSI Performance regression [was Re: [PATCH 0/6] tcm_vhost/virtio-scsi WIP code for-3.6]
On Fri, 2012-07-06 at 09:43 +0400, James Bottomley wrote:> On Thu, 2012-07-05 at 20:01 -0700, Nicholas A. Bellinger wrote: > > > So I'm pretty sure this discrepancy is attributed to the small block > > random I/O bottleneck currently present for all Linux/SCSI core LLDs > > regardless of physical or virtual storage fabric. > > > > The SCSI wide host-lock less conversion that happened in .38 code back > > in 2010, and subsequently having LLDs like virtio-scsi convert to run in > > host-lock-less mode have helped to some extent.. But it's still not > > enough.. > > > > Another example where we've been able to prove this bottleneck recently > > is with the following target setup: > > > > *) Intel Romley production machines with 128 GB of DDR-3 memory > > *) 4x FusionIO ioDrive 2 (1.5 TB @ PCI-e Gen2 x2) > > *) Mellanox PCI-exress Gen3 HCA running at 56 gb/sec > > *) Infiniband SRP Target backported to RHEL 6.2 + latest OFED > > > > In this setup using ib_srpt + IBLOCK w/ emulate_write_cache=1 + > > iomemory_vsl export we end up avoiding SCSI core bottleneck on the > > target machine, just as with the tcm_vhost example here for host kernel > > side processing with vhost. > > > > Using Linux IB SRP initiator + Windows Server 2008 R2 SCSI-miniport SRP > > (OFED) Initiator connected to four ib_srpt LUNs, we've observed that > > MSFT SCSI is currently outperforming RHEL 6.2 on the order of ~285K vs. > > ~215K with heavy random 4k WRITE iometer / fio tests. Note this with an > > optimized queue_depth ib_srp client w/ noop I/O schedulering, but is > > still lacking the host_lock-less patches on RHEL 6.2 OFED.. > > > > This bottleneck has been mentioned by various people (including myself) > > on linux-scsi the last 18 months, and I've proposed that that it be > > discussed at KS-2012 so we can start making some forward progress: > > Well, no, it hasn't. You randomly drop things like this into unrelated > email (I suppose that is a mention in strict English construction) but > it's not really enough to get anyone to pay attention since they mostly > stopped reading at the top, if they got that far: most people just go by > subject when wading through threads initially. >It most certainly has been made clear to me, numerous times from many people in the Linux/SCSI community that there is a bottleneck for small block random I/O in SCSI core vs. raw Linux/Block, as well as vs. non Linux based SCSI subsystems. My apologies if mentioning this issue last year at LC 2011 to you privately did not take a tone of a more serious nature, or that proposing a topic for LSF-2012 this year was not a clear enough indication of a problem with SCSI small block random I/O performance.> But even if anyone noticed, a statement that RHEL6.2 (on a 2.6.32 > kernel, which is now nearly three years old) is 25% slower than W2k8R2 > on infiniband isn't really going to get anyone excited either > (particularly when you mention OFED, which usually means a stack > replacement on Linux anyway). >The specific issue was first raised for .38 where we where able to get most of the interesting high performance LLDs converted to using internal locking methods so that host_lock did not have to be obtained during each ->queuecommand() I/O dispatch, right..? This has helped a good deal for large multi-lun scsi_host configs that are now running in host-lock less mode, but there is still a large discrepancy single LUN vs. raw struct block_device access even with LLD host_lock less mode enabled. Now I think the virtio-blk client performance is demonstrating this issue pretty vividly, along with this week's tcm_vhost IBLOCK raw block flash benchmarks that is demonstrate some other yet-to-be determined limitations for virtio-scsi-raw vs. tcm_vhost for this particular fio randrw workload.> What people might pay attention to is evidence that there's a problem in > 3.5-rc6 (without any OFED crap). If you're not going to bother > investigating, it has to be in an environment they can reproduce (so > ordinary hardware, not infiniband) otherwise it gets ignored as an > esoteric hardware issue. >It's really quite simple for anyone to demonstrate the bottleneck locally on any machine using tcm_loop with raw block flash. Take a struct block_device backend (like a Fusion IO /dev/fio*) and using IBLOCK and export locally accessible SCSI LUNs via tcm_loop.. Using FIO there is a significant drop for randrw 4k performance between tcm_loop <-> IBLOCK vs. raw struct block device backends. And no, it's not some type of target IBLOCK or tcm_loop bottleneck, it's a per SCSI LUN limitation for small block random I/Os on the order of ~75K for each SCSI LUN. If anyone has gone actually gone faster than this with any single SCSI LUN on any storage fabric, I would be interested in hearing about your setup. Thanks, --nab
Nicholas A. Bellinger
2012-Jul-06 18:21 UTC
SCSI Performance regression [was Re: [PATCH 0/6] tcm_vhost/virtio-scsi WIP code for-3.6]
On Fri, 2012-07-06 at 17:49 +0400, James Bottomley wrote:> On Fri, 2012-07-06 at 02:13 -0700, Nicholas A. Bellinger wrote: > > On Fri, 2012-07-06 at 09:43 +0400, James Bottomley wrote: > > > On Thu, 2012-07-05 at 20:01 -0700, Nicholas A. Bellinger wrote: > > ><SNIP>> > > > This bottleneck has been mentioned by various people (including myself) > > > > on linux-scsi the last 18 months, and I've proposed that that it be > > > > discussed at KS-2012 so we can start making some forward progress: > > > > > > Well, no, it hasn't. You randomly drop things like this into unrelated > > > email (I suppose that is a mention in strict English construction) but > > > it's not really enough to get anyone to pay attention since they mostly > > > stopped reading at the top, if they got that far: most people just go by > > > subject when wading through threads initially. > > > > > > > It most certainly has been made clear to me, numerous times from many > > people in the Linux/SCSI community that there is a bottleneck for small > > block random I/O in SCSI core vs. raw Linux/Block, as well as vs. non > > Linux based SCSI subsystems. > > > > My apologies if mentioning this issue last year at LC 2011 to you > > privately did not take a tone of a more serious nature, or that > > proposing a topic for LSF-2012 this year was not a clear enough > > indication of a problem with SCSI small block random I/O performance. > > > > > But even if anyone noticed, a statement that RHEL6.2 (on a 2.6.32 > > > kernel, which is now nearly three years old) is 25% slower than W2k8R2 > > > on infiniband isn't really going to get anyone excited either > > > (particularly when you mention OFED, which usually means a stack > > > replacement on Linux anyway). > > > > > > > The specific issue was first raised for .38 where we where able to get > > most of the interesting high performance LLDs converted to using > > internal locking methods so that host_lock did not have to be obtained > > during each ->queuecommand() I/O dispatch, right..? > > > > This has helped a good deal for large multi-lun scsi_host configs that > > are now running in host-lock less mode, but there is still a large > > discrepancy single LUN vs. raw struct block_device access even with LLD > > host_lock less mode enabled. > > > > Now I think the virtio-blk client performance is demonstrating this > > issue pretty vividly, along with this week's tcm_vhost IBLOCK raw block > > flash benchmarks that is demonstrate some other yet-to-be determined > > limitations for virtio-scsi-raw vs. tcm_vhost for this particular fio > > randrw workload. > > > > > What people might pay attention to is evidence that there's a problem in > > > 3.5-rc6 (without any OFED crap). If you're not going to bother > > > investigating, it has to be in an environment they can reproduce (so > > > ordinary hardware, not infiniband) otherwise it gets ignored as an > > > esoteric hardware issue. > > > > > > > It's really quite simple for anyone to demonstrate the bottleneck > > locally on any machine using tcm_loop with raw block flash. Take a > > struct block_device backend (like a Fusion IO /dev/fio*) and using > > IBLOCK and export locally accessible SCSI LUNs via tcm_loop.. > > > > Using FIO there is a significant drop for randrw 4k performance between > > tcm_loop <-> IBLOCK vs. raw struct block device backends. And no, it's > > not some type of target IBLOCK or tcm_loop bottleneck, it's a per SCSI > > LUN limitation for small block random I/Os on the order of ~75K for each > > SCSI LUN. > > Here, you're saying here that the end to end SCSI stack tops out at > around 75k iops, which is reasonably respectable if you don't employ any > mitigation like queue steering and interrupt polling ... what were the > mitigation techniques in the test you employed by the way? >~75K per SCSI LUN in a multi-lun per host setup is being optimistic btw. On the other side of the coin, the same pure block device can easily go ~200K per backend.- For the simplest case with tcm_loop, a struct scsi_cmnd is queued via cmwq to execute in process context -> submit the backend I/O. Once completed from IBLOCK, the I/O is run though a target completion wq, and completed back to SCSI. There is no fancy queue steering or interrupt polling going on (at least not in tcm_loop) because it's a simple virtual SCSI LLD similar to scsi_debug.> But previously, you ascribed a performance drop of around 75% on > virtio-scsi (topping out around 15-20k iops) to this same problem ... > that doesn't really seem likely. >No. I ascribed the performance difference between virtio-scsi+tcm_vhost vs. bare-metal raw block flash to this bottleneck in Linux/SCSI. It's obvious that virtio-scsi-raw going through QEMU SCSI / block is having some other shortcomings.> Here's the rough ranges of concern: > > 10K iops: standard arrays > 100K iops: modern expensive fast flash drives on 6Gb links > 1M iops: PCIe NVMexpress like devices > > SCSI should do arrays with no problem at all, so I'd be really concerned > that it can't make 0-20k iops. If you push the system and fine tune it, > SCSI can just about get to 100k iops. 1M iops is still a stretch goal > for pure block drivers. >1M iops is not a stretch for pure block drivers anymore on commodity hardwrae. 5 Fusion-IO HBAs + Romley HW can easily go 1M random 4k IOPs using a pure block driver. The point is that it would currently take at least 2x the amount of SCSI LUNs in order to even get close to 1M IOPs with an single LLD driver. And from the feedback from everyone I've talked to, no one has been able to make Linux/SCSI go 1M IOPs with any kernel. --nab
Christoph Lameter
2012-Jul-06 20:30 UTC
[Ksummit-2012-discuss] SCSI Performance regression [was Re: [PATCH 0/6] tcm_vhost/virtio-scsi WIP code for-3.6]
On Fri, 6 Jul 2012, James Bottomley wrote:> What people might pay attention to is evidence that there's a problem in > 3.5-rc6 (without any OFED crap). If you're not going to bother > investigating, it has to be in an environment they can reproduce (so > ordinary hardware, not infiniband) otherwise it gets ignored as an > esoteric hardware issue.The OFED stuff in the meantime is part of 3.5-rc6. Infiniband has been supported for a long time and its a very important technology given the problematic nature of ethernet at high network speeds. OFED crap exists for those running RHEL5/6. The new enterprise distros are based on the 3.2 kernel which has pretty good Infiniband support out of the box.
Nicholas A. Bellinger
2012-Jul-06 22:06 UTC
[Ksummit-2012-discuss] SCSI Performance regression [was Re: [PATCH 0/6] tcm_vhost/virtio-scsi WIP code for-3.6]
On Fri, 2012-07-06 at 15:30 -0500, Christoph Lameter wrote:> On Fri, 6 Jul 2012, James Bottomley wrote: > > > What people might pay attention to is evidence that there's a problem in > > 3.5-rc6 (without any OFED crap). If you're not going to bother > > investigating, it has to be in an environment they can reproduce (so > > ordinary hardware, not infiniband) otherwise it gets ignored as an > > esoteric hardware issue. > > The OFED stuff in the meantime is part of 3.5-rc6. Infiniband has been > supported for a long time and its a very important technology given the > problematic nature of ethernet at high network speeds. > > OFED crap exists for those running RHEL5/6. The new enterprise distros are > based on the 3.2 kernel which has pretty good Infiniband support > out of the box. >So I don't think the HCAs or Infiniband fabric was the limiting factor for small block random I/O in the RHEL 6.2 w/ OFED vs. Windows Server 2008 R2 w/ OFED setup mentioned earlier. I've seen both FC and iSCSI fabrics demonstrate the same type of random small block I/O performance anomalies with Linux/SCSI clients too. The v3.x Linux/SCSI clients are certainly better in the multi-lun per host small block random I/O case, but single LUN performance is (still) lacking compared to everything else. Also RHEL 6.2 does have the scsi-host-lock less bits in place now, but it's been more a matter of converting OFED ib_srp code to run in host-lock less mode to realize extra gains for multi-lun per host.
Seemingly Similar Threads
- [RFC-v3 0/4] tcm_vhost+cmwq fabric driver code for-3.6
- [RFC-v3 0/4] tcm_vhost+cmwq fabric driver code for-3.6
- [RFC-v2 0/4] tcm_vhost+cmwq fabric driver code for-3.6
- [RFC-v2 0/4] tcm_vhost+cmwq fabric driver code for-3.6
- [RFC-v4 0/3] tcm_vhost+cmwq fabric driver code for-3.6