Sudeep Dutt
2013-Jul-25 03:31 UTC
[PATCH 0/5] Enable Drivers for Intel MIC X100 Coprocessors.
An Intel MIC X100 device is a PCIe form factor add-in coprocessor card based on the Intel Many Integrated Core (MIC) architecture that runs a Linux OS. It is a PCIe endpoint in a platform and therefore implements the three required standard address spaces i.e. configuration, memory and I/O. The host OS loads a device driver as is typical for PCIe devices. The card itself runs a bootstrap after reset that transfers control to the card OS downloaded from the host driver. The card OS as shipped by Intel is a Linux kernel with modifications for the X100 devices. Since it is a PCIe card, it does not have the ability to host hardware devices for networking, storage and console. We provide these devices on X100 coprocessors thus enabling a self-bootable equivalent environment for applications. A key benefit of our solution is that it leverages the standard virtio framework for network, disk and console devices, though in our case the virtio framework is used across a PCIe bus. Here is a block diagram of the various components described above. The virtio backends are situated on the host rather than the card given better single threaded performance for the host compared to MIC and the ability of the host to initiate DMA's to/from the card using the MIC DMA engine. | +----------+ | +----------+ | Card OS | | | Host OS | +----------+ | +----------+ | +-------+ +--------+ +------+ | +---------+ +--------+ +--------+ | Virtio| |Virtio | |Virtio| | |Virtio | |Virtio | |Virtio | | Net | |Console | |Block | | |Net | |Console | |Block | | Driver| |Driver | |Driver| | |backend | |backend | |backend | +-------+ +--------+ +------+ | +---------+ +--------+ +--------+ | | | | | | | | | | |Ring 3| | | | | | |------|------------|---------|------- +-------------------+ |Ring 0+--------------------------+ | | | Virtio over PCIe IOCTLs | | | +--------------------------+ +--------------+ | | |Intel MIC | | +---------------+ |Card Driver | | |Intel MIC | +--------------+ | |Host Driver | | | +---------------+ | | | +-------------------------------------------------------------+ | | | PCIe Bus | +-------------------------------------------------------------+ The following series of patches are partitioned as follows: Patch 1: This patch introduces the "Intel MIC Host Driver" in the block diagram which does the following: a) Initializes the Intel MIC X100 PCIe devices. b) Boots and shuts down the card via sysfs entries. c) Allocates and maps a device page for communication with the card driver and updates the device page address via scratchpad registers. d) Provides sysfs entries for family, stepping, state, shutdown status, kernel command line, IP address, ramdisk and log buffer information. Patch 2: This patch introduces the "Intel MIC Card Driver" in the block diagram which does the following: a) Initializes the Intel MIC X100 platform device and driver. b) Sets up support to handle shutdown requests from the host. c) Maps the device page after obtaining the device page address from the scratchpad registers updated by the host. d) Informs the host upon a card crash by registering a panic notifier. e) Informs the host upon a poweroff/halt event. Patch 3: This patch introduces the host "Virtio over PCIe" interface for Intel MIC. It allows creating user space backends on the host and instantiating virtio devices for them on the Intel MIC card. A character device per MIC is exposed with IOCTL, mmap and poll callbacks. This allows the user space backend to: (a) add/remove a virtio device via a device page. (b) map (R/O) virtio rings and device page to user space. (c) poll for availability of data. (d) copy a descriptor or entire descriptor chain to/from the card. (e) modify virtio configuration. (f) handle virtio device reset. The buffers are copied over using CPU copies for this initial patch and host initiated MIC DMA support is planned for future patches. The avail and desc virtio rings are in host memory and the used ring is in card memory to maximize writes across PCIe for performance. Patch 4: This patch introduces the card "Virtio over PCIe" interface for Intel MIC. It allows virtio drivers on the card to communicate with their user space backends on the host via a device page. Ring 3 apps on the host can add, remove and configure virtio devices. A thin MIC specific virtio_config_ops is implemented which is borrowed heavily from previous similar implementations in lguest and s390 @ drivers/lguest/lguest_device.c drivers/s390/kvm/kvm_virtio.c Patch 5: This patch introduces a sample user space daemon which implements the virtio device backends on the host. The daemon creates/removes/configures virtio device backends by communicating with the Intel MIC Host Driver. The virtio devices currently supported are virtio net, virtio console and virtio block. The daemon also monitors card shutdown status and takes appropriate actions like killing the virtio backends and resetting the card upon card shutdown and crashes. The patches have been compiled/validated against v3.10. Ashutosh Dixit (2): Intel MIC Host Driver Changes for Virtio Devices. Intel MIC Card Driver Changes for Virtio Devices. Caz Yokoyama (1): Sample Implementation of Intel MIC User Space Daemon. Sudeep Dutt (2): Intel MIC Host Driver for X100 family. Intel MIC Card Driver for X100 family. Documentation/mic/mic_overview.txt | 48 + Documentation/mic/mpssd/.gitignore | 1 + Documentation/mic/mpssd/Makefile | 20 + Documentation/mic/mpssd/micctrl | 157 +++ Documentation/mic/mpssd/mpss | 246 +++++ Documentation/mic/mpssd/mpssd.c | 1732 ++++++++++++++++++++++++++++++++++ Documentation/mic/mpssd/mpssd.h | 105 +++ Documentation/mic/mpssd/sysfs.c | 108 +++ drivers/misc/Kconfig | 1 + drivers/misc/Makefile | 1 + drivers/misc/mic/Kconfig | 56 ++ drivers/misc/mic/Makefile | 6 + drivers/misc/mic/card/Makefile | 11 + drivers/misc/mic/card/mic_common.h | 43 + drivers/misc/mic/card/mic_debugfs.c | 139 +++ drivers/misc/mic/card/mic_debugfs.h | 40 + drivers/misc/mic/card/mic_device.c | 311 ++++++ drivers/misc/mic/card/mic_device.h | 106 +++ drivers/misc/mic/card/mic_virtio.c | 643 +++++++++++++ drivers/misc/mic/card/mic_virtio.h | 79 ++ drivers/misc/mic/card/mic_x100.c | 253 +++++ drivers/misc/mic/card/mic_x100.h | 53 ++ drivers/misc/mic/common/mic_device.h | 85 ++ drivers/misc/mic/host/Makefile | 13 + drivers/misc/mic/host/mic_boot.c | 181 ++++ drivers/misc/mic/host/mic_common.h | 37 + drivers/misc/mic/host/mic_debugfs.c | 503 ++++++++++ drivers/misc/mic/host/mic_debugfs.h | 34 + drivers/misc/mic/host/mic_device.h | 280 ++++++ drivers/misc/mic/host/mic_fops.c | 280 ++++++ drivers/misc/mic/host/mic_fops.h | 37 + drivers/misc/mic/host/mic_main.c | 1119 ++++++++++++++++++++++ drivers/misc/mic/host/mic_smpt.c | 441 +++++++++ drivers/misc/mic/host/mic_smpt.h | 103 ++ drivers/misc/mic/host/mic_sysfs.c | 360 +++++++ drivers/misc/mic/host/mic_virtio.c | 703 ++++++++++++++ drivers/misc/mic/host/mic_virtio.h | 108 +++ drivers/misc/mic/host/mic_x100.c | 665 +++++++++++++ drivers/misc/mic/host/mic_x100.h | 112 +++ include/uapi/linux/Kbuild | 2 + include/uapi/linux/mic_common.h | 242 +++++ include/uapi/linux/mic_ioctl.h | 104 ++ 42 files changed, 9568 insertions(+) create mode 100644 Documentation/mic/mic_overview.txt create mode 100644 Documentation/mic/mpssd/.gitignore create mode 100644 Documentation/mic/mpssd/Makefile create mode 100755 Documentation/mic/mpssd/micctrl create mode 100755 Documentation/mic/mpssd/mpss create mode 100644 Documentation/mic/mpssd/mpssd.c create mode 100644 Documentation/mic/mpssd/mpssd.h create mode 100644 Documentation/mic/mpssd/sysfs.c create mode 100644 drivers/misc/mic/Kconfig create mode 100644 drivers/misc/mic/Makefile create mode 100644 drivers/misc/mic/card/Makefile create mode 100644 drivers/misc/mic/card/mic_common.h create mode 100644 drivers/misc/mic/card/mic_debugfs.c create mode 100644 drivers/misc/mic/card/mic_debugfs.h create mode 100644 drivers/misc/mic/card/mic_device.c create mode 100644 drivers/misc/mic/card/mic_device.h create mode 100644 drivers/misc/mic/card/mic_virtio.c create mode 100644 drivers/misc/mic/card/mic_virtio.h create mode 100644 drivers/misc/mic/card/mic_x100.c create mode 100644 drivers/misc/mic/card/mic_x100.h create mode 100644 drivers/misc/mic/common/mic_device.h create mode 100644 drivers/misc/mic/host/Makefile create mode 100644 drivers/misc/mic/host/mic_boot.c create mode 100644 drivers/misc/mic/host/mic_common.h create mode 100644 drivers/misc/mic/host/mic_debugfs.c create mode 100644 drivers/misc/mic/host/mic_debugfs.h create mode 100644 drivers/misc/mic/host/mic_device.h create mode 100644 drivers/misc/mic/host/mic_fops.c create mode 100644 drivers/misc/mic/host/mic_fops.h create mode 100644 drivers/misc/mic/host/mic_main.c create mode 100644 drivers/misc/mic/host/mic_smpt.c create mode 100644 drivers/misc/mic/host/mic_smpt.h create mode 100644 drivers/misc/mic/host/mic_sysfs.c create mode 100644 drivers/misc/mic/host/mic_virtio.c create mode 100644 drivers/misc/mic/host/mic_virtio.h create mode 100644 drivers/misc/mic/host/mic_x100.c create mode 100644 drivers/misc/mic/host/mic_x100.h create mode 100644 include/uapi/linux/mic_common.h create mode 100644 include/uapi/linux/mic_ioctl.h -- 1.8.2.1
This patch enables the following: a) Initializes the Intel MIC X100 PCIe devices. b) Boots and shuts down the card via sysfs entries. c) Allocates and maps a device page for communication with the card driver and updates the device page address via scratchpad registers. d) Provides sysfs entries for family, stepping, state, shutdown status, kernel command line, IP address, ramdisk and log buffer information. Co-author: Dasaratharaman Chandramouli <dasaratharaman.chandramouli at intel.com> Signed-off-by: Ashutosh Dixit <ashutosh.dixit at intel.com> Signed-off-by: Caz Yokoyama <Caz.Yokoyama at intel.com> Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli at intel.com> Signed-off-by: Harshavardhan R Kharche <harshavardhan.r.kharche at intel.com> Signed-off-by: Nikhil Rao <nikhil.rao at intel.com> Signed-off-by: Sudeep Dutt <sudeep.dutt at intel.com> Acked-by: Yaozu (Eddie) Dong <eddie.dong at intel.com> Reviewed-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr at intel.com> --- drivers/misc/Kconfig | 1 + drivers/misc/Makefile | 1 + drivers/misc/mic/Kconfig | 19 + drivers/misc/mic/Makefile | 5 + drivers/misc/mic/common/mic_device.h | 81 +++ drivers/misc/mic/host/Makefile | 11 + drivers/misc/mic/host/mic_boot.c | 179 ++++++ drivers/misc/mic/host/mic_common.h | 37 ++ drivers/misc/mic/host/mic_debugfs.c | 366 ++++++++++++ drivers/misc/mic/host/mic_debugfs.h | 34 ++ drivers/misc/mic/host/mic_device.h | 280 +++++++++ drivers/misc/mic/host/mic_main.c | 1095 ++++++++++++++++++++++++++++++++++ drivers/misc/mic/host/mic_smpt.c | 441 ++++++++++++++ drivers/misc/mic/host/mic_smpt.h | 103 ++++ drivers/misc/mic/host/mic_sysfs.c | 360 +++++++++++ drivers/misc/mic/host/mic_x100.c | 665 +++++++++++++++++++++ drivers/misc/mic/host/mic_x100.h | 112 ++++ include/uapi/linux/Kbuild | 1 + include/uapi/linux/mic_common.h | 79 +++ 19 files changed, 3870 insertions(+) create mode 100644 drivers/misc/mic/Kconfig create mode 100644 drivers/misc/mic/Makefile create mode 100644 drivers/misc/mic/common/mic_device.h create mode 100644 drivers/misc/mic/host/Makefile create mode 100644 drivers/misc/mic/host/mic_boot.c create mode 100644 drivers/misc/mic/host/mic_common.h create mode 100644 drivers/misc/mic/host/mic_debugfs.c create mode 100644 drivers/misc/mic/host/mic_debugfs.h create mode 100644 drivers/misc/mic/host/mic_device.h create mode 100644 drivers/misc/mic/host/mic_main.c create mode 100644 drivers/misc/mic/host/mic_smpt.c create mode 100644 drivers/misc/mic/host/mic_smpt.h create mode 100644 drivers/misc/mic/host/mic_sysfs.c create mode 100644 drivers/misc/mic/host/mic_x100.c create mode 100644 drivers/misc/mic/host/mic_x100.h create mode 100644 include/uapi/linux/mic_common.h diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig index c002d86..09fcca9 100644 --- a/drivers/misc/Kconfig +++ b/drivers/misc/Kconfig @@ -536,4 +536,5 @@ source "drivers/misc/carma/Kconfig" source "drivers/misc/altera-stapl/Kconfig" source "drivers/misc/mei/Kconfig" source "drivers/misc/vmw_vmci/Kconfig" +source "drivers/misc/mic/Kconfig" endmenu diff --git a/drivers/misc/Makefile b/drivers/misc/Makefile index c235d5b..0b7ea3e 100644 --- a/drivers/misc/Makefile +++ b/drivers/misc/Makefile @@ -53,3 +53,4 @@ obj-$(CONFIG_INTEL_MEI) += mei/ obj-$(CONFIG_VMWARE_VMCI) += vmw_vmci/ obj-$(CONFIG_LATTICE_ECP3_CONFIG) += lattice-ecp3-config.o obj-$(CONFIG_SRAM) += sram.o +obj-y += mic/ diff --git a/drivers/misc/mic/Kconfig b/drivers/misc/mic/Kconfig new file mode 100644 index 0000000..aaefd0c --- /dev/null +++ b/drivers/misc/mic/Kconfig @@ -0,0 +1,19 @@ +comment "Intel MIC Host Driver" + +config INTEL_MIC_HOST + tristate "Intel MIC Host Driver" + depends on 64BIT && PCI + default N + help + This enables Host Driver support for the Intel Many Integrated + Core (MIC) family of PCIe form factor coprocessor devices that + run a 64 bit Linux OS. The driver manages card OS state and + enables communication between host and card. Intel MIC X100 + devices are currently supported. + + If you are building a host kernel with an Intel MIC device then + say M (recommended) or Y, else say N. If unsure say N. + + More information about the Intel MIC family as well as the Linux + OS and tools for MIC to use with this driver are available from + <http://software.intel.com/en-us/mic-developer>. diff --git a/drivers/misc/mic/Makefile b/drivers/misc/mic/Makefile new file mode 100644 index 0000000..8e72421 --- /dev/null +++ b/drivers/misc/mic/Makefile @@ -0,0 +1,5 @@ +# +# Makefile - Intel MIC Linux driver. +# Copyright(c) 2013, Intel Corporation. +# +obj-$(CONFIG_INTEL_MIC_HOST) += host/ diff --git a/drivers/misc/mic/common/mic_device.h b/drivers/misc/mic/common/mic_device.h new file mode 100644 index 0000000..24934b1 --- /dev/null +++ b/drivers/misc/mic/common/mic_device.h @@ -0,0 +1,81 @@ +/* + * Intel MIC Platform Software Stack (MPSS) + * + * Copyright(c) 2013 Intel Corporation. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License, version 2, as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 + * USA. + * + * The full GNU General Public License is included in this distribution in + * the file called "COPYING". + * + * Disclaimer: The codes contained in these modules may be specific to + * the Intel Software Development Platform codenamed: Knights Ferry, and + * the Intel product codenamed: Knights Corner, and are not backward + * compatible with other Intel products. Additionally, Intel will NOT + * support the codes or instruction set in future products. + * + * Intel MIC driver. + * + */ +#ifndef __MIC_COMMON_DEVICE_H_ +#define __MIC_COMMON_DEVICE_H_ + +/** + * struct mic_mw - MIC memory window + * + * @pa: Base physical address. + * @va: Base ioremap'd virtual address. + * @len: Size of the memory window. + */ +struct mic_mw { + phys_addr_t pa; + void __iomem *va; + resource_size_t len; +}; + +/** + * mic_mmio_read() - read from an MMIO register. + * @mw: MMIO register base virtual address. + * @offset: register offset. + * + * RETURNS: register value. + */ +static inline u32 mic_mmio_read(struct mic_mw *mw, u32 offset) +{ + return ioread32(mw->va + offset); +} + +/** + * mic_mmio_write() - write to an MMIO register. + * @mw: MMIO register base virtual address. + * @val: the data value to put into the register + * @offset: register offset. + * + * RETURNS: none. + */ +static inline void +mic_mmio_write(struct mic_mw *mw, u32 val, u32 offset) +{ + iowrite32(val, mw->va + offset); +} + +/* + * Scratch pad register offsets used by the host to communicate + * device page DMA address to the card. + */ +#define MIC_DPLO_SPAD 14 +#define MIC_DPHI_SPAD 15 + +#endif diff --git a/drivers/misc/mic/host/Makefile b/drivers/misc/mic/host/Makefile new file mode 100644 index 0000000..0608bbb --- /dev/null +++ b/drivers/misc/mic/host/Makefile @@ -0,0 +1,11 @@ +# +# Makefile - Intel MIC Linux driver. +# Copyright(c) 2013, Intel Corporation. +# +obj-$(CONFIG_INTEL_MIC_HOST) += mic_host.o +mic_host-objs := mic_main.o +mic_host-objs += mic_x100.o +mic_host-objs += mic_sysfs.o +mic_host-objs += mic_boot.o +mic_host-objs += mic_smpt.o +mic_host-objs += mic_debugfs.o diff --git a/drivers/misc/mic/host/mic_boot.c b/drivers/misc/mic/host/mic_boot.c new file mode 100644 index 0000000..6485a87 --- /dev/null +++ b/drivers/misc/mic/host/mic_boot.c @@ -0,0 +1,179 @@ +/* + * Intel MIC Platform Software Stack (MPSS) + * + * Copyright(c) 2013 Intel Corporation. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License, version 2, as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 + * USA. + * + * The full GNU General Public License is included in this distribution in + * the file called "COPYING". + * + * Intel MIC Host driver. + * + */ +#include <linux/fs.h> +#include <linux/pci.h> +#include <linux/sched.h> +#include <linux/firmware.h> +#include <linux/delay.h> + +#include "mic_common.h" + +/** + * mic_reset - Reset the MIC device. + * @mdev: pointer to mic_device instance + */ +static void mic_reset(struct mic_device *mdev) +{ + int i; + +#define MIC_RESET_TO (45) + + mdev->ops->reset_fw_ready(mdev); + mdev->ops->reset(mdev); + + for (i = 0; i < MIC_RESET_TO; i++) { + if (mdev->ops->is_fw_ready(mdev)) + return; + /* + * Resets typically take 10s of seconds to complete. + * Since an MMIO read is required to check if the + * firmware is ready or not, a 1 second delay works nicely. + */ + msleep(1000); + } + mic_set_state(mdev, MIC_RESET_FAILED); +} + +/* Initialize the MIC bootparams */ +void mic_bootparam_init(struct mic_device *mdev) +{ + struct mic_bootparam *bootparam = mdev->dp; + + bootparam->magic = MIC_MAGIC; + bootparam->c2h_shutdown_db = mdev->shutdown_db; + bootparam->h2c_shutdown_db = -1; + bootparam->h2c_config_db = -1; + bootparam->shutdown_status = 0; + bootparam->shutdown_card = 0; +} + +/** + * mic_start - Start the MIC. + * @mdev: pointer to mic_device instance + * @buf: buffer containing boot string including firmware/ramdisk path. + * + * This function prepares an MIC for boot and initiates boot. + * RETURNS: An appropriate -ERRNO error value on error, or zero for success. + */ +int mic_start(struct mic_device *mdev, const char *buf) +{ + int rc; + mutex_lock(&mdev->mic_mutex); + if (MIC_OFFLINE != mdev->state) { + rc = -EINVAL; + goto unlock_ret; + } + rc = mdev->ops->load_mic_fw(mdev, buf); + if (rc) + goto unlock_ret; + mic_smpt_restore(mdev); + mic_intr_restore(mdev); + mdev->intr_ops->enable_interrupts(mdev); + mdev->ops->write_spad(mdev, MIC_DPLO_SPAD, mdev->dp_dma_addr); + mdev->ops->write_spad(mdev, MIC_DPHI_SPAD, mdev->dp_dma_addr >> 32); + mdev->ops->send_firmware_intr(mdev); + mic_set_state(mdev, MIC_ONLINE); +unlock_ret: + mutex_unlock(&mdev->mic_mutex); + return rc; +} + +/** + * mic_stop - Prepare the MIC for reset and trigger reset. + * @mdev: pointer to mic_device instance + * @force: force a MIC to reset even if it is already offline. + * + * RETURNS: None. + */ +void mic_stop(struct mic_device *mdev, bool force) +{ + mutex_lock(&mdev->mic_mutex); + if (MIC_OFFLINE != mdev->state || force) { + mic_bootparam_init(mdev); + mic_reset(mdev); + if (MIC_RESET_FAILED == mdev->state) + goto unlock; + mic_set_shutdown_status(mdev, MIC_NOP); + mic_set_state(mdev, MIC_OFFLINE); + } +unlock: + mutex_unlock(&mdev->mic_mutex); +} + +/** + * mic_shutdown - Initiate MIC shutdown. + * @mdev: pointer to mic_device instance + * + * RETURNS: None. + */ +void mic_shutdown(struct mic_device *mdev) +{ + struct mic_bootparam *bootparam = mdev->dp; + s8 db = bootparam->h2c_shutdown_db; + + mutex_lock(&mdev->mic_mutex); + if (MIC_ONLINE == mdev->state && db != -1) { + bootparam->shutdown_card = 1; + mdev->ops->send_intr(mdev, db); + mic_set_state(mdev, MIC_SHUTTING_DOWN); + } + mutex_unlock(&mdev->mic_mutex); +} + +/** + * mic_shutdown_work - Handle shutdown interrupt from MIC. + * @work: The work structure. + * + * This work is scheduled whenever the host has received a shutdown + * interrupt from the MIC. + */ +void mic_shutdown_work(struct work_struct *work) +{ + struct mic_device *mdev = container_of(work, struct mic_device, + shutdown_work); + struct mic_bootparam *bootparam = mdev->dp; + + mutex_lock(&mdev->mic_mutex); + mic_set_shutdown_status(mdev, bootparam->shutdown_status); + bootparam->shutdown_status = 0; + if (MIC_SHUTTING_DOWN != mdev->state) + mic_set_state(mdev, MIC_SHUTTING_DOWN); + mutex_unlock(&mdev->mic_mutex); +} + +/** + * mic_reset_trigger_work - Trigger MIC reset. + * @work: The work structure. + * + * This work is scheduled whenever the host wants to reset the MIC. + */ +void mic_reset_trigger_work(struct work_struct *work) +{ + struct mic_device *mdev = container_of(work, struct mic_device, + reset_trigger_work); + + mic_stop(mdev, false); +} diff --git a/drivers/misc/mic/host/mic_common.h b/drivers/misc/mic/host/mic_common.h new file mode 100644 index 0000000..e0bf3a8 --- /dev/null +++ b/drivers/misc/mic/host/mic_common.h @@ -0,0 +1,37 @@ +/* + * Intel MIC Platform Software Stack (MPSS) + * + * Copyright(c) 2013 Intel Corporation. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License, version 2, as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 + * USA. + * + * The full GNU General Public License is included in this distribution in + * the file called "COPYING". + * + * Intel MIC Host driver. + * + */ +#ifndef _MIC_HOST_COMMON_H_ +#define _MIC_HOST_COMMON_H_ + +#include <linux/cdev.h> +#include <linux/mic_common.h> + +#include "../common/mic_device.h" +#include "mic_device.h" +#include "mic_x100.h" +#include "mic_smpt.h" + +#endif diff --git a/drivers/misc/mic/host/mic_debugfs.c b/drivers/misc/mic/host/mic_debugfs.c new file mode 100644 index 0000000..5b7697e --- /dev/null +++ b/drivers/misc/mic/host/mic_debugfs.c @@ -0,0 +1,366 @@ +/* + * Intel MIC Platform Software Stack (MPSS) + * + * Copyright(c) 2013 Intel Corporation. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License, version 2, as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 + * USA. + * + * The full GNU General Public License is included in this distribution in + * the file called "COPYING". + * + * Intel MIC Host driver. + * + */ +#include <linux/fs.h> +#include <linux/pci.h> +#include <linux/sched.h> +#include <linux/debugfs.h> +#include <linux/module.h> +#include <linux/seq_file.h> + +#include "mic_common.h" +#include "mic_debugfs.h" + +/* Debugfs parent dir */ +static struct dentry *mic_dbg; + +/** + * log_buf_seq_show - Display MIC kernel log buffer. + * + * log_buf addr/len is read from System.map by user space + * and populated in sysfs entries. + */ +static int log_buf_seq_show(struct seq_file *s, void *unused) +{ + void __iomem *log_buf_va; + int __iomem *log_buf_len_va; + struct mic_device *mdev = s->private; + void *kva; + int size; + unsigned long aper_offset; + + if (!mdev || !mdev->log_buf_addr || !mdev->log_buf_len) + goto done; + /* + * Card kernel will never be relocated and any kernel text/data mapping + * can be translated to phys address by subtracting __START_KERNEL_map. + */ + aper_offset = (unsigned long)mdev->log_buf_len - __START_KERNEL_map; + log_buf_len_va = mdev->aper.va + aper_offset; + aper_offset = (unsigned long)mdev->log_buf_addr - __START_KERNEL_map; + log_buf_va = mdev->aper.va + aper_offset; + size = ioread32(log_buf_len_va); + + kva = kmalloc(size, GFP_KERNEL); + if (!kva) + goto done; + mutex_lock(&mdev->mic_mutex); + memcpy_fromio(kva, log_buf_va, size); + switch (mdev->state) { + case MIC_ONLINE: + /* Fall through */ + case MIC_SHUTTING_DOWN: + seq_write(s, kva, size); + break; + default: + break; + } + mutex_unlock(&mdev->mic_mutex); + kfree(kva); +done: + return 0; +} + +static int log_buf_open(struct inode *inode, struct file *file) +{ + return single_open(file, log_buf_seq_show, inode->i_private); +} + +static int log_buf_release(struct inode *inode, struct file *file) +{ + return single_release(inode, file); +} + +static const struct file_operations log_buf_ops = { + .owner = THIS_MODULE, + .open = log_buf_open, + .read = seq_read, + .llseek = seq_lseek, + .release = log_buf_release +}; + +static int smpt_seq_show(struct seq_file *s, void *pos) +{ + int i; + struct mic_device *mdev = s->private; + unsigned long flags; + + seq_printf(s, "MIC %-2d |%-10s| %-14s %-10s %-10s %-10s %-10s\n", + mdev->id, "SMPT entry", "SW DMA addr", + "RefCount", "Register", "Snoop", "RegDMAAddr"); + seq_puts(s, "====================================================\n"); + + if (mdev->smpt) { + struct mic_smpt_info *smpt_info = mdev->smpt; + spin_lock_irqsave(&smpt_info->smpt_lock, flags); + for (i = 0; i < smpt_info->info.num_reg; i++) { + u32 val = mic_mmio_read(&mdev->mmio, + MIC_X100_SBOX_BASE_ADDRESS + + MIC_X100_SBOX_SMPT00 + (4 * i)); + seq_printf(s, "%9s|%-10d| %-#14llx %-10lld %-#10x", + " ", i, smpt_info->entry[i].dma_addr, + smpt_info->entry[i].ref_count, val); + seq_printf(s, " %-10s 0x%llx\n", + (val & 0x1) ? "OFF" : "ON", + ((u64)val >> 2ULL) << + smpt_info->info.page_shift); + } + spin_unlock_irqrestore(&smpt_info->smpt_lock, flags); + } + seq_puts(s, "====================================================\n"); + return 0; +} + +static int smpt_debug_open(struct inode *inode, struct file *file) +{ + return single_open(file, smpt_seq_show, inode->i_private); +} + +static int smpt_debug_release(struct inode *inode, struct file *file) +{ + return single_release(inode, file); +} + +static const struct file_operations smpt_file_ops = { + .owner = THIS_MODULE, + .open = smpt_debug_open, + .read = seq_read, + .llseek = seq_lseek, + .release = smpt_debug_release +}; + +static int soft_reset_seq_show(struct seq_file *s, void *pos) +{ + struct mic_device *mdev = s->private; + + mic_stop(mdev, true); + return 0; +} + +static int soft_reset_debug_open(struct inode *inode, struct file *file) +{ + return single_open(file, soft_reset_seq_show, inode->i_private); +} + +static int soft_reset_debug_release(struct inode *inode, struct file *file) +{ + return single_release(inode, file); +} + +static const struct file_operations soft_reset_ops = { + .owner = THIS_MODULE, + .open = soft_reset_debug_open, + .read = seq_read, + .llseek = seq_lseek, + .release = soft_reset_debug_release +}; + +static int post_code_seq_show(struct seq_file *s, void *pos) +{ + struct mic_device *mdev = s->private; + u32 reg = mdev->ops->get_postcode(mdev); + + seq_printf(s, "%c%c", reg & 0xff, (reg >> 8) & 0xff); + return 0; +} + +static int post_code_debug_open(struct inode *inode, struct file *file) +{ + return single_open(file, post_code_seq_show, inode->i_private); +} + +static int post_code_debug_release(struct inode *inode, struct file *file) +{ + return single_release(inode, file); +} + +static const struct file_operations post_code_ops = { + .owner = THIS_MODULE, + .open = post_code_debug_open, + .read = seq_read, + .llseek = seq_lseek, + .release = post_code_debug_release +}; + +static int dp_seq_show(struct seq_file *s, void *pos) +{ + struct mic_device *mdev = s->private; + struct mic_bootparam *bootparam = mdev->dp; + + seq_printf(s, "Bootparam: magic 0x%x\n", + bootparam->magic); + seq_printf(s, "Bootparam: h2c_shutdown_db %d\n", + bootparam->h2c_shutdown_db); + seq_printf(s, "Bootparam: h2c_config_db %d\n", + bootparam->h2c_config_db); + seq_printf(s, "Bootparam: c2h_shutdown_db %d\n", + bootparam->c2h_shutdown_db); + seq_printf(s, "Bootparam: shutdown_status %d\n", + bootparam->shutdown_status); + seq_printf(s, "Bootparam: shutdown_card %d\n", + bootparam->shutdown_card); + + return 0; +} + +static int dp_debug_open(struct inode *inode, struct file *file) +{ + return single_open(file, dp_seq_show, inode->i_private); +} + +static int dp_debug_release(struct inode *inode, struct file *file) +{ + return single_release(inode, file); +} + +static const struct file_operations dp_ops = { + .owner = THIS_MODULE, + .open = dp_debug_open, + .read = seq_read, + .llseek = seq_lseek, + .release = dp_debug_release +}; + +static int msi_irq_info_seq_show(struct seq_file *s, void *pos) +{ + struct mic_device *mdev = s->private; + int reg; + int i, j; + u16 entry; + u16 vector; + + if (pci_dev_msi_enabled(mdev->pdev)) { + for (i = 0; i < mdev->irq_info.num_vectors; i++) { + if (mdev->pdev->msix_enabled) { + entry = mdev->irq_info.msix_entries[i].entry; + vector = mdev->irq_info.msix_entries[i].vector; + } else { + entry = 0; + vector = mdev->pdev->irq; + } + + reg = mdev->intr_ops->read_msi_to_src_map(mdev, entry); + + seq_printf(s, "%s %-10d %s %-10d MXAR[%d]: %08X\n", + "IRQ:", vector, "Entry:", entry, i, reg); + + seq_printf(s, "%-10s", "offset:"); + for (j = (MIC_NUM_OFFSETS - 1); j >= 0; j--) + seq_printf(s, "%4d ", j); + seq_puts(s, "\n"); + + + seq_printf(s, "%-10s", "count:"); + for (j = (MIC_NUM_OFFSETS - 1); j >= 0; j--) + seq_printf(s, "%4d ", + (mdev->irq_info.mic_msi_map[i] & BIT(j)) ? + 1 : 0); + seq_puts(s, "\n\n"); + } + } else { + seq_puts(s, "MSI/MSIx interrupts not enabled\n"); + } + + return 0; + +} + +static int msi_irq_info_debug_open(struct inode *inode, struct file *file) +{ + return single_open(file, msi_irq_info_seq_show, inode->i_private); +} + +static int msi_irq_info_debug_release(struct inode *inode, struct file *file) +{ + return single_release(inode, file); +} + +static const struct file_operations msi_irq_info_ops = { + .owner = THIS_MODULE, + .open = msi_irq_info_debug_open, + .read = seq_read, + .llseek = seq_lseek, + .release = msi_irq_info_debug_release +}; + +/** + * mic_create_debug_dir - Initialize MIC debugfs entries. + */ +void __init mic_create_debug_dir(struct mic_device *mdev) +{ + if (!mic_dbg) + return; + + mdev->dbg_dir = debugfs_create_dir(mdev->name, mic_dbg); + if (!mdev->dbg_dir) + return; + + debugfs_create_file("log_buf", 0444, mdev->dbg_dir, + mdev, &log_buf_ops); + + debugfs_create_file("smpt", 0444, mdev->dbg_dir, + mdev, &smpt_file_ops); + + debugfs_create_file("soft_reset", 0444, mdev->dbg_dir, + mdev, &soft_reset_ops); + + debugfs_create_file("post_code", 0444, mdev->dbg_dir, + mdev, &post_code_ops); + + debugfs_create_file("dp", 0444, mdev->dbg_dir, + mdev, &dp_ops); + + debugfs_create_file("msi_irq_info", 0444, mdev->dbg_dir, + mdev, &msi_irq_info_ops); +} + +/** + * mic_delete_debug_dir - Uninitialize MIC debugfs entries. + */ +void mic_delete_debug_dir(struct mic_device *mdev) +{ + if (!mdev->dbg_dir) + return; + + debugfs_remove_recursive(mdev->dbg_dir); +} + +/** + * mic_init_debugfs - Initialize global debugfs entry. + */ +void __init mic_init_debugfs(void) +{ + mic_dbg = debugfs_create_dir(KBUILD_MODNAME, NULL); + if (!mic_dbg) + pr_err("can't create debugfs dir\n"); +} + +/** + * mic_exit_debugfs - Uninitialize global debugfs entry + */ +void mic_exit_debugfs(void) +{ + debugfs_remove(mic_dbg); +} diff --git a/drivers/misc/mic/host/mic_debugfs.h b/drivers/misc/mic/host/mic_debugfs.h new file mode 100644 index 0000000..ee9241c --- /dev/null +++ b/drivers/misc/mic/host/mic_debugfs.h @@ -0,0 +1,34 @@ +/* + * Intel MIC Platform Software Stack (MPSS) + * + * Copyright(c) 2013 Intel Corporation. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License, version 2, as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 + * USA. + * + * The full GNU General Public License is included in this distribution in + * the file called "COPYING". + * + * Intel MIC Host driver. + * + */ +#ifndef _MIC_DEBUGFS_H_ +#define _MIC_DEBUGFS_H_ + +void __init mic_create_debug_dir(struct mic_device *dev); +void mic_delete_debug_dir(struct mic_device *dev); +void __init mic_init_debugfs(void); +void mic_exit_debugfs(void); + +#endif diff --git a/drivers/misc/mic/host/mic_device.h b/drivers/misc/mic/host/mic_device.h new file mode 100644 index 0000000..dd15837 --- /dev/null +++ b/drivers/misc/mic/host/mic_device.h @@ -0,0 +1,280 @@ +/* + * Intel MIC Platform Software Stack (MPSS) + * + * Copyright(c) 2013 Intel Corporation. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License, version 2, as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 + * USA. + * + * The full GNU General Public License is included in this distribution in + * the file called "COPYING". + * + * Intel MIC Host driver. + * + */ +#ifndef _MIC_DEVICE_H_ +#define _MIC_DEVICE_H_ + +#define MIC_MAX_NUM_DEVS 256 +#define MIC_NUM_INTR_TYPES 3 +#define MIC_SHUTDOWN_TIMEOUT (60 * HZ) +#define MIC_MUTEX_HELD true + +/* The minimum number of msix vectors required + * for normal operation */ +#define MIC_MIN_MSIX 5 + +/** + * struct mic_intr_info - Contains h/w specific interrupt sources info + * + * @intr_start_idx: Contains the starting indexes of the + * interrupt types. + * @intr_len: Contains the length of the interrupt types. + */ +struct mic_intr_info { + u16 intr_start_idx[MIC_NUM_INTR_TYPES]; + u16 intr_len[MIC_NUM_INTR_TYPES]; +}; + +/** + * mic_intr_source - The type of source that will generate + * the interrupt.The number of types needs to be in sync with + * MIC_NUM_INTR_TYPES + * + * MIC_INTR_DB: The source is a doorbell + * MIC_INTR_DMA: The source is a DMA channel + * MIC_INTR_ERR: The source is an error interrupt e.g. SBOX ERR + */ +enum mic_intr_type { + MIC_INTR_DB = 0, + MIC_INTR_DMA, + MIC_INTR_ERR, +}; + +/** + * enum mic_hw_family - The hardware family to which a device belongs. + */ +enum mic_hw_family { + MIC_FAMILY_X100 = 0, + MIC_FAMILY_UNKNOWN +}; + +/** + * enum mic_stepping - MIC stepping ids. + */ +enum mic_stepping { + MIC_A0_STEP = 0x0, + MIC_B0_STEP = 0x10, + MIC_B1_STEP = 0x11, + MIC_C0_STEP = 0x20, +}; + +/** + * struct mic_irq_info - OS specific irq information + * + * @next_avail_src: next available doorbell that can be assigned. + * @msix_entries: msix entries allocated while setting up MSI-x + * @mic_msi_map: The MSI/MSI-x mapping information. + * @num_vectors: The number of MSI/MSI-x vectors that have been allocated. + * @cb_id: Running count of the number of callbacks registered. + * @mic_intr_lock: spinlock to protect the interrupt callback list. + * @cb_list: Array of callback lists one for each source. + */ +struct mic_irq_info { + int next_avail_src; + struct msix_entry *msix_entries; + u32 *mic_msi_map; + u16 num_vectors; + u32 cb_id; + spinlock_t mic_intr_lock; + struct list_head *cb_list; +}; + +/** + * struct mic_intr_cb - Interrupt callback structure. + * + * @func: The callback function + * @data: Private data of the requester. + * @cb_id: The callback id. Identifies this callback. + * @list: list head pointing to the next callback structure. + */ +struct mic_intr_cb { + irqreturn_t (*func) (int irq, void *data); + void *data; + u32 cb_id; + struct list_head list; +}; + +/** + * struct mic_irq - opaque pointer used as cookie + */ +struct mic_irq; + +/** + * struct mic_device - MIC device information for each card. + * + * @name: Unique name for this MIC device. + * @mmio: MMIO bar information. + * @dbg_dir: debugfs directory of this MIC device. + * @pdev: The PCI device structure. + * @mic_mutex: Mutex for synchronizing access to mic_device. + * @family: The MIC family to which this device belongs. + * @ops: MIC HW specific operations. + * @intr_ops: HW specific interrupt operations. + * @smpt_ops: Hardware specific SMPT operations. + * @id: The unique device id for this MIC device. + * @stepping: Stepping ID. + * @attr_group: Sysfs attribute group. + * @dev: Device for sysfs entries. + * @aper: Aperture bar information. + * @smpt: MIC SMPT information. + * @intr_info: H/W specific interrupt information. + * @cmdline: Kernel command line. + * @ipaddr: IP address for this device. + * @firmware: Firmware file name. + * @ramdisk: Ramdisk file name. + * @bootaddr: MIC boot address. + * @reset_trigger_work: Work for triggering reset requests. + * @shutdown_work: Work for handling shutdown interrupts. + * @state: MIC state. + * @shutdown_status: MIC status reported by card for shutdown/crashes. + * @state_sysfs: Sysfs dirent for notifying ring 3 about MIC state changes. + * @reset_wait: Waitqueue for sleeping while reset completes. + * @log_buf_addr: Log buffer address for MIC. + * @log_buf_len: Log buffer length address for MIC. + * @dp: virtio device page + * @dp_dma_addr: virtio device page DMA address. + * @vdev_list: list of virtio devices. + * @default_attrs: Sysfs attributes. + * @cdev: Character device for MIC. + * @irq_info: The OS specific irq information + * @shutdown_db: shutdown doorbell. + * @shutdown_cookie: shutdown cookie. + */ +struct mic_device { + char name[20]; + struct mic_mw mmio; + struct dentry *dbg_dir; + struct pci_dev *pdev; + struct mutex mic_mutex; + enum mic_hw_family family; + struct mic_hw_ops *ops; + struct mic_hw_intr_ops *intr_ops; + struct mic_smpt_ops *smpt_ops; + int id; + enum mic_stepping stepping; + struct attribute_group attr_group; + struct device *dev; + struct mic_mw aper; + struct mic_smpt_info *smpt; + struct mic_intr_info *intr_info; + char *cmdline; + char *ipaddr; + char *firmware; + char *ramdisk; + u32 bootaddr; + struct work_struct reset_trigger_work; + struct work_struct shutdown_work; + u8 state; + u8 shutdown_status; + struct sysfs_dirent *state_sysfs; + struct completion reset_wait; + void *log_buf_addr; + int *log_buf_len; + void *dp; + dma_addr_t dp_dma_addr; + struct list_head vdev_list; + struct attribute **default_attrs; + struct cdev cdev; + struct mic_irq_info irq_info; + int shutdown_db; + struct mic_irq *shutdown_cookie; +}; + +/** + * struct mic_hw_intr_ops: MIC HW specific interrupt operations + * @intr_init: Initialize H/W specific interrupt information. + * @enable_interrupts: Enable interrupts from the hardware. + * @disable_interrupts: Disable interrupts from the hardware. + * @program_msi_to_src_map: Update MSI mapping registers with + * irq information. + * @read_msi_to_src_map: Read MSI mapping registers containing + * irq information. + */ +struct mic_hw_intr_ops { + void (*intr_init)(struct mic_device *mdev); + void (*enable_interrupts)(struct mic_device *mdev); + void (*disable_interrupts)(struct mic_device *mdev); + void (*program_msi_to_src_map) (struct mic_device *mdev, + int idx, int intr_src, bool set); + u32 (*read_msi_to_src_map) (struct mic_device *mdev, + int idx); +}; + +/** + * struct mic_hw_ops - MIC HW specific operations. + * @aper_bar: Aperture bar resource number. + * @mmio_bar: MMIO bar resource number. + * @init: Initialize the MIC HW information. + * @read_spad: Read from scratch pad register. + * @write_spad: Write to scratch pad register. + * @reset: Reset the remote processor. + * @reset_fw_ready: Reset firmware ready field. + * @is_fw_ready: Check if firmware is ready for OS download. + * @send_firmware_intr: Send an interrupt to the card firmware. + * @send_intr: Send an interrupt for a particular doorbell on the card. + * @ack_interrupt: Hardware specific operations to ack the h/w on + * receipt of an interrupt. + * @load_mic_fw: Load firmware segments required to boot the card + * into card memory. This includes the kernel, command line, ramdisk etc. + * @get_postcode: Get post code status from firmware. + */ +struct mic_hw_ops { + u8 aper_bar; + u8 mmio_bar; + void (*init)(struct mic_device *mdev); + u32 (*read_spad)(struct mic_device *mdev, unsigned int idx); + void (*write_spad)(struct mic_device *mdev, u32 idx, u32 val); + void (*reset)(struct mic_device *mdev); + void (*reset_fw_ready)(struct mic_device *mdev); + bool (*is_fw_ready)(struct mic_device *mdev); + void (*send_firmware_intr)(struct mic_device *mdev); + void (*send_intr)(struct mic_device *mdev, int doorbell); + u32 (*ack_interrupt)(struct mic_device *mdev); + int (*load_mic_fw)(struct mic_device *mdev, const char *buf); + u32 (*get_postcode)(struct mic_device *mdev); +}; + +int mic_start(struct mic_device *mdev, const char *buf); +void mic_stop(struct mic_device *mdev, bool force); +void mic_shutdown(struct mic_device *mdev); +void mic_reset_delayed_work(struct work_struct *work); +void mic_reset_trigger_work(struct work_struct *work); +void mic_shutdown_work(struct work_struct *work); + +int mic_next_db(struct mic_device *mdev); +struct mic_irq *mic_request_irq(struct mic_device *mdev, + irqreturn_t (*func)(int irq, void *data), + const char *name, void *data, int intr_src, + enum mic_intr_type type); + +void mic_free_irq(struct mic_device *mdev, + struct mic_irq *cookie, void *data); +void mic_intr_restore(struct mic_device *mdev); + +void mic_sysfs_init(struct mic_device *mdev); +void mic_bootparam_init(struct mic_device *mdev); +void mic_set_state(struct mic_device *mdev, u8 state); +void mic_set_shutdown_status(struct mic_device *mdev, u8 status); +#endif diff --git a/drivers/misc/mic/host/mic_main.c b/drivers/misc/mic/host/mic_main.c new file mode 100644 index 0000000..70cc235 --- /dev/null +++ b/drivers/misc/mic/host/mic_main.c @@ -0,0 +1,1095 @@ +/* + * Intel MIC Platform Software Stack (MPSS) + * + * Copyright(c) 2013 Intel Corporation. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License, version 2, as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 + * USA. + * + * The full GNU General Public License is included in this distribution in + * the file called "COPYING". + * + * Intel MIC Host driver. + * + * Global TODO's across the driver to be added after initial base + * patches are accepted upstream: + * 1) Enable DMA support. + * 2) Enable per vring interrupt support. + */ +#include <linux/module.h> +#include <linux/fs.h> +#include <linux/pci.h> +#include <linux/interrupt.h> +#include <linux/firmware.h> +#include <linux/completion.h> +#include <linux/poll.h> + +#include "mic_common.h" +#include "mic_debugfs.h" + +static const char mic_driver_name[] = "mic"; + +static DEFINE_PCI_DEVICE_TABLE(mic_pci_tbl) = { + {PCI_DEVICE(PCI_VENDOR_ID_INTEL, MIC_X100_PCI_DEVICE_2250)}, + {PCI_DEVICE(PCI_VENDOR_ID_INTEL, MIC_X100_PCI_DEVICE_2251)}, + {PCI_DEVICE(PCI_VENDOR_ID_INTEL, MIC_X100_PCI_DEVICE_2252)}, + {PCI_DEVICE(PCI_VENDOR_ID_INTEL, MIC_X100_PCI_DEVICE_2253)}, + {PCI_DEVICE(PCI_VENDOR_ID_INTEL, MIC_X100_PCI_DEVICE_2254)}, + {PCI_DEVICE(PCI_VENDOR_ID_INTEL, MIC_X100_PCI_DEVICE_2255)}, + {PCI_DEVICE(PCI_VENDOR_ID_INTEL, MIC_X100_PCI_DEVICE_2256)}, + {PCI_DEVICE(PCI_VENDOR_ID_INTEL, MIC_X100_PCI_DEVICE_2257)}, + {PCI_DEVICE(PCI_VENDOR_ID_INTEL, MIC_X100_PCI_DEVICE_2258)}, + {PCI_DEVICE(PCI_VENDOR_ID_INTEL, MIC_X100_PCI_DEVICE_2259)}, + {PCI_DEVICE(PCI_VENDOR_ID_INTEL, MIC_X100_PCI_DEVICE_225a)}, + {PCI_DEVICE(PCI_VENDOR_ID_INTEL, MIC_X100_PCI_DEVICE_225b)}, + {PCI_DEVICE(PCI_VENDOR_ID_INTEL, MIC_X100_PCI_DEVICE_225c)}, + {PCI_DEVICE(PCI_VENDOR_ID_INTEL, MIC_X100_PCI_DEVICE_225d)}, + {PCI_DEVICE(PCI_VENDOR_ID_INTEL, MIC_X100_PCI_DEVICE_225e)}, + + /* required last entry */ + { 0, } +}; + +MODULE_DEVICE_TABLE(pci, mic_pci_tbl); + +/** + * struct mic_info - Global information about all MIC devices. + * + * @next_id: Next available MIC device id. + * @mic_class: Class of MIC devices for sysfs accessibility. + * @dev: Range for device node numbers. + */ +struct mic_info { + int next_id; + struct class *mic_class; + dev_t dev; +}; + +/* g_mic - Global information about all MIC devices. */ +static struct mic_info g_mic; + +/* Initialize the device page */ +static int mic_dp_init(struct mic_device *mdev) +{ + mdev->dp = kzalloc(MIC_DP_SIZE, GFP_KERNEL); + if (!mdev->dp) { + dev_err(&mdev->pdev->dev, "%s %d err %d\n", + __func__, __LINE__, -ENOMEM); + return -ENOMEM; + } + + mdev->dp_dma_addr = mic_map_single(mdev, + mdev->dp, MIC_DP_SIZE); + if (mic_map_error(mdev->dp_dma_addr)) { + kfree(mdev->dp); + dev_err(&mdev->pdev->dev, "%s %d err %d\n", + __func__, __LINE__, -ENOMEM); + return -ENOMEM; + } + mdev->ops->write_spad(mdev, MIC_DPLO_SPAD, mdev->dp_dma_addr); + mdev->ops->write_spad(mdev, MIC_DPHI_SPAD, mdev->dp_dma_addr >> 32); + return 0; +} + +/* Uninitialize the device page */ +static void mic_dp_uninit(struct mic_device *mdev) +{ + mic_unmap_single(mdev, mdev->dp_dma_addr, MIC_DP_SIZE); + kfree(mdev->dp); +} + +/** + * mic_shutdown_db - Shutdown doorbell interrupt handler. + */ +static irqreturn_t mic_shutdown_db(int irq, void *data) +{ + struct mic_device *mdev = data; + struct mic_bootparam *bootparam = mdev->dp; + + mdev->ops->ack_interrupt(mdev); + + switch (bootparam->shutdown_status) { + case MIC_HALTED: + case MIC_POWER_OFF: + case MIC_RESTART: + /* Fall through */ + case MIC_CRASHED: + schedule_work(&mdev->shutdown_work); + break; + default: + break; + }; + return IRQ_HANDLED; +} + +/* + * mic_invoke_callback - Invoke callback functions registered for + * the corresponding source id. + * + * @mdev: pointer to the mic_device instance + * @idx: The interrupt source id. + * + * Returns none. + */ +static inline void mic_invoke_callback(struct mic_device *mdev, int idx) +{ + struct mic_intr_cb *intr_cb; + + spin_lock(&mdev->irq_info.mic_intr_lock); + list_for_each_entry(intr_cb, &mdev->irq_info.cb_list[idx], list) + if (intr_cb->func) + intr_cb->func(mdev->pdev->irq, intr_cb->data); + spin_unlock(&mdev->irq_info.mic_intr_lock); +} + +/** + * mic_interrupt - Generic interrupt handler for + * MSI and INTx based interrupts. + */ +static irqreturn_t mic_interrupt(int irq, void *dev) +{ + struct mic_device *mdev = dev; + struct mic_intr_info *info = mdev->intr_info; + u32 mask; + int i; + + mask = mdev->ops->ack_interrupt(mdev); + if (!mask) + return IRQ_NONE; + + for (i = info->intr_start_idx[MIC_INTR_DB]; + i < info->intr_len[MIC_INTR_DB]; i++) + if (mask & BIT(i)) + mic_invoke_callback(mdev, i); + + return IRQ_HANDLED; +} + +/* Retrieve the next doorbell interrupt source. */ +int mic_next_db(struct mic_device *mdev) +{ + int next_db; + + next_db = mdev->irq_info.next_avail_src % + mdev->intr_info->intr_len[MIC_INTR_DB]; + mdev->irq_info.next_avail_src++; + return next_db; +} + +/* Return the interrupt offset from the index. Index is 0 based. */ +static u16 map_src_to_offset(struct mic_device *mdev, + int intr_src, enum mic_intr_type type) { + + if (type >= MIC_NUM_INTR_TYPES) + return MIC_NUM_OFFSETS; + + if (intr_src >= mdev->intr_info->intr_len[type]) + return MIC_NUM_OFFSETS; + + return mdev->intr_info->intr_start_idx[type] + intr_src; +} + +/* Return next available msix_entry. */ +static struct msix_entry *mic_get_available_vector(struct mic_device *mdev) +{ + int i; + struct mic_irq_info *info = &mdev->irq_info; + + for (i = 0; i < info->num_vectors; i++) { + if (!info->mic_msi_map[i]) + return &info->msix_entries[i]; + } + return NULL; +} + +/** + * mic_intr_restore - Restore h/w specific interrupt + * registers after a card reset. mic_mutex needs to be + * held before calling this function. + * + */ +void mic_intr_restore(struct mic_device *mdev) +{ + int entry, offset; + + if (!pci_dev_msi_enabled(mdev->pdev)) + return; + + WARN_ON(!mutex_is_locked(&mdev->mic_mutex)); + for (entry = 0; entry < mdev->irq_info.num_vectors; entry++) { + for (offset = 0; offset < MIC_NUM_OFFSETS; offset++) { + if (mdev->irq_info.mic_msi_map[entry] & BIT(offset)) + mdev->intr_ops->program_msi_to_src_map(mdev, + entry, offset, true); + } + } +} + +/** + * mic_register_intr_callback - Register a callback handler for the + * given source id. + * + * @mdev: pointer to the mic_device instance + * @idx: The source id to be registered. + * @func: The function to be called when the source id receives + * the interrupt. + * @data: Private data of the requester. + * Return the callback structure that was registered or an + * appropriate error on failure. + */ +static struct mic_intr_cb *mic_register_intr_callback(struct mic_device *mdev, + u8 idx, irqreturn_t (*func) (int irq, void *dev), + void *data) +{ + struct mic_intr_cb *intr_cb; + unsigned long flags; + intr_cb = kmalloc(sizeof(struct mic_intr_cb), GFP_KERNEL); + + if (!intr_cb) + return ERR_PTR(-ENOMEM); + + intr_cb->func = func; + intr_cb->data = data; + intr_cb->cb_id = mdev->irq_info.cb_id++; + + spin_lock_irqsave(&mdev->irq_info.mic_intr_lock, flags); + list_add_tail(&intr_cb->list, &mdev->irq_info.cb_list[idx]); + spin_unlock_irqrestore(&mdev->irq_info.mic_intr_lock, flags); + + return intr_cb; +} + +/** + * mic_unregister_intr_callback - Unregister the callback handler + * identified by its callback id. + * + * @mdev: pointer to the mic_device instance + * @idx: The callback structure id to be unregistered. + * Return the source id that was unregistered or MIC_NUM_OFFSETS if no + * such callback handler was found. + */ +static u8 mic_unregister_intr_callback(struct mic_device *mdev, u32 idx) +{ + struct list_head *pos, *tmp; + struct mic_intr_cb *intr_cb; + unsigned long flags; + int i; + + for (i = 0; i < MIC_NUM_OFFSETS; i++) { + spin_lock_irqsave(&mdev->irq_info.mic_intr_lock, flags); + list_for_each_safe(pos, tmp, &mdev->irq_info.cb_list[i]) { + intr_cb = list_entry(pos, struct mic_intr_cb, list); + if (intr_cb->cb_id == idx) { + list_del(pos); + kfree(intr_cb); + spin_unlock_irqrestore( + &mdev->irq_info.mic_intr_lock, flags); + return i; + } + } + spin_unlock_irqrestore(&mdev->irq_info.mic_intr_lock, flags); + } + + return MIC_NUM_OFFSETS; +} + +/** + * mic_alloc_msi_map - Allocate mapping information for MSI + * and MSI-x interrupts. + * + * @mdev: pointer to mic_device instance + * + * 0 on success. Appropriate error on failure. + */ +static int mic_alloc_msi_map(struct mic_device *mdev) +{ + mdev->irq_info.mic_msi_map = kzalloc((sizeof(u32) * + mdev->irq_info.num_vectors), GFP_KERNEL); + + if (!mdev->irq_info.mic_msi_map) + return -ENOMEM; + return 0; +} + +/** + * mic_free_msi_map - Free mapping information for MSI + * and MSI-x interrupts. + * + * @mdev: pointer to mic_device instance + */ +static void mic_free_msi_map(struct mic_device *mdev) +{ + kfree(mdev->irq_info.mic_msi_map); +} + + +#define COOKIE_ID_SHIFT 16 +#define GET_ENTRY(cookie) ((cookie) & 0xFFFF) +#define GET_OFFSET(cookie) ((cookie) >> COOKIE_ID_SHIFT) +#define MK_COOKIE(x, y) ((x) | (y) << COOKIE_ID_SHIFT) + +/** + * mic_request_irq - request an irq. mic_mutex needs + * to be held before calling this function. + * + * @mdev: pointer to mic_device instance + * @func: The callback function that handles the interrupt. + * The function needs to call ack_interrupts + * (mdev->ops->ack_interrupt(mdev)) when handling the interrupts. + * @name: The ASCII name of the callee requesting the irq. + * @data: private data that is returned back when calling the + * function handler. + * @intr_src: The source id of the requester. Its the doorbell id + * for Doorbell interrupts and DMA channel id for DMA interrupts. + * @type: The type of interrupt. Values defined in mic_intr_type + * + * returns: The cookie that is transparent to the caller. Passed + * back when calling mic_free_irq. An appropriate error code + * is returned on failure. Caller needs to use IS_ERR(return_val) + * to check for failure and PTR_ERR(return_val) to obtained the + * error code. + * + */ +struct mic_irq *mic_request_irq(struct mic_device *mdev, + irqreturn_t (*func)(int irq, void *dev), + const char *name, void *data, int intr_src, + enum mic_intr_type type) { + + u16 offset; + int rc = 0; + struct msix_entry *msix = NULL; + unsigned long cookie = 0; + u16 entry; + struct mic_intr_cb *intr_cb; + + if (!mdev) { + rc = -EINVAL; + goto err; + } + + WARN_ON(!mutex_is_locked(&mdev->mic_mutex)); + offset = map_src_to_offset(mdev, intr_src, type); + if (offset >= MIC_NUM_OFFSETS) { + dev_err(&mdev->pdev->dev, + "Error mapping index %d to a valid source id.\n", + intr_src); + rc = -EINVAL; + goto err; + } + + if (mdev->irq_info.num_vectors > 1) { + msix = mic_get_available_vector(mdev); + if (!msix) { + dev_err(&mdev->pdev->dev, + "No MSIx vectors available for use.\n"); + rc = -ENOSPC; + goto err; + } + + rc = request_irq(msix->vector, func, 0, name, data); + if (rc) { + dev_dbg(&mdev->pdev->dev, + "request irq failed rc = %d\n", rc); + goto err; + } + + entry = msix->entry; + mdev->irq_info.mic_msi_map[entry] |= BIT(offset); + mdev->intr_ops->program_msi_to_src_map(mdev, + entry, offset, true); + cookie = MK_COOKIE(entry, offset); + dev_dbg(&mdev->pdev->dev, "irq: %d assigned for src: %d\n", + msix->vector, intr_src); + } else { + intr_cb = mic_register_intr_callback(mdev, + offset, func, data); + if (IS_ERR(intr_cb)) { + dev_err(&mdev->pdev->dev, + "No available callback entries for use\n"); + rc = PTR_ERR(intr_cb); + goto err; + } + + entry = 0; + if (pci_dev_msi_enabled(mdev->pdev)) { + mdev->irq_info.mic_msi_map[entry] |= (1 << offset); + mdev->intr_ops->program_msi_to_src_map(mdev, + entry, offset, true); + } + cookie = MK_COOKIE(entry, intr_cb->cb_id); + dev_dbg(&mdev->pdev->dev, "callback %d registered for src: %d\n", + intr_cb->cb_id, intr_src); + } + + return (struct mic_irq *)cookie; +err: + return ERR_PTR(rc); +} + +/** + * mic_free_irq - free irq. mic_mutex + * needs to be held before calling this function. + * + * @mdev: pointer to mic_device instance + * @cookie: cookie obtained during a successful call to mic_request_irq + * @data: private data specified by the calling function during the + * mic_request_irq + * + * returns: none. + */ +void mic_free_irq(struct mic_device *mdev, + struct mic_irq *cookie, void *data) +{ + u32 offset; + u32 entry; + u8 src_id; + unsigned int irq; + + if (!mdev) + return; + + WARN_ON(!mutex_is_locked(&mdev->mic_mutex)); + + entry = GET_ENTRY((unsigned long)cookie); + offset = GET_OFFSET((unsigned long)cookie); + if (mdev->irq_info.num_vectors > 1) { + if (entry >= mdev->irq_info.num_vectors) { + dev_warn(&mdev->pdev->dev, + "entry %d should be < num_irq %d\n", + entry, mdev->irq_info.num_vectors); + return; + } + irq = mdev->irq_info.msix_entries[entry].vector; + free_irq(irq, data); + mdev->irq_info.mic_msi_map[entry] &= ~(BIT(offset)); + mdev->intr_ops->program_msi_to_src_map(mdev, + entry, offset, false); + + dev_dbg(&mdev->pdev->dev, "irq: %d freed\n", irq); + } else { + irq = mdev->pdev->irq; + src_id = mic_unregister_intr_callback(mdev, offset); + if (src_id >= MIC_NUM_OFFSETS) { + dev_warn(&mdev->pdev->dev, "Error unregistering callback\n"); + return; + } + if (pci_dev_msi_enabled(mdev->pdev)) { + mdev->irq_info.mic_msi_map[entry] &= ~(BIT(src_id)); + mdev->intr_ops->program_msi_to_src_map(mdev, + entry, src_id, false); + } + dev_dbg(&mdev->pdev->dev, "callback %d unregistered for src: %d\n", + offset, src_id); + } +} + +/** + * mic_setup_msix - Initializes MSIx interrupts. + * + * @mdev: pointer to mic_device instance + * + * + * RETURNS: An appropriate -ERRNO error value on error, or zero for success. + */ +static int mic_setup_msix(struct mic_device *mdev) +{ + struct pci_dev *pdev = mdev->pdev; + int rc, i; + + mdev->irq_info.msix_entries = kmalloc(sizeof(struct msix_entry) * + MIC_MIN_MSIX, GFP_KERNEL); + if (!mdev->irq_info.msix_entries) { + rc = -ENOMEM; + goto err_nomem1; + } + + for (i = 0; i < MIC_MIN_MSIX; i++) + mdev->irq_info.msix_entries[i].entry = i; + + rc = pci_enable_msix(pdev, mdev->irq_info.msix_entries, + MIC_MIN_MSIX); + if (rc) { + dev_dbg(&pdev->dev, "Error enabling MSIx. rc = %d\n", rc); + goto err_enable_msix; + } + + mdev->irq_info.num_vectors = MIC_MIN_MSIX; + rc = mic_alloc_msi_map(mdev); + if (rc) + goto err_nomem2; + + dev_dbg(&mdev->pdev->dev, + "%d MSIx irqs setup\n", mdev->irq_info.num_vectors); + return 0; + +err_nomem2: + pci_disable_msix(pdev); +err_enable_msix: + kfree(mdev->irq_info.msix_entries); +err_nomem1: + mdev->irq_info.num_vectors = 0; + return rc; +} + + +/** + * mic_setup_callbacks - Initialize data structures needed + * to handle callbacks. + * + * @mdev: pointer to mic_device instance + */ +static int mic_setup_callbacks(struct mic_device *mdev) +{ + int i; + + mdev->irq_info.cb_list = kmalloc(sizeof(struct list_head) * + MIC_NUM_OFFSETS, GFP_KERNEL); + if (!mdev->irq_info.cb_list) + return -ENOMEM; + + for (i = 0; i < MIC_NUM_OFFSETS; i++) + INIT_LIST_HEAD(&mdev->irq_info.cb_list[i]); + + spin_lock_init(&mdev->irq_info.mic_intr_lock); + return 0; +} + +/** + * mic_release_callbacks - Uninitialize data structures needed + * to handle callbacks. + * + * @mdev: pointer to mic_device instance + */ +static void mic_release_callbacks(struct mic_device *mdev) +{ + unsigned long flags; + struct list_head *pos, *tmp; + struct mic_intr_cb *intr_cb; + int i; + + for (i = 0; i < MIC_NUM_OFFSETS; i++) { + spin_lock_irqsave(&mdev->irq_info.mic_intr_lock, flags); + + if (!list_empty(&mdev->irq_info.cb_list[i])) { + dev_warn(&mdev->pdev->dev, + "irq %d may still be in use.\n", mdev->pdev->irq); + } else { + spin_unlock_irqrestore(&mdev->irq_info.mic_intr_lock, + flags); + break; + } + + list_for_each_safe(pos, tmp, &mdev->irq_info.cb_list[i]) { + intr_cb = list_entry(pos, struct mic_intr_cb, list); + list_del(pos); + kfree(intr_cb); + } + spin_unlock_irqrestore(&mdev->irq_info.mic_intr_lock, flags); + } + + kfree(mdev->irq_info.cb_list); +} + +/** + * mic_setup_msi - Initializes MSI interrupts. + * + * @mdev: pointer to mic_device instance + * + * RETURNS: An appropriate -ERRNO error value on error, or zero for success. + */ +static int mic_setup_msi(struct mic_device *mdev) +{ + struct pci_dev *pdev = mdev->pdev; + int rc; + + rc = pci_enable_msi(pdev); + if (rc) { + dev_dbg(&pdev->dev, "Error enabling MSI. rc = %d\n", rc); + return rc; + } + + mdev->irq_info.num_vectors = 1; + rc = mic_alloc_msi_map(mdev); + if (rc) + goto err_nomem1; + + rc = mic_setup_callbacks(mdev); + if (rc) { + dev_err(&pdev->dev, "Error setting up callbacks\n"); + goto err_nomem2; + } + + rc = request_irq(pdev->irq, mic_interrupt, 0 , "mic-msi", mdev); + if (rc) { + dev_err(&pdev->dev, "Error allocating MSI interrupt\n"); + goto err_irq_req_fail; + } + + dev_dbg(&pdev->dev, "%d MSI irqs setup\n", mdev->irq_info.num_vectors); + return 0; + +err_irq_req_fail: + mic_release_callbacks(mdev); +err_nomem2: + mic_free_msi_map(mdev); +err_nomem1: + pci_disable_msi(pdev); + mdev->irq_info.num_vectors = 0; + return rc; +} + +/** + * mic_setup_intx - Initializes legacy interrupts. + * + * @mdev: pointer to mic_device instance + * + * RETURNS: An appropriate -ERRNO error value on error, or zero for success. + */ +static int mic_setup_intx(struct mic_device *mdev) +{ + struct pci_dev *pdev = mdev->pdev; + int rc; + + pci_msi_off(pdev); + + /* Enable intx */ + pci_intx(pdev, 1); + + rc = mic_setup_callbacks(mdev); + if (rc) { + dev_err(&pdev->dev, "Error setting up callbacks\n"); + goto err_nomem; + } + + rc = request_irq(pdev->irq, mic_interrupt, + IRQF_SHARED, "mic-intx", mdev); + if (rc) + goto err; + + dev_dbg(&pdev->dev, "intx irq setup\n"); + + return 0; +err: + mic_release_callbacks(mdev); +err_nomem: + return rc; + +} + +/** + * mic_setup_interrupts - Initializes interrupts. + * + * @mdev: pointer to mic_device instance + * + * RETURNS: An appropriate -ERRNO error value on error, or zero for success. + */ +static int mic_setup_interrupts(struct mic_device *mdev) +{ + int rc; + + rc = mic_setup_msix(mdev); + if (!rc) + goto done; + + rc = mic_setup_msi(mdev); + if (!rc) + goto done; + + rc = mic_setup_intx(mdev); + if (rc) { + dev_err(&mdev->pdev->dev, "no usable interrupts\n"); + return rc; + } +done: + mdev->intr_ops->enable_interrupts(mdev); + return 0; +} + +/** + * mic_free_interrupts - Frees interrupts setup by mic_setup_interrupts + * + * @mdev: pointer to mic_device instance + * + * returns none. + */ +static void mic_free_interrupts(struct mic_device *mdev) +{ + struct pci_dev *pdev = mdev->pdev; + int i; + + mdev->intr_ops->disable_interrupts(mdev); + if (mdev->irq_info.num_vectors > 1) { + for (i = 0; i < mdev->irq_info.num_vectors; i++) { + if (mdev->irq_info.mic_msi_map[i]) + dev_warn(&pdev->dev, "irq %d may still be in use.\n", + mdev->irq_info.msix_entries[i].vector); + } + mic_free_msi_map(mdev); + kfree(mdev->irq_info.msix_entries); + pci_disable_msix(pdev); + } else { + if (pci_dev_msi_enabled(mdev->pdev)) { + free_irq(mdev->pdev->irq, mdev); + mic_free_msi_map(mdev); + pci_disable_msi(pdev); + } else { + free_irq(mdev->pdev->irq, mdev); + } + mic_release_callbacks(mdev); + } +} + +/** + * mic_ops_init: Initialize HW specific operation tables. + * + * @mdev: pointer to mic_device instance + * + * returns none. + */ +static void mic_ops_init(struct mic_device *mdev) +{ + switch (mdev->family) { + case MIC_FAMILY_X100: + mdev->ops = &mic_x100_ops; + mdev->intr_ops = &mic_x100_intr_ops; + mdev->smpt_ops = &mic_x100_smpt_ops; + break; + default: + break; + } +} + +/** + * mic_get_family - Determine hardware family to which this MIC belongs. + * + * @mdev: pointer to mic_device instance + * + * returns family. + */ +static enum mic_hw_family mic_get_family(struct mic_device *mdev) +{ + int dev_id = mdev->pdev->device; + enum mic_hw_family family; + + switch (dev_id) { + case MIC_X100_PCI_DEVICE_2250: + case MIC_X100_PCI_DEVICE_2251: + case MIC_X100_PCI_DEVICE_2252: + case MIC_X100_PCI_DEVICE_2253: + case MIC_X100_PCI_DEVICE_2254: + case MIC_X100_PCI_DEVICE_2255: + case MIC_X100_PCI_DEVICE_2256: + case MIC_X100_PCI_DEVICE_2257: + case MIC_X100_PCI_DEVICE_2258: + case MIC_X100_PCI_DEVICE_2259: + case MIC_X100_PCI_DEVICE_225a: + case MIC_X100_PCI_DEVICE_225b: + case MIC_X100_PCI_DEVICE_225c: + case MIC_X100_PCI_DEVICE_225d: + case MIC_X100_PCI_DEVICE_225e: + family = MIC_FAMILY_X100; + break; + default: + family = MIC_FAMILY_UNKNOWN; + break; + } + return family; +} + +/** + * mic_device_init - Allocates and initializes the MIC device structure + * + * @mdev: pointer to mic_device instance + * @pdev: The pci device structure + * + * returns none. + */ +static void +mic_device_init(struct mic_device *mdev, struct pci_dev *pdev) +{ + mdev->pdev = pdev; + INIT_LIST_HEAD(&mdev->vdev_list); + mutex_init(&mdev->mic_mutex); + mdev->family = mic_get_family(mdev); + mic_ops_init(mdev); + mic_sysfs_init(mdev); + INIT_WORK(&mdev->reset_trigger_work, mic_reset_trigger_work); + INIT_WORK(&mdev->shutdown_work, mic_shutdown_work); + init_completion(&mdev->reset_wait); + mdev->irq_info.next_avail_src = 0; +} + +/** + * mic_device_uninit - Frees resources allocated during mic_device_init(..) + * + * @mdev: pointer to mic_device instance + * + * returns none + */ +static void mic_device_uninit(struct mic_device *mdev) +{ + /* The cmdline sysfs entry might have allocated cmdline */ + kfree(mdev->cmdline); + kfree(mdev->ipaddr); + kfree(mdev->firmware); + flush_work(&mdev->reset_trigger_work); + flush_work(&mdev->shutdown_work); +} + +/** + * mic_probe - Device Initialization Routine + * + * @pdev: PCI device structure + * @ent: entry in mic_pci_tbl + * + * returns 0 on success, < 0 on failure. + */ +static int mic_probe(struct pci_dev *pdev, const struct pci_device_id *ent) +{ + int rc; + struct mic_device *mdev; + char name[20]; + + rc = g_mic.next_id++; + + snprintf(name, sizeof(name), "mic%d", rc); + mdev = kzalloc(sizeof(*mdev), GFP_KERNEL); + if (!mdev) { + rc = -ENOMEM; + dev_err(&pdev->dev, "dev kmalloc failed rc %d\n", rc); + goto dec_num_dev; + } + strncpy(mdev->name, name, sizeof(name)); + mdev->id = rc; + + mic_device_init(mdev, pdev); + + rc = pci_enable_device(pdev); + if (rc) { + dev_err(&pdev->dev, "failed to enable pci device.\n"); + goto uninit_device; + } + + pci_set_master(pdev); + + rc = pci_request_regions(pdev, mic_driver_name); + if (rc) { + dev_err(&pdev->dev, "failed to get pci regions.\n"); + goto disable_device; + } + + rc = pci_set_dma_mask(pdev, DMA_BIT_MASK(64)); + if (rc) { + dev_err(&pdev->dev, "Cannot set DMA mask\n"); + goto release_regions; + } + + mdev->mmio.pa = pci_resource_start(pdev, mdev->ops->mmio_bar); + mdev->mmio.len = pci_resource_len(pdev, mdev->ops->mmio_bar); + mdev->mmio.va = pci_ioremap_bar(pdev, mdev->ops->mmio_bar); + if (!mdev->mmio.va) { + dev_err(&pdev->dev, "Cannot remap MMIO BAR\n"); + rc = -EIO; + goto release_regions; + } + + mdev->aper.pa = pci_resource_start(pdev, mdev->ops->aper_bar); + mdev->aper.len = pci_resource_len(pdev, mdev->ops->aper_bar); + mdev->aper.va = ioremap_wc(mdev->aper.pa, mdev->aper.len); + if (!mdev->aper.va) { + dev_err(&pdev->dev, "Cannot remap Aperture BAR\n"); + rc = -EIO; + goto unmap_mmio; + } + + mdev->ops->init(mdev); + mdev->intr_ops->intr_init(mdev); + rc = mic_setup_interrupts(mdev); + if (rc) { + dev_err(&pdev->dev, "mic_setup_interrupts failed %d\n", rc); + goto unmap_aper; + } + rc = mic_smpt_init(mdev); + if (rc) { + dev_err(&pdev->dev, "smpt_init failed %d\n", rc); + goto free_interrupts; + } + + pci_set_drvdata(pdev, mdev); + + mdev->dev = device_create(g_mic.mic_class, &pdev->dev, + MKDEV(MAJOR(g_mic.dev), mdev->id), NULL, "%s", mdev->name); + if (IS_ERR(mdev->dev)) { + rc = PTR_ERR(mdev->dev); + dev_err(&pdev->dev, "device_create failed rc %d\n", rc); + goto smpt_uninit; + } + + rc = sysfs_create_group(&mdev->dev->kobj, &mdev->attr_group); + if (rc) { + dev_err(&pdev->dev, "sysfs_create_group failed rc %d\n", rc); + goto destroy_device; + } + mdev->state_sysfs = sysfs_get_dirent(mdev->dev->kobj.sd, + NULL, "state"); + if (!mdev->state_sysfs) { + rc = -ENODEV; + dev_err(&pdev->dev, "sysfs_get_dirent failed rc %d\n", rc); + goto destroy_group; + } + + rc = mic_dp_init(mdev); + if (rc) { + dev_err(&pdev->dev, "mic_dp_init failed rc %d\n", rc); + goto sysfs_put; + } + mutex_lock(&mdev->mic_mutex); + + mdev->shutdown_db = mic_next_db(mdev); + mdev->shutdown_cookie = mic_request_irq(mdev, mic_shutdown_db, + "shutdown-interrupt", mdev, mdev->shutdown_db, MIC_INTR_DB); + if (IS_ERR(mdev->shutdown_cookie)) { + rc = PTR_ERR(mdev->shutdown_cookie); + mutex_unlock(&mdev->mic_mutex); + goto dp_uninit; + } + mutex_unlock(&mdev->mic_mutex); + mic_bootparam_init(mdev); + + mic_create_debug_dir(mdev); + dev_info(&pdev->dev, "Probe successful for %s\n", mdev->name); + return 0; +dp_uninit: + mic_dp_uninit(mdev); +sysfs_put: + sysfs_put(mdev->state_sysfs); +destroy_group: + sysfs_remove_group(&mdev->dev->kobj, &mdev->attr_group); +destroy_device: + device_destroy(g_mic.mic_class, MKDEV(MAJOR(g_mic.dev), mdev->id)); +smpt_uninit: + mic_smpt_uninit(mdev); +free_interrupts: + mic_free_interrupts(mdev); +unmap_aper: + iounmap(mdev->mmio.va); +unmap_mmio: + iounmap(mdev->aper.va); +release_regions: + pci_release_regions(pdev); +disable_device: + pci_disable_device(pdev); +uninit_device: + mic_device_uninit(mdev); + kfree(mdev); +dec_num_dev: + g_mic.next_id--; + dev_err(&pdev->dev, "Probe failed rc %d\n", rc); + return rc; +} + +/** + * mic_remove - Device Removal Routine + * + * @pdev: PCI device structure + * + * mic_remove is called by the PCI subsystem to alert the driver + * that it should release a PCI device. + */ +static void mic_remove(struct pci_dev *pdev) +{ + struct mic_device *mdev; + int id; + + mdev = pci_get_drvdata(pdev); + if (!mdev) + return; + + id = mdev->id; + + mic_stop(mdev, false); + mic_delete_debug_dir(mdev); + mutex_lock(&mdev->mic_mutex); + mic_free_irq(mdev, mdev->shutdown_cookie, mdev); + mutex_unlock(&mdev->mic_mutex); + flush_work(&mdev->shutdown_work); + mic_dp_uninit(mdev); + sysfs_put(mdev->state_sysfs); + sysfs_remove_group(&mdev->dev->kobj, &mdev->attr_group); + device_destroy(g_mic.mic_class, MKDEV(MAJOR(g_mic.dev), mdev->id)); + mic_smpt_uninit(mdev); + mic_free_interrupts(mdev); + iounmap(mdev->mmio.va); + iounmap(mdev->aper.va); + mic_device_uninit(mdev); + pci_release_regions(pdev); + pci_disable_device(pdev); + kfree(mdev); + dev_dbg(&pdev->dev, "Removed mic%d\n", id); +} +static struct pci_driver mic_driver = { + .name = mic_driver_name, + .id_table = mic_pci_tbl, + .probe = mic_probe, + .remove = mic_remove +}; + +static int __init mic_init(void) +{ + int ret; + + ret = alloc_chrdev_region(&g_mic.dev, 0, + MIC_MAX_NUM_DEVS, mic_driver_name); + if (ret) { + pr_err("alloc_chrdev_region failed ret %d\n", ret); + goto error; + } + + g_mic.mic_class = class_create(THIS_MODULE, mic_driver_name); + if (IS_ERR(g_mic.mic_class)) { + ret = PTR_ERR(g_mic.mic_class); + pr_err("class_create failed ret %d\n", ret); + goto cleanup_chrdev; + } + + mic_init_debugfs(); + ret = pci_register_driver(&mic_driver); + if (ret) { + pr_err("pci_register_driver failed ret %d\n", ret); + goto cleanup_debugfs; + } + return ret; +cleanup_debugfs: + mic_exit_debugfs(); + class_destroy(g_mic.mic_class); +cleanup_chrdev: + unregister_chrdev_region(g_mic.dev, MIC_MAX_NUM_DEVS); +error: + return ret; +} + +static void __exit mic_exit(void) +{ + pci_unregister_driver(&mic_driver); + mic_exit_debugfs(); + class_destroy(g_mic.mic_class); + unregister_chrdev_region(g_mic.dev, MIC_MAX_NUM_DEVS); +} + +module_init(mic_init); +module_exit(mic_exit); + +MODULE_AUTHOR("Intel Corporation"); +MODULE_DESCRIPTION("Intel(R) MIC X100 Host driver"); +MODULE_LICENSE("GPL"); diff --git a/drivers/misc/mic/host/mic_smpt.c b/drivers/misc/mic/host/mic_smpt.c new file mode 100644 index 0000000..459303a --- /dev/null +++ b/drivers/misc/mic/host/mic_smpt.c @@ -0,0 +1,441 @@ +/* + * Intel MIC Platform Software Stack (MPSS) + * + * Copyright(c) 2013 Intel Corporation. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License, version 2, as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 + * USA. + * + * The full GNU General Public License is included in this distribution in + * the file called "COPYING". + * + * Intel MIC Host driver. + * + */ +#include <linux/fs.h> +#include <linux/pci.h> +#include <linux/sched.h> + +#include "mic_common.h" + +static inline u64 mic_system_page_mask(struct mic_device *mdev) +{ + return (1ULL << mdev->smpt->info.page_shift) - 1ULL; +} + +static inline u64 mic_sys_addr_to_smpt(struct mic_device *mdev, dma_addr_t pa) +{ + return (pa - mdev->smpt->info.base) >> mdev->smpt->info.page_shift; +} + +static inline u64 mic_smpt_to_pa(struct mic_device *mdev, u64 index) +{ + return mdev->smpt->info.base + (index * mdev->smpt->info.page_size); +} + +static inline u64 mic_smpt_offset(struct mic_device *mdev, dma_addr_t pa) +{ + return pa & mic_system_page_mask(mdev); +} + +static inline u64 mic_smpt_align_low(struct mic_device *mdev, dma_addr_t pa) +{ + return ALIGN(pa - mic_system_page_mask(mdev), + mdev->smpt->info.page_size); +} + +static inline u64 mic_smpt_align_high(struct mic_device *mdev, dma_addr_t pa) +{ + return ALIGN(pa, mdev->smpt->info.page_size); +} + +/* Total Cumulative system memory accessible by MIC across all SMPT entries */ +static inline u64 mic_max_system_memory(struct mic_device *mdev) +{ + return mdev->smpt->info.num_reg * mdev->smpt->info.page_size; +} + +/* Maximum system memory address accessible by MIC */ +static inline u64 mic_max_system_addr(struct mic_device *mdev) +{ + return mdev->smpt->info.base + mic_max_system_memory(mdev) - 1ULL; +} + +/* Check if the DMA address is a MIC system memory address */ +static inline bool +mic_is_system_addr(struct mic_device *mdev, dma_addr_t pa) +{ + return pa >= mdev->smpt->info.base && pa <= mic_max_system_addr(mdev); +} + +/* Populate an SMPT entry and update the reference counts. */ +static void add_smpt_entry(int spt, s64 *ref, u64 addr, + int entries, struct mic_device *mdev) +{ + struct mic_smpt_info *smpt_info = mdev->smpt; + int i; + + for (i = spt; i < spt + entries; i++, + addr += smpt_info->info.page_size) { + if (!smpt_info->entry[i].ref_count && + (smpt_info->entry[i].dma_addr != addr)) { + mdev->smpt_ops->set(mdev, addr, i); + smpt_info->entry[i].dma_addr = addr; + } + smpt_info->entry[i].ref_count += ref[i - spt]; + } +} + +/* + * Find an available MIC address in MIC SMPT address space + * for a given DMA address and size. + */ +static dma_addr_t smpt_op(struct mic_device *mdev, u64 dma_addr, + int entries, s64 *ref, size_t size) +{ + int spt = -1; /* smpt index */ + int ee = 0; /* existing entries */ + int fe = 0; /* free entries */ + int i; + unsigned long flags; + dma_addr_t mic_addr = 0; + dma_addr_t addr = dma_addr; + struct mic_smpt_info *smpt_info = mdev->smpt; + + spin_lock_irqsave(&smpt_info->smpt_lock, flags); + + /* find existing entries */ + for (i = 0; i < smpt_info->info.num_reg; i++) { + if (smpt_info->entry[i].dma_addr == addr) { + ee++; + addr += smpt_info->info.page_size; + } else if (ee) /* cannot find contiguous entries */ + goto not_found; + + if (ee == entries) + goto found; + } + + /* find free entry */ + for (i = 0; i < smpt_info->info.num_reg; i++) { + fe = (smpt_info->entry[i].ref_count == 0) ? fe + 1 : 0; + if (fe == entries) + goto found; + } + +not_found: + spin_unlock_irqrestore(&smpt_info->smpt_lock, flags); + return mic_addr; + +found: + spt = i - entries + 1; + mic_addr = mic_smpt_to_pa(mdev, spt); + add_smpt_entry(spt, ref, dma_addr, entries, mdev); + smpt_info->map_count++; + smpt_info->ref_count += (s64)size; + spin_unlock_irqrestore(&smpt_info->smpt_lock, flags); + return mic_addr; +} + +/* + * Returns number of smpt entries needed for dma_addr to dma_addr + size + * also returns the reference count array for each of those entries + * and the starting smpt address + */ +static int get_smpt_ref_count(s64 *ref, dma_addr_t dma_addr, size_t size, + u64 *smpt_start, struct mic_device *mdev) +{ + u64 start = dma_addr; + u64 end = dma_addr + size; + int i = 0; + + while (start < end) { + ref[i++] = min(mic_smpt_align_high(mdev, start + 1), + end) - start; + start = mic_smpt_align_high(mdev, start + 1); + } + + if (smpt_start) + *smpt_start = mic_smpt_align_low(mdev, dma_addr); + + return i; +} + +/* + * mic_to_dma_addr - Converts a MIC address to a DMA address. + * + * @mdev: pointer to mic_device instance. + * @mic_address: MIC address. + * + * returns a DMA address. + */ +static dma_addr_t +mic_to_dma_addr(struct mic_device *mdev, dma_addr_t mic_addr) +{ + struct mic_smpt_info *smpt_info = mdev->smpt; + int spt; + dma_addr_t dma_addr; + + if (!mic_is_system_addr(mdev, mic_addr)) { + WARN_ON(1); + return -EINVAL; + } + spt = mic_sys_addr_to_smpt(mdev, mic_addr); + dma_addr = smpt_info->entry[spt].dma_addr + + mic_smpt_offset(mdev, mic_addr); + return dma_addr; +} + +/** + * mic_map - Maps a DMA address to a MIC physical address. + * + * @mdev: pointer to mic_device instance. + * @dma_address: DMA address. + * @size: Size of the region to be mapped. + * + * This API converts the DMA address provided to a DMA address understood + * by MIC. Callee should check for errors by called mic_map_error(..). + * + * returns DMA address as required by MIC. + */ +dma_addr_t mic_map(struct mic_device *mdev, dma_addr_t dma_addr, size_t size) +{ + dma_addr_t mic_addr = 0; + int entries; + s64 *ref; + u64 smpt_start; + + if (!size || size > mic_max_system_memory(mdev)) + return mic_addr; + + ref = kmalloc(mdev->smpt->info.num_reg * sizeof(s64), GFP_KERNEL); + if (!ref) + return mic_addr; + + /* + * Get number of smpt entries to be mapped, ref count array + * and the starting smpt address to start the search for + * free or existing smpt entries. + */ + entries = get_smpt_ref_count(ref, dma_addr, size, &smpt_start, mdev); + + /* Set the smpt table appropriately and get 16G aligned mic address */ + mic_addr = smpt_op(mdev, smpt_start, entries, ref, size); + + kfree(ref); + + /* + * If mic_addr is zero then its an error case + * since mic_addr can never be zero. + * else generate mic_addr by adding the 16G offset in dma_addr + */ + if (!mic_addr && MIC_FAMILY_X100 == mdev->family) { + WARN_ON(1); + return mic_addr; + } else { + return mic_addr + (dma_addr & mic_system_page_mask(mdev)); + } +} + +/** + * mic_unmap - Unmaps a MIC physical address. + * + * @mdev: pointer to mic_device instance. + * @mic_addr: MIC physical address. + * @size: Size of the region to be unmapped. + * + * This API unmaps the mappings created by mic_map(..). + * + * returns None. + */ +void mic_unmap(struct mic_device *mdev, dma_addr_t mic_addr, size_t size) +{ + struct mic_smpt_info *smpt_info = mdev->smpt; + s64 *ref; + int num_smpt; + int spt; + int i; + unsigned long flags; + + if (!size) + return; + + if (!mic_is_system_addr(mdev, mic_addr)) { + WARN_ON(1); + return; + } + + spt = mic_sys_addr_to_smpt(mdev, mic_addr); + ref = kmalloc(mdev->smpt->info.num_reg * sizeof(s64), GFP_KERNEL); + if (!ref) + return; + + /* Get number of smpt entries to be mapped, ref count array */ + num_smpt = get_smpt_ref_count(ref, mic_addr, size, NULL, mdev); + + spin_lock_irqsave(&smpt_info->smpt_lock, flags); + smpt_info->unmap_count++; + smpt_info->ref_count -= (s64)size; + + for (i = spt; i < spt + num_smpt; i++) { + smpt_info->entry[i].ref_count -= ref[i - spt]; + WARN_ON(smpt_info->entry[i].ref_count < 0); + } + spin_unlock_irqrestore(&smpt_info->smpt_lock, flags); + kfree(ref); +} + +/** + * mic_map_single - Maps a virtual address to a MIC physical address. + * + * @mdev: pointer to mic_device instance. + * @va: Kernel direct mapped virtual address. + * @size: Size of the region to be mapped. + * + * This API calls pci_map_single(..) for the direct mapped virtual address + * and then converts the DMA address provided to a DMA address understood + * by MIC. Callee should check for errors by called mic_map_error(..). + * + * returns DMA address as required by MIC. + */ +dma_addr_t mic_map_single(struct mic_device *mdev, void *va, size_t size) +{ + dma_addr_t mic_addr = 0; + dma_addr_t dma_addr + pci_map_single(mdev->pdev, va, size, PCI_DMA_BIDIRECTIONAL); + + if (!pci_dma_mapping_error(mdev->pdev, dma_addr)) { + mic_addr = mic_map(mdev, dma_addr, size); + if (!mic_addr) { + dev_err(&mdev->pdev->dev, + "mic_map failed dma_addr 0x%llx size 0x%lx\n", + dma_addr, size); + pci_unmap_single(mdev->pdev, dma_addr, + size, PCI_DMA_BIDIRECTIONAL); + } + } + return mic_addr; +} + +/** + * mic_unmap_single - Unmaps a MIC physical address. + * + * @mdev: pointer to mic_device instance. + * @mic_addr: MIC physical address. + * @size: Size of the region to be unmapped. + * + * This API unmaps the mappings created by mic_map_single(..). + * + * returns None. + */ +void +mic_unmap_single(struct mic_device *mdev, dma_addr_t mic_addr, size_t size) +{ + dma_addr_t dma_addr = mic_to_dma_addr(mdev, mic_addr); + mic_unmap(mdev, mic_addr, size); + pci_unmap_single(mdev->pdev, dma_addr, size, PCI_DMA_BIDIRECTIONAL); +} + +/** + * mic_smpt_init - Initialize MIC System Memory Page Tables. + * + * @mdev: pointer to mic_device instance. + * + * returns 0 for success and -errno for error. + */ +int mic_smpt_init(struct mic_device *mdev) +{ + int i, err = 0; + dma_addr_t dma_addr; + struct mic_smpt_info *smpt_info; + + mdev->smpt = kmalloc(sizeof(*mdev->smpt), GFP_KERNEL); + if (!mdev->smpt) + return -ENOMEM; + + smpt_info = mdev->smpt; + mdev->smpt_ops->init(mdev); + smpt_info->entry = kmalloc(sizeof(struct mic_smpt) + * smpt_info->info.num_reg, GFP_KERNEL); + if (!smpt_info->entry) { + err = -ENOMEM; + goto free_smpt; + } + spin_lock_init(&smpt_info->smpt_lock); + for (i = 0; i < smpt_info->info.num_reg; i++) { + dma_addr = i * smpt_info->info.page_size; + smpt_info->entry[i].dma_addr = dma_addr; + smpt_info->entry[i].ref_count = 0; + mdev->smpt_ops->set(mdev, dma_addr, i); + } + smpt_info->ref_count = 0; + smpt_info->map_count = 0; + smpt_info->unmap_count = 0; + return 0; +free_smpt: + kfree(smpt_info); + return err; +} + +/** + * mic_smpt_uninit - UnInitialize MIC System Memory Page Tables. + * + * @mdev: pointer to mic_device instance. + * + * returns None. + */ +void mic_smpt_uninit(struct mic_device *mdev) +{ + struct mic_smpt_info *smpt_info = mdev->smpt; + int i; + + dev_dbg(&mdev->pdev->dev, + "nodeid %d SMPT ref count %lld map %lld unmap %lld\n", + mdev->id, smpt_info->ref_count, + smpt_info->map_count, smpt_info->unmap_count); + + for (i = 0; i < smpt_info->info.num_reg; i++) { + dev_dbg(&mdev->pdev->dev, + "SMPT entry[%d] dma_addr = 0x%llx ref_count = %lld\n", + i, smpt_info->entry[i].dma_addr, + smpt_info->entry[i].ref_count); + WARN_ON(smpt_info->entry[i].ref_count); + } + kfree(smpt_info->entry); + kfree(smpt_info); +} + +/** + * mic_smpt_restore - Restore MIC System Memory Page Tables. + * + * @mdev: pointer to mic_device instance. + * + * Restore the SMPT registers to values previously stored in the + * SW data structures. Some MIC steppings lose register state + * across resets and this API should be called for performing + * a restore operation if required. + * + * returns None. + */ +void mic_smpt_restore(struct mic_device *mdev) +{ + int i; + dma_addr_t dma_addr; + + for (i = 0; i < mdev->smpt->info.num_reg; i++) { + dma_addr = mdev->smpt->entry[i].dma_addr; + mdev->smpt_ops->set(mdev, dma_addr, i); + } +} + diff --git a/drivers/misc/mic/host/mic_smpt.h b/drivers/misc/mic/host/mic_smpt.h new file mode 100644 index 0000000..9dc1d4c --- /dev/null +++ b/drivers/misc/mic/host/mic_smpt.h @@ -0,0 +1,103 @@ +/* + * Intel MIC Platform Software Stack (MPSS) + * + * Copyright(c) 2013 Intel Corporation. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License, version 2, as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 + * USA. + * + * The full GNU General Public License is included in this distribution in + * the file called "COPYING". + * + * Intel MIC Host driver. + * + */ +#ifndef MIC_SMPT_H +#define MIC_SMPT_H +/** + * struct mic_smpt_ops - MIC HW specific SMPT operations. + * @init: Initialize hardware specific SMPT information in mic_smpt_hw_info. + * @set: Set the value for a particular SMPT entry. + */ +struct mic_smpt_ops { + void (*init)(struct mic_device *mdev); + void (*set)(struct mic_device *mdev, dma_addr_t dma_addr, u8 index); +}; + +/** + * struct mic_smpt - MIC SMPT entry information. + * @dma_addr: Base DMA address for this SMPT entry. + * @ref_count: Number of active mappings for this SMPT entry in bytes. + */ +struct mic_smpt { + dma_addr_t dma_addr; + s64 ref_count; +}; + +/** + * struct mic_smpt_hw_info - MIC SMPT hardware specific information. + * @num_reg: Number of SMPT registers. + * @page_shift: System memory page shift. + * @page_size: System memory page size. + * @base: System address base. + */ +struct mic_smpt_hw_info { + u8 num_reg; + u8 page_shift; + u64 page_size; + u64 base; +}; + +/** + * struct mic_smpt_info - MIC SMPT information. + * @entry: Array of SMPT entries. + * @smpt_lock: Spin lock protecting access to SMPT data structures. + * @info: Hardware specific SMPT information. + * @ref_count: Number of active SMPT mappings (for debug). + * @map_count: Number of SMPT mappings created (for debug). + * @unmap_count: Number of SMPT mappings destroyed (for debug). + */ +struct mic_smpt_info { + struct mic_smpt *entry; + spinlock_t smpt_lock; + struct mic_smpt_hw_info info; + s64 ref_count; + s64 map_count; + s64 unmap_count; +}; + +dma_addr_t mic_map_single(struct mic_device *mdev, void *va, size_t size); +void mic_unmap_single(struct mic_device *mdev, + dma_addr_t mic_addr, size_t size); +dma_addr_t mic_map(struct mic_device *mdev, + dma_addr_t dma_addr, size_t size); +void mic_unmap(struct mic_device *mdev, dma_addr_t mic_addr, size_t size); + +/** + * mic_map_error - Check a MIC address for errors. + * + * @mdev: pointer to mic_device instance. + * + * returns Whether there was an error during mic_map..(..) APIs. + */ +static inline bool mic_map_error(dma_addr_t mic_addr) +{ + return !mic_addr; +} + +int mic_smpt_init(struct mic_device *mdev); +void mic_smpt_uninit(struct mic_device *mdev); +void mic_smpt_restore(struct mic_device *mdev); + +#endif diff --git a/drivers/misc/mic/host/mic_sysfs.c b/drivers/misc/mic/host/mic_sysfs.c new file mode 100644 index 0000000..d4d8ee6 --- /dev/null +++ b/drivers/misc/mic/host/mic_sysfs.c @@ -0,0 +1,360 @@ +/* + * Intel MIC Platform Software Stack (MPSS) + * + * Copyright(c) 2013 Intel Corporation. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License, version 2, as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 + * USA. + * + * The full GNU General Public License is included in this distribution in + * the file called "COPYING". + * + * Intel MIC Host driver. + * + */ +#include <linux/module.h> +#include <linux/pci.h> + +#include "mic_common.h" + +/* + * A state-to-string lookup table, for exposing a human readable state + * via sysfs. Always keep in sync with enum mic_states + */ +static const char * const mic_state_string[] = { + [MIC_OFFLINE] = "offline", + [MIC_ONLINE] = "online", + [MIC_SHUTTING_DOWN] = "shutting_down", + [MIC_RESET_FAILED] = "reset_failed", +}; + +/* + * A shutdown-status-to-string lookup table, for exposing a human + * readable state via sysfs. Always keep in sync with enum mic_shutdown_status + */ +static const char * const mic_shutdown_status_string[] = { + [MIC_NOP] = "nop", + [MIC_CRASHED] = "crashed", + [MIC_HALTED] = "halted", + [MIC_POWER_OFF] = "poweroff", + [MIC_RESTART] = "restart", +}; + +void mic_set_shutdown_status(struct mic_device *mdev, u8 shutdown_status) +{ + WARN_ON(!mutex_is_locked(&mdev->mic_mutex)); + dev_info(&mdev->pdev->dev, "Shutdown Status %s -> %s\n", + mic_shutdown_status_string[mdev->shutdown_status], + mic_shutdown_status_string[shutdown_status]); + mdev->shutdown_status = shutdown_status; +} + +void mic_set_state(struct mic_device *mdev, u8 state) +{ + WARN_ON(!mutex_is_locked(&mdev->mic_mutex)); + dev_info(&mdev->pdev->dev, "State %s -> %s\n", + mic_state_string[mdev->state], + mic_state_string[state]); + mdev->state = state; + sysfs_notify_dirent(mdev->state_sysfs); +} + +static ssize_t +show_family(struct device *dev, struct device_attribute *attr, char *buf) +{ + static const char x100[] = "x100"; + static const char unknown[] = "Unknown"; + const char *card = NULL; + struct mic_device *mdev = dev_get_drvdata(dev->parent); + + if (!mdev) + return -EINVAL; + + switch (mdev->family) { + case MIC_FAMILY_X100: + card = x100; + break; + default: + card = unknown; + break; + } + return snprintf(buf, PAGE_SIZE, "%s\n", card); +} +static DEVICE_ATTR(family, S_IRUGO, show_family, NULL); + +static ssize_t +show_stepping(struct device *dev, struct device_attribute *attr, char *buf) +{ + struct mic_device *mdev = dev_get_drvdata(dev->parent); + char *string = "??"; + + if (!mdev) + return -EINVAL; + + switch (mdev->family) { + case MIC_FAMILY_X100: + switch (mdev->stepping) { + case MIC_A0_STEP: + string = "A0"; + break; + case MIC_B0_STEP: + string = "B0"; + break; + case MIC_B1_STEP: + string = "B1"; + break; + case MIC_C0_STEP: + string = "C0"; + break; + default: + break; + } + break; + default: + break; + } + return snprintf(buf, PAGE_SIZE, "%s\n", string); +} +static DEVICE_ATTR(stepping, S_IRUGO, show_stepping, NULL); + +static ssize_t +show_micstate(struct device *dev, struct device_attribute *attr, char *buf) +{ + struct mic_device *mdev = dev_get_drvdata(dev->parent); + + if (!mdev || mdev->state >= MIC_LAST) + return -EINVAL; + + return snprintf(buf, PAGE_SIZE, "%s", + mic_state_string[mdev->state]); +} + +static ssize_t +set_micstate(struct device *dev, struct device_attribute *attr, + const char *buf, size_t count) +{ + int rc = 0; + struct mic_device *mdev = dev_get_drvdata(dev->parent); + if (!mdev) + return -EINVAL; + if (!strncmp(buf, "boot", strlen("boot"))) { + rc = mic_start(mdev, buf); + if (rc) { + dev_err(&mdev->pdev->dev, + "mic_boot failed rc %d\n", rc); + count = rc; + } + goto done; + } + + if (sysfs_streq(buf, "reset")) { + schedule_work(&mdev->reset_trigger_work); + goto done; + } + + if (sysfs_streq(buf, "shutdown")) { + mic_shutdown(mdev); + goto done; + } + + count = -EINVAL; +done: + return count; +} +static DEVICE_ATTR(state, S_IRUGO|S_IWUSR, show_micstate, set_micstate); + +static ssize_t show_mic_shutdown_status(struct device *dev, + struct device_attribute *attr, char *buf) +{ + struct mic_device *mdev = dev_get_drvdata(dev->parent); + + if (!mdev || mdev->shutdown_status >= MIC_STATUS_LAST) + return -EINVAL; + + return snprintf(buf, PAGE_SIZE, "%s", + mic_shutdown_status_string[mdev->shutdown_status]); +} +static DEVICE_ATTR(shutdown_status, S_IRUGO|S_IWUSR, + show_mic_shutdown_status, NULL); + +static ssize_t +show_ipaddr(struct device *dev, struct device_attribute *attr, char *buf) +{ + struct mic_device *mdev = dev_get_drvdata(dev->parent); + char *ipaddr; + + if (!mdev) + return -EINVAL; + + ipaddr = mdev->ipaddr; + + if (ipaddr) + return snprintf(buf, PAGE_SIZE, "%s\n", ipaddr); + return 0; +} + +static ssize_t +set_ipaddr(struct device *dev, struct device_attribute *attr, + const char *buf, size_t count) +{ + struct mic_device *mdev = dev_get_drvdata(dev->parent); + + if (!mdev) + return -EINVAL; + + kfree(mdev->ipaddr); + + mdev->ipaddr = kzalloc(count + 1, GFP_KERNEL); + if (!mdev->ipaddr) + return -ENOMEM; + + strncpy(mdev->ipaddr, buf, count); + + if (mdev->ipaddr[count - 1] == '\n') + mdev->ipaddr[count - 1] = '\0'; + else + mdev->ipaddr[count] = '\0'; + return count; +} +static DEVICE_ATTR(ipaddr, S_IRUGO | S_IWUSR, show_ipaddr, set_ipaddr); + +static ssize_t +show_cmdline(struct device *dev, struct device_attribute *attr, char *buf) +{ + struct mic_device *mdev = dev_get_drvdata(dev->parent); + char *cmdline; + + if (!mdev) + return -EINVAL; + + cmdline = mdev->cmdline; + + if (cmdline) + return snprintf(buf, PAGE_SIZE, "%s\n", cmdline); + return 0; +} + +static ssize_t +set_cmdline(struct device *dev, struct device_attribute *attr, + const char *buf, size_t count) +{ + struct mic_device *mdev = dev_get_drvdata(dev->parent); + + if (!mdev) + return -EINVAL; + + kfree(mdev->cmdline); + + mdev->cmdline = kmalloc(count + 1, GFP_KERNEL); + if (!mdev->cmdline) + return -ENOMEM; + + strncpy(mdev->cmdline, buf, count); + + if (mdev->cmdline[count - 1] == '\n') + mdev->cmdline[count - 1] = '\0'; + else + mdev->cmdline[count] = '\0'; + + return count; +} +static DEVICE_ATTR(cmdline, S_IRUGO | S_IWUSR, show_cmdline, set_cmdline); + +static ssize_t +show_log_buf_addr(struct device *dev, struct device_attribute *attr, + char *buf) +{ + struct mic_device *mdev = dev_get_drvdata(dev->parent); + + if (!mdev) + return -EINVAL; + + return snprintf(buf, PAGE_SIZE, "%p\n", mdev->log_buf_addr); +} + +static ssize_t +store_log_buf_addr(struct device *dev, struct device_attribute *attr, + const char *buf, size_t count) +{ + struct mic_device *mdev = dev_get_drvdata(dev->parent); + int ret; + unsigned long addr; + + if (!mdev) + return -EINVAL; + + ret = kstrtoul(buf, 16, &addr); + if (ret) + goto exit; + + mdev->log_buf_addr = (void *)addr; + ret = count; +exit: + return ret; +} +static DEVICE_ATTR(log_buf_addr, S_IRUGO | S_IWUSR, + show_log_buf_addr, store_log_buf_addr); + +static ssize_t +show_log_buf_len(struct device *dev, struct device_attribute *attr, + char *buf) +{ + struct mic_device *mdev = dev_get_drvdata(dev->parent); + + if (!mdev) + return -EINVAL; + + return snprintf(buf, PAGE_SIZE, "%p\n", mdev->log_buf_len); +} + +static ssize_t +store_log_buf_len(struct device *dev, struct device_attribute *attr, + const char *buf, size_t count) +{ + struct mic_device *mdev = dev_get_drvdata(dev->parent); + int ret; + unsigned long addr; + + if (!mdev) + return -EINVAL; + + ret = kstrtoul(buf, 16, &addr); + if (ret) + goto exit; + + mdev->log_buf_len = (int *)addr; + ret = count; +exit: + return ret; +} +static DEVICE_ATTR(log_buf_len, S_IRUGO | S_IWUSR, + show_log_buf_len, store_log_buf_len); + +static struct attribute *default_attrs[] = { + &dev_attr_family.attr, + &dev_attr_stepping.attr, + &dev_attr_state.attr, + &dev_attr_shutdown_status.attr, + &dev_attr_cmdline.attr, + &dev_attr_ipaddr.attr, + &dev_attr_log_buf_addr.attr, + &dev_attr_log_buf_len.attr, + + NULL +}; + +void mic_sysfs_init(struct mic_device *mdev) +{ + mdev->attr_group.attrs = default_attrs; +} diff --git a/drivers/misc/mic/host/mic_x100.c b/drivers/misc/mic/host/mic_x100.c new file mode 100644 index 0000000..41ca272 --- /dev/null +++ b/drivers/misc/mic/host/mic_x100.c @@ -0,0 +1,665 @@ +/* + * Intel MIC Platform Software Stack (MPSS) + * + * Copyright(c) 2013 Intel Corporation. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License, version 2, as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 + * USA. + * + * The full GNU General Public License is included in this distribution in + * the file called "COPYING". + * + * Intel MIC Host driver. + * + */ +#include <linux/fs.h> +#include <linux/pci.h> +#include <linux/sched.h> +#include <linux/firmware.h> +#include <linux/delay.h> + +#include "mic_common.h" + +/* + * mic_x100_hw_init - Initialize hardware information. + * + * @mdev: pointer to mic_device instance + * + * returns none: + */ +static void mic_x100_hw_init(struct mic_device *mdev) +{ + mdev->stepping = mdev->pdev->revision; +} + +/** + * mic_x100_write_spad() - write to the scratchpad register + * @mdev: pointer to mic_device instance + * @idx: index to the scratchpad register, 0 based + * @val: the data value to put into the register + * + * This function allows writing of a 32bit value to the indexed scratchpad + * register. + * + * RETURNS: none. + */ +static void +mic_x100_write_spad(struct mic_device *mdev, unsigned int idx, u32 val) +{ + dev_dbg(&mdev->pdev->dev, "Writing 0x%x to scratch pad index %d\n", + val, idx); + mic_mmio_write(&mdev->mmio, val, + MIC_X100_SBOX_BASE_ADDRESS + + MIC_X100_SBOX_SPAD0 + idx * 4); +} + +/** + * mic_x100_read_spad() - read from the scratchpad register + * @mdev: pointer to mic_device instance + * @idx: index to scratchpad register, 0 based + * + * This function allows reading of the 32bit scratchpad register. + * + * RETURNS: An appropriate -ERRNO error value on error, or zero for success. + */ +static u32 +mic_x100_read_spad(struct mic_device *mdev, unsigned int idx) +{ + u32 val = mic_mmio_read(&mdev->mmio, + MIC_X100_SBOX_BASE_ADDRESS + + MIC_X100_SBOX_SPAD0 + idx * 4); + + dev_dbg(&mdev->pdev->dev, + "Reading 0x%x from scratch pad index %d\n", val, idx); + return val; +} + +/* + * mic_x100_reset_fw_ready - Reset Firmware ready status field. + * @mdev: pointer to mic_device instance + */ +static void mic_x100_reset_fw_ready(struct mic_device *mdev) +{ + mdev->ops->write_spad(mdev, MIC_X100_DOWNLOAD_INFO, 0); +} + +/* + * mic_x100_is_fw_ready - Check if firmware is ready. + * @mdev: pointer to mic_device instance + */ +static bool mic_x100_is_fw_ready(struct mic_device *mdev) +{ + u32 scratch2 = mdev->ops->read_spad(mdev, MIC_X100_DOWNLOAD_INFO); + return MIC_X100_SPAD2_DOWNLOAD_STATUS(scratch2) ? true : false; +} + +/** + * mic_x100_hw_reset - Reset the MIC device. + * @mdev: pointer to mic_device instance + */ +static void mic_x100_hw_reset(struct mic_device *mdev) +{ + u32 reset_reg; + u32 rgcr = MIC_X100_SBOX_BASE_ADDRESS + MIC_X100_SBOX_RGCR; + struct mic_mw *mw = &mdev->mmio; + + /* Ensure all loads and stores have completed */ + mb(); + /* Trigger reset */ + reset_reg = mic_mmio_read(mw, rgcr); + reset_reg |= 0x1; + mic_mmio_write(mw, reset_reg, rgcr); + /* + * It seems we really want to delay at least 1 second + * after touching reset to prevent a lot of problems. + */ + msleep(1000); +} + +/** + * mic_x100_enable_interrupts - Enable interrupts. + * @mdev: pointer to mic_device instance + */ +static void mic_x100_enable_interrupts(struct mic_device *mdev) +{ + u32 reg; + struct mic_mw *mw = &mdev->mmio; + u32 sice0 = MIC_X100_SBOX_BASE_ADDRESS + MIC_X100_SBOX_SICE0; + u32 siac0 = MIC_X100_SBOX_BASE_ADDRESS + MIC_X100_SBOX_SIAC0; + + reg = mic_mmio_read(mw, sice0); + reg |= MIC_X100_SBOX_DBR_BITS(0xf) | MIC_X100_SBOX_DMA_BITS(0xff); + mic_mmio_write(mw, reg, sice0); + + /* Enable auto-clear when enabling interrupts. + * Applicable only for MSI-x interrupts. Legacy + * and MSI mode cannot have auto-clear enabled */ + if (mdev->irq_info.num_vectors > 1) { + reg = mic_mmio_read(mw, siac0); + reg |= MIC_X100_SBOX_DBR_BITS(0xf) | + MIC_X100_SBOX_DMA_BITS(0xff); + mic_mmio_write(mw, reg, siac0); + } +} + +/** + * mic_x100_disable_interrupts - Disable interrupts. + * @mdev: pointer to mic_device instance + */ +static void mic_x100_disable_interrupts(struct mic_device *mdev) +{ + u32 reg; + struct mic_mw *mw = &mdev->mmio; + u32 sice0 = MIC_X100_SBOX_BASE_ADDRESS + MIC_X100_SBOX_SICE0; + u32 siac0 = MIC_X100_SBOX_BASE_ADDRESS + MIC_X100_SBOX_SIAC0; + u32 sicc0 = MIC_X100_SBOX_BASE_ADDRESS + MIC_X100_SBOX_SICC0; + + reg = mic_mmio_read(mw, sice0); + mic_mmio_write(mw, reg, sicc0); + + if (mdev->irq_info.num_vectors > 1) { + reg = mic_mmio_read(mw, siac0); + reg &= ~(MIC_X100_SBOX_DBR_BITS(0xf) | + MIC_X100_SBOX_DMA_BITS(0xff)); + mic_mmio_write(mw, reg, siac0); + } +} + +/** + * mic_x100_get_apic_id - Get bootstrap APIC ID. + * @mdev: pointer to mic_device instance + */ +static u32 mic_x100_get_apic_id(struct mic_device *mdev) +{ + u32 scratch2 = 0; + + scratch2 = mdev->ops->read_spad(mdev, MIC_X100_DOWNLOAD_INFO); + return MIC_X100_SPAD2_APIC_ID(scratch2); +} + +/** + * mic_x100_send_firmware_intr - Send an interrupt to the firmware on MIC. + * @mdev: pointer to mic_device instance + */ +static void mic_x100_send_firmware_intr(struct mic_device *mdev) +{ + u32 apicicr_low; + u64 apic_icr_offset = MIC_X100_SBOX_APICICR7; + int vector = MIC_X100_BSP_INTERRUPT_VECTOR; + struct mic_mw *mw = &mdev->mmio; + + /* + * For MIC we need to make sure we "hit" + * the send_icr bit (13). + */ + apicicr_low = (vector | (1 << 13)); + + mic_mmio_write(mw, mic_x100_get_apic_id(mdev), + MIC_X100_SBOX_BASE_ADDRESS + apic_icr_offset + 4); + + /* Ensure all stores have been completed before informing card */ + wmb(); + + /* + * MIC card interrupt triggers only when we write + * the lower part of the address (upper bits). + */ + mic_mmio_write(mw, apicicr_low, + MIC_X100_SBOX_BASE_ADDRESS + apic_icr_offset); +} + +/** + * mic_x100_send_sbox_intr - Send an MIC_X100_SBOX interrupt to MIC. + * @mdev: pointer to mic_device instance + */ +static void mic_x100_send_sbox_intr(struct mic_device *mdev, + int doorbell) +{ + struct mic_mw *mw = &mdev->mmio; + u64 apic_icr_offset = MIC_X100_SBOX_APICICR0 + doorbell * 8; + u32 apicicr_low = mic_mmio_read(mw, + MIC_X100_SBOX_BASE_ADDRESS + apic_icr_offset); + + /* for MIC we need to make sure we "hit" the send_icr bit (13) */ + apicicr_low = (apicicr_low | (1 << 13)); + + /* Ensure all stores have been completed before sending an interrupt */ + wmb(); + /* MIC card only triggers when we write the lower part of the + * address (upper bits) + */ + mic_mmio_write(mw, apicicr_low, + MIC_X100_SBOX_BASE_ADDRESS + apic_icr_offset); +} + +/** + * mic_x100_send_rdmasr_intr - Send an RDMASR interrupt to MIC. + * @mdev: pointer to mic_device instance + */ +static void mic_x100_send_rdmasr_intr(struct mic_device *mdev, + int doorbell) +{ + int rdmasr_offset = MIC_X100_SBOX_RDMASR0 + (doorbell << 2); + mic_mmio_write(&mdev->mmio, 0, + MIC_X100_SBOX_BASE_ADDRESS + rdmasr_offset); +} + +/** + * __mic_x100_send_intr - Send interrupt to MIC. + * @mdev: pointer to mic_device instance + * @doorbell: doorbell number. + */ +static void mic_x100_send_intr(struct mic_device *mdev, int doorbell) +{ + int rdmasr_db; + if (doorbell < MIC_X100_NUM_SBOX_IRQ) { + mic_x100_send_sbox_intr(mdev, doorbell); + } else { + rdmasr_db = doorbell - MIC_X100_NUM_SBOX_IRQ + + MIC_X100_RDMASR_IRQ_BASE; + mic_x100_send_rdmasr_intr(mdev, rdmasr_db); + } +} + +/** + * mic_ack_interrupt - Device specific interrupt handling. + * @mdev: pointer to mic_device instance + * + * Returns: bitmask of doorbell events triggered. + */ +static u32 mic_x100_ack_interrupt(struct mic_device *mdev) +{ + u32 reg = 0; + struct mic_mw *mw = &mdev->mmio; + u32 sicr0 = MIC_X100_SBOX_BASE_ADDRESS + MIC_X100_SBOX_SICR0; + + /* Clear pending bit array. */ + if (MIC_A0_STEP == mdev->stepping) + mic_mmio_write(mw, 1, MIC_X100_SBOX_BASE_ADDRESS + + MIC_X100_SBOX_MSIXPBACR_K1OM); + + if (mdev->irq_info.num_vectors <= 1) { + reg = mic_mmio_read(mw, sicr0); + + if (unlikely(!reg)) + goto done; + + mic_mmio_write(mw, reg, sicr0); + } + + if (mdev->stepping >= MIC_B0_STEP) + mdev->intr_ops->enable_interrupts(mdev); +done: + return reg; +} + +/** + * mic_x100_hw_intr_init() - Initialize h/w specific interrupt + * information. + * @mdev: pointer to mic_device instance + */ +static void mic_x100_hw_intr_init(struct mic_device *mdev) +{ + mdev->intr_info = (struct mic_intr_info *) mic_x100_intr_init; +} + +/** + * mic_x100_read_msi_to_src_map() - read from the MSI mapping registers + * @mdev: pointer to mic_device instance + * @idx: index to the mapping register, 0 based + * + * This function allows reading of the 32bit MSI mapping register. + * + * RETURNS: The value in the register. + */ +static u32 +mic_x100_read_msi_to_src_map(struct mic_device *mdev, int idx) +{ + return mic_mmio_read(&mdev->mmio, + MIC_X100_SBOX_BASE_ADDRESS + + MIC_X100_SBOX_MXAR0_K1OM + idx * 4); +} + +/** + * mic_x100_program_msi_to_src_map() - program the MSI mapping registers + * @mdev: pointer to mic_device instance + * @idx: index to the mapping register, 0 based + * @offset: The bit offset in the register that needs to be updated. + * @set: boolean specifying if the bit in the specified offset needs + * to be set or cleared. + * + * RETURNS: None. + */ +static void +mic_x100_program_msi_to_src_map(struct mic_device *mdev, + int idx, int offset, bool set) +{ + unsigned long reg; + struct mic_mw *mw = &mdev->mmio; + u32 mxar = MIC_X100_SBOX_BASE_ADDRESS + + MIC_X100_SBOX_MXAR0_K1OM + idx * 4; + + reg = mic_mmio_read(mw, mxar); + if (set) + __set_bit(offset, ®); + else + __clear_bit(offset, ®); + mic_mmio_write(mw, reg, mxar); +} + +/** + * mic_x100_load_command_line() - Load command line to MIC. + * @mdev: pointer to mic_device instance + * @fw: the firmware image + * + * RETURNS: An appropriate -ERRNO error value on error, or zero for success. + */ +static int +mic_x100_load_command_line(struct mic_device *mdev, const struct firmware *fw) +{ + u32 len = 0; + u32 boot_mem; + char *buf; + void __iomem *cmd_line_va = mdev->aper.va + mdev->bootaddr + fw->size; +#define CMDLINE_SIZE 2048 + + boot_mem = mdev->aper.len >> 20; + buf = kzalloc(CMDLINE_SIZE, GFP_KERNEL); + if (!buf) { + dev_err(&mdev->pdev->dev, + "%s %d allocation failed\n", __func__, __LINE__); + return -ENOMEM; + } + len += snprintf(buf, CMDLINE_SIZE - len, + " mem=%dM crashkernel=1M at 80M", boot_mem); + if (mdev->cmdline) + snprintf(buf + len, CMDLINE_SIZE - len, + " %s", mdev->cmdline); + memcpy_toio(cmd_line_va, buf, strlen(buf) + 1); + kfree(buf); + return 0; +} + +/** + * mic_x100_load_ramdisk() - Load ramdisk to MIC. + * @mdev: pointer to mic_device instance + * + * RETURNS: An appropriate -ERRNO error value on error, or zero for success. + */ +static int +mic_x100_load_ramdisk(struct mic_device *mdev) +{ + const struct firmware *fw; + int rc; + struct boot_params __iomem *bp = mdev->aper.va + mdev->bootaddr; + + rc = request_firmware(&fw, + mdev->ramdisk, &mdev->pdev->dev); + if (rc < 0) { + dev_err(&mdev->pdev->dev, + "ramdisk request_firmware failed: %d %s\n", + rc, mdev->ramdisk); + goto error; + } + /* + * Typically the bootaddr for card OS is 64M + * so copy over the ramdisk @ 128M. + */ + memcpy_toio(mdev->aper.va + (mdev->bootaddr << 1), + fw->data, fw->size); + iowrite32(cpu_to_le32(mdev->bootaddr << 1), &bp->hdr.ramdisk_image); + iowrite32(cpu_to_le32(fw->size), &bp->hdr.ramdisk_size); + release_firmware(fw); +error: + return rc; +} + +/** + * mic_x100_get_boot_addr() - Get MIC boot address. + * @mdev: pointer to mic_device instance + * + * This function is called during firmware load to determine + * the address at which the OS should be downloaded in card + * memory i.e. GDDR. + * RETURNS: An appropriate -ERRNO error value on error, or zero for success. + */ +static int +mic_x100_get_boot_addr(struct mic_device *mdev) +{ + u32 scratch2, boot_addr; + int rc = 0; + + scratch2 = mdev->ops->read_spad(mdev, MIC_X100_DOWNLOAD_INFO); + boot_addr = MIC_X100_SPAD2_DOWNLOAD_ADDR(scratch2); + dev_dbg(&mdev->pdev->dev, "%s %d boot_addr 0x%x\n", + __func__, __LINE__, boot_addr); + if (boot_addr > (1 << 31)) { + dev_err(&mdev->pdev->dev, + "incorrect bootaddr 0x%x\n", + boot_addr); + rc = -EINVAL; + goto error; + } + mdev->bootaddr = boot_addr; +error: + return rc; +} + +/* Either a Linux OS or an ELF for flash updates is currently supported */ +enum mic_mode { + MIC_LINUX = 0, + MIC_ELF, +}; + +static const char * const mic_boot_str[] = { + [MIC_LINUX] = "boot:linux:", + [MIC_ELF] = "boot:elf:", +}; + +/* + * mic_x100_parse_fw_path() - Parse firmware/ramdisk path. + * @mdev: pointer to mic_device instance + * @buf: buffer containing boot string. + * + * RETURNS: An appropriate -ERRNO error value on error, or mode on success. + */ +static int mic_parse_fw_path(struct mic_device *mdev, const char *buf) +{ + enum mic_mode mode; + char *firmware, *ramdisk = NULL; + const char *default_mm_image = "mic/RASMM.elf"; + int len; + + if (!strncmp(buf, mic_boot_str[MIC_LINUX], + strlen(mic_boot_str[MIC_LINUX]))) { + mode = MIC_LINUX; + len = strlen(mic_boot_str[MIC_LINUX]); + } else if (!strncmp(buf, mic_boot_str[MIC_ELF], + strlen(mic_boot_str[MIC_ELF]))) { + mode = MIC_ELF; + len = strlen(mic_boot_str[MIC_ELF]); + } else { + dev_err(&mdev->pdev->dev, + "incorrect boot string %s\n", buf); + return -EINVAL; + } + buf += len; + len = strlen(buf); + if (!(len - 1) && mode == MIC_ELF) { + buf = default_mm_image; + len = strlen(default_mm_image); + } + firmware = kmalloc(len + 1, GFP_KERNEL); + if (!firmware) + return -ENOMEM; + memcpy(firmware, buf, len); + if ('\n' == firmware[len - 1]) + firmware[len - 1] = '\0'; + else + firmware[len] = '\0'; + if (MIC_LINUX == mode) { + /* + * if booting linux, the ramdisk image will likely follow. + * The format is "boot:linux:<fw_path>:<ramdisk_path>" + */ + ramdisk = strchr(firmware, ':'); + if (ramdisk) + *ramdisk++ = '\0'; + } + kfree(mdev->firmware); + mdev->firmware = firmware; + mdev->ramdisk = ramdisk; + return mode; +} + +/** + * mic_x100_load_firmware() - Load firmware to MIC. + * @mdev: pointer to mic_device instance + * @buf: buffer containing boot string including firmware/ramdisk path. + * + * RETURNS: An appropriate -ERRNO error value on error, or zero for success. + */ +static int +mic_x100_load_firmware(struct mic_device *mdev, const char *buf) +{ + int rc, mode; + const struct firmware *fw; + + rc = mic_x100_get_boot_addr(mdev); + if (rc) + goto error; + mode = mic_parse_fw_path(mdev, buf); + if (mode < 0) { + rc = mode; + goto error; + } + /* load OS */ + rc = request_firmware(&fw, mdev->firmware, &mdev->pdev->dev); + if (rc < 0) { + dev_err(&mdev->pdev->dev, + "ramdisk request_firmware failed: %d %s\n", + rc, mdev->firmware); + goto error; + } + if (mdev->bootaddr > mdev->aper.len - fw->size) { + rc = -EINVAL; + dev_err(&mdev->pdev->dev, "%s %d rc %d bootaddr 0x%x\n", + __func__, __LINE__, rc, mdev->bootaddr); + release_firmware(fw); + goto error; + } + memcpy_toio(mdev->aper.va + mdev->bootaddr, fw->data, fw->size); + mdev->ops->write_spad(mdev, MIC_X100_FW_SIZE, fw->size); + if (MIC_ELF == mode) + goto done; + /* load command line */ + rc = mic_x100_load_command_line(mdev, fw); + if (rc) { + dev_err(&mdev->pdev->dev, "%s %d rc %d\n", + __func__, __LINE__, rc); + goto error; + } + release_firmware(fw); + /* load ramdisk */ + if (mdev->ramdisk) + rc = mic_x100_load_ramdisk(mdev); +error: + dev_dbg(&mdev->pdev->dev, "%s %d rc %d\n", + __func__, __LINE__, rc); +done: + return rc; +} + +/** + * mic_x100_get_postcode() - Get postcode status from firmware. + * @mdev: pointer to mic_device instance + * + * RETURNS: postcode. + */ +static u32 mic_x100_get_postcode(struct mic_device *mdev) +{ + return mic_mmio_read(&mdev->mmio, MIC_X100_POSTCODE); +} + +/** + * mic_x100_smpt_set() - Update an SMPT entry with a DMA address. + * @mdev: pointer to mic_device instance + * + * RETURNS: none. + */ +static void +mic_x100_smpt_set(struct mic_device *mdev, dma_addr_t dma_addr, u8 index) +{ +#define SNOOP_ON (0 << 0) +#define SNOOP_OFF (1 << 0) +/* + * Sbox Smpt Reg Bits: + * Bits 31:2 Host address + * Bits 1 RSVD + * Bits 0 No snoop + */ +#define BUILD_SMPT(NO_SNOOP, HOST_ADDR) \ + (u32)(((((HOST_ADDR) << 2) & (~0x03)) | ((NO_SNOOP) & (0x01)))) + + uint32_t smpt_reg_val = BUILD_SMPT(SNOOP_ON, + dma_addr >> mdev->smpt->info.page_shift); + mic_mmio_write(&mdev->mmio, smpt_reg_val, + MIC_X100_SBOX_BASE_ADDRESS + + MIC_X100_SBOX_SMPT00 + (4 * index)); +} + +/** + * mic_x100_smpt_hw_init() - Initialize SMPT X100 specific fields. + * @mdev: pointer to mic_device instance + * + * RETURNS: none. + */ +static void mic_x100_smpt_hw_init(struct mic_device *mdev) +{ + struct mic_smpt_hw_info *info = &mdev->smpt->info; + + info->num_reg = 32; + info->page_shift = 34; + info->page_size = (1ULL << info->page_shift); + info->base = 0x8000000000ULL; +} + +struct mic_smpt_ops mic_x100_smpt_ops = { + .init = mic_x100_smpt_hw_init, + .set = mic_x100_smpt_set, +}; + +struct mic_hw_ops mic_x100_ops = { + .aper_bar = MIC_X100_APER_BAR, + .mmio_bar = MIC_X100_MMIO_BAR, + .init = mic_x100_hw_init, + .read_spad = mic_x100_read_spad, + .write_spad = mic_x100_write_spad, + .reset = mic_x100_hw_reset, + .reset_fw_ready = mic_x100_reset_fw_ready, + .is_fw_ready = mic_x100_is_fw_ready, + .send_firmware_intr = mic_x100_send_firmware_intr, + .send_intr = mic_x100_send_intr, + .ack_interrupt = mic_x100_ack_interrupt, + .load_mic_fw = mic_x100_load_firmware, + .get_postcode = mic_x100_get_postcode, +}; + +struct mic_hw_intr_ops mic_x100_intr_ops = { + .intr_init = mic_x100_hw_intr_init, + .enable_interrupts = mic_x100_enable_interrupts, + .disable_interrupts = mic_x100_disable_interrupts, + .program_msi_to_src_map = mic_x100_program_msi_to_src_map, + .read_msi_to_src_map = mic_x100_read_msi_to_src_map, +}; diff --git a/drivers/misc/mic/host/mic_x100.h b/drivers/misc/mic/host/mic_x100.h new file mode 100644 index 0000000..102fdb8 --- /dev/null +++ b/drivers/misc/mic/host/mic_x100.h @@ -0,0 +1,112 @@ +/* + * Intel MIC Platform Software Stack (MPSS) + * + * Copyright(c) 2013 Intel Corporation. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License, version 2, as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 + * USA. + * + * The full GNU General Public License is included in this distribution in + * the file called "COPYING". + * + * Intel MIC Host driver. + * + */ +#ifndef _MIC_X100_HW_H_ +#define _MIC_X100_HW_H_ + +#define MIC_X100_PCI_DEVICE_2250 0x2250 +#define MIC_X100_PCI_DEVICE_2251 0x2251 +#define MIC_X100_PCI_DEVICE_2252 0x2252 +#define MIC_X100_PCI_DEVICE_2253 0x2253 +#define MIC_X100_PCI_DEVICE_2254 0x2254 +#define MIC_X100_PCI_DEVICE_2255 0x2255 +#define MIC_X100_PCI_DEVICE_2256 0x2256 +#define MIC_X100_PCI_DEVICE_2257 0x2257 +#define MIC_X100_PCI_DEVICE_2258 0x2258 +#define MIC_X100_PCI_DEVICE_2259 0x2259 +#define MIC_X100_PCI_DEVICE_225a 0x225a +#define MIC_X100_PCI_DEVICE_225b 0x225b +#define MIC_X100_PCI_DEVICE_225c 0x225c +#define MIC_X100_PCI_DEVICE_225d 0x225d +#define MIC_X100_PCI_DEVICE_225e 0x225e + +#define MIC_X100_APER_BAR 0 +#define MIC_X100_MMIO_BAR 4 + +#define MIC_X100_SBOX_BASE_ADDRESS 0x00010000 + +#define MIC_X100_SPAD2_DOWNLOAD_STATUS(x) ((x) & 0x1) +#define MIC_X100_SPAD2_APIC_ID(x) (((x) >> 1) & 0x1ff) +#define MIC_X100_SPAD2_DOWNLOAD_ADDR(x) ((x) & 0xfffff000) + +#define MIC_X100_SBOX_SICR0_DBR(x) ((x) & 0xf) +#define MIC_X100_SBOX_SICR0_DMA(x) (((x) >> 8) & 0xff) + +#define MIC_X100_SBOX_SICE0_DBR(x) ((x) & 0xf) +#define MIC_X100_SBOX_DBR_BITS(x) ((x) & 0xf) +#define MIC_X100_SBOX_SICE0_DMA(x) (((x) >> 8) & 0xff) +#define MIC_X100_SBOX_DMA_BITS(x) (((x) & 0xff) << 8) + +#define MIC_X100_DOWNLOAD_INFO 2 +#define MIC_X100_FW_SIZE 5 + +#define MIC_X100_SBOX_SPAD0 0x0000AB20 +#define MIC_X100_SBOX_APICICR0 0x0000A9D0 +#define MIC_X100_SBOX_APICICR7 0x0000AA08 +#define MIC_X100_SBOX_RGCR 0x00004010 +#define MIC_X100_SBOX_SICR0 0x00009004 +#define MIC_X100_SBOX_SICE0 0x0000900C +#define MIC_X100_SBOX_SICC0 0x00009010 +#define MIC_X100_SBOX_SIAC0 0x00009014 +#define MIC_X100_SBOX_MSIXPBACR_K1OM 0x00009084 +#define MIC_X100_SBOX_MXAR0 0x00009040 +#define MIC_X100_SBOX_MXAR0_K1OM 0x00009044 +#define MIC_X100_SBOX_SDBIC3 0x0000CC9C +#define MIC_X100_SBOX_SDBIC0 0x0000CC90 +#define MIC_X100_SBOX_SMPT00 0x00003100 +#define MIC_X100_SBOX_RDMASR0 0x0000B180 + +#define MIC_X100_POSTCODE 0x242c + +#define MIC_X100_DOORBELL_IDX_START 0 +#define MIC_X100_NUM_DOORBELL 4 +#define MIC_X100_DMA_IDX_START 8 +#define MIC_X100_NUM_DMA 8 +#define MIC_X100_ERR_IDX_START 30 +#define MIC_X100_NUM_ERR 1 + +#define MIC_X100_NUM_SBOX_IRQ 8 +#define MIC_X100_NUM_RDMASR_IRQ 8 +#define MIC_X100_RDMASR_IRQ_BASE 17 + +#define MIC_NUM_OFFSETS 32 + +static const u16 mic_x100_intr_init[] = { + MIC_X100_DOORBELL_IDX_START, + MIC_X100_DMA_IDX_START, + MIC_X100_ERR_IDX_START, + MIC_X100_NUM_DOORBELL, + MIC_X100_NUM_DMA, + MIC_X100_NUM_ERR, +}; + +/* Host->Card(bootstrap) Interrupt Vector */ +#define MIC_X100_BSP_INTERRUPT_VECTOR 229 + +extern struct mic_hw_ops mic_x100_ops; +extern struct mic_smpt_ops mic_x100_smpt_ops; +extern struct mic_hw_intr_ops mic_x100_intr_ops; + +#endif diff --git a/include/uapi/linux/Kbuild b/include/uapi/linux/Kbuild index bdc6e87..8f985dd 100644 --- a/include/uapi/linux/Kbuild +++ b/include/uapi/linux/Kbuild @@ -239,6 +239,7 @@ header-y += media.h header-y += mei.h header-y += mempolicy.h header-y += meye.h +header-y += mic_common.h header-y += mii.h header-y += minix_fs.h header-y += mman.h diff --git a/include/uapi/linux/mic_common.h b/include/uapi/linux/mic_common.h new file mode 100644 index 0000000..b8edede --- /dev/null +++ b/include/uapi/linux/mic_common.h @@ -0,0 +1,79 @@ +/* + * Intel MIC Platform Software Stack (MPSS) + * + * Copyright(c) 2013 Intel Corporation. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License, version 2, as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 + * USA. + * + * The full GNU General Public License is included in this distribution in + * the file called "COPYING". + * + * Intel MIC driver. + * + */ +#ifndef __MIC_COMMON_H_ +#define __MIC_COMMON_H_ + +#include <linux/types.h> + +/** + * struct mic_bootparam: Virtio device independent information in device page + * + * @magic: A magic value used by the card to ensure it can see the host + * @c2h_shutdown_db: Card to Host shutdown doorbell set by host + * @h2c_shutdown_db: Host to Card shutdown doorbell set by card + * @h2c_config_db: Host to Card Virtio config doorbell set by card + * @shutdown_status: Card shutdown status set by card + * @shutdown_card: Set to 1 by the host when a card shutdown is initiated + */ +struct mic_bootparam { + __u32 magic; + __s8 c2h_shutdown_db; + __s8 h2c_shutdown_db; + __s8 h2c_config_db; + __u8 shutdown_status; + __u8 shutdown_card; +} __aligned(8); + +/* Device page size */ +#define MIC_DP_SIZE 4096 + +#define MIC_MAGIC 0xc0ffee00 + +/** + * enum mic_states - MIC states. + */ +enum mic_states { + MIC_OFFLINE = 0, + MIC_ONLINE, + MIC_SHUTTING_DOWN, + MIC_RESET_FAILED, + MIC_LAST +}; + +/** + * enum mic_status - MIC status reported by card after + * a host or card initiated shutdown or a card crash. + */ +enum mic_status { + MIC_NOP = 0, + MIC_CRASHED, + MIC_HALTED, + MIC_POWER_OFF, + MIC_RESTART, + MIC_STATUS_LAST +}; + +#endif -- 1.8.2.1
This patch does the following: a) Initializes the Intel MIC X100 platform device and driver. b) Sets up support to handle shutdown requests from the host. c) Maps the device page after obtaining the device page address from the scratchpad registers updated by the host. d) Informs the host upon a card crash by registering a panic notifier. e) Informs the host upon a poweroff/halt event. Co-author: Dasaratharaman Chandramouli <dasaratharaman.chandramouli at intel.com> Signed-off-by: Ashutosh Dixit <ashutosh.dixit at intel.com> Signed-off-by: Caz Yokoyama <Caz.Yokoyama at intel.com> Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli at intel.com> Signed-off-by: Nikhil Rao <nikhil.rao at intel.com> Signed-off-by: Harshavardhan R Kharche <harshavardhan.r.kharche at intel.com> Signed-off-by: Sudeep Dutt <sudeep.dutt at intel.com> Acked-by: Yaozu (Eddie) Dong <eddie.dong at intel.com> Reviewed-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr at intel.com> --- drivers/misc/mic/Kconfig | 37 +++++ drivers/misc/mic/Makefile | 1 + drivers/misc/mic/card/Makefile | 10 ++ drivers/misc/mic/card/mic_common.h | 43 +++++ drivers/misc/mic/card/mic_debugfs.c | 139 +++++++++++++++++ drivers/misc/mic/card/mic_debugfs.h | 40 +++++ drivers/misc/mic/card/mic_device.c | 304 ++++++++++++++++++++++++++++++++++++ drivers/misc/mic/card/mic_device.h | 106 +++++++++++++ drivers/misc/mic/card/mic_x100.c | 253 ++++++++++++++++++++++++++++++ drivers/misc/mic/card/mic_x100.h | 53 +++++++ 10 files changed, 986 insertions(+) create mode 100644 drivers/misc/mic/card/Makefile create mode 100644 drivers/misc/mic/card/mic_common.h create mode 100644 drivers/misc/mic/card/mic_debugfs.c create mode 100644 drivers/misc/mic/card/mic_debugfs.h create mode 100644 drivers/misc/mic/card/mic_device.c create mode 100644 drivers/misc/mic/card/mic_device.h create mode 100644 drivers/misc/mic/card/mic_x100.c create mode 100644 drivers/misc/mic/card/mic_x100.h diff --git a/drivers/misc/mic/Kconfig b/drivers/misc/mic/Kconfig index aaefd0c..95bc291 100644 --- a/drivers/misc/mic/Kconfig +++ b/drivers/misc/mic/Kconfig @@ -17,3 +17,40 @@ config INTEL_MIC_HOST More information about the Intel MIC family as well as the Linux OS and tools for MIC to use with this driver are available from <http://software.intel.com/en-us/mic-developer>. + +comment "Intel MIC Card Driver" + +config INTEL_MIC_CARD + tristate "Intel MIC Card Driver" + depends on 64BIT + default N + help + This enables card driver support for the Intel Many Integrated + Core (MIC) device family. The card driver communicates shutdown/ + crash events to the host and allows registration/configuration of + virtio devices. + + If you are building a card kernel for an Intel MIC device then + say M (recommended) or Y, else say N. If unsure say N. + + For more information see + <http://software.intel.com/en-us/mic-developer>. + +comment "Intel MIC Card X100 Driver" + +config INTEL_MIC_CARD_X100 + bool "Intel MIC Card X100 Driver" + depends on INTEL_MIC_CARD + default Y if INTEL_MIC_CARD + help + This enables card driver support for Intel Many Integrated + Core (MIC) X100 devices. The X100 specific card driver + registers a platform device/driver to enable basic + INTEL_MIC_CARD functionality. + + If you are building a card kernel for an Intel MIC X100 + device and have enabled INTEL_MIC_CARD then say Y else say N. + If unsure say N. + + For more information see + <http://software.intel.com/en-us/mic-developer>. diff --git a/drivers/misc/mic/Makefile b/drivers/misc/mic/Makefile index 8e72421..05b34d6 100644 --- a/drivers/misc/mic/Makefile +++ b/drivers/misc/mic/Makefile @@ -3,3 +3,4 @@ # Copyright(c) 2013, Intel Corporation. # obj-$(CONFIG_INTEL_MIC_HOST) += host/ +obj-$(CONFIG_INTEL_MIC_CARD) += card/ diff --git a/drivers/misc/mic/card/Makefile b/drivers/misc/mic/card/Makefile new file mode 100644 index 0000000..06007ba --- /dev/null +++ b/drivers/misc/mic/card/Makefile @@ -0,0 +1,10 @@ +# +# Makefile - Intel MIC Linux driver. +# Copyright(c) 2013, Intel Corporation. +# +ccflags-y += -DINTEL_MIC_CARD + +obj-$(CONFIG_INTEL_MIC_CARD) += mic_card.o +mic_card-$(CONFIG_INTEL_MIC_CARD_X100) += mic_x100.o +mic_card-y += mic_device.o +mic_card-y += mic_debugfs.o diff --git a/drivers/misc/mic/card/mic_common.h b/drivers/misc/mic/card/mic_common.h new file mode 100644 index 0000000..bba0183b --- /dev/null +++ b/drivers/misc/mic/card/mic_common.h @@ -0,0 +1,43 @@ +/* + * Intel MIC Platform Software Stack (MPSS) + * + * Copyright(c) 2013 Intel Corporation. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License, version 2, as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 + * USA. + * + * The full GNU General Public License is included in this distribution in + * the file called "COPYING". + * + * Disclaimer: The codes contained in these modules may be specific to + * the Intel Software Development Platform codenamed: Knights Ferry, and + * the Intel product codenamed: Knights Corner, and are not backward + * compatible with other Intel products. Additionally, Intel will NOT + * support the codes or instruction set in future products. + * + * Intel MIC Card driver. + * + */ +#ifndef _MIC_CARD_COMMON_H_ +#define _MIC_CARD_COMMON_H_ + +#include <linux/mic_common.h> + +#include "../common/mic_device.h" +#include "mic_device.h" +#ifdef CONFIG_INTEL_MIC_CARD_X100 +#include "mic_x100.h" +#endif + +#endif diff --git a/drivers/misc/mic/card/mic_debugfs.c b/drivers/misc/mic/card/mic_debugfs.c new file mode 100644 index 0000000..1a2fa83 --- /dev/null +++ b/drivers/misc/mic/card/mic_debugfs.c @@ -0,0 +1,139 @@ +/* + * Intel MIC Platform Software Stack (MPSS) + * + * Copyright(c) 2013 Intel Corporation. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License, version 2, as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 + * USA. + * + * The full GNU General Public License is included in this distribution in + * the file called "COPYING". + * + * Disclaimer: The codes contained in these modules may be specific to + * the Intel Software Development Platform codenamed: Knights Ferry, and + * the Intel product codenamed: Knights Corner, and are not backward + * compatible with other Intel products. Additionally, Intel will NOT + * support the codes or instruction set in future products. + * + * Intel MIC Card driver. + * + */ +#include <linux/fs.h> +#include <linux/pci.h> +#include <linux/sched.h> +#include <linux/debugfs.h> +#include <linux/module.h> +#include <linux/delay.h> +#include <linux/seq_file.h> + +#include "mic_common.h" +#include "mic_debugfs.h" + +/* Debugfs parent dir */ +static struct dentry *mic_dbg; + +/** + * intr_test_seq_show - Display MIC kernel log buffer. + * + */ +static int intr_test_seq_show(struct seq_file *s, void *unused) +{ + struct mic_driver *mdrv = s->private; + struct mic_device *mdev = &mdrv->mdev; + + mic_send_intr(mdev, 0); + msleep(1000); + mic_send_intr(mdev, 1); + msleep(1000); + mic_send_intr(mdev, 2); + msleep(1000); + mic_send_intr(mdev, 3); + msleep(1000); + + return 0; +} + +static int intr_test_open(struct inode *inode, struct file *file) +{ + return single_open(file, intr_test_seq_show, inode->i_private); +} + +static int intr_test_release(struct inode *inode, struct file *file) +{ + return single_release(inode, file); +} + +static const struct file_operations intr_test_ops = { + .owner = THIS_MODULE, + .open = intr_test_open, + .read = seq_read, + .llseek = seq_lseek, + .release = intr_test_release +}; + +/** + * mic_create_card_debug_dir - Initialize MIC debugfs entries. + */ +void __init mic_create_card_debug_dir(struct mic_driver *mdrv) +{ + struct dentry *d; + + if (!mic_dbg) + return; + + mdrv->dbg_dir = debugfs_create_dir(mdrv->name, mic_dbg); + if (!mdrv->dbg_dir) { + dev_err(mdrv->dev, "Cant create dbg_dir %s\n", mdrv->name); + return; + } + + d = debugfs_create_file("intr_test", 0444, mdrv->dbg_dir, + mdrv, &intr_test_ops); + + if (!d) { + dev_err(mdrv->dev, + "Cant create dbg intr_test %s\n", mdrv->name); + return; + } +} + +/** + * mic_delete_card_debug_dir - Uninitialize MIC debugfs entries. + */ +void mic_delete_card_debug_dir(struct mic_driver *mdrv) +{ + if (!mdrv->dbg_dir) + return; + + debugfs_remove_recursive(mdrv->dbg_dir); +} + + +/** + * mic_init_card_debugfs - Initialize global debugfs entry. + */ +void __init mic_init_card_debugfs(void) +{ + mic_dbg = debugfs_create_dir(KBUILD_MODNAME, NULL); + if (!mic_dbg) + pr_err("can't create debugfs dir\n"); +} + +/** + * mic_exit_card_debugfs - Uninitialize global debugfs entry + */ +void mic_exit_card_debugfs(void) +{ + debugfs_remove(mic_dbg); +} diff --git a/drivers/misc/mic/card/mic_debugfs.h b/drivers/misc/mic/card/mic_debugfs.h new file mode 100644 index 0000000..155f44c --- /dev/null +++ b/drivers/misc/mic/card/mic_debugfs.h @@ -0,0 +1,40 @@ +/* + * Intel MIC Platform Software Stack (MPSS) + * + * Copyright(c) 2013 Intel Corporation. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License, version 2, as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 + * USA. + * + * The full GNU General Public License is included in this distribution in + * the file called "COPYING". + * + * Disclaimer: The codes contained in these modules may be specific to + * the Intel Software Development Platform codenamed: Knights Ferry, and + * the Intel product codenamed: Knights Corner, and are not backward + * compatible with other Intel products. Additionally, Intel will NOT + * support the codes or instruction set in future products. + * + * Intel MIC Card driver. + * + */ +#ifndef _MIC_CARD_DEBUGFS_H_ +#define _MIC_CARD_DEBUGFS_H_ + +void __init mic_create_card_debug_dir(struct mic_driver *mdrv); +void mic_delete_card_debug_dir(struct mic_driver *mdrv); +void __init mic_init_card_debugfs(void); +void mic_exit_card_debugfs(void); + +#endif diff --git a/drivers/misc/mic/card/mic_device.c b/drivers/misc/mic/card/mic_device.c new file mode 100644 index 0000000..7bfe2e5 --- /dev/null +++ b/drivers/misc/mic/card/mic_device.c @@ -0,0 +1,304 @@ +/* + * Intel MIC Platform Software Stack (MPSS) + * + * Copyright(c) 2013 Intel Corporation. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License, version 2, as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 + * USA. + * + * The full GNU General Public License is included in this distribution in + * the file called "COPYING". + * + * Disclaimer: The codes contained in these modules may be specific to + * the Intel Software Development Platform codenamed: Knights Ferry, and + * the Intel product codenamed: Knights Corner, and are not backward + * compatible with other Intel products. Additionally, Intel will NOT + * support the codes or instruction set in future products. + * + * Intel MIC Card driver. + * + */ +#include <linux/module.h> +#include <linux/pci.h> +#include <linux/interrupt.h> +#include <linux/reboot.h> + +#include "mic_common.h" +#include "mic_debugfs.h" + +static struct mic_driver *g_drv; +static struct mic_irq *shutdown_cookie; + +static void mic_notify_host(u8 state) +{ + struct mic_driver *mdrv = g_drv; + struct mic_bootparam __iomem *bootparam = mdrv->dp; + + iowrite8(state, &bootparam->shutdown_status); + dev_info(mdrv->dev, "%s %d system_state %d\n", + __func__, __LINE__, state); + mic_send_intr(&mdrv->mdev, ioread8(&bootparam->c2h_shutdown_db)); +} + +static int mic_panic_event(struct notifier_block *this, unsigned long event, + void *ptr) +{ + struct mic_driver *mdrv = g_drv; + struct mic_bootparam __iomem *bootparam = mdrv->dp; + + iowrite8(-1, &bootparam->h2c_config_db); + iowrite8(-1, &bootparam->h2c_shutdown_db); + mic_notify_host(MIC_CRASHED); + return NOTIFY_DONE; +} + +static struct notifier_block mic_panic = { + .notifier_call = mic_panic_event, +}; + +static irqreturn_t mic_shutdown_isr(int irq, void *data) +{ + struct mic_driver *mdrv = g_drv; + struct mic_bootparam __iomem *bootparam = mdrv->dp; + + mic_ack_interrupt(&g_drv->mdev); + if (ioread8(&bootparam->shutdown_card)) + orderly_poweroff(true); + return IRQ_HANDLED; +} + +static int mic_shutdown_init(void) +{ + int rc = 0; + struct mic_driver *mdrv = g_drv; + struct mic_bootparam __iomem *bootparam = mdrv->dp; + int shutdown_db; + + shutdown_db = mic_next_card_db(); + shutdown_cookie = mic_request_card_irq(mic_shutdown_isr, + "Shutdown", mdrv, shutdown_db); + if (IS_ERR(shutdown_cookie)) + rc = PTR_ERR(shutdown_cookie); + else + iowrite8(shutdown_db, &bootparam->h2c_shutdown_db); + return rc; +} + +static void mic_shutdown_uninit(void) +{ + struct mic_driver *mdrv = g_drv; + struct mic_bootparam __iomem *bootparam = mdrv->dp; + + iowrite8(-1, &bootparam->h2c_shutdown_db); + mic_free_card_irq(shutdown_cookie, mdrv); +} + +static int __init mic_dp_init(void) +{ + struct mic_driver *mdrv = g_drv; + struct mic_device *mdev = &mdrv->mdev; + struct mic_bootparam __iomem *bootparam; + u64 lo, hi, dp_dma_addr; + u32 magic; + + lo = mic_read_spad(&mdrv->mdev, MIC_DPLO_SPAD); + hi = mic_read_spad(&mdrv->mdev, MIC_DPHI_SPAD); + + dp_dma_addr = lo | (hi << 32); + mdrv->dp = mic_card_map(mdev, dp_dma_addr, MIC_DP_SIZE); + if (!mdrv->dp) { + dev_err(mdrv->dev, "Cannot remap Aperture BAR\n"); + return -ENOMEM; + } + bootparam = mdrv->dp; + magic = ioread32(&bootparam->magic); + if (MIC_MAGIC != magic) { + dev_err(mdrv->dev, "bootparam magic mismatch 0x%x\n", magic); + return -EIO; + } + dev_info(mdrv->dev, "bootparam magic success 0x%x\n", magic); + return 0; +} + +/* Uninitialize the device page */ +static void mic_dp_uninit(void) +{ + mic_card_unmap(&g_drv->mdev, g_drv->dp); +} + +/** + * mic_request_card_irq - request an irq. + * + * @func: The callback function that handles the interrupt. + * @name: The ASCII name of the callee requesting the irq. + * @data: private data that is returned back when calling the + * function handler. + * @index: The doorbell index of the requester. + * + * returns: The cookie that is transparent to the caller. Passed + * back when calling mic_free_irq. An appropriate error code + * is returned on failure. Caller needs to use IS_ERR(return_val) + * to check for failure and PTR_ERR(return_val) to obtained the + * error code. + * + */ +struct mic_irq *mic_request_card_irq(irqreturn_t (*func)(int irq, void *data), + const char *name, void *data, int index) +{ + int rc = 0; + unsigned long cookie; + struct mic_driver *mdrv = g_drv; + + rc = request_irq(mic_db_to_irq(mdrv, index), func, + 0, name, data); + if (rc) { + dev_err(mdrv->dev, "request_irq failed rc = %d\n", rc); + goto err; + } + mdrv->irq_info.irq_usage_count[index]++; + cookie = index; + return (struct mic_irq *)cookie; +err: + return ERR_PTR(rc); + +} + +/** + * mic_free_card_irq - free irq. + * + * @cookie: cookie obtained during a successful call to mic_request_irq + * @data: private data specified by the calling function during the + * mic_request_irq + * + * returns: none. + */ +void mic_free_card_irq(struct mic_irq *cookie, void *data) +{ + int index; + struct mic_driver *mdrv = g_drv; + + index = (unsigned long)cookie & 0xFFFFU; + free_irq(mic_db_to_irq(mdrv, index), data); + mdrv->irq_info.irq_usage_count[index]--; +} + +/** + * mic_next_card_db - Get the doorbell with minimum usage count. + * + * Returns the irq index. + */ +int mic_next_card_db(void) +{ + int i; + int index = 0; + struct mic_driver *mdrv = g_drv; + + for (i = 0; i < mdrv->intr_info.num_intr; i++) { + if (mdrv->irq_info.irq_usage_count[i] < + mdrv->irq_info.irq_usage_count[index]) + index = i; + } + + return index; +} + +/** + * mic_init_irq - Initialize irq information. + * + * Returns 0 in success. Appropriate error code on failure. + */ +static int mic_init_irq(void) +{ + struct mic_driver *mdrv = g_drv; + + mdrv->irq_info.irq_usage_count = kzalloc((sizeof(u32) * + mdrv->intr_info.num_intr), + GFP_KERNEL); + if (!mdrv->irq_info.irq_usage_count) + return -ENOMEM; + return 0; +} + +/** + * mic_uninit_irq - Uninitialize irq information. + * + * None. + */ +static void mic_uninit_irq(void) +{ + struct mic_driver *mdrv = g_drv; + + kfree(mdrv->irq_info.irq_usage_count); +} + +/* + * mic_driver_init - MIC driver initialization tasks. + * + * Returns 0 in success. Appropriate error code on failure. + */ +int mic_driver_init(struct mic_driver *mdrv) +{ + int rc; + + g_drv = mdrv; + /* + * Unloading the card module is not supported. The MIC card module + * handles fundamental operations like host/card initiated shutdowns + * and informing the host about card crashes and cannot be unloaded. + */ + if (!try_module_get(mdrv->dev->driver->owner)) { + rc = -ENODEV; + goto done; + } + rc = mic_dp_init(); + if (rc) + goto put; + rc = mic_init_irq(); + if (rc) + goto dp_uninit; + rc = mic_shutdown_init(); + if (rc) + goto irq_uninit; + mic_create_card_debug_dir(mdrv); + atomic_notifier_chain_register(&panic_notifier_list, &mic_panic); +done: + return rc; +irq_uninit: + mic_uninit_irq(); +dp_uninit: + mic_dp_uninit(); +put: + module_put(mdrv->dev->driver->owner); + return rc; +} + +/* + * mic_driver_uninit - MIC driver uninitialization tasks. + * + * Returns None + */ +void mic_driver_uninit(struct mic_driver *mdrv) +{ + mic_delete_card_debug_dir(mdrv); + /* + * Inform the host about the shutdown status i.e. poweroff/restart etc. + * The module cannot be unloaded so the only code path to call + * mic_devices_uninit(..) is the shutdown callback. + */ + mic_notify_host(system_state); + mic_shutdown_uninit(); + mic_uninit_irq(); + mic_dp_uninit(); + module_put(mdrv->dev->driver->owner); +} diff --git a/drivers/misc/mic/card/mic_device.h b/drivers/misc/mic/card/mic_device.h new file mode 100644 index 0000000..c0c79e4 --- /dev/null +++ b/drivers/misc/mic/card/mic_device.h @@ -0,0 +1,106 @@ +/* + * Intel MIC Platform Software Stack (MPSS) + * + * Copyright(c) 2013 Intel Corporation. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License, version 2, as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 + * USA. + * + * The full GNU General Public License is included in this distribution in + * the file called "COPYING". + * + * Disclaimer: The codes contained in these modules may be specific to + * the Intel Software Development Platform codenamed: Knights Ferry, and + * the Intel product codenamed: Knights Corner, and are not backward + * compatible with other Intel products. Additionally, Intel will NOT + * support the codes or instruction set in future products. + * + * Intel MIC Card driver. + * + */ +#ifndef _MIC_CARD_DEVICE_H_ +#define _MIC_CARD_DEVICE_H_ + +/** + * struct mic_intr_info - Contains h/w specific interrupt sources info + * + * @num_intr: The number of irqs available + */ +struct mic_intr_info { + u32 num_intr; +}; + +/** + * struct mic_irq_info - OS specific irq information + * + * @irq_usage_count: usage count array tracking the number of sources + * assigned for each irq. + */ +struct mic_irq_info { + int *irq_usage_count; +}; + +/** + * struct mic_device - MIC device information. + * + * @mmio: MMIO bar information. + */ +struct mic_device { + struct mic_mw mmio; +}; + +/** + * struct mic_driver - MIC card driver information. + * + * @name: Name for MIC driver. + * @dbg_dir: debugfs directory of this MIC device. + * @dev: The device backing this MIC. + * @dp: The pointer to the virtio device page. + * @mdev: MIC device information for the host. + * @hotplug_work: Hot plug work for adding/removing virtio devices. + * @irq_info: The OS specific irq information + * @intr_info: H/W specific interrupt information. + */ +struct mic_driver { + char name[20]; + struct dentry *dbg_dir; + struct device *dev; + void __iomem *dp; + struct mic_device mdev; + struct work_struct hotplug_work; + struct mic_irq_info irq_info; + struct mic_intr_info intr_info; +}; + +/** + * struct mic_irq - opaque pointer used as cookie + */ +struct mic_irq; + +int mic_driver_init(struct mic_driver *mdrv); +void mic_driver_uninit(struct mic_driver *mdrv); +int mic_next_card_db(void); +struct mic_irq *mic_request_card_irq(irqreturn_t (*func)(int irq, void *data), + const char *name, void *data, int intr_src); +void mic_free_card_irq(struct mic_irq *cookie, void *data); +u32 mic_read_spad(struct mic_device *mdev, unsigned int idx); +void mic_send_intr(struct mic_device *mdev, int doorbell); +int mic_db_to_irq(struct mic_driver *mdrv, int db); +u32 mic_ack_interrupt(struct mic_device *mdev); +void mic_hw_intr_init(struct mic_driver *mdrv); +void __iomem * +mic_card_map(struct mic_device *mdev, dma_addr_t addr, size_t size); +void mic_card_unmap(struct mic_device *mdev, void __iomem *addr); + +#endif diff --git a/drivers/misc/mic/card/mic_x100.c b/drivers/misc/mic/card/mic_x100.c new file mode 100644 index 0000000..7da97c2 --- /dev/null +++ b/drivers/misc/mic/card/mic_x100.c @@ -0,0 +1,253 @@ +/* + * Intel MIC Platform Software Stack (MPSS) + * + * Copyright(c) 2013 Intel Corporation. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License, version 2, as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 + * USA. + * + * The full GNU General Public License is included in this distribution in + * the file called "COPYING". + * + * Disclaimer: The codes contained in these modules may be specific to + * the Intel Software Development Platform codenamed: Knights Ferry, and + * the Intel product codenamed: Knights Corner, and are not backward + * compatible with other Intel products. Additionally, Intel will NOT + * support the codes or instruction set in future products. + * + * Intel MIC Card driver. + * + */ +#include <linux/module.h> +#include <linux/pci.h> +#include <linux/platform_device.h> + +#include "mic_common.h" +#include "mic_debugfs.h" + +static const char mic_driver_name[] = "mic"; + +static struct mic_driver g_drv; + +/** + * mic_read_spad() - read from the scratchpad register + * @mdev: pointer to mic_device instance + * @idx: index to scratchpad register, 0 based + * + * This function allows reading of the 32bit scratchpad register. + * + * RETURNS: An appropriate -ERRNO error value on error, or zero for success. + */ +u32 mic_read_spad(struct mic_device *mdev, unsigned int idx) +{ + return mic_mmio_read(&mdev->mmio, + MIC_X100_SBOX_BASE_ADDRESS + + MIC_X100_SBOX_SPAD0 + idx * 4); +} + +/** + * __mic_send_intr - Send interrupt to Host. + * @mdev: pointer to mic_device instance + * @doorbell: Doorbell number. + */ +void mic_send_intr(struct mic_device *mdev, int doorbell) +{ + struct mic_mw *mw = &mdev->mmio; + + if (doorbell > MIC_X100_MAX_DOORBELL_IDX) + return; + /* Ensure all stores have been completed before sending an interrupt */ + wmb(); + mic_mmio_write(mw, MIC_X100_SBOX_SDBIC0_DBREQ_BIT, + MIC_X100_SBOX_BASE_ADDRESS + + (MIC_X100_SBOX_SDBIC0 + (4 * doorbell))); +} + +/** + * mic_ack_interrupt - Device specific interrupt handling. + * @mdev: pointer to mic_device instance + * + * Returns: bitmask of doorbell events triggered. + */ +u32 mic_ack_interrupt(struct mic_device *mdev) +{ + return 0; +} + +static inline int mic_get_sbox_irq(int db) +{ + return MIC_X100_IRQ_BASE + db; +} + +static inline int mic_get_rdmasr_irq(int index) +{ + return MIC_X100_RDMASR_IRQ_BASE + index; +} + +/** + * mic_hw_intr_init() - Initialize h/w specific interrupt + * information. + * @mdrv: pointer to mic_driver + */ +void mic_hw_intr_init(struct mic_driver *mdrv) +{ + mdrv->intr_info.num_intr = MIC_X100_NUM_SBOX_IRQ + + MIC_X100_NUM_RDMASR_IRQ; +} + +/** + * mic_db_to_irq - Retrieve irq number corresponding to a doorbell. + * @mdrv: pointer to mic_driver + * @db: The doorbell obtained for which the irq is needed. Doorbell + * may correspond to an sbox doorbell or an rdmasr index. + * + * Returns the irq corresponding to the doorbell. + */ +int mic_db_to_irq(struct mic_driver *mdrv, int db) +{ + int rdmasr_index; + if (db < MIC_X100_NUM_SBOX_IRQ) { + return mic_get_sbox_irq(db); + } else { + rdmasr_index = db - MIC_X100_NUM_SBOX_IRQ + + MIC_X100_RDMASR_IRQ_BASE; + return mic_get_rdmasr_irq(rdmasr_index); + } +} + +/* + * mic_card_map - Allocate virtual address for a remote memory region. + * @mdev: pointer to mic_device instance. + * @addr: Remote DMA address. + * @size: Size of the region. + * + * Returns: Virtual address backing the remote memory region. + */ +void __iomem * +mic_card_map(struct mic_device *mdev, dma_addr_t addr, size_t size) +{ + return ioremap(addr, size); +} + +/* + * mic_card_unmap - Unmap the virtual address for a remote memory region. + * @mdev: pointer to mic_device instance. + * @addr: Virtual address for remote memory region. + * + * Returns: None. + */ +void mic_card_unmap(struct mic_device *mdev, void __iomem *addr) +{ + iounmap(addr); +} + +static int mic_probe(struct platform_device *pdev) +{ + struct mic_driver *mdrv = &g_drv; + struct mic_device *mdev = &mdrv->mdev; + int rc = 0; + + mdrv->dev = &pdev->dev; + snprintf(mdrv->name, sizeof(mic_driver_name), mic_driver_name); + + mdev->mmio.pa = MIC_X100_MMIO_BASE; + mdev->mmio.len = MIC_X100_MMIO_LEN; + mdev->mmio.va = ioremap(MIC_X100_MMIO_BASE, MIC_X100_MMIO_LEN); + if (!mdev->mmio.va) { + dev_err(&pdev->dev, "Cannot remap MMIO BAR\n"); + rc = -EIO; + goto done; + } + mic_hw_intr_init(mdrv); + rc = mic_driver_init(mdrv); + if (rc) { + dev_err(&pdev->dev, "mic_driver_init failed rc %d\n", rc); + goto iounmap; + } + dev_info(&pdev->dev, "Probe successful for %s\n", mic_driver_name); +done: + return rc; +iounmap: + iounmap(mdev->mmio.va); + return rc; +} + +static int mic_remove(struct platform_device *pdev) +{ + struct mic_driver *mdrv = &g_drv; + struct mic_device *mdev = &mdrv->mdev; + + mic_driver_uninit(mdrv); + iounmap(mdev->mmio.va); + return 0; +} + +static void mic_platform_shutdown(struct platform_device *pdev) +{ + mic_remove(pdev); +} + +static struct platform_device mic_platform_dev = { + .name = mic_driver_name, + .id = 0, + .num_resources = 0, +}; + +static struct platform_driver mic_platform_driver = { + .probe = mic_probe, + .remove = mic_remove, + .shutdown = mic_platform_shutdown, + .driver = { + .name = mic_driver_name, + .owner = THIS_MODULE, + }, +}; + +static int __init mic_init(void) +{ + int ret; + + mic_init_card_debugfs(); + ret = platform_device_register(&mic_platform_dev); + if (ret) { + pr_err("platform_device_register ret %d\n", ret); + goto cleanup_debugfs; + } + ret = platform_driver_register(&mic_platform_driver); + if (ret) { + pr_err("platform_driver_register ret %d\n", ret); + goto device_unregister; + } + return ret; + +device_unregister: + platform_device_unregister(&mic_platform_dev); +cleanup_debugfs: + mic_exit_card_debugfs(); + return ret; +} + +static void __exit mic_exit(void) +{ + platform_driver_unregister(&mic_platform_driver); + platform_device_unregister(&mic_platform_dev); + mic_exit_card_debugfs(); +} + +module_init(mic_init); +module_exit(mic_exit); + +MODULE_AUTHOR("Intel Corporation"); +MODULE_DESCRIPTION("Intel(R) MIC X100 Card driver"); +MODULE_LICENSE("GPL"); diff --git a/drivers/misc/mic/card/mic_x100.h b/drivers/misc/mic/card/mic_x100.h new file mode 100644 index 0000000..a2ba1a8 --- /dev/null +++ b/drivers/misc/mic/card/mic_x100.h @@ -0,0 +1,53 @@ +/* + * Intel MIC Platform Software Stack (MPSS) + * + * Copyright(c) 2013 Intel Corporation. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License, version 2, as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 + * USA. + * + * The full GNU General Public License is included in this distribution in + * the file called "COPYING". + * + * Disclaimer: The codes contained in these modules may be specific to + * the Intel Software Development Platform codenamed: Knights Ferry, and + * the Intel product codenamed: Knights Corner, and are not backward + * compatible with other Intel products. Additionally, Intel will NOT + * support the codes or instruction set in future products. + * + * Intel MIC Card driver. + * + */ +#ifndef _MIC_X100_CARD_H_ +#define _MIC_X100_CARD_H_ + +#define MIC_X100_MMIO_BASE 0x08007C0000ULL +#define MIC_X100_MMIO_LEN 0x00020000ULL +#define MIC_X100_SBOX_BASE_ADDRESS 0x00010000ULL + +#define MIC_X100_SBOX_SPAD0 0x0000AB20 +#define MIC_X100_SBOX_SDBIC0 0x0000CC90 +#define MIC_X100_SBOX_SDBIC0_DBREQ_BIT 0x80000000 +#define MIC_X100_SBOX_RDMASR0 0x0000B180 + +#define MIC_X100_MAX_DOORBELL_IDX 8 + +#define MIC_X100_NUM_SBOX_IRQ 8 +#define MIC_X100_NUM_RDMASR_IRQ 8 +#define MIC_X100_SBOX_IRQ_BASE 0 +#define MIC_X100_RDMASR_IRQ_BASE 17 + +#define MIC_X100_IRQ_BASE 26 + +#endif -- 1.8.2.1
Sudeep Dutt
2013-Jul-25 03:31 UTC
[PATCH 3/5] Intel MIC Host Driver Changes for Virtio Devices.
From: Ashutosh Dixit <ashutosh.dixit at intel.com> This patch introduces the host "Virtio over PCIe" interface for Intel MIC. It allows creating user space backends on the host and instantiating virtio devices for them on the Intel MIC card. A character device per MIC is exposed with IOCTL, mmap and poll callbacks. This allows the user space backend to: (a) add/remove a virtio device via a device page. (b) map (R/O) virtio rings and device page to user space. (c) poll for availability of data. (d) copy a descriptor or entire descriptor chain to/from the card. (e) modify virtio configuration. (f) handle virtio device reset. The buffers are copied over using CPU copies for this initial patch and host initiated MIC DMA support is planned for future patches. The avail and desc virtio rings are in host memory and the used ring is in card memory to maximize writes across PCIe for performance. Co-author: Sudeep Dutt <sudeep.dutt at intel.com> Signed-off-by: Ashutosh Dixit <ashutosh.dixit at intel.com> Signed-off-by: Caz Yokoyama <Caz.Yokoyama at intel.com> Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli at intel.com> Signed-off-by: Nikhil Rao <nikhil.rao at intel.com> Signed-off-by: Harshavardhan R Kharche <harshavardhan.r.kharche at intel.com> Signed-off-by: Sudeep Dutt <sudeep.dutt at intel.com> Acked-by: Yaozu (Eddie) Dong <eddie.dong at intel.com> Reviewed-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr at intel.com> --- drivers/misc/mic/common/mic_device.h | 4 + drivers/misc/mic/host/Makefile | 2 + drivers/misc/mic/host/mic_boot.c | 2 + drivers/misc/mic/host/mic_debugfs.c | 137 +++++++ drivers/misc/mic/host/mic_fops.c | 280 ++++++++++++++ drivers/misc/mic/host/mic_fops.h | 37 ++ drivers/misc/mic/host/mic_main.c | 24 ++ drivers/misc/mic/host/mic_virtio.c | 703 +++++++++++++++++++++++++++++++++++ drivers/misc/mic/host/mic_virtio.h | 108 ++++++ include/uapi/linux/Kbuild | 1 + include/uapi/linux/mic_common.h | 165 +++++++- include/uapi/linux/mic_ioctl.h | 104 ++++++ 12 files changed, 1566 insertions(+), 1 deletion(-) create mode 100644 drivers/misc/mic/host/mic_fops.c create mode 100644 drivers/misc/mic/host/mic_fops.h create mode 100644 drivers/misc/mic/host/mic_virtio.c create mode 100644 drivers/misc/mic/host/mic_virtio.h create mode 100644 include/uapi/linux/mic_ioctl.h diff --git a/drivers/misc/mic/common/mic_device.h b/drivers/misc/mic/common/mic_device.h index 24934b1..7cdeb74 100644 --- a/drivers/misc/mic/common/mic_device.h +++ b/drivers/misc/mic/common/mic_device.h @@ -78,4 +78,8 @@ mic_mmio_write(struct mic_mw *mw, u32 val, u32 offset) #define MIC_DPLO_SPAD 14 #define MIC_DPHI_SPAD 15 +/* These values are supposed to be in ext_params on an interrupt */ +#define MIC_VIRTIO_PARAM_DEV_REMOVE 0x1 +#define MIC_VIRTIO_PARAM_CONFIG_CHANGED 0x2 + #endif diff --git a/drivers/misc/mic/host/Makefile b/drivers/misc/mic/host/Makefile index 0608bbb..e02abdb 100644 --- a/drivers/misc/mic/host/Makefile +++ b/drivers/misc/mic/host/Makefile @@ -9,3 +9,5 @@ mic_host-objs += mic_sysfs.o mic_host-objs += mic_boot.o mic_host-objs += mic_smpt.o mic_host-objs += mic_debugfs.o +mic_host-objs += mic_fops.o +mic_host-objs += mic_virtio.o diff --git a/drivers/misc/mic/host/mic_boot.c b/drivers/misc/mic/host/mic_boot.c index 6485a87..40bcb90 100644 --- a/drivers/misc/mic/host/mic_boot.c +++ b/drivers/misc/mic/host/mic_boot.c @@ -30,6 +30,7 @@ #include <linux/delay.h> #include "mic_common.h" +#include "mic_virtio.h" /** * mic_reset - Reset the MIC device. @@ -112,6 +113,7 @@ void mic_stop(struct mic_device *mdev, bool force) { mutex_lock(&mdev->mic_mutex); if (MIC_OFFLINE != mdev->state || force) { + mic_virtio_reset_devices(mdev); mic_bootparam_init(mdev); mic_reset(mdev); if (MIC_RESET_FAILED == mdev->state) diff --git a/drivers/misc/mic/host/mic_debugfs.c b/drivers/misc/mic/host/mic_debugfs.c index 5b7697e..bebc6e3 100644 --- a/drivers/misc/mic/host/mic_debugfs.c +++ b/drivers/misc/mic/host/mic_debugfs.c @@ -32,6 +32,7 @@ #include "mic_common.h" #include "mic_debugfs.h" +#include "mic_virtio.h" /* Debugfs parent dir */ static struct dentry *mic_dbg; @@ -207,7 +208,13 @@ static const struct file_operations post_code_ops = { static int dp_seq_show(struct seq_file *s, void *pos) { struct mic_device *mdev = s->private; + struct mic_device_desc *d; + struct mic_device_ctrl *dc; + struct mic_vqconfig *vqconfig; + __u32 *features; + __u8 *config; struct mic_bootparam *bootparam = mdev->dp; + int i, j; seq_printf(s, "Bootparam: magic 0x%x\n", bootparam->magic); @@ -222,6 +229,53 @@ static int dp_seq_show(struct seq_file *s, void *pos) seq_printf(s, "Bootparam: shutdown_card %d\n", bootparam->shutdown_card); + for (i = sizeof(*bootparam); i < MIC_DP_SIZE; + i += mic_total_desc_size(d)) { + d = mdev->dp + i; + dc = (void *)d + mic_aligned_desc_size(d); + + /* end of list */ + if (d->type == 0) + break; + + if (d->type == -1) + continue; + + seq_printf(s, "Type %d ", d->type); + seq_printf(s, "Num VQ %d ", d->num_vq); + seq_printf(s, "Feature Len %d\n", d->feature_len); + seq_printf(s, "Config Len %d ", d->config_len); + seq_printf(s, "Shutdown Status %d\n", d->status); + + for (j = 0; j < d->num_vq; j++) { + vqconfig = mic_vq_config(d) + j; + seq_printf(s, "vqconfig[%d]: ", j); + seq_printf(s, "address 0x%llx ", vqconfig->address); + seq_printf(s, "num %d ", vqconfig->num); + seq_printf(s, "used address 0x%llx\n", + vqconfig->used_address); + } + + features = (__u32 *) mic_vq_features(d); + seq_printf(s, "Features: Host 0x%x ", features[0]); + seq_printf(s, "Guest 0x%x\n", features[1]); + + config = mic_vq_configspace(d); + for (j = 0; j < d->config_len; j++) + seq_printf(s, "config[%d]=%d\n", j, config[j]); + + seq_puts(s, "Device control:\n"); + seq_printf(s, "Config Change %d ", dc->config_change); + seq_printf(s, "Vdev reset %d\n", dc->vdev_reset); + seq_printf(s, "Guest Ack %d ", dc->guest_ack); + seq_printf(s, "Host ack %d\n", dc->host_ack); + seq_printf(s, "Used address updated %d ", + dc->used_address_updated); + seq_printf(s, "Vdev 0x%llx\n", dc->vdev); + seq_printf(s, "c2h doorbell %d ", dc->c2h_vdev_db); + seq_printf(s, "h2c doorbell %d\n", dc->h2c_vdev_db); + } + return 0; } @@ -243,6 +297,86 @@ static const struct file_operations dp_ops = { .release = dp_debug_release }; +static int vdev_info_seq_show(struct seq_file *s, void *unused) +{ + struct mic_device *mdev = s->private; + struct list_head *pos, *tmp; + struct mic_vdev *mvdev; + int i, j; + + mutex_lock(&mdev->mic_mutex); + list_for_each_safe(pos, tmp, &mdev->vdev_list) { + mvdev = list_entry(pos, struct mic_vdev, list); + seq_printf(s, "VDEV type %d state %s in %ld out %ld\n", + mvdev->virtio_id, + mic_vdevup(mvdev) ? "UP" : "DOWN", + mvdev->in_bytes, + mvdev->out_bytes); + for (i = 0; i < MIC_MAX_VRINGS; i++) { + struct vring_desc *desc; + struct vring_avail *avail; + struct vring_used *used; + int num = mvdev->vring[i].vr.num; + if (!num) + continue; + desc = mvdev->vring[i].vr.desc; + seq_printf(s, "vring i %d avail_idx %d", + i, mvdev->vring[i].info->avail_idx & (num - 1)); + seq_printf(s, " used_idx %d num %d\n", + mvdev->vring[i].info->used_idx & (num - 1), + num); + seq_printf(s, "vring i %d avail_idx %d used_idx %d\n", + i, mvdev->vring[i].info->avail_idx, + mvdev->vring[i].info->used_idx); + for (j = 0; j < num; j++) { + seq_printf(s, "desc[%d] addr 0x%llx len %d", + j, desc->addr, desc->len); + seq_printf(s, " flags 0x%x next %d\n", + desc->flags, + desc->next); + desc++; + } + avail = mvdev->vring[i].vr.avail; + seq_printf(s, "avail flags 0x%x idx %d\n", + avail->flags, avail->idx & (num - 1)); + seq_printf(s, "avail flags 0x%x idx %d\n", + avail->flags, avail->idx); + for (j = 0; j < num; j++) + seq_printf(s, "avail ring[%d] %d\n", + j, avail->ring[j]); + used = mvdev->vring[i].vr.used; + seq_printf(s, "used flags 0x%x idx %d\n", + used->flags, used->idx & (num - 1)); + seq_printf(s, "used flags 0x%x idx %d\n", + used->flags, used->idx); + for (j = 0; j < num; j++) + seq_printf(s, "used ring[%d] id %d len %d\n", + j, used->ring[j].id, used->ring[j].len); + } + } + mutex_unlock(&mdev->mic_mutex); + + return 0; +} + +static int vdev_info_debug_open(struct inode *inode, struct file *file) +{ + return single_open(file, vdev_info_seq_show, inode->i_private); +} + +static int vdev_info_debug_release(struct inode *inode, struct file *file) +{ + return single_release(inode, file); +} + +static const struct file_operations vdev_info_ops = { + .owner = THIS_MODULE, + .open = vdev_info_debug_open, + .read = seq_read, + .llseek = seq_lseek, + .release = vdev_info_debug_release +}; + static int msi_irq_info_seq_show(struct seq_file *s, void *pos) { struct mic_device *mdev = s->private; @@ -332,6 +466,9 @@ void __init mic_create_debug_dir(struct mic_device *mdev) debugfs_create_file("dp", 0444, mdev->dbg_dir, mdev, &dp_ops); + debugfs_create_file("vdev_info", 0444, mdev->dbg_dir, + mdev, &vdev_info_ops); + debugfs_create_file("msi_irq_info", 0444, mdev->dbg_dir, mdev, &msi_irq_info_ops); } diff --git a/drivers/misc/mic/host/mic_fops.c b/drivers/misc/mic/host/mic_fops.c new file mode 100644 index 0000000..626a454 --- /dev/null +++ b/drivers/misc/mic/host/mic_fops.c @@ -0,0 +1,280 @@ +/* + * Intel MIC Platform Software Stack (MPSS) + * + * Copyright(c) 2013 Intel Corporation. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License, version 2, as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 + * USA. + * + * The full GNU General Public License is included in this distribution in + * the file called "COPYING". + * + * Intel MIC Host driver. + * + */ +#include <linux/module.h> +#include <linux/fs.h> +#include <linux/pci.h> +#include <linux/interrupt.h> +#include <linux/firmware.h> +#include <linux/completion.h> +#include <linux/poll.h> +#include <linux/virtio_ids.h> +#include <linux/mic_ioctl.h> + +#include "mic_common.h" +#include "mic_fops.h" +#include "mic_virtio.h" + +int mic_open(struct inode *inode, struct file *f) +{ + struct mic_vdev *mvdev; + struct mic_device *mdev = container_of(inode->i_cdev, + struct mic_device, cdev); + + mvdev = kzalloc(sizeof(*mvdev), GFP_KERNEL); + if (!mvdev) + return -ENOMEM; + + init_waitqueue_head(&mvdev->waitq); + INIT_LIST_HEAD(&mvdev->list); + mvdev->mdev = mdev; + mvdev->virtio_id = -1; + + f->private_data = mvdev; + return 0; +} + +int mic_release(struct inode *inode, struct file *f) +{ + struct mic_vdev *mvdev = (struct mic_vdev *)f->private_data; + + if (-1 != mvdev->virtio_id) + mic_virtio_del_device(mvdev); + f->private_data = NULL; + kfree(mvdev); + return 0; +} + +long mic_ioctl(struct file *f, unsigned int cmd, unsigned long arg) +{ + struct mic_vdev *mvdev = (struct mic_vdev *)f->private_data; + void __user *argp = (void __user *)arg; + int ret; + + switch (cmd) { + case MIC_VIRTIO_ADD_DEVICE: + { + ret = mic_virtio_add_device(mvdev, argp); + if (ret < 0) { + dev_err(mic_dev(mvdev), + "%s %d errno ret %d\n", + __func__, __LINE__, ret); + return ret; + } + break; + } + case MIC_VIRTIO_COPY_DESC: + { + struct mic_copy_desc request; + struct mic_copy *copy = &request.copy; + + ret = mic_vdev_inited(mvdev); + if (ret) + return ret; + + if (copy_from_user(&request, argp, sizeof(request))) + return -EFAULT; + + dev_dbg(mic_dev(mvdev), + "%s %d === iovcnt 0x%x vr_idx 0x%x desc_idx 0x%x " + "used_idx 0x%x used_len 0x%x\n", + __func__, __LINE__, copy->iovcnt, + copy->vr_idx, copy->desc_idx, + request.used_desc_idx, request.used_len); + + ret = mic_virtio_copy_desc(mvdev, &request); + if (ret < 0) { + dev_err(mic_dev(mvdev), + "%s %d errno ret %d\n", + __func__, __LINE__, ret); + return ret; + } + if (copy_to_user( + &((struct mic_copy_desc __user *)argp)->copy.out_cookie, + ©->out_cookie, sizeof(copy->out_cookie))) { + dev_err(mic_dev(mvdev), "%s %d errno ret %d\n", + __func__, __LINE__, -EFAULT); + return -EFAULT; + } + if (copy_to_user( + &((struct mic_copy_desc __user *)argp)->copy.out_len, + ©->out_len, sizeof(copy->out_len))) { + dev_err(mic_dev(mvdev), "%s %d errno ret %d\n", + __func__, __LINE__, -EFAULT); + return -EFAULT; + } + break; + } + case MIC_VIRTIO_COPY_CHAIN: + { + struct mic_copy request; + + ret = mic_vdev_inited(mvdev); + if (ret) + return ret; + + if (copy_from_user(&request, argp, sizeof(request))) + return -EFAULT; + + dev_dbg(mic_dev(mvdev), + "%s %d === vr_idx 0x%x desc_idx 0x%x iovcnt 0x%x\n", + __func__, __LINE__, + request.vr_idx, request.desc_idx, request.iovcnt); + + ret = mic_virtio_copy_chain(mvdev, &request); + if (ret < 0) { + dev_err(mic_dev(mvdev), + "%s %d errno ret %d\n", + __func__, __LINE__, ret); + return ret; + } + if (copy_to_user( + &((struct mic_copy __user *)argp)->out_cookie, + &request.out_cookie, sizeof(request.out_cookie))) { + dev_err(mic_dev(mvdev), "%s %d errno ret %d\n", + __func__, __LINE__, -EFAULT); + return -EFAULT; + } + if (copy_to_user(&((struct mic_copy __user *)argp)->out_len, + &request.out_len, + sizeof(request.out_len))) { + dev_err(mic_dev(mvdev), "%s %d errno ret %d\n", + __func__, __LINE__, -EFAULT); + return -EFAULT; + } + break; + } + case MIC_VIRTIO_CONFIG_CHANGE: + { + ret = mic_vdev_inited(mvdev); + if (ret) + return ret; + + ret = mic_virtio_config_change(mvdev, argp); + if (ret < 0) { + dev_err(mic_dev(mvdev), + "%s %d errno ret %d\n", + __func__, __LINE__, ret); + return ret; + } + break; + } + default: + return -ENOIOCTLCMD; + }; + return 0; +} + +/* + * We return POLLIN | POLLOUT from poll when new buffers are enqueued, and + * not when previously enqueued buffers may be available. This means that + * in the card->host (TX) path, when userspace is unblocked by poll it + * must drain all available descriptors or it can stall. + */ +unsigned int mic_poll(struct file *f, poll_table *wait) +{ + struct mic_vdev *mvdev = (struct mic_vdev *)f->private_data; + int mask = 0; + + poll_wait(f, &mvdev->waitq, wait); + + if (mic_vdev_inited(mvdev)) + mask = POLLERR; + else if (mvdev->poll_wake) { + mvdev->poll_wake = 0; + mask = POLLIN | POLLOUT; + } + + return mask; +} + +static inline int +mic_query_offset(struct mic_vdev *mvdev, unsigned long offset, + unsigned long *size, unsigned long *pa) +{ + struct mic_device *mdev = mvdev->mdev; + unsigned long start = MIC_DP_SIZE; + int i; + + /* + * MMAP interface is as follows: + * offset region + * 0x0 virtio device_page + * 0x1000 first vring + * 0x1000 + size of 1st vring second vring + * .... + */ + if (!offset) { + *pa = virt_to_phys(mdev->dp); + *size = MIC_DP_SIZE; + return 0; + } + + for (i = 0; i < mvdev->dd->num_vq; i++) { + if (offset == start) { + *pa = virt_to_phys(mvdev->vring[i].va); + *size = mvdev->vring[i].len; + return 0; + } + start += mvdev->vring[i].len; + } + return -1; +} + +/* + * Maps the device page and virtio rings to user space for readonly access. + */ +int +mic_mmap(struct file *f, struct vm_area_struct *vma) +{ + struct mic_vdev *mvdev = (struct mic_vdev *)f->private_data; + unsigned long offset = vma->vm_pgoff << PAGE_SHIFT; + unsigned long pa, size = vma->vm_end - vma->vm_start, size_rem = size; + int i, err; + + err = mic_vdev_inited(mvdev); + if (err) + return err; + + if (vma->vm_flags & VM_WRITE) + return -EACCES; + + while (size_rem) { + i = mic_query_offset(mvdev, offset, &size, &pa); + if (i < 0) + return -EINVAL; + err = remap_pfn_range(vma, vma->vm_start + offset, + pa >> PAGE_SHIFT, size, vma->vm_page_prot); + if (err) + return err; + dev_dbg(mic_dev(mvdev), + "%s %d type %d size 0x%lx off 0x%lx pa 0x%lx vma 0x%lx\n", + __func__, __LINE__, mvdev->virtio_id, size, offset, + pa, vma->vm_start + offset); + size_rem -= size; + offset += size; + } + return 0; +} diff --git a/drivers/misc/mic/host/mic_fops.h b/drivers/misc/mic/host/mic_fops.h new file mode 100644 index 0000000..504506c --- /dev/null +++ b/drivers/misc/mic/host/mic_fops.h @@ -0,0 +1,37 @@ +/* + * Intel MIC Platform Software Stack (MPSS) + * + * Copyright(c) 2013 Intel Corporation. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License, version 2, as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 + * USA. + * + * The full GNU General Public License is included in this distribution in + * the file called "COPYING". + * + * Intel MIC Host driver. + * + */ +#ifndef _MIC_FOPS_H_ +#define _MIC_FOPS_H_ + +int mic_open(struct inode *inode, struct file *filp); +int mic_release(struct inode *inode, struct file *filp); +ssize_t mic_read(struct file *filp, char __user *buf, + size_t count, loff_t *pos); +long mic_ioctl(struct file *filp, unsigned int cmd, unsigned long arg); +int mic_mmap(struct file *f, struct vm_area_struct *vma); +unsigned int mic_poll(struct file *f, poll_table *wait); + +#endif diff --git a/drivers/misc/mic/host/mic_main.c b/drivers/misc/mic/host/mic_main.c index 70cc235..dd421d5 100644 --- a/drivers/misc/mic/host/mic_main.c +++ b/drivers/misc/mic/host/mic_main.c @@ -37,6 +37,8 @@ #include "mic_common.h" #include "mic_debugfs.h" +#include "mic_fops.h" +#include "mic_virtio.h" static const char mic_driver_name[] = "mic"; @@ -79,6 +81,15 @@ struct mic_info { /* g_mic - Global information about all MIC devices. */ static struct mic_info g_mic; +static const struct file_operations mic_fops = { + .open = mic_open, + .release = mic_release, + .unlocked_ioctl = mic_ioctl, + .poll = mic_poll, + .mmap = mic_mmap, + .owner = THIS_MODULE, +}; + /* Initialize the device page */ static int mic_dp_init(struct mic_device *mdev) { @@ -968,8 +979,20 @@ static int mic_probe(struct pci_dev *pdev, const struct pci_device_id *ent) mic_bootparam_init(mdev); mic_create_debug_dir(mdev); + cdev_init(&mdev->cdev, &mic_fops); + mdev->cdev.owner = THIS_MODULE; + rc = cdev_add(&mdev->cdev, MKDEV(MAJOR(g_mic.dev), mdev->id), 1); + if (rc) { + dev_err(&pdev->dev, "cdev_add err id %d rc %d\n", mdev->id, rc); + goto cleanup_debug_dir; + } dev_info(&pdev->dev, "Probe successful for %s\n", mdev->name); return 0; +cleanup_debug_dir: + mic_delete_debug_dir(mdev); + mutex_lock(&mdev->mic_mutex); + mic_free_irq(mdev, mdev->shutdown_cookie, mdev); + mutex_unlock(&mdev->mic_mutex); dp_uninit: mic_dp_uninit(mdev); sysfs_put: @@ -1019,6 +1042,7 @@ static void mic_remove(struct pci_dev *pdev) id = mdev->id; mic_stop(mdev, false); + cdev_del(&mdev->cdev); mic_delete_debug_dir(mdev); mutex_lock(&mdev->mic_mutex); mic_free_irq(mdev, mdev->shutdown_cookie, mdev); diff --git a/drivers/misc/mic/host/mic_virtio.c b/drivers/misc/mic/host/mic_virtio.c new file mode 100644 index 0000000..7282e12 --- /dev/null +++ b/drivers/misc/mic/host/mic_virtio.c @@ -0,0 +1,703 @@ +/* + * Intel MIC Platform Software Stack (MPSS) + * + * Copyright(c) 2013 Intel Corporation. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License, version 2, as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 + * USA. + * + * The full GNU General Public License is included in this distribution in + * the file called "COPYING". + * + * Intel MIC Host driver. + * + */ +#include <linux/module.h> +#include <linux/fs.h> +#include <linux/pci.h> +#include <linux/interrupt.h> +#include <linux/firmware.h> +#include <linux/completion.h> +#include <linux/poll.h> +#include <linux/sched.h> +#include <uapi/linux/virtio_ids.h> +#include <uapi/linux/virtio_net.h> + +#include "mic_common.h" +#include "mic_virtio.h" + +/* See comments in vhost.c for explanation of next_desc() */ +static unsigned next_desc(struct vring_desc *desc) +{ + unsigned int next; + + if (!(le16_to_cpu(desc->flags) & VRING_DESC_F_NEXT)) + return -1U; + next = le16_to_cpu(desc->next); + read_barrier_depends(); + return next; +} + +/* + * Central API which initiates the copies across the PCIe bus. + */ +static int mic_virtio_copy_desc_buf(struct mic_vdev *mvdev, + struct vring_desc *desc, + void __user *ubuf, u32 rem_len, u32 doff, u32 *out_len) +{ + void __iomem *dbuf; + int err; + u32 len = le32_to_cpu(desc->len); + u16 flags = le16_to_cpu(desc->flags); + u64 addr = le64_to_cpu(desc->addr); + + dbuf = mvdev->mdev->aper.va + addr + doff; + *out_len = min_t(u32, rem_len, len - doff); + if (flags & VRING_DESC_F_WRITE) { + /* + * We are copying to IO below and the subsequent + * wmb(..) ensures that the stores have completed. + * We should ideally use something like + * copy_from_user_toio(..) if it existed. + */ + if (copy_from_user(dbuf, ubuf, *out_len)) { + err = -EFAULT; + dev_err(mic_dev(mvdev), "%s %d err %d\n", + __func__, __LINE__, err); + goto err; + } + mvdev->out_bytes += *out_len; + wmb(); + } else { + /* + * We are copying from IO below and the subsequent + * rmb(..) ensures that the loads have completed. + * We should ideally use something like + * copy_to_user_fromio(..) if it existed. + */ + if (copy_to_user(ubuf, dbuf, *out_len)) { + err = -EFAULT; + dev_err(mic_dev(mvdev), "%s %d err %d\n", + __func__, __LINE__, err); + goto err; + } + mvdev->in_bytes += *out_len; + rmb(); + } + err = 0; +err: + dev_dbg(mic_dev(mvdev), + "%s: ubuf %p dbuf %p rem_len 0x%x *out_len 0x%x " + "dlen 0x%x desc->writable %d err %d\n", + __func__, ubuf, dbuf, rem_len, *out_len, + len, flags & VRING_DESC_F_WRITE, err); + return err; +} + +/* Iterate over the virtio descriptor chain and issue the copies */ +static int _mic_virtio_copy(struct mic_vdev *mvdev, + struct mic_copy *copy, bool chain) +{ + struct mic_vring *vr; + struct vring_desc *desc; + u32 desc_idx = copy->desc_idx; + int ret = 0, iovcnt = copy->iovcnt; + struct iovec iov; + struct iovec __user *u_iov = copy->iov; + u32 rem_ulen, rem_dlen, len, doff; + void __user *ubuf = NULL; + + vr = &mvdev->vring[copy->vr_idx]; + desc = vr->vr.desc; + copy->out_len = 0; + rem_dlen = le32_to_cpu(desc[desc_idx].len); + rem_ulen = 0; + doff = 0; + + while (iovcnt && desc_idx != -1U) { + if (!rem_ulen) { + /* Copy over a new iovec */ + ret = copy_from_user(&iov, u_iov, sizeof(*u_iov)); + if (ret) { + ret = -EINVAL; + dev_err(mic_dev(mvdev), "%s %d err %d\n", + __func__, __LINE__, ret); + break; + } + rem_ulen = iov.iov_len; + ubuf = iov.iov_base; + } + ret = mic_virtio_copy_desc_buf(mvdev, + &desc[desc_idx], + ubuf, rem_ulen, doff, &len); + if (ret) + break; + + dev_dbg(mic_dev(mvdev), + "%s: desc_idx 0x%x rem_ulen 0x%x rem_dlen 0x%x " + "doff 0x%x dlen 0x%x\n", + __func__, desc_idx, rem_ulen, rem_dlen, + doff, le32_to_cpu(desc[desc_idx].len)); + + copy->out_len += len; + rem_ulen -= len; + rem_dlen -= len; + ubuf += len; + doff += len; + /* One iovec is now completed */ + if (!rem_ulen) { + iovcnt--; + u_iov++; + } + /* One descriptor is now completed */ + if (!rem_dlen) { + desc_idx = next_desc(&desc[desc_idx]); + if (desc_idx != -1U) { + rem_dlen = le32_to_cpu(desc[desc_idx].len); + doff = 0; + } + } + } + /* + * Return EINVAL if a chain should be processed, but we have run out + * of iovecs while there are readable descriptors remaining in the + * chain. + */ + if (chain && desc_idx != -1U && + !(le16_to_cpu(desc->flags) & VRING_DESC_F_WRITE)) { + dev_err(mic_dev(mvdev), "%s not enough iovecs\n", __func__); + ret = -EINVAL; + } + return ret; +} + +static inline void +mic_update_local_avail(struct mic_vdev *mvdev, u8 vr_idx) +{ + struct mic_vring *vr = &mvdev->vring[vr_idx]; + vr->info->avail_idx++; +} + +/* Update the used ring */ +static void mic_update_used(struct mic_vdev *mvdev, u8 vr_idx, + u32 used_desc_idx, u32 used_len) +{ + struct mic_vring *vr = &mvdev->vring[vr_idx]; + u16 used_idx; + s8 db = mvdev->dc->h2c_vdev_db; + + used_idx = vr->info->used_idx & (vr->vr.num - 1); + iowrite32(used_desc_idx, &vr->vr.used->ring[used_idx].id); + iowrite32(used_len, &vr->vr.used->ring[used_idx].len); + wmb(); + iowrite16(++vr->info->used_idx, &vr->vr.used->idx); + dev_dbg(mic_dev(mvdev), + "%s: ======== vr_idx %d used_idx 0x%x used_len 0x%x ========\n", + __func__, vr_idx, used_desc_idx, used_len); + wmb(); + /* Check if the remote device wants us to suppress interrupts */ + if (le16_to_cpu(vr->vr.avail->flags) & VRING_AVAIL_F_NO_INTERRUPT) + return; + if (db != -1) + mvdev->mdev->ops->send_intr(mvdev->mdev, db); +} + +static inline int verify_copy_args(struct mic_vdev *mvdev, + struct mic_copy *request) +{ + if (request->vr_idx >= mvdev->dd->num_vq) { + dev_err(mic_dev(mvdev), "%s %d err %d\n", + __func__, __LINE__, -EINVAL); + return -EINVAL; + } + + if (request->desc_idx >+ le16_to_cpu(mic_vq_config(mvdev->dd)->num)) { + dev_err(mic_dev(mvdev), "%s %d err %d\n", + __func__, __LINE__, -EINVAL); + return -EINVAL; + } + + return 0; +} + +#define PROCESS_DESC_CHAIN true + +/* Copy a specified number of virtio descriptors in a chain */ +int mic_virtio_copy_desc(struct mic_vdev *mvdev, + struct mic_copy_desc *request) +{ + int err; + struct mutex *vr_mutex; + + err = verify_copy_args(mvdev, &request->copy); + if (err) + return err; + + vr_mutex = &mvdev->vr_mutex[request->copy.vr_idx]; + mutex_lock(vr_mutex); + if (!mic_vdevup(mvdev)) { + err = -ENODEV; + dev_err(mic_dev(mvdev), "%s %d err %d\n", + __func__, __LINE__, err); + goto err; + } + err = _mic_virtio_copy(mvdev, &request->copy, !PROCESS_DESC_CHAIN); + if (err) { + dev_err(mic_dev(mvdev), "%s %d err %d\n", + __func__, __LINE__, err); + } else if (request->used_desc_idx != -1) { + if (request->used_desc_idx >+ le16_to_cpu(mic_vq_config(mvdev->dd)->num)) { + dev_err(mic_dev(mvdev), "%s %d err %d\n", + __func__, __LINE__, -EINVAL); + err = -EINVAL; + goto err; + } + mic_update_local_avail(mvdev, request->copy.vr_idx); + mic_update_used(mvdev, request->copy.vr_idx, + request->used_desc_idx, request->used_len); + } +err: + mutex_unlock(vr_mutex); + return err; +} + +/* Copy a chain of virtio descriptors */ +int mic_virtio_copy_chain(struct mic_vdev *mvdev, + struct mic_copy *request) +{ + int err; + struct mutex *vr_mutex; + + err = verify_copy_args(mvdev, request); + if (err) + return err; + + vr_mutex = &mvdev->vr_mutex[request->vr_idx]; + mutex_lock(vr_mutex); + if (!mic_vdevup(mvdev)) { + err = -ENODEV; + dev_err(mic_dev(mvdev), "%s %d err %d\n", + __func__, __LINE__, err); + goto err; + } + err = _mic_virtio_copy(mvdev, request, PROCESS_DESC_CHAIN); + if (!err) { + mic_update_local_avail(mvdev, request->vr_idx); + mic_update_used(mvdev, request->vr_idx, + request->desc_idx, request->out_len); + } else + dev_err(mic_dev(mvdev), "%s %d err %d\n", + __func__, __LINE__, err); +err: + mutex_unlock(vr_mutex); + return err; +} + +static void mic_virtio_init_post(struct mic_vdev *mvdev) +{ + struct mic_vqconfig *vqconfig = mic_vq_config(mvdev->dd); + int i; + + for (i = 0; i < mvdev->dd->num_vq; i++) { + if (!le64_to_cpu(vqconfig[i].used_address)) { + dev_warn(mic_dev(mvdev), "used_address zero??\n"); + continue; + } + mvdev->vring[i].vr.used + mvdev->mdev->aper.va + + le64_to_cpu(vqconfig[i].used_address); + } + + smp_wmb(); + mvdev->dc->used_address_updated = 0; + + dev_info(mic_dev(mvdev), "%s: device type %d LINKUP\n", + __func__, mvdev->virtio_id); +} + +static inline void mic_virtio_device_reset(struct mic_vdev *mvdev) +{ + int i; + + dev_info(mic_dev(mvdev), "%s: status %d device type %d RESET\n", + __func__, mvdev->dd->status, mvdev->virtio_id); + + for (i = 0; i < mvdev->dd->num_vq; i++) + /* + * Avoid lockdep false positive. The + 1 is for the mic + * mutex which is held in the reset devices code path. + */ + mutex_lock_nested(&mvdev->vr_mutex[i], i + 1); + + /* 0 status means "reset" */ + mvdev->dd->status = 0; + mvdev->dc->vdev_reset = 0; + mvdev->dc->host_ack = 1; + + for (i = 0; i < mvdev->dd->num_vq; i++) { + mvdev->vring[i].info->avail_idx = 0; + mvdev->vring[i].info->used_idx = 0; + } + + for (i = 0; i < mvdev->dd->num_vq; i++) + mutex_unlock(&mvdev->vr_mutex[i]); +} + +void mic_virtio_reset_devices(struct mic_device *mdev) +{ + struct list_head *pos, *tmp; + struct mic_vdev *mvdev; + + dev_info(&mdev->pdev->dev, "%s\n", __func__); + + WARN_ON(!mutex_is_locked(&mdev->mic_mutex)); + list_for_each_safe(pos, tmp, &mdev->vdev_list) { + mvdev = list_entry(pos, struct mic_vdev, list); + mic_virtio_device_reset(mvdev); + mvdev->poll_wake = 1; + wake_up(&mvdev->waitq); + } +} + +void mic_bh_handler(struct work_struct *work) +{ + struct mic_vdev *mvdev = container_of(work, struct mic_vdev, + virtio_bh_work); + + if (mvdev->dc->used_address_updated) + mic_virtio_init_post(mvdev); + + if (mvdev->dc->vdev_reset) + mic_virtio_device_reset(mvdev); + + mvdev->poll_wake = 1; + wake_up(&mvdev->waitq); +} + +static irqreturn_t mic_virtio_intr_handler(int irq, void *data) +{ + + struct mic_vdev *mvdev = data; + struct mic_device *mdev = mvdev->mdev; + + mdev->ops->ack_interrupt(mdev); + schedule_work(&mvdev->virtio_bh_work); + return IRQ_HANDLED; +} + +int mic_virtio_config_change(struct mic_vdev *mvdev, + void __user *argp) +{ + DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wake); + int ret = 0, retry = 100, i; + struct mic_bootparam *bootparam = mvdev->mdev->dp; + s8 db = bootparam->h2c_config_db; + + mutex_lock(&mvdev->mdev->mic_mutex); + for (i = 0; i < mvdev->dd->num_vq; i++) + mutex_lock_nested(&mvdev->vr_mutex[i], i + 1); + + if (db == -1 || mvdev->dd->type == -1) { + ret = -EIO; + goto exit; + } + + if (copy_from_user(mic_vq_configspace(mvdev->dd), + argp, mvdev->dd->config_len)) { + dev_err(mic_dev(mvdev), "%s %d err %d\n", + __func__, __LINE__, -EFAULT); + ret = -EFAULT; + goto exit; + } + mvdev->dc->config_change = MIC_VIRTIO_PARAM_CONFIG_CHANGED; + smp_wmb(); + mvdev->mdev->ops->send_intr(mvdev->mdev, db); + + for (i = retry; i--;) { + ret = wait_event_timeout(wake, + mvdev->dc->guest_ack, msecs_to_jiffies(100)); + if (ret) + break; + } + + dev_info(mic_dev(mvdev), + "%s %d retry: %d\n", __func__, __LINE__, retry); + mvdev->dc->config_change = 0; + mvdev->dc->guest_ack = 0; +exit: + for (i = 0; i < mvdev->dd->num_vq; i++) + mutex_unlock(&mvdev->vr_mutex[i]); + mutex_unlock(&mvdev->mdev->mic_mutex); + return ret; +} + +static int mic_copy_dp_entry(struct mic_vdev *mvdev, + void __user *argp, + __u8 *type, + struct mic_device_desc **devpage) +{ + struct mic_device *mdev = mvdev->mdev; + struct mic_device_desc dd, *dd_config, *devp; + struct mic_vqconfig *vqconfig; + int ret = 0, i; + bool slot_found = false; + + if (copy_from_user(&dd, argp, sizeof(dd))) { + dev_err(mic_dev(mvdev), "%s %d err %d\n", + __func__, __LINE__, -EFAULT); + return -EFAULT; + } + + if (mic_aligned_desc_size(&dd) > MIC_MAX_DESC_BLK_SIZE + || dd.num_vq > MIC_MAX_VRINGS) { + dev_err(mic_dev(mvdev), "%s %d err %d\n", + __func__, __LINE__, -EINVAL); + return -EINVAL; + } + + dd_config = kmalloc(mic_desc_size(&dd), GFP_KERNEL); + if (dd_config == NULL) { + dev_err(mic_dev(mvdev), "%s %d err %d\n", + __func__, __LINE__, -ENOMEM); + return -ENOMEM; + } + if (copy_from_user(dd_config, argp, mic_desc_size(&dd))) { + ret = -EFAULT; + dev_err(mic_dev(mvdev), "%s %d err %d\n", + __func__, __LINE__, ret); + goto exit; + } + + vqconfig = mic_vq_config(dd_config); + for (i = 0; i < dd.num_vq; i++) { + if (le16_to_cpu(vqconfig[i].num) > MIC_MAX_VRING_ENTRIES) { + ret = -EINVAL; + dev_err(mic_dev(mvdev), "%s %d err %d\n", + __func__, __LINE__, ret); + goto exit; + } + } + + /* Find the first free device page entry */ + for (i = mic_aligned_size(struct mic_bootparam); + i < MIC_DP_SIZE - mic_total_desc_size(dd_config); + i += mic_total_desc_size(devp)) { + devp = mdev->dp + i; + if (devp->type == 0 || devp->type == -1) { + slot_found = true; + break; + } + } + if (!slot_found) { + ret = -EINVAL; + dev_err(mic_dev(mvdev), "%s %d err %d\n", + __func__, __LINE__, ret); + goto exit; + } + + /* Save off the type before doing the memcpy. Type will be set in the + * end after completing all initialization for the new device */ + *type = dd_config->type; + dd_config->type = 0; + memcpy(devp, dd_config, mic_desc_size(dd_config)); + + *devpage = devp; +exit: + kfree(dd_config); + return ret; +} + +static void mic_init_device_ctrl(struct mic_vdev *mvdev, + struct mic_device_desc *devpage) +{ + struct mic_device_ctrl *dc; + + dc = mvdev->dc = (void *)devpage + mic_aligned_desc_size(devpage); + + dc->config_change = 0; + dc->guest_ack = 0; + dc->vdev_reset = 0; + dc->host_ack = 0; + dc->used_address_updated = 0; + dc->c2h_vdev_db = -1; + dc->h2c_vdev_db = -1; +} + +int mic_virtio_add_device(struct mic_vdev *mvdev, + void __user *argp) +{ + struct mic_device *mdev = mvdev->mdev; + struct mic_device_desc *dd; + struct mic_vqconfig *vqconfig; + int vr_size, i, j, ret; + u8 type; + s8 db; + char irqname[10]; + struct mic_bootparam *bootparam = mdev->dp; + u16 num; + + mutex_lock(&mdev->mic_mutex); + + ret = mic_copy_dp_entry(mvdev, argp, &type, &dd); + if (ret) { + mutex_unlock(&mdev->mic_mutex); + return ret; + } + + mic_init_device_ctrl(mvdev, dd); + + mvdev->dd = dd; + mvdev->virtio_id = type; + vqconfig = mic_vq_config(dd); + INIT_WORK(&mvdev->virtio_bh_work, mic_bh_handler); + + for (i = 0; i < dd->num_vq; i++) { + struct mic_vring *vr = &mvdev->vring[i]; + num = le16_to_cpu(vqconfig[i].num); + mutex_init(&mvdev->vr_mutex[i]); + vr_size = PAGE_ALIGN(vring_size(num, MIC_VIRTIO_RING_ALIGN) + + sizeof(struct _mic_vring_info)); + vr->va = (void *) + __get_free_pages(GFP_KERNEL | __GFP_ZERO, + get_order(vr_size)); + if (!vr->va) { + ret = -ENOMEM; + dev_err(mic_dev(mvdev), "%s %d err %d\n", + __func__, __LINE__, ret); + goto err; + } + vr->len = vr_size; + vr->info = vr->va + vring_size(num, MIC_VIRTIO_RING_ALIGN); + vr->info->magic = MIC_MAGIC + mvdev->virtio_id + i; + vqconfig[i].address = mic_map_single(mdev, + vr->va, vr_size); + if (mic_map_error(vqconfig[i].address)) { + free_pages((unsigned long)vr->va, + get_order(vr_size)); + ret = -ENOMEM; + dev_err(mic_dev(mvdev), "%s %d err %d\n", + __func__, __LINE__, ret); + goto err; + } + vqconfig[i].address = cpu_to_le64(vqconfig[i].address); + + vring_init(&vr->vr, num, + vr->va, MIC_VIRTIO_RING_ALIGN); + + dev_dbg(&mdev->pdev->dev, + "%s %d index %d va %p info %p vr_size 0x%x\n", + __func__, __LINE__, i, vr->va, vr->info, vr_size); + } + + snprintf(irqname, sizeof(irqname), + "mic%dvirtio%d", mdev->id, mvdev->virtio_id); + mvdev->virtio_db = mic_next_db(mdev); + mvdev->virtio_cookie = mic_request_irq(mdev, mic_virtio_intr_handler, + irqname, mvdev, mvdev->virtio_db, MIC_INTR_DB); + if (IS_ERR(mvdev->virtio_cookie)) { + ret = PTR_ERR(mvdev->virtio_cookie); + dev_dbg(&mdev->pdev->dev, "request irq failed\n"); + goto err; + } + + mvdev->dc->c2h_vdev_db = mvdev->virtio_db; + + list_add_tail(&mvdev->list, &mdev->vdev_list); + /* + * Now that we are completely initialized, set the type to "commit" + * the addition of the new device. + * For x86 we only need a compiler barrier before dd->type. For other + * platforms we need smp_wmb(..) since we are writing to system memory + * and type needs to be visible to all CPUs or MIC. + */ + smp_wmb(); + dd->type = type; + + dev_info(&mdev->pdev->dev, "Added virtio device id %d\n", dd->type); + + db = bootparam->h2c_config_db; + if (db != -1) + mdev->ops->send_intr(mdev, db); + mutex_unlock(&mdev->mic_mutex); + return 0; +err: + vqconfig = mic_vq_config(dd); + for (j = 0; j < i; j++) { + mic_unmap_single(mdev, le64_to_cpu(vqconfig[j].address), + mvdev->vring[j].len); + free_pages((unsigned long)mvdev->vring[j].va, + get_order(mvdev->vring[j].len)); + } + mutex_unlock(&mdev->mic_mutex); + return ret; +} + +void mic_virtio_del_device(struct mic_vdev *mvdev) +{ + struct list_head *pos, *tmp; + struct mic_vdev *tmp_mvdev; + struct mic_device *mdev = mvdev->mdev; + DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wake); + int i, ret, retry = 100; + struct mic_vqconfig *vqconfig; + struct mic_bootparam *bootparam = mdev->dp; + s8 db; + + mutex_lock(&mdev->mic_mutex); + db = bootparam->h2c_config_db; + if (db == -1) + goto skip_hot_remove; + dev_info(&mdev->pdev->dev, + "Requesting hot remove id %d\n", mvdev->virtio_id); + mvdev->dc->config_change = MIC_VIRTIO_PARAM_DEV_REMOVE; + smp_wmb(); + mdev->ops->send_intr(mdev, db); + for (i = retry; i--;) { + ret = wait_event_timeout(wake, + mvdev->dc->guest_ack, msecs_to_jiffies(100)); + if (ret) + break; + } + dev_info(&mdev->pdev->dev, + "Device id %d config_change %d guest_ack %d\n", + mvdev->virtio_id, mvdev->dc->config_change, + mvdev->dc->guest_ack); + mvdev->dc->config_change = 0; + mvdev->dc->guest_ack = 0; +skip_hot_remove: + mic_free_irq(mdev, mvdev->virtio_cookie, mvdev); + flush_work(&mvdev->virtio_bh_work); + vqconfig = mic_vq_config(mvdev->dd); + for (i = 0; i < mvdev->dd->num_vq; i++) { + mic_unmap_single(mdev, le64_to_cpu(vqconfig[i].address), + mvdev->vring[i].len); + free_pages((unsigned long)mvdev->vring[i].va, + get_order(mvdev->vring[i].len)); + } + + list_for_each_safe(pos, tmp, &mdev->vdev_list) { + tmp_mvdev = list_entry(pos, struct mic_vdev, list); + if (tmp_mvdev == mvdev) { + list_del(pos); + dev_info(&mdev->pdev->dev, + "Removing virtio device id %d\n", + mvdev->virtio_id); + break; + } + } + mvdev->dd->type = -1; + mutex_unlock(&mdev->mic_mutex); +} diff --git a/drivers/misc/mic/host/mic_virtio.h b/drivers/misc/mic/host/mic_virtio.h new file mode 100644 index 0000000..1e2a439 --- /dev/null +++ b/drivers/misc/mic/host/mic_virtio.h @@ -0,0 +1,108 @@ +/* + * Intel MIC Platform Software Stack (MPSS) + * + * Copyright(c) 2013 Intel Corporation. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License, version 2, as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 + * USA. + * + * The full GNU General Public License is included in this distribution in + * the file called "COPYING". + * + * Intel MIC Host driver. + * + */ +#ifndef MIC_VIRTIO_H +#define MIC_VIRTIO_H + +#include <linux/types.h> +#include <linux/virtio_ring.h> +#include <linux/virtio_config.h> + +#include <linux/mic_ioctl.h> + +/* + * Note on endianness. + * 1. Host can be both BE or LE + * 2. Guest/card is LE. Host uses le_to_cpu to access desc/avail + * rings and ioreadXX/iowriteXX to access used ring. + * 3. Device page exposed by host to guest contains LE values. Guest + * accesses these using ioreadXX/iowriteXX etc. This way in general we + * obey the virtio spec according to which guest works with native + * endianness and host is aware of guest endianness and does all + * required endianness conversion. + * 4. Data provided from user space to guest (in ADD_DEVICE and + * CONFIG_CHANGE ioctl's) is not interpreted by the driver and should be + * in guest endianness. + */ + +struct mic_vdev { + int virtio_id; + wait_queue_head_t waitq; + struct mic_device *mdev; + int poll_wake; + unsigned long out_bytes; + unsigned long in_bytes; + struct mic_vring vring[MIC_MAX_VRINGS]; + struct work_struct virtio_bh_work; + struct mutex vr_mutex[MIC_MAX_VRINGS]; + struct mic_device_desc *dd; + struct mic_device_ctrl *dc; + struct list_head list; + int virtio_db; + struct mic_irq *virtio_cookie; +}; + +void mic_virtio_uninit(struct mic_device *mdev); +int mic_virtio_add_device(struct mic_vdev *mvdev, + void __user *argp); +void mic_virtio_del_device(struct mic_vdev *mvdev); +int mic_virtio_config_change(struct mic_vdev *mvdev, + void __user *argp); +int mic_virtio_copy_desc(struct mic_vdev *mvdev, + struct mic_copy_desc *request); +void mic_virtio_reset_devices(struct mic_device *mdev); +int mic_virtio_copy_chain(struct mic_vdev *mvdev, + struct mic_copy *request); +void mic_bh_handler(struct work_struct *work); + +static inline struct device *mic_dev(struct mic_vdev *mvdev) +{ + return &mvdev->mdev->pdev->dev; +} + +static inline int mic_vdev_inited(struct mic_vdev *mvdev) +{ + /* Device has not been created yet */ + if (!mvdev->dd || !mvdev->dd->type) { + dev_err(mic_dev(mvdev), "%s %d err %d\n", + __func__, __LINE__, -EINVAL); + return -EINVAL; + } + + /* Device has been removed/deleted */ + if (mvdev->dd->type == -1) { + dev_err(mic_dev(mvdev), "%s %d err %d\n", + __func__, __LINE__, -ENODEV); + return -ENODEV; + } + + return 0; +} + +static inline bool mic_vdevup(struct mic_vdev *mvdev) +{ + return !!mvdev->dd->status; +} +#endif diff --git a/include/uapi/linux/Kbuild b/include/uapi/linux/Kbuild index 8f985dd..1579aab 100644 --- a/include/uapi/linux/Kbuild +++ b/include/uapi/linux/Kbuild @@ -240,6 +240,7 @@ header-y += mei.h header-y += mempolicy.h header-y += meye.h header-y += mic_common.h +header-y += mic_ioctl.h header-y += mii.h header-y += minix_fs.h header-y += mman.h diff --git a/include/uapi/linux/mic_common.h b/include/uapi/linux/mic_common.h index b8edede..2576d0b 100644 --- a/include/uapi/linux/mic_common.h +++ b/include/uapi/linux/mic_common.h @@ -26,7 +26,61 @@ #ifndef __MIC_COMMON_H_ #define __MIC_COMMON_H_ -#include <linux/types.h> +#include <linux/virtio_ring.h> + +#ifndef __KERNEL__ +#define ALIGN(a, x) (((a) + (x) - 1) & ~((x) - 1)) +#define __aligned(x) __attribute__ ((aligned(x))) +#endif + +#define mic_aligned_size(x) ALIGN(sizeof(x), 8) + + +/** + * struct mic_device_desc: Virtio device information shared between the + * virtio driver and userspace backend + * + * @type: Device type: console/network/disk etc. Type 0/-1 terminates. + * @num_vq: Number of virtqueues. + * @feature_len: Number of bytes of feature bits. Multiply by 2: one for + host features and one for guest acknowledgements. + * @config_len: Number of bytes of the config array after virtqueues. + * @status: A status byte, written by the Guest. + * @config: Start of the following variable length config. + */ +struct mic_device_desc { + __s8 type; + __u8 num_vq; + __u8 feature_len; + __u8 config_len; + __u8 status; + __u64 config[0]; +} __aligned(8); + +/** + * struct mic_device_ctrl: Per virtio device information in the device page + * used internally by the host and card side drivers. + * + * @vdev: Used for storing MIC vdev information by the guest. + * @config_change: Set to 1 by host when a config change is requested. + * @vdev_reset: Set to 1 by guest to indicate virtio device has been reset. + * @guest_ack: Set to 1 by guest to ack a command. + * @host_ack: Set to 1 by host to ack a command. + * @used_address_updated: Set to 1 by guest when the used address should be + * updated. + * @c2h_vdev_db: The doorbell number to be used by guest. Set by host. + * @h2c_vdev_db: The doorbell number to be used by host. Set by guest. + */ +struct mic_device_ctrl { + __u64 vdev; + __u8 config_change; + __u8 vdev_reset; + __u8 guest_ack; + __u8 host_ack; + __u8 used_address_updated; + __s8 c2h_vdev_db; + __s8 h2c_vdev_db; +} __aligned(8); /** * struct mic_bootparam: Virtio device independent information in device page @@ -47,6 +101,115 @@ struct mic_bootparam { __u8 shutdown_card; } __aligned(8); +/** + * struct mic_device_page: High level representation of the device page + * + * @bootparam: The bootparam structure is used for sharing information and + * status updates between MIC host and card drivers. + * @desc: Array of MIC virtio device descriptors. + */ +struct mic_device_page { + struct mic_bootparam bootparam; + struct mic_device_desc desc[0]; +}; +/** + * struct mic_vqconfig: This is how we expect the device configuration field + * for a virtqueue to be laid out in config space. + * + * @address: Guest/MIC physical address of the virtio ring + * (avail and desc rings) + * @used_address: Guest/MIC physical address of the used ring + * @num: The number of entries in the virtio_ring + */ +struct mic_vqconfig { + __u64 address; + __u64 used_address; + __u16 num; +} __aligned(8); + +/* The alignment to use between consumer and producer parts of vring. + * This is pagesize for historical reasons. */ +#define MIC_VIRTIO_RING_ALIGN 4096 + +#define MIC_MAX_VRINGS 4 +#define MIC_VRING_ENTRIES 128 + +/* + * Max vring entries (power of 2) to ensure desc and avail rings + * fit in a single page + */ +#define MIC_MAX_VRING_ENTRIES 128 + +/** + * Max size of the desc block in bytes: includes: + * - struct mic_device_desc + * - struct mic_vqconfig (num_vq of these) + * - host and guest features + * - virtio device config space + */ +#define MIC_MAX_DESC_BLK_SIZE 256 + +/** + * struct _mic_vring_info - Host vring info exposed to userspace backend + * + * @avail_idx: host avail idx + * @used_idx: host used idx + * @magic: A magic debug cookie. + */ +struct _mic_vring_info { + __u16 avail_idx; + __u16 used_idx; + int magic; +}; + +/** + * struct mic_vring - Vring information. + * + * @vr: The virtio ring. + * @info: Host vring information exposed to the card. + * @va: The va for the buffer allocated for vr and info. + * @len: The length of the buffer required for allocating vr and info. + */ +struct mic_vring { + struct vring vr; + struct _mic_vring_info *info; + void *va; + int len; +}; + +#define mic_aligned_desc_size(d) ALIGN(mic_desc_size(d), 8) + +#ifndef INTEL_MIC_CARD +static inline unsigned mic_desc_size(const struct mic_device_desc *desc) +{ + return mic_aligned_size(*desc) + + desc->num_vq * mic_aligned_size(struct mic_vqconfig) + + desc->feature_len * 2 + + desc->config_len; +} + +static inline struct mic_vqconfig * +mic_vq_config(const struct mic_device_desc *desc) +{ + return (struct mic_vqconfig *)(desc + 1); +} + +static inline __u8 *mic_vq_features(const struct mic_device_desc *desc) +{ + return (__u8 *)(mic_vq_config(desc) + desc->num_vq); +} + +static inline __u8 *mic_vq_configspace(const struct mic_device_desc *desc) +{ + return mic_vq_features(desc) + desc->feature_len * 2; +} +static inline unsigned mic_total_desc_size(struct mic_device_desc *desc) +{ + return mic_aligned_desc_size(desc) + + mic_aligned_size(struct mic_device_ctrl); +} +#endif + /* Device page size */ #define MIC_DP_SIZE 4096 diff --git a/include/uapi/linux/mic_ioctl.h b/include/uapi/linux/mic_ioctl.h new file mode 100644 index 0000000..02e1518 --- /dev/null +++ b/include/uapi/linux/mic_ioctl.h @@ -0,0 +1,104 @@ +/* + * Intel MIC Platform Software Stack (MPSS) + * + * Copyright(c) 2013 Intel Corporation. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License, version 2, as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 + * USA. + * + * The full GNU General Public License is included in this distribution in + * the file called "COPYING". + * + * Intel MIC Host driver. + * + */ +#ifndef _MIC_IOCTL_H_ +#define _MIC_IOCTL_H_ + +#include <linux/mic_common.h> + +/* + * mic_copy - MIC virtio descriptor copy. + * + * @iov: An array of IOVEC structures containing user space buffers. + * @iovcnt: Number of IOVEC structures in iov. + * @vr_idx: The vring index. + * @desc_idx: The starting desc index. + * @out_cookie: A cookie returned by the driver to identify this copy. + * @out_len: The aggregate of the total length written to or read from + * the virtio device. + */ +struct mic_copy { +#ifdef __KERNEL__ + struct iovec __user *iov; +#else + struct iovec *iov; +#endif + int iovcnt; + __u8 vr_idx; + __u32 desc_idx; + __u64 out_cookie; + __u32 out_len; +}; + +/* + * mic_copy_desc - MIC virtio copy. + * + * @copy - MIC virtio descriptor copy. + * @used_desc_idx - The desc index to update the used ring with. + * The used index is not updated if the used_idx is -1. + * @used_len - The length to update the used ring with. + */ +struct mic_copy_desc { + struct mic_copy copy; + __u32 used_desc_idx; + __u32 used_len; +}; + +/* + * Add a new virtio device + * The (struct mic_device_desc *) pointer points to a device page entry + * for the virtio device consisting of: + * - struct mic_device_desc + * - struct mic_vqconfig (num_vq of these) + * - host and guest features + * - virtio device config space + * The total size referenced by the pointer should equal the size returned + * by desc_size() in mic_common.h + */ +#define MIC_VIRTIO_ADD_DEVICE _IOWR('s', 1, struct mic_device_desc *) + +/* + * Copy the number of entries in the iovec and update the used index + * if requested by the user. + */ +#define MIC_VIRTIO_COPY_DESC _IOWR('s', 2, struct mic_copy_desc *) + +/* + * Copy iovec entries upto the length of the chain. The number of entries + * must be >= the length of the chain else -1 is returned and errno set + * to EINVAL. + */ +#define MIC_VIRTIO_COPY_CHAIN _IOWR('s', 3, struct mic_copy *) + +/* + * Notify virtio device of a config change + * The (__u8 *) pointer points to config space values for the device + * as they should be written into the device page. The total size + * referenced by the pointer should equal the config_len field of struct + * mic_device_desc. + */ +#define MIC_VIRTIO_CONFIG_CHANGE _IOWR('s', 5, __u8 *) + +#endif -- 1.8.2.1
Sudeep Dutt
2013-Jul-25 03:31 UTC
[PATCH 4/5] Intel MIC Card Driver Changes for Virtio Devices.
From: Ashutosh Dixit <ashutosh.dixit at intel.com> This patch introduces the card "Virtio over PCIe" interface for Intel MIC. It allows virtio drivers on the card to communicate with their user space backends on the host via a device page. Ring 3 apps on the host can add, remove and configure virtio devices. A thin MIC specific virtio_config_ops is implemented which is borrowed heavily from previous similar implementations in lguest and s390 @ drivers/lguest/lguest_device.c drivers/s390/kvm/kvm_virtio.c Co-author: Sudeep Dutt <sudeep.dutt at intel.com> Signed-off-by: Ashutosh Dixit <ashutosh.dixit at intel.com> Signed-off-by: Caz Yokoyama <Caz.Yokoyama at intel.com> Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli at intel.com> Signed-off-by: Nikhil Rao <nikhil.rao at intel.com> Signed-off-by: Harshavardhan R Kharche <harshavardhan.r.kharche at intel.com> Signed-off-by: Sudeep Dutt <sudeep.dutt at intel.com> Acked-by: Yaozu (Eddie) Dong <eddie.dong at intel.com> Reviewed-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr at intel.com> --- drivers/misc/mic/card/Makefile | 1 + drivers/misc/mic/card/mic_device.c | 7 + drivers/misc/mic/card/mic_virtio.c | 643 +++++++++++++++++++++++++++++++++++++ drivers/misc/mic/card/mic_virtio.h | 79 +++++ 4 files changed, 730 insertions(+) create mode 100644 drivers/misc/mic/card/mic_virtio.c create mode 100644 drivers/misc/mic/card/mic_virtio.h diff --git a/drivers/misc/mic/card/Makefile b/drivers/misc/mic/card/Makefile index 06007ba..cedb06a 100644 --- a/drivers/misc/mic/card/Makefile +++ b/drivers/misc/mic/card/Makefile @@ -8,3 +8,4 @@ obj-$(CONFIG_INTEL_MIC_CARD) += mic_card.o mic_card-$(CONFIG_INTEL_MIC_CARD_X100) += mic_x100.o mic_card-y += mic_device.o mic_card-y += mic_debugfs.o +mic_card-y += mic_virtio.o diff --git a/drivers/misc/mic/card/mic_device.c b/drivers/misc/mic/card/mic_device.c index 7bfe2e5..029fdbc 100644 --- a/drivers/misc/mic/card/mic_device.c +++ b/drivers/misc/mic/card/mic_device.c @@ -36,6 +36,7 @@ #include "mic_common.h" #include "mic_debugfs.h" +#include "mic_virtio.h" static struct mic_driver *g_drv; static struct mic_irq *shutdown_cookie; @@ -270,10 +271,15 @@ int mic_driver_init(struct mic_driver *mdrv) rc = mic_shutdown_init(); if (rc) goto irq_uninit; + rc = mic_devices_init(mdrv); + if (rc) + goto shutdown_uninit; mic_create_card_debug_dir(mdrv); atomic_notifier_chain_register(&panic_notifier_list, &mic_panic); done: return rc; +shutdown_uninit: + mic_shutdown_uninit(); irq_uninit: mic_uninit_irq(); dp_uninit: @@ -291,6 +297,7 @@ put: void mic_driver_uninit(struct mic_driver *mdrv) { mic_delete_card_debug_dir(mdrv); + mic_devices_uninit(mdrv); /* * Inform the host about the shutdown status i.e. poweroff/restart etc. * The module cannot be unloaded so the only code path to call diff --git a/drivers/misc/mic/card/mic_virtio.c b/drivers/misc/mic/card/mic_virtio.c new file mode 100644 index 0000000..8178fbf --- /dev/null +++ b/drivers/misc/mic/card/mic_virtio.c @@ -0,0 +1,643 @@ +/* + * Intel MIC Platform Software Stack (MPSS) + * + * Copyright(c) 2013 Intel Corporation. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License, version 2, as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 + * USA. + * + * The full GNU General Public License is included in this distribution in + * the file called "COPYING". + * + * Disclaimer: The codes contained in these modules may be specific to + * the Intel Software Development Platform codenamed: Knights Ferry, and + * the Intel product codenamed: Knights Corner, and are not backward + * compatible with other Intel products. Additionally, Intel will NOT + * support the codes or instruction set in future products. + * + * Adapted from: + * + * virtio for kvm on s390 + * + * Copyright IBM Corp. 2008 + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License (version 2 only) + * as published by the Free Software Foundation. + * + * Author(s): Christian Borntraeger <borntraeger at de.ibm.com> + * + * Intel MIC Card driver. + * + */ +#include <linux/module.h> +#include <linux/fs.h> +#include <linux/kernel_stat.h> +#include <linux/init.h> +#include <linux/bootmem.h> +#include <linux/err.h> +#include <linux/virtio.h> +#include <linux/virtio_config.h> +#include <linux/slab.h> +#include <linux/virtio_console.h> +#include <linux/interrupt.h> +#include <linux/virtio_ring.h> +#include <linux/pfn.h> +#include <linux/delay.h> +#include <linux/sched.h> +#include <linux/wait.h> +#include <linux/io.h> +#include "mic_common.h" +#include "mic_virtio.h" + +#define VIRTIO_SUBCODE_64 0x0D00 + +#define MIC_MAX_VRINGS 4 + +struct mic_vdev { + struct virtio_device vdev; + struct mic_device_desc __iomem *desc; + struct mic_device_ctrl __iomem *dc; + struct mic_device *mdev; + void __iomem *vr[MIC_MAX_VRINGS]; + int used_size[MIC_MAX_VRINGS]; + struct completion reset_done; + struct mic_irq *virtio_cookie; + int c2h_vdev_db; +}; + +static struct mic_irq *virtio_config_cookie; +#define to_micvdev(vd) container_of(vd, struct mic_vdev, vdev) + +static inline struct device *dev(struct mic_vdev *mvdev) +{ + return mvdev->vdev.dev.parent; +} + +/* This gets the device's feature bits. */ +static u32 mic_get_features(struct virtio_device *vdev) +{ + unsigned int i, bits; + u32 features = 0; + struct mic_device_desc __iomem *desc = to_micvdev(vdev)->desc; + u8 __iomem *in_features = mic_vq_features(desc); + int feature_len = ioread8(&desc->feature_len); + + bits = min_t(unsigned, feature_len, + sizeof(vdev->features)) * 8; + for (i = 0; i < bits; i++) + if (ioread8(&in_features[i / 8]) & (BIT(i % 8))) + features |= BIT(i); + + return features; +} + +static void mic_finalize_features(struct virtio_device *vdev) +{ + unsigned int i, bits; + struct mic_device_desc __iomem *desc = to_micvdev(vdev)->desc; + u8 feature_len = ioread8(&desc->feature_len); + /* Second half of bitmap is features we accept. */ + u8 __iomem *out_features + mic_vq_features(desc) + feature_len; + + /* Give virtio_ring a chance to accept features. */ + vring_transport_features(vdev); + + memset_io(out_features, 0, feature_len); + bits = min_t(unsigned, feature_len, + sizeof(vdev->features)) * 8; + for (i = 0; i < bits; i++) { + if (test_bit(i, vdev->features)) + iowrite8(ioread8(&out_features[i / 8]) | (1 << (i % 8)), + &out_features[i / 8]); + } +} + +/* + * Reading and writing elements in config space + */ +static void mic_get(struct virtio_device *vdev, unsigned int offset, + void *buf, unsigned len) +{ + struct mic_device_desc __iomem *desc = to_micvdev(vdev)->desc; + + if (offset + len > ioread8(&desc->config_len)) + return; + memcpy_fromio(buf, mic_vq_configspace(desc) + offset, len); +} + +static void mic_set(struct virtio_device *vdev, unsigned int offset, + const void *buf, unsigned len) +{ + struct mic_device_desc __iomem *desc = to_micvdev(vdev)->desc; + + if (offset + len > ioread8(&desc->config_len)) + return; + memcpy_toio(mic_vq_configspace(desc) + offset, buf, len); +} + +/* + * The operations to get and set the status word just access the status + * field of the device descriptor. set_status also interrupts the host + * to tell about status changes. + */ +static u8 mic_get_status(struct virtio_device *vdev) +{ + return ioread8(&to_micvdev(vdev)->desc->status); +} + +static void mic_set_status(struct virtio_device *vdev, u8 status) +{ + struct mic_vdev *mvdev = to_micvdev(vdev); + if (!status) + return; + iowrite8(status, &mvdev->desc->status); + mic_send_intr(mvdev->mdev, mvdev->c2h_vdev_db); +} + +/* Inform host on a virtio device reset and wait for ack from host */ +static void mic_reset_inform_host(struct virtio_device *vdev) +{ + struct mic_vdev *mvdev = to_micvdev(vdev); + struct mic_device_ctrl __iomem *dc = mvdev->dc; + int retry = 100, i; + + iowrite8(0, &dc->host_ack); + iowrite8(1, &dc->vdev_reset); + mic_send_intr(mvdev->mdev, mvdev->c2h_vdev_db); + + /* Wait till host completes all card accesses and acks the reset */ + for (i = retry; i--;) { + if (ioread8(&dc->host_ack)) + break; + msleep(100); + }; + + dev_info(dev(mvdev), "%s: retry: %d\n", __func__, retry); + + /* Reset status to 0 in case we timed out */ + iowrite8(0, &mvdev->desc->status); +} + +static void mic_reset(struct virtio_device *vdev) +{ + struct mic_vdev *mvdev = to_micvdev(vdev); + + dev_info(dev(mvdev), "%s: virtio id %d\n", __func__, vdev->id.device); + + mic_reset_inform_host(vdev); + complete_all(&mvdev->reset_done); +} + +/* + * The virtio_ring code calls this API when it wants to notify the Host. + */ +static void mic_notify(struct virtqueue *vq) +{ + struct mic_vdev *mvdev = vq->priv; + + mic_send_intr(mvdev->mdev, mvdev->c2h_vdev_db); +} + +static void mic_del_vq(struct virtqueue *vq, int n) +{ + struct mic_vdev *mvdev = to_micvdev(vq->vdev); + struct vring *vr = (struct vring *) (vq + 1); + + free_pages((unsigned long) vr->used, + get_order(mvdev->used_size[n])); + vring_del_virtqueue(vq); + mic_card_unmap(mvdev->mdev, mvdev->vr[n]); + mvdev->vr[n] = NULL; +} + +static void mic_del_vqs(struct virtio_device *vdev) +{ + struct mic_vdev *mvdev = to_micvdev(vdev); + struct virtqueue *vq, *n; + int idx = 0; + + dev_info(dev(mvdev), "%s\n", __func__); + + list_for_each_entry_safe(vq, n, &vdev->vqs, list) + mic_del_vq(vq, idx++); +} + +/* + * This routine will assign vring's allocated in host/io memory. Code in + * virtio_ring.c however continues to access this io memory as if it were local + * memory without io accessors. + */ +static struct virtqueue *mic_find_vq(struct virtio_device *vdev, + unsigned index, + void (*callback)(struct virtqueue *vq), + const char *name) +{ + struct mic_vdev *mvdev = to_micvdev(vdev); + struct mic_vqconfig __iomem *vqconfig; + struct mic_vqconfig config; + struct virtqueue *vq; + void __iomem *va; + struct _mic_vring_info __iomem *info; + void *used; + int vr_size, _vr_size, err, magic; + struct vring *vr; + u8 type = ioread8(&mvdev->desc->type); + + if (index >= ioread8(&mvdev->desc->num_vq)) + return ERR_PTR(-ENOENT); + + if (!name) + return ERR_PTR(-ENOENT); + + /* First assign the vring's allocated in host memory */ + vqconfig = mic_vq_config(mvdev->desc) + index; + memcpy_fromio(&config, vqconfig, sizeof(config)); + _vr_size = vring_size(config.num, MIC_VIRTIO_RING_ALIGN); + vr_size = PAGE_ALIGN(_vr_size + sizeof(struct _mic_vring_info)); + va = mic_card_map(mvdev->mdev, config.address, vr_size); + if (!va) + return ERR_PTR(-ENOMEM); + mvdev->vr[index] = va; + memset_io(va, 0x0, _vr_size); + vq = vring_new_virtqueue(index, + config.num, MIC_VIRTIO_RING_ALIGN, vdev, + false, + va, mic_notify, callback, name); + if (!vq) { + err = -ENOMEM; + goto unmap; + } + info = va + _vr_size; + magic = ioread32(&info->magic); + dev_info(dev(mvdev), + "%s: magic 0x%x type 0x%x index 0x%x expected 0x%x\n", + __func__, magic, type, index, MIC_MAGIC + type + index); + + if (WARN(magic != MIC_MAGIC + type + index, "magic mismatch")) { + err = -EIO; + goto unmap; + } + + /* Allocate and reassign used ring now */ + mvdev->used_size[index] = PAGE_ALIGN(sizeof(__u16) * 3 + + sizeof(struct vring_used_elem) * config.num); + used = (void *) __get_free_pages(GFP_KERNEL | __GFP_ZERO, + get_order(mvdev->used_size[index])); + if (!used) { + err = -ENOMEM; + dev_err(dev(mvdev), "%s %d err %d\n", __func__, __LINE__, err); + goto del_vq; + } + iowrite64(virt_to_phys(used), &vqconfig->used_address); + + /* + * To reassign the used ring here we are directly accessing + * struct vring_virtqueue which is a private data structure + * in virtio_ring.c. At the minimum, a BUILD_BUG_ON() in + * vring_new_virtqueue() would ensure that + * (&vq->vring == (struct vring *) (&vq->vq + 1)); + */ + vr = (struct vring *) (vq + 1); + vr->used = used; + + vq->priv = mvdev; + return vq; +del_vq: + vring_del_virtqueue(vq); +unmap: + mic_card_unmap(mvdev->mdev, mvdev->vr[index]); + return ERR_PTR(err); +} + +static int mic_find_vqs(struct virtio_device *vdev, unsigned nvqs, + struct virtqueue *vqs[], + vq_callback_t *callbacks[], + const char *names[]) +{ + struct mic_vdev *mvdev = to_micvdev(vdev); + struct mic_device_ctrl __iomem *dc = mvdev->dc; + int i, err, retry = 100; + + /* We must have this many virtqueues. */ + if (nvqs > ioread8(&mvdev->desc->num_vq)) + return -ENOENT; + + for (i = 0; i < nvqs; ++i) { + dev_info(dev(mvdev), "%s: %d: %s\n", __func__, i, names[i]); + vqs[i] = mic_find_vq(vdev, i, callbacks[i], names[i]); + if (IS_ERR(vqs[i])) { + err = PTR_ERR(vqs[i]); + goto error; + } + } + + iowrite8(1, &dc->used_address_updated); + + /* Send an interrupt to the host to inform it that used rings have + * been re-assigned */ + mic_send_intr(mvdev->mdev, mvdev->c2h_vdev_db); + for (i = retry; i--;) { + if (!ioread8(&dc->used_address_updated)) + break; + msleep(100); + }; + + dev_info(dev(mvdev), "%s: retry: %d\n", __func__, retry); + if (!retry) { + err = -ENODEV; + goto error; + } + + return 0; +error: + mic_del_vqs(vdev); + return err; +} + +/* + * The config ops structure as defined by virtio config + */ +static struct virtio_config_ops mic_vq_config_ops = { + .get_features = mic_get_features, + .finalize_features = mic_finalize_features, + .get = mic_get, + .set = mic_set, + .get_status = mic_get_status, + .set_status = mic_set_status, + .reset = mic_reset, + .find_vqs = mic_find_vqs, + .del_vqs = mic_del_vqs, +}; + +static irqreturn_t +mic_virtio_intr_handler(int irq, void *data) +{ + struct mic_vdev *mvdev = data; + struct virtqueue *vq; + + mic_ack_interrupt(mvdev->mdev); + list_for_each_entry(vq, &mvdev->vdev.vqs, list) + vring_interrupt(0, vq); + + return IRQ_HANDLED; +} + +static void mic_virtio_release_dev(struct device *_d) +{ + /* + * No need for a release method similar to virtio PCI. + * Provide an empty one to avoid getting a warning from core. + */ +} + +/* + * adds a new device and register it with virtio + * appropriate drivers are loaded by the device model + */ +static int add_mic_device(struct mic_device_desc __iomem *d, + unsigned int offset, struct mic_driver *mdrv) +{ + struct mic_vdev *mvdev; + int ret; + int virtio_db; + u8 type = ioread8(&d->type); + + mvdev = kzalloc(sizeof(*mvdev), GFP_KERNEL); + if (!mvdev) { + dev_err(mdrv->dev, "Cannot allocate mic dev %u type %u\n", + offset, type); + return -ENOMEM; + } + + mvdev->mdev = &mdrv->mdev; + mvdev->vdev.dev.parent = mdrv->dev; + mvdev->vdev.dev.release = mic_virtio_release_dev; + mvdev->vdev.id.device = type; + mvdev->vdev.config = &mic_vq_config_ops; + mvdev->desc = d; + mvdev->dc = (void __iomem *)d + mic_aligned_desc_size(d); + init_completion(&mvdev->reset_done); + + virtio_db = mic_next_card_db(); + mvdev->virtio_cookie = mic_request_card_irq(mic_virtio_intr_handler, + "virtio intr", mvdev, virtio_db); + if (IS_ERR(mvdev->virtio_cookie)) { + ret = PTR_ERR(mvdev->virtio_cookie); + goto kfree; + } + iowrite8((u8)virtio_db, &mvdev->dc->h2c_vdev_db); + mvdev->c2h_vdev_db = ioread8(&mvdev->dc->c2h_vdev_db); + + ret = register_virtio_device(&mvdev->vdev); + if (ret) { + dev_err(dev(mvdev), + "Failed to register mic device %u type %u\n", + offset, type); + goto free_irq; + } + iowrite64((u64)mvdev, &mvdev->dc->vdev); + dev_info(dev(mvdev), "%s: registered mic device %u type %u mvdev %p\n", + __func__, offset, type, mvdev); + + return 0; + +free_irq: + mic_free_card_irq(mvdev->virtio_cookie, mvdev); +kfree: + kfree(mvdev); + return ret; +} + +/* + * match for a mic device with a specific desc pointer + */ +static int match_desc(struct device *dev, void *data) +{ + struct virtio_device *vdev = dev_to_virtio(dev); + struct mic_vdev *mvdev = to_micvdev(vdev); + + return mvdev->desc == (void __iomem *)data; +} + +static void handle_mic_config_change(struct mic_device_desc __iomem *d, + unsigned int offset, struct mic_driver *mdrv) +{ + struct mic_device_ctrl __iomem *dc + = (void __iomem *)d + mic_aligned_desc_size(d); + struct mic_vdev *mvdev = (struct mic_vdev *)ioread64(&dc->vdev); + struct virtio_driver *drv; + + if (ioread8(&dc->config_change) != MIC_VIRTIO_PARAM_CONFIG_CHANGED) + return; + + dev_info(mdrv->dev, "%s %d\n", __func__, __LINE__); + drv = container_of(mvdev->vdev.dev.driver, + struct virtio_driver, driver); + if (drv->config_changed) + drv->config_changed(&mvdev->vdev); + iowrite8(1, &dc->guest_ack); +} + +/* + * removes a virtio device if a hot remove event has been + * requested by the host. + */ +static int remove_mic_device(struct mic_device_desc __iomem *d, + unsigned int offset, struct mic_driver *mdrv) +{ + struct mic_device_ctrl __iomem *dc + = (void __iomem *)d + mic_aligned_desc_size(d); + struct mic_vdev *mvdev = (struct mic_vdev *)ioread64(&dc->vdev); + u8 status; + int ret = -1; + + if (ioread8(&dc->config_change) == MIC_VIRTIO_PARAM_DEV_REMOVE) { + dev_info(mdrv->dev, + "%s %d config_change %d type %d mvdev %p\n", + __func__, __LINE__, + ioread8(&dc->config_change), ioread8(&d->type), mvdev); + + status = ioread8(&d->status); + INIT_COMPLETION(mvdev->reset_done); + unregister_virtio_device(&mvdev->vdev); + mic_free_card_irq(mvdev->virtio_cookie, mvdev); + if (status & VIRTIO_CONFIG_S_DRIVER_OK) + wait_for_completion(&mvdev->reset_done); + kfree(mvdev); + iowrite8(1, &dc->guest_ack); + dev_info(mdrv->dev, "%s %d guest_ack %d\n", + __func__, __LINE__, ioread8(&dc->guest_ack)); + ret = 0; + } + + return ret; +} + +#define REMOVE_DEVICES true + +static void scan_devices(struct mic_driver *mdrv, bool remove) +{ + s8 type; + unsigned int i; + struct mic_device_desc __iomem *d; + struct mic_device_ctrl __iomem *dc; + struct device *dev; + int ret; + + for (i = mic_aligned_size(struct mic_bootparam); + i < MIC_DP_SIZE; i += mic_total_desc_size(d)) { + d = mdrv->dp + i; + dc = (void __iomem *)d + mic_aligned_desc_size(d); + + type = ioread8(&d->type); + + /* end of list */ + if (type == 0) + break; + + if (type == -1) + continue; + + /* device already exists */ + dev = device_find_child(mdrv->dev, d, match_desc); + if (dev) { + if (remove) + iowrite8(MIC_VIRTIO_PARAM_DEV_REMOVE, + &dc->config_change); + put_device(dev); + handle_mic_config_change(d, i, mdrv); + ret = remove_mic_device(d, i, mdrv); + if (!ret && !remove) + iowrite8(-1, &d->type); + if (remove) { + iowrite8(0, &dc->config_change); + iowrite8(0, &dc->guest_ack); + } + continue; + } + + /* new device */ + dev_info(mdrv->dev, "%s %d Adding new virtio device %p\n", + __func__, __LINE__, d); + if (!remove) + add_mic_device(d, i, mdrv); + } +} + +/* + * hotplug_device tries to find changes in the device page. + */ +static void hotplug_devices(struct work_struct *work) +{ + struct mic_driver *mdrv = container_of(work, + struct mic_driver, hotplug_work); + + scan_devices(mdrv, !REMOVE_DEVICES); +} + +/* + * Interrupt handler for hot plug/config changes etc. + */ +static irqreturn_t +mic_extint_handler(int irq, void *data) +{ + struct mic_driver *mdrv = (struct mic_driver *)data; + + dev_dbg(mdrv->dev, "%s %d hotplug work\n", + __func__, __LINE__); + mic_ack_interrupt(&mdrv->mdev); + schedule_work(&mdrv->hotplug_work); + return IRQ_HANDLED; +} + +/* + * Init function for virtio + */ +int mic_devices_init(struct mic_driver *mdrv) +{ + int rc; + struct mic_bootparam __iomem *bootparam; + int config_db; + + INIT_WORK(&mdrv->hotplug_work, hotplug_devices); + scan_devices(mdrv, !REMOVE_DEVICES); + + config_db = mic_next_card_db(); + virtio_config_cookie = mic_request_card_irq(mic_extint_handler, + "virtio_config_intr", mdrv, config_db); + if (IS_ERR(virtio_config_cookie)) { + rc = PTR_ERR(virtio_config_cookie); + goto exit; + } + + bootparam = mdrv->dp; + iowrite8(config_db, &bootparam->h2c_config_db); + return 0; +exit: + return rc; +} + +/* + * Uninit function for virtio + */ +void mic_devices_uninit(struct mic_driver *mdrv) +{ + struct mic_bootparam __iomem *bootparam = mdrv->dp; + iowrite8(-1, &bootparam->h2c_config_db); + mic_free_card_irq(virtio_config_cookie, mdrv); + flush_work(&mdrv->hotplug_work); + scan_devices(mdrv, REMOVE_DEVICES); +} diff --git a/drivers/misc/mic/card/mic_virtio.h b/drivers/misc/mic/card/mic_virtio.h new file mode 100644 index 0000000..6e5e15b --- /dev/null +++ b/drivers/misc/mic/card/mic_virtio.h @@ -0,0 +1,79 @@ +/* + * Intel MIC Platform Software Stack (MPSS) + * + * Copyright(c) 2013 Intel Corporation. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License, version 2, as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 + * USA. + * + * The full GNU General Public License is included in this distribution in + * the file called "COPYING". + * + * Disclaimer: The codes contained in these modules may be specific to + * the Intel Software Development Platform codenamed: Knights Ferry, and + * the Intel product codenamed: Knights Corner, and are not backward + * compatible with other Intel products. Additionally, Intel will NOT + * support the codes or instruction set in future products. + * + * Intel MIC Card driver. + * + */ +#ifndef __MIC_CARD_VIRTIO_H +#define __MIC_CARD_VIRTIO_H + +/* + * 64 bit I/O access + */ +#ifndef ioread64 +#define ioread64 readq +#endif +#ifndef iowrite64 +#define iowrite64 writeq +#endif + +static inline unsigned mic_desc_size(struct mic_device_desc __iomem *desc) +{ + return mic_aligned_size(*desc) + + ioread8(&desc->num_vq) * mic_aligned_size(struct mic_vqconfig) + + ioread8(&desc->feature_len) * 2 + + ioread8(&desc->config_len); +} + +static inline struct mic_vqconfig __iomem * +mic_vq_config(struct mic_device_desc __iomem *desc) +{ + return (struct mic_vqconfig __iomem *)(desc + 1); +} + +static inline __u8 __iomem * +mic_vq_features(struct mic_device_desc __iomem *desc) +{ + return (__u8 __iomem *)(mic_vq_config(desc) + ioread8(&desc->num_vq)); +} + +static inline __u8 __iomem * +mic_vq_configspace(struct mic_device_desc __iomem *desc) +{ + return mic_vq_features(desc) + ioread8(&desc->feature_len) * 2; +} +static inline unsigned mic_total_desc_size(struct mic_device_desc __iomem *desc) +{ + return mic_aligned_desc_size(desc) + + mic_aligned_size(struct mic_device_ctrl); +} + +int mic_devices_init(struct mic_driver *mdrv); +void mic_devices_uninit(struct mic_driver *mdrv); + +#endif -- 1.8.2.1
Sudeep Dutt
2013-Jul-25 03:31 UTC
[PATCH 5/5] Sample Implementation of Intel MIC User Space Daemon.
From: Caz Yokoyama <Caz.Yokoyama at intel.com> This patch introduces a sample user space daemon which implements the virtio device backends on the host. The daemon creates/removes/configures virtio device backends by communicating with the Intel MIC Host Driver. The virtio devices currently supported are virtio net, virtio console and virtio block. The daemon also monitors card shutdown status and takes appropriate actions like killing the virtio backends and resetting the card upon card shutdown and crashes. Co-author: Ashutosh Dixit <ashutosh.dixit at intel.com> Co-author: Sudeep Dutt <sudeep.dutt at intel.com> Signed-off-by: Ashutosh Dixit <ashutosh.dixit at intel.com> Signed-off-by: Caz Yokoyama <Caz.Yokoyama at intel.com> Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli at intel.com> Signed-off-by: Nikhil Rao <nikhil.rao at intel.com> Signed-off-by: Harshavardhan R Kharche <harshavardhan.r.kharche at intel.com> Signed-off-by: Sudeep Dutt <sudeep.dutt at intel.com> Acked-by: Yaozu (Eddie) Dong <eddie.dong at intel.com> --- Documentation/mic/mic_overview.txt | 48 + Documentation/mic/mpssd/.gitignore | 1 + Documentation/mic/mpssd/Makefile | 20 + Documentation/mic/mpssd/micctrl | 157 ++++ Documentation/mic/mpssd/mpss | 246 +++++ Documentation/mic/mpssd/mpssd.c | 1732 ++++++++++++++++++++++++++++++++++++ Documentation/mic/mpssd/mpssd.h | 105 +++ Documentation/mic/mpssd/sysfs.c | 108 +++ 8 files changed, 2417 insertions(+) create mode 100644 Documentation/mic/mic_overview.txt create mode 100644 Documentation/mic/mpssd/.gitignore create mode 100644 Documentation/mic/mpssd/Makefile create mode 100755 Documentation/mic/mpssd/micctrl create mode 100755 Documentation/mic/mpssd/mpss create mode 100644 Documentation/mic/mpssd/mpssd.c create mode 100644 Documentation/mic/mpssd/mpssd.h create mode 100644 Documentation/mic/mpssd/sysfs.c diff --git a/Documentation/mic/mic_overview.txt b/Documentation/mic/mic_overview.txt new file mode 100644 index 0000000..8b1a916 --- /dev/null +++ b/Documentation/mic/mic_overview.txt @@ -0,0 +1,48 @@ +An Intel MIC X100 device is a PCIe form factor add-in coprocessor +card based on the Intel Many Integrated Core (MIC) architecture +that runs a Linux OS. It is a PCIe endpoint in a platform and therefore +implements the three required standard address spaces i.e. configuration, +memory and I/O. The host OS loads a device driver as is typical for +PCIe devices. The card itself runs a bootstrap after reset that +transfers control to the card OS downloaded from the host driver. +The card OS as shipped by Intel is a Linux kernel with modifications +for the X100 devices. + +Since it is a PCIe card, it does not have the ability to host hardware +devices for networking, storage and console. We provide these devices +on X100 coprocessors thus enabling a self-bootable equivalent environment +for applications. A key benefit of our solution is that it leverages +the standard virtio framework for network, disk and console devices, +though in our case the virtio framework is used across a PCIe bus. + +Here is a block diagram of the various components described above. The +virtio backends are situated on the host rather than the card given better +single threaded performance for the host compared to MIC and the ability of +the host to initiate DMA's to/from the card using the MIC DMA engine. + + | + +----------+ | +----------+ + | Card OS | | | Host OS | + +----------+ | +----------+ + | ++-------+ +--------+ +------+ | +---------+ +--------+ +--------+ +| Virtio| |Virtio | |Virtio| | |Virtio | |Virtio | |Virtio | +| Net | |Console | |Block | | |Net | |Console | |Block | +| Driver| |Driver | |Driver| | |backend | |backend | |backend | ++-------+ +--------+ +------+ | +---------+ +--------+ +--------+ + | | | | | | | + | | | |Ring 3| | | + | | | |------|------------|---------|------- + +-------------------+ |Ring 0+--------------------------+ + | | | Virtio over PCIe IOCTLs | + | | +--------------------------+ + +--------------+ | | + |Intel MIC | | +---------------+ + |Card Driver | | |Intel MIC | + +--------------+ | |Host Driver | + | | +---------------+ + | | | + +-------------------------------------------------------------+ + | | + | PCIe Bus | + +-------------------------------------------------------------+ diff --git a/Documentation/mic/mpssd/.gitignore b/Documentation/mic/mpssd/.gitignore new file mode 100644 index 0000000..8b7c72f --- /dev/null +++ b/Documentation/mic/mpssd/.gitignore @@ -0,0 +1 @@ +mpssd diff --git a/Documentation/mic/mpssd/Makefile b/Documentation/mic/mpssd/Makefile new file mode 100644 index 0000000..63e9edd --- /dev/null +++ b/Documentation/mic/mpssd/Makefile @@ -0,0 +1,20 @@ +# +# Makefile - Intel MIC User Space Tools. +# Copyright(c) 2013, Intel Corporation. +# +ifdef DEBUG +CFLAGS += $(USERWARNFLAGS) -I. -g -Wall -DDEBUG=$(DEBUG) +else +CFLAGS += $(USERWARNFLAGS) -I. -g -Wall +endif + +mpssd: mpssd.o sysfs.o + $(CC) $(CFLAGS) -o $@ $^ -lpthread + +install: + install mpss /etc/rc.d/init.d/mpss + install mpssd /usr/sbin/mpssd + install micctrl /usr/sbin/micctrl + +clean: + rm -f mpssd *.o diff --git a/Documentation/mic/mpssd/micctrl b/Documentation/mic/mpssd/micctrl new file mode 100755 index 0000000..8ecf0cd --- /dev/null +++ b/Documentation/mic/mpssd/micctrl @@ -0,0 +1,157 @@ +#!/bin/bash +# Intel MIC Platform Software Stack (MPSS) +# +# Copyright(c) 2013 Intel Corporation. +# +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License, version 2, as +# published by the Free Software Foundation. +# +# This program is distributed in the hope that it will be useful, but +# WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU +# General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write to the Free Software +# Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 +# USA. +# +# The full GNU General Public License is included in this distribution in +# the file called "COPYING". +# +# Intel MIC User Space Tools. +# +# micctrl - Controls MIC boot/start/stop. +# +# chkconfig: 2345 95 05 +# description: start MPSS stack processing. +# +### BEGIN INIT INFO +# Provides: micctrl +### END INIT INFO + +# Source function library. +. /etc/init.d/functions + +sysfs="/sys/class/mic" + +status() +{ + if [ "`echo $1 | head -c3`" == "mic" ]; then + f=$sysfs/$1 + echo -e $1 state: "`cat $f/state`" shutdown_status: "`cat $f/shutdown_status`" + return 0 + fi + + if [ -d "$sysfs" ]; then + for f in $sysfs/* + do + echo -e ""`basename $f`" state: "`cat $f/state`" shutdown_status: "`cat $f/shutdown_status`"" + done + fi + + return 0 +} + +reset() +{ + if [ "`echo $1 | head -c3`" == "mic" ]; then + f=$sysfs/$1 + echo reset > $f/state + return 0 + fi + + if [ -d "$sysfs" ]; then + for f in $sysfs/* + do + echo reset > $f/state + done + fi + + return 0 +} + +boot() +{ + if [ "`echo $1 | head -c3`" == "mic" ]; then + f=$sysfs/$1 + echo "boot:linux:mic/uos.img:mic/$1.image" > $f/state + return 0 + fi + + if [ -d "$sysfs" ]; then + for f in $sysfs/* + do + echo "boot:linux:mic/uos.img:mic/`basename $f`.image" > $f/state + done + fi + + return 0 +} + +shutdown() +{ + if [ "`echo $1 | head -c3`" == "mic" ]; then + f=$sysfs/$1 + echo shutdown > $f/state + return 0 + fi + + if [ -d "$sysfs" ]; then + for f in $sysfs/* + do + echo shutdown > $f/state + done + fi + + return 0 +} + +wait() +{ + if [ "`echo $1 | head -c3`" == "mic" ]; then + f=$sysfs/$1 + while [ "`cat $f/state`" != "offline" -a "`cat $f/state`" != "online" ] + do + sleep 1 + echo -e "Waiting for $1 to go offline" + done + return 0 + fi + + if [ -d "$sysfs" ]; then + # Wait for the cards to go offline + for f in $sysfs/* + do + while [ "`cat $f/state`" != "offline" -a "`cat $f/state`" != "online" ] + do + sleep 1 + echo -e "Waiting for "`basename $f`" to go offline" + done + done + fi +} + +case $1 in + -s) + status $2 + ;; + -r) + reset $2 + ;; + -b) + boot $2 + ;; + -S) + shutdown $2 + ;; + -w) + wait $2 + ;; + *) + echo $"Usage: $0 {-s (status) |-r (reset) |-b (boot) |-S (shutdown) |-w (wait)}" + exit 2 +esac + +exit $? diff --git a/Documentation/mic/mpssd/mpss b/Documentation/mic/mpssd/mpss new file mode 100755 index 0000000..81993b9 --- /dev/null +++ b/Documentation/mic/mpssd/mpss @@ -0,0 +1,246 @@ +#!/bin/bash +# Intel MIC Platform Software Stack (MPSS) +# +# Copyright(c) 2013 Intel Corporation. +# +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License, version 2, as +# published by the Free Software Foundation. +# +# This program is distributed in the hope that it will be useful, but +# WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU +# General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write to the Free Software +# Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 +# USA. +# +# The full GNU General Public License is included in this distribution in +# the file called "COPYING". +# +# Intel MIC User Space Tools. +# +# mpss Start mpssd. +# +# chkconfig: 2345 95 05 +# description: start MPSS stack processing. +# +### BEGIN INIT INFO +# Provides: mpss +# Required-Start: +# Required-Stop: +# Short-Description: MPSS stack control +# Description: MPSS stack control +### END INIT INFO + +# Source function library. +. /etc/init.d/functions + +exec=/usr/sbin/mpssd +sysfs="/sys/class/mic" + +start() +{ + [ -x $exec ] || exit 5 + + echo -e $"Starting MPSS Stack" + + echo -e $"Loading MIC_HOST Module" + + # Ensure the driver is loaded + [ -d "$sysfs" ] || modprobe mic_host + + if [ "`ps -e | awk '{print $4}' | grep mpssd | head -1`" = "mpssd" ]; then + echo -e $"MPSSD already running! " + success + echo + return 0; + fi + + # Start the daemon + echo -n $"Starting MPSSD" + $exec & + RETVAL=$? + if [ $RETVAL -ne 0 ]; then + failure + else + success + fi + echo + + sleep 5 + + # Boot the cards + if [ $RETVAL -eq 0 ]; then + for f in $sysfs/* + do + echo -ne "Booting "`basename $f`" " + echo "boot:linux:mic/uos.img:mic/`basename $f`.image" > $f/state + RETVAL=$? + if [ $RETVAL -ne 0 ]; then + failure + else + success + fi + echo + done + fi + + # Wait till ping works + if [ $RETVAL -eq 0 ]; then + for f in $sysfs/* + do + count=100 + while [ $count -ge 0 ] + do + echo -e "Pinging "`basename $f`" " + ping -c 1 "`cat $f/ipaddr`" &>/dev/null + RETVAL=$? + if [ $RETVAL -eq 0 ]; then + success + break + fi + sleep 1 + count=`expr $count - 1` + done + if [ $RETVAL -ne 0 ]; then + failure + else + success + fi + echo + done + fi + return $RETVAL +} + +stop() +{ + echo -e $"Shutting down MPSS Stack: " + + # Bail out if module is unloaded + if [ ! -d "$sysfs" ]; then + echo -n $"Module unloaded " + killall -9 mpssd 2>/dev/null + success + echo + return 0 + fi + + # Shut down the cards + for f in $sysfs/* + do + echo -e "Shutting down `basename $f` " + echo "shutdown" > $f/state 2>/dev/null + done + + # Wait for the cards to go offline + for f in $sysfs/* + do + while [ "`cat $f/state`" != "offline" ] + do + sleep 1 + echo -e "Waiting for "`basename $f`" to go offline" + done + done + + # Display the status of the cards + for f in $sysfs/* + do + echo -e ""`basename $f`" state: "`cat $f/state`"" + done + + sleep 5 + + # Kill MPSSD now + echo -n $"Killing MPSSD" + killall -9 mpssd 2>/dev/null + RETVAL=$? + if [ $RETVAL -ne 0 ]; then + failure + else + success + fi + echo + return $RETVAL +} + +restart() +{ + stop + sleep 5 + start +} + +status() +{ + if [ -d "$sysfs" ]; then + for f in $sysfs/* + do + echo -e ""`basename $f`" state: "`cat $f/state`"" + done + fi + + if [ "`ps -e | awk '{print $4}' | grep mpssd | head -n 1`" = "mpssd" ]; then + echo "mpssd is running" + else + echo "mpssd is stopped" + fi + return 0 +} + +unload() +{ + if [ ! -d "$sysfs" ]; then + echo -n $"No MIC_HOST Module: " + killall -9 mpssd 2>/dev/null + success + echo + return + fi + + stop + RETVAL=$? + + sleep 5 + echo -n $"Removing MIC_HOST Module: " + + if [ $RETVAL = 0 ]; then + sleep 1 + modprobe -r mic_host + RETVAL=$? + fi + + if [ $RETVAL -ne 0 ]; then + failure + else + success + fi + echo + return $RETVAL +} + +case $1 in + start) + start + ;; + stop) + stop + ;; + restart) + restart + ;; + status) + status + ;; + unload) + unload + ;; + *) + echo $"Usage: $0 {start|stop|restart|status|unload}" + exit 2 +esac + +exit $? diff --git a/Documentation/mic/mpssd/mpssd.c b/Documentation/mic/mpssd/mpssd.c new file mode 100644 index 0000000..fc912fe --- /dev/null +++ b/Documentation/mic/mpssd/mpssd.c @@ -0,0 +1,1732 @@ +/* + * Intel MIC Platform Software Stack (MPSS) + * + * Copyright(c) 2013 Intel Corporation. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License, version 2, as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 + * USA. + * + * The full GNU General Public License is included in this distribution in + * the file called "COPYING". + * + * Intel MIC User Space Tools. + */ + +#define _GNU_SOURCE + +#include <stdlib.h> +#include <fcntl.h> +#include <getopt.h> +#include <assert.h> +#include <unistd.h> +#include <stdbool.h> +#include <signal.h> +#include <poll.h> +#include <features.h> +#include <sys/types.h> +#include <sys/stat.h> +#include <sys/mman.h> +#include <sys/socket.h> +#include <linux/virtio_ring.h> +#include <linux/virtio_net.h> +#include <linux/virtio_console.h> +#include <linux/virtio_blk.h> +#include <linux/version.h> +#include "mpssd.h" +#include <linux/mic_ioctl.h> +#include <linux/mic_common.h> + +static void init_mic(struct mic_info *mic); + +static FILE *logfp; +static struct mic_info mic_list; + +#define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0])) + +#define min_t(type, x, y) ({ \ + type __min1 = (x); \ + type __min2 = (y); \ + __min1 < __min2 ? __min1 : __min2; }) + +/* align addr on a size boundary - adjust address up/down if needed */ +#define _ALIGN_UP(addr, size) (((addr)+((size)-1))&(~((size)-1))) +#define _ALIGN_DOWN(addr, size) ((addr)&(~((size)-1))) + +/* align addr on a size boundary - adjust address up if needed */ +#define _ALIGN(addr, size) _ALIGN_UP(addr, size) + +/* to align the pointer to the (next) page boundary */ +#define PAGE_ALIGN(addr) _ALIGN(addr, PAGE_SIZE) + +#define ACCESS_ONCE(x) (*(volatile typeof(x) *)&(x)) + +/* Insert REP NOP (PAUSE) in busy-wait loops. */ +static inline void cpu_relax(void) +{ + asm volatile("rep; nop" : : : "memory"); +} + +#define GSO_ENABLED 1 +#define MAX_GSO_SIZE (64 * 1024) +#define ETH_H_LEN 14 +#define MAX_NET_PKT_SIZE (_ALIGN_UP(MAX_GSO_SIZE + ETH_H_LEN, 64)) +#define MIC_DEVICE_PAGE_END 0x1000 + +#ifndef VIRTIO_NET_HDR_F_DATA_VALID +#define VIRTIO_NET_HDR_F_DATA_VALID 2 /* Csum is valid */ +#endif + +static struct { + struct mic_device_desc dd; + struct mic_vqconfig vqconfig[2]; + __u32 host_features, guest_acknowledgements; + struct virtio_console_config cons_config; +} virtcons_dev_page = { + .dd = { + .type = VIRTIO_ID_CONSOLE, + .num_vq = ARRAY_SIZE(virtcons_dev_page.vqconfig), + .feature_len = sizeof(virtcons_dev_page.host_features), + .config_len = sizeof(virtcons_dev_page.cons_config), + }, + .vqconfig[0] = { + .num = htole16(MIC_VRING_ENTRIES), + }, + .vqconfig[1] = { + .num = htole16(MIC_VRING_ENTRIES), + }, +}; + +static struct { + struct mic_device_desc dd; + struct mic_vqconfig vqconfig[2]; + __u32 host_features, guest_acknowledgements; + struct virtio_net_config net_config; +} virtnet_dev_page = { + .dd = { + .type = VIRTIO_ID_NET, + .num_vq = ARRAY_SIZE(virtnet_dev_page.vqconfig), + .feature_len = sizeof(virtnet_dev_page.host_features), + .config_len = sizeof(virtnet_dev_page.net_config), + }, + .vqconfig[0] = { + .num = htole16(MIC_VRING_ENTRIES), + }, + .vqconfig[1] = { + .num = htole16(MIC_VRING_ENTRIES), + }, +#if GSO_ENABLED + .host_features = htole32( + 1 << VIRTIO_NET_F_CSUM | + 1 << VIRTIO_NET_F_GSO | + 1 << VIRTIO_NET_F_GUEST_TSO4 | + 1 << VIRTIO_NET_F_GUEST_TSO6 | + 1 << VIRTIO_NET_F_GUEST_ECN | + 1 << VIRTIO_NET_F_GUEST_UFO), +#else + .host_features = 0, +#endif +}; + +static const char *mic_config_dir = "/etc/sysconfig/mic"; +static const char *virtblk_backend = "VIRTBLK_BACKEND"; +static struct { + struct mic_device_desc dd; + struct mic_vqconfig vqconfig[1]; + __u32 host_features, guest_acknowledgements; + struct virtio_blk_config blk_config; +} virtblk_dev_page = { + .dd = { + .type = VIRTIO_ID_BLOCK, + .num_vq = ARRAY_SIZE(virtblk_dev_page.vqconfig), + .feature_len = sizeof(virtblk_dev_page.host_features), + .config_len = sizeof(virtblk_dev_page.blk_config), + }, + .vqconfig[0] = { + .num = htole16(MIC_VRING_ENTRIES), + }, + .host_features + htole32(1<<VIRTIO_BLK_F_SEG_MAX), + .blk_config = { + .seg_max = htole32(MIC_VRING_ENTRIES - 2), + .capacity = htole64(0), + } +}; + +static char *myname; + +static int +tap_configure(struct mic_info *mic, char *dev) +{ + pid_t pid; + char *ifargv[7]; + char ipaddr[IFNAMSIZ]; + int ret = 0; + + pid = fork(); + if (pid == 0) { + ifargv[0] = "ip"; + ifargv[1] = "link"; + ifargv[2] = "set"; + ifargv[3] = dev; + ifargv[4] = "up"; + ifargv[5] = NULL; + mpsslog("Configuring %s\n", dev); + ret = execvp("ip", ifargv); + if (ret < 0) { + mpsslog("%s execvp failed errno %s\n", + mic->name, strerror(errno)); + return ret; + } + } + if (pid < 0) { + mpsslog("%s fork failed errno %s\n", + mic->name, strerror(errno)); + return ret; + } + + ret = waitpid(pid, NULL, 0); + if (ret < 0) { + mpsslog("%s waitpid failed errno %s\n", + mic->name, strerror(errno)); + return ret; + } + + snprintf(ipaddr, IFNAMSIZ, "172.31.%d.254/24", mic->id); + + pid = fork(); + if (pid == 0) { + ifargv[0] = "ip"; + ifargv[1] = "addr"; + ifargv[2] = "add"; + ifargv[3] = ipaddr; + ifargv[4] = "dev"; + ifargv[5] = dev; + ifargv[6] = NULL; + mpsslog("Configuring %s ipaddr %s\n", dev, ipaddr); + ret = execvp("ip", ifargv); + if (ret < 0) { + mpsslog("%s execvp failed errno %s\n", + mic->name, strerror(errno)); + return ret; + } + } + if (pid < 0) { + mpsslog("%s fork failed errno %s\n", + mic->name, strerror(errno)); + return ret; + } + + ret = waitpid(pid, NULL, 0); + if (ret < 0) { + mpsslog("%s waitpid failed errno %s\n", + mic->name, strerror(errno)); + return ret; + } + mpsslog("MIC name %s %s %d DONE!\n", + mic->name, __func__, __LINE__); + return 0; +} + +static int tun_alloc(struct mic_info *mic, char *dev) +{ + struct ifreq ifr; + int fd, err; +#if GSO_ENABLED + unsigned offload; +#endif + fd = open("/dev/net/tun", O_RDWR); + if (fd < 0) { + mpsslog("Could not open /dev/net/tun %s\n", strerror(errno)); + goto done; + } + + memset(&ifr, 0, sizeof(ifr)); + + ifr.ifr_flags = IFF_TAP | IFF_NO_PI | IFF_VNET_HDR; + if (*dev) + strncpy(ifr.ifr_name, dev, IFNAMSIZ); + + err = ioctl(fd, TUNSETIFF, (void *) &ifr); + if (err < 0) { + mpsslog("%s %s %d TUNSETIFF failed %s\n", + mic->name, __func__, __LINE__, strerror(errno)); + close(fd); + return err; + } +#if GSO_ENABLED + offload = TUN_F_CSUM | TUN_F_TSO4 | TUN_F_TSO6 | + TUN_F_TSO_ECN | TUN_F_UFO; + + err = ioctl(fd, TUNSETOFFLOAD, offload); + if (err < 0) { + mpsslog("%s %s %d TUNSETOFFLOAD failed %s\n", + mic->name, __func__, __LINE__, strerror(errno)); + close(fd); + return err; + } +#endif + strcpy(dev, ifr.ifr_name); + mpsslog("Created TAP %s\n", dev); +done: + return fd; +} + +#define NET_FD_VIRTIO_NET 0 +#define NET_FD_TUN 1 +#define MAX_NET_FD 2 + +#define USE_MIC_VIRTIO_COPY_CHAIN 0 + +static void * * +get_dp(struct mic_info *mic, int type) +{ + switch (type) { + case VIRTIO_ID_CONSOLE: + return &mic->mic_console.console_dp; + case VIRTIO_ID_NET: + return &mic->mic_net.net_dp; + case VIRTIO_ID_BLOCK: + return &mic->mic_virtblk.block_dp; + } + mpsslog("%s %s %d not found\n", mic->name, __func__, type); + assert(0); + return NULL; +} + +static struct mic_device_desc *get_device_desc(struct mic_info *mic, int type) +{ + struct mic_device_desc *d; + int i; + void *dp = *get_dp(mic, type); + + for (i = mic_aligned_size(struct mic_bootparam); i < PAGE_SIZE; + i += mic_total_desc_size(d)) { + d = dp + i; + + /* End of list */ + if (d->type == 0) + break; + + if (d->type == -1) + continue; + + mpsslog("%s %s d-> type %d d %p\n", + mic->name, __func__, d->type, d); + + if (d->type == (__u8)type) + return d; + } + mpsslog("%s %s %d not found\n", mic->name, __func__, type); + assert(0); + return NULL; +} + +/* See comments in vhost.c for explanation of next_desc() */ +static unsigned next_desc(struct vring_desc *desc) +{ + unsigned int next; + + if (!(le16toh(desc->flags) & VRING_DESC_F_NEXT)) + return -1U; + next = le16toh(desc->next); + return next; +} + +/* Sum up all the IOVEC length */ +static ssize_t +sum_iovec_len(struct mic_copy *copy) +{ + ssize_t sum = 0; + int i; + + for (i = 0; i < copy->iovcnt; i++) + sum += copy->iov[i].iov_len; + return sum; +} + +static inline void verify_out_len(struct mic_info *mic, + struct mic_copy *copy) +{ + if (copy->out_len != sum_iovec_len(copy)) { + mpsslog("%s %s %d BUG copy->out_len 0x%x len 0x%x\n", + mic->name, __func__, __LINE__, + copy->out_len, sum_iovec_len(copy)); + assert(copy->out_len == sum_iovec_len(copy)); + } +} + +/* Display an iovec */ +static void +disp_iovec(struct mic_info *mic, struct mic_copy *copy, + const char *s, int line) +{ + int i; + + for (i = 0; i < copy->iovcnt; i++) + mpsslog("%s %s %d copy->iov[%d] addr %p len 0x%lx\n", + mic->name, s, line, i, + copy->iov[i].iov_base, copy->iov[i].iov_len); +} + +static inline __u16 read_avail_idx(struct mic_vring *vr) +{ + return ACCESS_ONCE(vr->info->avail_idx); +} + +static inline void txrx_prepare(int type, bool tx, struct mic_vring *vr, + struct mic_copy_desc *chain, ssize_t len) +{ + struct mic_copy *copy = &chain->copy; + __u16 avail_idx; + + copy->vr_idx = tx ? 0 : 1; + avail_idx = read_avail_idx(vr) & (vr->vr.num - 1); + chain->used_desc_idx = copy->desc_idx + le16toh(vr->vr.avail->ring[avail_idx]); + if (type == VIRTIO_ID_NET) + copy->iov[1].iov_len = len - sizeof(struct virtio_net_hdr); + else + copy->iov[0].iov_len = len; + chain->used_len = len; +} + +/* Central API which triggers the copies */ +static int +mic_virtio_copy(struct mic_info *mic, int fd, + struct mic_vring *vr, struct mic_copy_desc *chain) +{ + int ret; + +#if USE_MIC_VIRTIO_COPY_CHAIN + /* Copy an entire chain at once */ + ret = ioctl(fd, MIC_VIRTIO_COPY_CHAIN, &chain->copy); +#else + ret = ioctl(fd, MIC_VIRTIO_COPY_DESC, chain); + if (ret) { + mpsslog("%s %s %d errno %s ret %d\n", + mic->name, __func__, __LINE__, + strerror(errno), ret); + } +#endif + return ret; +} + +/* + * This initialization routine requires at least one + * vring i.e. vr0. vr1 is optional. + */ +static void * +init_vr(struct mic_info *mic, int fd, int type, + struct mic_vring *vr0, struct mic_vring *vr1, int num_vq) +{ + int vr_size; + char *va; + + vr_size = PAGE_ALIGN(vring_size(MIC_VRING_ENTRIES, + MIC_VIRTIO_RING_ALIGN) + sizeof(struct _mic_vring_info)); + va = mmap(NULL, MIC_DEVICE_PAGE_END + vr_size * num_vq, + PROT_READ, MAP_SHARED, fd, 0); + if (MAP_FAILED == va) { + mpsslog("%s %s %d mmap failed errno %s\n", + mic->name, __func__, __LINE__, + strerror(errno)); + goto done; + } + *get_dp(mic, type) = (void *)va; + vr0->va = (struct mic_vring *)&va[MIC_DEVICE_PAGE_END]; + vr0->info = vr0->va + + vring_size(MIC_VRING_ENTRIES, MIC_VIRTIO_RING_ALIGN); + vring_init(&vr0->vr, + MIC_VRING_ENTRIES, vr0->va, MIC_VIRTIO_RING_ALIGN); + mpsslog("%s %s vr0 %p vr0->info %p vr_size 0x%x vring 0x%x ", + __func__, mic->name, vr0->va, vr0->info, vr_size, + vring_size(MIC_VRING_ENTRIES, MIC_VIRTIO_RING_ALIGN)); + mpsslog("magic 0x%x expected 0x%x\n", + vr0->info->magic, MIC_MAGIC + type + 0); + assert(vr0->info->magic == MIC_MAGIC + type + 0); + if (vr1) { + vr1->va = (struct mic_vring *) + &va[MIC_DEVICE_PAGE_END + vr_size]; + vr1->info = vr1->va + vring_size(MIC_VRING_ENTRIES, + MIC_VIRTIO_RING_ALIGN); + vring_init(&vr1->vr, + MIC_VRING_ENTRIES, vr1->va, MIC_VIRTIO_RING_ALIGN); + mpsslog("%s %s vr1 %p vr1->info %p vr_size 0x%x vring 0x%x ", + __func__, mic->name, vr1->va, vr1->info, vr_size, + vring_size(MIC_VRING_ENTRIES, MIC_VIRTIO_RING_ALIGN)); + mpsslog("magic 0x%x expected 0x%x\n", + vr1->info->magic, MIC_MAGIC + type + 1); + assert(vr1->info->magic == MIC_MAGIC + type + 1); + } +done: + return va; +} + +static void +uninit_vr(struct mic_info *mic, int num_vq) +{ + int vr_size, ret; + + vr_size = PAGE_ALIGN(vring_size(MIC_VRING_ENTRIES, + MIC_VIRTIO_RING_ALIGN) + sizeof(struct _mic_vring_info)); + ret = munmap(mic->mic_virtblk.block_dp, + MIC_DEVICE_PAGE_END + vr_size * num_vq); + if (ret < 0) + mpsslog("%s munmap errno %d\n", mic->name, errno); +} + +static void +wait_for_card_driver(struct mic_info *mic, int fd, int type) +{ + struct pollfd pollfd; + int err; + struct mic_device_desc *desc = get_device_desc(mic, type); + + pollfd.fd = fd; + mpsslog("%s %s Waiting .... desc-> type %d status 0x%x\n", + mic->name, __func__, type, desc->status); + while (1) { + pollfd.events = POLLIN; + pollfd.revents = 0; + err = poll(&pollfd, 1, -1); + if (err < 0) { + mpsslog("%s %s poll failed %s\n", + mic->name, __func__, strerror(errno)); + continue; + } + + if (pollfd.revents) { + mpsslog("%s %s Waiting... desc-> type %d status 0x%x\n", + mic->name, __func__, type, desc->status); + if (desc->status & VIRTIO_CONFIG_S_DRIVER_OK) { + mpsslog("%s %s poll.revents %d\n", + mic->name, __func__, pollfd.revents); + mpsslog("%s %s desc-> type %d status 0x%x\n", + mic->name, __func__, type, + desc->status); + break; + } + } + } +} + +/* Spin till we have some descriptors */ +static void +wait_for_descriptors(struct mic_info *mic, struct mic_vring *vr) +{ + __u16 avail_idx = read_avail_idx(vr); + + while (avail_idx == le16toh(ACCESS_ONCE(vr->vr.avail->idx))) { +#ifdef DEBUG + mpsslog("%s %s waiting for desc avail %d info_avail %d\n", + mic->name, __func__, + le16toh(vr->vr.avail->idx), vr->info->avail_idx); +#endif + cpu_relax(); + } +} + +static void * +virtio_net(void *arg) +{ + static __u8 vnet_hdr[2][sizeof(struct virtio_net_hdr)]; + static __u8 vnet_buf[2][MAX_NET_PKT_SIZE] __aligned(64); + struct iovec vnet_iov[2][2] = { + { { .iov_base = vnet_hdr[0], .iov_len = sizeof(vnet_hdr[0]) }, + { .iov_base = vnet_buf[0], .iov_len = sizeof(vnet_buf[0]) } }, + { { .iov_base = vnet_hdr[1], .iov_len = sizeof(vnet_hdr[1]) }, + { .iov_base = vnet_buf[1], .iov_len = sizeof(vnet_buf[1]) } }, + }; + struct iovec *iov0 = vnet_iov[0], *iov1 = vnet_iov[1]; + struct mic_info *mic = (struct mic_info *)arg; + char if_name[IFNAMSIZ]; + struct pollfd net_poll[MAX_NET_FD]; + struct mic_vring tx_vr, rx_vr; + struct mic_copy_desc chain; + struct mic_copy *copy = &chain.copy; + struct mic_device_desc *desc; + int err; + + snprintf(if_name, IFNAMSIZ, "mic%d", mic->id); + mic->mic_net.tap_fd = tun_alloc(mic, if_name); + if (mic->mic_net.tap_fd < 0) + goto done; + + if (tap_configure(mic, if_name)) + goto done; + mpsslog("MIC name %s id %d\n", mic->name, mic->id); + + net_poll[NET_FD_VIRTIO_NET].fd = mic->mic_net.virtio_net_fd; + net_poll[NET_FD_VIRTIO_NET].events = POLLIN; + net_poll[NET_FD_TUN].fd = mic->mic_net.tap_fd; + net_poll[NET_FD_TUN].events = POLLIN; + + if (MAP_FAILED == init_vr(mic, mic->mic_net.virtio_net_fd, + VIRTIO_ID_NET, &tx_vr, &rx_vr, + virtnet_dev_page.dd.num_vq)) { + mpsslog("%s init_vr failed %s\n", + mic->name, strerror(errno)); + goto done; + } + + copy->iovcnt = 2; + desc = get_device_desc(mic, VIRTIO_ID_NET); + + while (1) { + ssize_t len; + + net_poll[NET_FD_VIRTIO_NET].revents = 0; + net_poll[NET_FD_TUN].revents = 0; + + /* Start polling for data from tap and virtio net */ + err = poll(net_poll, 2, -1); + if (err < 0) { + mpsslog("%s poll failed %s\n", + __func__, strerror(errno)); + continue; + } + if (!(desc->status & VIRTIO_CONFIG_S_DRIVER_OK)) + wait_for_card_driver(mic, mic->mic_net.virtio_net_fd, + VIRTIO_ID_NET); + /* + * Check if there is data to be read from TUN and write to + * virtio net fd if there is. + */ + if (net_poll[NET_FD_TUN].revents & POLLIN) { + copy->iov = iov0; + len = readv(net_poll[NET_FD_TUN].fd, + copy->iov, copy->iovcnt); + if (len > 0) { + struct virtio_net_hdr *hdr + = (struct virtio_net_hdr *) vnet_hdr[0]; + + /* Disable checksums on the card since we are on + a reliable PCIe link */ + hdr->flags |= VIRTIO_NET_HDR_F_DATA_VALID; +#ifdef DEBUG + mpsslog("%s %s %d hdr->flags 0x%x ", mic->name, + __func__, __LINE__, hdr->flags); + mpsslog("copy->out_len %d hdr->gso_type 0x%x\n", + copy->out_len, hdr->gso_type); +#endif +#ifdef DEBUG + disp_iovec(mic, copy, __func__, __LINE__); + mpsslog("%s %s %d read from tap 0x%lx\n", + mic->name, __func__, __LINE__, + len); +#endif + wait_for_descriptors(mic, &tx_vr); + txrx_prepare(VIRTIO_ID_NET, 1, &tx_vr, &chain, + len); + + err = mic_virtio_copy(mic, + mic->mic_net.virtio_net_fd, &tx_vr, + &chain); + if (err < 0) { + mpsslog("%s %s %d mic_virtio_copy %s\n", + mic->name, __func__, __LINE__, + strerror(errno)); + } + if (!err) + verify_out_len(mic, copy); +#ifdef DEBUG + disp_iovec(mic, copy, __func__, __LINE__); + mpsslog("%s %s %d wrote to net 0x%lx\n", + mic->name, __func__, __LINE__, + sum_iovec_len(copy)); +#endif + /* Reinitialize IOV for next run */ + iov0[1].iov_len = MAX_NET_PKT_SIZE; + } else if (len < 0) { + disp_iovec(mic, copy, __func__, __LINE__); + mpsslog("%s %s %d read failed %s ", mic->name, + __func__, __LINE__, strerror(errno)); + mpsslog("cnt %d sum %d\n", + copy->iovcnt, sum_iovec_len(copy)); + } + } + + /* + * Check if there is data to be read from virtio net and + * write to TUN if there is. + */ + if (net_poll[NET_FD_VIRTIO_NET].revents & POLLIN) { + while (rx_vr.info->avail_idx !+ le16toh(rx_vr.vr.avail->idx)) { + copy->iov = iov1; + txrx_prepare(VIRTIO_ID_NET, 0, &rx_vr, &chain, + MAX_NET_PKT_SIZE + + sizeof(struct virtio_net_hdr)); + + err = mic_virtio_copy(mic, + mic->mic_net.virtio_net_fd, &rx_vr, + &chain); + if (!err) { +#ifdef DEBUG + struct virtio_net_hdr *hdr + = (struct virtio_net_hdr *) + vnet_hdr[1]; + + mpsslog("%s %s %d hdr->flags 0x%x, ", + mic->name, __func__, __LINE__, + hdr->flags); + mpsslog("out_len %d gso_type 0x%x\n", + copy->out_len, + hdr->gso_type); +#endif + /* Set the correct output iov_len */ + iov1[1].iov_len = copy->out_len - + sizeof(struct virtio_net_hdr); + verify_out_len(mic, copy); +#ifdef DEBUG + disp_iovec(mic, copy, __func__, + __LINE__); + mpsslog("%s %s %d ", + mic->name, __func__, __LINE__); + mpsslog("read from net 0x%lx\n", + sum_iovec_len(copy)); +#endif + len = writev(net_poll[NET_FD_TUN].fd, + copy->iov, copy->iovcnt); + if (len != sum_iovec_len(copy)) { + mpsslog("Tun write failed %s ", + strerror(errno)); + mpsslog("len 0x%x ", len); + mpsslog("read_len 0x%x\n", + sum_iovec_len(copy)); + } else { +#ifdef DEBUG + disp_iovec(mic, copy, __func__, + __LINE__); + mpsslog("%s %s %d ", + mic->name, __func__, + __LINE__); + mpsslog("wrote to tap 0x%lx\n", + len); +#endif + } + } else { + mpsslog("%s %s %d mic_virtio_copy %s\n", + mic->name, __func__, __LINE__, + strerror(errno)); + break; + } + } + } + if (net_poll[NET_FD_VIRTIO_NET].revents & POLLERR) { + mpsslog("%s: %s: POLLERR\n", __func__, mic->name); + sleep(1); + } + } +done: + pthread_exit(NULL); +} + +/* virtio_console */ +#define VIRTIO_CONSOLE_FD 0 +#define MONITOR_FD (VIRTIO_CONSOLE_FD + 1) +#define MAX_CONSOLE_FD (MONITOR_FD + 1) /* must be the last one + 1 */ +#define MAX_BUFFER_SIZE PAGE_SIZE + +static void * +virtio_console(void *arg) +{ + static __u8 vcons_buf[2][PAGE_SIZE]; + struct iovec vcons_iov[2] = { + { .iov_base = vcons_buf[0], .iov_len = sizeof(vcons_buf[0]) }, + { .iov_base = vcons_buf[1], .iov_len = sizeof(vcons_buf[1]) }, + }; + struct iovec *iov0 = &vcons_iov[0], *iov1 = &vcons_iov[1]; + struct mic_info *mic = (struct mic_info *)arg; + int err; + struct pollfd console_poll[MAX_CONSOLE_FD]; + int pty_fd; + char *pts_name; + ssize_t len; + struct mic_vring tx_vr, rx_vr; + struct mic_copy_desc chain; + struct mic_copy *copy = &chain.copy; + struct mic_device_desc *desc; + + pty_fd = posix_openpt(O_RDWR); + if (pty_fd < 0) { + mpsslog("can't open a pseudoterminal master device: %s\n", + strerror(errno)); + goto _return; + } + pts_name = ptsname(pty_fd); + if (pts_name == NULL) { + mpsslog("can't get pts name\n"); + goto _close_pty; + } + printf("%s console message goes to %s\n", mic->name, pts_name); + mpsslog("%s console message goes to %s\n", mic->name, pts_name); + err = grantpt(pty_fd); + if (err < 0) { + mpsslog("can't grant access: %s %s\n", + pts_name, strerror(errno)); + goto _close_pty; + } + err = unlockpt(pty_fd); + if (err < 0) { + mpsslog("can't unlock a pseudoterminal: %s %s\n", + pts_name, strerror(errno)); + goto _close_pty; + } + console_poll[MONITOR_FD].fd = pty_fd; + console_poll[MONITOR_FD].events = POLLIN; + + console_poll[VIRTIO_CONSOLE_FD].fd = mic->mic_console.virtio_console_fd; + console_poll[VIRTIO_CONSOLE_FD].events = POLLIN; + + if (MAP_FAILED == init_vr(mic, mic->mic_console.virtio_console_fd, + VIRTIO_ID_CONSOLE, &tx_vr, &rx_vr, + virtcons_dev_page.dd.num_vq)) { + mpsslog("%s init_vr failed %s\n", + mic->name, strerror(errno)); + goto _close_pty; + } + + copy->iovcnt = 1; + desc = get_device_desc(mic, VIRTIO_ID_CONSOLE); + + for (;;) { + console_poll[MONITOR_FD].revents = 0; + console_poll[VIRTIO_CONSOLE_FD].revents = 0; + err = poll(console_poll, MAX_CONSOLE_FD, -1); + if (err < 0) { + mpsslog("%s %d: poll failed: %s\n", __func__, __LINE__, + strerror(errno)); + continue; + } + if (!(desc->status & VIRTIO_CONFIG_S_DRIVER_OK)) + wait_for_card_driver(mic, + mic->mic_console.virtio_console_fd, + VIRTIO_ID_CONSOLE); + + if (console_poll[MONITOR_FD].revents & POLLIN) { + copy->iov = iov0; + len = readv(pty_fd, copy->iov, copy->iovcnt); + if (len > 0) { +#ifdef DEBUG + disp_iovec(mic, copy, __func__, __LINE__); + mpsslog("%s %s %d read from tap 0x%lx\n", + mic->name, __func__, __LINE__, + len); +#endif + wait_for_descriptors(mic, &tx_vr); + txrx_prepare(VIRTIO_ID_CONSOLE, 1, &tx_vr, + &chain, len); + + err = mic_virtio_copy(mic, + mic->mic_console.virtio_console_fd, + &tx_vr, &chain); + if (err < 0) { + mpsslog("%s %s %d mic_virtio_copy %s\n", + mic->name, __func__, __LINE__, + strerror(errno)); + } + if (!err) + verify_out_len(mic, copy); +#ifdef DEBUG + disp_iovec(mic, copy, __func__, __LINE__); + mpsslog("%s %s %d wrote to net 0x%lx\n", + mic->name, __func__, __LINE__, + sum_iovec_len(copy)); +#endif + /* Reinitialize IOV for next run */ + iov0->iov_len = PAGE_SIZE; + } else if (len < 0) { + disp_iovec(mic, copy, __func__, __LINE__); + mpsslog("%s %s %d read failed %s ", + mic->name, __func__, __LINE__, + strerror(errno)); + mpsslog("cnt %d sum %d\n", + copy->iovcnt, sum_iovec_len(copy)); + } + } + + if (console_poll[VIRTIO_CONSOLE_FD].revents & POLLIN) { + while (rx_vr.info->avail_idx !+ le16toh(rx_vr.vr.avail->idx)) { + copy->iov = iov1; + txrx_prepare(VIRTIO_ID_CONSOLE, 0, &rx_vr, + &chain, PAGE_SIZE); + + err = mic_virtio_copy(mic, + mic->mic_console.virtio_console_fd, + &rx_vr, &chain); + if (!err) { + /* Set the correct output iov_len */ + iov1->iov_len = copy->out_len; + verify_out_len(mic, copy); +#ifdef DEBUG + disp_iovec(mic, copy, __func__, + __LINE__); + mpsslog("%s %s %d ", + mic->name, __func__, __LINE__); + mpsslog("read from net 0x%lx\n", + sum_iovec_len(copy)); +#endif + len = writev(pty_fd, + copy->iov, copy->iovcnt); + if (len != sum_iovec_len(copy)) { + mpsslog("Tun write failed %s ", + strerror(errno)); + mpsslog("len 0x%x ", len); + mpsslog("read_len 0x%x\n", + sum_iovec_len(copy)); + } else { +#ifdef DEBUG + disp_iovec(mic, copy, __func__, + __LINE__); + mpsslog("%s %s %d ", + mic->name, __func__, + __LINE__); + mpsslog("wrote to tap 0x%lx\n", + len); +#endif + } + } else { + mpsslog("%s %s %d mic_virtio_copy %s\n", + mic->name, __func__, __LINE__, + strerror(errno)); + break; + } + } + } + if (console_poll[NET_FD_VIRTIO_NET].revents & POLLERR) { + mpsslog("%s: %s: POLLERR\n", __func__, mic->name); + sleep(1); + } + } +_close_pty: + close(pty_fd); +_return: + pthread_exit(NULL); +} + +static void +add_virtio_device(struct mic_info *mic, struct mic_device_desc *dd) +{ + char path[PATH_MAX]; + int fd, err; + + snprintf(path, PATH_MAX, "/dev/mic%d", mic->id); + fd = open(path, O_RDWR); + if (fd < 0) { + mpsslog("Could not open %s %s\n", path, strerror(errno)); + return; + } + + err = ioctl(fd, MIC_VIRTIO_ADD_DEVICE, dd); + if (err < 0) { + mpsslog("Could not add %d %s\n", dd->type, strerror(errno)); + close(fd); + return; + } + switch (dd->type) { + case VIRTIO_ID_NET: + mic->mic_net.virtio_net_fd = fd; + mpsslog("Added VIRTIO_ID_NET for %s\n", mic->name); + break; + case VIRTIO_ID_CONSOLE: + mic->mic_console.virtio_console_fd = fd; + mpsslog("Added VIRTIO_ID_CONSOLE for %s\n", mic->name); + break; + case VIRTIO_ID_BLOCK: + mic->mic_virtblk.virtio_block_fd = fd; + mpsslog("Added VIRTIO_ID_BLOCK for %s\n", mic->name); + break; + } +} + +static bool +set_backend_file(struct mic_info *mic) +{ + FILE *config; + char buff[PATH_MAX], *line, *evv, *p; + + snprintf(buff, PATH_MAX, "%s/mpssd%03d.conf", mic_config_dir, mic->id); + config = fopen(buff, "r"); + if (config == NULL) + return false; + do { /* look for "virtblk_backend=XXXX" */ + line = fgets(buff, PATH_MAX, config); + if (line == NULL) + break; + if (*line == '#') + continue; + p = strchr(line, '\n'); + if (p) + *p = '\0'; + } while (strncmp(line, virtblk_backend, strlen(virtblk_backend)) != 0); + fclose(config); + if (line == NULL) + return false; + evv = strchr(line, '='); + if (evv == NULL) + return false; + mic->mic_virtblk.backend_file = malloc(strlen(evv)); + if (mic->mic_virtblk.backend_file == NULL) { + mpsslog("can't allocate memory\n", mic->name, mic->id); + return false; + } + strcpy(mic->mic_virtblk.backend_file, evv + 1); + return true; +} + +#define SECTOR_SIZE 512 +static bool +set_backend_size(struct mic_info *mic) +{ + mic->mic_virtblk.backend_size = lseek(mic->mic_virtblk.backend, 0, + SEEK_END); + if (mic->mic_virtblk.backend_size < 0) { + mpsslog("%s: can't seek: %s\n", + mic->name, mic->mic_virtblk.backend_file); + return false; + } + virtblk_dev_page.blk_config.capacity + mic->mic_virtblk.backend_size / SECTOR_SIZE; + if ((mic->mic_virtblk.backend_size % SECTOR_SIZE) != 0) + virtblk_dev_page.blk_config.capacity++; + + virtblk_dev_page.blk_config.capacity + htole64(virtblk_dev_page.blk_config.capacity); + + return true; +} + +static bool +open_backend(struct mic_info *mic) +{ + if (!set_backend_file(mic)) + goto _error_exit; + mic->mic_virtblk.backend = open(mic->mic_virtblk.backend_file, O_RDWR); + if (mic->mic_virtblk.backend < 0) { + mpsslog("%s: can't open: %s\n", mic->name, + mic->mic_virtblk.backend_file); + goto _error_free; + } + if (!set_backend_size(mic)) + goto _error_close; + mic->mic_virtblk.backend_addr = mmap(NULL, + mic->mic_virtblk.backend_size, + PROT_READ|PROT_WRITE, MAP_SHARED, + mic->mic_virtblk.backend, 0L); + if (mic->mic_virtblk.backend_addr == MAP_FAILED) { + mpsslog("%s: can't map: %s %s\n", + mic->name, mic->mic_virtblk.backend_file, + strerror(errno)); + goto _error_close; + } + return true; + + _error_close: + close(mic->mic_virtblk.backend); + _error_free: + free(mic->mic_virtblk.backend_file); + _error_exit: + return false; +} + +static void +close_backend(struct mic_info *mic) +{ + munmap(mic->mic_virtblk.backend_addr, mic->mic_virtblk.backend_size); + close(mic->mic_virtblk.backend); + free(mic->mic_virtblk.backend_file); +} + +static bool +start_virtblk(struct mic_info *mic, struct mic_vring *vring) +{ + if (((__u64)&virtblk_dev_page.blk_config % 8) != 0) { + mpsslog("%s: blk_config is not 8 byte aligned.\n", + mic->name); + return false; + } + add_virtio_device(mic, &virtblk_dev_page.dd); + if (MAP_FAILED == init_vr(mic, mic->mic_virtblk.virtio_block_fd, + VIRTIO_ID_BLOCK, vring, NULL, virtblk_dev_page.dd.num_vq)) { + mpsslog("%s init_vr failed %s\n", + mic->name, strerror(errno)); + return false; + } + return true; +} + +static void +stop_virtblk(struct mic_info *mic) +{ + uninit_vr(mic, virtblk_dev_page.dd.num_vq); + close(mic->mic_virtblk.virtio_block_fd); +} + +static __u8 +header_error_check(struct vring_desc *desc) +{ + if (le32toh(desc->len) != sizeof(struct virtio_blk_outhdr)) { + mpsslog("%s() %d: length is not sizeof(virtio_blk_outhd)\n", + __func__, __LINE__); + return -EIO; + } + if (!(le16toh(desc->flags) & VRING_DESC_F_NEXT)) { + mpsslog("%s() %d: alone\n", + __func__, __LINE__); + return -EIO; + } + if (le16toh(desc->flags) & VRING_DESC_F_WRITE) { + mpsslog("%s() %d: not read\n", + __func__, __LINE__); + return -EIO; + } + return 0; +} + +static int +read_header(int fd, struct virtio_blk_outhdr *hdr, __u32 desc_idx) +{ + struct iovec iovec; + struct mic_copy_desc chain; + + iovec.iov_len = sizeof(*hdr); + iovec.iov_base = hdr; + chain.copy.iov = &iovec; + chain.copy.iovcnt = 1; + chain.copy.vr_idx = 0; /* only one vring on virtio_block */ + chain.copy.desc_idx = desc_idx; + chain.copy.out_len = iovec.iov_len; + chain.used_desc_idx = -1U; /* do not update used pointer */ + chain.used_len = 0; + return ioctl(fd, MIC_VIRTIO_COPY_DESC, &chain); +} + +static int +transfer_blocks(int fd, struct iovec *iovec, __u32 iovcnt, + __u32 desc_idx, __u32 total_len) +{ + struct mic_copy_desc chain; + + chain.copy.iov = iovec; + chain.copy.iovcnt = iovcnt; + chain.copy.vr_idx = 0; + chain.copy.desc_idx = desc_idx; + chain.copy.out_len = total_len; + chain.used_desc_idx = -1U; /* do not update used pointer */ + chain.used_len = 0; + return ioctl(fd, MIC_VIRTIO_COPY_DESC, &chain); +} + +static __u8 +status_error_check(struct vring_desc *desc) +{ + if (le32toh(desc->len) != sizeof(__u8)) { + mpsslog("%s() %d: length is not sizeof(status)\n", + __func__, __LINE__); + return -EIO; + } + return 0; +} + +static int +write_status(int fd, __u8 *status, __u32 status_desc_idx, + __u32 used_desc_idx, __u32 total_len) +{ + struct iovec iovec; + struct mic_copy_desc chain; + + iovec.iov_base = status; + iovec.iov_len = sizeof(*status); + chain.copy.iov = &iovec; + chain.copy.iovcnt = 1; + chain.copy.vr_idx = 0; /* only one vring on virtio_block */ + chain.copy.desc_idx = status_desc_idx; + chain.copy.out_len = iovec.iov_len; + chain.used_desc_idx = used_desc_idx; /* update used pointer */ + chain.used_len = total_len; + return ioctl(fd, MIC_VIRTIO_COPY_DESC, &chain); +} + +static void * +virtio_block(void *arg) +{ + struct mic_info *mic = (struct mic_info *) arg; + int ret; + struct pollfd block_poll; + struct mic_vring vring; + __u16 avail_idx; + __u32 desc_idx; + struct vring_desc *desc; + struct iovec *iovec, *piov; + __u8 status; + __u32 header_desc_idx, buffer_desc_idx; + ssize_t total_len; + struct virtio_blk_outhdr hdr; + void *fos; + + for (;;) { /* forever */ + if (!open_backend(mic)) { /* No virtblk */ + for (mic->mic_virtblk.signaled = 0; + !mic->mic_virtblk.signaled;) + sleep(1); + continue; + } + + /* backend file is specified. */ + if (!start_virtblk(mic, &vring)) + goto _close_backend; + iovec = malloc(sizeof(*iovec) * + le32toh(virtblk_dev_page.blk_config.seg_max)); + if (!iovec) { + mpsslog("%s: can't alloc iovec: %s\n", + mic->name, strerror(ENOMEM)); + goto _stop_virtblk; + } + + block_poll.fd = mic->mic_virtblk.virtio_block_fd; + block_poll.events = POLLIN; + for (mic->mic_virtblk.signaled = 0; + !mic->mic_virtblk.signaled;) { + block_poll.revents = 0; + /* timeout in 1 sec to see signaled */ + ret = poll(&block_poll, 1, 1000); + if (ret < 0) { + mpsslog("%s %d: poll failed: %s\n", + __func__, __LINE__, + strerror(errno)); + continue; + } + + if (!(block_poll.revents & POLLIN)) { +#ifdef DEBUG + mpsslog("%s %d: block_poll.revents=0x%x\n", + __func__, __LINE__, block_poll.revents); + sleep(1); +#endif + continue; + } + + /* POLLIN */ + while (vring.info->avail_idx !+ le16toh(vring.vr.avail->idx)) { + /* read header element */ + avail_idx + vring.info->avail_idx & + (vring.vr.num - 1); + desc_idx = le16toh( + vring.vr.avail->ring[avail_idx]); + desc = &vring.vr.desc[desc_idx]; +#ifdef DEBUG + mpsslog("%s() %d: avail_idx=%d ", + __func__, __LINE__, + vring.info->avail_idx); + mpsslog("vring.vr.num=%d desc=%p\n", + vring.vr.num, desc); +#endif + status = header_error_check(desc); + ret = read_header( + mic->mic_virtblk.virtio_block_fd, + &hdr, desc_idx); + if (ret < 0) { + mpsslog("%s() %d %s: ret=%d %s\n", + __func__, __LINE__, + mic->name, ret, + strerror(errno)); + break; + } + header_desc_idx = desc_idx; + + /* buffer element */ + piov = iovec; + status = 0; + total_len = 0; + fos = mic->mic_virtblk.backend_addr + + (hdr.sector * SECTOR_SIZE); + buffer_desc_idx = desc_idx + next_desc(desc); + for (desc = &vring.vr.desc[buffer_desc_idx]; + desc->flags & VRING_DESC_F_NEXT; + desc_idx = next_desc(desc), + desc = &vring.vr.desc[desc_idx]) { + piov->iov_len = desc->len; + piov->iov_base = fos; + piov++; + fos += desc->len; + total_len += desc->len; + } + if (hdr.type & ~(VIRTIO_BLK_T_OUT)) { + /* + VIRTIO_BLK_T_IN - does not do + anything. Probably for documenting. + VIRTIO_BLK_T_SCSI_CMD - for + virtio_scsi. + VIRTIO_BLK_T_GET_ID - virtblk driver + on card makes id as an empty string on + error status. + VIRTIO_BLK_T_FLUSH - turned off in + config space. + VIRTIO_BLK_T_BARRIER - defined but not + used in anywhere. + */ + mpsslog("%s() %d: type %x ", + __func__, __LINE__, + hdr.type); + mpsslog("is not supported\n"); + status = -ENOTSUP; + + } else { + ret = transfer_blocks( + mic->mic_virtblk.virtio_block_fd, + iovec, + piov - iovec, + buffer_desc_idx, + total_len); + if (ret < 0 && + status != 0) + status = ret; + } + /* write status and update used pointer */ + if (status != 0) + status = status_error_check(desc); + ret = write_status( + mic->mic_virtblk.virtio_block_fd, + &status, desc_idx, + header_desc_idx, + total_len + sizeof(hdr) + + sizeof(status)); +#ifdef DEBUG + mpsslog("%s() %d: write status=%d on desc=%p\n", + __func__, __LINE__, + status, desc); +#endif + } + } + free(iovec); +_stop_virtblk: + stop_virtblk(mic); +_close_backend: + close_backend(mic); + } /* forever */ + + pthread_exit(NULL); +} + +static void +reset(struct mic_info *mic) +{ +#define RESET_TIMEOUT 120 + int i = RESET_TIMEOUT; + setsysfs(mic->name, "state", "reset"); + while (i) { + char *state; + state = readsysfs(mic->name, "state"); + if (!state) + goto retry; + mpsslog("%s: %s %d state %s\n", + mic->name, __func__, __LINE__, state); + if ((!strcmp(state, "offline"))) { + free(state); + break; + } + free(state); +retry: + sleep(1); + i--; + } +} + +static int +get_mic_shutdown_status(struct mic_info *mic, char *shutdown_status) +{ + if (!strcmp(shutdown_status, "nop")) + return MIC_NOP; + if (!strcmp(shutdown_status, "crashed")) + return MIC_CRASHED; + if (!strcmp(shutdown_status, "halted")) + return MIC_HALTED; + if (!strcmp(shutdown_status, "poweroff")) + return MIC_POWER_OFF; + if (!strcmp(shutdown_status, "restart")) + return MIC_RESTART; + mpsslog("%s: BUG invalid status %s\n", mic->name, shutdown_status); + /* Invalid state */ + assert(0); +}; + +static int get_mic_state(struct mic_info *mic, char *state) +{ + if (!strcmp(state, "offline")) + return MIC_OFFLINE; + if (!strcmp(state, "online")) + return MIC_ONLINE; + if (!strcmp(state, "shutting_down")) + return MIC_SHUTTING_DOWN; + if (!strcmp(state, "reset_failed")) + return MIC_RESET_FAILED; + mpsslog("%s: BUG invalid state %s\n", mic->name, state); + /* Invalid state */ + assert(0); +}; + +static void mic_handle_shutdown(struct mic_info *mic) +{ +#define SHUTDOWN_TIMEOUT 60 + int i = SHUTDOWN_TIMEOUT, ret, stat = 0; + char *shutdown_status; + while (i) { + shutdown_status = readsysfs(mic->name, "shutdown_status"); + if (!shutdown_status) + continue; + mpsslog("%s: %s %d shutdown_status %s\n", + mic->name, __func__, __LINE__, shutdown_status); + switch (get_mic_shutdown_status(mic, shutdown_status)) { + case MIC_RESTART: + mic->restart = 1; + case MIC_HALTED: + case MIC_POWER_OFF: + case MIC_CRASHED: + goto reset; + default: + break; + } + free(shutdown_status); + sleep(1); + i--; + } +reset: + ret = kill(mic->pid, SIGTERM); + mpsslog("%s: %s %d kill pid %d ret %d\n", + mic->name, __func__, __LINE__, + mic->pid, ret); + if (!ret) { + ret = waitpid(mic->pid, &stat, + WIFSIGNALED(stat)); + mpsslog("%s: %s %d waitpid ret %d pid %d\n", + mic->name, __func__, __LINE__, + ret, mic->pid); + } + if (ret == mic->pid) + reset(mic); +} + +static void * +mic_config(void *arg) +{ + struct mic_info *mic = (struct mic_info *)arg; + char *state = NULL; + char pathname[PATH_MAX]; + int fd, ret; + struct pollfd ufds[1]; + char value[4096]; + + snprintf(pathname, PATH_MAX - 1, "%s/%s/%s", + MICSYSFSDIR, mic->name, "state"); + + fd = open(pathname, O_RDONLY); + if (fd < 0) { + mpsslog("%s: opening file %s failed %s\n", + mic->name, pathname, strerror(errno)); + goto error; + } + + do { + ret = read(fd, value, sizeof(value)); + if (ret < 0) { + mpsslog("%s: Failed to read sysfs entry '%s': %s\n", + mic->name, pathname, strerror(errno)); + goto close_error1; + } +retry: + state = readsysfs(mic->name, "state"); + if (!state) + goto retry; + mpsslog("%s: %s %d state %s\n", + mic->name, __func__, __LINE__, state); + switch (get_mic_state(mic, state)) { + case MIC_SHUTTING_DOWN: + mic_handle_shutdown(mic); + goto close_error; + default: + break; + } + free(state); + + ufds[0].fd = fd; + ufds[0].events = POLLERR | POLLPRI; + ret = poll(ufds, 1, -1); + if (ret < 0) { + mpsslog("%s: poll failed %s\n", + mic->name, strerror(errno)); + goto close_error1; + } + } while (1); +close_error: + free(state); +close_error1: + close(fd); +error: + init_mic(mic); + pthread_exit(NULL); +} + +static void +set_cmdline(struct mic_info *mic) +{ + char buffer[PATH_MAX]; + int len; + + len = snprintf(buffer, PATH_MAX, + "clocksource=tsc highres=off nohz=off "); + len += snprintf(buffer + len, PATH_MAX, + "cpufreq_on;corec6_off;pc3_off;pc6_off "); + len += snprintf(buffer + len, PATH_MAX, + "ifcfg=static;address,172.31.%d.1;netmask,255.255.255.0", + mic->id); + + setsysfs(mic->name, "cmdline", buffer); + mpsslog("%s: Command line: \"%s\"\n", mic->name, buffer); + snprintf(buffer, PATH_MAX, "172.31.%d.1", mic->id); + setsysfs(mic->name, "ipaddr", buffer); + mpsslog("%s: IPADDR: \"%s\"\n", mic->name, buffer); +} + +static void +set_log_buf_info(struct mic_info *mic) +{ + int fd; + off_t len; + char system_map[] = "/lib/firmware/mic/System.map"; + char *map, *temp, log_buf[17] = {'\0'}; + + fd = open(system_map, O_RDONLY); + if (fd < 0) { + mpsslog("%s: Opening System.map failed: %d\n", + mic->name, errno); + return; + } + len = lseek(fd, 0, SEEK_END); + if (len < 0) { + mpsslog("%s: Reading System.map size failed: %d\n", + mic->name, errno); + close(fd); + return; + } + map = mmap(NULL, len, PROT_READ, MAP_PRIVATE, fd, 0); + if (map == MAP_FAILED) { + mpsslog("%s: mmap of System.map failed: %d\n", + mic->name, errno); + close(fd); + return; + } + temp = strstr(map, "__log_buf"); + if (!temp) { + mpsslog("%s: __log_buf not found: %d\n", mic->name, errno); + munmap(map, len); + close(fd); + return; + } + strncpy(log_buf, temp - 19, 16); + setsysfs(mic->name, "log_buf_addr", log_buf); + mpsslog("%s: log_buf_addr: %s\n", mic->name, log_buf); + temp = strstr(map, "log_buf_len"); + if (!temp) { + mpsslog("%s: log_buf_len not found: %d\n", mic->name, errno); + munmap(map, len); + close(fd); + return; + } + strncpy(log_buf, temp - 19, 16); + setsysfs(mic->name, "log_buf_len", log_buf); + mpsslog("%s: log_buf_len: %s\n", mic->name, log_buf); + munmap(map, len); + close(fd); +} + +static void init_mic(struct mic_info *mic); + +static void +change_virtblk_backend(int x, siginfo_t *siginfo, void *p) +{ + struct mic_info *mic; + + for (mic = mic_list.next; mic != NULL; mic = mic->next) + mic->mic_virtblk.signaled = 1/* true */; +} + +static void +init_mic(struct mic_info *mic) +{ + struct sigaction ignore = { + .sa_flags = 0, + .sa_handler = SIG_IGN + }; + struct sigaction act = { + .sa_flags = SA_SIGINFO, + .sa_sigaction = change_virtblk_backend, + }; + char buffer[PATH_MAX]; + int err; + + /* ignore SIGUSR1 for both process */ + sigaction(SIGUSR1, &ignore, NULL); + + mic->pid = fork(); + switch (mic->pid) { + case 0: + set_log_buf_info(mic); + set_cmdline(mic); + add_virtio_device(mic, &virtcons_dev_page.dd); + add_virtio_device(mic, &virtnet_dev_page.dd); + err = pthread_create(&mic->mic_console.console_thread, NULL, + virtio_console, mic); + if (err) + mpsslog("%s virtcons pthread_create failed %s\n", + mic->name, strerror(err)); + /* + * TODO: Debug why not adding this sleep results in the tap + * interface not coming up during certain runs sporadically. + */ + usleep(1000); + err = pthread_create(&mic->mic_net.net_thread, NULL, + virtio_net, mic); + if (err) + mpsslog("%s virtnet pthread_create failed %s\n", + mic->name, strerror(err)); + err = pthread_create(&mic->mic_virtblk.block_thread, NULL, + virtio_block, mic); + if (err) + mpsslog("%s virtblk pthread_create failed %s\n", + mic->name, strerror(err)); + sigemptyset(&act.sa_mask); + err = sigaction(SIGUSR1, &act, NULL); + if (err) + mpsslog("%s sigaction SIGUSR1 failed %s\n", + mic->name, strerror(errno)); + while (1) + sleep(60); + case -1: + mpsslog("fork failed MIC name %s id %d errno %d\n", + mic->name, mic->id, errno); + break; + default: + if (mic->restart) { + snprintf(buffer, PATH_MAX, + "boot:linux:mic/uos.img:mic/mic%d.image", + mic->id); + setsysfs(mic->name, "state", buffer); + mpsslog("%s restarting mic %d\n", + mic->name, mic->restart); + mic->restart = 0; + } + pthread_create(&mic->config_thread, NULL, mic_config, mic); + } +} + +static void +start_daemon(void) +{ + struct mic_info *mic; + + for (mic = mic_list.next; mic != NULL; mic = mic->next) + init_mic(mic); + + while (1) + sleep(60); +} + +static int +init_mic_list(void) +{ + struct mic_info *mic = &mic_list; + struct dirent *file; + DIR *dp; + int cnt = 0; + + dp = opendir(MICSYSFSDIR); + if (!dp) + return 0; + + while ((file = readdir(dp)) != NULL) { + if (!strncmp(file->d_name, "mic", 3)) { + mic->next = malloc(sizeof(struct mic_info)); + if (mic->next) { + mic = mic->next; + mic->next = NULL; + memset(mic, 0, sizeof(struct mic_info)); + mic->id = atoi(&file->d_name[3]); + mic->name = malloc(strlen(file->d_name) + 16); + if (mic->name) + strcpy(mic->name, file->d_name); + mpsslog("MIC name %s id %d\n", mic->name, + mic->id); + cnt++; + } + } + } + + closedir(dp); + return cnt; +} + +void +mpsslog(char *format, ...) +{ + va_list args; + char buffer[4096]; + time_t t; + char *ts; + + if (logfp == NULL) + return; + + va_start(args, format); + vsprintf(buffer, format, args); + va_end(args); + + time(&t); + ts = ctime(&t); + ts[strlen(ts) - 1] = '\0'; + fprintf(logfp, "%s: %s", ts, buffer); + + fflush(logfp); +} + +int +main(int argc, char *argv[]) +{ + int cnt; + + myname = argv[0]; + + logfp = fopen(LOGFILE_NAME, "a+"); + if (!logfp) { + fprintf(stderr, "cannot open logfile '%s'\n", LOGFILE_NAME); + exit(1); + } + + mpsslog("MIC Daemon start\n"); + + cnt = init_mic_list(); + if (cnt == 0) { + mpsslog("MIC module not loaded\n"); + exit(2); + } + mpsslog("MIC found %d devices\n", cnt); + + start_daemon(); + + exit(0); +} diff --git a/Documentation/mic/mpssd/mpssd.h b/Documentation/mic/mpssd/mpssd.h new file mode 100644 index 0000000..b3212e5 --- /dev/null +++ b/Documentation/mic/mpssd/mpssd.h @@ -0,0 +1,105 @@ +/* + * Intel MIC Platform Software Stack (MPSS) + * + * Copyright(c) 2013 Intel Corporation. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License, version 2, as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 + * USA. + * + * The full GNU General Public License is included in this distribution in + * the file called "COPYING". + * + * Intel MIC User Space Tools. + */ +#ifndef _MPSSD_H_ +#define _MPSSD_H_ + +#include <stdio.h> +#include <stdlib.h> +#include <string.h> +#include <fcntl.h> +#include <unistd.h> +#include <dirent.h> +#include <libgen.h> +#include <pthread.h> +#include <stdarg.h> +#include <time.h> +#include <errno.h> +#include <sys/dir.h> +#include <sys/ioctl.h> +#include <sys/poll.h> +#include <sys/types.h> +#include <sys/socket.h> +#include <sys/stat.h> +#include <sys/types.h> +#include <sys/mman.h> +#include <sys/utsname.h> +#include <sys/wait.h> +#include <netinet/in.h> +#include <arpa/inet.h> +#include <netdb.h> +#include <pthread.h> +#include <signal.h> +#include <limits.h> +#include <syslog.h> +#include <getopt.h> +#include <net/if.h> +#include <linux/if_tun.h> +#include <linux/if_tun.h> +#include <linux/virtio_ids.h> + +#define MICSYSFSDIR "/sys/class/mic" +#define LOGFILE_NAME "/var/log/mpssd" +#define PAGE_SIZE 4096 + +struct mic_console_info { + pthread_t console_thread; + int virtio_console_fd; + void *console_dp; +}; + +struct mic_net_info { + pthread_t net_thread; + int virtio_net_fd; + int tap_fd; + void *net_dp; +}; + +struct mic_virtblk_info { + pthread_t block_thread; + int virtio_block_fd; + void *block_dp; + volatile sig_atomic_t signaled; + char *backend_file; + int backend; + void *backend_addr; + long backend_size; +}; + +struct mic_info { + int id; + char *name; + pthread_t config_thread; + pid_t pid; + struct mic_console_info mic_console; + struct mic_net_info mic_net; + struct mic_virtblk_info mic_virtblk; + int restart; + struct mic_info *next; +}; + +void mpsslog(char *format, ...); +char *readsysfs(char *dir, char *entry); +int setsysfs(char *dir, char *entry, char *value); +#endif diff --git a/Documentation/mic/mpssd/sysfs.c b/Documentation/mic/mpssd/sysfs.c new file mode 100644 index 0000000..7bc9fbf --- /dev/null +++ b/Documentation/mic/mpssd/sysfs.c @@ -0,0 +1,108 @@ +/* + * Intel MIC Platform Software Stack (MPSS) + * + * Copyright(c) 2013 Intel Corporation. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License, version 2, as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 + * USA. + * + * The full GNU General Public License is included in this distribution in + * the file called "COPYING". + * + * Intel MIC User Space Tools. + */ + +#include "mpssd.h" + +#define PAGE_SIZE 4096 + +char * +readsysfs(char *dir, char *entry) +{ + char filename[PATH_MAX]; + char value[PAGE_SIZE]; + char *string = NULL; + int fd; + int len; + + if (dir == NULL) + snprintf(filename, PATH_MAX, "%s/%s", MICSYSFSDIR, entry); + else + snprintf(filename, PATH_MAX, + "%s/%s/%s", MICSYSFSDIR, dir, entry); + + fd = open(filename, O_RDONLY); + if (fd < 0) { + mpsslog("Failed to open sysfs entry '%s': %s\n", + filename, strerror(errno)); + return NULL; + } + + len = read(fd, value, sizeof(value)); + if (len < 0) { + mpsslog("Failed to read sysfs entry '%s': %s\n", + filename, strerror(errno)); + goto readsys_ret; + } + + value[len] = '\0'; + + string = malloc(strlen(value) + 1); + if (string) + strcpy(string, value); + +readsys_ret: + close(fd); + return string; +} + +int +setsysfs(char *dir, char *entry, char *value) +{ + char filename[PATH_MAX]; + char oldvalue[PAGE_SIZE]; + int fd; + + if (dir == NULL) + snprintf(filename, PATH_MAX, "%s/%s", MICSYSFSDIR, entry); + else + snprintf(filename, PATH_MAX, "%s/%s/%s", + MICSYSFSDIR, dir, entry); + + fd = open(filename, O_RDWR); + if (fd < 0) { + mpsslog("Failed to open sysfs entry '%s': %s\n", + filename, strerror(errno)); + return errno; + } + + if (read(fd, oldvalue, sizeof(oldvalue)) < 0) { + mpsslog("Failed to read sysfs entry '%s': %s\n", + filename, strerror(errno)); + close(fd); + return errno; + } + + if (strcmp(value, oldvalue)) { + if (write(fd, value, strlen(value)) < 0) { + mpsslog("Failed to write new sysfs entry '%s': %s\n", + filename, strerror(errno)); + close(fd); + return errno; + } + } + + close(fd); + return 0; +} -- 1.8.2.1
Greg Kroah-Hartman
2013-Jul-25 04:41 UTC
[PATCH 4/5] Intel MIC Card Driver Changes for Virtio Devices.
On Wed, Jul 24, 2013 at 08:31:35PM -0700, Sudeep Dutt wrote:> +/* > + * Intel MIC Platform Software Stack (MPSS) > + * > + * Copyright(c) 2013 Intel Corporation. > + * > + * This program is free software; you can redistribute it and/or modify > + * it under the terms of the GNU General Public License, version 2, as > + * published by the Free Software Foundation. > + * > + * This program is distributed in the hope that it will be useful, but > + * WITHOUT ANY WARRANTY; without even the implied warranty of > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + * General Public License for more details. > + * > + * You should have received a copy of the GNU General Public License > + * along with this program; if not, write to the Free Software > + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 > + * USA. > + * > + * The full GNU General Public License is included in this distribution in > + * the file called "COPYING".Someone needs to tell Intel to take out the address paragraph of thier "standard template" as it's annoying. Please remove it, unless you personally want to keep the file up to date with the address movements of the FSF for the next 40+ years.> + * Disclaimer: The codes contained in these modules may be specific to > + * the Intel Software Development Platform codenamed: Knights Ferry, and > + * the Intel product codenamed: Knights Corner, and are not backward > + * compatible with other Intel products. Additionally, Intel will NOT > + * support the codes or instruction set in future products.What does this mean? That's a new one to me...> +static inline struct device *dev(struct mic_vdev *mvdev) > +{ > + return mvdev->vdev.dev.parent; > +}Can you pick a worse name? And you aren't returning the "device", you are returning the parent, so the name (as short as it is), is wrong. ick. greg k-h
Rusty Russell
2013-Jul-29 01:58 UTC
[PATCH 4/5] Intel MIC Card Driver Changes for Virtio Devices.
Sudeep Dutt <sudeep.dutt at intel.com> writes:> From: Ashutosh Dixit <ashutosh.dixit at intel.com> > > This patch introduces the card "Virtio over PCIe" interface for > Intel MIC. It allows virtio drivers on the card to communicate with their > user space backends on the host via a device page. Ring 3 apps on the host > can add, remove and configure virtio devices. A thin MIC specific > virtio_config_ops is implemented which is borrowed heavily from previous > similar implementations in lguest and s390 @ > drivers/lguest/lguest_device.c > drivers/s390/kvm/kvm_virtio.cWow, this is pretty cool! Note that the lguest implementation is a bit limited, but lguest doesn't have a real ABI, so we can simply change it if we wanted to. I assume you don't have that luxury. In particular, you may want to get rid of the historical align constant, and have explicit avail and used addresses. Cheers, Rusty.
Michael S. Tsirkin
2013-Jul-29 07:05 UTC
[PATCH 3/5] Intel MIC Host Driver Changes for Virtio Devices.
On Wed, Jul 24, 2013 at 08:31:34PM -0700, Sudeep Dutt wrote:> From: Ashutosh Dixit <ashutosh.dixit at intel.com> > > This patch introduces the host "Virtio over PCIe" interface for > Intel MIC. It allows creating user space backends on the host and > instantiating virtio devices for them on the Intel MIC card. A character > device per MIC is exposed with IOCTL, mmap and poll callbacks. This allows > the user space backend to: > (a) add/remove a virtio device via a device page. > (b) map (R/O) virtio rings and device page to user space. > (c) poll for availability of data. > (d) copy a descriptor or entire descriptor chain to/from the card. > (e) modify virtio configuration. > (f) handle virtio device reset. > The buffers are copied over using CPU copies for this initial patch > and host initiated MIC DMA support is planned for future patches. > The avail and desc virtio rings are in host memory and the used ring > is in card memory to maximize writes across PCIe for performance. > > Co-author: Sudeep Dutt <sudeep.dutt at intel.com> > Signed-off-by: Ashutosh Dixit <ashutosh.dixit at intel.com> > Signed-off-by: Caz Yokoyama <Caz.Yokoyama at intel.com> > Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli at intel.com> > Signed-off-by: Nikhil Rao <nikhil.rao at intel.com> > Signed-off-by: Harshavardhan R Kharche <harshavardhan.r.kharche at intel.com> > Signed-off-by: Sudeep Dutt <sudeep.dutt at intel.com> > Acked-by: Yaozu (Eddie) Dong <eddie.dong at intel.com> > Reviewed-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr at intel.com>I decided to look at the security and ordering of ring accesses. Doing a quick look, I think I found some issues, see comments below. If it were possible to reuse existing ring handling code, such issues would go away automatically. Which brings me to the next question: have you looked at reusing some code under drivers/vhost for host side processing? If not, you probably should. Is code in vringh.c generic enough to support your use-case, and if not what exactly are the issues preventing this? Thanks,> --- > drivers/misc/mic/common/mic_device.h | 4 + > drivers/misc/mic/host/Makefile | 2 + > drivers/misc/mic/host/mic_boot.c | 2 + > drivers/misc/mic/host/mic_debugfs.c | 137 +++++++ > drivers/misc/mic/host/mic_fops.c | 280 ++++++++++++++ > drivers/misc/mic/host/mic_fops.h | 37 ++ > drivers/misc/mic/host/mic_main.c | 24 ++ > drivers/misc/mic/host/mic_virtio.c | 703 +++++++++++++++++++++++++++++++++++ > drivers/misc/mic/host/mic_virtio.h | 108 ++++++ > include/uapi/linux/Kbuild | 1 + > include/uapi/linux/mic_common.h | 165 +++++++- > include/uapi/linux/mic_ioctl.h | 104 ++++++ > 12 files changed, 1566 insertions(+), 1 deletion(-) > create mode 100644 drivers/misc/mic/host/mic_fops.c > create mode 100644 drivers/misc/mic/host/mic_fops.h > create mode 100644 drivers/misc/mic/host/mic_virtio.c > create mode 100644 drivers/misc/mic/host/mic_virtio.h > create mode 100644 include/uapi/linux/mic_ioctl.h > > diff --git a/drivers/misc/mic/common/mic_device.h b/drivers/misc/mic/common/mic_device.h > index 24934b1..7cdeb74 100644 > --- a/drivers/misc/mic/common/mic_device.h > +++ b/drivers/misc/mic/common/mic_device.h > @@ -78,4 +78,8 @@ mic_mmio_write(struct mic_mw *mw, u32 val, u32 offset) > #define MIC_DPLO_SPAD 14 > #define MIC_DPHI_SPAD 15 > > +/* These values are supposed to be in ext_params on an interrupt */ > +#define MIC_VIRTIO_PARAM_DEV_REMOVE 0x1 > +#define MIC_VIRTIO_PARAM_CONFIG_CHANGED 0x2 > + > #endif > diff --git a/drivers/misc/mic/host/Makefile b/drivers/misc/mic/host/Makefile > index 0608bbb..e02abdb 100644 > --- a/drivers/misc/mic/host/Makefile > +++ b/drivers/misc/mic/host/Makefile > @@ -9,3 +9,5 @@ mic_host-objs += mic_sysfs.o > mic_host-objs += mic_boot.o > mic_host-objs += mic_smpt.o > mic_host-objs += mic_debugfs.o > +mic_host-objs += mic_fops.o > +mic_host-objs += mic_virtio.o > diff --git a/drivers/misc/mic/host/mic_boot.c b/drivers/misc/mic/host/mic_boot.c > index 6485a87..40bcb90 100644 > --- a/drivers/misc/mic/host/mic_boot.c > +++ b/drivers/misc/mic/host/mic_boot.c > @@ -30,6 +30,7 @@ > #include <linux/delay.h> > > #include "mic_common.h" > +#include "mic_virtio.h" > > /** > * mic_reset - Reset the MIC device. > @@ -112,6 +113,7 @@ void mic_stop(struct mic_device *mdev, bool force) > { > mutex_lock(&mdev->mic_mutex); > if (MIC_OFFLINE != mdev->state || force) { > + mic_virtio_reset_devices(mdev); > mic_bootparam_init(mdev); > mic_reset(mdev); > if (MIC_RESET_FAILED == mdev->state) > diff --git a/drivers/misc/mic/host/mic_debugfs.c b/drivers/misc/mic/host/mic_debugfs.c > index 5b7697e..bebc6e3 100644 > --- a/drivers/misc/mic/host/mic_debugfs.c > +++ b/drivers/misc/mic/host/mic_debugfs.c > @@ -32,6 +32,7 @@ > > #include "mic_common.h" > #include "mic_debugfs.h" > +#include "mic_virtio.h" > > /* Debugfs parent dir */ > static struct dentry *mic_dbg; > @@ -207,7 +208,13 @@ static const struct file_operations post_code_ops = { > static int dp_seq_show(struct seq_file *s, void *pos) > { > struct mic_device *mdev = s->private; > + struct mic_device_desc *d; > + struct mic_device_ctrl *dc; > + struct mic_vqconfig *vqconfig; > + __u32 *features; > + __u8 *config; > struct mic_bootparam *bootparam = mdev->dp; > + int i, j; > > seq_printf(s, "Bootparam: magic 0x%x\n", > bootparam->magic); > @@ -222,6 +229,53 @@ static int dp_seq_show(struct seq_file *s, void *pos) > seq_printf(s, "Bootparam: shutdown_card %d\n", > bootparam->shutdown_card); > > + for (i = sizeof(*bootparam); i < MIC_DP_SIZE; > + i += mic_total_desc_size(d)) { > + d = mdev->dp + i; > + dc = (void *)d + mic_aligned_desc_size(d); > + > + /* end of list */ > + if (d->type == 0) > + break; > + > + if (d->type == -1) > + continue; > + > + seq_printf(s, "Type %d ", d->type); > + seq_printf(s, "Num VQ %d ", d->num_vq); > + seq_printf(s, "Feature Len %d\n", d->feature_len); > + seq_printf(s, "Config Len %d ", d->config_len); > + seq_printf(s, "Shutdown Status %d\n", d->status); > + > + for (j = 0; j < d->num_vq; j++) { > + vqconfig = mic_vq_config(d) + j; > + seq_printf(s, "vqconfig[%d]: ", j); > + seq_printf(s, "address 0x%llx ", vqconfig->address); > + seq_printf(s, "num %d ", vqconfig->num); > + seq_printf(s, "used address 0x%llx\n", > + vqconfig->used_address); > + } > + > + features = (__u32 *) mic_vq_features(d); > + seq_printf(s, "Features: Host 0x%x ", features[0]); > + seq_printf(s, "Guest 0x%x\n", features[1]); > + > + config = mic_vq_configspace(d); > + for (j = 0; j < d->config_len; j++) > + seq_printf(s, "config[%d]=%d\n", j, config[j]); > + > + seq_puts(s, "Device control:\n"); > + seq_printf(s, "Config Change %d ", dc->config_change); > + seq_printf(s, "Vdev reset %d\n", dc->vdev_reset); > + seq_printf(s, "Guest Ack %d ", dc->guest_ack); > + seq_printf(s, "Host ack %d\n", dc->host_ack); > + seq_printf(s, "Used address updated %d ", > + dc->used_address_updated); > + seq_printf(s, "Vdev 0x%llx\n", dc->vdev); > + seq_printf(s, "c2h doorbell %d ", dc->c2h_vdev_db); > + seq_printf(s, "h2c doorbell %d\n", dc->h2c_vdev_db); > + } > + > return 0; > } > > @@ -243,6 +297,86 @@ static const struct file_operations dp_ops = { > .release = dp_debug_release > }; > > +static int vdev_info_seq_show(struct seq_file *s, void *unused) > +{ > + struct mic_device *mdev = s->private; > + struct list_head *pos, *tmp; > + struct mic_vdev *mvdev; > + int i, j; > + > + mutex_lock(&mdev->mic_mutex); > + list_for_each_safe(pos, tmp, &mdev->vdev_list) { > + mvdev = list_entry(pos, struct mic_vdev, list); > + seq_printf(s, "VDEV type %d state %s in %ld out %ld\n", > + mvdev->virtio_id, > + mic_vdevup(mvdev) ? "UP" : "DOWN", > + mvdev->in_bytes, > + mvdev->out_bytes); > + for (i = 0; i < MIC_MAX_VRINGS; i++) { > + struct vring_desc *desc; > + struct vring_avail *avail; > + struct vring_used *used; > + int num = mvdev->vring[i].vr.num; > + if (!num) > + continue; > + desc = mvdev->vring[i].vr.desc; > + seq_printf(s, "vring i %d avail_idx %d", > + i, mvdev->vring[i].info->avail_idx & (num - 1)); > + seq_printf(s, " used_idx %d num %d\n", > + mvdev->vring[i].info->used_idx & (num - 1), > + num); > + seq_printf(s, "vring i %d avail_idx %d used_idx %d\n", > + i, mvdev->vring[i].info->avail_idx, > + mvdev->vring[i].info->used_idx); > + for (j = 0; j < num; j++) { > + seq_printf(s, "desc[%d] addr 0x%llx len %d", > + j, desc->addr, desc->len); > + seq_printf(s, " flags 0x%x next %d\n", > + desc->flags, > + desc->next); > + desc++; > + } > + avail = mvdev->vring[i].vr.avail; > + seq_printf(s, "avail flags 0x%x idx %d\n", > + avail->flags, avail->idx & (num - 1)); > + seq_printf(s, "avail flags 0x%x idx %d\n", > + avail->flags, avail->idx); > + for (j = 0; j < num; j++) > + seq_printf(s, "avail ring[%d] %d\n", > + j, avail->ring[j]); > + used = mvdev->vring[i].vr.used; > + seq_printf(s, "used flags 0x%x idx %d\n", > + used->flags, used->idx & (num - 1)); > + seq_printf(s, "used flags 0x%x idx %d\n", > + used->flags, used->idx); > + for (j = 0; j < num; j++) > + seq_printf(s, "used ring[%d] id %d len %d\n", > + j, used->ring[j].id, used->ring[j].len); > + } > + } > + mutex_unlock(&mdev->mic_mutex); > + > + return 0; > +} > + > +static int vdev_info_debug_open(struct inode *inode, struct file *file) > +{ > + return single_open(file, vdev_info_seq_show, inode->i_private); > +} > + > +static int vdev_info_debug_release(struct inode *inode, struct file *file) > +{ > + return single_release(inode, file); > +} > + > +static const struct file_operations vdev_info_ops = { > + .owner = THIS_MODULE, > + .open = vdev_info_debug_open, > + .read = seq_read, > + .llseek = seq_lseek, > + .release = vdev_info_debug_release > +}; > + > static int msi_irq_info_seq_show(struct seq_file *s, void *pos) > { > struct mic_device *mdev = s->private; > @@ -332,6 +466,9 @@ void __init mic_create_debug_dir(struct mic_device *mdev) > debugfs_create_file("dp", 0444, mdev->dbg_dir, > mdev, &dp_ops); > > + debugfs_create_file("vdev_info", 0444, mdev->dbg_dir, > + mdev, &vdev_info_ops); > + > debugfs_create_file("msi_irq_info", 0444, mdev->dbg_dir, > mdev, &msi_irq_info_ops); > } > diff --git a/drivers/misc/mic/host/mic_fops.c b/drivers/misc/mic/host/mic_fops.c > new file mode 100644 > index 0000000..626a454 > --- /dev/null > +++ b/drivers/misc/mic/host/mic_fops.c > @@ -0,0 +1,280 @@ > +/* > + * Intel MIC Platform Software Stack (MPSS) > + * > + * Copyright(c) 2013 Intel Corporation. > + * > + * This program is free software; you can redistribute it and/or modify > + * it under the terms of the GNU General Public License, version 2, as > + * published by the Free Software Foundation. > + * > + * This program is distributed in the hope that it will be useful, but > + * WITHOUT ANY WARRANTY; without even the implied warranty of > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + * General Public License for more details. > + * > + * You should have received a copy of the GNU General Public License > + * along with this program; if not, write to the Free Software > + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 > + * USA. > + * > + * The full GNU General Public License is included in this distribution in > + * the file called "COPYING". > + * > + * Intel MIC Host driver. > + * > + */ > +#include <linux/module.h> > +#include <linux/fs.h> > +#include <linux/pci.h> > +#include <linux/interrupt.h> > +#include <linux/firmware.h> > +#include <linux/completion.h> > +#include <linux/poll.h> > +#include <linux/virtio_ids.h> > +#include <linux/mic_ioctl.h> > + > +#include "mic_common.h" > +#include "mic_fops.h" > +#include "mic_virtio.h" > + > +int mic_open(struct inode *inode, struct file *f) > +{ > + struct mic_vdev *mvdev; > + struct mic_device *mdev = container_of(inode->i_cdev, > + struct mic_device, cdev); > + > + mvdev = kzalloc(sizeof(*mvdev), GFP_KERNEL); > + if (!mvdev) > + return -ENOMEM; > + > + init_waitqueue_head(&mvdev->waitq); > + INIT_LIST_HEAD(&mvdev->list); > + mvdev->mdev = mdev; > + mvdev->virtio_id = -1; > + > + f->private_data = mvdev; > + return 0; > +} > + > +int mic_release(struct inode *inode, struct file *f) > +{ > + struct mic_vdev *mvdev = (struct mic_vdev *)f->private_data; > + > + if (-1 != mvdev->virtio_id) > + mic_virtio_del_device(mvdev); > + f->private_data = NULL; > + kfree(mvdev); > + return 0; > +} > + > +long mic_ioctl(struct file *f, unsigned int cmd, unsigned long arg) > +{ > + struct mic_vdev *mvdev = (struct mic_vdev *)f->private_data; > + void __user *argp = (void __user *)arg; > + int ret; > + > + switch (cmd) { > + case MIC_VIRTIO_ADD_DEVICE: > + { > + ret = mic_virtio_add_device(mvdev, argp); > + if (ret < 0) { > + dev_err(mic_dev(mvdev), > + "%s %d errno ret %d\n", > + __func__, __LINE__, ret); > + return ret; > + } > + break; > + } > + case MIC_VIRTIO_COPY_DESC: > + { > + struct mic_copy_desc request; > + struct mic_copy *copy = &request.copy; > + > + ret = mic_vdev_inited(mvdev); > + if (ret) > + return ret; > + > + if (copy_from_user(&request, argp, sizeof(request))) > + return -EFAULT; > + > + dev_dbg(mic_dev(mvdev), > + "%s %d === iovcnt 0x%x vr_idx 0x%x desc_idx 0x%x " > + "used_idx 0x%x used_len 0x%x\n", > + __func__, __LINE__, copy->iovcnt, > + copy->vr_idx, copy->desc_idx, > + request.used_desc_idx, request.used_len); > + > + ret = mic_virtio_copy_desc(mvdev, &request); > + if (ret < 0) { > + dev_err(mic_dev(mvdev), > + "%s %d errno ret %d\n", > + __func__, __LINE__, ret); > + return ret; > + } > + if (copy_to_user( > + &((struct mic_copy_desc __user *)argp)->copy.out_cookie, > + ©->out_cookie, sizeof(copy->out_cookie))) { > + dev_err(mic_dev(mvdev), "%s %d errno ret %d\n", > + __func__, __LINE__, -EFAULT); > + return -EFAULT; > + } > + if (copy_to_user( > + &((struct mic_copy_desc __user *)argp)->copy.out_len, > + ©->out_len, sizeof(copy->out_len))) { > + dev_err(mic_dev(mvdev), "%s %d errno ret %d\n", > + __func__, __LINE__, -EFAULT); > + return -EFAULT; > + } > + break; > + } > + case MIC_VIRTIO_COPY_CHAIN: > + { > + struct mic_copy request; > + > + ret = mic_vdev_inited(mvdev); > + if (ret) > + return ret; > + > + if (copy_from_user(&request, argp, sizeof(request))) > + return -EFAULT; > + > + dev_dbg(mic_dev(mvdev), > + "%s %d === vr_idx 0x%x desc_idx 0x%x iovcnt 0x%x\n", > + __func__, __LINE__, > + request.vr_idx, request.desc_idx, request.iovcnt); > + > + ret = mic_virtio_copy_chain(mvdev, &request); > + if (ret < 0) { > + dev_err(mic_dev(mvdev), > + "%s %d errno ret %d\n", > + __func__, __LINE__, ret); > + return ret; > + } > + if (copy_to_user( > + &((struct mic_copy __user *)argp)->out_cookie, > + &request.out_cookie, sizeof(request.out_cookie))) { > + dev_err(mic_dev(mvdev), "%s %d errno ret %d\n", > + __func__, __LINE__, -EFAULT); > + return -EFAULT; > + } > + if (copy_to_user(&((struct mic_copy __user *)argp)->out_len, > + &request.out_len, > + sizeof(request.out_len))) { > + dev_err(mic_dev(mvdev), "%s %d errno ret %d\n", > + __func__, __LINE__, -EFAULT); > + return -EFAULT; > + } > + break; > + } > + case MIC_VIRTIO_CONFIG_CHANGE: > + { > + ret = mic_vdev_inited(mvdev); > + if (ret) > + return ret; > + > + ret = mic_virtio_config_change(mvdev, argp); > + if (ret < 0) { > + dev_err(mic_dev(mvdev), > + "%s %d errno ret %d\n", > + __func__, __LINE__, ret); > + return ret; > + } > + break; > + } > + default: > + return -ENOIOCTLCMD; > + }; > + return 0; > +} > + > +/* > + * We return POLLIN | POLLOUT from poll when new buffers are enqueued, and > + * not when previously enqueued buffers may be available. This means that > + * in the card->host (TX) path, when userspace is unblocked by poll it > + * must drain all available descriptors or it can stall. > + */ > +unsigned int mic_poll(struct file *f, poll_table *wait) > +{ > + struct mic_vdev *mvdev = (struct mic_vdev *)f->private_data; > + int mask = 0; > + > + poll_wait(f, &mvdev->waitq, wait); > + > + if (mic_vdev_inited(mvdev)) > + mask = POLLERR; > + else if (mvdev->poll_wake) { > + mvdev->poll_wake = 0; > + mask = POLLIN | POLLOUT; > + } > + > + return mask; > +} > + > +static inline int > +mic_query_offset(struct mic_vdev *mvdev, unsigned long offset, > + unsigned long *size, unsigned long *pa) > +{ > + struct mic_device *mdev = mvdev->mdev; > + unsigned long start = MIC_DP_SIZE; > + int i; > + > + /* > + * MMAP interface is as follows: > + * offset region > + * 0x0 virtio device_page > + * 0x1000 first vring > + * 0x1000 + size of 1st vring second vring > + * .... > + */ > + if (!offset) { > + *pa = virt_to_phys(mdev->dp); > + *size = MIC_DP_SIZE; > + return 0; > + } > + > + for (i = 0; i < mvdev->dd->num_vq; i++) { > + if (offset == start) { > + *pa = virt_to_phys(mvdev->vring[i].va); > + *size = mvdev->vring[i].len; > + return 0; > + } > + start += mvdev->vring[i].len; > + } > + return -1; > +} > + > +/* > + * Maps the device page and virtio rings to user space for readonly access. > + */ > +int > +mic_mmap(struct file *f, struct vm_area_struct *vma) > +{ > + struct mic_vdev *mvdev = (struct mic_vdev *)f->private_data; > + unsigned long offset = vma->vm_pgoff << PAGE_SHIFT; > + unsigned long pa, size = vma->vm_end - vma->vm_start, size_rem = size; > + int i, err; > + > + err = mic_vdev_inited(mvdev); > + if (err) > + return err; > + > + if (vma->vm_flags & VM_WRITE) > + return -EACCES; > + > + while (size_rem) { > + i = mic_query_offset(mvdev, offset, &size, &pa); > + if (i < 0) > + return -EINVAL; > + err = remap_pfn_range(vma, vma->vm_start + offset, > + pa >> PAGE_SHIFT, size, vma->vm_page_prot); > + if (err) > + return err; > + dev_dbg(mic_dev(mvdev), > + "%s %d type %d size 0x%lx off 0x%lx pa 0x%lx vma 0x%lx\n", > + __func__, __LINE__, mvdev->virtio_id, size, offset, > + pa, vma->vm_start + offset); > + size_rem -= size; > + offset += size; > + } > + return 0; > +} > diff --git a/drivers/misc/mic/host/mic_fops.h b/drivers/misc/mic/host/mic_fops.h > new file mode 100644 > index 0000000..504506c > --- /dev/null > +++ b/drivers/misc/mic/host/mic_fops.h > @@ -0,0 +1,37 @@ > +/* > + * Intel MIC Platform Software Stack (MPSS) > + * > + * Copyright(c) 2013 Intel Corporation. > + * > + * This program is free software; you can redistribute it and/or modify > + * it under the terms of the GNU General Public License, version 2, as > + * published by the Free Software Foundation. > + * > + * This program is distributed in the hope that it will be useful, but > + * WITHOUT ANY WARRANTY; without even the implied warranty of > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + * General Public License for more details. > + * > + * You should have received a copy of the GNU General Public License > + * along with this program; if not, write to the Free Software > + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 > + * USA. > + * > + * The full GNU General Public License is included in this distribution in > + * the file called "COPYING". > + * > + * Intel MIC Host driver. > + * > + */ > +#ifndef _MIC_FOPS_H_ > +#define _MIC_FOPS_H_ > + > +int mic_open(struct inode *inode, struct file *filp); > +int mic_release(struct inode *inode, struct file *filp); > +ssize_t mic_read(struct file *filp, char __user *buf, > + size_t count, loff_t *pos); > +long mic_ioctl(struct file *filp, unsigned int cmd, unsigned long arg); > +int mic_mmap(struct file *f, struct vm_area_struct *vma); > +unsigned int mic_poll(struct file *f, poll_table *wait); > + > +#endif > diff --git a/drivers/misc/mic/host/mic_main.c b/drivers/misc/mic/host/mic_main.c > index 70cc235..dd421d5 100644 > --- a/drivers/misc/mic/host/mic_main.c > +++ b/drivers/misc/mic/host/mic_main.c > @@ -37,6 +37,8 @@ > > #include "mic_common.h" > #include "mic_debugfs.h" > +#include "mic_fops.h" > +#include "mic_virtio.h" > > static const char mic_driver_name[] = "mic"; > > @@ -79,6 +81,15 @@ struct mic_info { > /* g_mic - Global information about all MIC devices. */ > static struct mic_info g_mic; > > +static const struct file_operations mic_fops = { > + .open = mic_open, > + .release = mic_release, > + .unlocked_ioctl = mic_ioctl, > + .poll = mic_poll, > + .mmap = mic_mmap, > + .owner = THIS_MODULE, > +}; > + > /* Initialize the device page */ > static int mic_dp_init(struct mic_device *mdev) > { > @@ -968,8 +979,20 @@ static int mic_probe(struct pci_dev *pdev, const struct pci_device_id *ent) > mic_bootparam_init(mdev); > > mic_create_debug_dir(mdev); > + cdev_init(&mdev->cdev, &mic_fops); > + mdev->cdev.owner = THIS_MODULE; > + rc = cdev_add(&mdev->cdev, MKDEV(MAJOR(g_mic.dev), mdev->id), 1); > + if (rc) { > + dev_err(&pdev->dev, "cdev_add err id %d rc %d\n", mdev->id, rc); > + goto cleanup_debug_dir; > + } > dev_info(&pdev->dev, "Probe successful for %s\n", mdev->name); > return 0; > +cleanup_debug_dir: > + mic_delete_debug_dir(mdev); > + mutex_lock(&mdev->mic_mutex); > + mic_free_irq(mdev, mdev->shutdown_cookie, mdev); > + mutex_unlock(&mdev->mic_mutex); > dp_uninit: > mic_dp_uninit(mdev); > sysfs_put: > @@ -1019,6 +1042,7 @@ static void mic_remove(struct pci_dev *pdev) > id = mdev->id; > > mic_stop(mdev, false); > + cdev_del(&mdev->cdev); > mic_delete_debug_dir(mdev); > mutex_lock(&mdev->mic_mutex); > mic_free_irq(mdev, mdev->shutdown_cookie, mdev); > diff --git a/drivers/misc/mic/host/mic_virtio.c b/drivers/misc/mic/host/mic_virtio.c > new file mode 100644 > index 0000000..7282e12 > --- /dev/null > +++ b/drivers/misc/mic/host/mic_virtio.c > @@ -0,0 +1,703 @@ > +/* > + * Intel MIC Platform Software Stack (MPSS) > + * > + * Copyright(c) 2013 Intel Corporation. > + * > + * This program is free software; you can redistribute it and/or modify > + * it under the terms of the GNU General Public License, version 2, as > + * published by the Free Software Foundation. > + * > + * This program is distributed in the hope that it will be useful, but > + * WITHOUT ANY WARRANTY; without even the implied warranty of > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + * General Public License for more details. > + * > + * You should have received a copy of the GNU General Public License > + * along with this program; if not, write to the Free Software > + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 > + * USA. > + * > + * The full GNU General Public License is included in this distribution in > + * the file called "COPYING". > + * > + * Intel MIC Host driver. > + * > + */ > +#include <linux/module.h> > +#include <linux/fs.h> > +#include <linux/pci.h> > +#include <linux/interrupt.h> > +#include <linux/firmware.h> > +#include <linux/completion.h> > +#include <linux/poll.h> > +#include <linux/sched.h> > +#include <uapi/linux/virtio_ids.h> > +#include <uapi/linux/virtio_net.h> > + > +#include "mic_common.h" > +#include "mic_virtio.h" > + > +/* See comments in vhost.c for explanation of next_desc() */ > +static unsigned next_desc(struct vring_desc *desc) > +{ > + unsigned int next; > + > + if (!(le16_to_cpu(desc->flags) & VRING_DESC_F_NEXT)) > + return -1U; > + next = le16_to_cpu(desc->next); > + read_barrier_depends(); > + return next; > +} > + > +/* > + * Central API which initiates the copies across the PCIe bus. > + */ > +static int mic_virtio_copy_desc_buf(struct mic_vdev *mvdev, > + struct vring_desc *desc, > + void __user *ubuf, u32 rem_len, u32 doff, u32 *out_len) > +{ > + void __iomem *dbuf; > + int err; > + u32 len = le32_to_cpu(desc->len); > + u16 flags = le16_to_cpu(desc->flags); > + u64 addr = le64_to_cpu(desc->addr); > + > + dbuf = mvdev->mdev->aper.va + addr + doff; > + *out_len = min_t(u32, rem_len, len - doff); > + if (flags & VRING_DESC_F_WRITE) { > + /* > + * We are copying to IO below and the subsequent > + * wmb(..) ensures that the stores have completed.It doesn't - you would need to read card memory for this. What wmb does is order previous stores wrt subsequent stores. So I am guessing you really want to move this smb to where avail ring is written.> + * We should ideally use something like > + * copy_from_user_toio(..) if it existed. > + */ > + if (copy_from_user(dbuf, ubuf, *out_len)) { > + err = -EFAULT; > + dev_err(mic_dev(mvdev), "%s %d err %d\n", > + __func__, __LINE__, err); > + goto err; > + } > + mvdev->out_bytes += *out_len; > + wmb(); > + } else { > + /* > + * We are copying from IO below and the subsequent > + * rmb(..) ensures that the loads have completed. > + * We should ideally use something like > + * copy_to_user_fromio(..) if it existed. > + */ > + if (copy_to_user(ubuf, dbuf, *out_len)) { > + err = -EFAULT; > + dev_err(mic_dev(mvdev), "%s %d err %d\n", > + __func__, __LINE__, err); > + goto err; > + } > + mvdev->in_bytes += *out_len; > + rmb(); > + } > + err = 0; > +err: > + dev_dbg(mic_dev(mvdev), > + "%s: ubuf %p dbuf %p rem_len 0x%x *out_len 0x%x " > + "dlen 0x%x desc->writable %d err %d\n", > + __func__, ubuf, dbuf, rem_len, *out_len, > + len, flags & VRING_DESC_F_WRITE, err); > + return err; > +} > + > +/* Iterate over the virtio descriptor chain and issue the copies */ > +static int _mic_virtio_copy(struct mic_vdev *mvdev, > + struct mic_copy *copy, bool chain) > +{ > + struct mic_vring *vr; > + struct vring_desc *desc; > + u32 desc_idx = copy->desc_idx; > + int ret = 0, iovcnt = copy->iovcnt; > + struct iovec iov; > + struct iovec __user *u_iov = copy->iov; > + u32 rem_ulen, rem_dlen, len, doff; > + void __user *ubuf = NULL; > + > + vr = &mvdev->vring[copy->vr_idx]; > + desc = vr->vr.desc; > + copy->out_len = 0; > + rem_dlen = le32_to_cpu(desc[desc_idx].len); > + rem_ulen = 0; > + doff = 0; > + > + while (iovcnt && desc_idx != -1U) { > + if (!rem_ulen) { > + /* Copy over a new iovec */ > + ret = copy_from_user(&iov, u_iov, sizeof(*u_iov)); > + if (ret) { > + ret = -EINVAL; > + dev_err(mic_dev(mvdev), "%s %d err %d\n", > + __func__, __LINE__, ret); > + break; > + } > + rem_ulen = iov.iov_len; > + ubuf = iov.iov_base; > + } > + ret = mic_virtio_copy_desc_buf(mvdev, > + &desc[desc_idx], > + ubuf, rem_ulen, doff, &len); > + if (ret) > + break; > + > + dev_dbg(mic_dev(mvdev), > + "%s: desc_idx 0x%x rem_ulen 0x%x rem_dlen 0x%x " > + "doff 0x%x dlen 0x%x\n", > + __func__, desc_idx, rem_ulen, rem_dlen, > + doff, le32_to_cpu(desc[desc_idx].len)); > + > + copy->out_len += len; > + rem_ulen -= len; > + rem_dlen -= len; > + ubuf += len; > + doff += len; > + /* One iovec is now completed */ > + if (!rem_ulen) { > + iovcnt--; > + u_iov++; > + } > + /* One descriptor is now completed */ > + if (!rem_dlen) { > + desc_idx = next_desc(&desc[desc_idx]); > + if (desc_idx != -1U) { > + rem_dlen = le32_to_cpu(desc[desc_idx].len); > + doff = 0; > + }looks like desc_idx here can become outside the range of desc array.> + } > + } > + /* > + * Return EINVAL if a chain should be processed, but we have run out > + * of iovecs while there are readable descriptors remaining in the > + * chain. > + */ > + if (chain && desc_idx != -1U && > + !(le16_to_cpu(desc->flags) & VRING_DESC_F_WRITE)) { > + dev_err(mic_dev(mvdev), "%s not enough iovecs\n", __func__); > + ret = -EINVAL; > + } > + return ret; > +} > + > +static inline void > +mic_update_local_avail(struct mic_vdev *mvdev, u8 vr_idx) > +{ > + struct mic_vring *vr = &mvdev->vring[vr_idx]; > + vr->info->avail_idx++; > +} > + > +/* Update the used ring */ > +static void mic_update_used(struct mic_vdev *mvdev, u8 vr_idx, > + u32 used_desc_idx, u32 used_len) > +{ > + struct mic_vring *vr = &mvdev->vring[vr_idx]; > + u16 used_idx; > + s8 db = mvdev->dc->h2c_vdev_db; > + > + used_idx = vr->info->used_idx & (vr->vr.num - 1); > + iowrite32(used_desc_idx, &vr->vr.used->ring[used_idx].id); > + iowrite32(used_len, &vr->vr.used->ring[used_idx].len); > + wmb(); > + iowrite16(++vr->info->used_idx, &vr->vr.used->idx); > + dev_dbg(mic_dev(mvdev), > + "%s: ======== vr_idx %d used_idx 0x%x used_len 0x%x ========\n", > + __func__, vr_idx, used_desc_idx, used_len); > + wmb();Are you trying to make sure avail flags read below is ordered with respect to used index write here? If yes you need an mb() not just a wmb().> + /* Check if the remote device wants us to suppress interrupts */ > + if (le16_to_cpu(vr->vr.avail->flags) & VRING_AVAIL_F_NO_INTERRUPT) > + return; > + if (db != -1) > + mvdev->mdev->ops->send_intr(mvdev->mdev, db); > +} > + > +static inline int verify_copy_args(struct mic_vdev *mvdev, > + struct mic_copy *request) > +{ > + if (request->vr_idx >= mvdev->dd->num_vq) { > + dev_err(mic_dev(mvdev), "%s %d err %d\n", > + __func__, __LINE__, -EINVAL); > + return -EINVAL; > + } > + > + if (request->desc_idx >> + le16_to_cpu(mic_vq_config(mvdev->dd)->num)) { > + dev_err(mic_dev(mvdev), "%s %d err %d\n", > + __func__, __LINE__, -EINVAL); > + return -EINVAL; > + } > + > + return 0; > +} > + > +#define PROCESS_DESC_CHAIN true > + > +/* Copy a specified number of virtio descriptors in a chain */ > +int mic_virtio_copy_desc(struct mic_vdev *mvdev, > + struct mic_copy_desc *request) > +{ > + int err; > + struct mutex *vr_mutex; > + > + err = verify_copy_args(mvdev, &request->copy); > + if (err) > + return err; > + > + vr_mutex = &mvdev->vr_mutex[request->copy.vr_idx]; > + mutex_lock(vr_mutex); > + if (!mic_vdevup(mvdev)) { > + err = -ENODEV; > + dev_err(mic_dev(mvdev), "%s %d err %d\n", > + __func__, __LINE__, err); > + goto err; > + } > + err = _mic_virtio_copy(mvdev, &request->copy, !PROCESS_DESC_CHAIN); > + if (err) { > + dev_err(mic_dev(mvdev), "%s %d err %d\n", > + __func__, __LINE__, err); > + } else if (request->used_desc_idx != -1) { > + if (request->used_desc_idx >> + le16_to_cpu(mic_vq_config(mvdev->dd)->num)) { > + dev_err(mic_dev(mvdev), "%s %d err %d\n", > + __func__, __LINE__, -EINVAL); > + err = -EINVAL; > + goto err; > + } > + mic_update_local_avail(mvdev, request->copy.vr_idx); > + mic_update_used(mvdev, request->copy.vr_idx, > + request->used_desc_idx, request->used_len); > + } > +err: > + mutex_unlock(vr_mutex); > + return err; > +} > + > +/* Copy a chain of virtio descriptors */ > +int mic_virtio_copy_chain(struct mic_vdev *mvdev, > + struct mic_copy *request) > +{ > + int err; > + struct mutex *vr_mutex; > + > + err = verify_copy_args(mvdev, request); > + if (err) > + return err; > + > + vr_mutex = &mvdev->vr_mutex[request->vr_idx]; > + mutex_lock(vr_mutex); > + if (!mic_vdevup(mvdev)) { > + err = -ENODEV; > + dev_err(mic_dev(mvdev), "%s %d err %d\n", > + __func__, __LINE__, err); > + goto err; > + } > + err = _mic_virtio_copy(mvdev, request, PROCESS_DESC_CHAIN); > + if (!err) { > + mic_update_local_avail(mvdev, request->vr_idx); > + mic_update_used(mvdev, request->vr_idx, > + request->desc_idx, request->out_len); > + } else > + dev_err(mic_dev(mvdev), "%s %d err %d\n", > + __func__, __LINE__, err); > +err: > + mutex_unlock(vr_mutex); > + return err; > +} > + > +static void mic_virtio_init_post(struct mic_vdev *mvdev) > +{ > + struct mic_vqconfig *vqconfig = mic_vq_config(mvdev->dd); > + int i; > + > + for (i = 0; i < mvdev->dd->num_vq; i++) { > + if (!le64_to_cpu(vqconfig[i].used_address)) { > + dev_warn(mic_dev(mvdev), "used_address zero??\n"); > + continue; > + } > + mvdev->vring[i].vr.used > + mvdev->mdev->aper.va + > + le64_to_cpu(vqconfig[i].used_address); > + } > + > + smp_wmb();Looking at smp_XX macros, here and elsewhere this driver only has smp_wmb. This seems to violate SMP barrier pairing rules in Documentation/memory-barriers.txt> + mvdev->dc->used_address_updated = 0; > + > + dev_info(mic_dev(mvdev), "%s: device type %d LINKUP\n", > + __func__, mvdev->virtio_id); > +} > + > +static inline void mic_virtio_device_reset(struct mic_vdev *mvdev) > +{ > + int i; > + > + dev_info(mic_dev(mvdev), "%s: status %d device type %d RESET\n", > + __func__, mvdev->dd->status, mvdev->virtio_id); > + > + for (i = 0; i < mvdev->dd->num_vq; i++) > + /* > + * Avoid lockdep false positive. The + 1 is for the mic > + * mutex which is held in the reset devices code path. > + */ > + mutex_lock_nested(&mvdev->vr_mutex[i], i + 1); > + > + /* 0 status means "reset" */ > + mvdev->dd->status = 0; > + mvdev->dc->vdev_reset = 0; > + mvdev->dc->host_ack = 1; > + > + for (i = 0; i < mvdev->dd->num_vq; i++) { > + mvdev->vring[i].info->avail_idx = 0; > + mvdev->vring[i].info->used_idx = 0; > + } > + > + for (i = 0; i < mvdev->dd->num_vq; i++) > + mutex_unlock(&mvdev->vr_mutex[i]); > +} > + > +void mic_virtio_reset_devices(struct mic_device *mdev) > +{ > + struct list_head *pos, *tmp; > + struct mic_vdev *mvdev; > + > + dev_info(&mdev->pdev->dev, "%s\n", __func__); > + > + WARN_ON(!mutex_is_locked(&mdev->mic_mutex)); > + list_for_each_safe(pos, tmp, &mdev->vdev_list) { > + mvdev = list_entry(pos, struct mic_vdev, list); > + mic_virtio_device_reset(mvdev); > + mvdev->poll_wake = 1; > + wake_up(&mvdev->waitq); > + } > +} > + > +void mic_bh_handler(struct work_struct *work) > +{ > + struct mic_vdev *mvdev = container_of(work, struct mic_vdev, > + virtio_bh_work); > + > + if (mvdev->dc->used_address_updated) > + mic_virtio_init_post(mvdev); > + > + if (mvdev->dc->vdev_reset) > + mic_virtio_device_reset(mvdev); > + > + mvdev->poll_wake = 1; > + wake_up(&mvdev->waitq); > +} > + > +static irqreturn_t mic_virtio_intr_handler(int irq, void *data) > +{ > + > + struct mic_vdev *mvdev = data; > + struct mic_device *mdev = mvdev->mdev; > + > + mdev->ops->ack_interrupt(mdev); > + schedule_work(&mvdev->virtio_bh_work); > + return IRQ_HANDLED; > +} > + > +int mic_virtio_config_change(struct mic_vdev *mvdev, > + void __user *argp) > +{ > + DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wake); > + int ret = 0, retry = 100, i; > + struct mic_bootparam *bootparam = mvdev->mdev->dp; > + s8 db = bootparam->h2c_config_db; > + > + mutex_lock(&mvdev->mdev->mic_mutex); > + for (i = 0; i < mvdev->dd->num_vq; i++) > + mutex_lock_nested(&mvdev->vr_mutex[i], i + 1); > + > + if (db == -1 || mvdev->dd->type == -1) { > + ret = -EIO; > + goto exit; > + } > + > + if (copy_from_user(mic_vq_configspace(mvdev->dd), > + argp, mvdev->dd->config_len)) { > + dev_err(mic_dev(mvdev), "%s %d err %d\n", > + __func__, __LINE__, -EFAULT); > + ret = -EFAULT; > + goto exit; > + } > + mvdev->dc->config_change = MIC_VIRTIO_PARAM_CONFIG_CHANGED; > + smp_wmb(); > + mvdev->mdev->ops->send_intr(mvdev->mdev, db); > + > + for (i = retry; i--;) { > + ret = wait_event_timeout(wake, > + mvdev->dc->guest_ack, msecs_to_jiffies(100)); > + if (ret) > + break; > + } > + > + dev_info(mic_dev(mvdev), > + "%s %d retry: %d\n", __func__, __LINE__, retry); > + mvdev->dc->config_change = 0; > + mvdev->dc->guest_ack = 0; > +exit: > + for (i = 0; i < mvdev->dd->num_vq; i++) > + mutex_unlock(&mvdev->vr_mutex[i]); > + mutex_unlock(&mvdev->mdev->mic_mutex); > + return ret; > +} > + > +static int mic_copy_dp_entry(struct mic_vdev *mvdev, > + void __user *argp, > + __u8 *type, > + struct mic_device_desc **devpage) > +{ > + struct mic_device *mdev = mvdev->mdev; > + struct mic_device_desc dd, *dd_config, *devp; > + struct mic_vqconfig *vqconfig; > + int ret = 0, i; > + bool slot_found = false; > + > + if (copy_from_user(&dd, argp, sizeof(dd))) { > + dev_err(mic_dev(mvdev), "%s %d err %d\n", > + __func__, __LINE__, -EFAULT); > + return -EFAULT; > + } > + > + if (mic_aligned_desc_size(&dd) > MIC_MAX_DESC_BLK_SIZE > + || dd.num_vq > MIC_MAX_VRINGS) { > + dev_err(mic_dev(mvdev), "%s %d err %d\n", > + __func__, __LINE__, -EINVAL); > + return -EINVAL; > + } > + > + dd_config = kmalloc(mic_desc_size(&dd), GFP_KERNEL); > + if (dd_config == NULL) { > + dev_err(mic_dev(mvdev), "%s %d err %d\n", > + __func__, __LINE__, -ENOMEM); > + return -ENOMEM; > + } > + if (copy_from_user(dd_config, argp, mic_desc_size(&dd))) { > + ret = -EFAULT; > + dev_err(mic_dev(mvdev), "%s %d err %d\n", > + __func__, __LINE__, ret); > + goto exit; > + } > + > + vqconfig = mic_vq_config(dd_config); > + for (i = 0; i < dd.num_vq; i++) { > + if (le16_to_cpu(vqconfig[i].num) > MIC_MAX_VRING_ENTRIES) { > + ret = -EINVAL; > + dev_err(mic_dev(mvdev), "%s %d err %d\n", > + __func__, __LINE__, ret); > + goto exit; > + } > + } > + > + /* Find the first free device page entry */ > + for (i = mic_aligned_size(struct mic_bootparam); > + i < MIC_DP_SIZE - mic_total_desc_size(dd_config); > + i += mic_total_desc_size(devp)) { > + devp = mdev->dp + i; > + if (devp->type == 0 || devp->type == -1) { > + slot_found = true; > + break; > + } > + } > + if (!slot_found) { > + ret = -EINVAL; > + dev_err(mic_dev(mvdev), "%s %d err %d\n", > + __func__, __LINE__, ret); > + goto exit; > + } > + > + /* Save off the type before doing the memcpy. Type will be set in the > + * end after completing all initialization for the new device */ > + *type = dd_config->type; > + dd_config->type = 0; > + memcpy(devp, dd_config, mic_desc_size(dd_config)); > + > + *devpage = devp; > +exit: > + kfree(dd_config); > + return ret; > +} > + > +static void mic_init_device_ctrl(struct mic_vdev *mvdev, > + struct mic_device_desc *devpage) > +{ > + struct mic_device_ctrl *dc; > + > + dc = mvdev->dc = (void *)devpage + mic_aligned_desc_size(devpage); > + > + dc->config_change = 0; > + dc->guest_ack = 0; > + dc->vdev_reset = 0; > + dc->host_ack = 0; > + dc->used_address_updated = 0; > + dc->c2h_vdev_db = -1; > + dc->h2c_vdev_db = -1; > +} > + > +int mic_virtio_add_device(struct mic_vdev *mvdev, > + void __user *argp) > +{ > + struct mic_device *mdev = mvdev->mdev; > + struct mic_device_desc *dd; > + struct mic_vqconfig *vqconfig; > + int vr_size, i, j, ret; > + u8 type; > + s8 db; > + char irqname[10]; > + struct mic_bootparam *bootparam = mdev->dp; > + u16 num; > + > + mutex_lock(&mdev->mic_mutex); > + > + ret = mic_copy_dp_entry(mvdev, argp, &type, &dd); > + if (ret) { > + mutex_unlock(&mdev->mic_mutex); > + return ret; > + } > + > + mic_init_device_ctrl(mvdev, dd); > + > + mvdev->dd = dd; > + mvdev->virtio_id = type; > + vqconfig = mic_vq_config(dd); > + INIT_WORK(&mvdev->virtio_bh_work, mic_bh_handler); > + > + for (i = 0; i < dd->num_vq; i++) { > + struct mic_vring *vr = &mvdev->vring[i]; > + num = le16_to_cpu(vqconfig[i].num); > + mutex_init(&mvdev->vr_mutex[i]); > + vr_size = PAGE_ALIGN(vring_size(num, MIC_VIRTIO_RING_ALIGN) + > + sizeof(struct _mic_vring_info)); > + vr->va = (void *) > + __get_free_pages(GFP_KERNEL | __GFP_ZERO, > + get_order(vr_size)); > + if (!vr->va) { > + ret = -ENOMEM; > + dev_err(mic_dev(mvdev), "%s %d err %d\n", > + __func__, __LINE__, ret); > + goto err; > + } > + vr->len = vr_size; > + vr->info = vr->va + vring_size(num, MIC_VIRTIO_RING_ALIGN); > + vr->info->magic = MIC_MAGIC + mvdev->virtio_id + i; > + vqconfig[i].address = mic_map_single(mdev, > + vr->va, vr_size); > + if (mic_map_error(vqconfig[i].address)) { > + free_pages((unsigned long)vr->va, > + get_order(vr_size)); > + ret = -ENOMEM; > + dev_err(mic_dev(mvdev), "%s %d err %d\n", > + __func__, __LINE__, ret); > + goto err; > + } > + vqconfig[i].address = cpu_to_le64(vqconfig[i].address); > + > + vring_init(&vr->vr, num, > + vr->va, MIC_VIRTIO_RING_ALIGN); > + > + dev_dbg(&mdev->pdev->dev, > + "%s %d index %d va %p info %p vr_size 0x%x\n", > + __func__, __LINE__, i, vr->va, vr->info, vr_size); > + } > + > + snprintf(irqname, sizeof(irqname), > + "mic%dvirtio%d", mdev->id, mvdev->virtio_id); > + mvdev->virtio_db = mic_next_db(mdev); > + mvdev->virtio_cookie = mic_request_irq(mdev, mic_virtio_intr_handler, > + irqname, mvdev, mvdev->virtio_db, MIC_INTR_DB); > + if (IS_ERR(mvdev->virtio_cookie)) { > + ret = PTR_ERR(mvdev->virtio_cookie); > + dev_dbg(&mdev->pdev->dev, "request irq failed\n"); > + goto err; > + } > + > + mvdev->dc->c2h_vdev_db = mvdev->virtio_db; > + > + list_add_tail(&mvdev->list, &mdev->vdev_list); > + /* > + * Now that we are completely initialized, set the type to "commit" > + * the addition of the new device. > + * For x86 we only need a compiler barrier before dd->type. For other > + * platforms we need smp_wmb(..) since we are writing to system memory > + * and type needs to be visible to all CPUs or MIC. > + */ > + smp_wmb(); > + dd->type = type; > + > + dev_info(&mdev->pdev->dev, "Added virtio device id %d\n", dd->type); > + > + db = bootparam->h2c_config_db; > + if (db != -1) > + mdev->ops->send_intr(mdev, db); > + mutex_unlock(&mdev->mic_mutex); > + return 0; > +err: > + vqconfig = mic_vq_config(dd); > + for (j = 0; j < i; j++) { > + mic_unmap_single(mdev, le64_to_cpu(vqconfig[j].address), > + mvdev->vring[j].len); > + free_pages((unsigned long)mvdev->vring[j].va, > + get_order(mvdev->vring[j].len)); > + } > + mutex_unlock(&mdev->mic_mutex); > + return ret; > +} > + > +void mic_virtio_del_device(struct mic_vdev *mvdev) > +{ > + struct list_head *pos, *tmp; > + struct mic_vdev *tmp_mvdev; > + struct mic_device *mdev = mvdev->mdev; > + DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wake); > + int i, ret, retry = 100; > + struct mic_vqconfig *vqconfig; > + struct mic_bootparam *bootparam = mdev->dp; > + s8 db; > + > + mutex_lock(&mdev->mic_mutex); > + db = bootparam->h2c_config_db; > + if (db == -1) > + goto skip_hot_remove; > + dev_info(&mdev->pdev->dev, > + "Requesting hot remove id %d\n", mvdev->virtio_id); > + mvdev->dc->config_change = MIC_VIRTIO_PARAM_DEV_REMOVE; > + smp_wmb(); > + mdev->ops->send_intr(mdev, db); > + for (i = retry; i--;) { > + ret = wait_event_timeout(wake, > + mvdev->dc->guest_ack, msecs_to_jiffies(100)); > + if (ret) > + break; > + } > + dev_info(&mdev->pdev->dev, > + "Device id %d config_change %d guest_ack %d\n", > + mvdev->virtio_id, mvdev->dc->config_change, > + mvdev->dc->guest_ack); > + mvdev->dc->config_change = 0; > + mvdev->dc->guest_ack = 0; > +skip_hot_remove: > + mic_free_irq(mdev, mvdev->virtio_cookie, mvdev); > + flush_work(&mvdev->virtio_bh_work); > + vqconfig = mic_vq_config(mvdev->dd); > + for (i = 0; i < mvdev->dd->num_vq; i++) { > + mic_unmap_single(mdev, le64_to_cpu(vqconfig[i].address), > + mvdev->vring[i].len); > + free_pages((unsigned long)mvdev->vring[i].va, > + get_order(mvdev->vring[i].len)); > + } > + > + list_for_each_safe(pos, tmp, &mdev->vdev_list) { > + tmp_mvdev = list_entry(pos, struct mic_vdev, list); > + if (tmp_mvdev == mvdev) { > + list_del(pos); > + dev_info(&mdev->pdev->dev, > + "Removing virtio device id %d\n", > + mvdev->virtio_id); > + break; > + } > + } > + mvdev->dd->type = -1; > + mutex_unlock(&mdev->mic_mutex); > +} > diff --git a/drivers/misc/mic/host/mic_virtio.h b/drivers/misc/mic/host/mic_virtio.h > new file mode 100644 > index 0000000..1e2a439 > --- /dev/null > +++ b/drivers/misc/mic/host/mic_virtio.h > @@ -0,0 +1,108 @@ > +/* > + * Intel MIC Platform Software Stack (MPSS) > + * > + * Copyright(c) 2013 Intel Corporation. > + * > + * This program is free software; you can redistribute it and/or modify > + * it under the terms of the GNU General Public License, version 2, as > + * published by the Free Software Foundation. > + * > + * This program is distributed in the hope that it will be useful, but > + * WITHOUT ANY WARRANTY; without even the implied warranty of > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + * General Public License for more details. > + * > + * You should have received a copy of the GNU General Public License > + * along with this program; if not, write to the Free Software > + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 > + * USA. > + * > + * The full GNU General Public License is included in this distribution in > + * the file called "COPYING". > + * > + * Intel MIC Host driver. > + * > + */ > +#ifndef MIC_VIRTIO_H > +#define MIC_VIRTIO_H > + > +#include <linux/types.h> > +#include <linux/virtio_ring.h> > +#include <linux/virtio_config.h> > + > +#include <linux/mic_ioctl.h> > + > +/* > + * Note on endianness. > + * 1. Host can be both BE or LE > + * 2. Guest/card is LE. Host uses le_to_cpu to access desc/avail > + * rings and ioreadXX/iowriteXX to access used ring. > + * 3. Device page exposed by host to guest contains LE values. Guest > + * accesses these using ioreadXX/iowriteXX etc. This way in general we > + * obey the virtio spec according to which guest works with native > + * endianness and host is aware of guest endianness and does all > + * required endianness conversion. > + * 4. Data provided from user space to guest (in ADD_DEVICE and > + * CONFIG_CHANGE ioctl's) is not interpreted by the driver and should be > + * in guest endianness. > + */ > + > +struct mic_vdev { > + int virtio_id; > + wait_queue_head_t waitq; > + struct mic_device *mdev; > + int poll_wake; > + unsigned long out_bytes; > + unsigned long in_bytes; > + struct mic_vring vring[MIC_MAX_VRINGS]; > + struct work_struct virtio_bh_work; > + struct mutex vr_mutex[MIC_MAX_VRINGS]; > + struct mic_device_desc *dd; > + struct mic_device_ctrl *dc; > + struct list_head list; > + int virtio_db; > + struct mic_irq *virtio_cookie; > +}; > + > +void mic_virtio_uninit(struct mic_device *mdev); > +int mic_virtio_add_device(struct mic_vdev *mvdev, > + void __user *argp); > +void mic_virtio_del_device(struct mic_vdev *mvdev); > +int mic_virtio_config_change(struct mic_vdev *mvdev, > + void __user *argp); > +int mic_virtio_copy_desc(struct mic_vdev *mvdev, > + struct mic_copy_desc *request); > +void mic_virtio_reset_devices(struct mic_device *mdev); > +int mic_virtio_copy_chain(struct mic_vdev *mvdev, > + struct mic_copy *request); > +void mic_bh_handler(struct work_struct *work); > + > +static inline struct device *mic_dev(struct mic_vdev *mvdev) > +{ > + return &mvdev->mdev->pdev->dev; > +} > + > +static inline int mic_vdev_inited(struct mic_vdev *mvdev) > +{ > + /* Device has not been created yet */ > + if (!mvdev->dd || !mvdev->dd->type) { > + dev_err(mic_dev(mvdev), "%s %d err %d\n", > + __func__, __LINE__, -EINVAL); > + return -EINVAL; > + } > + > + /* Device has been removed/deleted */ > + if (mvdev->dd->type == -1) { > + dev_err(mic_dev(mvdev), "%s %d err %d\n", > + __func__, __LINE__, -ENODEV); > + return -ENODEV; > + } > + > + return 0; > +} > + > +static inline bool mic_vdevup(struct mic_vdev *mvdev) > +{ > + return !!mvdev->dd->status; > +} > +#endif > diff --git a/include/uapi/linux/Kbuild b/include/uapi/linux/Kbuild > index 8f985dd..1579aab 100644 > --- a/include/uapi/linux/Kbuild > +++ b/include/uapi/linux/Kbuild > @@ -240,6 +240,7 @@ header-y += mei.h > header-y += mempolicy.h > header-y += meye.h > header-y += mic_common.h > +header-y += mic_ioctl.h > header-y += mii.h > header-y += minix_fs.h > header-y += mman.h > diff --git a/include/uapi/linux/mic_common.h b/include/uapi/linux/mic_common.h > index b8edede..2576d0b 100644 > --- a/include/uapi/linux/mic_common.h > +++ b/include/uapi/linux/mic_common.h > @@ -26,7 +26,61 @@ > #ifndef __MIC_COMMON_H_ > #define __MIC_COMMON_H_ > > -#include <linux/types.h> > +#include <linux/virtio_ring.h> > + > +#ifndef __KERNEL__ > +#define ALIGN(a, x) (((a) + (x) - 1) & ~((x) - 1)) > +#define __aligned(x) __attribute__ ((aligned(x))) > +#endif > + > +#define mic_aligned_size(x) ALIGN(sizeof(x), 8) > + > + > +/** > + * struct mic_device_desc: Virtio device information shared between the > + * virtio driver and userspace backend > + * > + * @type: Device type: console/network/disk etc. Type 0/-1 terminates. > + * @num_vq: Number of virtqueues. > + * @feature_len: Number of bytes of feature bits. Multiply by 2: one for > + host features and one for guest acknowledgements. > + * @config_len: Number of bytes of the config array after virtqueues. > + * @status: A status byte, written by the Guest. > + * @config: Start of the following variable length config. > + */ > +struct mic_device_desc { > + __s8 type; > + __u8 num_vq; > + __u8 feature_len; > + __u8 config_len; > + __u8 status; > + __u64 config[0]; > +} __aligned(8); > + > +/** > + * struct mic_device_ctrl: Per virtio device information in the device page > + * used internally by the host and card side drivers. > + * > + * @vdev: Used for storing MIC vdev information by the guest. > + * @config_change: Set to 1 by host when a config change is requested. > + * @vdev_reset: Set to 1 by guest to indicate virtio device has been reset. > + * @guest_ack: Set to 1 by guest to ack a command. > + * @host_ack: Set to 1 by host to ack a command. > + * @used_address_updated: Set to 1 by guest when the used address should be > + * updated. > + * @c2h_vdev_db: The doorbell number to be used by guest. Set by host. > + * @h2c_vdev_db: The doorbell number to be used by host. Set by guest. > + */ > +struct mic_device_ctrl { > + __u64 vdev; > + __u8 config_change; > + __u8 vdev_reset; > + __u8 guest_ack; > + __u8 host_ack; > + __u8 used_address_updated; > + __s8 c2h_vdev_db; > + __s8 h2c_vdev_db; > +} __aligned(8); > > /** > * struct mic_bootparam: Virtio device independent information in device page > @@ -47,6 +101,115 @@ struct mic_bootparam { > __u8 shutdown_card; > } __aligned(8); > > +/** > + * struct mic_device_page: High level representation of the device page > + * > + * @bootparam: The bootparam structure is used for sharing information and > + * status updates between MIC host and card drivers. > + * @desc: Array of MIC virtio device descriptors. > + */ > +struct mic_device_page { > + struct mic_bootparam bootparam; > + struct mic_device_desc desc[0]; > +}; > +/** > + * struct mic_vqconfig: This is how we expect the device configuration field > + * for a virtqueue to be laid out in config space. > + * > + * @address: Guest/MIC physical address of the virtio ring > + * (avail and desc rings) > + * @used_address: Guest/MIC physical address of the used ring > + * @num: The number of entries in the virtio_ring > + */ > +struct mic_vqconfig { > + __u64 address; > + __u64 used_address; > + __u16 num; > +} __aligned(8); > + > +/* The alignment to use between consumer and producer parts of vring. > + * This is pagesize for historical reasons. */ > +#define MIC_VIRTIO_RING_ALIGN 4096 > + > +#define MIC_MAX_VRINGS 4 > +#define MIC_VRING_ENTRIES 128 > + > +/* > + * Max vring entries (power of 2) to ensure desc and avail rings > + * fit in a single page > + */ > +#define MIC_MAX_VRING_ENTRIES 128 > + > +/** > + * Max size of the desc block in bytes: includes: > + * - struct mic_device_desc > + * - struct mic_vqconfig (num_vq of these) > + * - host and guest features > + * - virtio device config space > + */ > +#define MIC_MAX_DESC_BLK_SIZE 256 > + > +/** > + * struct _mic_vring_info - Host vring info exposed to userspace backend > + * > + * @avail_idx: host avail idx > + * @used_idx: host used idx > + * @magic: A magic debug cookie. > + */ > +struct _mic_vring_info { > + __u16 avail_idx; > + __u16 used_idx; > + int magic; > +}; > + > +/** > + * struct mic_vring - Vring information. > + * > + * @vr: The virtio ring. > + * @info: Host vring information exposed to the card. > + * @va: The va for the buffer allocated for vr and info. > + * @len: The length of the buffer required for allocating vr and info. > + */ > +struct mic_vring { > + struct vring vr; > + struct _mic_vring_info *info; > + void *va; > + int len; > +}; > + > +#define mic_aligned_desc_size(d) ALIGN(mic_desc_size(d), 8) > + > +#ifndef INTEL_MIC_CARD > +static inline unsigned mic_desc_size(const struct mic_device_desc *desc) > +{ > + return mic_aligned_size(*desc) > + + desc->num_vq * mic_aligned_size(struct mic_vqconfig) > + + desc->feature_len * 2 > + + desc->config_len; > +} > + > +static inline struct mic_vqconfig * > +mic_vq_config(const struct mic_device_desc *desc) > +{ > + return (struct mic_vqconfig *)(desc + 1); > +} > + > +static inline __u8 *mic_vq_features(const struct mic_device_desc *desc) > +{ > + return (__u8 *)(mic_vq_config(desc) + desc->num_vq); > +} > + > +static inline __u8 *mic_vq_configspace(const struct mic_device_desc *desc) > +{ > + return mic_vq_features(desc) + desc->feature_len * 2; > +} > +static inline unsigned mic_total_desc_size(struct mic_device_desc *desc) > +{ > + return mic_aligned_desc_size(desc) + > + mic_aligned_size(struct mic_device_ctrl); > +} > +#endif > + > /* Device page size */ > #define MIC_DP_SIZE 4096 > > diff --git a/include/uapi/linux/mic_ioctl.h b/include/uapi/linux/mic_ioctl.h > new file mode 100644 > index 0000000..02e1518 > --- /dev/null > +++ b/include/uapi/linux/mic_ioctl.h > @@ -0,0 +1,104 @@ > +/* > + * Intel MIC Platform Software Stack (MPSS) > + * > + * Copyright(c) 2013 Intel Corporation. > + * > + * This program is free software; you can redistribute it and/or modify > + * it under the terms of the GNU General Public License, version 2, as > + * published by the Free Software Foundation. > + * > + * This program is distributed in the hope that it will be useful, but > + * WITHOUT ANY WARRANTY; without even the implied warranty of > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + * General Public License for more details. > + * > + * You should have received a copy of the GNU General Public License > + * along with this program; if not, write to the Free Software > + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 > + * USA. > + * > + * The full GNU General Public License is included in this distribution in > + * the file called "COPYING". > + * > + * Intel MIC Host driver. > + * > + */ > +#ifndef _MIC_IOCTL_H_ > +#define _MIC_IOCTL_H_ > + > +#include <linux/mic_common.h> > + > +/* > + * mic_copy - MIC virtio descriptor copy. > + * > + * @iov: An array of IOVEC structures containing user space buffers. > + * @iovcnt: Number of IOVEC structures in iov. > + * @vr_idx: The vring index. > + * @desc_idx: The starting desc index. > + * @out_cookie: A cookie returned by the driver to identify this copy. > + * @out_len: The aggregate of the total length written to or read from > + * the virtio device. > + */ > +struct mic_copy { > +#ifdef __KERNEL__ > + struct iovec __user *iov; > +#else > + struct iovec *iov; > +#endif > + int iovcnt; > + __u8 vr_idx; > + __u32 desc_idx; > + __u64 out_cookie; > + __u32 out_len; > +}; > + > +/* > + * mic_copy_desc - MIC virtio copy. > + * > + * @copy - MIC virtio descriptor copy. > + * @used_desc_idx - The desc index to update the used ring with. > + * The used index is not updated if the used_idx is -1. > + * @used_len - The length to update the used ring with. > + */ > +struct mic_copy_desc { > + struct mic_copy copy; > + __u32 used_desc_idx; > + __u32 used_len; > +}; > + > +/* > + * Add a new virtio device > + * The (struct mic_device_desc *) pointer points to a device page entry > + * for the virtio device consisting of: > + * - struct mic_device_desc > + * - struct mic_vqconfig (num_vq of these) > + * - host and guest features > + * - virtio device config space > + * The total size referenced by the pointer should equal the size returned > + * by desc_size() in mic_common.h > + */ > +#define MIC_VIRTIO_ADD_DEVICE _IOWR('s', 1, struct mic_device_desc *) > + > +/* > + * Copy the number of entries in the iovec and update the used index > + * if requested by the user. > + */ > +#define MIC_VIRTIO_COPY_DESC _IOWR('s', 2, struct mic_copy_desc *) > + > +/* > + * Copy iovec entries upto the length of the chain. The number of entries > + * must be >= the length of the chain else -1 is returned and errno set > + * to EINVAL. > + */ > +#define MIC_VIRTIO_COPY_CHAIN _IOWR('s', 3, struct mic_copy *) > + > +/* > + * Notify virtio device of a config change > + * The (__u8 *) pointer points to config space values for the device > + * as they should be written into the device page. The total size > + * referenced by the pointer should equal the config_len field of struct > + * mic_device_desc. > + */ > +#define MIC_VIRTIO_CONFIG_CHANGE _IOWR('s', 5, __u8 *) > + > +#endif > -- > 1.8.2.1
Greg Kroah-Hartman
2013-Aug-01 01:44 UTC
[PATCH 1/5] Intel MIC Host Driver for X100 family.
On Wed, Jul 24, 2013 at 08:31:32PM -0700, Sudeep Dutt wrote:> This patch enables the following: > a) Initializes the Intel MIC X100 PCIe devices. > b) Boots and shuts down the card via sysfs entries. > c) Allocates and maps a device page for communication with the > card driver and updates the device page address via scratchpad > registers. > d) Provides sysfs entries for family, stepping, state, shutdown > status, kernel command line, IP address, ramdisk and log buffer > information.As you are creating sysfs entries, you also have to create Documentatin/ABI/ entries in the kernel source at the same time. Please do this in your next version of this patch. thanks, greg k-h
Greg Kroah-Hartman
2013-Aug-01 01:45 UTC
[PATCH 1/5] Intel MIC Host Driver for X100 family.
On Wed, Jul 24, 2013 at 08:31:32PM -0700, Sudeep Dutt wrote:> This patch enables the following: > a) Initializes the Intel MIC X100 PCIe devices. > b) Boots and shuts down the card via sysfs entries. > c) Allocates and maps a device page for communication with the > card driver and updates the device page address via scratchpad > registers. > d) Provides sysfs entries for family, stepping, state, shutdown > status, kernel command line, IP address, ramdisk and log buffer > information.That's a lot to do in one patch, almost 4 thousand lines. Can't you break this up into some more smaller, logical, and reviewable, pieces? For example, I have no idea what b) is, and how to separate it from the things you do for a) and c). thanks, greg k-h
Greg Kroah-Hartman
2013-Aug-01 01:46 UTC
[PATCH 0/5] Enable Drivers for Intel MIC X100 Coprocessors.
On Wed, Jul 24, 2013 at 08:31:31PM -0700, Sudeep Dutt wrote:> An Intel MIC X100 device is a PCIe form factor add-in coprocessor > card based on the Intel Many Integrated Core (MIC) architecture > that runs a Linux OS. It is a PCIe endpoint in a platform and therefore > implements the three required standard address spaces i.e. configuration, > memory and I/O. The host OS loads a device driver as is typical for > PCIe devices. The card itself runs a bootstrap after reset that > transfers control to the card OS downloaded from the host driver. > The card OS as shipped by Intel is a Linux kernel with modifications > for the X100 devices. > > Since it is a PCIe card, it does not have the ability to host hardware > devices for networking, storage and console. We provide these devices > on X100 coprocessors thus enabling a self-bootable equivalent environment > for applications. A key benefit of our solution is that it leverages > the standard virtio framework for network, disk and console devices, > though in our case the virtio framework is used across a PCIe bus. > > Here is a block diagram of the various components described above. The > virtio backends are situated on the host rather than the card given better > single threaded performance for the host compared to MIC and the ability of > the host to initiate DMA's to/from the card using the MIC DMA engine. > > | > +----------+ | +----------+ > | Card OS | | | Host OS | > +----------+ | +----------+ > | > +-------+ +--------+ +------+ | +---------+ +--------+ +--------+ > | Virtio| |Virtio | |Virtio| | |Virtio | |Virtio | |Virtio | > | Net | |Console | |Block | | |Net | |Console | |Block | > | Driver| |Driver | |Driver| | |backend | |backend | |backend | > +-------+ +--------+ +------+ | +---------+ +--------+ +--------+ > | | | | | | | > | | | |Ring 3| | | > | | | |------|------------|---------|------- > +-------------------+ |Ring 0+--------------------------+ > | | | Virtio over PCIe IOCTLs | > | | +--------------------------+ > +--------------+ | | > |Intel MIC | | +---------------+ > |Card Driver | | |Intel MIC | > +--------------+ | |Host Driver | > | | +---------------+ > | | | > +-------------------------------------------------------------+ > | | > | PCIe Bus | > +-------------------------------------------------------------+That's some nice information, why isn't it in one of the patches you sent, so that others can read it later on to try to figure out what is going on with this codebase? thanks, greg k-h
Greg Kroah-Hartman
2013-Aug-01 01:51 UTC
[PATCH 1/5] Intel MIC Host Driver for X100 family.
On Wed, Jul 24, 2013 at 08:31:32PM -0700, Sudeep Dutt wrote:> --- /dev/null > +++ b/drivers/misc/mic/common/mic_device.h > @@ -0,0 +1,81 @@ > +/* > + * Intel MIC Platform Software Stack (MPSS) > + * > + * Copyright(c) 2013 Intel Corporation. > + * > + * This program is free software; you can redistribute it and/or modify > + * it under the terms of the GNU General Public License, version 2, as > + * published by the Free Software Foundation. > + * > + * This program is distributed in the hope that it will be useful, but > + * WITHOUT ANY WARRANTY; without even the implied warranty of > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + * General Public License for more details. > + * > + * You should have received a copy of the GNU General Public License > + * along with this program; if not, write to the Free Software > + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 > + * USA.Ok, it's now your task as an Intel employee to get this paragraph stricken from the "default template" that GPL code in the kernel, written by Intel developers, has. It's pointless, and unless you want to be personally responsible for tracking the address changes of the FSF for the next 40+ years, not something I'm going to ever accept. I know you aren't responsible for this, but I'm getting really tired of pointing this out each-and-every-time I see a file from an Intel developer. So please work to fix this. thanks, greg k-h
Asias He
2013-Aug-01 07:45 UTC
[PATCH 0/5] Enable Drivers for Intel MIC X100 Coprocessors.
Hello Sudeep Dutt, On Wed, Jul 31, 2013 at 06:46:08PM -0700, Greg Kroah-Hartman wrote:> On Wed, Jul 24, 2013 at 08:31:31PM -0700, Sudeep Dutt wrote: > > An Intel MIC X100 device is a PCIe form factor add-in coprocessor > > card based on the Intel Many Integrated Core (MIC) architecture > > that runs a Linux OS. It is a PCIe endpoint in a platform and therefore > > implements the three required standard address spaces i.e. configuration, > > memory and I/O. The host OS loads a device driver as is typical for > > PCIe devices. The card itself runs a bootstrap after reset that > > transfers control to the card OS downloaded from the host driver. > > The card OS as shipped by Intel is a Linux kernel with modifications > > for the X100 devices. > > > > Since it is a PCIe card, it does not have the ability to host hardware > > devices for networking, storage and console. We provide these devices > > on X100 coprocessors thus enabling a self-bootable equivalent environment > > for applications. A key benefit of our solution is that it leverages > > the standard virtio framework for network, disk and console devices, > > though in our case the virtio framework is used across a PCIe bus. > > > > Here is a block diagram of the various components described above. The > > virtio backends are situated on the host rather than the card given better > > single threaded performance for the host compared to MIC and the ability of > > the host to initiate DMA's to/from the card using the MIC DMA engine. > > > > | > > +----------+ | +----------+ > > | Card OS | | | Host OS | > > +----------+ | +----------+ > > | > > +-------+ +--------+ +------+ | +---------+ +--------+ +--------+ > > | Virtio| |Virtio | |Virtio| | |Virtio | |Virtio | |Virtio | > > | Net | |Console | |Block | | |Net | |Console | |Block | > > | Driver| |Driver | |Driver| | |backend | |backend | |backend | > > +-------+ +--------+ +------+ | +---------+ +--------+ +--------+ > > | | | | | | | > > | | | |Ring 3| | | > > | | | |------|------------|---------|------- > > +-------------------+ |Ring 0+--------------------------+ > > | | | Virtio over PCIe IOCTLs | > > | | +--------------------------+ > > +--------------+ | | > > |Intel MIC | | +---------------+ > > |Card Driver | | |Intel MIC | > > +--------------+ | |Host Driver | > > | | +---------------+ > > | | | > > +-------------------------------------------------------------+ > > | | > > | PCIe Bus | > > +-------------------------------------------------------------+Could you send the whole series to virtualization at lists.linux-foundation.org next time?> That's some nice information, why isn't it in one of the patches you > sent, so that others can read it later on to try to figure out what is > going on with this codebase? > > thanks, > > greg k-h > _______________________________________________ > Virtualization mailing list > Virtualization at lists.linux-foundation.org > https://lists.linuxfoundation.org/mailman/listinfo/virtualization-- Asias
Sudeep Dutt
2013-Aug-02 00:34 UTC
[PATCH 0/5] Enable Drivers for Intel MIC X100 Coprocessors.
On Wed, 2013-07-31 at 18:46 -0700, Greg Kroah-Hartman wrote:> On Wed, Jul 24, 2013 at 08:31:31PM -0700, Sudeep Dutt wrote: > > An Intel MIC X100 device is a PCIe form factor add-in coprocessor > > card based on the Intel Many Integrated Core (MIC) architecture > > that runs a Linux OS. It is a PCIe endpoint in a platform and therefore > > implements the three required standard address spaces i.e. configuration, > > memory and I/O. The host OS loads a device driver as is typical for > > PCIe devices. The card itself runs a bootstrap after reset that > > transfers control to the card OS downloaded from the host driver. > > The card OS as shipped by Intel is a Linux kernel with modifications > > for the X100 devices. > > > > Since it is a PCIe card, it does not have the ability to host hardware > > devices for networking, storage and console. We provide these devices > > on X100 coprocessors thus enabling a self-bootable equivalent environment > > for applications. A key benefit of our solution is that it leverages > > the standard virtio framework for network, disk and console devices, > > though in our case the virtio framework is used across a PCIe bus. > > > > Here is a block diagram of the various components described above. The > > virtio backends are situated on the host rather than the card given better > > single threaded performance for the host compared to MIC and the ability of > > the host to initiate DMA's to/from the card using the MIC DMA engine. > > > > | > > +----------+ | +----------+ > > | Card OS | | | Host OS | > > +----------+ | +----------+ > > | > > +-------+ +--------+ +------+ | +---------+ +--------+ +--------+ > > | Virtio| |Virtio | |Virtio| | |Virtio | |Virtio | |Virtio | > > | Net | |Console | |Block | | |Net | |Console | |Block | > > | Driver| |Driver | |Driver| | |backend | |backend | |backend | > > +-------+ +--------+ +------+ | +---------+ +--------+ +--------+ > > | | | | | | | > > | | | |Ring 3| | | > > | | | |------|------------|---------|------- > > +-------------------+ |Ring 0+--------------------------+ > > | | | Virtio over PCIe IOCTLs | > > | | +--------------------------+ > > +--------------+ | | > > |Intel MIC | | +---------------+ > > |Card Driver | | |Intel MIC | > > +--------------+ | |Host Driver | > > | | +---------------+ > > | | | > > +-------------------------------------------------------------+ > > | | > > | PCIe Bus | > > +-------------------------------------------------------------+ > > That's some nice information, why isn't it in one of the patches you > sent, so that others can read it later on to try to figure out what is > going on with this codebase? >The description in the cover letter is also provided in PATCH 5 of the series @ Documentation/mic/mic_overview.txt [1]. Thanks, Sudeep Dutt [1] https://lkml.org/lkml/2013/7/24/812> thanks, > > greg k-h
Pavel Machek
2013-Aug-13 12:43 UTC
[PATCH 0/5] Enable Drivers for Intel MIC X100 Coprocessors.
Hi!> Since it is a PCIe card, it does not have the ability to host hardware > devices for networking, storage and console. We provide these devices > on X100 coprocessors thus enabling a self-bootable equivalent environment > for applications. A key benefit of our solution is that it leverages > the standard virtio framework for network, disk and console devices, > though in our case the virtio framework is used across a PCIe bus.Interesting...> Documentation/mic/mic_overview.txt | 48 + > Documentation/mic/mpssd/.gitignore | 1 + > Documentation/mic/mpssd/Makefile | 20 + > Documentation/mic/mpssd/micctrl | 157 +++ > Documentation/mic/mpssd/mpss | 246 +++++ > Documentation/mic/mpssd/mpssd.c | 1732 ++++++++++++++++++++++++++++++++++ > Documentation/mic/mpssd/mpssd.h | 105 +++ > Documentation/mic/mpssd/sysfs.c | 108 +++ > drivers/misc/Kconfig | 1 + > drivers/misc/Makefile | 1 + > drivers/misc/mic/Kconfig | 56 ++ > drivers/misc/mic/Makefile | 6 + > drivers/misc/mic/card/Makefile | 11 + > drivers/misc/mic/card/mic_common.h | 43 + > drivers/misc/mic/card/mic_debugfs.c | 139 +++ > drivers/misc/mic/card/mic_debugfs.h | 40 + > drivers/misc/mic/card/mic_device.c | 311 ++++++ > drivers/misc/mic/card/mic_device.h | 106 +++ > drivers/misc/mic/card/mic_virtio.c | 643 +++++++++++++ > drivers/misc/mic/card/mic_virtio.h | 79 ++ > drivers/misc/mic/card/mic_x100.c | 253 +++++ > drivers/misc/mic/card/mic_x100.h | 53 ++ > drivers/misc/mic/common/mic_device.h | 85 ++ > drivers/misc/mic/host/Makefile | 13 + > drivers/misc/mic/host/mic_boot.c | 181 ++++So... there are basically separate computers running on PCIe card plugged into host computer, right? Maybe we should have something more promintent than drivers/misc for this, then? Like drivers/multicomputer? Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
Sudeep Dutt
2013-Aug-14 20:24 UTC
[PATCH 0/5] Enable Drivers for Intel MIC X100 Coprocessors.
On Tue, 2013-08-13 at 14:43 +0200, Pavel Machek wrote:> Hi! > > > Since it is a PCIe card, it does not have the ability to host hardware > > devices for networking, storage and console. We provide these devices > > on X100 coprocessors thus enabling a self-bootable equivalent environment > > for applications. A key benefit of our solution is that it leverages > > the standard virtio framework for network, disk and console devices, > > though in our case the virtio framework is used across a PCIe bus. > > Interesting... > > > Documentation/mic/mic_overview.txt | 48 + > > Documentation/mic/mpssd/.gitignore | 1 + > > Documentation/mic/mpssd/Makefile | 20 + > > Documentation/mic/mpssd/micctrl | 157 +++ > > Documentation/mic/mpssd/mpss | 246 +++++ > > Documentation/mic/mpssd/mpssd.c | 1732 ++++++++++++++++++++++++++++++++++ > > Documentation/mic/mpssd/mpssd.h | 105 +++ > > Documentation/mic/mpssd/sysfs.c | 108 +++ > > drivers/misc/Kconfig | 1 + > > drivers/misc/Makefile | 1 + > > drivers/misc/mic/Kconfig | 56 ++ > > drivers/misc/mic/Makefile | 6 + > > drivers/misc/mic/card/Makefile | 11 + > > drivers/misc/mic/card/mic_common.h | 43 + > > drivers/misc/mic/card/mic_debugfs.c | 139 +++ > > drivers/misc/mic/card/mic_debugfs.h | 40 + > > drivers/misc/mic/card/mic_device.c | 311 ++++++ > > drivers/misc/mic/card/mic_device.h | 106 +++ > > drivers/misc/mic/card/mic_virtio.c | 643 +++++++++++++ > > drivers/misc/mic/card/mic_virtio.h | 79 ++ > > drivers/misc/mic/card/mic_x100.c | 253 +++++ > > drivers/misc/mic/card/mic_x100.h | 53 ++ > > drivers/misc/mic/common/mic_device.h | 85 ++ > > drivers/misc/mic/host/Makefile | 13 + > > drivers/misc/mic/host/mic_boot.c | 181 ++++ > > So... there are basically separate computers running on PCIe card > plugged into host computer, right? >They are PCIe form factor Coprocessors plugged into the host.> Maybe we should have something more promintent than drivers/misc for > this, then? Like drivers/multicomputer? >multicomputer" is an interesting name for these kind of devices but has several issues: a) The definition I found for multicomputer online was "A computer made up of several computers. The term generally refers to an architecture in which each processor has its own memory rather than multiple processors with a shared memory. A multicore computer, although it sounds similar, would not be a multicomputer because the multiple cores share a common memory." Intel MIC X100 devices typically have upto 244 CPUs (61 cores) on the card sharing common card memory so multicomputer would not be accurate based on this definition. b) X100 MIC devices have always been referred to Coprocessors and never as multicomputers in product specifications @ http://software.intel.com/en-us/mic-developer c) multicomputer is a very long path name. Given these issues, we would like to stick to drivers/misc/mic/ unless you have objections to this approach. Thanks for the feedback. Sudeep Dutt> Pavel >
Seemingly Similar Threads
- [PATCH 0/5] Enable Drivers for Intel MIC X100 Coprocessors.
- [PATCH 0/5] Enable Drivers for Intel MIC X100 Coprocessors.
- [PATCH 0/5] Enable Drivers for Intel MIC X100 Coprocessors.
- [PATCH 0/5] Enable Drivers for Intel MIC X100 Coprocessors.
- [PATCH 0/5] Enable Drivers for Intel MIC X100 Coprocessors.