thr3ads.net - Linux Virtualization - [RFC 0/4] Virtio uses DMA API for all devices [Jul 2018]

If this information is useful, please help other people find it:
Share via:

Anshuman Khandual

2018-Jul-25 03:26 UTC

[RFC 0/4] Virtio uses DMA API for all devices

On 07/23/2018 02:38 PM, Michael S. Tsirkin wrote:> On Mon, Jul 23, 2018 at 11:58:23AM +0530, Anshuman Khandual wrote:
>> On 07/20/2018 06:46 PM, Michael S. Tsirkin wrote:
>>> On Fri, Jul 20, 2018 at 09:29:37AM +0530, Anshuman Khandual wrote:
>>>> This patch series is the follow up on the discussions we had
before about
>>>> the RFC titled [RFC,V2] virtio: Add platform specific DMA API
translation
>>>> for virito devices
(https://patchwork.kernel.org/patch/10417371/). There
>>>> were suggestions about doing away with two different paths of
transactions
>>>> with the host/QEMU, first being the direct GPA and the other
being the DMA
>>>> API based translations.
>>>>
>>>> First patch attempts to create a direct GPA mapping based DMA
operations
>>>> structure called 'virtio_direct_dma_ops' with exact
same implementation
>>>> of the direct GPA path which virtio core currently has but just
wrapped in
>>>> a DMA API format. Virtio core must use
'virtio_direct_dma_ops' instead of
>>>> the arch default in absence of VIRTIO_F_IOMMU_PLATFORM flag to
preserve the
>>>> existing semantics. The second patch does exactly that inside
the function
>>>> virtio_finalize_features(). The third patch removes the default
direct GPA
>>>> path from virtio core forcing it to use DMA API callbacks for
all devices.
>>>> Now with that change, every device must have a DMA operations
structure
>>>> associated with it. The fourth patch adds an additional hook
which gives
>>>> the platform an opportunity to do yet another override if
required. This
>>>> platform hook can be used on POWER Ultravisor based protected
guests to
>>>> load up SWIOTLB DMA callbacks to do the required (as discussed
previously
>>>> in the above mentioned thread how host is allowed to access
only parts of
>>>> the guest GPA range) bounce buffering into the shared memory
for all I/O
>>>> scatter gather buffers to be consumed on the host side.
>>>>
>>>> Please go through these patches and review whether this
approach broadly
>>>> makes sense. I will appreciate suggestions, inputs, comments
regarding
>>>> the patches or the approach in general. Thank you.
>>> I like how patches 1-3 look. Could you test performance
>>> with/without to see whether the extra indirection through
>>> use of DMA ops causes a measurable slow-down?
>>
>> I ran this simple DD command 10 times where /dev/vda is a virtio block
>> device of 10GB size.
>>
>> dd if=/dev/zero of=/dev/vda bs=8M count=1024 oflag=direct
>>
>> With and without patches bandwidth which has a bit wide range does not
>> look that different from each other.
>>
>> Without patches
>> ==============>>
>> ---------- 1 ---------
>> 1024+0 records in
>> 1024+0 records out
>> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.95557 s, 4.4 GB/s
>> ---------- 2 ---------
>> 1024+0 records in
>> 1024+0 records out
>> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 2.05176 s, 4.2 GB/s
>> ---------- 3 ---------
>> 1024+0 records in
>> 1024+0 records out
>> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.88314 s, 4.6 GB/s
>> ---------- 4 ---------
>> 1024+0 records in
>> 1024+0 records out
>> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.84899 s, 4.6 GB/s
>> ---------- 5 ---------
>> 1024+0 records in
>> 1024+0 records out
>> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 5.37184 s, 1.6 GB/s
>> ---------- 6 ---------
>> 1024+0 records in
>> 1024+0 records out
>> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.9205 s, 4.5 GB/s
>> ---------- 7 ---------
>> 1024+0 records in
>> 1024+0 records out
>> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 6.85166 s, 1.3 GB/s
>> ---------- 8 ---------
>> 1024+0 records in
>> 1024+0 records out
>> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.74049 s, 4.9 GB/s
>> ---------- 9 ---------
>> 1024+0 records in
>> 1024+0 records out
>> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 6.31699 s, 1.4 GB/s
>> ---------- 10 ---------
>> 1024+0 records in
>> 1024+0 records out
>> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 2.47057 s, 3.5 GB/s
>>
>>
>> With patches
>> ===========>>
>> ---------- 1 ---------
>> 1024+0 records in
>> 1024+0 records out
>> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 2.25993 s, 3.8 GB/s
>> ---------- 2 ---------
>> 1024+0 records in
>> 1024+0 records out
>> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.82438 s, 4.7 GB/s
>> ---------- 3 ---------
>> 1024+0 records in
>> 1024+0 records out
>> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.93856 s, 4.4 GB/s
>> ---------- 4 ---------
>> 1024+0 records in
>> 1024+0 records out
>> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.83405 s, 4.7 GB/s
>> ---------- 5 ---------
>> 1024+0 records in
>> 1024+0 records out
>> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 7.50199 s, 1.1 GB/s
>> ---------- 6 ---------
>> 1024+0 records in
>> 1024+0 records out
>> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 2.28742 s, 3.8 GB/s
>> ---------- 7 ---------
>> 1024+0 records in
>> 1024+0 records out
>> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 5.74958 s, 1.5 GB/s
>> ---------- 8 ---------
>> 1024+0 records in
>> 1024+0 records out
>> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.99149 s, 4.3 GB/s
>> ---------- 9 ---------
>> 1024+0 records in
>> 1024+0 records out
>> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 5.67647 s, 1.5 GB/s
>> ---------- 10 ---------
>> 1024+0 records in
>> 1024+0 records out
>> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 2.93957 s, 2.9 GB/s
>>
>> Does this look okay ?
> 
> You want to test IOPS with lots of small writes and using
> raw ramdisk on host.
Hello Michael,

I have conducted the following experiments and here are the results.

TEST SETUP
=========
A virtio block disk is mounted on the guest as follows.


    <disk type='file' device='disk'>
      <driver name='qemu' type='raw'
ioeventfd='off'/>
      <source file='/mnt/disk2.img'/>
      <target dev='vdb' bus='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00'
slot='0x04' function='0x0'/>
    </disk>

In the host back end its an QEMU raw image on tmpfs file system.

disk:

-rw-r--r-- 1 libvirt-qemu kvm  5.0G Jul 24 06:26 disk2.img

mount:

size=21G on /mnt type tmpfs (rw,relatime,size=22020096k)

TEST CONFIG
==========
FIO (https://linux.die.net/man/1/fio) is being run with and without
the patches.

Read test config:

[Sequential]
direct=1
ioengine=libaio
runtime=5m
time_based
filename=/dev/vda
bs=4k
numjobs=16
rw=read
unlink=1
iodepth=256


Write test config:

[Sequential]
direct=1
ioengine=libaio
runtime=5m
time_based
filename=/dev/vda
bs=4k
numjobs=16
rw=write
unlink=1
iodepth=256

The virtio block device comes up as /dev/vda on the guest with

/sys/block/vda/queue/nr_requests=128

TEST RESULTS
===========
Without the patches
-------------------

Read test:

Run status group 0 (all jobs):
   READ: bw=550MiB/s (577MB/s), 33.2MiB/s-35.6MiB/s (34.9MB/s-37.4MB/s),
io=161GiB (173GB), run=300001-300009msec

Disk stats (read/write):
  vda: ios=42249926/0, merge=0/0, ticks=1499920/0, in_queue=35672384,
util=100.00%


Write test:

Run status group 0 (all jobs):
  WRITE: bw=514MiB/s (539MB/s), 31.5MiB/s-34.6MiB/s (33.0MB/s-36.2MB/s),
io=151GiB (162GB), run=300001-300009msec

Disk stats (read/write):
  vda: ios=29/39459261, merge=0/0, ticks=0/1570580, in_queue=35745992,
util=100.00%

With the patches
----------------

Read test:

Run status group 0 (all jobs):
   READ: bw=572MiB/s (600MB/s), 35.0MiB/s-37.2MiB/s (36.7MB/s-38.0MB/s),
io=168GiB (180GB), run=300001-300006msec

Disk stats (read/write):
  vda: ios=43917611/0, merge=0/0, ticks=1934268/0, in_queue=35531688,
util=100.00%
  
Write test:

Run status group 0 (all jobs):
  WRITE: bw=546MiB/s (572MB/s), 33.7MiB/s-35.0MiB/s (35.3MB/s-36.7MB/s),
io=160GiB (172GB), run=300001-300007msec

Disk stats (read/write):
  vda: ios=14/41893878, merge=0/0, ticks=8/2107816, in_queue=35535716,
util=100.00%

Results with and without the patches are similar.

Michael S. Tsirkin

2018-Jul-27 11:31 UTC

head link

[RFC 0/4] Virtio uses DMA API for all devices

On Wed, Jul 25, 2018 at 08:56:23AM +0530, Anshuman Khandual
wrote:> Results with and without the patches are similar.
Thanks! And another thing to try is virtio-net with
a fast NIC backend (40G and up). Unfortunately
at this point loopback tests stress the host
scheduler too much.

-- 
MST

Anshuman Khandual

2018-Jul-28 08:37 UTC

head link

[RFC 0/4] Virtio uses DMA API for all devices

On 07/27/2018 05:01 PM, Michael S. Tsirkin wrote:> On Wed, Jul 25, 2018 at 08:56:23AM +0530, Anshuman Khandual wrote:
>> Results with and without the patches are similar.
> 
> Thanks! And another thing to try is virtio-net with
> a fast NIC backend (40G and up). Unfortunately
> at this point loopback tests stress the host
> scheduler too much.
> 
Sure. Will look around for a 40G NIC system. BTW I have been testing
virtio-net with a TAP device as back end.

ip tuntap add dev tap0 mode tap user $(whoami)
ip link set tap0 master virbr0
ip link set dev virbr0 up
ip link set dev tap0 up

which is exported into the guest as follows

-device virtio-net,netdev=network0,mac=52:55:00:d1:55:01 \
-netdev tap,id=network0,ifname=tap0,script=no,downscript=no \

But I have not run any network benchmarks on it though.

Apparently Analagous Threads

Search for more seemingly similar threads

Linux Virtualization - Jul 2018 - [RFC 0/4] Virtio uses DMA API for all devices

[RFC 0/4] Virtio uses DMA API for all devices

[RFC 0/4] Virtio uses DMA API for all devices

[RFC 0/4] Virtio uses DMA API for all devices

Apparently Analagous Threads