thr3ads.net - Virtualization - [RFC PATCH 4/9] vringh: unify the APIs for all accessors [Dec 2022]

If this information is useful, please help other people find it:
Share via:

Shunsuke Mie

2022-Dec-27 07:49 UTC

[RFC PATCH 4/9] vringh: unify the APIs for all accessors

2022?12?27?(?) 16:04 Michael S. Tsirkin <mst at
redhat.com>:>
> On Tue, Dec 27, 2022 at 11:25:26AM +0900, Shunsuke Mie wrote:
> > Each vringh memory accessors that are for user, kern and iotlb has own
> > interfaces that calls common code. But some codes are duplicated and
that
> > becomes loss extendability.
> >
> > Introduce a struct vringh_ops and provide a common APIs for all
accessors.
> > It can bee easily extended vringh code for new memory accessor and
> > simplified a caller code.
> >
> > Signed-off-by: Shunsuke Mie <mie at igel.co.jp>
> > ---
> >  drivers/vhost/vringh.c | 667
+++++++++++------------------------------
> >  include/linux/vringh.h | 100 +++---
> >  2 files changed, 225 insertions(+), 542 deletions(-)
> >
> > diff --git a/drivers/vhost/vringh.c b/drivers/vhost/vringh.c
> > index aa3cd27d2384..ebfd3644a1a3 100644
> > --- a/drivers/vhost/vringh.c
> > +++ b/drivers/vhost/vringh.c
> > @@ -35,15 +35,12 @@ static __printf(1,2) __cold void vringh_bad(const
char *fmt, ...)
> >  }
> >
> >  /* Returns vring->num if empty, -ve on error. */
> > -static inline int __vringh_get_head(const struct vringh *vrh,
> > -                                 int (*getu16)(const struct vringh
*vrh,
> > -                                               u16 *val, const
__virtio16 *p),
> > -                                 u16 *last_avail_idx)
> > +static inline int __vringh_get_head(const struct vringh *vrh, u16
*last_avail_idx)
> >  {
> >       u16 avail_idx, i, head;
> >       int err;
> >
> > -     err = getu16(vrh, &avail_idx,
&vrh->vring.avail->idx);
> > +     err = vrh->ops.getu16(vrh, &avail_idx,
&vrh->vring.avail->idx);
> >       if (err) {
> >               vringh_bad("Failed to access avail idx at %p",
> >                          &vrh->vring.avail->idx);
>
> I like that this patch removes more lines of code than it adds.
>
> However one of the design points of vringh abstractions is that they were
> carefully written to be very low overhead.
> This is why we are passing function pointers to inline functions -
> compiler can optimize that out.
>
> I think that introducing ops indirect functions calls here is going to
break
> these assumptions and hurt performance.
> Unless compiler can somehow figure it out and optimize?
> I don't see how it's possible with ops pointer in memory
> but maybe I'm wrong.I think your concern is correct. I have to understand the compiler
optimization and redesign this approach If it is needed.> Was any effort taken to test effect of these patches on performance?I just tested vringh_test and already faced little performance reduction.
I have to investigate that, as you said.

Thank you for your comments.> Thanks!
>
>Best,
Shunsuke.

Shunsuke Mie

2022-Dec-27 10:22 UTC

head link

[RFC PATCH 4/9] vringh: unify the APIs for all accessors

2022?12?27?(?) 16:49 Shunsuke Mie <mie at
igel.co.jp>:>
> 2022?12?27?(?) 16:04 Michael S. Tsirkin <mst at redhat.com>:
> >
> > On Tue, Dec 27, 2022 at 11:25:26AM +0900, Shunsuke Mie wrote:
> > > Each vringh memory accessors that are for user, kern and iotlb
has own
> > > interfaces that calls common code. But some codes are duplicated
and that
> > > becomes loss extendability.
> > >
> > > Introduce a struct vringh_ops and provide a common APIs for all
accessors.
> > > It can bee easily extended vringh code for new memory accessor
and
> > > simplified a caller code.
> > >
> > > Signed-off-by: Shunsuke Mie <mie at igel.co.jp>
> > > ---
> > >  drivers/vhost/vringh.c | 667
+++++++++++------------------------------
> > >  include/linux/vringh.h | 100 +++---
> > >  2 files changed, 225 insertions(+), 542 deletions(-)
> > >
> > > diff --git a/drivers/vhost/vringh.c b/drivers/vhost/vringh.c
> > > index aa3cd27d2384..ebfd3644a1a3 100644
> > > --- a/drivers/vhost/vringh.c
> > > +++ b/drivers/vhost/vringh.c
> > > @@ -35,15 +35,12 @@ static __printf(1,2) __cold void
vringh_bad(const char *fmt, ...)
> > >  }
> > >
> > >  /* Returns vring->num if empty, -ve on error. */
> > > -static inline int __vringh_get_head(const struct vringh *vrh,
> > > -                                 int (*getu16)(const struct
vringh *vrh,
> > > -                                               u16 *val, const
__virtio16 *p),
> > > -                                 u16 *last_avail_idx)
> > > +static inline int __vringh_get_head(const struct vringh *vrh,
u16 *last_avail_idx)
> > >  {
> > >       u16 avail_idx, i, head;
> > >       int err;
> > >
> > > -     err = getu16(vrh, &avail_idx,
&vrh->vring.avail->idx);
> > > +     err = vrh->ops.getu16(vrh, &avail_idx,
&vrh->vring.avail->idx);
> > >       if (err) {
> > >               vringh_bad("Failed to access avail idx at
%p",
> > >                          &vrh->vring.avail->idx);
> >
> > I like that this patch removes more lines of code than it adds.
> >
> > However one of the design points of vringh abstractions is that they
were
> > carefully written to be very low overhead.
> > This is why we are passing function pointers to inline functions -
> > compiler can optimize that out.
> >
> > I think that introducing ops indirect functions calls here is going to
break
> > these assumptions and hurt performance.
> > Unless compiler can somehow figure it out and optimize?
> > I don't see how it's possible with ops pointer in memory
> > but maybe I'm wrong.
> I think your concern is correct. I have to understand the compiler
> optimization and redesign this approach If it is needed.
> > Was any effort taken to test effect of these patches on performance?
> I just tested vringh_test and already faced little performance reduction.
> I have to investigate that, as you said.I attempted to test with perf. I found that the performance of patched code
is almost the same as the upstream one. However, I have to investigate way
this patch leads to this result, also the profiling should be run on
more powerful
machines too.

environment:
$ grep 'model name' /proc/cpuinfo
model name      : Intel(R) Core(TM) i3-7020U CPU @ 2.30GHz
model name      : Intel(R) Core(TM) i3-7020U CPU @ 2.30GHz
model name      : Intel(R) Core(TM) i3-7020U CPU @ 2.30GHz
model name      : Intel(R) Core(TM) i3-7020U CPU @ 2.30GHz

results:
* for patched code
 Performance counter stats for 'nice -n -20 ./vringh_test_patched
--parallel --eventidx --fast-vringh --indirect --virtio-1' (20 runs):

          3,028.05 msec task-clock                #    0.995 CPUs
utilized            ( +-  0.12% )
            78,150      context-switches          #   25.691 K/sec
               ( +-  0.00% )
                 5      cpu-migrations            #    1.644 /sec
               ( +-  3.33% )
               190      page-faults               #   62.461 /sec
               ( +-  0.41% )
     6,919,025,222      cycles                    #    2.275 GHz
               ( +-  0.13% )
     8,990,220,160      instructions              #    1.29  insn per
cycle           ( +-  0.04% )
     1,788,326,786      branches                  #  587.899 M/sec
               ( +-  0.05% )
         4,557,398      branch-misses             #    0.25% of all
branches          ( +-  0.43% )

           3.04359 +- 0.00378 seconds time elapsed  ( +-  0.12% )

* for upstream code
 Performance counter stats for 'nice -n -20 ./vringh_test_base
--parallel --eventidx --fast-vringh --indirect --virtio-1' (10 runs):

          3,058.41 msec task-clock                #    0.999 CPUs
utilized            ( +-  0.14% )
            78,149      context-switches          #   25.545 K/sec
               ( +-  0.00% )
                 5      cpu-migrations            #    1.634 /sec
               ( +-  2.67% )
               194      page-faults               #   63.414 /sec
               ( +-  0.43% )
     6,988,713,963      cycles                    #    2.284 GHz
               ( +-  0.14% )
     8,512,533,269      instructions              #    1.22  insn per
cycle           ( +-  0.04% )
     1,638,375,371      branches                  #  535.549 M/sec
               ( +-  0.05% )
         4,428,866      branch-misses             #    0.27% of all
branches          ( +- 22.57% )

           3.06085 +- 0.00420 seconds time elapsed  ( +-  0.14% )
> Thank you for your comments.
> > Thanks!
> >
> >
> Best,
> Shunsuke.

Virtualization - Dec 2022 - [RFC PATCH 4/9] vringh: unify the APIs for all accessors

[RFC PATCH 4/9] vringh: unify the APIs for all accessors

[RFC PATCH 4/9] vringh: unify the APIs for all accessors