thr3ads.net - Xen devel - [Xen-devel] [PATCH] Std VGA Performance [Oct 2007]

If this information is useful, please help other people find it:
Share via:

Ben Guthro

2007-Oct-24 21:36 UTC

[Xen-devel] [PATCH] Std VGA Performance

This patch improves the performance of Standard VGA,
the mode used during Windows boot and by the Linux
splash screen.

It does so by buffering all the stdvga programmed output ops
and memory mapped ops (both reads and writes) that are sent to QEMU.

We maintain locally essential VGA state so we can respond
immediately to input and read ops without waiting for
QEMU.  We snoop output and write ops to keep our state
up-to-date.

PIO input ops are satisfied from cached state without
bothering QEMU.

PIO output and mmio ops are passed through to QEMU, including
mmio read ops.  This is necessary because mmio reads
can have side effects.

I have changed the format of the buffered_iopage.
It used to contain 80 elements of type ioreq_t (48 bytes each).
Now it contains 672 elements of type buf_ioreq_t (6 bytes each).
Being able to pipeline 8 times as many ops improves
VGA performance by a factor of 8.

I changed hvm_buffered_io_intercept to use the same
registration and callback mechanism as hvm_portio_intercept
rather than the hacky hardcoding it used before.

In platform.c, I fixed send_timeoffset_req() to sets its
ioreq size to 8 (rather than 4), and its count to 1 (which
was missing).

Signed-off-by: Ben Guthro <bguthro@virtualron.com>
Signed-off-by: Robert Phillips <rphillips@virtualiron.com>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2007-Oct-25 14:14 UTC

head link

Re: [Xen-devel] [PATCH] Std VGA Performance

On 24/10/07 22:36, "Ben Guthro" <bguthro@virtualiron.com> wrote:
> This patch improves the performance of Standard VGA,
> the mode used during Windows boot and by the Linux
> splash screen.
> 
> It does so by buffering all the stdvga programmed output ops
> and memory mapped ops (both reads and writes) that are sent to QEMU.
How much benefit comes from immediate servicing of PIO input ops versus the
massive increase in buffered-io slots? Removing the former optimisation
would certainly make the patch a lot smaller!

What happens across save/restore? The hypervisor''s state cache will go
away,
won''t it? I suppose it''s okay if the guest is in SVGA LFB mode
at that point
(actually, that''s another thing - do you correctly handle hand-off
between
VGA and SVGA modes), but I don''t know that we want to rely on that.

 -- Keir
> We maintain locally essential VGA state so we can respond
> immediately to input and read ops without waiting for
> QEMU.  We snoop output and write ops to keep our state
> up-to-date.
> 
> PIO input ops are satisfied from cached state without
> bothering QEMU.
> 
> PIO output and mmio ops are passed through to QEMU, including
> mmio read ops.  This is necessary because mmio reads
> can have side effects.
> 
> I have changed the format of the buffered_iopage.
> It used to contain 80 elements of type ioreq_t (48 bytes each).
> Now it contains 672 elements of type buf_ioreq_t (6 bytes each).
> Being able to pipeline 8 times as many ops improves
> VGA performance by a factor of 8.
> 
> I changed hvm_buffered_io_intercept to use the same
> registration and callback mechanism as hvm_portio_intercept
> rather than the hacky hardcoding it used before.
> 
> In platform.c, I fixed send_timeoffset_req() to sets its
> ioreq size to 8 (rather than 4), and its count to 1 (which
> was missing).


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Robert Phillips

2007-Oct-25 15:28 UTC

head link

Re: [Xen-devel] [PATCH] Std VGA Performance

Good questions, Keir.  Answers below:

On 10/25/07, Keir Fraser <Keir.Fraser@cl.cam.ac.uk>
wrote:>
> On 24/10/07 22:36, "Ben Guthro" <bguthro@virtualiron.com>
wrote:
>
> > This patch improves the performance of Standard VGA,
> > the mode used during Windows boot and by the Linux
> > splash screen.
> >
> > It does so by buffering all the stdvga programmed output ops
> > and memory mapped ops (both reads and writes) that are sent to QEMU.
>
> How much benefit comes from immediate servicing of PIO input ops versus
> the
> massive increase in buffered-io slots? Removing the former optimisation
> would certainly make the patch a lot smaller!

Subjectively, the performance improvement appears substantial.  We have
tested the code with the stdvga emulation and with and without the increased
number of slots. With more slots the screen painting goes from being fast to
very fast.

As you''ve noticed, the increase in number of slots is compensated by
the
decrease in slot size (so there is no increase in memory use) at the cost of
packing (and unpacking) ioreqs as they are written to (and read from) the
buffer.

What happens across save/restore? The hypervisor''s state cache will go
away,> won''t it? I suppose it''s okay if the guest is in SVGA LFB
mode at that
> point
> (actually, that''s another thing - do you correctly handle hand-off
between
> VGA and SVGA modes), but I don''t know that we want to rely on
that.

This hasn''t been a problem in practice.  The guest quickly switches
from
VGA to SVGA mode causing the stdvga code to be largely inactive,
and we have only seen it switch back when the guest blue-screens.
The stdvga code detects that transition correctly and paints the blue-screen
quickly.

After a restore, the code assumes it is not in standard VGA mode so is
largely inactive.
That conservative assumption might not be optimal but it is correct.

I don''t believe the failure to save/restore the stdvga cache will prove
problematic but
if it becomes so I will add corrective code.

One might ask (and we did) what is the point of all this VGA emulation code
when it is only
active during the boot process (or during blue-screen painting).

The answer is that one wants the user''s first experience with Xen to be
positive;
as watching an excruciatingly slow Windows boot screen or Linux splash panel
is not.

-- rsp

-- Keir>
> > We maintain locally essential VGA state so we can respond
> > immediately to input and read ops without waiting for
> > QEMU.  We snoop output and write ops to keep our state
> > up-to-date.
> >
> > PIO input ops are satisfied from cached state without
> > bothering QEMU.
> >
> > PIO output and mmio ops are passed through to QEMU, including
> > mmio read ops.  This is necessary because mmio reads
> > can have side effects.
> >
> > I have changed the format of the buffered_iopage.
> > It used to contain 80 elements of type ioreq_t (48 bytes each).
> > Now it contains 672 elements of type buf_ioreq_t (6 bytes each).
> > Being able to pipeline 8 times as many ops improves
> > VGA performance by a factor of 8.
> >
> > I changed hvm_buffered_io_intercept to use the same
> > registration and callback mechanism as hvm_portio_intercept
> > rather than the hacky hardcoding it used before.
> >
> > In platform.c, I fixed send_timeoffset_req() to sets its
> > ioreq size to 8 (rather than 4), and its count to 1 (which
> > was missing).
>
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
>

-- 
--------------------------------------------------------------------
Robert S. Phillips                          Virtual Iron Software
rphillips@virtualiron.com                Tower 1, Floor 2
978-849-1220                                 900 Chelmsford Street
                                                    Lowell, MA 01851

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2007-Oct-25 15:39 UTC

head link

Re: [Xen-devel] [PATCH] Std VGA Performance

On 25/10/07 16:28, "Robert Phillips" <rsp.vi.xen@gmail.com>
wrote:
>> How much benefit comes from immediate servicing of PIO input ops versus
the
>> massive increase in buffered-io slots? Removing the former optimisation
>> would certainly make the patch a lot smaller!
> 
> Subjectively, the performance improvement appears substantial.  We have
tested
> the code with the stdvga emulation and with and without the increased
number
> of slots. With more slots the screen painting goes from being fast to very
> fast.
> 
> As you''ve noticed, the increase in number of slots is compensated
by the
> decrease in slot size (so there is no increase in memory use) at the cost
of
> packing (and unpacking) ioreqs as they are written to (and read from) the
> buffer.
I guess what I¹m really interested in is the performance /with/ the
increased number of slots and with versus without the stdvga emulation.
Since it¹s the stdvga emulation that really adds the complexity.

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Robert Phillips

2007-Oct-25 17:31 UTC

head link

Re: [Xen-devel] [PATCH] Std VGA Performance

The performance is poor with increased slots and without emulation.

As presented the emulation code treats the buffered-io slots as an
asynchronous queue.  The stdvga emulator pushes ioreqs into the queue but
need not wait for any response because it can satisfy read requests
locally.  (The only time it must wait is when the queue becomes full.)

Without the emulation, the code must block on each read (of which there are
many) waiting for QEMU to provide an answer.  This really slows things down
and renders the buffer largely useless.  I don''t believe it ever gets
full;
there are never enough consecutive writes to fill it.

With both increased slots and emulation, the performance feels so very much
better.  Like taking a stone out of your shoe.  :-)

-- rsp

On 10/25/07, Keir Fraser <Keir.Fraser@cl.cam.ac.uk>
wrote:>
>  On 25/10/07 16:28, "Robert Phillips"
<rsp.vi.xen@gmail.com> wrote:
>
> How much benefit comes from immediate servicing of PIO input ops versus
> the
> massive increase in buffered-io slots? Removing the former optimisation
> would certainly make the patch a lot smaller!
>
>
> Subjectively, the performance improvement appears substantial.  We have
> tested the code with the stdvga emulation and with and without the
increased
> number of slots. With more slots the screen painting goes from being fast
to
> very fast.
>
> As you''ve noticed, the increase in number of slots is compensated
by the
> decrease in slot size (so there is no increase in memory use) at the cost
of
> packing (and unpacking) ioreqs as they are written to (and read from) the
> buffer.
>
>
> I guess what I''m really interested in is the performance /with/
the
> increased number of slots and with versus without the stdvga emulation.
> Since it''s the stdvga emulation that really adds the complexity.
>
>  -- Keir
>

-- 
--------------------------------------------------------------------
Robert S. Phillips                          Virtual Iron Software
rphillips@virtualiron.com                Tower 1, Floor 2
978-849-1220                                 900 Chelmsford Street
                                                    Lowell, MA 01851

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Alex Williamson

2007-Oct-29 18:48 UTC

head link

[Xen-ia64-devel] Re: [Xen-devel] [PATCH] Std VGA Performance

On Wed, 2007-10-24 at 17:36 -0400, Ben Guthro wrote:> This patch improves the performance of Standard VGA,
> the mode used during Windows boot and by the Linux
> splash screen.
Hi,

   ia64 uses VGA too.  I''ve been able to regain some functionality with
the patch below, but the VGA modes used by our firmware still have
significant issues (once we boot to Linux userspace, VGA text mode gets
readable).  It seems like perhaps we''ve lost support for some basic
text
VGA modes.  I haven''t tried to understand the changes in qemu yet, but
are we sacrificing compatibility for performance?  Patch and screen shot
below.  Thanks,

	Alex

PS - for xen-ia64-devel, both the Open Source and Intel GFW have issues
with EFI text mode w/ this patch (use EFI shell to see it on Intel GFW).

Signed-off-by: Alex Williamson <alex.williamson@hp.com>
---

diff -r 4034317507de xen/arch/ia64/vmx/mmio.c
--- a/xen/arch/ia64/vmx/mmio.c	Mon Oct 29 16:49:02 2007 +0000
+++ b/xen/arch/ia64/vmx/mmio.c	Mon Oct 29 12:29:18 2007 -0600
@@ -56,10 +56,12 @@ static int hvm_buffered_io_intercept(ior
 {
     struct vcpu *v = current;
     spinlock_t  *buffered_io_lock;
-    buffered_iopage_t *buffered_iopage +    buffered_iopage_t *pg         
(buffered_iopage_t *)(v->domain->arch.hvm_domain.buffered_io_va);
-    unsigned long tmp_write_pointer = 0;
     int i;
+    buf_ioreq_t bp;
+    /* Timeoffset sends 64b data, but no address.  Use two consecutive slots.
*/
+    int qw = 0;
 
     /* ignore READ ioreq_t! */
     if ( p->dir == IOREQ_READ )
@@ -75,11 +77,41 @@ static int hvm_buffered_io_intercept(ior
     if ( i == HVM_BUFFERED_IO_RANGE_NR )
         return 0;
 
+    /* Return 0 for the cases we can''t deal with. */
+    if ( p->addr > 0xffffful || p->data_is_ptr || p->df ||
p->count != 1 )
+        return 0;
+
+    bp.type = p->type;
+    bp.dir = p->dir;
+    switch (p->size) {
+    case 1:
+        bp.size = 0;
+        break;
+    case 2:
+        bp.size = 1;
+        break;
+    case 4:
+        bp.size = 2;
+        break;
+    case 8:
+        bp.size = 3;
+        qw = 1;
+        gdprintk(XENLOG_INFO, "quadword ioreq type:%d
data:%"PRIx64"\n",
+                 p->type, p->data);
+        break;
+    default:
+        gdprintk(XENLOG_WARNING, "unexpected ioreq
size:%"PRId64"\n", p->size);
+        return 0;
+    }
+
+    bp.data = p->data;
+    bp.addr = qw ? ((p->data >> 16) & 0xfffful) : (p->addr
& 0xffffful);
+
     buffered_io_lock = &v->domain->arch.hvm_domain.buffered_io_lock;
     spin_lock(buffered_io_lock);
 
-    if ( buffered_iopage->write_pointer - buffered_iopage->read_pointer
=-         (unsigned long)IOREQ_BUFFER_SLOT_NUM ) {
+    if ( pg->write_pointer - pg->read_pointer >+         (unsigned
long)IOREQ_BUFFER_SLOT_NUM - (qw ? 1 : 0) ) {
         /* the queue is full.
          * send the iopacket through the normal path.
          * NOTE: The arithimetic operation could handle the situation for
@@ -89,13 +121,19 @@ static int hvm_buffered_io_intercept(ior
         return 0;
     }
 
-    tmp_write_pointer = buffered_iopage->write_pointer %
IOREQ_BUFFER_SLOT_NUM;
-
-    memcpy(&buffered_iopage->ioreq[tmp_write_pointer], p,
sizeof(ioreq_t));
+    memcpy(&pg->buf_ioreq[pg->write_pointer % IOREQ_BUFFER_SLOT_NUM],
+           &bp, sizeof(bp));
+
+    if (qw) {
+        bp.data = p->data >> 32;
+        bp.addr = (p->data >> 48) & 0xfffful;
+        memcpy(&pg->buf_ioreq[(pg->write_pointer + 1) %
IOREQ_BUFFER_SLOT_NUM],
+               &bp, sizeof(bp));
+    }
 
     /*make the ioreq_t visible before write_pointer*/
     wmb();
-    buffered_iopage->write_pointer++;
+    pg->write_pointer += qw ? 2 : 1;
 
     spin_unlock(buffered_io_lock);
 



_______________________________________________
Xen-ia64-devel mailing list
Xen-ia64-devel@lists.xensource.com
http://lists.xensource.com/xen-ia64-devel

Keir Fraser

2007-Oct-29 19:17 UTC

head link

Re: [Xen-devel] [PATCH] Std VGA Performance

On 29/10/07 18:48, "Alex Williamson" <alex.williamson@hp.com>
wrote:
> On Wed, 2007-10-24 at 17:36 -0400, Ben Guthro wrote:
>> This patch improves the performance of Standard VGA,
>> the mode used during Windows boot and by the Linux
>> splash screen.
> 
> Hi,
> 
>    ia64 uses VGA too.  I''ve been able to regain some functionality
with
> the patch below, but the VGA modes used by our firmware still have
> significant issues (once we boot to Linux userspace, VGA text mode gets
> readable).  It seems like perhaps we''ve lost support for some
basic text
> VGA modes.  I haven''t tried to understand the changes in qemu yet,
but
> are we sacrificing compatibility for performance?  Patch and screen shot
> below.  Thanks,
All that''s changed for ia64 is the definition of the buffered ioreq
structure, which has become more densely packed. All the rest of the
acceleration is (currently) x86-specific. So this shouldn''t be too hard
to
track down...

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Alex Williamson

2007-Oct-30 16:19 UTC

head link

[PATCH] Re: [Xen-devel] [PATCH] Std VGA Performance

On Mon, 2007-10-29 at 19:17 +0000, Keir Fraser wrote:> 
> All that''s changed for ia64 is the definition of the buffered
ioreq
> structure, which has become more densely packed. All the rest of the
> acceleration is (currently) x86-specific. So this shouldn''t be too
hard to
> track down...
   Yes, you''re right, but easy to overlook, and I''m not sure
how it
works on x86.  I copied the x86 code for filling in the buffered ioreq,
but failed to notice that it attempts to store 4 bytes of data into a 2
byte field...  The comment for the size entry in buf_ioreq could be
interpreted that only 1, 2, and 8 bytes are expected, but I definitely
see 4 bytes on occasion.  I''d guess x86 has a bug here that''s
simply not
exposed because of the 16bit code that''s probably being used to
initialize VGA.  I also question the 8 byte support, which is why I
skipped it in the patch below.  Wouldn''t an 8 byte MMIO access that
isn''t a timeoffset be possible?  Keir, please apply this to the staging
tree.  Thanks,

	Alex

Signed-off-by: Alex Williamson <alex.williamson@hp.com>
--

diff -r 4034317507de xen/arch/ia64/vmx/mmio.c
--- a/xen/arch/ia64/vmx/mmio.c	Mon Oct 29 16:49:02 2007 +0000
+++ b/xen/arch/ia64/vmx/mmio.c	Tue Oct 30 10:03:42 2007 -0600
@@ -55,53 +55,68 @@ static int hvm_buffered_io_intercept(ior
 static int hvm_buffered_io_intercept(ioreq_t *p)
 {
     struct vcpu *v = current;
-    spinlock_t  *buffered_io_lock;
-    buffered_iopage_t *buffered_iopage +    buffered_iopage_t *pg         
(buffered_iopage_t *)(v->domain->arch.hvm_domain.buffered_io_va);
-    unsigned long tmp_write_pointer = 0;
+    buf_ioreq_t bp;
     int i;
 
+    /* Ensure buffered_iopage fits in a page */
+    BUILD_BUG_ON(sizeof(buffered_iopage_t) > PAGE_SIZE);
+
     /* ignore READ ioreq_t! */
-    if ( p->dir == IOREQ_READ )
-        return 0;
-
-    for ( i = 0; i < HVM_BUFFERED_IO_RANGE_NR; i++ ) {
-        if ( p->addr >= hvm_buffered_io_ranges[i]->start_addr
&&
-             p->addr + p->size - 1 <
hvm_buffered_io_ranges[i]->start_addr +
-                                     hvm_buffered_io_ranges[i]->length )
+    if (p->dir == IOREQ_READ)
+        return 0;
+
+    for (i = 0; i < HVM_BUFFERED_IO_RANGE_NR; i++) {
+        if (p->addr >= hvm_buffered_io_ranges[i]->start_addr
&&
+            p->addr + p->size - 1 <
hvm_buffered_io_ranges[i]->start_addr +
+                                    hvm_buffered_io_ranges[i]->length)
             break;
     }
 
-    if ( i == HVM_BUFFERED_IO_RANGE_NR )
-        return 0;
-
-    buffered_io_lock = &v->domain->arch.hvm_domain.buffered_io_lock;
-    spin_lock(buffered_io_lock);
-
-    if ( buffered_iopage->write_pointer - buffered_iopage->read_pointer
=-         (unsigned long)IOREQ_BUFFER_SLOT_NUM ) {
+    if (i == HVM_BUFFERED_IO_RANGE_NR)
+        return 0;
+
+    bp.type = p->type;
+    bp.dir = p->dir;
+    switch (p->size) {
+    case 1:
+        bp.size = 0;
+        break;
+    case 2:
+        bp.size = 1;
+        break;
+    default:
+	/* Could use quad word semantics, but it only appears
+	 * to be useful for timeoffset data. */
+        return 0;
+    }
+    bp.data = (uint16_t)p->data;
+    bp.addr = (uint32_t)p->addr;
+
+    spin_lock(&v->domain->arch.hvm_domain.buffered_io_lock);
+
+    if (pg->write_pointer - pg->read_pointer == IOREQ_BUFFER_SLOT_NUM) {
         /* the queue is full.
          * send the iopacket through the normal path.
          * NOTE: The arithimetic operation could handle the situation for
          * write_pointer overflow.
          */
-        spin_unlock(buffered_io_lock);
-        return 0;
-    }
-
-    tmp_write_pointer = buffered_iopage->write_pointer %
IOREQ_BUFFER_SLOT_NUM;
-
-    memcpy(&buffered_iopage->ioreq[tmp_write_pointer], p,
sizeof(ioreq_t));
-
-    /*make the ioreq_t visible before write_pointer*/
+        spin_unlock(&v->domain->arch.hvm_domain.buffered_io_lock);
+        return 0;
+    }
+
+    memcpy(&pg->buf_ioreq[pg->write_pointer % IOREQ_BUFFER_SLOT_NUM],
+           &bp, sizeof(bp));
+
+    /* Make the ioreq_t visible before write_pointer */
     wmb();
-    buffered_iopage->write_pointer++;
-
-    spin_unlock(buffered_io_lock);
+    pg->write_pointer++;
+
+    spin_unlock(&v->domain->arch.hvm_domain.buffered_io_lock);
 
     return 1;
 }
-
 
 static void low_mmio_access(VCPU *vcpu, u64 pa, u64 *val, size_t s, int dir)
 {
@@ -110,32 +125,36 @@ static void low_mmio_access(VCPU *vcpu, 
     ioreq_t *p;
 
     vio = get_vio(v->domain, v->vcpu_id);
-    if (vio == 0) {
-        panic_domain(NULL,"bad shared page: %lx", (unsigned
long)vio);
-    }
+    if (!vio)
+        panic_domain(NULL, "bad shared page");
+
     p = &vio->vp_ioreq;
+
     p->addr = pa;
     p->size = s;
     p->count = 1;
+    if (dir == IOREQ_WRITE)
+        p->data = *val;
+    else
+        p->data = 0;
+    p->data_is_ptr = 0;
     p->dir = dir;
-    if (dir==IOREQ_WRITE)     // write;
-        p->data = *val;
-    else if (dir == IOREQ_READ)
-        p->data = 0;          // clear all bits
-    p->data_is_ptr = 0;
+    p->df = 0;
     p->type = 1;
-    p->df = 0;
 
     p->io_count++;
+
     if (hvm_buffered_io_intercept(p)) {
         p->state = STATE_IORESP_READY;
         vmx_io_assist(v);
-        return;
-    } else 
-        vmx_send_assist_req(v);
-    if (dir == IOREQ_READ) { // read
+        if (dir != IOREQ_READ)
+            return;
+    }
+
+    vmx_send_assist_req(v);
+    if (dir == IOREQ_READ)
         *val = p->data;
-    }
+
     return;
 }
 
@@ -227,16 +246,18 @@ static void legacy_io_access(VCPU *vcpu,
     ioreq_t *p;
 
     vio = get_vio(v->domain, v->vcpu_id);
-    if (vio == 0) {
-        panic_domain(NULL,"bad shared page\n");
-    }
+    if (!vio)
+        panic_domain(NULL, "bad shared page\n");
+
     p = &vio->vp_ioreq;
-    p->addr = TO_LEGACY_IO(pa&0x3ffffffUL);
+    p->addr = TO_LEGACY_IO(pa & 0x3ffffffUL);
     p->size = s;
     p->count = 1;
     p->dir = dir;
-    if (dir == IOREQ_WRITE)     // write;
+    if (dir == IOREQ_WRITE)
         p->data = *val;
+    else
+        p->data = 0;
     p->data_is_ptr = 0;
     p->type = 0;
     p->df = 0;




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2007-Oct-30 16:24 UTC

head link

Re: [PATCH] Re: [Xen-devel] [PATCH] Std VGA Performance

Yeah, hopefully that''s a bug in the comment. I would expect 4-byte
accesses
to be possible and be handled. As for 8-byte accesses, they can certainly
happen, why not? Unlikely at start of day, but once we''re in x86/64
mode
there''s no reason why not.

 -- Keir

On 30/10/07 16:19, "Alex Williamson" <alex.williamson@hp.com>
wrote:
>    Yes, you''re right, but easy to overlook, and I''m not
sure how it
> works on x86.  I copied the x86 code for filling in the buffered ioreq,
> but failed to notice that it attempts to store 4 bytes of data into a 2
> byte field...  The comment for the size entry in buf_ioreq could be
> interpreted that only 1, 2, and 8 bytes are expected, but I definitely
> see 4 bytes on occasion.  I''d guess x86 has a bug here
that''s simply not
> exposed because of the 16bit code that''s probably being used to
> initialize VGA.  I also question the 8 byte support, which is why I
> skipped it in the patch below.  Wouldn''t an 8 byte MMIO access
that
> isn''t a timeoffset be possible?  Keir, please apply this to the
staging
> tree.  Thanks,


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Alex Williamson

2007-Oct-30 16:40 UTC

head link

[Xen-ia64-devel] Re: [PATCH] Re: [Xen-devel] [PATCH] Std VGA Performance

On Tue, 2007-10-30 at 16:24 +0000, Keir Fraser wrote:> Yeah, hopefully that''s a bug in the comment. I would expect 4-byte
accesses
> to be possible and be handled. As for 8-byte accesses, they can certainly
> happen, why not? Unlikely at start of day, but once we''re in
x86/64 mode
> there''s no reason why not.
   Right, and it seems that the "quadword" handling is quite specific
to
timeoffset.  It takes advantage of the fact that that there''s no
address
and stuff some of the data in there.  So, I think both 4 & 8 byte
buffered mmio is likely broken right now on x86.  Thanks,

	Alex

-- 
Alex Williamson                             HP Open Source & Linux Org.


_______________________________________________
Xen-ia64-devel mailing list
Xen-ia64-devel@lists.xensource.com
http://lists.xensource.com/xen-ia64-devel

Keir Fraser

2007-Oct-30 17:02 UTC

head link

Re: [PATCH] Re: [Xen-devel] [PATCH] Std VGA Performance

On 30/10/07 16:40, "Alex Williamson" <alex.williamson@hp.com>
wrote:
> On Tue, 2007-10-30 at 16:24 +0000, Keir Fraser wrote:
>> Yeah, hopefully that''s a bug in the comment. I would expect
4-byte accesses
>> to be possible and be handled. As for 8-byte accesses, they can
certainly
>> happen, why not? Unlikely at start of day, but once we''re in
x86/64 mode
>> there''s no reason why not.
> 
>    Right, and it seems that the "quadword" handling is quite
specific to
> timeoffset.  It takes advantage of the fact that that there''s no
address
> and stuff some of the data in there.  So, I think both 4 & 8 byte
> buffered mmio is likely broken right now on x86.  Thanks,
I guess we''ll see how testing goes over the next little while. If the
bufioreq changes prove to be broken we can back them out before 3.2.0, or
better yet fix them ;-).

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Robert Phillips

2007-Oct-31 19:28 UTC

head link

Re: [PATCH] Re: [Xen-devel] [PATCH] Std VGA Performance

Alex is correct;  there is a bug with size=32 operations.  The fix is
simple.  We''ll submit an updated patch very soon.
-- rsp

On 10/30/07, Keir Fraser <Keir.Fraser@cl.cam.ac.uk>
wrote:>
> On 30/10/07 16:40, "Alex Williamson"
<alex.williamson@hp.com> wrote:
>
> > On Tue, 2007-10-30 at 16:24 +0000, Keir Fraser wrote:
> >> Yeah, hopefully that''s a bug in the comment. I would
expect 4-byte
> accesses
> >> to be possible and be handled. As for 8-byte accesses, they can
> certainly
> >> happen, why not? Unlikely at start of day, but once we''re
in x86/64
> mode
> >> there''s no reason why not.
> >
> >    Right, and it seems that the "quadword" handling is quite
specific to
> > timeoffset.  It takes advantage of the fact that that there''s
no address
> > and stuff some of the data in there.  So, I think both 4 & 8 byte
> > buffered mmio is likely broken right now on x86.  Thanks,
>
> I guess we''ll see how testing goes over the next little while. If
the
> bufioreq changes prove to be broken we can back them out before 3.2.0, or
> better yet fix them ;-).
>
> -- Keir
>
>
>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Xen devel - Oct 2007 - [PATCH] Std VGA Performance

[Xen-devel] [PATCH] Std VGA Performance

Re: [Xen-devel] [PATCH] Std VGA Performance

Re: [Xen-devel] [PATCH] Std VGA Performance

Re: [Xen-devel] [PATCH] Std VGA Performance

Re: [Xen-devel] [PATCH] Std VGA Performance

[Xen-ia64-devel] Re: [Xen-devel] [PATCH] Std VGA Performance

Re: [Xen-devel] [PATCH] Std VGA Performance

[PATCH] Re: [Xen-devel] [PATCH] Std VGA Performance

Re: [PATCH] Re: [Xen-devel] [PATCH] Std VGA Performance

[Xen-ia64-devel] Re: [PATCH] Re: [Xen-devel] [PATCH] Std VGA Performance

Re: [PATCH] Re: [Xen-devel] [PATCH] Std VGA Performance

Re: [PATCH] Re: [Xen-devel] [PATCH] Std VGA Performance