thr3ads.net - Xen devel - [Xen-devel] early_cpu_init() and identify

If this information is useful, please help other people find it:
Share via:

Jan Beulich

2007-Jul-13 16:16 UTC

[Xen-devel] early_cpu_init() and identify_cpu()

Is there any reason (other than having things inherited this way from Linux)
that
we cannot call identify_cpu() for the boot CPU at the end of early_cpu_init()
rather than explicitly from __start_xen()? And if not, it would seem reasonable
to me to at once move the two CR4 twiddling pieces out of __start_xen, too.

(I''m not asking because I want to beautify the code, but because I want
the
identify to happen earlier, namely I want to fully set up the VESA console as
early as possible, but there I''d like to be able to set MTRRs, which in
turn
depends on identify_cpu() having executed.

Thanks, Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2007-Jul-13 16:26 UTC

head link

Re: [Xen-devel] early_cpu_init() and identify_cpu()

On 13/7/07 17:16, "Jan Beulich" <jbeulich@novell.com> wrote:
> Is there any reason (other than having things inherited this way from
Linux)
> that
> we cannot call identify_cpu() for the boot CPU at the end of
early_cpu_init()
> rather than explicitly from __start_xen()? And if not, it would seem
> reasonable
> to me to at once move the two CR4 twiddling pieces out of __start_xen, too.
> 
> (I''m not asking because I want to beautify the code, but because I
want the
> identify to happen earlier, namely I want to fully set up the VESA console
as
> early as possible, but there I''d like to be able to set MTRRs,
which in turn
> depends on identify_cpu() having executed.
Isn''t it a fairly safe bet that the BIOS will have done this for us
and, if
not, that the penalty is a performance loss (probably using WB or UC instead
of WC) rather than a correctness issue? And hence, if we bother to update
the MTRRs at all, then it can at least be left until later in the boot?

 -- Keir


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jan Beulich

2007-Jul-16 06:14 UTC

head link

Re: [Xen-devel] early_cpu_init() and identify_cpu()

>>> Keir Fraser <keir@xensource.com> 13.07.07 18:26 >>>
>On 13/7/07 17:16, "Jan Beulich" <jbeulich@novell.com> wrote:
>
>> Is there any reason (other than having things inherited this way from
Linux)
>> that
>> we cannot call identify_cpu() for the boot CPU at the end of
early_cpu_init()
>> rather than explicitly from __start_xen()? And if not, it would seem
>> reasonable
>> to me to at once move the two CR4 twiddling pieces out of __start_xen,
too.
>> 
>> (I''m not asking because I want to beautify the code, but
because I want the
>> identify to happen earlier, namely I want to fully set up the VESA
console as
>> early as possible, but there I''d like to be able to set MTRRs,
which in turn
>> depends on identify_cpu() having executed.
>
>Isn''t it a fairly safe bet that the BIOS will have done this for us
and, if
>not, that the penalty is a performance loss (probably using WB or UC instead
>of WC) rather than a correctness issue? And hence, if we bother to update
>the MTRRs at all, then it can at least be left until later in the boot?
I''ve never seen a BIOS set the frame buffer to WC (or anything other
than
uncachable), presumably not the least because the address space may get
laid out anew by the OS. And the performance loss in our case is quite
significant (though I assume WC would at best help a little, I''m
considering
other approaches, too): Scrolling a 1280x1024x16 screen takes, on the test
system I''m primarily trying this out on, on the order of a second. This
is
because we will want to not disturb what dom0 may have written to the
screen, and hence we can''t simply redraw. And I''m not certain
I want to
special case boot time here (although I may have no other option - delay
in that order can easily lead to other failures [like CPUs not properly coming
up]).

Of course, I could make the two stage vesafb initialization as it is right now
a three stage one, doing just the MTRR request in the last. But I wanted to
avoid making the code more spaghetti like than necessary just because of
this little feature...

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2007-Jul-16 06:25 UTC

head link

Re: [Xen-devel] early_cpu_init() and identify_cpu()

On 16/7/07 07:14, "Jan Beulich" <jbeulich@novell.com> wrote:
> I''ve never seen a BIOS set the frame buffer to WC (or anything
other than
> uncachable), presumably not the least because the address space may get
> laid out anew by the OS. And the performance loss in our case is quite
> significant (though I assume WC would at best help a little, I''m
considering
> other approaches, too): Scrolling a 1280x1024x16 screen takes, on the test
> system I''m primarily trying this out on, on the order of a second.
This is
> because we will want to not disturb what dom0 may have written to the
> screen, and hence we can''t simply redraw. And I''m not
certain I want to
> special case boot time here (although I may have no other option - delay
> in that order can easily lead to other failures [like CPUs not properly
coming
> up]).
> 
> Of course, I could make the two stage vesafb initialization as it is right
now
> a three stage one, doing just the MTRR request in the last. But I wanted to
> avoid making the code more spaghetti like than necessary just because of
> this little feature...
I''d rather have a later call (but not that late -- before secondary
CPUs
come up is fine) than re-order start-of-day cpu detection code. At least in
a first patchset! The MTRR update will still happen before scrolling needs
to occur for the first time, and I don''t see that the code will become
spaghetti because of it.

By the way, what makes you think that redrawing the whole screen (presumably
re-pasting text characters one-by-one) would be faster than scrolling?
Sounds slower to me, or is this because read-plus-write of UC memory sucks?

How much faster is scrolling of WC framebuffer on your test system?

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jan Beulich

2007-Jul-16 06:41 UTC

head link

Re: [Xen-devel] early_cpu_init() and identify_cpu()

>I''d rather have a later call (but not that late -- before secondary
CPUs
>come up is fine) than re-order start-of-day cpu detection code. At least in
>a first patchset! The MTRR update will still happen before scrolling needs
>to occur for the first time, and I don''t see that the code will
become
>spaghetti because of it.
Okay, will do it that way then.
>By the way, what makes you think that redrawing the whole screen (presumably
>re-pasting text characters one-by-one) would be faster than scrolling?
>Sounds slower to me, or is this because read-plus-write of UC memory sucks?
Yes, exactly that (really its presumably mostly the data dependency of the
writes on the reads, which in a write-only scenario is so much smaller).
>How much faster is scrolling of WC framebuffer on your test system?
Haven''t tested yet, as I haven''t made the adjustments to make
WC work so far
(finding why it doesn''t work was the last thing I did on Friday)... But
as I said,
I don''t expect much gain from *just* the attribute change, as reads
will continue
to be done UC. As I said, I''m considering alternatives...

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2007-Jul-16 06:57 UTC

head link

Re: [Xen-devel] early_cpu_init() and identify_cpu()

On 16/7/07 07:41, "Jan Beulich" <jbeulich@novell.com> wrote:
>> By the way, what makes you think that redrawing the whole screen
(presumably
>> re-pasting text characters one-by-one) would be faster than scrolling?
>> Sounds slower to me, or is this because read-plus-write of UC memory
sucks?
> 
> Yes, exactly that (really its presumably mostly the data dependency of the
> writes on the reads, which in a write-only scenario is so much smaller).
> 
>> How much faster is scrolling of WC framebuffer on your test system?
> 
> Haven''t tested yet, as I haven''t made the adjustments to
make WC work so far
> (finding why it doesn''t work was the last thing I did on
Friday)... But as I
> said,
> I don''t expect much gain from *just* the attribute change, as
reads will
> continue
> to be done UC. As I said, I''m considering alternatives...
Yes, I''d be surprised if WC is that much of a win, since Linux
certainly
doesn''t appear to mess with MTRRs by default.

The best option would be to re-draw until dom0 starts to boot. After that
switch to scroll, but from that point on Xen doesn''t write much to the
console anyway. Supporting both ways doesn''t sound that hard, and
there''s
already a console_endboot() hook to trigger the switch in behaviour.

But if Linux re-draws the whole screen instead of scrolling, won''t
Xen''s
output get overwritten anyway? That would limit ''vga=keep''s
usefulness,
except for crash dumps after which Linux dom0 will not print any more (I
suppose that is the main most useful case though). And if Linux *does*
scroll, how come *its* performance doesn''t suck?

 -- Keir

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2007-Jul-16 07:01 UTC

head link

Re: [Xen-devel] early_cpu_init() and identify_cpu()

On 16/7/07 07:57, "Keir Fraser" <Keir.Fraser@cl.cam.ac.uk>
wrote:
> But if Linux re-draws the whole screen instead of scrolling, won''t
Xen''s
> output get overwritten anyway? That would limit
''vga=keep''s usefulness,
> except for crash dumps after which Linux dom0 will not print any more (I
> suppose that is the main most useful case though).
If you only cared about crash dumps you could just always use re-draw
behaviour, disable Xen output after console_endboot(), but always
*re-enable* output, at coordinate (0,0) top-left of screen, on any call to
start_sync_console(). That usually indicates something important is being
printed. :-) Starting at top-left means you do not obliterate the most
recent dom0 output.

Would that be acceptable?

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jan Beulich

2007-Jul-16 07:30 UTC

head link

Re: [Xen-devel] early_cpu_init() and identify_cpu()

>The best option would be to re-draw until dom0 starts to boot. After that
>switch to scroll, but from that point on Xen doesn''t write much to
the
>console anyway. Supporting both ways doesn''t sound that hard, and
there''s
... depending on the log level. I found it quite helpful to force loglvl=all,
and certainly there''s stuff being printed with that.
>already a console_endboot() hook to trigger the switch in behaviour.
Sure.
>But if Linux re-draws the whole screen instead of scrolling, won''t
Xen''s
>output get overwritten anyway? That would limit
''vga=keep''s usefulness,
Yes, it does. But there''s no way to avoid that other than really
establishing
co-operation between Xen and dom0, which I don''t think would look like
being upstreamable.
>except for crash dumps after which Linux dom0 will not print any more (I
>suppose that is the main most useful case though). And if Linux *does*
Yes, that''s certainly the main intention (after all, you have to force
keeping
console output in the first place in order to get into that situation).
>scroll, how come *its* performance doesn''t suck?
Again, redraw scrolling doesn''t suck severely, only moving video memory
from one place to another does.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jan Beulich

2007-Jul-16 07:31 UTC

head link

Re: [Xen-devel] early_cpu_init() and identify_cpu()

>>> Keir Fraser <Keir.Fraser@cl.cam.ac.uk> 16.07.07 09:01
>>>
>On 16/7/07 07:57, "Keir Fraser" <Keir.Fraser@cl.cam.ac.uk>
wrote:
>
>> But if Linux re-draws the whole screen instead of scrolling,
won''t Xen''s
>> output get overwritten anyway? That would limit
''vga=keep''s usefulness,
>> except for crash dumps after which Linux dom0 will not print any more
(I
>> suppose that is the main most useful case though).
>
>If you only cared about crash dumps you could just always use re-draw
>behaviour, disable Xen output after console_endboot(), but always
>*re-enable* output, at coordinate (0,0) top-left of screen, on any call to
>start_sync_console(). That usually indicates something important is being
>printed. :-) Starting at top-left means you do not obliterate the most
>recent dom0 output.
>
>Would that be acceptable?
It would be an option, but not my preferred solution.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jan Beulich

2007-Jul-16 08:18 UTC

head link

Re: [Xen-devel] early_cpu_init() and identify_cpu()

>How much faster is scrolling of WC framebuffer on your test system?
That''s a win of about 30%, and using movsq in memcpy() is another win
of
about 50%. Will probably create a vidmemmove()...

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Alan Cox

2007-Jul-16 09:23 UTC

head link

Re: [Xen-devel] early_cpu_init() and identify_cpu()

> significant (though I assume WC would at best help a little, I''m
considering
> other approaches, too): Scrolling a 1280x1024x16 screen takes, on the test
> system I''m primarily trying this out on, on the order of a second.
This is
A lot of work on this has been done in X by the X and Gnome developers
including producing some routines which use the largest possible aligned
loads to reduce the big cost (which is PCI reads). Having a write
combining video memory can help (but some cards also put stuff like
command queues there which X cannot WC).

It is usually much cheaper to simply rewrite a text display onto a
framebuffer than scroll it as you can avoid any PCI read traffic so you
get streaming WC writes (or async sse writes even without that). Even
more so when you skip unchanged quadwords.

Alan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2007-Jul-17 07:50 UTC

head link

Re: [Xen-devel] early_cpu_init() and identify_cpu()

On 16/7/07 09:18, "Jan Beulich" <jbeulich@novell.com> wrote:
>> How much faster is scrolling of WC framebuffer on your test system?
> 
> That''s a win of about 30%, and using movsq in memcpy() is another
win of
> about 50%. Will probably create a vidmemmove()...
That still totally sucks then. Paging a full screen of text of, say 60
lines, done line-by-line will still take, say 60*300ms == 20 seconds(ish).

Implement this as a command-line option only if you must have it.

 -- Keir


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2007-Jul-17 07:55 UTC

head link

Re: [Xen-devel] early_cpu_init() and identify_cpu()

On 16/7/07 10:23, "Alan Cox" <alan@lxorguk.ukuu.org.uk> wrote:
>> significant (though I assume WC would at best help a little,
I''m considering
>> other approaches, too): Scrolling a 1280x1024x16 screen takes, on the
test
>> system I''m primarily trying this out on, on the order of a
second. This is
> 
> A lot of work on this has been done in X by the X and Gnome developers
> including producing some routines which use the largest possible aligned
> loads to reduce the big cost (which is PCI reads). Having a write
> combining video memory can help (but some cards also put stuff like
> command queues there which X cannot WC).
Does this mean that defaulting to setting up WC on a power-of-two-sized
region starting at the framebuffer address is not really safe on a fair
number of systems?

 -- Keir


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jan Beulich

2007-Jul-17 09:58 UTC

head link

Re: [Xen-devel] early_cpu_init() and identify_cpu()

>>> Keir Fraser <keir@xensource.com> 17.07.07 09:50 >>>
>On 16/7/07 09:18, "Jan Beulich" <jbeulich@novell.com> wrote:
>
>>> How much faster is scrolling of WC framebuffer on your test system?
>> 
>> That''s a win of about 30%, and using movsq in memcpy() is
another win of
>> about 50%. Will probably create a vidmemmove()...
>
>That still totally sucks then. Paging a full screen of text of, say 60
>lines, done line-by-line will still take, say 60*300ms == 20 seconds(ish).
No, you got me wrong - with ''scrolling by one line'' I mean the
scrolling the
entire screen up by a line. It''s about half as fast as Linux during
boot now,
visibly slower post-boot.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jan Beulich

2007-Jul-17 10:00 UTC

head link

Re: [Xen-devel] early_cpu_init() and identify_cpu()

>>> Keir Fraser <keir@xensource.com> 17.07.07 09:55 >>>
>On 16/7/07 10:23, "Alan Cox" <alan@lxorguk.ukuu.org.uk>
wrote:
>
>>> significant (though I assume WC would at best help a little,
I''m considering
>>> other approaches, too): Scrolling a 1280x1024x16 screen takes, on
the test
>>> system I''m primarily trying this out on, on the order of a
second. This is
>> 
>> A lot of work on this has been done in X by the X and Gnome developers
>> including producing some routines which use the largest possible
aligned
>> loads to reduce the big cost (which is PCI reads). Having a write
>> combining video memory can help (but some cards also put stuff like
>> command queues there which X cannot WC).
>
>Does this mean that defaulting to setting up WC on a power-of-two-sized
>region starting at the framebuffer address is not really safe on a fair
>number of systems?
So I suppose, also based on the fact that Linux is defensive here too in
defaulting to not touching the MTRRs at all. I implemented this in a similar
fashion for Xen now.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2007-Jul-17 10:05 UTC

head link

Re: [Xen-devel] early_cpu_init() and identify_cpu()

On 17/7/07 10:58, "Jan Beulich" <jbeulich@novell.com> wrote:
>>> That''s a win of about 30%, and using movsq in memcpy() is
another win of
>>> about 50%. Will probably create a vidmemmove()...
>> 
>> That still totally sucks then. Paging a full screen of text of, say 60
>> lines, done line-by-line will still take, say 60*300ms == 20
seconds(ish).
> 
> No, you got me wrong - with ''scrolling by one line'' I
mean the scrolling the
> entire screen up by a line. It''s about half as fast as Linux
during boot now,
> visibly slower post-boot.
Re-draw is going to be the default, even if I have to implement it on top of
your patch. :-)

The fact that dom0 will overwrite Xen''s output anyway as it re-draws
seems
to make scrolling-as-default the inferior option because it''s
significantly
slower and can provide benefit really only for crash dumps (where a re-draw
based scheme can work just as well).

So I don''t understand your attachment to scrolling.

 -- Keir


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2007-Jul-17 10:15 UTC

head link

Re: [Xen-devel] early_cpu_init() and identify_cpu()

On 17/7/07 11:00, "Jan Beulich" <jbeulich@novell.com> wrote:
>> Does this mean that defaulting to setting up WC on a power-of-two-sized
>> region starting at the framebuffer address is not really safe on a fair
>> number of systems?
> 
> So I suppose, also based on the fact that Linux is defensive here too in
> defaulting to not touching the MTRRs at all. I implemented this in a
similar
> fashion for Xen now.
How about remapping the framebuffer specifying WC in the PAT bits, if the
CPU is detected to support PAT? This might work more safely because we can
map at 4kB granularity rather than merely power-of-two. It depends on how
close the command queues actually are to the lfb in the VRAM map.

If we just mapped the 4kB-rounded region specified by lfb_base to
lfb_base+lfb_size (as determined via the VBE Get Mode Info call) as WC,
would that be safe?

If so we could use that unconditionally and avoid any MTRR-poking code. PAT
has been around for ages now.

 -- Keir


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2007-Jul-17 10:23 UTC

head link

Re: [Xen-devel] early_cpu_init() and identify_cpu()

On 17/7/07 11:15, "Keir Fraser" <keir@xensource.com> wrote:
> If we just mapped the 4kB-rounded region specified by lfb_base to
> lfb_base+lfb_size (as determined via the VBE Get Mode Info call) as WC,
> would that be safe?
Actually this is probably okay because any software that does want to access
command queues (e.g., X server) will have its own mapping that will not
specify PAT.WC. In contrast MTRR type specification is used by all mappings
of that physical address range. So long as aliasing of WC and UC was not a
problem, this would work fine. But I think it''s WC aliasing with WB/WT
that
is a problem.

 -- Keir


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Xen devel - Jul 2007 - early_cpu_init() and identify_cpu()

[Xen-devel] early_cpu_init() and identify_cpu()

Re: [Xen-devel] early_cpu_init() and identify_cpu()

Re: [Xen-devel] early_cpu_init() and identify_cpu()

Re: [Xen-devel] early_cpu_init() and identify_cpu()

Re: [Xen-devel] early_cpu_init() and identify_cpu()

Re: [Xen-devel] early_cpu_init() and identify_cpu()

Re: [Xen-devel] early_cpu_init() and identify_cpu()

Re: [Xen-devel] early_cpu_init() and identify_cpu()

Re: [Xen-devel] early_cpu_init() and identify_cpu()

Re: [Xen-devel] early_cpu_init() and identify_cpu()

Re: [Xen-devel] early_cpu_init() and identify_cpu()

Re: [Xen-devel] early_cpu_init() and identify_cpu()

Re: [Xen-devel] early_cpu_init() and identify_cpu()

Re: [Xen-devel] early_cpu_init() and identify_cpu()

Re: [Xen-devel] early_cpu_init() and identify_cpu()

Re: [Xen-devel] early_cpu_init() and identify_cpu()

Re: [Xen-devel] early_cpu_init() and identify_cpu()

Re: [Xen-devel] early_cpu_init() and identify_cpu()