thr3ads.net - Xen devel - [Xen-devel] xen.git branch reorg [Apr 2009]

If this information is useful, please help other people find it:
Share via:

Jeremy Fitzhardinge

2009-Apr-23 17:38 UTC

[Xen-devel] xen.git branch reorg

I finally fixed the AHCI problem with xen-tip/next and pushed forward 
with the long-threatened xen.git cleanup and reorg.

I''ve removed a pile of branches on 
git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen.git:

    * xen/*
    * push2/*
    * for-*/*

aside from some branches which contain some work which I need to look 
over again and work out what to do with.

*PLEASE* tell me if I''ve accidentally deleted a branch with something 
important, and I''ll reinstate it.

All the changesets are still there in the repo, and if you have any 
local branches referring to these branches then they''ll stay around 
indefinitely.  The removal just means that the branches won''t confuse 
any newcomers, and it makes it clear that no further development is 
going to happen on them.

The new branch structure is similar to the old one in overall layout.  
There are two "merged" branches:

    * xen-tip/master - will try to keep as a known-working branch, with
      only tested changes
    * xen-tip/next - current bleeding edge; should at least compile

My planned workflow is:

   1. new development happens on topic branches
   2. those changes are merged with xen-tip/next until they test OK
   3. the changes are then merged onto master (either directly off next,
      or cleanly re-merged)
   4. upstream branches are merged with next and master like topic
      branches; I''ll avoid merging them into xen.git topic branches
      unless its really necessary

I won''t generally rebase any of the branches, though the
"next" and
"master" are more likely to be rebased than the topic branches.

The current set of topic branches are:

    * xen-tip/core
          o core Xen stuff; currently all upstream
    * xen-tip/dom0/acpi
          o host S3 suspend/resume (untested, unmerged)
    * xen-tip/dom0/apic
          o apic changes
    * xen-tip/dom0/backend/core
    * xen-tip/dom0/backend/blkback
    * xen-tip/dom0/backend/netback
          o backend devices
    * xen-tip/dom0/core
          o essential dom0 changes
    * xen-tip/dom0/drm
          o drm/dri changes
    * xen-tip/dom0/gntdev
          o /dev/gntdev
    * xen-tip/dom0/microcode
          o CPU microcode driver
    * xen-tip/dom0/mtrr
          o /proc/mtrr stuff
    * xen-tip/dom0/pci
          o general dom0 PCI/device access changes
    * xen-tip/dom0/swiotlb
          o Xen swiotlb changes
    * xen-tip/dom0/xenfs
          o /proc/xen/privcmd


Thanks,
    J

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Pasi Kärkkäinen

2009-Apr-23 18:24 UTC

head link

Re: [Xen-devel] xen.git branch reorg

On Thu, Apr 23, 2009 at 10:38:31AM -0700, Jeremy Fitzhardinge
wrote:> I finally fixed the AHCI problem with xen-tip/next and pushed forward 
> with the long-threatened xen.git cleanup and reorg.
> 
> The new branch structure is similar to the old one in overall layout.  
> There are two "merged" branches:
> 
>    * xen-tip/master - will try to keep as a known-working branch, with
>      only tested changes
>    * xen-tip/next - current bleeding edge; should at least compile
> 
I''ll try upgrading from dom0/hackery to xen-tip/next and see how it
works
for me. 

Btw how does dom0 upstreaming look at the moment? Ingo sent pull request
about some changes, and those got merged, but how about the rest? 

-- Pasi

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jeremy Fitzhardinge

2009-Apr-23 18:32 UTC

head link

Re: [Xen-devel] xen.git branch reorg

Pasi Kärkkäinen wrote:> I''ll try upgrading from dom0/hackery to xen-tip/next and see how
it works
> for me. 
>   
Thanks.
> Btw how does dom0 upstreaming look at the moment? Ingo sent pull request
> about some changes, and those got merged, but how about the rest? 
Ingo basically ignored all the Xen changes in the leadup to the merge 
window, then stomped on my attempt to get them merged with Linus, which 
was all pretty annoying.  It had the doubly-irritating side-effect of 
casting doubt over the controversy-free domU changes, so they didn''t
get
merged in the merge window either; by the time Ingo got around to OKing 
them, the window had closed.  So all that got merged in the end was the 
must-have bug fixes.

Linus isn''t going to pull any more major functionality changes in the 
-rc kernels, and certainly isn''t going to make an exception for Xen. 
So
we''re stuck with waiting for the .31 merge window.

    J

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Alex Zeffertt

2009-Apr-24 08:59 UTC

head link

Re: [Xen-devel] xen.git branch reorg

Jeremy Fitzhardinge wrote:> I finally fixed the AHCI problem with xen-tip/next and pushed forward 
> with the long-threatened xen.git cleanup and reorg.
> 
> I''ve removed a pile of branches on 
> git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen.git:
> 
>     * xen/*
>     * push2/*
>     * for-*/*
> 

Does this mean we need this patch in xen-unstable.hg?

Update XEN_LINUX_GIT_REMOTEBRANCH to match changes made in upstream repo.
Needed if you want setting KERNELS=linux-2.6-pvops in config/Linux.mk to
work.

diff -r 8b152638adaa buildconfigs/mk.linux-2.6-pvops
--- a/buildconfigs/mk.linux-2.6-pvops	Thu Apr 23 16:22:48 2009 +0100
+++ b/buildconfigs/mk.linux-2.6-pvops	Fri Apr 24 09:53:40 2009 +0100
@@ -7,7 +7,7 @@

  XEN_LINUX_GIT_URL ?=
git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen.git
  XEN_LINUX_GIT_REMOTENAME ?= xen
-XEN_LINUX_GIT_REMOTEBRANCH ?= xen/dom0/hackery
+XEN_LINUX_GIT_REMOTEBRANCH ?= xen-tip/master

  EXTRAVERSION ?




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Boris Derzhavets

2009-Apr-24 10:33 UTC

head link

Re: [Xen-devel] xen.git branch reorg

Kernel been built based on xen-tip/next  appears to have name 2.6.30-rc2-tip
and behaves under Xen 3.4-rc2-pre as usual. No problems were noticed with PV
DomUs
for   CentOS 5.3, F10, Ubuntu Server 9.04. However, remote VNC connection to
Ubuntu
Server 9.04 PV DomU seems to be extrtemely slow. VNC connection to same DomU
from Dom0 runs fine. I''ll try to test this issue for CentOS and F10
DomUs ASAP.
I also have to notice that remote VNC connection to Ubuntu 9.04 DomU running at
the 
same Xen 3.4 version Dom0 with Suse''s 2.6.27.5 xen-ified kernel behaves
just fine.
IP6v connection via vinagre (for Ubuntu Server 9.04 DomU) behaves exactly same
way as old fashioned. No problems when been established from Dom0 and almost
dead remotely.

Boris.

--- On Thu, 4/23/09, Pasi Kärkkäinen <pasik@iki.fi> wrote:

From: Pasi Kärkkäinen <pasik@iki.fi>
Subject: Re: [Xen-devel] xen.git branch reorg
To: "Jeremy Fitzhardinge" <jeremy@goop.org>
Cc: "Xen-devel" <xen-devel@lists.xensource.com>
Date: Thursday, April 23, 2009, 2:24 PM

On Thu, Apr 23, 2009 at 10:38:31AM -0700, Jeremy Fitzhardinge
wrote:> I finally fixed the AHCI problem with xen-tip/next and pushed forward 
> with the long-threatened xen.git cleanup and reorg.
> 
> The new branch structure is similar to the old one in overall layout.  
> There are two "merged" branches:
> 
>    * xen-tip/master - will try to keep as a known-working branch, with
>      only tested changes
>    * xen-tip/next - current bleeding edge; should at least compile
> 
I''ll try upgrading from dom0/hackery to xen-tip/next and see how it
works
for me. 

Btw how does dom0 upstreaming look at the moment? Ingo sent pull request
about some changes, and those got merged, but how about the rest? 

-- Pasi

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jeremy Fitzhardinge

2009-Apr-24 18:17 UTC

head link

Re: [Xen-devel] xen.git branch reorg

Boris Derzhavets wrote:> Kernel been built based on xen-tip/next  appears to have name 
> 2.6.30-rc2-tip
> and behaves under Xen 3.4-rc2-pre as usual. No problems were noticed 
> with PV DomUs
> for   CentOS 5.3, F10, Ubuntu Server 9.04. However, remote VNC 
> connection to Ubuntu
> Server 9.04 PV DomU seems to be extrtemely slow. VNC connection to 
> same DomU
> from Dom0 runs fine. I''ll try to test this issue for CentOS and
F10
> DomUs ASAP.
> I also have to notice that remote VNC connection to Ubuntu 9.04 DomU 
> running at the 
> same Xen 3.4 version Dom0 with Suse''s 2.6.27.5 xen-ified kernel 
> behaves just fine.
> IP6v connection via vinagre (for Ubuntu Server 9.04 DomU) behaves 
> exactly same way as old fashioned. No problems when been established 
> from Dom0 and almost dead remotely.
>
Is it always consistent with the same kernel, or does it change from 
boot to boot?

Could you try to work out what''s actually failing with 
tcpdump/wireshark, both from within the domU, and from dom0?  Are 
packets getting lost on tx or rx, or very delayed, or something else?

Thanks,
    J


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Boris Derzhavets

2009-Apr-24 18:50 UTC

head link

Re: [Xen-devel] xen.git branch reorg

 In meantime time i see,  that 2.6.30-rc3&rc2&rc1-tip are affected.
Solution is the same as on Solaris xVM Linux DomUs about one 
year ago -  is  to disable checksum (nothing else) offloading at Linux DomUs (
CentOS 5.3, Ubuntu 9.04)

/usr/local/sbin/ethtool -K etho tx off

It immediately brings remote VNC connections to the nice shape.
Actually , speeds up network.  

I will run "tcpdump" through this weekend to find out what''s
going
wrong. To be honest,  i have experience with catching checksum offloading
failure via tcpdump''s  capturing only on Solaris Nevada xVM ;)
But, i''ll post the logs captured anyway.
Just a brief instruction where to run tcpdump ( and what command line keys are
needed ) would help a lot.

Thanks
Boris




--- On Fri, 4/24/09, Jeremy Fitzhardinge <jeremy@goop.org> wrote:
From: Jeremy Fitzhardinge <jeremy@goop.org>
Subject: Re: [Xen-devel] xen.git branch reorg
To: bderzhavets@yahoo.com
Cc: "Pasi Kärkkäinen" <pasik@iki.fi>, "Xen-devel"
<xen-devel@lists.xensource.com>
Date: Friday, April 24, 2009, 2:17 PM

Boris Derzhavets wrote:> Kernel been built based on xen-tip/next  appears to have name
2.6.30-rc2-tip> and behaves under Xen 3.4-rc2-pre as usual. No problems were noticed with
PV DomUs> for   CentOS 5.3, F10, Ubuntu Server 9.04. However, remote VNC connection
to Ubuntu> Server 9.04 PV DomU seems to be extrtemely slow. VNC connection to same
DomU> from Dom0 runs fine. I''ll try to test this issue for CentOS and
F10
DomUs ASAP.> I also have to notice that remote VNC connection to Ubuntu 9.04 DomUrunning at the same Xen 3.4 version Dom0 with Suse''s 2.6.27.5 xen-ified
kernel behaves just fine.> IP6v connection via vinagre (for Ubuntu Server 9.04 DomU) behaves exactlysame way as old fashioned. No problems when been established from Dom0 and
almost dead remotely.> 
Is it always consistent with the same kernel, or does it change from boot to
boot?

Could you try to work out what''s actually failing with
tcpdump/wireshark,
both from within the domU, and from dom0?  Are packets getting lost on tx or rx,
or very delayed, or something else?

Thanks,
   J




      

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Boris Derzhavets

2009-Apr-24 18:56 UTC

head link

Re: [Xen-devel] xen.git branch reorg

It''s also always consistent with each one of rc1,rc2,rc3 kernels

Boris.

--- On Fri, 4/24/09, Jeremy Fitzhardinge <jeremy@goop.org> wrote:
From: Jeremy Fitzhardinge <jeremy@goop.org>
Subject: Re: [Xen-devel] xen.git branch reorg
To: bderzhavets@yahoo.com
Cc: "Xen-devel" <xen-devel@lists.xensource.com>
Date: Friday, April 24, 2009, 2:17 PM

Boris Derzhavets wrote:> Kernel been built based on xen-tip/next  appears to have name
2.6.30-rc2-tip> and behaves under Xen 3.4-rc2-pre as usual. No problems were noticed with
PV DomUs> for   CentOS 5.3, F10, Ubuntu Server 9.04. However, remote VNC connection
to Ubuntu> Server 9.04 PV DomU seems to be extrtemely slow. VNC connection to same
DomU> from Dom0 runs fine. I''ll try to test this issue for CentOS and
F10
DomUs ASAP.> I also have to notice that remote VNC connection to Ubuntu 9.04 DomUrunning at the same Xen 3.4 version Dom0 with Suse''s 2.6.27.5 xen-ified
kernel behaves just fine.> IP6v connection via vinagre (for Ubuntu Server 9.04 DomU) behaves exactlysame way as old fashioned. No problems when been established from Dom0 and
almost dead remotely.> 
Is it always consistent with the same kernel, or does it change from boot to
boot?

Could you try to work out what''s actually failing with
tcpdump/wireshark,
both from within the domU, and from dom0?  Are packets getting lost on tx or rx,
or very delayed, or something else?

Thanks,
   J

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jeremy Fitzhardinge

2009-Apr-24 19:48 UTC

head link

Re: [Xen-devel] xen.git branch reorg

Boris Derzhavets wrote:>
>  In meantime time i see,  that 2.6.30-rc3&rc2&rc1-tip are affected.
> Solution is the same as on Solaris xVM Linux DomUs about one
> year ago -  is  to disable checksum (nothing else) offloading at Linux 
> DomUs ( CentOS 5.3, Ubuntu 9.04)
>
> /usr/local/sbin/ethtool -K etho tx off
>
OK, that''s a good lead.

Ian, do you remember the story around this checksumming stuff?  Has 
something dropped off netback (or front) that we need?
   >  
> I will run "tcpdump" through this weekend to find out
what''s going
> wrong. To be honest,  i have experience with catching checksum offloading
> failure via tcpdump''s  capturing only on Solaris Nevada xVM ;)
> But, i''ll post the logs captured anyway.
> Just a brief instruction where to run tcpdump ( and what command line 
> keys are needed ) would help a lot.
>
Actually, I think that''s enough to go on for now.

    J

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Christophe Saout

2009-Apr-24 22:39 UTC

head link

Re: [Xen-devel] xen.git branch reorg

Hi Jeremy,
> >  In meantime time i see,  that 2.6.30-rc3&rc2&rc1-tip are
affected.
> > Solution is the same as on Solaris xVM Linux DomUs about one
> > year ago -  is  to disable checksum (nothing else) offloading at Linux
> > DomUs ( CentOS 5.3, Ubuntu 9.04)
> >
> > /usr/local/sbin/ethtool -K etho tx off
>
> OK, that''s a good lead.
Yes, I''ve been seeing this too (and meant to investigate it before
claiming there''s abug) and I can confirm that turning off segmentation
offloading "cures" the problem here too.

Now the tcpdump on Dom0 looks interesting.  It repeatedly sees a packet
with 2880 byte from DomU coming in, which is then dropped and ICMP
"fragmentation needed" sent back, the DomU resends a 1440 byte packet
(after some delay), which then goes through, but then the next one is a
2880 byte one again, and so on.

FYI: My Dom0 is running NAT, in case this is relevant.

	Christophe



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Boris Derzhavets

2009-Apr-25 06:55 UTC

head link

Re: [Xen-devel] xen.git branch reorg

Chris,

I have to apologize. Turning off segmentation offloading may work as well.
Just now it fixed VNC issue on bare metal - CentOS 5.2 with Atansic Gigabit
Ethernet driver (ASUS P5KR) . I gonna try it at DomUs at my earliest
convenience.

Boris.

--- On Fri, 4/24/09, Christophe Saout <christophe@saout.de> wrote:
From: Christophe Saout <christophe@saout.de>
Subject: Re: [Xen-devel] xen.git branch reorg
To: "Jeremy Fitzhardinge" <jeremy@goop.org>
Cc: bderzhavets@yahoo.com, "Xen-devel"
<xen-devel@lists.xensource.com>, "Ian Campbell"
<Ian.Campbell@citrix.com>
Date: Friday, April 24, 2009, 6:39 PM

Hi Jeremy,
> >  In meantime time i see,  that 2.6.30-rc3&rc2&rc1-tip are
affected.> > Solution is the same as on Solaris xVM Linux DomUs about one
> > year ago -  is  to disable checksum (nothing else) offloading at
Linux > > DomUs ( CentOS 5.3, Ubuntu 9.04)
> >
> > /usr/local/sbin/ethtool -K etho tx off
>
> OK, that''s a good lead.
Yes, I''ve been seeing this too (and meant to investigate it before
claiming there''s abug) and I can confirm that turning off segmentation
offloading "cures" the problem here too.

Now the tcpdump on Dom0 looks interesting.  It repeatedly sees a packet
with 2880 byte from DomU coming in, which is then dropped and ICMP
"fragmentation needed" sent back, the DomU resends a 1440 byte packet
(after some delay), which then goes through, but then the next one is a
2880 byte one again, and so on.

FYI: My Dom0 is running NAT, in case this is relevant.

	Christophe

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Venefax

2009-Apr-25 07:03 UTC

head link

RE: [Xen-devel] xen.git branch reorg

What are the exact commands to turn off Segmentation Offload and Checksum
Offload?

Also, should this be done only at Dom0 or also at DomU''s?

Is this correct or should it be "eth0"?

 

/usr/local/sbin/ethtool -K etho tx off

F.Alves

From: xen-devel-bounces@lists.xensource.com
[mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Boris Derzhavets
Sent: Saturday, April 25, 2009 2:55 AM
To: Jeremy Fitzhardinge; Christophe Saout
Cc: Xen-devel; Ian Campbell
Subject: Re: [Xen-devel] xen.git branch reorg

 


Chris,

I have to apologize. Turning off segmentation offloading may work as well.
Just now it fixed VNC issue on bare metal - CentOS 5.2 with Atansic Gigabit
Ethernet driver (ASUS P5KR) . I gonna try it at DomUs at my earliest
convenience.

Boris.

--- On Fri, 4/24/09, Christophe Saout <christophe@saout.de> wrote:

From: Christophe Saout <christophe@saout.de>
Subject: Re: [Xen-devel] xen.git branch reorg
To: "Jeremy Fitzhardinge" <jeremy@goop.org>
Cc: bderzhavets@yahoo.com, "Xen-devel"
<xen-devel@lists.xensource.com>, "Ian
Campbell" <Ian.Campbell@citrix.com>
Date: Friday, April 24, 2009, 6:39 PM

Hi Jeremy,


  

> >  In meantime time i see,  that 2.6.30-rc3&rc2&rc1-tip are


affected.

> > Solution is the same as on Solaris xVM Linux DomUs about one
> > year ago -  is  to disable checksum (nothing else) offloading at

Linux 

> > DomUs ( CentOS 5.3, Ubuntu 9.04)
> >
> > /usr/local/sbin/ethtool -K etho tx off
>
> OK, that''s a good lead.

  


Yes, I''ve been seeing this too (and meant to investigate it before


claiming there''s abug) and I can confirm that turning off segmentation


offloading "cures" the problem here too.


  


Now the tcpdump on Dom0 looks interesting.  It repeatedly sees a packet


with 2880 byte from DomU coming in, which is then dropped and ICMP


"fragmentation needed" sent back, the DomU resends a 1440 byte packet


(after some delay), which then goes through, but then the next one is a


2880 byte one again, and so on.


  


FYI: My Dom0 is running NAT, in case this is relevant.


  


  


  
        Christophe


  


  


  


_______________________________________________


Xen-devel mailing list


Xen-devel@lists.xensource.com


http://lists.xensource.com/xen-devel

 



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Pasi Kärkkäinen

2009-Apr-25 08:19 UTC

head link

Re: [Xen-devel] xen.git branch reorg

On Fri, Apr 24, 2009 at 11:55:19PM -0700, Boris Derzhavets
wrote:> Chris,
> 
> I have to apologize. Turning off segmentation offloading may work as well.
> Just now it fixed VNC issue on bare metal - CentOS 5.2 with Atansic Gigabit
> Ethernet driver (ASUS P5KR) . I gonna try it at DomUs at my earliest
convenience.
>
Have you tried another NICs? It could be a bug in the driver for that NIC..

-- Pasi

 > Boris.
> 
> --- On Fri, 4/24/09, Christophe Saout <christophe@saout.de> wrote:
> From: Christophe Saout <christophe@saout.de>
> Subject: Re: [Xen-devel] xen.git branch reorg
> To: "Jeremy Fitzhardinge" <jeremy@goop.org>
> Cc: bderzhavets@yahoo.com, "Xen-devel"
<xen-devel@lists.xensource.com>, "Ian Campbell"
<Ian.Campbell@citrix.com>
> Date: Friday, April 24, 2009, 6:39 PM
> 
> Hi Jeremy,
> 
> > >  In meantime time i see,  that 2.6.30-rc3&rc2&rc1-tip are
> affected.
> > > Solution is the same as on Solaris xVM Linux DomUs about one
> > > year ago -  is  to disable checksum (nothing else) offloading at
> Linux 
> > > DomUs ( CentOS 5.3, Ubuntu 9.04)
> > >
> > > /usr/local/sbin/ethtool -K etho tx off
> >
> > OK, that''s a good lead.
> 
> Yes, I''ve been seeing this too (and meant to investigate it before
> claiming there''s abug) and I can confirm that turning off
segmentation
> offloading "cures" the problem here too.
> 
> Now the tcpdump on Dom0 looks interesting.  It repeatedly sees a packet
> with 2880 byte from DomU coming in, which is then dropped and ICMP
> "fragmentation needed" sent back, the DomU resends a 1440 byte
packet
> (after some delay), which then goes through, but then the next one is a
> 2880 byte one again, and so on.
> 
> FYI: My Dom0 is running NAT, in case this is relevant.
> 
> 	Christophe
> 
> 
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Boris Derzhavets

2009-Apr-25 08:35 UTC

head link

RE: [Xen-devel] xen.git branch reorg

I tried at DomU:

1. Specifies whether TX checksumming should be disabled.
# /usr/local/sbin/ethtool -K eth0 tx off

2. Specifies whether TCP segmentation offload should be disabled.
# /usr/local/sbin/ethtool -K eth0 tso off

3.  Specifies whether UDP fragmentation offload should be disabled
# /usr/local/sbin/ethtool -K eth0 ufo off

Second parameter is your Ethernet interface . Might be eth1 or eth2.
Just install ethtool-6 to get  " man ethtool " handy
You can try all of them, just first , second and third.
Whatever provide a relief.

I had to run second to brought up VNC on CentOS 5.2 Dom0 
 (on bare metal ASUS P5KR)
I believe Linux driver for Atansic Gigabit Ethernet is not good.
Regarding DomU access from from the Net and vice versa usually
first one helps (at DomU ). 
I don''t know what exactly Chris did. "gso" option fails.
Some network education is obviously required for myself at least ;)
To read and understand tcpdump''s captured logs for instance.

Boris.

--- On Sat, 4/25/09, Venefax <venefax@gmail.com> wrote:
From: Venefax <venefax@gmail.com>
Subject: RE: [Xen-devel] xen.git branch reorg
To: bderzhavets@yahoo.com, "''Jeremy Fitzhardinge''"
<jeremy@goop.org>, "''Christophe Saout''"
<christophe@saout.de>
Cc: "''Xen-devel''"
<xen-devel@lists.xensource.com>, "''Ian
Campbell''" <Ian.Campbell@citrix.com>
Date: Saturday, April 25, 2009, 3:03 AM

What are the exact commands to turn off Segmentation Offload and
Checksum Offload? 

Also, should this be done only at Dom0 or also at DomU’s? 

Is this correct or should it be “eth0”? 

/usr/local/sbin/ethtool -K etho tx off 

F.Alves 

From:
xen-devel-bounces@lists.xensource.com
[mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Boris
Derzhavets

Sent: Saturday, April 25, 2009 2:55 AM

To: Jeremy Fitzhardinge; Christophe Saout

Cc: Xen-devel; Ian Campbell

Subject: Re: [Xen-devel] xen.git branch reorg 

  Chris,

  I have to apologize. Turning off segmentation offloading may work as well.

  Just now it fixed VNC issue on bare metal - CentOS 5.2 with Atansic Gigabit

  Ethernet driver (ASUS P5KR) . I gonna try it at DomUs at my earliest
  convenience.

  Boris.

  --- On Fri, 4/24/09, Christophe Saout <christophe@saout.de>
  wrote: 
  From: Christophe Saout
  <christophe@saout.de>

  Subject: Re: [Xen-devel] xen.git branch reorg

  To: "Jeremy Fitzhardinge" <jeremy@goop.org>

  Cc: bderzhavets@yahoo.com, "Xen-devel"
  <xen-devel@lists.xensource.com>, "Ian Campbell"
  <Ian.Campbell@citrix.com>

  Date: Friday, April 24, 2009, 6:39 PM 
  Hi Jeremy,

> >  In meantime time i see,  that 2.6.30-rc3&rc2&rc1-tip are
affected.
> > Solution is the same as on Solaris xVM Linux DomUs about one
> > year ago -  is  to disable checksum (nothing else) offloading at
Linux 
> > DomUs ( CentOS 5.3, Ubuntu 9.04)
> >
> > /usr/local/sbin/ethtool -K etho tx off
>
> OK, that''s a good lead.

Yes, I''ve been seeing this too (and meant to investigate it before

claiming there''s abug) and I can confirm that turning off segmentation

offloading "cures" the problem here too.

Now the tcpdump on Dom0 looks interesting.  It repeatedly sees a packet

with 2880 byte from DomU coming in, which is then dropped and ICMP

"fragmentation needed" sent back, the DomU resends a 1440 byte packet

(after some delay), which then goes through, but then the next one is a

2880 byte one again, and so on.

FYI: My Dom0 is running NAT, in case this is relevant.

          Christophe

_______________________________________________

Xen-devel mailing list

Xen-devel@lists.xensource.com

http://lists.xensource.com/xen-devel 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Boris Derzhavets

2009-Apr-25 08:58 UTC

head link

Re: [Xen-devel] xen.git branch reorg

Pasi,

The most recent boards have integrated Gigabit Ethernet Adapters on
PCI-E as usual. My preference for Linux  is Marvel Yukon 8056 PCI-E.
 It''s customers''s decision what board to purchase and what OS
to install.
I am just supposed to fix bug , either left customer on his own.

Boris

--- On Sat, 4/25/09, Pasi Kärkkäinen <pasik@iki.fi> wrote:
From: Pasi Kärkkäinen <pasik@iki.fi>
Subject: Re: [Xen-devel] xen.git branch reorg
To: "Boris Derzhavets" <bderzhavets@yahoo.com>
Cc: "Jeremy Fitzhardinge" <jeremy@goop.org>, "Christophe
Saout" <christophe@saout.de>, "Xen-devel"
<xen-devel@lists.xensource.com>, "Ian Campbell"
<Ian.Campbell@citrix.com>
Date: Saturday, April 25, 2009, 4:19 AM

On Fri, Apr 24, 2009 at 11:55:19PM -0700, Boris Derzhavets
wrote:> Chris,
> 
> I have to apologize. Turning off segmentation offloading may work as well.
> Just now it fixed VNC issue on bare metal - CentOS 5.2 with Atansic
Gigabit> Ethernet driver (ASUS P5KR) . I gonna try it at DomUs at my earliest
convenience.>
Have you tried another NICs? It could be a bug in the driver for that NIC..

-- Pasi

 > Boris.
> 
> --- On Fri, 4/24/09, Christophe Saout <christophe@saout.de> wrote:
> From: Christophe Saout <christophe@saout.de>
> Subject: Re: [Xen-devel] xen.git branch reorg
> To: "Jeremy Fitzhardinge" <jeremy@goop.org>
> Cc: bderzhavets@yahoo.com, "Xen-devel"<xen-devel@lists.xensource.com>, "Ian Campbell"
<Ian.Campbell@citrix.com>> Date: Friday, April 24, 2009, 6:39 PM
> 
> Hi Jeremy,
> 
> > >  In meantime time i see,  that 2.6.30-rc3&rc2&rc1-tip
are> affected.
> > > Solution is the same as on Solaris xVM Linux DomUs about one
> > > year ago -  is  to disable checksum (nothing else) offloading at
> Linux 
> > > DomUs ( CentOS 5.3, Ubuntu 9.04)
> > >
> > > /usr/local/sbin/ethtool -K etho tx off
> >
> > OK, that''s a good lead.
> 
> Yes, I''ve been seeing this too (and meant to investigate it before
> claiming there''s abug) and I can confirm that turning off
segmentation
> offloading "cures" the problem here too.
> 
> Now the tcpdump on Dom0 looks interesting.  It repeatedly sees a packet
> with 2880 byte from DomU coming in, which is then dropped and ICMP
> "fragmentation needed" sent back, the DomU resends a 1440 byte
packet> (after some delay), which then goes through, but then the next one is a
> 2880 byte one again, and so on.
> 
> FYI: My Dom0 is running NAT, in case this is relevant.
> 
> 	Christophe
> 
> 


      

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Boris Derzhavets

2009-Apr-25 09:22 UTC

head link

Re: [Xen-devel] xen.git branch reorg

I''ve just tested:-

# /usr/local/sbin/ethtool -K eth0 tso off

at Ubuntu 9.04 Server PV DomU at Xen 3.4-rc3-pre Dom0 ( with 2.6.30-rc3-tip).
This command also fixes remote VNC connections performance to DomU.

Boris.

--- On Fri, 4/24/09, Christophe Saout <christophe@saout.de> wrote:
From: Christophe Saout <christophe@saout.de>
Subject: Re: [Xen-devel] xen.git branch reorg
To: "Jeremy Fitzhardinge" <jeremy@goop.org>
Cc: bderzhavets@yahoo.com, "Xen-devel"
<xen-devel@lists.xensource.com>, "Ian Campbell"
<Ian.Campbell@citrix.com>
Date: Friday, April 24, 2009, 6:39 PM

Hi Jeremy,
> >  In meantime time i see,  that 2.6.30-rc3&rc2&rc1-tip are
affected.> > Solution is the same as on Solaris xVM Linux DomUs about one
> > year ago -  is  to disable checksum (nothing else) offloading at
Linux > > DomUs ( CentOS 5.3, Ubuntu 9.04)
> >
> > /usr/local/sbin/ethtool -K etho tx off
>
> OK, that''s a good lead.
Yes, I''ve been seeing this too (and meant to investigate it before
claiming there''s abug) and I can confirm that turning off segmentation
offloading "cures" the problem here too.

Now the tcpdump on Dom0 looks interesting.  It repeatedly sees a packet
with 2880 byte from DomU coming in, which is then dropped and ICMP
"fragmentation needed" sent back, the DomU resends a 1440 byte packet
(after some delay), which then goes through, but then the next one is a
2880 byte one again, and so on.

FYI: My Dom0 is running NAT, in case this is relevant.

	Christophe



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel



      

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Pasi Kärkkäinen

2009-Apr-25 11:54 UTC

head link

Re: [Xen-devel] xen.git branch reorg / crash with 2.6.30-rc3 pv_ops dom0

On Thu, Apr 23, 2009 at 11:32:14AM -0700, Jeremy Fitzhardinge
wrote:> Pasi Kärkkäinen wrote:
> >I''ll try upgrading from dom0/hackery to xen-tip/next and see
how it works
> >for me. 
> >  
> 
> Thanks.
> 
It seems latest tree crashes for me (happened yesterday, and also today, just
rebuilt the latest commits):
http://pasik.reaktio.net/xen/pv_ops-dom0-debug/pv_ops-dom0-bootlog-26-xen331-linux-2.6.30-rc3-crash-no-highpte.txt

Zone PFN ranges:
  DMA      0x00000010 -> 0x00001000
  Normal   0x00001000 -> 0x000229fe
  HighMem  0x000229fe -> 0x00040000
Movable zone start PFN for each node
early_node_map[3] active PFN ranges
    0: 0x00000010 -> 0x0000009f
    0: 0x00000100 -> 0x00001167
    0: 0x00001268 -> 0x00040000
(XEN) d0:v0: unhandled page fault (ec=0003)
(XEN) Pagetable walk from c1268000:
(XEN)  L3[0x003] = 000000003c8f0001 000008f0
(XEN)  L2[0x009] = 000000003d276067 00001276 
(XEN)  L1[0x068] = 000000003d268061 00001268
(XEN) domain_crash_sync called from entry.S (ff19f70e)
(XEN) Domain 0 (vcpu#0) crashed on cpu#0:
(XEN) ----[ Xen-3.3.1-11.fc11  x86_32p  debug=n  Not tainted ]----
(XEN) CPU:    0
(XEN) EIP:    e019:[<c088adae>]
(XEN) EFLAGS: 00000206   EM: 1   CONTEXT: pv guest
(XEN) eax: 00000000   ebx: 00800000   ecx: 00200000   edx: c1268000
(XEN) esi: 01268000   edi: c1268000   ebp: c086de5c   esp: c086de1c
(XEN) cr0: 8005003b   cr4: 000006f0   cr3: 3c85c000   cr2: c1268000
(XEN) ds: e021   es: e021   fs: e021   gs: e021   ss: e021   cs: e019
(XEN) Guest stack trace from esp=c086de1c:

[root@dom0test linux-2.6-xen]# gdb ./vmlinux
(gdb) x/i 0xc086de1c
0xc086de1c <init_thread_union+3612>:    add    %al,(%eax)

32b PAE dom0, 32b PAE hypervisor. 

> >Btw how does dom0 upstreaming look at the moment? Ingo sent pull
request
> >about some changes, and those got merged, but how about the rest? 
> 
> Ingo basically ignored all the Xen changes in the leadup to the merge 
> window, then stomped on my attempt to get them merged with Linus, which 
> was all pretty annoying.  It had the doubly-irritating side-effect of 
> casting doubt over the controversy-free domU changes, so they
didn''t get
> merged in the merge window either; by the time Ingo got around to OKing 
> them, the window had closed.  So all that got merged in the end was the 
> must-have bug fixes.
> 
Yeah.. :( I really hope next merge window will be better..
> Linus isn''t going to pull any more major functionality changes in
the
> -rc kernels, and certainly isn''t going to make an exception for
Xen.  So
> we''re stuck with waiting for the .31 merge window.
> 
I think it might be a good idea to start sending the patches already now, so
they get enough review before the .31 merge window.. 

-- Pasi

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Pasi Kärkkäinen

2009-Apr-25 12:36 UTC

head link

Re: [Xen-devel] xen.git branch reorg / crash with 2.6.30-rc3 pv_ops dom0

On Sat, Apr 25, 2009 at 02:54:04PM +0300, Pasi Kärkkäinen
wrote:> On Thu, Apr 23, 2009 at 11:32:14AM -0700, Jeremy Fitzhardinge wrote:
> > Pasi Kärkkäinen wrote:
> > >I''ll try upgrading from dom0/hackery to xen-tip/next and
see how it works
> > >for me. 
> > >  
> > 
> > Thanks.
> > 
> 
> It seems latest tree crashes for me (happened yesterday, and also today,
just rebuilt the latest commits):
>
http://pasik.reaktio.net/xen/pv_ops-dom0-debug/pv_ops-dom0-bootlog-26-xen331-linux-2.6.30-rc3-crash-no-highpte.txt
> 
Same kernel seems to boot and work OK on baremetal without Xen.

-- Pasi
> Zone PFN ranges:
>   DMA      0x00000010 -> 0x00001000
>   Normal   0x00001000 -> 0x000229fe
>   HighMem  0x000229fe -> 0x00040000
> Movable zone start PFN for each node
> early_node_map[3] active PFN ranges
>     0: 0x00000010 -> 0x0000009f
>     0: 0x00000100 -> 0x00001167
>     0: 0x00001268 -> 0x00040000
> (XEN) d0:v0: unhandled page fault (ec=0003)
> (XEN) Pagetable walk from c1268000:
> (XEN)  L3[0x003] = 000000003c8f0001 000008f0
> (XEN)  L2[0x009] = 000000003d276067 00001276 
> (XEN)  L1[0x068] = 000000003d268061 00001268
> (XEN) domain_crash_sync called from entry.S (ff19f70e)
> (XEN) Domain 0 (vcpu#0) crashed on cpu#0:
> (XEN) ----[ Xen-3.3.1-11.fc11  x86_32p  debug=n  Not tainted ]----
> (XEN) CPU:    0
> (XEN) EIP:    e019:[<c088adae>]
> (XEN) EFLAGS: 00000206   EM: 1   CONTEXT: pv guest
> (XEN) eax: 00000000   ebx: 00800000   ecx: 00200000   edx: c1268000
> (XEN) esi: 01268000   edi: c1268000   ebp: c086de5c   esp: c086de1c
> (XEN) cr0: 8005003b   cr4: 000006f0   cr3: 3c85c000   cr2: c1268000
> (XEN) ds: e021   es: e021   fs: e021   gs: e021   ss: e021   cs: e019
> (XEN) Guest stack trace from esp=c086de1c:
> 
> [root@dom0test linux-2.6-xen]# gdb ./vmlinux
> (gdb) x/i 0xc086de1c
> 0xc086de1c <init_thread_union+3612>:    add    %al,(%eax)
> 
> 32b PAE dom0, 32b PAE hypervisor. 
> 
> 
> > >Btw how does dom0 upstreaming look at the moment? Ingo sent pull
request
> > >about some changes, and those got merged, but how about the rest? 
> > 
> > Ingo basically ignored all the Xen changes in the leadup to the merge 
> > window, then stomped on my attempt to get them merged with Linus,
which
> > was all pretty annoying.  It had the doubly-irritating side-effect of 
> > casting doubt over the controversy-free domU changes, so they
didn''t get
> > merged in the merge window either; by the time Ingo got around to
OKing
> > them, the window had closed.  So all that got merged in the end was
the
> > must-have bug fixes.
> > 
> 
> Yeah.. :( I really hope next merge window will be better..
> 
> > Linus isn''t going to pull any more major functionality
changes in the
> > -rc kernels, and certainly isn''t going to make an exception
for Xen.  So
> > we''re stuck with waiting for the .31 merge window.
> > 
> 
> I think it might be a good idea to start sending the patches already now,
so
> they get enough review before the .31 merge window.. 
> 
> -- Pasi
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

M A Young

2009-Apr-25 13:59 UTC

head link

Re: [Xen-devel] xen.git branch reorg / crash with 2.6.30-rc3 pv_ops dom0

On Sat, 25 Apr 2009, Pasi K?rkk?inen wrote:
> It seems latest tree crashes for me (happened yesterday, and also today, 
> just rebuilt the latest commits): 
>
http://pasik.reaktio.net/xen/pv_ops-dom0-debug/pv_ops-dom0-bootlog-26-xen331-linux-2.6.30-rc3-crash-no-highpte.txt
What are your STACKPROTECTOR settings?

 	Michael Young

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jeremy Fitzhardinge

2009-Apr-25 23:34 UTC

head link

Re: [Xen-devel] xen.git branch reorg / crash with 2.6.30-rc3 pv_ops dom0

Pasi Kärkkäinen wrote:> On Thu, Apr 23, 2009 at 11:32:14AM -0700, Jeremy Fitzhardinge wrote:
>   
>> Pasi Kärkkäinen wrote:
>>     
>>> I''ll try upgrading from dom0/hackery to xen-tip/next and
see how it works
>>> for me. 
>>>  
>>>       
>> Thanks.
>>
>>     
>
> It seems latest tree crashes for me (happened yesterday, and also today,
just rebuilt the latest commits):
>
http://pasik.reaktio.net/xen/pv_ops-dom0-debug/pv_ops-dom0-bootlog-26-xen331-linux-2.6.30-rc3-crash-no-highpte.txt
>   
This is xen-tip/next?  Does master work any better?  How about running 
as domU?

Thanks,
    J
> Zone PFN ranges:
>   DMA      0x00000010 -> 0x00001000
>   Normal   0x00001000 -> 0x000229fe
>   HighMem  0x000229fe -> 0x00040000
> Movable zone start PFN for each node
> early_node_map[3] active PFN ranges
>     0: 0x00000010 -> 0x0000009f
>     0: 0x00000100 -> 0x00001167
>     0: 0x00001268 -> 0x00040000
> (XEN) d0:v0: unhandled page fault (ec=0003)
> (XEN) Pagetable walk from c1268000:
> (XEN)  L3[0x003] = 000000003c8f0001 000008f0
> (XEN)  L2[0x009] = 000000003d276067 00001276 
> (XEN)  L1[0x068] = 000000003d268061 00001268
> (XEN) domain_crash_sync called from entry.S (ff19f70e)
> (XEN) Domain 0 (vcpu#0) crashed on cpu#0:
> (XEN) ----[ Xen-3.3.1-11.fc11  x86_32p  debug=n  Not tainted ]----
> (XEN) CPU:    0
> (XEN) EIP:    e019:[<c088adae>]
> (XEN) EFLAGS: 00000206   EM: 1   CONTEXT: pv guest
> (XEN) eax: 00000000   ebx: 00800000   ecx: 00200000   edx: c1268000
> (XEN) esi: 01268000   edi: c1268000   ebp: c086de5c   esp: c086de1c
> (XEN) cr0: 8005003b   cr4: 000006f0   cr3: 3c85c000   cr2: c1268000
> (XEN) ds: e021   es: e021   fs: e021   gs: e021   ss: e021   cs: e019
> (XEN) Guest stack trace from esp=c086de1c:
>
> [root@dom0test linux-2.6-xen]# gdb ./vmlinux
> (gdb) x/i 0xc086de1c
> 0xc086de1c <init_thread_union+3612>:    add    %al,(%eax)
>
> 32b PAE dom0, 32b PAE hypervisor. 
>
>
>   
>>> Btw how does dom0 upstreaming look at the moment? Ingo sent pull
request
>>> about some changes, and those got merged, but how about the rest? 
>>>       
>> Ingo basically ignored all the Xen changes in the leadup to the merge 
>> window, then stomped on my attempt to get them merged with Linus, which
>> was all pretty annoying.  It had the doubly-irritating side-effect of 
>> casting doubt over the controversy-free domU changes, so they
didn''t get
>> merged in the merge window either; by the time Ingo got around to OKing
>> them, the window had closed.  So all that got merged in the end was the
>> must-have bug fixes.
>>
>>     
>
> Yeah.. :( I really hope next merge window will be better..
>
>   
>> Linus isn''t going to pull any more major functionality changes
in the
>> -rc kernels, and certainly isn''t going to make an exception
for Xen.  So
>> we''re stuck with waiting for the .31 merge window.
>>
>>     
>
> I think it might be a good idea to start sending the patches already now,
so
> they get enough review before the .31 merge window.. 
>
> -- Pasi
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
>   

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

William Pitcock

2009-Apr-26 01:28 UTC

head link

Re: [Xen-devel] xen.git branch reorg

Cloning git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen.git
gives me the following:

nenolod@petrie:~/dev-src$ git clone
git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen.git linux-xen-next
Initialized empty Git repository in /home/nenolod/dev-src/linux-xen-next/.git/
remote: Counting objects: 1299853, done.
remote: Compressing objects: 100% (227847/227847), done.
remote: Total 1299853 (delta 1088065), reused 1275536 (delta 1064482)
Receiving objects: 100% (1299853/1299853), 306.99 MiB | 411 KiB/s, done.
Resolving deltas: 100% (1088065/1088065), done.
warning: remote HEAD refers to nonexistent ref, unable to checkout.

Something seems wrong here.

William

On Thu, 2009-04-23 at 10:38 -0700, Jeremy Fitzhardinge
wrote:> I finally fixed the AHCI problem with xen-tip/next and pushed forward 
> with the long-threatened xen.git cleanup and reorg.
> 
> I''ve removed a pile of branches on 
> git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen.git:
> 
>     * xen/*
>     * push2/*
>     * for-*/*
> 
> aside from some branches which contain some work which I need to look 
> over again and work out what to do with.
> 
> *PLEASE* tell me if I''ve accidentally deleted a branch with
something
> important, and I''ll reinstate it.
> 
> All the changesets are still there in the repo, and if you have any 
> local branches referring to these branches then they''ll stay
around
> indefinitely.  The removal just means that the branches won''t
confuse
> any newcomers, and it makes it clear that no further development is 
> going to happen on them.
> 
> The new branch structure is similar to the old one in overall layout.  
> There are two "merged" branches:
> 
>     * xen-tip/master - will try to keep as a known-working branch, with
>       only tested changes
>     * xen-tip/next - current bleeding edge; should at least compile
> 
> My planned workflow is:
> 
>    1. new development happens on topic branches
>    2. those changes are merged with xen-tip/next until they test OK
>    3. the changes are then merged onto master (either directly off next,
>       or cleanly re-merged)
>    4. upstream branches are merged with next and master like topic
>       branches; I''ll avoid merging them into xen.git topic
branches
>       unless its really necessary
> 
> I won''t generally rebase any of the branches, though the
"next" and
> "master" are more likely to be rebased than the topic branches.
> 
> The current set of topic branches are:
> 
>     * xen-tip/core
>           o core Xen stuff; currently all upstream
>     * xen-tip/dom0/acpi
>           o host S3 suspend/resume (untested, unmerged)
>     * xen-tip/dom0/apic
>           o apic changes
>     * xen-tip/dom0/backend/core
>     * xen-tip/dom0/backend/blkback
>     * xen-tip/dom0/backend/netback
>           o backend devices
>     * xen-tip/dom0/core
>           o essential dom0 changes
>     * xen-tip/dom0/drm
>           o drm/dri changes
>     * xen-tip/dom0/gntdev
>           o /dev/gntdev
>     * xen-tip/dom0/microcode
>           o CPU microcode driver
>     * xen-tip/dom0/mtrr
>           o /proc/mtrr stuff
>     * xen-tip/dom0/pci
>           o general dom0 PCI/device access changes
>     * xen-tip/dom0/swiotlb
>           o Xen swiotlb changes
>     * xen-tip/dom0/xenfs
>           o /proc/xen/privcmd
> 
> 
> Thanks,
>     J
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Pasi Kärkkäinen

2009-Apr-26 14:50 UTC

head link

Re: [Xen-devel] xen.git branch reorg / crash with 2.6.30-rc3 pv_ops dom0

On Sat, Apr 25, 2009 at 02:59:09PM +0100, M A Young
wrote:> On Sat, 25 Apr 2009, Pasi K?rkk?inen wrote:
> 
> >It seems latest tree crashes for me (happened yesterday, and also
today,
> >just rebuilt the latest commits): 
>
>http://pasik.reaktio.net/xen/pv_ops-dom0-debug/pv_ops-dom0-bootlog-26-xen331-linux-2.6.30-rc3-crash-no-highpte.txt
> 
> What are your STACKPROTECTOR settings?
> 
[root@dom0test linux-2.6-xen]# grep -i protector .config
# CONFIG_CC_STACKPROTECTOR is not set

-- Pasi

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Pasi Kärkkäinen

2009-Apr-26 14:51 UTC

head link

Re: [Xen-devel] xen.git branch reorg / crash with 2.6.30-rc3 pv_ops dom0

On Sat, Apr 25, 2009 at 04:34:00PM -0700, Jeremy Fitzhardinge
wrote:> Pasi Kärkkäinen wrote:
> >On Thu, Apr 23, 2009 at 11:32:14AM -0700, Jeremy Fitzhardinge wrote:
> >  
> >>Pasi Kärkkäinen wrote:
> >>    
> >>>I''ll try upgrading from dom0/hackery to xen-tip/next
and see how it works
> >>>for me. 
> >>> 
> >>>      
> >>Thanks.
> >>
> >>    
> >
> >It seems latest tree crashes for me (happened yesterday, and also
today,
> >just rebuilt the latest commits):
>
>http://pasik.reaktio.net/xen/pv_ops-dom0-debug/pv_ops-dom0-bootlog-26-xen331-linux-2.6.30-rc3-crash-no-highpte.txt
> >  
> 
> This is xen-tip/next?  Does master work any better?  How about running 
> as domU?
> 
Yep, it''s xen-tip/next. I''ll try xen-tip/master next.. 

-- Pasi
> Thanks,
>    J
> 
> >Zone PFN ranges:
> >  DMA      0x00000010 -> 0x00001000
> >  Normal   0x00001000 -> 0x000229fe
> >  HighMem  0x000229fe -> 0x00040000
> >Movable zone start PFN for each node
> >early_node_map[3] active PFN ranges
> >    0: 0x00000010 -> 0x0000009f
> >    0: 0x00000100 -> 0x00001167
> >    0: 0x00001268 -> 0x00040000
> >(XEN) d0:v0: unhandled page fault (ec=0003)
> >(XEN) Pagetable walk from c1268000:
> >(XEN)  L3[0x003] = 000000003c8f0001 000008f0
> >(XEN)  L2[0x009] = 000000003d276067 00001276 
> >(XEN)  L1[0x068] = 000000003d268061 00001268
> >(XEN) domain_crash_sync called from entry.S (ff19f70e)
> >(XEN) Domain 0 (vcpu#0) crashed on cpu#0:
> >(XEN) ----[ Xen-3.3.1-11.fc11  x86_32p  debug=n  Not tainted ]----
> >(XEN) CPU:    0
> >(XEN) EIP:    e019:[<c088adae>]
> >(XEN) EFLAGS: 00000206   EM: 1   CONTEXT: pv guest
> >(XEN) eax: 00000000   ebx: 00800000   ecx: 00200000   edx: c1268000
> >(XEN) esi: 01268000   edi: c1268000   ebp: c086de5c   esp: c086de1c
> >(XEN) cr0: 8005003b   cr4: 000006f0   cr3: 3c85c000   cr2: c1268000
> >(XEN) ds: e021   es: e021   fs: e021   gs: e021   ss: e021   cs: e019
> >(XEN) Guest stack trace from esp=c086de1c:
> >
> >[root@dom0test linux-2.6-xen]# gdb ./vmlinux
> >(gdb) x/i 0xc086de1c
> >0xc086de1c <init_thread_union+3612>:    add    %al,(%eax)
> >
> >32b PAE dom0, 32b PAE hypervisor. 
> >
> >
> >  
> >>>Btw how does dom0 upstreaming look at the moment? Ingo sent
pull request
> >>>about some changes, and those got merged, but how about the
rest?
> >>>      
> >>Ingo basically ignored all the Xen changes in the leadup to the
merge
> >>window, then stomped on my attempt to get them merged with Linus,
which
> >>was all pretty annoying.  It had the doubly-irritating side-effect
of
> >>casting doubt over the controversy-free domU changes, so they
didn''t get
> >>merged in the merge window either; by the time Ingo got around to
OKing
> >>them, the window had closed.  So all that got merged in the end was
the
> >>must-have bug fixes.
> >>
> >>    
> >
> >Yeah.. :( I really hope next merge window will be better..
> >
> >  
> >>Linus isn''t going to pull any more major functionality
changes in the
> >>-rc kernels, and certainly isn''t going to make an
exception for Xen.  So
> >>we''re stuck with waiting for the .31 merge window.
> >>
> >>    
> >
> >I think it might be a good idea to start sending the patches already
now,
> >so
> >they get enough review before the .31 merge window.. 
> >
> >-- Pasi
> >
> >_______________________________________________
> >Xen-devel mailing list
> >Xen-devel@lists.xensource.com
> >http://lists.xensource.com/xen-devel
> >  
> 
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Pasi Kärkkäinen

2009-Apr-26 18:38 UTC

head link

Re: [Xen-devel] xen.git branch reorg / crash with 2.6.30-rc3 pv_ops dom0

On Sun, Apr 26, 2009 at 05:51:08PM +0300, Pasi Kärkkäinen
wrote:> On Sat, Apr 25, 2009 at 04:34:00PM -0700, Jeremy Fitzhardinge wrote:
> > Pasi Kärkkäinen wrote:
> > >On Thu, Apr 23, 2009 at 11:32:14AM -0700, Jeremy Fitzhardinge
wrote:
> > >  
> > >>Pasi Kärkkäinen wrote:
> > >>    
> > >>>I''ll try upgrading from dom0/hackery to
xen-tip/next and see how it works
> > >>>for me. 
> > >>> 
> > >>>      
> > >>Thanks.
> > >>
> > >>    
> > >
> > >It seems latest tree crashes for me (happened yesterday, and also
today,
> > >just rebuilt the latest commits):
> >
>http://pasik.reaktio.net/xen/pv_ops-dom0-debug/pv_ops-dom0-bootlog-26-xen331-linux-2.6.30-rc3-crash-no-highpte.txt
> > >  
> > 
> > This is xen-tip/next?  Does master work any better?  How about running
> > as domU?
> > 
> 
> Yep, it''s xen-tip/next. I''ll try xen-tip/master next.. 
>
xen-tip/master crashes aswell:
http://pasik.reaktio.net/xen/pv_ops-dom0-debug/pv_ops-dom0-bootlog-27-xen331-linux-2.6.30-rc3-master-crash-no-highpte.txt

Zone PFN ranges:
  DMA      0x00000010 -> 0x00001000
  Normal   0x00001000 -> 0x000229fe
  HighMem  0x000229fe -> 0x00040000
Movable zone start PFN for each node
early_node_map[3] active PFN ranges
    0: 0x00000010 -> 0x0000009f
    0: 0x00000100 -> 0x00001165
    0: 0x00001266 -> 0x00040000
(XEN) d0:v0: unhandled page fault (ec=0003)
(XEN) Pagetable walk from c1266000:
(XEN)  L3[0x003] = 000000003c8ee001 000008ee
(XEN)  L2[0x009] = 000000003d274067 00001274 
(XEN)  L1[0x066] = 000000003d266061 00001266
(XEN) domain_crash_sync called from entry.S (ff19f70e)
(XEN) Domain 0 (vcpu#0) crashed on cpu#0:
(XEN) ----[ Xen-3.3.1-11.fc11  x86_32p  debug=n  Not tainted ]----
(XEN) CPU:    0
(XEN) EIP:    e019:[<c0888d74>]
(XEN) EFLAGS: 00000206   EM: 1   CONTEXT: pv guest
(XEN) eax: 00000000   ebx: 00800000   ecx: 00200000   edx: c1266000
(XEN) esi: 01266000   edi: c1266000   ebp: c086be5c   esp: c086be1c
(XEN) cr0: 8005003b   cr4: 000006f0   cr3: 3c85a000   cr2: c1266000
(XEN) ds: e021   es: e021   fs: e021   gs: e021   ss: e021   cs: e019
(XEN) Guest stack trace from esp=c086be1c:

[root@dom0test linux-2.6-xen]# gdb ./vmlinux
(gdb) x/i 0xc086be1c
0xc086be1c <init_thread_union+3612>:    add    %al,(%eax)

-- Pasi

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Ian Campbell

2009-Apr-27 15:44 UTC

head link

Re: [Xen-devel] xen.git branch reorg

On Fri, 2009-04-24 at 04:59 -0400, Alex Zeffertt wrote:> Jeremy Fitzhardinge wrote:
> > I finally fixed the AHCI problem with xen-tip/next and pushed forward 
> > with the long-threatened xen.git cleanup and reorg.
> > 
> > I''ve removed a pile of branches on 
> > git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen.git:
> > 
> >     * xen/*
> >     * push2/*
> >     * for-*/*
> > 
> 
> 
> Does this mean we need this patch in xen-unstable.hg?
I think so yes.
> Update XEN_LINUX_GIT_REMOTEBRANCH to match changes made in upstream repo.
> Needed if you want setting KERNELS=linux-2.6-pvops in config/Linux.mk to
> work.
Acked-by: Ian Campbell <ian.campbell@citrix.com>
> 
> diff -r 8b152638adaa buildconfigs/mk.linux-2.6-pvops
> --- a/buildconfigs/mk.linux-2.6-pvops	Thu Apr 23 16:22:48 2009 +0100
> +++ b/buildconfigs/mk.linux-2.6-pvops	Fri Apr 24 09:53:40 2009 +0100
> @@ -7,7 +7,7 @@
> 
>   XEN_LINUX_GIT_URL ?=
git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen.git
>   XEN_LINUX_GIT_REMOTENAME ?= xen
> -XEN_LINUX_GIT_REMOTEBRANCH ?= xen/dom0/hackery
> +XEN_LINUX_GIT_REMOTEBRANCH ?= xen-tip/master
> 
>   EXTRAVERSION ?> 
> 
> 
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jeremy Fitzhardinge

2009-Apr-27 19:33 UTC

head link

Re: [Xen-devel] xen.git branch reorg / crash with 2.6.30-rc3 pv_ops dom0

Pasi Kärkkäinen wrote:> On Thu, Apr 23, 2009 at 11:32:14AM -0700, Jeremy Fitzhardinge wrote:
>   
>> Pasi Kärkkäinen wrote:
>>     
>>> I''ll try upgrading from dom0/hackery to xen-tip/next and
see how it works
>>> for me. 
>>>  
>>>       
>> Thanks.
>>
>>     
>
> It seems latest tree crashes for me (happened yesterday, and also today,
just rebuilt the latest commits):
>
http://pasik.reaktio.net/xen/pv_ops-dom0-debug/pv_ops-dom0-bootlog-26-xen331-linux-2.6.30-rc3-crash-no-highpte.txt
>
> Zone PFN ranges:
>   DMA      0x00000010 -> 0x00001000
>   Normal   0x00001000 -> 0x000229fe
>   HighMem  0x000229fe -> 0x00040000
> Movable zone start PFN for each node
> early_node_map[3] active PFN ranges
>     0: 0x00000010 -> 0x0000009f
>     0: 0x00000100 -> 0x00001167
>     0: 0x00001268 -> 0x00040000
> (XEN) d0:v0: unhandled page fault (ec=0003)
> (XEN) Pagetable walk from c1268000:
> (XEN)  L3[0x003] = 000000003c8f0001 000008f0
> (XEN)  L2[0x009] = 000000003d276067 00001276 
> (XEN)  L1[0x068] = 000000003d268061 00001268
> (XEN) domain_crash_sync called from entry.S (ff19f70e)
> (XEN) Domain 0 (vcpu#0) crashed on cpu#0:
> (XEN) ----[ Xen-3.3.1-11.fc11  x86_32p  debug=n  Not tainted ]----
> (XEN) CPU:    0
> (XEN) EIP:    e019:[<c088adae>]
> (XEN) EFLAGS: 00000206   EM: 1   CONTEXT: pv guest
> (XEN) eax: 00000000   ebx: 00800000   ecx: 00200000   edx: c1268000
> (XEN) esi: 01268000   edi: c1268000   ebp: c086de5c   esp: c086de1c
> (XEN) cr0: 8005003b   cr4: 000006f0   cr3: 3c85c000   cr2: c1268000
> (XEN) ds: e021   es: e021   fs: e021   gs: e021   ss: e021   cs: e019
> (XEN) Guest stack trace from esp=c086de1c:
>
> [root@dom0test linux-2.6-xen]# gdb ./vmlinux
> (gdb) x/i 0xc086de1c
> 0xc086de1c <init_thread_union+3612>:    add    %al,(%eax)
>
> 32b PAE dom0, 32b PAE hypervisor. 
>   "x/i 0xc088adae" would be more useful, as that will show the faulting 
instruction.

I just booted a current xen-tip/next kernel as dom0 on my 32-bit machine 
with no problems, so I''m not sure what''s going wrong on your
machine.

Ian, have you tried 32-bit boots lately?

    J

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Pasi Kärkkäinen

2009-Apr-27 19:38 UTC

head link

Re: [Xen-devel] xen.git branch reorg / crash with 2.6.30-rc3 pv_ops dom0

On Mon, Apr 27, 2009 at 12:33:01PM -0700, Jeremy Fitzhardinge
wrote:> Pasi Kärkkäinen wrote:
> >On Thu, Apr 23, 2009 at 11:32:14AM -0700, Jeremy Fitzhardinge wrote:
> >  
> >>Pasi Kärkkäinen wrote:
> >>    
> >>>I''ll try upgrading from dom0/hackery to xen-tip/next
and see how it works
> >>>for me. 
> >>> 
> >>>      
> >>Thanks.
> >>
> >>    
> >
> >
> >[root@dom0test linux-2.6-xen]# gdb ./vmlinux
> >(gdb) x/i 0xc086de1c
> >0xc086de1c <init_thread_union+3612>:    add    %al,(%eax)
> >
> >32b PAE dom0, 32b PAE hypervisor. 
> >  
> "x/i 0xc088adae" would be more useful, as that will show the
faulting
> instruction.
> 
> I just booted a current xen-tip/next kernel as dom0 on my 32-bit machine 
> with no problems, so I''m not sure what''s going wrong on
your machine.
> 
This is from xen-tip/master:

(XEN) d0:v0: unhandled page fault (ec=0003)
(XEN) Pagetable walk from c1266000:
(XEN)  L3[0x003] = 000000003c8ee001 000008ee
(XEN)  L2[0x009] = 000000003d274067 00001274 
(XEN)  L1[0x066] = 000000003d266061 00001266
(XEN) domain_crash_sync called from entry.S (ff19f70e)
(XEN) Domain 0 (vcpu#0) crashed on cpu#0:
(XEN) ----[ Xen-3.3.1-11.fc11  x86_32p  debug=n  Not tainted ]----
(XEN) CPU:    0
(XEN) EIP:    e019:[<c0888d74>]
(XEN) EFLAGS: 00000206   EM: 1   CONTEXT: pv guest
(XEN) eax: 00000000   ebx: 00800000   ecx: 00200000   edx: c1266000
(XEN) esi: 01266000   edi: c1266000   ebp: c086be5c   esp: c086be1c
(XEN) cr0: 8005003b   cr4: 000006f0   cr3: 3c85a000   cr2: c1266000
(XEN) ds: e021   es: e021   fs: e021   gs: e021   ss: e021   cs: e019
(XEN) Guest stack trace from esp=c086be1c:

[root@dom0test linux-2.6-xen]# gdb vmlinux
(gdb) x/i 0xc0888d74
0xc0888d74 <__constant_c_memset+21>:    rep stos %eax,%es:(%edi)

-- Pasi

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jeremy Fitzhardinge

2009-Apr-27 19:46 UTC

head link

Re: [Xen-devel] xen.git branch reorg

Ian Campbell wrote:> My guess would be the change from
> CHECKSUM_{HW,etc}+skb->proto_{csum,data}_valid to
> CHECKSUM_{UNNECESSARY,PARTIAL,etc} is incomplete/incorrect. This should
> be taken care of by f4f969ffe1d9326ccaace768bde3b33a5ae49e71. I saw
> checksum offloading issues early on when I did this but I thought I got
> it right eventually, I''m pretty certain it was OK for me at the
time at
> least.
>   
It looks like the checksum is a secondary issue.  My understanding is 
that things like TSO, USO, LRO, etc, depend on hardware checksumming, so 
disabling checksumming will implicitly disable the large segment 
features too.  And judging from Christophe''s report, it seems
that''s the
issue, with unexpectedly large packets appearing in dom0.

    J


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Christophe Saout

2009-Apr-27 20:18 UTC

head link

Re: [Xen-devel] xen.git branch reorg

Hi Jeremy,
> > My guess would be the change from
> > CHECKSUM_{HW,etc}+skb->proto_{csum,data}_valid to
> > CHECKSUM_{UNNECESSARY,PARTIAL,etc} is incomplete/incorrect. This
should
> > be taken care of by f4f969ffe1d9326ccaace768bde3b33a5ae49e71. I saw
> > checksum offloading issues early on when I did this but I thought I
got
> > it right eventually, I''m pretty certain it was OK for me at
the time at
> > least.
> >   
> 
> It looks like the checksum is a secondary issue.  My understanding is 
> that things like TSO, USO, LRO, etc, depend on hardware checksumming, so 
> disabling checksumming will implicitly disable the large segment 
> features too.  And judging from Christophe''s report, it seems
that''s the
> issue, with unexpectedly large packets appearing in dom0.
The checksum offloading is fine, I''ve checked this in a couple of
different ways.  What I find strange, is that with TSO packets should be
chopped on their way out (in software, if the hardware doesn''t support
it).  I am wondering who is sending the "fragmentation needed" reply
in
the Dom0 network stack.

ip_forward.c for instance has something like this:

        if (unlikely(skb->len > dst_mtu(&rt->u.dst) &&
!skb_is_gso(skb) &&
                     (ip_hdr(skb)->frag_off & htons(IP_DF))) &&
!skb->local_df)
                IP_INC_STATS(dev_net(rt->u.dst.dev), IPSTATS_MIB_FRAGFAILS);
                icmp_send(skb, ICMP_DEST_UNREACH, ICMP_FRAG_NEEDED,
                          htonl(dst_mtu(&rt->u.dst)));
                goto drop;
        }

So if the packet is *marked* as being segmentation offloaded it should
be passed to the NIC driver where it''s handled.  At least that is my
understanding.  Turning on gso/tso support on the outgoing NIC doesn''t
make a difference.

It seems that netfront/netback make sure that the GSO bits are passed on
correctly, but I would need to inspect the skbuff''s in Dom0 to find out
if this is working correctly.  Because if the GSO bits in skbuff was for
some reason missing, it would explain the behaviour I am seeing.  Or, of
course, there''s something else I am missing.

	Christophe



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jeremy Fitzhardinge

2009-Apr-27 20:50 UTC

head link

Re: [Xen-devel] xen.git branch reorg

William Pitcock wrote:> Cloning git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen.git
> gives me the following:
>
> nenolod@petrie:~/dev-src$ git clone
git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen.git linux-xen-next
> Initialized empty Git repository in
/home/nenolod/dev-src/linux-xen-next/.git/
> remote: Counting objects: 1299853, done.
> remote: Compressing objects: 100% (227847/227847), done.
> remote: Total 1299853 (delta 1088065), reused 1275536 (delta 1064482)
> Receiving objects: 100% (1299853/1299853), 306.99 MiB | 411 KiB/s, done.
> Resolving deltas: 100% (1088065/1088065), done.
> warning: remote HEAD refers to nonexistent ref, unable to checkout.
>
> Something seems wrong here.
>   
I don''t think its terribly significant; it just means the HEAD is 
pointing to a now-deleted branch.  I''ve re-pointed it to something 
sensible.  But either way, it shouldn''t prevent you from checking out 
one of the branches.

Does it work for you now?

    J

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

William Pitcock

2009-Apr-27 21:07 UTC

head link

Re: [Xen-devel] xen.git branch reorg

On Mon, 2009-04-27 at 13:50 -0700, Jeremy Fitzhardinge
wrote:> William Pitcock wrote:
> > Cloning git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen.git
> > gives me the following:
> >
> > nenolod@petrie:~/dev-src$ git clone
git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen.git linux-xen-next
> > Initialized empty Git repository in
/home/nenolod/dev-src/linux-xen-next/.git/
> > remote: Counting objects: 1299853, done.
> > remote: Compressing objects: 100% (227847/227847), done.
> > remote: Total 1299853 (delta 1088065), reused 1275536 (delta 1064482)
> > Receiving objects: 100% (1299853/1299853), 306.99 MiB | 411 KiB/s,
done.
> > Resolving deltas: 100% (1088065/1088065), done.
> > warning: remote HEAD refers to nonexistent ref, unable to checkout.
> >
> > Something seems wrong here.
> >   
> 
> I don''t think its terribly significant; it just means the HEAD is 
> pointing to a now-deleted branch.  I''ve re-pointed it to something
> sensible.  But either way, it shouldn''t prevent you from checking
out
> one of the branches.
> 
> Does it work for you now?
Yeah, it''s working here now. Thanks for that.

We intend to start testing the 3.4 release candidate with 2.6.30
paravirt-ops dom0 in our test environment this weekend.

William


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jeremy Fitzhardinge

2009-Apr-27 23:48 UTC

head link

Re: [Xen-devel] xen.git branch reorg

William Pitcock wrote:> Yeah, it''s working here now. Thanks for that.
>
> We intend to start testing the 3.4 release candidate with 2.6.30
> paravirt-ops dom0 in our test environment this weekend.
>   
OK, that''ll be interesting.  What''s your test environment?

    J

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

William Pitcock

2009-Apr-28 07:13 UTC

head link

Re: [Xen-devel] xen.git branch reorg

On Mon, 2009-04-27 at 16:48 -0700, Jeremy Fitzhardinge
wrote:> William Pitcock wrote:
> > Yeah, it''s working here now. Thanks for that.
> >
> > We intend to start testing the 3.4 release candidate with 2.6.30
> > paravirt-ops dom0 in our test environment this weekend.
> >   
> 
> OK, that''ll be interesting.  What''s your test
environment?
Paravirtualization-only nocona-based (early EM64T) Xeon hardware, with
nodes comprising of dual 2.8ghz CPUs with 8GB of memory, on Debian
testing.

Production is presently at 3.2 with XenLinux 2.6.18 patches rebased
against 2.6.26. Production machines are dual opteron 2216 machines with
8GB-16GB of RAM, with both HVM and Paravirtualized domains.

The test and production grids use the same storage backend, which is
presently provided through exporting LVM volumes with AoE and
cluster-lvm.

I could pull a spare server out of the production grid for testing HVM
under 2.6.30 if needed.

William

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Boris Derzhavets

2009-Apr-28 09:14 UTC

head link

Re: [Xen-devel] xen.git branch reorg

>I could pull a spare server out of the production 
>grid for testing HVM under 2.6.30 if needed.
As far as to my knowledge Xen 3.4-rc3-pre Dom0 & (2.6.30-rc3-tip)
support only PV DomUs. HVM is still unsolved problem.
If i am wrong about that, please advise.

Boris.


--- On Tue, 4/28/09, William Pitcock <nenolod@dereferenced.org> wrote:
From: William Pitcock <nenolod@dereferenced.org>
Subject: Re: [Xen-devel] xen.git branch reorg
To: "Jeremy Fitzhardinge" <jeremy@goop.org>
Cc: "Xen-devel" <xen-devel@lists.xensource.com>
Date: Tuesday, April 28, 2009, 3:13 AM

On Mon, 2009-04-27 at 16:48 -0700, Jeremy Fitzhardinge
wrote:> William Pitcock wrote:
> > Yeah, it''s working here now. Thanks for that.
> >
> > We intend to start testing the 3.4 release candidate with 2.6.30
> > paravirt-ops dom0 in our test environment this weekend.
> >   
> 
> OK, that''ll be interesting.  What''s your test
environment?
Paravirtualization-only nocona-based (early EM64T) Xeon hardware, with
nodes comprising of dual 2.8ghz CPUs with 8GB of memory, on Debian
testing.

Production is presently at 3.2 with XenLinux 2.6.18 patches rebased
against 2.6.26. Production machines are dual opteron 2216 machines with
8GB-16GB of RAM, with both HVM and Paravirtualized domains.

The test and production grids use the same storage backend, which is
presently provided through exporting LVM volumes with AoE and
cluster-lvm.

I could pull a spare server out of the production grid for testing HVM
under 2.6.30 if needed.

William


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel



      

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

William Pitcock

2009-Apr-28 14:51 UTC

head link

Re: [Xen-devel] xen.git branch reorg

I think you are talking about HVM PV drivers. HVM itself is no different
than booting a paravirtualized guest as far as the dom0 kernel is
concerned... the only difference is the running qemu-dm process or stub
domain.

William

On Tue, 2009-04-28 at 02:14 -0700, Boris Derzhavets
wrote:> >I could pull a spare server out of the production 
> >grid for testing HVM under 2.6.30 if needed.
> 
> As far as to my knowledge Xen 3.4-rc3-pre Dom0 & (2.6.30-rc3-tip)
> support only PV DomUs. HVM is still unsolved problem.
> If i am wrong about that, please advise.
> 
> Boris.
> 
> 
> --- On Tue, 4/28/09, William Pitcock <nenolod@dereferenced.org>
wrote:
>         From: William Pitcock <nenolod@dereferenced.org>
>         Subject: Re: [Xen-devel] xen.git branch reorg
>         To: "Jeremy Fitzhardinge" <jeremy@goop.org>
>         Cc: "Xen-devel" <xen-devel@lists.xensource.com>
>         Date: Tuesday, April 28, 2009, 3:13 AM
>         
>         On Mon, 2009-04-27 at 16:48 -0700, Jeremy Fitzhardinge wrote:
>         > William Pitcock
>          wrote:
>         > > Yeah, it''s working here now. Thanks for that.
>         > >
>         > > We intend to start testing the 3.4 release candidate with
2.6.30
>         > > paravirt-ops dom0 in our test environment this weekend.
>         > >   
>         > 
>         > OK, that''ll be interesting.  What''s your
test environment?
>         
>         Paravirtualization-only nocona-based (early EM64T) Xeon hardware,
with
>         nodes comprising of dual 2.8ghz CPUs with 8GB of memory, on Debian
>         testing.
>         
>         Production is presently at 3.2 with XenLinux 2.6.18 patches rebased
>         against 2.6.26. Production machines are dual opteron 2216 machines
with
>         8GB-16GB of RAM, with both HVM and Paravirtualized domains.
>         
>         The test and production grids use the same storage backend, which
is
>         presently provided through exporting LVM volumes with AoE and
>         cluster-lvm.
>         
>         I could pull a spare server out of the production grid for testing
HVM
>         under 2.6.30 if
>          needed.
>         
>         William
>         
>         
>         _______________________________________________
>         Xen-devel mailing list
>         Xen-devel@lists.xensource.com
>         http://lists.xensource.com/xen-devel
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Boris Derzhavets

2009-Apr-28 15:01 UTC

head link

Re: [Xen-devel] xen.git branch reorg

I am talking about usual HVM DomUs.>the only difference is the running qemu-dm process or stub
> domain.
That''s failing with current 2.6.30-rc3-tip under Xen 3.4-rc3-pre
at least through my experience.
Should be well known issue for Jeremy.

Boris.



--- On Tue, 4/28/09, William Pitcock <nenolod@dereferenced.org> wrote:
From: William Pitcock <nenolod@dereferenced.org>
Subject: Re: [Xen-devel] xen.git branch reorg
To: bderzhavets@yahoo.com
Cc: "Jeremy Fitzhardinge" <jeremy@goop.org>,
"Xen-devel" <xen-devel@lists.xensource.com>
Date: Tuesday, April 28, 2009, 10:51 AM

I think you are talking about HVM PV drivers. HVM itself is no different
than booting a paravirtualized guest as far as the dom0 kernel is
concerned... the only difference is the running qemu-dm process or stub
domain.

William

On Tue, 2009-04-28 at 02:14 -0700, Boris Derzhavets
wrote:> >I could pull a spare server out of the production 
> >grid for testing HVM under 2.6.30 if needed.
> 
> As far as to my knowledge Xen 3.4-rc3-pre Dom0 & (2.6.30-rc3-tip)
> support only PV DomUs. HVM is still unsolved problem.
> If i am wrong about that, please advise.
> 
> Boris.
> 
> 
> --- On Tue, 4/28/09, William Pitcock <nenolod@dereferenced.org>
wrote:>         From: William Pitcock <nenolod@dereferenced.org>
>         Subject: Re: [Xen-devel] xen.git branch reorg
>         To: "Jeremy Fitzhardinge" <jeremy@goop.org>
>         Cc: "Xen-devel" <xen-devel@lists.xensource.com>
>         Date: Tuesday, April 28, 2009, 3:13 AM
>         
>         On Mon, 2009-04-27 at 16:48 -0700, Jeremy Fitzhardinge wrote:
>         > William Pitcock
>          wrote:
>         > > Yeah, it''s working here now. Thanks for that.
>         > >
>         > > We intend to start testing the 3.4 release candidate
with 2.6.30>         > > paravirt-ops dom0 in our test environment this weekend.
>         > >   
>         > 
>         > OK, that''ll be interesting.  What''s your
test
environment?>         
>         Paravirtualization-only nocona-based (early EM64T) Xeon hardware,
with>         nodes comprising of dual 2.8ghz CPUs with 8GB of memory, on Debian
>         testing.
>         
>         Production is presently at 3.2 with XenLinux 2.6.18 patches
rebased>         against 2.6.26. Production machines are dual opteron 2216 machines
with>         8GB-16GB of RAM, with both HVM and Paravirtualized domains.
>         
>         The test and production grids use the same storage backend, which
is>         presently provided through exporting LVM volumes with AoE and
>         cluster-lvm.
>         
>         I could pull a spare server out of the production grid for testing
HVM>         under 2.6.30 if
>          needed.
>         
>         William
>         
>         
>         _______________________________________________
>         Xen-devel mailing list
>         Xen-devel@lists.xensource.com
>         http://lists.xensource.com/xen-devel
> 



      

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Pasi Kärkkäinen

2009-Apr-28 15:22 UTC

head link

Re: [Xen-devel] xen.git branch reorg / crash with 2.6.30-rc3 pv_ops dom0

On Tue, Apr 28, 2009 at 04:05:13PM +0100, Ian Campbell
wrote:> On Mon, 2009-04-27 at 15:38 -0400, Pasi Kärkkäinen wrote:
> > [root@dom0test linux-2.6-xen]# gdb vmlinux
> > (gdb) x/i 0xc0888d74
> > 0xc0888d74 <__constant_c_memset+21>:    rep stos %eax,%es:(%edi)
> 
> I see basically the same thing, except I''m testing current
xen-tip/next,
> 
Good to hear it''s not only me :)

I also saw similar crash on xen-tip/next.

-- Pasi
> (XEN) d0:v0: unhandled page fault (ec=0003)
> (XEN) Pagetable walk from 00000000c0e5d000:
> (XEN)  L4[0x000] = 000000011b537027 0000000000000537
> (XEN)  L3[0x003] = 000000011b5b9027 00000000000005b9
> (XEN)  L2[0x007] = 000000011be64067 0000000000000e64 
> (XEN)  L1[0x05d] = 000000011be5d001 0000000000000e5d
> (XEN) domain_crash_sync called from entry.S
> (XEN) Domain 0 (vcpu#0) crashed on cpu#0:
> (XEN) ----[ Xen-3.4-unstable  x86_64  debug=y  Not tainted ]----
> (XEN) CPU:    0
> (XEN) RIP:    e019:[<00000000c056bd2f>]
> (XEN) RFLAGS: 0000000000000287   EM: 1   CONTEXT: pv guest
> (XEN) rax: 0000000000000000   rbx: 0000000000000000   rcx: 0000000000000400
> (XEN) rdx: 00000000c0e5d000   rsi: 0000000000000000   rdi: 00000000c0e5d000
> (XEN) rbp: 00000000c0555d84   rsp: 00000000c0555d68   r8:  0000000000000000
> (XEN) r9:  0000000000000000   r10: 0000000000000000   r11: 0000000000000000
> (XEN) r12: 0000000000000000   r13: 0000000000000000   r14: 0000000000000000
> (XEN) r15: 0000000000000000   cr0: 000000008005003b   cr4: 00000000000006f0
> (XEN) cr3: 000000011ffb0000   cr2: 00000000c0e5d000
> (XEN) ds: e021   es: e021   fs: e021   gs: e021   ss: e021   cs: e019
> 
> (gdb) disas 0x00000000c056bd2f
> [...]
> 0xc056bd2f <alloc_low_page+47>:	rep stos %eax,%es:(%edi)
> 
> The rough stack trace seems to be (unabridged version below):
>         c056bd2f: alloc_low_page + 47 in section .init.text
>         c056bda8: one_page_table_init + 88 in section .init.text
>         c056cddb: kernel_physical_mapping_init + 571 in
>         section .init.text
>         c03dae40: init_memory_mapping + 800 in section .text
>         c055f656: setup_arch + 1014 in section .init.text
>         c055a7b6: start_kernel + 118 in section .init.text
>         c055a076: i386_start_kernel + 86 in section .init.text
>         c055d1cb: xen_start_kernel + 955 in section .init.text
> 
> I thought
>         commit 4c76c04421dfe7be3e5a1d8ab1b2a3be0b02558e
>         Author: Yinghai Lu <yinghai@kernel.org>
>         Date:   Fri Mar 6 16:49:00 2009 -0800
>         
>             x86: introduce bootmem_state
> might be at fault but reverting doesn''t improve matters.
There''s some
> other cleanup/unification patches in the recent-ish history of init_32.c
> which might be worth investigating further.
> 
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

William Pitcock

2009-Apr-28 15:33 UTC

head link

Re: [Xen-devel] xen.git branch reorg

Hmm.

I''ll try to debug it if I have time over the weekend. It
shouldn''t be
very hard to do.

Do you have any specifics on what is failing?

William

On Tue, 2009-04-28 at 08:01 -0700, Boris Derzhavets
wrote:> I am talking about usual HVM DomUs.
> >the only difference is the running qemu-dm process or stub
> > domain.
> 
> That''s failing with current 2.6.30-rc3-tip under Xen 3.4-rc3-pre
> at least through my experience.
> Should be well known issue for Jeremy.
> 
> Boris.
> 
> 
> 
> --- On Tue, 4/28/09, William Pitcock <nenolod@dereferenced.org>
wrote:
>         From: William Pitcock <nenolod@dereferenced.org>
>         Subject: Re: [Xen-devel] xen.git branch reorg
>         To: bderzhavets@yahoo.com
>         Cc: "Jeremy Fitzhardinge" <jeremy@goop.org>,
"Xen-devel"
>         <xen-devel@lists.xensource.com>
>         Date: Tuesday, April 28, 2009, 10:51 AM
>         
>         I think you are talking about HVM PV drivers. HVM itself is no
different
>         than
>          booting a paravirtualized guest as far as the dom0 kernel is
>         concerned... the only difference is the running qemu-dm process or
stub
>         domain.
>         
>         William
>         
>         On Tue, 2009-04-28 at 02:14 -0700, Boris Derzhavets wrote:
>         > >I could pull a spare server out of the production 
>         > >grid for testing HVM under 2.6.30 if needed.
>         > 
>         > As far as to my knowledge Xen 3.4-rc3-pre Dom0 &
(2.6.30-rc3-tip)
>         > support only PV DomUs. HVM is still unsolved problem.
>         > If i am wrong about that, please advise.
>         > 
>         > Boris.
>         > 
>         > 
>         > --- On Tue, 4/28/09, William Pitcock
<nenolod@dereferenced.org>
>         wrote:
>         >         From: William Pitcock <nenolod@dereferenced.org>
>         >         Subject: Re: [Xen-devel] xen.git branch reorg
>         >         To: "Jeremy Fitzhardinge"
<jeremy@goop.org>
>         >         Cc: "Xen-devel"
<xen-devel@lists.xensource.com>
>         >         Date:
>          Tuesday, April 28, 2009, 3:13 AM
>         >         
>         >         On Mon, 2009-04-27 at 16:48 -0700, Jeremy Fitzhardinge
wrote:
>         >         > William Pitcock
>         >          wrote:
>         >         > > Yeah, it''s working here now. Thanks
for that.
>         >         > >
>         >         > > We intend to start testing the 3.4 release
candidate
>         with 2.6.30
>         >         > > paravirt-ops dom0 in our test environment
this weekend.
>         >         > >   
>         >         > 
>         >         > OK, that''ll be interesting. 
What''s your test
>         environment?
>         >         
>         >         Paravirtualization-only nocona-based (early EM64T)
Xeon hardware,
>         with
>         >         nodes comprising of dual 2.8ghz CPUs with 8GB of
memory, on Debian
>         >         testing.
>         >         
>         >         Production is presently at 3.2 with XenLinux 2.6.18
patches
>         rebased
>         >         against 2.6.26. Production machines
>          are dual opteron 2216 machines
>         with
>         >         8GB-16GB of RAM, with both HVM and Paravirtualized
domains.
>         >         
>         >         The test and production grids use the same storage
backend, which
>         is
>         >         presently provided through exporting LVM volumes with
AoE and
>         >         cluster-lvm.
>         >         
>         >         I could pull a spare server out of the production grid
for testing
>         HVM
>         >         under 2.6.30 if
>         >          needed.
>         >         
>         >         William
>         >         
>         >         
>         >         _______________________________________________
>         >         Xen-devel mailing list
>         >         Xen-devel@lists.xensource.com
>         >         http://lists.xensource.com/xen-devel
>         > 
>         
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Boris Derzhavets

2009-Apr-28 15:41 UTC

head link

[Xen-devel] Yum install xen on F10

Pasi,

You wrote on Wiki page :-
These features/patches are backported from Xen 3.4 development/unstable version
to Fedora''s Xen 3.3.x.
As far as i can see now :-
# yum install xen
on Fedora 10 (64-bit) installs Hypervisor which is unable to handle bzImage to
load.
So, the only one chance is to wait until F11 GA. ( I want to load bzImage )

Boris.




      

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Boris Derzhavets

2009-Apr-28 15:51 UTC

head link

Re: [Xen-devel] xen.git branch reorg

I''ve got a similar results

Boris

-- On Mon, 2/23/09, Jeremy Fitzhardinge <jeremy@goop.org> wrote:

    From: Jeremy Fitzhardinge <jeremy@goop.org>
    Subject: Re: [Xen-devel] HVM pvops failures
    To: "Ian Jackson" <Ian.Jackson@eu.citrix.com>
    Cc: "Andrew Lyon" <andrew.lyon@gmail.com>,
"xen-devel@lists.xensource.com" <xen-devel@lists.xensource.com>,
"Ian Campbell" <Ian.Campbell@citrix.com>
    Date: Monday, February 23, 2009, 7:12 PM

    Ian Jackson wrote:
    > Andrew Lyon writes ("Re: [Xen-devel] HVM guest question (was Re:
    [PATCH] ioemu: Cleanup the code of PCI passthrough.)"):
    >   
    >> On Mon, Feb 23, 2009 at 2:53 PM, Ian
     Jackson
    <Ian.Jackson@eu.citrix.com> wrote:
    >>     
    >>> These messages are not very surprising.  Is it working ?
    >>>       
    >> No, when try to start HVM on Xen unstable with pv_ops kernel I get
    this error:
    >>     
    >
    > Ah.  This is rather odd.  Normally I would hope that xend would report
    > an exit status.  (I haven''t tried pvops with qemu.)
    >
    >   
    Hm, I''m getting:
    [2009-02-23 15:26:18 4380] WARNING (image:482) domain win7: device model 
    failure: pid 5409: died due to signal 7; see /var/log/xen/qemu-dm-win7.log

    Hm, signal 7 - SIGBUS.  I wonder if

    Using stub domains doesn''t work either.

    > I would suggest running qemu-dm under strace.  This can be done easily
    > enough with a simple wrapper script, something like:
    >   #!/bin/sh
    >   set -e
    >   exec strace -vvs500 -f -o /root/qemu-dm.strace \
    >    
     /usr/lib/xen/bin/qemu-dm "$@"
    > and then give the name of the script as device_model in your config
file.
    >   
    I see:

    ...
    5079  ioctl(10, EVIOCGKEYCODE, 0x7fffdfd52b70) = 0
    5079  clock_gettime(CLOCK_MONOTONIC, {1324, 539747423}) = 0
    5079  clock_gettime(CLOCK_MONOTONIC, {1324, 539837298}) = 0
    5079  select(14, [3 6 10 11 13], [], [], {0, 10000}) = 1 (in [10], left {0,
    9995})
    5079  read(10, "\36\0\0\0"..., 4)       = 4
    5079  write(10, "\36\0\0\0"..., 4)      = 4
    5079  ioctl(10, EVIOCGKEYCODE, 0x7fffdfd52b70) = 0
    5079  clock_gettime(CLOCK_MONOTONIC, {1324, 540495964}) = 0
    5079  clock_gettime(CLOCK_MONOTONIC, {1324, 540591278}) = 0
    5079  select(14, [3 6 10 11 13], [], [], {0, 10000}) = 1 (in [10], left {0,
    9995})
    5079  read(10, "\36\0\0\0"..., 4)       = 4
    5079  write(10, "\36\0\0\0"..., 4)      = 4
    5079  mmap(NULL, 1048576, PROT_READ|PROT_WRITE, MAP_SHARED, 4, 0)    
0x7f1ad5f2b000
    5079 
     ioctl(4, SNDCTL_DSP_STEREO, 0x7fffdfd52230) = 0
    5079  --- SIGBUS (Bus error) @ 0 (0) ---
    5157  +++ killed by SIGBUS +++


    This mmap and ioctl is from /proc/xen/privcmd.

        J

--- On Tue, 4/28/09, William Pitcock <nenolod@dereferenced.org> wrote:
From: William Pitcock <nenolod@dereferenced.org>
Subject: Re: [Xen-devel] xen.git branch reorg
To: bderzhavets@yahoo.com
Cc: "Jeremy Fitzhardinge" <jeremy@goop.org>,
"Xen-devel" <xen-devel@lists.xensource.com>
Date: Tuesday, April 28, 2009, 11:33 AM

Hmm.

I''ll try to debug it if I have time over the weekend. It
shouldn''t be
very hard to do.

Do you have any specifics on what is failing?

William

On Tue, 2009-04-28 at 08:01 -0700, Boris Derzhavets
wrote:> I am talking about usual HVM DomUs.
> >the only difference is the running qemu-dm process or stub
> > domain.
> 
> That''s failing with current 2.6.30-rc3-tip under Xen 3.4-rc3-pre
> at least through my experience.
> Should be well known issue for Jeremy.
> 
> Boris.
> 
> 
> 
> --- On Tue, 4/28/09, William Pitcock <nenolod@dereferenced.org>
wrote:>         From: William Pitcock <nenolod@dereferenced.org>
>         Subject: Re: [Xen-devel] xen.git branch reorg
>         To: bderzhavets@yahoo.com
>         Cc: "Jeremy Fitzhardinge" <jeremy@goop.org>,
"Xen-devel">         <xen-devel@lists.xensource.com>
>         Date: Tuesday, April 28, 2009, 10:51 AM
>         
>         I think you are talking about HVM PV drivers. HVM itself is no
different>         than
>          booting a paravirtualized guest as far as the dom0 kernel is
>         concerned... the only difference is the running qemu-dm process or
stub>         domain.
>         
>         William
>         
>         On Tue, 2009-04-28 at 02:14 -0700, Boris Derzhavets wrote:
>         > >I could pull a spare server out of the production 
>         > >grid for testing HVM under 2.6.30 if needed.
>         > 
>         > As far as to my knowledge Xen 3.4-rc3-pre Dom0 &
(2.6.30-rc3-tip)>         > support only PV DomUs. HVM is still unsolved problem.
>         > If i am wrong about that, please advise.
>         > 
>         > Boris.
>         > 
>         > 
>         > --- On Tue, 4/28/09, William Pitcock
<nenolod@dereferenced.org>>         wrote:
>         >         From: William Pitcock
<nenolod@dereferenced.org>>         >         Subject: Re: [Xen-devel] xen.git branch reorg
>         >         To: "Jeremy Fitzhardinge"
<jeremy@goop.org>>         >         Cc: "Xen-devel"
<xen-devel@lists.xensource.com>>         >         Date:
>          Tuesday, April 28, 2009, 3:13 AM
>         >         
>         >         On Mon, 2009-04-27 at 16:48 -0700, Jeremy
Fitzhardinge wrote:>         >         > William Pitcock
>         >          wrote:
>         >         > > Yeah, it''s working here now. Thanks
for
that.>         >         > >
>         >         > > We intend to start testing the 3.4 release
candidate>         with 2.6.30
>         >         > > paravirt-ops dom0 in our test environment
this weekend.>         >         > >   
>         >         > 
>         >         > OK, that''ll be interesting. 
What''s your
test>         environment?
>         >         
>         >         Paravirtualization-only nocona-based (early EM64T)
Xeon hardware,>         with
>         >         nodes comprising of dual 2.8ghz CPUs with 8GB of
memory, on Debian>         >         testing.
>         >         
>         >         Production is presently at 3.2 with XenLinux 2.6.18
patches>         rebased
>         >         against 2.6.26. Production machines
>          are dual opteron 2216 machines
>         with
>         >         8GB-16GB of RAM, with both HVM and Paravirtualized
domains.>         >         
>         >         The test and production grids use the same storage
backend, which>         is
>         >         presently provided through exporting LVM volumes with
AoE and>         >         cluster-lvm.
>         >         
>         >         I could pull a spare server out of the production
grid for testing>         HVM
>         >         under 2.6.30 if
>         >          needed.
>         >         
>         >         William
>         >         
>         >         
>         >         _______________________________________________
>         >         Xen-devel mailing list
>         >         Xen-devel@lists.xensource.com
>         >         http://lists.xensource.com/xen-devel
>         > 
>         
> 



      


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

M A Young

2009-Apr-28 16:02 UTC

head link

Re: [Xen-devel] Yum install xen on F10

On Tue, 28 Apr 2009, Boris Derzhavets wrote:
> Pasi,
> 
> You wrote on Wiki page :-
> These features/patches are backported from Xen 3.4 development/unstable
> version to Fedora''s Xen 3.3.x.
> As far as i can see now :-
> # yum install xen
> on Fedora 10 (64-bit) installs Hypervisor which is unable to handle bzImage
> to load.
> So, the only one chance is to wait until F11 GA. ( I want to load bzImage )
Or get the F11 source rpm and do an rpmbuild --rebuild on Fedora 10. It 
works for me.

 	Michael Young

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jeremy Fitzhardinge

2009-Apr-28 16:25 UTC

head link

Re: [Xen-devel] xen.git branch reorg / crash with 2.6.30-rc3 pv_ops dom0

Ian Campbell wrote:> On Mon, 2009-04-27 at 15:38 -0400, Pasi Kärkkäinen wrote:
>   
>> [root@dom0test linux-2.6-xen]# gdb vmlinux
>> (gdb) x/i 0xc0888d74
>> 0xc0888d74 <__constant_c_memset+21>:    rep stos %eax,%es:(%edi)
>>     
>
> I see basically the same thing, except I''m testing current
xen-tip/next,
>
> (XEN) d0:v0: unhandled page fault (ec=0003)
> (XEN) Pagetable walk from 00000000c0e5d000:
> (XEN)  L4[0x000] = 000000011b537027 0000000000000537
> (XEN)  L3[0x003] = 000000011b5b9027 00000000000005b9
> (XEN)  L2[0x007] = 000000011be64067 0000000000000e64 
> (XEN)  L1[0x05d] = 000000011be5d001 0000000000000e5d
> (XEN) domain_crash_sync called from entry.S
> (XEN) Domain 0 (vcpu#0) crashed on cpu#0:
> (XEN) ----[ Xen-3.4-unstable  x86_64  debug=y  Not tainted ]----
> (XEN) CPU:    0
> (XEN) RIP:    e019:[<00000000c056bd2f>]
> (XEN) RFLAGS: 0000000000000287   EM: 1   CONTEXT: pv guest
> (XEN) rax: 0000000000000000   rbx: 0000000000000000   rcx: 0000000000000400
> (XEN) rdx: 00000000c0e5d000   rsi: 0000000000000000   rdi: 00000000c0e5d000
> (XEN) rbp: 00000000c0555d84   rsp: 00000000c0555d68   r8:  0000000000000000
> (XEN) r9:  0000000000000000   r10: 0000000000000000   r11: 0000000000000000
> (XEN) r12: 0000000000000000   r13: 0000000000000000   r14: 0000000000000000
> (XEN) r15: 0000000000000000   cr0: 000000008005003b   cr4: 00000000000006f0
> (XEN) cr3: 000000011ffb0000   cr2: 00000000c0e5d000
> (XEN) ds: e021   es: e021   fs: e021   gs: e021   ss: e021   cs: e019
>
> (gdb) disas 0x00000000c056bd2f
> [...]
> 0xc056bd2f <alloc_low_page+47>:	rep stos %eax,%es:(%edi)
>
> The rough stack trace seems to be (unabridged version below):
>         c056bd2f: alloc_low_page + 47 in section .init.text
>         c056bda8: one_page_table_init + 88 in section .init.text
>         c056cddb: kernel_physical_mapping_init + 571 in
>         section .init.text
>         c03dae40: init_memory_mapping + 800 in section .text
>         c055f656: setup_arch + 1014 in section .init.text
>         c055a7b6: start_kernel + 118 in section .init.text
>         c055a076: i386_start_kernel + 86 in section .init.text
>         c055d1cb: xen_start_kernel + 955 in section .init.text
>   
Interesting.  Curiously its working for me on my machine, but has a 
tendency to crash a bit later with a protection fault on an RO page 
during memory allocation.  I wonder if its related...

    J

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jeremy Fitzhardinge

2009-Apr-28 16:28 UTC

head link

Re: [Xen-devel] xen.git branch reorg

William Pitcock wrote:> Hmm.
>
> I''ll try to debug it if I have time over the weekend. It
shouldn''t be
> very hard to do.
>
> Do you have any specifics on what is failing?
>   
For some reason the privcmd mappings are either not getting created 
properly, or are disappearing for some reason, causing a SIGBUS on 
qemu''s privcmd mapping.  I put a bunch of printks in at one point, and 
couldn''t see where it was failing.  I''m planning on having
another go
with all the tracing infrastructure I put in place, but feel free to 
look at it if you''re so inclined.

    J

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jeremy Fitzhardinge

2009-Apr-28 17:30 UTC

head link

Re: [Xen-devel] xen.git branch reorg / crash with 2.6.30-rc3 pv_ops dom0

Ian Campbell wrote:>> Interesting.  Curiously its working for me on my machine, but has a 
>> tendency to crash a bit later with a protection fault on an RO page 
>> during memory allocation.  I wonder if its related...
>>     
>
> Certainly smells similar.
>   
Yeah.  The fault in both cases has an error-code of 3, so a write 
protect fault.  It suggests a pinned page is getting freed into the 
general heap for some reason.  Except there are no complaints from Xen 
about writes into a pagetable, so that makes it look like page is being 
made RO but not (left) pinned.

    J

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Pasi Kärkkäinen

2009-Apr-28 17:38 UTC

head link

[Xen-devel] Re: Yum install xen on F10

On Tue, Apr 28, 2009 at 08:41:55AM -0700, Boris Derzhavets
wrote:> Pasi,
> 
> You wrote on Wiki page :-
> These features/patches are backported from Xen 3.4 development/unstable
version to Fedora''s Xen 3.3.x.
> As far as i can see now :-
> # yum install xen
> on Fedora 10 (64-bit) installs Hypervisor which is unable to handle bzImage
to load.
> So, the only one chance is to wait until F11 GA. ( I want to load bzImage )
> 
Yes, those backported patches are in F11 xen rpms.

I''ve been rebuilding that F11 (rawhide) xen src.rpm on F10.

-- Pasi

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Boris Derzhavets

2009-Apr-28 17:42 UTC

head link

Re: [Xen-devel] Yum install xen on F10

Downloaded twice from different mirrors:-

[root@ServerFDR Download]# rpm -iv xen-3.3.1-11.fc11.src.rpm
warning: xen-3.3.1-11.fc11.src.rpm: Header V3 RSA/SHA256 signature: NOKEY, key
ID d22e77f2
xen-3.3.1-11.fc11
warning: user mockbuild does not exist - using root
warning: group mockbuild does not exist - using root
error: unpacking of archive failed on file
/root/rpmbuild/SOURCES/grub-0.97.tar.gz;49f73923: cpio: MD5 sum mismatch

Same error comes up.

Boris.

--- On Tue, 4/28/09, M A Young <m.a.young@durham.ac.uk> wrote:
From: M A Young <m.a.young@durham.ac.uk>
Subject: Re: [Xen-devel] Yum install xen on F10
To: "Boris Derzhavets" <bderzhavets@yahoo.com>
Cc: "Ian Campbell" <Ian.Campbell@eu.citrix.com>, "Pasi
Kärkkäinen" <pasik@iki.fi>, "Jeremy Fitzhardinge"
<jeremy@goop.org>, "Xen-devel"
<xen-devel@lists.xensource.com>
Date: Tuesday, April 28, 2009, 12:02 PM

On Tue, 28 Apr 2009, Boris Derzhavets wrote:
> Pasi,
> 
> You wrote on Wiki page :-
> These features/patches are backported from Xen 3.4 development/unstable
> version to Fedora''s Xen 3.3.x.
> As far as i can see now :-
> # yum install xen
> on Fedora 10 (64-bit) installs Hypervisor which is unable to handle
bzImage> to load.
> So, the only one chance is to wait until F11 GA. ( I want to load bzImage)

Or get the F11 source rpm and do an rpmbuild --rebuild on Fedora 10. It works
for me.

	Michael Young



      

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Pasi Kärkkäinen

2009-Apr-28 19:33 UTC

head link

Re: [Xen-devel] Yum install xen on F10

On Tue, Apr 28, 2009 at 10:42:48AM -0700, Boris Derzhavets
wrote:> Downloaded twice from different mirrors:-
> 
> [root@ServerFDR Download]# rpm -iv xen-3.3.1-11.fc11.src.rpm
> warning: xen-3.3.1-11.fc11.src.rpm: Header V3 RSA/SHA256 signature: NOKEY,
key ID d22e77f2
> xen-3.3.1-11.fc11
> warning: user mockbuild does not exist - using root
> warning: group mockbuild does not exist - using root
> error: unpacking of archive failed on file
/root/rpmbuild/SOURCES/grub-0.97.tar.gz;49f73923: cpio: MD5 sum mismatch
> 
> Same error comes up.
> 
Do you have latest updates installed to your F10? F11 switched to new rpm
version, and that new rpm version should be in F10 updates aswell..

-- Pasi
> Boris.
> 
> --- On Tue, 4/28/09, M A Young <m.a.young@durham.ac.uk> wrote:
> From: M A Young <m.a.young@durham.ac.uk>
> Subject: Re: [Xen-devel] Yum install xen on F10
> To: "Boris Derzhavets" <bderzhavets@yahoo.com>
> Cc: "Ian Campbell" <Ian.Campbell@eu.citrix.com>, "Pasi
Kärkkäinen" <pasik@iki.fi>, "Jeremy Fitzhardinge"
<jeremy@goop.org>, "Xen-devel"
<xen-devel@lists.xensource.com>
> Date: Tuesday, April 28, 2009, 12:02 PM
> 
> On Tue, 28 Apr 2009, Boris Derzhavets wrote:
> 
> > Pasi,
> > 
> > You wrote on Wiki page :-
> > These features/patches are backported from Xen 3.4
development/unstable
> > version to Fedora''s Xen 3.3.x.
> > As far as i can see now :-
> > # yum install xen
> > on Fedora 10 (64-bit) installs Hypervisor which is unable to handle
> bzImage
> > to load.
> > So, the only one chance is to wait until F11 GA. ( I want to load
bzImage
> )
> 
> Or get the F11 source rpm and do an rpmbuild --rebuild on Fedora 10. It
works
> for me.
> 
> 	Michael Young
> 
> 
> 
>       
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Boris Derzhavets

2009-Apr-28 19:40 UTC

head link

Re: [Xen-devel] Yum install xen on F10

Thank you. Will update F10.
Boris.

--- On Tue, 4/28/09, Pasi Kärkkäinen <pasik@iki.fi> wrote:
From: Pasi Kärkkäinen <pasik@iki.fi>
Subject: Re: [Xen-devel] Yum install xen on F10
To: "Boris Derzhavets" <bderzhavets@yahoo.com>
Cc: "M A Young" <m.a.young@durham.ac.uk>, "Ian
Campbell" <Ian.Campbell@eu.citrix.com>, "Jeremy
Fitzhardinge" <jeremy@goop.org>, "Xen-devel"
<xen-devel@lists.xensource.com>
Date: Tuesday, April 28, 2009, 3:33 PM

On Tue, Apr 28, 2009 at 10:42:48AM -0700, Boris Derzhavets
wrote:> Downloaded twice from different mirrors:-
> 
> [root@ServerFDR Download]# rpm -iv xen-3.3.1-11.fc11.src.rpm
> warning: xen-3.3.1-11.fc11.src.rpm: Header V3 RSA/SHA256 signature: NOKEY,
key ID d22e77f2> xen-3.3.1-11.fc11
> warning: user mockbuild does not exist - using root
> warning: group mockbuild does not exist - using root
> error: unpacking of archive failed on file/root/rpmbuild/SOURCES/grub-0.97.tar.gz;49f73923: cpio: MD5 sum
mismatch> 
> Same error comes up.
> 
Do you have latest updates installed to your F10? F11 switched to new rpm
version, and that new rpm version should be in F10 updates aswell..

-- Pasi
> Boris.
> 
> --- On Tue, 4/28/09, M A Young <m.a.young@durham.ac.uk> wrote:
> From: M A Young <m.a.young@durham.ac.uk>
> Subject: Re: [Xen-devel] Yum install xen on F10
> To: "Boris Derzhavets" <bderzhavets@yahoo.com>
> Cc: "Ian Campbell" <Ian.Campbell@eu.citrix.com>,"Pasi Kärkkäinen" <pasik@iki.fi>, "Jeremy
Fitzhardinge" <jeremy@goop.org>, "Xen-devel"
<xen-devel@lists.xensource.com>> Date: Tuesday, April 28, 2009, 12:02 PM
> 
> On Tue, 28 Apr 2009, Boris Derzhavets wrote:
> 
> > Pasi,
> > 
> > You wrote on Wiki page :-
> > These features/patches are backported from Xen 3.4
development/unstable> > version to Fedora''s Xen 3.3.x.
> > As far as i can see now :-
> > # yum install xen
> > on Fedora 10 (64-bit) installs Hypervisor which is unable to handle
> bzImage
> > to load.
> > So, the only one chance is to wait until F11 GA. ( I want to load
bzImage> )
> 
> Or get the F11 source rpm and do an rpmbuild --rebuild on Fedora 10. It
works> for me.
> 
> 	Michael Young
> 
> 
> 
>       


      

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Boris Derzhavets

2009-Apr-29 06:40 UTC

head link

Re: [Xen-devel] Yum install xen on F10

Pasi,

[root@ServerFDR10 ~]# rpm -qa|grep xen
xen-libs-3.3.0-1.fc10.x86_64
xen-hypervisor-3.3.0-1.fc10.x86_64
jaxen-1.1-1.3.fc10.noarch
xen-runtime-3.3.0-1.fc10.x86_64
xen-3.3.0-1.fc10.x86_64


I''ve tested "yum update" and "yum upgrade" neither
one of packages list had
been obtained before download started showed up xen-packages supposed to be
refreshed.

Maybe i was doing something wrong ?

Boris.

--- On Tue, 4/28/09, Pasi Kärkkäinen <pasik@iki.fi> wrote:
From: Pasi Kärkkäinen <pasik@iki.fi>
Subject: Re: [Xen-devel] Yum install xen on F10
To: "Boris Derzhavets" <bderzhavets@yahoo.com>
Cc: "Ian Campbell" <Ian.Campbell@eu.citrix.com>, "Jeremy
Fitzhardinge" <jeremy@goop.org>, "Xen-devel"
<xen-devel@lists.xensource.com>, "M A Young"
<m.a.young@durham.ac.uk>
Date: Tuesday, April 28, 2009, 3:33 PM

On Tue, Apr 28, 2009 at 10:42:48AM -0700, Boris Derzhavets
wrote:> Downloaded twice from different mirrors:-
> 
> [root@ServerFDR Download]# rpm -iv xen-3.3.1-11.fc11.src.rpm
> warning: xen-3.3.1-11.fc11.src.rpm: Header V3 RSA/SHA256 signature: NOKEY,
key ID d22e77f2> xen-3.3.1-11.fc11
> warning: user mockbuild does not exist - using root
> warning: group mockbuild does not exist - using root
> error: unpacking of archive failed on file/root/rpmbuild/SOURCES/grub-0.97.tar.gz;49f73923: cpio: MD5 sum
mismatch> 
> Same error comes up.
> 
Do you have latest updates installed to your F10? F11 switched to new rpm
version, and that new rpm version should be in F10 updates aswell..

-- Pasi
> Boris.
> 
> --- On Tue, 4/28/09, M A Young <m.a.young@durham.ac.uk> wrote:
> From: M A Young <m.a.young@durham.ac.uk>
> Subject: Re: [Xen-devel] Yum install xen on F10
> To: "Boris Derzhavets" <bderzhavets@yahoo.com>
> Cc: "Ian Campbell" <Ian.Campbell@eu.citrix.com>,"Pasi Kärkkäinen" <pasik@iki.fi>, "Jeremy
Fitzhardinge" <jeremy@goop.org>, "Xen-devel"
<xen-devel@lists.xensource.com>> Date: Tuesday, April 28, 2009, 12:02 PM
> 
> On Tue, 28 Apr 2009, Boris Derzhavets wrote:
> 
> > Pasi,
> > 
> > You wrote on Wiki page :-
> > These features/patches are backported from Xen 3.4
development/unstable> > version to Fedora''s Xen 3.3.x.
> > As far as i can see now :-
> > # yum install xen
> > on Fedora 10 (64-bit) installs Hypervisor which is unable to handle
> bzImage
> > to load.
> > So, the only one chance is to wait until F11 GA. ( I want to load
bzImage> )
> 
> Or get the F11 source rpm and do an rpmbuild --rebuild on Fedora 10. It
works> for me.
> 
> 	Michael Young
> 
> 
> 
>       
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel



      

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jeremy Fitzhardinge

2009-Apr-29 16:25 UTC

head link

Re: [Xen-devel] xen.git branch reorg / crash with 2.6.30-rc3 pv_ops dom0

Ian Campbell wrote:> On Tue, 2009-04-28 at 13:30 -0400, Jeremy Fitzhardinge wrote:
>   
>> Ian Campbell wrote:
>>     
>>>> Interesting.  Curiously its working for me on my machine, but
has a
>>>> tendency to crash a bit later with a protection fault on an RO
page
>>>> during memory allocation.  I wonder if its related...
>>>>     
>>>>         
>>> Certainly smells similar.
>>>   
>>>       
>> Yeah.  The fault in both cases has an error-code of 3, so a write 
>> protect fault.  It suggests a pinned page is getting freed into the 
>> general heap for some reason.  Except there are no complaints from Xen 
>> about writes into a pagetable, so that makes it look like page is being
>> made RO but not (left) pinned.
>>     
>
> The crash Pasi and I are seeing is pretty early on though, is there any
> opportunity for a page table page to have been recycled before
> ~kernel_physical_mapping_init()? I''d have thought not.
>   
No, sounds unlikely.
> I was wondering if perhaps e820_table_start (used by alloc_low_page) had
> somehow got initialised to a bogus value such that it was pointing at
> the domain builder supplied page tables (hence RO but not marked as
> pinned yet).
Well, Xen would know they''re pinned, even if the kernel''s
structures
don''t, one would expect to see any completely bogus writes appear on
the
Xen console.
>  There was some unification work in this area around the
> beginning of March although I''m pretty such I''ve had it
work much more
> recently. (I guess it might not have been merged into a visible branch
> until more recently, it''s a bit hard to tell with git but I
don''t think
> that''s what happened).
Yeah, I think it has been working properly for some since then, though I 
think Pasi has been reporting problems since 32-bit dom0 first booted.

My symptoms are a bit all over the place.  For a while I was just seeing 
writes-to-RO pages, but since I added some debugging to try and work out 
where that was happening, I''m now seeing more major pagetable
corruption
(like instruction fetches failing because of reserved bits being set in 
the pagetable...).  So something is stomping pagetable, and I think its 
some page being freed.

I think these are somewhat similar to Pasi''s symptoms which he said
that
disabling HIGHPTE fixed.   I see problems independent of the HIGHPTE 
setting.

I have best success in causing crashes when scp''ing a 8GB file onto my 
XFS /home filesystem.  XFS does quite a lot of vmapping, so that may 
exacerbate the problem.

I also realized that my Xen doesn''t have debug=y set, so I''m
probably
missing some information.

    J

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Boris Derzhavets

2009-May-01 09:13 UTC

head link

Re: [Xen-devel] Yum install xen on F10

Michael,

I upgraded F10 . Attempt "rpmbuild --rebuild" still failed (on 64-bit
system) due to line in xen.spec:-

# so that x86_64 builds pick up glibc32 correctly
BuildRequires: /usr/include/gnu/stubs-32.h

It''s not a problem to obtain and install  /usr/include/gnu/stubs-32.h.
I also tried create a symlink to it in SOURCES. No luck.
glibc32 headers rpm cannot be installed on F10. It requires 
all glibc32 rpms install. It conflicts with glibc64 already installed.

Like in previous time i just commented it out and rebuilt
xen 3.3.1 rpms.

# rpmbuild -ba ./xen.spec

PV domus are not affected.  I believe , it might affect HVM at the point 
when pvops kernel will be fixed to support HVM.
On the other side i guess F11 will be out earlier.

Thanks.
Boris.






      

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

John Haxby

2009-May-01 09:26 UTC

head link

Re: [Xen-devel] Yum install xen on F10

Boris Derzhavets wrote:> Michael,
>
> I upgraded F10 . Attempt "rpmbuild --rebuild" still failed (on
64-bit
> system) due to line in xen.spec:-
>
> # so that x86_64 builds pick up glibc32 correctly
> BuildRequires: /usr/include/gnu/stubs-32.h
>
> It''s not a problem to obtain and install 
/usr/include/gnu/stubs-32.h.
> I also tried create a symlink to it in SOURCES. No luck.
> glibc32 headers rpm cannot be installed on F10. It requires
> all glibc32 rpms install. It conflicts with glibc64 already installed.
>I have glibc-devel.i386 and glibc-devel.x86_64 installed on my F10 
machine, no conflicts.

Do you have some other problem?

jch


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Pasi Kärkkäinen

2009-May-01 09:58 UTC

head link

Re: [Xen-devel] xen.git branch reorg / crash with 2.6.30-rc3 pv_ops dom0

On Wed, Apr 29, 2009 at 09:25:24AM -0700, Jeremy Fitzhardinge
wrote:> 
> > There was some unification work in this area around the
> >beginning of March although I''m pretty such I''ve had
it work much more
> >recently. (I guess it might not have been merged into a visible branch
> >until more recently, it''s a bit hard to tell with git but I
don''t think
> >that''s what happened).
> 
> Yeah, I think it has been working properly for some since then, though I 
> think Pasi has been reporting problems since 32-bit dom0 first booted.
> 
Yep, I''ve had problems from the beginning.. 

dom0/hackery was working pretty OK for me before it was removed.. 
I could run and install new PV guests etc.
> My symptoms are a bit all over the place.  For a while I was just seeing 
> writes-to-RO pages, but since I added some debugging to try and work out 
> where that was happening, I''m now seeing more major pagetable
corruption
> (like instruction fetches failing because of reserved bits being set in 
> the pagetable...).  So something is stomping pagetable, and I think its 
> some page being freed.
> 
> I think these are somewhat similar to Pasi''s symptoms which he
said that
> disabling HIGHPTE fixed.   I see problems independent of the HIGHPTE 
> setting.
> 
CONFIG_HIGHPTE=n fixed my "dom0 kernel crashes while compiling kernel
source in dom0" test..
could be the problem still happens with some other tests though.. 
> I have best success in causing crashes when scp''ing a 8GB file
onto my
> XFS /home filesystem.  XFS does quite a lot of vmapping, so that may 
> exacerbate the problem.
> 
I could try scp aswell.. 
> I also realized that my Xen doesn''t have debug=y set, so
I''m probably
> missing some information.
> 
Hmm, good point. I don''t have debug=y either.. 

-- Pasi

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Boris Derzhavets

2009-May-01 10:51 UTC

head link

Re: [Xen-devel] Yum install xen on F10

I''ve tried to follow your advise:-

[root@ServerFDR10 Download]# uname -a
Linux ServerFDR10 2.6.27.21-170.2.56.fc10.x86_64 #1 SMP Mon Mar 23 23:08:10 EDT
2009 x86_64 x86_64 x86_64 GNU/Linux

[root@ServerFDR10 Download]# ls -l
total 29928
-rw-rw-r-- 1 boris boris  4999329 2009-04-30 12:26 glibc-2.9-2.i386.rpm
-rw-rw-r-- 1 boris boris 22727800 2009-04-30 12:35 glibc-common-2.9-2.i386.rpm
-rw-rw-r-- 1 boris boris  2228772 2009-04-30 12:36 glibc-devel-2.9-2.i386.rpm
-rw-rw-r-- 1 boris boris   630193 2009-04-30 12:31 glibc-headers-2.9-2.i386.rpm
-rwxr--r-- 1 root  root       120 2009-04-30 12:38 inst.sh

[root@ServerFDR10 Download]# rpm -ivh glibc-devel-2.9-2.i386.rpm
error: Failed dependencies:
    glibc = 2.9-2 is needed by glibc-devel-2.9-2.i386
    glibc-headers = 2.9-2 is needed by glibc-devel-2.9-2.i386
    libBrokenLocale.so.1 is needed by glibc-devel-2.9-2.i386
    libanl.so.1 is needed by glibc-devel-2.9-2.i386
    libcidn.so.1 is needed by glibc-devel-2.9-2.i386
    libcrypt.so.1 is needed by glibc-devel-2.9-2.i386
    libdl.so.2 is needed by glibc-devel-2.9-2.i386
    libm.so.6 is needed by glibc-devel-2.9-2.i386
    libnsl.so.1 is needed by glibc-devel-2.9-2.i386
    libnss_compat.so.2 is needed by glibc-devel-2.9-2.i386
    libnss_dns.so.2 is needed by glibc-devel-2.9-2.i386
    libnss_files.so.2 is needed by glibc-devel-2.9-2.i386
    libnss_hesiod.so.2 is needed by glibc-devel-2.9-2.i386
    libnss_nis.so.2 is needed by glibc-devel-2.9-2.i386
    libnss_nisplus.so.2 is needed by glibc-devel-2.9-2.i386
    libresolv.so.2 is needed by glibc-devel-2.9-2.i386
    librt.so.1 is needed by glibc-devel-2.9-2.i386
    libthread_db.so.1 is needed by glibc-devel-2.9-2.i386
    libutil.so.1 is needed by glibc-devel-2.9-2.i386

Boris.

--- On Fri, 5/1/09, John Haxby <john.haxby@oracle.com> wrote:
From: John Haxby <john.haxby@oracle.com>
Subject: Re: [Xen-devel] Yum install xen on F10
To: bderzhavets@yahoo.com
Cc: "M A Young" <m.a.young@durham.ac.uk>, "Ian
Campbell" <Ian.Campbell@eu.citrix.com>, "Jeremy
Fitzhardinge" <jeremy@goop.org>, "Xen-devel"
<xen-devel@lists.xensource.com>
Date: Friday, May 1, 2009, 5:26 AM

Boris Derzhavets wrote:> Michael,
>
> I upgraded F10 . Attempt "rpmbuild --rebuild" still failed (on
64-bit > system) due to line in xen.spec:-
>
> # so that x86_64 builds pick up glibc32 correctly
> BuildRequires: /usr/include/gnu/stubs-32.h
>
> It''s not a problem to obtain and install 
/usr/include/gnu/stubs-32.h.
> I also tried create a symlink to it in SOURCES. No luck.
> glibc32 headers rpm cannot be installed on F10. It requires
> all glibc32 rpms install. It conflicts with glibc64 already installed.
>I have glibc-devel.i386 and glibc-devel.x86_64 installed on my F10 
machine, no conflicts.

Do you have some other problem?

jch




      

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

M A Young

2009-May-01 10:55 UTC

head link

Re: [Xen-devel] Yum install xen on F10

On Fri, 1 May 2009, Boris Derzhavets wrote:
> Michael,
> 
> I upgraded F10 . Attempt "rpmbuild --rebuild" still failed (on
64-bit
> system) due to line in xen.spec:-
> 
> # so that x86_64 builds pick up glibc32 correctly
> BuildRequires: /usr/include/gnu/stubs-32.h
yum install glibc-devel.i386

 	Michael Young

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Boris Derzhavets

2009-May-01 11:19 UTC

head link

Re: [Xen-devel] Yum install xen on F10

Thank you, Michael.
Boris

--- On Fri, 5/1/09, M A Young <m.a.young@durham.ac.uk> wrote:
From: M A Young <m.a.young@durham.ac.uk>
Subject: Re: [Xen-devel] Yum install xen on F10
To: "Boris Derzhavets" <bderzhavets@yahoo.com>
Cc: "Xen-devel" <xen-devel@lists.xensource.com>
Date: Friday, May 1, 2009, 6:55 AM

On Fri, 1 May 2009, Boris Derzhavets wrote:
> Michael,
> 
> I upgraded F10 . Attempt "rpmbuild --rebuild" still failed (on
64-bit> system) due to line in xen.spec:-
> 
> # so that x86_64 builds pick up glibc32 correctly
> BuildRequires: /usr/include/gnu/stubs-32.h
yum install glibc-devel.i386

	Michael Young

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel



      

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jeremy Fitzhardinge

2009-May-01 18:35 UTC

head link

Re: [Xen-devel] xen.git branch reorg / crash with 2.6.30-rc3 pv_ops dom0

Pasi Kärkkäinen wrote:> On Wed, Apr 29, 2009 at 09:25:24AM -0700, Jeremy Fitzhardinge wrote:
>   
>>> There was some unification work in this area around the
>>> beginning of March although I''m pretty such I''ve
had it work much more
>>> recently. (I guess it might not have been merged into a visible
branch
>>> until more recently, it''s a bit hard to tell with git but
I don''t think
>>> that''s what happened).
>>>       
>> Yeah, I think it has been working properly for some since then, though
I
>> think Pasi has been reporting problems since 32-bit dom0 first booted.
>>
>>     
>
> Yep, I''ve had problems from the beginning.. 
>
> dom0/hackery was working pretty OK for me before it was removed.. 
> I could run and install new PV guests etc.
>   
Sorry, perhaps I was too hasty in removing the branches.  Though you 
still have references in your own git tree?

You still had problems with HIGHPTE in hackery?  My suspicion is that 
we''re seeing the same bug manifesting in a much more obvious way, since
I''m seeing somewhat similar symptoms, even without HIGHPTE.
>> I also realized that my Xen doesn''t have debug=y set, so
I''m probably
>> missing some information.
>>
>>     
>
> Hmm, good point. I don''t have debug=y either.. 
>   
That would be useful.  It would be interesting to see if Xen complains 
about any stray pte writes.

    J

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Pasi Kärkkäinen

2009-May-05 17:19 UTC

head link

Re: [Xen-devel] xen.git branch reorg / crash with 2.6.30-rc3 pv_ops dom0

On Fri, May 01, 2009 at 11:35:19AM -0700, Jeremy Fitzhardinge
wrote:> Pasi Kärkkäinen wrote:
> >On Wed, Apr 29, 2009 at 09:25:24AM -0700, Jeremy Fitzhardinge wrote:
> >  
> >>>There was some unification work in this area around the
> >>>beginning of March although I''m pretty such
I''ve had it work much more
> >>>recently. (I guess it might not have been merged into a visible
branch
> >>>until more recently, it''s a bit hard to tell with git
but I don''t think
> >>>that''s what happened).
> >>>      
> >>Yeah, I think it has been working properly for some since then,
though I
> >>think Pasi has been reporting problems since 32-bit dom0 first
booted.
> >>
> >>    
> >
> >Yep, I''ve had problems from the beginning.. 
> >
> >dom0/hackery was working pretty OK for me before it was removed.. 
> >I could run and install new PV guests etc.
> >  
> 
> Sorry, perhaps I was too hasty in removing the branches.  Though you 
> still have references in your own git tree?
> 
> You still had problems with HIGHPTE in hackery?  My suspicion is that 
> we''re seeing the same bug manifesting in a much more obvious way,
since
> I''m seeing somewhat similar symptoms, even without HIGHPTE.
> 
Actually I didn''t try with CONFIG_HIGHPTE=y for some time, since I
didn''t see
any notes about it being fixed/changed..
> >>I also realized that my Xen doesn''t have debug=y set, so
I''m probably
> >>missing some information.
> >>
> >>    
> >
> >Hmm, good point. I don''t have debug=y either.. 
> >  
> 
> That would be useful.  It would be interesting to see if Xen complains 
> about any stray pte writes.
> 
And here we go:
http://pasik.reaktio.net/xen/pv_ops-dom0-debug/pv_ops-dom0-bootlog-28-xen331-linux-2.6.30-rc3-next-crash-no-highpte.txt

(XEN) d0:v0: unhandled page fault (ec=0003)
(XEN) Pagetable walk from c1268000:
(XEN)  L3[0x003] = 000000003c8ee001 000008ee
(XEN)  L2[0x009] = 000000003d276067 00001276 
(XEN)  L1[0x068] = 000000003d268061 00001268
(XEN) domain_crash_sync called from entry.S (ff1a5c72)
(XEN) Domain 0 (vcpu#0) crashed on cpu#0:
(XEN) ----[ Xen-3.3.1-12.0customdebug1.fc11  x86_32p  debug=y  Not tainted ]----
(XEN) CPU:    0
(XEN) EIP:    e019:[<c0888d87>]

[root@dom0test linux-2.6-xen]# gdb vmlinux
(gdb) x/i 0xc0888d87
0xc0888d87 <__constant_c_memset+21>:    rep stos %eax,%es:(%edi)
(gdb) 

-- Pasi

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jeremy Fitzhardinge

2009-May-05 20:10 UTC

head link

Re: [Xen-devel] xen.git branch reorg / crash with 2.6.30-rc3 pv_ops dom0

Pasi Kärkkäinen wrote:> And here we go:
>
http://pasik.reaktio.net/xen/pv_ops-dom0-debug/pv_ops-dom0-bootlog-28-xen331-linux-2.6.30-rc3-next-crash-no-highpte.txt
>
> (XEN) d0:v0: unhandled page fault (ec=0003)
> (XEN) Pagetable walk from c1268000:
>   

Oh, look, we''re not reserving the Xen pagetable, and this time we
really
should.  Does this work for you?
>From f26499cadfd057e4377e92ba680e16fa7bdf9422 Mon Sep 17 00:00:00 2001From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Date: Tue, 5 May 2009 13:08:42 -0700
Subject: [PATCH] xen/i386: reserve Xen pagetables

The Xen pagetables are no longer implicitly reserved as part of the other
i386_start_kernel reservations, so make sure we explicitly reserve them.
This prevents them from being released into the general kernel free page
pool and reused.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index 0e13477..801d042 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -1799,6 +1799,11 @@ __init pgd_t *xen_setup_kernel_pagetable(pgd_t *pgd,
 
 	pin_pagetable_pfn(MMUEXT_PIN_L3_TABLE, PFN_DOWN(__pa(swapper_pg_dir)));
 
+	reserve_early(__pa(xen_start_info->pt_base),
+		      __pa(xen_start_info->pt_base +
+			   xen_start_info->nr_pt_frames * PAGE_SIZE),
+		      "XEN PAGETABLES");
+
 	return swapper_pg_dir;
 }
 #endif	/* CONFIG_X86_64 */



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jiang, Yunhong

2009-May-06 06:48 UTC

head link

RE: [Xen-devel] xen.git branch reorg / crash with 2.6.30-rc3 pv_ops dom0

Are there any recommendation that which branch will work? I meet the same issue
in the xen-tip/master tree.

BTW, has anyone tried the xen-tip/dom0/apic branch? It always complains unknow
partition type on my system for sda, while the same image works quite well when
running as native.


Thanks
Yunhong Jiang

xen-devel-bounces@lists.xensource.com wrote:> Pasi Kärkkäinen wrote:
>> On Wed, Apr 29, 2009 at 09:25:24AM -0700, Jeremy Fitzhardinge wrote:
>> 
>>>> There was some unification work in this area around the
>>>> beginning of March although I''m pretty such
I''ve had it work much
>>>> more recently. (I guess it might not have been merged into a
>>>> visible branch until more recently, it''s a bit hard to
tell with
>>>> git but I don''t think that''s what happened). 
>>>> 
>>> Yeah, I think it has been working properly for some since then,
>>> though I think Pasi has been reporting problems since 32-bit dom0
>>> first booted. 
>>> 
>>> 
>> 
>> Yep, I''ve had problems from the beginning..
>> 
>> dom0/hackery was working pretty OK for me before it was removed..
>> I could run and install new PV guests etc.
>> 
> 
> Sorry, perhaps I was too hasty in removing the branches.  Though you
> still have references in your own git tree?
> 
> You still had problems with HIGHPTE in hackery?  My suspicion is that
> we''re seeing the same bug manifesting in a much more obvious
> way, since
> I''m seeing somewhat similar symptoms, even without HIGHPTE.
> 
>>> I also realized that my Xen doesn''t have debug=y set, so
I''m
>>> probably missing some information. 
>>> 
>>> 
>> 
>> Hmm, good point. I don''t have debug=y either..
>> 
> 
> That would be useful.  It would be interesting to see if Xen
> complains about any stray pte writes. 
> 
>    J
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jiang, Yunhong

2009-May-06 07:40 UTC

head link

RE: [Xen-devel] xen.git branch reorg / crash with 2.6.30-rc3 pv_ops dom0

After switch to a new build system, seems xen-tip/master can works.

Thanks
Yunhong Jiang

xen-devel-bounces@lists.xensource.com wrote:> Are there any recommendation that which branch will work? I
> meet the same issue in the xen-tip/master tree.
> 
> BTW, has anyone tried the xen-tip/dom0/apic branch? It always
> complains unknow partition type on my system for sda, while
> the same image works quite well when running as native.
> 
> 
> Thanks
> Yunhong Jiang
> 
> xen-devel-bounces@lists.xensource.com wrote:
>> Pasi Kärkkäinen wrote:
>>> On Wed, Apr 29, 2009 at 09:25:24AM -0700, Jeremy Fitzhardinge
wrote:
>>> 
>>>>> There was some unification work in this area around the
>>>>> beginning of March although I''m pretty such
I''ve had it work much
>>>>> more recently. (I guess it might not have been merged into
a
>>>>> visible branch until more recently, it''s a bit
hard to tell with
>>>>> git but I don''t think that''s what
happened).
>>>>> 
>>>> Yeah, I think it has been working properly for some since then,
>>>> though I think Pasi has been reporting problems since 32-bit
dom0
>>>> first booted. 
>>>> 
>>>> 
>>> 
>>> Yep, I''ve had problems from the beginning..
>>> 
>>> dom0/hackery was working pretty OK for me before it was removed..
>>> I could run and install new PV guests etc.
>>> 
>> 
>> Sorry, perhaps I was too hasty in removing the branches.  Though you
>> still have references in your own git tree?
>> 
>> You still had problems with HIGHPTE in hackery?  My suspicion is that
>> we''re seeing the same bug manifesting in a much more obvious
>> way, since
>> I''m seeing somewhat similar symptoms, even without HIGHPTE.
>> 
>>>> I also realized that my Xen doesn''t have debug=y set,
so I''m
>>>> probably missing some information.
>>>> 
>>>> 
>>> 
>>> Hmm, good point. I don''t have debug=y either..
>>> 
>> 
>> That would be useful.  It would be interesting to see if Xen
>> complains about any stray pte writes.
>> 
>>    J
>> 
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xensource.com
>> http://lists.xensource.com/xen-devel
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jeremy Fitzhardinge

2009-May-06 15:54 UTC

head link

Re: [Xen-devel] xen.git branch reorg / crash with 2.6.30-rc3 pv_ops dom0

Jiang, Yunhong wrote:> After switch to a new build system, seems xen-tip/master can works.
>   
Yes, I committed a fix to xen-tip/master yesterday.

    J

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Pasi Kärkkäinen

2009-May-06 18:54 UTC

head link

Re: [Xen-devel] xen.git branch reorg / success with 2.6.30-rc3 pv_ops dom0

On Tue, May 05, 2009 at 01:10:43PM -0700, Jeremy Fitzhardinge
wrote:> Pasi Kärkkäinen wrote:
> >And here we go:
>
>http://pasik.reaktio.net/xen/pv_ops-dom0-debug/pv_ops-dom0-bootlog-28-xen331-linux-2.6.30-rc3-next-crash-no-highpte.txt
> >
> >(XEN) d0:v0: unhandled page fault (ec=0003)
> >(XEN) Pagetable walk from c1268000:
> >  
> 
> 
> Oh, look, we''re not reserving the Xen pagetable, and this time we
really
> should.  Does this work for you?
> 
Yes, this fixes the problem! Now xen-tip/next pv_ops dom0 boots OK for me,
just like dom0/hackery did earlier!

But there''s more.. I was pretty surprised when I booted up my testbox, 
using the serial console as usual, and in the end of the boot process login
prompt (getty) appeared on the VGA text console (tty1) !!

It seems this (or some other) patch has fixed the 32bit pv_ops dom0 console
problems aswell..

Next I tried without the serial console, and yes, now the VGA text console 
works just like it does with the old 2.6.18 xenlinux tree! I can see the
pv_ops dom0 kernel boot messages after Xen boot messages just fine.

My grub.conf:

title Fedora Xen pv_ops dom0-test (2.6.30-rc3-tip)
        root (hd0,0)
        kernel /xen-3.3.gz dom0_mem=1024M loglvl=all guest_loglvl=all
        module /vmlinuz-2.6.30-rc3-tip ro root=/dev/vg00/lv01
        module /initrd-2.6.30-rc3-tip.img

No need to specify console= or earlyprintk= anymore!

Thanks for all the hard work and congratulations!

-- Pasi

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jeremy Fitzhardinge

2009-May-06 21:51 UTC

head link

Re: [Xen-devel] xen.git branch reorg / success with 2.6.30-rc3 pv_ops dom0

Pasi Kärkkäinen wrote:>> Oh, look, we''re not reserving the Xen pagetable, and this time
we really
>> should.  Does this work for you?
>>
>>     
>
> Yes, this fixes the problem! Now xen-tip/next pv_ops dom0 boots OK for me,
> just like dom0/hackery did earlier!
>   
Great!  I''d be interested to know if you''re still having
HIGHPTE
problems.  It may or may not have got fixed.
> But there''s more.. I was pretty surprised when I booted up my
testbox,
> using the serial console as usual, and in the end of the boot process login
> prompt (getty) appeared on the VGA text console (tty1) !!
>   
Yes, I fixed that separately the other day, but wouldn''t have noticed 
without being able to boot ;)
> Next I tried without the serial console, and yes, now the VGA text console 
> works just like it does with the old 2.6.18 xenlinux tree! I can see the
> pv_ops dom0 kernel boot messages after Xen boot messages just fine.
>   
Good.  Have you tried starting X?

    J

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Pasi Kärkkäinen

2009-May-07 17:24 UTC

head link

Re: [Xen-devel] xen.git branch reorg / success with 2.6.30-rc3 pv_ops dom0

On Wed, May 06, 2009 at 02:51:59PM -0700, Jeremy Fitzhardinge
wrote:> Pasi Kärkkäinen wrote:
> >>Oh, look, we''re not reserving the Xen pagetable, and this
time we really
> >>should.  Does this work for you?
> >>
> >>    
> >
> >Yes, this fixes the problem! Now xen-tip/next pv_ops dom0 boots OK for
me,
> >just like dom0/hackery did earlier!
> >  
> 
> Great!  I''d be interested to know if you''re still having
HIGHPTE
> problems.  It may or may not have got fixed.
> 
I just tried with CONFIG_HIGHPTE=y but that didn''t seem to work:
http://pasik.reaktio.net/xen/pv_ops-dom0-debug/pv_ops-dom0-bootlog-30-xen331-linux-2.6.30-rc3-next-crash-with-highpte.txt

(XEN) mm.c:2006:d0 Bad type (saw 28000001 != exp e0000000) for mfn 6b0a6 (pfn
2c959)
(XEN) mm.c:707:d0 Error getting mfn 6b0a6 (pfn 2c959) from L1 entry
000000006b0a6063 for dom0
(XEN) mm.c:3640:d0 ptwr_emulate: could not get_page_from_l1e()

BUG: unable to handle kernel paging request at c0207d58
IP: [<c0405bf7>] xen_set_pte+0x89/0x93
*pdpt = 000000003c8ef001 
Oops: 0003 [#1] SMP 
Pid: 323, comm: kswapd0 Not tainted (2.6.30-rc3-tip #34) P8SC8
EIP: 0061:[<c0405bf7>] EFLAGS: 00010296 CPU: 0
EIP is at xen_set_pte+0x89/0x93
EAX: 00000000 EBX: c0207d58 ECX: 00000000 EDX: 6b0a6063
ESI: 00000000 EDI: 000bd778 EBP: e254cd80 ESP: e254cd70
 DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0069
Process kswapd0 (pid: 323, ti=e254c000 task=e25c8000 task.ti=e254c000)

[root@dom0test linux-2.6-xen]# gdb vmlinux
(gdb) x/i 0xc0405bf7
0xc0405bf7 <xen_set_pte+137>:   mov    %edx,(%ebx)
(gdb) 

> >Next I tried without the serial console, and yes, now the VGA text
console
> >works just like it does with the old 2.6.18 xenlinux tree! I can see
the
> >pv_ops dom0 kernel boot messages after Xen boot messages just fine.
> >  
> 
> Good.  Have you tried starting X?
> 
Nope, not yet. I don''t even have X in that box.. I''ll try that
later :)

-- Pasi

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jeremy Fitzhardinge

2009-May-07 18:30 UTC

head link

Re: [Xen-devel] xen.git branch reorg / success with 2.6.30-rc3 pv_ops dom0

Pasi Kärkkäinen wrote:> I just tried with CONFIG_HIGHPTE=y but that didn''t seem to work:
>
http://pasik.reaktio.net/xen/pv_ops-dom0-debug/pv_ops-dom0-bootlog-30-xen331-linux-2.6.30-rc3-next-crash-with-highpte.txt
>
> (XEN) mm.c:2006:d0 Bad type (saw 28000001 != exp e0000000) for mfn 6b0a6
(pfn 2c959)
> (XEN) mm.c:707:d0 Error getting mfn 6b0a6 (pfn 2c959) from L1 entry
000000006b0a6063 for dom0
> (XEN) mm.c:3640:d0 ptwr_emulate: could not get_page_from_l1e()
>
> BUG: unable to handle kernel paging request at c0207d58
> IP: [<c0405bf7>] xen_set_pte+0x89/0x93
> *pdpt = 000000003c8ef001 
> Oops: 0003 [#1] SMP 
> Pid: 323, comm: kswapd0 Not tainted (2.6.30-rc3-tip #34) P8SC8
> EIP: 0061:[<c0405bf7>] EFLAGS: 00010296 CPU: 0
> EIP is at xen_set_pte+0x89/0x93
> EAX: 00000000 EBX: c0207d58 ECX: 00000000 EDX: 6b0a6063
> ESI: 00000000 EDI: 000bd778 EBP: e254cd80 ESP: e254cd70
>  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0069
> Process kswapd0 (pid: 323, ti=e254c000 task=e25c8000 task.ti=e254c000)
>   
Hm, can''t have everything I suppose...  I wonder what''s going
on here; I
haven''t seen any problems with highpte at all.  Something
.config-specific?

    J

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Pasi Kärkkäinen

2009-May-07 18:46 UTC

head link

Re: [Xen-devel] xen.git branch reorg / success with 2.6.30-rc3 pv_ops dom0

On Thu, May 07, 2009 at 11:30:04AM -0700, Jeremy Fitzhardinge
wrote:> Pasi Kärkkäinen wrote:
> >I just tried with CONFIG_HIGHPTE=y but that didn''t seem to
work:
>
>http://pasik.reaktio.net/xen/pv_ops-dom0-debug/pv_ops-dom0-bootlog-30-xen331-linux-2.6.30-rc3-next-crash-with-highpte.txt
> >
> >(XEN) mm.c:2006:d0 Bad type (saw 28000001 != exp e0000000) for mfn
6b0a6
> >(pfn 2c959)
> >(XEN) mm.c:707:d0 Error getting mfn 6b0a6 (pfn 2c959) from L1 entry 
> >000000006b0a6063 for dom0
> >(XEN) mm.c:3640:d0 ptwr_emulate: could not get_page_from_l1e()
> >
> >BUG: unable to handle kernel paging request at c0207d58
> >IP: [<c0405bf7>] xen_set_pte+0x89/0x93
> >*pdpt = 000000003c8ef001 
> >Oops: 0003 [#1] SMP 
> >Pid: 323, comm: kswapd0 Not tainted (2.6.30-rc3-tip #34) P8SC8
> >EIP: 0061:[<c0405bf7>] EFLAGS: 00010296 CPU: 0
> >EIP is at xen_set_pte+0x89/0x93
> >EAX: 00000000 EBX: c0207d58 ECX: 00000000 EDX: 6b0a6063
> >ESI: 00000000 EDI: 000bd778 EBP: e254cd80 ESP: e254cd70
> > DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0069
> >Process kswapd0 (pid: 323, ti=e254c000 task=e25c8000 task.ti=e254c000)
> >  
> 
> Hm, can''t have everything I suppose...  I wonder what''s
going on here; I
> haven''t seen any problems with highpte at all.  Something
.config-specific?
> 
Hmm.. it could be my .config.

http://pasik.reaktio.net/xen/pv_ops-dom0-debug/config-2.6.30-rc3-tip-next-with-highpte

Also attached to this mail. It''s originally based on Fedora 10 default
kernel config.

Anything suspicious?

-- Pasi


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Pasi Kärkkäinen

2009-May-14 11:11 UTC

head link

Re: [Xen-devel] xen.git branch reorg / success with 2.6.30-rc3 pv_ops dom0 / CONFIG_HIGHPTE problems

On Thu, May 07, 2009 at 09:46:24PM +0300, Pasi Kärkkäinen
wrote:> On Thu, May 07, 2009 at 11:30:04AM -0700, Jeremy Fitzhardinge wrote:
> > Pasi Kärkkäinen wrote:
> > >I just tried with CONFIG_HIGHPTE=y but that didn''t seem
to work:
> >
>http://pasik.reaktio.net/xen/pv_ops-dom0-debug/pv_ops-dom0-bootlog-30-xen331-linux-2.6.30-rc3-next-crash-with-highpte.txt
> > >
> > >(XEN) mm.c:2006:d0 Bad type (saw 28000001 != exp e0000000) for mfn
6b0a6
> > >(pfn 2c959)
> > >(XEN) mm.c:707:d0 Error getting mfn 6b0a6 (pfn 2c959) from L1
entry
> > >000000006b0a6063 for dom0
> > >(XEN) mm.c:3640:d0 ptwr_emulate: could not get_page_from_l1e()
> > >
> > >BUG: unable to handle kernel paging request at c0207d58
> > >IP: [<c0405bf7>] xen_set_pte+0x89/0x93
> > >*pdpt = 000000003c8ef001 
> > >Oops: 0003 [#1] SMP 
> > >Pid: 323, comm: kswapd0 Not tainted (2.6.30-rc3-tip #34) P8SC8
> > >EIP: 0061:[<c0405bf7>] EFLAGS: 00010296 CPU: 0
> > >EIP is at xen_set_pte+0x89/0x93
> > >EAX: 00000000 EBX: c0207d58 ECX: 00000000 EDX: 6b0a6063
> > >ESI: 00000000 EDI: 000bd778 EBP: e254cd80 ESP: e254cd70
> > > DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0069
> > >Process kswapd0 (pid: 323, ti=e254c000 task=e25c8000
task.ti=e254c000)
> > >  
> > 
> > Hm, can''t have everything I suppose...  I wonder
what''s going on here; I
> > haven''t seen any problems with highpte at all.  Something
.config-specific?
> > 
> 
> Hmm.. it could be my .config.
> 
>
http://pasik.reaktio.net/xen/pv_ops-dom0-debug/config-2.6.30-rc3-tip-next-with-highpte
> 
> Also attached to this mail. It''s originally based on Fedora 10
default kernel config.
> 
> Anything suspicious?
> 
I can also try with your .config .. 

-- Pasi

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jeremy Fitzhardinge

2009-May-15 22:48 UTC

head link

Re: [Xen-devel] xen.git branch reorg / success with 2.6.30-rc3 pv_ops dom0 / CONFIG_HIGHPTE problems

Pasi Kärkkäinen wrote:> I can also try with your .config .. 
>
> -- Pasi
>   
Attached.

    J


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Ian Campbell

2009-May-18 14:57 UTC

head link

Re: [Xen-devel] xen.git branch reorg / success with 2.6.30-rc3 pv_ops dom0

Hi Pasi,

On Thu, 2009-05-07 at 13:24 -0400, Pasi Kärkkäinen
wrote:> On Wed, May 06, 2009 at 02:51:59PM -0700, Jeremy Fitzhardinge wrote:
> > Great!  I''d be interested to know if you''re still
having HIGHPTE
> > problems.  It may or may not have got fixed.
> > 
> 
> I just tried with CONFIG_HIGHPTE=y but that didn''t seem to work:
>
http://pasik.reaktio.net/xen/pv_ops-dom0-debug/pv_ops-dom0-bootlog-30-xen331-linux-2.6.30-rc3-next-crash-with-highpte.txt
> 
> (XEN) mm.c:2006:d0 Bad type (saw 28000001 != exp e0000000) for mfn 6b0a6
(pfn 2c959)
> (XEN) mm.c:707:d0 Error getting mfn 6b0a6 (pfn 2c959) from L1 entry
000000006b0a6063 for dom0
> (XEN) mm.c:3640:d0 ptwr_emulate: could not get_page_from_l1e()
I thought I might have a poke at this, how do you go about reproducing
it? I''ve used your .config and it boots OK -- now I''m trying a
kernbench
run since I think you mentioned compiling a kernel in domain 0 at one
point.

How much RAM does your host have? Are you running 32 or 64 bit
hypervisor? Which hypervisor version? From your .config I think your
dom0 kernel is 32 bit, right?

Ian.



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Pasi Kärkkäinen

2009-May-18 17:06 UTC

head link

Re: [Xen-devel] xen.git branch reorg / success with 2.6.30-rc3 pv_ops dom0

On Mon, May 18, 2009 at 03:57:07PM +0100, Ian Campbell
wrote:> Hi Pasi,
> 
> On Thu, 2009-05-07 at 13:24 -0400, Pasi Kärkkäinen wrote:
> > On Wed, May 06, 2009 at 02:51:59PM -0700, Jeremy Fitzhardinge wrote:
> > > Great!  I''d be interested to know if you''re
still having HIGHPTE
> > > problems.  It may or may not have got fixed.
> > > 
> > 
> > I just tried with CONFIG_HIGHPTE=y but that didn''t seem to
work:
> >
http://pasik.reaktio.net/xen/pv_ops-dom0-debug/pv_ops-dom0-bootlog-30-xen331-linux-2.6.30-rc3-next-crash-with-highpte.txt
> > 
> > (XEN) mm.c:2006:d0 Bad type (saw 28000001 != exp e0000000) for mfn
6b0a6 (pfn 2c959)
> > (XEN) mm.c:707:d0 Error getting mfn 6b0a6 (pfn 2c959) from L1 entry
000000006b0a6063 for dom0
> > (XEN) mm.c:3640:d0 ptwr_emulate: could not get_page_from_l1e()
> 
> I thought I might have a poke at this, how do you go about reproducing
> it? I''ve used your .config and it boots OK -- now I''m
trying a kernbench
> run since I think you mentioned compiling a kernel in domain 0 at one
> point.
> 
I can reproduce it every time like this:

- boot into pv_ops dom0 kernel
- run "make bzImage && make modules"
- wait for the kernel compilation to finish..
- *crash*, before the compilation finishes, usually within 15-30mins
> How much RAM does your host have? Are you running 32 or 64 bit
> hypervisor? Which hypervisor version? From your .config I think your
> dom0 kernel is 32 bit, right?
> 
My host has 2GB of RAM.

Hypervisor is 32bit PAE, Fedora 11 Xen 3.3.1-11 rpms.
My dom0 kernel is 32bit PAE aswell.

grub.conf:

kernel /xen-3.3.gz dom0_mem=1024M
module /vmlinuz-2.6.30-rc3-tip root=/dev/sda1 ro
module /boot/initrd.img-2.6.30-rc3-tip

I''m (was) using xen-tip/next.

-- Pasi

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Pasi Kärkkäinen

2009-May-18 17:17 UTC

head link

Re: [Xen-devel] xen.git branch reorg / success with 2.6.30-rc3 pv_ops dom0

On Mon, May 18, 2009 at 08:06:26PM +0300, Pasi Kärkkäinen
wrote:> On Mon, May 18, 2009 at 03:57:07PM +0100, Ian Campbell wrote:
> > Hi Pasi,
> > 
> > On Thu, 2009-05-07 at 13:24 -0400, Pasi Kärkkäinen wrote:
> > > On Wed, May 06, 2009 at 02:51:59PM -0700, Jeremy Fitzhardinge
wrote:
> > > > Great!  I''d be interested to know if
you''re still having HIGHPTE
> > > > problems.  It may or may not have got fixed.
> > > > 
> > > 
> > > I just tried with CONFIG_HIGHPTE=y but that didn''t seem
to work:
> > >
http://pasik.reaktio.net/xen/pv_ops-dom0-debug/pv_ops-dom0-bootlog-30-xen331-linux-2.6.30-rc3-next-crash-with-highpte.txt
> > > 
> > > (XEN) mm.c:2006:d0 Bad type (saw 28000001 != exp e0000000) for
mfn 6b0a6 (pfn 2c959)
> > > (XEN) mm.c:707:d0 Error getting mfn 6b0a6 (pfn 2c959) from L1
entry 000000006b0a6063 for dom0
> > > (XEN) mm.c:3640:d0 ptwr_emulate: could not get_page_from_l1e()
> > 
> > I thought I might have a poke at this, how do you go about reproducing
> > it? I''ve used your .config and it boots OK -- now
I''m trying a kernbench
> > run since I think you mentioned compiling a kernel in domain 0 at one
> > point.
> > 
> 
> I can reproduce it every time like this:
> 
> - boot into pv_ops dom0 kernel
> - run "make bzImage && make modules"
> - wait for the kernel compilation to finish..
> - *crash*, before the compilation finishes, usually within 15-30mins
> 
Oh, and the crash happens only when dom0 kernel is built with CONFIG_HIGHPTE=y.
dom0 kernel compiled with CONFIG_HIGHPTE=n survives this test without crashing.

-- Pasi

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jeremy Fitzhardinge

2009-May-18 17:39 UTC

head link

Re: [Xen-devel] xen.git branch reorg / success with 2.6.30-rc3 pv_ops dom0

Pasi Kärkkäinen wrote:> Oh, and the crash happens only when dom0 kernel is built with
CONFIG_HIGHPTE=y.
> dom0 kernel compiled with CONFIG_HIGHPTE=n survives this test without
crashing.
What distro are you using?  What''s the gcc version?

    J


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Pasi Kärkkäinen

2009-May-18 17:50 UTC

head link

Re: [Xen-devel] xen.git branch reorg / success with 2.6.30-rc3 pv_ops dom0

On Mon, May 18, 2009 at 10:39:23AM -0700, Jeremy Fitzhardinge
wrote:> Pasi Kärkkäinen wrote:
> >Oh, and the crash happens only when dom0 kernel is built with 
> >CONFIG_HIGHPTE=y.
> >dom0 kernel compiled with CONFIG_HIGHPTE=n survives this test without 
> >crashing.
> 
> What distro are you using?  What''s the gcc version?
> 
I''ve seen that problem with Fedora 10 and Fedora 11 (rawhide). 

Fedora 11 is using:
gcc version 4.4.0 20090427 (Red Hat 4.4.0-3)

Fedora 10 is using:
gcc version 4.3.2 20081105 (Red Hat 4.3.2-7)

-- Pasi

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Pasi Kärkkäinen

2009-May-18 19:09 UTC

head link

Re: [Xen-devel] xen.git branch reorg / success with 2.6.30-rc3 pv_ops dom0

On Mon, May 18, 2009 at 08:06:26PM +0300, Pasi Kärkkäinen
wrote:> On Mon, May 18, 2009 at 03:57:07PM +0100, Ian Campbell wrote:
> > Hi Pasi,
> > 
> > On Thu, 2009-05-07 at 13:24 -0400, Pasi Kärkkäinen wrote:
> > > On Wed, May 06, 2009 at 02:51:59PM -0700, Jeremy Fitzhardinge
wrote:
> > > > Great!  I''d be interested to know if
you''re still having HIGHPTE
> > > > problems.  It may or may not have got fixed.
> > > > 
> > > 
> > > I just tried with CONFIG_HIGHPTE=y but that didn''t seem
to work:
> > >
http://pasik.reaktio.net/xen/pv_ops-dom0-debug/pv_ops-dom0-bootlog-30-xen331-linux-2.6.30-rc3-next-crash-with-highpte.txt
> > > 
> > > (XEN) mm.c:2006:d0 Bad type (saw 28000001 != exp e0000000) for
mfn 6b0a6 (pfn 2c959)
> > > (XEN) mm.c:707:d0 Error getting mfn 6b0a6 (pfn 2c959) from L1
entry 000000006b0a6063 for dom0
> > > (XEN) mm.c:3640:d0 ptwr_emulate: could not get_page_from_l1e()
> > 
> > I thought I might have a poke at this, how do you go about reproducing
> > it? I''ve used your .config and it boots OK -- now
I''m trying a kernbench
> > run since I think you mentioned compiling a kernel in domain 0 at one
> > point.
> > 
> 
> I can reproduce it every time like this:
> 
> - boot into pv_ops dom0 kernel
> - run "make bzImage && make modules"
> - wait for the kernel compilation to finish..
> - *crash*, before the compilation finishes, usually within 15-30mins
> 
> > How much RAM does your host have? Are you running 32 or 64 bit
> > hypervisor? Which hypervisor version? From your .config I think your
> > dom0 kernel is 32 bit, right?
> > 
> 
> My host has 2GB of RAM.
> 
> Hypervisor is 32bit PAE, Fedora 11 Xen 3.3.1-11 rpms.
> My dom0 kernel is 32bit PAE aswell.
> 
> grub.conf:
> 
> kernel /xen-3.3.gz dom0_mem=1024M
> module /vmlinuz-2.6.30-rc3-tip root=/dev/sda1 ro
> module /boot/initrd.img-2.6.30-rc3-tip
> 
Hmm, that''s the usual grub.conf I use, but for capturing the crash logs
I
use:

kernel /xen-3.3.gz dom0_mem=1024M loglvl=all guest_loglvl=all com1=19200,8n1
console=com1
module /vmlinuz-2.6.30-rc3-tip root=/dev/sda1 ro console=hvc0 earlyprintk=xen
module /boot/initrd.img-2.6.30-rc3-tip

And Xen hypervisor is a debug build.

-- Pasi

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Ian Campbell

2009-May-21 09:08 UTC

head link

Re: [Xen-devel] xen.git branch reorg / success with 2.6.30-rc3 pv_ops dom0

On Mon, 2009-05-18 at 13:50 -0400, Pasi Kärkkäinen wrote:
[...]

Thanks for all the info Pasi, I''ve been trying to repro but not
successfully. I setup an environment similar to yours (32p h/v and
kernel, ~512M dom0 RAM, swap) and managed most of an allmodconfig build
before I ran out of RAM trying to link the final vmlinux.

I haven''t yet tried different compiler versions, I''m using
4.1.2.

Is it always kswap<N> which has the issues? (subsequent traces seem to
be from the softlockup caused by the original failure, not repeated
failures). Could you perhaps try reproing with swap disabled?

Have you ever tried with a more recent hypervisor? I''m using
xen-unstable, I guess I should rollback to something based on the FC11
RPMs and try again.

Ian.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Pasi Kärkkäinen

2009-May-22 08:06 UTC

head link

Re: [Xen-devel] xen.git branch reorg / success with 2.6.30-rc3 pv_ops dom0

On Thu, May 21, 2009 at 10:08:35AM +0100, Ian Campbell
wrote:> On Mon, 2009-05-18 at 13:50 -0400, Pasi Kärkkäinen wrote:
> [...]
> 
> Thanks for all the info Pasi, I''ve been trying to repro but not
> successfully. I setup an environment similar to yours (32p h/v and
> kernel, ~512M dom0 RAM, swap) and managed most of an allmodconfig build
> before I ran out of RAM trying to link the final vmlinux.
> 
Thanks for looking into it :)

I have dom0_mem=1024M but dunno if that makes any difference..
> I haven''t yet tried different compiler versions, I''m
using 4.1.2.
> 
gcc version 4.4.0 20090427 (Red Hat 4.4.0-3) - Fedora 11
gcc version 4.3.2 20081105 (Red Hat 4.3.2-7) - Fedora 10

Those were the versions I tried with..
> Is it always kswap<N> which has the issues? (subsequent traces seem
to
> be from the softlockup caused by the original failure, not repeated
> failures). Could you perhaps try reproing with swap disabled?
> 
I went through a couple of the logs lately, and yeah, it seems to be kswapd..
I can try without swap.. late next week, I''m away until that..
> Have you ever tried with a more recent hypervisor? I''m using
> xen-unstable, I guess I should rollback to something based on the FC11
> RPMs and try again.
> 
Nope, I haven''t tried newer hypervisor yet.. I can try that aswell.

-- Pasi

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Pasi Kärkkäinen

2009-Jun-04 20:26 UTC

head link

Re: [Xen-devel] xen.git branch reorg / success with 2.6.30-rc3 pv_ops dom0

On Fri, May 22, 2009 at 11:06:56AM +0300, Pasi Kärkkäinen
wrote:> On Thu, May 21, 2009 at 10:08:35AM +0100, Ian Campbell wrote:
> > On Mon, 2009-05-18 at 13:50 -0400, Pasi Kärkkäinen wrote:
> > [...]
> > 
> > Thanks for all the info Pasi, I''ve been trying to repro but
not
> > successfully. I setup an environment similar to yours (32p h/v and
> > kernel, ~512M dom0 RAM, swap) and managed most of an allmodconfig
build
> > before I ran out of RAM trying to link the final vmlinux.
> > 
> 
> Thanks for looking into it :)
> 
> I have dom0_mem=1024M but dunno if that makes any difference..
> 
> > I haven''t yet tried different compiler versions, I''m
using 4.1.2.
> > 
> 
> gcc version 4.4.0 20090427 (Red Hat 4.4.0-3) - Fedora 11
> gcc version 4.3.2 20081105 (Red Hat 4.3.2-7) - Fedora 10
> 
> Those were the versions I tried with..
> 
> > Is it always kswap<N> which has the issues? (subsequent traces
seem to
> > be from the softlockup caused by the original failure, not repeated
> > failures). Could you perhaps try reproing with swap disabled?
> > 
> 
> I went through a couple of the logs lately, and yeah, it seems to be
kswapd..
> I can try without swap.. late next week, I''m away until that..
> 
I had a HDD crash on my testbox, but now it''s up again.. with a fresh
installation of Fedora 11 (rawhide). 

I tried a couple of times with the latest xen-tip/next tree, and pv_ops 
dom0 kernel built with CONFIG_HIGHMEM=y still crashes during the 
"make bzImage && make modules" test.

with swap enabled it takes around 30 minutes to crash.. without swap, the
crash happens in around 15 mins. 

Serial console logs here:
http://pasik.reaktio.net/xen/pv_ops-dom0-debug/pv_ops-dom0-log-01-with-highpte.txt
http://pasik.reaktio.net/xen/pv_ops-dom0-debug/pv_ops-dom0-log-02-with-highpte-no-swap.txt
> > Have you ever tried with a more recent hypervisor? I''m using
> > xen-unstable, I guess I should rollback to something based on the FC11
> > RPMs and try again.
> > 
> 
> Nope, I haven''t tried newer hypervisor yet.. I can try that
aswell.
> 
I was using stock Xen 3.3.1-11 from Fedora 11.

What do you suggest me to try next? 

-- Pasi

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Pasi Kärkkäinen

2009-Jun-04 20:30 UTC

head link

Re: [Xen-devel] xen.git branch reorg / success with 2.6.30-rc3 pv_ops dom0

On Thu, Jun 04, 2009 at 11:26:56PM +0300, Pasi Kärkkäinen
wrote:> On Fri, May 22, 2009 at 11:06:56AM +0300, Pasi Kärkkäinen wrote:
> > On Thu, May 21, 2009 at 10:08:35AM +0100, Ian Campbell wrote:
> > > On Mon, 2009-05-18 at 13:50 -0400, Pasi Kärkkäinen wrote:
> > > [...]
> > > 
> > > Thanks for all the info Pasi, I''ve been trying to repro
but not
> > > successfully. I setup an environment similar to yours (32p h/v
and
> > > kernel, ~512M dom0 RAM, swap) and managed most of an allmodconfig
build
> > > before I ran out of RAM trying to link the final vmlinux.
> > > 
> > 
> > Thanks for looking into it :)
> > 
> > I have dom0_mem=1024M but dunno if that makes any difference..
> > 
> > > I haven''t yet tried different compiler versions,
I''m using 4.1.2.
> > > 
> > 
> > gcc version 4.4.0 20090427 (Red Hat 4.4.0-3) - Fedora 11
> > gcc version 4.3.2 20081105 (Red Hat 4.3.2-7) - Fedora 10
> > 
> > Those were the versions I tried with..
> > 
> > > Is it always kswap<N> which has the issues? (subsequent
traces seem to
> > > be from the softlockup caused by the original failure, not
repeated
> > > failures). Could you perhaps try reproing with swap disabled?
> > > 
> > 
> > I went through a couple of the logs lately, and yeah, it seems to be
kswapd..
> > I can try without swap.. late next week, I''m away until
that..
> > 
> 
> I had a HDD crash on my testbox, but now it''s up again.. with a
fresh
> installation of Fedora 11 (rawhide). 
> 
> I tried a couple of times with the latest xen-tip/next tree, and pv_ops 
> dom0 kernel built with CONFIG_HIGHMEM=y still crashes during the 
> "make bzImage && make modules" test.
> 
> with swap enabled it takes around 30 minutes to crash.. without swap, the
> crash happens in around 15 mins. 
> 
> Serial console logs here:
>
http://pasik.reaktio.net/xen/pv_ops-dom0-debug/pv_ops-dom0-log-01-with-highpte.txt
>
http://pasik.reaktio.net/xen/pv_ops-dom0-debug/pv_ops-dom0-log-02-with-highpte-no-swap.txt
> 
Oh, and it seems to be always kswapd<0> having issues..

-- Pasi
> > > Have you ever tried with a more recent hypervisor? I''m
using
> > > xen-unstable, I guess I should rollback to something based on the
FC11
> > > RPMs and try again.
> > > 
> > 
> > Nope, I haven''t tried newer hypervisor yet.. I can try that
aswell.
> > 
> 
> I was using stock Xen 3.3.1-11 from Fedora 11.
> 
> What do you suggest me to try next? 
> 
> -- Pasi
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Ian Campbell

2009-Jun-05 10:20 UTC

head link

Re: [Xen-devel] xen.git branch reorg / success with 2.6.30-rc3 pv_ops dom0

On Thu, 2009-06-04 at 16:26 -0400, Pasi Kärkkäinen
wrote:> What do you suggest me to try next? 
I''m at a bit of a loss to be honest...

It''s interesting that it''s always kswapd0 even in the case
with no swap
configured. Were you running with CONFIG_SWAP=n or just with the swap
device turned off?

Judging from the backtrace the sequence of events seems to be roughly:
kswapd<0> runs and calls balance_pgdat which calls shrink_zone who in
turn calls shrink_active_list if inactive_anon_is_low() (so I think we
are dealing with anon pages). shrink_active_list() then iterates over a
list of pages calling page_referenced() on each one. page_referenced()
eventually calls down to page_referenced_one() (presumably via
page_referenced_anon()) and eventually to page_check_address() which
walks the page table and attempts to map the PTE page. This goes via
pte_offset_map() to kmap_atomic_pte() then xen_kmap_atomic_pte(). Here
we check if the page is pinned and then attempt to map it, since we
_think_ the page is not pinned the mapping is writable. However at this
point Xen reports that the page really is pinned (28000001 => Page Type
1 == L1 PT) and we are trying to make a writable mapping (e0000000 =>
Page Type 7 == Writable) which is disallowed.

Do you know which line of xen_set_pte() the fault is occurring at? I
assume either "ptep->pte_high =" or "ptep->pte_low =".

So the question is -- how come we have a page which is pinned but this
fact is not recorded in the struct page information? It might be
interesting to know if the corresponding L3 PT is pinned. If the mm is
active then this should always be the case and I _think_ it would be a
bug for the L3 to be pinned but not all the which L1s which it contains.
Can you try this patch which tries to notice this situation and prints
some potentially interesting information, similarly on the fault it
dumps a little more info. Since I can''t repro I''ve only tested
that it
doesn''t break normal use, I''ve not actually seen the debug
trigger...

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index abe8e4b..483bad7 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -285,7 +285,7 @@ check_v8086_mode(struct pt_regs *regs, unsigned long
address,
 		tsk->thread.screen_bitmap |= 1 << bit;
 }
 
-static void dump_pagetable(unsigned long address)
+void dump_pagetable(unsigned long address)
 {
 	__typeof__(pte_val(__pte(0))) page;
 
@@ -603,6 +603,10 @@ show_fault_oops(struct pt_regs *regs, unsigned long
error_code,
 	printk_address(regs->ip, 1);
 
 	dump_pagetable(address);
+	printk(KERN_CRIT "Fixmap KM_PTE0 @ %#lx\n", fix_to_virt(KM_PTE0));
+	dump_pagetable(fix_to_virt(KM_PTE0));
+	printk(KERN_CRIT "Fixmap KM_PTE0 @ %#lx\n", fix_to_virt(KM_PTE1));
+	dump_pagetable(fix_to_virt(KM_PTE1));
 }
 
 static noinline void
diff --git a/include/xen/swiotlb.h b/include/xen/swiotlb.h
index f35183b..5db8659 100644
--- a/include/xen/swiotlb.h
+++ b/include/xen/swiotlb.h
@@ -5,6 +5,10 @@ extern void xen_swiotlb_fixup(void *buf, size_t size, unsigned
long nslabs);
 extern phys_addr_t xen_bus_to_phys(dma_addr_t daddr);
 extern dma_addr_t xen_phys_to_bus(phys_addr_t paddr);
 extern int xen_range_needs_mapping(phys_addr_t phys, size_t size);
+#ifdef CONFIG_XEN_PCI
 extern int xen_wants_swiotlb(void);
+#else
+static inline int xen_wants_swiotlb(void) { return 0; }
+#endif
 
 #endif /* _XEN_SWIOTLB_H */
diff --git a/mm/rmap.c b/mm/rmap.c
index 1652166..ae5d5a0 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -267,6 +267,7 @@ unsigned long page_address_in_vma(struct page *page, struct
vm_area_struct *vma)
 pte_t *page_check_address(struct page *page, struct mm_struct *mm,
 			  unsigned long address, spinlock_t **ptlp, int sync)
 {
+	struct page *pgd_page, *pte_page;
 	pgd_t *pgd;
 	pud_t *pud;
 	pmd_t *pmd;
@@ -285,6 +286,22 @@ pte_t *page_check_address(struct page *page, struct
mm_struct *mm,
 	if (!pmd_present(*pmd))
 		return NULL;
 
+	pgd_page = virt_to_page(mm->pgd);
+	pte_page = pmd_page(*pmd);
+
+	if (PagePinned(pgd_page) != PagePinned(pte_page)) {
+		extern void dump_pagetable(unsigned long address);
+		printk(KERN_CRIT "L4 at %p is %s contains L2 at %p which points at an L1
which is %s %s\n",
+		       pgd, PagePinned(pgd_page) ? "pinned" : "unpinned",
+		       pmd, PagePinned(pte_page) ? "pinned" : "unpinned",
+		       PageHighMem(pte_page) ? "highmem" : "lowmem");
+		printk(KERN_CRIT "address %#lx\n", address);
+		dump_pagetable(address);
+		printk(KERN_CRIT "Fixmap KM_PTE0 @ %#lx\n", fix_to_virt(KM_PTE0));
+		dump_pagetable(fix_to_virt(KM_PTE0));
+		printk(KERN_CRIT "Fixmap KM_PTE0 @ %#lx\n", fix_to_virt(KM_PTE1));
+		dump_pagetable(fix_to_virt(KM_PTE1));
+	}
 	pte = pte_offset_map(pmd, address);
 	/* Make a quick check before getting the lock */
 	if (!sync && !pte_present(*pte)) {



I''d guess that this would at least work around the issue, I doubt
it''s a
proper fix and it''s going to shaft perf I suspect (not that highpte
won''t be doing a pretty good job of that ;-)).

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index fefdeee..4c694e4 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -1521,7 +1521,7 @@ static void *xen_kmap_atomic_pte(struct page *page, enum
km_type type)
 {
 	pgprot_t prot = PAGE_KERNEL;
 
-	if (PagePinned(page))
+	if (1 || PagePinned(page))
 		prot = PAGE_KERNEL_RO;
 
 	if (0 && PageHighMem(page))


Ian.



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Pasi Kärkkäinen

2009-Jun-05 11:23 UTC

head link

Re: [Xen-devel] xen.git branch reorg / success with 2.6.30-rc3 pv_ops dom0

On Fri, Jun 05, 2009 at 11:20:17AM +0100, Ian Campbell
wrote:> On Thu, 2009-06-04 at 16:26 -0400, Pasi Kärkkäinen wrote:
> > What do you suggest me to try next? 
> 
> I''m at a bit of a loss to be honest...
> 
> It''s interesting that it''s always kswapd0 even in the
case with no swap
> configured. Were you running with CONFIG_SWAP=n or just with the swap
> device turned off?
> 
I just ran "swapoff -a" before testing..
> Judging from the backtrace the sequence of events seems to be roughly:
> kswapd<0> runs and calls balance_pgdat which calls shrink_zone who in
> turn calls shrink_active_list if inactive_anon_is_low() (so I think we
> are dealing with anon pages). shrink_active_list() then iterates over a
> list of pages calling page_referenced() on each one. page_referenced()
> eventually calls down to page_referenced_one() (presumably via
> page_referenced_anon()) and eventually to page_check_address() which
> walks the page table and attempts to map the PTE page. This goes via
> pte_offset_map() to kmap_atomic_pte() then xen_kmap_atomic_pte(). Here
> we check if the page is pinned and then attempt to map it, since we
> _think_ the page is not pinned the mapping is writable. However at this
> point Xen reports that the page really is pinned (28000001 => Page Type
> 1 == L1 PT) and we are trying to make a writable mapping (e0000000 =>
> Page Type 7 == Writable) which is disallowed.
> 
> Do you know which line of xen_set_pte() the fault is occurring at? I
> assume either "ptep->pte_high =" or "ptep->pte_low
=".
> 
I haven''t looked for that.. I guess I should compile debug=y Xen build
again.
> So the question is -- how come we have a page which is pinned but this
> fact is not recorded in the struct page information? It might be
> interesting to know if the corresponding L3 PT is pinned. If the mm is
> active then this should always be the case and I _think_ it would be a
> bug for the L3 to be pinned but not all the which L1s which it contains.
> Can you try this patch which tries to notice this situation and prints
> some potentially interesting information, similarly on the fault it
> dumps a little more info. Since I can''t repro I''ve only
tested that it
> doesn''t break normal use, I''ve not actually seen the
debug trigger...
> 
OK. I''ll try later today.. 
> 
> 
> I''d guess that this would at least work around the issue, I doubt
it''s a
> proper fix and it''s going to shaft perf I suspect (not that
highpte
> won''t be doing a pretty good job of that ;-)).
> 
> diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
> index fefdeee..4c694e4 100644
> --- a/arch/x86/xen/mmu.c
> +++ b/arch/x86/xen/mmu.c
> @@ -1521,7 +1521,7 @@ static void *xen_kmap_atomic_pte(struct page *page,
enum km_type type)
>  {
>  	pgprot_t prot = PAGE_KERNEL;
>  
> -	if (PagePinned(page))
> +	if (1 || PagePinned(page))
>  		prot = PAGE_KERNEL_RO;
>  
>  	if (0 && PageHighMem(page))
> 
I''ll try just the first debug-patch first.. so I won''t apply
this one yet.

-- Pasi

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Ian Campbell

2009-Jun-05 11:37 UTC

head link

Re: [Xen-devel] xen.git branch reorg / success with 2.6.30-rc3 pv_ops dom0

On Fri, 2009-06-05 at 07:23 -0400, Pasi Kärkkäinen
wrote:> > Do you know which line of xen_set_pte() the fault is occurring at? I
> > assume either "ptep->pte_high =" or
"ptep->pte_low =".
> > 
> 
> I haven''t looked for that.. I guess I should compile debug=y Xen
build
> again.
xen_set_pte() is in the kernel rather than Xen so e.g. from
http://pasik.reaktio.net/xen/pv_ops-dom0-debug/pv_ops-dom0-log-01-with-highpte.txt:
[...]
EIP: 0061:[<c0405d63>] EFLAGS: 00010296 CPU: 0
[...]

Can you use gdb to find out what 0xc0405d63 is, e.g. with "list
*0xc0405d63" and/or "disas 0xc0405d63"

Trying a debug=y Xen might be interesting as well though, it does more
checks etc so perhaps we can spot something odd earlier. Also all my
repro attempts were with debug=y so it would be interesting to know what
happens for you.

Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Pasi Kärkkäinen

2009-Jun-05 13:38 UTC

head link

Re: [Xen-devel] xen.git branch reorg / success with 2.6.30-rc3 pv_ops dom0

On Fri, Jun 05, 2009 at 11:37:44AM +0000, Ian Campbell
wrote:> On Fri, 2009-06-05 at 07:23 -0400, Pasi Kärkkäinen wrote:
> > > Do you know which line of xen_set_pte() the fault is occurring
at? I
> > > assume either "ptep->pte_high =" or
"ptep->pte_low =".
> > > 
> > 
> > I haven''t looked for that.. I guess I should compile debug=y
Xen build
> > again.
> 
> xen_set_pte() is in the kernel rather than Xen so e.g. from
>
http://pasik.reaktio.net/xen/pv_ops-dom0-debug/pv_ops-dom0-log-01-with-highpte.txt:
> [...]
> EIP: 0061:[<c0405d63>] EFLAGS: 00010296 CPU: 0
> [...]
> 
> Can you use gdb to find out what 0xc0405d63 is, e.g. with "list
> *0xc0405d63" and/or "disas 0xc0405d63"
> 
(gdb) list *0xc0405d63
0xc0405d63 is in xen_set_pte (arch/x86/xen/mmu.c:683).
678             ADD_STATS(pte_update_batched, paravirt_get_lazy_mode() ==
PARAVIRT_LAZY_MMU);
679
680     #ifdef CONFIG_X86_PAE
681             ptep->pte_high = pte.pte_high;
682             smp_wmb();
683             ptep->pte_low = pte.pte_low;
684     #else
685             *ptep = pte;
686     #endif
687     }

Dump of assembler code for function xen_set_pte:
0xc0405cda <xen_set_pte+0>:     push   %ebp
0xc0405cdb <xen_set_pte+1>:     mov    %esp,%ebp
0xc0405cdd <xen_set_pte+3>:     push   %edi
0xc0405cde <xen_set_pte+4>:     push   %esi
0xc0405cdf <xen_set_pte+5>:     mov    %ecx,%esi
0xc0405ce1 <xen_set_pte+7>:     push   %ebx
0xc0405ce2 <xen_set_pte+8>:     mov    %eax,%ebx
0xc0405ce4 <xen_set_pte+10>:    mov    %edx,%eax
0xc0405ce6 <xen_set_pte+12>:    sub    $0x4,%esp
0xc0405ce9 <xen_set_pte+15>:    and    $0x400,%eax
0xc0405cee <xen_set_pte+20>:    je     0xc0405cff <check_zero>
0xc0405cf0 <xen_set_iomap_pte+0>:       mov    %ebx,%eax
0xc0405cf2 <xen_set_iomap_pte+2>:       push   $0x7ff1
0xc0405cf7 <xen_set_iomap_pte+7>:       call   0xc0405c35
<xen_set_domain_pte>
0xc0405cfc <xen_set_iomap_pte+12>:      pop    %ebx
0xc0405cfd <xen_set_iomap_pte+13>:      jmp    0xc0405d65
<xen_set_pte+139>
0xc0405cff <check_zero+0>:      cmpb   $0x0,0xc08f334c
0xc0405d06 <check_zero+7>:      je     0xc0405d1b <xen_set_pte+65>
0xc0405d08 <__constant_c_and_count_memset+0>:   mov    $0x33,%ecx
0xc0405d0d <__constant_c_and_count_memset+5>:   mov    $0xc08f3280,%edi
0xc0405d12 <__constant_c_and_count_memset+10>:  rep stos %eax,%es:(%edi)
0xc0405d14 <check_zero+21>:     movb   $0x0,0xc08f334c
0xc0405d1b <xen_set_pte+65>:    incl   0xc08f32a4
0xc0405d21 <check_zero+0>:      cmpb   $0x0,0xc08f334c
0xc0405d28 <check_zero+7>:      je     0xc0405d3f <xen_set_pte+101>
0xc0405d2a <__constant_c_and_count_memset+0>:   mov    $0x33,%ecx
0xc0405d2f <__constant_c_and_count_memset+5>:   mov    $0xc08f3280,%edi
0xc0405d34 <__constant_c_and_count_memset+10>:  xor    %eax,%eax
0xc0405d36 <__constant_c_and_count_memset+12>:  rep stos %eax,%es:(%edi)
0xc0405d38 <check_zero+23>:     movb   $0x0,0xc08f334c
0xc0405d3f <xen_set_pte+101>:   mov    0xc08f32ac,%edi
0xc0405d45 <xen_set_pte+107>:   mov    %edx,-0x10(%ebp)
0xc0405d48 <xen_set_pte+110>:   call   0xc0422f2a
<paravirt_get_lazy_mode>
0xc0405d4d <xen_set_pte+115>:   dec    %eax
0xc0405d4e <xen_set_pte+116>:   sete   %al
0xc0405d51 <xen_set_pte+119>:   movzbl %al,%eax
0xc0405d54 <xen_set_pte+122>:   lea    (%eax,%edi,1),%edi
0xc0405d57 <xen_set_pte+125>:   mov    %edi,0xc08f32ac
0xc0405d5d <xen_set_pte+131>:   mov    %esi,0x4(%ebx)
0xc0405d60 <xen_set_pte+134>:   mov    -0x10(%ebp),%edx
0xc0405d63 <xen_set_pte+137>:   mov    %edx,(%ebx)
0xc0405d65 <xen_set_pte+139>:   lea    -0xc(%ebp),%esp
0xc0405d68 <xen_set_pte+142>:   pop    %ebx
0xc0405d69 <xen_set_pte+143>:   pop    %esi
0xc0405d6a <xen_set_pte+144>:   pop    %edi
0xc0405d6b <xen_set_pte+145>:   pop    %ebp
0xc0405d6c <xen_set_pte+146>:   ret
End of assembler dump.
(gdb) 

> Trying a debug=y Xen might be interesting as well though, it does more
> checks etc so perhaps we can spot something odd earlier. Also all my
> repro attempts were with debug=y so it would be interesting to know what
> happens for you.
> 
I''ll build debug=y Xen hypervisor, and also new CONFIG_HIGHPTE=y kernel
with
the debug patch you sent. 

Btw. I just realized you said earlier that you tested with dom0_mem=512M ..
that doesn''t give you any highmem.. ? Maybe that''s why you
aren''t seeing the
problem..

I have dom0_mem=1024M

-- Pasi

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Ian Campbell

2009-Jun-05 13:52 UTC

head link

Re: [Xen-devel] xen.git branch reorg / success with 2.6.30-rc3 pv_ops dom0

On Fri, 2009-06-05 at 09:38 -0400, Pasi Kärkkäinen
wrote:> (gdb) list *0xc0405d63
> 0xc0405d63 is in xen_set_pte (arch/x86/xen/mmu.c:683).
> 678             ADD_STATS(pte_update_batched, paravirt_get_lazy_mode() ==
PARAVIRT_LAZY_MMU);
> 679
> 680     #ifdef CONFIG_X86_PAE
> 681             ptep->pte_high = pte.pte_high;
> 682             smp_wmb();
> 683             ptep->pte_low = pte.pte_low;
> 684     #else
> 685             *ptep = pte;
> 686     #endif
> 687     }
Good that makes most sense. 
> Btw. I just realized you said earlier that you tested with dom0_mem=512M ..
> that doesn''t give you any highmem.. ? Maybe that''s why
you aren''t seeing the
> problem..
> 
> I have dom0_mem=1024M
I''m pretty sure I also tried larger amounts, both dom0_mem=1024M and
the
default which is ALL-128M or something. I''ll try again to make sure
though.

Thanks,
Ian.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Pasi Kärkkäinen

2009-Jun-05 15:41 UTC

head link

Re: [Xen-devel] xen.git branch reorg / success with 2.6.30-rc3 pv_ops dom0

On Fri, Jun 05, 2009 at 02:52:59PM +0100, Ian Campbell
wrote:> On Fri, 2009-06-05 at 09:38 -0400, Pasi Kärkkäinen wrote:
> > (gdb) list *0xc0405d63
> > 0xc0405d63 is in xen_set_pte (arch/x86/xen/mmu.c:683).
> > 678             ADD_STATS(pte_update_batched, paravirt_get_lazy_mode()
== PARAVIRT_LAZY_MMU);
> > 679
> > 680     #ifdef CONFIG_X86_PAE
> > 681             ptep->pte_high = pte.pte_high;
> > 682             smp_wmb();
> > 683             ptep->pte_low = pte.pte_low;
> > 684     #else
> > 685             *ptep = pte;
> > 686     #endif
> > 687     }
> 
> Good that makes most sense. 
> 
I rebuilt my Fedora 11 Xen 3.3.1-11 src.rpm with "debug=y verbose=y
crash_debug=y".

And I rebuilt my pv_ops dom0 kernel (CONFIG_HIGHPTE=y) with your
debugging patch applied. (Some hunks to swiotlb.h failed, because the code
was already there.. with different newlines or so).

Serial console log:
http://pasik.reaktio.net/xen/pv_ops-dom0-debug/pv_ops-dom0-log-03-with-highpte-no-swap-with-debug.txt

(XEN) mm.c:2006:d0 Bad type (saw 28000001 != exp e0000000) for mfn 683f4 (pfn
29a0b)
(XEN) mm.c:707:d0 Error getting mfn 683f4 (pfn 29a0b) from L1 entry
00000000683f4063 for dom0
(XEN) mm.c:3640:d0 ptwr_emulate: could not get_page_from_l1e()
BUG: unable to handle kernel paging request at c0207c80
IP: [<c0405d63>] xen_set_pte+0x89/0x93
*pdpt = 000000003c8ef001 
Fixmap KM_PTE0 @ 0xf57f0000
*pdpt = 000000003c8ef001 
Fixmap KM_PTE0 @ 0xf57ee000
*pdpt = 000000003c8ef001 
Oops: 0003 [#1] SMP 


(gdb) list *0xc0405d63
0xc0405d63 is in xen_set_pte (arch/x86/xen/mmu.c:683).
678             ADD_STATS(pte_update_batched, paravirt_get_lazy_mode() ==
PARAVIRT_LAZY_MMU);
679
680     #ifdef CONFIG_X86_PAE
681             ptep->pte_high = pte.pte_high;
682             smp_wmb();
683             ptep->pte_low = pte.pte_low;
684     #else
685             *ptep = pte;
686     #endif
687     }


(gdb) disas 0xc0405d63
Dump of assembler code for function xen_set_pte:
0xc0405cda <xen_set_pte+0>:     push   %ebp
0xc0405cdb <xen_set_pte+1>:     mov    %esp,%ebp
0xc0405cdd <xen_set_pte+3>:     push   %edi
0xc0405cde <xen_set_pte+4>:     push   %esi
0xc0405cdf <xen_set_pte+5>:     mov    %ecx,%esi
0xc0405ce1 <xen_set_pte+7>:     push   %ebx
0xc0405ce2 <xen_set_pte+8>:     mov    %eax,%ebx
0xc0405ce4 <xen_set_pte+10>:    mov    %edx,%eax
0xc0405ce6 <xen_set_pte+12>:    sub    $0x4,%esp
0xc0405ce9 <xen_set_pte+15>:    and    $0x400,%eax
0xc0405cee <xen_set_pte+20>:    je     0xc0405cff <check_zero>
0xc0405cf0 <xen_set_iomap_pte+0>:       mov    %ebx,%eax
0xc0405cf2 <xen_set_iomap_pte+2>:       push   $0x7ff1
0xc0405cf7 <xen_set_iomap_pte+7>:       call   0xc0405c35
<xen_set_domain_pte>
0xc0405cfc <xen_set_iomap_pte+12>:      pop    %ebx
0xc0405cfd <xen_set_iomap_pte+13>:      jmp    0xc0405d65
<xen_set_pte+139>
0xc0405cff <check_zero+0>:      cmpb   $0x0,0xc08f334c
0xc0405d06 <check_zero+7>:      je     0xc0405d1b <xen_set_pte+65>
0xc0405d08 <__constant_c_and_count_memset+0>:   mov    $0x33,%ecx
0xc0405d0d <__constant_c_and_count_memset+5>:   mov    $0xc08f3280,%edi
0xc0405d12 <__constant_c_and_count_memset+10>:  rep stos %eax,%es:(%edi)
0xc0405d14 <check_zero+21>:     movb   $0x0,0xc08f334c
0xc0405d1b <xen_set_pte+65>:    incl   0xc08f32a4
0xc0405d21 <check_zero+0>:      cmpb   $0x0,0xc08f334c
0xc0405d28 <check_zero+7>:      je     0xc0405d3f <xen_set_pte+101>
0xc0405d2a <__constant_c_and_count_memset+0>:   mov    $0x33,%ecx
0xc0405d2f <__constant_c_and_count_memset+5>:   mov    $0xc08f3280,%edi
0xc0405d34 <__constant_c_and_count_memset+10>:  xor    %eax,%eax
0xc0405d36 <__constant_c_and_count_memset+12>:  rep stos %eax,%es:(%edi)
0xc0405d38 <check_zero+23>:     movb   $0x0,0xc08f334c
0xc0405d3f <xen_set_pte+101>:   mov    0xc08f32ac,%edi
0xc0405d45 <xen_set_pte+107>:   mov    %edx,-0x10(%ebp)
0xc0405d48 <xen_set_pte+110>:   call   0xc0422f2a
<paravirt_get_lazy_mode>
0xc0405d4d <xen_set_pte+115>:   dec    %eax
0xc0405d4e <xen_set_pte+116>:   sete   %al
0xc0405d51 <xen_set_pte+119>:   movzbl %al,%eax
0xc0405d54 <xen_set_pte+122>:   lea    (%eax,%edi,1),%edi
0xc0405d57 <xen_set_pte+125>:   mov    %edi,0xc08f32ac
0xc0405d5d <xen_set_pte+131>:   mov    %esi,0x4(%ebx)
0xc0405d60 <xen_set_pte+134>:   mov    -0x10(%ebp),%edx
0xc0405d63 <xen_set_pte+137>:   mov    %edx,(%ebx)
0xc0405d65 <xen_set_pte+139>:   lea    -0xc(%ebp),%esp
0xc0405d68 <xen_set_pte+142>:   pop    %ebx
0xc0405d69 <xen_set_pte+143>:   pop    %esi
0xc0405d6a <xen_set_pte+144>:   pop    %edi
0xc0405d6b <xen_set_pte+145>:   pop    %ebp
0xc0405d6c <xen_set_pte+146>:   ret    
End of assembler dump.
(gdb) 

-- Pasi


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Ian Campbell

2009-Jun-05 16:05 UTC

head link

Re: [Xen-devel] xen.git branch reorg / success with 2.6.30-rc3 pv_ops dom0

On Fri, 2009-06-05 at 11:41 -0400, Pasi Kärkkäinen
wrote:> On Fri, Jun 05, 2009 at 02:52:59PM +0100, Ian Campbell wrote:
> > On Fri, 2009-06-05 at 09:38 -0400, Pasi Kärkkäinen wrote:
> > > (gdb) list *0xc0405d63
> > > 0xc0405d63 is in xen_set_pte (arch/x86/xen/mmu.c:683).
> > > 678             ADD_STATS(pte_update_batched,
paravirt_get_lazy_mode() == PARAVIRT_LAZY_MMU);
> > > 679
> > > 680     #ifdef CONFIG_X86_PAE
> > > 681             ptep->pte_high = pte.pte_high;
> > > 682             smp_wmb();
> > > 683             ptep->pte_low = pte.pte_low;
> > > 684     #else
> > > 685             *ptep = pte;
> > > 686     #endif
> > > 687     }
> > 
> > Good that makes most sense. 
> > 
> 
> I rebuilt my Fedora 11 Xen 3.3.1-11 src.rpm with "debug=y verbose=y
crash_debug=y".
> 
> And I rebuilt my pv_ops dom0 kernel (CONFIG_HIGHPTE=y) with your
> debugging patch applied. (Some hunks to swiotlb.h failed, because the code
> was already there.. with different newlines or so).
I think I included those changes to swiotlb.h by mistake anyway.
> Serial console log:
>
http://pasik.reaktio.net/xen/pv_ops-dom0-debug/pv_ops-dom0-log-03-with-highpte-no-swap-with-debug.txt
> 
> (XEN) mm.c:2006:d0 Bad type (saw 28000001 != exp e0000000) for mfn 683f4
(pfn 29a0b)
> (XEN) mm.c:707:d0 Error getting mfn 683f4 (pfn 29a0b) from L1 entry
00000000683f4063 for dom0
> (XEN) mm.c:3640:d0 ptwr_emulate: could not get_page_from_l1e()
> BUG: unable to handle kernel paging request at c0207c80
> IP: [<c0405d63>] xen_set_pte+0x89/0x93
> *pdpt = 000000003c8ef001 
> Fixmap KM_PTE0 @ 0xf57f0000
> *pdpt = 000000003c8ef001 
> Fixmap KM_PTE0 @ 0xf57ee000
> *pdpt = 000000003c8ef001 
> Oops: 0003 [#1] SMP 
Hmm, this isn''t too useful because dump_pagetable() doesn''t
work for Xen
guests -- it goes direct at the pagetables instead of going via the
normal accessors so it misses the MFN<->PFN translations.

I had some patches to unify the 32 and 64 bit versions of dump page
table at one point, since the 64 bit version does the right thing. I''ll
see if I can find or reproduce them.

Ian.
> 
> 
> (gdb) list *0xc0405d63
> 0xc0405d63 is in xen_set_pte (arch/x86/xen/mmu.c:683).
> 678             ADD_STATS(pte_update_batched, paravirt_get_lazy_mode() ==
PARAVIRT_LAZY_MMU);
> 679
> 680     #ifdef CONFIG_X86_PAE
> 681             ptep->pte_high = pte.pte_high;
> 682             smp_wmb();
> 683             ptep->pte_low = pte.pte_low;
> 684     #else
> 685             *ptep = pte;
> 686     #endif
> 687     }
> 
> 
> (gdb) disas 0xc0405d63
> Dump of assembler code for function xen_set_pte:
> 0xc0405cda <xen_set_pte+0>:     push   %ebp
> 0xc0405cdb <xen_set_pte+1>:     mov    %esp,%ebp
> 0xc0405cdd <xen_set_pte+3>:     push   %edi
> 0xc0405cde <xen_set_pte+4>:     push   %esi
> 0xc0405cdf <xen_set_pte+5>:     mov    %ecx,%esi
> 0xc0405ce1 <xen_set_pte+7>:     push   %ebx
> 0xc0405ce2 <xen_set_pte+8>:     mov    %eax,%ebx
> 0xc0405ce4 <xen_set_pte+10>:    mov    %edx,%eax
> 0xc0405ce6 <xen_set_pte+12>:    sub    $0x4,%esp
> 0xc0405ce9 <xen_set_pte+15>:    and    $0x400,%eax
> 0xc0405cee <xen_set_pte+20>:    je     0xc0405cff <check_zero>
> 0xc0405cf0 <xen_set_iomap_pte+0>:       mov    %ebx,%eax
> 0xc0405cf2 <xen_set_iomap_pte+2>:       push   $0x7ff1
> 0xc0405cf7 <xen_set_iomap_pte+7>:       call   0xc0405c35
<xen_set_domain_pte>
> 0xc0405cfc <xen_set_iomap_pte+12>:      pop    %ebx
> 0xc0405cfd <xen_set_iomap_pte+13>:      jmp    0xc0405d65
<xen_set_pte+139>
> 0xc0405cff <check_zero+0>:      cmpb   $0x0,0xc08f334c
> 0xc0405d06 <check_zero+7>:      je     0xc0405d1b
<xen_set_pte+65>
> 0xc0405d08 <__constant_c_and_count_memset+0>:   mov    $0x33,%ecx
> 0xc0405d0d <__constant_c_and_count_memset+5>:   mov   
$0xc08f3280,%edi
> 0xc0405d12 <__constant_c_and_count_memset+10>:  rep stos
%eax,%es:(%edi)
> 0xc0405d14 <check_zero+21>:     movb   $0x0,0xc08f334c
> 0xc0405d1b <xen_set_pte+65>:    incl   0xc08f32a4
> 0xc0405d21 <check_zero+0>:      cmpb   $0x0,0xc08f334c
> 0xc0405d28 <check_zero+7>:      je     0xc0405d3f
<xen_set_pte+101>
> 0xc0405d2a <__constant_c_and_count_memset+0>:   mov    $0x33,%ecx
> 0xc0405d2f <__constant_c_and_count_memset+5>:   mov   
$0xc08f3280,%edi
> 0xc0405d34 <__constant_c_and_count_memset+10>:  xor    %eax,%eax
> 0xc0405d36 <__constant_c_and_count_memset+12>:  rep stos
%eax,%es:(%edi)
> 0xc0405d38 <check_zero+23>:     movb   $0x0,0xc08f334c
> 0xc0405d3f <xen_set_pte+101>:   mov    0xc08f32ac,%edi
> 0xc0405d45 <xen_set_pte+107>:   mov    %edx,-0x10(%ebp)
> 0xc0405d48 <xen_set_pte+110>:   call   0xc0422f2a
<paravirt_get_lazy_mode>
> 0xc0405d4d <xen_set_pte+115>:   dec    %eax
> 0xc0405d4e <xen_set_pte+116>:   sete   %al
> 0xc0405d51 <xen_set_pte+119>:   movzbl %al,%eax
> 0xc0405d54 <xen_set_pte+122>:   lea    (%eax,%edi,1),%edi
> 0xc0405d57 <xen_set_pte+125>:   mov    %edi,0xc08f32ac
> 0xc0405d5d <xen_set_pte+131>:   mov    %esi,0x4(%ebx)
> 0xc0405d60 <xen_set_pte+134>:   mov    -0x10(%ebp),%edx
> 0xc0405d63 <xen_set_pte+137>:   mov    %edx,(%ebx)
> 0xc0405d65 <xen_set_pte+139>:   lea    -0xc(%ebp),%esp
> 0xc0405d68 <xen_set_pte+142>:   pop    %ebx
> 0xc0405d69 <xen_set_pte+143>:   pop    %esi
> 0xc0405d6a <xen_set_pte+144>:   pop    %edi
> 0xc0405d6b <xen_set_pte+145>:   pop    %ebp
> 0xc0405d6c <xen_set_pte+146>:   ret    
> End of assembler dump.
> (gdb) 
> 
> -- Pasi
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Ian Campbell

2009-Jun-05 16:12 UTC

head link

Re: [Xen-devel] xen.git branch reorg / success with 2.6.30-rc3 pv_ops dom0

On Fri, 2009-06-05 at 12:05 -0400, Ian Campbell wrote:> 
> I had some patches to unify the 32 and 64 bit versions of dump page
> table at one point, since the 64 bit version does the right thing.
> I''ll see if I can find or reproduce them.
Couldn''t find them but please try this:

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index abe8e4b..e455d56 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -285,46 +285,12 @@ check_v8086_mode(struct pt_regs *regs, unsigned long
address,
 		tsk->thread.screen_bitmap |= 1 << bit;
 }
 
-static void dump_pagetable(unsigned long address)
-{
-	__typeof__(pte_val(__pte(0))) page;
-
-	page = read_cr3();
-	page = ((__typeof__(page) *) __va(page))[address >> PGDIR_SHIFT];
-
 #ifdef CONFIG_X86_PAE
-	printk("*pdpt = %016Lx ", page);
-	if ((page >> PAGE_SHIFT) < max_low_pfn
-	    && page & _PAGE_PRESENT) {
-		page &= PAGE_MASK;
-		page = ((__typeof__(page) *) __va(page))[(address >> PMD_SHIFT)
-							& (PTRS_PER_PMD - 1)];
-		printk(KERN_CONT "*pde = %016Lx ", page);
-		page &= ~_PAGE_NX;
-	}
+#define FMTPTE "ll"
 #else
-	printk("*pde = %08lx ", page);
+#define FMTPTE "l"
 #endif
 
-	/*
-	 * We must not directly access the pte in the highpte
-	 * case if the page table is located in highmem.
-	 * And let''s rather not kmap-atomic the pte, just in case
-	 * it''s allocated already:
-	 */
-	if ((page >> PAGE_SHIFT) < max_low_pfn
-	    && (page & _PAGE_PRESENT)
-	    && !(page & _PAGE_PSE)) {
-
-		page &= PAGE_MASK;
-		page = ((__typeof__(page) *) __va(page))[(address >> PAGE_SHIFT)
-							& (PTRS_PER_PTE - 1)];
-		printk("*pte = %0*Lx ", sizeof(page)*2, (u64)page);
-	}
-
-	printk("\n");
-}
-
 #else /* CONFIG_X86_64: */
 
 void vmalloc_sync_all(void)
@@ -440,6 +406,10 @@ check_v8086_mode(struct pt_regs *regs, unsigned long
address,
 {
 }
 
+#define FMTPTE "ll"
+
+#endif /* CONFIG_X86_64 */
+
 static int bad_address(void *p)
 {
 	unsigned long dummy;
@@ -447,7 +417,7 @@ static int bad_address(void *p)
 	return probe_kernel_address((unsigned long *)p, dummy);
 }
 
-static void dump_pagetable(unsigned long address)
+void dump_pagetable(unsigned long address)
 {
 	pgd_t *pgd;
 	pud_t *pud;
@@ -462,7 +432,7 @@ static void dump_pagetable(unsigned long address)
 	if (bad_address(pgd))
 		goto bad;
 
-	printk("PGD %lx ", pgd_val(*pgd));
+	printk("PGD %"FMTPTE"x ", pgd_val(*pgd));
 
 	if (!pgd_present(*pgd))
 		goto out;
@@ -471,7 +441,7 @@ static void dump_pagetable(unsigned long address)
 	if (bad_address(pud))
 		goto bad;
 
-	printk("PUD %lx ", pud_val(*pud));
+	printk("PUD %"FMTPTE"x ", pud_val(*pud));
 	if (!pud_present(*pud) || pud_large(*pud))
 		goto out;
 
@@ -479,7 +449,7 @@ static void dump_pagetable(unsigned long address)
 	if (bad_address(pmd))
 		goto bad;
 
-	printk("PMD %lx ", pmd_val(*pmd));
+	printk("PMD %"FMTPTE"x ", pmd_val(*pmd));
 	if (!pmd_present(*pmd) || pmd_large(*pmd))
 		goto out;
 
@@ -487,7 +457,7 @@ static void dump_pagetable(unsigned long address)
 	if (bad_address(pte))
 		goto bad;
 
-	printk("PTE %lx", pte_val(*pte));
+	printk("PTE %"FMTPTE"x", pte_val(*pte));
 out:
 	printk("\n");
 	return;
@@ -495,8 +465,6 @@ bad:
 	printk("BAD\n");
 }
 
-#endif /* CONFIG_X86_64 */
-
 /*
  * Workaround for K8 erratum #93 & buggy BIOS.
  *
@@ -603,6 +571,10 @@ show_fault_oops(struct pt_regs *regs, unsigned long
error_code,
 	printk_address(regs->ip, 1);
 
 	dump_pagetable(address);
+	printk(KERN_CRIT "Fixmap KM_PTE0 @ %#lx\n", fix_to_virt(KM_PTE0));
+	dump_pagetable(fix_to_virt(KM_PTE0));
+	printk(KERN_CRIT "Fixmap KM_PTE1 @ %#lx\n", fix_to_virt(KM_PTE1));
+	dump_pagetable(fix_to_virt(KM_PTE1));
 }
 
 static noinline void
diff --git a/init/main.c b/init/main.c
index 33ce929..fee067e 100644
--- a/init/main.c
+++ b/init/main.c
@@ -807,6 +807,7 @@ static void run_init_process(char *init_filename)
 static noinline int init_post(void)
 	__releases(kernel_lock)
 {
+	extern void dump_pagetable(unsigned long address);
 	/* need to finish all async __init code before freeing the memory */
 	async_synchronize_full();
 	free_initmem();
@@ -815,6 +816,9 @@ static noinline int init_post(void)
 	system_state = SYSTEM_RUNNING;
 	numa_default_policy();
 
+	printk(KERN_CRIT "test dump_pagetable on %#lx\n", (unsigned
long)__builtin_return_address(0));
+	dump_pagetable((unsigned long)__builtin_return_address(0));
+
 	if (sys_open((const char __user *) "/dev/console", O_RDWR, 0) <
0)
 		printk(KERN_WARNING "Warning: unable to open an initial
console.\n");
 
diff --git a/mm/rmap.c b/mm/rmap.c
index 1652166..ae5d5a0 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -267,6 +267,7 @@ unsigned long page_address_in_vma(struct page *page, struct
vm_area_struct *vma)
 pte_t *page_check_address(struct page *page, struct mm_struct *mm,
 			  unsigned long address, spinlock_t **ptlp, int sync)
 {
+	struct page *pgd_page, *pte_page;
 	pgd_t *pgd;
 	pud_t *pud;
 	pmd_t *pmd;
@@ -285,6 +286,22 @@ pte_t *page_check_address(struct page *page, struct
mm_struct *mm,
 	if (!pmd_present(*pmd))
 		return NULL;
 
+	pgd_page = virt_to_page(mm->pgd);
+	pte_page = pmd_page(*pmd);
+
+	if (PagePinned(pgd_page) != PagePinned(pte_page)) {
+		extern void dump_pagetable(unsigned long address);
+		printk(KERN_CRIT "L4 at %p is %s contains L2 at %p which points at an L1
which is %s %s\n",
+		       pgd, PagePinned(pgd_page) ? "pinned" : "unpinned",
+		       pmd, PagePinned(pte_page) ? "pinned" : "unpinned",
+		       PageHighMem(pte_page) ? "highmem" : "lowmem");
+		printk(KERN_CRIT "address %#lx\n", address);
+		dump_pagetable(address);
+		printk(KERN_CRIT "Fixmap KM_PTE0 @ %#lx\n", fix_to_virt(KM_PTE0));
+		dump_pagetable(fix_to_virt(KM_PTE0));
+		printk(KERN_CRIT "Fixmap KM_PTE0 @ %#lx\n", fix_to_virt(KM_PTE1));
+		dump_pagetable(fix_to_virt(KM_PTE1));
+	}
 	pte = pte_offset_map(pmd, address);
 	/* Make a quick check before getting the lock */
 	if (!sync && !pte_present(*pte)) {



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Pasi Kärkkäinen

2009-Jun-05 18:19 UTC

head link

Re: [Xen-devel] xen.git branch reorg / success with 2.6.30-rc3 pv_ops dom0

On Fri, Jun 05, 2009 at 05:12:33PM +0100, Ian Campbell
wrote:> On Fri, 2009-06-05 at 12:05 -0400, Ian Campbell wrote:
> > 
> > I had some patches to unify the 32 and 64 bit versions of dump page
> > table at one point, since the 64 bit version does the right thing.
> > I''ll see if I can find or reproduce them.
> 
> Couldn''t find them but please try this:
> 
I had some problems applying the patch until I figured out it was supposed
to be applied to a clean tree.. hopefully "git checkout file" restores
(or
resets) the file to it''s original form and removes any local changes.

Here goes again:
http://pasik.reaktio.net/xen/pv_ops-dom0-debug/pv_ops-dom0-log-04-with-highpte-no-swap-with-debug2.txt


L4 at e1822000 is pinned contains L2 at e1977228 which points at an L1 which is
unpinned low mem address 0x8bf8000
Fixmap KM_PTE0 @ 0xf57f0000
Fixmap KM_PTE0 @ 0xf57ee000
(XEN) mm.c:2006:d0 Bad type (saw 28000001 != exp e0000000) for mfn 62991 (pfn
2406e)
(XEN) mm.c:707:d0 Error getting mfn 62991 (pfn 2406e) from L1 entry
0000000062991063 for dom0
(XEN) mm.c:3640:d0 ptwr_emulate: could not get_page_from_l1e()
BUG: unable to handle kernel paging request at c0207d58
IP: [<c0405d63>] xen_set_pte+0x89/0x93
PGD 8ef001 PUD 8ef001 PMD 1268067 PTE 207061
Fixmap KM_PTE0 @ 0xf57f0000
PGD 8ef001 PUD 8ef001 PMD 207067 PTE 0
Fixmap KM_PTE1 @ 0xf57ee000
PGD 8ef001 PUD 8ef001 PMD 207067 PTE 0
Oops: 0003 [#1] SMP 


list *0x(gdb) list *0xc0405d63
0xc0405d63 is in xen_set_pte (arch/x86/xen/mmu.c:683).
678             ADD_STATS(pte_update_batched, paravirt_get_lazy_mode() ==
PARAVIRT_LAZY_MMU);
679
680     #ifdef CONFIG_X86_PAE
681             ptep->pte_high = pte.pte_high;
682             smp_wmb();
683             ptep->pte_low = pte.pte_low;
684     #else
685             *ptep = pte;
686     #endif
687     }


(gdb) disas 0xc0405d63
Dump of assembler code for function xen_set_pte:
0xc0405cda <xen_set_pte+0>:     push   %ebp
0xc0405cdb <xen_set_pte+1>:     mov    %esp,%ebp
0xc0405cdd <xen_set_pte+3>:     push   %edi
0xc0405cde <xen_set_pte+4>:     push   %esi
0xc0405cdf <xen_set_pte+5>:     mov    %ecx,%esi
0xc0405ce1 <xen_set_pte+7>:     push   %ebx
0xc0405ce2 <xen_set_pte+8>:     mov    %eax,%ebx
0xc0405ce4 <xen_set_pte+10>:    mov    %edx,%eax
0xc0405ce6 <xen_set_pte+12>:    sub    $0x4,%esp
0xc0405ce9 <xen_set_pte+15>:    and    $0x400,%eax
0xc0405cee <xen_set_pte+20>:    je     0xc0405cff <check_zero>
0xc0405cf0 <xen_set_iomap_pte+0>:       mov    %ebx,%eax
0xc0405cf2 <xen_set_iomap_pte+2>:       push   $0x7ff1
0xc0405cf7 <xen_set_iomap_pte+7>:       call   0xc0405c35
<xen_set_domain_pte>
0xc0405cfc <xen_set_iomap_pte+12>:      pop    %ebx
0xc0405cfd <xen_set_iomap_pte+13>:      jmp    0xc0405d65
<xen_set_pte+139>
0xc0405cff <check_zero+0>:      cmpb   $0x0,0xc08f334c
0xc0405d06 <check_zero+7>:      je     0xc0405d1b <xen_set_pte+65>
0xc0405d08 <__constant_c_and_count_memset+0>:   mov    $0x33,%ecx
0xc0405d0d <__constant_c_and_count_memset+5>:   mov    $0xc08f3280,%edi
0xc0405d12 <__constant_c_and_count_memset+10>:  rep stos %eax,%es:(%edi)
0xc0405d14 <check_zero+21>:     movb   $0x0,0xc08f334c
0xc0405d1b <xen_set_pte+65>:    incl   0xc08f32a4
0xc0405d21 <check_zero+0>:      cmpb   $0x0,0xc08f334c
0xc0405d28 <check_zero+7>:      je     0xc0405d3f <xen_set_pte+101>
0xc0405d2a <__constant_c_and_count_memset+0>:   mov    $0x33,%ecx
0xc0405d2f <__constant_c_and_count_memset+5>:   mov    $0xc08f3280,%edi
0xc0405d34 <__constant_c_and_count_memset+10>:  xor    %eax,%eax
0xc0405d36 <__constant_c_and_count_memset+12>:  rep stos %eax,%es:(%edi)
0xc0405d38 <check_zero+23>:     movb   $0x0,0xc08f334c
0xc0405d3f <xen_set_pte+101>:   mov    0xc08f32ac,%edi
0xc0405d45 <xen_set_pte+107>:   mov    %edx,-0x10(%ebp)
0xc0405d48 <xen_set_pte+110>:   call   0xc0422f2a
<paravirt_get_lazy_mode>
0xc0405d4d <xen_set_pte+115>:   dec    %eax
0xc0405d4e <xen_set_pte+116>:   sete   %al
0xc0405d51 <xen_set_pte+119>:   movzbl %al,%eax
0xc0405d54 <xen_set_pte+122>:   lea    (%eax,%edi,1),%edi
0xc0405d57 <xen_set_pte+125>:   mov    %edi,0xc08f32ac
0xc0405d5d <xen_set_pte+131>:   mov    %esi,0x4(%ebx)
0xc0405d60 <xen_set_pte+134>:   mov    -0x10(%ebp),%edx
0xc0405d63 <xen_set_pte+137>:   mov    %edx,(%ebx)
0xc0405d65 <xen_set_pte+139>:   lea    -0xc(%ebp),%esp
0xc0405d68 <xen_set_pte+142>:   pop    %ebx
0xc0405d69 <xen_set_pte+143>:   pop    %esi
0xc0405d6a <xen_set_pte+144>:   pop    %edi
0xc0405d6b <xen_set_pte+145>:   pop    %ebp
0xc0405d6c <xen_set_pte+146>:   ret    
End of assembler dump.
(gdb) 


-- Pasi



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Ian Campbell

2009-Jun-08 15:45 UTC

head link

Re: [Xen-devel] xen.git branch reorg / success with 2.6.30-rc3 pv_ops dom0

On Fri, 2009-06-05 at 14:19 -0400, Pasi Kärkkäinen
wrote:> On Fri, Jun 05, 2009 at 05:12:33PM +0100, Ian Campbell wrote:
> > On Fri, 2009-06-05 at 12:05 -0400, Ian Campbell wrote:
> > > 
> > > I had some patches to unify the 32 and 64 bit versions of dump
> page
> > > table at one point, since the 64 bit version does the right
thing.
> > > I''ll see if I can find or reproduce them.
> > 
> > Couldn''t find them but please try this:
> > 
> 
> I had some problems applying the patch until I figured out it was
> supposed
> to be applied to a clean tree.. hopefully "git checkout file"
restores
> (or
> resets) the file to it''s original form and removes any local
changes.
Should work I guess, I usually use "git reset --hard" to undo any
local
mods.
> 
> Here goes again:
>
http://pasik.reaktio.net/xen/pv_ops-dom0-debug/pv_ops-dom0-log-04-with-highpte-no-swap-with-debug2.txt
> 
> 
> L4 at e1822000 is pinned contains L2 at e1977228 which points at an L1
> which is unpinned low mem address 0x8bf8000
OK so I think that is interesting. A pinned L4 referencing an unpinned
L1 isn''t supposed to happen, I don''t think (Jeremy?).

The patch at the end (applies to a clean tree again) walks the lowmem
region of every L4 to ensure that every L1 page is pinned just before
pinning the L4. I hope this will catch the L1 in the act.
> PGD 8ef001 PUD 8ef001 PMD 1268067 PTE 207061
This just tells us that the PT which maps the PTE we were trying to
write is mapped R/O, which is not as interesting as I thought it would
be.
> Fixmap KM_PTE0 @ 0xf57f0000
> PGD 8ef001 PUD 8ef001 PMD 207067 PTE 0
> Fixmap KM_PTE1 @ 0xf57ee000
> PGD 8ef001 PUD 8ef001 PMD 207067 PTE 0
So these guys are not at fault, although we are in the middle of filling
in KM_PTE0, I think.

I''ve just had another go reproing this with a xen-3.3-testing.hg
hypervisor (both 32 and 64 bit) with a 32 bit kernel and dom0_mem=1024M.
No luck...

Ian.

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index f9b252c..538590a 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -285,46 +285,12 @@ check_v8086_mode(struct pt_regs *regs, unsigned long
address,
 		tsk->thread.screen_bitmap |= 1 << bit;
 }
 
-static void dump_pagetable(unsigned long address)
-{
-	__typeof__(pte_val(__pte(0))) page;
-
-	page = read_cr3();
-	page = ((__typeof__(page) *) __va(page))[address >> PGDIR_SHIFT];
-
 #ifdef CONFIG_X86_PAE
-	printk("*pdpt = %016Lx ", page);
-	if ((page >> PAGE_SHIFT) < max_low_pfn
-	    && page & _PAGE_PRESENT) {
-		page &= PAGE_MASK;
-		page = ((__typeof__(page) *) __va(page))[(address >> PMD_SHIFT)
-							& (PTRS_PER_PMD - 1)];
-		printk(KERN_CONT "*pde = %016Lx ", page);
-		page &= ~_PAGE_NX;
-	}
+#define FMTPTE "ll"
 #else
-	printk("*pde = %08lx ", page);
+#define FMTPTE "l"
 #endif
 
-	/*
-	 * We must not directly access the pte in the highpte
-	 * case if the page table is located in highmem.
-	 * And let''s rather not kmap-atomic the pte, just in case
-	 * it''s allocated already:
-	 */
-	if ((page >> PAGE_SHIFT) < max_low_pfn
-	    && (page & _PAGE_PRESENT)
-	    && !(page & _PAGE_PSE)) {
-
-		page &= PAGE_MASK;
-		page = ((__typeof__(page) *) __va(page))[(address >> PAGE_SHIFT)
-							& (PTRS_PER_PTE - 1)];
-		printk("*pte = %0*Lx ", sizeof(page)*2, (u64)page);
-	}
-
-	printk("\n");
-}
-
 #else /* CONFIG_X86_64: */
 
 void vmalloc_sync_all(void)
@@ -440,6 +406,10 @@ check_v8086_mode(struct pt_regs *regs, unsigned long
address,
 {
 }
 
+#define FMTPTE "ll"
+
+#endif /* CONFIG_X86_64 */
+
 static int bad_address(void *p)
 {
 	unsigned long dummy;
@@ -447,7 +417,7 @@ static int bad_address(void *p)
 	return probe_kernel_address((unsigned long *)p, dummy);
 }
 
-static void dump_pagetable(unsigned long address)
+void dump_pagetable(unsigned long address)
 {
 	pgd_t *pgd;
 	pud_t *pud;
@@ -462,7 +432,7 @@ static void dump_pagetable(unsigned long address)
 	if (bad_address(pgd))
 		goto bad;
 
-	printk("PGD %lx ", pgd_val(*pgd));
+	printk("PGD %"FMTPTE"x ", pgd_val(*pgd));
 
 	if (!pgd_present(*pgd))
 		goto out;
@@ -471,7 +441,7 @@ static void dump_pagetable(unsigned long address)
 	if (bad_address(pud))
 		goto bad;
 
-	printk("PUD %lx ", pud_val(*pud));
+	printk("PUD %"FMTPTE"x ", pud_val(*pud));
 	if (!pud_present(*pud) || pud_large(*pud))
 		goto out;
 
@@ -479,7 +449,7 @@ static void dump_pagetable(unsigned long address)
 	if (bad_address(pmd))
 		goto bad;
 
-	printk("PMD %lx ", pmd_val(*pmd));
+	printk("PMD %"FMTPTE"x ", pmd_val(*pmd));
 	if (!pmd_present(*pmd) || pmd_large(*pmd))
 		goto out;
 
@@ -487,7 +457,7 @@ static void dump_pagetable(unsigned long address)
 	if (bad_address(pte))
 		goto bad;
 
-	printk("PTE %lx", pte_val(*pte));
+	printk("PTE %"FMTPTE"x", pte_val(*pte));
 out:
 	printk("\n");
 	return;
@@ -495,8 +465,6 @@ bad:
 	printk("BAD\n");
 }
 
-#endif /* CONFIG_X86_64 */
-
 /*
  * Workaround for K8 erratum #93 & buggy BIOS.
  *
@@ -598,6 +566,10 @@ show_fault_oops(struct pt_regs *regs, unsigned long
error_code,
 	printk_address(regs->ip, 1);
 
 	dump_pagetable(address);
+	printk(KERN_CRIT "Fixmap KM_PTE0 @ %#lx\n", fix_to_virt(KM_PTE0));
+	dump_pagetable(fix_to_virt(KM_PTE0));
+	printk(KERN_CRIT "Fixmap KM_PTE1 @ %#lx\n", fix_to_virt(KM_PTE1));
+	dump_pagetable(fix_to_virt(KM_PTE1));
 }
 
 static noinline void
diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index 1729178..2c427d3 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -1015,13 +1015,34 @@ static int xen_pin_page(struct mm_struct *mm, struct
page *page,
 	return flush;
 }
 
+static int xen_check_l1_pinned(pte_t *pte, unsigned long s, unsigned long e,
struct mm_walk *walk)
+{
+	extern void dump_pagetable(unsigned long address);
+	struct page *pte_page = virt_to_page(pte);
+
+	if (!PagePinned(pte_page)) {
+		printk(KERN_CRIT "PTE @ %p is an L1 page %p covering %#lx-%#lx which is
not pinned\n", pte, pte_page, s, e);
+		dump_pagetable((unsigned long)pte);
+		BUG();
+	}
+
+	return 0;
+}
+
 /* This is called just after a mm has been created, but it has not
    been used yet.  We need to make sure that its pagetable is all
    read-only, and can be pinned. */
 static void __xen_pgd_pin(struct mm_struct *mm, pgd_t *pgd)
 {
+	struct mm_walk xen_pin_walk = {
+		.pte_entry = &xen_check_l1_pinned,
+		.mm = mm,
+	};
+
 	vm_unmap_aliases();
 
+	walk_page_range(0xc0000000, FIXADDR_TOP, &xen_pin_walk);
+
 	xen_mc_batch();
 
 	if (__xen_pgd_walk(mm, pgd, xen_pin_page, USER_LIMIT)) {
diff --git a/init/main.c b/init/main.c
index 33ce929..baf4300 100644
--- a/init/main.c
+++ b/init/main.c
@@ -74,6 +74,8 @@
 #include <asm/sections.h>
 #include <asm/cacheflush.h>
 
+#include <asm/xen/page.h>
+
 #ifdef CONFIG_X86_LOCAL_APIC
 #include <asm/smp.h>
 #endif
@@ -815,6 +817,54 @@ static noinline int init_post(void)
 	system_state = SYSTEM_RUNNING;
 	numa_default_policy();
 
+	{
+		extern void dump_pagetable(unsigned long address);
+		struct page *pgd_page, *pte_page;
+		pgd_t *pgd;
+		pud_t *pud;
+		pmd_t *pmd;
+		phys_addr_t pte_phys;
+		unsigned long address = 0xc08ce011UL;//(unsigned long)
__builtin_return_address(0);
+
+		pgd = pgd_offset(&init_mm, address);
+		if (!pgd_present(*pgd))
+			goto skip;
+
+		pud = pud_offset(pgd, address);
+		if (!pud_present(*pud))
+			goto skip;
+
+		pmd = pmd_offset(pud, address);
+		if (!pmd_present(*pmd))
+			goto skip;
+
+		pgd_page = virt_to_page(init_mm.pgd);
+		pte_page = pmd_page(*pmd);
+
+		pte_phys = page_to_phys(pte_page) + pte_index(address);
+		printk(KERN_CRIT "Test debug infrastructure on address %#lx:\n",
address);
+		printk(KERN_CRIT "L4 at V:%p/P:%#llx/M:%#llx is %s and contains L2 at
V:%p/P:%#llx/M:%#llx = %#llx "
+		       "which points to an L1 P:%#llx/M:%#llx which is %s %s\n",
+		       pgd, virt_to_phys(pgd), virt_to_machine(pgd).maddr,
+		       PagePinned(pgd_page) ? "pinned" : "unpinned",
+		       pmd, virt_to_phys(pmd), virt_to_machine(pmd).maddr,
+		       pmd_val(*pmd),
+		       pte_phys, phys_to_machine(XPADDR(pte_phys)).maddr,
+		       PagePinned(pte_page) ? "pinned" : "unpinned",
+		       PageHighMem(pte_page) ? "highmem" : "lowmem");
+		printk(KERN_CRIT "faulting address %#lx\n", address);
+		dump_pagetable(address);
+		if (!PageHighMem(pte_page)) {
+			printk(KERN_CRIT "lowmem mapping of L1 @ P:%#llx is at V:%p\n",
pte_phys, phys_to_virt(page_to_phys(pte_page)));
+			dump_pagetable((unsigned long)phys_to_virt(page_to_phys(pte_page)));
+		}
+		printk(KERN_CRIT "Fixmap KM_PTE0 @ %#lx\n", fix_to_virt(KM_PTE0));
+		dump_pagetable(fix_to_virt(KM_PTE0));
+		printk(KERN_CRIT "Fixmap KM_PTE1 @ %#lx\n", fix_to_virt(KM_PTE1));
+		dump_pagetable(fix_to_virt(KM_PTE1));
+	}
+	skip:
+
 	if (sys_open((const char __user *) "/dev/console", O_RDWR, 0) <
0)
 		printk(KERN_WARNING "Warning: unable to open an initial
console.\n");
 
diff --git a/mm/rmap.c b/mm/rmap.c
index 1652166..ced5650 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -52,6 +52,9 @@
 #include <linux/migrate.h>
 
 #include <asm/tlbflush.h>
+#include <asm/io.h>
+
+#include <asm/xen/page.h>
 
 #include "internal.h"
 
@@ -267,6 +270,7 @@ unsigned long page_address_in_vma(struct page *page, struct
vm_area_struct *vma)
 pte_t *page_check_address(struct page *page, struct mm_struct *mm,
 			  unsigned long address, spinlock_t **ptlp, int sync)
 {
+	struct page *pgd_page, *pte_page;
 	pgd_t *pgd;
 	pud_t *pud;
 	pmd_t *pmd;
@@ -285,6 +289,32 @@ pte_t *page_check_address(struct page *page, struct
mm_struct *mm,
 	if (!pmd_present(*pmd))
 		return NULL;
 
+	pgd_page = virt_to_page(mm->pgd);
+	pte_page = pmd_page(*pmd);
+
+	if (PagePinned(pgd_page) != PagePinned(pte_page)) {
+		extern void dump_pagetable(unsigned long address);
+		phys_addr_t pte_phys = page_to_phys(pte_page) + pte_index(address);
+		printk(KERN_CRIT "L4 at V:%p/P:%#llx/M:%#llx is %s and contains L2 at
V:%p/P:%#llx/M:%#llx = %#llx "
+		       "which points to an L1 P:%#llx/M:%#llx which is %s %s\n",
+		       pgd, virt_to_phys(pgd), virt_to_machine(pgd).maddr,
+		       PagePinned(pgd_page) ? "pinned" : "unpinned",
+		       pmd, virt_to_phys(pmd), virt_to_machine(pmd).maddr,
+		       pmd_val(*pmd),
+		       pte_phys, phys_to_machine(XPADDR(pte_phys)).maddr,
+		       PagePinned(pte_page) ? "pinned" : "unpinned",
+		       PageHighMem(pte_page) ? "highmem" : "lowmem");
+		printk(KERN_CRIT "faulting address %#lx\n", address);
+		dump_pagetable(address);
+		if (!PageHighMem(pte_page)) {
+			printk(KERN_CRIT "lowmem mapping of L1 @ P:%#llx is at V:%p\n",
pte_phys, phys_to_virt(page_to_phys(pte_page)));
+			dump_pagetable((unsigned long)phys_to_virt(page_to_phys(pte_page)));
+		}
+		printk(KERN_CRIT "Fixmap KM_PTE0 @ %#lx\n", fix_to_virt(KM_PTE0));
+		dump_pagetable(fix_to_virt(KM_PTE0));
+		printk(KERN_CRIT "Fixmap KM_PTE1 @ %#lx\n", fix_to_virt(KM_PTE1));
+		dump_pagetable(fix_to_virt(KM_PTE1));
+	}
 	pte = pte_offset_map(pmd, address);
 	/* Make a quick check before getting the lock */
 	if (!sync && !pte_present(*pte)) {



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Ian Campbell

2009-Jun-08 16:00 UTC

head link

Re: [Xen-devel] xen.git branch reorg / success with 2.6.30-rc3 pv_ops dom0

On Mon, 2009-06-08 at 11:45 -0400, Ian Campbell wrote:> 
> > L4 at e1822000 is pinned contains L2 at e1977228 which points at an
> L1
> > which is unpinned low mem address 0x8bf8000
> 
> OK so I think that is interesting. A pinned L4 referencing an unpinned
> L1 isn''t supposed to happen, I don''t think (Jeremy?).
Interesting:

        pte_t *page_check_address(struct page *page, struct mm_struct *mm,
        [...]
        	pte = pte_offset_map(pmd, address); /* A */
        	/* Make a quick check before getting the lock */
        	if (!sync && !pte_present(*pte)) {
        		pte_unmap(pte);
        		return NULL;
        	}

        	ptl = pte_lockptr(mm, pmd);
        	spin_lock(ptl);
        [...]

So at point A we make a new mapping of a PTE without yet holding the
corresponding PTE lock and this is precisely the point at which things
start to go wrong for us... (coincidence? I think not ;-))

I wonder how this interacts with the logic in
arch/x86/xen/mmu.c:xen_pin_page() which holds the lock while waiting for
the (deferred) pin multicall to occur? Hmm, no this is about the
PagePinned flag on the struct page which is out of date WRT the actual
pinned status as Xen sees it -- we update the PagePinned flag early in
xen_pin_page() long before Xen the pin hypercall so this window is the
other way round to what would be needed to trigger this bug.

On the other hand xen_unpin_page() looks like it sets up something
roughly like what we need for this issue to trigger.

Pasi in additional to my other mad hack could you try this:

diff --git a/mm/Kconfig b/mm/Kconfig
index a5b7781..5663548 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -166,6 +166,7 @@ config SPLIT_PTLOCK_CPUS
 	int
 	default "4096" if ARM && !CPU_CACHE_VIPT
 	default "4096" if PARISC && !PA20
+	default "4096" if XEN
 	default "4"

 #

Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Pasi Kärkkäinen

2009-Jun-08 16:13 UTC

head link

Re: [Xen-devel] xen.git branch reorg / success with 2.6.30-rc3 pv_ops dom0

On Mon, Jun 08, 2009 at 05:00:58PM +0100, Ian Campbell
wrote:> On Mon, 2009-06-08 at 11:45 -0400, Ian Campbell wrote:
> > 
> > > L4 at e1822000 is pinned contains L2 at e1977228 which points at
an
> > L1
> > > which is unpinned low mem address 0x8bf8000
> > 
> > OK so I think that is interesting. A pinned L4 referencing an unpinned
> > L1 isn''t supposed to happen, I don''t think
(Jeremy?).
> 
> Interesting:
> 
>         pte_t *page_check_address(struct page *page, struct mm_struct *mm,
>         [...]
>         	pte = pte_offset_map(pmd, address); /* A */
>         	/* Make a quick check before getting the lock */
>         	if (!sync && !pte_present(*pte)) {
>         		pte_unmap(pte);
>         		return NULL;
>         	}
>         
>         	ptl = pte_lockptr(mm, pmd);
>         	spin_lock(ptl);
>         [...]
>         
> So at point A we make a new mapping of a PTE without yet holding the
> corresponding PTE lock and this is precisely the point at which things
> start to go wrong for us... (coincidence? I think not ;-))
> 
> I wonder how this interacts with the logic in
> arch/x86/xen/mmu.c:xen_pin_page() which holds the lock while waiting for
> the (deferred) pin multicall to occur? Hmm, no this is about the
> PagePinned flag on the struct page which is out of date WRT the actual
> pinned status as Xen sees it -- we update the PagePinned flag early in
> xen_pin_page() long before Xen the pin hypercall so this window is the
> other way round to what would be needed to trigger this bug.
> 
> On the other hand xen_unpin_page() looks like it sets up something
> roughly like what we need for this issue to trigger.
> 
> Pasi in additional to my other mad hack could you try this:
> 
Ok.. do you want me to try first without this patch? Or should I cancel my
kernel compilation and apply this aswell? :)

-- Pasi
> diff --git a/mm/Kconfig b/mm/Kconfig
> index a5b7781..5663548 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -166,6 +166,7 @@ config SPLIT_PTLOCK_CPUS
>  	int
>  	default "4096" if ARM && !CPU_CACHE_VIPT
>  	default "4096" if PARISC && !PA20
> +	default "4096" if XEN
>  	default "4"
>  
>  #
> 
> 
> Ian.
> 
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Ian Campbell

2009-Jun-08 16:17 UTC

head link

Re: [Xen-devel] xen.git branch reorg / success with 2.6.30-rc3 pv_ops dom0

On Mon, 2009-06-08 at 12:13 -0400, Pasi Kärkkäinen
wrote:> On Mon, Jun 08, 2009 at 05:00:58PM +0100, Ian Campbell wrote:
> > On Mon, 2009-06-08 at 11:45 -0400, Ian Campbell wrote:
> > > 
> > > > L4 at e1822000 is pinned contains L2 at e1977228 which
points at an
> > > L1
> > > > which is unpinned low mem address 0x8bf8000
> > > 
> > > OK so I think that is interesting. A pinned L4 referencing an
unpinned
> > > L1 isn''t supposed to happen, I don''t think
(Jeremy?).
> > 
> > Interesting:
> > 
> >         pte_t *page_check_address(struct page *page, struct mm_struct
*mm,
> >         [...]
> >         	pte = pte_offset_map(pmd, address); /* A */
> >         	/* Make a quick check before getting the lock */
> >         	if (!sync && !pte_present(*pte)) {
> >         		pte_unmap(pte);
> >         		return NULL;
> >         	}
> >         
> >         	ptl = pte_lockptr(mm, pmd);
> >         	spin_lock(ptl);
> >         [...]
> >         
> > So at point A we make a new mapping of a PTE without yet holding the
> > corresponding PTE lock and this is precisely the point at which things
> > start to go wrong for us... (coincidence? I think not ;-))
> > 
> > I wonder how this interacts with the logic in
> > arch/x86/xen/mmu.c:xen_pin_page() which holds the lock while waiting
for
> > the (deferred) pin multicall to occur? Hmm, no this is about the
> > PagePinned flag on the struct page which is out of date WRT the actual
> > pinned status as Xen sees it -- we update the PagePinned flag early in
> > xen_pin_page() long before Xen the pin hypercall so this window is the
> > other way round to what would be needed to trigger this bug.
> > 
> > On the other hand xen_unpin_page() looks like it sets up something
> > roughly like what we need for this issue to trigger.
> > 
> > Pasi in additional to my other mad hack could you try this:
> > 
> 
> Ok.. do you want me to try first without this patch? Or should I cancel my
> kernel compilation and apply this aswell? :)
Can you try the first patch first then add this one please.

Ian.
> 
> -- Pasi
> 
> > diff --git a/mm/Kconfig b/mm/Kconfig
> > index a5b7781..5663548 100644
> > --- a/mm/Kconfig
> > +++ b/mm/Kconfig
> > @@ -166,6 +166,7 @@ config SPLIT_PTLOCK_CPUS
> >  	int
> >  	default "4096" if ARM && !CPU_CACHE_VIPT
> >  	default "4096" if PARISC && !PA20
> > +	default "4096" if XEN
> >  	default "4"
> >  
> >  #
> > 
> > 
> > Ian.
> > 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Pasi Kärkkäinen

2009-Jun-08 16:21 UTC

head link

Re: [Xen-devel] xen.git branch reorg / success with 2.6.30-rc3 pv_ops dom0

On Mon, Jun 08, 2009 at 05:17:45PM +0100, Ian Campbell
wrote:> On Mon, 2009-06-08 at 12:13 -0400, Pasi Kärkkäinen wrote:
> > On Mon, Jun 08, 2009 at 05:00:58PM +0100, Ian Campbell wrote:
> > > On Mon, 2009-06-08 at 11:45 -0400, Ian Campbell wrote:
> > > > 
> > > > > L4 at e1822000 is pinned contains L2 at e1977228 which
points at an
> > > > L1
> > > > > which is unpinned low mem address 0x8bf8000
> > > > 
> > > > OK so I think that is interesting. A pinned L4 referencing
an unpinned
> > > > L1 isn''t supposed to happen, I don''t think
(Jeremy?).
> > > 
> > > Interesting:
> > > 
> > >         pte_t *page_check_address(struct page *page, struct
mm_struct *mm,
> > >         [...]
> > >         	pte = pte_offset_map(pmd, address); /* A */
> > >         	/* Make a quick check before getting the lock */
> > >         	if (!sync && !pte_present(*pte)) {
> > >         		pte_unmap(pte);
> > >         		return NULL;
> > >         	}
> > >         
> > >         	ptl = pte_lockptr(mm, pmd);
> > >         	spin_lock(ptl);
> > >         [...]
> > >         
> > > So at point A we make a new mapping of a PTE without yet holding
the
> > > corresponding PTE lock and this is precisely the point at which
things
> > > start to go wrong for us... (coincidence? I think not ;-))
> > > 
> > > I wonder how this interacts with the logic in
> > > arch/x86/xen/mmu.c:xen_pin_page() which holds the lock while
waiting for
> > > the (deferred) pin multicall to occur? Hmm, no this is about the
> > > PagePinned flag on the struct page which is out of date WRT the
actual
> > > pinned status as Xen sees it -- we update the PagePinned flag
early in
> > > xen_pin_page() long before Xen the pin hypercall so this window
is the
> > > other way round to what would be needed to trigger this bug.
> > > 
> > > On the other hand xen_unpin_page() looks like it sets up
something
> > > roughly like what we need for this issue to trigger.
> > > 
> > > Pasi in additional to my other mad hack could you try this:
> > > 
> > 
> > Ok.. do you want me to try first without this patch? Or should I
cancel my
> > kernel compilation and apply this aswell? :)
> 
> Can you try the first patch first then add this one please.
> 
Ok. Will do.

I was already starting to feel like ''maybe my hardware is
broken'' but now that
code looks like it might be an actual bug :)

Let''s see.

-- Pasi

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Pasi Kärkkäinen

2009-Jun-08 17:05 UTC

head link

Re: [Xen-devel] xen.git branch reorg / success with 2.6.30-rc3 pv_ops dom0

On Mon, Jun 08, 2009 at 07:21:46PM +0300, Pasi Kärkkäinen
wrote:> On Mon, Jun 08, 2009 at 05:17:45PM +0100, Ian Campbell wrote:
> > On Mon, 2009-06-08 at 12:13 -0400, Pasi Kärkkäinen wrote:
> > > On Mon, Jun 08, 2009 at 05:00:58PM +0100, Ian Campbell wrote:
> > > > On Mon, 2009-06-08 at 11:45 -0400, Ian Campbell wrote:
> > > > > 
> > > > > > L4 at e1822000 is pinned contains L2 at e1977228
which points at an
> > > > > L1
> > > > > > which is unpinned low mem address 0x8bf8000
> > > > > 
> > > > > OK so I think that is interesting. A pinned L4
referencing an unpinned
> > > > > L1 isn''t supposed to happen, I don''t
think (Jeremy?).
> > > > 
> > > > Interesting:
> > > > 
> > > >         pte_t *page_check_address(struct page *page, struct
mm_struct *mm,
> > > >         [...]
> > > >         	pte = pte_offset_map(pmd, address); /* A */
> > > >         	/* Make a quick check before getting the lock */
> > > >         	if (!sync && !pte_present(*pte)) {
> > > >         		pte_unmap(pte);
> > > >         		return NULL;
> > > >         	}
> > > >         
> > > >         	ptl = pte_lockptr(mm, pmd);
> > > >         	spin_lock(ptl);
> > > >         [...]
> > > >         
> > > > So at point A we make a new mapping of a PTE without yet
holding the
> > > > corresponding PTE lock and this is precisely the point at
which things
> > > > start to go wrong for us... (coincidence? I think not ;-))
> > > > 
> > > > I wonder how this interacts with the logic in
> > > > arch/x86/xen/mmu.c:xen_pin_page() which holds the lock while
waiting for
> > > > the (deferred) pin multicall to occur? Hmm, no this is about
the
> > > > PagePinned flag on the struct page which is out of date WRT
the actual
> > > > pinned status as Xen sees it -- we update the PagePinned
flag early in
> > > > xen_pin_page() long before Xen the pin hypercall so this
window is the
> > > > other way round to what would be needed to trigger this bug.
> > > > 
> > > > On the other hand xen_unpin_page() looks like it sets up
something
> > > > roughly like what we need for this issue to trigger.
> > > > 
> > > > Pasi in additional to my other mad hack could you try this:
> > > > 
> > > 
> > > Ok.. do you want me to try first without this patch? Or should I
cancel my
> > > kernel compilation and apply this aswell? :)
> > 
> > Can you try the first patch first then add this one please.
> > 
> 
> Ok. Will do.
> 
> I was already starting to feel like ''maybe my hardware is
broken'' but now that
> code looks like it might be an actual bug :)
> 
> Let''s see.
> 
Crash with only the first patch applied:
http://pasik.reaktio.net/xen/pv_ops-dom0-debug/pv_ops-dom0-log-05-with-highpte-no-swap-with-debug3.txt

Now I''ll try with the second one included aswell..

-- Pasi

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Pasi Kärkkäinen

2009-Jun-08 19:11 UTC

head link

Re: [Xen-devel] xen.git branch reorg / success with 2.6.30-rc3 pv_ops dom0

On Mon, Jun 08, 2009 at 08:05:43PM +0300, Pasi Kärkkäinen
wrote:> On Mon, Jun 08, 2009 at 07:21:46PM +0300, Pasi Kärkkäinen wrote:
> > On Mon, Jun 08, 2009 at 05:17:45PM +0100, Ian Campbell wrote:
> > > On Mon, 2009-06-08 at 12:13 -0400, Pasi Kärkkäinen wrote:
> > > > On Mon, Jun 08, 2009 at 05:00:58PM +0100, Ian Campbell
wrote:
> > > > > On Mon, 2009-06-08 at 11:45 -0400, Ian Campbell wrote:
> > > > > > 
> > > > > > > L4 at e1822000 is pinned contains L2 at
e1977228 which points at an
> > > > > > L1
> > > > > > > which is unpinned low mem address 0x8bf8000
> > > > > > 
> > > > > > OK so I think that is interesting. A pinned L4
referencing an unpinned
> > > > > > L1 isn''t supposed to happen, I
don''t think (Jeremy?).
> > > > > 
> > > > > Interesting:
> > > > > 
> > > > >         pte_t *page_check_address(struct page *page,
struct mm_struct *mm,
> > > > >         [...]
> > > > >         	pte = pte_offset_map(pmd, address); /* A */
> > > > >         	/* Make a quick check before getting the lock
*/
> > > > >         	if (!sync && !pte_present(*pte)) {
> > > > >         		pte_unmap(pte);
> > > > >         		return NULL;
> > > > >         	}
> > > > >         
> > > > >         	ptl = pte_lockptr(mm, pmd);
> > > > >         	spin_lock(ptl);
> > > > >         [...]
> > > > >         
> > > > > So at point A we make a new mapping of a PTE without
yet holding the
> > > > > corresponding PTE lock and this is precisely the point
at which things
> > > > > start to go wrong for us... (coincidence? I think not
;-))
> > > > > 
> > > > > I wonder how this interacts with the logic in
> > > > > arch/x86/xen/mmu.c:xen_pin_page() which holds the lock
while waiting for
> > > > > the (deferred) pin multicall to occur? Hmm, no this is
about the
> > > > > PagePinned flag on the struct page which is out of date
WRT the actual
> > > > > pinned status as Xen sees it -- we update the
PagePinned flag early in
> > > > > xen_pin_page() long before Xen the pin hypercall so
this window is the
> > > > > other way round to what would be needed to trigger this
bug.
> > > > > 
> > > > > On the other hand xen_unpin_page() looks like it sets
up something
> > > > > roughly like what we need for this issue to trigger.
> > > > > 
> > > > > Pasi in additional to my other mad hack could you try
this:
> > > > > 
> > > > 
> > > > Ok.. do you want me to try first without this patch? Or
should I cancel my
> > > > kernel compilation and apply this aswell? :)
> > > 
> > > Can you try the first patch first then add this one please.
> > > 
> > 
> > Ok. Will do.
> > 
> > I was already starting to feel like ''maybe my hardware is
broken'' but now that
> > code looks like it might be an actual bug :)
> > 
> > Let''s see.
> > 
> 
> Crash with only the first patch applied:
>
http://pasik.reaktio.net/xen/pv_ops-dom0-debug/pv_ops-dom0-log-05-with-highpte-no-swap-with-debug3.txt
> 
> Now I''ll try with the second one included aswell..
> 
And here''s one with the second patch applied aswell:
http://pasik.reaktio.net/xen/pv_ops-dom0-debug/pv_ops-dom0-log-06-with-highpte-no-swap-with-debug4.txt

Seems to be different.. Xen is not complaining anymore..

-- Pasi

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Pasi Kärkkäinen

2009-Jun-09 14:53 UTC

head link

Re: [Xen-devel] xen.git branch reorg / success with 2.6.30-rc3 pv_ops dom0

On Mon, Jun 08, 2009 at 10:11:41PM +0300, Pasi Kärkkäinen
wrote:> On Mon, Jun 08, 2009 at 08:05:43PM +0300, Pasi Kärkkäinen wrote:
> > On Mon, Jun 08, 2009 at 07:21:46PM +0300, Pasi Kärkkäinen wrote:
> > > On Mon, Jun 08, 2009 at 05:17:45PM +0100, Ian Campbell wrote:
> > > > On Mon, 2009-06-08 at 12:13 -0400, Pasi Kärkkäinen wrote:
> > > > > On Mon, Jun 08, 2009 at 05:00:58PM +0100, Ian Campbell
wrote:
> > > > > > On Mon, 2009-06-08 at 11:45 -0400, Ian Campbell
wrote:
> > > > > > > 
> > > > > > > > L4 at e1822000 is pinned contains L2 at
e1977228 which points at an
> > > > > > > L1
> > > > > > > > which is unpinned low mem address
0x8bf8000
> > > > > > > 
> > > > > > > OK so I think that is interesting. A pinned
L4 referencing an unpinned
> > > > > > > L1 isn''t supposed to happen, I
don''t think (Jeremy?).
> > > > > > 
> > > > > > Interesting:
> > > > > > 
> > > > > >         pte_t *page_check_address(struct page
*page, struct mm_struct *mm,
> > > > > >         [...]
> > > > > >         	pte = pte_offset_map(pmd, address); /* A
*/
> > > > > >         	/* Make a quick check before getting the
lock */
> > > > > >         	if (!sync && !pte_present(*pte))
{
> > > > > >         		pte_unmap(pte);
> > > > > >         		return NULL;
> > > > > >         	}
> > > > > >         
> > > > > >         	ptl = pte_lockptr(mm, pmd);
> > > > > >         	spin_lock(ptl);
> > > > > >         [...]
> > > > > >         
> > > > > > So at point A we make a new mapping of a PTE
without yet holding the
> > > > > > corresponding PTE lock and this is precisely the
point at which things
> > > > > > start to go wrong for us... (coincidence? I think
not ;-))
> > > > > > 
> > > > > > I wonder how this interacts with the logic in
> > > > > > arch/x86/xen/mmu.c:xen_pin_page() which holds the
lock while waiting for
> > > > > > the (deferred) pin multicall to occur? Hmm, no
this is about the
> > > > > > PagePinned flag on the struct page which is out of
date WRT the actual
> > > > > > pinned status as Xen sees it -- we update the
PagePinned flag early in
> > > > > > xen_pin_page() long before Xen the pin hypercall
so this window is the
> > > > > > other way round to what would be needed to trigger
this bug.
> > > > > > 
> > > > > > On the other hand xen_unpin_page() looks like it
sets up something
> > > > > > roughly like what we need for this issue to
trigger.
> > > > > > 
> > > > > > Pasi in additional to my other mad hack could you
try this:
> > > > > > 
> > > > > 
> > > > > Ok.. do you want me to try first without this patch? Or
should I cancel my
> > > > > kernel compilation and apply this aswell? :)
> > > > 
> > > > Can you try the first patch first then add this one please.
> > > > 
> > > 
> > > Ok. Will do.
> > > 
> > > I was already starting to feel like ''maybe my hardware
is broken'' but now that
> > > code looks like it might be an actual bug :)
> > > 
> > > Let''s see.
> > > 
> > 
> > Crash with only the first patch applied:
> >
http://pasik.reaktio.net/xen/pv_ops-dom0-debug/pv_ops-dom0-log-05-with-highpte-no-swap-with-debug3.txt
> > 
> > Now I''ll try with the second one included aswell..
> > 
> 
> And here''s one with the second patch applied aswell:
>
http://pasik.reaktio.net/xen/pv_ops-dom0-debug/pv_ops-dom0-log-06-with-highpte-no-swap-with-debug4.txt
> 
> Seems to be different.. Xen is not complaining anymore..
> 
And here''s one with only the second patch applied:
http://pasik.reaktio.net/xen/pv_ops-dom0-debug/pv_ops-dom0-log-07-with-highpte-no-swap-with-debug5.txt

Now Xen is complaining again.. does that sound correct?

-- Pasi

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Ian Campbell

2009-Jun-09 15:37 UTC

head link

Re: [Xen-devel] xen.git branch reorg / success with 2.6.30-rc3 pv_ops dom0

On Tue, 2009-06-09 at 10:53 -0400, Pasi Kärkkäinen
wrote:> 
> 
> And here''s one with only the second patch applied:
>
http://pasik.reaktio.net/xen/pv_ops-dom0-debug/pv_ops-dom0-log-07-with-highpte-no-swap-with-debug5.txt
> 
> Now Xen is complaining again.. does that sound correct?
Well, it suggests my theory around pte locking and split pte locks may
be invalid... I guess even without split pte locks the call to
kmap_atomic_pte from page_check_address() is still outside
mm->page_table_lock and hence subject to the race.

Without redoing the core locking rules I''m not sure what we could do
about that. Perhaps as a workaround always doing kmap_atomic_pte as a
read only mapping would be sufficient (it seems to be in this particular
call chain which never writes the pte but I didn''t check them all and I
guess some of them must want to write).

Does this patch (without any of the others) make any difference to you?

--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -1522,7 +1522,7 @@ static void *xen_kmap_atomic_pte(struct page *page, enum
km_type type)
 {
 	pgprot_t prot = PAGE_KERNEL;
 
-	if (PagePinned(page))
+	if (1 || PagePinned(page))
 		prot = PAGE_KERNEL_RO;
 
 	if (0 && PageHighMem(page))


Ian.



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jeremy Fitzhardinge

2009-Jun-09 17:28 UTC

head link

Re: [Xen-devel] xen.git branch reorg / success with 2.6.30-rc3 pv_ops dom0

Ian Campbell wrote:> I wonder how this interacts with the logic in
> arch/x86/xen/mmu.c:xen_pin_page() which holds the lock while waiting for
> the (deferred) pin multicall to occur? Hmm, no this is about the
> PagePinned flag on the struct page which is out of date WRT the actual
> pinned status as Xen sees it -- we update the PagePinned flag early in
> xen_pin_page() long before Xen the pin hypercall so this window is the
> other way round to what would be needed to trigger this bug.
>   
Yes, it looks like you could get a bad mapping here.  An obvious fix 
would be to defer clearing the pinned flag in the page struct until 
after the hypercall has issued.  That would make the racy 
kmap_atomic_pte map RO, which would be fine unless it actually tries to 
modify it (but I can''t imagine it would do that unlocked).

    J

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Pasi Kärkkäinen

2009-Jun-09 18:07 UTC

head link

Re: [Xen-devel] xen.git branch reorg / success with 2.6.30-rc3 pv_ops dom0

On Tue, Jun 09, 2009 at 04:37:52PM +0100, Ian Campbell
wrote:> On Tue, 2009-06-09 at 10:53 -0400, Pasi Kärkkäinen wrote:
> > 
> > 
> > And here''s one with only the second patch applied:
> >
http://pasik.reaktio.net/xen/pv_ops-dom0-debug/pv_ops-dom0-log-07-with-highpte-no-swap-with-debug5.txt
> > 
> > Now Xen is complaining again.. does that sound correct?
> 
> Well, it suggests my theory around pte locking and split pte locks may
> be invalid... I guess even without split pte locks the call to
> kmap_atomic_pte from page_check_address() is still outside
> mm->page_table_lock and hence subject to the race.
> 
> Without redoing the core locking rules I''m not sure what we could
do
> about that. Perhaps as a workaround always doing kmap_atomic_pte as a
> read only mapping would be sufficient (it seems to be in this particular
> call chain which never writes the pte but I didn''t check them all
and I
> guess some of them must want to write).
> 
> Does this patch (without any of the others) make any difference to you?
> 
Yeah, now the kernel crashes very early during system startup :)
http://pasik.reaktio.net/xen/pv_ops-dom0-debug/pv_ops-dom0-log-08-with-highpte-no-swap-with-debug6.txt

-- Pasi
> --- a/arch/x86/xen/mmu.c
> +++ b/arch/x86/xen/mmu.c
> @@ -1522,7 +1522,7 @@ static void *xen_kmap_atomic_pte(struct page *page,
enum km_type type)
>  {
>  	pgprot_t prot = PAGE_KERNEL;
>  
> -	if (PagePinned(page))
> +	if (1 || PagePinned(page))
>  		prot = PAGE_KERNEL_RO;
>  
>  	if (0 && PageHighMem(page))
> 
> 
> Ian.
> 
> 
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Ian Campbell

2009-Jun-11 09:02 UTC

head link

Re: [Xen-devel] xen.git branch reorg / success with 2.6.30-rc3 pv_ops dom0

On Tue, 2009-06-09 at 13:28 -0400, Jeremy Fitzhardinge
wrote:> Ian Campbell wrote:
> > I wonder how this interacts with the logic in
> > arch/x86/xen/mmu.c:xen_pin_page() which holds the lock while waiting
for
> > the (deferred) pin multicall to occur? Hmm, no this is about the
> > PagePinned flag on the struct page which is out of date WRT the actual
> > pinned status as Xen sees it -- we update the PagePinned flag early in
> > xen_pin_page() long before Xen the pin hypercall so this window is the
> > other way round to what would be needed to trigger this bug.
> >   
> 
> Yes, it looks like you could get a bad mapping here.  An obvious fix 
> would be to defer clearing the pinned flag in the page struct until 
> after the hypercall has issued.  That would make the racy 
> kmap_atomic_pte map RO, which would be fine unless it actually tries to 
> modify it (but I can''t imagine it would do that unlocked).
But would it redo the mapping after taking the lock? It doesn''t look
like it does (why would it). So we could end up writing to an unpinned
pte via a R/O mapping.

As an experiment I tried the simple approach of flushing the multicalls
explicitly in xen_unpin_page and then clearing the Pinned bit and it all
goes a bit wrong. eip is "ptep->pte_low = 0" so I think the
unpinned but
R/O theory holds...
        BUG: unable to handle kernel paging request at f57ab240
        IP: [<c0486f8b>] unmap_vmas+0x32e/0x5bb
        *pdpt = 00000001002d6001
        Oops: 0003 [#1] SMP
        last sysfs file:
        Modules linked in:
        
        Pid: 719, comm: init Not tainted (2.6.30-rc6-x86_32p-highpte-tip #15)
        EIP: 0061:[<c0486f8b>] EFLAGS: 00010202 CPU: 0
        EIP is at unmap_vmas+0x32e/0x5bb
        EAX: 1dcfb025 EBX: 00000001 ECX: f57ab240 EDX: 00000001
        ESI: 00000001 EDI: 08048000 EBP: e18d8d54 ESP: e18d8cd4
         DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0069
        Process init (pid: 719, ti=e18d8000 task=e24d8cc0 task.ti=e18d8000)
        Stack:
         00000002 e18ce000 e3204a50 00000000 00000000 e18ad058 e18d8d70 003ff000
         08048000 00000001 00000000 00000001 e25d3e00 08050000 e18ce000 e25d3e00
         f57ab240 00000000 00000000 c17f03ec 00000000 00000000 008d8d4c e18bf200
        Call Trace:
         [<c048a752>] ? exit_mmap+0x74/0xba
         [<c0436b92>] ? mmput+0x37/0x81
         [<c04a0e69>] ? flush_old_exec+0x3bc/0x635
         [<c04a0274>] ? kernel_read+0x34/0x46
         [<c04c7594>] ? load_elf_binary+0x329/0x1189
         [<c049c2da>] ? fsnotify_access+0x4f/0x5a

kmap_atomic_pte doesn''t get passed the mm so there is no way to get at
the ptl we would need to do something like clearing the pinned flag
under the lock in xen_unpin_page and holding the lock in
xen_kmap_atomic_pte. (I don''t know if that would be valid anyway under
the locking scheme).

The experimental patch:

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index 1729178..e997813 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -1108,9 +1108,7 @@ static void __init xen_mark_init_mm_pinned(void)
 static int xen_unpin_page(struct mm_struct *mm, struct page *page,
 			  enum pt_level level)
 {
-	unsigned pgfl = TestClearPagePinned(page);
-
-	if (pgfl && !PageHighMem(page)) {
+	if (PagePinned(page) && !PageHighMem(page)) {
 		void *pt = lowmem_page_address(page);
 		unsigned long pfn = page_to_pfn(page);
 		spinlock_t *ptl = NULL;
@@ -1136,10 +1134,12 @@ static int xen_unpin_page(struct mm_struct *mm, struct
page *page,
 					pfn_pte(pfn, PAGE_KERNEL),
 					level == PT_PGD ? UVMF_TLB_FLUSH : 0);
 
-		if (ptl) {
-			/* unlock when batch completed */
-			xen_mc_callback(xen_pte_unlock, ptl);
-		}
+		xen_mc_flush();
+
+		ClearPagePinned(page);
+
+		if (ptl)
+			xen_pte_unlock(ptl);
 	}
 
 	return 0;		/* never need to flush on unpin */



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Pasi Kärkkäinen

2009-Jun-11 09:14 UTC

head link

Re: [Xen-devel] xen.git branch reorg / success with 2.6.30-rc3 pv_ops dom0

On Thu, Jun 11, 2009 at 10:02:18AM +0100, Ian Campbell
wrote:> On Tue, 2009-06-09 at 13:28 -0400, Jeremy Fitzhardinge wrote:
> > Ian Campbell wrote:
> > > I wonder how this interacts with the logic in
> > > arch/x86/xen/mmu.c:xen_pin_page() which holds the lock while
waiting for
> > > the (deferred) pin multicall to occur? Hmm, no this is about the
> > > PagePinned flag on the struct page which is out of date WRT the
actual
> > > pinned status as Xen sees it -- we update the PagePinned flag
early in
> > > xen_pin_page() long before Xen the pin hypercall so this window
is the
> > > other way round to what would be needed to trigger this bug.
> > >   
> > 
> > Yes, it looks like you could get a bad mapping here.  An obvious fix 
> > would be to defer clearing the pinned flag in the page struct until 
> > after the hypercall has issued.  That would make the racy 
> > kmap_atomic_pte map RO, which would be fine unless it actually tries
to
> > modify it (but I can''t imagine it would do that unlocked).
> 
> But would it redo the mapping after taking the lock? It doesn''t
look
> like it does (why would it). So we could end up writing to an unpinned
> pte via a R/O mapping.
> 
> As an experiment I tried the simple approach of flushing the multicalls
> explicitly in xen_unpin_page and then clearing the Pinned bit and it all
> goes a bit wrong. eip is "ptep->pte_low = 0" so I think the
unpinned but
> R/O theory holds...
>         BUG: unable to handle kernel paging request at f57ab240
>         IP: [<c0486f8b>] unmap_vmas+0x32e/0x5bb
>         *pdpt = 00000001002d6001
>         Oops: 0003 [#1] SMP
>         last sysfs file:
>         Modules linked in:
>         
>         Pid: 719, comm: init Not tainted (2.6.30-rc6-x86_32p-highpte-tip
#15)
>         EIP: 0061:[<c0486f8b>] EFLAGS: 00010202 CPU: 0
>         EIP is at unmap_vmas+0x32e/0x5bb
>         EAX: 1dcfb025 EBX: 00000001 ECX: f57ab240 EDX: 00000001
>         ESI: 00000001 EDI: 08048000 EBP: e18d8d54 ESP: e18d8cd4
>          DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0069
>         Process init (pid: 719, ti=e18d8000 task=e24d8cc0 task.ti=e18d8000)
>         Stack:
>          00000002 e18ce000 e3204a50 00000000 00000000 e18ad058 e18d8d70
003ff000
>          08048000 00000001 00000000 00000001 e25d3e00 08050000 e18ce000
e25d3e00
>          f57ab240 00000000 00000000 c17f03ec 00000000 00000000 008d8d4c
e18bf200
>         Call Trace:
>          [<c048a752>] ? exit_mmap+0x74/0xba
>          [<c0436b92>] ? mmput+0x37/0x81
>          [<c04a0e69>] ? flush_old_exec+0x3bc/0x635
>          [<c04a0274>] ? kernel_read+0x34/0x46
>          [<c04c7594>] ? load_elf_binary+0x329/0x1189
>          [<c049c2da>] ? fsnotify_access+0x4f/0x5a
> 
> kmap_atomic_pte doesn''t get passed the mm so there is no way to
get at
> the ptl we would need to do something like clearing the pinned flag
> under the lock in xen_unpin_page and holding the lock in
> xen_kmap_atomic_pte. (I don''t know if that would be valid anyway
under
> the locking scheme).
> 
> The experimental patch:
> 

So should I try with only this patch applied? 

-- Pasi

> diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
> index 1729178..e997813 100644
> --- a/arch/x86/xen/mmu.c
> +++ b/arch/x86/xen/mmu.c
> @@ -1108,9 +1108,7 @@ static void __init xen_mark_init_mm_pinned(void)
>  static int xen_unpin_page(struct mm_struct *mm, struct page *page,
>  			  enum pt_level level)
>  {
> -	unsigned pgfl = TestClearPagePinned(page);
> -
> -	if (pgfl && !PageHighMem(page)) {
> +	if (PagePinned(page) && !PageHighMem(page)) {
>  		void *pt = lowmem_page_address(page);
>  		unsigned long pfn = page_to_pfn(page);
>  		spinlock_t *ptl = NULL;
> @@ -1136,10 +1134,12 @@ static int xen_unpin_page(struct mm_struct *mm,
struct page *page,
>  					pfn_pte(pfn, PAGE_KERNEL),
>  					level == PT_PGD ? UVMF_TLB_FLUSH : 0);
>  
> -		if (ptl) {
> -			/* unlock when batch completed */
> -			xen_mc_callback(xen_pte_unlock, ptl);
> -		}
> +		xen_mc_flush();
> +
> +		ClearPagePinned(page);
> +
> +		if (ptl)
> +			xen_pte_unlock(ptl);
>  	}
>  
>  	return 0;		/* never need to flush on unpin */
> 
> 
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Ian Campbell

2009-Jun-11 09:18 UTC

head link

Re: [Xen-devel] xen.git branch reorg / success with 2.6.30-rc3 pv_ops dom0

On Thu, 2009-06-11 at 05:14 -0400, Pasi Kärkkäinen wrote:
> So should I try with only this patch applied? 
No, it doesn''t even work for me ;-) I''m just about to send
another patch
I''d like you to try though...



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Ian Campbell

2009-Jun-11 09:18 UTC

head link

Re: [Xen-devel] xen.git branch reorg / success with 2.6.30-rc3 pv_ops dom0

Pasi, to validate the theory that you are seeing races between unpinning
and kmap_atomic_pte can you give this biguglystick approach to solving
it a go.

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index 1729178..beeb8e8 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -1145,9 +1145,12 @@ static int xen_unpin_page(struct mm_struct *mm, struct
page *page,
 	return 0;		/* never need to flush on unpin */
 }
 
+static DEFINE_SPINLOCK(hack_lock); /* Hack to sync unpin against
kmap_atomic_pte */
+
 /* Release a pagetables pages back as normal RW */
 static void __xen_pgd_unpin(struct mm_struct *mm, pgd_t *pgd)
 {
+	spin_lock(&hack_lock);
 	xen_mc_batch();
 
 	xen_do_pin(MMUEXT_UNPIN_TABLE, PFN_DOWN(__pa(pgd)));
@@ -1173,6 +1176,7 @@ static void __xen_pgd_unpin(struct mm_struct *mm, pgd_t
*pgd)
 	__xen_pgd_walk(mm, pgd, xen_unpin_page, USER_LIMIT);
 
 	xen_mc_issue(0);
+	spin_unlock(&hack_lock);
 }
 
 static void xen_pgd_unpin(struct mm_struct *mm)
@@ -1521,6 +1525,9 @@ static void xen_pgd_free(struct mm_struct *mm, pgd_t *pgd)
 static void *xen_kmap_atomic_pte(struct page *page, enum km_type type)
 {
 	pgprot_t prot = PAGE_KERNEL;
+	void *ret;
+
+	spin_lock(&hack_lock);
 
 	if (PagePinned(page))
 		prot = PAGE_KERNEL_RO;
@@ -1530,7 +1537,11 @@ static void *xen_kmap_atomic_pte(struct page *page, enum
km_type type)
 		       page_to_pfn(page), type,
 		       (unsigned long)pgprot_val(prot) & _PAGE_RW ? "WRITE" :
"READ");
 
-	return kmap_atomic_prot(page, type, prot);
+	ret = kmap_atomic_prot(page, type, prot);
+
+	spin_unlock(&hack_lock);
+
+	return ret;
 }
 #endif
 



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jeremy Fitzhardinge

2009-Jun-11 15:18 UTC

head link

Re: [Xen-devel] xen.git branch reorg / success with 2.6.30-rc3 pv_ops dom0

On 06/11/09 02:02, Ian Campbell wrote:> On Tue, 2009-06-09 at 13:28 -0400, Jeremy Fitzhardinge wrote:
>    
>> Ian Campbell wrote:
>>      
>>> I wonder how this interacts with the logic in
>>> arch/x86/xen/mmu.c:xen_pin_page() which holds the lock while
waiting for
>>> the (deferred) pin multicall to occur? Hmm, no this is about the
>>> PagePinned flag on the struct page which is out of date WRT the
actual
>>> pinned status as Xen sees it -- we update the PagePinned flag early
in
>>> xen_pin_page() long before Xen the pin hypercall so this window is
the
>>> other way round to what would be needed to trigger this bug.
>>>
>>>        
>> Yes, it looks like you could get a bad mapping here.  An obvious fix
>> would be to defer clearing the pinned flag in the page struct until
>> after the hypercall has issued.  That would make the racy
>> kmap_atomic_pte map RO, which would be fine unless it actually tries to
>> modify it (but I can''t imagine it would do that unlocked).
>>      
>
> But would it redo the mapping after taking the lock? It doesn''t
look
> like it does (why would it). So we could end up writing to an unpinned
> pte via a R/O mapping.
>    
Hm, yep.  One thing I noticed is that set_pte() is used very rarely, so 
it would be no cost to always use a hypercall in that case.  But 
xen_set_pte_at() ends up calling xen_set_pte() as well, and I think 
that''s more common.  Certainly we need to make sure that we''re
actually
taking advantage of late-pin by direct writing unpinned ptes.

I''ve been thinking of rearranging the set_pte(_at) pvops a little bit 
anyway; its not obvious we''re really getting much benefit from using
the
update_va_mapping hypercall, and if we''re not using it, then the 
set_pte_at pvop is taking a lot of unused parameters.

If we switch to just using mmu_update, then we can just pass the address 
and pte value.  But we could also pass the struct page * (which makes a 
bit of conceptual sense), so we could easy directly test whether the pte 
is pinned, and either use a direct write or hypercall accordingly.
> As an experiment I tried the simple approach of flushing the multicalls
> explicitly in xen_unpin_page and then clearing the Pinned bit and it all
> goes a bit wrong. eip is "ptep->pte_low = 0" so I think the
unpinned but
> R/O theory holds...
>    
Yes, I think the theory is sound.  But I''m curious why Pasi seems to be
able to hit the race easily, but we have not...

     J

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Pasi Kärkkäinen

2009-Jun-11 17:24 UTC

head link

Re: [Xen-devel] xen.git branch reorg / success with 2.6.30-rc3 pv_ops dom0

On Thu, Jun 11, 2009 at 08:18:15AM -0700, Jeremy Fitzhardinge
wrote:> On 06/11/09 02:02, Ian Campbell wrote:
> >On Tue, 2009-06-09 at 13:28 -0400, Jeremy Fitzhardinge wrote:
> >   
> >>Ian Campbell wrote:
> >>     
> >>>I wonder how this interacts with the logic in
> >>>arch/x86/xen/mmu.c:xen_pin_page() which holds the lock while
waiting for
> >>>the (deferred) pin multicall to occur? Hmm, no this is about
the
> >>>PagePinned flag on the struct page which is out of date WRT the
actual
> >>>pinned status as Xen sees it -- we update the PagePinned flag
early in
> >>>xen_pin_page() long before Xen the pin hypercall so this window
is the
> >>>other way round to what would be needed to trigger this bug.
> >>>
> >>>       
> >>Yes, it looks like you could get a bad mapping here.  An obvious
fix
> >>would be to defer clearing the pinned flag in the page struct until
> >>after the hypercall has issued.  That would make the racy
> >>kmap_atomic_pte map RO, which would be fine unless it actually
tries to
> >>modify it (but I can''t imagine it would do that unlocked).
> >>     
> >
> >But would it redo the mapping after taking the lock? It
doesn''t look
> >like it does (why would it). So we could end up writing to an unpinned
> >pte via a R/O mapping.
> >   
> 
> Hm, yep.  One thing I noticed is that set_pte() is used very rarely, so 
> it would be no cost to always use a hypercall in that case.  But 
> xen_set_pte_at() ends up calling xen_set_pte() as well, and I think 
> that''s more common.  Certainly we need to make sure that
we''re actually
> taking advantage of late-pin by direct writing unpinned ptes.
> 
> I''ve been thinking of rearranging the set_pte(_at) pvops a little
bit
> anyway; its not obvious we''re really getting much benefit from
using the
> update_va_mapping hypercall, and if we''re not using it, then the 
> set_pte_at pvop is taking a lot of unused parameters.
> 
> If we switch to just using mmu_update, then we can just pass the address 
> and pte value.  But we could also pass the struct page * (which makes a 
> bit of conceptual sense), so we could easy directly test whether the pte 
> is pinned, and either use a direct write or hypercall accordingly.
> 
> >As an experiment I tried the simple approach of flushing the multicalls
> >explicitly in xen_unpin_page and then clearing the Pinned bit and it
all
> >goes a bit wrong. eip is "ptep->pte_low = 0" so I think
the unpinned but
> >R/O theory holds...
> >   
> 
> Yes, I think the theory is sound.  But I''m curious why Pasi seems
to be
> able to hit the race easily, but we have not...
> 
Yeah, I''ve been thinking about that too.. 

My hardware is ~5 years old, but it has been running stable with multiple
distributions and kernel versions, on various types of loads. I think the
hardware should be all fine.

Atm I''ve been running Fedora 10 and Fedora 11 on it, both seem stable
with
the distro-provided kernels.

ie. I''m only seeing the problem on pv_ops dom0 kernel.

My installation is pretty basic/standard.. root-fs on LVM-volume. Can''t
really think of anything special.. 

And the problem seems to be _always_ reproducible with a simple 
"make clean && make bzImage && make modules" command
on dom0 ..

Anyway, I''ll continue testing. Hopefully we get this hunted down :)

-- Pasi

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Pasi Kärkkäinen

2009-Jun-11 18:27 UTC

head link

Re: [Xen-devel] xen.git branch reorg / success with 2.6.30-rc3 pv_ops dom0

On Thu, Jun 11, 2009 at 10:18:34AM +0100, Ian Campbell
wrote:> Pasi, to validate the theory that you are seeing races between unpinning
> and kmap_atomic_pte can you give this biguglystick approach to solving
> it a go.
> 
Guess what.. 

Now my dom0 didn''t crash !! (with only this patch applied).
It survived kernel compilation just fine.. first time so far with pv_ops dom0.

I''ll try again, just in case.

-- Pasi
> diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
> index 1729178..beeb8e8 100644
> --- a/arch/x86/xen/mmu.c
> +++ b/arch/x86/xen/mmu.c
> @@ -1145,9 +1145,12 @@ static int xen_unpin_page(struct mm_struct *mm,
struct page *page,
>  	return 0;		/* never need to flush on unpin */
>  }
>  
> +static DEFINE_SPINLOCK(hack_lock); /* Hack to sync unpin against
kmap_atomic_pte */
> +
>  /* Release a pagetables pages back as normal RW */
>  static void __xen_pgd_unpin(struct mm_struct *mm, pgd_t *pgd)
>  {
> +	spin_lock(&hack_lock);
>  	xen_mc_batch();
>  
>  	xen_do_pin(MMUEXT_UNPIN_TABLE, PFN_DOWN(__pa(pgd)));
> @@ -1173,6 +1176,7 @@ static void __xen_pgd_unpin(struct mm_struct *mm,
pgd_t *pgd)
>  	__xen_pgd_walk(mm, pgd, xen_unpin_page, USER_LIMIT);
>  
>  	xen_mc_issue(0);
> +	spin_unlock(&hack_lock);
>  }
>  
>  static void xen_pgd_unpin(struct mm_struct *mm)
> @@ -1521,6 +1525,9 @@ static void xen_pgd_free(struct mm_struct *mm, pgd_t
*pgd)
>  static void *xen_kmap_atomic_pte(struct page *page, enum km_type type)
>  {
>  	pgprot_t prot = PAGE_KERNEL;
> +	void *ret;
> +
> +	spin_lock(&hack_lock);
>  
>  	if (PagePinned(page))
>  		prot = PAGE_KERNEL_RO;
> @@ -1530,7 +1537,11 @@ static void *xen_kmap_atomic_pte(struct page *page,
enum km_type type)
>  		       page_to_pfn(page), type,
>  		       (unsigned long)pgprot_val(prot) & _PAGE_RW ?
"WRITE" : "READ");
>  
> -	return kmap_atomic_prot(page, type, prot);
> +	ret = kmap_atomic_prot(page, type, prot);
> +
> +	spin_unlock(&hack_lock);
> +
> +	return ret;
>  }
>  #endif
>  
> 
> 
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jeremy Fitzhardinge

2009-Jun-11 18:56 UTC

head link

Re: [Xen-devel] xen.git branch reorg / success with 2.6.30-rc3 pv_ops dom0

On 06/11/09 10:24, Pasi Kärkkäinen wrote:> Atm I''ve been running Fedora 10 and Fedora 11 on it, both seem
stable with
> the distro-provided kernels.
>    But not Xen?
> ie. I''m only seeing the problem on pv_ops dom0 kernel.
>
> My installation is pretty basic/standard.. root-fs on LVM-volume.
Can''t
> really think of anything special..
>    
Any odd (=not ext3) filesystems?

     J

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Pasi Kärkkäinen

2009-Jun-11 19:02 UTC

head link

Re: [Xen-devel] xen.git branch reorg / success with 2.6.30-rc3 pv_ops dom0

On Thu, Jun 11, 2009 at 11:56:56AM -0700, Jeremy Fitzhardinge
wrote:> On 06/11/09 10:24, Pasi Kärkkäinen wrote:
> >Atm I''ve been running Fedora 10 and Fedora 11 on it, both seem
stable with
> >the distro-provided kernels.
> >   
> But not Xen?
> 
I''ve also been running Xen of CentOS 5.x, and it has been stable
aswell..
> >ie. I''m only seeing the problem on pv_ops dom0 kernel.
> >
> >My installation is pretty basic/standard.. root-fs on LVM-volume.
Can''t
> >really think of anything special..
> >   
> 
> Any odd (=not ext3) filesystems?
> 
Only ext3 in use..

But now with the latest "biguglystick" patch from Ian I was able to
successfully run my kernel-compilation test.. (see the other mail).

so it looks like the problem was found!

-- Pasi

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jeremy Fitzhardinge

2009-Jun-11 19:23 UTC

head link

Re: [Xen-devel] xen.git branch reorg / success with 2.6.30-rc3 pv_ops dom0

On 06/11/09 12:02, Pasi Kärkkäinen wrote:> But now with the latest "biguglystick" patch from Ian I was able
to
> successfully run my kernel-compilation test.. (see the other mail).
>
> so it looks like the problem was found!
>    
Yes.  Still mulling a proper fix though.

     J

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Pasi Kärkkäinen

2009-Jun-11 19:34 UTC

head link

Re: [Xen-devel] xen.git branch reorg / success with 2.6.30-rc3 pv_ops dom0

On Thu, Jun 11, 2009 at 09:27:09PM +0300, Pasi Kärkkäinen
wrote:> On Thu, Jun 11, 2009 at 10:18:34AM +0100, Ian Campbell wrote:
> > Pasi, to validate the theory that you are seeing races between
unpinning
> > and kmap_atomic_pte can you give this biguglystick approach to solving
> > it a go.
> > 
> 
> Guess what.. 
> 
> Now my dom0 didn''t crash !! (with only this patch applied).
> It survived kernel compilation just fine.. first time so far with pv_ops
dom0.
> 
> I''ll try again, just in case.
> 
Yep, I tried again, and it still worked. 

No crashes anymore with this patch :) Congratulations and thanks!

-- Pasi
> 
> > diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
> > index 1729178..beeb8e8 100644
> > --- a/arch/x86/xen/mmu.c
> > +++ b/arch/x86/xen/mmu.c
> > @@ -1145,9 +1145,12 @@ static int xen_unpin_page(struct mm_struct *mm,
struct page *page,
> >  	return 0;		/* never need to flush on unpin */
> >  }
> >  
> > +static DEFINE_SPINLOCK(hack_lock); /* Hack to sync unpin against
kmap_atomic_pte */
> > +
> >  /* Release a pagetables pages back as normal RW */
> >  static void __xen_pgd_unpin(struct mm_struct *mm, pgd_t *pgd)
> >  {
> > +	spin_lock(&hack_lock);
> >  	xen_mc_batch();
> >  
> >  	xen_do_pin(MMUEXT_UNPIN_TABLE, PFN_DOWN(__pa(pgd)));
> > @@ -1173,6 +1176,7 @@ static void __xen_pgd_unpin(struct mm_struct
*mm, pgd_t *pgd)
> >  	__xen_pgd_walk(mm, pgd, xen_unpin_page, USER_LIMIT);
> >  
> >  	xen_mc_issue(0);
> > +	spin_unlock(&hack_lock);
> >  }
> >  
> >  static void xen_pgd_unpin(struct mm_struct *mm)
> > @@ -1521,6 +1525,9 @@ static void xen_pgd_free(struct mm_struct *mm,
pgd_t *pgd)
> >  static void *xen_kmap_atomic_pte(struct page *page, enum km_type
type)
> >  {
> >  	pgprot_t prot = PAGE_KERNEL;
> > +	void *ret;
> > +
> > +	spin_lock(&hack_lock);
> >  
> >  	if (PagePinned(page))
> >  		prot = PAGE_KERNEL_RO;
> > @@ -1530,7 +1537,11 @@ static void *xen_kmap_atomic_pte(struct page
*page, enum km_type type)
> >  		       page_to_pfn(page), type,
> >  		       (unsigned long)pgprot_val(prot) & _PAGE_RW ?
"WRITE" : "READ");
> >  
> > -	return kmap_atomic_prot(page, type, prot);
> > +	ret = kmap_atomic_prot(page, type, prot);
> > +
> > +	spin_unlock(&hack_lock);
> > +
> > +	return ret;
> >  }
> >  #endif
> >  
> > 
> > 
> 
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Ian Campbell

2009-Jun-15 10:03 UTC

head link

Re: [Xen-devel] xen.git branch reorg / success with 2.6.30-rc3 pv_ops dom0

On Thu, 2009-06-11 at 15:34 -0400, Pasi Kärkkäinen
wrote:> On Thu, Jun 11, 2009 at 09:27:09PM +0300, Pasi Kärkkäinen wrote:
> > On Thu, Jun 11, 2009 at 10:18:34AM +0100, Ian Campbell wrote:
> > > Pasi, to validate the theory that you are seeing races between
unpinning
> > > and kmap_atomic_pte can you give this biguglystick approach to
solving
> > > it a go.
> > > 
> > 
> > Guess what.. 
> > 
> > Now my dom0 didn''t crash !! (with only this patch applied).
> > It survived kernel compilation just fine.. first time so far with
pv_ops dom0.
> > 
> > I''ll try again, just in case.
> > 
> 
> Yep, I tried again, and it still worked. 
> 
> No crashes anymore with this patch :) Congratulations and thanks!
Oh good, thanks for testing. The patch is not really a suitable
long-term fix as it is but it sounds like Jeremy has some ideas.

I''m still curious how come you are the only one who sees this issue.  I
don''t recall you having lots of processors in your dmesg which might
make the race more common, nor do you have involuntary preempt enabled.
Very strange. Oh well I guess it doesn''t matter now ;-)

Ian.

> 
> -- Pasi
> 
> > 
> > > diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
> > > index 1729178..beeb8e8 100644
> > > --- a/arch/x86/xen/mmu.c
> > > +++ b/arch/x86/xen/mmu.c
> > > @@ -1145,9 +1145,12 @@ static int xen_unpin_page(struct mm_struct
*mm, struct page *page,
> > >  	return 0;		/* never need to flush on unpin */
> > >  }
> > >  
> > > +static DEFINE_SPINLOCK(hack_lock); /* Hack to sync unpin against
kmap_atomic_pte */
> > > +
> > >  /* Release a pagetables pages back as normal RW */
> > >  static void __xen_pgd_unpin(struct mm_struct *mm, pgd_t *pgd)
> > >  {
> > > +	spin_lock(&hack_lock);
> > >  	xen_mc_batch();
> > >  
> > >  	xen_do_pin(MMUEXT_UNPIN_TABLE, PFN_DOWN(__pa(pgd)));
> > > @@ -1173,6 +1176,7 @@ static void __xen_pgd_unpin(struct
mm_struct *mm, pgd_t *pgd)
> > >  	__xen_pgd_walk(mm, pgd, xen_unpin_page, USER_LIMIT);
> > >  
> > >  	xen_mc_issue(0);
> > > +	spin_unlock(&hack_lock);
> > >  }
> > >  
> > >  static void xen_pgd_unpin(struct mm_struct *mm)
> > > @@ -1521,6 +1525,9 @@ static void xen_pgd_free(struct mm_struct
*mm, pgd_t *pgd)
> > >  static void *xen_kmap_atomic_pte(struct page *page, enum km_type
type)
> > >  {
> > >  	pgprot_t prot = PAGE_KERNEL;
> > > +	void *ret;
> > > +
> > > +	spin_lock(&hack_lock);
> > >  
> > >  	if (PagePinned(page))
> > >  		prot = PAGE_KERNEL_RO;
> > > @@ -1530,7 +1537,11 @@ static void *xen_kmap_atomic_pte(struct
page *page, enum km_type type)
> > >  		       page_to_pfn(page), type,
> > >  		       (unsigned long)pgprot_val(prot) & _PAGE_RW ?
"WRITE" : "READ");
> > >  
> > > -	return kmap_atomic_prot(page, type, prot);
> > > +	ret = kmap_atomic_prot(page, type, prot);
> > > +
> > > +	spin_unlock(&hack_lock);
> > > +
> > > +	return ret;
> > >  }
> > >  #endif
> > >  
> > > 
> > > 
> > 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Pasi Kärkkäinen

2009-Jun-15 10:21 UTC

head link

Re: [Xen-devel] xen.git branch reorg / success with 2.6.30-rc3 pv_ops dom0

On Mon, Jun 15, 2009 at 11:03:17AM +0100, Ian Campbell
wrote:> On Thu, 2009-06-11 at 15:34 -0400, Pasi Kärkkäinen wrote:
> > On Thu, Jun 11, 2009 at 09:27:09PM +0300, Pasi Kärkkäinen wrote:
> > > On Thu, Jun 11, 2009 at 10:18:34AM +0100, Ian Campbell wrote:
> > > > Pasi, to validate the theory that you are seeing races
between unpinning
> > > > and kmap_atomic_pte can you give this biguglystick approach
to solving
> > > > it a go.
> > > > 
> > > 
> > > Guess what.. 
> > > 
> > > Now my dom0 didn''t crash !! (with only this patch
applied).
> > > It survived kernel compilation just fine.. first time so far with
pv_ops dom0.
> > > 
> > > I''ll try again, just in case.
> > > 
> > 
> > Yep, I tried again, and it still worked. 
> > 
> > No crashes anymore with this patch :) Congratulations and thanks!
> 
> Oh good, thanks for testing. The patch is not really a suitable
> long-term fix as it is but it sounds like Jeremy has some ideas.
> 
Yep. I''m only able to test patches until thursday this week, after that
I''ll be on summer vacation for a month and I don''t know yet
how much I''m
able to test patches during that period.. 
> I''m still curious how come you are the only one who sees this
issue.  I
> don''t recall you having lots of processors in your dmesg which
might
> make the race more common, nor do you have involuntary preempt enabled.
> Very strange. Oh well I guess it doesn''t matter now ;-)
> 
(XEN) Initializing CPU#0
(XEN) Detected 3000.241 MHz processor.
(XEN) CPU0: Intel(R) Pentium(R) 4 CPU 3.00GHz stepping 04

(XEN) Initializing CPU#1
(XEN) CPU1: Intel(R) Pentium(R) 4 CPU 3.00GHz stepping 04
(XEN) Total of 2 processors activated.

It''s old Intel P4 CPU with hyperthreading, so one physical CPU, seen as
two
logical CPUs.

dom0 kernel/domain is seeing both CPUs:

SMP: Allowing 2 CPUs, 0 hotplug CPUs
Initializing CPU#0
CPU0: Intel P4/Xeon Extended MCE MSRs (12) available
Initializing CPU#1
CPU1: Intel P4/Xeon Extended MCE MSRs (12) available
Brought up 2 CPUs


But yeah, if the reason for the problem looks valid, I guess it doesn''t
really matter then :)

-- Pasi
> Ian.
> 
> 
> > 
> > -- Pasi
> > 
> > > 
> > > > diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
> > > > index 1729178..beeb8e8 100644
> > > > --- a/arch/x86/xen/mmu.c
> > > > +++ b/arch/x86/xen/mmu.c
> > > > @@ -1145,9 +1145,12 @@ static int xen_unpin_page(struct
mm_struct *mm, struct page *page,
> > > >  	return 0;		/* never need to flush on unpin */
> > > >  }
> > > >  
> > > > +static DEFINE_SPINLOCK(hack_lock); /* Hack to sync unpin
against kmap_atomic_pte */
> > > > +
> > > >  /* Release a pagetables pages back as normal RW */
> > > >  static void __xen_pgd_unpin(struct mm_struct *mm, pgd_t
*pgd)
> > > >  {
> > > > +	spin_lock(&hack_lock);
> > > >  	xen_mc_batch();
> > > >  
> > > >  	xen_do_pin(MMUEXT_UNPIN_TABLE, PFN_DOWN(__pa(pgd)));
> > > > @@ -1173,6 +1176,7 @@ static void __xen_pgd_unpin(struct
mm_struct *mm, pgd_t *pgd)
> > > >  	__xen_pgd_walk(mm, pgd, xen_unpin_page, USER_LIMIT);
> > > >  
> > > >  	xen_mc_issue(0);
> > > > +	spin_unlock(&hack_lock);
> > > >  }
> > > >  
> > > >  static void xen_pgd_unpin(struct mm_struct *mm)
> > > > @@ -1521,6 +1525,9 @@ static void xen_pgd_free(struct
mm_struct *mm, pgd_t *pgd)
> > > >  static void *xen_kmap_atomic_pte(struct page *page, enum
km_type type)
> > > >  {
> > > >  	pgprot_t prot = PAGE_KERNEL;
> > > > +	void *ret;
> > > > +
> > > > +	spin_lock(&hack_lock);
> > > >  
> > > >  	if (PagePinned(page))
> > > >  		prot = PAGE_KERNEL_RO;
> > > > @@ -1530,7 +1537,11 @@ static void
*xen_kmap_atomic_pte(struct page *page, enum km_type type)
> > > >  		       page_to_pfn(page), type,
> > > >  		       (unsigned long)pgprot_val(prot) & _PAGE_RW ?
"WRITE" : "READ");
> > > >  
> > > > -	return kmap_atomic_prot(page, type, prot);
> > > > +	ret = kmap_atomic_prot(page, type, prot);
> > > > +
> > > > +	spin_unlock(&hack_lock);
> > > > +
> > > > +	return ret;
> > > >  }
> > > >  #endif
> > > >  
> > > > 
> > > > 
> > > 
> 
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Ian Campbell

2009-Jun-16 10:35 UTC

head link

Re: [Xen-devel] xen.git branch reorg / success with 2.6.30-rc3 pv_ops dom0

On Mon, 2009-06-15 at 06:21 -0400, Pasi Kärkkäinen
wrote:> 
> It''s old Intel P4 CPU with hyperthreading, so one physical CPU,
seen
> as two logical CPUs.
Interesting in the light of a comment from Ingo Molnar this morning on
LKML:> Plus this system is an old P4 HyperThreading dual-socket system: 
> pretty much the only thing HyperThreading is good for on that box is 
> finding SMP races: that CPU can (and will) yield between 
> hyperthreads on arbitrary instruction boundaries - opening up races 
> wide open.
Ian.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Pasi Kärkkäinen

2009-Jun-16 10:56 UTC

head link

Re: [Xen-devel] xen.git branch reorg / success with 2.6.30-rc3 pv_ops dom0

On Tue, Jun 16, 2009 at 11:35:25AM +0100, Ian Campbell
wrote:> On Mon, 2009-06-15 at 06:21 -0400, Pasi Kärkkäinen wrote:
> > 
> > It''s old Intel P4 CPU with hyperthreading, so one physical
CPU, seen
> > as two logical CPUs.
> 
> Interesting in the light of a comment from Ingo Molnar this morning on
> LKML:
> > Plus this system is an old P4 HyperThreading dual-socket system: 
> > pretty much the only thing HyperThreading is good for on that box is 
> > finding SMP races: that CPU can (and will) yield between 
> > hyperthreads on arbitrary instruction boundaries - opening up races 
> > wide open.
> 
I knew this box was good for something! :)

-- Pasi

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jeremy Fitzhardinge

2009-Jun-16 19:31 UTC

head link

Re: [Xen-devel] xen.git branch reorg / success with 2.6.30-rc3 pv_ops dom0

On 06/16/09 03:35, Ian Campbell wrote:> Interesting in the light of a comment from Ingo Molnar this morning on
> LKML:
>    
>> Plus this system is an old P4 HyperThreading dual-socket system:
>> pretty much the only thing HyperThreading is good for on that box is
>> finding SMP races: that CPU can (and will) yield between
>> hyperthreads on arbitrary instruction boundaries - opening up races
>> wide open.
>>      
Yes, I was wondering if HT could be a factor here.  My test server is 
HT, but I run it 64-bit...

     J

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Pasi Kärkkäinen

2009-Jun-29 21:16 UTC

head link

Re: [Xen-devel] xen.git branch reorg / success with 2.6.30-rc3 pv_ops dom0

On Thu, Jun 11, 2009 at 12:23:52PM -0700, Jeremy Fitzhardinge
wrote:> On 06/11/09 12:02, Pasi Kärkkäinen wrote:
> >But now with the latest "biguglystick" patch from Ian I was
able to
> >successfully run my kernel-compilation test.. (see the other mail).
> >
> >so it looks like the problem was found!
> >   
> 
> Yes.  Still mulling a proper fix though.
> 
When there''s something to test, let me know..

-- Pasi

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Dulloor

2009-Jun-29 21:23 UTC

head link

Re: [Xen-devel] xen.git branch reorg / success with 2.6.30-rc3 pv_ops dom0

Please check the other thread titled "pvops bug". I ran into issues
without
HT (on a core-2 quad machine). My crude fix as well as Ian''s crude
patch,
both serializing unpin pte call avoids crash. I can look into it once I find
time. If you have any clue, I can try earlier.

-dulloor

On Tue, Jun 16, 2009 at 3:31 PM, Jeremy Fitzhardinge
<jeremy@goop.org>wrote:
> On 06/16/09 03:35, Ian Campbell wrote:
>
>> Interesting in the light of a comment from Ingo Molnar this morning on
>> LKML:
>>
>>
>>> Plus this system is an old P4 HyperThreading dual-socket system:
>>> pretty much the only thing HyperThreading is good for on that box
is
>>> finding SMP races: that CPU can (and will) yield between
>>> hyperthreads on arbitrary instruction boundaries - opening up races
>>> wide open.
>>>
>>>
>>
> Yes, I was wondering if HT could be a factor here.  My test server is HT,
> but I run it 64-bit...
>
>
>    J
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jeremy Fitzhardinge

2009-Jul-22 18:16 UTC

head link

Re: [Xen-devel] xen.git branch reorg / success with 2.6.30-rc3 pv_ops dom0

On 06/11/09 02:18, Ian Campbell wrote:> Pasi, to validate the theory that you are seeing races between unpinning
> and kmap_atomic_pte can you give this biguglystick approach to solving
> it a go.
>   
I gave up trying to solve this in any kind of clever way and just
decided to go with a slightly cleaned-up version of this patch. 
Unfortunately, I don''t think it actually solves the problem because it
doesn''t prevent unpin from happening while the page is still kmapped;
it
just ends up using the spinlock as a barrier to move the race around to
some timing which is presumably mostly avoids the problem.

In principle the fix is to take the lock in xen_kmap_atomic and release
it in xen_kunmap_atomic.  Unfortunately this is pretty ugly and complex
because kmaps are 1) inherently per-cpu, and 2) there can be 2 levels of
kmapping at once.  This means that we''d need 2 per-cpu locks to allow
us
to hold these locks for the mapping duration without introducing
deadlocks.  However unpin (and pin, in principle) need to take *all*
these locks to exclude kmap on all cpus.

This is total overkill, since we only really care about excluding kmap
and unpin from a given pagetable, which suggests that the locks should
be part of the mm rather than global.  Unfortunately k(un)map_atomic_pte
don''t get the mm of the pagetable they''re trying to pin, and I
don''t
think we can assume its the current mm.

Another (pretty hideous) approach might be to make unpin check the state
of the KMAP_PTE[01] slots in each CPU''s kmap fixmaps and see if a
mapping exists for a page that its currently unpinning.  This also has
the problem of being inherently racy; if we unpin the page, there''s
going to be a little window after the unpin and before the kmap pte
update (even if they''re back-to-back batched hypercalls).

I guess we could have a global rw spinlock; kmap/unmap takes it for
reading, and so can be concurrent with all other kmaps, but pin/unpin
takes it for writing to exclude them.  That would work, but runs the
risk of pin/unpin from being livelocked out (I don''t think rwspins will
block new readers if there''s a pending writer).  Ugly, but its the only
thing I can think of which actually solves the problem.

Oh, crap, we don''t have a kunmap_atomic_pte hook.

Thoughts?  Am I overthinking this and missing something obvious?

(PS: avoid CONFIG_HIGHPTE)

    J
> diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
> index 1729178..beeb8e8 100644
> --- a/arch/x86/xen/mmu.c
> +++ b/arch/x86/xen/mmu.c
> @@ -1145,9 +1145,12 @@ static int xen_unpin_page(struct mm_struct *mm,
struct page *page,
>  	return 0;		/* never need to flush on unpin */
>  }
>  
> +static DEFINE_SPINLOCK(hack_lock); /* Hack to sync unpin against
kmap_atomic_pte */
> +
>  /* Release a pagetables pages back as normal RW */
>  static void __xen_pgd_unpin(struct mm_struct *mm, pgd_t *pgd)
>  {
> +	spin_lock(&hack_lock);
>  	xen_mc_batch();
>  
>  	xen_do_pin(MMUEXT_UNPIN_TABLE, PFN_DOWN(__pa(pgd)));
> @@ -1173,6 +1176,7 @@ static void __xen_pgd_unpin(struct mm_struct *mm,
pgd_t *pgd)
>  	__xen_pgd_walk(mm, pgd, xen_unpin_page, USER_LIMIT);
>  
>  	xen_mc_issue(0);
> +	spin_unlock(&hack_lock);
>  }
>  
>  static void xen_pgd_unpin(struct mm_struct *mm)
> @@ -1521,6 +1525,9 @@ static void xen_pgd_free(struct mm_struct *mm, pgd_t
*pgd)
>  static void *xen_kmap_atomic_pte(struct page *page, enum km_type type)
>  {
>  	pgprot_t prot = PAGE_KERNEL;
> +	void *ret;
> +
> +	spin_lock(&hack_lock);
>  
>  	if (PagePinned(page))
>  		prot = PAGE_KERNEL_RO;
> @@ -1530,7 +1537,11 @@ static void *xen_kmap_atomic_pte(struct page *page,
enum km_type type)
>  		       page_to_pfn(page), type,
>  		       (unsigned long)pgprot_val(prot) & _PAGE_RW ?
"WRITE" : "READ");
>  
> -	return kmap_atomic_prot(page, type, prot);
> +	ret = kmap_atomic_prot(page, type, prot);
> +
> +	spin_unlock(&hack_lock);
> +
> +	return ret;
>  }
>  #endif
>  
>
>
>   

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Xen devel - Apr 2009 - xen.git branch reorg

[Xen-devel] xen.git branch reorg

Re: [Xen-devel] xen.git branch reorg

Re: [Xen-devel] xen.git branch reorg

Re: [Xen-devel] xen.git branch reorg

Re: [Xen-devel] xen.git branch reorg

Re: [Xen-devel] xen.git branch reorg

Re: [Xen-devel] xen.git branch reorg

Re: [Xen-devel] xen.git branch reorg

Re: [Xen-devel] xen.git branch reorg

Re: [Xen-devel] xen.git branch reorg

Re: [Xen-devel] xen.git branch reorg

RE: [Xen-devel] xen.git branch reorg

Re: [Xen-devel] xen.git branch reorg

RE: [Xen-devel] xen.git branch reorg

Re: [Xen-devel] xen.git branch reorg

Re: [Xen-devel] xen.git branch reorg

Re: [Xen-devel] xen.git branch reorg / crash with 2.6.30-rc3 pv_ops dom0

Re: [Xen-devel] xen.git branch reorg / crash with 2.6.30-rc3 pv_ops dom0

Re: [Xen-devel] xen.git branch reorg / crash with 2.6.30-rc3 pv_ops dom0

Re: [Xen-devel] xen.git branch reorg / crash with 2.6.30-rc3 pv_ops dom0

Re: [Xen-devel] xen.git branch reorg

Re: [Xen-devel] xen.git branch reorg / crash with 2.6.30-rc3 pv_ops dom0

Re: [Xen-devel] xen.git branch reorg / crash with 2.6.30-rc3 pv_ops dom0

Re: [Xen-devel] xen.git branch reorg / crash with 2.6.30-rc3 pv_ops dom0

Re: [Xen-devel] xen.git branch reorg

Re: [Xen-devel] xen.git branch reorg / crash with 2.6.30-rc3 pv_ops dom0

Re: [Xen-devel] xen.git branch reorg / crash with 2.6.30-rc3 pv_ops dom0

Re: [Xen-devel] xen.git branch reorg

Re: [Xen-devel] xen.git branch reorg

Re: [Xen-devel] xen.git branch reorg

Re: [Xen-devel] xen.git branch reorg

Re: [Xen-devel] xen.git branch reorg

Re: [Xen-devel] xen.git branch reorg

Re: [Xen-devel] xen.git branch reorg

Re: [Xen-devel] xen.git branch reorg

Re: [Xen-devel] xen.git branch reorg

Re: [Xen-devel] xen.git branch reorg / crash with 2.6.30-rc3 pv_ops dom0

Re: [Xen-devel] xen.git branch reorg

[Xen-devel] Yum install xen on F10

Re: [Xen-devel] xen.git branch reorg

Re: [Xen-devel] Yum install xen on F10

Re: [Xen-devel] xen.git branch reorg / crash with 2.6.30-rc3 pv_ops dom0

Re: [Xen-devel] xen.git branch reorg

Re: [Xen-devel] xen.git branch reorg / crash with 2.6.30-rc3 pv_ops dom0

[Xen-devel] Re: Yum install xen on F10

Re: [Xen-devel] Yum install xen on F10

Re: [Xen-devel] Yum install xen on F10

Re: [Xen-devel] Yum install xen on F10

Re: [Xen-devel] Yum install xen on F10

Re: [Xen-devel] xen.git branch reorg / crash with 2.6.30-rc3 pv_ops dom0

Re: [Xen-devel] Yum install xen on F10

Re: [Xen-devel] Yum install xen on F10

Re: [Xen-devel] xen.git branch reorg / crash with 2.6.30-rc3 pv_ops dom0

Re: [Xen-devel] Yum install xen on F10

Re: [Xen-devel] Yum install xen on F10

Re: [Xen-devel] Yum install xen on F10

Re: [Xen-devel] xen.git branch reorg / crash with 2.6.30-rc3 pv_ops dom0

Re: [Xen-devel] xen.git branch reorg / crash with 2.6.30-rc3 pv_ops dom0

Re: [Xen-devel] xen.git branch reorg / crash with 2.6.30-rc3 pv_ops dom0

RE: [Xen-devel] xen.git branch reorg / crash with 2.6.30-rc3 pv_ops dom0

RE: [Xen-devel] xen.git branch reorg / crash with 2.6.30-rc3 pv_ops dom0

Re: [Xen-devel] xen.git branch reorg / crash with 2.6.30-rc3 pv_ops dom0

Re: [Xen-devel] xen.git branch reorg / success with 2.6.30-rc3 pv_ops dom0

Re: [Xen-devel] xen.git branch reorg / success with 2.6.30-rc3 pv_ops dom0

Re: [Xen-devel] xen.git branch reorg / success with 2.6.30-rc3 pv_ops dom0

Re: [Xen-devel] xen.git branch reorg / success with 2.6.30-rc3 pv_ops dom0

Re: [Xen-devel] xen.git branch reorg / success with 2.6.30-rc3 pv_ops dom0

Re: [Xen-devel] xen.git branch reorg / success with 2.6.30-rc3 pv_ops dom0 / CONFIG_HIGHPTE problems

Re: [Xen-devel] xen.git branch reorg / success with 2.6.30-rc3 pv_ops dom0 / CONFIG_HIGHPTE problems

Re: [Xen-devel] xen.git branch reorg / success with 2.6.30-rc3 pv_ops dom0

Re: [Xen-devel] xen.git branch reorg / success with 2.6.30-rc3 pv_ops dom0

Re: [Xen-devel] xen.git branch reorg / success with 2.6.30-rc3 pv_ops dom0

Re: [Xen-devel] xen.git branch reorg / success with 2.6.30-rc3 pv_ops dom0

Re: [Xen-devel] xen.git branch reorg / success with 2.6.30-rc3 pv_ops dom0

Re: [Xen-devel] xen.git branch reorg / success with 2.6.30-rc3 pv_ops dom0

Re: [Xen-devel] xen.git branch reorg / success with 2.6.30-rc3 pv_ops dom0

Re: [Xen-devel] xen.git branch reorg / success with 2.6.30-rc3 pv_ops dom0

Re: [Xen-devel] xen.git branch reorg / success with 2.6.30-rc3 pv_ops dom0

Re: [Xen-devel] xen.git branch reorg / success with 2.6.30-rc3 pv_ops dom0