Dear Konrad, just to let you know: I am using your patches[1] on my notebook (Thinkpad T61p) and they are working perfectly fine for me. I encountered three issues which I could solve: * Machine crashes some time after wakeup with "BUG: unable to handle kernel NULL pointer dereferenced at (null)". The crashing process was sshd as I am forwarding my window manager from a DomU to X with nouveau running on Dom0 with sdm. I fixed that by setting all interrupts in the BIOS to "auto-select" instead of the fixed default of "IRQ11". Since then I had no more crashes. * The DomUs do not resync their clock after Dom0 waking up. They''re basically continue to count the time as if the sleep never happened. I have to run ''ntpdate'' on resume on all the DomUs. I am not sure if there are any side effects of this; probably there is a more simple way to tell a DomU to reread clock from Dom0? * vbetool hangs at 100% CPU on resume (i/o waiting, I guess, because neither strace nor ltrace do show any activity). Simply killing vbetool (no -9) kind of "fixes" the issue. Probably I do not even need to run vbetool on resume. Anyways, thank you very much for your efforts in bringing decent Dom0 support to upstream kernel! Your patches applied cleanly to the Debian/testing package linux-image-3.0.0-1-amd64 (3.0.0-3) and work just fine! best regards, Adi Kriegisch [1] http://lists.xensource.com/archives/html/xen-devel/2011-08/msg01358.html _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>>> On 14.09.11 at 10:11, Adi Kriegisch <adi@cg.tuwien.ac.at> wrote: > * The DomUs do not resync their clock after Dom0 waking up. They''re > basically continue to count the time as if the sleep never happened. > I have to run ''ntpdate'' on resume on all the DomUs. I am not sure if > there are any side effects of this; probably there is a more simple way > to tell a DomU to reread clock from Dom0?This is a more fundamental problem - upstream pv-ops doesn''t make use of XENFP_settime (or its bogus alias DOM_SETTIME) at all; only Jeremy''s 2.6.32.x tree has this so far. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Wed, Sep 14, 2011 at 10:11:56AM +0200, Adi Kriegisch wrote:> Dear Konrad, > > just to let you know: I am using your patches[1] on my notebook (ThinkpadExcellent. Is it OK if I put ''Tested-by: Adi Kriegish" on them?> T61p) and they are working perfectly fine for me. I encountered three issuesWait, T61p.. Can you actually do 64-bit on that laptop?Or are you using a 32-bit hypervisor?> which I could solve: > * Machine crashes some time after wakeup with "BUG: unable to handle kernel > NULL pointer dereferenced at (null)". The crashing process was sshd as I > am forwarding my window manager from a DomU to X with nouveau running on > Dom0 with sdm. > I fixed that by setting all interrupts in the BIOS to "auto-select" > instead of the fixed default of "IRQ11". Since then I had no more crashes.Ok, any other data? Stack trace?> * The DomUs do not resync their clock after Dom0 waking up. They''re > basically continue to count the time as if the sleep never happened. > I have to run ''ntpdate'' on resume on all the DomUs. I am not sure if > there are any side effects of this; probably there is a more simple way > to tell a DomU to reread clock from Dom0?You know, I don''t know. I just never thought about that - um. I wonder if it is related to the RTC update patch that I''ve been meaning to take a look at: http://lists.xensource.com/archives/html/xen-devel/2010-02/msg00469.html> * vbetool hangs at 100% CPU on resume (i/o waiting, I guess, because > neither strace nor ltrace do show any activity). Simply killing vbetool > (no -9) kind of "fixes" the issue. Probably I do not even need to run > vbetool on resume.Why do you run it? Anyhow there is a patch for vbetool to work correctly with Nvidia drivers .. somewhere. ah, here. diff --git a/drivers/char/mem.c b/drivers/char/mem.c index 1256454..3d91e46 100644 --- a/drivers/char/mem.c +++ b/drivers/char/mem.c @@ -316,9 +316,14 @@ static int mmap_mem(struct file *file, struct vm_area_struct *vma) &vma->vm_page_prot)) return -EINVAL; - vma->vm_page_prot = phys_mem_access_prot(file, vma->vm_pgoff, - size, - vma->vm_page_prot); + vma->vm_flags |= VM_RESERVED | VM_IO | VM_PFNMAP | VM_DONTEXPAND; + vma->vm_page_prot = __pgprot( + pgprot_val(vm_get_page_prot(vma->vm_flags)) | + _PAGE_IOMAP | + pgprot_val(phys_mem_access_prot(file, + vma->vm_pgoff, + size, + vma->vm_page_prot))); vma->vm_ops = &mmap_mem_ops;> > Anyways, thank you very much for your efforts in bringing decent Dom0 > support to upstream kernel! Your patches applied cleanly to the > Debian/testing package linux-image-3.0.0-1-amd64 (3.0.0-3) and work just > fine!Woot!> > best regards, > Adi Kriegisch > > [1] http://lists.xensource.com/archives/html/xen-devel/2011-08/msg01358.html_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Dear Konrad, first off, I am really sorry to first post that everything is just fine and then -- while responding to your mail -- getting the same crash again. :-(> > which I could solve: > > * Machine crashes some time after wakeup with "BUG: unable to handle kernel > > NULL pointer dereferenced at (null)". The crashing process was sshd as I > > am forwarding my window manager from a DomU to X with nouveau running on > > Dom0 with sdm. > > I fixed that by setting all interrupts in the BIOS to "auto-select" > > instead of the fixed default of "IRQ11". Since then I had no more crashes. > > Ok, any other data? Stack trace?Find that stuff attached: the archive contains the kernel trace (I took a photo, typed the stuff and checked twice... In case you want to have the photo, just tell me) and two relevant parts of the syslog (9.6G uncompressed): First part is last two messages of the suspend process from yesterday and the wakeup messages from today morning. The other part is me plugging in my phone (which I used to take pictures of the kernel trace), mounting it as usb mass storage device and finally copying images dom0 to the domU that runs my desktop. Right after copying finished I was looking at the images for some minutes. I noticed these WARNINGs on all other crashes too. From the first appearance of these warnings it took 2 to 10 minutes to crash. Apps that triggered the warnings were: apt-get, bash, cp, dpkg, gzip, swapd, evtchn, xenfs, xm_wm, Xorg, vi, "rs:main", "kworker/u:30" and several more... I hope this helps. I''d be more than happy to help out with testing! -- Adi _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> > Excellent. Is it OK if I put ''Tested-by: Adi Kriegish" on them? > Sure, go ahead! ;-) > Update: No, the system just crashed while writing this mail after about 4 days > of uptime with many suspend-resume cycles in between... *sigh* :-(Hmmm.. I wonder if you are hitting the writecombine issue I''ve seen sometimes. Just to eliminate it, can you try ''nopat'' on the Linux command line? ..> > Ok, any other data? Stack trace? > Yes. I will send them in a second mail... I hope I can find all relevant > information.<nods>> > You know, I don''t know. I just never thought about that - um. I wonder > > if it is related to the RTC update patch that I''ve been meaning > > to take a look at: > > > > http://lists.xensource.com/archives/html/xen-devel/2010-02/msg00469.html > Sounds like it could be related. Shall I apply that patch? If so, which > hook takes care that the function is called?It kind of automatically hooks up. If you can apply it cleanly - sure. But it might not apply cleanly :-( _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Wed, Sep 14, 2011 at 10:28:54AM -0400, Konrad Rzeszutek Wilk wrote:> > > Excellent. Is it OK if I put ''Tested-by: Adi Kriegish" on them? > > Sure, go ahead! ;-) > > Update: No, the system just crashed while writing this mail after about 4 days > > of uptime with many suspend-resume cycles in between... *sigh* :-( > > Hmmm.. I wonder if you are hitting the writecombine issue I''ve seen sometimes. > Just to eliminate it, can you try ''nopat'' on the Linux command line?Sure. Do you know any way to make sure I am hitting the writecombine issue fast, so that I can make (kind of) sure everything is working?> > > You know, I don''t know. I just never thought about that - um. I wonder > > > if it is related to the RTC update patch that I''ve been meaning > > > to take a look at: > > > > > > http://lists.xensource.com/archives/html/xen-devel/2010-02/msg00469.html > > Sounds like it could be related. Shall I apply that patch? If so, which > > hook takes care that the function is called? > > It kind of automatically hooks up. If you can apply it cleanly - sure. But it > might not apply cleanly :-(It does not apply at all: first hunk fails because there have been some other includes added. second hunk fails because there is no more "#endif /* CONFIG_PARAVIRT_CLOCK_VSYSCALL */"... and this is the point where I can''t fix a bug because I do not know enough of the kernel/xen internals to know what to touch... -- Adi _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Wed, Sep 14, 2011 at 05:09:34PM +0200, Adi Kriegisch wrote:> On Wed, Sep 14, 2011 at 10:28:54AM -0400, Konrad Rzeszutek Wilk wrote: > > > > Excellent. Is it OK if I put ''Tested-by: Adi Kriegish" on them? > > > Sure, go ahead! ;-) > > > Update: No, the system just crashed while writing this mail after about 4 days > > > of uptime with many suspend-resume cycles in between... *sigh* :-( > > > > Hmmm.. I wonder if you are hitting the writecombine issue I''ve seen sometimes. > > Just to eliminate it, can you try ''nopat'' on the Linux command line? > Sure. Do you know any way to make sure I am hitting the writecombine issue > fast, so that I can make (kind of) sure everything is working?Mysterious applications crashing left and right. Under my box bash stopped working right and such. Pretty obvious that something went wrong.> > > > > You know, I don''t know. I just never thought about that - um. I wonder > > > > if it is related to the RTC update patch that I''ve been meaning > > > > to take a look at: > > > > > > > > http://lists.xensource.com/archives/html/xen-devel/2010-02/msg00469.html > > > Sounds like it could be related. Shall I apply that patch? If so, which > > > hook takes care that the function is called? > > > > It kind of automatically hooks up. If you can apply it cleanly - sure. But it > > might not apply cleanly :-( > It does not apply at all:Pfff.. well, I will try to rebase it in a couple of days. Can you ping in a week if I haven''t sent anything to you yet?> first hunk fails because there have been some other includes added. > second hunk fails because there is no more > "#endif /* CONFIG_PARAVIRT_CLOCK_VSYSCALL */"... and this is the point > where I can''t fix a bug because I do not know enough of the kernel/xen > internals to know what to touch... > > -- Adi > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 09/14/2011 01:41 AM, Jan Beulich wrote:>>>> On 14.09.11 at 10:11, Adi Kriegisch <adi@cg.tuwien.ac.at> wrote: >> * The DomUs do not resync their clock after Dom0 waking up. They''re >> basically continue to count the time as if the sleep never happened. >> I have to run ''ntpdate'' on resume on all the DomUs. I am not sure if >> there are any side effects of this; probably there is a more simple way >> to tell a DomU to reread clock from Dom0? > This is a more fundamental problem - upstream pv-ops doesn''t make > use of XENFP_settime (or its bogus alias DOM_SETTIME) at all; only > Jeremy''s 2.6.32.x tree has this so far.I was confused grepping for those: XEN*PF*_settime, or DOM*0*_SETTIME. Yeah, thanks for the reminder. I''ve queued that up for the next merge window. J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Wed, Sep 14, 2011 at 11:47:26AM -0400, Konrad Rzeszutek Wilk wrote:> On Wed, Sep 14, 2011 at 05:09:34PM +0200, Adi Kriegisch wrote: > > On Wed, Sep 14, 2011 at 10:28:54AM -0400, Konrad Rzeszutek Wilk wrote: > > > > > Excellent. Is it OK if I put ''Tested-by: Adi Kriegish" on them? > > > > Sure, go ahead! ;-)Now, I think, you may really go ahead! ;-) ''nopat'' did the trick for me.> > > Hmmm.. I wonder if you are hitting the writecombine issue I''ve seen sometimes. > > > Just to eliminate it, can you try ''nopat'' on the Linux command line? > > Sure. Do you know any way to make sure I am hitting the writecombine issue > > fast, so that I can make (kind of) sure everything is working? > Mysterious applications crashing left and right. Under my box bash stopped > working right and such. Pretty obvious that something went wrong.Hmmm... I rethought the workloads I had and found a way to reproduce the crashes -- more or less reliably: As I did a complete reinstallation of my notebook, I had a lot of data to copy. The first bunch of sutff was copied on Dom0 -- this was where I experienced the crashes; not immediately while copying but a little later (The warnings in the syslog always happened during copying, btw). Then -- after I basic setup was done -- I used xm block-attach and copied tons of stuff within a DomU. I did not experience a single crash while doing so. Do you want me to do something to further debug the issue? Just tell me what I could/should try to do! ;-)> > > > > http://lists.xensource.com/archives/html/xen-devel/2010-02/msg00469.html > > > > Sounds like it could be related. Shall I apply that patch? If so, which > > > > hook takes care that the function is called?[SNIP]> > It does not apply at all: > > Pfff.. well, I will try to rebase it in a couple of days. Can you ping in a week > if I haven''t sent anything to you yet?Yes, I will! ;-) -- Adi _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Wed, Sep 14, 2011 at 11:47:26AM -0400, Konrad Rzeszutek Wilk wrote:> On Wed, Sep 14, 2011 at 05:09:34PM +0200, Adi Kriegisch wrote: > > On Wed, Sep 14, 2011 at 10:28:54AM -0400, Konrad Rzeszutek Wilk wrote: > > > > > Excellent. Is it OK if I put ''Tested-by: Adi Kriegish" on them?Works like a charme with ''nopat''. No crash ever since.> > > > > You know, I don''t know. I just never thought about that - um. I wonder > > > > > if it is related to the RTC update patch that I''ve been meaning > > > > > to take a look at: > > > > > > > > > > http://lists.xensource.com/archives/html/xen-devel/2010-02/msg00469.html[SNIP]> Pfff.. well, I will try to rebase it in a couple of days. Can you ping in a week > if I haven''t sent anything to you yet?Any news on this one? I still have to resync my clock after acpi sleep... Thanks, Adi Kriegisch _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Tue, Sep 27, 2011 at 04:50:08PM +0200, Adi Kriegisch wrote:> On Wed, Sep 14, 2011 at 11:47:26AM -0400, Konrad Rzeszutek Wilk wrote: > > On Wed, Sep 14, 2011 at 05:09:34PM +0200, Adi Kriegisch wrote: > > > On Wed, Sep 14, 2011 at 10:28:54AM -0400, Konrad Rzeszutek Wilk wrote: > > > > > > Excellent. Is it OK if I put ''Tested-by: Adi Kriegish" on them? > Works like a charme with ''nopat''. No crash ever since. > > > > > > > You know, I don''t know. I just never thought about that - um. I wonder > > > > > > if it is related to the RTC update patch that I''ve been meaning > > > > > > to take a look at: > > > > > > > > > > > > http://lists.xensource.com/archives/html/xen-devel/2010-02/msg00469.html > [SNIP] > > Pfff.. well, I will try to rebase it in a couple of days. Can you ping in a week > > if I haven''t sent anything to you yet? > Any news on this one? I still have to resync my clock after acpi sleep...Jeremy just sent out the patches for review. http://lists.xensource.com/archives/html/xen-devel/2011-09/msg01452.html Please test. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel