George, before diving deeply into the PoD code, I hope you have some idea that might ease the debugging that''s apparently going to be needed. Following the comment immediately before p2m_pod_set_mem_target(), there''s an apparent inconsistency with the accounting: While the guest in question properly balloons down to its intended setting (1G, with a maxmem setting of 2G), the combination of the equations d->arch.p2m->pod.entry_count == B - P d->tot_pages == P + d->arch.p2m->pod.count doesn''t hold (provided I interpreted the meaning of B correctly - I took this from the guest balloon driver''s "Current allocation" report, converted to pages); there''s a difference of over 13000 pages. Obviously, as soon as the guest uses up enough of its memory, it will get crashed by the PoD code. In two runs I did, the difference (and hence the number of entries reported in the eventual crash message) was identical, implying to me that this is not a simple race, but rather a systematical problem. Even on the initial dump taken (when the guest was sitting at the boot manager screen), there already appears to be a difference of 800 pages (it''s my understanding that at this point the difference between entries and cache should equal the difference between maxmem and mem). Does this ring any bells? Any hints how to debug this? In any case I''m attaching the full log in case you want to look at it. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com lists.xensource.com/xen-devel
What seems likely to me is that Xen (setting the PoD target) and the balloon driver (allocating memory) have a different way of calculating the amount of guest memory. So the balloon driver thinks it''s done handing memory back to Xen when there are still more outstanding PoD entries than there are entries in the PoD memory pool. What balloon driver are you using? Can you let me know max_mem, target, and what the balloon driver has reached before calling it quits? (Although 13,000 pages is an awful lot to be off by: 54 MB...) Re what "B" means, below is a rather long-winded explanation that will, hopefully, be clear. :-) Hmm, I''m not sure what the guest balloon driver''s "Current allocation" means either. :-) Does it mean, "Size of the current balloon" (i.e., starts at 0 and grows as the balloon driver allocates guest pages and hands them back to Xen)? Or does it mean, "Amount of memory guest currently has allocated to it" (i.e., starts at static_max and goes down as the balloon driver allocates guest pages and hands them back to Xen)? In the comment, B does *not* mean "the size of the balloon" (i.e., the number of pages allocated from the guest OS by the balloon driver). Rather, B means "Amount of memory the guest currently thinks it has allocated to it." B starts at M at boot. The balloon driver will try to make B=T by inflating the size of the balloon to M-T. Clear as mud? Let''s make a concrete example. Let''s say static max is 409,600K (100,000 pages). M=100,000 and doesn''t change. Let''s say that T is 50,000. At boot: B == M == 100,000. P == 0 tot_pages = pod.count == 50,000 entry_count == 100,000 Thus things hold: * 0 <= P (0) <= T (50,000) <= B (100,000) <= M (100,000) * entry_count (100,000) == B (100,000) - P (0) * tot_pages (50,000) == P (0) + pod.count (50,000) As the guest boots, pages will be populated from the cache; P increases, but entry_count and pod.count decrease. Let''s say that 25,000 pages get allocated just before the balloon driver runs: * 0 <= P (25,000) <= T (50,000) <= B(100,000) <= M (100,000) * entry_count (75,000) == B (100,000) - P (25,000) * tot_pages (50,000) == P (25,000) + pod.count (25,000) Then the balloon driver runs. It should try to allocate 50,000 pages total (M - T). For simplicity, let''s say that the balloon driver only allocates un-allocated pages. When it''s halfway there, having allocated 25,000 pages, things look like this: * 0 <= P (25,000) <= T (50,000) <= B (75,000) <= M (100,000) * entry_count (50,000) == B (75,000) - P (25,000) * tot_pages (50,000) == P (25,000) + pod.count (25,000) Eventually the balloon driver should reach its new target of 50,000, having allocated 50,000 pages: * 0 <= P (25,000) <= T (50,000) <= B (50,000) <= M(100,000) * entry_count(25,000) == B(50,000) - P (25,000) * tot_pages (50,000) == P(25,000) + pod.count(25,000) The reason for the logic is so that we can do the Right Thing if, after the balloon driver has ballooned half way (to 75,000 pages), the target is changed. If you''re not changing the target before the balloon driver has reached its target, -George Jan Beulich wrote:> George, > > before diving deeply into the PoD code, I hope you have some idea that > might ease the debugging that''s apparently going to be needed. > > Following the comment immediately before p2m_pod_set_mem_target(), > there''s an apparent inconsistency with the accounting: While the guest > in question properly balloons down to its intended setting (1G, with a > maxmem setting of 2G), the combination of the equations > > d->arch.p2m->pod.entry_count == B - P > d->tot_pages == P + d->arch.p2m->pod.count > > doesn''t hold (provided I interpreted the meaning of B correctly - I > took this from the guest balloon driver''s "Current allocation" report, > converted to pages); there''s a difference of over 13000 pages. > Obviously, as soon as the guest uses up enough of its memory, it > will get crashed by the PoD code. > > In two runs I did, the difference (and hence the number of entries > reported in the eventual crash message) was identical, implying to > me that this is not a simple race, but rather a systematical problem. > > Even on the initial dump taken (when the guest was sitting at the > boot manager screen), there already appears to be a difference of > 800 pages (it''s my understanding that at this point the difference > between entries and cache should equal the difference between > maxmem and mem). > > Does this ring any bells? Any hints how to debug this? In any case > I''m attaching the full log in case you want to look at it. > > Jan >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com lists.xensource.com/xen-devel
>>> George Dunlap <george.dunlap@eu.citrix.com> 29.01.10 17:01 >>> >What seems likely to me is that Xen (setting the PoD target) and the >balloon driver (allocating memory) have a different way of calculating >the amount of guest memory. So the balloon driver thinks it''s done >handing memory back to Xen when there are still more outstanding PoD >entries than there are entries in the PoD memory pool. What balloon >driver are you using?The one from our forward ported 2.6.32.x tree. I would suppose there are no significant differences here to the one in 2.6.18, but I wonder how precise the totalram_pages value is that the driver (also in 2.6.18) uses to initialize bs.current_pages. Given that with PoD it is now crucial for the guest to balloon out enough memory, using an imprecise start value is not acceptable anymore. The question however is what more reliable data source one could use (given that any non-exported kernel object is out of question). And I wonder how this works reliably for others...>Can you let me know max_mem, target, and what the >balloon driver has reached before calling it quits? (Although 13,000 >pages is an awful lot to be off by: 54 MB...)The balloon driver reports the expected state: target and allocation are 1G. But yes - how did I not pay attention to this - the balloon is *far* from being 1G in size (and in fact the difference is probably matching quite closely those 54M). Thanks a lot! Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com lists.xensource.com/xen-devel
PoD is not critical to balloon out guest memory. You can boot with mem == maxmem and then balloon down afterwards just as you could before, without involving PoD. (Or at least, you should be able to; if you can''t then it''s a bug.) It''s just that with PoD you can do something you''ve always wanted to do but never knew it: boot with 1GiB with the option of expanding up to 2GiB later. :-) With the 54 megabyte difference: It''s not like a GiB vs GB thing, is it? (i.e., 2^30 vs 10^9?) The difference between 1GiB (2^30) and 1 GB (10^9) is about 74 megs, or 18,000 pages. I guess that is a weakness of PoD in general: we can''t control the guest balloon driver, but we rely on it to have the same model of how to translate "target" into # pages in the balloon as the PoD code. -George Jan Beulich wrote:>>>> George Dunlap <george.dunlap@eu.citrix.com> 29.01.10 17:01 >>> >>>> >> What seems likely to me is that Xen (setting the PoD target) and the >> balloon driver (allocating memory) have a different way of calculating >> the amount of guest memory. So the balloon driver thinks it''s done >> handing memory back to Xen when there are still more outstanding PoD >> entries than there are entries in the PoD memory pool. What balloon >> driver are you using? >> > > The one from our forward ported 2.6.32.x tree. I would suppose there > are no significant differences here to the one in 2.6.18, but I wonder > how precise the totalram_pages value is that the driver (also in 2.6.18) > uses to initialize bs.current_pages. Given that with PoD it is now crucial > for the guest to balloon out enough memory, using an imprecise start > value is not acceptable anymore. The question however is what more > reliable data source one could use (given that any non-exported > kernel object is out of question). And I wonder how this works reliably > for others... > > >> Can you let me know max_mem, target, and what the >> balloon driver has reached before calling it quits? (Although 13,000 >> pages is an awful lot to be off by: 54 MB...) >> > > The balloon driver reports the expected state: target and allocation > are 1G. But yes - how did I not pay attention to this - the balloon is > *far* from being 1G in size (and in fact the difference is probably > matching quite closely those 54M). > > Thanks a lot! > > Jan > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com lists.xensource.com/xen-devel
>>> George Dunlap 01/29/10 7:30 PM >>> >PoD is not critical to balloon out guest memory. You can boot with mem >== maxmem and then balloon down afterwards just as you could before, >without involving PoD. (Or at least, you should be able to; if you >can''t then it''s a bug.) It''s just that with PoD you can do something >you''ve always wanted to do but never knew it: boot with 1GiB with the >option of expanding up to 2GiB later. :-)Oh, no, that''s not what I meant. What I really wanted to say is that with PoD, a properly functioning balloon driver in the guest is crucial for it to stay alive long enough.>With the 54 megabyte difference: It''s not like a GiB vs GB thing, is >it? (i.e., 2^30 vs 10^9?) The difference between 1GiB (2^30) and 1 GB >(10^9) is about 74 megs, or 18,000 pages.No, that''s not the problem. As I understand it now, the problem is that totalram_pages (which the balloon driver bases its calculations on) reflects all memory available after all bootmem allocations were done (i.e. includes neither the static kernel image nor any memory allocated before or from the bootmem allocator).>I guess that is a weakness of PoD in general: we can''t control the guest >balloon driver, but we rely on it to have the same model of how to >translate "target" into # pages in the balloon as the PoD code.I think this isn''t a weakness of PoD, but a design issue in the balloon driver''s xenstore interface: While a target value shown in or obtained from the /proc and /sys interfaces naturally can be based on (and reflect) any internal kernel state, the xenstore interface should only use numbers in terms of full memory amount given to the guest. Hence a target value read from the memory/target node should be adjusted before put in relation to totalram_pages. And I think this is a general misconception in the current implementation (i.e. it should be corrected not only for the HVM case, but for the pv one as well). The bad aspect of this is that it will require a fixed balloon driver in any HVM guest that has maxmem>mem when the underlying Xen gets updated to a version that supports PoD. I cannot, however, see an OS and OS-version independent alternative (i.e. something to be done in the PoD code or the tools). Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com lists.xensource.com/xen-devel
So did you track down where the math error is? Do we have a plan to fix this going forward? -George Jan Beulich wrote:>>>> George Dunlap 01/29/10 7:30 PM >>> >>>> >> PoD is not critical to balloon out guest memory. You can boot with mem >> == maxmem and then balloon down afterwards just as you could before, >> without involving PoD. (Or at least, you should be able to; if you >> can''t then it''s a bug.) It''s just that with PoD you can do something >> you''ve always wanted to do but never knew it: boot with 1GiB with the >> option of expanding up to 2GiB later. :-) >> > > Oh, no, that''s not what I meant. What I really wanted to say is that > with PoD, a properly functioning balloon driver in the guest is crucial > for it to stay alive long enough. > > >> With the 54 megabyte difference: It''s not like a GiB vs GB thing, is >> it? (i.e., 2^30 vs 10^9?) The difference between 1GiB (2^30) and 1 GB >> (10^9) is about 74 megs, or 18,000 pages. >> > > No, that''s not the problem. As I understand it now, the problem is > that totalram_pages (which the balloon driver bases its calculations > on) reflects all memory available after all bootmem allocations were > done (i.e. includes neither the static kernel image nor any memory > allocated before or from the bootmem allocator). > > >> I guess that is a weakness of PoD in general: we can''t control the guest >> balloon driver, but we rely on it to have the same model of how to >> translate "target" into # pages in the balloon as the PoD code. >> > > I think this isn''t a weakness of PoD, but a design issue in the balloon > driver''s xenstore interface: While a target value shown in or obtained > from the /proc and /sys interfaces naturally can be based on (and > reflect) any internal kernel state, the xenstore interface should only > use numbers in terms of full memory amount given to the guest. > Hence a target value read from the memory/target node should be > adjusted before put in relation to totalram_pages. And I think this > is a general misconception in the current implementation (i.e. it > should be corrected not only for the HVM case, but for the pv one > as well). > > The bad aspect of this is that it will require a fixed balloon driver > in any HVM guest that has maxmem>mem when the underlying Xen > gets updated to a version that supports PoD. I cannot, however, > see an OS and OS-version independent alternative (i.e. something > to be done in the PoD code or the tools). > > Jan > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com lists.xensource.com/xen-devel
It was in the balloon driver''s interaction with xenstore - see 2.6.18 c/s 989. I have to admit that I cannot see how this issue could slip attention when the PoD code was introduced - any guest with PoD in use and an unfixed balloon driver is set to crash sooner or later (implying the unfortunate effect of requiring an update of the pv drivers in HVM guests when upgrading Xen from a PoD-incapable to a PoD-capable version). Jan>>> George Dunlap <george.dunlap@eu.citrix.com> 03.02.10 19:42 >>>So did you track down where the math error is? Do we have a plan to fix this going forward? -George Jan Beulich wrote:>>>> George Dunlap 01/29/10 7:30 PM >>> >>>> >> PoD is not critical to balloon out guest memory. You can boot with mem >> == maxmem and then balloon down afterwards just as you could before, >> without involving PoD. (Or at least, you should be able to; if you >> can''t then it''s a bug.) It''s just that with PoD you can do something >> you''ve always wanted to do but never knew it: boot with 1GiB with the >> option of expanding up to 2GiB later. :-) >> > > Oh, no, that''s not what I meant. What I really wanted to say is that > with PoD, a properly functioning balloon driver in the guest is crucial > for it to stay alive long enough. > > >> With the 54 megabyte difference: It''s not like a GiB vs GB thing, is >> it? (i.e., 2^30 vs 10^9?) The difference between 1GiB (2^30) and 1 GB >> (10^9) is about 74 megs, or 18,000 pages. >> > > No, that''s not the problem. As I understand it now, the problem is > that totalram_pages (which the balloon driver bases its calculations > on) reflects all memory available after all bootmem allocations were > done (i.e. includes neither the static kernel image nor any memory > allocated before or from the bootmem allocator). > > >> I guess that is a weakness of PoD in general: we can''t control the guest >> balloon driver, but we rely on it to have the same model of how to >> translate "target" into # pages in the balloon as the PoD code. >> > > I think this isn''t a weakness of PoD, but a design issue in the balloon > driver''s xenstore interface: While a target value shown in or obtained > from the /proc and /sys interfaces naturally can be based on (and > reflect) any internal kernel state, the xenstore interface should only > use numbers in terms of full memory amount given to the guest. > Hence a target value read from the memory/target node should be > adjusted before put in relation to totalram_pages. And I think this > is a general misconception in the current implementation (i.e. it > should be corrected not only for the HVM case, but for the pv one > as well). > > The bad aspect of this is that it will require a fixed balloon driver > in any HVM guest that has maxmem>mem when the underlying Xen > gets updated to a version that supports PoD. I cannot, however, > see an OS and OS-version independent alternative (i.e. something > to be done in the PoD code or the tools). > > Jan > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com lists.xensource.com/xen-devel
Yeah, the OSS tree doesn''t get the kind of regression testing it really needs at the moment. I was using the OSS balloon drivers when I implemented and submitted the PoD code last year. I didn''t have any trouble then, and I was definitely using up all of the memory. But I haven''t done any testing on OSS since then, basically. -George On Thu, Feb 4, 2010 at 12:17 AM, Jan Beulich <JBeulich@novell.com> wrote:> It was in the balloon driver''s interaction with xenstore - see 2.6.18 c/s > 989. > > I have to admit that I cannot see how this issue could slip attention > when the PoD code was introduced - any guest with PoD in use and > an unfixed balloon driver is set to crash sooner or later (implying the > unfortunate effect of requiring an update of the pv drivers in HVM > guests when upgrading Xen from a PoD-incapable to a PoD-capable > version). > > Jan > >>>> George Dunlap <george.dunlap@eu.citrix.com> 03.02.10 19:42 >>> > So did you track down where the math error is? Do we have a plan to fix > this going forward? > -George > > Jan Beulich wrote: >>>>> George Dunlap 01/29/10 7:30 PM >>> >>>>> >>> PoD is not critical to balloon out guest memory. You can boot with mem >>> == maxmem and then balloon down afterwards just as you could before, >>> without involving PoD. (Or at least, you should be able to; if you >>> can''t then it''s a bug.) It''s just that with PoD you can do something >>> you''ve always wanted to do but never knew it: boot with 1GiB with the >>> option of expanding up to 2GiB later. :-) >>> >> >> Oh, no, that''s not what I meant. What I really wanted to say is that >> with PoD, a properly functioning balloon driver in the guest is crucial >> for it to stay alive long enough. >> >> >>> With the 54 megabyte difference: It''s not like a GiB vs GB thing, is >>> it? (i.e., 2^30 vs 10^9?) The difference between 1GiB (2^30) and 1 GB >>> (10^9) is about 74 megs, or 18,000 pages. >>> >> >> No, that''s not the problem. As I understand it now, the problem is >> that totalram_pages (which the balloon driver bases its calculations >> on) reflects all memory available after all bootmem allocations were >> done (i.e. includes neither the static kernel image nor any memory >> allocated before or from the bootmem allocator). >> >> >>> I guess that is a weakness of PoD in general: we can''t control the guest >>> balloon driver, but we rely on it to have the same model of how to >>> translate "target" into # pages in the balloon as the PoD code. >>> >> >> I think this isn''t a weakness of PoD, but a design issue in the balloon >> driver''s xenstore interface: While a target value shown in or obtained >> from the /proc and /sys interfaces naturally can be based on (and >> reflect) any internal kernel state, the xenstore interface should only >> use numbers in terms of full memory amount given to the guest. >> Hence a target value read from the memory/target node should be >> adjusted before put in relation to totalram_pages. And I think this >> is a general misconception in the current implementation (i.e. it >> should be corrected not only for the HVM case, but for the pv one >> as well). >> >> The bad aspect of this is that it will require a fixed balloon driver >> in any HVM guest that has maxmem>mem when the underlying Xen >> gets updated to a version that supports PoD. I cannot, however, >> see an OS and OS-version independent alternative (i.e. something >> to be done in the PoD code or the tools). >> >> Jan >> >> > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > lists.xensource.com/xen-devel >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com lists.xensource.com/xen-devel
On Thu, Feb 4, 2010 at 2:12 PM, George Dunlap <George.Dunlap@eu.citrix.com> wrote:> Yeah, the OSS tree doesn''t get the kind of regression testing it > really needs at the moment. I was using the OSS balloon drivers when > I implemented and submitted the PoD code last year. I didn''t have any > trouble then, and I was definitely using up all of the memory. But I > haven''t done any testing on OSS since then, basically. >Is it expected that booting HVM guests with maxmem > memory is unstable? In testing 3.4.3-rc2 (kernel 2.6.18 c/s 993) I can easily crash the guest and occasionally the entire server. Keith Coleman _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com lists.xensource.com/xen-devel
> On Thu, Feb 4, 2010 at 2:12 PM, George Dunlap > <George.Dunlap@eu.citrix.com> wrote: > > Yeah, the OSS tree doesn''t get the kind of regression testing it > > really needs at the moment. I was using the OSS balloon drivers when > > I implemented and submitted the PoD code last year. I didn''t have any > > trouble then, and I was definitely using up all of the memory. But I > > haven''t done any testing on OSS since then, basically. > > > > Is it expected that booting HVM guests with maxmem > memory is > unstable? In testing 3.4.3-rc2 (kernel 2.6.18 c/s 993) I can easily > crash the guest and occasionally the entire server.Obviously the platform should never crash, and that''s very concerning. Are you running a balloon driver in the guest? It''s essential that you do, because it needs to get in fairly early in the guest boot and allocate the difference between maxmem and target memory. The populate-on-demand code exists just to cope with things like the memory scrubber running ahead of the balloon driver. If you''re not running a balloon driver the guest is doomed to crash as soon as it tries using more than target memory. All of this requires coordination between the tool stack, PoD code, and PV drivers so that sufficient memory gets ballooned out. I expect the combination that has had most testing is the XCP toolstack and Citrix PV windows drivers. Ian You need to be running a balloon driver in the guest: it needs _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com lists.xensource.com/xen-devel
>>> Keith Coleman <list.keith@scaltro.com> 19.02.10 01:03 >>> >On Thu, Feb 4, 2010 at 2:12 PM, George Dunlap ><George.Dunlap@eu.citrix.com> wrote: >> Yeah, the OSS tree doesn''t get the kind of regression testing it >> really needs at the moment. I was using the OSS balloon drivers when >> I implemented and submitted the PoD code last year. I didn''t have any >> trouble then, and I was definitely using up all of the memory. But I >> haven''t done any testing on OSS since then, basically. >> > >Is it expected that booting HVM guests with maxmem > memory is >unstable? In testing 3.4.3-rc2 (kernel 2.6.18 c/s 993) I can easily >crash the guest and occasionally the entire server.Crashing the guest is expected if the guest doesn''t have a fixed balloon driver (i.e. the mentioned c/s would need to be in the sources the pv drivers for the guest were built from). Crashing the host is certainly unacceptable - please provide logs thereof. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com lists.xensource.com/xen-devel
On Fri, Feb 19, 2010 at 1:53 AM, Ian Pratt <Ian.Pratt@eu.citrix.com> wrote:>> On Thu, Feb 4, 2010 at 2:12 PM, George Dunlap >> <George.Dunlap@eu.citrix.com> wrote: >> > Yeah, the OSS tree doesn''t get the kind of regression testing it >> > really needs at the moment. I was using the OSS balloon drivers when >> > I implemented and submitted the PoD code last year. I didn''t have any >> > trouble then, and I was definitely using up all of the memory. But I >> > haven''t done any testing on OSS since then, basically. >> > >> >> Is it expected that booting HVM guests with maxmem > memory is >> unstable? In testing 3.4.3-rc2 (kernel 2.6.18 c/s 993) I can easily >> crash the guest and occasionally the entire server. > > Obviously the platform should never crash, and that''s very concerning. > > Are you running a balloon driver in the guest? It''s essential that you do, because it needs to get in fairly early in the guest boot and allocate the difference between maxmem and target memory. The populate-on-demand code exists just to cope with things like the memory scrubber running ahead of the balloon driver. If you''re not running a balloon driver the guest is doomed to crash as soon as it tries using more than target memory. > > All of this requires coordination between the tool stack, PoD code, and PV drivers so that sufficient memory gets ballooned out. I expect the combination that has had most testing is the XCP toolstack and Citrix PV windows drivers. >Initially I was using the XCP 0.1.1 WinPV drivers (win server 2003 sp2) and the guest crashed when I tried to install software via emulated cdrom. Nothing about the crash was reported in the qemu log file and xend.log wasn''t very helpful either but here''s the relevant portion: [2010-02-17 20:42:49 4253] DEBUG (DevController:139) Waiting for devices vtpm. [2010-02-17 20:42:49 4253] INFO (XendDomain:1182) Domain win2 (30) unpaused. [2010-02-17 20:48:05 4253] WARNING (XendDomainInfo:1888) Domain has crashed: name=win2 id=30. [2010-02-17 20:48:06 4253] DEBUG (XendDomainInfo:2734) XendDomainInfo.destroy: domid=30 [2010-02-17 20:48:06 4253] DEBUG (XendDomainInfo:2209) Destroying device model I unsuccessfully attempted the install several more times then tried copying files from the emulated cd which also crashed the guest each time. I wasn''t even thinking about the fact that I had set maxmem/pod so I blamed the xcp winpv drivers and switched to gplpv (0.10.0.138). Same crashes with gplpv. At this point I hadn''t checked ''xm dmesg'' which was the only place that the pod/p2m error is reported so I changed to pure HVM mode and tried to copy the files from emulated cd. That''s when the real trouble started. The rdp and vnc connections to the guest froze as did the ssh to the dom0. This server was also hosting 7 linux pv guests. I could ping the guests and partially load some of their websites but couldn''t login via ssh. I suspeced that the HDDs were overloaded causing disk io to block the guests. I was on site so I went to check server and was shocked to find no disk activity. The monitor output was blank and I couldnt wake it up. Maybe the usb keyboard was unable to be enumerated because I couldnt even toggle the numlock, etc after several reconnections. I power cycled the host and checked the logs but there was no evidence of a crash other than one of the software raid devices being unclean on startup. Perhaps there was interesting data logged to ''xm dmesg'' or waiting to be written to disk at the time of the crash. I''m afraid this server/mb is incapable of logging data to the serial port. I''ve attempted to do so several times both before and after this crash. Of course the simple fix is to remove maxmem from the domU config file for the time being. Eventually people will use pod on production systems. Relying on the guest to have a solid balloon driver is unacceptable. A guest could accidentally (or otherwise) remove the pv drivers to bring down an entire host. When I can free up a server with serial logging for testing I will try to reproduce this crash. Keith Coleman _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com lists.xensource.com/xen-devel
On Fri, Feb 19, 2010 at 08:19:15AM +0000, Jan Beulich wrote:> >>> Keith Coleman <list.keith@scaltro.com> 19.02.10 01:03 >>> > >On Thu, Feb 4, 2010 at 2:12 PM, George Dunlap > ><George.Dunlap@eu.citrix.com> wrote: > >> Yeah, the OSS tree doesn''t get the kind of regression testing it > >> really needs at the moment. I was using the OSS balloon drivers when > >> I implemented and submitted the PoD code last year. I didn''t have any > >> trouble then, and I was definitely using up all of the memory. But I > >> haven''t done any testing on OSS since then, basically. > >> > > > >Is it expected that booting HVM guests with maxmem > memory is > >unstable? In testing 3.4.3-rc2 (kernel 2.6.18 c/s 993) I can easily > >crash the guest and occasionally the entire server. > > Crashing the guest is expected if the guest doesn''t have a fixed > balloon driver (i.e. the mentioned c/s would need to be in the > sources the pv drivers for the guest were built from). > > Crashing the host is certainly unacceptable - please provide logs > thereof. >Was this resolved? Someone was complaining recently that maxmem != memory crashes his Xen host.. -- Pasi _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com lists.xensource.com/xen-devel