Jacob Gorm Hansen
2004-Apr-04 16:02 UTC
[Xen-devel] Stale mfns in update_queue in XenoLinux, and suspend/resume
hi, it seems the suspend code in arch/xen/kernel/setup.c does not flush to mmu_update queue prior to suspend, and that as a result it may crash after resumption as a result of stale machine page frame references in the queue. Is this correct/should this behaviour be fixed? I am currently investigating a crash in my own migration code, and though I do flush the queue prior to obtaining a checkpoint, I still seem to be hit occasionally by stale references somewhere. If suspension is going to be safe, I guess all uses of machine addresses should be treated as critical regions, to make sure a suspend/resume does not happen while they are still in scope? I know this will be problematic because of the batching of mmu-updates, perhaps it would be wise to revert to the old behavior of specifying them as virtual addresses, or maybe they should be converted on the fly, in a cli() context right before the hypercall? Jacob ------------------------------------------------------- This SF.Net email is sponsored by: IBM Linux Tutorials Free Linux tutorial presented by Daniel Robbins, President and CEO of GenToo technologies. Learn everything from fundamentals to system administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
Keir Fraser
2004-Apr-05 06:48 UTC
Re: [Xen-devel] Stale mfns in update_queue in XenoLinux, and suspend/resume
> it seems the suspend code in arch/xen/kernel/setup.c does not flush to > mmu_update queue prior to suspend, and that as a result it may crash > after resumption as a result of stale machine page frame references in > the queue. Is this correct/should this behaviour be fixed? I am > currently investigating a crash in my own migration code, and though I > do flush the queue prior to obtaining a checkpoint, I still seem to be > hit occasionally by stale references somewhere. > > If suspension is going to be safe, I guess all uses of machine addresses > should be treated as critical regions, to make sure a suspend/resume > does not happen while they are still in scope? I know this will be > problematic because of the batching of mmu-updates, perhaps it would be > wise to revert to the old behavior of specifying them as virtual > addresses, or maybe they should be converted on the fly, in a cli() > context right before the hypercall?Suspend/resume occurs in a process context. Since Xenolinux is uniprocessor, I think that this should mean that there are no outstanding page-update requests. Thinking about it, though, it''s possible that interrupt handlers and softirqs may add stuff to teh update queue. For safety you might want to flush it immediately after __cli(). -- Keir ------------------------------------------------------- This SF.Net email is sponsored by: IBM Linux Tutorials Free Linux tutorial presented by Daniel Robbins, President and CEO of GenToo technologies. Learn everything from fundamentals to system administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel