Byrne, John (HP Labs)
2010-Apr-22 20:57 UTC
[Xen-devel] event_lock not initialized in the idle domain (permitted actions in a tasklet?)
I am playing with the xenpaging code that was checked into xen 4.0 and I hit a bug because it sends an event channel notification from a tasklet which was getting run from the idle domain on my box. Since domain_create() does not perform the evtchn_init for the idle domain, the event_lock was not initialized and the tasklet would hang the cpu when it tried to acquire the lock. While my immediate problem seems easy enough to work around --- I really can''t see the reason for the tasklet in the first place, so I just got rid of it --- the underlying issue needs a look. Should domain_create() simply initialize all of the idle_domain structure? As far as I can tell, the only reason it doesn''t is to save a little memory. While the event_lock issue can be dealt with simply enough by breaking out the initialization separately, it seems that having the special-case code for the idle domain opens up the possibility for bugs with respect to operations from tasklets. Thanks, John Byrne _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2010-Apr-22 21:18 UTC
Re: [Xen-devel] event_lock not initialized in the idle domain (permitted actions in a tasklet?)
It ought to be the destination domain''s event-channel info that gets accessed, not that of the domain that happens to be running currently. Idle domains have no event-channel info because it makes no sense for them to have it. I don''t know what xenpaging code you are specifically referring to, but sounds like it is simply broken. K. On 22/04/2010 21:57, "Byrne, John (HP Labs)" <john.l.byrne@hp.com> wrote:> I am playing with the xenpaging code that was checked into xen 4.0 and I hit a > bug because it sends an event channel notification from a tasklet which was > getting run from the idle domain on my box. Since domain_create() does not > perform the evtchn_init for the idle domain, the event_lock was not > initialized and the tasklet would hang the cpu when it tried to acquire the > lock. > > While my immediate problem seems easy enough to work around --- I really can''t > see the reason for the tasklet in the first place, so I just got rid of it --- > the underlying issue needs a look. Should domain_create() simply initialize > all of the idle_domain structure? As far as I can tell, the only reason it > doesn''t is to save a little memory. While the event_lock issue can be dealt > with simply enough by breaking out the initialization separately, it seems > that having the special-case code for the idle domain opens up the possibility > for bugs with respect to operations from tasklets. > > Thanks, > > John Byrne > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Byrne, John (HP Labs)
2010-Apr-22 22:20 UTC
RE: [Xen-devel] event_lock not initialized in the idle domain (permitted actions in a tasklet?)
Keir, Well yes, the code is broken, but I thought the problem may be more general. Looking at the code more carefully, I think you''re right in that the code is even more broken than I thought and that there is no general problem. This is the tasklet in question from xen/arch/x86/mm/mem_event.c: -static void mem_event_notify(struct domain *d) -{ - prepare_wait_on_xen_event_channel(d->mem_event.xen_port); - notify_via_xen_event_channel(d->mem_event.xen_port); -} The prepare_wait_on_xen_event_channel() looks to be totally bogus. The notify_via_xen_event_channel() is the source of my problem. The first thing it does is grab the lock on the current domain. If the tasklet is running on the idle domain, it hangs because the event_lock of the current domain is not initialized. Looking at the code for notify_via_xen_event_channel() more carefully, it seems it should be calling evtchn_send() instead. I could provide a patch that does this. As I said in the my first e-mail, I don''t see any advantage in doing the notification from a tasklet and my inclination would be to delete it as well and just make the call directly. I''m a little worried that I am only interested in xenpaging at the moment and it looks like it might impact the sharing path as well. John Byrne> -----Original Message----- > From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] > Sent: Thursday, April 22, 2010 2:18 PM > To: Byrne, John (HP Labs); xen-devel@lists.xensource.com > Subject: Re: [Xen-devel] event_lock not initialized in the idle domain > (permitted actions in a tasklet?) > > It ought to be the destination domain''s event-channel info that gets > accessed, not that of the domain that happens to be running currently. > Idle > domains have no event-channel info because it makes no sense for them > to > have it. I don''t know what xenpaging code you are specifically > referring to, > but sounds like it is simply broken. > > K. > > On 22/04/2010 21:57, "Byrne, John (HP Labs)" <john.l.byrne@hp.com> > wrote: > > > I am playing with the xenpaging code that was checked into xen 4.0 > and I hit a > > bug because it sends an event channel notification from a tasklet > which was > > getting run from the idle domain on my box. Since domain_create() > does not > > perform the evtchn_init for the idle domain, the event_lock was not > > initialized and the tasklet would hang the cpu when it tried to > acquire the > > lock. > > > > While my immediate problem seems easy enough to work around --- I > really can''t > > see the reason for the tasklet in the first place, so I just got rid > of it --- > > the underlying issue needs a look. Should domain_create() simply > initialize > > all of the idle_domain structure? As far as I can tell, the only > reason it > > doesn''t is to save a little memory. While the event_lock issue can be > dealt > > with simply enough by breaking out the initialization separately, it > seems > > that having the special-case code for the idle domain opens up the > possibility > > for bugs with respect to operations from tasklets. > > > > Thanks, > > > > John Byrne > > > > > > > > _______________________________________________ > > Xen-devel mailing list > > Xen-devel@lists.xensource.com > > http://lists.xensource.com/xen-devel >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2010-Apr-23 07:14 UTC
Re: [Xen-devel] event_lock not initialized in the idle domain (permitted actions in a tasklet?)
On 22/04/2010 23:20, "Byrne, John (HP Labs)" <john.l.byrne@hp.com> wrote:> The prepare_wait_on_xen_event_channel() looks to be totally bogus. The > notify_via_xen_event_channel() is the source of my problem. The first thing it > does is grab the lock on the current domain. If the tasklet is running on the > idle domain, it hangs because the event_lock of the current domain is not > initialized. Looking at the code for notify_via_xen_event_channel() more > carefully, it seems it should be calling evtchn_send() instead. > > I could provide a patch that does this. As I said in the my first e-mail, I > don''t see any advantage in doing the notification from a tasklet and my > inclination would be to delete it as well and just make the call directly. I''m > a little worried that I am only interested in xenpaging at the moment and it > looks like it might impact the sharing path as well.I think your approach of removing the tasklet was the correct one. See arch/x86/hvm/hvm.c:hvm_send_assist_req() for an example of correct usage of the *_xen_event_channel() helpers. Firstly, that function doesn''t run as a tasklet; Secondly, in fact ''v'' *always* is current. Possibly we should even ASSERT that on entry to the function to make it clearer. I think that the correct functions probably are being used, just in the wrong context. The aim after all is to pause the current vcpu until it receives assistance from a helper in dom0. That is similar to the existing (correct) use of *_xen_event_channel() which is to pause an HVM vcpu which requires help from qemu-dm in a stubdomain or dom0 process. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Byrne, John (HP Labs)
2010-Apr-24 00:30 UTC
RE: [Xen-devel] event_lock not initialized in the idle domain (permitted actions in a tasklet?)
> -----Original Message----- > From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] > ...snipped... > I think your approach of removing the tasklet was the correct one. See > arch/x86/hvm/hvm.c:hvm_send_assist_req() for an example of correct > usage of > the *_xen_event_channel() helpers. Firstly, that function doesn''t run > as a > tasklet; Secondly, in fact ''v'' *always* is current. Possibly we should > even > ASSERT that on entry to the function to make it clearer. > > I think that the correct functions probably are being used, just in the > wrong context. The aim after all is to pause the current vcpu until it > receives assistance from a helper in dom0. That is similar to the > existing > (correct) use of *_xen_event_channel() which is to pause an HVM vcpu > which > requires help from qemu-dm in a stubdomain or dom0 process.Keir, Unfortunately, in the page sharing case as of now, p2m_mem_paging_populate() (which will eventually want to send the event) can be called from dom0 in the grant code. (Maybe you can see a way to re-engineer this, but it is not obvious to me.) So, I need an interface that allows me to specify the domain. I don''t see one for Xen event channels and the cleanest way I see to implement this is to simply add one. I''m certainly open to suggestions. The handling in the grant code for paging is pretty ugly. There are a couple of holes. The biggest is that I don''t see an obvious way to keep mapped grant pages from being evicted, again. However, I think it will work well enough for me to test what I need. John _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2010-Apr-24 09:45 UTC
RE: [Xen-devel] event_lock not initialized in the idle domain (permitted actions in a tasklet?)
Well, [prepare_]wait_on_xen_event_channel() makes no sense to call on another domain, certainly given their current implementation. But notify_via_xen_event_channel() could, and you could add a domain parameter to it. I don''t know what dom0 does while the page is paged in? Probably just throws an error code and carries on? In which case thinking about [prepare_]wait_on_xen_event_channel() further would not be necessary, as a dom0 code path can just notify on teh channel and carry on its way. -- Keir ________________________________________ From: Byrne, John (HP Labs) [john.l.byrne@hp.com] Sent: Saturday, April 24, 2010 1:30 AM To: Keir Fraser Cc: xen-devel@lists.xensource.com Subject: RE: [Xen-devel] event_lock not initialized in the idle domain (permitted actions in a tasklet?) Unfortunately, in the page sharing case as of now, p2m_mem_paging_populate() (which will eventually want to send the event) can be called from dom0 in the grant code. (Maybe you can see a way to re-engineer this, but it is not obvious to me.) So, I need an interface that allows me to specify the domain. I don''t see one for Xen event channels and the cleanest way I see to implement this is to simply add one. I''m certainly open to suggestions. The handling in the grant code for paging is pretty ugly. There are a couple of holes. The biggest is that I don''t see an obvious way to keep mapped grant pages from being evicted, again. However, I think it will work well enough for me to test what I need. John _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Byrne, John (HP Labs)
2010-Apr-26 23:01 UTC
RE: [Xen-devel] event_lock not initialized in the idle domain (permitted actions in a tasklet?)
> -----Original Message----- > From: Keir Fraser [mailto:Keir.Fraser@eu.citrix.com] > > Well, [prepare_]wait_on_xen_event_channel() makes no sense to call on > another domain, certainly given their current implementation. But > notify_via_xen_event_channel() could, and you could add a domain > parameter to it.I''ve added the domain parameter to notify_via_xen_event_channel() as you suggested since it is only used in a couple of places.> > I don''t know what dom0 does while the page is paged in? Probably just > throws an error code and carries on? In which case thinking about > [prepare_]wait_on_xen_event_channel() further would not be necessary, > as a dom0 code path can just notify on teh channel and carry on its > way. >That is the way it is supposed to work, but there seems to be a bit of a disconnect between grant-table code and the driver mods to handle the errors. The driver mods expect a new error code GNTST_eagain, but the grant_table code returns ENOENT in a couple of places. There were also a couple of "GNTxxx_can_fail" flags added to grant-table interface, but they are unused. So things don''t look quite finished. John _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2010-Apr-27 05:48 UTC
Re: [Xen-devel] event_lock not initialized in the idle domain (permitted actions in a tasklet?)
On 27/04/2010 00:01, "Byrne, John (HP Labs)" <john.l.byrne@hp.com> wrote:>> I don''t know what dom0 does while the page is paged in? Probably just >> throws an error code and carries on? In which case thinking about >> [prepare_]wait_on_xen_event_channel() further would not be necessary, >> as a dom0 code path can just notify on teh channel and carry on its >> way. >> > > That is the way it is supposed to work, but there seems to be a bit of a > disconnect between grant-table code and the driver mods to handle the errors. > The driver mods expect a new error code GNTST_eagain, but the grant_table code > returns ENOENT in a couple of places. There were also a couple of > "GNTxxx_can_fail" flags added to grant-table interface, but they are unused. > So things don''t look quite finished.Yeah, I think that is pretty much the situation. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Byrne, John (HP Labs)
2010-Jun-03 01:12 UTC
RE: [Xen-devel] [PATCH[ event_lock not initialized in the idle domain (permitted actions in a tasklet?)
Keir, Sorry this dropped into a hole for a month. I''ve attached a patches against xen-4.0-testing 21172:6e77bf0852ec. It fixes the hangs related to the notify_xen_via_event_channel() by adding the domain argument. The tasklet is deleted because it seems to provide no value. With a couple of minor tweaks, I got paging to work against the 2.6.18 kernel --- the driver changes are not in the pvops kernel --- but there is some actual design work needed to do things right. (The grant failure/retry and the paging daemon need thought.) I''d certainly be happy to talk about things if someone cares and perhaps do the actual work, if time permits. Signed-off-by: John Byrne <john.l.byrne@hp.com>> -----Original Message----- > From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] > Sent: Monday, April 26, 2010 10:49 PM > To: Byrne, John (HP Labs) > Cc: xen-devel@lists.xensource.com > Subject: Re: [Xen-devel] event_lock not initialized in the idle domain > (permitted actions in a tasklet?) > > On 27/04/2010 00:01, "Byrne, John (HP Labs)" <john.l.byrne@hp.com> > wrote: > > >> I don''t know what dom0 does while the page is paged in? Probably > just > >> throws an error code and carries on? In which case thinking about > >> [prepare_]wait_on_xen_event_channel() further would not be > necessary, > >> as a dom0 code path can just notify on teh channel and carry on its > >> way. > >> > > > > That is the way it is supposed to work, but there seems to be a bit > of a > > disconnect between grant-table code and the driver mods to handle the > errors. > > The driver mods expect a new error code GNTST_eagain, but the > grant_table code > > returns ENOENT in a couple of places. There were also a couple of > > "GNTxxx_can_fail" flags added to grant-table interface, but they are > unused. > > So things don''t look quite finished. > > Yeah, I think that is pretty much the situation. > > -- Keir >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Patrick Colp
2010-Jun-04 14:37 UTC
Re: [Xen-devel] [PATCH[ event_lock not initialized in the idle domain (permitted actions in a tasklet?)
Hi, Sorry I missed this thread the first time around. I actually just ran into this same bug myself while trying to further/solidify the mem_paging/mem_event stuff. And, in fact, came to all the same conclusions and even the exact same fix. I was going to clean it up and push it out, but it looks like now I don''t have to :) I''d be happy for any suggestions you have about how to improve things. I''m fully aware the grant table code likely doesn''t work. Around the time we were prepping the mem_paging/mem_sharing patches for submittal, there had been a big change to grant table stuff and I didn''t have access to an ept box at the time. However, that is no longer the case and I would be more than happy to look at any and all issues you''ve been having. Patrick On 2 June 2010 18:12, Byrne, John (HP Labs) <john.l.byrne@hp.com> wrote:> Keir, > > Sorry this dropped into a hole for a month. > > I''ve attached a patches against xen-4.0-testing 21172:6e77bf0852ec. It fixes the hangs related to the notify_xen_via_event_channel() by adding the domain argument. The tasklet is deleted because it seems to provide no value. > > With a couple of minor tweaks, I got paging to work against the 2.6.18 kernel --- the driver changes are not in the pvops kernel --- but there is some actual design work needed to do things right. (The grant failure/retry and the paging daemon need thought.) I''d certainly be happy to talk about things if someone cares and perhaps do the actual work, if time permits. > > Signed-off-by: John Byrne <john.l.byrne@hp.com> > >> -----Original Message----- >> From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] >> Sent: Monday, April 26, 2010 10:49 PM >> To: Byrne, John (HP Labs) >> Cc: xen-devel@lists.xensource.com >> Subject: Re: [Xen-devel] event_lock not initialized in the idle domain >> (permitted actions in a tasklet?) >> >> On 27/04/2010 00:01, "Byrne, John (HP Labs)" <john.l.byrne@hp.com> >> wrote: >> >> >> I don''t know what dom0 does while the page is paged in? Probably >> just >> >> throws an error code and carries on? In which case thinking about >> >> [prepare_]wait_on_xen_event_channel() further would not be >> necessary, >> >> as a dom0 code path can just notify on teh channel and carry on its >> >> way. >> >> >> > >> > That is the way it is supposed to work, but there seems to be a bit >> of a >> > disconnect between grant-table code and the driver mods to handle the >> errors. >> > The driver mods expect a new error code GNTST_eagain, but the >> grant_table code >> > returns ENOENT in a couple of places. There were also a couple of >> > "GNTxxx_can_fail" flags added to grant-table interface, but they are >> unused. >> > So things don''t look quite finished. >> >> Yeah, I think that is pretty much the situation. >> >> -- Keir >> > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel