Kaushik Kumar Ram
2010-Jul-14 23:59 UTC
[Xen-devel] blktap2 and CONFIG_XEN_BLKBACK_PAGEMAP
Is it necessary to use blkback_pagemap with blktap2? Since the use of blkback_pagemap is configurable I tried without it and my system crashed (crash dump attached below). Or is it a bug? I am using about a month old xen-unstable.hg with linux-2.6.18-xen.hg (both 64 bit). Thanks. -Kaushik (XEN) mm.c:889:d0 Error getting mfn 80765 (pfn 3fba6) from L1 entry 8000000080765027 for l1e_owner=0, pg_owner=0 (XEN) mm.c:5046:d0 ptwr_emulate: could not get_page_from_l1e() Unable to handle kernel paging request at ffff8800388f6688 RIP: [<ffffffff803dc7d6>] blktap_map_uaddr_fn+0xa6/0xc0 PGD 1140067 PUD 1141067 PMD 1306067 PTE 80100000388f6065 Oops: 0003 [1] SMP CPU 0 Modules linked in: e1000e sd_mod ata_piix libata thermal fan Pid: 4183, comm: blkback.1.sda1 Not tainted 2.6.18.8-xen0 #40 RIP: e030:[<ffffffff803dc7d6>] [<ffffffff803dc7d6>] blktap_map_uaddr_fn+0xa6/0xc0 RSP: e02b:ffff880039d01840 EFLAGS: 00010297 RAX: 8000000080765027 RBX: ffff8800388f6688 RCX: ffff880039d01908 RDX: 00002b218a8d1000 RSI: ffff880001fb15d0 RDI: ffff8800388f6688 RBP: ffff880039d01850 R08: 00000000000388f6 R09: 0000000000000000 R10: 0000000000000000 R11: 00000000000002c8 R12: ffff8800388f6688 R13: 00002b218a8d1000 R14: 00002b218a8d2000 R15: ffff88003890e2a0 FS: 00002af9674c06e0(0000) GS:ffffffff8058c000(0000) knlGS:0000000000000000 CS: e033 DS: 0000 ES: 0000 Process blkback.1.sda1 (pid: 4183, threadinfo ffff880039d00000, task ffff88003e8cf080) Stack: ffff8800388f6688 ffff880001fb15d0 ffff880039d018f0 ffffffff80270033 0000001a00000039 ffff880039d01908 ffffffff803dc730 ffff88003a714080 ffff8800389802b0 00002b218a8d2000 00002b218a8d2000 ffff88003c03b430 Call Trace: [<ffffffff80270033>] apply_to_page_range+0x4e3/0x590 [<ffffffff803dc730>] blktap_map_uaddr_fn+0x0/0xc0 [<ffffffff803dac01>] blktap_map_uaddr+0x21/0x30 [<ffffffff803db70c>] blktap_device_do_request+0x67c/0xfe0 [<ffffffff8023f36c>] __mod_timer+0xbc/0xe0 [<ffffffff802088b0>] __switch_to+0x370/0x5b0 [<ffffffff8023f1dc>] lock_timer_base+0x2c/0x60 [<ffffffff8023f9c6>] del_timer+0x56/0x70 [<ffffffff80344715>] __generic_unplug_device+0x25/0x30 [<ffffffff803459d0>] generic_unplug_device+0x20/0x60 [<ffffffff803d3196>] unplug_queue+0x26/0x50 [<ffffffff803d3dea>] blkif_schedule+0x55a/0x690 [<ffffffff803d3890>] blkif_schedule+0x0/0x690 [<ffffffff8024b12a>] kthread+0xda/0x110 [<ffffffff8020a428>] child_rip+0xa/0x12 [<ffffffff8024b050>] kthread+0x0/0x110 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Shriram Rajagopalan
2010-Jul-15 18:19 UTC
Re: [Xen-devel] blktap2 and CONFIG_XEN_BLKBACK_PAGEMAP
IIRC during my early experiments with blkback & blktap2, I hit a similar error.. tracing through the code, I gathered that the pagemap stuff is used to manage page grants to blktap2 kernel driver . So, the #else (ie !BLKBK_PAGEMAP) code is not going to work. I suggest, you try to look at the blkback_pagemap.c and the blktap2/device.c or something like that to get a better picture. On Wed, Jul 14, 2010 at 4:59 PM, Kaushik Kumar Ram <kaushik@rice.edu> wrote:> Is it necessary to use blkback_pagemap with blktap2? Since the use of > blkback_pagemap is configurable I tried without it and my system crashed > (crash dump attached below). Or is it a bug? > > I am using about a month old xen-unstable.hg with linux-2.6.18-xen.hg (both > 64 bit). > > Thanks. > -Kaushik > > (XEN) mm.c:889:d0 Error getting mfn 80765 (pfn 3fba6) from L1 entry > 8000000080765027 for l1e_owner=0, pg_owner=0 > (XEN) mm.c:5046:d0 ptwr_emulate: could not get_page_from_l1e() > Unable to handle kernel paging request at ffff8800388f6688 RIP: > [<ffffffff803dc7d6>] blktap_map_uaddr_fn+0xa6/0xc0 > PGD 1140067 PUD 1141067 PMD 1306067 PTE 80100000388f6065 > Oops: 0003 [1] SMP > CPU 0 > Modules linked in: e1000e sd_mod ata_piix libata thermal fan > Pid: 4183, comm: blkback.1.sda1 Not tainted 2.6.18.8-xen0 #40 > RIP: e030:[<ffffffff803dc7d6>] [<ffffffff803dc7d6>] > blktap_map_uaddr_fn+0xa6/0xc0 > RSP: e02b:ffff880039d01840 EFLAGS: 00010297 > RAX: 8000000080765027 RBX: ffff8800388f6688 RCX: ffff880039d01908 > RDX: 00002b218a8d1000 RSI: ffff880001fb15d0 RDI: ffff8800388f6688 > RBP: ffff880039d01850 R08: 00000000000388f6 R09: 0000000000000000 > R10: 0000000000000000 R11: 00000000000002c8 R12: ffff8800388f6688 > R13: 00002b218a8d1000 R14: 00002b218a8d2000 R15: ffff88003890e2a0 > FS: 00002af9674c06e0(0000) GS:ffffffff8058c000(0000) > knlGS:0000000000000000 > CS: e033 DS: 0000 ES: 0000 > Process blkback.1.sda1 (pid: 4183, threadinfo ffff880039d00000, task > ffff88003e8cf080) > Stack: ffff8800388f6688 ffff880001fb15d0 ffff880039d018f0 ffffffff80270033 > 0000001a00000039 ffff880039d01908 ffffffff803dc730 ffff88003a714080 > ffff8800389802b0 00002b218a8d2000 00002b218a8d2000 ffff88003c03b430 > Call Trace: > [<ffffffff80270033>] apply_to_page_range+0x4e3/0x590 > [<ffffffff803dc730>] blktap_map_uaddr_fn+0x0/0xc0 > [<ffffffff803dac01>] blktap_map_uaddr+0x21/0x30 > [<ffffffff803db70c>] blktap_device_do_request+0x67c/0xfe0 > [<ffffffff8023f36c>] __mod_timer+0xbc/0xe0 > [<ffffffff802088b0>] __switch_to+0x370/0x5b0 > [<ffffffff8023f1dc>] lock_timer_base+0x2c/0x60 > [<ffffffff8023f9c6>] del_timer+0x56/0x70 > [<ffffffff80344715>] __generic_unplug_device+0x25/0x30 > [<ffffffff803459d0>] generic_unplug_device+0x20/0x60 > [<ffffffff803d3196>] unplug_queue+0x26/0x50 > [<ffffffff803d3dea>] blkif_schedule+0x55a/0x690 > [<ffffffff803d3890>] blkif_schedule+0x0/0x690 > [<ffffffff8024b12a>] kthread+0xda/0x110 > [<ffffffff8020a428>] child_rip+0xa/0x12 > [<ffffffff8024b050>] kthread+0x0/0x110 > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel >-- perception is but an offspring of its own self _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Kaushik Kumar Ram
2010-Jul-15 19:02 UTC
Re: [Xen-devel] blktap2 and CONFIG_XEN_BLKBACK_PAGEMAP
On Jul 15, 2010, at 1:19 PM, Shriram Rajagopalan wrote:> IIRC during my early experiments with blkback & blktap2, I hit a similar error.. tracing through the code, I gathered that the pagemap stuff is used to manage page grants to blktap2 kernel driver . So, the #else (ie !BLKBK_PAGEMAP) code is not going to work. > I suggest, you try to look at the blkback_pagemap.c and the blktap2/device.c or something like that to get a better picture.Thanks Shriram.0 I have been looking at the code over the past few days. Since I am not familiar with the Linux block I/O layers its taking a lot of time! It seems like on enabling CONFIG_BLKBACK_PAGEMAP the grant mechanism is used to map guest pages into user space too. This means the guest pages are mapped twice using the grant mechanism, first into dom0 kernel space (in blkback/blback.c) and then into tapdisk process''s address space (in blktap2/device.c). This is the new implementation of blkback. On disabling CONFIG_BLKBACK_PAGEMAP, the code falls back on the old implementation. Here, the guest pages are mapped into user space by directly manipulating the page tables without going through the grant mechanism. (Things seem slightly different when XENFEAT_auto_translated_physmap is set but I will ignore that for now). First, does the old way still work? The problem seems to arise when the page table entry is set in blktap_umap_uaddr_fn() (in blktap2/device.c) which leads to a page fault and Xen does not seem to like this page fault to handle it correctly and this results in a panic. Why is the page table entry set directly without using a hypercall here? Any further explanation will be much appreciated. Thanks. -Kaushik> On Wed, Jul 14, 2010 at 4:59 PM, Kaushik Kumar Ram <kaushik@rice.edu> wrote: > Is it necessary to use blkback_pagemap with blktap2? Since the use of blkback_pagemap is configurable I tried without it and my system crashed (crash dump attached below). Or is it a bug? > > I am using about a month old xen-unstable.hg with linux-2.6.18-xen.hg (both 64 bit). > > Thanks. > -Kaushik > > (XEN) mm.c:889:d0 Error getting mfn 80765 (pfn 3fba6) from L1 entry 8000000080765027 for l1e_owner=0, pg_owner=0 > (XEN) mm.c:5046:d0 ptwr_emulate: could not get_page_from_l1e() > Unable to handle kernel paging request at ffff8800388f6688 RIP: > [<ffffffff803dc7d6>] blktap_map_uaddr_fn+0xa6/0xc0 > PGD 1140067 PUD 1141067 PMD 1306067 PTE 80100000388f6065 > Oops: 0003 [1] SMP > CPU 0 > Modules linked in: e1000e sd_mod ata_piix libata thermal fan > Pid: 4183, comm: blkback.1.sda1 Not tainted 2.6.18.8-xen0 #40 > RIP: e030:[<ffffffff803dc7d6>] [<ffffffff803dc7d6>] blktap_map_uaddr_fn+0xa6/0xc0 > RSP: e02b:ffff880039d01840 EFLAGS: 00010297 > RAX: 8000000080765027 RBX: ffff8800388f6688 RCX: ffff880039d01908 > RDX: 00002b218a8d1000 RSI: ffff880001fb15d0 RDI: ffff8800388f6688 > RBP: ffff880039d01850 R08: 00000000000388f6 R09: 0000000000000000 > R10: 0000000000000000 R11: 00000000000002c8 R12: ffff8800388f6688 > R13: 00002b218a8d1000 R14: 00002b218a8d2000 R15: ffff88003890e2a0 > FS: 00002af9674c06e0(0000) GS:ffffffff8058c000(0000) knlGS:0000000000000000 > CS: e033 DS: 0000 ES: 0000 > Process blkback.1.sda1 (pid: 4183, threadinfo ffff880039d00000, task ffff88003e8cf080) > Stack: ffff8800388f6688 ffff880001fb15d0 ffff880039d018f0 ffffffff80270033 > 0000001a00000039 ffff880039d01908 ffffffff803dc730 ffff88003a714080 > ffff8800389802b0 00002b218a8d2000 00002b218a8d2000 ffff88003c03b430 > Call Trace: > [<ffffffff80270033>] apply_to_page_range+0x4e3/0x590 > [<ffffffff803dc730>] blktap_map_uaddr_fn+0x0/0xc0 > [<ffffffff803dac01>] blktap_map_uaddr+0x21/0x30 > [<ffffffff803db70c>] blktap_device_do_request+0x67c/0xfe0 > [<ffffffff8023f36c>] __mod_timer+0xbc/0xe0 > [<ffffffff802088b0>] __switch_to+0x370/0x5b0 > [<ffffffff8023f1dc>] lock_timer_base+0x2c/0x60 > [<ffffffff8023f9c6>] del_timer+0x56/0x70 > [<ffffffff80344715>] __generic_unplug_device+0x25/0x30 > [<ffffffff803459d0>] generic_unplug_device+0x20/0x60 > [<ffffffff803d3196>] unplug_queue+0x26/0x50 > [<ffffffff803d3dea>] blkif_schedule+0x55a/0x690 > [<ffffffff803d3890>] blkif_schedule+0x0/0x690 > [<ffffffff8024b12a>] kthread+0xda/0x110 > [<ffffffff8020a428>] child_rip+0xa/0x12 > [<ffffffff8024b050>] kthread+0x0/0x110 > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel > > > > -- > perception is but an offspring of its own self_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Shriram Rajagopalan
2010-Jul-15 22:06 UTC
Re: [Xen-devel] blktap2 and CONFIG_XEN_BLKBACK_PAGEMAP
On Thu, Jul 15, 2010 at 12:02 PM, Kaushik Kumar Ram <kaushik@rice.edu>wrote:> > On Jul 15, 2010, at 1:19 PM, Shriram Rajagopalan wrote: > > > IIRC during my early experiments with blkback & blktap2, I hit a similar > error.. tracing through the code, I gathered that the pagemap stuff is used > to manage page grants to blktap2 kernel driver . So, the #else (ie > !BLKBK_PAGEMAP) code is not going to work. > > I suggest, you try to look at the blkback_pagemap.c and the > blktap2/device.c or something like that to get a better picture. > > Thanks Shriram.0 I have been looking at the code over the past few days. > Since I am not familiar with the Linux block I/O layers its taking a lot of > time! > > It seems like on enabling CONFIG_BLKBACK_PAGEMAP the grant mechanism is > used to map guest pages into user space too. This means the guest pages are > mapped twice using the grant mechanism, first into dom0 kernel space (in > blkback/blback.c) and then into tapdisk process''s address space (in > blktap2/device.c). This is the new implementation of blkback. > > yep..> On disabling CONFIG_BLKBACK_PAGEMAP, the code falls back on the old > implementation. Here, the guest pages are mapped into user space by directly > manipulating the page tables without going through the grant mechanism. > (Things seem slightly different when XENFEAT_auto_translated_physmap is set > but I will ignore that for now).IIRC, that XENFEAT_auto_translated_physmap is kinda deprecated.. it was used in xen 3.1 or so I guess.. (basically, it makes pfn = mfn, instead of the current style : p2m & m2p tables)> First, does the old way still work?AFAIK, nope. I am not sure if some other config needs to be set to get that old code to work. It looks like dead code to me. I cannot figure out the "backward compatibility" angle to it either.> The problem seems to arise when the page table entry is set in > blktap_umap_uaddr_fn() (in blktap2/device.c) which leads to a page fault and > Xen does not seem to like this page fault to handle it correctly and this > results in a panic. Why is the page table entry set directly without using a > hypercall here? > > Any further explanation will be much appreciated. > > Thanks. > -Kaushik > > > On Wed, Jul 14, 2010 at 4:59 PM, Kaushik Kumar Ram <kaushik@rice.edu> > wrote: > > Is it necessary to use blkback_pagemap with blktap2? Since the use of > blkback_pagemap is configurable I tried without it and my system crashed > (crash dump attached below). Or is it a bug? > > > > I am using about a month old xen-unstable.hg with linux-2.6.18-xen.hg > (both 64 bit). > > > > Thanks. > > -Kaushik > > > > (XEN) mm.c:889:d0 Error getting mfn 80765 (pfn 3fba6) from L1 entry > 8000000080765027 for l1e_owner=0, pg_owner=0 > > (XEN) mm.c:5046:d0 ptwr_emulate: could not get_page_from_l1e() > > Unable to handle kernel paging request at ffff8800388f6688 RIP: > > [<ffffffff803dc7d6>] blktap_map_uaddr_fn+0xa6/0xc0 > > PGD 1140067 PUD 1141067 PMD 1306067 PTE 80100000388f6065 > > Oops: 0003 [1] SMP > > CPU 0 > > Modules linked in: e1000e sd_mod ata_piix libata thermal fan > > Pid: 4183, comm: blkback.1.sda1 Not tainted 2.6.18.8-xen0 #40 > > RIP: e030:[<ffffffff803dc7d6>] [<ffffffff803dc7d6>] > blktap_map_uaddr_fn+0xa6/0xc0 > > RSP: e02b:ffff880039d01840 EFLAGS: 00010297 > > RAX: 8000000080765027 RBX: ffff8800388f6688 RCX: ffff880039d01908 > > RDX: 00002b218a8d1000 RSI: ffff880001fb15d0 RDI: ffff8800388f6688 > > RBP: ffff880039d01850 R08: 00000000000388f6 R09: 0000000000000000 > > R10: 0000000000000000 R11: 00000000000002c8 R12: ffff8800388f6688 > > R13: 00002b218a8d1000 R14: 00002b218a8d2000 R15: ffff88003890e2a0 > > FS: 00002af9674c06e0(0000) GS:ffffffff8058c000(0000) > knlGS:0000000000000000 > > CS: e033 DS: 0000 ES: 0000 > > Process blkback.1.sda1 (pid: 4183, threadinfo ffff880039d00000, task > ffff88003e8cf080) > > Stack: ffff8800388f6688 ffff880001fb15d0 ffff880039d018f0 > ffffffff80270033 > > 0000001a00000039 ffff880039d01908 ffffffff803dc730 ffff88003a714080 > > ffff8800389802b0 00002b218a8d2000 00002b218a8d2000 ffff88003c03b430 > > Call Trace: > > [<ffffffff80270033>] apply_to_page_range+0x4e3/0x590 > > [<ffffffff803dc730>] blktap_map_uaddr_fn+0x0/0xc0 > > [<ffffffff803dac01>] blktap_map_uaddr+0x21/0x30 > > [<ffffffff803db70c>] blktap_device_do_request+0x67c/0xfe0 > > [<ffffffff8023f36c>] __mod_timer+0xbc/0xe0 > > [<ffffffff802088b0>] __switch_to+0x370/0x5b0 > > [<ffffffff8023f1dc>] lock_timer_base+0x2c/0x60 > > [<ffffffff8023f9c6>] del_timer+0x56/0x70 > > [<ffffffff80344715>] __generic_unplug_device+0x25/0x30 > > [<ffffffff803459d0>] generic_unplug_device+0x20/0x60 > > [<ffffffff803d3196>] unplug_queue+0x26/0x50 > > [<ffffffff803d3dea>] blkif_schedule+0x55a/0x690 > > [<ffffffff803d3890>] blkif_schedule+0x0/0x690 > > [<ffffffff8024b12a>] kthread+0xda/0x110 > > [<ffffffff8020a428>] child_rip+0xa/0x12 > > [<ffffffff8024b050>] kthread+0x0/0x110 > > _______________________________________________ > > Xen-devel mailing list > > Xen-devel@lists.xensource.com > > http://lists.xensource.com/xen-devel > > > > > > > > -- > > perception is but an offspring of its own self > >-- perception is but an offspring of its own self _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Daniel Stodden
2010-Jul-16 20:16 UTC
Re: [Xen-devel] blktap2 and CONFIG_XEN_BLKBACK_PAGEMAP
On Thu, 2010-07-15 at 18:06 -0400, Shriram Rajagopalan wrote:> > > On Thu, Jul 15, 2010 at 12:02 PM, Kaushik Kumar Ram <kaushik@rice.edu> > wrote: > > On Jul 15, 2010, at 1:19 PM, Shriram Rajagopalan wrote: > > > IIRC during my early experiments with blkback & blktap2, I > hit a similar error.. tracing through the code, I gathered > that the pagemap stuff is used to manage page grants to > blktap2 kernel driver . So, the #else (ie !BLKBK_PAGEMAP) code > is not going to work. > > I suggest, you try to look at the blkback_pagemap.c and the > blktap2/device.c or something like that to get a better > picture. > > > Thanks Shriram.0 I have been looking at the code over the past > few days. Since I am not familiar with the Linux block I/O > layers its taking a lot of time! > > It seems like on enabling CONFIG_BLKBACK_PAGEMAP the grant > mechanism is used to map guest pages into user space too. This > means the guest pages are mapped twice using the grant > mechanism, first into dom0 kernel space (in blkback/blback.c) > and then into tapdisk process''s address space (in > blktap2/device.c). This is the new implementation of blkback. > > yep.. >Yes, it''s pretty mandatory. It''s needed to map foreign frames which have been mapped by blkback back to their grants. I guess the Kconfigs should reflect that. Didn''t expect that it''s just set to optional anywhere. The reason for the duplicate mapping is that userspace has to re-queue those frames at the physical device layer, and -- iirc -- the problem was that queuing pages twice, once on the blktap2 bdev and once on the underlying disk, will deadlock. So the second grant map basically creates an alias under a second pfn. One page locally separate in two frames. Not exactly beautiful, but effective.> On disabling CONFIG_BLKBACK_PAGEMAP, the code falls back on > the old implementation. Here, the guest pages are mapped into > user space by directly manipulating the page tables without > going through the grant mechanism. (Things seem slightly > different when XENFEAT_auto_translated_physmap is set but I > will ignore that for now). > IIRC, that XENFEAT_auto_translated_physmap is kinda deprecated.. it > was used in xen 3.1 or so I guess.. (basically, it makes pfn = mfn, > instead of the current style : p2m & m2p tables)Yes. That code has been there forever and then got carried over from blktap1 to blktap2, basically as-is. Even to pvops, where it''s probaby broken. Empirical proof that nobody is using blktap2 with autotranslation, at least not on recent kernels. I guess it''s going to stay there until autotranslation either gets more en vogue again or evaporates altogether. Cheers, Daniel _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2010-Jul-17 00:53 UTC
Re: [Xen-devel] blktap2 and CONFIG_XEN_BLKBACK_PAGEMAP
On 07/16/2010 01:16 PM, Daniel Stodden wrote:> Yes. That code has been there forever and then got carried over from > blktap1 to blktap2, basically as-is. Even to pvops, where it''s probaby > broken. Empirical proof that nobody is using blktap2 with > autotranslation, at least not on recent kernels. > > I guess it''s going to stay there until autotranslation either gets more > en vogue again or evaporates altogether. >auto_translate_physmap will come back if people want to use EPT/NPT with PV guests. J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Daniel Stodden
2010-Jul-17 00:56 UTC
Re: [Xen-devel] blktap2 and CONFIG_XEN_BLKBACK_PAGEMAP
On Fri, 2010-07-16 at 20:53 -0400, Jeremy Fitzhardinge wrote:> On 07/16/2010 01:16 PM, Daniel Stodden wrote: > > Yes. That code has been there forever and then got carried over from > > blktap1 to blktap2, basically as-is. Even to pvops, where it''s probaby > > broken. Empirical proof that nobody is using blktap2 with > > autotranslation, at least not on recent kernels. > > > > I guess it''s going to stay there until autotranslation either gets more > > en vogue again or evaporates altogether. > > > > auto_translate_physmap will come back if people want to use EPT/NPT with > PV guests.Ah. I always thought that''s going to be rather the replacement. Rather than being synonymous in the kernel sources. Now that you say it, it might make sense. :) Daniel _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Fri, 2010-07-16 at 21:16 +0100, Daniel Stodden wrote:> > The reason for the duplicate mapping is that userspace has to re-queue > those frames at the physical device layer, and -- iirc -- the problem > was that queuing pages twice, once on the blktap2 bdev and once on the > underlying disk, will deadlock.I was wondering what the duplicate mappings were for just last week. So is this need to play tricks with the p2m to avoid a deadlock the only dependency blktap2 has on Xen? IOW if we could find another way around the deadlock would a) blktap2 be esable on native and/or b) would all the Xen specific bits (grant mappings etc) be confined to blkback only? I guess the difference between blktap and e.g. device mapper is that in the later case the requeuing is done in the kernel and in the former the page goes via userspace and hence the association with the original I/O is lost? Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Daniel Stodden
2010-Jul-19 16:53 UTC
Re: [Xen-devel] blktap2 and CONFIG_XEN_BLKBACK_PAGEMAP
On Mon, 2010-07-19 at 09:36 -0400, Ian Campbell wrote:> On Fri, 2010-07-16 at 21:16 +0100, Daniel Stodden wrote: > > > > The reason for the duplicate mapping is that userspace has to re-queue > > those frames at the physical device layer, and -- iirc -- the problem > > was that queuing pages twice, once on the blktap2 bdev and once on the > > underlying disk, will deadlock. > > I was wondering what the duplicate mappings were for just last week. > > So is this need to play tricks with the p2m to avoid a deadlock the only > dependency blktap2 has on Xen? IOW if we could find another way around > the deadlock would a) blktap2 be esable on native and/or b) would all > the Xen specific bits (grant mappings etc) be confined to blkback only?[cc Jake. Did most of the mapping code, and still the one who knows best what prevents that path from getting simpler.] Both the xen and native datapaths are presently inlined in the same disk type. The solution to that would be an ops struct to separate the handling. But that''s certainly not a hard problem. Apart from that, I believe native was more of a problem than blkback. Only out my memory: Consider non-foreign r/w in dom0. There''s going to be a page lock foregoing queuing on the tapdev. And a second lock attempt on the path from tapdisk to the physical device, because what userland is sending down the native I/O path is sold as normal user memory. So it''s probably rather tribute to zero-copy than anything else. The problem might evaporate if the physical I/O were bounced off anon memory. That might be one possible alternative. Note that the blkback path is different, because it directly goes for the disk queue, not through the filemap. I''d expect that to just work.> I guess the difference between blktap and e.g. device mapper is that in > the later case the requeuing is done in the kernel and in the former the > page goes via userspace and hence the association with the original I/O > is lost?Yep. I think another difference was that dm nodes only do request translation, then just pass them on the the physical layer. So dm nodes are rather thin compared to a tapdev. But that might not matter here. Daniel _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> -----Original Message----- > From: Daniel Stodden > Sent: Monday, July 19, 2010 9:53 AM > To: Ian Campbell > Cc: Shriram Rajagopalan; Kaushik Kumar Ram; xen-devel@lists.xensource.com; > Jake Wires > Subject: Re: [Xen-devel] blktap2 and CONFIG_XEN_BLKBACK_PAGEMAP > > On Mon, 2010-07-19 at 09:36 -0400, Ian Campbell wrote: > > On Fri, 2010-07-16 at 21:16 +0100, Daniel Stodden wrote: > > > > > > The reason for the duplicate mapping is that userspace has to re-queue > > > those frames at the physical device layer, and -- iirc -- the problem > > > was that queuing pages twice, once on the blktap2 bdev and once on the > > > underlying disk, will deadlock. > > > > I was wondering what the duplicate mappings were for just last week. > > > > So is this need to play tricks with the p2m to avoid a deadlock the only > > dependency blktap2 has on Xen? IOW if we could find another way around > > the deadlock would a) blktap2 be esable on native and/or b) would all > > the Xen specific bits (grant mappings etc) be confined to blkback only? > > [cc Jake. Did most of the mapping code, and still the one who knows best > what prevents that path from getting simpler.] > > Both the xen and native datapaths are presently inlined in the same disk > type. The solution to that would be an ops struct to separate the > handling. But that''s certainly not a hard problem. > > Apart from that, I believe native was more of a problem than blkback. > > Only out my memory: Consider non-foreign r/w in dom0. There''s going to > be a page lock foregoing queuing on the tapdev. And a second lock > attempt on the path from tapdisk to the physical device, because what > userland is sending down the native I/O path is sold as normal user > memory. > > So it''s probably rather tribute to zero-copy than anything else. The > problem might evaporate if the physical I/O were bounced off anon > memory. That might be one possible alternative.Daniel is correct -- in the non-xen case, the blktap mapping is used merely to give us a new (unlocked) page struct that tapdisk can send back down through the IO stack. we could do away with this in the non-xen case if we give up zero-copy. blktap would still need to use xen to map foreign pages to tapdisk.> Note that the blkback path is different, because it directly goes for > the disk queue, not through the filemap. I''d expect that to just work. > > > > I guess the difference between blktap and e.g. device mapper is that in > > the later case the requeuing is done in the kernel and in the former the > > page goes via userspace and hence the association with the original I/O > > is lost? > > Yep. > > I think another difference was that dm nodes only do request > translation, then just pass them on the the physical layer. So dm nodes > are rather thin compared to a tapdev. But that might not matter here. > > Daniel >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>>> On 15.07.10 at 21:02, Kaushik Kumar Ram <kaushik@rice.edu> wrote: > It seems like on enabling CONFIG_BLKBACK_PAGEMAP the grant mechanism is used > to map guest pages into user space too. This means the guest pages are mapped > twice using the grant mechanism, first into dom0 kernel space (in > blkback/blback.c) and then into tapdisk process''s address space (in > blktap2/device.c). This is the new implementation of blkback. > > On disabling CONFIG_BLKBACK_PAGEMAP, the code falls back on the old > implementation. Here, the guest pages are mapped into user space by directly > manipulating the page tables without going through the grant mechanism. > (Things seem slightly different when XENFEAT_auto_translated_physmap is set > but I will ignore that for now). First, does the old way still work? The > problem seems to arise when the page table entry is set in > blktap_umap_uaddr_fn() (in blktap2/device.c) which leads to a page fault and > Xen does not seem to like this page fault to handle it correctly and this > results in a panic. Why is the page table entry set directly without using a > hypercall here? > > Any further explanation will be much appreciated.How could you have disabled XEN_BLKBACK_PAGEMAP in the first place? It''s a prompt-less option after all (for the very reason that it''s not optional). Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel