Hi all, I just find a xc_map_foreign_range() problem in Xen. xc_map_foreign_range(), an API of libxc backed by privcmd - a xen kernel module, can be used to mmap guest VM memory into dom0. However, if we mmap more than 160 pages, we''ll fail. Inside xc_map_foreign_range(), it uses ioctrl to communicate with privcmd. There are 2 ioctl commands, the one is IOCTL_PRIVCMD_MMAPBATCH (legacy), another one is IOCTL_PRIVCMD_MMAPBATCH_V2 (newer). Both of them constantly return 0 (success), but the mmapings are fail after mmaping 160 pages. Firstly, when my Linux kernel version was 3.5, IOCTL_PRIVCMD_MMAPBATCH was the only one ioctl command. rc = ioctl(fd, IOCTL_PRIVCMD_MMAPBATCH, &ioctlx). After mapping 160 pages, the subsequent invoking of ioctl would set ioctlx.arr[] items to 140737344202616, overwriting the original pfn number. And then, I updated my Linux kernel to 3.8 so as to test IOCTL_PRIVCMD_MMAPBATCH_V2. rc = ioctl(fd, IOCTL_PRIVCMD_MMAPBATCH_V2, &ioctlx). This time, the post-160 invoking of ioctl would set ioctls.err[] items to EINVAL. Although I have inserted printk() in privcmd.c to track its execution flow, the result showed a look-like complete path, which is quite weird. I have no idea what happened in privcmd. Can anyone figure out this problem? Thanks, Guanglin
On Mon, 2013-07-29 at 16:16 -0400, Guanglin Xu wrote:> Hi all, > > I just find a xc_map_foreign_range() problem in Xen. > > xc_map_foreign_range(), an API of libxc backed by privcmd - a xen > kernel module, can be used to mmap guest VM memory into dom0. However, > if we mmap more than 160 pages, we''ll fail. > > Inside xc_map_foreign_range(), it uses ioctrl to communicate with > privcmd. There are 2 ioctl commands, the one is > IOCTL_PRIVCMD_MMAPBATCH (legacy), another one is > IOCTL_PRIVCMD_MMAPBATCH_V2 (newer). Both of them constantly return 0 > (success), but the mmapings are fail after mmaping 160 pages. > > Firstly, when my Linux kernel version was 3.5, IOCTL_PRIVCMD_MMAPBATCH > was the only one ioctl command. rc = ioctl(fd, > IOCTL_PRIVCMD_MMAPBATCH, &ioctlx). After mapping 160 pages, the > subsequent invoking of ioctl would > set ioctlx.arr[] items to 140737344202616, overwriting the original > pfn number. > > And then, I updated my Linux kernel to 3.8 so as to test > IOCTL_PRIVCMD_MMAPBATCH_V2. rc = ioctl(fd, > IOCTL_PRIVCMD_MMAPBATCH_V2, &ioctlx). This time, the post-160 invoking > of ioctl would set ioctls.err[] items to EINVAL. > > Although I have inserted printk() in privcmd.c to track its execution > flow, the result showed a look-like complete path, which is quite > weird. I have no idea what happened in privcmd. > > Can anyone figure out this problem?I wouldn''t be all that surprised if there was a hardcoded batch size limit somewhere in either libxc or the privcmd driver. If you map you memory in batch of e.g. 128 pages does it all work OK? If you want to get to the bottom of the 160 page limit you''ll probably have to trace through the code looking for hardcoded sizes or limits on sizes (e.g. of arrays) or perhaps integer indexes etc which are too small and are overflowing. 160 * the size of the various structs involved doesn''t look to be all that interesting (i.e. just over a page size boundary or something), but that''s the sort of direction I would start by looking in. If you can''t spot it by eye then you''ll likely have to instrument the code paths with prints to try and track the progress of the initially supplied buffer through to the hypercall etc. Ian.
2013/7/30 Ian Campbell <Ian.Campbell@citrix.com>:> On Mon, 2013-07-29 at 16:16 -0400, Guanglin Xu wrote: >> Hi all, >> >> I just find a xc_map_foreign_range() problem in Xen. >> >> xc_map_foreign_range(), an API of libxc backed by privcmd - a xen >> kernel module, can be used to mmap guest VM memory into dom0. However, >> if we mmap more than 160 pages, we''ll fail. >> >> Inside xc_map_foreign_range(), it uses ioctrl to communicate with >> privcmd. There are 2 ioctl commands, the one is >> IOCTL_PRIVCMD_MMAPBATCH (legacy), another one is >> IOCTL_PRIVCMD_MMAPBATCH_V2 (newer). Both of them constantly return 0 >> (success), but the mmapings are fail after mmaping 160 pages. >> >> Firstly, when my Linux kernel version was 3.5, IOCTL_PRIVCMD_MMAPBATCH >> was the only one ioctl command. rc = ioctl(fd, >> IOCTL_PRIVCMD_MMAPBATCH, &ioctlx). After mapping 160 pages, the >> subsequent invoking of ioctl would >> set ioctlx.arr[] items to 140737344202616, overwriting the original >> pfn number. >> >> And then, I updated my Linux kernel to 3.8 so as to test >> IOCTL_PRIVCMD_MMAPBATCH_V2. rc = ioctl(fd, >> IOCTL_PRIVCMD_MMAPBATCH_V2, &ioctlx). This time, the post-160 invoking >> of ioctl would set ioctls.err[] items to EINVAL. >> >> Although I have inserted printk() in privcmd.c to track its execution >> flow, the result showed a look-like complete path, which is quite >> weird. I have no idea what happened in privcmd. >> >> Can anyone figure out this problem? > > I wouldn''t be all that surprised if there was a hardcoded batch size > limit somewhere in either libxc or the privcmd driver.Hi Ian, Thank you very much for your reply. I can confirm that it''s the problem of privcmd, because I have debug libxc and narrowed the problem to ioctl(), where privcmd would set ioctlx.arr[] ( IOCTL_PRIVCMD_MMAPBATCH) or ioctlx.err[] ( IOCTL_PRIVCMD_MMAPBATCH_V2) to indicate the limitation.> > If you map you memory in batch of e.g. 128 pages does it all work OK?Yes. For example, [0-127] succeeds. However, the subsequent [128-255] would fail because the size of the whole region has exceeded 160 pages.> > If you want to get to the bottom of the 160 page limit you''ll probably > have to trace through the code looking for hardcoded sizes or limits on > sizes (e.g. of arrays) or perhaps integer indexes etc which are too > small and are overflowing. > > 160 * the size of the various structs involved doesn''t look to be all > that interesting (i.e. just over a page size boundary or something), but > that''s the sort of direction I would start by looking in. > > If you can''t spot it by eye then you''ll likely have to instrument the > code paths with prints to try and track the progress of the initially > supplied buffer through to the hypercall etc.Yes. I have been debuging libxc and instrumenting privcmd, but I couldn''t find the "hardcode" or other limitation codes. Codes in libxc is easizer to trace, but in privcmd it is hard for me who is in lack of kernel dev experience. I couldn''t even find where privcmd copy_to_user() or put_user() the ioctlx.err[] or ioctlx.arr[] items while the execution path of privcmd_ioctl() seems quite complete by use of printk(). Do you have idea how privcmd can set ioctlx.err[] in another way?> > Ian. > >
On Tue, 2013-07-30 at 07:55 -0400, Guanglin Xu wrote:> 2013/7/30 Ian Campbell <Ian.Campbell@citrix.com>: > > On Mon, 2013-07-29 at 16:16 -0400, Guanglin Xu wrote: > >> Hi all, > >> > >> I just find a xc_map_foreign_range() problem in Xen. > >> > >> xc_map_foreign_range(), an API of libxc backed by privcmd - a xen > >> kernel module, can be used to mmap guest VM memory into dom0. However, > >> if we mmap more than 160 pages, we''ll fail. > >> > >> Inside xc_map_foreign_range(), it uses ioctrl to communicate with > >> privcmd. There are 2 ioctl commands, the one is > >> IOCTL_PRIVCMD_MMAPBATCH (legacy), another one is > >> IOCTL_PRIVCMD_MMAPBATCH_V2 (newer). Both of them constantly return 0 > >> (success), but the mmapings are fail after mmaping 160 pages. > >> > >> Firstly, when my Linux kernel version was 3.5, IOCTL_PRIVCMD_MMAPBATCH > >> was the only one ioctl command. rc = ioctl(fd, > >> IOCTL_PRIVCMD_MMAPBATCH, &ioctlx). After mapping 160 pages, the > >> subsequent invoking of ioctl would > >> set ioctlx.arr[] items to 140737344202616, overwriting the original > >> pfn number. > >> > >> And then, I updated my Linux kernel to 3.8 so as to test > >> IOCTL_PRIVCMD_MMAPBATCH_V2. rc = ioctl(fd, > >> IOCTL_PRIVCMD_MMAPBATCH_V2, &ioctlx). This time, the post-160 invoking > >> of ioctl would set ioctls.err[] items to EINVAL. > >> > >> Although I have inserted printk() in privcmd.c to track its execution > >> flow, the result showed a look-like complete path, which is quite > >> weird. I have no idea what happened in privcmd. > >> > >> Can anyone figure out this problem? > > > > I wouldn''t be all that surprised if there was a hardcoded batch size > > limit somewhere in either libxc or the privcmd driver. > > Hi Ian, > > Thank you very much for your reply. > > I can confirm that it''s the problem of privcmd, because I have debug > libxc and narrowed the problem to ioctl(), where privcmd would set > ioctlx.arr[] ( IOCTL_PRIVCMD_MMAPBATCH) or ioctlx.err[] ( > IOCTL_PRIVCMD_MMAPBATCH_V2) to indicate the limitation. > > > > > If you map you memory in batch of e.g. 128 pages does it all work OK? > > Yes. For example, [0-127] succeeds. However, the subsequent [128-255] > would fail because the size of the whole region has exceeded 160 > pages.That''s quite interesting.> > > > If you want to get to the bottom of the 160 page limit you''ll probably > > have to trace through the code looking for hardcoded sizes or limits on > > sizes (e.g. of arrays) or perhaps integer indexes etc which are too > > small and are overflowing. > > > > 160 * the size of the various structs involved doesn''t look to be all > > that interesting (i.e. just over a page size boundary or something), but > > that''s the sort of direction I would start by looking in. > > > > If you can''t spot it by eye then you''ll likely have to instrument the > > code paths with prints to try and track the progress of the initially > > supplied buffer through to the hypercall etc. > > Yes. I have been debuging libxc and instrumenting privcmd, but I > couldn''t find the "hardcode" or other limitation codes. > > Codes in libxc is easizer to trace, but in privcmd it is hard for me > who is in lack of kernel dev experience. I couldn''t even find where > privcmd copy_to_user() or put_user() the ioctlx.err[] or ioctlx.arr[] > items while the execution path of privcmd_ioctl() seems quite complete > by use of printk(). > > Do you have idea how privcmd can set ioctlx.err[] in another way?I''m not familiar with this code. It sounds like this is most probably a kernel bug, I''d recommend taking it to the xen-devel list, perhaps CC Konrad Wilk and David Vrabel. Ian.