''stable'' pages have always been a bit of a fiction. It''s easy to intentionally modify stable pages under io with some help from page references that ignore mappings and page state. Here''s little test that uses O_DIRECT to get the pinned aio ring pages under IO and then has event completion stores modify them while they''re in flight. It''s a nice quick way to test the consequences of stable pages being modified. It can be used to burp out ratelimited csum failure kernel messages with btrfs, for example. - z #define _GNU_SOURCE #include <stdlib.h> #include <unistd.h> #include <stdio.h> #include <limits.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <sys/uio.h> #include <assert.h> #include <libaio.h> int main(int argc, char **argv) { size_t total = 1 * 1024 * 1024; size_t page_size = sysconf(_SC_PAGESIZE); struct iovec *iov; size_t iov_nr = total / page_size; void *junk; io_context_t ctx = NULL; int nr_iocbs = 3; struct iocb iocbs[nr_iocbs]; struct iocb *iocb_ptrs[nr_iocbs]; struct io_event events[nr_iocbs]; int ret; int fd; int nr; int i; if (argc != 2) { fprintf(stderr, "usage: %s <file_to_overwrite>\n", argv[0]); exit(1); } iov = calloc(iov_nr, sizeof(*iov)); junk = malloc(total); assert(iov && junk); fd = open(argv[1], O_RDWR|O_CREAT|O_DIRECT, 0644); assert(fd >= 0); ret = io_setup(nr_iocbs, &ctx); assert(ret >= 0); for (i = 0; i < iov_nr; i++) { iov[i].iov_base = ctx; iov[i].iov_len = page_size; } /* initial write to allocate the file region */ ret = writev(fd, iov, iov_nr); assert(ret == total); /* * Keep one of each of these iocbs in flight: * * [0]: hopefully fast 0 byte read to keep churning events * [1]: dio read of file bytes to trigger csum verification * [2]: dio write of unstable event pages */ io_prep_pread(&iocbs[0], fd, junk, 0, 0); io_prep_pread(&iocbs[1], fd, junk, total, 0); io_prep_pwritev(&iocbs[2], fd, iov, iov_nr, 0); for (i = 0; i < nr_iocbs; i++) iocb_ptrs[i] = &iocbs[i]; nr = nr_iocbs; for(;;) { ret = io_submit(ctx, nr, iocb_ptrs); assert(ret == nr); nr = io_getevents(ctx, 1, nr_iocbs, events, NULL); assert(nr > 0); for (i = 0; i < nr; i++) iocb_ptrs[i] = events[i].obj; } return 0; } -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Quoting Zach Brown (2013-05-30 18:36:10)> ''stable'' pages have always been a bit of a fiction. It''s easy to > intentionally modify stable pages under io with some help from page > references that ignore mappings and page state. > > Here''s little test that uses O_DIRECT to get the pinned aio ring pages > under IO and then has event completion stores modify them while they''re > in flight. > > It''s a nice quick way to test the consequences of stable pages being > modified. It can be used to burp out ratelimited csum failure kernel > messages with btrfs, for example.Changing O_DIRECT in flight has always been a deep dark corner case, and crc errors are the expected result. Have you found anyone doing this in real life? I do like the small test program though, we should extend it into a test to make sure crcs are really crcing. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> Changing O_DIRECT in flight has always been a deep dark corner case, and > crc errors are the expected result. Have you found anyone doing this in > real life?Agreed; and no, I haven''t heard of people accidentally modifying stable pages.> I do like the small test program though, we should extend it into a test > to make sure crcs are really crcing.Yeah, the little test is more to make sure that the error paths handle the modification case as expected :). - z -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, May 31, 2013 at 12:24:30AM -0600, Zach Brown wrote:> > Changing O_DIRECT in flight has always been a deep dark corner case, and > > crc errors are the expected result. Have you found anyone doing this in > > real life? > > Agreed; and no, I haven''t heard of people accidentally modifying stable > pages. >Windows does this, it also will send down the same page for different offsets which is why we have that special check in check_direct_IO for reads because that would cause lots of csum errors too. I tried to fix the modified in flight problem by checking the csums of the pages in the io completion handler and re-submitting the IO if the pages had changed, but this of course dramatically reduced performance for all of those well behaved O_DIRECT applications, so in the end I just set nodatasum for that vm image and carried on. I''m not sure what the solution is for this problem. Thanks, Josef -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Quoting Josef Bacik (2013-05-31 09:29:07)> On Fri, May 31, 2013 at 12:24:30AM -0600, Zach Brown wrote: > > > Changing O_DIRECT in flight has always been a deep dark corner case, and > > > crc errors are the expected result. Have you found anyone doing this in > > > real life? > > > > Agreed; and no, I haven''t heard of people accidentally modifying stable > > pages. > > > > Windows does this, it also will send down the same page for different offsets > which is why we have that special check in check_direct_IO for reads because > that would cause lots of csum errors too. I tried to fix the modified in flight > problem by checking the csums of the pages in the io completion handler and > re-submitting the IO if the pages had changed, but this of course dramatically > reduced performance for all of those well behaved O_DIRECT applications, so in > the end I just set nodatasum for that vm image and carried on. I''m not sure > what the solution is for this problem. Thanks,Ugh, I forgot about the windows case. KVM should really be copying the pages for sectors where IO is already in flight. We could do the copies ourselves: mount -o dio_copies -chris -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, May 31, 2013 at 07:53:29AM -0600, Chris Mason wrote:> Quoting Josef Bacik (2013-05-31 09:29:07) > > On Fri, May 31, 2013 at 12:24:30AM -0600, Zach Brown wrote: > > > > Changing O_DIRECT in flight has always been a deep dark corner case, and > > > > crc errors are the expected result. Have you found anyone doing this in > > > > real life? > > > > > > Agreed; and no, I haven''t heard of people accidentally modifying stable > > > pages. > > > > > > > Windows does this, it also will send down the same page for different offsets > > which is why we have that special check in check_direct_IO for reads because > > that would cause lots of csum errors too. I tried to fix the modified in flight > > problem by checking the csums of the pages in the io completion handler and > > re-submitting the IO if the pages had changed, but this of course dramatically > > reduced performance for all of those well behaved O_DIRECT applications, so in > > the end I just set nodatasum for that vm image and carried on. I''m not sure > > what the solution is for this problem. Thanks, > > Ugh, I forgot about the windows case. KVM should really be copying the > pages for sectors where IO is already in flight. > > We could do the copies ourselves: mount -o dio_copies >That would work for us, but what about other people that rely on stable pages, like *fs on iscsi and such? It might be good to have a generic mount option that the vfs notices and makes the copying happen before it gets to the file system and that way we''re all save and don''t have a different solution/mount option for each fs. Thanks, Josef -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html