''stable'' pages have always been a bit of a fiction.
It''s easy to
intentionally modify stable pages under io with some help from page
references that ignore mappings and page state.
Here''s little test that uses O_DIRECT to get the pinned aio ring pages
under IO and then has event completion stores modify them while they''re
in flight.
It''s a nice quick way to test the consequences of stable pages being
modified. It can be used to burp out ratelimited csum failure kernel
messages with btrfs, for example.
- z
#define _GNU_SOURCE
#include <stdlib.h>
#include <unistd.h>
#include <stdio.h>
#include <limits.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <sys/uio.h>
#include <assert.h>
#include <libaio.h>
int main(int argc, char **argv)
{
size_t total = 1 * 1024 * 1024;
size_t page_size = sysconf(_SC_PAGESIZE);
struct iovec *iov;
size_t iov_nr = total / page_size;
void *junk;
io_context_t ctx = NULL;
int nr_iocbs = 3;
struct iocb iocbs[nr_iocbs];
struct iocb *iocb_ptrs[nr_iocbs];
struct io_event events[nr_iocbs];
int ret;
int fd;
int nr;
int i;
if (argc != 2) {
fprintf(stderr, "usage: %s <file_to_overwrite>\n", argv[0]);
exit(1);
}
iov = calloc(iov_nr, sizeof(*iov));
junk = malloc(total);
assert(iov && junk);
fd = open(argv[1], O_RDWR|O_CREAT|O_DIRECT, 0644);
assert(fd >= 0);
ret = io_setup(nr_iocbs, &ctx);
assert(ret >= 0);
for (i = 0; i < iov_nr; i++) {
iov[i].iov_base = ctx;
iov[i].iov_len = page_size;
}
/* initial write to allocate the file region */
ret = writev(fd, iov, iov_nr);
assert(ret == total);
/*
* Keep one of each of these iocbs in flight:
*
* [0]: hopefully fast 0 byte read to keep churning events
* [1]: dio read of file bytes to trigger csum verification
* [2]: dio write of unstable event pages
*/
io_prep_pread(&iocbs[0], fd, junk, 0, 0);
io_prep_pread(&iocbs[1], fd, junk, total, 0);
io_prep_pwritev(&iocbs[2], fd, iov, iov_nr, 0);
for (i = 0; i < nr_iocbs; i++)
iocb_ptrs[i] = &iocbs[i];
nr = nr_iocbs;
for(;;) {
ret = io_submit(ctx, nr, iocb_ptrs);
assert(ret == nr);
nr = io_getevents(ctx, 1, nr_iocbs, events, NULL);
assert(nr > 0);
for (i = 0; i < nr; i++)
iocb_ptrs[i] = events[i].obj;
}
return 0;
}
--
To unsubscribe from this list: send the line "unsubscribe
linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Quoting Zach Brown (2013-05-30 18:36:10)> ''stable'' pages have always been a bit of a fiction. It''s easy to > intentionally modify stable pages under io with some help from page > references that ignore mappings and page state. > > Here''s little test that uses O_DIRECT to get the pinned aio ring pages > under IO and then has event completion stores modify them while they''re > in flight. > > It''s a nice quick way to test the consequences of stable pages being > modified. It can be used to burp out ratelimited csum failure kernel > messages with btrfs, for example.Changing O_DIRECT in flight has always been a deep dark corner case, and crc errors are the expected result. Have you found anyone doing this in real life? I do like the small test program though, we should extend it into a test to make sure crcs are really crcing. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> Changing O_DIRECT in flight has always been a deep dark corner case, and > crc errors are the expected result. Have you found anyone doing this in > real life?Agreed; and no, I haven''t heard of people accidentally modifying stable pages.> I do like the small test program though, we should extend it into a test > to make sure crcs are really crcing.Yeah, the little test is more to make sure that the error paths handle the modification case as expected :). - z -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, May 31, 2013 at 12:24:30AM -0600, Zach Brown wrote:> > Changing O_DIRECT in flight has always been a deep dark corner case, and > > crc errors are the expected result. Have you found anyone doing this in > > real life? > > Agreed; and no, I haven''t heard of people accidentally modifying stable > pages. >Windows does this, it also will send down the same page for different offsets which is why we have that special check in check_direct_IO for reads because that would cause lots of csum errors too. I tried to fix the modified in flight problem by checking the csums of the pages in the io completion handler and re-submitting the IO if the pages had changed, but this of course dramatically reduced performance for all of those well behaved O_DIRECT applications, so in the end I just set nodatasum for that vm image and carried on. I''m not sure what the solution is for this problem. Thanks, Josef -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Quoting Josef Bacik (2013-05-31 09:29:07)> On Fri, May 31, 2013 at 12:24:30AM -0600, Zach Brown wrote: > > > Changing O_DIRECT in flight has always been a deep dark corner case, and > > > crc errors are the expected result. Have you found anyone doing this in > > > real life? > > > > Agreed; and no, I haven''t heard of people accidentally modifying stable > > pages. > > > > Windows does this, it also will send down the same page for different offsets > which is why we have that special check in check_direct_IO for reads because > that would cause lots of csum errors too. I tried to fix the modified in flight > problem by checking the csums of the pages in the io completion handler and > re-submitting the IO if the pages had changed, but this of course dramatically > reduced performance for all of those well behaved O_DIRECT applications, so in > the end I just set nodatasum for that vm image and carried on. I''m not sure > what the solution is for this problem. Thanks,Ugh, I forgot about the windows case. KVM should really be copying the pages for sectors where IO is already in flight. We could do the copies ourselves: mount -o dio_copies -chris -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, May 31, 2013 at 07:53:29AM -0600, Chris Mason wrote:> Quoting Josef Bacik (2013-05-31 09:29:07) > > On Fri, May 31, 2013 at 12:24:30AM -0600, Zach Brown wrote: > > > > Changing O_DIRECT in flight has always been a deep dark corner case, and > > > > crc errors are the expected result. Have you found anyone doing this in > > > > real life? > > > > > > Agreed; and no, I haven''t heard of people accidentally modifying stable > > > pages. > > > > > > > Windows does this, it also will send down the same page for different offsets > > which is why we have that special check in check_direct_IO for reads because > > that would cause lots of csum errors too. I tried to fix the modified in flight > > problem by checking the csums of the pages in the io completion handler and > > re-submitting the IO if the pages had changed, but this of course dramatically > > reduced performance for all of those well behaved O_DIRECT applications, so in > > the end I just set nodatasum for that vm image and carried on. I''m not sure > > what the solution is for this problem. Thanks, > > Ugh, I forgot about the windows case. KVM should really be copying the > pages for sectors where IO is already in flight. > > We could do the copies ourselves: mount -o dio_copies >That would work for us, but what about other people that rely on stable pages, like *fs on iscsi and such? It might be good to have a generic mount option that the vfs notices and makes the copying happen before it gets to the file system and that way we''re all save and don''t have a different solution/mount option for each fs. Thanks, Josef -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html