''stable'' pages have always been a bit of a fiction. 
It''s easy to
intentionally modify stable pages under io with some help from page
references that ignore mappings and page state.
Here''s little test that uses O_DIRECT to get the pinned aio ring pages
under IO and then has event completion stores modify them while they''re
in flight.
It''s a nice quick way to test the consequences of stable pages being
modified.  It can be used to burp out ratelimited csum failure kernel
messages with btrfs, for example.
- z
#define _GNU_SOURCE
#include <stdlib.h>
#include <unistd.h>
#include <stdio.h>
#include <limits.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <sys/uio.h>
#include <assert.h>
#include <libaio.h>
int main(int argc, char **argv)
{
	size_t total = 1 * 1024 * 1024;
	size_t page_size = sysconf(_SC_PAGESIZE);
	struct iovec *iov;
	size_t iov_nr = total / page_size;
	void *junk;
	io_context_t ctx = NULL;
	int nr_iocbs = 3;
	struct iocb iocbs[nr_iocbs];
	struct iocb *iocb_ptrs[nr_iocbs];
	struct io_event events[nr_iocbs];
	int ret;
	int fd;
	int nr;
	int i;
	if (argc != 2) {
		fprintf(stderr, "usage: %s <file_to_overwrite>\n", argv[0]);
		exit(1);
	}
	iov = calloc(iov_nr, sizeof(*iov));
	junk = malloc(total);
	assert(iov && junk);
	fd = open(argv[1], O_RDWR|O_CREAT|O_DIRECT, 0644);
	assert(fd >= 0);
	ret = io_setup(nr_iocbs, &ctx);
	assert(ret >= 0);
	for (i = 0; i < iov_nr; i++) {
		iov[i].iov_base = ctx;
		iov[i].iov_len = page_size;
	}
	/* initial write to allocate the file region */
	ret = writev(fd, iov, iov_nr);
	assert(ret == total);
	/*
	 * Keep one of each of these iocbs in flight:
	 *
	 * [0]: hopefully fast 0 byte read to keep churning events
	 * [1]: dio read of file bytes to trigger csum verification
	 * [2]: dio write of unstable event pages
	 */
	io_prep_pread(&iocbs[0], fd, junk, 0, 0);
	io_prep_pread(&iocbs[1], fd, junk, total, 0);
	io_prep_pwritev(&iocbs[2], fd, iov, iov_nr, 0);
	for (i = 0; i < nr_iocbs; i++)
		iocb_ptrs[i] = &iocbs[i];
	nr = nr_iocbs;
	for(;;) {
		ret = io_submit(ctx, nr, iocb_ptrs);
		assert(ret == nr);
		nr = io_getevents(ctx, 1, nr_iocbs, events, NULL);
		assert(nr > 0);
		for (i = 0; i < nr; i++)
			iocb_ptrs[i] = events[i].obj;
	}
	return 0;
}
--
To unsubscribe from this list: send the line "unsubscribe
linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Quoting Zach Brown (2013-05-30 18:36:10)> ''stable'' pages have always been a bit of a fiction. It''s easy to > intentionally modify stable pages under io with some help from page > references that ignore mappings and page state. > > Here''s little test that uses O_DIRECT to get the pinned aio ring pages > under IO and then has event completion stores modify them while they''re > in flight. > > It''s a nice quick way to test the consequences of stable pages being > modified. It can be used to burp out ratelimited csum failure kernel > messages with btrfs, for example.Changing O_DIRECT in flight has always been a deep dark corner case, and crc errors are the expected result. Have you found anyone doing this in real life? I do like the small test program though, we should extend it into a test to make sure crcs are really crcing. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> Changing O_DIRECT in flight has always been a deep dark corner case, and > crc errors are the expected result. Have you found anyone doing this in > real life?Agreed; and no, I haven''t heard of people accidentally modifying stable pages.> I do like the small test program though, we should extend it into a test > to make sure crcs are really crcing.Yeah, the little test is more to make sure that the error paths handle the modification case as expected :). - z -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, May 31, 2013 at 12:24:30AM -0600, Zach Brown wrote:> > Changing O_DIRECT in flight has always been a deep dark corner case, and > > crc errors are the expected result. Have you found anyone doing this in > > real life? > > Agreed; and no, I haven''t heard of people accidentally modifying stable > pages. >Windows does this, it also will send down the same page for different offsets which is why we have that special check in check_direct_IO for reads because that would cause lots of csum errors too. I tried to fix the modified in flight problem by checking the csums of the pages in the io completion handler and re-submitting the IO if the pages had changed, but this of course dramatically reduced performance for all of those well behaved O_DIRECT applications, so in the end I just set nodatasum for that vm image and carried on. I''m not sure what the solution is for this problem. Thanks, Josef -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Quoting Josef Bacik (2013-05-31 09:29:07)> On Fri, May 31, 2013 at 12:24:30AM -0600, Zach Brown wrote: > > > Changing O_DIRECT in flight has always been a deep dark corner case, and > > > crc errors are the expected result. Have you found anyone doing this in > > > real life? > > > > Agreed; and no, I haven''t heard of people accidentally modifying stable > > pages. > > > > Windows does this, it also will send down the same page for different offsets > which is why we have that special check in check_direct_IO for reads because > that would cause lots of csum errors too. I tried to fix the modified in flight > problem by checking the csums of the pages in the io completion handler and > re-submitting the IO if the pages had changed, but this of course dramatically > reduced performance for all of those well behaved O_DIRECT applications, so in > the end I just set nodatasum for that vm image and carried on. I''m not sure > what the solution is for this problem. Thanks,Ugh, I forgot about the windows case. KVM should really be copying the pages for sectors where IO is already in flight. We could do the copies ourselves: mount -o dio_copies -chris -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, May 31, 2013 at 07:53:29AM -0600, Chris Mason wrote:> Quoting Josef Bacik (2013-05-31 09:29:07) > > On Fri, May 31, 2013 at 12:24:30AM -0600, Zach Brown wrote: > > > > Changing O_DIRECT in flight has always been a deep dark corner case, and > > > > crc errors are the expected result. Have you found anyone doing this in > > > > real life? > > > > > > Agreed; and no, I haven''t heard of people accidentally modifying stable > > > pages. > > > > > > > Windows does this, it also will send down the same page for different offsets > > which is why we have that special check in check_direct_IO for reads because > > that would cause lots of csum errors too. I tried to fix the modified in flight > > problem by checking the csums of the pages in the io completion handler and > > re-submitting the IO if the pages had changed, but this of course dramatically > > reduced performance for all of those well behaved O_DIRECT applications, so in > > the end I just set nodatasum for that vm image and carried on. I''m not sure > > what the solution is for this problem. Thanks, > > Ugh, I forgot about the windows case. KVM should really be copying the > pages for sectors where IO is already in flight. > > We could do the copies ourselves: mount -o dio_copies >That would work for us, but what about other people that rely on stable pages, like *fs on iscsi and such? It might be good to have a generic mount option that the vfs notices and makes the copying happen before it gets to the file system and that way we''re all save and don''t have a different solution/mount option for each fs. Thanks, Josef -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html