Hi, the $subj warning appears sometimes in syslog, in my case when xfstests/209 runs looped. The minimal reproducer is looped mkfs+mount. The message comes from disk-io.c btree_readpage_end_io_hook(): 581 if (check_tree_block_fsid(root, eb)) { 582 printk_ratelimited(KERN_INFO "btrfs bad fsid on block %llu\n", 583 (unsigned long long)eb->start); 584 ret = -EIO; 585 goto err; 586 } relevant syslog messages: [420367.199710] device fsid dda1a3db-3106-4bb9-8ecf-2e823938d538 devid 1 transid 4 /dev/sda9 [420367.209438] btrfs: force lzo compression [420367.214695] btrfs: enabling inode map caching [420367.220404] btrfs: enabling auto defrag [420367.224193] btrfs: disk space caching is enabled [420367.323356] btrfs bad fsid on block 20971520 [420367.358349] btrfs bad fsid on block 20971520 [420367.368272] btrfs bad fsid on block 20971520 [420367.376239] btrfs bad fsid on block 20971520 [420367.381836] btrfs bad fsid on block 20971520 [420367.467332] btrfs bad fsid on block 20971520 [420367.473249] btrfs bad fsid on block 20971520 [420367.478649] btrfs: failed to read chunk root on sda9 [420367.487810] btrfs: open_ctree failed and mount fails. /proc/partitions: 8 9 10485760 sda9 10485760*1024 bytes, which is 2621440 4k blocks. The number 20971520 is not a block number, rather byte offset, so the message might be confusing first. Real block number is 20971520 / 4096 = 20480 used blocks on a freshly created device: File size of test-10g is 10737418240 (10485760 blocks, blocksize 1024) ext logical physical expected length flags 0 0 10248192 2048 1 4096 10252288 10250239 20 2 20480 10293248 10252307 4 3 28672 10301440 10293251 4 4 36864 10260480 10301443 24 5 65536 10264576 10260503 4 6 1085440 10276864 10264579 24 7 10483712 294912 10276887 2048 eof it''s "extent" nr. 2, not a superblock. The block obviously does not contain the exptected data, though they were submitted and supposed to be written by the mkfs step. The question is, where the update is lost -- block layer, write caches of the disk. btrfs-debug-tree says it''s: chunk tree leaf 20971520 items 6 free space 3283 generation 4 owner 3 fs uuid 024cd2e6-d584-493c-af81-fa3e2f548abb chunk uuid 3f52ec70-89a9-4cd5-b5f0-177d7ae63de3 item 0 key (DEV_ITEMS DEV_ITEM 1) itemoff 3897 itemsize 98 dev item devid 1 total_bytes 10737418240 bytes used 2185232384 item 1 key (FIRST_CHUNK_TREE CHUNK_ITEM 0) itemoff 3817 itemsize 80 chunk length 4194304 owner 2 type 2 num_stripes 1 stripe 0 devid 1 offset 0 item 2 key (FIRST_CHUNK_TREE CHUNK_ITEM 4194304) itemoff 3737 itemsize 80 chunk length 8388608 owner 2 type 4 num_stripes 1 stripe 0 devid 1 offset 4194304 item 3 key (FIRST_CHUNK_TREE CHUNK_ITEM 12582912) itemoff 3657 itemsize 80 chunk length 8388608 owner 2 type 1 num_stripes 1 stripe 0 devid 1 offset 12582912 item 4 key (FIRST_CHUNK_TREE CHUNK_ITEM 20971520) itemoff 3545 itemsize 112 chunk length 8388608 owner 2 type 34 num_stripes 2 stripe 0 devid 1 offset 20971520 stripe 1 devid 1 offset 29360128 item 5 key (FIRST_CHUNK_TREE CHUNK_ITEM 29360128) itemoff 3433 itemsize 112 chunk length 1073741824 owner 2 type 36 num_stripes 2 stripe 0 devid 1 offset 37748736 stripe 1 devid 1 offset 1111490560 david -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Jan 11, 2012 at 04:46:21PM +0100, David Sterba wrote:> Hi, > > the $subj warning appears sometimes in syslog, in my case when > xfstests/209 runs looped. The minimal reproducer is looped mkfs+mount.I''ve been seeing this as well. It''s new with 3.2, and I haven''t yet been able to track it down. The first thing that happens when we mount the FS is a block layer invalidate, and that must be dropping the write. It''s also possible (but very unlikely) that mkfs.btrfs is neglecting to write that block. Do you have a reliable way to reproduce? If so, can you try with a much older mkfs.btrfs. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Jan 11, 2012 at 11:37:14AM -0500, Chris Mason wrote:> I''ve been seeing this as well. It''s new with 3.2, and I haven''t yet > been able to track it down. > > The first thing that happens when we mount the FS is a block layer > invalidate, and that must be dropping the write. > > It''s also possible (but very unlikely) that mkfs.btrfs is neglecting to > write that block. > > Do you have a reliable way to reproduce? If so, can you try with a much > older mkfs.btrfs.I built mkfs from v0.19 and let 209 loop again, the error appeared 6x during 3 hours, the last 5 occurences within 10 minutes. Should I try even older mkfs? I will try to catch it with blktrace running. thanks, david -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Jan 11, 2012 at 11:34:34PM +0100, David Sterba wrote:> On Wed, Jan 11, 2012 at 11:37:14AM -0500, Chris Mason wrote: > > I''ve been seeing this as well. It''s new with 3.2, and I haven''t yet > > been able to track it down. > > > > The first thing that happens when we mount the FS is a block layer > > invalidate, and that must be dropping the write. > > > > It''s also possible (but very unlikely) that mkfs.btrfs is neglecting to > > write that block. > > > > Do you have a reliable way to reproduce? If so, can you try with a much > > older mkfs.btrfs. > > I built mkfs from v0.19 and let 209 loop again, the error appeared 6x > during 3 hours, the last 5 occurences within 10 minutes. Should I try > even older mkfs?Nah, I''d try the latest mkfs on a 3.0 kernel. I think its a change to the block device invalidate code that happens on mount. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Jan 11, 2012 at 11:34:34PM +0100, David Sterba wrote:> I will try to catch it with blktrace running.Measurement disrupted the experiment. The second I start blktrace, these messages [450482.299863] device fsid 7f7bfb60-b8f3-457e-857a-9a1a187f750f devid 1 transid 7 /dev/sda9 [450482.309642] btrfs: force lzo compression [450482.314802] btrfs: enabling inode map caching [450482.320363] btrfs: enabling auto defrag [450482.324138] btrfs: disk space caching is enabled [450482.378652] btrfs: failed to read chunk root on sda9 [450482.385397] btrfs: open_ctree failed appear in the log, mount fails and the test is not performed. (And immediately stop when blktrace stops.) There are a few occurances of the [450491.373282] btrfs bad fsid on block 20971520 message. Blktrace log does not contain any record of ''mkfs'' activity, the other involved process are there (mount, aio-dio, kernel threads). The other day I saw [ 8334.490486] device fsid 830e57b6-b9c3-471c-b4dc-4a8c2c56fb35 devid 1 transid 4 /dev/sda9 [ 8334.500482] btrfs: force lzo compression [ 8334.505853] btrfs: enabling inode map caching [ 8334.511539] btrfs: enabling auto defrag [ 8334.515234] btrfs: disk space caching is enabled [ 8334.532517] btrfs 0 12582912 [ 8334.551594] btrfs bad tree block start 20971520 12582912 [ 8334.560353] btrfs bad tree block start 0 12582912 [ 8334.568263] btrfs bad tree block start 20971520 12582912 [ 8334.575188] btrfs bad tree block start 20971520 12582912 [ 8334.581946] btrfs bad tree block start 0 12582912 [ 8334.588044] btrfs bad tree block start 20971520 12582912 [ 8334.594543] btrfs: failed to read chunk root on sda9 [ 8334.601040] btrfs: open_ctree failed a different instance of the the same. So this looks rather serious. Per your advice, I''ll try to test with other filesystems, with older kernels, and in btrfs case add fsync into mkfs. david -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Jan 16, 2012 at 03:34:28PM +0100, David Sterba wrote:> Per your advice, I''ll try to test with other filesystems, with older > kernels, and in btrfs case add fsync into mkfs.I left looping the 3.0.13 based sles kernel and did not trigger the warning for several hours. In the meantime I grepped through my serial console logs and found that first ''bad fsid'' message appeared in 3.0.0-rc5+ dated to 2011-06-01: [73673.623530] device fsid 5f1b5c0e-21ff-4896-bf48-8d64558dd205 devid 1 transid 7 /dev/sdb10 [73673.633194] btrfs: enabling auto defrag [73673.636915] btrfs: enabling disk space caching [73673.644124] btrfs: enabling inode map caching [73673.649733] btrfs: force lzo compression [73673.740630] btrfs bad fsid on block 20971520 [73673.746400] btrfs bad fsid on block 20971520 [73673.760785] btrfs bad fsid on block 20971520 [73673.766284] btrfs: failed to read chunk root on sdb10 [73673.772969] btrfs warning page private not zero on page 20971520 [73673.792224] btrfs: open_ctree failed and there are several messages from 3.1.0-rc4 kernel, no more occurences of "page private not zero" message. david -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Possibly Parallel Threads
- [btrfs-progs] btrfs fi df output
- unremovable dirs from failed unlink
- [PATCH] Btrfs: fix sync fs to actually wait for all data to be persisted
- [PATCH] Btrfs-progs: fix missing recow roots when making btrfs filesystem
- NFSv4: Using fsid=0 but *not* exporting the root filesystem