2011/10/11 Sage Weil <sage@newdream.net>:> On Tue, 11 Oct 2011, Christian Brunner wrote: >> Maybe this one is easier: >> >> One of our OSDs isn''t starting, because ther is no "current" >> directory. What I have are three snap directories. >> >> total 0 >> -rw-r--r-- 1 root root 37 Oct 9 15:57 ceph_fsid >> -rw-r--r-- 1 root root 8 Oct 9 15:57 fsid >> -rw-r--r-- 1 root root 21 Oct 9 15:57 magic >> drwxr-xr-x 1 root root 7986 Oct 11 18:34 snap_506043 >> drwxr-xr-x 1 root root 7986 Oct 11 18:34 snap_507364 >> drwxr-xr-x 1 root root 7814 Oct 11 18:36 snap_507417 >> -rw-r--r-- 1 root root 4 Oct 9 15:57 store_version >> -rw-r--r-- 1 root root 2 Oct 9 15:57 whoami >> >> Is there a way to rollback the latest? > > That''s what the OSD actually does on startup (roll back to the newest > snap_). It''s probably a trivial bug that''s preventing startup now... I''ll > take a look. In the meantime, you can clone the latest snap_ to current > and it should start! > > sageThis seems to be a btrfs problem. It fails, when I''m trying to create the clone # btrfs subvolume snapshot snap_507417 current Create a snapshot of ''snap_507417'' in ''./current'' ERROR: cannot snapshot ''snap_507417'' And I get the following kernel messages: [ 5863.263950] ------------[ cut here ]------------ [ 5863.269125] WARNING: at fs/btrfs/inode.c:2335 btrfs_orphan_cleanup+0xcd/0x3d0 [btrfs]() [ 5863.278142] Hardware name: ProLiant DL180 G6 [ 5863.283161] Modules linked in: btrfs zlib_deflate libcrc32c bonding ipv6 serio_raw pcspkr ghes hed iTCO_wdt iTCO_vendor_support ixgbe dca mdio i7core_edac edac_core iomemory_vsl(P) hpsa squashfs usb_storage [last unloaded: scsi_wait_scan] [ 5863.307774] Pid: 6349, comm: btrfs Tainted: P W 3.0.6-1.fits.2.el6.x86_64 #1 [ 5863.316647] Call Trace: [ 5863.319648] [<ffffffff8106344f>] warn_slowpath_common+0x7f/0xc0 [ 5863.326536] [<ffffffff810634aa>] warn_slowpath_null+0x1a/0x20 [ 5863.333146] [<ffffffffa023fb0d>] btrfs_orphan_cleanup+0xcd/0x3d0 [btrfs] [ 5863.340839] [<ffffffffa0238381>] ? join_transaction+0x201/0x250 [btrfs] [ 5863.348482] [<ffffffffa021fbaa>] ? block_rsv_migrate_bytes+0x3a/0x50 [btrfs] [ 5863.356590] [<ffffffffa0261a3b>] btrfs_mksubvol+0x2fb/0x380 [btrfs] [ 5863.363726] [<ffffffffa0261bba>] btrfs_ioctl_snap_create_transid+0xfa/0x150 [btrfs] [ 5863.372445] [<ffffffffa0261c66>] btrfs_ioctl_snap_create+0x56/0x80 [btrfs] [ 5863.380398] [<ffffffffa026583e>] btrfs_ioctl+0x2fe/0xd50 [btrfs] [ 5863.387344] [<ffffffff8125ed20>] ? inode_has_perm+0x30/0x40 [ 5863.393798] [<ffffffff81261a2c>] ? file_has_perm+0xdc/0xf0 [ 5863.400114] [<ffffffff8117086a>] do_vfs_ioctl+0x9a/0x5a0 [ 5863.406244] [<ffffffff81170e11>] sys_ioctl+0xa1/0xb0 [ 5863.412001] [<ffffffff81562882>] system_call_fastpath+0x16/0x1b [ 5863.418767] ---[ end trace e3234ecab14ad64c ]--- [ 5863.424084] btrfs: Error removing orphan entry, stopping orphan cleanup [ 5863.431614] btrfs: could not do orphan cleanup -22 Can I use an older snapshot as well? Regards, Christian -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 11 Oct 2011, Christian Brunner wrote:> 2011/10/11 Sage Weil <sage@newdream.net>: > > On Tue, 11 Oct 2011, Christian Brunner wrote: > >> Maybe this one is easier: > >> > >> One of our OSDs isn''t starting, because ther is no "current" > >> directory. What I have are three snap directories. > >> > >> total 0 > >> -rw-r--r-- 1 root root 37 Oct 9 15:57 ceph_fsid > >> -rw-r--r-- 1 root root 8 Oct 9 15:57 fsid > >> -rw-r--r-- 1 root root 21 Oct 9 15:57 magic > >> drwxr-xr-x 1 root root 7986 Oct 11 18:34 snap_506043 > >> drwxr-xr-x 1 root root 7986 Oct 11 18:34 snap_507364 > >> drwxr-xr-x 1 root root 7814 Oct 11 18:36 snap_507417 > >> -rw-r--r-- 1 root root 4 Oct 9 15:57 store_version > >> -rw-r--r-- 1 root root 2 Oct 9 15:57 whoami > >> > >> Is there a way to rollback the latest? > > > > That''s what the OSD actually does on startup (roll back to the newest > > snap_). It''s probably a trivial bug that''s preventing startup now... I''ll > > take a look. In the meantime, you can clone the latest snap_ to current > > and it should start! > > > > sage > > This seems to be a btrfs problem. It fails, when I''m trying to create the clone > > # btrfs subvolume snapshot snap_507417 current > Create a snapshot of ''snap_507417'' in ''./current'' > ERROR: cannot snapshot ''snap_507417'' > > And I get the following kernel messages: > > [ 5863.263950] ------------[ cut here ]------------ > [ 5863.269125] WARNING: at fs/btrfs/inode.c:2335 > btrfs_orphan_cleanup+0xcd/0x3d0 [btrfs]() > [ 5863.278142] Hardware name: ProLiant DL180 G6 > [ 5863.283161] Modules linked in: btrfs zlib_deflate libcrc32c bonding > ipv6 serio_raw pcspkr ghes hed iTCO_wdt iTCO_vendor_support ixgbe dca > mdio i7core_edac edac_core iomemory_vsl(P) hpsa squashfs usb_storage > [last unloaded: scsi_wait_scan] > [ 5863.307774] Pid: 6349, comm: btrfs Tainted: P W > 3.0.6-1.fits.2.el6.x86_64 #1 > [ 5863.316647] Call Trace: > [ 5863.319648] [<ffffffff8106344f>] warn_slowpath_common+0x7f/0xc0 > [ 5863.326536] [<ffffffff810634aa>] warn_slowpath_null+0x1a/0x20 > [ 5863.333146] [<ffffffffa023fb0d>] btrfs_orphan_cleanup+0xcd/0x3d0 [btrfs] > [ 5863.340839] [<ffffffffa0238381>] ? join_transaction+0x201/0x250 [btrfs] > [ 5863.348482] [<ffffffffa021fbaa>] ? block_rsv_migrate_bytes+0x3a/0x50 [btrfs] > [ 5863.356590] [<ffffffffa0261a3b>] btrfs_mksubvol+0x2fb/0x380 [btrfs] > [ 5863.363726] [<ffffffffa0261bba>] > btrfs_ioctl_snap_create_transid+0xfa/0x150 [btrfs] > [ 5863.372445] [<ffffffffa0261c66>] btrfs_ioctl_snap_create+0x56/0x80 [btrfs] > [ 5863.380398] [<ffffffffa026583e>] btrfs_ioctl+0x2fe/0xd50 [btrfs] > [ 5863.387344] [<ffffffff8125ed20>] ? inode_has_perm+0x30/0x40 > [ 5863.393798] [<ffffffff81261a2c>] ? file_has_perm+0xdc/0xf0 > [ 5863.400114] [<ffffffff8117086a>] do_vfs_ioctl+0x9a/0x5a0 > [ 5863.406244] [<ffffffff81170e11>] sys_ioctl+0xa1/0xb0 > [ 5863.412001] [<ffffffff81562882>] system_call_fastpath+0x16/0x1b > [ 5863.418767] ---[ end trace e3234ecab14ad64c ]--- > [ 5863.424084] btrfs: Error removing orphan entry, stopping orphan cleanup > [ 5863.431614] btrfs: could not do orphan cleanup -22 > > Can I use an older snapshot as well?You''re able to snapshot the others? Yeah, any of the snap_ directories will work, although keep in mind when the OSD starts up it will immediately remove current/ and re-clone the newest snap_ to current/ again. If the problem is a toxic/broken snap_ dir, you''ll need to rename it out of the way to avoid hitting the problem again... sage> > Regards, > Christian > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > >
2011/10/11 Sage Weil <sage@newdream.net>:> On Tue, 11 Oct 2011, Christian Brunner wrote: >> 2011/10/11 Sage Weil <sage@newdream.net>: >> > On Tue, 11 Oct 2011, Christian Brunner wrote: >> >> Maybe this one is easier: >> >> >> >> One of our OSDs isn''t starting, because ther is no "current" >> >> directory. What I have are three snap directories. >> >> >> >> total 0 >> >> -rw-r--r-- 1 root root 37 Oct 9 15:57 ceph_fsid >> >> -rw-r--r-- 1 root root 8 Oct 9 15:57 fsid >> >> -rw-r--r-- 1 root root 21 Oct 9 15:57 magic >> >> drwxr-xr-x 1 root root 7986 Oct 11 18:34 snap_506043 >> >> drwxr-xr-x 1 root root 7986 Oct 11 18:34 snap_507364 >> >> drwxr-xr-x 1 root root 7814 Oct 11 18:36 snap_507417 >> >> -rw-r--r-- 1 root root 4 Oct 9 15:57 store_version >> >> -rw-r--r-- 1 root root 2 Oct 9 15:57 whoami >> >> >> >> Is there a way to rollback the latest? >> > >> > That''s what the OSD actually does on startup (roll back to the newest >> > snap_). It''s probably a trivial bug that''s preventing startup now... I''ll >> > take a look. In the meantime, you can clone the latest snap_ to current >> > and it should start! >> > >> > sage >> >> This seems to be a btrfs problem. It fails, when I''m trying to create the clone >> >> # btrfs subvolume snapshot snap_507417 current >> Create a snapshot of ''snap_507417'' in ''./current'' >> ERROR: cannot snapshot ''snap_507417'' >> >> And I get the following kernel messages: >> >> [ 5863.263950] ------------[ cut here ]------------ >> [ 5863.269125] WARNING: at fs/btrfs/inode.c:2335 >> btrfs_orphan_cleanup+0xcd/0x3d0 [btrfs]() >> [ 5863.278142] Hardware name: ProLiant DL180 G6 >> [ 5863.283161] Modules linked in: btrfs zlib_deflate libcrc32c bonding >> ipv6 serio_raw pcspkr ghes hed iTCO_wdt iTCO_vendor_support ixgbe dca >> mdio i7core_edac edac_core iomemory_vsl(P) hpsa squashfs usb_storage >> [last unloaded: scsi_wait_scan] >> [ 5863.307774] Pid: 6349, comm: btrfs Tainted: P W >> 3.0.6-1.fits.2.el6.x86_64 #1 >> [ 5863.316647] Call Trace: >> [ 5863.319648] [<ffffffff8106344f>] warn_slowpath_common+0x7f/0xc0 >> [ 5863.326536] [<ffffffff810634aa>] warn_slowpath_null+0x1a/0x20 >> [ 5863.333146] [<ffffffffa023fb0d>] btrfs_orphan_cleanup+0xcd/0x3d0 [btrfs] >> [ 5863.340839] [<ffffffffa0238381>] ? join_transaction+0x201/0x250 [btrfs] >> [ 5863.348482] [<ffffffffa021fbaa>] ? block_rsv_migrate_bytes+0x3a/0x50 [btrfs] >> [ 5863.356590] [<ffffffffa0261a3b>] btrfs_mksubvol+0x2fb/0x380 [btrfs] >> [ 5863.363726] [<ffffffffa0261bba>] >> btrfs_ioctl_snap_create_transid+0xfa/0x150 [btrfs] >> [ 5863.372445] [<ffffffffa0261c66>] btrfs_ioctl_snap_create+0x56/0x80 [btrfs] >> [ 5863.380398] [<ffffffffa026583e>] btrfs_ioctl+0x2fe/0xd50 [btrfs] >> [ 5863.387344] [<ffffffff8125ed20>] ? inode_has_perm+0x30/0x40 >> [ 5863.393798] [<ffffffff81261a2c>] ? file_has_perm+0xdc/0xf0 >> [ 5863.400114] [<ffffffff8117086a>] do_vfs_ioctl+0x9a/0x5a0 >> [ 5863.406244] [<ffffffff81170e11>] sys_ioctl+0xa1/0xb0 >> [ 5863.412001] [<ffffffff81562882>] system_call_fastpath+0x16/0x1b >> [ 5863.418767] ---[ end trace e3234ecab14ad64c ]--- >> [ 5863.424084] btrfs: Error removing orphan entry, stopping orphan cleanup >> [ 5863.431614] btrfs: could not do orphan cleanup -22 >> >> Can I use an older snapshot as well? > > You''re able to snapshot the others? > > Yeah, any of the snap_ directories will work, although keep in mind when > the OSD starts up it will immediately remove current/ and re-clone the > newest snap_ to current/ again. If the problem is a toxic/broken snap_ > dir, you''ll need to rename it out of the way to avoid hitting the problem > again... > > sageOK - renaming snap_507417 to broken_snap_507417 worked. Two other OSDs crashed at the moment it became online again, but as far as I can see, this is the same problem I''ve reported already. After a couple of OSD restarts, I have them all up again. Thanks for your help. Christian -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html