thr3ads.net - Lustre discuss - [Lustre-discuss] Has anybody seen: ldiskfs_get_inode

If this information is useful, please help other people find it:
Share via:

"Budsjö, Martin"

2009-Nov-09 13:37 UTC

[Lustre-discuss] Has anybody seen: ldiskfs_get_inode_block: bad inode number: 1

Hi

After our upgrade to 1.6.7.2 (from 1.4.12) we started to get
"ldiskfs_get_inode_block: bad inode number: 1" errors. This causes the
ldiskfs filesystem to remount read only making the whole lustre filesystem read
only.

We took the filesytem offline and ran fsck on them. Some minor errors was found
and fixxed, but the error persists. Any clue?


Regards
Martin Budsj?

Example of the error:

Nov  4 03:06:32 hlc305 kernel: LDISKFS-fs error (device dm-28):
ldiskfs_get_inode_block: bad inode number: 1
Nov  4 03:06:32 hlc305 kernel: Remounting filesystem read-only
Nov  4 03:06:32 hlc305 kernel: LDISKFS-fs error (device dm-28):
ldiskfs_get_inode_block: bad inode number: 1
Nov  4 03:06:32 hlc305 kernel: LustreError:
1316:0:(fsfilt-ldiskfs.c:280:fsfilt_ldiskfs_start()) error starting handle for
op 4 (35 credits): rc -30
Nov  4 03:06:32 hlc305 kernel: LustreError:
1316:0:(fsfilt-ldiskfs.c:280:fsfilt_ldiskfs_start()) Skipped 52 previous similar
messages
Nov  4 03:06:32 hlc305 kernel: LustreError:
1316:0:(mds_open.c:769:mds_finish_open()) mds_create_objects: rc = -30
Nov  4 03:06:32 hlc305 kernel: LustreError:
1316:0:(mds_open.c:769:mds_finish_open()) Skipped 1 previous similar message
Nov  4 03:06:32 hlc305 kernel: LustreError:
1316:0:(mds_reint.c:154:mds_finish_transno()) fsfilt_start: -30
Nov  4 03:06:32 hlc305 kernel: LustreError:
1316:0:(mds_reint.c:154:mds_finish_transno()) Skipped 52 previous similar
messages
Nov  4 03:06:32 hlc305 kernel: LDISKFS-fs error (device dm-28):
ldiskfs_get_inode_block: bad inode number: 1
Nov  4 03:06:32 hlc305 kernel: LDISKFS-fs error (device dm-28):
ldiskfs_get_inode_block: bad inode number: 1
Nov  4 03:06:32 hlc305 kernel: LustreError:
1189:0:(mds_open.c:769:mds_finish_open()) mds_create_objects: rc = -30
Nov  4 03:07:14 hlc305 kernel: Lustre: Failing over spehome-MDT0000
Nov  4 03:07:14 hlc305 kernel: Lustre: Skipped 12 previous similar messages
Nov  4 03:07:14 hlc305 kernel: Lustre: *** setting obd spehome-MDT0000 device
''dm-28'' read-only ***
Nov  4 03:07:14 hlc305 kernel: Turning device dm-28 (0xfd0001c) read-only
Nov  4 03:07:15 hlc305 kernel: LustreError: 1243:0:(handler.c:1601:mds_handle())
operation 101 on unconnected MDS from 12345-10.0.0.86 at tcp
Nov  4 03:07:15 hlc305 kernel: LustreError: 1243:0:(handler.c:1601:mds_handle())
Skipped 74 previous similar messages
Nov  4 03:07:15 hlc305 kernel: LustreError:
1243:0:(ldlm_lib.c:1643:target_send_reply_msg()) @@@ processing error (-107) 
req at 00000100cabd0800 x30168375/t0 o101-><?>@<?>:
0/0 lens 440/0 e 0 to 0 dl 1257300535 ref 1 fl Interpret:/0/0 rc -107/0
Nov  4 03:07:15 hlc305 kernel: LustreError:
1243:0:(ldlm_lib.c:1643:target_send_reply_msg()) Skipped 74 previous similar
messages
Nov  4 03:07:15 hlc305 kernel: LustreError: 137-5: UUID
''mds5_UUID'' is not available  for connect (stopping)
Nov  4 03:07:15 hlc305 kernel: LustreError:
17131:0:(llog_obd.c:380:llog_obd_origin_cleanup()) failure destroying log during
cleanup: -30
Nov  4 03:07:15 hlc305 kernel: LustreError:
17131:0:(llog_obd.c:380:llog_obd_origin_cleanup()) Skipped 6 previous similar
messages
Nov  4 03:07:15 hlc305 kernel: LustreError:
17131:0:(fsfilt-ldiskfs.c:1236:fsfilt_ldiskfs_write_record()) can''t
start transaction for 34 blocks (8192 bytes)
Nov  4 03:07:15 hlc305 kernel: LustreError:
17131:0:(fsfilt-ldiskfs.c:1236:fsfilt_ldiskfs_write_record()) Skipped 6 previous
similar messages
Nov  4 03:07:15 hlc305 kernel: LustreError:
17131:0:(llog_lvfs.c:116:llog_lvfs_write_blob()) error writing log record: rc
-30
Nov  4 03:07:15 hlc305 kernel: LustreError:
17131:0:(llog_lvfs.c:116:llog_lvfs_write_blob()) Skipped 6 previous similar
messages
Nov  4 03:07:15 hlc305 kernel: LustreError:
17131:0:(llog.c:135:llog_cancel_rec()) Failure re-writing header -30
Nov  4 03:07:15 hlc305 kernel: LustreError:
17131:0:(llog.c:135:llog_cancel_rec()) Skipped 6 previous similar messages
Nov  4 03:07:15 hlc305 kernel: LustreError:
17131:0:(handler.c:1963:mds_update_server_data()) error writing MDS server data:
rc = -30
Nov  4 03:07:15 hlc305 kernel: Lustre: spehome-MDT0000: shutting down for
failover; client state will be preserved.
Nov  4 03:07:15 hlc305 kernel: Lustre: MDT spehome-MDT0000 has stopped.
Nov  4 03:07:15 hlc305 kernel: VFS: Busy inodes after unmount. Self-destruct in
5 seconds.  Have a nice day...
Nov  4 03:07:15 hlc305 kernel: Removing read-only on unknown block (0xfd0001c)
Nov  4 03:07:15 hlc305 kernel: Lustre: server umount spehome-MDT0000 complete
Nov  4 03:07:16 hlc305 kernel: kjournald starting.  Commit interval 5 seconds
Nov  4 03:07:16 hlc305 kernel: LDISKFS-fs warning: mounting fs with errors,
running e2fsck is recommended
Nov  4 03:07:16 hlc305 kernel: LDISKFS FS on dm-28, internal journal
Nov  4 03:07:16 hlc305 kernel: LDISKFS-fs: recovery complete.
Nov  4 03:07:16 hlc305 kernel: LDISKFS-fs: mounted filesystem with ordered data
mode.
Nov  4 03:07:16 hlc305 kernel: kjournald starting.  Commit interval 5 seconds
Nov  4 03:07:16 hlc305 kernel: LDISKFS-fs warning: mounting fs with errors,
running e2fsck is recommended
Nov  4 03:07:16 hlc305 kernel: LDISKFS FS on dm-28, internal journal
Nov  4 03:07:16 hlc305 kernel: LDISKFS-fs: mounted filesystem with ordered data
mode.
Nov  4 03:07:16 hlc305 kernel: Lustre: Enabling user_xattr
Nov  4 03:07:16 hlc305 kernel: Lustre:
17137:0:(mds_fs.c:511:mds_init_server_data()) RECOVERY: service spehome-MDT0000,
166 recoverable clients, last_transno 7646800754
Nov  4 03:07:16 hlc305 kernel: Lustre: 636:0:(mds_lov.c:1075:mds_notify()) MDS
spehome-MDT0000: in recovery, not resetting orphans on ost1spehome_UUID
Nov  4 03:07:16 hlc305 kernel: Lustre: MDT spehome-MDT0000 now serving mds5_UUID
(spehome-MDT0000/2af526ca-85d4-996b-4773-4898812d6a31), but will be in recovery
for at le
ast 5:00, or until 166 clients reconnect. During this time new clients will not
be allowed to connect. Recovery progress can be monitored by watching
/proc/fs/lustre/mds/
spehome-MDT0000/recovery_status.
Nov  4 03:07:16 hlc305 kernel: Lustre: Server spehome-MDT0000 on device
/dev/vg_mds/spehome has started
Nov  4 03:07:16 hlc305 kernel: Lustre: spehome-MDT0000: temporarily refusing
client connection from 10.0.0.69 at tcp
Nov  4 03:07:16 hlc305 kernel: Lustre:
1247:0:(ldlm_lib.c:1240:check_and_start_recovery_timer()) spehome-MDT0000:
starting recovery timer
Nov  4 03:07:16 hlc305 kernel: Lustre:
1249:0:(ldlm_lib.c:1591:target_queue_last_replay_reply()) spehome-MDT0000: 165
recoverable clients remain
Nov  4 03:07:16 hlc305 kernel: Lustre:
1249:0:(ldlm_lib.c:1591:target_queue_last_replay_reply()) Skipped 7 previous
similar messages
Nov  4 03:07:16 hlc305 kernel: Lustre: 1284:0:(mds_open.c:841:mds_open_by_fid())
Orphan d5839c:772a0a47 found and opened in PENDING directory
Nov  4 03:07:16 hlc305 kernel: Lustre: 1284:0:(mds_open.c:841:mds_open_by_fid())
Skipped 1 previous similar message


--------------------------------------------------------------------------
Confidentiality Notice: This message is private and may contain confidential and
proprietary information. If you have received this message in error, please
notify us and remove it from your system and note that you must not copy,
distribute or take any action in reliance on it. Any unauthorized use or
disclosure of the contents of this message is not permitted and may be unlawful.

Andreas Dilger

2009-Nov-09 20:02 UTC

head link

[Lustre-discuss] Has anybody seen: ldiskfs_get_inode_block: bad inode number: 1

On 2009-11-09, at 06:37, Budsj?, Martin wrote:> After our upgrade to 1.6.7.2 (from 1.4.12) we started to get  
> "ldiskfs_get_inode_block: bad inode number: 1" errors. This
causes
> the ldiskfs filesystem to remount read only making the whole lustre  
> filesystem read only.
>
> We took the filesytem offline and ran fsck on them. Some minor  
> errors was found and fixxed, but the error persists. Any clue?
That inode number is invalid to be used, so either there is a  
directory entry
with that value (seems unlikely to exist after an e2fsck run), or some  
client
is incorrectly requesting that inode number (possibly cached after  
having read
it from the MDS previously?).

It will probably go away if you reboot the client(s) that have cached  
this
inode number, but it isn''t clear from the error messages below which  
client
that is.  You might be able to find that in the Lustre kernel debug  
logs.
> Example of the error:
>
> Nov  4 03:06:32 hlc305 kernel: LDISKFS-fs error (device dm-28):  
> ldiskfs_get_inode_block: bad inode number: 1
> Nov  4 03:06:32 hlc305 kernel: Remounting filesystem read-only
> Nov  4 03:06:32 hlc305 kernel: LDISKFS-fs error (device dm-28):  
> ldiskfs_get_inode_block: bad inode number: 1
> Nov  4 03:06:32 hlc305 kernel: LustreError: 1316:0:(fsfilt-ldiskfs.c: 
> 280:fsfilt_ldiskfs_start()) error starting handle for op 4 (35  
> credits): rc -30
> Nov  4 03:06:32 hlc305 kernel: LustreError: 1316:0:(fsfilt-ldiskfs.c: 
> 280:fsfilt_ldiskfs_start()) Skipped 52 previous similar messages
> Nov  4 03:06:32 hlc305 kernel: LustreError: 1316:0:(mds_open.c: 
> 769:mds_finish_open()) mds_create_objects: rc = -30
> Nov  4 03:06:32 hlc305 kernel: LustreError: 1316:0:(mds_open.c: 
> 769:mds_finish_open()) Skipped 1 previous similar message
> Nov  4 03:06:32 hlc305 kernel: LustreError: 1316:0:(mds_reint.c: 
> 154:mds_finish_transno()) fsfilt_start: -30
> Nov  4 03:06:32 hlc305 kernel: LustreError: 1316:0:(mds_reint.c: 
> 154:mds_finish_transno()) Skipped 52 previous similar messages
> Nov  4 03:06:32 hlc305 kernel: LDISKFS-fs error (device dm-28):  
> ldiskfs_get_inode_block: bad inode number: 1
> Nov  4 03:06:32 hlc305 kernel: LDISKFS-fs error (device dm-28):  
> ldiskfs_get_inode_block: bad inode number: 1
> Nov  4 03:06:32 hlc305 kernel: LustreError: 1189:0:(mds_open.c: 
> 769:mds_finish_open()) mds_create_objects: rc = -30
> Nov  4 03:07:14 hlc305 kernel: Lustre: Failing over spehome-MDT0000
> Nov  4 03:07:14 hlc305 kernel: Lustre: Skipped 12 previous similar  
> messages
> Nov  4 03:07:14 hlc305 kernel: Lustre: *** setting obd spehome- 
> MDT0000 device ''dm-28'' read-only ***
> Nov  4 03:07:14 hlc305 kernel: Turning device dm-28 (0xfd0001c) read- 
> only
> Nov  4 03:07:15 hlc305 kernel: LustreError: 1243:0:(handler.c: 
> 1601:mds_handle()) operation 101 on unconnected MDS from  
> 12345-10.0.0.86 at tcp
> Nov  4 03:07:15 hlc305 kernel: LustreError: 1243:0:(handler.c: 
> 1601:mds_handle()) Skipped 74 previous similar messages
> Nov  4 03:07:15 hlc305 kernel: LustreError: 1243:0:(ldlm_lib.c: 
> 1643:target_send_reply_msg()) @@@ processing error (-107)   
> req at 00000100cabd0800 x30168375/t0 o101-><?>@<?>:
> 0/0 lens 440/0 e 0 to 0 dl 1257300535 ref 1 fl Interpret:/0/0 rc  
> -107/0
> Nov  4 03:07:15 hlc305 kernel: LustreError: 1243:0:(ldlm_lib.c: 
> 1643:target_send_reply_msg()) Skipped 74 previous similar messages
> Nov  4 03:07:15 hlc305 kernel: LustreError: 137-5: UUID
''mds5_UUID''
> is not available  for connect (stopping)
> Nov  4 03:07:15 hlc305 kernel: LustreError: 17131:0:(llog_obd.c: 
> 380:llog_obd_origin_cleanup()) failure destroying log during  
> cleanup: -30
> Nov  4 03:07:15 hlc305 kernel: LustreError: 17131:0:(llog_obd.c: 
> 380:llog_obd_origin_cleanup()) Skipped 6 previous similar messages
> Nov  4 03:07:15 hlc305 kernel: LustreError: 17131:0:(fsfilt- 
> ldiskfs.c:1236:fsfilt_ldiskfs_write_record()) can''t start  
> transaction for 34 blocks (8192 bytes)
> Nov  4 03:07:15 hlc305 kernel: LustreError: 17131:0:(fsfilt- 
> ldiskfs.c:1236:fsfilt_ldiskfs_write_record()) Skipped 6 previous  
> similar messages
> Nov  4 03:07:15 hlc305 kernel: LustreError: 17131:0:(llog_lvfs.c: 
> 116:llog_lvfs_write_blob()) error writing log record: rc -30
> Nov  4 03:07:15 hlc305 kernel: LustreError: 17131:0:(llog_lvfs.c: 
> 116:llog_lvfs_write_blob()) Skipped 6 previous similar messages
> Nov  4 03:07:15 hlc305 kernel: LustreError: 17131:0:(llog.c: 
> 135:llog_cancel_rec()) Failure re-writing header -30
> Nov  4 03:07:15 hlc305 kernel: LustreError: 17131:0:(llog.c: 
> 135:llog_cancel_rec()) Skipped 6 previous similar messages
> Nov  4 03:07:15 hlc305 kernel: LustreError: 17131:0:(handler.c: 
> 1963:mds_update_server_data()) error writing MDS server data: rc = -30
> Nov  4 03:07:15 hlc305 kernel: Lustre: spehome-MDT0000: shutting  
> down for failover; client state will be preserved.
> Nov  4 03:07:15 hlc305 kernel: Lustre: MDT spehome-MDT0000 has  
> stopped.
> Nov  4 03:07:15 hlc305 kernel: VFS: Busy inodes after unmount. Self- 
> destruct in 5 seconds.  Have a nice day...
> Nov  4 03:07:15 hlc305 kernel: Removing read-only on unknown block  
> (0xfd0001c)
> Nov  4 03:07:15 hlc305 kernel: Lustre: server umount spehome-MDT0000  
> complete
> Nov  4 03:07:16 hlc305 kernel: kjournald starting.  Commit interval  
> 5 seconds
> Nov  4 03:07:16 hlc305 kernel: LDISKFS-fs warning: mounting fs with  
> errors, running e2fsck is recommended
> Nov  4 03:07:16 hlc305 kernel: LDISKFS FS on dm-28, internal journal
> Nov  4 03:07:16 hlc305 kernel: LDISKFS-fs: recovery complete.
> Nov  4 03:07:16 hlc305 kernel: LDISKFS-fs: mounted filesystem with  
> ordered data mode.
> Nov  4 03:07:16 hlc305 kernel: kjournald starting.  Commit interval  
> 5 seconds
> Nov  4 03:07:16 hlc305 kernel: LDISKFS-fs warning: mounting fs with  
> errors, running e2fsck is recommended
> Nov  4 03:07:16 hlc305 kernel: LDISKFS FS on dm-28, internal journal
> Nov  4 03:07:16 hlc305 kernel: LDISKFS-fs: mounted filesystem with  
> ordered data mode.
> Nov  4 03:07:16 hlc305 kernel: Lustre: Enabling user_xattr
> Nov  4 03:07:16 hlc305 kernel: Lustre: 17137:0:(mds_fs.c: 
> 511:mds_init_server_data()) RECOVERY: service spehome-MDT0000, 166  
> recoverable clients, last_transno 7646800754
> Nov  4 03:07:16 hlc305 kernel: Lustre: 636:0:(mds_lov.c: 
> 1075:mds_notify()) MDS spehome-MDT0000: in recovery, not resetting  
> orphans on ost1spehome_UUID
> Nov  4 03:07:16 hlc305 kernel: Lustre: MDT spehome-MDT0000 now  
> serving mds5_UUID (spehome- 
> MDT0000/2af526ca-85d4-996b-4773-4898812d6a31), but will be in  
> recovery for at le
> ast 5:00, or until 166 clients reconnect. During this time new  
> clients will not be allowed to connect. Recovery progress can be  
> monitored by watching /proc/fs/lustre/mds/
> spehome-MDT0000/recovery_status.
> Nov  4 03:07:16 hlc305 kernel: Lustre: Server spehome-MDT0000 on  
> device /dev/vg_mds/spehome has started
> Nov  4 03:07:16 hlc305 kernel: Lustre: spehome-MDT0000: temporarily  
> refusing client connection from 10.0.0.69 at tcp
> Nov  4 03:07:16 hlc305 kernel: Lustre: 1247:0:(ldlm_lib.c: 
> 1240:check_and_start_recovery_timer()) spehome-MDT0000: starting  
> recovery timer
> Nov  4 03:07:16 hlc305 kernel: Lustre: 1249:0:(ldlm_lib.c: 
> 1591:target_queue_last_replay_reply()) spehome-MDT0000: 165  
> recoverable clients remain
> Nov  4 03:07:16 hlc305 kernel: Lustre: 1249:0:(ldlm_lib.c: 
> 1591:target_queue_last_replay_reply()) Skipped 7 previous similar  
> messages
> Nov  4 03:07:16 hlc305 kernel: Lustre: 1284:0:(mds_open.c: 
> 841:mds_open_by_fid()) Orphan d5839c:772a0a47 found and opened in  
> PENDING directory
> Nov  4 03:07:16 hlc305 kernel: Lustre: 1284:0:(mds_open.c: 
> 841:mds_open_by_fid()) Skipped 1 previous similar message
>
>
> --------------------------------------------------------------------------
> Confidentiality Notice: This message is private and may contain  
> confidential and proprietary information. If you have received this  
> message in error, please notify us and remove it from your system  
> and note that you must not copy, distribute or take any action in  
> reliance on it. Any unauthorized use or disclosure of the contents  
> of this message is not permitted and may be unlawful.
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

Lustre discuss - Nov 2009 - Has anybody seen: ldiskfs_get_inode_block: bad inode number: 1

[Lustre-discuss] Has anybody seen: ldiskfs_get_inode_block: bad inode number: 1

[Lustre-discuss] Has anybody seen: ldiskfs_get_inode_block: bad inode number: 1