Moving forward, I created an MDT/MGS on a disk partition:
bio-ppc-45:~# /vol/lustre-1.5.95/sbin/mkfs.lustre --fsname=lus1 --mdt
--mgs --reformat /dev/sda2
Permanent disk data:
Target: lus1-MDTffff
Index: unassigned
Lustre FS: lus1
Mount type: ldiskfs
Flags: 0x75
(MDT MGS needs_index first_time update )
Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr
Parameters:
device size = 10240MB
formatting backing filesystem ldiskfs on /dev/sda2
target name lus1-MDTffff
4k blocks 0
options -J size=400 -i 4096 -I 512 -q -O dir_index -F
mkfs_cmd = mkfs.ext2 -j -b 4096 -L lus1-MDTffff -J size=400 -i 4096 -
I 512 -q -O dir_index -F /dev/sda2
Writing CONFIGS/mountdata
bio-ppc-45:~# mount -t lustre /dev/sda2 /mnt/lustre/mds
Mounted no problem. From console:
LDISKFS FS on sda2, internal journal
LDISKFS-fs: mounted filesystem with ordered data mode.
Lustre: OBD class driver, info@clusterfs.com
Lustre Version: 1.5.95
Build Version: 1.5.95-19700101000000-
PRISTINE-.tmp.x.linux-2.6.18-2.6.18
Lustre: Added LNI 192.5.200.69@tcp [8/256]
Lustre: Accept secure, port 988
Lustre: Lustre Client File System; info@clusterfs.com
Lustre: mount data:
Lustre: device: /dev/sda2
Lustre: flags: 0
kjournald starting. Commit interval 5 seconds
LDISKFS FS on sda2, internal journal
LDISKFS-fs: mounted filesystem with ordered data mode.
Lustre: disk data:
Lustre: server: lus1-MDTffff
Lustre: uuid:
Lustre: fs: lus1
Lustre: index: ffff
Lustre: config: 1
Lustre: flags: 0x75
Lustre: diskfs: ldiskfs
Lustre: options: errors=remount-ro,iopen_nopriv,user_xattr
Lustre: params:
Lustre: comment:
kjournald starting. Commit interval 5 seconds
LDISKFS FS on sda2, internal journal
LDISKFS-fs: mounted filesystem with ordered data mode.
LDISKFS-fs: mballoc enabled
Lustre: MGS MGS started
Lustre: disk data:
Lustre: server: lus1-MDT0000
Lustre: uuid:
Lustre: fs: lus1
Lustre: index: 0000
Lustre: config: 2
Lustre: flags: 0x5
Lustre: diskfs: ldiskfs
Lustre: options: errors=remount-ro,iopen_nopriv,user_xattr
Lustre: params:
Lustre: comment:
Lustre: Enabling user_xattr
Lustre: lus1-MDT0000: new disk, initializing
Lustre: MDT lus1-MDT0000 now serving dev (lus1-
MDT0000/3a216daa-81dc-4683-aefc-4e8660383993) with recovery enabled
Lustre: 0 UP mgs MGS MGS 5
Lustre: 1 UP mgc MGC192.5.200.69@tcp f1842a6f-8434-1fc0-f598-
ce739bcf44ee 6
Lustre: 2 UP mdt MDS MDS_uuid 3
Lustre: 3 UP lov lus1-mdtlov lus1-mdtlov_UUID 4
Lustre: 4 UP mds lus1-MDT0000 lus1-MDT0000_UUID 3
Lustre: mount /dev/sda2 complete
The problem comes when trying to bring an OST online:
bio-ppc-45:~# /vol/lustre-1.5.95/sbin/mkfs.lustre --fsname=lus1 --ost
--mgsnode=bio-ppc-45@tcp0 --reformat /dev/sda3
Permanent disk data:
Target: lus1-OSTffff
Index: unassigned
Lustre FS: lus1
Mount type: ldiskfs
Flags: 0x72
(OST needs_index first_time update )
Persistent mount opts: errors=remount-ro,extents,mballoc
Parameters: mgsnode=192.5.200.69@tcp
device size = 102400MB
formatting backing filesystem ldiskfs on /dev/sda3
target name lus1-OSTffff
4k blocks 0
options -J size=400 -i 16384 -I 256 -q -O dir_index -F
mkfs_cmd = mkfs.ext2 -j -b 4096 -L lus1-OSTffff -J size=400 -i 16384
-I 256 -q -O dir_index -F /dev/sda3
Writing CONFIGS/mountdata
bio-ppc-45:~# mount -t lustre /dev/sda3 /mnt/lustre/osd1
[CRASH]
From the console:
LDISKFS FS on sda3, internal journal
LDISKFS-fs: mounted filesystem with ordered data mode.
Lustre: mount data:
Lustre: device: /dev/sda3
Lustre: flags: 0
kjournald starting. Commit interval 5 seconds
LDISKFS FS on sda3, internal journal
LDISKFS-fs: mounted filesystem with ordered data mode.
Lustre: disk data:
Lustre: server: lus1-OSTffff
Lustre: uuid:
Lustre: fs: lus1
Lustre: index: ffff
Lustre: config: 1
Lustre: flags: 0x72
Lustre: diskfs: ldiskfs
Lustre: options: errors=remount-ro,extents,mballoc
Lustre: params: mgsnode=192.5.200.69@tcp
Lustre: comment:
kjournald starting. Commit interval 5 seconds
LDISKFS FS on sda3, internal journal
LDISKFS-fs: mounted filesystem with ordered data mode.
LDISKFS-fs: file extents enabled
LDISKFS-fs: mballoc enabled
Lustre: disk data:
Lustre: server: lus1-OST0000
Lustre: uuid:
Lustre: fs: lus1
Lustre: index: 0000
Lustre: config: 2
Lustre: flags: 0x2
Lustre: diskfs: ldiskfs
Lustre: options: errors=remount-ro,extents,mballoc
Lustre: params: mgsnode=192.5.200.69@tcp
Lustre: comment:
BUG: soft lockup detected on CPU#1!
Call Trace:
[C000000076991DC0] [C00000000000EBD0] .show_stack+0x68/0x1b0
(unreliable)
[C000000076991E60] [C000000000077484] .softlockup_tick+0xf0/0x13c
[C000000076991F10] [C000000000056114] .run_local_timers+0x1c/0x30
[C000000076991F90] [C00000000001B6C4] .timer_interrupt+0xa4/0x464
[C000000076992070] [C0000000000034EC] decrementer_common+0xec/0x100
--- Exception: 901 at .generic_find_next_zero_le_bit+0xb8/0x21c
LR = .ldiskfs_mb_init_cache+0x590/0x7a4 [ldiskfs]
[C000000076992360] [D0000000009D55D0] .ldiskfs_mb_init_cache
+0x174/0x7a4 [ldiskfs] (unreliable)
[C0000000769924D0] [D0000000009D5DEC] .ldiskfs_mb_load_buddy+0x1ec/
0x358 [ldiskfs]
[C000000076992580] [D0000000009D71D4] .ldiskfs_mb_new_blocks
+0x514/0x14c4 [ldiskfs]
[C000000076992720] [D0000000009D8448] .ldiskfs_new_block+0x4c/0x5c
[ldiskfs]
[C0000000769927B0] [D0000000009D3660] .ldiskfs_ext_get_block
+0x318/0x480 [ldiskfs]
[C0000000769928D0] [D0000000009BEEE0] .ldiskfs_getblk+0xa8/0x2dc
[ldiskfs]
[C0000000769929D0] [D0000000009C04C4] .ldiskfs_bread+0x18/0xb8 [ldiskfs]
[C000000076992A60] [D000000000995960] .fsfilt_ldiskfs_write_record
+0x18c/0x51c [fsfilt_ldiskfs]
[C000000076992B70] [D0000000006AD44C] .llog_lvfs_write_blob
+0x124/0x57c [obdclass]
[C000000076992C60] [D0000000006AF4D8] .llog_lvfs_write_rec
+0xae0/0xe48 [obdclass]
[C000000076992D60] [D000000000957634] .mgc_copy_handler+0x364/0x860
[mgc]
[C000000076992E60] [D0000000006A665C] .llog_process+0xd54/0x12f0
[obdclass]
[C000000076992F70] [D000000000952A88] .mgc_copy_llog+0xc18/0xe10 [mgc]
[C000000076993050] [D0000000009538E4] .mgc_process_log+0xc64/0x144c
[mgc]
[C000000076993240] [D000000000958DC0] .mgc_process_config
+0x1290/0x198c [mgc]
[C000000076993330] [D0000000006F5A6C] .lustre_process_log
+0x1028/0x15dc [obdclass]
[C000000076993460] [D0000000006F712C] .server_start_targets+0x110c/
0x208c [obdclass]
[C000000076993570] [D0000000006F8A50] .server_fill_super+0x9a4/0x1174
[obdclass]
[C000000076993640] [D0000000006FB36C] .lustre_fill_super+0x214c/
0x2374 [obdclass]
[C000000076993730] [C0000000000B07B4] .get_sb_nodev+0x88/0xf8
[C0000000769937D0] [D0000000006E5DA4] .lustre_get_sb+0x20/0x38
[obdclass]
[C000000076993850] [C0000000000B032C] .vfs_kern_mount+0x80/0xe8
[C0000000769938F0] [C0000000000B03F0] .do_kern_mount+0x4c/0x80
[C000000076993990] [C0000000000CE580] .do_mount+0x6d8/0x784
[C000000076993D70] [C0000000000CE6EC] .sys_mount+0xc0/0x140
[C000000076993E30] [C00000000000871C] syscall_exit+0x0/0x40
BUG: soft lockup detected on CPU#0!
Call Trace:
[C00000000FFBF0A0] [C00000000000EBD0] .show_stack+0x68/0x1b0
(unreliable)
[C00000000FFBF140] [C000000000077484] .softlockup_tick+0xf0/0x13c
[C00000000FFBF1F0] [C000000000056114] .run_local_timers+0x1c/0x30
[C00000000FFBF270] [C00000000001B6C4] .timer_interrupt+0xa4/0x464
[C00000000FFBF350] [C0000000000034EC] decrementer_common+0xec/0x100
--- Exception: 901 at .lock_kernel+0x64/0x8c
LR = .nfs_permission+0xd8/0x274
[C00000000FFBF640] [C00000000FFBF6E0] 0xc00000000ffbf6e0 (unreliable)
[C00000000FFBF700] [C0000000000B9EC0] .permission+0xb0/0xe8
[C00000000FFBF780] [C0000000000BB848] .__link_path_walk+0x174/0x15f0
[C00000000FFBF910] [C0000000000BCD60] .link_path_walk+0x9c/0x184
[C00000000FFBFA80] [C0000000000BD34C] .do_path_lookup+0x320/0x368
[C00000000FFBFB30] [C0000000000BE0DC] .__user_walk_fd_it+0x60/0xa0
[C00000000FFBFBD0] [C0000000000B46D0] .vfs_stat_fd+0x70/0xc8
[C00000000FFBFD30] [C0000000000B484C] .sys_newstat+0x2c/0x60
[C00000000FFBFE30] [C00000000000871C] syscall_exit+0x0/0x40
This is, btw, running on a netbooted machine with root on NFS (I
bring it up as I see nfs_permission in the one callback there).
Disk layout:
bio-ppc-45:~# mac-fdisk -l
/dev/sda
# type name length
base ( size ) system
/dev/sda1 Apple_partition_map Apple 63 @
1 ( 31.5k) Partition map
/dev/sda2 Apple_UNIX_SVR2 lustre-mds 20971520 @
64 ( 10.0G) Linux native
/dev/sda3 Apple_UNIX_SVR2 lustre-osd1 209715200 @
20971584 (100.0G) Linux native
/dev/sda4 Apple_Free Extra 550735984 @
230686784 (262.6G) Free space
Block size=512, Number of Blocks=781422768
DeviceType=0x0, DeviceId=0x0
Thanks,
--bob
oh, and one other datapoint - it worked fine when I created mgs/mgt and ost as loopback filesystems on a single machine. On Nov 3, 2006, at 2:43 PM, Robert Olson wrote:> Moving forward, I created an MDT/MGS on a disk partition: > > bio-ppc-45:~# /vol/lustre-1.5.95/sbin/mkfs.lustre --fsname=lus1 -- > mdt --mgs --reformat /dev/sda2 > > Permanent disk data: > Target: lus1-MDTffff > Index: unassigned > Lustre FS: lus1 > Mount type: ldiskfs > Flags: 0x75 > (MDT MGS needs_index first_time update ) > Persistent mount opts: errors=remount-ro,iopen_nopriv,user_xattr > Parameters: > > device size = 10240MB > formatting backing filesystem ldiskfs on /dev/sda2 > target name lus1-MDTffff > 4k blocks 0 > options -J size=400 -i 4096 -I 512 -q -O dir_index -F > mkfs_cmd = mkfs.ext2 -j -b 4096 -L lus1-MDTffff -J size=400 -i > 4096 -I 512 -q -O dir_index -F /dev/sda2 > Writing CONFIGS/mountdata > > bio-ppc-45:~# mount -t lustre /dev/sda2 /mnt/lustre/mds > > > Mounted no problem. From console: > > LDISKFS FS on sda2, internal journal > LDISKFS-fs: mounted filesystem with ordered data mode. > Lustre: OBD class driver, info@clusterfs.com > Lustre Version: 1.5.95 > Build Version: 1.5.95-19700101000000- > PRISTINE-.tmp.x.linux-2.6.18-2.6.18 > Lustre: Added LNI 192.5.200.69@tcp [8/256] > Lustre: Accept secure, port 988 > Lustre: Lustre Client File System; info@clusterfs.com > Lustre: mount data: > Lustre: device: /dev/sda2 > Lustre: flags: 0 > kjournald starting. Commit interval 5 seconds > LDISKFS FS on sda2, internal journal > LDISKFS-fs: mounted filesystem with ordered data mode. > Lustre: disk data: > Lustre: server: lus1-MDTffff > Lustre: uuid: > Lustre: fs: lus1 > Lustre: index: ffff > Lustre: config: 1 > Lustre: flags: 0x75 > Lustre: diskfs: ldiskfs > Lustre: options: errors=remount-ro,iopen_nopriv,user_xattr > Lustre: params: > Lustre: comment: > kjournald starting. Commit interval 5 seconds > LDISKFS FS on sda2, internal journal > LDISKFS-fs: mounted filesystem with ordered data mode. > LDISKFS-fs: mballoc enabled > Lustre: MGS MGS started > Lustre: disk data: > Lustre: server: lus1-MDT0000 > Lustre: uuid: > Lustre: fs: lus1 > Lustre: index: 0000 > Lustre: config: 2 > Lustre: flags: 0x5 > Lustre: diskfs: ldiskfs > Lustre: options: errors=remount-ro,iopen_nopriv,user_xattr > Lustre: params: > Lustre: comment: > Lustre: Enabling user_xattr > Lustre: lus1-MDT0000: new disk, initializing > Lustre: MDT lus1-MDT0000 now serving dev (lus1- > MDT0000/3a216daa-81dc-4683-aefc-4e8660383993) with recovery enabled > Lustre: 0 UP mgs MGS MGS 5 > Lustre: 1 UP mgc MGC192.5.200.69@tcp f1842a6f-8434-1fc0-f598- > ce739bcf44ee 6 > Lustre: 2 UP mdt MDS MDS_uuid 3 > Lustre: 3 UP lov lus1-mdtlov lus1-mdtlov_UUID 4 > Lustre: 4 UP mds lus1-MDT0000 lus1-MDT0000_UUID 3 > Lustre: mount /dev/sda2 complete > > > The problem comes when trying to bring an OST online: > > bio-ppc-45:~# /vol/lustre-1.5.95/sbin/mkfs.lustre --fsname=lus1 -- > ost --mgsnode=bio-ppc-45@tcp0 --reformat /dev/sda3 > > Permanent disk data: > Target: lus1-OSTffff > Index: unassigned > Lustre FS: lus1 > Mount type: ldiskfs > Flags: 0x72 > (OST needs_index first_time update ) > Persistent mount opts: errors=remount-ro,extents,mballoc > Parameters: mgsnode=192.5.200.69@tcp > > device size = 102400MB > formatting backing filesystem ldiskfs on /dev/sda3 > target name lus1-OSTffff > 4k blocks 0 > options -J size=400 -i 16384 -I 256 -q -O dir_index -F > mkfs_cmd = mkfs.ext2 -j -b 4096 -L lus1-OSTffff -J size=400 -i > 16384 -I 256 -q -O dir_index -F /dev/sda3 > Writing CONFIGS/mountdata > bio-ppc-45:~# mount -t lustre /dev/sda3 /mnt/lustre/osd1 > [CRASH] > > From the console: > > LDISKFS FS on sda3, internal journal > LDISKFS-fs: mounted filesystem with ordered data mode. > Lustre: mount data: > Lustre: device: /dev/sda3 > Lustre: flags: 0 > kjournald starting. Commit interval 5 seconds > LDISKFS FS on sda3, internal journal > LDISKFS-fs: mounted filesystem with ordered data mode. > Lustre: disk data: > Lustre: server: lus1-OSTffff > Lustre: uuid: > Lustre: fs: lus1 > Lustre: index: ffff > Lustre: config: 1 > Lustre: flags: 0x72 > Lustre: diskfs: ldiskfs > Lustre: options: errors=remount-ro,extents,mballoc > Lustre: params: mgsnode=192.5.200.69@tcp > Lustre: comment: > kjournald starting. Commit interval 5 seconds > LDISKFS FS on sda3, internal journal > LDISKFS-fs: mounted filesystem with ordered data mode. > LDISKFS-fs: file extents enabled > LDISKFS-fs: mballoc enabled > Lustre: disk data: > Lustre: server: lus1-OST0000 > Lustre: uuid: > Lustre: fs: lus1 > Lustre: index: 0000 > Lustre: config: 2 > Lustre: flags: 0x2 > Lustre: diskfs: ldiskfs > Lustre: options: errors=remount-ro,extents,mballoc > Lustre: params: mgsnode=192.5.200.69@tcp > Lustre: comment: > BUG: soft lockup detected on CPU#1! > Call Trace: > [C000000076991DC0] [C00000000000EBD0] .show_stack+0x68/0x1b0 > (unreliable) > [C000000076991E60] [C000000000077484] .softlockup_tick+0xf0/0x13c > [C000000076991F10] [C000000000056114] .run_local_timers+0x1c/0x30 > [C000000076991F90] [C00000000001B6C4] .timer_interrupt+0xa4/0x464 > [C000000076992070] [C0000000000034EC] decrementer_common+0xec/0x100 > --- Exception: 901 at .generic_find_next_zero_le_bit+0xb8/0x21c > LR = .ldiskfs_mb_init_cache+0x590/0x7a4 [ldiskfs] > [C000000076992360] [D0000000009D55D0] .ldiskfs_mb_init_cache > +0x174/0x7a4 [ldiskfs] (unreliable) > [C0000000769924D0] [D0000000009D5DEC] .ldiskfs_mb_load_buddy+0x1ec/ > 0x358 [ldiskfs] > [C000000076992580] [D0000000009D71D4] .ldiskfs_mb_new_blocks > +0x514/0x14c4 [ldiskfs] > [C000000076992720] [D0000000009D8448] .ldiskfs_new_block+0x4c/0x5c > [ldiskfs] > [C0000000769927B0] [D0000000009D3660] .ldiskfs_ext_get_block > +0x318/0x480 [ldiskfs] > [C0000000769928D0] [D0000000009BEEE0] .ldiskfs_getblk+0xa8/0x2dc > [ldiskfs] > [C0000000769929D0] [D0000000009C04C4] .ldiskfs_bread+0x18/0xb8 > [ldiskfs] > [C000000076992A60] [D000000000995960] .fsfilt_ldiskfs_write_record > +0x18c/0x51c [fsfilt_ldiskfs] > [C000000076992B70] [D0000000006AD44C] .llog_lvfs_write_blob > +0x124/0x57c [obdclass] > [C000000076992C60] [D0000000006AF4D8] .llog_lvfs_write_rec > +0xae0/0xe48 [obdclass] > [C000000076992D60] [D000000000957634] .mgc_copy_handler+0x364/0x860 > [mgc] > [C000000076992E60] [D0000000006A665C] .llog_process+0xd54/0x12f0 > [obdclass] > [C000000076992F70] [D000000000952A88] .mgc_copy_llog+0xc18/0xe10 [mgc] > [C000000076993050] [D0000000009538E4] .mgc_process_log+0xc64/0x144c > [mgc] > [C000000076993240] [D000000000958DC0] .mgc_process_config > +0x1290/0x198c [mgc] > [C000000076993330] [D0000000006F5A6C] .lustre_process_log > +0x1028/0x15dc [obdclass] > [C000000076993460] [D0000000006F712C] .server_start_targets+0x110c/ > 0x208c [obdclass] > [C000000076993570] [D0000000006F8A50] .server_fill_super > +0x9a4/0x1174 [obdclass] > [C000000076993640] [D0000000006FB36C] .lustre_fill_super+0x214c/ > 0x2374 [obdclass] > [C000000076993730] [C0000000000B07B4] .get_sb_nodev+0x88/0xf8 > [C0000000769937D0] [D0000000006E5DA4] .lustre_get_sb+0x20/0x38 > [obdclass] > [C000000076993850] [C0000000000B032C] .vfs_kern_mount+0x80/0xe8 > [C0000000769938F0] [C0000000000B03F0] .do_kern_mount+0x4c/0x80 > [C000000076993990] [C0000000000CE580] .do_mount+0x6d8/0x784 > [C000000076993D70] [C0000000000CE6EC] .sys_mount+0xc0/0x140 > [C000000076993E30] [C00000000000871C] syscall_exit+0x0/0x40 > BUG: soft lockup detected on CPU#0! > Call Trace: > [C00000000FFBF0A0] [C00000000000EBD0] .show_stack+0x68/0x1b0 > (unreliable) > [C00000000FFBF140] [C000000000077484] .softlockup_tick+0xf0/0x13c > [C00000000FFBF1F0] [C000000000056114] .run_local_timers+0x1c/0x30 > [C00000000FFBF270] [C00000000001B6C4] .timer_interrupt+0xa4/0x464 > [C00000000FFBF350] [C0000000000034EC] decrementer_common+0xec/0x100 > --- Exception: 901 at .lock_kernel+0x64/0x8c > LR = .nfs_permission+0xd8/0x274 > [C00000000FFBF640] [C00000000FFBF6E0] 0xc00000000ffbf6e0 (unreliable) > [C00000000FFBF700] [C0000000000B9EC0] .permission+0xb0/0xe8 > [C00000000FFBF780] [C0000000000BB848] .__link_path_walk+0x174/0x15f0 > [C00000000FFBF910] [C0000000000BCD60] .link_path_walk+0x9c/0x184 > [C00000000FFBFA80] [C0000000000BD34C] .do_path_lookup+0x320/0x368 > [C00000000FFBFB30] [C0000000000BE0DC] .__user_walk_fd_it+0x60/0xa0 > [C00000000FFBFBD0] [C0000000000B46D0] .vfs_stat_fd+0x70/0xc8 > [C00000000FFBFD30] [C0000000000B484C] .sys_newstat+0x2c/0x60 > [C00000000FFBFE30] [C00000000000871C] syscall_exit+0x0/0x40 > > This is, btw, running on a netbooted machine with root on NFS (I > bring it up as I see nfs_permission in the one callback there). > > Disk layout: > > bio-ppc-45:~# mac-fdisk -l > /dev/sda > # type name length > base ( size ) system > /dev/sda1 Apple_partition_map Apple 63 @ > 1 ( 31.5k) Partition map > /dev/sda2 Apple_UNIX_SVR2 lustre-mds 20971520 @ > 64 ( 10.0G) Linux native > /dev/sda3 Apple_UNIX_SVR2 lustre-osd1 209715200 @ > 20971584 (100.0G) Linux native > /dev/sda4 Apple_Free Extra 550735984 @ > 230686784 (262.6G) Free space > > Block size=512, Number of Blocks=781422768 > DeviceType=0x0, DeviceId=0x0 > > Thanks, > --bob > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss@clusterfs.com > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss >
Hm, I have to take that back. When I try it on this machine using loopback devices I just got the same crash. --bob On Nov 3, 2006, at 3:09 PM, Robert Olson wrote:> oh, and one other datapoint - it worked fine when I created mgs/mgt > and ost as loopback filesystems on a single machine.
On Fri, 3 Nov 2006, Robert Olson wrote:> BUG: soft lockup detected on CPU#1! > Call Trace: > [C000000076991DC0] [C00000000000EBD0] .show_stack+0x68/0x1b0 (unreliable) > [C000000076991E60] [C000000000077484] .softlockup_tick+0xf0/0x13c > [C000000076991F10] [C000000000056114] .run_local_timers+0x1c/0x30 > [C000000076991F90] [C00000000001B6C4] .timer_interrupt+0xa4/0x464 > [C000000076992070] [C0000000000034EC] decrementer_common+0xec/0x100 > --- Exception: 901 at .generic_find_next_zero_le_bit+0xb8/0x21c > LR = .ldiskfs_mb_init_cache+0x590/0x7a4 [ldiskfs] > [C000000076992360] [D0000000009D55D0] .ldiskfs_mb_init_cache+0x174/0x7a4 > [ldiskfs] (unreliable) > [C0000000769924D0] [D0000000009D5DEC] .ldiskfs_mb_load_buddy+0x1ec/0x358 > [ldiskfs]You may need patches from this bug: https://bugzilla.lustre.org/show_bug.cgi?id=10634 HTH -- Jean-Marc Saffroy - jean-marc.saffroy@ext.bull.net
Hm, this is tricky. It looks like these patches include: the addition of the multiblock allocator, already patched in my kernel (from one of the lustre patches I believe) the use of EXT3_MOUNT_MBALLOC ext2_find_next_le_bit and its use change to the signature of ext3_free_blocks addition of buddy allocator I have the feeling the patches for these all exist in the lustre kernel patches dir, I''m just missing some. Still exploring. Thanks, --bob On Nov 3, 2006, at 5:39 PM, Jean-Marc Saffroy wrote:> On Fri, 3 Nov 2006, Robert Olson wrote: > >> BUG: soft lockup detected on CPU#1! >> Call Trace: >> [C000000076991DC0] [C00000000000EBD0] .show_stack+0x68/0x1b0 >> (unreliable) >> [C000000076991E60] [C000000000077484] .softlockup_tick+0xf0/0x13c >> [C000000076991F10] [C000000000056114] .run_local_timers+0x1c/0x30 >> [C000000076991F90] [C00000000001B6C4] .timer_interrupt+0xa4/0x464 >> [C000000076992070] [C0000000000034EC] decrementer_common+0xec/0x100 >> --- Exception: 901 at .generic_find_next_zero_le_bit+0xb8/0x21c >> LR = .ldiskfs_mb_init_cache+0x590/0x7a4 [ldiskfs] >> [C000000076992360] [D0000000009D55D0] .ldiskfs_mb_init_cache >> +0x174/0x7a4 [ldiskfs] (unreliable) >> [C0000000769924D0] [D0000000009D5DEC] .ldiskfs_mb_load_buddy+0x1ec/ >> 0x358 [ldiskfs] > > You may need patches from this bug: > https://bugzilla.lustre.org/show_bug.cgi?id=10634 > > > HTH > > -- > Jean-Marc Saffroy - jean-marc.saffroy@ext.bull.net >