thr3ads.net - Lustre discuss - [Lustre-discuss] Patching 2.6.9-11 kernel with Lustre 1.4.5.1 [May 2006]

If this information is useful, please help other people find it:
Share via:

Selvi Kadirvel

2006-May-19 07:36 UTC

[Lustre-discuss] Patching 2.6.9-11 kernel with Lustre 1.4.5.1

I now tried to patch the 2.6.9-11 RHEL kernel with lustre 1.4.5  
patches, the kernel seems to build just fine. I can boot into it and  
everything seems ok, till I do a copy of  a directory with a large  
number of files from a remote home directory into the local directory.

tar cf - . | ( cd /var/tmp/linux-2.6/ ; tar xvfBp -)

The kernel panics with the oops message shown below. I try the tar  
command before I even install lustre to test if the patched kernel is  
stable.

Similarly I can do a ''scp'' of small files, but when i try a
large
directory, kernel panics again. Does anyone know what is happening  
here? Do I *have* to use the ldiskfs-2.6-rhel4.series? ( I used the  
2.6-rhel4.series of patches since I got some errors using the latter  
during the make modules step)

Any suggestions/ideas would help. Thanks!

-Selvi

[root@hpcio4 10.13.16.46-2006-01-23-17:45]# cat log
Unable to handle kernel paging request at 000000000000e551 RIP:
<ffffffff8015a64a>{kmem_getpages+130}
PML4 178cc4067 PGD 178cb5067 PMD 0
Oops: 0000 [1] SMP
CPU 0
Modules linked in: netconsole(U) netdump(U) nfs(U) nfsd(U) exportfs 
(U) lockd(U) autofs4(U) i2c_dev(U) i2c_core(U) sunrpc(U) dm_mod(U)  
button(U) battery(U) ac(U) ohci_hcd(U) ehci_hcd(U) tg3(U) e100(U) mii 
(U) bonding(U) sg(U) qla2300(U) qla2xxx(U) scsi_transport_fc(U) ext3 
(U) jbd(U) sata_nv(U) libata(U) sd_mod(U) scsi_mod(U)
Pid: 6625, comm: tar Not tainted 2.6.9-11.ELcustom
RIP: 0010:[<ffffffff8015a64a>] <ffffffff8015a64a>{kmem_getpages+130}
RSP: 0018:0000010178cd1a08  EFLAGS: 00010013
RAX: ffffffff7fffffff RBX: 000001007ff11cc0 RCX: 000000000000cc61
RDX: 000001007ff11d28 RSI: 0000010005f926c8 RDI: 000001000000f000
RBP: 0000000000000040 R08: 000001016bc1f000 R09: 00000101764f9e80
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000220
R13: 000001007ff11cc0 R14: 0000000000000000 R15: 0000000000000003
FS:  0000002a95563b00(0000) GS:ffffffff80459800(0000) knlGS: 
00000000f7fafbb0
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 000000000000e551 CR3: 0000000000101000 CR4: 00000000000006e0
Process tar (pid: 6625, threadinfo 0000010178cd0000, task  
000001000cd717f0)
Stack: 0000000000000246 0000000000000246 000001007ff11d68  
ffffffff8015ad87
        0000022000000220 0000000000000220 000001007ff11cc0  
000001016c3ffce8
        0000000000000000 0000000000000000
Call Trace:<ffffffff8015ad87>{cache_alloc_refill+615}  
<ffffffff8015aabf>{kmem_cache_alloc+90}
        <ffffffff801dc603>{radix_tree_node_alloc+19}  
<ffffffff801dc7bf>{radix_tree_insert+254}
        <ffffffffa01b3f4b>{:nfs:nfs_update_request+289}  
<ffffffffa01b50e5>{:nfs:nfs_updatepage+288}
        <ffffffffa01abd0d>{:nfs:nfs_commit_write+78}  
<ffffffff8015514c>{generic_file_buffered_write+927}
        <ffffffff801302a5>{try_to_wake_up+734} <ffffffff80155733> 
{generic_file_aio_write_nolock+732}
        <ffffffff801557e4>{generic_file_aio_write+126}  
<ffffffffa01abe08>{:nfs:nfs_file_write+177}
        <ffffffff80171e99>{do_sync_write+173} <ffffffff801712e8> 
{dentry_open_it+284}
        <ffffffff80171443>{filp_open+122} <ffffffff801330ae> 
{autoremove_wake_function+0}
        <ffffffff8018bc8c>{dnotify_parent+34} <ffffffff80171f94> 
{vfs_write+207}
        <ffffffff8017207c>{sys_write+69} <ffffffff8011003e> 
{system_call+126}


Code: 48 8b 91 f0 18 00 00 76 07 b8 00 00 00 80 eb 0a 48 b8 00 00
RIP <ffffffff8015a64a>{kmem_getpages+130} RSP <0000010178cd1a08>
CR2: 000000000000e551

Mc Carthy, Fergal

2006-May-19 07:36 UTC

head link

[Lustre-discuss] Patching 2.6.9-11 kernel with Lustre 1.4.5.1

The ldiskfs-*.series file would have been used automaticallty to create
the lustre ldiskfs.ko. When you build lustre it would have taken a copy
of the ext3 dir from the main kernel and applied the patches to create
the ldiskfs sources.

The problem you have here appears to be some sort of memory corruption
issue which your NFS clienting is trapping.

I am not familiar with this issue but the one thought that I have to
suggest is to try tuning down the max_cached_mb setting for the Lustre
file system; it defaults to 3/4 of available memory and if you are
reading a large amount of data and transferring it via NFS that could be
causing memory pressure issues between NFS and Lustre.

This setting can be found as /proc/fs/lustre/llite/fsX/max_cached_mb,
where X is the mounted Lustre file system index, always 0 if you are
only mounting a single FS.

Try tuning it down to 1/2 or 1/3 of memory and seeing if that helps...
It should reduce the likelihood of Lustre and NFS fighting over buffer
cache space.

Also I suggest searching in bugzilla.lustre.org to see if there is a
similar signature to match this Oops.

Fergal.

--

Fergal.McCarthy@HP.com

(The contents of this message and any attachments to it are confidential
and may be legally privileged. If you have received this message in
error you should delete it from your system immediately and advise the
sender. To any recipient of this message within HP, unless otherwise
stated, you should consider this message and attachments as "HP
CONFIDENTIAL".)
 

-----Original Message-----
From: lustre-discuss-admin@lists.clusterfs.com
[mailto:lustre-discuss-admin@lists.clusterfs.com] On Behalf Of Selvi
Kadirvel
Sent: 27 January 2006 18:46
To: lustre-discuss@lists.clusterfs.com
Cc: Selvi Kadirvel
Subject: RE: [Lustre-discuss] Patching 2.6.9-11 kernel with Lustre
1.4.5.1


I now tried to patch the 2.6.9-11 RHEL kernel with lustre 1.4.5  
patches, the kernel seems to build just fine. I can boot into it and  
everything seems ok, till I do a copy of  a directory with a large  
number of files from a remote home directory into the local directory.

tar cf - . | ( cd /var/tmp/linux-2.6/ ; tar xvfBp -)

The kernel panics with the oops message shown below. I try the tar  
command before I even install lustre to test if the patched kernel is  
stable.

Similarly I can do a ''scp'' of small files, but when i try a
large
directory, kernel panics again. Does anyone know what is happening  
here? Do I *have* to use the ldiskfs-2.6-rhel4.series? ( I used the  
2.6-rhel4.series of patches since I got some errors using the latter  
during the make modules step)

Any suggestions/ideas would help. Thanks!

-Selvi

[root@hpcio4 10.13.16.46-2006-01-23-17:45]# cat log
Unable to handle kernel paging request at 000000000000e551 RIP:
<ffffffff8015a64a>{kmem_getpages+130}
PML4 178cc4067 PGD 178cb5067 PMD 0
Oops: 0000 [1] SMP
CPU 0
Modules linked in: netconsole(U) netdump(U) nfs(U) nfsd(U) exportfs 
(U) lockd(U) autofs4(U) i2c_dev(U) i2c_core(U) sunrpc(U) dm_mod(U)  
button(U) battery(U) ac(U) ohci_hcd(U) ehci_hcd(U) tg3(U) e100(U) mii 
(U) bonding(U) sg(U) qla2300(U) qla2xxx(U) scsi_transport_fc(U) ext3 
(U) jbd(U) sata_nv(U) libata(U) sd_mod(U) scsi_mod(U)
Pid: 6625, comm: tar Not tainted 2.6.9-11.ELcustom
RIP: 0010:[<ffffffff8015a64a>] <ffffffff8015a64a>{kmem_getpages+130}
RSP: 0018:0000010178cd1a08  EFLAGS: 00010013
RAX: ffffffff7fffffff RBX: 000001007ff11cc0 RCX: 000000000000cc61
RDX: 000001007ff11d28 RSI: 0000010005f926c8 RDI: 000001000000f000
RBP: 0000000000000040 R08: 000001016bc1f000 R09: 00000101764f9e80
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000220
R13: 000001007ff11cc0 R14: 0000000000000000 R15: 0000000000000003
FS:  0000002a95563b00(0000) GS:ffffffff80459800(0000) knlGS: 
00000000f7fafbb0
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 000000000000e551 CR3: 0000000000101000 CR4: 00000000000006e0
Process tar (pid: 6625, threadinfo 0000010178cd0000, task  
000001000cd717f0)
Stack: 0000000000000246 0000000000000246 000001007ff11d68  
ffffffff8015ad87
        0000022000000220 0000000000000220 000001007ff11cc0  
000001016c3ffce8
        0000000000000000 0000000000000000
Call Trace:<ffffffff8015ad87>{cache_alloc_refill+615}  
<ffffffff8015aabf>{kmem_cache_alloc+90}
        <ffffffff801dc603>{radix_tree_node_alloc+19}  
<ffffffff801dc7bf>{radix_tree_insert+254}
        <ffffffffa01b3f4b>{:nfs:nfs_update_request+289}  
<ffffffffa01b50e5>{:nfs:nfs_updatepage+288}
        <ffffffffa01abd0d>{:nfs:nfs_commit_write+78}  
<ffffffff8015514c>{generic_file_buffered_write+927}
        <ffffffff801302a5>{try_to_wake_up+734} <ffffffff80155733> 
{generic_file_aio_write_nolock+732}
        <ffffffff801557e4>{generic_file_aio_write+126}  
<ffffffffa01abe08>{:nfs:nfs_file_write+177}
        <ffffffff80171e99>{do_sync_write+173} <ffffffff801712e8> 
{dentry_open_it+284}
        <ffffffff80171443>{filp_open+122} <ffffffff801330ae> 
{autoremove_wake_function+0}
        <ffffffff8018bc8c>{dnotify_parent+34} <ffffffff80171f94> 
{vfs_write+207}
        <ffffffff8017207c>{sys_write+69} <ffffffff8011003e> 
{system_call+126}


Code: 48 8b 91 f0 18 00 00 76 07 b8 00 00 00 80 eb 0a 48 b8 00 00
RIP <ffffffff8015a64a>{kmem_getpages+130} RSP <0000010178cd1a08>
CR2: 000000000000e551



_______________________________________________
Lustre-discuss mailing list
Lustre-discuss@lists.clusterfs.com
https://lists.clusterfs.com/mailman/listinfo/lustre-discuss

Selvi Kadirvel

2006-May-19 07:36 UTC

head link

[Lustre-discuss] Patching 2.6.9-11 kernel with Lustre 1.4.5.1

Thanks again, but i think I haven''t explained my situation well. I  
have not installed the Lustre file system at all. This is only a  
lustre patched kernel and this doesn''t seem to be stable. Do you  
suggest I try and install Lustre even before ensuring that the kernel  
I have built is fine?

-Selvi

On Jan 27, 2006, at 2:04 PM, Mc Carthy, Fergal wrote:
> The ldiskfs-*.series file would have been used automaticallty to  
> create
> the lustre ldiskfs.ko. When you build lustre it would have taken a  
> copy
> of the ext3 dir from the main kernel and applied the patches to create
> the ldiskfs sources.
>
> The problem you have here appears to be some sort of memory corruption
> issue which your NFS clienting is trapping.
>
> I am not familiar with this issue but the one thought that I have to
> suggest is to try tuning down the max_cached_mb setting for the Lustre
> file system; it defaults to 3/4 of available memory and if you are
> reading a large amount of data and transferring it via NFS that  
> could be
> causing memory pressure issues between NFS and Lustre.
>
> This setting can be found as /proc/fs/lustre/llite/fsX/max_cached_mb,
> where X is the mounted Lustre file system index, always 0 if you are
> only mounting a single FS.
>
> Try tuning it down to 1/2 or 1/3 of memory and seeing if that helps...
> It should reduce the likelihood of Lustre and NFS fighting over buffer
> cache space.
>
> Also I suggest searching in bugzilla.lustre.org to see if there is a
> similar signature to match this Oops.
>
> Fergal.
>
> --
>
> Fergal.McCarthy@HP.com
>
> (The contents of this message and any attachments to it are  
> confidential
> and may be legally privileged. If you have received this message in
> error you should delete it from your system immediately and advise the
> sender. To any recipient of this message within HP, unless otherwise
> stated, you should consider this message and attachments as "HP
> CONFIDENTIAL".)
>
>
> -----Original Message-----
> From: lustre-discuss-admin@lists.clusterfs.com
> [mailto:lustre-discuss-admin@lists.clusterfs.com] On Behalf Of Selvi
> Kadirvel
> Sent: 27 January 2006 18:46
> To: lustre-discuss@lists.clusterfs.com
> Cc: Selvi Kadirvel
> Subject: RE: [Lustre-discuss] Patching 2.6.9-11 kernel with Lustre
> 1.4.5.1
>
>
> I now tried to patch the 2.6.9-11 RHEL kernel with lustre 1.4.5
> patches, the kernel seems to build just fine. I can boot into it and
> everything seems ok, till I do a copy of  a directory with a large
> number of files from a remote home directory into the local directory.
>
> tar cf - . | ( cd /var/tmp/linux-2.6/ ; tar xvfBp -)
>
> The kernel panics with the oops message shown below. I try the tar
> command before I even install lustre to test if the patched kernel is
> stable.
>
> Similarly I can do a ''scp'' of small files, but when i try
a large
> directory, kernel panics again. Does anyone know what is happening
> here? Do I *have* to use the ldiskfs-2.6-rhel4.series? ( I used the
> 2.6-rhel4.series of patches since I got some errors using the latter
> during the make modules step)
>
> Any suggestions/ideas would help. Thanks!
>
> -Selvi
>
> [root@hpcio4 10.13.16.46-2006-01-23-17:45]# cat log
> Unable to handle kernel paging request at 000000000000e551 RIP:
> <ffffffff8015a64a>{kmem_getpages+130}
> PML4 178cc4067 PGD 178cb5067 PMD 0
> Oops: 0000 [1] SMP
> CPU 0
> Modules linked in: netconsole(U) netdump(U) nfs(U) nfsd(U) exportfs
> (U) lockd(U) autofs4(U) i2c_dev(U) i2c_core(U) sunrpc(U) dm_mod(U)
> button(U) battery(U) ac(U) ohci_hcd(U) ehci_hcd(U) tg3(U) e100(U) mii
> (U) bonding(U) sg(U) qla2300(U) qla2xxx(U) scsi_transport_fc(U) ext3
> (U) jbd(U) sata_nv(U) libata(U) sd_mod(U) scsi_mod(U)
> Pid: 6625, comm: tar Not tainted 2.6.9-11.ELcustom
> RIP: 0010:[<ffffffff8015a64a>]
<ffffffff8015a64a>{kmem_getpages+130}
> RSP: 0018:0000010178cd1a08  EFLAGS: 00010013
> RAX: ffffffff7fffffff RBX: 000001007ff11cc0 RCX: 000000000000cc61
> RDX: 000001007ff11d28 RSI: 0000010005f926c8 RDI: 000001000000f000
> RBP: 0000000000000040 R08: 000001016bc1f000 R09: 00000101764f9e80
> R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000220
> R13: 000001007ff11cc0 R14: 0000000000000000 R15: 0000000000000003
> FS:  0000002a95563b00(0000) GS:ffffffff80459800(0000) knlGS:
> 00000000f7fafbb0
> CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 000000000000e551 CR3: 0000000000101000 CR4: 00000000000006e0
> Process tar (pid: 6625, threadinfo 0000010178cd0000, task
> 000001000cd717f0)
> Stack: 0000000000000246 0000000000000246 000001007ff11d68
> ffffffff8015ad87
>         0000022000000220 0000000000000220 000001007ff11cc0
> 000001016c3ffce8
>         0000000000000000 0000000000000000
> Call Trace:<ffffffff8015ad87>{cache_alloc_refill+615}
> <ffffffff8015aabf>{kmem_cache_alloc+90}
>         <ffffffff801dc603>{radix_tree_node_alloc+19}
> <ffffffff801dc7bf>{radix_tree_insert+254}
>         <ffffffffa01b3f4b>{:nfs:nfs_update_request+289}
> <ffffffffa01b50e5>{:nfs:nfs_updatepage+288}
>         <ffffffffa01abd0d>{:nfs:nfs_commit_write+78}
> <ffffffff8015514c>{generic_file_buffered_write+927}
>         <ffffffff801302a5>{try_to_wake_up+734}
<ffffffff80155733>
> {generic_file_aio_write_nolock+732}
>         <ffffffff801557e4>{generic_file_aio_write+126}
> <ffffffffa01abe08>{:nfs:nfs_file_write+177}
>         <ffffffff80171e99>{do_sync_write+173}
<ffffffff801712e8>
> {dentry_open_it+284}
>         <ffffffff80171443>{filp_open+122} <ffffffff801330ae>
> {autoremove_wake_function+0}
>         <ffffffff8018bc8c>{dnotify_parent+34}
<ffffffff80171f94>
> {vfs_write+207}
>         <ffffffff8017207c>{sys_write+69} <ffffffff8011003e>
> {system_call+126}
>
>
> Code: 48 8b 91 f0 18 00 00 76 07 b8 00 00 00 80 eb 0a 48 b8 00 00
> RIP <ffffffff8015a64a>{kmem_getpages+130} RSP
<0000010178cd1a08>
> CR2: 000000000000e551
>
>
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss@lists.clusterfs.com
> https://lists.clusterfs.com/mailman/listinfo/lustre-discuss
>
>

Andreas Dilger

2006-May-19 07:36 UTC

head link

[Lustre-discuss] Patching 2.6.9-11 kernel with Lustre 1.4.5.1

On Jan 27, 2006  13:45 -0500, Selvi Kadirvel wrote:> I now tried to patch the 2.6.9-11 RHEL kernel with lustre 1.4.5  
> patches, the kernel seems to build just fine. I can boot into it and  
> everything seems ok, till I do a copy of  a directory with a large  
> number of files from a remote home directory into the local directory.
> 
> tar cf - . | ( cd /var/tmp/linux-2.6/ ; tar xvfBp -)
> 
> The kernel panics with the oops message shown below. I try the tar  
> command before I even install lustre to test if the patched kernel is  
> stable.
> 
> Similarly I can do a ''scp'' of small files, but when i try
a large
> directory, kernel panics again. Does anyone know what is happening  
> here? Do I *have* to use the ldiskfs-2.6-rhel4.series? ( I used the  
> 2.6-rhel4.series of patches since I got some errors using the latter  
> during the make modules step)
There may be some confusion here.  The ldiskfs series is not to be applied
to your kernel directly.  Instead, Lustre uses this series internally to
build a separate filesystem module (ldiskfs) which is just a patched ext3.
> Call Trace:<ffffffff8015ad87>{cache_alloc_refill+615}  
> <ffffffff8015aabf>{kmem_cache_alloc+90}
>        <ffffffff801dc603>{radix_tree_node_alloc+19}  
> <ffffffff801dc7bf>{radix_tree_insert+254}
>        <ffffffffa01b3f4b>{:nfs:nfs_update_request+289}  
> <ffffffffa01b50e5>{:nfs:nfs_updatepage+288}
>        <ffffffffa01abd0d>{:nfs:nfs_commit_write+78}  
> <ffffffff8015514c>{generic_file_buffered_write+927}
>        <ffffffff801302a5>{try_to_wake_up+734}
<ffffffff80155733>
> {generic_file_aio_write_nolock+732}
>        <ffffffff801557e4>{generic_file_aio_write+126}  
> <ffffffffa01abe08>{:nfs:nfs_file_write+177}
>        <ffffffff80171e99>{do_sync_write+173} <ffffffff801712e8>
> {dentry_open_it+284}
>        <ffffffff80171443>{filp_open+122} <ffffffff801330ae> 
> {autoremove_wake_function+0}
>        <ffffffff8018bc8c>{dnotify_parent+34} <ffffffff80171f94>
> {vfs_write+207}
>        <ffffffff8017207c>{sys_write+69} <ffffffff8011003e> 
> {system_call+126}
I haven''t seen this kind of stack before.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

Mc Carthy, Fergal

2006-May-19 07:36 UTC

head link

[Lustre-discuss] Patching 2.6.9-11 kernel with Lustre 1.4.5.1

In that case I would suggest that you try the CFS pre-built RHEL4u1
2.6.9-11 kernel (available from the same place as you downloaded the
original Lustre 1.4.5 I believe) and see if it hits the same problem. If
it doesn''t see the problem then it is likely that you need to revisit
your own built kernel.

If it does suffer from the same problem then I suggest, if you haven''t
already done it, checking the CFS bugzilla to see if it is known and
possibily fixed.

Fergal.

--

Fergal.McCarthy@HP.com

(The contents of this message and any attachments to it are confidential
and may be legally privileged. If you have received this message in
error you should delete it from your system immediately and advise the
sender. To any recipient of this message within HP, unless otherwise
stated, you should consider this message and attachments as "HP
CONFIDENTIAL".)
 

-----Original Message-----
From: lustre-discuss-admin@lists.clusterfs.com
[mailto:lustre-discuss-admin@lists.clusterfs.com] On Behalf Of Selvi
Kadirvel
Sent: 27 January 2006 19:53
To: lustre-discuss@lists.clusterfs.com
Subject: Re: [Lustre-discuss] Patching 2.6.9-11 kernel with Lustre
1.4.5.1

Thanks again, but i think I haven''t explained my situation well. I  
have not installed the Lustre file system at all. This is only a  
lustre patched kernel and this doesn''t seem to be stable. Do you  
suggest I try and install Lustre even before ensuring that the kernel  
I have built is fine?

-Selvi

On Jan 27, 2006, at 2:04 PM, Mc Carthy, Fergal wrote:
> The ldiskfs-*.series file would have been used automaticallty to  
> create
> the lustre ldiskfs.ko. When you build lustre it would have taken a  
> copy
> of the ext3 dir from the main kernel and applied the patches to create
> the ldiskfs sources.
>
> The problem you have here appears to be some sort of memory corruption
> issue which your NFS clienting is trapping.
>
> I am not familiar with this issue but the one thought that I have to
> suggest is to try tuning down the max_cached_mb setting for the Lustre
> file system; it defaults to 3/4 of available memory and if you are
> reading a large amount of data and transferring it via NFS that  
> could be
> causing memory pressure issues between NFS and Lustre.
>
> This setting can be found as /proc/fs/lustre/llite/fsX/max_cached_mb,
> where X is the mounted Lustre file system index, always 0 if you are
> only mounting a single FS.
>
> Try tuning it down to 1/2 or 1/3 of memory and seeing if that helps...
> It should reduce the likelihood of Lustre and NFS fighting over buffer
> cache space.
>
> Also I suggest searching in bugzilla.lustre.org to see if there is a
> similar signature to match this Oops.
>
> Fergal.
>
> --
>
> Fergal.McCarthy@HP.com
>
> (The contents of this message and any attachments to it are  
> confidential
> and may be legally privileged. If you have received this message in
> error you should delete it from your system immediately and advise the
> sender. To any recipient of this message within HP, unless otherwise
> stated, you should consider this message and attachments as "HP
> CONFIDENTIAL".)
>
>
> -----Original Message-----
> From: lustre-discuss-admin@lists.clusterfs.com
> [mailto:lustre-discuss-admin@lists.clusterfs.com] On Behalf Of Selvi
> Kadirvel
> Sent: 27 January 2006 18:46
> To: lustre-discuss@lists.clusterfs.com
> Cc: Selvi Kadirvel
> Subject: RE: [Lustre-discuss] Patching 2.6.9-11 kernel with Lustre
> 1.4.5.1
>
>
> I now tried to patch the 2.6.9-11 RHEL kernel with lustre 1.4.5
> patches, the kernel seems to build just fine. I can boot into it and
> everything seems ok, till I do a copy of  a directory with a large
> number of files from a remote home directory into the local directory.
>
> tar cf - . | ( cd /var/tmp/linux-2.6/ ; tar xvfBp -)
>
> The kernel panics with the oops message shown below. I try the tar
> command before I even install lustre to test if the patched kernel is
> stable.
>
> Similarly I can do a ''scp'' of small files, but when i try
a large
> directory, kernel panics again. Does anyone know what is happening
> here? Do I *have* to use the ldiskfs-2.6-rhel4.series? ( I used the
> 2.6-rhel4.series of patches since I got some errors using the latter
> during the make modules step)
>
> Any suggestions/ideas would help. Thanks!
>
> -Selvi
>
> [root@hpcio4 10.13.16.46-2006-01-23-17:45]# cat log
> Unable to handle kernel paging request at 000000000000e551 RIP:
> <ffffffff8015a64a>{kmem_getpages+130}
> PML4 178cc4067 PGD 178cb5067 PMD 0
> Oops: 0000 [1] SMP
> CPU 0
> Modules linked in: netconsole(U) netdump(U) nfs(U) nfsd(U) exportfs
> (U) lockd(U) autofs4(U) i2c_dev(U) i2c_core(U) sunrpc(U) dm_mod(U)
> button(U) battery(U) ac(U) ohci_hcd(U) ehci_hcd(U) tg3(U) e100(U) mii
> (U) bonding(U) sg(U) qla2300(U) qla2xxx(U) scsi_transport_fc(U) ext3
> (U) jbd(U) sata_nv(U) libata(U) sd_mod(U) scsi_mod(U)
> Pid: 6625, comm: tar Not tainted 2.6.9-11.ELcustom
> RIP: 0010:[<ffffffff8015a64a>]
<ffffffff8015a64a>{kmem_getpages+130}
> RSP: 0018:0000010178cd1a08  EFLAGS: 00010013
> RAX: ffffffff7fffffff RBX: 000001007ff11cc0 RCX: 000000000000cc61
> RDX: 000001007ff11d28 RSI: 0000010005f926c8 RDI: 000001000000f000
> RBP: 0000000000000040 R08: 000001016bc1f000 R09: 00000101764f9e80
> R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000220
> R13: 000001007ff11cc0 R14: 0000000000000000 R15: 0000000000000003
> FS:  0000002a95563b00(0000) GS:ffffffff80459800(0000) knlGS:
> 00000000f7fafbb0
> CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 000000000000e551 CR3: 0000000000101000 CR4: 00000000000006e0
> Process tar (pid: 6625, threadinfo 0000010178cd0000, task
> 000001000cd717f0)
> Stack: 0000000000000246 0000000000000246 000001007ff11d68
> ffffffff8015ad87
>         0000022000000220 0000000000000220 000001007ff11cc0
> 000001016c3ffce8
>         0000000000000000 0000000000000000
> Call Trace:<ffffffff8015ad87>{cache_alloc_refill+615}
> <ffffffff8015aabf>{kmem_cache_alloc+90}
>         <ffffffff801dc603>{radix_tree_node_alloc+19}
> <ffffffff801dc7bf>{radix_tree_insert+254}
>         <ffffffffa01b3f4b>{:nfs:nfs_update_request+289}
> <ffffffffa01b50e5>{:nfs:nfs_updatepage+288}
>         <ffffffffa01abd0d>{:nfs:nfs_commit_write+78}
> <ffffffff8015514c>{generic_file_buffered_write+927}
>         <ffffffff801302a5>{try_to_wake_up+734}
<ffffffff80155733>
> {generic_file_aio_write_nolock+732}
>         <ffffffff801557e4>{generic_file_aio_write+126}
> <ffffffffa01abe08>{:nfs:nfs_file_write+177}
>         <ffffffff80171e99>{do_sync_write+173}
<ffffffff801712e8>
> {dentry_open_it+284}
>         <ffffffff80171443>{filp_open+122} <ffffffff801330ae>
> {autoremove_wake_function+0}
>         <ffffffff8018bc8c>{dnotify_parent+34}
<ffffffff80171f94>
> {vfs_write+207}
>         <ffffffff8017207c>{sys_write+69} <ffffffff8011003e>
> {system_call+126}
>
>
> Code: 48 8b 91 f0 18 00 00 76 07 b8 00 00 00 80 eb 0a 48 b8 00 00
> RIP <ffffffff8015a64a>{kmem_getpages+130} RSP
<0000010178cd1a08>
> CR2: 000000000000e551
>
>
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss@lists.clusterfs.com
> https://lists.clusterfs.com/mailman/listinfo/lustre-discuss
>
>
_______________________________________________
Lustre-discuss mailing list
Lustre-discuss@lists.clusterfs.com
https://lists.clusterfs.com/mailman/listinfo/lustre-discuss

Selvi Kadirvel

2006-May-19 07:36 UTC

head link

[Lustre-discuss] Patching 2.6.9-11 kernel with Lustre 1.4.5.1

All,

I am trying to patch the 2.6.9-11 kernel with kernel patches of  
Lustre 1.4.5.1. I am using the config file provided with this version  
of Lustre. I am getting the following warning messages when I do a  
modules_install:

WARNING: /lib/modules/2.6.9-11.ELcustom/kernel/fs/ext3/ext3.ko needs  
unknown symbol init_ext3_proc
WARNING: /lib/modules/2.6.9-11.ELcustom/kernel/fs/ext3/ext3.ko needs  
unknown symbol exit_ext3_proc
WARNING: /lib/modules/2.6.9-11.ELcustom/kernel/fs/ext3/ext3.ko needs  
unknown symbol __d_rehash
WARNING: /lib/modules/2.6.9-11.ELcustom/kernel/fs/ext3/ext3.ko needs  
unknown symbol __d_move

If I go ahead and boot into this kernel, it panics with a symbol  
undefined error message.

Has anyone faced a similar problem and/or know of a fix for this?

Thanks,
Selvi

Lustre discuss - May 2006 - Patching 2.6.9-11 kernel with Lustre 1.4.5.1

[Lustre-discuss] Patching 2.6.9-11 kernel with Lustre 1.4.5.1

[Lustre-discuss] Patching 2.6.9-11 kernel with Lustre 1.4.5.1

[Lustre-discuss] Patching 2.6.9-11 kernel with Lustre 1.4.5.1

[Lustre-discuss] Patching 2.6.9-11 kernel with Lustre 1.4.5.1

[Lustre-discuss] Patching 2.6.9-11 kernel with Lustre 1.4.5.1

[Lustre-discuss] Patching 2.6.9-11 kernel with Lustre 1.4.5.1