Hello I am using LNetWait (blocking call ) on a particular event .After i recevie this event i break from the loop which waits for this event and proceed but when another event is added into the event queue the system crashes.I thought LNetPoll would be better as i can just poll for that particular event without disturbing the event queue but when i make i get undefined.Any thoughts . Thanks Ravi -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-devel/attachments/20110304/64b7f91c/attachment.html
Hi Ravi, Which version of Lustre/LNet are you trying with? Are you trying to build some new code over LNet? Could you show us some example code if you don''t mind? btw, If you are trying this in kernel space, I would suggest to use eq_callback (LNetEQAlloc(...eq_callback)) instead of LNetEQPoll/LNetEQWait, which is better for performance. Polling is not good for performance because all EQs share one single waitq in LNet. Regards Liang On Mar 5, 2011, at 9:30 AM, Ravi wrote:> Hello > > I am using LNetWait (blocking call ) on a particular event .After i recevie this event i break from the loop which waits for this event and proceed but when another event is added into the event queue the system crashes.I thought LNetPoll would be better as i can just poll for that particular event without disturbing the event queue but when i make i get undefined.Any thoughts . > > Thanks > Ravi > > _______________________________________________ > Lustre-devel mailing list > Lustre-devel at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-devel-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-devel/attachments/20110305/9b69428b/attachment.html
Hello
Thanks for the reply.I am using lustre-1.8.1.1 .I am working on a new module
which is used for some kind of delegation operations.I am using Lnet operations
in this module.
The code which fails is
do {
rc = LNetEQWait(lnet_eq_hd, &ev);
if( ev.type == LNET_EVENT_PUT )
break;
} while ( rc != 0);
Here i am waiting on some PUT event from a client and then break from the
loop.And do some operations accordingly.But next time when i perform some PUT
operation (for example) and it gets logged into the event queue i try reading
from that event but the MDS fails.
I also tried using these functions //rc = LNetEQPoll( &lnet_eq_hd, 1,2000,
&ev, &which); in place of LNetEQWait but it says undefined .
Can you please throw some light on eq_callback function as i havnt found it in
Lnet manual to go through.
The log before crashing shows :
Mar 4 20:35:08 ws11 kernel: type=LNET_EVENT_SEND, pt-idx=53,
mbits=0x1234abcd, rlen=64, mlen=64, md.user_ptr=0xaaaabbbb, hdr-data=0x0
Mar 4 20:35:08 ws11 kernel: status=0, unlnk=0, offset=0, seq=2
//Iam asssuming it fails here as till here it prints fine.I also want to mention
that this operation is successful as well.
Mar 4 20:35:27 ws11 kernel: BUG: soft lockup - CPU#0 stuck for 10s!
[insmod:4609]
Mar 4 20:35:27 ws11 kernel: CPU 0:
Mar 4 20:35:27 ws11 kernel: Modules linked in: tmod(U) ksocklnd(U) ko2iblnd(FU)
lnet(U) libcfs(U) autofs4(U) hidp(U) nfs(U) fscache(U) nfs_acl(U) rfcomm(U)
l2cap(U) bluetooth(U) lockd(U) sunrpc(U) cpufreq_ondemand(U) acpi_cpufreq(U)
freq_table(U) ip_conntrack_netbios_ns(U) ipt_REJECT(U) xt_state(U)
ip_conntrack(U) nfnetlink(U) iptable_filter(U) ip_tables(U) ip6t_REJECT(U)
xt_tcpudp(U) ip6table_filter(U) ip6_tables(U) x_tables(U) rdma_ucm(U) ib_sdp(U)
rdma_cm(U) iw_cm(U) ib_addr(U) ib_ipoib(U) ipoib_helper(U) ib_cm(U) ib_sa(U)
ipv6(U) xfrm_nalgo(U) crypto_api(U) ib_uverbs(U) ib_umad(U) mlx4_en(U)
mlx4_ib(U) mlx4_core(U) loop(U) dm_multipath(U) scsi_dh(U) video(U) hwmon(U)
backlight(U) sbs(U) i2c_ec(U) button(U) battery(U) asus_acpi(U)
acpi_memhotplug(U) ac(U) lp(U) snd_hda_intel(U) snd_seq_dummy(U) snd_seq_oss(U)
snd_seq_midi_event(U) snd_seq(U) snd_seq_device(U) snd_pcm_oss(U)
snd_mixer_oss(U) ib_mthca(U) snd_pcm(U) snd_timer(U) snd_page_alloc(U) ib_mad(U)
snd_hwdep(U) snd(U) sg(U) ib_core(U) e100(U) ide_cd(
Mar 4 20:35:27 ws11 kernel: ) mii(U) serio_raw(U) pcspkr(U) i2c_i801(U)
cdrom(U) soundcore(U) parport_pc(U) shpchp(U) i2c_core(U) parport(U)
dm_raid45(U) dm_message(U) dm_region_hash(U) dm_mem_cache(U) dm_snapshot(U)
dm_zero(U) dm_mirror(U) dm_log(U) dm_mod(U) ata_piix(U) libata(U) sd_mod(U)
scsi_mod(U) ext3(U) jbd(U) uhci_hcd(U) ohci_hcd(U) ehci_hcd(U)
Mar 4 20:35:27 ws11 kernel: Pid: 4609, comm: insmod Tainted: GF
2.6.18-128.7.1.el5-lustre.1.8.1.1smp-cust #2
Mar 4 20:35:27 ws11 kernel: RIP: 0010:[<ffffffff80064c54>]
[<ffffffff80064c54>] .text.lock.spinlock+0x2/0x30
Mar 4 20:35:27 ws11 kernel: RSP: 0018:ffff810021099d10 EFLAGS: 00000286
Mar 4 20:35:27 ws11 kernel: RAX: 0000000000000002 RBX: 00000000ffffffff RCX:
ffff810021099df8
Mar 4 20:35:27 ws11 kernel: RDX: 0000000000000001 RSI: 0000000000000001 RDI:
ffffffff8888a7a0
Mar 4 20:35:27 ws11 kernel: RBP: ffff81003a04e1c0 R08: ffff810021099ddc R09:
000000001234abcd
.......
I hope this helps
Thanks
-----Original Message-----
From: Liang Zhen <liang at whamcloud.com>
To: Ravi <raviprakashdrbh at aol.com>
Cc: lustre-devel <lustre-devel at lists.lustre.org>
Sent: Fri, Mar 4, 2011 8:54 pm
Subject: Re: [Lustre-devel] LNetPoll undefined
Hi Ravi,
Which version of Lustre/LNet are you trying with? Are you trying to build some
new code over LNet? Could you show us some example code if you don''t
mind?
btw, If you are trying this in kernel space, I would suggest to use eq_callback
(LNetEQAlloc(...eq_callback)) instead of LNetEQPoll/LNetEQWait, which is better
for performance. Polling is not good for performance because all EQs share one
single waitq in LNet.
Regards
Liang
On Mar 5, 2011, at 9:30 AM, Ravi wrote:
Hello
I am using LNetWait (blocking call ) on a particular event .After i recevie this
event i break from the loop which waits for this event and proceed but when
another event is added into the event queue the system crashes.I thought
LNetPoll would be better as i can just poll for that particular event without
disturbing the event queue but when i make i get undefined.Any thoughts .
Thanks
Ravi
_______________________________________________
Lustre-devel mailing list
Lustre-devel at lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-devel
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.lustre.org/pipermail/lustre-devel/attachments/20110305/b936f694/attachment.html
Ravi,
I think the soft lockup probably is because the thread is polling on the EQ and
expecting EVENT_PUT, however, there are a lot of EVENT_SEND(server keeps calling
LNetPut or LNetGet with MD on the same EQ?) so it became to a busy loop which is
always trying to get LNET_LOCK to poll new event, and kernel can''t
schedule watchdog on some CPUs then raise the warning.
pseudo code for eq_callback is like
wait_queue_head_t my_waitq;
struct list_head req_list;
void my_callback(lnet_event_t *ev)
{
if (ev->type != LNET_EVENT_PUT)
return;
/* construct request form data in MD */
req = ....;
...
add_req_to_queue(req, req_list);
wake_up(my_waitq);
}
my_thread()
{
...
rc = LNetEQAlloc(1024, my_callback, &eqh);
...
while (1) {
while (!list_empty(&req_list)) {
req = list_entry(req_list.next, ...);
list_del(&req->list);
handle_request(req);
}
init_waitqueue_entry(wait, current);
add_wait_queue(my_waitq, wait)
if (list_empty(&req_list))
schedule();
remove_wait_queue(my_waitq, wait);
}
}
NB: this is just pseudo code and there should be some locks to protect, if you
want some real code that is using eq_callback, please lookup into
lnet/selftest/rpc.c
Yes, LNetEQPoll is not exported... so if you really want to use it, please just
add this line to lnet/lnet/module.c:
EXPORT_SYMBOL(LNetEQPoll);
Though they should be exact samely for your case.
Regards
Liang
On Mar 6, 2011, at 2:51 AM, Ravi wrote:
> Hello
> Thanks for the reply.I am using lustre-1.8.1.1 .I am working on a new
module which is used for some kind of delegation operations.I am using Lnet
operations in this module.
>
>
> The code which fails is
> do {
> rc = LNetEQWait(lnet_eq_hd, &ev);
> if( ev.type == LNET_EVENT_PUT )
> break;
>
> } while ( rc != 0);
>
>
>
>
> Here i am waiting on some PUT event from a client and then break from the
loop.And do some operations accordingly.But next time when i perform some PUT
operation (for example) and it gets logged into the event queue i try reading
from that event but the MDS fails.
> I also tried using these functions //rc = LNetEQPoll( &lnet_eq_hd,
1,2000, &ev, &which); in place of LNetEQWait but it says undefined .
>
> Can you please throw some light on eq_callback function as i havnt found it
in Lnet manual to go through.
>
>
> The log before crashing shows :
>
>
> Mar 4 20:35:08 ws11 kernel: type=LNET_EVENT_SEND, pt-idx=53,
mbits=0x1234abcd, rlen=64, mlen=64, md.user_ptr=0xaaaabbbb, hdr-data=0x0
> Mar 4 20:35:08 ws11 kernel: status=0, unlnk=0, offset=0, seq=2
>
> //Iam asssuming it fails here as till here it prints fine.I also want to
mention that this operation is successful as well.
>
>
> Mar 4 20:35:27 ws11 kernel: BUG: soft lockup - CPU#0 stuck for 10s!
[insmod:4609]
> Mar 4 20:35:27 ws11 kernel: CPU 0:
> Mar 4 20:35:27 ws11 kernel: Modules linked in: tmod(U) ksocklnd(U)
ko2iblnd(FU) lnet(U) libcfs(U) autofs4(U) hidp(U) nfs(U) fscache(U) nfs_acl(U)
rfcomm(U) l2cap(U) bluetooth(U) lockd(U) sunrpc(U) cpufreq_ondemand(U)
acpi_cpufreq(U) freq_table(U) ip_conntrack_netbios_ns(U) ipt_REJECT(U)
xt_state(U) ip_conntrack(U) nfnetlink(U) iptable_filter(U) ip_tables(U)
ip6t_REJECT(U) xt_tcpudp(U) ip6table_filter(U) ip6_tables(U) x_tables(U)
rdma_ucm(U) ib_sdp(U) rdma_cm(U) iw_cm(U) ib_addr(U) ib_ipoib(U) ipoib_helper(U)
ib_cm(U) ib_sa(U) ipv6(U) xfrm_nalgo(U) crypto_api(U) ib_uverbs(U) ib_umad(U)
mlx4_en(U) mlx4_ib(U) mlx4_core(U) loop(U) dm_multipath(U) scsi_dh(U) video(U)
hwmon(U) backlight(U) sbs(U) i2c_ec(U) button(U) battery(U) asus_acpi(U)
acpi_memhotplug(U) ac(U) lp(U) snd_hda_intel(U) snd_seq_dummy(U) snd_seq_oss(U)
snd_seq_midi_event(U) snd_seq(U) snd_seq_device(U) snd_pcm_oss(U)
snd_mixer_oss(U) ib_mthca(U) snd_pcm(U) snd_timer(U) snd_page_alloc(U) ib_mad(U)
snd_hwdep(U) snd(U) sg(U) ib_core(U) e100(U) ide_cd(
> Mar 4 20:35:27 ws11 kernel: ) mii(U) serio_raw(U) pcspkr(U) i2c_i801(U)
cdrom(U) soundcore(U) parport_pc(U) shpchp(U) i2c_core(U) parport(U)
dm_raid45(U) dm_message(U) dm_region_hash(U) dm_mem_cache(U) dm_snapshot(U)
dm_zero(U) dm_mirror(U) dm_log(U) dm_mod(U) ata_piix(U) libata(U) sd_mod(U)
scsi_mod(U) ext3(U) jbd(U) uhci_hcd(U) ohci_hcd(U) ehci_hcd(U)
> Mar 4 20:35:27 ws11 kernel: Pid: 4609, comm: insmod Tainted: GF
2.6.18-128.7.1.el5-lustre.1.8.1.1smp-cust #2
> Mar 4 20:35:27 ws11 kernel: RIP: 0010:[<ffffffff80064c54>]
[<ffffffff80064c54>] .text.lock.spinlock+0x2/0x30
> Mar 4 20:35:27 ws11 kernel: RSP: 0018:ffff810021099d10 EFLAGS: 00000286
> Mar 4 20:35:27 ws11 kernel: RAX: 0000000000000002 RBX: 00000000ffffffff
RCX: ffff810021099df8
> Mar 4 20:35:27 ws11 kernel: RDX: 0000000000000001 RSI: 0000000000000001
RDI: ffffffff8888a7a0
> Mar 4 20:35:27 ws11 kernel: RBP: ffff81003a04e1c0 R08: ffff810021099ddc
R09: 000000001234abcd
> .......
>
>
>
> I hope this helps
>
> Thanks
>
>
>
>
>
> -----Original Message-----
> From: Liang Zhen <liang at whamcloud.com>
> To: Ravi <raviprakashdrbh at aol.com>
> Cc: lustre-devel <lustre-devel at lists.lustre.org>
> Sent: Fri, Mar 4, 2011 8:54 pm
> Subject: Re: [Lustre-devel] LNetPoll undefined
>
> Hi Ravi,
>
> Which version of Lustre/LNet are you trying with? Are you trying to build
some new code over LNet? Could you show us some example code if you
don''t mind?
> btw, If you are trying this in kernel space, I would suggest to use
eq_callback (LNetEQAlloc(...eq_callback)) instead of LNetEQPoll/LNetEQWait,
which is better for performance. Polling is not good for performance because all
EQs share one single waitq in LNet.
>
> Regards
> Liang
>
> On Mar 5, 2011, at 9:30 AM, Ravi wrote:
>
>> Hello
>>
>> I am using LNetWait (blocking call ) on a particular event .After i
recevie this event i break from the loop which waits for this event and proceed
but when another event is added into the event queue the system crashes.I
thought LNetPoll would be better as i can just poll for that particular event
without disturbing the event queue but when i make i get undefined.Any thoughts
.
>>
>> Thanks
>> Ravi
>>
>> _______________________________________________
>> Lustre-devel mailing list
>> Lustre-devel at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.lustre.org/pipermail/lustre-devel/attachments/20110307/878da137/attachment.html