thr3ads.net - Lustre discuss - [Lustre-discuss] Lustre clients under Xen [Nov 2010]

If this information is useful, please help other people find it:
Share via:

Lukas Hejtmanek

2010-Nov-30 19:14 UTC

[Lustre-discuss] Lustre clients under Xen

Hello,

I see some oddities on Lustre clients running under Xen DomU. I got messages
like this:
Jul 29 14:35:23 quark8-1 kernel: Lustre: Request x2674628 sent from
stable-OST0001-osc-ffff8801a72a5000 to NID 147.251.9.9 at tcp 100s ago has
timed out (limit 100s).
Jul 29 14:35:23 quark8-1 kernel: Lustre:
stable-OST0001-osc-ffff8801a72a5000: Connection to service
stable-OST0001 via nid 147.251.9.9 at tcp was lost; in progress operations
using this service will wait for recovery to complete.
Jul 29 14:35:23 quark8-1 kernel: LustreError:
128:0:(ldlm_request.c:1033:ldlm_cli_cancel_req()) Got rc -11 from cancel
RPC: canceling anyway
Jul 29 14:35:23 quark8-1 kernel: LustreError:
128:0:(ldlm_request.c:1622:ldlm_cli_cancel_list()) ldlm_cli_cancel_list: -11
Jul 29 14:35:23 quark8-1 kernel: Lustre:
stable-OST0001-osc-ffff8801a72a5000: Connection restored to service
stable-OST0001 using nid 147.251.9.9 at tcp.

The network is OK all the time. I tried both 1.6.x and 1.8.x Lustre. All the
same. Moreover, from time to time, Lustre fs gets stuck in:
[<ffffffff882c5ef3>] :mdc:mdc_close+0x1e3/0x7a0
[<ffffffff88332f53>] :lustre:ll_close_inode_openhandle+0x1e3/0x650
[<ffffffff88333a05>] :lustre:ll_mdc_real_close+0x115/0x370
[<ffffffff883691e1>] :lustre:ll_mdc_blocking_ast+0x1d1/0x570
[<ffffffff88186720>] :ptlrpc:ldlm_cancel_callback+0x50/0xd0
[<ffffffff881a0721>] :ptlrpc:ldlm_cli_cancel_local+0x61/0x350
[<ffffffff881a2025>] :ptlrpc:ldlm_cancel_lru_local+0x165/0x340
[<ffffffff881a14c7>] :ptlrpc:ldlm_cli_cancel_list+0xf7/0x380
[<ffffffff881a2263>] :ptlrpc:ldlm_cancel_lru+0x63/0x1b0
[<ffffffff881b62d7>] :ptlrpc:ldlm_cli_pool_shrink+0xf7/0x240
[<ffffffff881b365d>] :ptlrpc:ldlm_pool_shrink+0x2d/0xe0
[<ffffffff881b48fb>] :ptlrpc:ldlm_pools_shrink+0x25b/0x330
[<ffffffff8025c705>] shrink_slab+0xe2/0x15a

when the DomU is being suspended (most memory and CPU is stolen by another
DomU).

Is this something known or unsupported? (I.e., running Lustre under Xen with
domains preemption)

-- 
Luk?? Hejtm?nek

Andreas Dilger

2010-Nov-30 21:27 UTC

head link

[Lustre-discuss] Lustre clients under Xen

On 2010-11-30, at 12:14, Lukas Hejtmanek wrote:> I see some oddities on Lustre clients running under Xen DomU ...
> when the DomU is being suspended (most memory and CPU is stolen by another
DomU).
> 
> Is this something known or unsupported? (I.e., running Lustre under Xen
with domains preemption)
Lustre servers expect to always be able to communicate with the clients, and
that the clients are responsive to their requests, or they are evicted by the
server to keep the filesystem usable.  This makes sense in an HPC environment
where the filesystem is a shared resource, and clients fail regularly (due to
huge numbers of clients), and it is more important for the rest of the clients
to continue.

There is a mode that the servers will NOT expect the clients to be responsive,
but then it is the client''s responsibility to act accordingly and check
for pending server requests before it uses any saved state.  That is used by the
"liblustre" code, but is not implemented for the normal Linux client.

Combining these two modes, for use in VM systems where the Linux client may be
unresponsive for long periods of time might makes sense, though it can also add
a lot of complexity.  I don''t think anyone is planning to work on this
in the near future.

Cheers, Andreas
--
Andreas Dilger
Lustre Technical Lead
Oracle Corporation Canada Inc.

Lukas Hejtmanek

2010-Dec-01 08:12 UTC

head link

[Lustre-discuss] Lustre clients under Xen

On Tue, Nov 30, 2010 at 02:27:18PM -0700, Andreas Dilger
wrote:> Combining these two modes, for use in VM systems where the Linux client may
> be unresponsive for long periods of time might makes sense, though it can
> also add a lot of complexity.  I don''t think anyone is planning to
work on
> this in the near future.  
Thanks, however, the client is not completely unresponsive, it still has CPU
and about 500MB RAM so it should be OK from Lustre point of view.

Also, the problem arises as soon as I begin to shrink memory, e.g., from 8GB
to 0.5GB. Sometimes, Lustre client hangs. Is there any conceptual problem with
Lustre and memory shrinker in Xen?

-- 
Luk?? Hejtm?nek

Oleg Drokin

2010-Dec-02 15:18 UTC

head link

[Lustre-discuss] Lustre clients under Xen

Hello!

On Dec 1, 2010, at 3:12 AM, Lukas Hejtmanek wrote:
> On Tue, Nov 30, 2010 at 02:27:18PM -0700, Andreas Dilger wrote:
>> Combining these two modes, for use in VM systems where the Linux client
may
>> be unresponsive for long periods of time might makes sense, though it
can
>> also add a lot of complexity.  I don''t think anyone is
planning to work on
>> this in the near future.  
> Thanks, however, the client is not completely unresponsive, it still has
CPU
> and about 500MB RAM so it should be OK from Lustre point of view.
> Also, the problem arises as soon as I begin to shrink memory, e.g., from
8GB
> to 0.5GB. Sometimes, Lustre client hangs. Is there any conceptual problem
with
> Lustre and memory shrinker in Xen?
It all depends on how Xen does the shrinking. If it blocks kernel code from
execution for long periods of time in process,
it''s the same as if the node is suspended for some time essentially.
Even if it would block only certain threads that tare trying to access to be
shrunk memory and it happens to be certain lustre threads,
that would still spend trouble if the blocking persists for significant amounts
of time.

I personally use XEN without memory shrinking for my testing and it seems to be
working just fine.

Bye,
    Oleg

Lukas Hejtmanek

2010-Dec-02 15:50 UTC

head link

[Lustre-discuss] Lustre clients under Xen

On Thu, Dec 02, 2010 at 10:18:51AM -0500, Oleg Drokin
wrote:> It all depends on how Xen does the shrinking. If it blocks kernel code from
execution for long periods of time in process,
> it''s the same as if the node is suspended for some time
essentially.
> Even if it would block only certain threads that tare trying to access to
be shrunk memory and it happens to be certain lustre threads,
> that would still spend trouble if the blocking persists for significant
amounts of time.
> 
> I personally use XEN without memory shrinking for my testing and it seems
to be working just fine.
It could stop all the processes. And it pages out most used memory into swap. 
But maybe there are some problems resulting from unlocked pages in lustre 
threads?

-- 
Luk?? Hejtm?nek

Oleg Drokin

2010-Dec-02 15:59 UTC

head link

[Lustre-discuss] Lustre clients under Xen

Hello!

On Dec 2, 2010, at 10:50 AM, Lukas Hejtmanek wrote:
> On Thu, Dec 02, 2010 at 10:18:51AM -0500, Oleg Drokin wrote:
>> It all depends on how Xen does the shrinking. If it blocks kernel code
from execution for long periods of time in process,
>> it''s the same as if the node is suspended for some time
essentially.
>> Even if it would block only certain threads that tare trying to access
to be shrunk memory and it happens to be certain lustre threads,
>> that would still spend trouble if the blocking persists for significant
amounts of time.
> It could stop all the processes. And it pages out most used memory into
swap.
> But maybe there are some problems resulting from unlocked pages in lustre 
> threads?
Well, I assume XEN plays nice with others and does not just unlcok pages it did
not lock because that would upset everybody, not just Lustre.
In case of Lustre I am sure you''ll see tons of tripped assertions.
Lustre does not have any important processes running in userspace, but there are
some important kernel threads that should not be deprived of
CPU for too long.

Bye,
    Oleg

Lustre discuss - Nov 2010 - Lustre clients under Xen

[Lustre-discuss] Lustre clients under Xen

[Lustre-discuss] Lustre clients under Xen

[Lustre-discuss] Lustre clients under Xen

[Lustre-discuss] Lustre clients under Xen

[Lustre-discuss] Lustre clients under Xen

[Lustre-discuss] Lustre clients under Xen