thr3ads.net - Ocfs2 users - [Ocfs2-users] ENOSPC [Mar 2010]

If this information is useful, please help other people find it:
Share via:

David Johle

2010-Mar-24 00:31 UTC

[Ocfs2-users] ENOSPC

So in light of prior issues with lock contention and such due to 
writing apache logs to shared files I have started storing them 
locally on each node.  I made a script to combine them nightly before 
the statistics generator kicks off for the previous day's traffic analysis.

This script, using logresolvemerge.pl, is actually writing the output 
back to the shard volume for easy reference later.  I figure I would 
not have issues with this as it's a large amount of sequential writes 
from a single node at off-peak time.  However, It's been getting hung 
with high CPU from the merger.

I'm pretty sure I'm running into the famous "free space 
fragmentation" problem, but wanted to confirm that this was the case 
or see if there was additional troubleshooting I can do.

Here's the disk, plenty of overall free space:

Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/mapper/mpath1   209725440  85311460 124413980  41% /san/live-websites


While my merging was going 100% of a CPU core, but the merged file 
was not growing in size and not much I/O actually happening to the 
shared volume, I did an strace to see what it was doing and got this:

# strace -p 16844
Process 16844 attached - interrupt to quit
read(3, "1\" 200 936 \"http://www.industria"..., 4096) =
4096
write(1, ".NET CLR 1.1.4322; .NET CLR 2.0."..., 4096) = -1 ENOSPC (No 
space left on device)
read(4, "oration&locationName=South+Jerse"..., 4096) = 4096
write(1, "ivers=8&ngPipelines=600&kvtl230="..., 4096) = -1
ENOSPC (No
space left on device)
read(4, "1\" 200 936 \"http://www.industria"..., 4096) =
4096
write(1, "gan+Boulevard&locationCSZ=Salem%"..., 4096) = -1 ENOSPC
(No
space left on device)
read(3, "HTTP/1.0\" 200 4096 \"-\" \"WinampMP"...,
4096) = 4096
write(1, "elta=.375&zoomlevel=6&label=Sout"..., 4096) = -1
ENOSPC (No
space left on device)
read(4, "HTTP/1.0\" 200 4096 \"-\" \"WinampMP"...,
4096) = 4096
write(1, "ident/4.0; .NET CLR 1.1.4322; .N"..., 4096) = -1 ENOSPC (No 
space left on device)
read(3, "0 36516 \"-\" \"Mozilla/5.0 (compat"..., 4096)
= 4096


Now I'm really worried about the cluster stability from other routine 
writes that might fail soon.  I know the typical workaround is to 
reduce the node slots, but I don't have any excess slots to 
spare.  Are there any other tricks to improve/reduce freespace fragmentation?

Sunil Mushran

2010-Mar-25 01:54 UTC

head link

[Ocfs2-users] ENOSPC

Quite a bit of work is ongoing on this front. I'll list all that work
in another email.

Meanwhile make a bz with the stat_sysdir output. We'll need that
to determine the best way forward.

David Johle wrote:> So in light of prior issues with lock contention and such due to 
> writing apache logs to shared files I have started storing them 
> locally on each node.  I made a script to combine them nightly before 
> the statistics generator kicks off for the previous day's traffic
analysis.
>
> This script, using logresolvemerge.pl, is actually writing the output 
> back to the shard volume for easy reference later.  I figure I would 
> not have issues with this as it's a large amount of sequential writes 
> from a single node at off-peak time.  However, It's been getting hung 
> with high CPU from the merger.
>
> I'm pretty sure I'm running into the famous "free space 
> fragmentation" problem, but wanted to confirm that this was the case 
> or see if there was additional troubleshooting I can do.
>
> Here's the disk, plenty of overall free space:
>
> Filesystem           1K-blocks      Used Available Use% Mounted on
> /dev/mapper/mpath1   209725440  85311460 124413980  41% /san/live-websites
>
>
> While my merging was going 100% of a CPU core, but the merged file 
> was not growing in size and not much I/O actually happening to the 
> shared volume, I did an strace to see what it was doing and got this:
>
> # strace -p 16844
> Process 16844 attached - interrupt to quit
> read(3, "1\" 200 936 \"http://www.industria"..., 4096)
= 4096
> write(1, ".NET CLR 1.1.4322; .NET CLR 2.0."..., 4096) = -1 ENOSPC
(No
> space left on device)
> read(4, "oration&locationName=South+Jerse"..., 4096) = 4096
> write(1, "ivers=8&ngPipelines=600&kvtl230="..., 4096) =
-1 ENOSPC (No
> space left on device)
> read(4, "1\" 200 936 \"http://www.industria"..., 4096)
= 4096
> write(1, "gan+Boulevard&locationCSZ=Salem%"..., 4096) = -1
ENOSPC (No
> space left on device)
> read(3, "HTTP/1.0\" 200 4096 \"-\"
\"WinampMP"..., 4096) = 4096
> write(1, "elta=.375&zoomlevel=6&label=Sout"..., 4096) =
-1 ENOSPC (No
> space left on device)
> read(4, "HTTP/1.0\" 200 4096 \"-\"
\"WinampMP"..., 4096) = 4096
> write(1, "ident/4.0; .NET CLR 1.1.4322; .N"..., 4096) = -1 ENOSPC
(No
> space left on device)
> read(3, "0 36516 \"-\" \"Mozilla/5.0 (compat"...,
4096) = 4096
>
>
> Now I'm really worried about the cluster stability from other routine 
> writes that might fail soon.  I know the typical workaround is to 
> reduce the node slots, but I don't have any excess slots to 
> spare.  Are there any other tricks to improve/reduce freespace
fragmentation?
>
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>

David Johle

2010-Apr-08 14:46 UTC

head link

[Ocfs2-users] ENOSPC

Just realized I never replied to this, but I had created the bugzilla 
report a little while back.  It can be found at:
http://oss.oracle.com/bugzilla/show_bug.cgi?id=1237


At 08:54 PM 3/24/2010, Sunil Mushran wrote:>Quite a bit of work is ongoing on this front. I'll list all that work
>in another email.
>
>Meanwhile make a bz with the stat_sysdir output. We'll need that
>to determine the best way forward.
>
>David Johle wrote:
>>So in light of prior issues with lock contention and such due to 
>>writing apache logs to shared files I have started storing them 
>>locally on each node.  I made a script to combine them nightly 
>>before the statistics generator kicks off for the previous day's 
>>traffic analysis.
>>
>>This script, using logresolvemerge.pl, is actually writing the 
>>output back to the shard volume for easy reference later.  I figure 
>>I would not have issues with this as it's a large amount of 
>>sequential writes from a single node at off-peak time.  However, 
>>It's been getting hung with high CPU from the merger.
>>
>>I'm pretty sure I'm running into the famous "free space 
>>fragmentation" problem, but wanted to confirm that this was the 
>>case or see if there was additional troubleshooting I can do.
>>
>>Here's the disk, plenty of overall free space:
>>
>>Filesystem           1K-blocks      Used Available Use% Mounted on
>>/dev/mapper/mpath1   209725440  85311460 124413980  41%
/san/live-websites
>>
>>
>>While my merging was going 100% of a CPU core, but the merged file 
>>was not growing in size and not much I/O actually happening to the 
>>shared volume, I did an strace to see what it was doing and got this:
>>
>># strace -p 16844
>>Process 16844 attached - interrupt to quit
>>read(3, "1\" 200 936 \"http://www.industria"...,
4096) = 4096
>>write(1, ".NET CLR 1.1.4322; .NET CLR 2.0."..., 4096) = -1
ENOSPC
>>(No space left on device)
>>read(4, "oration&locationName=South+Jerse"..., 4096) =
4096
>>write(1, "ivers=8&ngPipelines=600&kvtl230="..., 4096)
= -1 ENOSPC
>>(No space left on device)
>>read(4, "1\" 200 936 \"http://www.industria"...,
4096) = 4096
>>write(1, "gan+Boulevard&locationCSZ=Salem%"..., 4096) = -1
ENOSPC
>>(No space left on device)
>>read(3, "HTTP/1.0\" 200 4096 \"-\"
\"WinampMP"..., 4096) = 4096
>>write(1, "elta=.375&zoomlevel=6&label=Sout"..., 4096)
= -1 ENOSPC
>>(No space left on device)
>>read(4, "HTTP/1.0\" 200 4096 \"-\"
\"WinampMP"..., 4096) = 4096
>>write(1, "ident/4.0; .NET CLR 1.1.4322; .N"..., 4096) = -1
ENOSPC
>>(No space left on device)
>>read(3, "0 36516 \"-\" \"Mozilla/5.0
(compat"..., 4096) = 4096
>>
>>
>>Now I'm really worried about the cluster stability from other 
>>routine writes that might fail soon.  I know the typical workaround 
>>is to reduce the node slots, but I don't have any excess slots to 
>>spare.  Are there any other tricks to improve/reduce freespace
fragmentation?

Ocfs2 users - Mar 2010 - ENOSPC

[Ocfs2-users] ENOSPC

[Ocfs2-users] ENOSPC

[Ocfs2-users] ENOSPC