thr3ads.net - samba - Samba and HSM [Jan 2001]

If this information is useful, please help other people find it:
Share via:

acherry@pobox.com

2001-Jan-09 20:22 UTC

Samba and HSM

Hello-

We've got a bit of a variation of the "multiple smbd processes on an
NFS-mounted filesystem" problem - I'm wondering if someone can be of
help.  In this case, it's not NFS that's the problem, but HSM (which
can cause symptoms similar to a non-responding NFS filesystem).

We are currently running Samba 2.0.6 on our Sun fileservers, primarily
for home directories and group data.  All of the filesystems that are
accessed via Samba are managed by Veritas Storage Migrator (HSM),
using optical and tape (in that order) as secondary storage.

The main symptom we are experiencing is with multiple smbd processes
per user.  The severity of the problem varies depending on the root
cause.  Here are the scenarios that come up:

1. Access to files migrated to tape (blocks smbd)

Files migrated to tape can take as long as 10 minutes to retrieve.
While attempting to access such a file, the Windows NT redirector
times out after about 45 seconds, and opens up a new connection
(spawning a second smbd process).  This happens until retrieval is
complete.  The main symptom of this problem is when a user gets
sharing violations trying to access their own files -- this is becuase
the blocking smbd process has locks on other other files, and the
"new" smbd process cannot work with these locks.  One thing that may
help
matters here is to increase the redirector timeout to wait longer (if
I can ever get our NT admin folks to push out the REG file to all the
clients!).  Unfortunately, this is the least serious of our problems.


2. Full filesystems cause many smbd processes to appear

This is similar to 1. except that access to an entire filesystem is
blocked until the migration system has claimed space (i.e. migrated
files out in response to a full disk).  In this case, the scenario
above happens to every user accessing that filesystem until the space
situation is resolved.  This time varies depending on migration criteria,
responsiveness of the secondary media, and sysadmin response time.


3. General HSM failure

This is obviously the worst situation - the HSM system stops
responding due to some failure, access to all filesystems is blocked,
and the smbd process load doubles after 45 seconds, and then continues
to increase by that amount every 45 seconds, until the underlying
problem is fixed.  1500+ smbd processes using up all of the system
memory and process space makes it difficult to do this.


Does anyone have ANY suggestions to getting around this problem?  The
main problem here is that the client is allowed to time out and tell
Samba to fork off another smbd process.  One suggestion I've seen is
to set keepalives in the smb.conf file (i.e. keepalive = 30), but
whether this will work will depend on which process is handling the
keepalives.  If the children smbd processes handle the keepalives,
it probably won't help matters since smbd won't be able to send/receive
keepalives when it is blocking on a read() or write() system call
(which is what happens when HSM is unable to immediately satisfy a
request).

Oh, to make matters worse, this is a two-node SunCluster HA cluster,
with separate Samba configs per logical host (binding to separate
logical interfaces).  This means it's not unusual for a user to have
two smbd processes running when both logical hosts are failed over to
the same physical host...

Any hints??  We're probably the only site insane enough to combine
Samba + HSM + SunCluster.. :-) :-S

-Andrew Cherry
 UNIX System Admin
 Cummins Engine Company

Gerald Carter

2001-Jan-11 04:40 UTC

head link

Samba and HSM

acherry@pobox.com wrote:> 
> Any hints??  We're probably the only site insane enough 
> to combine Samba + HSM + SunCluster.. :-) :-S
Andrew,

I think you are the only site I have ever heard of trying 
to do this.  :)  Sorry.  Wish I had more information for you.








Cheers, jerry
----------------------------------------------------------------------
   /\  Gerald (Jerry) Carter                     Professional Services
 \/    http://www.valinux.com/  VA Linux Systems   gcarter@valinux.com
       http://www.samba.org/       SAMBA Team          jerry@samba.org
       http://www.plainjoe.org/                     jerry@plainjoe.org

       "...a hundred billion castaways looking for a home."
                                - Sting "Message in a Bottle" ( 1979 )

David Collier-Brown

2001-Jan-11 12:45 UTC

head link

Samba and HSM

Andrew Cherry wrote:
| Files migrated to tape can take as long as 10 minutes to
| retrieve. While attempting to access such a file, the Windows
| NT redirector times out after about 45 seconds, and opens up 
| a new connection (spawning a second smbd process).  

	Unfortunately, NT doesn't realize that it should tell
	the server that it's disconnecting.  It probably
	assumes that the server has crashed.

	Samba can't do much about this, but it can be told to
	clean up the old (now disconected) smbd process. Do try
	"keepalive = 60", and we'll see if the code path
	allows keepalive processing to run while there is a
	read outstanding. 

	If not, we'll probably have to raise this on samba-technical
	and see if there's a way to do so. Hmmm..., or perhaps a 
	way to detect excessive HSM/NFS delay at read-time.

--dave	
-- 
David Collier-Brown,           | Always do right. This will gratify 
Performance & Engineering Team | some people and astonish the rest.
Americas Customer Engineering  |                      -- Mark Twain
(905) 415-2849                 | davecb@canada.sun.com

Bryan Feir

2001-Jan-11 21:08 UTC

head link

Samba and HSM

David Collier-Brown <David.Collier-Brown@canada.sun.com>
wrote:> Andrew Cherry wrote:
> | Files migrated to tape can take as long as 10 minutes to
> | retrieve. While attempting to access such a file, the Windows
> | NT redirector times out after about 45 seconds, and opens up 
> | a new connection (spawning a second smbd process).  
> 
> 	...
> 
> 	If not, we'll probably have to raise this on samba-technical
> 	and see if there's a way to do so. Hmmm..., or perhaps a 
> 	way to detect excessive HSM/NFS delay at read-time.
   Well, in theory that's possible on the NFS side at least.  NFSv3 has an
error code specifically for this case: NFSERR_JUKEBOX.  From RFC1813:

   The server initiated the request, but was not able to complete it in a
   timely fashion. The client should wait and then try the request with a
   new RPC transaction ID.  For example, this error should be returned from
   a server that supports hierarchical storage and receives a request to
   process a file that has been migrated.  In this case, the server should
   start the immigration process and respond to client with this error.

The proposed NFSv4 has a nearly identical code called NFS4ERR_DELAY.

   So as long as both the NFS client and server fully support NFSv3/4, the
client can find out when the HSM server is trying to locate a file.  Now, of
course, getting that information to a userland program like Samba is another
matter entirely...

---------------------------+---------------------------------------------------
Bryan Feir           VA3GBF|"A wrangle is the disinclination of two
boarders to
Work:bryan@sgl.crestech.ca | each other that meet together but are not in the
Home:jenora@sympatico.ca   | same line."              -- Stephen Leacock
---------------------------+---------------------------------------------------

David Collier-Brown

2001-Jan-12 20:29 UTC

head link

Samba and HSM

Andrew Cherry wrote:
| This of course goes under the assumption that each user/client
| combination should have only one smbd process.  Can anyone think of
| any situations where a single user logged onto an NT workstation
would
| have more than one SMB connection open to the same server?

	Yes, but the SMB spec **specifically** says you're
	allowed to restrict to just one.

	The two-connection case is rare, and NT tries to
	keep you from making more than one connection. 
	Two connections were once used here for secretaries
	connecting as both themselves and as their bosses...

	Give me a call or send me your phone number: we should
	talk about this by voice...

--dave (wearing his Sun hat) c-b
-- 
David Collier-Brown,           | Always do right. This will gratify 
Performance & Engineering Team | some people and astonish the rest.
Americas Customer Engineering  |                      -- Mark Twain
(905) 415-2849                 | davecb@canada.sun.com

samba - Jan 2001 - Samba and HSM

Samba and HSM

Samba and HSM

Samba and HSM

Samba and HSM

Samba and HSM