Johannes Amorosa | Celluloid VFX
2016-Apr-05 07:28 UTC
[Samba] Debugging Samba4 - application sometimes fails because files are invisible/gone
Hello Samba list, we have a problem that our proprietary application sometimes can't find files on our samba share. I'm hoping for some help on this list. Our setup is two ADs as replicated domain Controller ( Ubuntu 12.04.5 LTS, Version 4.1.17-SerNet-Ubuntu-10.precise) and several domain member as file servers and mixed clients (~40 x Win7, Ubuntu and OSX). The ADs use internal DNS. We have a proprietary software that runs as a cluster and needs a common shared network volume. This volume is on a domain member running (Ubuntu 12.04.5 LTS, Version 4.1.17-SerNet-Ubuntu-10.precise) with a zfs Raid 0.6.3. Authentication is done via pam and works fine. All test described [1]here succeed and we're using this setup in production for over a year. Problem: Sometimes (1-2/month) our application fails with a error message like: \\cell-dead-01\deadlinerepo\jobs\56fe4a61b9baa917e4169c31\DraftCreateMovie.py (System.IO.FileNotFoundException)" Although the file exists and has the same acl like everything else: /silo/deadlinerepo/jobs/56fe4a61b9baa917e4169c31/DraftCreateMovie.py We know that zfs is maybe not production ready and needs at least to be upgraded to 0.6.5.6. We should upgrade samba as well at least to 4.2.X. This will be done hopefully in may. It's possible we hit a bug in the application itself. Meanwhile I'm trying to make sense of samba log files and basically fail of spaminess. I configured vfs_audit to get behind these issues to see who is responsible. I'm seeing a lot of errors and want to know what to make out of it. In one day audit.log increased to 35mb. Here a some snippets: deadlinerepo|is_offline|fail (Operation not supported)|scripts/Submission/HServerSubmission.py deadlinerepo|translate_name|fail (Operation not supported) deadlinerepo|sys_acl_get_file|fail (Operation not supported)|scripts/Submission deadlinerepo|open|ok|r|custom/scripts/Submission deadlinerepo|realpath|fail (No such file or directory)|custom/events/Draft Interesting enough the app runs perfect most of the time - but if this happens it ruins a day of computation and deadlines are always super tight meaning overtime for some of us. Can someone shed some light on this? Thank you for your time. Joe Domain dc config: [global] ... server role = active directory domain controller ... Domain member config: [global] ... security = ADS realm = MOO.NET encrypt passwords = yes ... full_audit:prefix = %u|%I|%S full_audit:success = open opendir full_audit:failure = all !open full_audit:facility = local5 full_audit:priority = notice ... [deadlinerepo] ... read only = no path = /silo/deadlinerepo comment = Deadline Repository veto files = /._*/.DS_Store/.Trash*/.TemporaryItems/desktop.ini/.apdisk/ guest ok = yes force user = moo browseable = yes vfs objects = full_audit ... [1] https://wiki.samba.org/index.php/Setup_a_Samba_Active_Directory_Domain_Controller -- Johannes Amorosa | Celluloid VFX Celluloid Visual Effects GmbH & Co. KG Paul-Lincke-Ufer 39/40, 10999 Berlin
Jeremy Allison
2016-Apr-08 00:01 UTC
[Samba] Debugging Samba4 - application sometimes fails because files are invisible/gone
On Tue, Apr 05, 2016 at 09:28:12AM +0200, Johannes Amorosa | Celluloid VFX wrote:> Hello Samba list, > we have a problem that our proprietary application sometimes can't > find files on our samba share. I'm hoping > for some help on this list. > > Our setup is two ADs as replicated domain Controller ( Ubuntu > 12.04.5 LTS, Version 4.1.17-SerNet-Ubuntu-10.precise) > and several domain member as file servers and mixed clients (~40 x > Win7, Ubuntu and OSX). The ADs use internal DNS. > > We have a proprietary software that runs as a cluster and needs a > common shared network volume. This volume is > on a domain member running (Ubuntu 12.04.5 LTS, Version > 4.1.17-SerNet-Ubuntu-10.precise) with a zfs Raid 0.6.3. > > Authentication is done via pam and works fine. All test described > [1]here succeed and we're using this setup in production for over > a year. > > Problem: Sometimes (1-2/month) our application fails with a error > message like: > \\cell-dead-01\deadlinerepo\jobs\56fe4a61b9baa917e4169c31\DraftCreateMovie.py > (System.IO.FileNotFoundException)" > > Although the file exists and has the same acl like everything else: > /silo/deadlinerepo/jobs/56fe4a61b9baa917e4169c31/DraftCreateMovie.py > > We know that zfs is maybe not production ready and needs at least to > be upgraded to 0.6.5.6. > We should upgrade samba as well at least to 4.2.X. This will be done > hopefully in may. It's possible > we hit a bug in the application itself. Meanwhile I'm trying to make > sense of samba log files and > basically fail of spaminess. I configured vfs_audit to get behind > these issues to see who is > responsible. I'm seeing a lot of errors and want to know what to > make out of it. In one day > audit.log increased to 35mb. > > Here a some snippets: > > deadlinerepo|is_offline|fail (Operation not > supported)|scripts/Submission/HServerSubmission.py > deadlinerepo|translate_name|fail (Operation not supported) > deadlinerepo|sys_acl_get_file|fail (Operation not > supported)|scripts/Submission > deadlinerepo|open|ok|r|custom/scripts/Submission > deadlinerepo|realpath|fail (No such file or directory)|custom/events/Draft > > Interesting enough the app runs perfect most of the time - but if > this happens it ruins a day of computation and > deadlines are always super tight meaning overtime for some of us. > Can someone shed some > light on this? Thank you for your time. > JoeSorry, but there's not enough info for us to determine what might be the problem. Getting it repeatable will be the first step.
Johannes Amorosa | Celluloid VFX
2016-Apr-08 09:17 UTC
[Samba] Debugging Samba4 - application sometimes fails because files are invisible/gone
On 04/08/2016 02:01 AM, Jeremy Allison wrote:> On Tue, Apr 05, 2016 at 09:28:12AM +0200, Johannes Amorosa | Celluloid VFX wrote: >> Hello Samba list, >> we have a problem that our proprietary application sometimes can't >> find files on our samba share. I'm hoping >> for some help on this list. >> >> Our setup is two ADs as replicated domain Controller ( Ubuntu >> 12.04.5 LTS, Version 4.1.17-SerNet-Ubuntu-10.precise) >> and several domain member as file servers and mixed clients (~40 x >> Win7, Ubuntu and OSX). The ADs use internal DNS. >> >> We have a proprietary software that runs as a cluster and needs a >> common shared network volume. This volume is >> on a domain member running (Ubuntu 12.04.5 LTS, Version >> 4.1.17-SerNet-Ubuntu-10.precise) with a zfs Raid 0.6.3. >> >> Authentication is done via pam and works fine. All test described >> [1]here succeed and we're using this setup in production for over >> a year. >> >> Problem: Sometimes (1-2/month) our application fails with a error >> message like: >> \\cell-dead-01\deadlinerepo\jobs\56fe4a61b9baa917e4169c31\DraftCreateMovie.py >> (System.IO.FileNotFoundException)" >> >> Although the file exists and has the same acl like everything else: >> /silo/deadlinerepo/jobs/56fe4a61b9baa917e4169c31/DraftCreateMovie.py >> >> We know that zfs is maybe not production ready and needs at least to >> be upgraded to 0.6.5.6. >> We should upgrade samba as well at least to 4.2.X. This will be done >> hopefully in may. It's possible >> we hit a bug in the application itself. Meanwhile I'm trying to make >> sense of samba log files and >> basically fail of spaminess. I configured vfs_audit to get behind >> these issues to see who is >> responsible. I'm seeing a lot of errors and want to know what to >> make out of it. In one day >> audit.log increased to 35mb. >> >> Here a some snippets: >> >> deadlinerepo|is_offline|fail (Operation not >> supported)|scripts/Submission/HServerSubmission.py >> deadlinerepo|translate_name|fail (Operation not supported) >> deadlinerepo|sys_acl_get_file|fail (Operation not >> supported)|scripts/Submission >> deadlinerepo|open|ok|r|custom/scripts/Submission >> deadlinerepo|realpath|fail (No such file or directory)|custom/events/Draft >> >> Interesting enough the app runs perfect most of the time - but if >> this happens it ruins a day of computation and >> deadlines are always super tight meaning overtime for some of us. >> Can someone shed some >> light on this? Thank you for your time. >> Joe > Sorry, but there's not enough info for us to > determine what might be the problem. Getting > it repeatable will be the first step. >Thank you Jeremy for answering my post - I have upgraded all our DCs and fileservers to 4.2. in hope of not hitting that bug again - zfs upgrade requires a reboot - we have a window next week. Unfortunately after the upgrade my audit log stays empty. -- Johannes Amorosa | Celluloid VFX Celluloid Visual Effects GmbH & Co. KG Paul-Lincke-Ufer 39/40, 10999 Berlin phone +49 (0)30 / 54 735 220 fax +49 (0)30 / 54 735 221
Possibly Parallel Threads
- Debugging Samba4 - application sometimes fails because files are invisible/gone
- Debugging Samba4 - application sometimes fails because files are invisible/gone
- 4.1.0 auditing : can't get only wanted vfs operations to log
- Samba suddenly acting strangely
- Samba suddenly acting strangely