Sebastian Scholz
2009-Jul-11 07:19 UTC
[Samba] Samba freezes server when writing 50MB MS Word document (sometimes)
Hi list for a couple of days we have the problem that our samba pdc server (ubuntu 8.04 LTS [kernel 2.6.24-24] with 3.0.28a samba package) freezes sometimes when someone saves a specific word document (about 50MB) from MS Word. It's not always the same person. Saving the document - as nearly any huge MS Office document - takes some time, about one minute (guessed). First the server gets really slow. Some user programs on clients fail to write. The CPU usage of the smb process of the user writing the document goes up to nearly 100%. After a while it's not possible to log on to the server - even locally. It asks for a password but there won't be a prompt. Sometimes there is nothing on the console/screen. Hitting a button does not change a thing. It's not possible to kill the process. Another thing is that there are a couple of smbd processes for one user (see attached user smbd log). When rebooting the server it does not shut down properly. I guess the process blocks the harddrive thus it cannot be unmounted. A ALT+Print+u and ALT+Print+b helps. ########################################################################### This is what the syslog says: Jul 3 15:01:44 [..] BUG: soft lockup - CPU#1 stuck for 11s! [smbd:23017] Jul 3 15:01:44 [..] Jul 3 15:01:44 [..] Pid: 23017, comm: smbd Not tainted (2.6.24-24-server #1) Jul 3 15:01:44 [..] EIP: 0060:[<f98c1e0c>] EFLAGS: 00000203 CPU: 1 Jul 3 15:01:44 [..] EIP is at xfs_iext_bno_to_ext+0x8c/0x1f0 [xfs] Jul 3 15:01:44 [..] EAX: 000000ba EBX: 00002404 ECX: 000000ba EDX: 000000b5 Jul 3 15:01:44 [..] ESI: 00000000 EDI: 000051c7 EBP: 00000001 ESP: ee5298a4 Jul 3 15:01:44 [..] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 Jul 3 15:01:44 [..] CR0: 8005003b CR2: b7f2f000 CR3: 3577d000 CR4: 000006b0 Jul 3 15:01:44 [..] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 Jul 3 15:01:44 [..] DR6: ffff0ff0 DR7: 00000400 Jul 3 15:01:44 [..] [<f989af59>] xfs_bmap_search_multi_extents+0x69/0x100 [xfs] Jul 3 15:01:44 [..] [<f989b05f>] xfs_bmap_search_extents+0x6f/0x150 [xfs] Jul 3 15:01:44 [..] [apic_timer_interrupt+0x28/0x30] apic_timer_interrupt+0x28/0x30 Jul 3 15:01:44 [..] [<f98a4714>] xfs_bmapi+0x1e4/0x16a0 [xfs] Jul 3 15:01:44 [..] [net_rx_action+0x12d/0x210] net_rx_action+0x12d/0x210 Jul 3 15:01:44 [..] [run_timer_softirq+0x30/0x1e0] run_timer_softirq+0x30/0x1e0 Jul 3 15:01:44 [..] [tick_program_event+0x38/0x60] tick_program_event+0x38/0x60 Jul 3 15:01:44 [..] [__do_softirq+0x82/0x110] __do_softirq+0x82/0x110 Jul 3 15:01:44 [..] [<f98c83c9>] xfs_iomap_eof_want_preallocate+0x149/0x220 [xfs] Jul 3 15:01:44 [..] [<f98c8bd3>] xfs_iomap_write_delay+0x153/0x3e0 [xfs] Jul 3 15:01:44 [..] [<f98c7fe7>] xfs_iomap+0x277/0x510 [xfs] Jul 3 15:01:44 [..] [<f98c1eda>] xfs_iext_bno_to_ext+0x15a/0x1f0 [xfs] Jul 3 15:01:44 [..] [<f98c8274>] xfs_iomap+0x504/0x510 [xfs] Jul 3 15:01:44 [..] [<f98e8393>] __xfs_get_blocks+0x73/0x2b0 [xfs] Jul 3 15:01:44 [..] [<f98e8628>] xfs_get_blocks+0x28/0x30 [xfs] Jul 3 15:01:44 [..] [__block_prepare_write+0x228/0x400] __block_prepare_write+0x228/0x400 Jul 3 15:01:44 [..] [sg:__alloc_pages+0x5a/0x380] __alloc_pages+0x5a/0x380 Jul 3 15:01:44 [..] [ext3:__grab_cache_page+0x97/0x1a90] __grab_cache_page+0x97/0xc0 Jul 3 15:01:44 [..] [ext3:block_write_begin+0x4d/0x590] block_write_begin+0x4d/0xf0 Jul 3 15:01:44 [..] [<f98e8600>] xfs_get_blocks+0x0/0x30 [xfs] Jul 3 15:01:44 [..] [<f98e7da6>] xfs_vm_write_begin+0x46/0x50 [xfs] Jul 3 15:01:44 [..] [<f98e8600>] xfs_get_blocks+0x0/0x30 [xfs] Jul 3 15:01:44 [..] [xfs:generic_file_buffered_write+0x117/0xcf0] generic_file_buffered_write+0x117/0x630 Jul 3 15:01:44 [..] [<f98f0880>] xfs_write+0x830/0x910 [xfs] Jul 3 15:01:44 [..] [<f98ec6cc>] xfs_file_aio_write+0x5c/0x70 [xfs] Jul 3 15:01:44 [..] [xfs:do_sync_write+0xd5/0x120] do_sync_write+0xd5/0x120 Jul 3 15:01:44 [..] [<c0145fa0>] autoremove_wake_function+0x0/0x40 Jul 3 15:01:44 [..] [<c0145fa0>] autoremove_wake_function+0x0/0x40 Jul 3 15:01:44 [..] [set_current_groups+0x25e/0x270] set_current_groups+0x25e/0x270 Jul 3 15:01:44 [..] [<c019a2c0>] do_sync_write+0x0/0x120 Jul 3 15:01:44 [..] [vfs_write+0xb9/0x170] vfs_write+0xb9/0x170 Jul 3 15:01:44 [..] [__capable+0x8/0x20] __capable+0x8/0x20 Jul 3 15:01:44 [..] [sys_pwrite64+0x66/0x80] sys_pwrite64+0x66/0x80 Jul 3 15:01:44 [..] [sysenter_past_esp+0x6b/0xa1] sysenter_past_esp+0x6b/0xa1 Jul 3 15:01:44 [..] ======================[.. this repeats x times ...] [.. REBOOT / relevant boot messages ..] Jul 3 15:33:04 [..] [ 115.676800] Driver 'sd' needs updating - please use bus_type methods Jul 3 15:33:04 [..] [ 115.676853] sd 0:0:0:0: [sda] 2927603712 512-byte hardware sectors (1498933 MB) Jul 3 15:33:04 [..] [ 115.676863] sd 0:0:0:0: [sda] Write Protect is off Jul 3 15:33:04 [..] [ 115.676864] sd 0:0:0:0: [sda] Mode Sense: 06 00 10 00 Jul 3 15:33:04 [..] [ 115.676879] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, supports DPO and FUA Jul 3 15:33:04 [..] [ 115.676921] sd 0:0:0:0: [sda] 2927603712 512-byte hardware sectors (1498933 MB) Jul 3 15:33:04 [..] [ 115.676929] sd 0:0:0:0: [sda] Write Protect is off Jul 3 15:33:04 [..] [ 115.676931] sd 0:0:0:0: [sda] Mode Sense: 06 00 10 00 Jul 3 15:33:04 [..] [ 115.676945] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, supports DPO and FUA Jul 3 15:33:04 [..] [ 115.676947] sda: sda1 [.. snipp ..] Jul 3 15:33:04 [..] [ 129.848176] SGI XFS with ACLs, security attributes, realtime, large block numbers, no debug enabled Jul 3 15:33:04 [..] [ 129.848501] SGI XFS Quota Management subsystem Jul 3 15:33:04 [..] [ 129.875533] XFS mounting filesystem sda1 Jul 3 15:33:04 [..] [ 130.036456] Starting XFS recovery on filesystem: sda1 (logdev: internal) Jul 3 15:33:04 [..] [ 131.699948] Ending XFS recovery on filesystem: sda1 (logdev: internal) [.. same error again ..] Jul 3 16:12:25 [..] [ 2488.349286] Filesystem "sda1": Access to block zero in inode 1073770465 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 1cf Jul 3 16:12:25 [..] [ 2488.351994] Filesystem "sda1": Access to block zero in inode 1073770465 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 1cf Jul 3 16:12:25 [..] [ 2488.354591] Filesystem "sda1": Access to block zero in inode 1073770465 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 1cf Jul 3 16:12:25 [..] [ 2488.357180] Filesystem "sda1": Access to block zero in inode 1073770465 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 1cf Jul 3 16:12:35 [..] [ 2499.059013] BUG: soft lockup - CPU#0 stuck for 11s! [smbd:7280] Jul 3 16:12:35 [..] [ 2499.060252] Jul 3 16:12:35 [..] [ 2499.060254] Pid: 7280, comm: smbd Not tainted (2.6.24-24-server #1) Jul 3 16:12:35 [..] [ 2499.060256] EIP: 0060:[<f98aff07>] EFLAGS: 00000282 CPU: 0 Jul 3 16:12:35 [..] [ 2499.060275] EIP is at xfs_bmap_search_multi_extents+0x17/0x100 [xfs] Jul 3 16:12:35 [..] [ 2499.060276] EAX: ef34c038 EBX: ef34c038 ECX: 00000000 EDX: 00001a93 Jul 3 16:12:35 [..] [ 2499.060278] ESI: 00001a93 EDI: f4565d44 EBP: f4565d28 ESP: f4565c04 Jul 3 16:12:35 [..] [ 2499.060279] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 Jul 3 16:12:35 [..] [ 2499.060281] CR0: 8005003b CR2: b792fa84 CR3: 351b2000 CR4: 000006f0 Jul 3 16:12:35 [..] [ 2499.060282] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 Jul 3 16:12:35 [..] [ 2499.060283] DR6: ffff0ff0 DR7: 00000400 Jul 3 16:12:35 [..] [ 2499.060293] [<f98b005f>] xfs_bmap_search_extents+0x6f/0x150 [xfs] Jul 3 16:12:35 [..] [ 2499.060315] [apic_timer_interrupt+0x28/0x30] apic_timer_interrupt+0x28/0x30 Jul 3 16:12:35 [..] [ 2499.060322] [<f98b9714>] xfs_bmapi+0x1e4/0x16a0 [xfs] Jul 3 16:12:35 [..] [ 2499.060346] [run_timer_softirq+0x30/0x1e0] run_timer_softirq+0x30/0x1e0 Jul 3 16:12:35 [..] [ 2499.060352] [__do_softirq+0x82/0x110] __do_softirq+0x82/0x110 Jul 3 16:12:35 [..] [ 2499.060359] [<f98b00d8>] xfs_bmap_search_extents+0xe8/0x150 [xfs] Jul 3 16:12:35 [..] [ 2499.060379] [<f99048e5>] xfs_zero_eof+0x185/0x300 [xfs] Jul 3 16:12:35 [..] [ 2499.060405] [<f9905380>] xfs_write+0x330/0x910 [xfs] Jul 3 16:12:35 [..] [ 2499.060429] [skb_release_all+0x54/0x130] skb_release_all+0x54/0x130 Jul 3 16:12:35 [..] [ 2499.060435] [<f99016cc>] xfs_file_aio_write+0x5c/0x70 [xfs] Jul 3 16:12:35 [..] [ 2499.060456] [xfs:do_sync_write+0xd5/0x120] do_sync_write+0xd5/0x120 Jul 3 16:12:35 [..] [ 2499.060462] [<c0145fa0>] autoremove_wake_function+0x0/0x40 Jul 3 16:12:35 [..] [ 2499.060465] [<c0145fa0>] autoremove_wake_function+0x0/0x40 Jul 3 16:12:35 [..] [ 2499.060471] [<c019a2c0>] do_sync_write+0x0/0x120 Jul 3 16:12:35 [..] [ 2499.060473] [vfs_write+0xb9/0x170] vfs_write+0xb9/0x170 Jul 3 16:12:35 [..] [ 2499.060477] [sys_pwrite64+0x66/0x80] sys_pwrite64+0x66/0x80 Jul 3 16:12:35 [..] [ 2499.060479] [sysenter_past_esp+0x6b/0xa1] sysenter_past_esp+0x6b/0xa1 Jul 3 16:12:35 [..] [ 2499.060483] [rt_mutex_slowlock+0x460/0x4b0] rt_mutex_slowlock+0x460/0x4b0 Jul 3 16:12:35 [..] [ 2499.060488] ======================Jul 3 16:12:36 [..] [ 2499.142186] named[5015]: segfault at 005047ff eip b7fd1410 esp 005047ff error 4 Jul 3 16:12:47 [..] [ 2510.849153] BUG: soft lockup - CPU#0 stuck for 11s! [smbd:7280] [..and so on..] ########################################################################### The samba log of the user with the smbd process with pid 23017 says (the main smbd process log does not say anything): [2009/07/02 20:51:03, 1] smbd/service.c:make_connection_snum(1033) pc_tom (192.168.43.189) connect to service profiles initially as user tomh (uid=2048, gid=1901) (pid 23017) [2009/07/02 20:51:06, 1] smbd/service.c:make_connection_snum(1033) pc_tom (192.168.43.189) connect to service profiles initially as user tomh (uid=2048, gid=1901) (pid 23017) [2009/07/02 20:51:06, 1] smbd/service.c:make_connection_snum(1033) pc_tom (192.168.43.189) connect to service profiles initially as user tomh (uid=2048, gid=1901) (pid 23017) [2009/07/02 20:51:06, 1] smbd/service.c:make_connection_snum(1033) pc_tom (192.168.43.189) connect to service profiles initially as user tomh (uid=2048, gid=1901) (pid 23017) [2009/07/02 20:51:06, 1] smbd/service.c:make_connection_snum(1033) pc_tom (192.168.43.189) connect to service profiles initially as user tomh (uid=2048, gid=1901) (pid 23017) [2009/07/02 21:01:33, 1] smbd/service.c:close_cnum(1230) pc_tom (192.168.43.189) closed connection to service profiles [2009/07/02 21:01:33, 1] smbd/service.c:close_cnum(1230) pc_tom (192.168.43.189) closed connection to service profiles [2009/07/02 21:01:33, 1] smbd/service.c:close_cnum(1230) pc_tom (192.168.43.189) closed connection to service profiles [2009/07/02 21:01:33, 1] smbd/service.c:close_cnum(1230) pc_tom (192.168.43.189) closed connection to service profiles [2009/07/02 21:01:33, 1] smbd/service.c:close_cnum(1230) pc_tom (192.168.43.189) closed connection to service profiles [2009/07/03 15:02:44, 0] lib/util_sock.c:write_data(562) write_data: write failure in writing to client 0.0.0.0. Error Connection reset by peer [2009/07/03 15:02:44, 0] lib/util_sock.c:send_smb(761) Error writing 4 bytes to client. -1. (Connection reset by peer) [2009/07/03 15:02:44, 1] smbd/service.c:make_connection_snum(1033) pc_tom (192.168.43.189) connect to service tomh initially as user tomh (uid=2048, gid=1901) (pid 11831) [2009/07/03 15:02:44, 1] smbd/service.c:make_connection_snum(1033) pc_tom (192.168.43.189) connect to service data initially as user tomh (uid=0, gid=1901) (pid 11831) [2009/07/03 15:02:45, 1] smbd/service.c:make_connection_snum(1033) pc_tom (192.168.43.189) connect to service projekte initially as user tomh (uid=0, gid=1901) (pid 11831) [2009/07/03 15:03:56, 1] smbd/service.c:make_connection_snum(1033) pc_tom (192.168.43.189) connect to service exchange initially as user tomh (uid=0, gid=1901) (pid 11831) [2009/07/03 15:03:56, 1] smbd/service.c:make_connection_snum(1033) pc_tom (192.168.43.189) connect to service orga initially as user tomh (uid=0, gid=1901) (pid 11831) [2009/07/03 15:03:56, 1] smbd/service.c:make_connection_snum(1033) pc_tom (192.168.43.189) connect to service info initially as user tomh (uid=0, gid=1901) (pid 11831) [2009/07/03 15:03:56, 1] smbd/service.c:make_connection_snum(1033) pc_tom (192.168.43.189) connect to service archiv initially as user tomh (uid=0, gid=1901) (pid 11831) [2009/07/03 15:03:56, 1] smbd/service.c:make_connection_snum(1033) pc_tom (192.168.43.189) connect to service setup initially as user tomh (uid=2048, gid=1901) (pid 11831) [2009/07/03 15:05:34, 0] lib/util_sock.c:write_data(562) write_data: write failure in writing to client 192.168.43.189. Error Connection reset by peer [2009/07/03 15:05:34, 0] lib/util_sock.c:send_smb(761) Error writing 4 bytes to client. -1. (Connection reset by peer) [2009/07/03 15:05:34, 1] smbd/service.c:make_connection_snum(1033) pc_tom (192.168.43.189) connect to service tomh initially as user tomh (uid=2048, gid=1901) (pid 11895) [2009/07/03 15:05:38, 1] smbd/service.c:make_connection_snum(1033) pc_tom (192.168.43.189) connect to service orga initially as user tomh (uid=0, gid=1901) (pid 11895) [2009/07/03 15:05:47, 1] smbd/service.c:make_connection_snum(1033) pc_tom (192.168.43.189) connect to service projekte initially as user tomh (uid=0, gid=1901) (pid 11895) [2009/07/03 15:07:17, 1] smbd/service.c:make_connection_snum(1033) pc_tom (192.168.43.189) connect to service projekte initially as user tomh (uid=0, gid=1901) (pid 11920) [2009/07/03 15:07:17, 1] smbd/service.c:make_connection_snum(1033) pc_tom (192.168.43.189) connect to service tomh initially as user tomh (uid=2048, gid=1901) (pid 11920) ########################################################################### Parts of the smb.conf: [global] unix charset = utf-8 workgroup = XXXXXXXX netbios name = XXXXXXXX server string = XXXXXXXX Domain Server enable privileges = yes interfaces = eth0,lo bind interfaces only = Yes passdb backend = ldapsam:ldapi://%2fvar%2frun%2fslapd%2fldapi log level = 1 syslog = 0 log file = /var/log/samba/%m max log size = 5000 smb ports = 137 138 139 445 name resolve order = wins bcast hosts time server = Yes printcap name = CUPS # wins support = Yes dns proxy = Yes ldap suffix = dc=ebox ldap machine suffix = ou=Computers ldap user suffix = ou=Users ldap group suffix = ou=Groups ldap idmap suffix = ou=Idmap ldap admin dn = cn=admin,dc=ebox map acl inherit = Yes printing = cups encrypt passwords = Yes obey pam restrictions = No ldap passwd sync = Yes mangling method = hash2 logon script = logon.bat logon drive = I: logon home logon path = \\XXXXXXXX\profiles\%U domain logons = Yes os level = 65 preferred master = Yes domain master = Yes add user script = /usr/sbin/smbldap-useradd -m "%u" ldap delete dn = Yes add machine script = /usr/sbin/smbldap-useradd -w "%u" add group script = /usr/sbin/smbldap-groupadd -p "%g" add user to group script = /usr/sbin/smbldap-groupmod -m "%u" "%g" delete user from group script = /usr/sbin/smbldap-groupmod -x "%u" "%g" set primary group script = /usr/sbin/smbldap-usermod -g "%g" "%u" create mask = 770 directory mask = 770 force create mode = 770 force directory mode = 770 [.. snipp ...] [projekte] comment = XXXXXXXX Projekte path = /home/samba/shares/projects valid users = "XXXXXXXX" read list write list = "XXXXXXXX" admin users = "XXXXXXXX" read only = No browseable = Yes ########################################################################### The samba shares are behind an Adaptec Raid Controller and are formated with XFS. We did some memtests but they showed no problem. I would like to stick to the 3.0.x branch of samba and to the ubuntu package system (There must be a reason for the LTS). I've read the "samba freezes the server" thread but this is for a different kernel. Any help would be really appreciated Thanks Sebastian
Volker Lendecke
2009-Jul-11 07:49 UTC
[Samba] Samba freezes server when writing 50MB MS Word document (sometimes)
On Sat, Jul 11, 2009 at 09:14:21AM +0200, Sebastian Scholz wrote:> Hi list > > for a couple of days we have the problem that our samba pdc server (ubuntu > 8.04 > LTS [kernel 2.6.24-24] with 3.0.28a samba package) freezes sometimes when > someone saves a specific word document (about 50MB) from MS Word. It's not > always the same person. Saving the document - as nearly any huge MS Office > document - takes some time, about one minute (guessed). > > First the server gets really slow. Some user programs on clients fail to > write. > The CPU usage of the smb process of the user writing the document goes up to > nearly 100%. After a while it's not possible to log on to the server - even > locally. It asks for a password but there won't be a prompt. Sometimes there > is > nothing on the console/screen. Hitting a button does not change a thing. > It's > not possible to kill the process. Another thing is that there are a couple > of > smbd processes for one user (see attached user smbd log).If it's not possible to kill smbd, even with -9, this really sounds like a kernel problem. Look for example at https://bugs.launchpad.net/linux/+bug/254326 Volker -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: Digital signature Url : http://lists.samba.org/archive/samba/attachments/20090711/5967dfee/attachment.bin