Vogel, Sven
2012-Nov-27 13:00 UTC
[Samba] CTDB / Samba / GFS2 - Performance - with Picture Link
Hello, maybe there is someone they can help and answer a question why i get these network screen on my ctdb clusters. I have two ctdb clusters. One physical and one in a vmware enviroment. So when i transfer any files (copy) in a samba share so i get such network curves with performance breaks. I dont see that the transfer will stop but why is that so? can i change anything or does anybody know which ist he problem? http://dev.kupper-computer.com/intern/transfer_network.jpg thanks Sven Vogel
Volker Lendecke
2012-Nov-27 13:05 UTC
[Samba] CTDB / Samba / GFS2 - Performance - with Picture Link
On Tue, Nov 27, 2012 at 01:00:49PM +0000, Vogel, Sven wrote:> Hello, > > maybe there is someone they can help and answer a question why i get these network screen on my ctdb clusters. I have two ctdb clusters. One physical and one in a vmware enviroment. > > So when i transfer any files (copy) in a samba share so i get such network curves with performance breaks. I dont see that the transfer will stop but why is that so? can i change anything or does anybody know which ist he problem? > > > http://dev.kupper-computer.com/intern/transfer_network.jpgDo a strace -ttT -f -o /tmp/smbd.strace -p <smbd-pid> and see in /tmp/smbd.strace which syscalls take long. Volker -- SerNet GmbH, Bahnhofsallee 1b, 37081 G?ttingen phone: +49-551-370000-0, fax: +49-551-370000-9 AG G?ttingen, HRB 2816, GF: Dr. Johannes Loxen http://www.sernet.de, mailto:kontakt at sernet.de
Vogel, Sven
2012-Nov-27 15:50 UTC
[Samba] CTDB / Samba / GFS2 - Performance - with Picture Link
Hi Volker, thanks for the fast reply. So used the strace command. I am not so a strace specialist but is it possible that the problem are the many polls?` 12513 15:33:24.593065 poll([{fd=9, events=POLLIN|POLLHUP}, {fd=7, events=POLLIN|POLLHUP}, {fd=40, events=POLLIN|POLLHUP}, {fd=32, events=POLLIN|POLLHUP}, {fd=34, events=POLLIN|POLLHUP}], 5, 4436) = 1 ([{fd=32, revents=POLLIN}]) <0.002497> 12513 15:33:24.595615 read(32, "\0\0\0T", 4) = 4 <0.000017> i added a link to the strace. I dont see which syscalls take long. There are such many syscalls in any second so i dont know whats normal. :-| http://dev.kupper-computer.com/intern/smbd.txt Did you have any idea? Thanks Sven -----Urspr?ngliche Nachricht----- Von: Volker Lendecke [mailto:Volker.Lendecke at SerNet.DE] Gesendet: Dienstag, 27. November 2012 14:06 An: Vogel, Sven Cc: samba at lists.samba.org Betreff: Re: [Samba] CTDB / Samba / GFS2 - Performance - with Picture Link On Tue, Nov 27, 2012 at 01:00:49PM +0000, Vogel, Sven wrote:> Hello, > > maybe there is someone they can help and answer a question why i get these network screen on my ctdb clusters. I have two ctdb clusters. One physical and one in a vmware enviroment. > > So when i transfer any files (copy) in a samba share so i get such network curves with performance breaks. I dont see that the transfer will stop but why is that so? can i change anything or does anybody know which ist he problem? > > > http://dev.kupper-computer.com/intern/transfer_network.jpgDo a strace -ttT -f -o /tmp/smbd.strace -p <smbd-pid> and see in /tmp/smbd.strace which syscalls take long. Volker -- SerNet GmbH, Bahnhofsallee 1b, 37081 G?ttingen phone: +49-551-370000-0, fax: +49-551-370000-9 AG G?ttingen, HRB 2816, GF: Dr. Johannes Loxen http://www.sernet.de, mailto:kontakt at sernet.de
Volker Lendecke
2012-Nov-27 16:04 UTC
[Samba] CTDB / Samba / GFS2 - Performance - with Picture Link
On Tue, Nov 27, 2012 at 03:50:40PM +0000, Vogel, Sven wrote:> Hi Volker, > > thanks for the fast reply. So used the strace command. I am not so a strace specialist but is it possible that the problem are the many polls?` > > 12513 15:33:24.593065 poll([{fd=9, events=POLLIN|POLLHUP}, {fd=7, events=POLLIN|POLLHUP}, {fd=40, events=POLLIN|POLLHUP}, {fd=32, events=POLLIN|POLLHUP}, {fd=34, events=POLLIN|POLLHUP}], 5, 4436) = 1 ([{fd=32, revents=POLLIN}]) <0.002497> > 12513 15:33:24.595615 read(32, "\0\0\0T", 4) = 4 <0.000017> > > i added a link to the strace. I dont see which syscalls take long. There are such many syscalls in any second so i dont know whats normal. :-| > > http://dev.kupper-computer.com/intern/smbd.txt > > Did you have any idea?One question -- do you have your brlock.tdb on gfs? If so, move them to a local file system, they will be taken care of by ctdb. Your fcntl calls on that seem slow. Also, you might want to try "posix locking = no". There is a call at timestamp 15:32:47.383963, 1.9 seconds to find out whether a range is locked. That shows that at this point in time GFS was busy regarding fcntl locks. Also, your network or your client seems to have a problem. For example at timestamp 15:32:51.837717 we are waiting 30 milliseconds for a new request from the client. This is very long for a client continuously trying to write. Volker -- SerNet GmbH, Bahnhofsallee 1b, 37081 G?ttingen phone: +49-551-370000-0, fax: +49-551-370000-9 AG G?ttingen, HRB 2816, GF: Dr. Johannes Loxen http://www.sernet.de, mailto:kontakt at sernet.de
Vogel, Sven
2012-Nov-28 11:11 UTC
[Samba] CTDB / Samba / GFS2 - Performance - with Picture Link
Hi Volker, so i looked fort he brlock.tdb file and its local on each node. I added "posix locking = no" and "locking = no". I think it will run now better. I again a strace file to the server. What do you think? http://dev.kupper-computer.com/intern/smbd_no_locking.txt I also added fileid:algorithm = fsname vfs objects = fileid for gfs2 whats better fsid or fileid? Thanks Sven -----Urspr?ngliche Nachricht----- Von: Volker Lendecke [mailto:Volker.Lendecke at SerNet.DE] Gesendet: Dienstag, 27. November 2012 17:05 An: Vogel, Sven Cc: samba at lists.samba.org Betreff: Re: [Samba] CTDB / Samba / GFS2 - Performance - with Picture Link On Tue, Nov 27, 2012 at 03:50:40PM +0000, Vogel, Sven wrote:> Hi Volker, > > thanks for the fast reply. So used the strace command. I am not so a > strace specialist but is it possible that the problem are the many > polls?` > > 12513 15:33:24.593065 poll([{fd=9, events=POLLIN|POLLHUP}, {fd=7, > events=POLLIN|POLLHUP}, {fd=40, events=POLLIN|POLLHUP}, {fd=32, > events=POLLIN|POLLHUP}, {fd=34, events=POLLIN|POLLHUP}], 5, 4436) = 1 > ([{fd=32, revents=POLLIN}]) <0.002497> > 12513 15:33:24.595615 read(32, "\0\0\0T", 4) = 4 <0.000017> > > i added a link to the strace. I dont see which syscalls take long. > There are such many syscalls in any second so i dont know whats > normal. :-| > > http://dev.kupper-computer.com/intern/smbd.txt > > Did you have any idea?One question -- do you have your brlock.tdb on gfs? If so, move them to a local file system, they will be taken care of by ctdb. Your fcntl calls on that seem slow. Also, you might want to try "posix locking = no". There is a call at timestamp 15:32:47.383963, 1.9 seconds to find out whether a range is locked. That shows that at this point in time GFS was busy regarding fcntl locks. Also, your network or your client seems to have a problem. For example at timestamp 15:32:51.837717 we are waiting 30 milliseconds for a new request from the client. This is very long for a client continuously trying to write. Volker -- SerNet GmbH, Bahnhofsallee 1b, 37081 G?ttingen phone: +49-551-370000-0, fax: +49-551-370000-9 AG G?ttingen, HRB 2816, GF: Dr. Johannes Loxen http://www.sernet.de, mailto:kontakt at sernet.de
Volker Lendecke
2012-Nov-28 11:14 UTC
[Samba] CTDB / Samba / GFS2 - Performance - with Picture Link
On Wed, Nov 28, 2012 at 11:11:16AM +0000, Vogel, Sven wrote:> Hi Volker, > > so i looked fort he brlock.tdb file and its local on each > node. I added "posix locking = no" and "locking = no". I > think it will run now better. I again a strace file to the > server. What do you think?I would not run with locking=no. It will certainly be faster, but it might cause data corruption.> http://dev.kupper-computer.com/intern/smbd_no_locking.txt > > I also added > > fileid:algorithm = fsname > vfs objects = fileid > > for gfs2 whats better fsid or fileid?Dunno, I never used GFS2, sorry. RedHat ships a cluster product with GFS2 and Samba, maybe they have a recommendation. Volker -- SerNet GmbH, Bahnhofsallee 1b, 37081 G?ttingen phone: +49-551-370000-0, fax: +49-551-370000-9 AG G?ttingen, HRB 2816, GF: Dr. Johannes Loxen http://www.sernet.de, mailto:kontakt at sernet.de
Vogel, Sven
2012-Nov-29 21:16 UTC
[Samba] CTDB / Samba / GFS2 - Performance - with Picture Link
Hi Volker, you wrote that ist not so good to set locking = no, why ist hat so? i thought ctdb (locking)--> dlm_controld (locking) or gfs_controld (locking) so when i disable locking in samba i dont know how will this presented to the cluster file system? I thought the cluster file system will use the locks like this below. Ctdb(locking=no) --> gfs2 (locking) Sven -----Urspr?ngliche Nachricht----- Von: Volker Lendecke [mailto:Volker.Lendecke at SerNet.DE] Gesendet: Mittwoch, 28. November 2012 12:15 An: Vogel, Sven Cc: samba at lists.samba.org Betreff: Re: [Samba] CTDB / Samba / GFS2 - Performance - with Picture Link On Wed, Nov 28, 2012 at 11:11:16AM +0000, Vogel, Sven wrote:> Hi Volker, > > so i looked fort he brlock.tdb file and its local on each node. I > added "posix locking = no" and "locking = no". I think it will run now > better. I again a strace file to the server. What do you think?I would not run with locking=no. It will certainly be faster, but it might cause data corruption.> http://dev.kupper-computer.com/intern/smbd_no_locking.txt > > I also added > > fileid:algorithm = fsname > vfs objects = fileid > > for gfs2 whats better fsid or fileid?Dunno, I never used GFS2, sorry. RedHat ships a cluster product with GFS2 and Samba, maybe they have a recommendation. Volker -- SerNet GmbH, Bahnhofsallee 1b, 37081 G?ttingen phone: +49-551-370000-0, fax: +49-551-370000-9 AG G?ttingen, HRB 2816, GF: Dr. Johannes Loxen http://www.sernet.de, mailto:kontakt at sernet.de
Reasonably Related Threads
- samba4 ldap high load and port queue overflow
- FreeBSD - dovecot: We couldn't drop root group privileges
- dovecot-auth: Too many open files
- libvirtd hangs
- Odd Samba 4 ("4.2.0pre1-GIT-b505111"; actually only using client) behaviour #2 - "accept: Software caused connection abort".