Hello, I am running SAMBA 2.0.7.0 on AIX 4.3.3.0 I have a 1 Tera Byte filesystem on AIX made available to NT and MAC. There is heavy IO activity on this filesystem and all of a sudden the NT's and MAC's can't read or write any file on this shared filesystem. Also AIX is running out of memory whenever this happens and files cannot be copied onto the said filesystem even at unix level. The only remedy is to kill the smbd processes, unmount filesystem and remount the filesystem. Invariably, the filesystem is getting corrupted and we need to run fsck on it before it can be remounted. Please note that there is 8 GB of RAM in AIX. And there is absolutely no paging going on. We think samba is not freeing up memory that it uses up for reading files. Please provide a fix for this. Regards, Sumitro Chowdhury Anderson Merchandisers ph: 1-806-376-6251 ext 4864 ________________________________________________________________________ Get Your Private, Free E-mail from MSN Hotmail at http://www.hotmail.com
Hi, 1. I am at AIX 4.3.3.0_02 maint. rel. (instfix confirms that) 2. There is 8 GB of RAM 3. Please explain how to find out "how aggressively I am caching files"? 4. System is an S80 5. PINnable memory (vmtune) was 80%. I changed it now to 60% of RAM. 6. Pl. explain "memory shared by SAMBA". If you mean shared memory, I have set "shared mem size = 5242880" in smb.conf but smbstatus shows "Share mode memory usage (bytes):1045920(99%) free + 2184(0%) used + 472(0%) overhead = 1048576 (100%) total" 7. I have about 10 Win98 and 10 Mac(s) as client. 8. The clients copy files from unix thru samba to their local harddisk, then edit them, and then copy these files to unix thru samba in a different subdirectory (but in the same filesystem). No two clients open the same file at one time. Any one client copies around 1000 files in one transaction, each about 50Mb size. 9. vmstat 5 output: kthr memory page faults cpu ----- ----------- ------------------------ ------------ ----------- r b avm fre re pi po fr sr cy in sy cs us sy id wa 1 3 241311 134 0 0 0 604 1558 0 3518 11448 6706 18 5 57 19 1 3 241311 180 0 0 0 137 427 0 1938 8882 4026 18 4 60 17 2 2 241311 132 0 0 0 902 3624 0 4691 16193 8594 21 15 40 23 1 3 241311 128 0 0 0 105 179 0 2071 11502 4604 20 5 56 19 1 3 241311 128 0 0 0 350 551 0 2714 10862 5608 18 5 58 19 1 2 241476 128 0 0 0 459 2293 0 3013 12787 5850 19 6 55 20 2 3 241319 130 0 0 0 579 1596 0 3473 13111 7201 18 7 56 19 1 3 241117 128 0 0 0 63 215 0 1902 9049 4211 18 4 59 19 1 3 241118 127 0 0 0 863 2825 0 4331 15372 8436 20 7 55 19 1 3 241119 132 0 0 0 333 872 0 2818 15402 5901 24 6 50 19 10. I have kept debug level as 1 otherwise the log files are growing very fast. 11. error messages in log.<username> file in /var/samba/log directory: "[2000/06/30 07:36:04, 0] lib/util_sock.c:write_data(508) write_data: write failure. Error = There is not enough memory available now." Please let me know what more information is required and I shall post it ASAP. Thanks in advance, Sumitro Chowdhury. Anderson Merchandisers ph: 1-806-376-6251 ext 4864>From: William Jojo <jojowil@hvcc.edu> >To: smc_adsm@hotmail.com >Subject: Re: SAMBA eats up all memory... >Date: Wed, 05 Jul 2000 11:25:48 -0400 > > >Sumitro, > >That's very interesting...we have exact setup here and do not have what you >describe. Are you certain all available patches for AIX 433 are installed? >There >are known data corruption issues at base levels. > >Also, how much memory is in your system to support this amount of I/O? how >aggressively are you caching files? what model system is this? how much >memory >is allowed to be pinned? (vmtune) > >How much memory is shared for SAMBA? How many clients do have and how many >files >are you projecting them to open at one time? > >I can help you figure this out if you can provide some seriously detailed >info >for me. > >Bill > > >Sumitro Chowdhury wrote: > > > > Hello, > > I am running SAMBA 2.0.7.0 on AIX 4.3.3.0 > > > > I have a 1 Tera Byte filesystem on AIX made available to NT and MAC. > > There is heavy IO activity on this filesystem and all of a sudden the >NT's > > and MAC's can't read or write any file on this shared filesystem. > > > > Also AIX is running out of memory whenever this happens and files cannot >be > > copied onto the said filesystem even at unix level. > > > > The only remedy is to kill the smbd processes, unmount filesystem and > > remount the filesystem. Invariably, the filesystem is getting corrupted >and > > we need to run fsck on it before it can be remounted. > > > > Please note that there is 8 GB of RAM in AIX. > > And there is absolutely no paging going on. > > > > We think samba is not freeing up memory that it uses up for reading >files. > > > > Please provide a fix for this. > > > > Regards, > > Sumitro Chowdhury > > Anderson Merchandisers > > ph: 1-806-376-6251 ext 4864 > > > > ________________________________________________________________________ > > Get Your Private, Free E-mail from MSN Hotmail at http://www.hotmail.com > >-- > > > /------------------------------------------------------\ > | | > | William E. Jojo, Jr. | > | | > | Senior Systems and Network Specialist | > | | > | Hudson Valley Community College | > | | > | (518) 629 7540 | > | | > | jojowil@hvcc.edu | > | | > \------------------------------------------------------/ > > > One step on your own as you walk all over me > > One head in the clouds you won't let go you're too proud > > One light to the blind and they see > > One touch on the head and we believe________________________________________________________________________ Get Your Private, Free E-mail from MSN Hotmail at http://www.hotmail.com
Hi, 1. (A) high water mark and low water mark are set to 0 (B) maxrandwrit = 0 in vmtune So I would say I/O pacing is off and write behind is off. 2. Since avm in vmstat is not increasing, there does not seem to be any memory leak. but the system is CERTAINLY running out of memory. This is what is frustating that I canNOT "see" how system can run out of memory without a) memory leak b)heavy paging. 3. lsps -a: Page Space Physical Volume Volume Group Size %Used Active Auto Type paging00 hdisk1 rootvg 1024MB 1 yes yes lv hd6 hdisk0 rootvg 512MB 1 yes yes lv Essentialy there is no disk paging. 4. vmtune output: vmtune: current values: -p -P -r -R -f -F -N -W minperm maxperm minpgahead maxpgahead minfree maxfree pd_npages maxrandwrt 104855 209711 2 8 120 128 524288 0 -M -w -k -c -b -B -u -l -d maxpin npswarn npskill numclust numfsbufs hd_pbuf_cnt lvm_bufcnt lrubucket defps 1258268 12288 3072 1 93 1649 9 131072 1 -s -n -S -h sync_release_ilock nokilluid v_pinshm strict_maxperm 0 0 0 0 number of valid memory pages = 2097113 maxperm=10.0% of real memory maximum pinable=60.0% of real memory minperm=5.0% of real memory number of file memory pages = 1752845 numperm=83.6% of real memory I have kept maxperm low to keep file caching to a minimum. I am dead scared about the filesystem getting corrupted (and it is), I figure if I can keep the file cache in the memory, I have a lesser chance of losing too many files when fsck is run. But no luck on this so far. I am thinking of setting minperm to 10% and maxperm to 30% 5. I would also tend to agree that MACs are screwing things up but they require proof. What troubles me is that when MACs and NT stop writing to the shared file system, I can't copy , move , rm 50MB files from the AIX prompt even. I get system out of memory errors on the screen. Nothing on errpt though and as u saw, avm is around 1GB. What's happening to the rest 7GB of memory ??? Thanks in advance... Sumitro Chowdhury>From: William Jojo <jojowil@hvcc.edu> >Reply-To: jojowil@hvcc.edu >To: Multiple recipients of list SAMBA <samba@samba.org> >Subject: Re: SAMBA eats up all memory... >Date: Thu, 6 Jul 2000 04:11:24 +1000 > > > >Sumitro Chowdhury wrote: > > > > Hi, > > 1. I am at AIX 4.3.3.0_02 maint. rel. (instfix confirms that) > >That sounds about right > > > 2. There is 8 GB of RAM > >Sweet... :) > > > 3. Please explain how to find out "how aggressively I am caching files"? > >Have you enabled I/O pacing or write-behind? Unless you've heard of them, >then >you're not, as they are turned off by default. If you have lots of memory >(which >you have) and really fast disks (which you probably do) then you shouldn't >need >them. These concepts are more for memory-challenged systems. > > > 4. System is an S80 > >Sweet... > > > 5. PINnable memory (vmtune) was 80%. I changed it now to 60% of RAM. > >80 was probably ok. How much perm I wonder? (see below) what's the view of >the >whole vmtune? > > > 6. Pl. explain "memory shared by SAMBA". If you mean shared memory, > > I have set "shared mem size = 5242880" in smb.conf but smbstatus > > shows > > "Share mode memory usage (bytes):1045920(99%) free + 2184(0%) > > used + 472(0%) overhead = 1048576 (100%) total" > >This will not change unless you either 1) reboot or 2) stop samba, delete >the >shared memory area with ipcrm and restart samba. If you are at all >squeemish, >about doing this - reboot... > > > 7. I have about 10 Win98 and 10 Mac(s) as client. > >Hmmmm. sounds like a memory leak to me, but what do I know...can you remove >the >macs and try to recreate the problem. One of my colleagues thinks they are >the >problem. > > > 8. The clients copy files from unix thru samba to their local harddisk, >then > > edit them, and then copy these files to unix thru samba in a different > > subdirectory (but in the same filesystem). > > No two clients open the same file at one time. > > Any one client copies around 1000 files in one transaction, each about >50Mb > > size. > > > >That should be okay, except if that's where the problem is. > > > 9. vmstat 5 output: > > > > kthr memory page faults cpu > > ----- ----------- ------------------------ ------------ ----------- > > r b avm fre re pi po fr sr cy in sy cs us sy id wa > > 1 3 241311 134 0 0 0 604 1558 0 3518 11448 6706 18 5 57 19 > > 1 3 241311 180 0 0 0 137 427 0 1938 8882 4026 18 4 60 17 > > 2 2 241311 132 0 0 0 902 3624 0 4691 16193 8594 21 15 40 23 > > 1 3 241311 128 0 0 0 105 179 0 2071 11502 4604 20 5 56 19 > > 1 3 241311 128 0 0 0 350 551 0 2714 10862 5608 18 5 58 19 > > 1 2 241476 128 0 0 0 459 2293 0 3013 12787 5850 19 6 55 20 > > 2 3 241319 130 0 0 0 579 1596 0 3473 13111 7201 18 7 56 19 > > 1 3 241117 128 0 0 0 63 215 0 1902 9049 4211 18 4 59 19 > > 1 3 241118 127 0 0 0 863 2825 0 4331 15372 8436 20 7 55 19 > > 1 3 241119 132 0 0 0 333 872 0 2818 15402 5901 24 6 50 19 > > > >This indicates several things. 1) You're system memory requirements are >about >943MB for everything to run "nicely" (avm * 4 / 1024) As you can see demand >paging is certainly working - sr is the number of pages scanned by the VMM >to >satisfy memory requests and fr is the number of pages freed. > >At the same time, you're not doing and paging to *or* from disk. I would >like to >know what lsps -a has to say about all this. > >What troubles me is you have many processors twiddling their thumbs (3-4 at >any >given interval based on id) while you have several blocked threads and a >somewhat busy VMM. In other words...everything's fairly normal except for >what >is happening to you. > > > 10. I have kept debug level as 1 otherwise the log files are growing > > very fast. > > >Tell me about it... > > > 11. error messages in log.<username> file in /var/samba/log directory: > > "[2000/06/30 07:36:04, 0] lib/util_sock.c:write_data(508) > > write_data: write failure. Error = There is not enough memory >available > > now." > > > > Please let me know what more information is required and I shall post it > > ASAP. > > > > Thanks in advance, > > Sumitro Chowdhury. > > Anderson Merchandisers > > ph: 1-806-376-6251 ext 4864 > > > > >From: William Jojo <jojowil@hvcc.edu> > > >To: smc_adsm@hotmail.com > > >Subject: Re: SAMBA eats up all memory... > > >Date: Wed, 05 Jul 2000 11:25:48 -0400 > > > > > > > > >Sumitro, > > > > > >That's very interesting...we have exact setup here and do not have what >you > > >describe. Are you certain all available patches for AIX 433 are >installed? > > >There > > >are known data corruption issues at base levels. > > > > > >Also, how much memory is in your system to support this amount of I/O? >how > > >aggressively are you caching files? what model system is this? how much > > >memory > > >is allowed to be pinned? (vmtune) > > > > > >How much memory is shared for SAMBA? How many clients do have and how >many > > >files > > >are you projecting them to open at one time? > > > > > >I can help you figure this out if you can provide some seriously >detailed > > >info > > >for me. > > > > > >Bill > > > > > > > > >Sumitro Chowdhury wrote: > > > > > > > > Hello, > > > > I am running SAMBA 2.0.7.0 on AIX 4.3.3.0 > > > > > > > > I have a 1 Tera Byte filesystem on AIX made available to NT and MAC. > > > > There is heavy IO activity on this filesystem and all of a sudden >the > > >NT's > > > > and MAC's can't read or write any file on this shared filesystem. > > > > > > > > Also AIX is running out of memory whenever this happens and files >cannot > > >be > > > > copied onto the said filesystem even at unix level. > > > > > > > > The only remedy is to kill the smbd processes, unmount filesystem >and > > > > remount the filesystem. Invariably, the filesystem is getting >corrupted > > >and > > > > we need to run fsck on it before it can be remounted. > > > > > > > > Please note that there is 8 GB of RAM in AIX. > > > > And there is absolutely no paging going on. > > > > > > > > We think samba is not freeing up memory that it uses up for reading > > >files. > > > > > > > > Please provide a fix for this. > > > > > > > > Regards, > > > > Sumitro Chowdhury > > > > Anderson Merchandisers > > > > ph: 1-806-376-6251 ext 4864 > > > > > > > > >________________________________________________________________________ > > > > Get Your Private, Free E-mail from MSN Hotmail at >http://www.hotmail.com > > > > > >-- > > > > > > > > > /------------------------------------------------------\ > > > | | > > > | William E. Jojo, Jr. | > > > | | > > > | Senior Systems and Network Specialist | > > > | | > > > | Hudson Valley Community College | > > > | | > > > | (518) 629 7540 | > > > | | > > > | jojowil@hvcc.edu | > > > | | > > > \------------------------------------------------------/ > > > > > > > > > One step on your own as you walk all over me > > > > > > One head in the clouds you won't let go you're too proud > > > > > > One light to the blind and they see > > > > > > One touch on the head and we believe > > > > ________________________________________________________________________ > > Get Your Private, Free E-mail from MSN Hotmail at http://www.hotmail.com > >-- > > > /------------------------------------------------------\ > | | > | William E. Jojo, Jr. | > | | > | Senior Systems and Network Specialist | > | | > | Hudson Valley Community College | > | | > | (518) 629 7540 | > | | > | jojowil@hvcc.edu | > | | > \------------------------------------------------------/ > > > One step on your own as you walk all over me > > One head in the clouds you won't let go you're too proud > > One light to the blind and they see > > One touch on the head and we believe________________________________________________________________________ Get Your Private, Free E-mail from MSN Hotmail at http://www.hotmail.com
Hi Bill, Thanks for your patience... [A] My "svmon" and corrosponding "vmstat" output: neptune02.amerch.com:/ > svmon size inuse free pin virtual memory 2097113 653391 128 101441 263249 pg space 393216 1241 work pers clnt pin 101579 0 0 in use -1141755 1795146 0 neptune02.amerch.com:/ > vmstat 5 kthr memory page faults cpu ----- ----------- ------------------------ ------------ ----------- r b avm fre re pi po fr sr cy in sy cs us sy id wa 0 0 263257 137 0 0 0 30 55 0 181 383 130 8 2 88 2 1 2 263257 128 0 0 0 516 971 0 727 1098 316 17 1 77 5 Bit confused on the mathematics :( a)free memory certainly matches between svmon and vmstat (128 frames) b) avm matches virtual , so far so good. c) what does -ve (-1141755 ) mean? Although pers-work=inuse (1795146-1141755=653391) !! d) If pers is what is used up in caching files,work in process and fre is free memory, pers+work+fre should be = memory size but in my svmon, it is not so. e) Could you kindly explain my svmon output? f) which is stale memory? [B] No. of processes is around 150 (ps -ef|wc -l) maxuproc is 500 ulimit -a: time(seconds) unlimited file(blocks) 2097151 data(kbytes) 131072 stack(kbytes) 32768 memory(kbytes) 1048576 coredump(blocks) 2097151 nofiles(descriptors) 2000 smbd(s) are running as root and are spawned by inetd. Please let me know what else to try... Thanks as always, Sumitro Chowdhury>From: William Jojo <jojowil@hvcc.edu> >To: smc_adsm@hotmail.com >CC: Multiple recipients of list SAMBA <samba@samba.org> >Subject: Re: SAMBA eats up all memory... >Date: Thu, 06 Jul 2000 08:37:50 -0400 > > > >Sumitro Chowdhury wrote: > > > > Hi, > > 1. (A) high water mark and low water mark are set to 0 > > (B) maxrandwrit = 0 in vmtune > > So I would say I/O pacing is off and write behind is off. > > > >Okay...I thought so... > > > 2. Since avm in vmstat is not increasing, there does not seem to be > > any memory leak. but the system is CERTAINLY running out of memory. > > This is what is frustating that I canNOT "see" how system can run > > out of memory without a) memory leak b)heavy paging. > > > >Actually, that's not entirely accurate. This is the amount of memory in use >by >programs, not AIX proper. IOW, if you have: > > perfagent.tools 2.2.33.13 COMMITTED Local Performance >Analysis & > Control Commands > >installed, run the following (if not, I *strongly* suggest you get it from >the >CD): > >[storage:/] # svmon > > size inuse free pin virtual >memory 784359 742339 29687 784359 63734 >pg space 786432 32823 > > work pers clnt >pin 30423 0 0 >in use 64830 677475 34 > >[storage:/] # vmstat 2 >kthr memory page faults cpu >----- ----------- ------------------------ ------------ ----------- > r b avm fre re pi po fr sr cy in sy cs us sy id wa > 0 0 63479 29957 0 0 0 10 22 0 134 380 66 1 2 93 4 > 0 2 63479 29955 0 0 0 0 0 0 435 2412 55 0 1 98 0 > 0 2 63479 29955 0 0 0 0 0 0 431 211 49 0 0 99 0 > 0 2 63479 29955 0 0 0 0 0 0 444 215 53 0 0 99 0 > >As you can see, I have a 3GB system, but only 250+MB is tied to processes >(work), the rest is in file caching (pers) or is free. > > > 3. lsps -a: > > Page Space Physical Volume Volume Group Size %Used Active >Auto > > Type > > paging00 hdisk1 rootvg 1024MB 1 yes yes > > lv > > hd6 hdisk0 rootvg 512MB 1 yes yes > > lv > > > > Essentialy there is no disk paging. > >Which makes sense, since maxperm is 10% - the VMM will leave working pages >alone >and aggresively steal file pages to minimize paging. What you did was >correct >but overkill. I would leave maxperm at 80%. This would give 6.4GB to files >and >1.6 to programs and the kernel. The system should still aggresively steal >file >pages and not do swapping since you only require ~1GB at present. If paging >does >begin, reduce maxperm by 5% until it stops and levels off but try to stay >at 50% >or higher since the VMM is designed to page to disk as well and is pretty >smart- >you should always look at avm from vmstat to see exactly what your >processes >need. > > > > > 4. vmtune output: > >see above ;) > > > 5. I would also tend to agree that MACs are screwing things up but they > > require proof. What troubles me is that when MACs and NT stop > > writing to the shared file system, I can't copy , move , rm 50MB > > files from the AIX prompt even. I get system out of memory errors on the > > screen. Nothing on errpt though and as u saw, avm is around 1GB. > > What's happening to the rest 7GB of memory ??? > >Check out this web page from IBM. It'll help you discover memory leaks in >programs. > >http://www.rs6000.ibm.com/doc_link/en_US/a_doc_lib/aixbman/prftungd/memoryuse.htm > >Use smbstatus to get the pid's of suspicious clients and get the proof you >need. > >Check out the online docs here: > >http://www.rs6000.ibm.com/doc_link/en_US/a_doc_lib/aixgen/ > > >There is more: > >What's the current number of processes running and what's the maxuproc >value of >lsattr -El sys0? Are all your smbd's running as root (which they should >be)? > >I ask because this is an argument I have with IBM right now that maxuproc >as of >4.3.3 (but not 4.3.2 or lower) seems to affect root (uid 0) when it should >not. > >Also what are the ulimit -a values for root and everyone else? these are >also >stored in /etc/security/limits. You may simply be hitting a data segment or >rss >wall which would look like your system is out of memory or, more to the >point, >like you have a memory leaky program (which I really don't think you >do...the >Samba Team has worked their asses off to make sure this code is clean and >fast) > > >Bill________________________________________________________________________ Get Your Private, Free E-mail from MSN Hotmail at http://www.hotmail.com