Last week I upgraded to Solaris 10_U2 and migrated my old UFS partitions to a ZFS pool. Since then I''ve noticed some of my nightly cron jobs failing because of memory allocation errors. So today I decided to look into it. First thing I looked at was user process memory. The only two real user processes besides Solaris processes are (Sun Java System) Web Server and MySQL. Neither of those were taking an appreciable amount of the 3G I have in the system. At any rate, I killed them, did not see any substantial amount of memory freed up. So I started to wonder about kernel memory. Now I haven''t poked around the kernel for a long time so any pointers would be appreciated. Anyways, I found the ::memstat dcmd for mdb. So I gave it a spin and it looked something like Page Summary Pages MB %Tot ------------ ---------------- ---------------- ---- Kernel 139650 1091 36% Anon 38142 297 10% Exec and libs 3193 24 1% Page cache 8644 67 2% Free (cachelist) 14135 110 4% Free (freelist) 185057 1445 48% Total 388821 3037 Physical 382802 2990 Actually, Kernel memory was more near 1700 MB but I rebooted w/o saving the original numbers. Seemed like an aweful lot of memory for the kernel to be using. At this point, I decided to reboot and see what ::memstat showed just after boot. Not surprisingly, Kernel memory was only at about 247 MB. Poking around a little more I found ::kmastat dcmd in mdb. And ran that a couple of times after boot and squirreled off its output. Then I decided to run some of my nightly scripts by hand, while I watched ::memstat output and noticed Kernel creeping up to the 1G number above. The scripts load some data from the MySQL database, the data of which resides on a ZFS filesystem. I ran ::kmastat again, and compared the output with the original and I found that the zio_buf caches/arenas were the big culprit of the newly consumed Kernel allocations. In fact, the difference between the original zio_buf and new zio_buf statistics showed about 813M in use which happens to be just about the same amount the change in Kernel memory ::memstat is showing. So what''s going on! Please help. I want my memory back! --joe
Joseph Mocker wrote: ...> Anyways, I found the ::memstat dcmd for mdb. So I gave it a spin and it > looked something like > > Page Summary Pages MB %Tot > ------------ ---------------- ---------------- ---- > Kernel 139650 1091 36% > Anon 38142 297 10% > Exec and libs 3193 24 1% > Page cache 8644 67 2% > Free (cachelist) 14135 110 4% > Free (freelist) 185057 1445 48% > > Total 388821 3037 > Physical 382802 2990 > > > Actually, Kernel memory was more near 1700 MB but I rebooted w/o saving > the original numbers. Seemed like an aweful lot of memory for the kernel > to be using....> So what''s going on! Please help. I want my memory back!This is essentially by design, due to the way that ZFS uses kernel memory for caching and other stuff. You can alleviate this somewhat by running a 64bit processor, which has a significantly larger address space to play with. best regards, James C. McPherson -- Solaris Datapath Engineering Storage Division Sun Microsystems
> >> So what''s going on! Please help. I want my memory back! > > > This is essentially by design, due to the way that ZFS uses kernel > memory for caching and other stuff. > > You can alleviate this somewhat by running a 64bit processor, which > has a significantly larger address space to play with.Uhh. If I don''t have any more physical memory, how does a 64bit processor help? FWIW, this is on a SunBlade 2000 running in 64bit mode: root at watt[27]: uname -a SunOS watt 5.10 Generic_118833-17 sun4u sparc SUNW,Sun-Blade-1000 root at watt[28]: isainfo sparcv9 sparc
There two things to note here: 1. The vast majority of the memory is being used by the ZFS cache, but appears under ''kernel heap''. If you actually need the memory, it _should_ be released. Under UFS, this cache appears as the ''page cache'', and users understand that it can be released when needed. The same is true of ZFS, but it''s just not accounted for as separate memory. Now, the VM hooks needed to do this are somewhat add hoc at the moment, but the ZFS cache should keep itself from consuming 100% of the available memory. 2. There is a difference between VA (virtual addressing) and physical memory. See the following thread for a more complete discussion: http://www.opensolaris.org/jive/thread.jspa?threadID=10774&tstart=45&start=15 So the (apparent) high kernel memory consumption is expected, and does not indicate any type of problem. Applications actually receiving ENOMEM should not happen, and may indicate that there are some circumstances where the VM interfaces are currently inadequate. Someone else on the ZFS team may be able to get some more specifics from you to figure out what''s really going on. - Eric On Thu, Jul 20, 2006 at 04:03:50PM -0700, Joseph Mocker wrote:> > > > >>So what''s going on! Please help. I want my memory back! > > > > > >This is essentially by design, due to the way that ZFS uses kernel > >memory for caching and other stuff. > > > >You can alleviate this somewhat by running a 64bit processor, which > >has a significantly larger address space to play with. > > Uhh. If I don''t have any more physical memory, how does a 64bit > processor help? > > FWIW, this is on a SunBlade 2000 running in 64bit mode: > > root at watt[27]: uname -a > SunOS watt 5.10 Generic_118833-17 sun4u sparc SUNW,Sun-Blade-1000 > root at watt[28]: isainfo > sparcv9 sparc > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss-- Eric Schrock, Solaris Kernel Development http://blogs.sun.com/eschrock
Eric, Thanks for the explanation. I am familiar with the UFS cache and assumed ZFS cache would have worked the same way. However, it seems like there are a few bugs here. Here''s what I see. 1. I can cause an out of memory situation by simply copying a bunch of files between folders in a ZFS filesystem. I copied a bunch of database files by running the following, during which ZFS apparently consumed all the available memory: i=1; for file in `find . -type f`; do echo "doing $file"; cat $file > /app/tmp/file.$i; i=`expr $i + 1`; done 2. It doesn''t appear that ZFS is actually releasing the cache when something like a user process needs it. As a simple test, I went into /tmp/ and started to make 512M files. I was only able to create one before I received the error Could not set length of tmp.02: No space left on device Here''s the ::memstat output for the whole process. Before first "mkfile 512m" Page Summary Pages MB %Tot ------------ ---------------- ---------------- ---- Kernel 204322 1596 53% Anon 38387 299 10% Exec and libs 2058 16 1% Page cache 1120 8 0% Free (cachelist) 59677 466 15% Free (freelist) 83257 650 21% Total 388821 3037 Physical 382802 2990 After first "mkfile 512m" Page Summary Pages MB %Tot ------------ ---------------- ---------------- ---- Kernel 204625 1598 53% Anon 38387 299 10% Exec and libs 1984 15 1% Page cache 65767 513 17% Free (cachelist) 60598 473 16% Free (freelist) 17460 136 4% Total 388821 3037 Physical 382802 2990 Subsequent "mkfile 512m"s fail. --joe Eric Schrock wrote:> There two things to note here: > > 1. The vast majority of the memory is being used by the ZFS cache, but > appears under ''kernel heap''. If you actually need the memory, it > _should_ be released. Under UFS, this cache appears as the ''page > cache'', and users understand that it can be released when needed. > The same is true of ZFS, but it''s just not accounted for as separate > memory. Now, the VM hooks needed to do this are somewhat add hoc at > the moment, but the ZFS cache should keep itself from consuming 100% > of the available memory. > > 2. There is a difference between VA (virtual addressing) and physical > memory. See the following thread for a more complete discussion: > > http://www.opensolaris.org/jive/thread.jspa?threadID=10774&tstart=45&start=15 > > So the (apparent) high kernel memory consumption is expected, and does > not indicate any type of problem. Applications actually receiving > ENOMEM should not happen, and may indicate that there are some > circumstances where the VM interfaces are currently inadequate. Someone > else on the ZFS team may be able to get some more specifics from you to > figure out what''s really going on. > > - Eric > > On Thu, Jul 20, 2006 at 04:03:50PM -0700, Joseph Mocker wrote: > >>>> So what''s going on! Please help. I want my memory back! >>>> >>> This is essentially by design, due to the way that ZFS uses kernel >>> memory for caching and other stuff. >>> >>> You can alleviate this somewhat by running a 64bit processor, which >>> has a significantly larger address space to play with. >>> >> Uhh. If I don''t have any more physical memory, how does a 64bit >> processor help? >> >> FWIW, this is on a SunBlade 2000 running in 64bit mode: >> >> root at watt[27]: uname -a >> SunOS watt 5.10 Generic_118833-17 sun4u sparc SUNW,Sun-Blade-1000 >> root at watt[28]: isainfo >> sparcv9 sparc >> >> _______________________________________________ >> zfs-discuss mailing list >> zfs-discuss at opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >> > > -- > Eric Schrock, Solaris Kernel Development http://blogs.sun.com/eschrock >
Joseph Mocker wrote:> Eric, > > Thanks for the explanation. I am familiar with the UFS cache and assumed > ZFS cache would have worked the same way. > > However, it seems like there are a few bugs here. Here''s what I see. > > 1. I can cause an out of memory situation by simply copying a bunch of > files between folders in a ZFS filesystem. > I copied a bunch of database files by running the following, during > which ZFS apparently consumed all the available memory: > i=1; for file in `find . -type f`; do echo "doing $file"; cat $file > > /app/tmp/file.$i; i=`expr $i + 1`; done > > 2. It doesn''t appear that ZFS is actually releasing the cache when > something like a user process needs it. As a simple test, I went into > /tmp/ and started to make 512M files. I was only able to create one > before I received the error > > Could not set length of tmp.02: No space left on device > > Here''s the ::memstat output for the whole process. > > Before first "mkfile 512m" >How much swap space is configured on this machine? - Bart -- Bart Smaalders Solaris Kernel Performance barts at cyber.eng.sun.com http://blogs.sun.com/barts
Something I often do when I''m a little suspicious of this sort of activity is to run something that steals vast quantities of memory... eg: something like this: #include <stdio.h> #include <stdlib.h> int main() { int memsize=0; char *input_string; char *memory; long i=0; input_string=malloc(256 * sizeof(char)); printf("How much memory? :"); input_string=fgets(input_string, 255, stdin); memsize=atoi(input_string); printf("mem_size=%d\n", memsize); memory=calloc(memsize*1024*1024, 1); printf("Pausing: hit enter to exit\n"); input_string=fgets(input_string, 255, stdin); exit(0); } which allows me to request, say, 500mb of memory. Watching vmstat whilst doing this is interesting. It then runs and uses lots of memory, and causes some pressure. If, at the end when it exits, you have lots of memory free, and nothing swapped out, it''s all good. :) quick, dirty, possibly even smelly, with no error checking at all... :) Nathan. On Fri, 2006-07-21 at 09:28, Eric Schrock wrote:> There two things to note here: > > 1. The vast majority of the memory is being used by the ZFS cache, but > appears under ''kernel heap''. If you actually need the memory, it > _should_ be released. Under UFS, this cache appears as the ''page > cache'', and users understand that it can be released when needed. > The same is true of ZFS, but it''s just not accounted for as separate > memory. Now, the VM hooks needed to do this are somewhat add hoc at > the moment, but the ZFS cache should keep itself from consuming 100% > of the available memory. > > 2. There is a difference between VA (virtual addressing) and physical > memory. See the following thread for a more complete discussion: > > http://www.opensolaris.org/jive/thread.jspa?threadID=10774&tstart=45&start=15 > > So the (apparent) high kernel memory consumption is expected, and does > not indicate any type of problem. Applications actually receiving > ENOMEM should not happen, and may indicate that there are some > circumstances where the VM interfaces are currently inadequate. Someone > else on the ZFS team may be able to get some more specifics from you to > figure out what''s really going on. > > - Eric > > On Thu, Jul 20, 2006 at 04:03:50PM -0700, Joseph Mocker wrote: > > > > > > > >>So what''s going on! Please help. I want my memory back! > > > > > > > > >This is essentially by design, due to the way that ZFS uses kernel > > >memory for caching and other stuff. > > > > > >You can alleviate this somewhat by running a 64bit processor, which > > >has a significantly larger address space to play with. > > > > Uhh. If I don''t have any more physical memory, how does a 64bit > > processor help? > > > > FWIW, this is on a SunBlade 2000 running in 64bit mode: > > > > root at watt[27]: uname -a > > SunOS watt 5.10 Generic_118833-17 sun4u sparc SUNW,Sun-Blade-1000 > > root at watt[28]: isainfo > > sparcv9 sparc > > > > _______________________________________________ > > zfs-discuss mailing list > > zfs-discuss at opensolaris.org > > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > -- > Eric Schrock, Solaris Kernel Development http://blogs.sun.com/eschrock > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss--
Bart Smaalders wrote:> > How much swap space is configured on this machine?Zero. Is there any reason I would want to configure any swap space? --joe
Yeah I was a little suspicious of my mkfile in tmpfs test so I went ahead and wrote a program not so different than this one. The results were the same. I could only allocate about 512M before things went bad. --joe Nathan Kroenert wrote:>Something I often do when I''m a little suspicious of this sort of >activity is to run something that steals vast quantities of memory... > >eg: something like this: > >#include <stdio.h> >#include <stdlib.h> > >int main() >{ > > int memsize=0; > char *input_string; > char *memory; > long i=0; > > input_string=malloc(256 * sizeof(char)); > > printf("How much memory? :"); > input_string=fgets(input_string, 255, stdin); > memsize=atoi(input_string); > > printf("mem_size=%d\n", memsize); > > memory=calloc(memsize*1024*1024, 1); > > printf("Pausing: hit enter to exit\n"); > > input_string=fgets(input_string, 255, stdin); > > exit(0); > >} > > >which allows me to request, say, 500mb of memory. Watching vmstat whilst >doing this is interesting. >It then runs and uses lots of memory, and causes some pressure. >If, at the end when it exits, you have lots of memory free, and nothing >swapped out, it''s all good. :) > >quick, dirty, possibly even smelly, with no error checking at all... > >:) > >Nathan. > > > >On Fri, 2006-07-21 at 09:28, Eric Schrock wrote: > > >>There two things to note here: >> >>1. The vast majority of the memory is being used by the ZFS cache, but >> appears under ''kernel heap''. If you actually need the memory, it >> _should_ be released. Under UFS, this cache appears as the ''page >> cache'', and users understand that it can be released when needed. >> The same is true of ZFS, but it''s just not accounted for as separate >> memory. Now, the VM hooks needed to do this are somewhat add hoc at >> the moment, but the ZFS cache should keep itself from consuming 100% >> of the available memory. >> >>2. There is a difference between VA (virtual addressing) and physical >> memory. See the following thread for a more complete discussion: >> >> http://www.opensolaris.org/jive/thread.jspa?threadID=10774&tstart=45&start=15 >> >>So the (apparent) high kernel memory consumption is expected, and does >>not indicate any type of problem. Applications actually receiving >>ENOMEM should not happen, and may indicate that there are some >>circumstances where the VM interfaces are currently inadequate. Someone >>else on the ZFS team may be able to get some more specifics from you to >>figure out what''s really going on. >> >>- Eric >> >>On Thu, Jul 20, 2006 at 04:03:50PM -0700, Joseph Mocker wrote: >> >> >>>>>So what''s going on! Please help. I want my memory back! >>>>> >>>>> >>>>This is essentially by design, due to the way that ZFS uses kernel >>>>memory for caching and other stuff. >>>> >>>>You can alleviate this somewhat by running a 64bit processor, which >>>>has a significantly larger address space to play with. >>>> >>>> >>>Uhh. If I don''t have any more physical memory, how does a 64bit >>>processor help? >>> >>>FWIW, this is on a SunBlade 2000 running in 64bit mode: >>> >>>root at watt[27]: uname -a >>>SunOS watt 5.10 Generic_118833-17 sun4u sparc SUNW,Sun-Blade-1000 >>>root at watt[28]: isainfo >>>sparcv9 sparc >>> >>>_______________________________________________ >>>zfs-discuss mailing list >>>zfs-discuss at opensolaris.org >>>http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >>> >>> >>-- >>Eric Schrock, Solaris Kernel Development http://blogs.sun.com/eschrock >>_______________________________________________ >>zfs-discuss mailing list >>zfs-discuss at opensolaris.org >>http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >> >>
>Bart Smaalders wrote: > >> >> How much swap space is configured on this machine? > >Zero. Is there any reason I would want to configure any swap space?Yes. In this particular case: total: 213728k bytes allocated + 8896k reserved = 222624k used, 11416864k available you have 9MB of "reserved" memory which means it is memory which is not doing anything. Then there is a lot of "dirty" data which is never used again and which could be relegated to disk swap, if only there was some. Casper
Joseph Mocker wrote:> Bart Smaalders wrote: > >> >> How much swap space is configured on this machine? > > Zero. Is there any reason I would want to configure any swap space? > > --joeWell, if you want to allocate 500 MB in /tmp, and your machine has no swap, you need 500M of physical memory or the write _will_ fail. W/ no swap configured, every allocation in every process of any malloc''d memory, etc, is locked into RAM. I just swap on a zvol w/ my ZFS root machine. - Bart -- Bart Smaalders Solaris Kernel Performance barts at cyber.eng.sun.com http://blogs.sun.com/barts
Bart Smaalders wrote:> ... > > I just swap on a zvol w/ my ZFS root machine.I haven''t been watching...what''s the current status of using ZFS for swap/dump? Is a/the swap solution to use mkswap and then specify that file in vfstab? Darren
On Sat, Jul 22, 2006 at 12:44:16AM +0800, Darren Reed wrote:> Bart Smaalders wrote: > > >I just swap on a zvol w/ my ZFS root machine. > > > I haven''t been watching...what''s the current status of using > ZFS for swap/dump? > > Is a/the swap solution to use mkswap and then specify that file > in vfstab?ZFS currently support swap, but not dump. For swap, just make a zvol and add that to vfstab. --Bill
Casper.Dik at sun.com wrote:> Yes. > In this particular case: > > total: 213728k bytes allocated + 8896k reserved = 222624k used, 11416864k available > > you have 9MB of "reserved" memory which means it is memory which is not > doing anything. > > Then there is a lot of "dirty" data which is never used again and which > could be relegated to disk swap, if only there was some. > >We''ve kind of side tracked, but Yes, I do understand the limitations of running without swap. However, in the interest of performance, I, and in fact my whole organization which runs about 300 servers, disable swap. We''ve never had an out of memory problem in the past because of kernel memory. Is that wrong? We can''t typically afford to have the kernel swap out portions of the application to disk and back. At any rate, I don''t think adding swap will fix the problem I am seeing in that ZFS is not releasing its unused cache when applications need it. Adding swap might allow the kernel to move it out of memory but when the system needs it again it will have to swap it back in, and only performance suffers, no? FWIW, here''s the current ::memstat and swap output for my system. The reserved number is only about 46M or about 2% of RAM. Considering the box has 3G, I''m willing to sacrifice 2% in the interest of performance. Page Summary Pages MB %Tot ------------ ---------------- ---------------- ---- Kernel 249927 1952 64% Anon 34719 271 9% Exec and libs 2415 18 1% Page cache 1676 13 0% Free (cachelist) 11796 92 3% Free (freelist) 88288 689 23% Total 388821 3037 Physical 382802 2990 mock at watt[5]: swap -s total: 260008k bytes allocated + 47256k reserved = 307264k used, 381072k available
Bart Smaalders wrote:> Joseph Mocker wrote: >> Bart Smaalders wrote: >> >>> >>> How much swap space is configured on this machine? >> >> Zero. Is there any reason I would want to configure any swap space? >> >> --joe > > Well, if you want to allocate 500 MB in /tmp, and your machine > has no swap, you need 500M of physical memory or the write > _will_ fail. > > W/ no swap configured, every allocation in every process of > any malloc''d memory, etc, is locked into RAM.Yep. Understood. In the interest of performance we typically run w/o swap. Is there a way to tune the system so that swap is used only when RAM is full? We''ve run w/o swap for so long (since 2.7 or 2.8) we''ve not kept up with any advances the swapping algorithms of the kernel.> > I just swap on a zvol w/ my ZFS root machine. >Interesting. Doesn''t ZFS have more overhead in this context than just a traditional RAW partition? Well I suppose you have a better guarantee of data accuracy though. --joe
>We''ve kind of side tracked, but Yes, I do understand the limitations of >running without swap. However, in the interest of performance, I, and in >fact my whole organization which runs about 300 servers, disable swap. >We''ve never had an out of memory problem in the past because of kernel >memory. Is that wrong? We can''t typically afford to have the kernel swap >out portions of the application to disk and back.Why do you think your performance *improves* if you don''t use swap? It is much more likely it *deteriates* because your swap accumulates stuff you do not use.>At any rate, I don''t think adding swap will fix the problem I am seeing >in that ZFS is not releasing its unused cache when applications need it. >Adding swap might allow the kernel to move it out of memory but when the >system needs it again it will have to swap it back in, and only >performance suffers, no?Well, you have decided that all application data needs to be memory resident all of the time; but executables don''t need to be (they are now tossed out on memory shortage) and that ZFS can use less cache than it wants to.>FWIW, here''s the current ::memstat and swap output for my system. The >reserved number is only about 46M or about 2% of RAM. Considering the >box has 3G, I''m willing to sacrifice 2% in the interest of performance. > >Page Summary Pages MB %Tot >------------ ---------------- ---------------- ---- >Kernel 249927 1952 64% >Anon 34719 271 9% >Exec and libs 2415 18 1% >Page cache 1676 13 0% >Free (cachelist) 11796 92 3% >Free (freelist) 88288 689 23% > >Total 388821 3037 >Physical 382802 2990 > >mock at watt[5]: swap -s >total: 260008k bytes allocated + 47256k reserved = 307264k used, 381072k >availableSo there''s 47MB of memory which is not used at all. (Adding swap will give you 47MB of additional free memory without anything being written to disk). Execs are also pushed out on shortfall. There is 265 MB of anon memory and we have no clue how much of it is used at all; a large percentage is likely unused. But OTOH, you have sufficient memory on the freelist so there is not much of an issue. Casper
Casper.Dik at sun.com wrote:>> We''ve kind of side tracked, but Yes, I do understand the limitations of >> running without swap. However, in the interest of performance, I, and in >> fact my whole organization which runs about 300 servers, disable swap. >> We''ve never had an out of memory problem in the past because of kernel >> memory. Is that wrong? We can''t typically afford to have the kernel swap >> out portions of the application to disk and back. >> > > Why do you think your performance *improves* if you don''t use > swap? It is much more likely it *deteriates* because your swap > accumulates stuff you do not use. > >Are you trying to convince me that having applications/application data occasionally swapped out to disk is actually faster than keeping it all in memory? I have another box, which I LU''d to U1 a while ago. Its actually my primary desktop, a 2100z. After the upgrade I noticed my browser, firefox, was running slower. It was sluggish to respond when say I moved from reading my mail with thunderbird to firefox. Looked at swap, wait a minute, LU switched on an inactive swap partition I had disabled long ago. Removed the swap partition, and now everything is quite snappy. The question really becomes, how do I "pin" desirable applications in memory while only allowing "dirty" memory to be shifted out to disk. And still regardless of the swap issue. The bigger issue is that ZFS has about 1G of memory it won''t free up for applications. Is it relying on the existance of swap to dump those pages out? Or should it be releasing memory itself? --joe
Bill Moore <Bill.Moore at sun.com> writes:> On Sat, Jul 22, 2006 at 12:44:16AM +0800, Darren Reed wrote: > > Bart Smaalders wrote: > > > > >I just swap on a zvol w/ my ZFS root machine. > > > > > > I haven''t been watching...what''s the current status of using > > ZFS for swap/dump? > > > > Is a/the swap solution to use mkswap and then specify that file > > in vfstab? > > ZFS currently support swap, but not dump. For swap, just make a zvol > and add that to vfstab.There are two caveats, though: * Before SXCR b43, you''ll need the fix from CR 6405330 so the zvol is added after a reboot. The fix hasn''t been backported to S10 U2 (yet?), so it is equally affected. * A Live Upgrade comments the zvol entry in /etc/vfstab, so you (sort of) loose swap after an upgrade ;-( Rainer -- ----------------------------------------------------------------------------- Rainer Orth, Faculty of Technology, Bielefeld University
I need to read through this more thoroughly to get my head around it, but on my first pass, what jumps out at me is that something significant _changed_ in terms of "application" behavior with the introduction of ZFS. I''m saying that that is a bad thing, or a good thing, but it is an important thing, and we should try to understand if application behavior will, in general, change with the introduction of ZFS, so we can advise users accordingly. Joe appears to be a user of Sun system for some time, with a lot of experience deploying Solaris 8 and Solaris 9. He has succesfully deployed systems without physical swap, and I understand his reason for doing so. If the introduction of Solaris 10 and ZFS means we need to change a system parameter when transitioning from S8 or S9, such as configured swap, we need to understand why, and make sure understand the performance implications.> Why do you think your performance *improves* if you don''t use > swap? It is much more likely it *deteriates* because your swap > accumulates stuff you do not use. >I''m not sure what this is saying, but I don''t think it came out right. As I said, I need to do another pass on the information in the messages to get a better handle on the observed behviour, but this certainly seems like something we should explore further. Watch this space. /jim> >> At any rate, I don''t think adding swap will fix the problem I am seeing >> in that ZFS is not releasing its unused cache when applications need it. >> Adding swap might allow the kernel to move it out of memory but when the >> system needs it again it will have to swap it back in, and only >> performance suffers, no? >> > > Well, you have decided that all application data needs to be memory > resident all of the time; but executables don''t need to be (they > are now tossed out on memory shortage) and that ZFS can use less cache > than it wants to. > > >> FWIW, here''s the current ::memstat and swap output for my system. The >> reserved number is only about 46M or about 2% of RAM. Considering the >> box has 3G, I''m willing to sacrifice 2% in the interest of performance. >> >> Page Summary Pages MB %Tot >> ------------ ---------------- ---------------- ---- >> Kernel 249927 1952 64% >> Anon 34719 271 9% >> Exec and libs 2415 18 1% >> Page cache 1676 13 0% >> Free (cachelist) 11796 92 3% >> Free (freelist) 88288 689 23% >> >> Total 388821 3037 >> Physical 382802 2990 >> >> mock at watt[5]: swap -s >> total: 260008k bytes allocated + 47256k reserved = 307264k used, 381072k >> available >> > > So there''s 47MB of memory which is not used at all. (Adding swap will > give you 47MB of additional free memory without anything being written > to disk). Execs are also pushed out on shortfall. > > There is 265 MB of anon memory and we have no clue how much of it > is used at all; a large percentage is likely unused. > > But OTOH, you have sufficient memory on the freelist so there is not > much of an issue. > > Casper > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >
>Are you trying to convince me that having applications/application data >occasionally swapped out to disk is actually faster than keeping it all >in memory?Yes. Having more memory available generally causes the system to be a faster.>I have another box, which I LU''d to U1 a while ago. Its actually my >primary desktop, a 2100z. After the upgrade I noticed my browser, >firefox, was running slower. It was sluggish to respond when say I moved >from reading my mail with thunderbird to firefox.Then that''s a bug because something expunged the application when it shouldn''t have. If you have enough memory, you should never swap. Casper
I just ran: root at crazycanucks(129): mkfile 5000M f3 Could not set length of f3: No space left on device Which fails in anon_resvmem: dtrace -n fbt::anon_resvmem:return/arg1==0/{@a[stack(20)]=count()} tmpfs`tmp_resv+0x50 tmpfs`wrtmp+0x28c tmpfs`tmp_write+0x50 genunix`fop_write+0x20 genunix`write+0x270 unix`syscall_trap32+0xcc 1 Which could then be: 4034947 anon_swap_adjust(), anon_resvmem() should call kmem_reap() if availrmem is low. FixedInBuild: snv_42 But it is a best practise to run ZFS with some swap, I actually don''t know exactly why, but possibly to account for such bugs as this one. -r
Ah ha. Interesting procedure and bug report. This is starting to make sense. Another interesting bug report: 6416757 zfs could still use less memory This one is more or less the same thing I have noticed. I guess I''ll add some swap for the short term. :-( --joe Roch wrote:> I just ran: > > root at crazycanucks(129): mkfile 5000M f3 > Could not set length of f3: No space left on device > > > Which fails in anon_resvmem: > > dtrace -n fbt::anon_resvmem:return/arg1==0/{@a[stack(20)]=count()} > > tmpfs`tmp_resv+0x50 > tmpfs`wrtmp+0x28c > tmpfs`tmp_write+0x50 > genunix`fop_write+0x20 > genunix`write+0x270 > unix`syscall_trap32+0xcc > 1 > > Which could then be: > > 4034947 anon_swap_adjust(), anon_resvmem() should call kmem_reap() if availrmem is low. > > FixedInBuild: snv_42 > > > But it is a best practise to run ZFS with some > swap, I actually don''t know exactly why, but > possibly to account for such bugs as this one. > > -r > > >