Hi folks, I am member of Solaris Install team and I am currently working on making Slim installer compliant with ZFS boot design specification: http://opensolaris.org/os/community/arc/caselog/2006/370/commitment-materials/spec-txt/ After ZFS boot project was integrated into Nevada and support for installation on ZFS root delivered into legacy installer, some differences occurred between how Slim installer implements ZFS root and how it is done in legacy installer. One part is that we need to change in Slim installer is to create swap & dump on ZFS volume instead of utilizing UFS slice for this as defined in design spec and implemented in SXCE installer. When reading through the specification and looking at SXCE installer source code, I have realized some points are not quite clear to me. Could I please ask you to help me clarify them in order to follow the right way as far as implementation of that features is concerned ? Thank you very much, Jan [i] Formula for calculating dump & swap size -------------------------------------------- I have gone through the specification and found that following formula should be used for calculating default size of swap & dump during installation: o size of dump: 1/4 of physical memory o size of swap: max of (512MiB, 1% of rpool size) However, looking at the source code, SXCE installer calculates default sizes using slightly different algorithm: size_of_swap = size_of_dump = MAX(512 MiB, MIN(physical_memory/2, 32 GiB)) Are there any preferences which one should be used or is there any other possibility we might take into account ? [ii] Procedure of creating dump & swap -------------------------------------- Looking at the SXCE source code, I have discovered that following commands should be used for creating swap & dump: o swap # /usr/sbin/zfs create -b PAGESIZE -V <size_in_mb>m rpool/swap # /usr/sbin/swap -a /dev/zvol/dsk/rpool/swap o dump # /usr/sbin/zfs create -b 128*1024 -V <size_in_mb>m rpool/dump # /usr/sbin/dumpadm -d /dev/zvol/dsk/rpool/dump Could you please let me know, if my observations are correct or if I should use different approach ? As far as setting of volume block size is concerned (-b option), how that numbers are to be determined ? Will they be the same in different scenarios or are there plans to tune them in some way in future ? [iii] Is there anything else I should be aware of ? ---------------------------------------------------
Hi Jan, comments below... jan damborsky wrote:> Hi folks, > > I am member of Solaris Install team and I am currently working > on making Slim installer compliant with ZFS boot design specification: > > http://opensolaris.org/os/community/arc/caselog/2006/370/commitment-materials/spec-txt/ > > After ZFS boot project was integrated into Nevada and support > for installation on ZFS root delivered into legacy installer, > some differences occurred between how Slim installer implements > ZFS root and how it is done in legacy installer. > > One part is that we need to change in Slim installer is to create > swap & dump on ZFS volume instead of utilizing UFS slice for this > as defined in design spec and implemented in SXCE installer. > > When reading through the specification and looking at SXCE > installer source code, I have realized some points are not quite > clear to me. > > Could I please ask you to help me clarify them in order to > follow the right way as far as implementation of that features > is concerned ? > > Thank you very much, > Jan > > > [i] Formula for calculating dump & swap size > -------------------------------------------- > > I have gone through the specification and found that > following formula should be used for calculating default > size of swap & dump during installation: > > o size of dump: 1/4 of physical memory >This is a non-starter for systems with 1-4 TBytes of physical memory. There must be a reasonable maximum cap, most likely based on the size of the pool, given that we regularly boot large systems from modest-sized disks.> o size of swap: max of (512MiB, 1% of rpool size) > > However, looking at the source code, SXCE installer > calculates default sizes using slightly different > algorithm: > > size_of_swap = size_of_dump = MAX(512 MiB, MIN(physical_memory/2, 32 GiB)) > > Are there any preferences which one should be used or is > there any other possibility we might take into account ? >zero would make me happy :-) But there are some cases where swap space is preferred. Again, there needs to be a reasonable cap. In general, the larger the system, the less use for swap during normal operations, so for most cases there is no need for really large swap volumes. These can also be adjusted later, so the default can be modest. One day perhaps it will be fully self-adjusting like it is with other UNIX[-like] implementations.> > [ii] Procedure of creating dump & swap > -------------------------------------- > > Looking at the SXCE source code, I have discovered that following > commands should be used for creating swap & dump: > > o swap > # /usr/sbin/zfs create -b PAGESIZE -V <size_in_mb>m rpool/swap > # /usr/sbin/swap -a /dev/zvol/dsk/rpool/swap > > o dump > # /usr/sbin/zfs create -b 128*1024 -V <size_in_mb>m rpool/dump > # /usr/sbin/dumpadm -d /dev/zvol/dsk/rpool/dump > > Could you please let me know, if my observations are correct > or if I should use different approach ? > > As far as setting of volume block size is concerned (-b option), > how that numbers are to be determined ? Will they be the same in > different scenarios or are there plans to tune them in some way > in future ? >Setting the swap blocksize to pagesize is interesting, but should be ok for most cases. The reason I say it is interesting is because it is optimized for small systems, but not for larger systems which typically see more use of large page sizes. OTOH larger systems should not swap, so it is probably a non-issue for them. Small systems should see this as the best solution. Dump just sets the blocksize to the default, so it is a no-op. -- richard> > [iii] Is there anything else I should be aware of ? > --------------------------------------------------- >Installation should *not* fail due to running out of space because of large dump or swap allocations. I think the algorithm should first take into account the space available in the pool after accounting for the OS. -- richard
Richard Elling wrote:> Hi Jan, comments below... > > jan damborsky wrote: > >> Hi folks, >> >> I am member of Solaris Install team and I am currently working >> on making Slim installer compliant with ZFS boot design specification: >> >> http://opensolaris.org/os/community/arc/caselog/2006/370/commitment-materials/spec-txt/ >> >> After ZFS boot project was integrated into Nevada and support >> for installation on ZFS root delivered into legacy installer, >> some differences occurred between how Slim installer implements >> ZFS root and how it is done in legacy installer. >> >> One part is that we need to change in Slim installer is to create >> swap & dump on ZFS volume instead of utilizing UFS slice for this >> as defined in design spec and implemented in SXCE installer. >> >> When reading through the specification and looking at SXCE >> installer source code, I have realized some points are not quite >> clear to me. >> >> Could I please ask you to help me clarify them in order to >> follow the right way as far as implementation of that features >> is concerned ? >> >> Thank you very much, >> Jan >> >> >> [i] Formula for calculating dump & swap size >> -------------------------------------------- >> >> I have gone through the specification and found that >> following formula should be used for calculating default >> size of swap & dump during installation: >> >> o size of dump: 1/4 of physical memory >> >> > > This is a non-starter for systems with 1-4 TBytes of physical > memory. There must be a reasonable maximum cap, most > likely based on the size of the pool, given that we regularly > boot large systems from modest-sized disks.Actually, starting with build 90, the legacy installer sets the default size of the swap and dump zvols to half the size of physical memory, but no more then 32 GB and no less than 512 MB. Those are just the defaults. Administrators can use the zfs command to modify the volsize property of both the swap and dump zvols (to any value, including values larger than 32 GB).>> o size of swap: max of (512MiB, 1% of rpool size) >> >> However, looking at the source code, SXCE installer >> calculates default sizes using slightly different >> algorithm: >> >> size_of_swap = size_of_dump = MAX(512 MiB, MIN(physical_memory/2, 32 GiB)) >> >> Are there any preferences which one should be used or is >> there any other possibility we might take into account ? >> >> > > zero would make me happy :-) But there are some cases where swap > space is preferred. Again, there needs to be a reasonable cap. In > general, the larger the system, the less use for swap during normal > operations, so for most cases there is no need for really large swap > volumes. These can also be adjusted later, so the default can be > modest. One day perhaps it will be fully self-adjusting like it is > with other UNIX[-like] implementations. > > >> [ii] Procedure of creating dump & swap >> -------------------------------------- >> >> Looking at the SXCE source code, I have discovered that following >> commands should be used for creating swap & dump: >> >> o swap >> # /usr/sbin/zfs create -b PAGESIZE -V <size_in_mb>m rpool/swap >> # /usr/sbin/swap -a /dev/zvol/dsk/rpool/swap >> >> o dump >> # /usr/sbin/zfs create -b 128*1024 -V <size_in_mb>m rpool/dump >> # /usr/sbin/dumpadm -d /dev/zvol/dsk/rpool/dump >> >>The above commands for creating the swap and dump zvols match what the legacy installer does, as of build 90.>> Could you please let me know, if my observations are correct >> or if I should use different approach ? >> >> As far as setting of volume block size is concerned (-b option), >> how that numbers are to be determined ? Will they be the same in >> different scenarios or are there plans to tune them in some way >> in future ? >>There are no plans to tune this. The block sizes are appropriate for the way the zvols are to be used.>> >> > > Setting the swap blocksize to pagesize is interesting, but should be > ok for most cases. The reason I say it is interesting is because it > is optimized for small systems, but not for larger systems which > typically see more use of large page sizes. OTOH larger systems > should not swap, so it is probably a non-issue for them. Small > systems should see this as the best solution. > > Dump just sets the blocksize to the default, so it is a no-op. > -- richard > > >> [iii] Is there anything else I should be aware of ? >> --------------------------------------------------- >> >> > > Installation should *not* fail due to running out of space because > of large dump or swap allocations. I think the algorithm should > first take into account the space available in the pool after accounting > for the OS. > > >The Caiman team can make their own decision here, but we decided to be more hard-nosed about disk space requirements in the legacy install. If the pool is too small to accommodate the recommended swap and dump zvols, then maybe this system isn''t a good candidate for a zfs root pool. Basically, we decided that since you almost can''t buy disks smaller than 60 GB these days, it''s not worth much effort to facilitate the setup of zfs root pools on disks that are smaller than that. If you really need to do so, Jumpstart can be used to set the dump and swap sizes to whatever you like, at the time of initial install. Lori -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20080623/63071b25/attachment.html>
jan damborsky
2008-Jun-24 14:07 UTC
[zfs-discuss] [caiman-discuss] swap & dump on ZFS volume
Hi Lori, Lori Alt wrote:> Richard Elling wrote: >> Hi Jan, comments below... >> >> jan damborsky wrote: >> >>> Hi folks, >>> >>> I am member of Solaris Install team and I am currently working >>> on making Slim installer compliant with ZFS boot design specification: >>> >>> http://opensolaris.org/os/community/arc/caselog/2006/370/commitment-materials/spec-txt/ >>> >>> After ZFS boot project was integrated into Nevada and support >>> for installation on ZFS root delivered into legacy installer, >>> some differences occurred between how Slim installer implements >>> ZFS root and how it is done in legacy installer. >>> >>> One part is that we need to change in Slim installer is to create >>> swap & dump on ZFS volume instead of utilizing UFS slice for this >>> as defined in design spec and implemented in SXCE installer. >>> >>> When reading through the specification and looking at SXCE >>> installer source code, I have realized some points are not quite >>> clear to me. >>> >>> Could I please ask you to help me clarify them in order to >>> follow the right way as far as implementation of that features >>> is concerned ? >>> >>> Thank you very much, >>> Jan >>> >>> >>> [i] Formula for calculating dump & swap size >>> -------------------------------------------- >>> >>> I have gone through the specification and found that >>> following formula should be used for calculating default >>> size of swap & dump during installation: >>> >>> o size of dump: 1/4 of physical memory >>> >>> >> >> This is a non-starter for systems with 1-4 TBytes of physical >> memory. There must be a reasonable maximum cap, most >> likely based on the size of the pool, given that we regularly >> boot large systems from modest-sized disks. > Actually, starting with build 90, the legacy installer sets the > default size of the > swap and dump zvols to half the size of physical memory, but no more > then 32 GB and no less than 512 MB. Those are just the defaults. > Administrators can use the zfs command to modify the volsize > property of both the swap and dump zvols (to any value, including > values larger than 32 GB).Agreed - the formula [i] is mentioned in PSARC document, but the implementation I was investigating by looking at latest SXCE installer code is exactly what you are describing here. Since that calculation is part of PSARC, I assumed that every implementation of ZFS root should follow this in order to be fully compliant with ZFS root design ?> > > >>> o size of swap: max of (512MiB, 1% of rpool size) >>> >>> However, looking at the source code, SXCE installer >>> calculates default sizes using slightly different >>> algorithm: >>> >>> size_of_swap = size_of_dump = MAX(512 MiB, MIN(physical_memory/2, 32 GiB)) >>> >>> Are there any preferences which one should be used or is >>> there any other possibility we might take into account ? >>> >>> >> >> zero would make me happy :-) But there are some cases where swap >> space is preferred. Again, there needs to be a reasonable cap. In >> general, the larger the system, the less use for swap during normal >> operations, so for most cases there is no need for really large swap >> volumes. These can also be adjusted later, so the default can be >> modest. One day perhaps it will be fully self-adjusting like it is >> with other UNIX[-like] implementations. >> >> >>> [ii] Procedure of creating dump & swap >>> -------------------------------------- >>> >>> Looking at the SXCE source code, I have discovered that following >>> commands should be used for creating swap & dump: >>> >>> o swap >>> # /usr/sbin/zfs create -b PAGESIZE -V <size_in_mb>m rpool/swap >>> # /usr/sbin/swap -a /dev/zvol/dsk/rpool/swap >>> >>> o dump >>> # /usr/sbin/zfs create -b 128*1024 -V <size_in_mb>m rpool/dump >>> # /usr/sbin/dumpadm -d /dev/zvol/dsk/rpool/dump >>> >>> > The above commands for creating the swap and dump zvols match > what the legacy installer does, as of build 90.ok - Then I will use this implementation also in Slim installer.> >>> Could you please let me know, if my observations are correct >>> or if I should use different approach ? >>> >>> As far as setting of volume block size is concerned (-b option), >>> how that numbers are to be determined ? Will they be the same in >>> different scenarios or are there plans to tune them in some way >>> in future ? >>> > There are no plans to tune this. The block sizes are appropriate > for the way the zvols are to be used.I see - thanks for clarification.> >>> >>> >> >> Setting the swap blocksize to pagesize is interesting, but should be >> ok for most cases. The reason I say it is interesting is because it >> is optimized for small systems, but not for larger systems which >> typically see more use of large page sizes. OTOH larger systems >> should not swap, so it is probably a non-issue for them. Small >> systems should see this as the best solution. >> >> Dump just sets the blocksize to the default, so it is a no-op. >> -- richard >> >> >>> [iii] Is there anything else I should be aware of ? >>> --------------------------------------------------- >>> >>> >> >> Installation should *not* fail due to running out of space because >> of large dump or swap allocations. I think the algorithm should >> first take into account the space available in the pool after accounting >> for the OS. >> >> >> > The Caiman team can make their own decision here, but we > decided to be more hard-nosed about disk space requirements in the > legacy install. If the pool is too small to accommodate the recommended > swap and dump zvols, then maybe this system isn''t a good candidate for > a zfs root pool. Basically, we decided that since you almost > can''t buy disks smaller than 60 GB these days, it''s not worth much > effort to facilitate the setup of zfs root pools on disks that are smaller > than that. If you really need to do so, Jumpstart can be used to > set the dump and swap sizes to whatever you like, at the time > of initial install.I would agree with you as far as internal disks are concerned. However, since Slim installer also allows to install for example on USB sticks, which are smaller, the minimum required space might be the issue. Do we need to create two separate volumes for swap and dump or might be one ZFS volume enough which then would be shared by both swap and dump ? Thank you very much, Jan
Hi Richard, thank you very much for your comments. Please see my response in line. Jan Richard Elling wrote:> Hi Jan, comments below... > > jan damborsky wrote: >> Hi folks, >> >> I am member of Solaris Install team and I am currently working >> on making Slim installer compliant with ZFS boot design specification: >> >> http://opensolaris.org/os/community/arc/caselog/2006/370/commitment-materials/spec-txt/ >> >> >> After ZFS boot project was integrated into Nevada and support >> for installation on ZFS root delivered into legacy installer, >> some differences occurred between how Slim installer implements >> ZFS root and how it is done in legacy installer. >> >> One part is that we need to change in Slim installer is to create >> swap & dump on ZFS volume instead of utilizing UFS slice for this >> as defined in design spec and implemented in SXCE installer. >> >> When reading through the specification and looking at SXCE >> installer source code, I have realized some points are not quite >> clear to me. >> >> Could I please ask you to help me clarify them in order to >> follow the right way as far as implementation of that features >> is concerned ? >> >> Thank you very much, >> Jan >> >> >> [i] Formula for calculating dump & swap size >> -------------------------------------------- >> >> I have gone through the specification and found that >> following formula should be used for calculating default >> size of swap & dump during installation: >> >> o size of dump: 1/4 of physical memory >> > > This is a non-starter for systems with 1-4 TBytes of physical > memory. There must be a reasonable maximum cap, most > likely based on the size of the pool, given that we regularly > boot large systems from modest-sized disks.I agree - there will be both upper (32GiB) as well as lower (512MiB or 0 for dump ?) bounds defined.> >> o size of swap: max of (512MiB, 1% of rpool size) >> >> However, looking at the source code, SXCE installer >> calculates default sizes using slightly different >> algorithm: >> >> size_of_swap = size_of_dump = MAX(512 MiB, MIN(physical_memory/2, 32 >> GiB)) >> >> Are there any preferences which one should be used or is >> there any other possibility we might take into account ? >> > > zero would make me happy :-) But there are some cases where swap > space is preferred. Again, there needs to be a reasonable cap. In > general, the larger the system, the less use for swap during normal > operations, so for most cases there is no need for really large swap > volumes. These can also be adjusted later, so the default can be > modest. One day perhaps it will be fully self-adjusting like it is > with other UNIX[-like] implementations. > >> >> [ii] Procedure of creating dump & swap >> -------------------------------------- >> >> Looking at the SXCE source code, I have discovered that following >> commands should be used for creating swap & dump: >> >> o swap >> # /usr/sbin/zfs create -b PAGESIZE -V <size_in_mb>m rpool/swap >> # /usr/sbin/swap -a /dev/zvol/dsk/rpool/swap >> >> o dump >> # /usr/sbin/zfs create -b 128*1024 -V <size_in_mb>m rpool/dump >> # /usr/sbin/dumpadm -d /dev/zvol/dsk/rpool/dump >> >> Could you please let me know, if my observations are correct >> or if I should use different approach ? >> >> As far as setting of volume block size is concerned (-b option), >> how that numbers are to be determined ? Will they be the same in >> different scenarios or are there plans to tune them in some way >> in future ? >> > > Setting the swap blocksize to pagesize is interesting, but should be > ok for most cases. The reason I say it is interesting is because it > is optimized for small systems, but not for larger systems which > typically see more use of large page sizes. OTOH larger systems > should not swap, so it is probably a non-issue for them. Small > systems should see this as the best solution. > > Dump just sets the blocksize to the default, so it is a no-op.I see - thank you for clarifying this.> -- richard > >> >> [iii] Is there anything else I should be aware of ? >> --------------------------------------------------- >> > > Installation should *not* fail due to running out of space because > of large dump or swap allocations. I think the algorithm should > first take into account the space available in the pool after accounting > for the OS. > -- richard >This is a good point. I can imagine that if user would like to install on USB stick, he would be probably fine with having system installed without dump, if space available is limited.
Mike Gerdts
2008-Jun-24 14:25 UTC
[zfs-discuss] [caiman-discuss] swap & dump on ZFS volume
On Mon, Jun 23, 2008 at 11:58 AM, Lori Alt <Lori.Alt at sun.com> wrote:> The Caiman team can make their own decision here, but we > decided to be more hard-nosed about disk space requirements in the > legacy install. If the pool is too small to accommodate the recommended > swap and dump zvols, then maybe this system isn''t a good candidate for > a zfs root pool. Basically, we decided that since you almost > can''t buy disks smaller than 60 GB these days, it''s not worth much > effort to facilitate the setup of zfs root pools on disks that are smaller > than that. If you really need to do so, Jumpstart can be used to > set the dump and swap sizes to whatever you like, at the time > of initial install.This is extremely bad for virtualized environments. If I have a laptop with 150 GB disk, a dual core processor, and 4 GB of RAM I would expect that I should have plenty of room to install 10+ virtual machines, and be able to run up to 2 - 4 of them at a time. Requiring 60 GB would mean that I could only install 2 virtual machines - which is on par with what I was doing with my previous laptop that had a 30 GB disk. The same argument can be made for VMware, LDoms, Xen, etc., but those are much more likely to use jumpstart for installations than laptop-based VM''s. -- Mike Gerdts http://mgerdts.blogspot.com/
jan damborsky wrote:>Hi Lori, > > >Lori Alt wrote: > > >>Richard Elling wrote: >> >> >>>Hi Jan, comments below... >>> >>>jan damborsky wrote: >>> >>> >>> >>>>Hi folks, >>>> >>>>I am member of Solaris Install team and I am currently working >>>>on making Slim installer compliant with ZFS boot design specification: >>>> >>>>http://opensolaris.org/os/community/arc/caselog/2006/370/commitment-materials/spec-txt/ >>>> >>>>After ZFS boot project was integrated into Nevada and support >>>>for installation on ZFS root delivered into legacy installer, >>>>some differences occurred between how Slim installer implements >>>>ZFS root and how it is done in legacy installer. >>>> >>>>One part is that we need to change in Slim installer is to create >>>>swap & dump on ZFS volume instead of utilizing UFS slice for this >>>>as defined in design spec and implemented in SXCE installer. >>>> >>>>When reading through the specification and looking at SXCE >>>>installer source code, I have realized some points are not quite >>>>clear to me. >>>> >>>>Could I please ask you to help me clarify them in order to >>>>follow the right way as far as implementation of that features >>>>is concerned ? >>>> >>>>Thank you very much, >>>>Jan >>>> >>>> >>>>[i] Formula for calculating dump & swap size >>>>-------------------------------------------- >>>> >>>>I have gone through the specification and found that >>>>following formula should be used for calculating default >>>>size of swap & dump during installation: >>>> >>>>o size of dump: 1/4 of physical memory >>>> >>>> >>>> >>>> >>>This is a non-starter for systems with 1-4 TBytes of physical >>>memory. There must be a reasonable maximum cap, most >>>likely based on the size of the pool, given that we regularly >>>boot large systems from modest-sized disks. >>> >>> >>Actually, starting with build 90, the legacy installer sets the >>default size of the >>swap and dump zvols to half the size of physical memory, but no more >>then 32 GB and no less than 512 MB. Those are just the defaults. >>Administrators can use the zfs command to modify the volsize >>property of both the swap and dump zvols (to any value, including >>values larger than 32 GB). >> >> > >Agreed - the formula [i] is mentioned in PSARC document, but >the implementation I was investigating by looking at latest >SXCE installer code is exactly what you are describing here. > >Since that calculation is part of PSARC, I assumed that every >implementation of ZFS root should follow this in order to be >fully compliant with ZFS root design ? > >I guess I don''t consider the exact formula a "compliance" issue. I think it''s reasonable to modify the formula if you decide a change is worthwhile.> > >> >> >> >>>>o size of swap: max of (512MiB, 1% of rpool size) >>>> >>>>However, looking at the source code, SXCE installer >>>>calculates default sizes using slightly different >>>>algorithm: >>>> >>>>size_of_swap = size_of_dump = MAX(512 MiB, MIN(physical_memory/2, 32 GiB)) >>>> >>>>Are there any preferences which one should be used or is >>>>there any other possibility we might take into account ? >>>> >>>> >>>> >>>> >>>zero would make me happy :-) But there are some cases where swap >>>space is preferred. Again, there needs to be a reasonable cap. In >>>general, the larger the system, the less use for swap during normal >>>operations, so for most cases there is no need for really large swap >>>volumes. These can also be adjusted later, so the default can be >>>modest. One day perhaps it will be fully self-adjusting like it is >>>with other UNIX[-like] implementations. >>> >>> >>> >>> >>>>[ii] Procedure of creating dump & swap >>>>-------------------------------------- >>>> >>>>Looking at the SXCE source code, I have discovered that following >>>>commands should be used for creating swap & dump: >>>> >>>>o swap >>>># /usr/sbin/zfs create -b PAGESIZE -V <size_in_mb>m rpool/swap >>>># /usr/sbin/swap -a /dev/zvol/dsk/rpool/swap >>>> >>>>o dump >>>># /usr/sbin/zfs create -b 128*1024 -V <size_in_mb>m rpool/dump >>>># /usr/sbin/dumpadm -d /dev/zvol/dsk/rpool/dump >>>> >>>> >>>> >>>> >>The above commands for creating the swap and dump zvols match >>what the legacy installer does, as of build 90. >> >> > >ok - Then I will use this implementation also in Slim installer. > > > >>>>Could you please let me know, if my observations are correct >>>>or if I should use different approach ? >>>> >>>>As far as setting of volume block size is concerned (-b option), >>>>how that numbers are to be determined ? Will they be the same in >>>>different scenarios or are there plans to tune them in some way >>>>in future ? >>>> >>>> >>>> >>There are no plans to tune this. The block sizes are appropriate >>for the way the zvols are to be used. >> >> > >I see - thanks for clarification. > > > >>>> >>>> >>>> >>>> >>>Setting the swap blocksize to pagesize is interesting, but should be >>>ok for most cases. The reason I say it is interesting is because it >>>is optimized for small systems, but not for larger systems which >>>typically see more use of large page sizes. OTOH larger systems >>>should not swap, so it is probably a non-issue for them. Small >>>systems should see this as the best solution. >>> >>>Dump just sets the blocksize to the default, so it is a no-op. >>> -- richard >>> >>> >>> >>> >>>>[iii] Is there anything else I should be aware of ? >>>>--------------------------------------------------- >>>> >>>> >>>> >>>> >>>Installation should *not* fail due to running out of space because >>>of large dump or swap allocations. I think the algorithm should >>>first take into account the space available in the pool after accounting >>>for the OS. >>> >>> >>> >>> >>> >>The Caiman team can make their own decision here, but we >>decided to be more hard-nosed about disk space requirements in the >>legacy install. If the pool is too small to accommodate the recommended >>swap and dump zvols, then maybe this system isn''t a good candidate for >>a zfs root pool. Basically, we decided that since you almost >>can''t buy disks smaller than 60 GB these days, it''s not worth much >>effort to facilitate the setup of zfs root pools on disks that are smaller >>than that. If you really need to do so, Jumpstart can be used to >>set the dump and swap sizes to whatever you like, at the time >>of initial install. >> >> > >I would agree with you as far as internal disks are concerned. >However, since Slim installer also allows to install for example >on USB sticks, which are smaller, the minimum required space might >be the issue. >Yes, there could be cases where a smaller swap and dump amount is reasonable. It all depends on the environment in which the system is running. I would assume that a system with a USB stick as its root disk would not be doing the kind of computing needing lots of swap. In your install procedure, you are free to adapt to the particular environment you''re installing.> >Do we need to create two separate volumes for swap and dump or >might be one ZFS volume enough which then would be shared by both >swap and dump ? >When swap and dump are stored in zvols, it is no longer permitted that they share the same space (the implementation of swap and dump zvols required special handling that make them mutually exclusive. ). However, you could choose not to have a dump zvol at all in some environments (the USB stick as root environment, for example), as long as you''re willing to not get crash dumps. Lori> >Thank you very much, >Jan > >_______________________________________________ >caiman-discuss mailing list >caiman-discuss at opensolaris.org >http://mail.opensolaris.org/mailman/listinfo/caiman-discuss > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20080624/8fca6e6e/attachment.html>
Mike Gerdts wrote:>On Mon, Jun 23, 2008 at 11:58 AM, Lori Alt <Lori.Alt at sun.com> wrote: > > >>The Caiman team can make their own decision here, but we >>decided to be more hard-nosed about disk space requirements in the >>legacy install. If the pool is too small to accommodate the recommended >>swap and dump zvols, then maybe this system isn''t a good candidate for >>a zfs root pool. Basically, we decided that since you almost >>can''t buy disks smaller than 60 GB these days, it''s not worth much >>effort to facilitate the setup of zfs root pools on disks that are smaller >>than that. If you really need to do so, Jumpstart can be used to >>set the dump and swap sizes to whatever you like, at the time >>of initial install. >> >> > >This is extremely bad for virtualized environments. If I have a >laptop with 150 GB disk, a dual core processor, and 4 GB of RAM I >would expect that I should have plenty of room to install 10+ virtual >machines, and be able to run up to 2 - 4 of them at a time. Requiring >60 GB would mean that I could only install 2 virtual machines - which >is on par with what I was doing with my previous laptop that had a 30 >GB disk. > >The same argument can be made for VMware, LDoms, Xen, etc., but those >are much more likely to use jumpstart for installations than >laptop-based VM''s. > > >This is a good point. Perhaps at some point we should add back the capability of overriding the default swap/dump sizes in the interactive install. However, swap can''t always be reduced by much. The default swap sizes we chose were not totally arbitrary. But of course, environments differ widely. In some environments, it''s probably reasonable to run with little or no swap. Right now, you have two options to override the default and dump sizes: use Jumpstart to do the install, or modify the sizes of the swap and dump zvols after the install completes (using the "zfs set" command to modify the volsize) The Caiman team may wish to offer more configurability in this regard in their install procedure. Lori -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20080624/63955bca/attachment.html>
Richard Elling
2008-Jun-24 16:41 UTC
[zfs-discuss] [caiman-discuss] swap & dump on ZFS volume
jan damborsky wrote:> Hi Lori, > > > Lori Alt wrote: > >> The Caiman team can make their own decision here, but we >> decided to be more hard-nosed about disk space requirements in the >> legacy install. If the pool is too small to accommodate the recommended >> swap and dump zvols, then maybe this system isn''t a good candidate for >> a zfs root pool. Basically, we decided that since you almost >> can''t buy disks smaller than 60 GB these days, it''s not worth much >> effort to facilitate the setup of zfs root pools on disks that are >> smaller >> than that. If you really need to do so, Jumpstart can be used to >> set the dump and swap sizes to whatever you like, at the time >> of initial install. > > I would agree with you as far as internal disks are concerned. > However, since Slim installer also allows to install for example > on USB sticks, which are smaller, the minimum required space might > be the issue.With ZFS, the actual space used is difficult to predict, so there should be some leeway allowed. For USB sticks, I''m generally using compression and copies=2, both of which radically change the actual space used. It is unlikely that we can install 5 lbs of flour in a 1 lb bag, but may not be impossible.> > Do we need to create two separate volumes for swap and dump or > might be one ZFS volume enough which then would be shared by both > swap and dump ?IMHO, you can make dump optional, with no dump being default. Before Sommerfeld pounces on me (again :-), let me defend myself: the vast majority of people will never get a core dump and if they did, they wouldn''t know what to do with it. We will just end up wasting a bunch of space. As Solaris becomes more popular, this problem becomes bigger. OTOH, people who actually care about core dumps can enable them quite easily. WWMSD? -- richard
Lori Alt wrote:> > > Mike Gerdts wrote: >> On Mon, Jun 23, 2008 at 11:58 AM, Lori Alt <Lori.Alt at sun.com> wrote: >> >>> The Caiman team can make their own decision here, but we >>> decided to be more hard-nosed about disk space requirements in the >>> legacy install. If the pool is too small to accommodate the recommended >>> swap and dump zvols, then maybe this system isn''t a good candidate for >>> a zfs root pool. Basically, we decided that since you almost >>> can''t buy disks smaller than 60 GB these days, it''s not worth much >>> effort to facilitate the setup of zfs root pools on disks that are smaller >>> than that. If you really need to do so, Jumpstart can be used to >>> set the dump and swap sizes to whatever you like, at the time >>> of initial install. >>> >> >> This is extremely bad for virtualized environments. If I have a >> laptop with 150 GB disk, a dual core processor, and 4 GB of RAM I >> would expect that I should have plenty of room to install 10+ virtual >> machines, and be able to run up to 2 - 4 of them at a time. Requiring >> 60 GB would mean that I could only install 2 virtual machines - which >> is on par with what I was doing with my previous laptop that had a 30 >> GB disk. >> >> The same argument can be made for VMware, LDoms, Xen, etc., but those >> are much more likely to use jumpstart for installations than >> laptop-based VM''s. >> >> > This is a good point. Perhaps at some point we should add back the > capability of overriding the default swap/dump sizes in the interactive > install. However, swap can''t always be reduced by much. The default swap > sizes we chose were not totally arbitrary. But of course, environments > differ > widely. In some environments, it''s probably reasonable to run with little > or no swap. > > Right now, you have two options to override the default and dump sizes: > use Jumpstart to do the install, or modify the sizes of the swap and dump > zvols after the install completes (using the "zfs set" command to modify the > volsize) > > The Caiman team may wish to offer more configurability in this > regard in their install procedure. >I doubt we''d have interest in providing more configurability in the interactive installer. As Richard sort of points out subsequently, most people wouldn''t know what to do here, anyway, and the ones who do usually use automated provisioning like Jumpstart, where we can provide those options. That said, I''d like to default to having a dump volume when space allows, so that we are in a position to gather crash dumps, since reproducing them is usually not easy, and almost always undesirable. It''d be lower priority than having enough space for 2-3 BE''s plus swap, so might be automatically dropped when space is less than that. Dave
Bill Sommerfeld
2008-Jun-24 18:20 UTC
[zfs-discuss] [caiman-discuss] swap & dump on ZFS volume
On Tue, 2008-06-24 at 09:41 -0700, Richard Elling wrote:> IMHO, you can make dump optional, with no dump being default. > Before Sommerfeld pounces on me (again :-))actually, in the case of virtual machines, doing the dump *in* the virtual machine into preallocated virtual disk blocks is silly. if you can break the abstraction barriers a little, I''d think it would make more sense for the virtual machine infrastructure to create some sort of snapshot at the time of failure which could then be converted into a form that mdb can digest... - Bill
Keith Bierman
2008-Jun-24 18:43 UTC
[zfs-discuss] [caiman-discuss] swap & dump on ZFS volume
On Jun 24, 2008, at 11:01 AM, Dave Miner wrote:> I doubt we''d have interest in providing more configurability in the > interactive installer. As Richard sort of points out subsequently, > most > people wouldn''t know what to do here, anyway, and the ones who do > usually use automated provisioning like Jumpstart, where we can > provide > those options.A lot of developers use VMs of one sort or another these days, and few of them use jumpstart (especially when the entire point of the exercise is to get their feet wet on new platforms, or new versions of old platforms). Perhaps I travel in the wrong circles these days. -- Keith H. Bierman khbkhb at gmail.com | AIM kbiermank 5430 Nassau Circle East | Cherry Hills Village, CO 80113 | 303-997-2749 <speaking for myself*> Copyright 2008
Keith Bierman wrote:> On Jun 24, 2008, at 11:01 AM, Dave Miner wrote: > >> I doubt we''d have interest in providing more configurability in the >> interactive installer. As Richard sort of points out subsequently, >> most >> people wouldn''t know what to do here, anyway, and the ones who do >> usually use automated provisioning like Jumpstart, where we can >> provide >> those options. > > > A lot of developers use VMs of one sort or another these days, and > few of them use jumpstart (especially when the entire point of the > exercise is to get their feet wet on new platforms, or new versions > of old platforms). > > Perhaps I travel in the wrong circles these days. >All they''d have to do under my suggested solutoin is make the virtual disk large enough to get a dump pool created automatically. Our recommended sizing would encompass that. I do like Bill''s suggestion of getting VM''s to snapshot the VM on panic, though. Dave
Peter Tribble
2008-Jun-24 21:15 UTC
[zfs-discuss] [caiman-discuss] swap & dump on ZFS volume
On Tue, Jun 24, 2008 at 8:27 PM, Dave Miner <Dave.Miner at sun.com> wrote:> Keith Bierman wrote: >> A lot of developers use VMs of one sort or another these days, and >> few of them use jumpstart (especially when the entire point of the >> exercise is to get their feet wet on new platforms, or new versions >> of old platforms). >> >> Perhaps I travel in the wrong circles these days. > > All they''d have to do under my suggested solutoin is make the virtual > disk large enough to get a dump pool created automatically. Our > recommended sizing would encompass that.So remind me again - what is our recommended sizing? (Especially in the light of this discussion.) -- -Peter Tribble http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/
Peter Tribble wrote:> On Tue, Jun 24, 2008 at 8:27 PM, Dave Miner <Dave.Miner at sun.com> wrote: >> Keith Bierman wrote: >>> A lot of developers use VMs of one sort or another these days, and >>> few of them use jumpstart (especially when the entire point of the >>> exercise is to get their feet wet on new platforms, or new versions >>> of old platforms). >>> >>> Perhaps I travel in the wrong circles these days. >> All they''d have to do under my suggested solutoin is make the virtual >> disk large enough to get a dump pool created automatically. Our >> recommended sizing would encompass that. > > So remind me again - what is our recommended sizing? (Especially > in the light of this discussion.) >Dynamically calculated based info recorded in the image. http://src.opensolaris.org/source/xref/caiman/slim_source/usr/src/lib/liborchestrator/perform_slim_install.c#om_get_recommended_size It''s in the 4+ GB range right now. Dave
Darren J Moffat
2008-Jun-25 12:19 UTC
[zfs-discuss] [caiman-discuss] swap & dump on ZFS volume
Jan Damborsky wrote:> Thank you very much all for this valuable input. > > Based on the collected information, I would take > following approach as far as calculating size of > swap and dump devices on ZFS volumes in Caiman > installer is concerned. > > [1] Following formula would be used for calculating > swap and dump sizes: > > size_of_swap = size_of_dump = MAX(512 MiB, MIN(physical_memory/2, 32 GiB)) > > User can reconfigure this after installation is done on live > system by "zfs set" command.If the min space isn''t available do NOT abort the install just continue without creating swap space, but put a small warning somewhere suitable. Don''t ask the user to confirm this and don''t make a big deal about it. -- Darren J Moffat
Mike Gerdts
2008-Jun-25 13:09 UTC
[zfs-discuss] [caiman-discuss] swap & dump on ZFS volume
On Wed, Jun 25, 2008 at 11:09 PM, Jan Damborsky <Jan.Damborsky at sun.com> wrote:> Thank you very much all for this valuable input. > > Based on the collected information, I would take > following approach as far as calculating size of > swap and dump devices on ZFS volumes in Caiman > installer is concerned. > > [1] Following formula would be used for calculating > swap and dump sizes: > > size_of_swap = size_of_dump = MAX(512 MiB, MIN(physical_memory/2, 32 GiB))dump should scale with memory size, but the size given is completely overkill. On very active (heavy kernel activity) servers with 300+ GB of RAM, I have never seen a (compressed) dump that needed more than 8 GB. Even uncompressed the maximum size I''ve seen has been in the 18 GB range. This has been without zfs in the mix. It is my understanding that at one time the arc was dumped as part of kernel memory but that was regarded as a bug and has sense been fixed. If the arc is dumped, a value of dump much closer to physical memory is likely to be appropriate. As an aside, does the dedicated dump on all machines make it so that savecore no longer runs by default? It just creates a lot of extra I/O during boot (thereby slowing down boot after a crash) and uses a lot of extra disk space for those that will never look at a crash dump. Those that actually use it (not the majority target audience for OpenSolaris, I would guess) will be able to figure out how to enable (the yet non-existent) svc:/system/savecore:default. -- Mike Gerdts http://mgerdts.blogspot.com/
Robert Milkowski
2008-Jun-25 20:07 UTC
[zfs-discuss] [caiman-discuss] swap & dump on ZFS volume
Hello Darren, Wednesday, June 25, 2008, 1:19:53 PM, you wrote: DJM> Jan Damborsky wrote:>> Thank you very much all for this valuable input. >> >> Based on the collected information, I would take >> following approach as far as calculating size of >> swap and dump devices on ZFS volumes in Caiman >> installer is concerned. >> >> [1] Following formula would be used for calculating >> swap and dump sizes: >> >> size_of_swap = size_of_dump = MAX(512 MiB, MIN(physical_memory/2, 32 GiB)) >> >> User can reconfigure this after installation is done on live >> system by "zfs set" command.DJM> If the min space isn''t available do NOT abort the install just continue DJM> without creating swap space, but put a small warning somewhere suitable. DJM> Don''t ask the user to confirm this and don''t make a big deal about it. Yeah, I''ve just tried to install snv_91 on a 16GB CF card with ZFS as root file system.. and I couldn''t because it wanted almost 40GB of disk space and I couldn''t overwrite it. I know it is not Caiman. Such a behavior is irritating. -- Best regards, Robert Milkowski mailto:milek at task.gda.pl http://milek.blogspot.com
Robert Milkowski
2008-Jun-25 20:09 UTC
[zfs-discuss] [caiman-discuss] swap & dump on ZFS volume
Hello Mike, Wednesday, June 25, 2008, 2:09:31 PM, you wrote: MG> dump should scale with memory size, but the size given is completely MG> overkill. On very active (heavy kernel activity) servers with 300+ GB MG> of RAM, I have never seen a (compressed) dump that needed more than 8 MG> GB. Even uncompressed the maximum size I''ve seen has been in the 18 MG> GB range. This has been without zfs in the mix. It is my MG> understanding that at one time the arc was dumped as part of kernel MG> memory but that was regarded as a bug and has sense been fixed. If MG> the arc is dumped, a value of dump much closer to physical memory is MG> likely to be appropriate. Well, I''ve seen core dumps bigger than 10GB (even without ZFS)... :) -- Best regards, Robert Milkowski mailto:milek at task.gda.pl http://milek.blogspot.com
Mike Gerdts
2008-Jun-25 20:36 UTC
[zfs-discuss] [caiman-discuss] swap & dump on ZFS volume
On Wed, Jun 25, 2008 at 3:09 PM, Robert Milkowski <milek at task.gda.pl> wrote:> Well, I''ve seen core dumps bigger than 10GB (even without ZFS)... :)Was that the size in the dump device or the size in /var/crash? If it was the size in /var/crash, divide that by the compress ratio reported on the console after the dump completed. -- Mike Gerdts http://mgerdts.blogspot.com/
Mike Gerdts
2008-Jun-25 21:13 UTC
[zfs-discuss] [caiman-discuss] swap & dump on ZFS volume
On Wed, Jun 25, 2008 at 3:36 PM, Mike Gerdts <mgerdts at gmail.com> wrote:> On Wed, Jun 25, 2008 at 3:09 PM, Robert Milkowski <milek at task.gda.pl> wrote: >> Well, I''ve seen core dumps bigger than 10GB (even without ZFS)... :) > > Was that the size in the dump device or the size in /var/crash? If it > was the size in /var/crash, divide that by the compress ratio reported > on the console after the dump completed.I just did some digging for real life examples. Here are some extremes. The first one is extreme in size and the second one is extreme in compression ratio. All of my samples (~20) had compression ratios that ranged from 3.51 to 6.97. 100% done: 1946979 pages dumped, compression ratio 4.01, dump succeeded 100% done: 501196 pages dumped, compression ratio 6.97, dump succeeded $ echo ''1946979 * 8 / 1024 /1024 / 4.01'' | bc -l 3.70430696634877649625 $ echo ''501196 * 8 / 1024 /1024 / 6.97'' | bc -l .54861148084424318507 The first one is 14.8 GB uncompressed, but wrote 3.7 GB to dump. The second one was 3.8 GB uncompressed but wrote 0.5 GB to dump. -- Mike Gerdts http://mgerdts.blogspot.com/
Jan Damborsky
2008-Jun-26 04:09 UTC
[zfs-discuss] [caiman-discuss] swap & dump on ZFS volume
Thank you very much all for this valuable input. Based on the collected information, I would take following approach as far as calculating size of swap and dump devices on ZFS volumes in Caiman installer is concerned. [1] Following formula would be used for calculating swap and dump sizes: size_of_swap = size_of_dump = MAX(512 MiB, MIN(physical_memory/2, 32 GiB)) User can reconfigure this after installation is done on live system by "zfs set" command. [2] dump device will be considered optional dump device will be created only if there is appropriate space available on disk provided. Minimum disk space required will not take into account dump device, thus allowing user to install on small disks. Recommended disk size (which now covers one full upgrade plus 2GiB space for additional software) will take into account dump device as well. Dump device will be then created if user dedicates at least recommended disk space for installation. Please feel free to correct me, if I misunderstood some point. Thank you very much again, Jan Dave Miner wrote:> Peter Tribble wrote: >> On Tue, Jun 24, 2008 at 8:27 PM, Dave Miner <Dave.Miner at sun.com> wrote: >>> Keith Bierman wrote: >>>> A lot of developers use VMs of one sort or another these days, and >>>> few of them use jumpstart (especially when the entire point of the >>>> exercise is to get their feet wet on new platforms, or new versions >>>> of old platforms). >>>> >>>> Perhaps I travel in the wrong circles these days. >>> All they''d have to do under my suggested solutoin is make the virtual >>> disk large enough to get a dump pool created automatically. Our >>> recommended sizing would encompass that. >> >> So remind me again - what is our recommended sizing? (Especially >> in the light of this discussion.) >> > > > Dynamically calculated based info recorded in the image. > > http://src.opensolaris.org/source/xref/caiman/slim_source/usr/src/lib/liborchestrator/perform_slim_install.c#om_get_recommended_size > > > It''s in the 4+ GB range right now. > > Dave
Robert Milkowski
2008-Jun-28 02:08 UTC
[zfs-discuss] [caiman-discuss] swap & dump on ZFS volume
Hello Mike, Wednesday, June 25, 2008, 9:36:16 PM, you wrote: MG> On Wed, Jun 25, 2008 at 3:09 PM, Robert Milkowski <milek at task.gda.pl> wrote:>> Well, I''ve seen core dumps bigger than 10GB (even without ZFS)... :)MG> Was that the size in the dump device or the size in /var/crash? If it MG> was the size in /var/crash, divide that by the compress ratio reported MG> on the console after the dump completed. good poin - it was file size in /var/crash so uncompressed. -- Best regards, Robert mailto:milek at task.gda.pl http://milek.blogspot.com
jan damborsky
2008-Jun-30 12:33 UTC
[zfs-discuss] [caiman-discuss] swap & dump on ZFS volume
Hi Darren, Darren J Moffat wrote:> Jan Damborsky wrote: >> Thank you very much all for this valuable input. >> >> Based on the collected information, I would take >> following approach as far as calculating size of >> swap and dump devices on ZFS volumes in Caiman >> installer is concerned. >> >> [1] Following formula would be used for calculating >> swap and dump sizes: >> >> size_of_swap = size_of_dump = MAX(512 MiB, MIN(physical_memory/2, 32 >> GiB)) >> >> User can reconfigure this after installation is done on live >> system by "zfs set" command. > > If the min space isn''t available do NOT abort the install just > continue without creating swap space, but put a small warning > somewhere suitable. > Don''t ask the user to confirm this and don''t make a big deal about it. >I think it is necessary to have some absolute minimum and not allow installer to proceed if user doesn''t provide at least minimum required, as we have to make sure that installation doesn''t fail because of space issues. As this lower bound is not hard coded but dynamically calculated by the installer according to the size of bits to be installed, it reflects actual needs as far as necessary minimum space is required - it is currently ~4GiB. However, the absolute minimum always includes minimum swap space, which is now 512 MiB. I think the algorithm might be modified, so that swap space is not created if space doesn''t allow it, but to be honest I don''t know if this is what we want to allow for normal user. Thank you, Jan
Darren J Moffat
2008-Jun-30 13:20 UTC
[zfs-discuss] [caiman-discuss] swap & dump on ZFS volume
jan damborsky wrote:> I think it is necessary to have some absolute minimum > and not allow installer to proceed if user doesn''t > provide at least minimum required, as we have to make > sure that installation doesn''t fail because of space > issues.I very strongly disagree. Neither swap or dump are mandatory for running Solaris.> As this lower bound is not hard coded but dynamically calculated > by the installer according to the size of bits to be installed, > it reflects actual needs as far as necessary minimum space is > required - it is currently ~4GiB.Which is unrealistically too high given that the actual amount of bits that are put on disk by a minimal install.> However, the absolute minimum always includes minimum swap space, > which is now 512 MiB. I think the algorithm might be modified, > so that swap space is not created if space doesn''t allow it, > but to be honest I don''t know if this is what we want to allowWhy not ? swap is not mandatory. If there is enough space for the packages that will be installed but not enough for swap or dump then the installation should proceed, it just wouldn''t create swap or dump. -- Darren J Moffat
jan damborsky
2008-Jun-30 13:51 UTC
[zfs-discuss] [caiman-discuss] swap & dump on ZFS volume
Darren J Moffat wrote:> jan damborsky wrote: >> I think it is necessary to have some absolute minimum >> and not allow installer to proceed if user doesn''t >> provide at least minimum required, as we have to make >> sure that installation doesn''t fail because of space >> issues. > > I very strongly disagree. > > Neither swap or dump are mandatory for running Solaris.I agree with you in this point. Actually, the posted proposal count with dump to be optional. I am sorry about the confusion - by minimum space required I meant minimum disk space for installation, not minimum swap or dump space.> >> As this lower bound is not hard coded but dynamically calculated >> by the installer according to the size of bits to be installed, >> it reflects actual needs as far as necessary minimum space is >> required - it is currently ~4GiB. > > Which is unrealistically too high given that the actual amount of bits > that are put on disk by a minimal install.Installer currently uses following formula for calculating minimum required disk space: min_size = image_size * 1.2 + MIN_SWAP_SPACE, where MIN_SWAP_SPACE is 512MiB.> >> However, the absolute minimum always includes minimum swap space, >> which is now 512 MiB. I think the algorithm might be modified, >> so that swap space is not created if space doesn''t allow it, >> but to be honest I don''t know if this is what we want to allow > > Why not ? swap is not mandatory.I agree - I am just thinking, if it is fine in general to allow normal non-experienced user (who is the target audience for Slim installer) to run system without swap. To be honest, I don''t know, since I am not very experienced in this area. If people agree that this is not issue at all, I don''t have any objections against making swap optional.> > If there is enough space for the packages that will be installed but > not enough for swap or dump then the installation should proceed, it > just wouldn''t create swap or dump.Please see above. Thank you, Jan
jan damborsky
2008-Jun-30 14:19 UTC
[zfs-discuss] [caiman-discuss] swap & dump on ZFS volume
Hi Mike, Mike Gerdts wrote:> On Wed, Jun 25, 2008 at 11:09 PM, Jan Damborsky <Jan.Damborsky at sun.com> wrote: >> Thank you very much all for this valuable input. >> >> Based on the collected information, I would take >> following approach as far as calculating size of >> swap and dump devices on ZFS volumes in Caiman >> installer is concerned. >> >> [1] Following formula would be used for calculating >> swap and dump sizes: >> >> size_of_swap = size_of_dump = MAX(512 MiB, MIN(physical_memory/2, 32 GiB)) > > dump should scale with memory size, but the size given is completely > overkill. On very active (heavy kernel activity) servers with 300+ GB > of RAM, I have never seen a (compressed) dump that needed more than 8 > GB. Even uncompressed the maximum size I''ve seen has been in the 18 > GB range. This has been without zfs in the mix. It is my > understanding that at one time the arc was dumped as part of kernel > memory but that was regarded as a bug and has sense been fixed. If > the arc is dumped, a value of dump much closer to physical memory is > likely to be appropriate.I would agree that given the fact, user can customize this any time after installation, the smaller upper bound is the better. Would it be fine then to use 16 GiB, or even smaller one would be more appropriate ?> > As an aside, does the dedicated dump on all machines make it so that > savecore no longer runs by default? It just creates a lot of extra > I/O during boot (thereby slowing down boot after a crash) and uses a > lot of extra disk space for those that will never look at a crash > dump. Those that actually use it (not the majority target audience > for OpenSolaris, I would guess) will be able to figure out how to > enable (the yet non-existent) svc:/system/savecore:default. >Looking at the savecore(1M) man pages, it seems that it is managed by svc:/system/dumpadm:default. Looking at the installed system, this service is online. If I understand correctly, you are recommending to disable it by default ? Thank you, Jan
> I agree - I am just thinking, if it is fine in general to allow > normal non-experienced user (who is the target audience for Slim > installer) to run system without swap. To be honest, I don''t know, > since I am not very experienced in this area. > If people agree that this is not issue at all, I don''t have any > objections against making swap optional. >Now that we don''t have to reserve slices for it, making swap optional in the space calculation is fine. We don''t place any lower limits on memory, and it''s just virtual memory, after all. Besides which, we can infer that the system works well enough for the user''s purposes without swap since the boot from the CD won''t have used any swap. Dave
Mike Gerdts
2008-Jun-30 15:52 UTC
[zfs-discuss] [caiman-discuss] swap & dump on ZFS volume
On Mon, Jun 30, 2008 at 9:19 AM, jan damborsky <Jan.Damborsky at sun.com> wrote:> Hi Mike, > > > Mike Gerdts wrote: >> >> On Wed, Jun 25, 2008 at 11:09 PM, Jan Damborsky <Jan.Damborsky at sun.com> >> wrote: >>> >>> Thank you very much all for this valuable input. >>> >>> Based on the collected information, I would take >>> following approach as far as calculating size of >>> swap and dump devices on ZFS volumes in Caiman >>> installer is concerned. >>> >>> [1] Following formula would be used for calculating >>> swap and dump sizes: >>> >>> size_of_swap = size_of_dump = MAX(512 MiB, MIN(physical_memory/2, 32 >>> GiB)) >> >> dump should scale with memory size, but the size given is completely >> overkill. On very active (heavy kernel activity) servers with 300+ GB >> of RAM, I have never seen a (compressed) dump that needed more than 8 >> GB. Even uncompressed the maximum size I''ve seen has been in the 18 >> GB range. This has been without zfs in the mix. It is my >> understanding that at one time the arc was dumped as part of kernel >> memory but that was regarded as a bug and has sense been fixed. If >> the arc is dumped, a value of dump much closer to physical memory is >> likely to be appropriate. > > I would agree that given the fact, user can customize this any time > after installation, the smaller upper bound is the better. Would > it be fine then to use 16 GiB, or even smaller one would be more > appropriate ?By default, only kernel memory is dumped to the dump device. Further, this is compressed. I have heard that 3x compression is common and the samples that I have range from 3.51x - 6.97x. If you refer to InfoDoc 228921 (contract only - can that be opened or can a Sun employee get permission to post same info to an open wiki?) you will see a method for approximating the size of a crash dump. On my snv_91 virtualbox instance (712 MB RAM configured), that method gave me an estimated (uncompressed) crash dump size of about 450 MB. I induced a panic to test the approximation. In reality it was 323 MB and compress(1) takes it down to 106 MB. My understanding is that the algorithm used in the kernel is a bit less aggressive than the algorithm used by compress(1) so maybe figure 120 - 150 MB in this case. My guess is that this did not compress as well as my other samples because on this smaller system a higher percentage of my kernel pages were not full of zeros. Perhaps the right size for the dump device is more like: MAX(256 MiB, MIN(physical_memory/4, 16 GiB) Further, dumpadm(1M) could be enhanced to resize the dump volume on demand. The size that it would choose would likely be based upon what is being dumped (kernel, kernel+user, etc.), memory size, current estimate using InfoDoc 228921 logic, etc.>> As an aside, does the dedicated dump on all machines make it so that >> savecore no longer runs by default? It just creates a lot of extra >> I/O during boot (thereby slowing down boot after a crash) and uses a >> lot of extra disk space for those that will never look at a crash >> dump. Those that actually use it (not the majority target audience >> for OpenSolaris, I would guess) will be able to figure out how to >> enable (the yet non-existent) svc:/system/savecore:default. >> > > Looking at the savecore(1M) man pages, it seems that it is managed > by svc:/system/dumpadm:default. Looking at the installed system, > this service is online. If I understand correctly, you are recommending > to disable it by default ?"dumpadm -n" is really the right way to do this. -- Mike Gerdts http://mgerdts.blogspot.com/
Jeff Bonwick
2008-Jun-30 23:19 UTC
[zfs-discuss] [caiman-discuss] swap & dump on ZFS volume
> Neither swap or dump are mandatory for running Solaris.Dump is mandatory in the sense that losing crash dumps is criminal. Swap is more complex. It''s certainly not mandatory. Not so long ago, swap was typically larger than physical memory. But in recent years, we''ve essentially moved to a world in which paging is considered a bug. Swap devices are often only a fraction of physical memory size now, which raises the question of why we even bother. On my desktop, which has 16GB of memory, the default OpenSolaris swap partition is 2GB. That''s just stupid. Unless swap space significantly expands the amount of addressable virtual memory, there''s no reason to have it. There have been a number of good suggestions here: (1) The right way to size the dump device is to let dumpadm(1M) do it based on the dump content type. (2) In a virtualized environment, a better way to get a crash dump would be to snapshot the VM. This would require a little bit of host/guest cooperation, in that the installer (or dumpadm) would have to know that it''s operating in a VM, and the kernel would need some way to notify the VM that it just panicked. Both of these ought to be doable. Jeff
On Mon, Jun 30, 2008 at 04:19:15PM -0700, Jeff Bonwick wrote:> (2) In a virtualized environment, a better way to get a crash dump > would be to snapshot the VM. This would require a little bit > of host/guest cooperation, in that the installer (or dumpadm) > would have to know that it''s operating in a VM, and the kernel > would need some way to notify the VM that it just panicked. > Both of these ought to be doable.This is trivial with xVM, btw: just make the panic routine call HYPERVISOR_shutdown(SHUTDOWN_crash); and dom0 will automatically create a full crash dump for the domain, which is readably directly in MDB. As a refinement you might want to only do this if a (suitable) place to crash dump isn''t available. regards john
jan damborsky
2008-Jul-01 09:22 UTC
[zfs-discuss] [caiman-discuss] swap & dump on ZFS volume
Hi Jeff, Jeff Bonwick wrote:>> Neither swap or dump are mandatory for running Solaris. > > Dump is mandatory in the sense that losing crash dumps is criminal.I think that installer should be tolerant in this point and shouldn''t refuse to proceed with installation if user doesn''t provide enough available disk space to create dump device. It should be probably documented (for example mentioned in release notes) that when minimum disk space is provided for installation, swap & dump are not created.> > Swap is more complex. It''s certainly not mandatory. Not so long ago, > swap was typically larger than physical memory. But in recent years, > we''ve essentially moved to a world in which paging is considered a bug. > Swap devices are often only a fraction of physical memory size now, > which raises the question of why we even bother. On my desktop, which > has 16GB of memory, the default OpenSolaris swap partition is 2GB. > That''s just stupid. Unless swap space significantly expands the > amount of addressable virtual memory, there''s no reason to have it.I agree with you in this point. Since new formula for calculating swap & dump will take into account amount of physical memory, the values should make more sense. That said, this is just default value and certainly wouldn''t be feasible in all situations. However, as this is something which can be changed at will after installation is done, I would rather keep that formula as simple as reasonable.> > There have been a number of good suggestions here: > > (1) The right way to size the dump device is to let dumpadm(1M) do it > based on the dump content type.To be honest, it is not quite clear to me, how we might utilize dumpadm(1M) to help us to calculate/recommend size of dump device. Could you please elaborate more on this ?> > (2) In a virtualized environment, a better way to get a crash dump > would be to snapshot the VM. This would require a little bit > of host/guest cooperation, in that the installer (or dumpadm) > would have to know that it''s operating in a VM, and the kernel > would need some way to notify the VM that it just panicked. > Both of these ought to be doable.Yes - I like this idea as well. But until the appropriate support is provided by virtual tools and/or implemented in kernel, I think (I might be wrong) that in the installer we will still need to use standard mechanisms for now. Thank you, Jan
jan damborsky
2008-Jul-01 09:43 UTC
[zfs-discuss] [caiman-discuss] swap & dump on ZFS volume
Mike Gerdts wrote:> On Mon, Jun 30, 2008 at 9:19 AM, jan damborsky <Jan.Damborsky at sun.com> wrote: >> Hi Mike, >> >> >> Mike Gerdts wrote: >>> On Wed, Jun 25, 2008 at 11:09 PM, Jan Damborsky <Jan.Damborsky at sun.com> >>> wrote: >>>> Thank you very much all for this valuable input. >>>> >>>> Based on the collected information, I would take >>>> following approach as far as calculating size of >>>> swap and dump devices on ZFS volumes in Caiman >>>> installer is concerned. >>>> >>>> [1] Following formula would be used for calculating >>>> swap and dump sizes: >>>> >>>> size_of_swap = size_of_dump = MAX(512 MiB, MIN(physical_memory/2, 32 >>>> GiB)) >>> dump should scale with memory size, but the size given is completely >>> overkill. On very active (heavy kernel activity) servers with 300+ GB >>> of RAM, I have never seen a (compressed) dump that needed more than 8 >>> GB. Even uncompressed the maximum size I''ve seen has been in the 18 >>> GB range. This has been without zfs in the mix. It is my >>> understanding that at one time the arc was dumped as part of kernel >>> memory but that was regarded as a bug and has sense been fixed. If >>> the arc is dumped, a value of dump much closer to physical memory is >>> likely to be appropriate. >> I would agree that given the fact, user can customize this any time >> after installation, the smaller upper bound is the better. Would >> it be fine then to use 16 GiB, or even smaller one would be more >> appropriate ? > > By default, only kernel memory is dumped to the dump device. Further, > this is compressed. I have heard that 3x compression is common and > the samples that I have range from 3.51x - 6.97x. > > If you refer to InfoDoc 228921 (contract only - can that be opened or > can a Sun employee get permission to post same info to an open wiki?) > you will see a method for approximating the size of a crash dump. On > my snv_91 virtualbox instance (712 MB RAM configured), that method > gave me an estimated (uncompressed) crash dump size of about 450 MB. > I induced a panic to test the approximation. In reality it was 323 MB > and compress(1) takes it down to 106 MB. My understanding is that the > algorithm used in the kernel is a bit less aggressive than the > algorithm used by compress(1) so maybe figure 120 - 150 MB in this > case. My guess is that this did not compress as well as my other > samples because on this smaller system a higher percentage of my > kernel pages were not full of zeros. > > Perhaps the right size for the dump device is more like: > > MAX(256 MiB, MIN(physical_memory/4, 16 GiB)Thanks a lot for making this investigation and collecting valuable data - I will modify the proposed formula according to your suggestion.> > Further, dumpadm(1M) could be enhanced to resize the dump volume on > demand. The size that it would choose would likely be based upon what > is being dumped (kernel, kernel+user, etc.), memory size, current > estimate using InfoDoc 228921 logic, etc. > >>> As an aside, does the dedicated dump on all machines make it so that >>> savecore no longer runs by default? It just creates a lot of extra >>> I/O during boot (thereby slowing down boot after a crash) and uses a >>> lot of extra disk space for those that will never look at a crash >>> dump. Those that actually use it (not the majority target audience >>> for OpenSolaris, I would guess) will be able to figure out how to >>> enable (the yet non-existent) svc:/system/savecore:default. >>> >> Looking at the savecore(1M) man pages, it seems that it is managed >> by svc:/system/dumpadm:default. Looking at the installed system, >> this service is online. If I understand correctly, you are recommending >> to disable it by default ? > > "dumpadm -n" is really the right way to do this.I see - thanks for clarifying it. Jan
jan damborsky
2008-Jul-01 09:45 UTC
[zfs-discuss] [caiman-discuss] swap & dump on ZFS volume
Dave Miner wrote:>> I agree - I am just thinking, if it is fine in general to allow >> normal non-experienced user (who is the target audience for Slim >> installer) to run system without swap. To be honest, I don''t know, >> since I am not very experienced in this area. >> If people agree that this is not issue at all, I don''t have any >> objections against making swap optional. >> > > Now that we don''t have to reserve slices for it, making swap optional in > the space calculation is fine. We don''t place any lower limits on > memory, and it''s just virtual memory, after all. Besides which, we can > infer that the system works well enough for the user''s purposes without > swap since the boot from the CD won''t have used any swap.That is a good point. Based on this and also on Jeff''s comment I will make swap optional as well. Thank you, Jan
Jürgen Keil
2008-Jul-01 10:36 UTC
[zfs-discuss] [caiman-discuss] swap & dump on ZFS volume
Mike Gerdts wrote> By default, only kernel memory is dumped to the dump device. Further, > this is compressed. I have heard that 3x compression is common and > the samples that I have range from 3.51x - 6.97x.My samples are in the range 1.95x - 3.66x. And yes, I lost a few crash dumps on a box with a 2GB swap slice, after physical memory was upgraded from 4GB to 8GB. % grep "pages dumped" /var/adm/messages* /var/adm/messages:Jun 27 13:43:56 tiger2 genunix: [ID 409368 kern.notice] ^M100% done: 593680 pages dumped, compression ratio 3.51, /var/adm/messages.0:Jun 25 13:08:22 tiger2 genunix: [ID 409368 kern.notice] ^M100% done: 234922 pages dumped, compression ratio 2.39, /var/adm/messages.1:Jun 12 13:22:53 tiger2 genunix: [ID 409368 kern.notice] ^M100% done: 399746 pages dumped, compression ratio 1.95, /var/adm/messages.1:Jun 12 19:00:01 tiger2 genunix: [ID 409368 kern.notice] ^M100% done: 245417 pages dumped, compression ratio 2.41, /var/adm/messages.1:Jun 16 19:15:37 tiger2 genunix: [ID 409368 kern.notice] ^M100% done: 710001 pages dumped, compression ratio 3.48, /var/adm/messages.1:Jun 16 19:21:35 tiger2 genunix: [ID 409368 kern.notice] ^M100% done: 315989 pages dumped, compression ratio 3.66, /var/adm/messages.2:Jun 11 15:40:32 tiger2 genunix: [ID 409368 kern.notice] ^M100% done: 341209 pages dumped, compression ratio 2.68, This message posted from opensolaris.org
Darren J Moffat
2008-Jul-01 10:56 UTC
[zfs-discuss] [caiman-discuss] swap & dump on ZFS volume
Jeff Bonwick wrote:>> Neither swap or dump are mandatory for running Solaris. > > Dump is mandatory in the sense that losing crash dumps is criminal.Agreed on that point, I remember all to well why I was in Sun Service the days when the first dump was always lost because savecore didn''t used to be run!> Swap is more complex. It''s certainly not mandatory. Not so long ago, > swap was typically larger than physical memory. But in recent years, > we''ve essentially moved to a world in which paging is considered a bug. > Swap devices are often only a fraction of physical memory size now, > which raises the question of why we even bother. On my desktop, which > has 16GB of memory, the default OpenSolaris swap partition is 2GB. > That''s just stupid. Unless swap space significantly expands the > amount of addressable virtual memory, there''s no reason to have it.What has alwyas annoyed me about Solaris (and every Linux distro I''ve ever used) is that unlike Windows and MacOS X we put swap management (devices and their size) into the hands of the admin. The upside of this though is that it is easy to mirror swap using SVM. Instead we should take it completely out of their hands and do it all dynamically when it is needed. Now that we can swap on a ZVOL and ZVOLs can be extended this is much easier to deal with and we don''t lose the benefit of protected swap devices (in fact we have much more than we had with SVM). -- Darren J Moffat
Mike Gerdts
2008-Jul-01 12:12 UTC
[zfs-discuss] [caiman-discuss] swap & dump on ZFS volume
On Tue, Jul 1, 2008 at 5:56 AM, Darren J Moffat <Darren.Moffat at sun.com> wrote:> Instead we should take it completely out of their hands and do it all > dynamically when it is needed. Now that we can swap on a ZVOL and ZVOLs > can be extended this is much easier to deal with and we don''t lose the > benefit of protected swap devices (in fact we have much more than we had > with SVM).Are you suggesting that if I have a system that has 500 MB swap free and someone starts up another JVM with a 16 GB heap that swap should automatically grow by 16+ GB right at that time? I have seen times where applications "require" X GB of RAM, make the reservation, then never dirty more than X/2 GB of pages. In these cases dynamically growing swap to a certain point may be OK. In most cases, however, I see this as a recipe for disaster. I would rather have an application die (and likely restart via SMF) because it can''t get the memory that it requested than have heavy paging bring the system to such a crawl that transactions time out and it takes tens of minutes for administrators to log in and shut down some workload. The app that can''t start will likely do so during a maintenance window. The app that causes the system to crawl will, with all likelihood, do so during peak production or when the admin is in bed. Perhaps bad paging activity (definition needed) should throw some messages to FMA so that the nice GUI tool that answers the question "why does my machine suck?" can say that it has been excessively short on memory X times in recent history. Any of these approaches is miles above the Linux approach of finding a memory hog to kill. -- Mike Gerdts http://mgerdts.blogspot.com/
Darren J Moffat
2008-Jul-01 12:31 UTC
[zfs-discuss] [caiman-discuss] swap & dump on ZFS volume
Mike Gerdts wrote:> On Tue, Jul 1, 2008 at 5:56 AM, Darren J Moffat <Darren.Moffat at sun.com> wrote: >> Instead we should take it completely out of their hands and do it all >> dynamically when it is needed. Now that we can swap on a ZVOL and ZVOLs >> can be extended this is much easier to deal with and we don''t lose the >> benefit of protected swap devices (in fact we have much more than we had >> with SVM). > > Are you suggesting that if I have a system that has 500 MB swap free > and someone starts up another JVM with a 16 GB heap that swap should > automatically grow by 16+ GB right at that time? I have seen times > where applications "require" X GB of RAM, make the reservation, then > never dirty more than X/2 GB of pages. In these cases dynamically > growing swap to a certain point may be OK.Not at all, and I don''t see how you could get that assumption from what I said. I said "dynamically when it is needed".> In most cases, however, I see this as a recipe for disaster. I would > rather have an application die (and likely restart via SMF) because it > can''t get the memory that it requested than have heavy paging bring > the system to such a crawl that transactions time out and it takes > tens of minutes for administrators to log in and shut down some > workload. The app that can''t start will likely do so during a > maintenance window. The app that causes the system to crawl will, > with all likelihood, do so during peak production or when the admin is > in bed.I would not favour a system where the admin had no control over swap. I''m just suggesting that in many cases where swap is actually needed there is no real need for the admin to be involved in managing the swap and its size should not need to be predetermined. -- Darren J Moffat
Mike Gerdts
2008-Jul-01 13:10 UTC
[zfs-discuss] [caiman-discuss] swap & dump on ZFS volume
On Tue, Jul 1, 2008 at 7:31 AM, Darren J Moffat <Darren.Moffat at sun.com> wrote:> Mike Gerdts wrote: >> >> On Tue, Jul 1, 2008 at 5:56 AM, Darren J Moffat <Darren.Moffat at sun.com> >> wrote: >>> >>> Instead we should take it completely out of their hands and do it all >>> dynamically when it is needed. Now that we can swap on a ZVOL and ZVOLs >>> can be extended this is much easier to deal with and we don''t lose the >>> benefit of protected swap devices (in fact we have much more than we had >>> with SVM). >> >> Are you suggesting that if I have a system that has 500 MB swap free >> and someone starts up another JVM with a 16 GB heap that swap should >> automatically grow by 16+ GB right at that time? I have seen times >> where applications "require" X GB of RAM, make the reservation, then >> never dirty more than X/2 GB of pages. In these cases dynamically >> growing swap to a certain point may be OK. > > Not at all, and I don''t see how you could get that assumption from what I > said. I said "dynamically when it is needed".I think I came off wrong in my initial message. I''ve seen times when vmstat reports only megabytes of free swap while gigabytes of RAM were available. That is, reservations far outstripped actual usage. Do you have mechanisms in mind to be able to detect such circumstances and grow swap to a point that the system can handle more load without spiraling to a long slow death? -- Mike Gerdts http://mgerdts.blogspot.com/
On Tue, Jul 1, 2008 at 8:10 AM, Mike Gerdts <mgerdts at gmail.com> wrote:> On Tue, Jul 1, 2008 at 7:31 AM, Darren J Moffat <Darren.Moffat at sun.com> wrote: >> Mike Gerdts wrote: >>> >>> On Tue, Jul 1, 2008 at 5:56 AM, Darren J Moffat <Darren.Moffat at sun.com> >>> wrote: >>>> >>>> Instead we should take it completely out of their hands and do it all >>>> dynamically when it is needed. Now that we can swap on a ZVOL and ZVOLs >>>> can be extended this is much easier to deal with and we don''t lose the >>>> benefit of protected swap devices (in fact we have much more than we had >>>> with SVM). >>> >>> Are you suggesting that if I have a system that has 500 MB swap free >>> and someone starts up another JVM with a 16 GB heap that swap should >>> automatically grow by 16+ GB right at that time? I have seen times >>> where applications "require" X GB of RAM, make the reservation, then >>> never dirty more than X/2 GB of pages. In these cases dynamically >>> growing swap to a certain point may be OK. >> >> Not at all, and I don''t see how you could get that assumption from what I >> said. I said "dynamically when it is needed". > > I think I came off wrong in my initial message. I''ve seen times when > vmstat reports only megabytes of free swap while gigabytes of RAM were > available. That is, reservations far outstripped actual usage. Do > you have mechanisms in mind to be able to detect such circumstances > and grow swap to a point that the system can handle more load without > spiraling to a long slow death?Having this dynamic would be nice with Oracle. 10g at least will use DISM in the preferred configuration Oracle is now preaching to DBAs. I ran into this a few months ago on an upgrade (Solaris 8 -> 10, Oracle 8 -> 10g, and hw upgrade). The side effect of using DISM is that it reserves an amount equal to the SGA in swap, and will fail to startup if swap is too small. In practice, I don''t see the space ever being touched (I suspect it''s mostly there as a requirement for dynamic reconfiguration w/ DISM, but didn''t bother to dig that far).
Darren J Moffat
2008-Jul-01 13:35 UTC
[zfs-discuss] [caiman-discuss] swap & dump on ZFS volume
Mike Gerdts wrote:>> Not at all, and I don''t see how you could get that assumption from what I >> said. I said "dynamically when it is needed". > > I think I came off wrong in my initial message. I''ve seen times when > vmstat reports only megabytes of free swap while gigabytes of RAM were > available. That is, reservations far outstripped actual usage.Ah that makes it more clear. > Do you have mechanisms in mind to be able to detect such circumstances> and grow swap to a point that the system can handle more load without > spiraling to a long slow death?I don''t as yet because I haven''t had time to think about this. Maybe once I''ve finished with the ZFS Crypto project and I spend some time looking at encrypted VM (other than by swapping on an encrypted ZVOL). At the moment while it annoys me it isn''t on my todo list to try and implement a fix. -- Darren J Moffat
Richard Elling
2008-Jul-01 15:16 UTC
[zfs-discuss] [caiman-discuss] swap & dump on ZFS volume
Darren J Moffat wrote:> Mike Gerdts wrote: > > >>> Not at all, and I don''t see how you could get that assumption from what I >>> said. I said "dynamically when it is needed". >>> >> I think I came off wrong in my initial message. I''ve seen times when >> vmstat reports only megabytes of free swap while gigabytes of RAM were >> available. That is, reservations far outstripped actual usage. >> > > Ah that makes it more clear. > > > Do you have mechanisms in mind to be able to detect such circumstances > >> and grow swap to a point that the system can handle more load without >> spiraling to a long slow death? >> > > I don''t as yet because I haven''t had time to think about this. Maybe > once I''ve finished with the ZFS Crypto project and I spend some time > looking at encrypted VM (other than by swapping on an encrypted ZVOL). > At the moment while it annoys me it isn''t on my todo list to try and > implement a fix. > >Here is a good start, BSD''s dynamic_pager http://developer.apple.com/documentation/Darwin/Reference/ManPages/man8/dynamic_pager.8.html Mike, many people use this all day long and seem to be quite happy. I think the slow death spiral might be overrated :-) -- richard
Miles Nordin
2008-Jul-01 16:55 UTC
[zfs-discuss] [caiman-discuss] swap & dump on ZFS volume
>>>>> "re" == Richard Elling <Richard.Elling at Sun.COM> writes:re> Mike, many people use this all day long and seem to be quite re> happy. I think the slow death spiral might be overrated :-) I don''t think it''s overrated at all. People all around me are using this dynamic_pager right now, and they just reboot when they see too many pinwheels. If they are ``quite happy,'''' it''s not with their pager. The pinwheel is part of a Mac user''s daily vocabulary, and although they generally don''t know this, it almost always appears because of programs that leak memory, grow, and eventually cause thrashing. They do not even realize that restarting Mail or Firefox will fix the pinwheels. They just reboot. so obviously it''s an unworkable approach. To them, being forced to reboot, even if it takes twenty minutes to shut down as long as it''s a clean reboot, makes them feel more confident than Firefox unexpectedly crashing. For us, exactly the opposite is true. I think dynamic_pager gets it backwards. ``demand'''' is a reason *NOT* to increase swap. If all the allocated pages in swap are cold---colder than the disk''s io capacity---then there is no ``demand'''' and maybe it''s ok to add some free pages which might absorb some warmer data. If there are already warm pages in swap (``demand''''), then do not satisfy more of it, instead let swap fill and return ENOMEM. They see demand as capacity rather than temperature but...the machine does need to run out of memory eventually. Don''t drink the dynamic_pager futuristic kool-aid. It''s broken, both in theory and in the day-to-day experience of the Mac users around me. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 304 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20080701/2312490e/attachment.bin>
Keith Bierman
2008-Jul-01 17:07 UTC
[zfs-discuss] [caiman-discuss] swap & dump on ZFS volume
On Jul 1, 2008, at 10:55 AM, Miles Nordin wrote:> > I don''t think it''s overrated at all. People all around me are using > this dynamic_pager right now, and they just reboot when they see too > many pinwheels. If they are ``quite happy,'''' it''s not with their > pager.I often exist in a sea of mac users, and I''ve never seen them reboot other than after the periodic Apple Updates. Killing firefox every couple of days, or after visiting certain demented sites is not uncommon and is probably a good idea.> .... > > They see demand as capacity rather than temperature but...the machine > does need to run out of memory eventually. Don''t drink the > dynamic_pager futuristic kool-aid. It''s broken, both in theory and in > the day-to-day experience of the Mac users around me.I''ve got macs with uptimes of months ... admittedly not in the same territory as my old SunOS or Solaris boxes, but Apple has seldom resisted the temptation to drop a security update or a quicktime update for longer. -- Keith H. Bierman khbkhb at gmail.com | AIM kbiermank 5430 Nassau Circle East | Cherry Hills Village, CO 80113 | 303-997-2749 <speaking for myself*> Copyright 2008
Bob Friesenhahn
2008-Jul-01 17:15 UTC
[zfs-discuss] [caiman-discuss] swap & dump on ZFS volume
On Tue, 1 Jul 2008, Miles Nordin wrote:> > I don''t think it''s overrated at all. People all around me are using > this dynamic_pager right now, and they just reboot when they see too > many pinwheels. If they are ``quite happy,'''' it''s not with their > pager.While we have seen these "pinwheels" under OS-X, the cause seems to be usually application lockup (due to poor application/library design) and not due to paging to death. Paging to death causes lots of obvious disk churn. Microsoft Windows includes a dynamic page file as well. It is wrong to confuse total required paging space with thrashing. These are completely different issues. Dynamic sizing of paging space seems to fit well with the new zfs root/boot strategy where everything is shared via a common pool. If you don''t use it, you don''t lose it. System resource limits can be used to block individual applications from consuming all resources. Bob =====================================Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Richard Elling
2008-Jul-01 17:17 UTC
[zfs-discuss] [caiman-discuss] swap & dump on ZFS volume
Miles Nordin wrote:>>>>>> "re" == Richard Elling <Richard.Elling at Sun.COM> writes: >>>>>> > > re> Mike, many people use this all day long and seem to be quite > re> happy. I think the slow death spiral might be overrated :-) > > I don''t think it''s overrated at all. People all around me are using > this dynamic_pager right now, and they just reboot when they see too > many pinwheels. If they are ``quite happy,'''' it''s not with their > pager. >If you run out of space, things fail. Pinwheels are a symptom of running out of RAM, not running out of swap.> The pinwheel is part of a Mac user''s daily vocabulary, and although > they generally don''t know this, it almost always appears because of > programs that leak memory, grow, and eventually cause thrashing. They > do not even realize that restarting Mail or Firefox will fix the > pinwheels. They just reboot. >...which frees RAM.> so obviously it''s an unworkable approach. To them, being forced to > reboot, even if it takes twenty minutes to shut down as long as it''s a > clean reboot, makes them feel more confident than Firefox unexpectedly > crashing. For us, exactly the opposite is true. > > I think dynamic_pager gets it backwards. ``demand'''' is a reason *NOT* > to increase swap. If all the allocated pages in swap are > cold---colder than the disk''s io capacity---then there is no > ``demand'''' and maybe it''s ok to add some free pages which might absorb > some warmer data. If there are already warm pages in swap > (``demand''''), then do not satisfy more of it, instead let swap fill > and return ENOMEM. >You will get more service calls for failures due to ENOMEM than you will get for pinwheels. Given the large size of disks in today''s systems, you may never see an ENOMEM. The goodness here is that it is one less thing that requires a service touch, even a local sysadmin service touch costs real $$. -- richard
Miles Nordin
2008-Jul-01 19:46 UTC
[zfs-discuss] [caiman-discuss] swap & dump on ZFS volume
>>>>> "bf" == Bob Friesenhahn <bfriesen at simple.dallas.tx.us> writes: >>>>> "re" == Richard Elling <Richard.Elling at Sun.COM> writes:re> If you run out of space, things fail. Pinwheels are a symptom re> of running out of RAM, not running out of swap. okay. But what is the point? Pinwheels are a symptom of thrashing. Pinwheels are not showing up when the OS is returning ENOMEM. Pinwheels are not ``things fail'''', they are ``things are going slower than some watcher thinks they should.'''' AFAICT they show up when the application under the cursor has been blocked for about five seconds, which is usually because it''s thrashing, though sometimes it''s because it''s trying to read from an NFS share that went away (this also causes pinwheels). bf> While we have seen these "pinwheels" under OS-X, the cause bf> seems to be usually application lockup (due to poor bf> application/library design) and not due to paging to death. that''s simply not my experience. bf> Paging to death causes lots of obvious disk churn. You can check for it in ''top'' on OS X. they list pageins and pageouts. bf> It is wrong to confuse total required paging space with bf> thrashing. These are completely different issues. and I did not. I even rephrased the word ``demand'''' in terms of thrashing. I am not confused. bf> Dynamic sizing of paging space seems to fit well with the new bf> zfs root/boot strategy where everything is shared via a common bf> pool. yes, it fits extremely well. What I''m saying is, do not do it just because it ``fits well''''. Even if it fits really really well so it almost begs you like a sort of compulsive taxonomical lust to put the square peg into the square hole, don''t do it, because it''s a bad idea! When applications request memory reservations that are likely to bring the whole system down due to thrashing, they need to get ENOMEM. It isn''t okay to change the memory reservation ceiling to the ZFS pool size, or to any other unreasonably large and not-well-considered amount, even if the change includes a lot of mealy-mouthed pandering orbiting around the word ``dynamic''''. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 304 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20080701/e0a995b8/attachment.bin>
Bob Friesenhahn
2008-Jul-01 20:06 UTC
[zfs-discuss] [caiman-discuss] swap & dump on ZFS volume
On Tue, 1 Jul 2008, Miles Nordin wrote:> > okay. But what is the point? > > Pinwheels are a symptom of thrashing.They seem like the equivalent of the meaningless hourglass icon to me.> Pinwheels are not showing up when the OS is returning ENOMEM. > Pinwheels are not ``things fail'''', they are ``things are going slower > than some watcher thinks they should.''''Not all applications demand instant response when they are processing. Sometimes they have actual work to do.> bf> It is wrong to confuse total required paging space with > bf> thrashing. These are completely different issues. > > and I did not. I even rephrased the word ``demand'''' in terms of > thrashing. I am not confused.You sound angry.> When applications request memory reservations that are likely to bring > the whole system down due to thrashing, they need to get ENOMEM. ItWhat is the relationship between the size of the memory reservation and thrashing? Are they somehow related? I don''t see the relationship. It does not bother me if the memory reservation is 10X the size of physical memory as long as the access is orderly and not under resource contention (i.e. thrashing). A few days ago I had a process running which consumed 48GB of virtual address space without doing any noticeable thrashing and with hardly any impact to usability of the desktop.> isn''t okay to change the memory reservation ceiling to the ZFS pool > size, or to any other unreasonably large and not-well-considered > amount, even if the change includes a lot of mealy-mouthed pandering > orbiting around the word ``dynamic''''.I have seen mealy worms. They are kind of ugly but fun to hold in your hand and show your friends. I am don''t think I would want them in my mouth and am not sure how I would pander to a worm. Bob =====================================Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Miles Nordin
2008-Jul-01 20:53 UTC
[zfs-discuss] [caiman-discuss] swap & dump on ZFS volume
>>>>> "bf" == Bob Friesenhahn <bfriesen at simple.dallas.tx.us> writes:bf> What is the relationship between the size of the memory bf> reservation and thrashing? The problem is that size-capping is the only control we have over thrashing right now. Maybe there are better ways to predict thrashing than through reservation size, and maybe it''s possible to design swap admission control that''s safer and yet also more gracious to your Java-like reservations of large cold datasets than the flat capping we have now, but Mac OS doesn''t have it. I suspect there are even <cough> some programs that try to get an idea how much memory pressure there is, and how often they need to gc, by making big reservations until they get ENOMEM. They develop tricks in an ecosystem, presuming some reasonable swap cap is configured, so removing it will mess up their (admittedly clumsy) tricks. To my view, if the goal is ``manual tuning is bad. we want to eliminate swap size as a manual tuneable,'''' then the ``dynamic'''' aspect of the tuning should be to grow the swap area until it gets too hot: until the ``demand'''' is excessive. Some WFQ-ish thing might be in order, too, like a complicated version of ulimit. But this may be tricky or impossible, and in any case none of that is on the table so far: the type of autotuning you are trying to copy from other operating systems is just to remove the upper limit on swap size entirely, which is a step backwards. I think it''s the wrong choice for a desktop, but it is somewhat workable choice on a single-user machine where it''s often just as irritating to the user if his one big application crashes in which all his work is stored, as if the whole machine grinds to a halt. But that view is completely incompatible with most Solaris systems as well as with this fault-isolation, resiliency marketing push with sol10. so, if you are saying Mac users are happy with dynamic swap, <raises hand>, not happy!, and even if I were it''s not applicable to Solaris. I think ZFS swap should stay with a fixed-sized (albeit manually changeable!) cap until Java wizards can integrate some dynamicly self-disciplining swap concepts into their gc algorithms (meaning, probably forever). bf> You sound angry. Maybe I am and maybe I''m not, but wouldn''t it be better not to bring this up unless it''s interfering with my ability to communicate? Because if I were, saying I sound angry is poking the monkey through the bars, likely to make me angrier, which is unpleasant for me and wastes time for everyone---unless it amuses you or something. This is a technical list. Let''s not talk about our feelings, please. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 304 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20080701/3f7b9b5a/attachment.bin>
Jeff Bonwick
2008-Jul-01 20:56 UTC
[zfs-discuss] [caiman-discuss] swap & dump on ZFS volume
> To be honest, it is not quite clear to me, how we might utilize > dumpadm(1M) to help us to calculate/recommend size of dump device. > Could you please elaborate more on this ?dumpadm(1M) -c specifies the dump content, which can be kernel, kernel plus current process, or all memory. If the dump content is ''all'', the dump space needs to be as large as physical memory. If it''s just ''kernel'', it can be some fraction of that. Jeff
Jeff Bonwick
2008-Jul-01 21:07 UTC
[zfs-discuss] [caiman-discuss] swap & dump on ZFS volume
> The problem is that size-capping is the only control we have over > thrashing right now.It''s not just thrashing, it''s also any application that leaks memory. Without a cap, the broken application would continue plowing through memory until it had consumed every free block in the storage pool. What we really want is dynamic allocation with lower and upper bounds to ensure that there''s always enough swap space, and that a reasonable upper limit isn''t exceeded. As fortune would have it, that''s exactly what we get with quotas and reservations on zvol-based swap today. If you prefer uncapped behavior, no problem -- unset the reservation and grow the swap zvol to 16EB. (Ultimately it would be cleaner to express this more directly, rather than via the nominal size of an emulated volume. The VM 2.0 project will address that, along with many other long-standing annoyances.) Jeff
Bob Friesenhahn
2008-Jul-01 21:36 UTC
[zfs-discuss] [caiman-discuss] swap & dump on ZFS volume
On Tue, 1 Jul 2008, Miles Nordin wrote:> > bf> What is the relationship between the size of the memory > bf> reservation and thrashing? > > The problem is that size-capping is the only control we have over > thrashing right now. Maybe there are better ways to predict thrashing > than through reservation size, and maybe it''s possible to design swapTo be clear, "thrashing" as pertains to the paging device is due to the application making random access to virtual memory which is larger than the amount of physical memory on the machine. This is very similar to random access to disk (i.e. not very efficient) and in fact it does cause random access to disk. In a well-designed VM system (Solaris is probably second to none), sequential access to virtual memory causes reasonably sequential I/O requests to disk. Stale or dirty pages are expunged as needed in order to clear space for new requests. If multiple applications are fighting over the same VM, then there can still be thrashing even if their access is orderly. If using more virtual address space than there is physical address space always leads to problems, then it would not have much value. Bob =====================================Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Miles Nordin
2008-Jul-01 23:17 UTC
[zfs-discuss] [caiman-discuss] swap & dump on ZFS volume
>>>>> "bf" == Bob Friesenhahn <bfriesen at simple.dallas.tx.us> writes:bf> sequential access to virtual memory causes reasonably bf> sequential I/O requests to disk. no, thrashing is not when memory is accessed randomly instead of sequentially. It''s when the working set of pages is too big to fit in physical RAM. A program that allocates twice physical RAM size, then just scans through the entire block over and over, sequentially, will cause thrashing: the program will run orders of magnitude slower than it would run if it had enough physical RAM for its working set. Yes, I am making assumptions: 1. more than one program is running. the other program might just be xterm, but it''s there. 2. programs that allocate memory expect it to be about as fast as memory usually is. But, just read the assumptions. They''re not really assumptions. They''re just definitions of what is RAM, and what is a time-sharing system. They''re givens. To benefit, you need your program to loop tens of thousands of times over one chunk of memory, then stop using that chunk and move on to a different chunk. This is typical, but it''s not sequential. It''s temporal and spatial locality. A ``well-designed'''' or ``second-to-none'''' VM subsystem combined with convenient programs that only use sequentially-accessed chunks of memory does not avoid thrashing if the working set is larger than physical RAM. bf> If using more virtual address space than there is physical bf> address space always leads to problems, then it would not have bf> much value. It''s useful when some of the pages are almost never used, like the part of Mozilla''s text segment where the mail reader lives, or large contiguous chunks of memory that have leaked from buggy C daemons that kill and restart themselves every hour but leak like upside-down buckets until then, or the getty processes running on tty''s with nothing connected to them. It''s also useful when you tend to use some pages for a while, then use other pages. like chew chew chew chew swallow, chew chew chew chew swallow: maybe this takes two or three times as long to run if the swallower has to be paged in and out, but at least if you chew for a few minutes, and if you stop chewing while you swallow like most people, it won''t run 100 times slower. If you make chewing and swallowing separate threads, then the working set is now the entire program, it doesn''t fit in physical RAM, and the program thrashes and runs 100 times slower. sorry for the tangent. I''ll shut up now. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 304 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20080701/f4e31956/attachment.bin>
Bob Friesenhahn
2008-Jul-02 00:30 UTC
[zfs-discuss] [caiman-discuss] swap & dump on ZFS volume
On Tue, 1 Jul 2008, Miles Nordin wrote:> > But, just read the assumptions. They''re not really assumptions. > They''re just definitions of what is RAM, and what is a time-sharing > system. They''re givens.In today''s systems with two or three levels of cache in front of "RAM", variable page sizes, and huge complexities these are definitely not "givens".> A ``well-designed'''' or ``second-to-none'''' VM subsystem combined with > convenient programs that only use sequentially-accessed chunks of > memory does not avoid thrashing if the working set is larger than > physical RAM.This simplistic view was perhaps more appropriate 10 or 15 years ago than it is now when typical systems come with with 2GB or more RAM and small rack-mount systems can be fitted with 128GB of RAM. The notion of "chewing" before moving on is interesting but it is worthwhile noting that it takes some time for applications to "chew" through 2GB or more RAM so the simplistic view of "working set" is now pretty dated. The "chew and move on" you describe becomes the normal case for sequential access. Regardless, it seems that Solaris should be willing to supply a large virtual address space if the application needs it and the administrator should have the ability to apply limits. Dynamic reservation would decrease administrative overhead and would allow large programs to be run without requiring a permanent allocation. This would be good for me since then I don''t have to permanently assign 32GB of space for swap in case I need it. Bob =====================================Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
jan damborsky
2008-Jul-02 10:19 UTC
[zfs-discuss] [caiman-discuss] swap & dump on ZFS volume
Jeff Bonwick wrote:>> To be honest, it is not quite clear to me, how we might utilize >> dumpadm(1M) to help us to calculate/recommend size of dump device. >> Could you please elaborate more on this ? > > dumpadm(1M) -c specifies the dump content, which can be kernel, kernel plus > current process, or all memory. If the dump content is ''all'', the dump space > needs to be as large as physical memory. If it''s just ''kernel'', it can be > some fraction of that.I see - thanks a lot for clarification. Jan
David Magda
2008-Jul-02 15:08 UTC
[zfs-discuss] [caiman-discuss] swap & dump on ZFS volume
On Jun 30, 2008, at 19:19, Jeff Bonwick wrote:> Dump is mandatory in the sense that losing crash dumps is criminal. > > Swap is more complex. It''s certainly not mandatory. Not so long ago, > swap was typically larger than physical memory.These two statements kind of imply that dump and swap are two different slices. They certainly can be, but how often are they?> On my desktop, which has 16GB of memory, the default OpenSolaris > swap partition is 2GB. > That''s just stupid. Unless swap space significantly expands the > amount of addressable virtual memory, there''s no reason to have it.Quite often swap and dump are the same device, at least in the installs that I''ve worked with, and I think the default for Solaris is that if dump is not explicitly specified it defaults to swap, yes? Is there any reason why they should be separate? Having two just seems like a waste to me, even with disk sizes being what they are (and growing). A separate dump device is only really needed if something goes completely wrong, otherwise it''s just sitting there "doing nothing". If you''re panicing, then whatever is in swap is now no longer relevant, so over writing it is no big deal.
Kyle McDonald
2008-Jul-02 15:16 UTC
[zfs-discuss] [caiman-discuss] swap & dump on ZFS volume
David Magda wrote:> > Quite often swap and dump are the same device, at least in the > installs that I''ve worked with, and I think the default for Solaris > is that if dump is not explicitly specified it defaults to swap, yes? > Is there any reason why they should be separate? > >I beleive there are technical limitations with ZFS Boot that stop them from sharing the same Zvol..> Having two just seems like a waste to me, even with disk sizes being > what they are (and growing). A separate dump device is only really > needed if something goes completely wrong, otherwise it''s just > sitting there "doing nothing". If you''re panicing, then whatever is > in swap is now no longer relevant, so over writing it is no big deal. >That said, with all the talk of dynamic sizing, If, during normal operation the swap Zvol has space allocated, and the Dump Zvol is sized to 0. Then during a panic, could the swap volume be sized to 0 and the dump volume expanded to whatever size? This at least while still requireing 2 Zvol''s would allow (even when the rest of the pool is short on space) a close approximation of the old behavior of sharing the same slice for both swap and dump. -Kyle> _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >
Darren J Moffat
2008-Jul-02 15:24 UTC
[zfs-discuss] [caiman-discuss] swap & dump on ZFS volume
David Magda wrote:> On Jun 30, 2008, at 19:19, Jeff Bonwick wrote: > >> Dump is mandatory in the sense that losing crash dumps is criminal. >> >> Swap is more complex. It''s certainly not mandatory. Not so long ago, >> swap was typically larger than physical memory. > > These two statements kind of imply that dump and swap are two > different slices. They certainly can be, but how often are they?If they are ZVOLs then they are ALWAYS different.> Quite often swap and dump are the same device, at least in the > installs that I''ve worked with, and I think the default for Solaris > is that if dump is not explicitly specified it defaults to swap, yes?Correct.> Is there any reason why they should be separate?You might want dump but not swap. They maybe connected via completely different types of storage interconnect. For dump ideally you want the simplest possible route to the disk. -- Darren J Moffat
Mike Gerdts
2008-Jul-02 15:33 UTC
[zfs-discuss] [caiman-discuss] swap & dump on ZFS volume
On Wed, Jul 2, 2008 at 10:08 AM, David Magda <dmagda at ee.ryerson.ca> wrote:> Quite often swap and dump are the same device, at least in the > installs that I''ve worked with, and I think the default for Solaris > is that if dump is not explicitly specified it defaults to swap, yes? > Is there any reason why they should be separate?Aside from what Kyle just said... If they are separate you can avoid doing savecore if you are never going to read it. For most people, my guess is that savecore just means that they cause a bunch of thrashing during boot (swap/dump is typically on same spindles as /var/crashh), waste some space in /var/crash, and never look at the crash dump. If you come across a time where you actually do want to look at it, you can manually run savecore at some time in the future. Also, last time I looked (and I''ve not seen anything to suggest it is fixed) proper dependencies do not exist to prevent paging activity after boot from trashing the crash dump in a shared swap+dump device - even when savecore is enabled. It is only by luck that you get anything out of it. Arguably this should be fixed by proper SMF dependencies. -- Mike Gerdts http://mgerdts.blogspot.com/
sanjay nadkarni (Laptop)
2008-Jul-02 16:30 UTC
[zfs-discuss] [caiman-discuss] swap & dump on ZFS volume
Mike Gerdts wrote:> On Wed, Jul 2, 2008 at 10:08 AM, David Magda <dmagda at ee.ryerson.ca> wrote: > >> Quite often swap and dump are the same device, at least in the >> installs that I''ve worked with, and I think the default for Solaris >> is that if dump is not explicitly specified it defaults to swap, yes? >> Is there any reason why they should be separate? >> > > Aside from what Kyle just said... > > If they are separate you can avoid doing savecore if you are never > going to read it. For most people, my guess is that savecore just > means that they cause a bunch of thrashing during boot (swap/dump is > typically on same spindles as /var/crashh), waste some space in > /var/crash, and never look at the crash dump. If you come across a > time where you actually do want to look at it, you can manually run > savecore at some time in the future. > > Also, last time I looked (and I''ve not seen anything to suggest it is > fixed) proper dependencies do not exist to prevent paging activity > after boot from trashing the crash dump in a shared swap+dump device - > even when savecore is enabled. It is only by luck that you get > anything out of it. Arguably this should be fixed by proper SMF > dependencies. >Really ? Back when I looked at it, dumps were written to the back end of the swap device. This would prevent paging from writing on top of a valid dump. Furthermore when the system is coming up, savecore was run very early to grab core so that paging would not trash the core. -Sanjay> -- > Mike Gerdts > http://mgerdts.blogspot.com/ > _______________________________________________ > caiman-discuss mailing list > caiman-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/caiman-discuss >
Kyle McDonald
2008-Jul-02 16:40 UTC
[zfs-discuss] [caiman-discuss] swap & dump on ZFS volume
sanjay nadkarni (Laptop) wrote:> Mike Gerdts wrote: > >> On Wed, Jul 2, 2008 at 10:08 AM, David Magda <dmagda at ee.ryerson.ca> wrote: >> >> >>> Quite often swap and dump are the same device, at least in the >>> installs that I''ve worked with, and I think the default for Solaris >>> is that if dump is not explicitly specified it defaults to swap, yes? >>> Is there any reason why they should be separate? >>> >>> >> Aside from what Kyle just said... >> >> If they are separate you can avoid doing savecore if you are never >> going to read it. For most people, my guess is that savecore just >> means that they cause a bunch of thrashing during boot (swap/dump is >> typically on same spindles as /var/crashh), waste some space in >> /var/crash, and never look at the crash dump. If you come across a >> time where you actually do want to look at it, you can manually run >> savecore at some time in the future. >> >> Also, last time I looked (and I''ve not seen anything to suggest it is >> fixed) proper dependencies do not exist to prevent paging activity >> after boot from trashing the crash dump in a shared swap+dump device - >> even when savecore is enabled. It is only by luck that you get >> anything out of it. Arguably this should be fixed by proper SMF >> dependencies. >> >> > Really ? Back when I looked at it, dumps were written to the back end of > the swap device. This would prevent paging from writing on top of a > valid dump. Furthermore when the system is coming up, savecore was > run very early to grab core so that paging would not trash the core. > >I''m guessing Mike is suggesting that making the swap device available for paging should be dependent on savecore having already completed it''s job. -Kyle> -Sanjay > > >> -- >> Mike Gerdts >> http://mgerdts.blogspot.com/ >> _______________________________________________ >> caiman-discuss mailing list >> caiman-discuss at opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/caiman-discuss >> >> > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >
George Wilson
2008-Jul-02 16:45 UTC
[zfs-discuss] [caiman-discuss] swap & dump on ZFS volume
Kyle McDonald wrote:> David Magda wrote: > >> Quite often swap and dump are the same device, at least in the >> installs that I''ve worked with, and I think the default for Solaris >> is that if dump is not explicitly specified it defaults to swap, yes? >> Is there any reason why they should be separate? >> >> >> > I beleive there are technical limitations with ZFS Boot that stop them > from sharing the same Zvol.. >Yes, there is. Swap zvols are ordinary zvols which still COW their blocks and leverage checksumming, etc. Dump zvols don''t have this luxury because when the system crashes you are limited in the number of tasks that you can perform. So we solved this by changing the personality of a zvol when it''s added as a dump device. In particular, we needed to make sure that all the blocks that the dump device cared about were available at the time of a system crash. So we preallocate the dump device when it gets created. We also follow a different I/O path when writing to a dump device allowing us to behave as if we were a separate partition on the disk. The dump subsystem doesn''t know the difference which is exactly what we wanted. :-)>> Having two just seems like a waste to me, even with disk sizes being >> what they are (and growing). A separate dump device is only really >> needed if something goes completely wrong, otherwise it''s just >> sitting there "doing nothing". If you''re panicing, then whatever is >> in swap is now no longer relevant, so over writing it is no big deal. >> >> > That said, with all the talk of dynamic sizing, If, during normal > operation the swap Zvol has space allocated, and the Dump Zvol is sized > to 0. Then during a panic, could the swap volume be sized to 0 and the > dump volume expanded to whatever size. >Unfortunately that''s not possible for the reasons I mentioned. You can resize the dump zvol to a smaller size but unfortunately you can''t make it a size 0 as there is a minimum size requirement. Thanks, George> This at least while still requireing 2 Zvol''s would allow (even when the > rest of the pool is short on space) a close approximation of the old > behavior of sharing the same slice for both swap and dump. > > -Kyle > > >> _______________________________________________ >> zfs-discuss mailing list >> zfs-discuss at opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >> >> > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >