Hello all. Like many others, I''ve come close to making a home NAS server based on ZFS and OpenSolaris. While this is not an enterprise solution with high IOPS expectation, but rather a low-power system for storing everything I have, I plan on cramming in some 6-10 5400RPM "Green" drives with low wattage and high capacity, and possibly an SSD or two (or one-two spinning disks) for Read/Write caching/logging. However, having all the drives spinning (with little actual usage for >99% of the data at any given time) will get inefficient for power bills. An apparent solution is to use very few active devices, and idle or spin down the other disks until their data is actually accessed - and minimize the frequency of such requests by efficient caching, while transparently maintaining the ease of use of a single ZFS pool. This was all recognized, considered and discussed before me, but I have yet to find any definite answers on my questions below :) I''ve read a number of blogs and threads on ZFS support for spinning down unused disks, and for deferring metadata updates to a few always-active devices. Some threads also discuss hacks to spin up drives of a ZFS pool in parallel, to reduce latency when accessing their data initially after a spin-down. There were also hack suggestions to keep only a few devices requiring active power for writes, i.e. adding a mirror to a pool when its free space is about to end, so new writes go only to a couple of new disks - effectively making the pool a growing concat device and losing benefits of parallel read/writes over all disks at once. There were many answers and ideas to digest, but some questions I have remaining are: 1) What is the real situation now? Are such solutions still some home-made hacks or commercial-only solutions, or did they integrate into commonly and freely available OpenSolaris source code and binaries? 2) Can the same SSD (single or a mirrored couple) be used for read and write logging, i.e. L2ARC and ZIL? Is that going to be efficient anyhow? Should their size be preallocated (i.e. as partitions on SSD), or can both L2ARC and ZIL use all of the free space on a shared SSD? 3) For a real-life situation, say, I''m going to watch a movie off this home NAS over CIFS or via local XWindows session, and the movie''s file size is small enough to fit in ARC (RAM) or L2ARC (SSD). Can I set up the system in such a manner (and using freely available software) that the idle drives of the pool spin up, read the whole movie''s file into a cache, and spin down - and for the 2 hours that the movie goes, these drives don''t rotate at all, and only the cache devices, RAM and CPU consume power? On a counter situation, is it possible to upload a few files to such a pool so that they fit into the single (mirrored) active non-volatile write-cache device, and the larger drive sets won''t spin up at all until the write cache becomes full and needs to spill over to disks? Would such scenarios require special hacks and scripts, or do they already work as I envisioned above - out of the box? What is a typical overhead noted by home-NAS ZFS enthusiasts? I.e. for a 4Gb movie to be prefetched and watched from cache, how large should the cache device be? 4) For a cheap and not blazing-fast home-user solution, the expensive SSDs (for L2ARC and/or ZIL roles, with spun-down large disks waiting for occasional rare requests) can consume half the monetary budget for the server. Can SSDs be replaced by commodity USB/CF flash devices, or by dedicated spinning rust - with a single/mirrored spindle consuming power instead of the whole dozen? 5) Some threads mentioned hierarchical storage management, such as SAMFS/QFS, as a means to keep recently-requested/written data on some active devices and later destage it to rarely-spun drives emulating a tape array, and represent the whole lot as a single POSIX filesystem. Is any of SAMFS/QFS (or similar solution) available for free? Is it needed in my case, or current ZFS implementation with HDDs+L2ARC+ZIL covers this aspect of HSM already? If not, can a ZFS pool with multiple datasets be created inside a HSM volume, so that I have the flexibility of ZFS and offline-storage capabilities of HSM? -- Thanks for any replies, including statements that my ideas are insane or my views are outdated ;) But constructive ones are more appreciated ;) //Jim -- This message posted from opensolaris.org
Kim You''ve been able to spin down drives since about Solaris 8. http://www.sun.com/bigadmin/features/articles/disk_power_saving.jsp Jim Klimov wrote: Hello all. Like many others, I''ve come close to making a home NAS server based on ZFS and OpenSolaris. While this is not an enterprise solution with high IOPS expectation, but rather a low-power system for storing everything I have, I plan on cramming in some 6-10 5400RPM "Green" drives with low wattage and high capacity, and possibly an SSD or two (or one-two spinning disks) for Read/Write caching/logging. However, having all the drives spinning (with little actual usage for >99% of the data at any given time) will get inefficient for power bills. An apparent solution is to use very few active devices, and idle or spin down the other disks until their data is actually accessed - and minimize the frequency of such requests by efficient caching, while transparently maintaining the ease of use of a single ZFS pool. This was all recognized, considered and discussed before me, but I have yet to find any definite answers on my questions below :) I''ve read a number of blogs and threads on ZFS support for spinning down unused disks, and for deferring metadata updates to a few always-active devices. Some threads also discuss hacks to spin up drives of a ZFS pool in parallel, to reduce latency when accessing their data initially after a spin-down. There were also hack suggestions to keep only a few devices requiring active power for writes, i.e. adding a mirror to a pool when its free space is about to end, so new writes go only to a couple of new disks - effectively making the pool a growing concat device and losing benefits of parallel read/writes over all disks at once. There were many answers and ideas to digest, but some questions I have remaining are: 1) What is the real situation now? Are such solutions still some home-made hacks or commercial-only solutions, or did they integrate into commonly and freely available OpenSolaris source code and binaries? 2) Can the same SSD (single or a mirrored couple) be used for read and write logging, i.e. L2ARC and ZIL? Is that going to be efficient anyhow? Should their size be preallocated (i.e. as partitions on SSD), or can both L2ARC and ZIL use all of the free space on a shared SSD? 3) For a real-life situation, say, I''m going to watch a movie off this home NAS over CIFS or via local XWindows session, and the movie''s file size is small enough to fit in ARC (RAM) or L2ARC (SSD). Can I set up the system in such a manner (and using freely available software) that the idle drives of the pool spin up, read the whole movie''s file into a cache, and spin down - and for the 2 hours that the movie goes, these drives don''t rotate at all, and only the cache devices, RAM and CPU consume power? On a counter situation, is it possible to upload a few files to such a pool so that they fit into the single (mirrored) active non-volatile write-cache device, and the larger drive sets won''t spin up at all until the write cache becomes full and needs to spill over to disks? Would such scenarios require special hacks and scripts, or do they already work as I envisioned above - out of the box? What is a typical overhead noted by home-NAS ZFS enthusiasts? I.e. for a 4Gb movie to be prefetched and watched from cache, how large should the cache device be? 4) For a cheap and not blazing-fast home-user solution, the expensive SSDs (for L2ARC and/or ZIL roles, with spun-down large disks waiting for occasional rare requests) can consume half the monetary budget for the server. Can SSDs be replaced by commodity USB/CF flash devices, or by dedicated spinning rust - with a single/mirrored spindle consuming power instead of the whole dozen? 5) Some threads mentioned hierarchical storage management, such as SAMFS/QFS, as a means to keep recently-requested/written data on some active devices and later destage it to rarely-spun drives emulating a tape array, and represent the whole lot as a single POSIX filesystem. Is any of SAMFS/QFS (or similar solution) available for free? Is it needed in my case, or current ZFS implementation with HDDs+L2ARC+ZIL covers this aspect of HSM already? If not, can a ZFS pool with multiple datasets be created inside a HSM volume, so that I have the flexibility of ZFS and offline-storage capabilities of HSM? -- Thanks for any replies, including statements that my ideas are insane or my views are outdated ;) But constructive ones are more appreciated ;) //Jim www.eagle.co.nz This email is confidential and may be legally privileged. If received in error please destroy and immediately notify us. _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
David Dyer-Bennet
2009-Nov-04 21:50 UTC
[zfs-discuss] (home NAS) zfs and spinning down of drives
On Wed, November 4, 2009 15:36, Trevor Pretty wrote:> You''ve been able to spin down drives since about Solaris 8.And thanks for the link to the article. The article specifies SAS and SCSI a lot; does this also apply to SATA? Will anything in serving a ZFS filesystem out via in-kernel CIFS have a hissy fit at the spin-up time if the disk is down when a request comes in? With 6 drives spinning (but two of them are a mirrored root pool, and they do advise pretty strongly against spinning down a boot disk), 4 drives I might be able to spin down, and only one serious user (maybe three others using the pool for backups now and then, it does seem like I could save quite a bit of spin time on the disks, and some power, by applying some power management. Is lifetime of disks going to be shortened by a lot of extra spin up/down cycles these days? "A lot" meaning a dozen a day or something? How much? (Because I rather anticipate replacing them to upgrade the size well before they''re three years old). Has anybody done this? Is this going to be complicated and confusing, or very simple? Will I encounter anything more annoying than my windows box hanging when it accesses a file on a spun-down disk until the disk can spin up? -- David Dyer-Bennet, dd-b at dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info
Orvar Korvar
2009-Nov-04 22:51 UTC
[zfs-discuss] (home NAS) zfs and spinning down of drives
I read about some guy that shut off his RAID when he didnt use it. And he had a large system disc he used for temporary storage. So he copied everything to the temp storage and immediately shut down the raid. -- This message posted from opensolaris.org
Thanks for the link, but the main concern in spinning down drives of a ZFS pool is that ZFS by default is not so idle. Every 5 to 30 seconds it closes a transaction group (TXG) which requires a synchronous write of metadata to disk. I mentioned reading many blogs/forums on the matter, and some did mention hacks from tweaking kernel code to crafting different workarounds which all effect in limiting regular access to a few devices of the pool instead of all devices of the pool. And part of my questions was whether such solutions are public domain and part of common OpenSolaris now? But thanks anyway :) -- This message posted from opensolaris.org
Marion Hakanson
2009-Nov-05 02:02 UTC
[zfs-discuss] (home NAS) zfs and spinning down of drives
jimklimov at cos.ru said:> Thanks for the link, but the main concern in spinning down drives of a ZFS > pool is that ZFS by default is not so idle. Every 5 to 30 seconds it closes > a transaction group (TXG) which requires a synchronous write of metadata to > disk.You know, it''s just going to depend on your usage. On my home machine (Solaris-10U6 with U8-level patches), the drives are set to spin down after 30 minutes of idle time. I''m not certain if the root pool spins down, but the drives in the 2nd mirrored pool do spin down. This pool contains my Solaris home directory and the Samba-connected datasets for backups of other computers. It is true that I have to make sure Thunderbird and Firefox are not running in order to idle the home directory. Then the drives spin down and seem to stay that way until I wake up the display by moving the mouse or accessing the keyboard. They will also spin up when a nightly backup kicks off on one of the other systems, or if I SSH-in from work to check something. I don''t do anything special other than stopping Thunderbird and Firefox when I leave the computer. I just select "Lock Screen" from the Gnome Launch menu, the screen-lock window pops up, and the display goes into power-save mode shortly after. I don''t think there''s anything magic about ZFS with regard to keeping the drives busy. The fancy power-saving stuff was done by Green-Bytes; There they modified ZFS to do the meta-data updates onto Flash-based SSD''s separate from the rest of the usual pool drives. That way things like ZIL activity did not have to spin up a large number of data drives just to make small metadata updates, etc. Regards, Marion
Jim Klimov
2009-Nov-05 08:42 UTC
[zfs-discuss] (home NAS) zfs and spinning down of drives; +source links
That''s one way to do it, especially on "legacy systems", but I''m going to employ a much-hyped innovative ZFS, in hopes that now or in a few builds from now it would automagically and transparently solve the problems I posed above - all in a single logical pool of un-equivalent devices: some are large and dormant, and very few are actively consuming power as a cache to buffer inputs and outputs. I see now that my post yesterday was a bit speculative, and I''d like to back it by a few links to other posts which discuss the possible in-house and commercial solutions, and general ideas on this matter. 1) [http://www.c0t0d0s0.org/archives/4769-MAID,-ZFS-and-some-further-thoughts-....html] "MAID, ZFS and some further thoughts ..." This post on c0t0d0s0.org from over a year ago suggests using an array of similar disks (i.e. the 48-disk Thumper) with a few of them in a different role than others. Specifically, Joerg Moellenkamp ponders the use of 4 disks as an L2ARC and 2 disks as a ZIL, with other 42 disks serving as a large storage of rarely-accessed data, with disk spun down most of the time. He also goes on to suggest adding disks (mirrors) to the active pool just before existing space is depleted, using the pool as a concat and only requiring the spin of one-two disks to access any certain block of data. That is, of the 42 "data" disks only 2 are initially added to the pool as data storage devices, and other 40 are only plugged into the hardware box, but they are unused and unreferenced until time comes. This fits the term "MAID" - Massive Arrays of Idle Disks. This c0t0d0s0 post as well as numerous blogs from Brendan Gregg (i.e. [http://blogs.sun.com/brendan/entry/slog_screenshots])and the 7000 Storage engineers imply that we can use the logging devices (sized adequately to the working set) to keep I/Os from reaching the majority of array spindles. If these log devices are SSD disks - this raises IOPS by orders of magnitude. I guess if these are plain disks, this saves power. The ideas in c0t0d0s0 post seem to have given birth to a "CR 67422636 - Disk activation base" as written below that post. Comments to the post discuss expectations of disk life when spinning up and down all the time, but this seems to boil down to the fact that such systems as a Thumper (and most home-NASes) use consumer grade drives, which are expected to work 9/5 and possibly spin down during the workday too. The laptop industry also contributed a lot to extend drive reliability and to lower the power consumption to single watts per disk. Besides, with lower spindle speeds (5400 or perhaps even 4200) the power draw and mechanical shock of spinning up should not be as bad as for 15kRPM disks. Besides, with I/Os mostly going to cache devices, the other disks As mocked in one of the posts about throttling the spindle speeds like CPUs do, as far as I understand WD recently produced some 7200/5400 RPM Green drives, and perhaps this strange RPM marking means just that - throttling as a balance between average power consumption and peak performance? (This is just my assumption, I have not yet read up on these devices much) And then Tobias Exner of "www.eoipso.com" hints that they implemented some userland tricks (services, GUI) - without changing the ZFS allocator or other kernel sides - to spin down harddisk devices. His hints are vague, and I see that the solution may be tailored (and limited) for SAM, with the spun-down zpools being imported on the fly as if plugging a cartridge into a tape drive. And my Deutche is too weak to find further details on their site (quickly at least). In particular, I don''t know if this solution is freely available nor whether it suits a home user nor whether SAM is required. 2) Another year-old post by "Stephen" on fosketts blog [http://blog.fosketts.net/2008/09/15/greenbytes-embraces-extends-zfs/] points to a Rhode-Island company "greenBytes" who use commodity storage hardware like Thumpers and a tweaked "ZFS+" (GBFS) to implement MAID, as well as their own deduping. The Register also follows up with details: [http://www.channelregister.co.uk/2009/09/16/greenbytes_gb_x/] To quote,> One element of the greenByte story is the way in which they have > tweaked ZFS to allow disks to spin down. They limit the metadata > updates to just a few disks, so the others can be idled when no > access to them is made. The company suggests scheduling this > for off hours to minimize latency as drives are brought back > online, an approach that is less than optimal from an energy > perspective but demonstrates that they understand just how > difficult this problem is to crack. > The core is there, however: They have integrated the data > protection and storage management elements to enable > spin-down to be practical.This seems to be a commercial HW/SW solution, so we wonder whether their changes to ZFS code are published? Others asked too, recently: [http://mail.opensolaris.org/pipermail/zfs-discuss/2009-November/033432.html] 3) A discussion "ZFS RAIDZ and atacontrol spindown settings" on FreeBSD ZFS list [http://www.mail-archive.com/freebsd-stable at freebsd.org/msg104974.html] produces a simple hack (a script) to spin up multiple drives of the ZFS array in parallel, if the request comes. While it was not directly posted, I imply that the OP''s system did spin down the pool''s drives for a long time, and only spun them up upon access initiated by him (i.e. not every 5-30 secs of metadata spooling as I expected earlier). Maybe that''s a different implementation for BSD? -- To summarize, these and similar posts let me think that: 1) ZFS vdev components can be spun down until the user initiates access to them, 2) There is a problem of spinning them up (by default ZFS accesses one drive after another, and it can take over half a minute to spin up a 4-drive RAIDZ), and there are simple hacks to solve that. I don''t know if there''s any more elegant solution yet, though. 3) "log/cache devices" can be used to buffer and reliably store read and write I/Os from OS to the zfs pool, and keep these I/Os from reaching the backend vdevs every time. Which with proper sizing of log devices should allow to spin these backend disks down for a long time. 4) At least a year ago it took a kernel hack to ZFS to limit metadata update writes, so they don''t reach all harddisks regularly, requiring spin-up every few minutes/seconds. 5) Due to limited number of write-erase cycles it is not recommended to use the commodity USB/CF flash memory drives for log/cache devices, but they can be used for boot devices containing the OS image (RAM-disk based execution preferred). Commercial SSDs which take care of evenly spreading IOs and detecting flash errors are suggested for log/cache devices, though. Spinning hard disks can also be used for log/cache devices. At least some builds ago there was a problem that log devices could only be added to a ZFS pool, and removal/failure of such a device caused pool corruption and required that the pool be remade. This seems to not be the case now with current OpenSolaris builds, correct? --- Now back to my original questions - are all the goodies in OpenSolaris source code repository, and ready for "production" home use? The point is in having a single logical zpool, with all spin-up/down, log, cache, etc. happening in background with no user intervention and knowledge during everyday run-time usage (unlike, for example, that guy who copied data from his RAID to temp system storage and shut down the RAID - all manually). Can I simply make a zpool containing, for example, 2 SSDs (a mirror) as *both* L2ARC and ZIL, and 8 drives in two 4-disk RAIDZ1 vdev''s, and have these drives not spinning most of the time - with I/Os (including metadata) buffering on SSDs? Would (a single) access to the backend pool spin up all 8 drives of the pool, or only the 4 drives of the given vdev? Can this work with current freely downloadable OpenSolaris, without SAM or other commercial solutions? Creative scripting is not a major problem, although the centrally supported and tested solutions are preferred :) Hope my questions make a bit more sense now, thanks, //Jim -- This message posted from opensolaris.org
Hello Marion, and thank you for a real-life example from Solaris :) In particular, it seems to answer one of my puzzles - a mounted ZFS pool per se does not require regular writes to disk, as I was informed and wrote (asked) in my other posts.> The drives are set to spin down after 30 minutes of idle timeTo make me certain, did you configure that with regular HDD power-management setup of the OS, with no special steps to take care of ZFS/ZPOOL? I guess you didn''t port and use the time-slider (auto-snapshots) service, though? ;) That''s a part of regular metadata updates which I hope should go to a log device, especially when it''s a "zero-sized" snapshot with no new recent writes to the dataset. Thanks, //Jim -- This message posted from opensolaris.org
Cindy Swearingen
2009-Nov-05 17:02 UTC
[zfs-discuss] (home NAS) zfs and spinning down of drives
The zfs/power management CR and friends are filed. See this CR: http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6794603 fs/pm - storage power management, phase I Cindy On 11/04/09 19:02, Marion Hakanson wrote:> jimklimov at cos.ru said: >> Thanks for the link, but the main concern in spinning down drives of a ZFS >> pool is that ZFS by default is not so idle. Every 5 to 30 seconds it closes >> a transaction group (TXG) which requires a synchronous write of metadata to >> disk. > > You know, it''s just going to depend on your usage. On my home machine > (Solaris-10U6 with U8-level patches), the drives are set to spin down > after 30 minutes of idle time. I''m not certain if the root pool spins > down, but the drives in the 2nd mirrored pool do spin down. This pool > contains my Solaris home directory and the Samba-connected datasets for > backups of other computers. > > It is true that I have to make sure Thunderbird and Firefox are not > running in order to idle the home directory. Then the drives spin down > and seem to stay that way until I wake up the display by moving the > mouse or accessing the keyboard. They will also spin up when a nightly > backup kicks off on one of the other systems, or if I SSH-in from work > to check something. > > I don''t do anything special other than stopping Thunderbird and Firefox > when I leave the computer. I just select "Lock Screen" from the Gnome > Launch menu, the screen-lock window pops up, and the display goes into > power-save mode shortly after. I don''t think there''s anything magic > about ZFS with regard to keeping the drives busy. > > The fancy power-saving stuff was done by Green-Bytes; There they modified > ZFS to do the meta-data updates onto Flash-based SSD''s separate from the > rest of the usual pool drives. That way things like ZIL activity did not > have to spin up a large number of data drives just to make small metadata > updates, etc. > > Regards, > > Marion > > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Richard Elling
2009-Nov-06 18:38 UTC
[zfs-discuss] (home NAS) zfs and spinning down of drives
On Nov 4, 2009, at 1:50 PM, David Dyer-Bennet wrote:> On Wed, November 4, 2009 15:36, Trevor Pretty wrote: > >> You''ve been able to spin down drives since about Solaris 8.Wrong! It was available in Solaris 2.6 and, IIRC, as a patch or unbundle to 2.5.1 :-)> And thanks for the link to the article. > > The article specifies SAS and SCSI a lot; does this also apply to > SATA?It depends on the disk, not so much the transport.> Will anything in serving a ZFS filesystem out via in-kernel CIFS > have a > hissy fit at the spin-up time if the disk is down when a request > comes in?By default, generally no. Disks will need to spin up and that takes some amount of time that can be longer than the timeouts higher in the software stacks. If you design this into your environment, be aware that the disks may be asked to start sequentially, so if you use many disks, you might need to increase the timeouts to cover the worst case.> With 6 drives spinning (but two of them are a mirrored root pool, > and they > do advise pretty strongly against spinning down a boot disk), 4 > drives I > might be able to spin down, and only one serious user (maybe three > others > using the pool for backups now and then, it does seem like I could > save > quite a bit of spin time on the disks, and some power, by applying > some > power management. > > Is lifetime of disks going to be shortened by a lot of extra spin up/ > down > cycles these days? "A lot" meaning a dozen a day or something? How > much? > (Because I rather anticipate replacing them to upgrade the size well > before they''re three years old).Yes. I dunno how much, because that info is probably not publically available. If you read disk datasheets, you will notice that disks targeting the laptop market have a reliability spec related to the number of landings. In some cases, disk manufacturers have special mechanical structures to deal with the head landing problem. The enterprise disk market does not tend to implement power management the same as the laptop market, so I''ve never seen references to the landing problem on enterprise disks.> Has anybody done this? Is this going to be complicated and > confusing, or > very simple? Will I encounter anything more annoying than my > windows box > hanging when it accesses a file on a spun-down disk until the disk can > spin up?I''ve been doing this for 15 years or so. You can notice the pause and if you are near the server, you can hear the disks spin up. But here in Southern California, where the electricity market is deregulated and there is only one supplier, we never know what the price of electricity will be, so we tend to err on the side of not using electric utilities when possible. -- richard
Marion Hakanson
2009-Nov-06 19:00 UTC
[zfs-discuss] (home NAS) zfs and spinning down of drives
jimklimov at cos.ru said:> To make me certain, did you configure that with regular HDD power-management > setup of the OS, with no special steps to take care of ZFS/ZPOOL?That''s correct. ZFS knows nothing of it. Since I have an older PC, the SATA drives are controlled by the "ata" driver, so the power.conf (or GUI "dtpower" interface) does not control them. I made the relevant settings in the /platform/i86pc/kernel/drv/ata.conf file ("standby=...").> I guess you didn''t port and use the time-slider (auto-snapshots) service, > though? ;)Also correct, no automatic snapshots. Regards, Marion
> SATA drives are controlled by the "ata" driver,go into your bios and turn on ASPI or native sata support. Rob
Marion Hakanson
2009-Nov-06 20:14 UTC
[zfs-discuss] (home NAS) zfs and spinning down of drives
Rob at Logan.com said:>I said: > > SATA drives are controlled by the "ata" driver, > > go into your bios and turn on ASPI or native sata support.Been down that road years ago. BIOS has "legacy IDE" and "native sata" settings; But this is an ICH5 Intel chipset, not supported by any of the Solaris-10 SATA drivers. Regards, Marion
On Nov 4, 2009, at 6:02 PM, Jim Klimov wrote:> Thanks for the link, but the main concern in spinning down drives of a ZFS pool > is that ZFS by default is not so idle. Every 5 to 30 seconds it closes a transaction > group (TXG) which requires a synchronous write of metadata to disk.I''m running freebsd 7.2 with ZFS and have my data drives set to spin down when idle. They don''t get re-spun back up all the time like you fear; only when there is a userland access. as mentioned elsewhere, the disks spin up sequentially which is a PITA. I had a little script on my previous linux box that spun up all the disks .5 second apart but i can''t figure out how to get freebsd to tell me the current state of the disks. I don''t know if the freebsd folks made any mods to ZFS for this but I kinda doubt it. I don''t have anything spiffy like prefetching to a cache disk, etc. When I am streaming music to a squeezebox, all my disks are spinning. I have thought about hacking squeezecenter to copy to a temp disk but the win is pretty minimal, by the time they spin down i''ll want them back to choose the next album. danno
Jim Sez:> Like many others, I''ve come close to making a home > NAS server based on > ZFS and OpenSolaris. While this is not an enterprise > solution with high IOPS > expectation, but rather a low-power system for > storing everything I have, > I plan on cramming in some 6-10 5400RPM "Green" > drives with low wattage > and high capacity, and possibly an SSD or two (or > one-two spinning disks) > for Read/Write caching/logging.Hey! Me too! I''m up to buying hardware new to make it run. Having read through the thread, I wonder is the best solution might not be to make a minimal NAS-only box with a mirrored pair(s) of drives for the daily updates, and spinning this off at intervals via cron jobs or some such to long(er) term and safer storage in a second system that''s the main raidz repository. Sure it''s more elegant to have the momentary cache and safe repository on the same set of hardware, but for another $200 one can get a second whole system to work as the cache and take all the on/off cycles, then power on the main backing store system when something from deep freeze storage is needed, but keeping the recent working set in the cache system. This lets you schedule (for cheap electricity) the operations of the deep freeze backing storage, while keeping its disks mostly off, and minimizing power cycles on the disks down to as little as 1/day. Elegance is nice, but there are some places where more hardware can take it''s place more quickly. Can you tell I''m at heart a hardware guy? 8-) -- This message posted from opensolaris.org
Hello ! I''m currently using a X2200 with a LSI HBA connected to a Supermicro JBOD chassis, however i want to have more redundancy in the JBOD. So i have looked into to market, and into to the wallet, and i think that the Sun J4400 suits nicely to my goals. However i have some concerns and if anyone can give some suggestions i would trully appreciate. And now for my questions : * Will i be able to achieve multipath support, if i connect the J4400 to 2 LSI HBA in one server, with SATA disks, or this is only possible with SAS disks? This server will have OpenSolaris (any release i think) . * The CAM ( StorageTek Common Array Manager ), its only for hardware management of the JBOD, leaving disk/volumes/zpools/luns/whatever_name management up to the server operating system , correct ? * Can i put some readzillas/writezillas in the j4400 along with sata disks, and if so will i have any benefit , or should i place those *zillas directly into the servers disk tray? * Does any one has experiences with those jbods? If so, are they in general solid/reliable ? * The server will probably be a Sun x44xx series, with 32Gb ram, but for the best possible performance, should i invest in more and more spindles, or a couple less spindles and buy some readzillas? This system will be mainly used to export some volumes over ISCSI to a windows 2003 fileserver, and to hold some NFS shares. Thank you for all your time, Bruno -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20091125/19acae83/attachment.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 3656 bytes Desc: S/MIME Cryptographic Signature URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20091125/19acae83/attachment.bin>
Bruno - Sorry, I don''t have experience with OpenSolaris, but I *do* have experience running a J4400 with Solaris 10u8. First off, you need a LSI HBA for the Multipath support. It won''t work with any others as far as I know. I ran into problems with the multipath support because it wouldn''t allow me to manage the disks with cfgadm and got very confused when I''d do something as silly as replace a disk, causing the disk''s GUID (and therefor address under the virtual multipath controller) to change. My take-away was that Solaris 10u8 multipath support is not ready for production environments as there are limited-to-no administration tools. This may have been fixed in recent builds of Nevada. (See a thread that started around 03Nov09 for my experiences with MPxIO.) At the moment, I have the J4400 split between the two controllers and simply have even numbered disks on one, and odd numbered disks on the other. Both controllers can *see* all the disks. You are correct about the CAM software. It also updates the firmware, though, since us commoners don''t seemingly have access to the serial management ports on the J4400. I can''t speak to locating the drives -- that would be something you''d have to test. I have found increases in performance on my faster and more random array; others have found exactly the opposite. My configuration is as follows; x4250 - rpool - 2x 146 gb 10k SAS - ''hot'' pool - 10x 300gb 10k SAS + 2x 32gb ZIL j4400 - ''cold'' pool - 12x 1tb 7200rpm SATA ... testing adding 2x 146gb SAS in the x4250, but haven''t benchmarked yet. Performance on the J4400 was disappointing with just one controller to 12 disks in one RAIDZ2 and no ZIL. However, I do not know if the bottleneck was at the disk, controller, backplane, or software level... I''m too close to my deadline to do much besides randomly shotgunning different configs to see what works best! -K Karl Katzke Systems Analyst II TAMU - RGS>>> On 11/25/2009 at 11:13 AM, in message <4B0D65D6.4020009 at epinfante.com>, BrunoSousa <bsousa at epinfante.com> wrote:> Hello ! > > I''m currently using a X2200 with a LSI HBA connected to a Supermicro > JBOD chassis, however i want to have more redundancy in the JBOD. > So i have looked into to market, and into to the wallet, and i think > that the Sun J4400 suits nicely to my goals. However i have some > concerns and if anyone can give some suggestions i would trully appreciate. > And now for my questions : > > * Will i be able to achieve multipath support, if i connect the > J4400 to 2 LSI HBA in one server, with SATA disks, or this is only > possible with SAS disks? This server will have OpenSolaris (any > release i think) . > * The CAM ( StorageTek Common Array Manager ), its only for hardware > management of the JBOD, leaving > disk/volumes/zpools/luns/whatever_name management up to the server > operating system , correct ? > * Can i put some readzillas/writezillas in the j4400 along with sata > disks, and if so will i have any benefit , or should i place > those *zillas directly into the servers disk tray? > * Does any one has experiences with those jbods? If so, are they in > general solid/reliable ? > * The server will probably be a Sun x44xx series, with 32Gb ram, but > for the best possible performance, should i invest in more and > more spindles, or a couple less spindles and buy some readzillas? > This system will be mainly used to export some volumes over ISCSI > to a windows 2003 fileserver, and to hold some NFS shares. > > > Thank you for all your time, > Bruno >
This is an interesting discussion. It appears that there is indeed some work to be done with manipulating spin up/down on subsections of an array, etc. However, in terms of cost/performance for small systems, it may be simpler to solve this with less programming and more hardware. The cost of a smallish ZFS array is approximately $400 (motherboard, CPU, memory) and possibly less plus the cost of the array storage devices and about one memory controller (perhaps $100) for each set of eight disks over the ones the MB can handle by itself. Instead of dealing with the complexities of spinning up partial arrays, it might make sense to make the smallest, lightest, cheapest possible mini-array that can take care of most file needs, and have a larger zfs backing array that''s entirely powered off most of the time. This amounts to extending the cache disk to a cache array system. It would simplify the task of updating caches into a main backing store. It''s certainly simple to verify function on something like this. Having done some OS releases in a different life, just the testing to verify that complex file system operations were in fact operating properly instead of 98% properly was a massive chore. -- This message posted from opensolaris.org
Karl Don''t you just use stmsboot? http://docs.sun.com/source/820-3223-14/SASMultipath.html#50511899_pgfId-1046940 Bruno Next week I''m playing with a M3000 and a J4200 in the local NZ distributor''s lab. I had planned to just use the latest version of S10, but if I get the time I might play with OpenSolaris as well, but I don''t think there is anything radically different between the two here. >From what I''ve read in preparation (and I stand to be corrected): * Will i be able to achieve multipath support, if i connect the J4400 to 2 LSI HBA in one server, with SATA disks, or this is only possible with SAS disks? This server will have OpenSolaris (any release i think) . Disk type does not matter (see link above). * The CAM ( StorageTek Common Array Manager ), its only for hardware management of the JBOD, leaving disk/volumes/zpools/luns/whatever_name management up to the server operating system , correct ? That is my understanding see:- http://docs.sun.com/source/820-3765-11/ * Can i put some readzillas/writezillas in the j4400 along with sata disks, and if so will i have any benefit , or should i place those *zillas directly into the servers disk tray? On the Unified Storage products they go in both. Readzilla in the server Logzillas in the J4400. This is quite logical if you want to move the array between hosts all the data needs to be in the array. Read data can always be re-created so therefore the closer to the CPU the better. See: http://catalog.sun.com/ * Does any one has experiences with those jbods? If so, are they in general solid/reliable ? No: But, get a support contract! * The server will probably be a Sun x44xx series, with 32Gb ram, but for the best possible performance, should i invest in more and more spindles, or a couple less spindles and buy some readzillas? This system will be mainly used to export some volumes over ISCSI to a windows 2003 fileserver, and to hold some NFS shares. Check Brendon Gregg''s blogs *I think* he has done some work here from memory. Karl Katzke wrote: Bruno - Sorry, I don''t have experience with OpenSolaris, but I *do* have experience running a J4400 with Solaris 10u8. First off, you need a LSI HBA for the Multipath support. It won''t work with any others as far as I know. I ran into problems with the multipath support because it wouldn''t allow me to manage the disks with cfgadm and got very confused when I''d do something as silly as replace a disk, causing the disk''s GUID (and therefor address under the virtual multipath controller) to change. My take-away was that Solaris 10u8 multipath support is not ready for production environments as there are limited-to-no administration tools. This may have been fixed in recent builds of Nevada. (See a thread that started around 03Nov09 for my experiences with MPxIO.) At the moment, I have the J4400 split between the two controllers and simply have even numbered disks on one, and odd numbered disks on the other. Both controllers can *see* all the disks. You are correct about the CAM software. It also updates the firmware, though, since us commoners don''t seemingly have access to the serial management ports on the J4400. I can''t speak to locating the drives -- that would be something you''d have to test. I have found increases in performance on my faster and more random array; others have found exactly the opposite. My configuration is as follows; x4250 - rpool - 2x 146 gb 10k SAS - ''hot'' pool - 10x 300gb 10k SAS + 2x 32gb ZIL j4400 - ''cold'' pool - 12x 1tb 7200rpm SATA ... testing adding 2x 146gb SAS in the x4250, but haven''t benchmarked yet. Performance on the J4400 was disappointing with just one controller to 12 disks in one RAIDZ2 and no ZIL. However, I do not know if the bottleneck was at the disk, controller, backplane, or software level... I''m too close to my deadline to do much besides randomly shotgunning different configs to see what works best! -K Karl Katzke Systems Analyst II TAMU - RGS On 11/25/2009 at 11:13 AM, in message <4B0D65D6.4020009@epinfante.com>, Bruno Hello ! I''m currently using a X2200 with a LSI HBA connected to a Supermicro JBOD chassis, however i want to have more redundancy in the JBOD. So i have looked into to market, and into to the wallet, and i think that the Sun J4400 suits nicely to my goals. However i have some concerns and if anyone can give some suggestions i would trully appreciate. And now for my questions : * Will i be able to achieve multipath support, if i connect the J4400 to 2 LSI HBA in one server, with SATA disks, or this is only possible with SAS disks? This server will have OpenSolaris (any release i think) . * The CAM ( StorageTek Common Array Manager ), its only for hardware management of the JBOD, leaving disk/volumes/zpools/luns/whatever_name management up to the server operating system , correct ? * Can i put some readzillas/writezillas in the j4400 along with sata disks, and if so will i have any benefit , or should i place those *zillas directly into the servers disk tray? * Does any one has experiences with those jbods? If so, are they in general solid/reliable ? * The server will probably be a Sun x44xx series, with 32Gb ram, but for the best possible performance, should i invest in more and more spindles, or a couple less spindles and buy some readzillas? This system will be mainly used to export some volumes over ISCSI to a windows 2003 fileserver, and to hold some NFS shares. Thank you for all your time, Bruno _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss www.eagle.co.nz This email is confidential and may be legally privileged. If received in error please destroy and immediately notify us. _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
James C. McPherson
2009-Nov-30 02:21 UTC
[zfs-discuss] Opensolaris with J4400 - Experiences
Trevor Pretty wrote:> Karl > > Don''t you just use stmsboot? > > http://docs.sun.com/source/820-3223-14/SASMultipath.html#50511899_pgfId-1046940 > > Bruno > > Next week I''m playing with a M3000 and a J4200 in the local NZ > distributor''s lab. I had planned to just use the latest version of S10, > but if I get the time I might play with OpenSolaris as well, but I don''t > think there is anything radically different between the two here. > > >From what I''ve read in preparation (and I stand to be corrected): > > > * Will i be able to achieve multipath support, if i connect the > J4400 to 2 LSI HBA in one server, with SATA disks, or this is only > possible with SAS disks? This server will have OpenSolaris (any > release i think) .> Disk type does not matter (see link above). yes - run /usr/sbin/stmsboot -e that''s all you need to do. MPxIO will enumerate as many disks as it is able to. James C. McPherson -- Senior Kernel Software Engineer, Solaris Sun Microsystems http://blogs.sun.com/jmcp http://www.jmcp.homeunix.com/blog
Well, yes, to enable MPxIO. Which then requires a reboot. What about hot-swapping out a disk that was in a ZFS pool? Without rebooting? This was the answer I got early in November. Short version: It''s fixed in SNV 126, but not in 10u8 -- which is what I cautioned against. http://mail.opensolaris.org/pipermail/zfs-discuss/2009-November/033496.html -K Karl Katzke Systems Analyst II TAMU - RGS>>> On 11/29/2009 at 08:21 PM, in message <4B132C2C.3060008 at opensolaris.org>,"James C. McPherson" <jmcp at opensolaris.org> wrote:> Trevor Pretty wrote: > > Karl > > > > Don''t you just use stmsboot? > > > > > http://docs.sun.com/source/820-3223-14/SASMultipath.html#50511899_pgfId-1046940 > > > > > Bruno > > > > Next week I''m playing with a M3000 and a J4200 in the local NZ > > distributor''s lab. I had planned to just use the latest version of S10, > > but if I get the time I might play with OpenSolaris as well, but I don''t > > think there is anything radically different between the two here. > > > > >From what I''ve read in preparation (and I stand to be corrected): > > > > > > * Will i be able to achieve multipath support, if i connect the > > J4400 to 2 LSI HBA in one server, with SATA disks, or this is only > > possible with SAS disks? This server will have OpenSolaris (any > > release i think) . > > Disk type does not matter (see link above). > > > yes - run /usr/sbin/stmsboot -e > > that''s all you need to do. MPxIO will enumerate as many disks as > it is able to. > > > > > James C. McPherson > -- > Senior Kernel Software Engineer, Solaris > Sun Microsystems > http://blogs.sun.com/jmcp http://www.jmcp.homeunix.com/blog >
Hi Karl, Thank you for all your input, and i will keep this list updated about this "project". Regards, Bruno Karl Katzke wrote:> Bruno - > > Sorry, I don''t have experience with OpenSolaris, but I *do* have experience running a J4400 with Solaris 10u8. > > First off, you need a LSI HBA for the Multipath support. It won''t work with any others as far as I know. > > I ran into problems with the multipath support because it wouldn''t allow me to manage the disks with cfgadm and got very confused when I''d do something as silly as replace a disk, causing the disk''s GUID (and therefor address under the virtual multipath controller) to change. My take-away was that Solaris 10u8 multipath support is not ready for production environments as there are limited-to-no administration tools. This may have been fixed in recent builds of Nevada. (See a thread that started around 03Nov09 for my experiences with MPxIO.) > > At the moment, I have the J4400 split between the two controllers and simply have even numbered disks on one, and odd numbered disks on the other. Both controllers can *see* all the disks. > > You are correct about the CAM software. It also updates the firmware, though, since us commoners don''t seemingly have access to the serial management ports on the J4400. > > I can''t speak to locating the drives -- that would be something you''d have to test. I have found increases in performance on my faster and more random array; others have found exactly the opposite. > > My configuration is as follows; > x4250 > - rpool - 2x 146 gb 10k SAS > - ''hot'' pool - 10x 300gb 10k SAS + 2x 32gb ZIL > j4400 > - ''cold'' pool - 12x 1tb 7200rpm SATA ... testing adding 2x 146gb SAS in the x4250, but haven''t benchmarked yet. > > Performance on the J4400 was disappointing with just one controller to 12 disks in one RAIDZ2 and no ZIL. However, I do not know if the bottleneck was at the disk, controller, backplane, or software level... I''m too close to my deadline to do much besides randomly shotgunning different configs to see what works best! > > -K > > > Karl Katzke > Systems Analyst II > TAMU - RGS > > > > >>>> On 11/25/2009 at 11:13 AM, in message <4B0D65D6.4020009 at epinfante.com>, Bruno >>>> > Sousa <bsousa at epinfante.com> wrote: > >> Hello ! >> >> I''m currently using a X2200 with a LSI HBA connected to a Supermicro >> JBOD chassis, however i want to have more redundancy in the JBOD. >> So i have looked into to market, and into to the wallet, and i think >> that the Sun J4400 suits nicely to my goals. However i have some >> concerns and if anyone can give some suggestions i would trully appreciate. >> And now for my questions : >> >> * Will i be able to achieve multipath support, if i connect the >> J4400 to 2 LSI HBA in one server, with SATA disks, or this is only >> possible with SAS disks? This server will have OpenSolaris (any >> release i think) . >> * The CAM ( StorageTek Common Array Manager ), its only for hardware >> management of the JBOD, leaving >> disk/volumes/zpools/luns/whatever_name management up to the server >> operating system , correct ? >> * Can i put some readzillas/writezillas in the j4400 along with sata >> disks, and if so will i have any benefit , or should i place >> those *zillas directly into the servers disk tray? >> * Does any one has experiences with those jbods? If so, are they in >> general solid/reliable ? >> * The server will probably be a Sun x44xx series, with 32Gb ram, but >> for the best possible performance, should i invest in more and >> more spindles, or a couple less spindles and buy some readzillas? >> This system will be mainly used to export some volumes over ISCSI >> to a windows 2003 fileserver, and to hold some NFS shares. >> >> >> Thank you for all your time, >> Bruno >> >> > > >-------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 3656 bytes Desc: S/MIME Cryptographic Signature URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20091130/ff3fbfb0/attachment.bin>
OK Today I played with a J4400 connected to a Txxx server running S10 10/09 First off read the release notes I spent about 4 hours pulling my hair out as I could not get stmsboot to work until we read in the release notes that 500GB SATA drives do not work!!! Initial Setup: A pair of dual port SAS controllers (c4 and c5) A J4400 with 6x 1TB SATA disks The J440 had two controllers and these where connected to one SAS card (physical controller c4) Test 1: First a reboot -- -r format shows 12 disks on c4 (each disk having two paths). If you picked the same disk via both paths ZFS stopped you doing stupid things by knowing the disk was already in use. Test 2: run stmsboot -e format now shows six disk on controller c6, a new "virtual controller" The two internal disks are also now on c6 and stmsboot has done the right stuff with the rpool, so I would guess you could multi-path at a later date if you don''t want to fist off, but I did not test this. stmsboot -L only showed the two internal disk not the six in the J4400 strange, but we pressed on. Test 3: I created a zpool (two disks mirrored) using two of the new devices on c6. I created some I/O load I then unplugged one of the cables from the SAS card (physical c4). Result: Nothing everything just keeps working - cool stuff! Test 4: I plugged the unplugged cable into the other controller (physical c5) Result: Nothing everything just keeps working - cool stuff! Test 5: Being bold I then unplugged the remaining cable from the physical c4 controller Result: Nothing everything just keeps working - cool stuff! So I had gone from dual pathed, on a single controller (c4) to single pathed, on a different controller (c5). Test 6: I added the other four drives to the zpool (plain old zfs stuff - a bit boring). Test 7: I plugged in four more disks. Result: Their mulipathed devices just showed up in format, I added them to the pool and also added them as spares all the while the I/O load is happening. No noticable stops or glitches. Conclusion: If you RTFM first then stmsboot does everything it is documented to do. You don''t need to play with cfgadm or anything like that, just as I said orginally (below). The multi-pathing stuff is easy to set up and even a very rusty admin. like me found it very easy. Note: There may be patches for the 500GB SATA disks I don''y know, fortunatly that''s not what I''ve sold - Phew!! TTFN Trevor ________________________________ From: zfs-discuss-bounces at opensolaris.org [zfs-discuss-bounces at opensolaris.org] On Behalf Of Trevor Pretty [trevor_pretty at eagle.co.nz] Sent: Monday, 30 November 2009 2:48 p.m. To: Karl Katzke Cc: zfs-discuss at opensolaris.org Subject: Re: [zfs-discuss] Opensolaris with J4400 - Experiences Karl Don''t you just use stmsboot? http://docs.sun.com/source/820-3223-14/SASMultipath.html#50511899_pgfId-1046940 Bruno Next week I''m playing with a M3000 and a J4200 in the local NZ distributor''s lab. I had planned to just use the latest version of S10, but if I get the time I might play with OpenSolaris as well, but I don''t think there is anything radically different between the two here.>From what I''ve read in preparation (and I stand to be corrected):* Will i be able to achieve multipath support, if i connect the J4400 to 2 LSI HBA in one server, with SATA disks, or this is only possible with SAS disks? This server will have OpenSolaris (any release i think) . Disk type does not matter (see link above). * The CAM ( StorageTek Common Array Manager ), its only for hardware management of the JBOD, leaving disk/volumes/zpools/luns/whatever_name management up to the server operating system , correct ? That is my understanding see:- http://docs.sun.com/source/820-3765-11/ * Can i put some readzillas/writezillas in the j4400 along with sata disks, and if so will i have any benefit , or should i place those *zillas directly into the servers disk tray? On the Unified Storage products they go in both. Readzilla in the server Logzillas in the J4400. This is quite logical if you want to move the array between hosts all the data needs to be in the array. Read data can always be re-created so therefore the closer to the CPU the better. See: http://catalog.sun.com/ * Does any one has experiences with those jbods? If so, are they in general solid/reliable ? No: But, get a support contract! * The server will probably be a Sun x44xx series, with 32Gb ram, but for the best possible performance, should i invest in more and more spindles, or a couple less spindles and buy some readzillas? This system will be mainly used to export some volumes over ISCSI to a windows 2003 fileserver, and to hold some NFS shares. Check Brendon Gregg''s blogs *I think* he has done some work here from memory. Karl Katzke wrote: Bruno - Sorry, I don''t have experience with OpenSolaris, but I *do* have experience running a J4400 with Solaris 10u8. First off, you need a LSI HBA for the Multipath support. It won''t work with any others as far as I know. I ran into problems with the multipath support because it wouldn''t allow me to manage the disks with cfgadm and got very confused when I''d do something as silly as replace a disk, causing the disk''s GUID (and therefor address under the virtual multipath controller) to change. My take-away was that Solaris 10u8 multipath support is not ready for production environments as there are limited-to-no administration tools. This may have been fixed in recent builds of Nevada. (See a thread that started around 03Nov09 for my experiences with MPxIO.) At the moment, I have the J4400 split between the two controllers and simply have even numbered disks on one, and odd numbered disks on the other. Both controllers can *see* all the disks. You are correct about the CAM software. It also updates the firmware, though, since us commoners don''t seemingly have access to the serial management ports on the J4400. I can''t speak to locating the drives -- that would be something you''d have to test. I have found increases in performance on my faster and more random array; others have found exactly the opposite. My configuration is as follows; x4250 - rpool - 2x 146 gb 10k SAS - ''hot'' pool - 10x 300gb 10k SAS + 2x 32gb ZIL j4400 - ''cold'' pool - 12x 1tb 7200rpm SATA ... testing adding 2x 146gb SAS in the x4250, but haven''t benchmarked yet. Performance on the J4400 was disappointing with just one controller to 12 disks in one RAIDZ2 and no ZIL. However, I do not know if the bottleneck was at the disk, controller, backplane, or software level... I''m too close to my deadline to do much besides randomly shotgunning different configs to see what works best! -K Karl Katzke Systems Analyst II TAMU - RGS On 11/25/2009 at 11:13 AM, in message <4B0D65D6.4020009 at epinfante.com><mailto:4B0D65D6.4020009 at epinfante.com>, Bruno Sousa <bsousa at epinfante.com><mailto:bsousa at epinfante.com> wrote: Hello ! I''m currently using a X2200 with a LSI HBA connected to a Supermicro JBOD chassis, however i want to have more redundancy in the JBOD. So i have looked into to market, and into to the wallet, and i think that the Sun J4400 suits nicely to my goals. However i have some concerns and if anyone can give some suggestions i would trully appreciate. And now for my questions : * Will i be able to achieve multipath support, if i connect the J4400 to 2 LSI HBA in one server, with SATA disks, or this is only possible with SAS disks? This server will have OpenSolaris (any release i think) . * The CAM ( StorageTek Common Array Manager ), its only for hardware management of the JBOD, leaving disk/volumes/zpools/luns/whatever_name management up to the server operating system , correct ? * Can i put some readzillas/writezillas in the j4400 along with sata disks, and if so will i have any benefit , or should i place those *zillas directly into the servers disk tray? * Does any one has experiences with those jbods? If so, are they in general solid/reliable ? * The server will probably be a Sun x44xx series, with 32Gb ram, but for the best possible performance, should i invest in more and more spindles, or a couple less spindles and buy some readzillas? This system will be mainly used to export some volumes over ISCSI to a windows 2003 fileserver, and to hold some NFS shares. Thank you for all your time, Bruno _______________________________________________ zfs-discuss mailing list zfs-discuss at opensolaris.org<mailto:zfs-discuss at opensolaris.org> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss www.eagle.co.nz<http://www.eagle.co.nz/> This email is confidential and may be legally privileged. If received in error please destroy and immediately notify us. ==www.eagle.co.nz This email is confidential and may be legally privileged. If received in error please destroy and immediately notify us. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20091210/eef662c6/attachment.html>
Sorry if you got this twice but I never saw it appear on the alias. OK Today I played with a J4400 connected to a Txxx server running S10 10/09 First off read the release notes I spent about 4 hours pulling my hair out as I could not get stmsboot to work until we read in the release notes that 500GB SATA drives do not work!!! Initial Setup: A pair of dual port SAS controllers (c4 and c5) A J4400 with 6x 1TB SATA disks The J440 had two controllers and these where connected to one SAS card (physical controller c4) Test 1: First a reboot -- -r format shows 12 disks on c4 (each disk having two paths). If you picked the same disk via both paths ZFS stopped you doing stupid things by knowing the disk was already in use. Test 2: run stmsboot -e format now shows six disk on controller c6, a new "virtual controller" The two internal disks are also now on c6 and stmsboot has done the right stuff with the rpool, so I would guess you could multi-path at a later date if you don''t want to fist off, but I did not test this. stmsboot -L only showed the two internal disk not the six in the J4400 strange, but we pressed on. Test 3: I created a zpool (two disks mirrored) using two of the new devices on c6. I created some I/O load I then unplugged one of the cables from the SAS card (physical c4). Result: Nothing everything just keeps working - cool stuff! Test 4: I plugged the unplugged cable into the other controller (physical c5) Result: Nothing everything just keeps working - cool stuff! Test 5: Being bold I then unplugged the remaining cable from the physical c4 controller Result: Nothing everything just keeps working - cool stuff! So I had gone from dual pathed, on a single controller (c4) to single pathed, on a different controller (c5). Test 6: I added the other four drives to the zpool (plain old zfs stuff - a bit boring). Test 7: I plugged in four more disks. Result: Their mulipathed devices just showed up in format, I added them to the pool and also added them as spares all the while the I/O load is happening. No noticable stops or glitches. Conclusion: If you RTFM first then stmsboot does everything it is documented to do. You don''t need to play with cfgadm or anything like that, just as I said orginally (below). The multi-pathing stuff is easy to set up and even a very rusty admin. like me found it very easy. Note: There may be patches for the 500GB SATA disks I don''y know, fortunatly that''s not what I''ve sold - Phew!! TTFN Trevor Trevor Pretty wrote: Karl Don''t you just use stmsboot? http://docs.sun.com/source/820-3223-14/SASMultipath.html#50511899_pgfId-1046940 Bruno Next week I''m playing with a M3000 and a J4200 in the local NZ distributor''s lab. I had planned to just use the latest version of S10, but if I get the time I might play with OpenSolaris as well, but I don''t think there is anything radically different between the two here. >From what I''ve read in preparation (and I stand to be corrected): * Will i be able to achieve multipath support, if i connect the J4400 to 2 LSI HBA in one server, with SATA disks, or this is only possible with SAS disks? This server will have OpenSolaris (any release i think) . Disk type does not matter (see link above). * The CAM ( StorageTek Common Array Manager ), its only for hardware management of the JBOD, leaving disk/volumes/zpools/luns/whatever_name management up to the server operating system , correct ? That is my understanding see:- http://docs.sun.com/source/820-3765-11/ * Can i put some readzillas/writezillas in the j4400 along with sata disks, and if so will i have any benefit , or should i place those *zillas directly into the servers disk tray? On the Unified Storage products they go in both. Readzilla in the server Logzillas in the J4400. This is quite logical if you want to move the array between hosts all the data needs to be in the array. Read data can always be re-created so therefore the closer to the CPU the better. See: http://catalog.sun.com/ * Does any one has experiences with those jbods? If so, are they in general solid/reliable ? No: But, get a support contract! * The server will probably be a Sun x44xx series, with 32Gb ram, but for the best possible performance, should i invest in more and more spindles, or a couple less spindles and buy some readzillas? This system will be mainly used to export some volumes over ISCSI to a windows 2003 fileserver, and to hold some NFS shares. Check Brendon Gregg''s blogs *I think* he has done some work here from memory. Karl Katzke wrote: Bruno - Sorry, I don''t have experience with OpenSolaris, but I *do* have experience running a J4400 with Solaris 10u8. First off, you need a LSI HBA for the Multipath support. It won''t work with any others as far as I know. I ran into problems with the multipath support because it wouldn''t allow me to manage the disks with cfgadm and got very confused when I''d do something as silly as replace a disk, causing the disk''s GUID (and therefor address under the virtual multipath controller) to change. My take-away was that Solaris 10u8 multipath support is not ready for production environments as there are limited-to-no administration tools. This may have been fixed in recent builds of Nevada. (See a thread that started around 03Nov09 for my experiences with MPxIO.) At the moment, I have the J4400 split between the two controllers and simply have even numbered disks on one, and odd numbered disks on the other. Both controllers can *see* all the disks. You are correct about the CAM software. It also updates the firmware, though, since us commoners don''t seemingly have access to the serial management ports on the J4400. I can''t speak to locating the drives -- that would be something you''d have to test. I have found increases in performance on my faster and more random array; others have found exactly the opposite. My configuration is as follows; x4250 - rpool - 2x 146 gb 10k SAS - ''hot'' pool - 10x 300gb 10k SAS + 2x 32gb ZIL j4400 - ''cold'' pool - 12x 1tb 7200rpm SATA ... testing adding 2x 146gb SAS in the x4250, but haven''t benchmarked yet. Performance on the J4400 was disappointing with just one controller to 12 disks in one RAIDZ2 and no ZIL. However, I do not know if the bottleneck was at the disk, controller, backplane, or software level... I''m too close to my deadline to do much besides randomly shotgunning different configs to see what works best! -K Karl Katzke Systems Analyst II TAMU - RGS On 11/25/2009 at 11:13 AM, in message <4B0D65D6.4020009@epinfante.com>, Bruno class="moz-txt-link-rfc2396E" href="mailto:bsousa@epinfante.com"> wrote: Hello ! I''m currently using a X2200 with a LSI HBA connected to a Supermicro JBOD chassis, however i want to have more redundancy in the JBOD. So i have looked into to market, and into to the wallet, and i think that the Sun J4400 suits nicely to my goals. However i have some concerns and if anyone can give some suggestions i would trully appreciate. And now for my questions : * Will i be able to achieve multipath support, if i connect the J4400 to 2 LSI HBA in one server, with SATA disks, or this is only possible with SAS disks? This server will have OpenSolaris (any release i think) . * The CAM ( StorageTek Common Array Manager ), its only for hardware management of the JBOD, leaving disk/volumes/zpools/luns/whatever_name management up to the server operating system , correct ? * Can i put some readzillas/writezillas in the j4400 along with sata disks, and if so will i have any benefit , or should i place those *zillas directly into the servers disk tray? * Does any one has experiences with those jbods? If so, are they in general solid/reliable ? * The server will probably be a Sun x44xx series, with 32Gb ram, but for the best possible performance, should i invest in more and more spindles, or a couple less spindles and buy some readzillas? This system will be mainly used to export some volumes over ISCSI to a windows 2003 fileserver, and to hold some NFS shares. Thank you for all your time, Bruno _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss www.eagle.co.nz This email is confidential and may be legally privileged. If received in error please destroy and immediately notify us. www.eagle.co.nz This email is confidential and may be legally privileged. If received in error please destroy and immediately notify us. _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss