[Not sure which list is most appropriate since it's using HAST + ZFS on -RELEASE, -STABLE, and -CURRENT. Feel free to trim the CC: on replies.] I'm having a hell of a time making this work on real hardware, and am not ruling out hardware issues as yet, but wanted to get some reassurance that someone out there is using this combination (FreeBSD + HAST + ZFS) successfully, without kernel panics, without core dumps, without deadlocks, without issues, etc. I need to know I'm not chasing a dead rabbit. In tests using VirtualBox and FreeBSD 8-STABLE from when HAST was first MFC'd, everything worked wonderfully. HAST-based pool would come up, data would sync to the slave node, fail-over worked nicely, bringing the other box back online as the slave worked, data synced back, etc. It was a thing of beauty. Now, on real hardware, I cannot get the system to stay online for more than an hour. :( hastd causes kernel panics with "bufwrite: buffer not busy" errors. ZFS pools get corrupted. System deadlocks (no log messages, no onscreen errors, not even NumLock key works) at random points. The hardware is fairly standard fare: - SuperMicro H8DGi-F motherboard - AMD Opteron 6100-series CPU (8-cores @ 2.0 GHz) - 8 GB DDR3 SDRAM - 64 GB Kingston V-Series SSD for the OS install (using ahci(4) and the motherboard SATA controller) - 3x SuperMicro AOC-USAS2-8Li SATA controllers with IT firmware - 6x 1.5 TB Seagate 7200.11 drives (1x raidz2 vdev) - 12x 1.0 TB Seagate 7200.12 drives (2x raidz2 vdev) - 6x 0.5 TB WD RE3 drives (1x raidz2 vdev) The motherboard BIOS is up-to-date. I do not see any way to update the firmware on the SATA controllers. Using the onboard IPMI-based sensors, CPU, motherboard, RAM temps and volatages are in the nominal range. I've tried with FreeBSD 8.2-RELEASE, 8-STABLE, 8-STABLE w/ZFSv28 patches, and 9-CURRENT (after the ZFSv28 commit). Things work well until I start hastd. Then either the system locks up, or hastd causes a kernel panic, or hastd dumps core. Each harddrive is glabel'd as "disk-a1" through "disk-d6". hast.conf has 24 resources listed, one for each glabel'd device. The pool is created using the /dev/hast/* devices with disk-a1 through disk-a6 being one raidz2 vdev, and so on through disk-b*, disk-c*, and disk-d*, for a total of 4 raidz2 vdevs of 6 drives each. A fairly standard setup, I would think. Even using a GENERIC kernel, I can't keep things stable and running. So, please, someone, somewhere, share a success story, where you're using FreeBSD, ZFS, and HAST. Let me know that it does work. I'm starting to lose faith in my abilities here. :( Or point out where I'm doing things wrong so I can correct the issues. Thanks. -- Freddie Cash fjwcash@gmail.com
> So, please, someone, somewhere, share a success story, where you're > using FreeBSD, ZFS, and HAST. Let me know that it does work. I'm > starting to lose faith in my abilities here. :(I ran our main database for the old company using ZFS on top of HAST without any problems at all. Had a single HAST disc with a zpool on top of it, and mysql on top of that. All worked perfectly for us. Am not running that currently as the company went under and we lost the hardware. But am working for a new business and am about to deploy the same configuration for the main database as its "tried and tested" as far as I am concerned. Will be slightly different, as I will have a pair of HAST drives and do mirroring over the top with ZFS. But I shall report back how well, or not, it works. -pete.
On Thu, Mar 24, 2011 at 01:36:32PM -0700, Freddie Cash wrote:> I've tried with FreeBSD 8.2-RELEASE, 8-STABLE, 8-STABLE w/ZFSv28 > patches, and 9-CURRENT (after the ZFSv28 commit). Things work well > until I start hastd. Then either the system locks up, or hastd causes > a kernel panic, or hastd dumps core.The minimum amount of information (as always) would be backtrace from the kernel and also hastd backtrace when it coredumps. There is really decent logging in hast, so I'm also sure it does log something interesting on primary or secondary. Another useful thing would be to turn on debugging in hast (single -d option for hastd). The best you can do is to give me the simplest and quickest procedure to reproduce the issue, eg. configure two hast resources, put ZFS mirror on top, start rsync /usr/src to the file system on top of hast and switch roles. The simpler the better. -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://yomoli.com -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 196 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20110325/3fb8627b/attachment.pgp
Hi, 2011/3/24 Freddie Cash <fjwcash@gmail.com>:> The hardware is fairly standard fare: > ?- SuperMicro H8DGi-F motherboard > ?- AMD Opteron 6100-series CPU (8-cores @ 2.0 GHz) > ?- 8 GB DDR3 SDRAM > ?- 64 GB Kingston V-Series SSD for the OS install (using ahci(4) and > the motherboard SATA controller) > ?- 3x SuperMicro AOC-USAS2-8Li SATA controllers with IT firmware > ?- 6x 1.5 TB Seagate 7200.11 drives (1x raidz2 vdev) > ?- 12x 1.0 TB Seagate 7200.12 drives (2x raidz2 vdev) > ?- 6x 0.5 TB WD RE3 drives (1x raidz2 vdev)just for info, sun recommend 1 Gb of RAM per Tera of data. i see here ~ 16 To of available data, so i would recommend 16 Gb for arc_size and 24 or 32 Gb for the host.
> The other 5% of the time, the hastd crashes occurred either when > importing the ZFS pool, or when running multiple parallel rsyncs to > the pool. hastd was always shown as the last running process in the > backtrace onscreen.This is what I am seeing - did you manage to reproduce this with the patch, or does it fix the issue for you ? Am doing more test now, with only a single hast device to see if it is stable. Am Ok to run without mirroring across hast devices for now, but wouldnt like to do so long term! -pete.
On Thu, Mar 24, 2011 at 01:36:32PM -0700, Freddie Cash wrote:> [Not sure which list is most appropriate since it's using HAST + ZFS > on -RELEASE, -STABLE, and -CURRENT. Feel free to trim the CC: on > replies.] > > I'm having a hell of a time making this work on real hardware, and am > not ruling out hardware issues as yet, but wanted to get some > reassurance that someone out there is using this combination (FreeBSD > + HAST + ZFS) successfully, without kernel panics, without core dumps, > without deadlocks, without issues, etc. I need to know I'm not > chasing a dead rabbit.I just committed a fix for a problem that might look like a deadlock. With trociny@ patch and my last fix (to GEOM GATE and hastd) do you still have any issues? -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://yomoli.com -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 196 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20110402/459f2244/attachment.pgp