Ray Van Dolson
2012-Jan-11 02:07 UTC
[zfs-discuss] Unable to allocate dma memory for extra SGL
Hi all; We have a Solaris 10 U9 x86 instance running on Silicon Mechanics / SuperMicro hardware. Occasionally under high load (ZFS scrub for example), the box becomes non-responsive (it continues to respond to ping but nothing else works -- not even the local console). Our only solution is to hard reset after which everything comes up normally. Logs are showing the following: Jan 8 09:44:08 prodsys-dmz-zfs2 scsi: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci8086,3410 at 9/pci1000,72 at 0 (mpt_sas0): Jan 8 09:44:08 prodsys-dmz-zfs2 MPT SGL mem alloc failed Jan 8 09:44:08 prodsys-dmz-zfs2 scsi: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci8086,3410 at 9/pci1000,72 at 0 (mpt_sas0): Jan 8 09:44:08 prodsys-dmz-zfs2 Unable to allocate dma memory for extra SGL. Jan 8 09:44:08 prodsys-dmz-zfs2 scsi: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci8086,3410 at 9/pci1000,72 at 0 (mpt_sas0): Jan 8 09:44:08 prodsys-dmz-zfs2 Unable to allocate dma memory for extra SGL. Jan 8 09:44:10 prodsys-dmz-zfs2 scsi: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci8086,3410 at 9/pci1000,72 at 0 (mpt_sas0): Jan 8 09:44:10 prodsys-dmz-zfs2 Unable to allocate dma memory for extra SGL. Jan 8 09:44:10 prodsys-dmz-zfs2 scsi: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci8086,3410 at 9/pci1000,72 at 0 (mpt_sas0): Jan 8 09:44:10 prodsys-dmz-zfs2 MPT SGL mem alloc failed Jan 8 09:44:11 prodsys-dmz-zfs2 rpcmod: [ID 851375 kern.warning] WARNING: svc_cots_kdup no slots free I am able to resolve the last error by adjusting upwards the duplicate request cache sizes, but have been unable to find anything on the MPT SGL errors. Anyone have any thoughts on what this error might be? At this point, we are simply going to apply patches to this box (we do see an outstanding mpt patch): 147150 -- < 01 R-- 124 SunOS 5.10_x86: mpt_sas patch 147702 -- < 03 R-- 21 SunOS 5.10_x86: mpt patch But we have another identically configured box at the same patch level (admittedly with slightly less workload, though it also undergoes monthly zfs scrubs) which does not experience this issue. Ray
Hung-Sheng Tsao (laoTsao)
2012-Jan-11 02:23 UTC
[zfs-discuss] Unable to allocate dma memory for extra SGL
how is the ram size what is the zpool setup and what is your hba and hdd size and type Sent from my iPad On Jan 10, 2012, at 21:07, Ray Van Dolson <rvandolson at esri.com> wrote:> Hi all; > > We have a Solaris 10 U9 x86 instance running on Silicon Mechanics / > SuperMicro hardware. > > Occasionally under high load (ZFS scrub for example), the box becomes > non-responsive (it continues to respond to ping but nothing else works > -- not even the local console). Our only solution is to hard reset > after which everything comes up normally. > > Logs are showing the following: > > Jan 8 09:44:08 prodsys-dmz-zfs2 scsi: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci8086,3410 at 9/pci1000,72 at 0 (mpt_sas0): > Jan 8 09:44:08 prodsys-dmz-zfs2 MPT SGL mem alloc failed > Jan 8 09:44:08 prodsys-dmz-zfs2 scsi: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci8086,3410 at 9/pci1000,72 at 0 (mpt_sas0): > Jan 8 09:44:08 prodsys-dmz-zfs2 Unable to allocate dma memory for extra SGL. > Jan 8 09:44:08 prodsys-dmz-zfs2 scsi: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci8086,3410 at 9/pci1000,72 at 0 (mpt_sas0): > Jan 8 09:44:08 prodsys-dmz-zfs2 Unable to allocate dma memory for extra SGL. > Jan 8 09:44:10 prodsys-dmz-zfs2 scsi: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci8086,3410 at 9/pci1000,72 at 0 (mpt_sas0): > Jan 8 09:44:10 prodsys-dmz-zfs2 Unable to allocate dma memory for extra SGL. > Jan 8 09:44:10 prodsys-dmz-zfs2 scsi: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci8086,3410 at 9/pci1000,72 at 0 (mpt_sas0): > Jan 8 09:44:10 prodsys-dmz-zfs2 MPT SGL mem alloc failed > Jan 8 09:44:11 prodsys-dmz-zfs2 rpcmod: [ID 851375 kern.warning] WARNING: svc_cots_kdup no slots free > > I am able to resolve the last error by adjusting upwards the duplicate > request cache sizes, but have been unable to find anything on the MPT > SGL errors. > > Anyone have any thoughts on what this error might be? > > At this point, we are simply going to apply patches to this box (we do > see an outstanding mpt patch): > > 147150 -- < 01 R-- 124 SunOS 5.10_x86: mpt_sas patch > 147702 -- < 03 R-- 21 SunOS 5.10_x86: mpt patch > > But we have another identically configured box at the same patch level > (admittedly with slightly less workload, though it also undergoes > monthly zfs scrubs) which does not experience this issue. > > Ray > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Ray Van Dolson
2012-Jan-11 02:44 UTC
[zfs-discuss] Unable to allocate dma memory for extra SGL
On Tue, Jan 10, 2012 at 06:23:50PM -0800, Hung-Sheng Tsao (laoTsao) wrote:> how is the ram size what is the zpool setup and what is your hba and > hdd size and typeHmm, actually this system has only 6GB of memory. For some reason I though it had more. The controller is an LSISAS2008 (which oddly enough dose not seem to be recognized by lsiutil). There are 23x1TB disks (SATA interface, not SAS unfortunately) in the system. Three RAIDZ2 vdevs of seven disks each and one spare comprises a single zpool with two zfs file systems mounted (no deduplication or compression in use). There are two internally mounted Intel X-25E''s -- these double as the rootpool and ZIL devices. There is an 80GB X-25M mounted to the expander along with the 1TB drives operating as L2ARC.> > On Jan 10, 2012, at 21:07, Ray Van Dolson <rvandolson at esri.com> wrote: > > > Hi all; > > > > We have a Solaris 10 U9 x86 instance running on Silicon Mechanics / > > SuperMicro hardware. > > > > Occasionally under high load (ZFS scrub for example), the box becomes > > non-responsive (it continues to respond to ping but nothing else works > > -- not even the local console). Our only solution is to hard reset > > after which everything comes up normally. > > > > Logs are showing the following: > > > > Jan 8 09:44:08 prodsys-dmz-zfs2 scsi: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci8086,3410 at 9/pci1000,72 at 0 (mpt_sas0): > > Jan 8 09:44:08 prodsys-dmz-zfs2 MPT SGL mem alloc failed > > Jan 8 09:44:08 prodsys-dmz-zfs2 scsi: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci8086,3410 at 9/pci1000,72 at 0 (mpt_sas0): > > Jan 8 09:44:08 prodsys-dmz-zfs2 Unable to allocate dma memory for extra SGL. > > Jan 8 09:44:08 prodsys-dmz-zfs2 scsi: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci8086,3410 at 9/pci1000,72 at 0 (mpt_sas0): > > Jan 8 09:44:08 prodsys-dmz-zfs2 Unable to allocate dma memory for extra SGL. > > Jan 8 09:44:10 prodsys-dmz-zfs2 scsi: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci8086,3410 at 9/pci1000,72 at 0 (mpt_sas0): > > Jan 8 09:44:10 prodsys-dmz-zfs2 Unable to allocate dma memory for extra SGL. > > Jan 8 09:44:10 prodsys-dmz-zfs2 scsi: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci8086,3410 at 9/pci1000,72 at 0 (mpt_sas0): > > Jan 8 09:44:10 prodsys-dmz-zfs2 MPT SGL mem alloc failed > > Jan 8 09:44:11 prodsys-dmz-zfs2 rpcmod: [ID 851375 kern.warning] WARNING: svc_cots_kdup no slots free > > > > I am able to resolve the last error by adjusting upwards the duplicate > > request cache sizes, but have been unable to find anything on the MPT > > SGL errors. > > > > Anyone have any thoughts on what this error might be? > > > > At this point, we are simply going to apply patches to this box (we do > > see an outstanding mpt patch): > > > > 147150 -- < 01 R-- 124 SunOS 5.10_x86: mpt_sas patch > > 147702 -- < 03 R-- 21 SunOS 5.10_x86: mpt patch > > > > But we have another identically configured box at the same patch level > > (admittedly with slightly less workload, though it also undergoes > > monthly zfs scrubs) which does not experience this issue. > > > > RayThanks, Ray
"Hung-Sheng Tsao (Lao Tsao 老曹) Ph. D."
2012-Jan-11 11:41 UTC
[zfs-discuss] Unable to allocate dma memory for extra SGL
On 1/10/2012 9:44 PM, Ray Van Dolson wrote:> On Tue, Jan 10, 2012 at 06:23:50PM -0800, Hung-Sheng Tsao (laoTsao) wrote: >> how is the ram size what is the zpool setup and what is your hba and >> hdd size and type > Hmm, actually this system has only 6GB of memory. For some reason I > though it had more.IMHO, you will need more RAM did you cap the ARC in /etc/system?> > The controller is an LSISAS2008 (which oddly enough dose not seem to be > recognized by lsiutil). > > There are 23x1TB disks (SATA interface, not SAS unfortunately) in the > system. Three RAIDZ2 vdevs of seven disks each and one spare comprises > a single zpool with two zfs file systems mounted (no deduplication or > compression in use). > > There are two internally mounted Intel X-25E''s -- these double as the > rootpool and ZIL devices. > > There is an 80GB X-25M mounted to the expander along with the 1TB > drives operating as L2ARC. > >> On Jan 10, 2012, at 21:07, Ray Van Dolson<rvandolson at esri.com> wrote: >> >>> Hi all; >>> >>> We have a Solaris 10 U9 x86 instance running on Silicon Mechanics / >>> SuperMicro hardware. >>> >>> Occasionally under high load (ZFS scrub for example), the box becomes >>> non-responsive (it continues to respond to ping but nothing else works >>> -- not even the local console). Our only solution is to hard reset >>> after which everything comes up normally. >>> >>> Logs are showing the following: >>> >>> Jan 8 09:44:08 prodsys-dmz-zfs2 scsi: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci8086,3410 at 9/pci1000,72 at 0 (mpt_sas0): >>> Jan 8 09:44:08 prodsys-dmz-zfs2 MPT SGL mem alloc failed >>> Jan 8 09:44:08 prodsys-dmz-zfs2 scsi: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci8086,3410 at 9/pci1000,72 at 0 (mpt_sas0): >>> Jan 8 09:44:08 prodsys-dmz-zfs2 Unable to allocate dma memory for extra SGL. >>> Jan 8 09:44:08 prodsys-dmz-zfs2 scsi: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci8086,3410 at 9/pci1000,72 at 0 (mpt_sas0): >>> Jan 8 09:44:08 prodsys-dmz-zfs2 Unable to allocate dma memory for extra SGL. >>> Jan 8 09:44:10 prodsys-dmz-zfs2 scsi: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci8086,3410 at 9/pci1000,72 at 0 (mpt_sas0): >>> Jan 8 09:44:10 prodsys-dmz-zfs2 Unable to allocate dma memory for extra SGL. >>> Jan 8 09:44:10 prodsys-dmz-zfs2 scsi: [ID 107833 kern.warning] WARNING: /pci at 0,0/pci8086,3410 at 9/pci1000,72 at 0 (mpt_sas0): >>> Jan 8 09:44:10 prodsys-dmz-zfs2 MPT SGL mem alloc failed >>> Jan 8 09:44:11 prodsys-dmz-zfs2 rpcmod: [ID 851375 kern.warning] WARNING: svc_cots_kdup no slots free >>> >>> I am able to resolve the last error by adjusting upwards the duplicate >>> request cache sizes, but have been unable to find anything on the MPT >>> SGL errors. >>> >>> Anyone have any thoughts on what this error might be? >>> >>> At this point, we are simply going to apply patches to this box (we do >>> see an outstanding mpt patch): >>> >>> 147150 --< 01 R-- 124 SunOS 5.10_x86: mpt_sas patch >>> 147702 --< 03 R-- 21 SunOS 5.10_x86: mpt patch >>> >>> But we have another identically configured box at the same patch level >>> (admittedly with slightly less workload, though it also undergoes >>> monthly zfs scrubs) which does not experience this issue. >>> >>> Ray > Thanks, > Ray-------------- next part -------------- A non-text attachment was scrubbed... Name: laotsao.vcf Type: text/x-vcard Size: 343 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20120111/f3c074a2/attachment.vcf>