thr3ads.net - zfs discuss - [zfs-discuss] Direct I/O ability with zfs? [Oct 2007]

If this information is useful, please help other people find it:
Share via:

David Runyon

2007-Oct-02 19:11 UTC

[zfs-discuss] Direct I/O ability with zfs?

We are using MySQL, and love the idea of using zfs for this.  We are used to
using Direct I/O to bypass file system caching (let the DB do this).  Does this
exist for zfs?
 
 
This message posted from opensolaris.org

eric kustarz

2007-Oct-02 19:20 UTC

head link

[zfs-discuss] Direct I/O ability with zfs?

On Oct 2, 2007, at 1:11 PM, David Runyon wrote:
> We are using MySQL, and love the idea of using zfs for this.  We  
> are used to using Direct I/O to bypass file system caching (let the  
> DB do this).  Does this exist for zfs?
Not yet, see:
6429855 Need way to tell ZFS that caching is a lost cause

Is there a specific reason why you need to do the caching at the DB  
level instead of the file system?  I''m really curious as i''ve
got
conflicting data on why people do this.  If i get more data on real  
reasons on why we shouldn''t cache at the file system, then this could  
get bumped up in my priority queue.

eric

Rayson Ho

2007-Oct-02 19:56 UTC

head link

[zfs-discuss] Direct I/O ability with zfs?

1) Modern DBMSs cache database pages in their own buffer pool because
it is less expensive than to access data from the OS. (IIRC, MySQL''s
MyISAM is the only one that relies on the FS cache, but a lot of MySQL
sites use INNODB which has its own buffer pool)

2) Also, direct I/O is faster because it avoid double buffering.

Rayson

On 10/2/07, eric kustarz <eric.kustarz at sun.com>
wrote:> Not yet, see:
> 6429855 Need way to tell ZFS that caching is a lost cause
>
> Is there a specific reason why you need to do the caching at the DB
> level instead of the file system?  I''m really curious as
i''ve got
> conflicting data on why people do this.  If i get more data on real
> reasons on why we shouldn''t cache at the file system, then this
could
> get bumped up in my priority queue.
>
> eric
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>

Richard Elling

2007-Oct-02 20:11 UTC

head link

[zfs-discuss] Direct I/O ability with zfs?

David Runyon wrote:> We are using MySQL, and love the idea of using zfs for this.  We are used
to using Direct I/O to bypass file system caching (let the DB do this).  Does
this exist for zfs?
This is a FAQ.  See:
	http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide#ZFS_and_Database_Recommendations
	http://blogs.sun.com/roch/entry/zfs_and_directio
	http://blogs.sun.com/bobs/entry/one_i_o_two_i

  -- richard

przemolicc at poczta.fm

2007-Oct-03 05:53 UTC

head link

[zfs-discuss] Direct I/O ability with zfs?

On Tue, Oct 02, 2007 at 01:20:24PM -0600, eric kustarz
wrote:> 
> On Oct 2, 2007, at 1:11 PM, David Runyon wrote:
> 
> > We are using MySQL, and love the idea of using zfs for this.  We  
> > are used to using Direct I/O to bypass file system caching (let the  
> > DB do this).  Does this exist for zfs?
> 
> Not yet, see:
> 6429855 Need way to tell ZFS that caching is a lost cause
> 
> Is there a specific reason why you need to do the caching at the DB  
> level instead of the file system?  I''m really curious as
i''ve got
> conflicting data on why people do this.  If i get more data on real  
> reasons on why we shouldn''t cache at the file system, then this
could
> get bumped up in my priority queue.
At least two reasons:
http://developers.sun.com/solaris/articles/mysql_perf_tune.html#6
http://blogs.sun.com/glennf/entry/where_do_you_cache_oracle
(the first example proofs that this issue is not only Oracle-related)

Regards
przemol


--
http://przemol.blogspot.com/









----------------------------------------------------------------------
Fajne i smieszne. Zobacz najlepsze filmiki!
>>> http://link.interia.pl/f1bbb

Roch - PAE

2007-Oct-03 08:42 UTC

head link

[zfs-discuss] Direct I/O ability with zfs?

Rayson Ho writes:

 > 1) Modern DBMSs cache database pages in their own buffer pool because
 > it is less expensive than to access data from the OS. (IIRC,
MySQL''s
 > MyISAM is the only one that relies on the FS cache, but a lot of MySQL
 > sites use INNODB which has its own buffer pool)
 > 

The DB can and should cache data whether or not directio is used.

 > 2) Also, direct I/O is faster because it avoid double buffering.
 > 

A piece of data can be in one buffer, 2 buffers, 3
buffers. That says nothing about performance. More below.

So I guess  you  mean DIO  is  faster because it  avoids the
extra copy: dma straight to  user buffer rather than DMA  to
kernel buffer then copy to user buffer. If an  I/O is 5ms an
8K copy is  about 10 usec. Is  avoiding the copy  really the
most urgent thing to work on ?

 > Rayson
 > 
 > 
 > 
 > 
 > On 10/2/07, eric kustarz <eric.kustarz at sun.com> wrote:
 > > Not yet, see:
 > > 6429855 Need way to tell ZFS that caching is a lost cause
 > >
 > > Is there a specific reason why you need to do the caching at the DB
 > > level instead of the file system?  I''m really curious as
i''ve got
 > > conflicting data on why people do this.  If i get more data on real
 > > reasons on why we shouldn''t cache at the file system, then
this could
 > > get bumped up in my priority queue.
 > >

I can''t answer this although can well imagine that the DB is 
the most efficent place to cache it''s own data all organised 
and formatted to respond to queries. 

But  once the  DB has signified   to the FS that it  doesn''t
require the FS to cache data  then the benefit from this RFE
is  that the memory used to   stage the data  can be quickly
recycled by ZFS  for  subsequent  operations. It  means  ZFS
memory footprint   is  more  likely to contain    useful ZFS
metadata and not cached data block we know are not likely to
be used again anytime soon.

We also would operated better in mixed DIO/non-DIO workloads.

See also:
	http://blogs.sun.com/roch/entry/zfs_and_directio

-r

 > > eric
 > > _______________________________________________
 > > zfs-discuss mailing list
 > > zfs-discuss at opensolaris.org
 > > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
 > >
 > _______________________________________________
 > zfs-discuss mailing list
 > zfs-discuss at opensolaris.org
 > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Matty

2007-Oct-03 13:47 UTC

head link

[zfs-discuss] Direct I/O ability with zfs?

On 10/3/07, Roch - PAE <Roch.Bourbonnais at sun.com>
wrote:> Rayson Ho writes:
>
>  > 1) Modern DBMSs cache database pages in their own buffer pool because
>  > it is less expensive than to access data from the OS. (IIRC,
MySQL''s
>  > MyISAM is the only one that relies on the FS cache, but a lot of
MySQL
>  > sites use INNODB which has its own buffer pool)
>  >
>
> The DB can and should cache data whether or not directio is used.
It does, which leads to the core problem. Why do we have to store the
exact same data twice in memory (i.e., once in the ARC, and once in
the shared memory segment that Oracle uses)? Due to the lack of direct
I/O and kernel asynchronous I/O in ZFS, my employer has decided to
stick with VxFS. I would love nothing more than to use ZFS with our
databases, but unfortunately these missing features prevent us from
doing so. :(

Thanks,
- Ryan
-- 
UNIX Administrator
http://prefetch.net

Roch - PAE

2007-Oct-03 14:31 UTC

head link

[zfs-discuss] Direct I/O ability with zfs?

Matty writes:
 > On 10/3/07, Roch - PAE <Roch.Bourbonnais at sun.com> wrote:
 > > Rayson Ho writes:
 > >
 > >  > 1) Modern DBMSs cache database pages in their own buffer pool
because
 > >  > it is less expensive than to access data from the OS. (IIRC,
MySQL''s
 > >  > MyISAM is the only one that relies on the FS cache, but a lot
of MySQL
 > >  > sites use INNODB which has its own buffer pool)
 > >  >
 > >
 > > The DB can and should cache data whether or not directio is used.
 > 
 > It does, which leads to the core problem. Why do we have to store the
 > exact same data twice in memory (i.e., once in the ARC, and once in
 > the shared memory segment that Oracle uses)? 

We do not retain 2 copies of the same data.

If the DB cache is made large enough to consume most of memory,
the ZFS copy will quickly be evicted to stage other I/Os on
their way to the DB cache.

What problem does that pose ?

-r

 > 
 > Thanks,
 > - Ryan
 > -- 
 > UNIX Administrator
 > http://prefetch.net
 > _______________________________________________
 > zfs-discuss mailing list
 > zfs-discuss at opensolaris.org
 > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Nicolas Williams

2007-Oct-03 15:11 UTC

head link

[zfs-discuss] Direct I/O ability with zfs?

On Wed, Oct 03, 2007 at 10:42:53AM +0200, Roch - PAE
wrote:> Rayson Ho writes:
>  > 2) Also, direct I/O is faster because it avoid double buffering.
> 
> A piece of data can be in one buffer, 2 buffers, 3
> buffers. That says nothing about performance. More below.
> 
> So I guess  you  mean DIO  is  faster because it  avoids the
> extra copy: dma straight to  user buffer rather than DMA  to
> kernel buffer then copy to user buffer. If an  I/O is 5ms an
> 8K copy is  about 10 usec. Is  avoiding the copy  really the
> most urgent thing to work on ?
If the DB is huge relative to RAM, and very busy, then memory pressure
could become a problem.  And it''s not just the time spent copying
buffers, but the resources spent managing those copies.  (Just guessing.)

Rayson Ho

2007-Oct-03 17:34 UTC

head link

[zfs-discuss] Direct I/O ability with zfs?

On 10/3/07, Roch - PAE <Roch.Bourbonnais at sun.com>
wrote:> We do not retain 2 copies of the same data.
>
> If the DB cache is made large enough to consume most of memory,
> the ZFS copy will quickly be evicted to stage other I/Os on
> their way to the DB cache.
>
> What problem does that pose ?
Hi Roch,

1) The memory copy operations are expensive... I think the following
is a good intro to this problem:

"Copying data in memory can be a serious bottleneck in DBMS software
today. This fact is often a surprise to database students, who assume
that main-memory operations are "free" compared to disk I/O. But in
practice, a welltuned database installation is typically not
I/O-bound."  (section 3.2)

http://mitpress.mit.edu/books/chapters/0262693143chapm2.pdf

(Ch 2: Anatomy of a Database System, Readings in Database Systems, 4th Ed)

2) If you look at the TPC-C disclosure reports, you will see vendors
using thousands of disks for the top 10 systems. With that many disks
working in parallel, the I/O latencies are not as big as of a problem
as systems with fewer disks.

3) Also interesting is Concurrent I/O, which was introduced in AIX 5.2:

"Improving Database Performance With AIX Concurrent I/O"
http://www-03.ibm.com/systems/p/os/aix/whitepapers/db_perf_aix.html

"Improve database performance on file system containers in IBM DB2 UDB
V8.2 using Concurrent I/O on AIX"
http://www-128.ibm.com/developerworks/db2/library/techarticle/dm-0408lee/

Rayson

>
> -r
>
>  >
>  > Thanks,
>  > - Ryan
>  > --
>  > UNIX Administrator
>  > http://prefetch.net
>  > _______________________________________________
>  > zfs-discuss mailing list
>  > zfs-discuss at opensolaris.org
>  > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>

Dale Ghent

2007-Oct-03 18:00 UTC

head link

[zfs-discuss] Direct I/O ability with zfs?

On Oct 3, 2007, at 10:31 AM, Roch - PAE wrote:
> If the DB cache is made large enough to consume most of memory,
> the ZFS copy will quickly be evicted to stage other I/Os on
> their way to the DB cache.
>
> What problem does that pose ?
Personally, I''m still not completely sold on the performance  
(performance as in ability, not speed) of ARC eviction. Often times,  
especially during a resilver, a server with ~2GB of RAM free under  
normal circumstances will dive down to the minfree floor, causing  
processes to be swapped out. We''ve had to take to manually  
constraining ARC max size so this situation is avoided. This is on  
s10u2/3. I haven''t tried anything heavy duty with Nevada simply  
because I don''t put Nevada in production situations.

Anyhow, in the case of DBs, ARC indeed becomes a vestigial organ. I''m  
surprised that this is being met with skepticism considering that  
Oracle highly recommends direct IO be used,  and, IIRC, Oracle  
performance was the main motivation to adding DIO to UFS back in  
Solaris 2.6. This isn''t a problem with ZFS or any specific fs per se,  
it''s the buffer caching they all employ. So I''m a big fan of
seeing
6429855 come to fruition.

/dale

Jim Mauro

2007-Oct-03 18:21 UTC

head link

[zfs-discuss] Direct I/O ability with zfs?

Hey Roch -> We do not retain 2 copies of the same data.
>
> If the DB cache is made large enough to consume most of memory,
> the ZFS copy will quickly be evicted to stage other I/Os on
> their way to the DB cache.
>
> What problem does that pose ?
Can''t answer that question empirically, because we can''t
measure this, but
I imagine there''s some overhead to ZFS cache management in evicting and
replacing blocks, and that overhead could be eliminated if ZFS could be
told not to cache the blocks at all.

Now, obviously, whether this overhead would be in the noise level, or
something that actually hurts sustainable performance will depend on
several things, but I can envision scenerios where it''s overhead
I''d
rather avoid if I could.

Thanks,
/jim

Richard Elling

2007-Oct-03 21:21 UTC

head link

[zfs-discuss] Direct I/O ability with zfs?

Rayson Ho wrote:> On 10/3/07, Roch - PAE <Roch.Bourbonnais at sun.com> wrote:
>> We do not retain 2 copies of the same data.
>>
>> If the DB cache is made large enough to consume most of memory,
>> the ZFS copy will quickly be evicted to stage other I/Os on
>> their way to the DB cache.
>>
>> What problem does that pose ?
> 
> Hi Roch,
> 
> 1) The memory copy operations are expensive... I think the following
> is a good intro to this problem:
> 
> "Copying data in memory can be a serious bottleneck in DBMS software
> today. This fact is often a surprise to database students, who assume
> that main-memory operations are "free" compared to disk I/O. But
in
> practice, a welltuned database installation is typically not
> I/O-bound."  (section 3.2)
... just the ones people are complaining about ;-)
Indeed it seems rare that a DB performance escalation does not involve
I/O tuning :-(
> http://mitpress.mit.edu/books/chapters/0262693143chapm2.pdf
> 
> (Ch 2: Anatomy of a Database System, Readings in Database Systems, 4th Ed)
> 
> 
> 2) If you look at the TPC-C disclosure reports, you will see vendors
> using thousands of disks for the top 10 systems. With that many disks
> working in parallel, the I/O latencies are not as big as of a problem
> as systems with fewer disks.
> 
> 
> 3) Also interesting is Concurrent I/O, which was introduced in AIX 5.2:
> 
> "Improving Database Performance With AIX Concurrent I/O"
> http://www-03.ibm.com/systems/p/os/aix/whitepapers/db_perf_aix.html
This is a pretty decent paper and some of the issues are the same with
UFS.  To wit, direct I/O is not always a win (qv. Bob Sneed''s blog)
It also describes what we call the single writer lock problem, which IBM
solves with Concurrent I/O.  See also:
	http://www.solarisinternals.com/wiki/index.php/Direct_I/O

ZFS doesn''t have the single writer lock problem.  See also:
	http://blogs.sun.com/roch/entry/zfs_to_ufs_performance_comparison

Slightly off-topic, in looking at some field data this morning (looking
for something completely unrelated) I notice that the use of directio
on UFS is declining over time.  I''m not sure what that means...
hopefully
not more performance escalations...
  -- richard

Dale Ghent

2007-Oct-03 21:44 UTC

head link

[zfs-discuss] Direct I/O ability with zfs?

On Oct 3, 2007, at 5:21 PM, Richard Elling wrote:
> Slightly off-topic, in looking at some field data this morning  
> (looking
> for something completely unrelated) I notice that the use of directio
> on UFS is declining over time.  I''m not sure what that means...  
> hopefully
> not more performance escalations...
Sounds like someone from ZFS team needs to get with someone from  
Oracle/MySQL/Postgres and get the skinny on how the IO rubber->road  
boundary should look, because it doesn''t sound like there''s a
definitive or at least a sure answer here.

Oracle trumpets the use of DIO, and there are benchmarks and first- 
hand accounts out there from DBAs on its virtues - at least when  
running on UFS (and EXT2/3 on Linux, etc)

As it relates to ZFS mechanics specifically, there doesn''t appear to  
be any settled opinion.

/dale

Jason J. W. Williams

2007-Oct-03 23:50 UTC

head link

[zfs-discuss] Direct I/O ability with zfs?

Hi Dale,

We''re testing out the enhanced arc_max enforcement (track DNLC
entries) using Build 72 right now. Hopefully, it will fix the memory
creep, which is the only real downside to ZFS for DB work it seems to
me. Frankly, of our DB loads have improved performance with ZFS. I
suspect its because we are write-heavy.

-J

On 10/3/07, Dale Ghent <daleg at elemental.org>
wrote:> On Oct 3, 2007, at 10:31 AM, Roch - PAE wrote:
>
> > If the DB cache is made large enough to consume most of memory,
> > the ZFS copy will quickly be evicted to stage other I/Os on
> > their way to the DB cache.
> >
> > What problem does that pose ?
>
> Personally, I''m still not completely sold on the performance
> (performance as in ability, not speed) of ARC eviction. Often times,
> especially during a resilver, a server with ~2GB of RAM free under
> normal circumstances will dive down to the minfree floor, causing
> processes to be swapped out. We''ve had to take to manually
> constraining ARC max size so this situation is avoided. This is on
> s10u2/3. I haven''t tried anything heavy duty with Nevada simply
> because I don''t put Nevada in production situations.
>
> Anyhow, in the case of DBs, ARC indeed becomes a vestigial organ.
I''m
> surprised that this is being met with skepticism considering that
> Oracle highly recommends direct IO be used,  and, IIRC, Oracle
> performance was the main motivation to adding DIO to UFS back in
> Solaris 2.6. This isn''t a problem with ZFS or any specific fs per
se,
> it''s the buffer caching they all employ. So I''m a big fan
of seeing
> 6429855 come to fruition.
>
> /dale
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>

Kugutsumen

2007-Oct-04 02:05 UTC

head link

[zfs-discuss] Direct I/O ability with zfs?

Postgres assumes that the OS takes care of caching:

"PLEASE NOTE. PostgreSQL counts a lot on the OS to cache data files  
and hence does not bother with duplicating its file caching effort.  
The shared buffers parameter assumes that OS is going to cache a lot  
of files and hence it is generally very low compared with system RAM.  
Even for a dataset in excess of 20GB, a setting of 128MB may be too  
much, if you have only 1GB RAM and an aggressive-at-caching OS like  
Linux." Tuning PostgreSQL for performance, Shridhar Daithankar, Josh  
Berkus, 2003, http://www.varlena.com/GeneralBits/Tidbits/perf.html

Slightly off-topic, I have noticed at least 25% performance gain on  
my postgresql database after installing Wu Fengguang''s adaptive read- 
ahead disk cache patch for the linux kernel. http://lkml.org/lkml/ 
2005/9/15/185

http://www.samag.com/documents/s=10101/sam0616a/0616a.htm

I was wondering if Solaris uses a similar approach.

On 04/10/2007, at 4:44 AM, Dale Ghent wrote:

> On Oct 3, 2007, at 5:21 PM, Richard Elling wrote:
>
>
>> Slightly off-topic, in looking at some field data this morning
>> (looking
>> for something completely unrelated) I notice that the use of directio
>> on UFS is declining over time.  I''m not sure what that
means...
>> hopefully
>> not more performance escalations...
>>
>
> Sounds like someone from ZFS team needs to get with someone from
> Oracle/MySQL/Postgres and get the skinny on how the IO rubber->road
> boundary should look, because it doesn''t sound like
there''s a
> definitive or at least a sure answer here.
>
> Oracle trumpets the use of DIO, and there are benchmarks and first-
> hand accounts out there from DBAs on its virtues - at least when
> running on UFS (and EXT2/3 on Linux, etc)
>
> As it relates to ZFS mechanics specifically, there doesn''t appear
to
> be any settled opinion.
>
> /dale
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

eric kustarz

2007-Oct-04 04:45 UTC

head link

[zfs-discuss] Direct I/O ability with zfs?

>
> Anyhow, in the case of DBs, ARC indeed becomes a vestigial organ.
I''m
> surprised that this is being met with skepticism considering that
> Oracle highly recommends direct IO be used,  and, IIRC, Oracle
> performance was the main motivation to adding DIO to UFS back in
> Solaris 2.6. This isn''t a problem with ZFS or any specific fs per
se,
> it''s the buffer caching they all employ. So I''m a big fan
of seeing
> 6429855 come to fruition.
The point is that directI/O typically means two things:
1) concurrent I/O
2) no caching at the file system

Most file systems (ufs, vxfs, etc.) don''t do 1) or 2) without turning  
on "directI/O".

ZFS *does* 1.  It doesn''t do 2 (currently).

That is what we''re trying to discuss here.

Where does the win come from with "directI/O"?  Is it 1), 2), or some
combination?  If its a combination, what''s the percentage of each  
towards the win?

We need to tease 1) and 2) apart to have a full understanding.  I''m  
not against adding 2) to ZFS but want more information.  I suppose  
i''ll just prototype it and find out for myself.

eric

eric kustarz

2007-Oct-04 04:50 UTC

head link

[zfs-discuss] Direct I/O ability with zfs?

On Oct 3, 2007, at 3:44 PM, Dale Ghent wrote:
> On Oct 3, 2007, at 5:21 PM, Richard Elling wrote:
>
>> Slightly off-topic, in looking at some field data this morning
>> (looking
>> for something completely unrelated) I notice that the use of directio
>> on UFS is declining over time.  I''m not sure what that
means...
>> hopefully
>> not more performance escalations...
>
> Sounds like someone from ZFS team needs to get with someone from
> Oracle/MySQL/Postgres and get the skinny on how the IO rubber->road
> boundary should look, because it doesn''t sound like
there''s a
> definitive or at least a sure answer here.
I''ve done that already (Oracle, Postgres, JavaDB, etc.).  Because the  
holy grail of "directI/O" is an overloaded term, we don''t
really know
where the win within "directI/O" lies.  In any event, it seems the  
only way to get a definitive answer here is to prototype a no caching  
property...

eric

Louwtjie Burger

2007-Oct-04 06:33 UTC

head link

[zfs-discuss] Direct I/O ability with zfs?

Would it be easier to ...

1) Change ZFS code to enable a sort of directIO emulation and then run
various tests... or

2) Use Sun''s performance team, which have all the experience in the
world when it comes to performing benchmarks on Solaris and Oracle ..
+ a Dtrace master to drill down and see what the difference is between
UFS and UFS/DIO... and where the real win lies.


On 10/4/07, eric kustarz <eric.kustarz at sun.com>
wrote:>
> On Oct 3, 2007, at 3:44 PM, Dale Ghent wrote:
>
> > On Oct 3, 2007, at 5:21 PM, Richard Elling wrote:
> >
> >> Slightly off-topic, in looking at some field data this morning
> >> (looking
> >> for something completely unrelated) I notice that the use of
directio
> >> on UFS is declining over time.  I''m not sure what that
means...
> >> hopefully
> >> not more performance escalations...
> >
> > Sounds like someone from ZFS team needs to get with someone from
> > Oracle/MySQL/Postgres and get the skinny on how the IO rubber->road
> > boundary should look, because it doesn''t sound like
there''s a
> > definitive or at least a sure answer here.
>
> I''ve done that already (Oracle, Postgres, JavaDB, etc.).  Because
the
> holy grail of "directI/O" is an overloaded term, we
don''t really know
> where the win within "directI/O" lies.  In any event, it seems
the
> only way to get a definitive answer here is to prototype a no caching
> property...
>
> eric
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>

Jim Mauro

2007-Oct-04 12:40 UTC

head link

[zfs-discuss] Direct I/O ability with zfs?

> Where does the win come from with "directI/O"?  Is it 1), 2), or
some
> combination?  If its a combination, what''s the percentage of each
> towards the win?
>   That will vary based on workload (I know, you already knew that ... :^).
Decomposing the performance win between what is gained as a result of 
single writer
lock breakup and no caching is something we can only guess at, because, 
at least
for UFS, you can''t do just one - it''s all or
nothing.> We need to tease 1) and 2) apart to have a full understanding.  
We can''t. We can only guess (for UFS).

My opinion - it''s a must-have for ZFS if we''re going to get
serious
attention
in the database space. I''ll bet dollars-to-donuts that, over the next 
several years,
we''ll burn many tens-of-millions of dollars on customer support 
escalations that
come down to memory utilization issues and contention between database
specific buffering and the ARC. This is entirely my opinion (not that of 
Sun),
and I''ve been wrong before.

Thanks,
/jim

Roch - PAE

2007-Oct-04 13:49 UTC

head link

[zfs-discuss] Direct I/O ability with zfs?

Jim Mauro writes:
 > 
 > > Where does the win come from with "directI/O"?  Is it 1),
2), or some
 > > combination?  If its a combination, what''s the percentage of
each
 > > towards the win?
 > >   
 > That will vary based on workload (I know, you already knew that ... :^).
 > Decomposing the performance win between what is gained as a result of 
 > single writer
 > lock breakup and no caching is something we can only guess at, because, 
 > at least
 > for UFS, you can''t do just one - it''s all or nothing.
 > > We need to tease 1) and 2) apart to have a full understanding.  
 > 
 > We can''t. We can only guess (for UFS).
 > 
 > My opinion - it''s a must-have for ZFS if we''re going to
get serious
 > attention
 > in the database space. I''ll bet dollars-to-donuts that, over the
next
 > several years,
 > we''ll burn many tens-of-millions of dollars on customer support 
 > escalations that
 > come down to memory utilization issues and contention between database
 > specific buffering and the ARC. This is entirely my opinion (not that of 
 > Sun),


...memory utilisation... OK so we should implement the ''lost
cause'' rfe.

In all cases, ZFS must not steal pages from other memory consumers :

	6488341 ZFS should avoiding growing the ARC into trouble

So the DB memory pages should not be _contented_ for. 

-r

 > and I''ve been wrong before.
 > 
 > Thanks,
 > /jim
 > 
 > 
 >

Roch - PAE

2007-Oct-04 16:12 UTC

head link

[zfs-discuss] Direct I/O ability with zfs?

eric kustarz writes:
 > >
 > > Anyhow, in the case of DBs, ARC indeed becomes a vestigial organ.
I''m
 > > surprised that this is being met with skepticism considering that
 > > Oracle highly recommends direct IO be used,  and, IIRC, Oracle
 > > performance was the main motivation to adding DIO to UFS back in
 > > Solaris 2.6. This isn''t a problem with ZFS or any specific
fs per se,
 > > it''s the buffer caching they all employ. So I''m a
big fan of seeing
 > > 6429855 come to fruition.
 > 
 > The point is that directI/O typically means two things:
 > 1) concurrent I/O
 > 2) no caching at the file system
 > 

In my blog I also mention :

   3) no readahead (but can be viewed as an implicit consequence of 2)

And someone chimed in with

   4) ability to do I/O at the sector granularity.

I also think that for many 2) is too weak form of what they
expect :

   5) DMA straight from user buffer to disk avoiding a copy.

So

   1) concurrent I/O we have in ZFS.

   2) No Caching.
      we could do by taking a directio hint and evict 
      arc buffer immediately after copyout to user space
      for reads,  and after txg completion for writes.

   3) No prefetching.
      we have 2 level of prefetching. The low level was
      fixed recently. Should not cause problem to DB loads.
      The high level still needs fixing on it''s own.
      Then we should take the same hint as 2) to disable it
      altogether. In the mean time we can tune our way into 
      this mode.

   4) Sector sized I/O
      Is really foreign to ZFS design.

   5) Zero Copy & more CPU efficientcy.
      I think is where the debate is.

My line has been that 5) won''t help latency much and latency is
where I think the game is currently played. Now the
disconnect might be because people might feel that the game
is not latency but CPU efficientcy : "how many CPU cycles to I
burn to do get data from disk to user buffer". This is a
valid point. Configurations can with very large number
of disks end up saturated by the filesystem CPU utilisation.

So I still think that the major area  for ZFS perf gains are
on the latency  front : block  allocation (now much improved
with  the Separate  intent log),  I/O  scheduling, and other
fixes to the threading & ARC behavior.  But at some point we
can turn  our microscope on    the CPU efficientcy  of   the
implementation.   The copy will certainly be  a big chunk of
the CPU cost per  I/O but I would still  like to gather that
data.

Also  consider, 50  disks at  200 IOPS of   8K is 80 MB/sec.
That means maybe  1/10th  of a single  CPU  to  be saved  by
avoiding just   the copy. Probably  not  what people have in
mind.  How many CPU''s do you have when attaching 1000 drives 
to a host running a 100TB database ? That many drivers will barely 
occupy 2 cores running the copies.

People want  performance and efficientcy. Directio is
just an overloaded name that  delivered those gains to other
filesystems.

Right now, what I think  is worth gathering is cycles  spent
in ZFS per reads & writes in a large DB environment where DB
holds  90%  of memory.  For  comparison with another  FS, we
should disable checksum, file prefetching, vdev prefetching,
cap the  ARC, atime  off,  8K  recordsize.  A breakdown  and
comparison  of   the  CPU  cost per   layer   will  be quite
interesting and points to what needs work.

Another interesting thing for me would be : what is your
budget ?

	"how   many cycles per DB   reads and writes are you
	willing to spend and how did you come to that number"

But, as Eric says, let''s develop 2 and I''ll try  in parallel
to
figure out the per layer breakdown cost.

-r

 > Most file systems (ufs, vxfs, etc.) don''t do 1) or 2) without
turning
 > on "directI/O".
 > 
 > ZFS *does* 1.  It doesn''t do 2 (currently).
 > 
 > That is what we''re trying to discuss here.
 > 
 > Where does the win come from with "directI/O"?  Is it 1), 2), or
some
 > combination?  If its a combination, what''s the percentage of each
 > towards the win?
 > 
 > We need to tease 1) and 2) apart to have a full understanding. 
I''m
 > not against adding 2) to ZFS but want more information.  I suppose  
 > i''ll just prototype it and find out for myself.
 > 
 > eric

Nicolas Williams

2007-Oct-04 16:44 UTC

head link

[zfs-discuss] Direct I/O ability with zfs?

On Wed, Oct 03, 2007 at 04:31:01PM +0200, Roch - PAE
wrote:>  > It does, which leads to the core problem. Why do we have to store the
>  > exact same data twice in memory (i.e., once in the ARC, and once in
>  > the shared memory segment that Oracle uses)? 
> 
> We do not retain 2 copies of the same data.
> 
> If the DB cache is made large enough to consume most of memory,
> the ZFS copy will quickly be evicted to stage other I/Os on
> their way to the DB cache.
> 
> What problem does that pose ?
Other things deserving of staying in the cache get pushed out by things
that don''t deserve being in the cache.  Thus systemic memory pressure
(e.g., more on-demand paging of text).

Nico
--

Nicolas Williams

2007-Oct-04 16:46 UTC

head link

[zfs-discuss] Direct I/O ability with zfs?

On Thu, Oct 04, 2007 at 03:49:12PM +0200, Roch - PAE
wrote:> ...memory utilisation... OK so we should implement the ''lost
cause'' rfe.
> 
> In all cases, ZFS must not steal pages from other memory consumers :
> 
> 	6488341 ZFS should avoiding growing the ARC into trouble
> 
> So the DB memory pages should not be _contented_ for. 
What if your executable text, and pretty much everything lives on ZFS?
You don''t want to content for the memory caching those things either.
It''s not just the DB''s memory you don''t want to
contend for.

Roch - PAE

2007-Oct-04 16:49 UTC

head link

[zfs-discuss] Direct I/O ability with zfs?

Nicolas Williams writes:
 > On Wed, Oct 03, 2007 at 04:31:01PM +0200, Roch - PAE wrote:
 > >  > It does, which leads to the core problem. Why do we have to
store the
 > >  > exact same data twice in memory (i.e., once in the ARC, and
once in
 > >  > the shared memory segment that Oracle uses)? 
 > > 
 > > We do not retain 2 copies of the same data.
 > > 
 > > If the DB cache is made large enough to consume most of memory,
 > > the ZFS copy will quickly be evicted to stage other I/Os on
 > > their way to the DB cache.
 > > 
 > > What problem does that pose ?
 > 
 > Other things deserving of staying in the cache get pushed out by things
 > that don''t deserve being in the cache.  Thus systemic memory
pressure
 > (e.g., more on-demand paging of text).
 > 
 > Nico
 > -- 

I agree. That''s why I submitted both of these.

	6429855 Need way to tell ZFS that caching is a lost cause
	6488341 ZFS should avoiding growing the ARC into trouble

-r

Roch - PAE

2007-Oct-04 16:59 UTC

head link

[zfs-discuss] Direct I/O ability with zfs?

Nicolas Williams writes:

 > On Thu, Oct 04, 2007 at 03:49:12PM +0200, Roch - PAE wrote:
 > > ...memory utilisation... OK so we should implement the ''lost
cause'' rfe.
 > > 
 > > In all cases, ZFS must not steal pages from other memory consumers :
 > > 
 > > 	6488341 ZFS should avoiding growing the ARC into trouble
 > > 
 > > So the DB memory pages should not be _contented_ for. 
 > 
 > What if your executable text, and pretty much everything lives on ZFS?
 > You don''t want to content for the memory caching those things
either.
 > It''s not just the DB''s memory you don''t want to
contend for.

On the read side, 

We''re talking  here  about  1000   disks each  running    35
concurrent I/Os of 8K, so a footprint of 250MB, to stage a
ton of work.

On the write side  we do have  to play with  the transaction
group  so  that will be  5-10   seconds worth of synchronous
write activity.

But how much memory does a 1000-disks server got ?

-r

Nicolas Williams

2007-Oct-04 17:29 UTC

head link

[zfs-discuss] Direct I/O ability with zfs?

On Thu, Oct 04, 2007 at 06:59:56PM +0200, Roch - PAE
wrote:> Nicolas Williams writes:
>  > On Thu, Oct 04, 2007 at 03:49:12PM +0200, Roch - PAE wrote:
>  > > So the DB memory pages should not be _contented_ for. 
>  > 
>  > What if your executable text, and pretty much everything lives on
ZFS?
>  > You don''t want to content for the memory caching those
things either.
>  > It''s not just the DB''s memory you don''t
want to contend for.
> 
> On the read side, 
> 
> We''re talking  here  about  1000   disks each  running    35
> concurrent I/Os of 8K, so a footprint of 250MB, to stage a
> ton of work.
I''m not sure what you mean, but extra copies and memory just to stage
the I/Os is not the same as the systemic memory pressure issue.

Now, I''m _speculating_ as to what the real problem is, but it seems
very
likely that putting things in the cache that needn''t be there would
push
out things that should be there, and since restoring those things to the
cache later would cost I/Os...

Eric Hamilton

2007-Oct-04 17:29 UTC

head link

[zfs-discuss] Direct I/O ability with zfs?

I''d like to second a couple of comments made recently:
    * If they don''t regularly do so, I too encourage the ZFS, Solaris 
performance, and Sun Oracle support teams to sit down and talk about the 
utility of Direct I/O for databases.
    * I too suspect that absent Direct I/O (or some ringing endorsement 
from Oracle about how ZFS doesn''t need Direct I/O), there will be a 
drain of customer escalations regarding the lack-- plus FUD and other 
sales inhibitors.

While I realize that Sun has not published a TPC-C result since 2001 and 
offers a different value proposition to customers, performance does 
matter and for some cases Direct I/O can contribute to that.

Historically, every TPC-C database benchmark run can be converted from 
being I/O bound to being CPU bound by adding enough disk spindles and 
enough main memory.  In that context, saving the CPU cycles (and cache 
misses) from a copy are important.

Another historical trend was that for performance, portability across 
different operating systems, and perhaps just because they could, 
databases tended to use as few OS capabilities as possible and to do 
their own resource management.  So for instance databases were often 
benchmarked using raw devices.  Customers on the other hand preferred 
the manageability of filesystems and tended to deploy there.  In that 
context, Direct I/O is an attempt to get the best of both worlds.

Finally, besides UFS Direct I/O on Solaris, other filesystems including 
VxFS also have various forms of Direct I/O-- either separate APIs or 
mount options for that bypass the cache on large writes, etc.  
Understanding those benefits, both real and advertised, helps understand 
the opportunities and shortfalls for ZFS.

It may be that this is not the most important thing for ZFS performance 
or capability right now-- measurement in  targeted configurations and 
workloads is the only way to tell-- but I''d be highly surprised if
there
isn''t something (bypass cache on really large writes?) that
can''t be
learned from experiences with Direct I/O.

Eric (Hamilton)

Anton B. Rang

2007-Oct-05 01:23 UTC

head link

[zfs-discuss] Direct I/O ability with zfs?

> 5) DMA straight from user buffer to disk avoiding a copy.
This is what the "direct" in "direct i/o" has historically
meant.  :-)
> line has been that 5) won''t help latency much and
> latency is here I think the game is currently played. Now the
> disconnect might be because people might feel that the game
> is not latency but CPU efficiency : "how many CPU cycles do I
> burn to do get data from disk to user buffer".
Actually, it''s less CPU cycles in many cases than memory cycles.

For many databases, most of the I/O is writes (reads wind up
cached in memory).  What''s the cost of a write?

With direct I/O: CPU writes to memory (spread out over many
transactions), disk DMAs from memory.  We write LPS (log page size)
bytes of data from CPU to memory, we read LPS bytes from memory.
On processors without a cache line zero, we probably read the LPS
data from memory as part of the write.  Total cost = W:LPS, R:2*LPS.

Without direct I/O: The cost of getting the data into the user buffer
remains the same (W:LPS, R:LPS).  We copy the data from user buffer
to system buffer (W:LPS, R:LPS).  Then we push it out to disk.  Total
cost = W:2*LPS, R:3*LPS.  We''ve nearly doubled the cost, not including
any TLB effects.

On a memory-bandwidth-starved system (which should be nearly all
modern designs, especially with multi-threaded chips like Niagara),
replacing buffered I/O with direct I/O should give you nearly a 2x
improvement in log write bandwidth.  That''s without considering
cache effects (which shouldn''t be too significant, really, since LPS
should be << the size of L2).

How significant is this?  We''d have to measure; and it will likely
vary quite a lot depending on which database is used for testing.

But note that, for ZFS, the win with direct I/O will be somewhat
less.  That''s because you still need to read the page to compute
its checksum.  So for direct I/O with ZFS (with checksums enabled),
the cost is W:LPS, R:2*LPS.  Is saving one page of writes enough to
make a difference?  Possibly not.

Anton
 
 
This message posted from opensolaris.org

Jonathan Loran

2007-Oct-05 05:26 UTC

head link

[zfs-discuss] Direct I/O ability with zfs?

I''ve been thinking about this for awhile, but Anton''s analysis
makes me
think about it even more:

We all love ZFS, right.  It''s futuristic in a bold new way, which many 
virtues,  I won''t preach tot he choir.  But to make it all glue
together
has some necessary CPU/Memory intensive operations around checksum 
generation/validation, compression, encryption, data placement/component 
load balancing, etc.  Processors have gotten really powerful, much more 
so than the relative disk I/O gains, which in all honesty make ZFS 
possible.  My question: Is anyone working on an offload engine for ZFS?  
I can envision a highly optimized, pipelined system, where writes and 
reads pass through checksum, compression, encryption ASICs, that also 
locate data properly on disk.  This could even be in the form of a PCIe 
SATA/SAS card with many ports, or different options.  This would make 
direct IO, or DMA IO possible again.  The file system abstraction with 
ZFS is really too much and too important to ignore, and too hard to 
optimize with different load conditions, (my rookie opinion) to expect 
any RDBMS app to have a clue what to do with it.  I guess what I''m 
saying is the RDMBS app will know what blocks it needs, and wants to get 
them in and out speedy quick, but the mapping to disk is not linear with 
ZFS, the way other file systems are.  An offload engine could translate 
this instead.

Just throwing this out there for the purpose of blue sky fluff.

Jon

Anton B. Rang wrote:>> 5) DMA straight from user buffer to disk avoiding a copy.
>>     
>
> This is what the "direct" in "direct i/o" has
historically meant.  :-)
>
>   
>> line has been that 5) won''t help latency much and
>> latency is here I think the game is currently played. Now the
>> disconnect might be because people might feel that the game
>> is not latency but CPU efficiency : "how many CPU cycles do I
>> burn to do get data from disk to user buffer".
>>     
>
> Actually, it''s less CPU cycles in many cases than memory cycles.
>
> For many databases, most of the I/O is writes (reads wind up
> cached in memory).  What''s the cost of a write?
>
> With direct I/O: CPU writes to memory (spread out over many
> transactions), disk DMAs from memory.  We write LPS (log page size)
> bytes of data from CPU to memory, we read LPS bytes from memory.
> On processors without a cache line zero, we probably read the LPS
> data from memory as part of the write.  Total cost = W:LPS, R:2*LPS.
>
> Without direct I/O: The cost of getting the data into the user buffer
> remains the same (W:LPS, R:LPS).  We copy the data from user buffer
> to system buffer (W:LPS, R:LPS).  Then we push it out to disk.  Total
> cost = W:2*LPS, R:3*LPS.  We''ve nearly doubled the cost, not
including
> any TLB effects.
>
> On a memory-bandwidth-starved system (which should be nearly all
> modern designs, especially with multi-threaded chips like Niagara),
> replacing buffered I/O with direct I/O should give you nearly a 2x
> improvement in log write bandwidth.  That''s without considering
> cache effects (which shouldn''t be too significant, really, since
LPS
> should be << the size of L2).
>
> How significant is this?  We''d have to measure; and it will likely
> vary quite a lot depending on which database is used for testing.
>
> But note that, for ZFS, the win with direct I/O will be somewhat
> less.  That''s because you still need to read the page to compute
> its checksum.  So for direct I/O with ZFS (with checksums enabled),
> the cost is W:LPS, R:2*LPS.  Is saving one page of writes enough to
> make a difference?  Possibly not.
>
> Anton
>  
>  
> This message posted from opensolaris.org
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>   
-- 

-     _____/     _____/      /           - Jonathan Loran -           -
-    /          /           /                IT Manager               -
-  _____  /   _____  /     /     Space Sciences Laboratory, UC Berkeley
-        /          /     /      (510) 643-5146 jloran at ssl.berkeley.edu
- ______/    ______/    ______/           AST:7731^29u18e3

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20071004/33aaa999/attachment.html>

Robert Milkowski

2007-Oct-05 09:33 UTC

head link

[zfs-discuss] Direct I/O ability with zfs?

Hello Rayson,

Tuesday, October 2, 2007, 8:56:09 PM, you wrote:

RH> 1) Modern DBMSs cache database pages in their own buffer pool because
RH> it is less expensive than to access data from the OS. (IIRC,
MySQL''s
RH> MyISAM is the only one that relies on the FS cache, but a lot of MySQL
RH> sites use INNODB which has its own buffer pool)

RH> 2) Also, direct I/O is faster because it avoid double buffering.

I doubt its buying you much...

However on UFS if you go with direct IO, you allow concurent writes to
the same file and you disable read-aheads - I guess it''s buying you
much more in most cases than eliminating double buffering.

Now the question is - if application is usingi directio() call - what
happens if underlying fs is zfs?


-- 
Best regards,
 Robert Milkowski                      mailto:rmilkowski at task.gda.pl
                                       http://milek.blogspot.com

Dave Johnson

2007-Oct-05 10:55 UTC

head link

[zfs-discuss] Direct I/O ability with zfs?

From: "Anton B. Rang" <rang at acm.org>> For many databases, most of the I/O is writes (reads wind up
> cached in memory).
2 words:  table scan

-=dave

Toby Thain

2007-Oct-05 12:40 UTC

head link

[zfs-discuss] Direct I/O ability with zfs?

On 5-Oct-07, at 2:26 AM, Jonathan Loran wrote:
>
> I''ve been thinking about this for awhile, but Anton''s
analysis
> makes me think about it even more:
>
> We all love ZFS, right.  It''s futuristic in a bold new way, which
> many virtues,  I won''t preach tot he choir.  But to make it all  
> glue together has some necessary CPU/Memory intensive operations  
> around checksum generation/validation, compression, encryption,  
> data placement/component load balancing, etc.  Processors have  
> gotten really powerful, much more so than the relative disk I/O  
> gains, which in all honesty make ZFS possible.  My question: Is  
> anyone working on an offload engine for ZFS?
How far would that compromise ZFS'' #1 virtue (IMHO), end to end  
integrity?

--Toby
> I can envision a highly optimized, pipelined system, where writes  
> and reads pass through checksum, compression, encryption ASICs,  
> that also locate data properly on disk.  This could even be in the  
> form of a PCIe SATA/SAS card with many ports, or different options.

Rayson Ho

2007-Oct-05 12:54 UTC

head link

[zfs-discuss] Direct I/O ability with zfs?

On 10/5/07, Robert Milkowski <rmilkowski at task.gda.pl>
wrote:> RH> 2) Also, direct I/O is faster because it avoids double buffering.
>
> I doubt its buying you much...
We don''t know how much the performance gain is until we get a
prototype and benchmark it - the behavior is different with different
DBMSs, OSes, workloads (ie. I/O rate, hit ratio) etc.
> However on UFS if you go with direct IO, you allow concurent writes to
> the same file and you disable read-aheads - I guess it''s buying
you
> much more in most cases than eliminating double buffering.
I really hope that someone can sit down and look at the database
interface provided in all the filesystems.

So far, there are Direct I/O, Concurrent I/O (AIX JFS2), and Quick I/O (VxFS)
http://eval.veritas.com/webfiles/docs/qiowp.pdf

Then a prototype for ZFS will help us understand how much we can get...

Rayson


>
> Now the question is - if application is usingi directio() call - what
> happens if underlying fs is zfs?

Darren J Moffat

2007-Oct-05 13:24 UTC

head link

[zfs-discuss] Direct I/O ability with zfs?

Toby Thain wrote:> On 5-Oct-07, at 2:26 AM, Jonathan Loran wrote:
> 
>> I''ve been thinking about this for awhile, but Anton''s
analysis
>> makes me think about it even more:
>>
>> We all love ZFS, right.  It''s futuristic in a bold new way,
which
>> many virtues,  I won''t preach tot he choir.  But to make it
all
>> glue together has some necessary CPU/Memory intensive operations  
>> around checksum generation/validation, compression, encryption,  
>> data placement/component load balancing, etc.  Processors have  
>> gotten really powerful, much more so than the relative disk I/O  
>> gains, which in all honesty make ZFS possible.  My question: Is  
>> anyone working on an offload engine for ZFS?
> 
> How far would that compromise ZFS'' #1 virtue (IMHO), end to end  
> integrity?
It need not, in fact with ZFS Crypto you will already get the encryption 
and checksum offloaded if you have suitable hardware (eg a SCA-6000 card 
or a Niagara 2 processor).

-- 
Darren J Moffat

Bill Sommerfeld

2007-Oct-05 15:15 UTC

head link

[zfs-discuss] Direct I/O ability with zfs?

On Fri, 2007-10-05 at 09:40 -0300, Toby Thain wrote:> How far would that compromise ZFS'' #1 virtue (IMHO), end to end  
> integrity?
Speed sells, and speed kills.

If the offload were done on the HBA, it would extend the size of the
"assumed correct" part of the hardware from just the CPU+memory to
also
include the offload device and all the I/O bridges, DMA widgets, etc.,
between it and memory.

This is already the case for the TCP checksum offloads commonly
performed in network cards, and, yes, people get bitten by it
occasionally.

An analysis of packets with bad tcp checksums i''m familiar with showed
a
fair number which showed patterns consistent with a DMA hiccup (repeated
words or cache lines); those glitches on a system with checksum offload
would turn into data corruption.  See "When The CRC and TCP Checksum
Disagree",
http://www.sigcomm.org/sigcomm2000/conf/paper/sigcomm2000-9-1.pdf


						 - Bill

Nicolas Williams

2007-Oct-05 15:37 UTC

head link

[zfs-discuss] Direct I/O ability with zfs?

On Thu, Oct 04, 2007 at 10:26:24PM -0700, Jonathan Loran
wrote:> I can envision a highly optimized, pipelined system, where writes and 
> reads pass through checksum, compression, encryption ASICs, that also 
> locate data properly on disk.  ...
I''ve argued before that RAID-Z could be implemented in hardware.  But I
think that it''s all about economics.  Software is easier to develop and
patch than hardware, so if we can put together systems with enough
memory, general purpose CPU horsepower, and memory and I/O bandwidth,
all cheaply enough, then that will be better than developing special
purpose hardware for ZFS.  Thumper is an example of such a system.

Eventually we may find trends in system design once again favoring
pushing special tasks to the edge.  When that happens I''m sure
we''ll go
there.  But right now the trend is to put crypto co-processors and NICs
on the same die as the CPU.

Nico
--

Tim Spriggs

2007-Oct-05 15:56 UTC

head link

[zfs-discuss] Direct I/O ability with zfs?

Nicolas Williams wrote:> On Thu, Oct 04, 2007 at 10:26:24PM -0700, Jonathan Loran wrote:
>   
>> I can envision a highly optimized, pipelined system, where writes and 
>> reads pass through checksum, compression, encryption ASICs, that also 
>> locate data properly on disk.  ...
>>     
>
> I''ve argued before that RAID-Z could be implemented in hardware. 
But I
> think that it''s all about economics.  Software is easier to
develop and
> patch than hardware, so if we can put together systems with enough
> memory, general purpose CPU horsepower, and memory and I/O bandwidth,
> all cheaply enough, then that will be better than developing special
> purpose hardware for ZFS.  Thumper is an example of such a system.
>
> Eventually we may find trends in system design once again favoring
> pushing special tasks to the edge.  When that happens I''m sure
we''ll go
> there.  But right now the trend is to put crypto co-processors and NICs
> on the same die as the CPU.
>
> Nico
>   Time for on board FPGAs!

Nicolas Williams

2007-Oct-05 15:58 UTC

head link

[zfs-discuss] Direct I/O ability with zfs?

On Fri, Oct 05, 2007 at 08:56:26AM -0700, Tim Spriggs
wrote:> Time for on board FPGAs!
Heh!

Jonathan Loran

2007-Oct-05 17:13 UTC

head link

[zfs-discuss] Direct I/O ability with zfs?

Nicolas Williams wrote:> On Thu, Oct 04, 2007 at 10:26:24PM -0700, Jonathan Loran wrote:
>   
>> I can envision a highly optimized, pipelined system, where writes and 
>> reads pass through checksum, compression, encryption ASICs, that also 
>> locate data properly on disk.  ...
>>     
>
> I''ve argued before that RAID-Z could be implemented in hardware. 
But I
> think that it''s all about economics.  Software is easier to
develop and
> patch than hardware, so if we can put together systems with enough
> memory, general purpose CPU horsepower, and memory and I/O bandwidth,
> all cheaply enough, then that will be better than developing special
> purpose hardware for ZFS.  Thumper is an example of such a system.
>
> Eventually we may find trends in system design once again favoring
> pushing special tasks to the edge.  When that happens I''m sure
we''ll go
> there.  But right now the trend is to put crypto co-processors and NICs
> on the same die as the CPU.
>
> Nico
>   1) We can put it on the same die also, or at least as a chip set on the 
MoBo.

2) Offload engines do have software, stored in firmware.   Or maybe such 
an offload processor could run software out of a driver, could be booted 
in dynamically?

3) You all are aware of how many micro processors are involved in a 
normal file server right?  There''s one at almost every interface, disk 
to controller, controller to PCI bridge, PCI bridge to Hyperbus, etc.  
Imagine the burden if you did all that in the CPU only?  I sometimes 
find it amazing computers are as stable as they are, but it''s all in
the
maturity of the code running in every step of the way, and of course, 
good firmware coding practices.  Your vanilla SCSI controllers and disk 
drives do a lot of very complex but useful processing.  We trust these 
guys 100%, because the interface is stable, and the code and processors 
are mature, well used.

I do agree, pushing ZFS to the edge will come down the road, when it 
becomes less dynamic (how boring) and we know more about the bottlenecks.

Jon

-- 

-     _____/     _____/      /           - Jonathan Loran -           -
-    /          /           /                IT Manager               -
-  _____  /   _____  /     /     Space Sciences Laboratory, UC Berkeley
-        /          /     /      (510) 643-5146 jloran at ssl.berkeley.edu
- ______/    ______/    ______/           AST:7731^29u18e3

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20071005/f7abbc0d/attachment.html>

Peter Schuller

2007-Oct-05 22:24 UTC

head link

[zfs-discuss] Direct I/O ability with zfs?

> Is there a specific reason why you need to do the caching at the DB  
> level instead of the file system?  I''m really curious as
i''ve got
> conflicting data on why people do this.  If i get more data on real  
> reasons on why we shouldn''t cache at the file system, then this
could
> get bumped up in my priority queue.
FWIW a MySQL database was recently moved to a FreeBSD system with
ZFS. Performance ended up sucking because for some reason data did not
make it into the cache in a predictable fashion (simple case of
repeated queries were not cached; so for example a very common query,
even when executed repeatedly on an idle system, would take more than
1 minute instead of 0.10 seconds or so when cached).

Ended up convincing the person running the DB to switch from MyISAM
(which does not seem to support DB level caching, other than of
indexes) to InnoDB, thus allowing use of the InnoDB buffer cache.

I don''t know why it wasn''t cached by ZFS/ARC to begin with
(the size
of the ARC cache was definitely large enough - ~ 800 MB, and I know
the working set for this query was below 300 MB). Perhaps it has to do
with ARC trying to be smart and avoiding flushing the cache with
useless data? I am not read up on the details of the ARC. But in this
particular case it was clear that a simple LRU had been much more
useful - unless there was some other problem related to my setup or
FreeBSD integration that somehow broke proper caching.

-- 
/ Peter Schuller

PGP userID: 0xE9758B7D or ''Peter Schuller <peter.schuller at
infidyne.com>''
Key retrieval: Send an E-Mail to getpgpkey at scode.org
E-Mail: peter.schuller at infidyne.com Web: http://www.scode.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 187 bytes
Desc: not available
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20071006/c414acb1/attachment.bin>

johansen at sun.com

2007-Oct-05 22:37 UTC

head link

[zfs-discuss] Direct I/O ability with zfs?

> But note that, for ZFS, the win with direct I/O will be somewhat
> less.  That''s because you still need to read the page to compute
> its checksum.  So for direct I/O with ZFS (with checksums enabled),
> the cost is W:LPS, R:2*LPS.  Is saving one page of writes enough to
> make a difference?  Possibly not.
It''s more complicated than that.  The kernel would be verifying
checksums on buffers in a user''s address space.  For this to work, we
have to map these buffers into the kernel and simultaneously arrange for
these pages to be protected from other threads in the user''s address
space.  We discussed some of the VM gymnastics required to properly
implement this back in January:

http://mail.opensolaris.org/pipermail/zfs-discuss/2007-January/thread.html#36890

-j

Richard Elling

2007-Oct-07 03:45 UTC

head link

[zfs-discuss] Direct I/O ability with zfs?

Peter Schuller wrote:>> Is there a specific reason why you need to do the caching at the DB  
>> level instead of the file system?  I''m really curious as
i''ve got
>> conflicting data on why people do this.  If i get more data on real  
>> reasons on why we shouldn''t cache at the file system, then
this could
>> get bumped up in my priority queue.
> 
> FWIW a MySQL database was recently moved to a FreeBSD system with
> ZFS. Performance ended up sucking because for some reason data did not
> make it into the cache in a predictable fashion (simple case of
> repeated queries were not cached; so for example a very common query,
> even when executed repeatedly on an idle system, would take more than
> 1 minute instead of 0.10 seconds or so when cached).
> 
> Ended up convincing the person running the DB to switch from MyISAM
> (which does not seem to support DB level caching, other than of
> indexes) to InnoDB, thus allowing use of the InnoDB buffer cache.
> 
> I don''t know why it wasn''t cached by ZFS/ARC to begin
with (the size
> of the ARC cache was definitely large enough - ~ 800 MB, and I know
> the working set for this query was below 300 MB). Perhaps it has to do
> with ARC trying to be smart and avoiding flushing the cache with
> useless data? I am not read up on the details of the ARC. But in this
> particular case it was clear that a simple LRU had been much more
> useful - unless there was some other problem related to my setup or
> FreeBSD integration that somehow broke proper caching.
Neel''s arcstat might help shed light on such behaviour.
	http://blogs.sun.com/realneel/entry/zfs_arc_statistics

  -- richard

dudekula mastan

2007-Oct-08 07:20 UTC

head link

[zfs-discuss] ZFS file system is crashing my system

Hi All,
   
  While pumping IO on a zfs file system my ststem is crashing/panicing. Please
find the crash dump below.
   
  panic[cpu0]/thread=2a100adfcc0: assertion failed: ss != NULL, file:
../../common/fs/zfs/space_map.c, line: 125
  000002a100adec40 genunix:assfail+74 (7b652448, 7b652458, 7d, 183d800, 11ed400,
0)
  %l0-3: 0000000000000000 0000000000000000 00000000011e7508 000003000744ea30
  %l4-7: 00000000011ed400 0000000000000000 000000000186fc00 0000000000000000
  000002a100adecf0 zfs:space_map_remove+b8 (3000683e7b8, 2b200000, 20000,
7b652000, 7b652400, 7b652400)
  %l0-3: 0000000000000000 000000002b220000 000000002b0ec600 000003000744ebc0
  %l4-7: 000003000744eaf8 000000002b0ec000 000000007b652000 000000002b0ec600
  000002a100adedd0 zfs:space_map_load+218 (3000683e7b8, 30006f5f160, 1000,
3000683e488, 2b000000, 1)
  %l0-3: 0000000000000160 0000030006f5f000 0000000000000000 000000007b620ad0
  %l4-7: 000000007b62086c 00007fffffffffff 0000000000007fff 0000030006f5f128
  000002a100adeea0 zfs:metaslab_activate+3c (3000683e480, 8000000000000000,
c000000000000000, 24a998, 3000683e480, c0000000)
  %l0-3: 0000000000000000 0000000000000008 0000000000000000 0000029ebf9d0000
  %l4-7: 00000000704e2000 000003000391e940 0000030005572540 00000300060bacd0
  000002a100adef50 zfs:metaslab_group_alloc+1bc (3fffffffffffffff, 20000,
8000000000000000, 7e68000, 30006766080, ffffffffffffffff)
  %l0-3: 0000000000000000 00000300060bacd8 0000000000000001 000003000683e480
  %l4-7: 8000000000000000 0000000000000000 0000000003f34000 4000000000000000
  000002a100adf030 zfs:metaslab_alloc_dva+114 (0, 7e68000, 30006766080, 20000,
30005572540, 1e910)
  %l0-3: 0000000000000001 0000000000000000 0000000000000003 000003000380b6e0
  %l4-7: 0000000000000000 00000300060bacd0 0000000000000000 00000300060bacd0
  000002a100adf100 zfs:metaslab_alloc+2c (3000391e940, 20000, 30006766080, 1,
1e910, 0)
  %l0-3: 0000009980001605 0000000000000016 0000000000001b4d 0000000000000214
  %l4-7: 0000000000000000 0000000000000000 000003000391e940 0000000000000001
  000002a100adf1b0 zfs:zio_dva_allocate+4c (30005dd8a40, 7b6335a8, 30006766080,
704e2508, 704e2400, 20001)
  %l0-3: 0000030005dd8a40 0000060200ff00ff 0000060200ff00ff 0000000000000000
  %l4-7: 0000000000000000 00000000018a6400 0000000000000001 0000000000000006
  000002a100adf260 zfs:zio_write_compress+1ec (30005dd8a40, 23e20b, 23e000,
ff00ff, 2, 30006766080)
  %l0-3: 000000000000ffff 00000000000000ff 0000000000000100 0000000000020000
  %l4-7: 0000000000000000 0000000000ff0000 000000000000fc00 00000000000000ff
  000002a100adf330 zfs:arc_write+e4 (30005dd8a40, 3000391e940, 6, 2, 1, 1e910)
  %l0-3: ffffffffffffffff 000000007b6063c8 0000030006af2570 00000300060c5cf0
  %l4-7: 000002a100adf538 0000000000000004 0000000000000004 00000300060c7a88
  000002a100adf440 zfs:dbuf_sync+6c0 (30006af2570, 30005dd9440, 2b3ca, 2, 6,
1e910)
  %l0-3: 0000030005dd96c0 0000000000000000 0000030006ae7750 0000030006af2678
  %l4-7: 0000030006766080 0000000000000013 0000000000000001 0000000000000000
  000002a100adf560 zfs:dnode_sync+35c (0, 0, 30005dd9440, 30005ac8cc0, 2, 2)
  %l0-3: 0000030006af2570 0000030006ae77a8 0000030006ae7808 0000030006ae7808
  %l4-7: 0000000000000000 0000030006ae77a8 0000000000000001 000003000640ace0
  000002a100adf620 zfs:dmu_objset_sync_dnodes+6c (30005dd96c0, 30005dd97a0,
30005ac8cc0, 30006ae7750, 30006bd3ca0, 0)
  %l0-3: 00000000704e84c0 00000000704e8000 00000000704e8000 0000000000000001
  %l4-7: 0000000000000000 00000000704e4000 0000000000000000 0000030005dd9440
  000002a100adf6d0 zfs:dmu_objset_sync+54 (30005dd96c0, 30005ac8cc0, 0, 0,
300060c5318, 1e910)
  %l0-3: 0000000000000000 000000000000000f 0000000000000000 000000000000478d
  %l4-7: 0000030005dd97a0 0000000000000000 0000030005dd97a0 0000030005dd9820
  000002a100adf7e0 zfs:dsl_dataset_sync+c (30006f36780, 30005ac8cc0,
30006f36810, 300040c7db8, 300040c7db8, 30006f36780)
  %l0-3: 0000000000000001 0000000000000007 00000300040c7e38 0000000000000000
  %l4-7: 0000030006f36808 0000000000000000 0000000000000000 0000000000000000
  000002a100adf890 zfs:dsl_pool_sync+64 (300040c7d00, 1e910, 30006f36780,
30005ac9640, 30005581a80, 30005581aa8)
  %l0-3: 0000000000000000 000003000391ed00 0000030005ac8cc0 00000300040c7e98
  %l4-7: 00000300040c7e68 00000300040c7e38 00000300040c7da8 0000030005dd9440
  000002a100adf940 zfs:spa_sync+1b0 (3000391e940, 1e910, 0, 0, 2a100adfcc4, 1)
  %l0-3: 000003000391eb00 000003000391eb10 000003000391ea28 0000030005ac9640
  %l4-7: 0000000000000000 000003000410f580 00000300040c7d00 000003000391eac0
  000002a100adfa00 zfs:txg_sync_thread+134 (300040c7d00, 1e910, 0, 2a100adfab0,
300040c7e10, 300040c7e12)
  %l0-3: 00000300040c7e20 00000300040c7dd0 0000000000000000 00000300040c7dd8
  %l4-7: 00000300040c7e16 00000300040c7e14 00000300040c7dc8 000000000001e911
  syncing file systems... [1] 16 [1] 6 [1] [1] [1] [1] [1] [1] [1] [1] [1] [1]
[1] [1] [1] [1] [1] [1] [1] [1] [1] [1] [1] done (not all i/o completed)
  dumping to /dev/dsk/c1t0d0s1, offset 429916160, content: kernel\
   
   
  Unfortunately I dont have much idea on crash dump analsys. Can any one explain
why my machine went down ?
   
  -Masthan D

       
---------------------------------
Don''t let your dream ride pass you by.    Make it a reality with Yahoo!
Autos.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20071008/507b7af9/attachment.html>

dudekula mastan

2007-Oct-09 05:04 UTC

head link

[zfs-discuss] ZFS file system is crashing my system

Hi All,
   
  Any one has any chance to look into this issue ?
   
  -Masthan D

dudekula mastan <d_mastan at yahoo.com> wrote:
    
Hi All,
   
  While pumping IO on a zfs file system my ststem is crashing/panicing. Please
find the crash dump below.
   
  panic[cpu0]/thread=2a100adfcc0: assertion failed: ss != NULL, file:
../../common/fs/zfs/space_map.c, line: 125
  000002a100adec40 genunix:assfail+74 (7b652448, 7b652458, 7d, 183d800, 11ed400,
0)
  %l0-3: 0000000000000000 0000000000000000 00000000011e7508 000003000744ea30
  %l4-7: 00000000011ed400 0000000000000000 000000000186fc00 0000000000000000
  000002a100adecf0 zfs:space_map_remove+b8 (3000683e7b8, 2b200000, 20000,
7b652000, 7b652400, 7b652400)
  %l0-3: 0000000000000000 000000002b220000 000000002b0ec600 000003000744ebc0
  %l4-7: 000003000744eaf8 000000002b0ec000 000000007b652000 000000002b0ec600
  000002a100adedd0 zfs:space_map_load+218 (3000683e7b8, 30006f5f160, 1000,
3000683e488, 2b000000, 1)
  %l0-3: 0000000000000160 0000030006f5f000 0000000000000000 000000007b620ad0
  %l4-7: 000000007b62086c 00007fffffffffff 0000000000007fff 0000030006f5f128
  000002a100adeea0 zfs:metaslab_activate+3c (3000683e480, 8000000000000000,
c000000000000000, 24a998, 3000683e480, c0000000)
  %l0-3: 0000000000000000 0000000000000008 0000000000000000 0000029ebf9d0000
  %l4-7: 00000000704e2000 000003000391e940 0000030005572540 00000300060bacd0
  000002a100adef50 zfs:metaslab_group_alloc+1bc (3fffffffffffffff, 20000,
8000000000000000, 7e68000, 30006766080, ffffffffffffffff)
  %l0-3: 0000000000000000 00000300060bacd8 0000000000000001 000003000683e480
  %l4-7: 8000000000000000 0000000000000000 0000000003f34000 4000000000000000
  000002a100adf030 zfs:metaslab_alloc_dva+114 (0, 7e68000, 30006766080, 20000,
30005572540, 1e910)
  %l0-3: 0000000000000001 0000000000000000 0000000000000003 000003000380b6e0
  %l4-7: 0000000000000000 00000300060bacd0 0000000000000000 00000300060bacd0
  000002a100adf100 zfs:metaslab_alloc+2c (3000391e940, 20000, 30006766080, 1,
1e910, 0)
  %l0-3: 0000009980001605 0000000000000016 0000000000001b4d 0000000000000214
  %l4-7: 0000000000000000 0000000000000000 000003000391e940 0000000000000001
  000002a100adf1b0 zfs:zio_dva_allocate+4c (30005dd8a40, 7b6335a8, 30006766080,
704e2508, 704e2400, 20001)
  %l0-3: 0000030005dd8a40 0000060200ff00ff 0000060200ff00ff 0000000000000000
  %l4-7: 0000000000000000 00000000018a6400 0000000000000001 0000000000000006
  000002a100adf260 zfs:zio_write_compress+1ec (30005dd8a40, 23e20b, 23e000,
ff00ff, 2, 30006766080)
  %l0-3: 000000000000ffff 00000000000000ff 0000000000000100 0000000000020000
  %l4-7: 0000000000000000 0000000000ff0000 000000000000fc00 00000000000000ff
  000002a100adf330 zfs:arc_write+e4 (30005dd8a40, 3000391e940, 6, 2, 1, 1e910)
  %l0-3: ffffffffffffffff 000000007b6063c8 0000030006af2570 00000300060c5cf0
  %l4-7: 000002a100adf538 0000000000000004 0000000000000004 00000300060c7a88
  000002a100adf440 zfs:dbuf_sync+6c0 (30006af2570, 30005dd9440, 2b3ca, 2, 6,
1e910)
  %l0-3: 0000030005dd96c0 0000000000000000 0000030006ae7750 0000030006af2678
  %l4-7: 0000030006766080 0000000000000013 0000000000000001 0000000000000000
  000002a100adf560 zfs:dnode_sync+35c (0, 0, 30005dd9440, 30005ac8cc0, 2, 2)
  %l0-3: 0000030006af2570 0000030006ae77a8 0000030006ae7808 0000030006ae7808
  %l4-7: 0000000000000000 0000030006ae77a8 0000000000000001 000003000640ace0
  000002a100adf620 zfs:dmu_objset_sync_dnodes+6c (30005dd96c0, 30005dd97a0,
30005ac8cc0, 30006ae7750, 30006bd3ca0, 0)
  %l0-3: 00000000704e84c0 00000000704e8000 00000000704e8000 0000000000000001
  %l4-7: 0000000000000000 00000000704e4000 0000000000000000 0000030005dd9440
  000002a100adf6d0 zfs:dmu_objset_sync+54 (30005dd96c0, 30005ac8cc0, 0, 0,
300060c5318, 1e910)
  %l0-3: 0000000000000000 000000000000000f 0000000000000000 000000000000478d
  %l4-7: 0000030005dd97a0 0000000000000000 0000030005dd97a0 0000030005dd9820
  000002a100adf7e0 zfs:dsl_dataset_sync+c (30006f36780, 30005ac8cc0,
30006f36810, 300040c7db8, 300040c7db8, 30006f36780)
  %l0-3: 0000000000000001 0000000000000007 00000300040c7e38 0000000000000000
  %l4-7: 0000030006f36808 0000000000000000 0000000000000000 0000000000000000
  000002a100adf890 zfs:dsl_pool_sync+64 (300040c7d00, 1e910, 30006f36780,
30005ac9640, 30005581a80, 30005581aa8)
  %l0-3: 0000000000000000 000003000391ed00 0000030005ac8cc0 00000300040c7e98
  %l4-7: 00000300040c7e68 00000300040c7e38 00000300040c7da8 0000030005dd9440
  000002a100adf940 zfs:spa_sync+1b0 (3000391e940, 1e910, 0, 0, 2a100adfcc4, 1)
  %l0-3: 000003000391eb00 000003000391eb10 000003000391ea28 0000030005ac9640
  %l4-7: 0000000000000000 000003000410f580 00000300040c7d00 000003000391eac0
  000002a100adfa00 zfs:txg_sync_thread+134 (300040c7d00, 1e910, 0, 2a100adfab0,
300040c7e10, 300040c7e12)
  %l0-3: 00000300040c7e20 00000300040c7dd0 0000000000000000 00000300040c7dd8
  %l4-7: 00000300040c7e16 00000300040c7e14 00000300040c7dc8 000000000001e911
  syncing file systems... [1] 16 [1] 6 [1] [1] [1] [1] [1] [1] [1] [1] [1] [1]
[1] [1] [1] [1] [1] [1] [1] [1] [1] [1] [1] done (not all i/o completed)
  dumping to /dev/dsk/c1t0d0s1, offset 429916160, content: kernel\
   
   
  Unfortunately I dont have much idea on crash dump analsys. Can any one explain
why my machine went down ?
   
  -Masthan D
    
---------------------------------
  Don''t let your dream ride pass you by. Make it a reality with Yahoo!
Autos. _______________________________________________
zfs-discuss mailing list
zfs-discuss at opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


       
---------------------------------
Looking for a deal? Find great prices on flights and hotels with Yahoo!
FareChase.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20071008/1e6e15d0/attachment.html>

Prabahar Jeyaram

2007-Oct-09 06:07 UTC

head link

[zfs-discuss] ZFS file system is crashing my system

Your system seem to have hit a variant of BUG :

6458218 - http://bugs.opensolaris.org/view_bug.do?bug_id=6458218

This is fixed in Opensolaris Build 60 or S10U4.

--
Prabahar.


On Oct 8, 2007, at 10:04 PM, dudekula mastan wrote:
> Hi All,
>
> Any one has any chance to look into this issue ?
>
> -Masthan D
>
> dudekula mastan <d_mastan at yahoo.com> wrote:
>
> Hi All,
>
> While pumping IO on a zfs file system my ststem is crashing/ 
> panicing. Please find the crash dump below.
>
> panic[cpu0]/thread=2a100adfcc0: assertion failed: ss != NULL,  
> file: ../../common/fs/zfs/space_map.c, line: 125
> 000002a100adec40 genunix:assfail+74 (7b652448, 7b652458, 7d,  
> 183d800, 11ed400, 0)
> %l0-3: 0000000000000000 0000000000000000 00000000011e7508  
> 000003000744ea30
> %l4-7: 00000000011ed400 0000000000000000 000000000186fc00  
> 0000000000000000
> 000002a100adecf0 zfs:space_map_remove+b8 (3000683e7b8, 2b200000,  
> 20000, 7b652000, 7b652400, 7b652400)
> %l0-3: 0000000000000000 000000002b220000 000000002b0ec600  
> 000003000744ebc0
> %l4-7: 000003000744eaf8 000000002b0ec000 000000007b652000  
> 000000002b0ec600
> 000002a100adedd0 zfs:space_map_load+218 (3000683e7b8, 30006f5f160,  
> 1000, 3000683e488, 2b000000, 1)
> %l0-3: 0000000000000160 0000030006f5f000 0000000000000000  
> 000000007b620ad0
> %l4-7: 000000007b62086c 00007fffffffffff 0000000000007fff  
> 0000030006f5f128
> 000002a100adeea0 zfs:metaslab_activate+3c (3000683e480,  
> 8000000000000000, c000000000000000, 24a998, 3000683e480, c0000000)
> %l0-3: 0000000000000000 0000000000000008 0000000000000000  
> 0000029ebf9d0000
> %l4-7: 00000000704e2000 000003000391e940 0000030005572540  
> 00000300060bacd0
> 000002a100adef50 zfs:metaslab_group_alloc+1bc (3fffffffffffffff,  
> 20000, 8000000000000000, 7e68000, 30006766080, ffffffffffffffff)
> %l0-3: 0000000000000000 00000300060bacd8 0000000000000001  
> 000003000683e480
> %l4-7: 8000000000000000 0000000000000000 0000000003f34000  
> 4000000000000000
> 000002a100adf030 zfs:metaslab_alloc_dva+114 (0, 7e68000,  
> 30006766080, 20000, 30005572540, 1e910)
> %l0-3: 0000000000000001 0000000000000000 0000000000000003  
> 000003000380b6e0
> %l4-7: 0000000000000000 00000300060bacd0 0000000000000000  
> 00000300060bacd0
> 000002a100adf100 zfs:metaslab_alloc+2c (3000391e940, 20000,  
> 30006766080, 1, 1e910, 0)
> %l0-3: 0000009980001605 0000000000000016 0000000000001b4d  
> 0000000000000214
> %l4-7: 0000000000000000 0000000000000000 000003000391e940  
> 0000000000000001
> 000002a100adf1b0 zfs:zio_dva_allocate+4c (30005dd8a40, 7b6335a8,  
> 30006766080, 704e2508, 704e2400, 20001)
> %l0-3: 0000030005dd8a40 0000060200ff00ff 0000060200ff00ff  
> 0000000000000000
> %l4-7: 0000000000000000 00000000018a6400 0000000000000001  
> 0000000000000006
> 000002a100adf260 zfs:zio_write_compress+1ec (30005dd8a40, 23e20b,  
> 23e000, ff00ff, 2, 30006766080)
> %l0-3: 000000000000ffff 00000000000000ff 0000000000000100  
> 0000000000020000
> %l4-7: 0000000000000000 0000000000ff0000 000000000000fc00  
> 00000000000000ff
> 000002a100adf330 zfs:arc_write+e4 (30005dd8a40, 3000391e940, 6, 2,  
> 1, 1e910)
> %l0-3: ffffffffffffffff 000000007b6063c8 0000030006af2570  
> 00000300060c5cf0
> %l4-7: 000002a100adf538 0000000000000004 0000000000000004  
> 00000300060c7a88
> 000002a100adf440 zfs:dbuf_sync+6c0 (30006af2570, 30005dd9440,  
> 2b3ca, 2, 6, 1e910)
> %l0-3: 0000030005dd96c0 0000000000000000 0000030006ae7750  
> 0000030006af2678
> %l4-7: 0000030006766080 0000000000000013 0000000000000001  
> 0000000000000000
> 000002a100adf560 zfs:dnode_sync+35c (0, 0, 30005dd9440,  
> 30005ac8cc0, 2, 2)
> %l0-3: 0000030006af2570 0000030006ae77a8 0000030006ae7808  
> 0000030006ae7808
> %l4-7: 0000000000000000 0000030006ae77a8 0000000000000001  
> 000003000640ace0
> 000002a100adf620 zfs:dmu_objset_sync_dnodes+6c (30005dd96c0,  
> 30005dd97a0, 30005ac8cc0, 30006ae7750, 30006bd3ca0, 0)
> %l0-3: 00000000704e84c0 00000000704e8000 00000000704e8000  
> 0000000000000001
> %l4-7: 0000000000000000 00000000704e4000 0000000000000000  
> 0000030005dd9440
> 000002a100adf6d0 zfs:dmu_objset_sync+54 (30005dd96c0, 30005ac8cc0,  
> 0, 0, 300060c5318, 1e910)
> %l0-3: 0000000000000000 000000000000000f 0000000000000000  
> 000000000000478d
> %l4-7: 0000030005dd97a0 0000000000000000 0000030005dd97a0  
> 0000030005dd9820
> 000002a100adf7e0 zfs:dsl_dataset_sync+c (30006f36780, 30005ac8cc0,  
> 30006f36810, 300040c7db8, 300040c7db8, 30006f36780)
> %l0-3: 0000000000000001 0000000000000007 00000300040c7e38  
> 0000000000000000
> %l4-7: 0000030006f36808 0000000000000000 0000000000000000  
> 0000000000000000
> 000002a100adf890 zfs:dsl_pool_sync+64 (300040c7d00, 1e910,  
> 30006f36780, 30005ac9640, 30005581a80, 30005581aa8)
> %l0-3: 0000000000000000 000003000391ed00 0000030005ac8cc0  
> 00000300040c7e98
> %l4-7: 00000300040c7e68 00000300040c7e38 00000300040c7da8  
> 0000030005dd9440
> 000002a100adf940 zfs:spa_sync+1b0 (3000391e940, 1e910, 0, 0,  
> 2a100adfcc4, 1)
> %l0-3: 000003000391eb00 000003000391eb10 000003000391ea28  
> 0000030005ac9640
> %l4-7: 0000000000000000 000003000410f580 00000300040c7d00  
> 000003000391eac0
> 000002a100adfa00 zfs:txg_sync_thread+134 (300040c7d00, 1e910, 0,  
> 2a100adfab0, 300040c7e10, 300040c7e12)
> %l0-3: 00000300040c7e20 00000300040c7dd0 0000000000000000  
> 00000300040c7dd8
> %l4-7: 00000300040c7e16 00000300040c7e14 00000300040c7dc8  
> 000000000001e911
> syncing file systems... [1] 16 [1] 6 [1] [1] [1] [1] [1] [1] [1]  
> [1] [1] [1] [1] [1] [1] [1] [1] [1] [1] [1] [1] [1] [1] done (not  
> all i/o completed)
> dumping to /dev/dsk/c1t0d0s1, offset 429916160, content: kernel\
>
>
> Unfortunately I dont have much idea on crash dump analsys. Can any  
> one explain why my machine went down ?
>
> -Masthan D
> Don''t let your dream ride pass you by. Make it a reality with  
> Yahoo! Autos. _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
>
> Looking for a deal? Find great prices on flights and hotels with  
> Yahoo! FareChase.
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Anton B. Rang

2007-Oct-09 06:32 UTC

head link

[zfs-discuss] ZFS file system is crashing my system

I didn''t see an exact match in the bug database, but
  http://bugs.opensolaris.org/view_bug.do?bug_id=6328538
looks possible.  (The line number doesn''t quite match, but the call
chain does.)

Someone else reported this last month:
  http://www.opensolaris.org/jive/thread.jspa?messageID=155834
but there wasn''t a definite conclusion from there about what the
problem is.
 
 
This message posted from opensolaris.org

dudekula mastan

2007-Oct-09 11:20 UTC

head link

[zfs-discuss] ZFS file system is crashing my system

Hi Jeyaram,
   
  Thanks for your reply. Can you explain more about this bug ?
   
  Regards
  Masthan D

Prabahar Jeyaram <Prabahar.Jeyaram at Sun.COM> wrote:
  Your system seem to have hit a variant of BUG :

6458218 - http://bugs.opensolaris.org/view_bug.do?bug_id=6458218

This is fixed in Opensolaris Build 60 or S10U4.

--
Prabahar.


On Oct 8, 2007, at 10:04 PM, dudekula mastan wrote:
> Hi All,
>
> Any one has any chance to look into this issue ?
>
> -Masthan D
>
> dudekula mastan wrote:
>
> Hi All,
>
> While pumping IO on a zfs file system my ststem is crashing/ 
> panicing. Please find the crash dump below.
>
> panic[cpu0]/thread=2a100adfcc0: assertion failed: ss != NULL, 
> file: ../../common/fs/zfs/space_map.c, line: 125
> 000002a100adec40 genunix:assfail+74 (7b652448, 7b652458, 7d, 
> 183d800, 11ed400, 0)
> %l0-3: 0000000000000000 0000000000000000 00000000011e7508 
> 000003000744ea30
> %l4-7: 00000000011ed400 0000000000000000 000000000186fc00 
> 0000000000000000
> 000002a100adecf0 zfs:space_map_remove+b8 (3000683e7b8, 2b200000, 
> 20000, 7b652000, 7b652400, 7b652400)
> %l0-3: 0000000000000000 000000002b220000 000000002b0ec600 
> 000003000744ebc0
> %l4-7: 000003000744eaf8 000000002b0ec000 000000007b652000 
> 000000002b0ec600
> 000002a100adedd0 zfs:space_map_load+218 (3000683e7b8, 30006f5f160, 
> 1000, 3000683e488, 2b000000, 1)
> %l0-3: 0000000000000160 0000030006f5f000 0000000000000000 
> 000000007b620ad0
> %l4-7: 000000007b62086c 00007fffffffffff 0000000000007fff 
> 0000030006f5f128
> 000002a100adeea0 zfs:metaslab_activate+3c (3000683e480, 
> 8000000000000000, c000000000000000, 24a998, 3000683e480, c0000000)
> %l0-3: 0000000000000000 0000000000000008 0000000000000000 
> 0000029ebf9d0000
> %l4-7: 00000000704e2000 000003000391e940 0000030005572540 
> 00000300060bacd0
> 000002a100adef50 zfs:metaslab_group_alloc+1bc (3fffffffffffffff, 
> 20000, 8000000000000000, 7e68000, 30006766080, ffffffffffffffff)
> %l0-3: 0000000000000000 00000300060bacd8 0000000000000001 
> 000003000683e480
> %l4-7: 8000000000000000 0000000000000000 0000000003f34000 
> 4000000000000000
> 000002a100adf030 zfs:metaslab_alloc_dva+114 (0, 7e68000, 
> 30006766080, 20000, 30005572540, 1e910)
> %l0-3: 0000000000000001 0000000000000000 0000000000000003 
> 000003000380b6e0
> %l4-7: 0000000000000000 00000300060bacd0 0000000000000000 
> 00000300060bacd0
> 000002a100adf100 zfs:metaslab_alloc+2c (3000391e940, 20000, 
> 30006766080, 1, 1e910, 0)
> %l0-3: 0000009980001605 0000000000000016 0000000000001b4d 
> 0000000000000214
> %l4-7: 0000000000000000 0000000000000000 000003000391e940 
> 0000000000000001
> 000002a100adf1b0 zfs:zio_dva_allocate+4c (30005dd8a40, 7b6335a8, 
> 30006766080, 704e2508, 704e2400, 20001)
> %l0-3: 0000030005dd8a40 0000060200ff00ff 0000060200ff00ff 
> 0000000000000000
> %l4-7: 0000000000000000 00000000018a6400 0000000000000001 
> 0000000000000006
> 000002a100adf260 zfs:zio_write_compress+1ec (30005dd8a40, 23e20b, 
> 23e000, ff00ff, 2, 30006766080)
> %l0-3: 000000000000ffff 00000000000000ff 0000000000000100 
> 0000000000020000
> %l4-7: 0000000000000000 0000000000ff0000 000000000000fc00 
> 00000000000000ff
> 000002a100adf330 zfs:arc_write+e4 (30005dd8a40, 3000391e940, 6, 2, 
> 1, 1e910)
> %l0-3: ffffffffffffffff 000000007b6063c8 0000030006af2570 
> 00000300060c5cf0
> %l4-7: 000002a100adf538 0000000000000004 0000000000000004 
> 00000300060c7a88
> 000002a100adf440 zfs:dbuf_sync+6c0 (30006af2570, 30005dd9440, 
> 2b3ca, 2, 6, 1e910)
> %l0-3: 0000030005dd96c0 0000000000000000 0000030006ae7750 
> 0000030006af2678
> %l4-7: 0000030006766080 0000000000000013 0000000000000001 
> 0000000000000000
> 000002a100adf560 zfs:dnode_sync+35c (0, 0, 30005dd9440, 
> 30005ac8cc0, 2, 2)
> %l0-3: 0000030006af2570 0000030006ae77a8 0000030006ae7808 
> 0000030006ae7808
> %l4-7: 0000000000000000 0000030006ae77a8 0000000000000001 
> 000003000640ace0
> 000002a100adf620 zfs:dmu_objset_sync_dnodes+6c (30005dd96c0, 
> 30005dd97a0, 30005ac8cc0, 30006ae7750, 30006bd3ca0, 0)
> %l0-3: 00000000704e84c0 00000000704e8000 00000000704e8000 
> 0000000000000001
> %l4-7: 0000000000000000 00000000704e4000 0000000000000000 
> 0000030005dd9440
> 000002a100adf6d0 zfs:dmu_objset_sync+54 (30005dd96c0, 30005ac8cc0, 
> 0, 0, 300060c5318, 1e910)
> %l0-3: 0000000000000000 000000000000000f 0000000000000000 
> 000000000000478d
> %l4-7: 0000030005dd97a0 0000000000000000 0000030005dd97a0 
> 0000030005dd9820
> 000002a100adf7e0 zfs:dsl_dataset_sync+c (30006f36780, 30005ac8cc0, 
> 30006f36810, 300040c7db8, 300040c7db8, 30006f36780)
> %l0-3: 0000000000000001 0000000000000007 00000300040c7e38 
> 0000000000000000
> %l4-7: 0000030006f36808 0000000000000000 0000000000000000 
> 0000000000000000
> 000002a100adf890 zfs:dsl_pool_sync+64 (300040c7d00, 1e910, 
> 30006f36780, 30005ac9640, 30005581a80, 30005581aa8)
> %l0-3: 0000000000000000 000003000391ed00 0000030005ac8cc0 
> 00000300040c7e98
> %l4-7: 00000300040c7e68 00000300040c7e38 00000300040c7da8 
> 0000030005dd9440
> 000002a100adf940 zfs:spa_sync+1b0 (3000391e940, 1e910, 0, 0, 
> 2a100adfcc4, 1)
> %l0-3: 000003000391eb00 000003000391eb10 000003000391ea28 
> 0000030005ac9640
> %l4-7: 0000000000000000 000003000410f580 00000300040c7d00 
> 000003000391eac0
> 000002a100adfa00 zfs:txg_sync_thread+134 (300040c7d00, 1e910, 0, 
> 2a100adfab0, 300040c7e10, 300040c7e12)
> %l0-3: 00000300040c7e20 00000300040c7dd0 0000000000000000 
> 00000300040c7dd8
> %l4-7: 00000300040c7e16 00000300040c7e14 00000300040c7dc8 
> 000000000001e911
> syncing file systems... [1] 16 [1] 6 [1] [1] [1] [1] [1] [1] [1] 
> [1] [1] [1] [1] [1] [1] [1] [1] [1] [1] [1] [1] [1] [1] done (not 
> all i/o completed)
> dumping to /dev/dsk/c1t0d0s1, offset 429916160, content: kernel\
>
>
> Unfortunately I dont have much idea on crash dump analsys. Can any 
> one explain why my machine went down ?
>
> -Masthan D
> Don''t let your dream ride pass you by. Make it a reality with 
> Yahoo! Autos. _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
>
> Looking for a deal? Find great prices on flights and hotels with 
> Yahoo! FareChase.
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


       
---------------------------------
 Check out  the hottest 2008 models today at Yahoo! Autos.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20071009/0757754c/attachment.html>

Prabahar Jeyaram

2007-Oct-09 15:38 UTC

head link

[zfs-discuss] ZFS file system is crashing my system

Hi Masthan,

There was a race in the block allocation code which allocates a  
single disk block to two consumers. The system will trip when both  
the consumers try to free the block.

--
Prabahar.

On Oct 9, 2007, at 4:20 AM, dudekula mastan wrote:
> Hi Jeyaram,
>
> Thanks for your reply. Can you explain more about this bug ?
>
> Regards
> Masthan D
>
> Prabahar Jeyaram <Prabahar.Jeyaram at Sun.COM> wrote:
> Your system seem to have hit a variant of BUG :
>
> 6458218 - http://bugs.opensolaris.org/view_bug.do?bug_id=6458218
>
> This is fixed in Opensolaris Build 60 or S10U4.
>
> --
> Prabahar.
>
>
> On Oct 8, 2007, at 10:04 PM, dudekula mastan wrote:
>
> > Hi All,
> >
> > Any one has any chance to look into this issue ?
> >
> > -Masthan D
> >
> > dudekula mastan wrote:
> >
> > Hi All,
> >
> > While pumping IO on a zfs file system my ststem is crashing/
> > panicing. Please find the crash dump below.
> >
> > panic[cpu0]/thread=2a100adfcc0: assertion failed: ss != NULL,
> > file: ../../common/fs/zfs/space_map.c, line: 125
> > 000002a100adec40 genunix:assfail+74 (7b652448, 7b652458, 7d,
> > 183d800, 11ed400, 0)
> > %l0-3: 0000000000000000 0000000000000000 00000000011e7508
> > 000003000744ea30
> > %l4-7: 00000000011ed400 0000000000000000 000000000186fc00
> > 0000000000000000
> > 000002a100adecf0 zfs:space_map_remove+b8 (3000683e7b8, 2b200000,
> > 20000, 7b652000, 7b652400, 7b652400)
> > %l0-3: 0000000000000000 000000002b220000 000000002b0ec600
> > 000003000744ebc0
> > %l4-7: 000003000744eaf8 000000002b0ec000 000000007b652000
> > 000000002b0ec600
> > 000002a100adedd0 zfs:space_map_load+218 (3000683e7b8, 30006f5f160,
> > 1000, 3000683e488, 2b000000, 1)
> > %l0-3: 0000000000000160 0000030006f5f000 0000000000000000
> > 000000007b620ad0
> > %l4-7: 000000007b62086c 00007fffffffffff 0000000000007fff
> > 0000030006f5f128
> > 000002a100adeea0 zfs:metaslab_activate+3c (3000683e480,
> > 8000000000000000, c000000000000000, 24a998, 3000683e480, c0000000)
> > %l0-3: 0000000000000000 0000000000000008 0000000000000000
> > 0000029ebf9d0000
> > %l4-7: 00000000704e2000 000003000391e940 0000030005572540
> > 00000300060bacd0
> > 000002a100adef50 zfs:metaslab_group_alloc+1bc (3fffffffffffffff,
> > 20000, 8000000000000000, 7e68000, 30006766080, ffffffffffffffff)
> > %l0-3: 0000000000000000 00000300060bacd8 0000000000000001
> > 000003000683e480
> > %l4-7: 8000000000000000 0000000000000000 0000000003f34000
> > 4000000000000000
> > 000002a100adf030 zfs:metaslab_alloc_dva+114 (0, 7e68000,
> > 30006766080, 20000, 30005572540, 1e910)
> > %l0-3: 0000000000000001 0000000000000000 0000000000000003
> > 000003000380b6e0
> > %l4-7: 0000000000000000 00000300060bacd0 0000000000000000
> > 00000300060bacd0
> > 000002a100adf100 zfs:metaslab_alloc+2c (3000391e940, 20000,
> > 30006766080, 1, 1e910, 0)
> > %l0-3: 0000009980001605 0000000000000016 0000000000001b4d
> > 0000000000000214
> > %l4-7: 0000000000000000 0000000000000000 000003000391e940
> > 0000000000000001
> > 000002a100adf1b0 zfs:zio_dva_allocate+4c (30005dd8a40, 7b6335a8,
> > 30006766080, 704e2508, 704e2400, 20001)
> > %l0-3: 0000030005dd8a40 0000060200ff00ff 0000060200ff00ff
> > 0000000000000000
> > %l4-7: 0000000000000000 00000000018a6400 0000000000000001
> > 0000000000000006
> > 000002a100adf260 zfs:zio_write_compress+1ec (30005dd8a40, 23e20b,
> > 23e000, ff00ff, 2, 30006766080)
> > %l0-3: 000000000000ffff 00000000000000ff 0000000000000100
> > 0000000000020000
> > %l4-7: 0000000000000000 0000000000ff0000 000000000000fc00
> > 00000000000000ff
> > 000002a100adf330 zfs:arc_write+e4 (30005dd8a40, 3000391e940, 6, 2,
> > 1, 1e910)
> > %l0-3: ffffffffffffffff 000000007b6063c8 0000030006af2570
> > 00000300060c5cf0
> > %l4-7: 000002a100adf538 0000000000000004 0000000000000004
> > 00000300060c7a88
> > 000002a100adf440 zfs:dbuf_sync+6c0 (30006af2570, 30005dd9440,
> > 2b3ca, 2, 6, 1e910)
> > %l0-3: 0000030005dd96c0 0000000000000000 0000030006ae7750
> > 0000030006af2678
> > %l4-7: 0000030006766080 0000000000000013 0000000000000001
> > 0000000000000000
> > 000002a100adf560 zfs:dnode_sync+35c (0, 0, 30005dd9440,
> > 30005ac8cc0, 2, 2)
> > %l0-3: 0000030006af2570 0000030006ae77a8 0000030006ae7808
> > 0000030006ae7808
> > %l4-7: 0000000000000000 0000030006ae77a8 0000000000000001
> > 000003000640ace0
> > 000002a100adf620 zfs:dmu_objset_sync_dnodes+6c (30005dd96c0,
> > 30005dd97a0, 30005ac8cc0, 30006ae7750, 30006bd3ca0, 0)
> > %l0-3: 00000000704e84c0 00000000704e8000 00000000704e8000
> > 0000000000000001
> > %l4-7: 0000000000000000 00000000704e4000 0000000000000000
> > 0000030005dd9440
> > 000002a100adf6d0 zfs:dmu_objset_sync+54 (30005dd96c0, 30005ac8cc0,
> > 0, 0, 300060c5318, 1e910)
> > %l0-3: 0000000000000000 000000000000000f 0000000000000000
> > 000000000000478d
> > %l4-7: 0000030005dd97a0 0000000000000000 0000030005dd97a0
> > 0000030005dd9820
> > 000002a100adf7e0 zfs:dsl_dataset_sync+c (30006f36780, 30005ac8cc0,
> > 30006f36810, 300040c7db8, 300040c7db8, 30006f36780)
> > %l0-3: 0000000000000001 0000000000000007 00000300040c7e38
> > 0000000000000000
> > %l4-7: 0000030006f36808 0000000000000000 0000000000000000
> > 0000000000000000
> > 000002a100adf890 zfs:dsl_pool_sync+64 (300040c7d00, 1e910,
> > 30006f36780, 30005ac9640, 30005581a80, 30005581aa8)
> > %l0-3: 0000000000000000 000003000391ed00 0000030005ac8cc0
> > 00000300040c7e98
> > %l4-7: 00000300040c7e68 00000300040c7e38 00000300040c7da8
> > 0000030005dd9440
> > 000002a100adf940 zfs:spa_sync+1b0 (3000391e940, 1e910, 0, 0,
> > 2a100adfcc4, 1)
> > %l0-3: 000003000391eb00 000003000391eb10 000003000391ea28
> > 0000030005ac9640
> > %l4-7: 0000000000000000 000003000410f580 00000300040c7d00
> > 000003000391eac0
> > 000002a100adfa00 zfs:txg_sync_thread+134 (300040c7d00, 1e910, 0,
> > 2a100adfab0, 300040c7e10, 300040c7e12)
> > %l0-3: 00000300040c7e20 00000300040c7dd0 0000000000000000
> > 00000300040c7dd8
> > %l4-7: 00000300040c7e16 00000300040c7e14 00000300040c7dc8
> > 000000000001e911
> > syncing file systems... [1] 16 [1] 6 [1] [1] [1] [1] [1] [1] [1]
> > [1] [1] [1] [1] [1] [1] [1] [1] [1] [1] [1] [1] [1] [1] done (not
> > all i/o completed)
> > dumping to /dev/dsk/c1t0d0s1, offset 429916160, content: kernel\
> >
> >
> > Unfortunately I dont have much idea on crash dump analsys. Can any
> > one explain why my machine went down ?
> >
> > -Masthan D
> > Don''t let your dream ride pass you by. Make it a reality with
> > Yahoo! Autos. _______________________________________________
> > zfs-discuss mailing list
> > zfs-discuss at opensolaris.org
> > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
> >
> >
> > Looking for a deal? Find great prices on flights and hotels with
> > Yahoo! FareChase.
> > _______________________________________________
> > zfs-discuss mailing list
> > zfs-discuss at opensolaris.org
> > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
>
>
> Check out the hottest 2008 models today at Yahoo! Autos.
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

dudekula mastan

2007-Oct-10 04:48 UTC

head link

[zfs-discuss] Direct I/O ability with zfs?

Hi Everybody,
   
  From the last one week so many mails are exchanged on this topic.
   
  I have also one similar issue like this. I will appreciate if any one helps me
on this.
   
  I have an IO test tool, which writes the data and reads the data and then
compare the read data with write data. If read data and write data are same then
there is no CORRUIPTION else there is a CORRUPTION.
   
  File data may corrupt because of any reasons and one possible reason is file
system cache. If file system cache have issues, it will give wrong data (wrong
data means the actual data on the disk and the data that read call return to the
application are not match) to user applications.
   
  When there is a CORRUPTION, to check file system cache issues, my application
bypass the file system cache and then reads (Re-read) the data from the same
file and then compare the re-read data with write data.
   
  Tell me, is there a way to skip ZFS file system cache or tell me is there a
way to do direct IO on ZFS file system?
   
  Regards
  Masthan D
   

       
---------------------------------
Don''t let your dream ride pass you by.    Make it a reality with Yahoo!
Autos.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20071009/bc535fc7/attachment.html>

dudekula mastan

2007-Oct-10 08:49 UTC

head link

[zfs-discuss] Direct I/O ability with zfs?

Hi All,
   
  Any update on this ?
   
  -Masthan D

dudekula mastan <d_mastan at yahoo.com> wrote:
    Hi Everybody,
   
  From the last one week so many mails are exchanged on this topic.
   
  I have also one similar issue like this. I will appreciate if any one helps me
on this.
   
  I have an IO test tool, which writes the data and reads the data and then
compare the read data with write data. If read data and write data are same then
there is no CORRUIPTION else there is a CORRUPTION.
   
  File data may corrupt because of any reasons and one possible reason is file
system cache. If file system cache have issues, it will give wrong data (wrong
data means the actual data on the disk and the data that read call return to the
application are not match) to user applications.
   
  When there is a CORRUPTION, to check file system cache issues, my application
bypass the file system cache and then reads (Re-read) the data from the same
file and then compare the re-read data with write data.
   
  Tell me, is there a way to skip ZFS file system cache or tell me is there a
way to do direct IO on ZFS file system?
   
  Regards
  Masthan D
   
    
---------------------------------
  Don''t let your dream ride pass you by. Make it a reality with Yahoo!
Autos. _______________________________________________
zfs-discuss mailing list
zfs-discuss at opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


       
---------------------------------
Pinpoint customers who are looking for what you sell. 
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20071010/2d9cac74/attachment.html>

Vidya Sakar N

2007-Oct-10 09:27 UTC

head link

[zfs-discuss] Direct I/O ability with zfs?

>     Tell me, is there a way to skip ZFS file system cache or tell me is
>     there a way to do direct IO on ZFS file system?
No, currently there is no way to disable file system cache aka ARC in ZFS.
There is a pending RFE though,
6429855 Need way to tell ZFS that caching is a lost cause

Cheers,
Vidya Sakar


dudekula mastan wrote:> Hi All,
>  
> Any update on this ?
>  
> -Masthan D
> 
> */dudekula mastan <d_mastan at yahoo.com>/* wrote:
> 
>     Hi Everybody,
>      
>      From the last one week so many mails are exchanged on this topic.
>      
>     I have also one similar issue like this. I will appreciate if any
>     one helps me on this.
>      
>     I have an IO test tool, which writes the data and reads the data and
>     then compare the read data with write data. If read data and write
>     data are same then there is no CORRUIPTION else there is a CORRUPTION.
>      
>     File data may corrupt because of any reasons and one possible reason
>     is file system cache. If file system cache have issues, it will
>     give wrong data (wrong data means the actual data on the disk and
>     the data that read call return to the application are not match) to
>     user applications.
>      
>     When there is a CORRUPTION, to check file system cache issues, my
>     application bypass the file system cache and then reads (Re-read)
>     the data from the same file and then compare the re-read data with
>     write data.
>      
>     Tell me, is there a way to skip ZFS file system cache or tell me is
>     there a way to do direct IO on ZFS file system?
>      
>     Regards
>     Masthan D
>      
>    
------------------------------------------------------------------------
>     Don''t let your dream ride pass you by. Make it a reality
>    
<http://us.rd.yahoo.com/evt=51200/*http://autos.yahoo.com/index.html;_ylc=X3oDMTFibjNlcHF0BF9TAzk3MTA3MDc2BHNlYwNtYWlsdGFncwRzbGsDYXV0b3MtZHJlYW1jYXI->
>     with Yahoo! Autos. _______________________________________________
>     zfs-discuss mailing list
>     zfs-discuss at opensolaris.org
>     http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
> 
> 
> ------------------------------------------------------------------------
> Pinpoint customers 
>
<http://us.rd.yahoo.com/evt=48250/*http://searchmarketing.yahoo.com/arp/sponsoredsearch_v9.php?o=US2226&cmp=Yahoo&ctv=AprNI&s=Y&s2=EM&b=50>who
> are looking for what you sell.
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Rayson Ho

2007-Oct-29 15:39 UTC

head link

[zfs-discuss] Direct I/O ability with zfs?

Restarting this thread... I''ve just finished reading the article,
"A
look at MySQL on ZFS":
http://dev.mysql.com/tech-resources/articles/mysql-zfs.html

The section " MySQL Performance Comparison: ZFS vs. UFS on Open
Solaris" looks interesting...

Rayson

Rayson Ho

2007-Oct-29 18:08 UTC

head link

[zfs-discuss] Direct I/O ability with zfs?

Hi Tony,

John posted the URL to his article to the databases-discuss this
morning, and I only took a very quick look.

May be you can join that list and discuss further regarding the configurations?
http://mail.opensolaris.org/mailman/listinfo/databases-discuss

Rayson



On 10/29/07, Tony Leone <Tony.Leone at oag.state.ny.us>
wrote:> This is very interesting because it directly contradicts the results the
ZFS developers are posting on the OpenSolaris mailing list.  I just scanned the
article, does he give his ZFS settings and is he separate ZIL devices?
>
> Tony Leone
>
> >>> "Rayson Ho" <rayrayson at gmail.com>
10/29/2007 11:39 AM >>>
> Restarting this thread... I''ve just finished reading the article,
"A
> look at MySQL on ZFS":
> http://dev.mysql.com/tech-resources/articles/mysql-zfs.html
>
> The section " MySQL Performance Comparison: ZFS vs. UFS on Open
> Solaris" looks interesting...
>
> Rayson
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
>

Reasonably Related Threads

Search for more maybe matching threads

zfs discuss - Oct 2007 - Direct I/O ability with zfs?

[zfs-discuss] Direct I/O ability with zfs?

[zfs-discuss] Direct I/O ability with zfs?

[zfs-discuss] Direct I/O ability with zfs?

[zfs-discuss] Direct I/O ability with zfs?

[zfs-discuss] Direct I/O ability with zfs?

[zfs-discuss] Direct I/O ability with zfs?

[zfs-discuss] Direct I/O ability with zfs?

[zfs-discuss] Direct I/O ability with zfs?

[zfs-discuss] Direct I/O ability with zfs?

[zfs-discuss] Direct I/O ability with zfs?

[zfs-discuss] Direct I/O ability with zfs?

[zfs-discuss] Direct I/O ability with zfs?

[zfs-discuss] Direct I/O ability with zfs?

[zfs-discuss] Direct I/O ability with zfs?

[zfs-discuss] Direct I/O ability with zfs?

[zfs-discuss] Direct I/O ability with zfs?

[zfs-discuss] Direct I/O ability with zfs?

[zfs-discuss] Direct I/O ability with zfs?

[zfs-discuss] Direct I/O ability with zfs?

[zfs-discuss] Direct I/O ability with zfs?

[zfs-discuss] Direct I/O ability with zfs?

[zfs-discuss] Direct I/O ability with zfs?

[zfs-discuss] Direct I/O ability with zfs?

[zfs-discuss] Direct I/O ability with zfs?

[zfs-discuss] Direct I/O ability with zfs?

[zfs-discuss] Direct I/O ability with zfs?

[zfs-discuss] Direct I/O ability with zfs?

[zfs-discuss] Direct I/O ability with zfs?

[zfs-discuss] Direct I/O ability with zfs?

[zfs-discuss] Direct I/O ability with zfs?

[zfs-discuss] Direct I/O ability with zfs?

[zfs-discuss] Direct I/O ability with zfs?

[zfs-discuss] Direct I/O ability with zfs?

[zfs-discuss] Direct I/O ability with zfs?

[zfs-discuss] Direct I/O ability with zfs?

[zfs-discuss] Direct I/O ability with zfs?

[zfs-discuss] Direct I/O ability with zfs?

[zfs-discuss] Direct I/O ability with zfs?

[zfs-discuss] Direct I/O ability with zfs?

[zfs-discuss] Direct I/O ability with zfs?

[zfs-discuss] Direct I/O ability with zfs?

[zfs-discuss] Direct I/O ability with zfs?

[zfs-discuss] Direct I/O ability with zfs?

[zfs-discuss] ZFS file system is crashing my system

[zfs-discuss] ZFS file system is crashing my system

[zfs-discuss] ZFS file system is crashing my system

[zfs-discuss] ZFS file system is crashing my system

[zfs-discuss] ZFS file system is crashing my system

[zfs-discuss] ZFS file system is crashing my system

[zfs-discuss] Direct I/O ability with zfs?

[zfs-discuss] Direct I/O ability with zfs?

[zfs-discuss] Direct I/O ability with zfs?

[zfs-discuss] Direct I/O ability with zfs?

[zfs-discuss] Direct I/O ability with zfs?

Reasonably Related Threads