Considering that one of ZFS''s design goals is to address more than 2^64 bytes of storage capacity (and DVAs are already 128 bits for this reason), and that this should be possible even within a single top level vdev, shouldn''t the asize variable in the vdev label be 128 bits instead of the current 64? This message posted from opensolaris.org
On Fri, Dec 23, 2005 at 11:04:15PM -0800, Andrew wrote:> Considering that one of ZFS''s design goals is to address more than > 2^64 bytes of storage capacity (and DVAs are already 128 bits for this > reason), and that this should be possible even within a single top > level vdev, shouldn''t the asize variable in the vdev label be 128 bits > instead of the current 64?If you look at blkptr_t, you''ll see that the (roughly) 128 bits comes from up to 2^64 top-level vdevs of up to 2^64 bytes in size. So you can''t have a top-level vdev that''s bigger than 2^64 bytes. Since ASIZE represents the allocatable space from a given vdev, 64 bits is an appropriate size. --Bill
> If you look at blkptr_t, you''ll see that the (roughly) 128 bits comes > from up to 2^64 top-level vdevsActually 2^32 vdevs (the vdev ID in the DVA is 32 bits), though that''s not important here.> of up to 2^64 bytes in size. So you > can''t have a top-level vdev that''s bigger than 2^64 bytes.The 63-bit offset in the DVA addresses 512-byte sectors, not bytes, so the DVA can actually address up to 2^72 bytes per vdev.> Since ASIZE > represents the allocatable space from a given vdev, 64 bits is an > appropriate size.Even if the DVA could only address 2^64 bytes per vdev, that would simply be an argument for increasing that limit too; assuming for the sake of argument that Sun is correct in its prediction that current growth rates will be sustained for another several decades, and given ZFS''s design goal of not hitting any addressing limits within that period, a limit of 2^64 bytes per top level vdev is too small, so any internal limit (whether it''s the vdev label asize veriable, or offset size in the DVA, or anything else) which imposes that limit should be increased. It''s true that with the current design, the limit can be worked around by using multiple top level vdevs (giving a limit of 2^104 addressable bytes per pool), but that''s an inadequate solution for the same reason that it''s inadequate to work around a 2GB-per-file limit by splitting large files into multiple 2GB parts, as some contemporary filesystems require. A generation ago, hard drives were on the order of 2^23 bytes. Today, they''re 2^39 bytes. Extrapolate another couple generations, as Sun is anticipating, and the 2^64 byte barrier is blown. (Aside from all that, I think that the ZFS team should just abandon all fixed-size ints and use bignums instead, but I know that the programmers are just going to laugh at me when they read this.) This message posted from opensolaris.org
Richard Elling
2005-Dec-25 21:17 UTC
[zfs-discuss] Re: vdev label asize variable is too small
> Even if the DVA could only address 2^64 bytes per > vdev, that would simply be an argument for increasing > that limit too; assuming for the sake of argument > that Sun is correct in its prediction that current > growth rates will be sustained for another several > decades, and given ZFS''s design goal of not hitting > any addressing limits within that period, a limit of > 2^64 bytes per top level vdev is too small, so any > internal limit (whether it''s the vdev label asize > veriable, or offset size in the DVA, or anything > else) which imposes that limit should be increased.I don''t necessarily disagree, but moving forward, my crystal ball says that we won''t be using 512-byte blocks in a few years. 4kByte block size seems to be the concensus next step for spinning rust. For those who are looking at that part of the source code in detail, please verify that we''re not assuming 512-byte (disk) blocks. -- richard This message posted from opensolaris.org
James C. McPherson
2005-Dec-28 20:15 UTC
[zfs-discuss] Re: vdev label asize variable is too small
Andrew wrote: ...> (Aside from all that, I think that the ZFS team should just abandon all> fixed-size ints and use bignums instead, but I know that the programmers > are just going to laugh at me when they read this.) Eeeek! lisp in the kernel! mega-evilness awaits. surely that''s more the response you were expecting? :) Even bignums have limits though. cheers, James C. McPherson -- Solaris Datapath Engineering Data Management Group Sun Microsystems
Andrew
2006-Jan-04 16:22 UTC
[zfs-discuss] Lisp in the kernel (was: Re: vdev label asize variable is too small)
James C. McPherson wrote:> Even bignums have limits though.Yes. The limit is the amount of available virtual memory. Maybe I''m just still feeling ill at all those programmers whole claimed that they fixed y2k bugs when all they really did was convert them to y10k bugs, not to mention the programmers who created the y2038 bug, but I''m the kind of guy who gets rankled when "uuencode -p /dev/dsk/c0d0t0 | xargs mkdir" doesn''t work, and "touch -t 99999999999999999999999999999999999999999912312359.59 foo" doesn''t work. I do understand that there''s no such thing as a turing machine, but since I have to put up with the constraints of hardware that''s just a finite state machine, I ought to at least be able to fully utilize the resources of that machine, without my software imposing arbitrary additional constraints. I''ve got 100 godzillion bytes of storage space and virtual memory available, but my software is going to blow up in the year 10k for lack of a measly fifth byte in a fixed-size ASCII-encoded date field? Puh-leaze. I don''t expect really big numbers and strings to necessarily always work as efficiently as little ones, but the system shouldn''t stubbornly tell me "No! Not possible!" when I try to even use them at all. That attitude is just so... Soviet. Excuse me while I go look for my marbles. This message posted from opensolaris.org
Andrew
2006-Jan-04 16:44 UTC
[zfs-discuss] Haskell in the kernel (was: Re: vdev label asize variable is too small)
James C. McPherson wrote:> Eeeek! lisp in the kernel! mega-evilness awaits. > surely that''s more the response you were expecting?See http://www.cse.ogi.edu/~hallgren/ICFP2005/ "A Principled Approach to Operating System Construction in Haskell" Abstract: "We describe a monadic interface to low-level hardware features that is a suitable basis for building operating systems in Haskell. The interface includes primitives for controlling memory management hardware, user-mode process execution, and low-level device I/O. The interface enforces memory safety in nearly all circumstances. Its behavior is specified in part by formal assertions written in a programming logic called P-Logic. The interface has been implemented on bare IA32 hardware using the Glasgow Haskell Compiler (GHC) runtime system. We show how a variety of simple O/S kernels can be constructed on top of the interface, including a simple separation kernel and a demonstration system in which the kernel, window system, and all device drivers are written in Haskell." This message posted from opensolaris.org
Andrew wrote:> James C. McPherson wrote: >> Eeeek! lisp in the kernel! mega-evilness awaits. surely that''s more >> the response you were expecting? > See http://www.cse.ogi.edu/~hallgren/ICFP2005/ "A Principled Approach > to Operating System Construction in Haskell" > Abstract: "We describe a monadic interface to low-level hardware > features that is a suitable basis for building operating systems in > Haskell. The interface includes primitives for controlling memory > management hardware, user-mode process execution, and low-level > device I/O. The interface enforces memory safety in nearly all > circumstances. Its behavior is specified in part by formal assertions > written in a programming logic called P-Logic. The interface has been > implemented on bare IA32 hardware using the Glasgow Haskell Compiler > (GHC) runtime system. We show how a variety of simple O/S kernels can > be constructed on top of the interface, including a simple separation > kernel and a demonstration system in which the kernel, window system, > and all device drivers are written in Haskell."Andrew, in your previous post you said you were going looking for your marbles. Might I politely suggest (*snigger*) that your knowing that this article exists proves that you''ve lost them completely? :) Of course, those who first thought of a kernel _written_ in Haskell should increase their medication.... cheers! James C. McPherson -- Solaris Datapath Engineering Data Management Group Sun Microsystems