thr3ads.net - Lustre devel - [Lustre-devel] Filesystem as a Database? [Nov 2008]

If this information is useful, please help other people find it:
Share via:

Daire Byrne

2008-Nov-11 17:03 UTC

[Lustre-devel] Filesystem as a Database?

Hi,

I''m curious if there is any interest in building a more DB-like
interface into Lustre so that fast queries can be performed on the filesystem
and things like file versioning could be recorded. We currently use an inhouse
digital asset DB (DAB) which essentially uses a SQL database to version latest
"releases" of files and record dependencies between files stored on
the filesystem. Is the upcoming "Changelogs" feature a basic DB of
sorts already?

Our current asset DB uses special hidden dirs on the filesystem to
"store" all the files and a separate SQL DB is used to record all
their asset metadata and relationships - it might be nice one day to only need
the filesystem. No doubt this is somewhat outside Lustre''s mission
statement but I thought I''d mention it! If nothing else it might be
nice to be able to record simple metadata in files (e.g. EAs) and be able to
search the filesystem quickly for files with certain attributes. And if OSSs
could simultaneously search their own OSTs/DBs then it would be pretty scalable.

Regards,

Daire

Eric Barton

2008-Nov-11 22:11 UTC

head link

[Lustre-devel] Filesystem as a Database?

Daire,
> I''m curious if there is any interest in building a more DB-like
> interface into Lustre so that fast queries can be performed on the
> filesystem and things like file versioning could be recorded. We
> currently use an inhouse digital asset DB (DAB) which essentially
> uses a SQL database to version latest "releases" of files and
record
> dependencies between files stored on the filesystem. Is the upcoming
> "Changelogs" feature a basic DB of sorts already?
Not in itself - but the changelog could be used as a feed for a database
that tracks the filesystem, and then you could run your general purpose
queries there.
> Our current asset DB uses special hidden dirs on the filesystem to
> "store" all the files and a separate SQL DB is used to record all
> their asset metadata and relationships - it might be nice one day to
> only need the filesystem. No doubt this is somewhat outside
Lustre''s
> mission statement but I thought I''d mention it! If nothing else it
> might be nice to be able to record simple metadata in files
> (e.g. EAs) and be able to search the filesystem quickly for files
> with certain attributes. And if OSSs could simultaneously search
> their own OSTs/DBs then it would be pretty scalable.
Indeed.  To keep with the design ideal of eliminating all scanning in
normal operation, fast querying like this relies on being able to build
and maintain an index on arbitrary file properties.  This is quite an
interesting challenge if it is not to interfere with regular filesystem
performance and makes at least the metadata server look much more like a
general purpose database than a posix namespace.  So in that respect it
does fall outside our current mission statement.  But as filesystems
scale up to trillions of files, even fully parallel scans of the namespace
will start to take unacceptably long and something like this could begin
to become a requirement.

    Cheers,
              Eric

Brian J. Murrell

2008-Nov-11 22:24 UTC

head link

[Lustre-devel] Filesystem as a Database?

On Tue, 2008-11-11 at 22:11 +0000, Eric Barton wrote:> 
> Not in itself - but the changelog could be used as a feed for a database
> that tracks the filesystem, and then you could run your general purpose
> queries there.
...> Indeed.  To keep with the design ideal of eliminating all scanning in
> normal operation, fast querying like this relies on being able to build
> and maintain an index on arbitrary file properties.  This is quite an
> interesting challenge if it is not to interfere with regular filesystem
> performance and makes at least the metadata server look much more like a
> general purpose database than a posix namespace.  So in that respect it
> does fall outside our current mission statement.  But as filesystems
> scale up to trillions of files, even fully parallel scans of the namespace
> will start to take unacceptably long and something like this could begin
> to become a requirement.
Just as a datapoint, not really a suggest to use either of them, but
this sounds an awful lot like what beagle and tracker aim to do for
smaller scale filesystems today.  Granted those two indexers are more
interested in content (i.e. indexing what''s in files) than metadata
(which is what I''m, perhaps incorrectly, understanding you are more
interested in indexing) but there is nothing stopping anyone from adding
a backend to track file metadata and query it-- if anyone was interested
in it.  In fact beagle at least does index some metadata like file
names, extensions, file/mime-type, etc.

What is interesting is that in correlation or perhaps contrast to our
changelogs, beagle (and probably tracker) use the Linux inotify
interface to find out when filesystem state has changed.

b.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
Url :
http://lists.lustre.org/pipermail/lustre-devel/attachments/20081111/978db2d5/attachment.bin

Daire Byrne

2008-Nov-12 09:55 UTC

head link

[Lustre-devel] Filesystem as a Database?

Eric/Brian,

Cheers for the replies. I was really just thinking out loud while trying to get
my head around the design of our new inhouse asset database. I was imagining
what useful functionality there could be in mixing a filesystem and database
together. Sorry for spamming the devel list!

Daire

----- "Brian J. Murrell" <Brian.Murrell at Sun.COM> wrote:
> On Tue, 2008-11-11 at 22:11 +0000, Eric Barton wrote:
> > 
> > Not in itself - but the changelog could be used as a feed for a
> database
> > that tracks the filesystem, and then you could run your general
> purpose
> > queries there.
> ...
> > Indeed.  To keep with the design ideal of eliminating all scanning
> in
> > normal operation, fast querying like this relies on being able to
> build
> > and maintain an index on arbitrary file properties.  This is quite
> an
> > interesting challenge if it is not to interfere with regular
> filesystem
> > performance and makes at least the metadata server look much more
> like a
> > general purpose database than a posix namespace.  So in that respect
> it
> > does fall outside our current mission statement.  But as
> filesystems
> > scale up to trillions of files, even fully parallel scans of the
> namespace
> > will start to take unacceptably long and something like this could
> begin
> > to become a requirement.
> 
> Just as a datapoint, not really a suggest to use either of them, but
> this sounds an awful lot like what beagle and tracker aim to do for
> smaller scale filesystems today.  Granted those two indexers are more
> interested in content (i.e. indexing what''s in files) than
metadata
> (which is what I''m, perhaps incorrectly, understanding you are
more
> interested in indexing) but there is nothing stopping anyone from
> adding
> a backend to track file metadata and query it-- if anyone was
> interested
> in it.  In fact beagle at least does index some metadata like file
> names, extensions, file/mime-type, etc.
> 
> What is interesting is that in correlation or perhaps contrast to our
> changelogs, beagle (and probably tracker) use the Linux inotify
> interface to find out when filesystem state has changed.
> 
> b.

Lustre devel - Nov 2008 - Filesystem as a Database?

[Lustre-devel] Filesystem as a Database?

[Lustre-devel] Filesystem as a Database?

[Lustre-devel] Filesystem as a Database?

[Lustre-devel] Filesystem as a Database?