Hi, I''m curious if there is any interest in building a more DB-like interface into Lustre so that fast queries can be performed on the filesystem and things like file versioning could be recorded. We currently use an inhouse digital asset DB (DAB) which essentially uses a SQL database to version latest "releases" of files and record dependencies between files stored on the filesystem. Is the upcoming "Changelogs" feature a basic DB of sorts already? Our current asset DB uses special hidden dirs on the filesystem to "store" all the files and a separate SQL DB is used to record all their asset metadata and relationships - it might be nice one day to only need the filesystem. No doubt this is somewhat outside Lustre''s mission statement but I thought I''d mention it! If nothing else it might be nice to be able to record simple metadata in files (e.g. EAs) and be able to search the filesystem quickly for files with certain attributes. And if OSSs could simultaneously search their own OSTs/DBs then it would be pretty scalable. Regards, Daire
Daire,> I''m curious if there is any interest in building a more DB-like > interface into Lustre so that fast queries can be performed on the > filesystem and things like file versioning could be recorded. We > currently use an inhouse digital asset DB (DAB) which essentially > uses a SQL database to version latest "releases" of files and record > dependencies between files stored on the filesystem. Is the upcoming > "Changelogs" feature a basic DB of sorts already?Not in itself - but the changelog could be used as a feed for a database that tracks the filesystem, and then you could run your general purpose queries there.> Our current asset DB uses special hidden dirs on the filesystem to > "store" all the files and a separate SQL DB is used to record all > their asset metadata and relationships - it might be nice one day to > only need the filesystem. No doubt this is somewhat outside Lustre''s > mission statement but I thought I''d mention it! If nothing else it > might be nice to be able to record simple metadata in files > (e.g. EAs) and be able to search the filesystem quickly for files > with certain attributes. And if OSSs could simultaneously search > their own OSTs/DBs then it would be pretty scalable.Indeed. To keep with the design ideal of eliminating all scanning in normal operation, fast querying like this relies on being able to build and maintain an index on arbitrary file properties. This is quite an interesting challenge if it is not to interfere with regular filesystem performance and makes at least the metadata server look much more like a general purpose database than a posix namespace. So in that respect it does fall outside our current mission statement. But as filesystems scale up to trillions of files, even fully parallel scans of the namespace will start to take unacceptably long and something like this could begin to become a requirement. Cheers, Eric
On Tue, 2008-11-11 at 22:11 +0000, Eric Barton wrote:> > Not in itself - but the changelog could be used as a feed for a database > that tracks the filesystem, and then you could run your general purpose > queries there....> Indeed. To keep with the design ideal of eliminating all scanning in > normal operation, fast querying like this relies on being able to build > and maintain an index on arbitrary file properties. This is quite an > interesting challenge if it is not to interfere with regular filesystem > performance and makes at least the metadata server look much more like a > general purpose database than a posix namespace. So in that respect it > does fall outside our current mission statement. But as filesystems > scale up to trillions of files, even fully parallel scans of the namespace > will start to take unacceptably long and something like this could begin > to become a requirement.Just as a datapoint, not really a suggest to use either of them, but this sounds an awful lot like what beagle and tracker aim to do for smaller scale filesystems today. Granted those two indexers are more interested in content (i.e. indexing what''s in files) than metadata (which is what I''m, perhaps incorrectly, understanding you are more interested in indexing) but there is nothing stopping anyone from adding a backend to track file metadata and query it-- if anyone was interested in it. In fact beagle at least does index some metadata like file names, extensions, file/mime-type, etc. What is interesting is that in correlation or perhaps contrast to our changelogs, beagle (and probably tracker) use the Linux inotify interface to find out when filesystem state has changed. b. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: This is a digitally signed message part Url : http://lists.lustre.org/pipermail/lustre-devel/attachments/20081111/978db2d5/attachment.bin
Eric/Brian, Cheers for the replies. I was really just thinking out loud while trying to get my head around the design of our new inhouse asset database. I was imagining what useful functionality there could be in mixing a filesystem and database together. Sorry for spamming the devel list! Daire ----- "Brian J. Murrell" <Brian.Murrell at Sun.COM> wrote:> On Tue, 2008-11-11 at 22:11 +0000, Eric Barton wrote: > > > > Not in itself - but the changelog could be used as a feed for a > database > > that tracks the filesystem, and then you could run your general > purpose > > queries there. > ... > > Indeed. To keep with the design ideal of eliminating all scanning > in > > normal operation, fast querying like this relies on being able to > build > > and maintain an index on arbitrary file properties. This is quite > an > > interesting challenge if it is not to interfere with regular > filesystem > > performance and makes at least the metadata server look much more > like a > > general purpose database than a posix namespace. So in that respect > it > > does fall outside our current mission statement. But as > filesystems > > scale up to trillions of files, even fully parallel scans of the > namespace > > will start to take unacceptably long and something like this could > begin > > to become a requirement. > > Just as a datapoint, not really a suggest to use either of them, but > this sounds an awful lot like what beagle and tracker aim to do for > smaller scale filesystems today. Granted those two indexers are more > interested in content (i.e. indexing what''s in files) than metadata > (which is what I''m, perhaps incorrectly, understanding you are more > interested in indexing) but there is nothing stopping anyone from > adding > a backend to track file metadata and query it-- if anyone was > interested > in it. In fact beagle at least does index some metadata like file > names, extensions, file/mime-type, etc. > > What is interesting is that in correlation or perhaps contrast to our > changelogs, beagle (and probably tracker) use the Linux inotify > interface to find out when filesystem state has changed. > > b.