On Sat, Dec 3, 2011 at 10:33 PM, Douglas Gregor <dgregor at apple.com> wrote:> Hi Manuel, > > On Nov 28, 2011, at 2:49 AM, Manuel Klimek wrote: > >> Hi, >> >> while working on tooling on top of clang/llvm we found the file system >> abstractions in clang/llvm to be one of the points that could be nicer >> to integrate with. I’m writing this mail to propose a strawman and get >> some feedback on what you guys think the right way forward is (or >> whether we should just leave things as they are). >> >> First, the FileManager we have in clang has helped us a lot for our >> tooling - when we run clang in a mapreduce we don’t need to lay out >> files on a disk, we can just map files into memory and happily clang >> over them. We’re also using the same mechanism to map builtin >> includes; in short, the FileManager has made it possible to do clang >> at scale. >> >> Now we’re aware that it was not really the intention of the >> FileManager to allow doing the things we do with it: not every module >> in clang uses the FileManager, and the moment we hit llvm there is no >> FileManager at all. For example, in case of the Driver we hack around >> the fact that the header search tries to access the file system >> driectly in rather brittle ways, relying on implementation details and >> #ifdefs. >> >> So why not make FileManager a more principled (and still blazing fast) >> file system abstraction? > > Yes, please!Great :) /me jumps right into the design discussion then.> Having a proper virtual file system across Clang and LLVM would be a huge boon, especially for pushing Clang into more applications that aren't simply "grab stuff from the local file system." The current FileManager/SourceManager dance used to provide in-memory content for a (virtual or old) file is quite the mess. > >> Pro: >> - only one interface for developers to learn on the project (no more >> PathV1 vs PathV2 vs FileManager) >> - only one implementation (per-platform) for easier maintenance of the >> file system platform abstraction >> - one point to insert synchronization guarantees for tools / IDE >> integration that wants to run clang in multiple threads at once (for >> example when re-indexing on 12-ht-core machines) >> - being able to replay compilations by injecting a virtual file system >> that exactly “copies” the original file system’s content, which allows >> easy scaling of replays, running tools against dirty edit buffers on a >> lower level than the SourceManager and unit testing > > … and making sure that all of the various stages of compilation see the same view of the file system. > >> Con: >> - there would be yet another try at unifying the APIs which would be >> in an intermediate state while being worked on (and PathV1 vs PathV2 >> is already bad enough) > > I'm fine with intermediate states so long as the direction and benefits are clear. The former we can certainly discuss, and the latter is obvious already. > >> - making it the canonical file system interface is a lot of effort >> that requires touching a lot of systems (while we’re volunteering to >> do the work, it will probably eat up other people’s time, too) > > I doubt I'll have much time to directly hack on this, but I'll be happy to review / discuss / help with adoption. libclang is one of the huge beneficiaries of such a change, so I care a lot about getting that to work well. > >> What parts (if any) of this type of transition makes sense? >> 1. Figure out the “correct” interface we’d want for FileManager to be >> more generally useful >> 2. Change FileManager to that interface >> 4. Sink FileManager into llvm, so it can be used by other projects >> 4. Use it throughout clang >> 5. Use it throughout llvm >> We don’t need to do all of them at once, and should be able to >> evaluate the results along the way. > > I share some of Daniel's concern about re-using FileManager, because the interface is very narrowly designed for Clang's usage and some of the functionality intended for the VFS is split out into SourceManager. My advice would be to start building a new VFS down in LLVM, and make FileManager an increasingly-shrinking interface on top of the new VFS. At some point, FileManager will be thin enough that its clients can just switch directly over to using the VFS, and FileManager can eventually go away. > > I do realize that this could end up like PathV1 vs. PathV2, where both exist for a while, but the benefits of the VFS should outweigh our collective laziness.So, as I noted in my replay to Daniel, after working through llvm/Support (and bringing FileManager back to my mind) I think I'm actually seeing a way forward, tell me if I'm crazy: 1. morph FileSystem (I don't know whether that would include PathV2, but I currently don't think so) into a class that exports a nice interface for all FileSystem functions that we can override; to be able to do that step-by-step, we could for example introduce a static FileSystem pointer that is initialized with the default system file system on startup (I like being able to do baby-steps) 2. add methods to FileSystem to support opening MemoryBuffers; the path forward will be to move all calls to MemofyBuffer::get*File through the FileSystem interface, but again that can be handled incrementally 3. at that point we'd have enough stuff in FileSystem to rebase FileManager on top of it; once 1 and 2 are finished for clang/.* we'll be able to completely move the virtual file support over into a nice OverlayFileSystem implementation (argh, I've coded too many of those in my life); 4. add methods to FileSystem to support opening raw_fd_ostreams; this is basically the process for reading mirrored Thoughts? Completely broken approach? Broken order? On a different note, switching to the SourceManager topic - I know enough about SourceManager to be dangerous but not enough to ever claim I would have understood the crazy buffer management that's going on in ContentCache :) So I'd need a lot of help to pry that box open eventually. Currently I'd think that this can be done in a subsequent step after the file system is sorted out, but I might be wrong... Cheers, /Manuel
On Sun, Dec 4, 2011 at 9:06 AM, Manuel Klimek <klimek at google.com> wrote:> On Sat, Dec 3, 2011 at 10:33 PM, Douglas Gregor <dgregor at apple.com> wrote: >> Hi Manuel, >> >> On Nov 28, 2011, at 2:49 AM, Manuel Klimek wrote: >> >>> Hi, >>> >>> while working on tooling on top of clang/llvm we found the file system >>> abstractions in clang/llvm to be one of the points that could be nicer >>> to integrate with. I’m writing this mail to propose a strawman and get >>> some feedback on what you guys think the right way forward is (or >>> whether we should just leave things as they are). >>> >>> First, the FileManager we have in clang has helped us a lot for our >>> tooling - when we run clang in a mapreduce we don’t need to lay out >>> files on a disk, we can just map files into memory and happily clang >>> over them. We’re also using the same mechanism to map builtin >>> includes; in short, the FileManager has made it possible to do clang >>> at scale. >>> >>> Now we’re aware that it was not really the intention of the >>> FileManager to allow doing the things we do with it: not every module >>> in clang uses the FileManager, and the moment we hit llvm there is no >>> FileManager at all. For example, in case of the Driver we hack around >>> the fact that the header search tries to access the file system >>> driectly in rather brittle ways, relying on implementation details and >>> #ifdefs. >>> >>> So why not make FileManager a more principled (and still blazing fast) >>> file system abstraction? >> >> Yes, please! > > Great :) /me jumps right into the design discussion then. > >> Having a proper virtual file system across Clang and LLVM would be a huge boon, especially for pushing Clang into more applications that aren't simply "grab stuff from the local file system." The current FileManager/SourceManager dance used to provide in-memory content for a (virtual or old) file is quite the mess. >> >>> Pro: >>> - only one interface for developers to learn on the project (no more >>> PathV1 vs PathV2 vs FileManager) >>> - only one implementation (per-platform) for easier maintenance of the >>> file system platform abstraction >>> - one point to insert synchronization guarantees for tools / IDE >>> integration that wants to run clang in multiple threads at once (for >>> example when re-indexing on 12-ht-core machines) >>> - being able to replay compilations by injecting a virtual file system >>> that exactly “copies” the original file system’s content, which allows >>> easy scaling of replays, running tools against dirty edit buffers on a >>> lower level than the SourceManager and unit testing >> >> … and making sure that all of the various stages of compilation see the same view of the file system. >> >>> Con: >>> - there would be yet another try at unifying the APIs which would be >>> in an intermediate state while being worked on (and PathV1 vs PathV2 >>> is already bad enough) >> >> I'm fine with intermediate states so long as the direction and benefits are clear. The former we can certainly discuss, and the latter is obvious already. >> >>> - making it the canonical file system interface is a lot of effort >>> that requires touching a lot of systems (while we’re volunteering to >>> do the work, it will probably eat up other people’s time, too) >> >> I doubt I'll have much time to directly hack on this, but I'll be happy to review / discuss / help with adoption. libclang is one of the huge beneficiaries of such a change, so I care a lot about getting that to work well. >> >>> What parts (if any) of this type of transition makes sense? >>> 1. Figure out the “correct” interface we’d want for FileManager to be >>> more generally useful >>> 2. Change FileManager to that interface >>> 4. Sink FileManager into llvm, so it can be used by other projects >>> 4. Use it throughout clang >>> 5. Use it throughout llvm >>> We don’t need to do all of them at once, and should be able to >>> evaluate the results along the way. >> >> I share some of Daniel's concern about re-using FileManager, because the interface is very narrowly designed for Clang's usage and some of the functionality intended for the VFS is split out into SourceManager. My advice would be to start building a new VFS down in LLVM, and make FileManager an increasingly-shrinking interface on top of the new VFS. At some point, FileManager will be thin enough that its clients can just switch directly over to using the VFS, and FileManager can eventually go away. >> >> I do realize that this could end up like PathV1 vs. PathV2, where both exist for a while, but the benefits of the VFS should outweigh our collective laziness. > > So, as I noted in my replay to Daniel, after working through > llvm/Support (and bringing FileManager back to my mind) I think I'm > actually seeing a way forward, tell me if I'm crazy: > 1. morph FileSystem (I don't know whether that would include PathV2, > but I currently don't think so) into a class that exports a nice > interface for all FileSystem functions that we can override; to be > able to do that step-by-step, we could for example introduce a static > FileSystem pointer that is initialized with the default system file > system on startup (I like being able to do baby-steps) > 2. add methods to FileSystem to support opening MemoryBuffers; the > path forward will be to move all calls to MemofyBuffer::get*File > through the FileSystem interface, but again that can be handled > incrementally > 3. at that point we'd have enough stuff in FileSystem to rebase > FileManager on top of it; once 1 and 2 are finished for clang/.* we'll > be able to completely move the virtual file support over into a nice > OverlayFileSystem implementation (argh, I've coded too many of those > in my life); > 4. add methods to FileSystem to support opening raw_fd_ostreams; this > is basically the process for reading mirrored > > Thoughts? Completely broken approach? Broken order? > > On a different note, switching to the SourceManager topic - I know > enough about SourceManager to be dangerous but not enough to ever > claim I would have understood the crazy buffer management that's going > on in ContentCache :) So I'd need a lot of help to pry that box open > eventually. Currently I'd think that this can be done in a subsequent > step after the file system is sorted out, but I might be wrong... > > Cheers, > /ManuelJust for some background about why we have PathV2. In my quest to improve Windows support across LLVM and Clang I ran into many issues with the way PathV1 worked. A few were: * PathV1, and most of LLVM, use std::string to handle errors. This makes code more verbose than needed, and loses os level error information. * PathV1 makes it difficult to handle Unicode on Windows. Although apparently I didn't solve the problem correctly either :P. * PathV1 requires constructing a Path object before calling any functions. This is inefficient when most of the time you have something StringRef'able. Thus when I designed PathV2 I made it stateless, utf-8 only, and used error_code. The reason I bring this up is because I support a VFS, however, I want to make sure that we keep in mind the reasons PathV2 was created while writing it. PathV1 -> PathV2 transition stopped because I ran out of time to do it. There's so much code that uses it, and some of the changes are non trivial in the cases where the Path class is stored and accessed many places instead of just used to access the path functions. The approach and order seems good to me. The llvm::sys::path parts can stay separate, only the llvm::sys::fs parts need to be virtualized. - Michael Spencer
Hi Manuel, On Sun, Dec 4, 2011 at 9:06 AM, Manuel Klimek <klimek at google.com> wrote:> On Sat, Dec 3, 2011 at 10:33 PM, Douglas Gregor <dgregor at apple.com> wrote: >> Hi Manuel, >> >> On Nov 28, 2011, at 2:49 AM, Manuel Klimek wrote: >> >>> Hi, >>> >>> while working on tooling on top of clang/llvm we found the file system >>> abstractions in clang/llvm to be one of the points that could be nicer >>> to integrate with. I’m writing this mail to propose a strawman and get >>> some feedback on what you guys think the right way forward is (or >>> whether we should just leave things as they are). >>> >>> First, the FileManager we have in clang has helped us a lot for our >>> tooling - when we run clang in a mapreduce we don’t need to lay out >>> files on a disk, we can just map files into memory and happily clang >>> over them. We’re also using the same mechanism to map builtin >>> includes; in short, the FileManager has made it possible to do clang >>> at scale. >>> >>> Now we’re aware that it was not really the intention of the >>> FileManager to allow doing the things we do with it: not every module >>> in clang uses the FileManager, and the moment we hit llvm there is no >>> FileManager at all. For example, in case of the Driver we hack around >>> the fact that the header search tries to access the file system >>> driectly in rather brittle ways, relying on implementation details and >>> #ifdefs. >>> >>> So why not make FileManager a more principled (and still blazing fast) >>> file system abstraction? >> >> Yes, please! > > Great :) /me jumps right into the design discussion then. > >> Having a proper virtual file system across Clang and LLVM would be a huge boon, especially for pushing Clang into more applications that aren't simply "grab stuff from the local file system." The current FileManager/SourceManager dance used to provide in-memory content for a (virtual or old) file is quite the mess. >> >>> Pro: >>> - only one interface for developers to learn on the project (no more >>> PathV1 vs PathV2 vs FileManager) >>> - only one implementation (per-platform) for easier maintenance of the >>> file system platform abstraction >>> - one point to insert synchronization guarantees for tools / IDE >>> integration that wants to run clang in multiple threads at once (for >>> example when re-indexing on 12-ht-core machines) >>> - being able to replay compilations by injecting a virtual file system >>> that exactly “copies” the original file system’s content, which allows >>> easy scaling of replays, running tools against dirty edit buffers on a >>> lower level than the SourceManager and unit testing >> >> … and making sure that all of the various stages of compilation see the same view of the file system. >> >>> Con: >>> - there would be yet another try at unifying the APIs which would be >>> in an intermediate state while being worked on (and PathV1 vs PathV2 >>> is already bad enough) >> >> I'm fine with intermediate states so long as the direction and benefits are clear. The former we can certainly discuss, and the latter is obvious already. >> >>> - making it the canonical file system interface is a lot of effort >>> that requires touching a lot of systems (while we’re volunteering to >>> do the work, it will probably eat up other people’s time, too) >> >> I doubt I'll have much time to directly hack on this, but I'll be happy to review / discuss / help with adoption. libclang is one of the huge beneficiaries of such a change, so I care a lot about getting that to work well. >> >>> What parts (if any) of this type of transition makes sense? >>> 1. Figure out the “correct” interface we’d want for FileManager to be >>> more generally useful >>> 2. Change FileManager to that interface >>> 4. Sink FileManager into llvm, so it can be used by other projects >>> 4. Use it throughout clang >>> 5. Use it throughout llvm >>> We don’t need to do all of them at once, and should be able to >>> evaluate the results along the way. >> >> I share some of Daniel's concern about re-using FileManager, because the interface is very narrowly designed for Clang's usage and some of the functionality intended for the VFS is split out into SourceManager. My advice would be to start building a new VFS down in LLVM, and make FileManager an increasingly-shrinking interface on top of the new VFS. At some point, FileManager will be thin enough that its clients can just switch directly over to using the VFS, and FileManager can eventually go away. >> >> I do realize that this could end up like PathV1 vs. PathV2, where both exist for a while, but the benefits of the VFS should outweigh our collective laziness. > > So, as I noted in my replay to Daniel, after working through > llvm/Support (and bringing FileManager back to my mind) I think I'm > actually seeing a way forward, tell me if I'm crazy:The following seems like a good plan and breakdown to me. And +1 on working baby-steps.> 1. morph FileSystem (I don't know whether that would include PathV2, > but I currently don't think so) into a class that exports a nice > interface for all FileSystem functions that we can override; to be > able to do that step-by-step, we could for example introduce a static > FileSystem pointer that is initialized with the default system file > system on startup (I like being able to do baby-steps) > 2. add methods to FileSystem to support opening MemoryBuffers; the > path forward will be to move all calls to MemofyBuffer::get*File > through the FileSystem interface, but again that can be handled > incrementally > 3. at that point we'd have enough stuff in FileSystem to rebase > FileManager on top of it; once 1 and 2 are finished for clang/.* we'll > be able to completely move the virtual file support over into a nice > OverlayFileSystem implementation (argh, I've coded too many of those > in my life);Is this true (enough support to base FileManager on top)? I'm specifically thinking about some of the places we look at inodes. Or are you expecting to expose some kind of abstracted representation of an inode?> 4. add methods to FileSystem to support opening raw_fd_ostreams; this > is basically the process for reading mirrored > > Thoughts? Completely broken approach? Broken order? > > On a different note, switching to the SourceManager topic - I know > enough about SourceManager to be dangerous but not enough to ever > claim I would have understood the crazy buffer management that's going > on in ContentCache :) So I'd need a lot of help to pry that box open > eventually. Currently I'd think that this can be done in a subsequent > step after the file system is sorted out, but I might be wrong...I'd try away from SourceManager. I would hope that the VFS layer stuff doesn't interact (or minimally) with SourceManager (although SourceManager is also aware of inodes, which is sad). - Daniel> > Cheers, > /Manuel > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Sent from my iPhone On Dec 5, 2011, at 9:04 PM, Daniel Dunbar <daniel at zuster.org> wrote:> Hi Manuel, > > On Sun, Dec 4, 2011 at 9:06 AM, Manuel Klimek <klimek at google.com> wrote: >> On Sat, Dec 3, 2011 at 10:33 PM, Douglas Gregor <dgregor at apple.com> wrote: >>> Hi Manuel, >>> >>> On Nov 28, 2011, at 2:49 AM, Manuel Klimek wrote: >>> >>>> Hi, >>>> >>>> while working on tooling on top of clang/llvm we found the file system >>>> abstractions in clang/llvm to be one of the points that could be nicer >>>> to integrate with. I’m writing this mail to propose a strawman and get >>>> some feedback on what you guys think the right way forward is (or >>>> whether we should just leave things as they are). >>>> >>>> First, the FileManager we have in clang has helped us a lot for our >>>> tooling - when we run clang in a mapreduce we don’t need to lay out >>>> files on a disk, we can just map files into memory and happily clang >>>> over them. We’re also using the same mechanism to map builtin >>>> includes; in short, the FileManager has made it possible to do clang >>>> at scale. >>>> >>>> Now we’re aware that it was not really the intention of the >>>> FileManager to allow doing the things we do with it: not every module >>>> in clang uses the FileManager, and the moment we hit llvm there is no >>>> FileManager at all. For example, in case of the Driver we hack around >>>> the fact that the header search tries to access the file system >>>> driectly in rather brittle ways, relying on implementation details and >>>> #ifdefs. >>>> >>>> So why not make FileManager a more principled (and still blazing fast) >>>> file system abstraction? >>> >>> Yes, please! >> >> Great :) /me jumps right into the design discussion then. >> >>> Having a proper virtual file system across Clang and LLVM would be a huge boon, especially for pushing Clang into more applications that aren't simply "grab stuff from the local file system." The current FileManager/SourceManager dance used to provide in-memory content for a (virtual or old) file is quite the mess. >>> >>>> Pro: >>>> - only one interface for developers to learn on the project (no more >>>> PathV1 vs PathV2 vs FileManager) >>>> - only one implementation (per-platform) for easier maintenance of the >>>> file system platform abstraction >>>> - one point to insert synchronization guarantees for tools / IDE >>>> integration that wants to run clang in multiple threads at once (for >>>> example when re-indexing on 12-ht-core machines) >>>> - being able to replay compilations by injecting a virtual file system >>>> that exactly “copies” the original file system’s content, which allows >>>> easy scaling of replays, running tools against dirty edit buffers on a >>>> lower level than the SourceManager and unit testing >>> >>> … and making sure that all of the various stages of compilation see the same view of the file system. >>> >>>> Con: >>>> - there would be yet another try at unifying the APIs which would be >>>> in an intermediate state while being worked on (and PathV1 vs PathV2 >>>> is already bad enough) >>> >>> I'm fine with intermediate states so long as the direction and benefits are clear. The former we can certainly discuss, and the latter is obvious already. >>> >>>> - making it the canonical file system interface is a lot of effort >>>> that requires touching a lot of systems (while we’re volunteering to >>>> do the work, it will probably eat up other people’s time, too) >>> >>> I doubt I'll have much time to directly hack on this, but I'll be happy to review / discuss / help with adoption. libclang is one of the huge beneficiaries of such a change, so I care a lot about getting that to work well. >>> >>>> What parts (if any) of this type of transition makes sense? >>>> 1. Figure out the “correct” interface we’d want for FileManager to be >>>> more generally useful >>>> 2. Change FileManager to that interface >>>> 4. Sink FileManager into llvm, so it can be used by other projects >>>> 4. Use it throughout clang >>>> 5. Use it throughout llvm >>>> We don’t need to do all of them at once, and should be able to >>>> evaluate the results along the way. >>> >>> I share some of Daniel's concern about re-using FileManager, because the interface is very narrowly designed for Clang's usage and some of the functionality intended for the VFS is split out into SourceManager. My advice would be to start building a new VFS down in LLVM, and make FileManager an increasingly-shrinking interface on top of the new VFS. At some point, FileManager will be thin enough that its clients can just switch directly over to using the VFS, and FileManager can eventually go away. >>> >>> I do realize that this could end up like PathV1 vs. PathV2, where both exist for a while, but the benefits of the VFS should outweigh our collective laziness. >> >> So, as I noted in my replay to Daniel, after working through >> llvm/Support (and bringing FileManager back to my mind) I think I'm >> actually seeing a way forward, tell me if I'm crazy: > > The following seems like a good plan and breakdown to me. And +1 on > working baby-steps. > >> 1. morph FileSystem (I don't know whether that would include PathV2, >> but I currently don't think so) into a class that exports a nice >> interface for all FileSystem functions that we can override; to be >> able to do that step-by-step, we could for example introduce a static >> FileSystem pointer that is initialized with the default system file >> system on startup (I like being able to do baby-steps) >> 2. add methods to FileSystem to support opening MemoryBuffers; the >> path forward will be to move all calls to MemofyBuffer::get*File >> through the FileSystem interface, but again that can be handled >> incrementally >> 3. at that point we'd have enough stuff in FileSystem to rebase >> FileManager on top of it; once 1 and 2 are finished for clang/.* we'll >> be able to completely move the virtual file support over into a nice >> OverlayFileSystem implementation (argh, I've coded too many of those >> in my life); > > Is this true (enough support to base FileManager on top)? I'm > specifically thinking about some of the places we look at inodes. Or > are you expecting to expose some kind of abstracted representation of > an inode? > >> 4. add methods to FileSystem to support opening raw_fd_ostreams; this >> is basically the process for reading mirrored >> >> Thoughts? Completely broken approach? Broken order? >> >> On a different note, switching to the SourceManager topic - I know >> enough about SourceManager to be dangerous but not enough to ever >> claim I would have understood the crazy buffer management that's going >> on in ContentCache :) So I'd need a lot of help to pry that box open >> eventually. Currently I'd think that this can be done in a subsequent >> step after the file system is sorted out, but I might be wrong... > > I'd try away from SourceManager. I would hope that the VFS layer stuff > doesn't interact (or minimally) with SourceManager (although > SourceManager is also aware of inodes, which is sad).SourceManager has some code for overriding on-disk files with alternative buffers and for detecting when the underlying file system has changed from underneath us. That functionally should eventually move into FileSystem.
On Tue, Dec 6, 2011 at 6:04 AM, Daniel Dunbar <daniel at zuster.org> wrote:> Hi Manuel, > > On Sun, Dec 4, 2011 at 9:06 AM, Manuel Klimek <klimek at google.com> wrote: >> On Sat, Dec 3, 2011 at 10:33 PM, Douglas Gregor <dgregor at apple.com> wrote: >>> Hi Manuel, >>> >>> On Nov 28, 2011, at 2:49 AM, Manuel Klimek wrote: >>> >>>> Hi, >>>> >>>> while working on tooling on top of clang/llvm we found the file system >>>> abstractions in clang/llvm to be one of the points that could be nicer >>>> to integrate with. I’m writing this mail to propose a strawman and get >>>> some feedback on what you guys think the right way forward is (or >>>> whether we should just leave things as they are). >>>> >>>> First, the FileManager we have in clang has helped us a lot for our >>>> tooling - when we run clang in a mapreduce we don’t need to lay out >>>> files on a disk, we can just map files into memory and happily clang >>>> over them. We’re also using the same mechanism to map builtin >>>> includes; in short, the FileManager has made it possible to do clang >>>> at scale. >>>> >>>> Now we’re aware that it was not really the intention of the >>>> FileManager to allow doing the things we do with it: not every module >>>> in clang uses the FileManager, and the moment we hit llvm there is no >>>> FileManager at all. For example, in case of the Driver we hack around >>>> the fact that the header search tries to access the file system >>>> driectly in rather brittle ways, relying on implementation details and >>>> #ifdefs. >>>> >>>> So why not make FileManager a more principled (and still blazing fast) >>>> file system abstraction? >>> >>> Yes, please! >> >> Great :) /me jumps right into the design discussion then. >> >>> Having a proper virtual file system across Clang and LLVM would be a huge boon, especially for pushing Clang into more applications that aren't simply "grab stuff from the local file system." The current FileManager/SourceManager dance used to provide in-memory content for a (virtual or old) file is quite the mess. >>> >>>> Pro: >>>> - only one interface for developers to learn on the project (no more >>>> PathV1 vs PathV2 vs FileManager) >>>> - only one implementation (per-platform) for easier maintenance of the >>>> file system platform abstraction >>>> - one point to insert synchronization guarantees for tools / IDE >>>> integration that wants to run clang in multiple threads at once (for >>>> example when re-indexing on 12-ht-core machines) >>>> - being able to replay compilations by injecting a virtual file system >>>> that exactly “copies” the original file system’s content, which allows >>>> easy scaling of replays, running tools against dirty edit buffers on a >>>> lower level than the SourceManager and unit testing >>> >>> … and making sure that all of the various stages of compilation see the same view of the file system. >>> >>>> Con: >>>> - there would be yet another try at unifying the APIs which would be >>>> in an intermediate state while being worked on (and PathV1 vs PathV2 >>>> is already bad enough) >>> >>> I'm fine with intermediate states so long as the direction and benefits are clear. The former we can certainly discuss, and the latter is obvious already. >>> >>>> - making it the canonical file system interface is a lot of effort >>>> that requires touching a lot of systems (while we’re volunteering to >>>> do the work, it will probably eat up other people’s time, too) >>> >>> I doubt I'll have much time to directly hack on this, but I'll be happy to review / discuss / help with adoption. libclang is one of the huge beneficiaries of such a change, so I care a lot about getting that to work well. >>> >>>> What parts (if any) of this type of transition makes sense? >>>> 1. Figure out the “correct” interface we’d want for FileManager to be >>>> more generally useful >>>> 2. Change FileManager to that interface >>>> 4. Sink FileManager into llvm, so it can be used by other projects >>>> 4. Use it throughout clang >>>> 5. Use it throughout llvm >>>> We don’t need to do all of them at once, and should be able to >>>> evaluate the results along the way. >>> >>> I share some of Daniel's concern about re-using FileManager, because the interface is very narrowly designed for Clang's usage and some of the functionality intended for the VFS is split out into SourceManager. My advice would be to start building a new VFS down in LLVM, and make FileManager an increasingly-shrinking interface on top of the new VFS. At some point, FileManager will be thin enough that its clients can just switch directly over to using the VFS, and FileManager can eventually go away. >>> >>> I do realize that this could end up like PathV1 vs. PathV2, where both exist for a while, but the benefits of the VFS should outweigh our collective laziness. >> >> So, as I noted in my replay to Daniel, after working through >> llvm/Support (and bringing FileManager back to my mind) I think I'm >> actually seeing a way forward, tell me if I'm crazy: > > The following seems like a good plan and breakdown to me. And +1 on > working baby-steps. > >> 1. morph FileSystem (I don't know whether that would include PathV2, >> but I currently don't think so) into a class that exports a nice >> interface for all FileSystem functions that we can override; to be >> able to do that step-by-step, we could for example introduce a static >> FileSystem pointer that is initialized with the default system file >> system on startup (I like being able to do baby-steps) >> 2. add methods to FileSystem to support opening MemoryBuffers; the >> path forward will be to move all calls to MemofyBuffer::get*File >> through the FileSystem interface, but again that can be handled >> incrementally >> 3. at that point we'd have enough stuff in FileSystem to rebase >> FileManager on top of it; once 1 and 2 are finished for clang/.* we'll >> be able to completely move the virtual file support over into a nice >> OverlayFileSystem implementation (argh, I've coded too many of those >> in my life); > > Is this true (enough support to base FileManager on top)? I'm > specifically thinking about some of the places we look at inodes. Or > are you expecting to expose some kind of abstracted representation of > an inode?Exactly. FileSystem already supports that implicitly, we just need to export it in a sensible way.>> 4. add methods to FileSystem to support opening raw_fd_ostreams; this >> is basically the process for reading mirrored >> >> Thoughts? Completely broken approach? Broken order? >> >> On a different note, switching to the SourceManager topic - I know >> enough about SourceManager to be dangerous but not enough to ever >> claim I would have understood the crazy buffer management that's going >> on in ContentCache :) So I'd need a lot of help to pry that box open >> eventually. Currently I'd think that this can be done in a subsequent >> step after the file system is sorted out, but I might be wrong... > > I'd try away from SourceManager. I would hope that the VFS layer stuff > doesn't interact (or minimally) with SourceManager (although > SourceManager is also aware of inodes, which is sad).I think the concept of unique system wide file IDs makes sense (and like Douglas said we can probably push most of that stuff down from SourceManager once the FileSystem is providing all the hooks we need), and I'm confident we can express that in an OS independent way. Cheers, /Manuel> > - Daniel > >> >> Cheers, >> /Manuel >> >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
On Tue, Dec 6, 2011 at 2:11 AM, Michael Spencer <bigcheesegs at gmail.com> wrote:> On Sun, Dec 4, 2011 at 9:06 AM, Manuel Klimek <klimek at google.com> wrote: >> On Sat, Dec 3, 2011 at 10:33 PM, Douglas Gregor <dgregor at apple.com> wrote: >>> Hi Manuel, >>> >>> On Nov 28, 2011, at 2:49 AM, Manuel Klimek wrote: >>> >>>> Hi, >>>> >>>> while working on tooling on top of clang/llvm we found the file system >>>> abstractions in clang/llvm to be one of the points that could be nicer >>>> to integrate with. I’m writing this mail to propose a strawman and get >>>> some feedback on what you guys think the right way forward is (or >>>> whether we should just leave things as they are). >>>> >>>> First, the FileManager we have in clang has helped us a lot for our >>>> tooling - when we run clang in a mapreduce we don’t need to lay out >>>> files on a disk, we can just map files into memory and happily clang >>>> over them. We’re also using the same mechanism to map builtin >>>> includes; in short, the FileManager has made it possible to do clang >>>> at scale. >>>> >>>> Now we’re aware that it was not really the intention of the >>>> FileManager to allow doing the things we do with it: not every module >>>> in clang uses the FileManager, and the moment we hit llvm there is no >>>> FileManager at all. For example, in case of the Driver we hack around >>>> the fact that the header search tries to access the file system >>>> driectly in rather brittle ways, relying on implementation details and >>>> #ifdefs. >>>> >>>> So why not make FileManager a more principled (and still blazing fast) >>>> file system abstraction? >>> >>> Yes, please! >> >> Great :) /me jumps right into the design discussion then. >> >>> Having a proper virtual file system across Clang and LLVM would be a huge boon, especially for pushing Clang into more applications that aren't simply "grab stuff from the local file system." The current FileManager/SourceManager dance used to provide in-memory content for a (virtual or old) file is quite the mess. >>> >>>> Pro: >>>> - only one interface for developers to learn on the project (no more >>>> PathV1 vs PathV2 vs FileManager) >>>> - only one implementation (per-platform) for easier maintenance of the >>>> file system platform abstraction >>>> - one point to insert synchronization guarantees for tools / IDE >>>> integration that wants to run clang in multiple threads at once (for >>>> example when re-indexing on 12-ht-core machines) >>>> - being able to replay compilations by injecting a virtual file system >>>> that exactly “copies” the original file system’s content, which allows >>>> easy scaling of replays, running tools against dirty edit buffers on a >>>> lower level than the SourceManager and unit testing >>> >>> … and making sure that all of the various stages of compilation see the same view of the file system. >>> >>>> Con: >>>> - there would be yet another try at unifying the APIs which would be >>>> in an intermediate state while being worked on (and PathV1 vs PathV2 >>>> is already bad enough) >>> >>> I'm fine with intermediate states so long as the direction and benefits are clear. The former we can certainly discuss, and the latter is obvious already. >>> >>>> - making it the canonical file system interface is a lot of effort >>>> that requires touching a lot of systems (while we’re volunteering to >>>> do the work, it will probably eat up other people’s time, too) >>> >>> I doubt I'll have much time to directly hack on this, but I'll be happy to review / discuss / help with adoption. libclang is one of the huge beneficiaries of such a change, so I care a lot about getting that to work well. >>> >>>> What parts (if any) of this type of transition makes sense? >>>> 1. Figure out the “correct” interface we’d want for FileManager to be >>>> more generally useful >>>> 2. Change FileManager to that interface >>>> 4. Sink FileManager into llvm, so it can be used by other projects >>>> 4. Use it throughout clang >>>> 5. Use it throughout llvm >>>> We don’t need to do all of them at once, and should be able to >>>> evaluate the results along the way. >>> >>> I share some of Daniel's concern about re-using FileManager, because the interface is very narrowly designed for Clang's usage and some of the functionality intended for the VFS is split out into SourceManager. My advice would be to start building a new VFS down in LLVM, and make FileManager an increasingly-shrinking interface on top of the new VFS. At some point, FileManager will be thin enough that its clients can just switch directly over to using the VFS, and FileManager can eventually go away. >>> >>> I do realize that this could end up like PathV1 vs. PathV2, where both exist for a while, but the benefits of the VFS should outweigh our collective laziness. >> >> So, as I noted in my replay to Daniel, after working through >> llvm/Support (and bringing FileManager back to my mind) I think I'm >> actually seeing a way forward, tell me if I'm crazy: >> 1. morph FileSystem (I don't know whether that would include PathV2, >> but I currently don't think so) into a class that exports a nice >> interface for all FileSystem functions that we can override; to be >> able to do that step-by-step, we could for example introduce a static >> FileSystem pointer that is initialized with the default system file >> system on startup (I like being able to do baby-steps) >> 2. add methods to FileSystem to support opening MemoryBuffers; the >> path forward will be to move all calls to MemofyBuffer::get*File >> through the FileSystem interface, but again that can be handled >> incrementally >> 3. at that point we'd have enough stuff in FileSystem to rebase >> FileManager on top of it; once 1 and 2 are finished for clang/.* we'll >> be able to completely move the virtual file support over into a nice >> OverlayFileSystem implementation (argh, I've coded too many of those >> in my life); >> 4. add methods to FileSystem to support opening raw_fd_ostreams; this >> is basically the process for reading mirrored >> >> Thoughts? Completely broken approach? Broken order? >> >> On a different note, switching to the SourceManager topic - I know >> enough about SourceManager to be dangerous but not enough to ever >> claim I would have understood the crazy buffer management that's going >> on in ContentCache :) So I'd need a lot of help to pry that box open >> eventually. Currently I'd think that this can be done in a subsequent >> step after the file system is sorted out, but I might be wrong... >> >> Cheers, >> /Manuel > > Just for some background about why we have PathV2. > > In my quest to improve Windows support across LLVM and Clang I ran > into many issues with the way PathV1 worked. A few were: > * PathV1, and most of LLVM, use std::string to handle errors. This > makes code more verbose than needed, and loses os level error > information. > * PathV1 makes it difficult to handle Unicode on Windows. Although > apparently I didn't solve the problem correctly either :P.Are there open bugs? A quick search for unicode on llvm.org/bugs didn't show anything windows specific.> * PathV1 requires constructing a Path object before calling any > functions. This is inefficient when most of the time you have > something StringRef'able. > > Thus when I designed PathV2 I made it stateless, utf-8 only, and used > error_code. > > The reason I bring this up is because I support a VFS, however, I want > to make sure that we keep in mind the reasons PathV2 was created while > writing it.Yep, that's an important point. As I said, I've looked into PathV2 and I really like the distinction between path manipulation and file system access, and the general design of both PathV2 and Support/FileSystem.> PathV1 -> PathV2 transition stopped because I ran out of time to do > it. There's so much code that uses it, and some of the changes are non > trivial in the cases where the Path class is stored and accessed many > places instead of just used to access the path functions. > > The approach and order seems good to me. The llvm::sys::path parts can > stay separate, only the llvm::sys::fs parts need to be virtualized.Yep, that was exactly my thought. Thanks for confirming and providing all the background information! :) Cheers, /Manuel