I foresee problems with this on both Windows and non-Windows. A typical libc implementation has a lot of internal state that is shared across API boundaries in a way that is considered an implementation detail. So making assumptions about which state is shared and which isn't is going to be a problem. How do you guarantee that if you implement method A and forward method B, that B will behave the same as it would have if you had forwarded A also? It might not even work at all. Where can you safely draw this boundary? Users can set errno for example, and in many cases they must set errno to 0 before invoking a call if they want to reliably detect an error. So let's say they set errno to 0, then call a method which our libc implementation decides to forward. What do we do? We could propagate errno on every single call, but my point is that there are going to be a ton of subtle issues that arise from this approach that are hard to foresee, precisely because the implementation details of a libc implementation are supposed to be just that - implementation details. On Tue, Jun 25, 2019 at 5:01 PM Siva Chandra <sivachandra at google.com> wrote:> > On Tue, Jun 25, 2019 at 4:32 PM Zachary Turner <zturner at roblox.com> wrote: >> >> The main concern I have is that Windows is so different from >> everything else that there is a high likelihood of decisions being >> baked in early on that make things very difficult for people to come >> along later and contribute a Windows implementation. This happened >> with sanitizers for example (lack of support for weak functions on >> Windows), LLDB (posix api calls scattered throughout the codebase), >> and I worry with libc it will be even more difficult to correctly >> design the abstraction because we have to deal with executable file >> format, syscalls, operating system loaders, and various linkage >> models. >> >> The most immediate thing I think we will run into is that you >> mentioned wanting this to take shape as something that sits in between >> system libc and application. Given that Windows' libc and other >> versions of libc are so different, I expect this to lead to some >> interesting problems. >> >> Can you elaborate more on how you envision this working with llvm libc >> in between application and system libc? > > > A typical application uses a large number of pieces from a libc. But, it is not practical to have everything implemented and ready in a new libc from day one. So for that phase, when the new libc is still being built, we want the unimplemented parts of the new libc to essentially redirect to the system libc. This brings two benefits: > > 1. We can build the new libc in a gradual manner. > 2. Applications stay operational while gaining the benefits of the new implementations. > > Do you foresee any problems with this approach on Windows?
On 6/25/19 7:22 PM, Zachary Turner via llvm-dev wrote:> I foresee problems with this on both Windows and non-Windows. A > typical libc implementation has a lot of internal state that is shared > across API boundaries in a way that is considered an implementation > detail. So making assumptions about which state is shared and which > isn't is going to be a problem. > > How do you guarantee that if you implement method A and forward method > B, that B will behave the same as it would have if you had forwarded A > also? It might not even work at all. Where can you safely draw this > boundary? > > Users can set errno for example, and in many cases they must set errno > to 0 before invoking a call if they want to reliably detect an error. > So let's say they set errno to 0, then call a method which our libc > implementation decides to forward. What do we do? We could propagate > errno on every single call, but my point is that there are going to be > a ton of subtle issues that arise from this approach that are hard to > foresee, precisely because the implementation details of a libc > implementation are supposed to be just that - implementation details.You certainly can't mix-and-match on a per-function level, in general. I suspect that there are some subsystems that can be substituted. Using open from one libc and close from another seems problematic. Using open and close from one libc and qsort from another is probably fine. And, as you point out, the library might need to be configurable to use an externally-provided errno. -Hal> > On Tue, Jun 25, 2019 at 5:01 PM Siva Chandra <sivachandra at google.com> wrote: >> On Tue, Jun 25, 2019 at 4:32 PM Zachary Turner <zturner at roblox.com> wrote: >>> The main concern I have is that Windows is so different from >>> everything else that there is a high likelihood of decisions being >>> baked in early on that make things very difficult for people to come >>> along later and contribute a Windows implementation. This happened >>> with sanitizers for example (lack of support for weak functions on >>> Windows), LLDB (posix api calls scattered throughout the codebase), >>> and I worry with libc it will be even more difficult to correctly >>> design the abstraction because we have to deal with executable file >>> format, syscalls, operating system loaders, and various linkage >>> models. >>> >>> The most immediate thing I think we will run into is that you >>> mentioned wanting this to take shape as something that sits in between >>> system libc and application. Given that Windows' libc and other >>> versions of libc are so different, I expect this to lead to some >>> interesting problems. >>> >>> Can you elaborate more on how you envision this working with llvm libc >>> in between application and system libc? >> >> A typical application uses a large number of pieces from a libc. But, it is not practical to have everything implemented and ready in a new libc from day one. So for that phase, when the new libc is still being built, we want the unimplemented parts of the new libc to essentially redirect to the system libc. This brings two benefits: >> >> 1. We can build the new libc in a gradual manner. >> 2. Applications stay operational while gaining the benefits of the new implementations. >> >> Do you foresee any problems with this approach on Windows? > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-- Hal Finkel Lead, Compiler Technology and Programming Languages Leadership Computing Facility Argonne National Laboratory
> On 6/25/19 7:22 PM, Zachary Turner via llvm-dev wrote: > > I foresee problems with this on both Windows and non-Windows. A > > typical libc implementation has a lot of internal state that is shared > > across API boundaries in a way that is considered an implementation > > detail. So making assumptions about which state is shared and which > > isn't is going to be a problem.+1 for what Hal Finkel has said below about switching from redirectors to implementations: There will be certain groups of functions which will have to be switched all together. We will not be able to do it one function at a time for such groups.> > How do you guarantee that if you implement method A and forward method > > B, that B will behave the same as it would have if you had forwarded A > > also? It might not even work at all. Where can you safely draw this > > boundary?Are you talking about a scenario wherein implementation of B in the system libc calls its A? If yes, most libc implementations do a good job of using internal names in such scenarios. That is, B would call A with an internal name. This ensures that B from the system libc calls A also from the system libc and not the redirector/forwarder.> > Users can set errno for example, and in many cases they must set errno > > to 0 before invoking a call if they want to reliably detect an error. > > So let's say they set errno to 0, then call a method which our libc > > implementation decides to forward. What do we do? We could propagate > > errno on every single call, but my point is that there are going to be > > a ton of subtle issues that arise from this approach that are hard to > > foresee, precisely because the implementation details of a libc > > implementation are supposed to be just that - implementation details.Dealing with errno in particular is probably not as nasty as it seems. The standard allows errno to be a macro. Hence, for the transitory phase, implementations and redirectors in our libc can make use of the errno from the system libc. Something like this: $> cat llvm-errno.cpp #include <errno.h> // This is the system-libc header file int *__llvm_errno() { return &errno; } $> cat errno.h # This is the llvm libc's errno.h int *__llvm_errno(); #define errno (*__llvm_errno()) On Tue, Jun 25, 2019 at 6:20 PM Finkel, Hal J. <hfinkel at anl.gov> wrote:> You certainly can't mix-and-match on a per-function level, in general. I > suspect that there are some subsystems that can be substituted. Using > open from one libc and close from another seems problematic. Using open > and close from one libc and qsort from another is probably fine. And, as > you point out, the library might need to be configurable to use an > externally-provided errno.