Hummm ok, so the list doesn''t allow posting by non-members ... ---------- Forwarded message ---------- Date: Tue, 19 Oct 2004 12:45:01 -0700 From: Roland McGrath <roland@redhat.com> To: Rik van Riel <riel@redhat.com> Cc: xen-devel@lists.sourceforge.net, Jakub Jelinek <jakub@redhat.com> Subject: Re: NPTL/TLS "emulation" idea> A few weeks ago Roland, Jakub and myself brainstormed > about this problem. One of the things that came up is > that the positive (glibc private data) and -ve (TLS) > data are not generally used at the same time.Well, that''s not really true. Small positive offsets are used all the time (every syscall, for example, and all of pthreads internals). Negative offsets are used for actual ELF TLS accesses (__thread variables), which now include `errno'' in the standard glibc build. So depending on your code one or the other might be most common, but you are unlikely ever to have a program run that doesn''t flip back and forth a fair bit. I really don''t have any clue what the fault-segment-flip-resume overhead vs the fault-emulate-resume overhead is. You''d just have to test it out. I am still brainstorming about this, but I will need to do some experiments to figure out how some other funny ways of using segments actually work. ------------------------------------------------------- This SF.net email is sponsored by: IT Product Guide on ITManagersJournal Use IT products in your business? Tell us what you think of them. Give us Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out more http://productguide.itmanagersjournal.com/guidepromo.tmpl _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
> > A few weeks ago Roland, Jakub and myself brainstormed > > about this problem. One of the things that came up is > > that the positive (glibc private data) and -ve (TLS) > > data are not generally used at the same time. > > Well, that''s not really true. Small positive offsets are used all the time > (every syscall, for example, and all of pthreads internals). Negative > offsets are used for actual ELF TLS accesses (__thread variables), which > now include `errno'' in the standard glibc build. So depending on your code > one or the other might be most common, but you are unlikely ever to have a > program run that doesn''t flip back and forth a fair bit. I really don''t > have any clue what the fault-segment-flip-resume overhead vs the > fault-emulate-resume overhead is. You''d just have to test it out. > > I am still brainstorming about this, but I will need to do some experiments > to figure out how some other funny ways of using segments actually work.Yes, so the answer is that we ''flip'' about as often as the current code emulates (e.g., about 2.5 million flips/emulations to boot a Red Hat system). The performance is very bad, but the flipping code is both simpler and more robust than emulation so I will go with the new technique. But I will still print a warning message from Linux to tell the user to remove /lib/tls. -- Keir ------------------------------------------------------- This SF.net email is sponsored by: IT Product Guide on ITManagersJournal Use IT products in your business? Tell us what you think of them. Give us Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out more http://productguide.itmanagersjournal.com/guidepromo.tmpl _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
On Wed, 20 Oct 2004, Keir Fraser wrote:> Yes, so the answer is that we ''flip'' about as often as the current > code emulates (e.g., about 2.5 million flips/emulations to boot a Red > Hat system). > > The performance is very bad, but the flipping code is both simpler and > more robust than emulation so I will go with the new technique.How bad is the performance? A 10% performance penalty, 30% ?> But I will still print a warning message from Linux to tell the user to > remove /lib/tls.I''ve heard that this will actually break some things, like db4 locking and the RPM database consistency... -- "Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." - Brian W. Kernighan ------------------------------------------------------- This SF.net email is sponsored by: IT Product Guide on ITManagersJournal Use IT products in your business? Tell us what you think of them. Give us Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out more http://productguide.itmanagersjournal.com/guidepromo.tmpl _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
> On Wed, 20 Oct 2004, Keir Fraser wrote: > > > Yes, so the answer is that we ''flip'' about as often as the current > > code emulates (e.g., about 2.5 million flips/emulations to boot a Red > > Hat system). > > > > The performance is very bad, but the flipping code is both simpler and > > more robust than emulation so I will go with the new technique. > > How bad is the performance? A 10% performance penalty, 30% ?My benchmark is ''time /bin/ls -R /usr/lib >/dev/null'' with a warm buffer cache. With no /lib/tls this takes ~180ms. With emulation it takes ~300ms. With the new technique it''s ~390ms -- so about a further 30% slowdown, or 115% slowdown overall. The extra cost is due to the fact that we fault nearly twice as often because -ve and +ve accesses seem pretty neatly interleaved. So we fault on all GS accesses, rather than just the -ve ones. :-(> > But I will still print a warning message from Linux to tell the user to > > remove /lib/tls. > > I''ve heard that this will actually break some things, like > db4 locking and the RPM database consistency...Strange: do the non /lib/tls libraries deal with thread-local state in some incompatible way? One fix is to distribute an alternative /lib/tls that is built with ''virtualisation-happy'' GS accesses: e.g., mov %%gs:0,%0 mov <offset>(%0),%1 So we get the same externally-observable semantics (db4 and so on shouldn''t break) but this won''t cause Xen to choke. This may need to be one part of a general move to having two versions of many executables. GCC now defaults to producing -ve accesses to thread-local state --- if lots of apps start using the new ''thread'' keyword then this is going to cause problems for Xen unless versions are built with the approriate GCC command-line switch to produce virtualisation-happy code (no -ve accesses). -- Keir ------------------------------------------------------- This SF.net email is sponsored by: IT Product Guide on ITManagersJournal Use IT products in your business? Tell us what you think of them. Give us Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out more http://productguide.itmanagersjournal.com/guidepromo.tmpl _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
On Wed, 20 Oct 2004, Keir Fraser wrote:> > How bad is the performance? A 10% performance penalty, 30% ? > > My benchmark is ''time /bin/ls -R /usr/lib >/dev/null'' with a warm > buffer cache. > > With no /lib/tls this takes ~180ms. With emulation it takes > ~300ms. With the new technique it''s ~390ms -- so about a further 30% > slowdown, or 115% slowdown overall.Considering how system heavy this workload is, that''s probably not even that bad.> The extra cost is due to the fact that we fault nearly twice as often > because -ve and +ve accesses seem pretty neatly interleaved. So we > fault on all GS accesses, rather than just the -ve ones. :-(IIRC the glibc private data is accessed once per system call, or possibly on both system call entrance and exit. Less system heavy tasks probably do not have an overhead as bad as ls -R. -- "Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." - Brian W. Kernighan ------------------------------------------------------- This SF.net email is sponsored by: IT Product Guide on ITManagersJournal Use IT products in your business? Tell us what you think of them. Give us Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out more http://productguide.itmanagersjournal.com/guidepromo.tmpl _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
> > With no /lib/tls this takes ~180ms. With emulation it takes > > ~300ms. With the new technique it''s ~390ms -- so about a further 30% > > slowdown, or 115% slowdown overall. > > Considering how system heavy this workload is, that''s > probably not even that bad.200ms of pure CPU overhead is abysmal!> > The extra cost is due to the fact that we fault nearly twice as often > > because -ve and +ve accesses seem pretty neatly interleaved. So we > > fault on all GS accesses, rather than just the -ve ones. :-( > > IIRC the glibc private data is accessed once per system > call, or possibly on both system call entrance and exit. > > Less system heavy tasks probably do not have an overhead > as bad as ls -R.Just booting a minimal Fedora Core distribution takes 2.5 *million* faults. The slowdown is noticeable. More feasible long-term solutions include: 1. Modify the ABI to disallow -ve accesses [sounds like this possibility is vetoed by Ulrich Drepper]. 2. Provide alternative apps/libraries that do not cause -ve accesses. 3. If both FS and GS are reserved for glibc, we could indeed have one for +ve accesses and one for -ve accesses. This oculd be implemented either in user space --- i.e., rewrite glibc, and possibly gcc, to use both registers --- or by binary rewriting in the kernel (but problems with the patches getting committed to disc!!). -- Keir ------------------------------------------------------- This SF.net email is sponsored by: IT Product Guide on ITManagersJournal Use IT products in your business? Tell us what you think of them. Give us Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out more http://productguide.itmanagersjournal.com/guidepromo.tmpl _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
On Wed, 20 Oct 2004, Keir Fraser wrote:> More feasible long-term solutions include: > > 1. Modify the ABI to disallow -ve accesses [sounds like this > possibility is vetoed by Ulrich Drepper]. > > 2. Provide alternative apps/libraries that do not cause -ve accesses.4. Provide an alternative libc that does the +ve accesses (which are libc private, afaik) in another segment. This does not break the ABI for userland programs and -ve accesses aren''t that bad when there are no +ve accesses in the same segment. -- "Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." - Brian W. Kernighan ------------------------------------------------------- This SF.net email is sponsored by: IT Product Guide on ITManagersJournal Use IT products in your business? Tell us what you think of them. Give us Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out more http://productguide.itmanagersjournal.com/guidepromo.tmpl _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
On Wed, Oct 20, 2004 at 12:55:22PM -0400, Rik van Riel wrote:> 4. Provide an alternative libc that does the +ve accesses (which are > libc private, afaik) in another segment. This does not break the > ABI for userland programs and -ve accesses aren''t that bad when > there are no +ve accesses in the same segment.No, the ABI uses -ve accesses and %gs:0 (4 bytes there), +ve accesses above +4 are glibc private. Jakub ------------------------------------------------------- This SF.net email is sponsored by: IT Product Guide on ITManagersJournal Use IT products in your business? Tell us what you think of them. Give us Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out more http://productguide.itmanagersjournal.com/guidepromo.tmpl _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
> On Wed, Oct 20, 2004 at 12:55:22PM -0400, Rik van Riel wrote: > > 4. Provide an alternative libc that does the +ve accesses (which are > > libc private, afaik) in another segment. This does not break the > > ABI for userland programs and -ve accesses aren''t that bad when > > there are no +ve accesses in the same segment. > > No, the ABI uses -ve accesses and %gs:0 (4 bytes there), +ve accesses > above +4 are glibc private.Could we duplicate %gs:0 at %gs:-4 and update the ABI? Or is the ABI now set in stone? -- Keir ------------------------------------------------------- This SF.net email is sponsored by: IT Product Guide on ITManagersJournal Use IT products in your business? Tell us what you think of them. Give us Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out more http://productguide.itmanagersjournal.com/guidepromo.tmpl _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
> > 4. Provide an alternative libc that does the +ve accesses (which are > libc private, afaik) in another segment. This does not break the > ABI for userland programs and -ve accesses aren''t that bad when > there are no +ve accesses in the same segment.There''d be very little technical reason not to adopt this as the default libc. Unless this was likely to happen, we might be better off using simple dynamic binary rewriting to transform the standard libc into "+ve offsets use fs". The only down side is the pain that''s involved in stopping the patched versions getting committed to disk by the pre-linker... Ian ------------------------------------------------------- This SF.net email is sponsored by: IT Product Guide on ITManagersJournal Use IT products in your business? Tell us what you think of them. Give us Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out more http://productguide.itmanagersjournal.com/guidepromo.tmpl _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
On Wed, Oct 20, 2004 at 06:27:39PM +0100, Keir Fraser wrote:> > On Wed, Oct 20, 2004 at 12:55:22PM -0400, Rik van Riel wrote: > > > 4. Provide an alternative libc that does the +ve accesses (which are > > > libc private, afaik) in another segment. This does not break the > > > ABI for userland programs and -ve accesses aren''t that bad when > > > there are no +ve accesses in the same segment. > > > > No, the ABI uses -ve accesses and %gs:0 (4 bytes there), +ve accesses > > above +4 are glibc private. > > Could we duplicate %gs:0 at %gs:-4 and update the ABI? Or is the ABI > now set in stone?The ABI is there for several years, used e.g. in Solaris as well and is used already in several libraries, not just glibc. The ABI is not going to change for the sake of emulators. Jakub ------------------------------------------------------- This SF.net email is sponsored by: IT Product Guide on ITManagersJournal Use IT products in your business? Tell us what you think of them. Give us Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out more http://productguide.itmanagersjournal.com/guidepromo.tmpl _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
Jakub Jelinek wrote:> On Wed, Oct 20, 2004 at 06:27:39PM +0100, Keir Fraser wrote: >>>On Wed, Oct 20, 2004 at 12:55:22PM -0400, Rik van Riel wrote: >>> >>>> 4. Provide an alternative libc that does the +ve accesses (which are >>>> libc private, afaik) in another segment. This does not break the >>>> ABI for userland programs and -ve accesses aren''t that bad when >>>> there are no +ve accesses in the same segment. >>> >>>No, the ABI uses -ve accesses and %gs:0 (4 bytes there), +ve accesses >>>above +4 are glibc private. >> >>Could we duplicate %gs:0 at %gs:-4 and update the ABI? Or is the ABI >>now set in stone? > > The ABI is there for several years, used e.g. in Solaris as well > and is used already in several libraries, not just glibc. > The ABI is not going to change for the sake of emulators.This aspect of the TLS ABI has also been a problem for L4, although for a different reason (they had already used %gs:0 for something else). I can''t immediately see any solutions arising from that discussion, but in case it gives anyone else an idea, see <http://lists.ira.uka.de/pipermail/l4ka/2004-March/000874.html>. Looks like the same problem is also going to come up for the AMD64 port: <http://lists.freebsd.org/pipermail/freebsd-threads/2004-March/001850.html> -- David Hopwood <david.nospam.hopwood@blueyonder.co.uk> ------------------------------------------------------- This SF.net email is sponsored by: IT Product Guide on ITManagersJournal Use IT products in your business? Tell us what you think of them. Give us Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out more http://productguide.itmanagersjournal.com/guidepromo.tmpl _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel