Hello, I was looking to tackle the "StringRef'ize APIs" suggestion from the clang project page and just wanted to post a couple of thoughts and ask a couple of questions. First of all I am going to talk about how I see the goals of the project. Basically, as far as I understand it I will be converting existing functions that take std::string or char*'s to use llvm::StringRef where applicable. Then I will be changing a number of call sites to use this new function. One major question I have is should the old version be removed? It would be very possible to convert the old version as a stub and it may make it easier for others when they have a string rather than a StringRef (although conversion is simple anyways). Also there may be API and ABI implications if a function from the public API is converted. What do you think the best approach for this is? Another question I have is how would you define focus. A large part of the project is hunting through the source to find and change these functions so how would "progress" be defined. GSOC requires a solid requirement for mid-term and final requirements. Should I choose a number of functions that I expect to have converted in this time or is there a better criteria that you can think of. cheers, Kevin -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 295 bytes Desc: OpenPGP digital signature URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140317/c631b81e/attachment.sig>
On Mon, Mar 17, 2014 at 6:16 AM, Kevin Cox <kevincox at kevincox.ca> wrote:> Hello, > > I was looking to tackle the "StringRef'ize APIs" suggestion from the > clang project page and just wanted to post a couple of thoughts and ask > a couple of questions.Hi - welcome to the project!> First of all I am going to talk about how I see the goals of the > project. Basically, as far as I understand it I will be converting > existing functions that take std::string or char*'s to use > llvm::StringRef where applicable. Then I will be changing a number of > call sites to use this new function. > > One major question I have is should the old version be removed?The intention is to change APIs directly rather than introducing a new API alongside the old one. (so, yes, the old one should be removed/not exist)> It > would be very possible to convert the old version as a stub and it may > make it easier for others when they have a string rather than a > StringRef (although conversion is simple anyways).This change should generally be API compatible (implicit conversions to StringRef should fire in most/common cases) and fixing up a few callers for which the extra user defined conversion is not accessible shouldn't be too painful.> Also there may be > API and ABI implications if a function from the public API is converted. > What do you think the best approach for this is?The LLVM C++ API has no ABI guarantee/stability, we break it continuously and intend to keep doing so - you're welcome to do the same in this effort.> Another question I have is how would you define focus. A large part of > the project is hunting through the source to find and change these > functions so how would "progress" be defined. GSOC requires a solid > requirement for mid-term and final requirements. Should I choose a > number of functions that I expect to have converted in this time or is > there a better criteria that you can think of.I don't know much about GSOC to know whether this would be a good project or not, nor how it might be evaluated. I had a few deeper issues when I started on the project & was working on StringRef upgrades - I started looking at Twine and trying to figure out whether Twine could be used more pervasively, but never came to any good conclusions about that. I eventually just decided to do ArrayRef work which was more unambiguous. You could search the codebase for particular idioms (I found ArrayRef opportunities by searching for "\.data().*\.length()" I think - or idioms like that (you could search for "const std::string&" parameters, for example, if you wanted to do StringRef upgrades)) and see if the number of instances is high enough for a reasonable sized project, then use that metric to track your progress - run the same search each day/week/whatever and demonstrate that you're approaching zero. - David> > cheers, > Kevin > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >
Thanks for the feedback David.
I have created a quick draft of my proposal and would appreciate any
feedback.
GSOC Proposal -- StringRef'ize APIs
==================================
Background
----------
LLVM provides a StringRef class that quite simply references a string
(arbitrary byte buffer). Using llvm::StringRef copies can be avoided
when a string passed into a function is only going to be read. While
this class is used in Clang there are still places where std::string are
used. By replacing these with llvm::StringRef copies can be avoided,
improving performance. Furthermore there are places where a const char*
is used, in these situations llvm::StringRef can be used to improve
safety and convenience through use of llvm::StringRef's already written
functions and bounds checks when assertions are enabled.
Project Goals
-------------
The goal of this project is to replaces these uses of const std::string
and const char* inside Clang with llvm::StringRef. Furthermore
std::string and char* that are not marked const will also be considered
to see if they do not need to be modified, in which case a StringRef
will be used as well.
This project entails changing both the headers and implementation to use
llvm:StringRefs as well as updating the documentation. Other goals of
the project include no API breakage (unless discussed specifically) and
maintaining good performance. As this project is largely performance
focused if other performance improvements are possible while changing
the code they will likely be explored as well.
Another important part of the project is allowing llvm::StringRefs to be
used in more places rather than being converted everywhere. This is
where the real benefits of StringRefs are, allowing them to be passed
far down the call stack.
Who am I
--------
Hello, I am Kevin Cox, a Canadian student from Carleton University. By
this summer I will be in a Third Year Software Engineering standing.
I have been using Linux for over 6 years and compiling my C and C++ code
with clang for nearly that long. I love working with low level code and
really enjoy squeezing all of the performance out of it. I think that
LLVM and Clang are great projects and am glad to have an opportunity to
help out.
Contact
-------
Communication is vitally important to success, especially when working
with a new code base. To facilitate quick communication I will be
reading email constantly and idling in IRC whenever I am working on my
project (and most likely more often than that). I also have a good
quality microphone so voice and video chats are a viable option for live
communication.
Email: kevincox at kevincox.ca
XMPP: kevincox at kevincox.ca
PGP: E394 3366 624E 7449 B9B4 85AE C075 8A3B 34D5 2E74
IRC: kevincox
Phone: <omitted>
In addition to regular communication I propose weekly or bi-weekly
meetings with my mentor to keep in touch and ensure the project is
moving forward.
Goals
-----
Google Summer of Code is a three month program. Over the course of
three months I will be working to convert as many APIs as possible to
use llvm::StringRef. While I have identified a number of APIs that can
be converted to use StringRefs listing them would be a waste of time and
energy. Instead, I have used some incredible high tech methods to count
the uses of std::string& and const char* in Clang and hope to reduce
that number. Please note that not all of these matches can be converted
based on quick analysis about 1/8 to 1/4 of the functions can be converted.
% grep 'const\s*char\s*\\*' **/*.{h,cpp} | wc -l
3272
% grep 'std::string\s*&' **/*.{h,cpp} | wc -l
506
Throughout the summer I expect to significantly reduce these numbers. I
am also going to create and maintain a document of APIs that can and
can't be converted (think of a tri-color collector) and work through
this document throughout the summer. Idealy, by the end of the summer
all APIs that can be converted have been and call sites have been
updated to take advantage of the new APIs.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 295 bytes
Desc: OpenPGP digital signature
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140317/414a9469/attachment.sig>