Hello, I was looking to tackle the "StringRef'ize APIs" suggestion from the clang project page and just wanted to post a couple of thoughts and ask a couple of questions. First of all I am going to talk about how I see the goals of the project. Basically, as far as I understand it I will be converting existing functions that take std::string or char*'s to use llvm::StringRef where applicable. Then I will be changing a number of call sites to use this new function. One major question I have is should the old version be removed? It would be very possible to convert the old version as a stub and it may make it easier for others when they have a string rather than a StringRef (although conversion is simple anyways). Also there may be API and ABI implications if a function from the public API is converted. What do you think the best approach for this is? Another question I have is how would you define focus. A large part of the project is hunting through the source to find and change these functions so how would "progress" be defined. GSOC requires a solid requirement for mid-term and final requirements. Should I choose a number of functions that I expect to have converted in this time or is there a better criteria that you can think of. cheers, Kevin -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 295 bytes Desc: OpenPGP digital signature URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140317/c631b81e/attachment.sig>
On Mon, Mar 17, 2014 at 6:16 AM, Kevin Cox <kevincox at kevincox.ca> wrote:> Hello, > > I was looking to tackle the "StringRef'ize APIs" suggestion from the > clang project page and just wanted to post a couple of thoughts and ask > a couple of questions.Hi - welcome to the project!> First of all I am going to talk about how I see the goals of the > project. Basically, as far as I understand it I will be converting > existing functions that take std::string or char*'s to use > llvm::StringRef where applicable. Then I will be changing a number of > call sites to use this new function. > > One major question I have is should the old version be removed?The intention is to change APIs directly rather than introducing a new API alongside the old one. (so, yes, the old one should be removed/not exist)> It > would be very possible to convert the old version as a stub and it may > make it easier for others when they have a string rather than a > StringRef (although conversion is simple anyways).This change should generally be API compatible (implicit conversions to StringRef should fire in most/common cases) and fixing up a few callers for which the extra user defined conversion is not accessible shouldn't be too painful.> Also there may be > API and ABI implications if a function from the public API is converted. > What do you think the best approach for this is?The LLVM C++ API has no ABI guarantee/stability, we break it continuously and intend to keep doing so - you're welcome to do the same in this effort.> Another question I have is how would you define focus. A large part of > the project is hunting through the source to find and change these > functions so how would "progress" be defined. GSOC requires a solid > requirement for mid-term and final requirements. Should I choose a > number of functions that I expect to have converted in this time or is > there a better criteria that you can think of.I don't know much about GSOC to know whether this would be a good project or not, nor how it might be evaluated. I had a few deeper issues when I started on the project & was working on StringRef upgrades - I started looking at Twine and trying to figure out whether Twine could be used more pervasively, but never came to any good conclusions about that. I eventually just decided to do ArrayRef work which was more unambiguous. You could search the codebase for particular idioms (I found ArrayRef opportunities by searching for "\.data().*\.length()" I think - or idioms like that (you could search for "const std::string&" parameters, for example, if you wanted to do StringRef upgrades)) and see if the number of instances is high enough for a reasonable sized project, then use that metric to track your progress - run the same search each day/week/whatever and demonstrate that you're approaching zero. - David> > cheers, > Kevin > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >
Thanks for the feedback David. I have created a quick draft of my proposal and would appreciate any feedback. GSOC Proposal -- StringRef'ize APIs ================================== Background ---------- LLVM provides a StringRef class that quite simply references a string (arbitrary byte buffer). Using llvm::StringRef copies can be avoided when a string passed into a function is only going to be read. While this class is used in Clang there are still places where std::string are used. By replacing these with llvm::StringRef copies can be avoided, improving performance. Furthermore there are places where a const char* is used, in these situations llvm::StringRef can be used to improve safety and convenience through use of llvm::StringRef's already written functions and bounds checks when assertions are enabled. Project Goals ------------- The goal of this project is to replaces these uses of const std::string and const char* inside Clang with llvm::StringRef. Furthermore std::string and char* that are not marked const will also be considered to see if they do not need to be modified, in which case a StringRef will be used as well. This project entails changing both the headers and implementation to use llvm:StringRefs as well as updating the documentation. Other goals of the project include no API breakage (unless discussed specifically) and maintaining good performance. As this project is largely performance focused if other performance improvements are possible while changing the code they will likely be explored as well. Another important part of the project is allowing llvm::StringRefs to be used in more places rather than being converted everywhere. This is where the real benefits of StringRefs are, allowing them to be passed far down the call stack. Who am I -------- Hello, I am Kevin Cox, a Canadian student from Carleton University. By this summer I will be in a Third Year Software Engineering standing. I have been using Linux for over 6 years and compiling my C and C++ code with clang for nearly that long. I love working with low level code and really enjoy squeezing all of the performance out of it. I think that LLVM and Clang are great projects and am glad to have an opportunity to help out. Contact ------- Communication is vitally important to success, especially when working with a new code base. To facilitate quick communication I will be reading email constantly and idling in IRC whenever I am working on my project (and most likely more often than that). I also have a good quality microphone so voice and video chats are a viable option for live communication. Email: kevincox at kevincox.ca XMPP: kevincox at kevincox.ca PGP: E394 3366 624E 7449 B9B4 85AE C075 8A3B 34D5 2E74 IRC: kevincox Phone: <omitted> In addition to regular communication I propose weekly or bi-weekly meetings with my mentor to keep in touch and ensure the project is moving forward. Goals ----- Google Summer of Code is a three month program. Over the course of three months I will be working to convert as many APIs as possible to use llvm::StringRef. While I have identified a number of APIs that can be converted to use StringRefs listing them would be a waste of time and energy. Instead, I have used some incredible high tech methods to count the uses of std::string& and const char* in Clang and hope to reduce that number. Please note that not all of these matches can be converted based on quick analysis about 1/8 to 1/4 of the functions can be converted. % grep 'const\s*char\s*\\*' **/*.{h,cpp} | wc -l 3272 % grep 'std::string\s*&' **/*.{h,cpp} | wc -l 506 Throughout the summer I expect to significantly reduce these numbers. I am also going to create and maintain a document of APIs that can and can't be converted (think of a tri-color collector) and work through this document throughout the summer. Idealy, by the end of the summer all APIs that can be converted have been and call sites have been updated to take advantage of the new APIs. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 295 bytes Desc: OpenPGP digital signature URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140317/414a9469/attachment.sig>