Hello. My friend's data compressor has a problem. During decompression stage, on some corrupted files, it may issue an overlapping memcpy. He has two easy solutions for that: * switch to memmove * add a branch to detect such case However, he's not happy with either of them as they slow the decompression down to handle a case that will never happen to almost everyone. Furthermore, we don't see anything that could fail with memcpy (aside from nasal elephants, but we're used to them). The copied data may be corrupted, but it's not a failure as the compressed data is corrupted already. We see another solution, to write a custom memcpy. We'd like to avoid it for now as getting portable performance has proved to be a problem. So I have two questions: * what are possible failures with overlapping memcpy? * what solution to the problem do you recommend? Regards, -- Maciej Adamczyk
On Mon, Dec 07, 2015 at 07:33:29AM +0100, Maciej Adamczyk via llvm-dev wrote:> Hello. > My friend's data compressor has a problem. During decompression stage, on > some corrupted files, it may issue an overlapping memcpy. > He has two easy solutions for that: > * switch to memmove > * add a branch to detect such case > However, he's not happy with either of them as they slow the decompression > down to handle a case that will never happen to almost everyone.While I don't think any of this is really LLVM specific, the second is certainly the correct approach if the file format explicitly disallows such overlapping ranges. LZMA streams for example are perfectly well defined for that case and it even make sense for certain overlapping pattern to say "copy 256 Bytes starting from offsets -8". If you know that this condition is invalid for well formed input, mark the condition as predicted false and the compiler will try to turn it into a branch statement that the branch prediction of the CPU can understand. Joerg
On 7 Dec 2015, at 16:39, Joerg Sonnenberger via llvm-dev <llvm-dev at lists.llvm.org> wrote:> > On Mon, Dec 07, 2015 at 07:33:29AM +0100, Maciej Adamczyk via llvm-dev wrote: >> Hello. >> My friend's data compressor has a problem. During decompression stage, on >> some corrupted files, it may issue an overlapping memcpy. >> He has two easy solutions for that: >> * switch to memmove >> * add a branch to detect such case >> However, he's not happy with either of them as they slow the decompression >> down to handle a case that will never happen to almost everyone. > > While I don't think any of this is really LLVM specific, the second is > certainly the correct approach if the file format explicitly disallows > such overlapping ranges. LZMA streams for example are perfectly well > defined for that case and it even make sense for certain overlapping > pattern to say "copy 256 Bytes starting from offsets -8". If you > know that this condition is invalid for well formed input, mark the > condition as predicted false and the compiler will try to turn it into a branch > statement that the branch prediction of the CPU can understand.[ continuing off topic ] The lack of such error checking is one of the big reasons that libraries like libjpeg, libpng, and so on have been a huge source of vulnerabilities in web browsers for the last couple of decades. It sounds like your friend has already added a security hole to his library, please discourage him from adding any more. David