src/main/character.c:435-438 (do_strsplit) contains the following code: for (i = 0; i < tlen; i++) if (getCharCE(STRING_ELT(tok, 0)) == CE_UTF8) use_UTF8 = TRUE; for (i = 0; i < len; i++) if (getCharCE(STRING_ELT(x, 0)) == CE_UTF8) use_UTF8 = TRUE; since both loops iterate over loop-invariant expressions and statements, either the loops are redundant, or the fixed index '0' was meant to actually be the variable i. i guess it's the latter, hence 'bug?' in the subject. it also appears that if *any* element of tok (or x) positively passes the test, use_UTF8 is set to TRUE; in such a case, further checks make no sense. the following rewrite cuts the inessential computation: for (i = 0; i < tlen; i++) if (getCharCE(STRING_ELT(tok, i)) == CE_UTF8) { use_UTF8 = TRUE; break; } for (i = 0; i < len; i++) if (getCharCE(STRING_ELT(x, i)) == CE_UTF8) { use_UTF8 = TRUE; break; } since the pattern is repetitive, the following generic approach would help (and the macro could possibly be reused in other places): #define CHECK_CE(CHARACTER, LENGTH, USEUTF8) \ for (i = 0; i < (LENGTH); i++) \ if (getCharCE(STRING_ELT((CHARACTER), i)) == CE_UTF8) { \ (USEUTF8) = TRUE; \ break; } CHECK_CE(tok, tlen, use_UTF8) CHECK_CE(x, len, use_UTF8) if you like it, i can provide a patch. vQ