Peter Ruckdeschel
2008-Nov-05 22:47 UTC
[Rd] puzzled by cat() behaviour when argument '...' is a vector (and argument 'sep' contains "\n")
Hi r-devels, I am a bit puzzled by the behaviour of cat() --- any help is appreciated... At least AFAICS, cat() for vector-valued '...' argument behaves in contradiction to what I understand from the note in the help to cat() which reads " Despite its name and earlier documentation, 'sep' is a vector of terminators rather than separators, being output after every vector element (including the last). Entries are recycled as needed. " ---------------------------------------------------------------------------- reproducible example code: ---------------------------------------------------------------------------->cat(rep("x",3), sep = ".")x.x.x ## no "." appended! Things get even worse if "\n" features in the 'sep' vector:>cat(rep("x",3),sep = c(".","\n","."))x.x x>## last separator "." gets swallowed; an non-intended line feed is inserted ---------------------------------------------------------------------------- code causing this behaviour ---------------------------------------------------------------------------- ##### "\n" I have looked a bit into the source code (lines 468-630 in builtin.c in src/main) and found out, as variable pwidth is set to 1 in line 504, i.e.; if (strstr(CHAR(STRING_ELT(sepr, i)), "\n")) nlsep = 1; /* ASCII */ the code in lines 622-23, i.e.; if ((pwidth != INT_MAX) || nlsep) Rprintf("\n"); is responsible for the newline. Is this really intended? ##### separators, not terminators Another look shows that, contrary to what is said in the help file, an element of vector 'sep' is /not/ printed out after each element of the vector passed as argument '...' to cat(), "including the last" --- confer the for-loop over the elements of '...' in lines 596-617 and the print-out of the separator cat_printsep(sepr, ntot); in line 600. Once again: Is this intended? A patch fixing my problem would be easy, though might crash other much more important code; would you have any proposals? Best, Peter ------------------------------------------------------------------- Version: platform = i386-pc-mingw32 arch = i386 os = mingw32 system = i386, mingw32 status = Under development (unstable) major = 2 minor = 9.0 year = 2008 month = 10 day = 01 svn rev = 46589 language = R version.string = R version 2.9.0 Under development (unstable) (2008-10-01 r46589) Windows XP (build 2600) Service Pack 3 Locale: LC_COLLATE=German_Germany.1252;LC_CTYPE=German_Germany.1252;LC_MONETARY=German_Germany.1252;LC_NUMERIC=C;LC_TIME=German_Germany.1252 Search Path: .GlobalEnv, package:stats, package:graphics, package:grDevices, package:utils, package:datasets, package:methods, Autoloads, package:base
Duncan Murdoch
2008-Nov-05 23:52 UTC
[Rd] puzzled by cat() behaviour when argument '...' is a vector (and argument 'sep' contains "\n")
On 05/11/2008 5:47 PM, Peter Ruckdeschel wrote:> Hi r-devels, > > I am a bit puzzled by the behaviour of cat() --- any help is appreciated... > > At least AFAICS, cat() for vector-valued '...' argument behaves in > contradiction to what I understand from the note in the help to cat() > which reads > > " > Despite its name and earlier documentation, 'sep' is a vector of > terminators rather than separators, being output after every > vector element (including the last). Entries are recycled as > needed. > "I think you're right that the documentation is incorrect. I'd prefer a patch to the docs, rather than a change to the behaviour: cat() is so fundamental that any changes to it would have wide ranging consequences. If you want to study the code and draft a documentation patch, I'll review it and possibly commit it. Duncan Murdoch> ---------------------------------------------------------------------------- > reproducible example code: > ---------------------------------------------------------------------------- > >> cat(rep("x",3), sep = ".") > x.x.x > ## no "." appended! > > Things get even worse if "\n" features in the 'sep' vector: > >> cat(rep("x",3),sep = c(".","\n",".")) > x.x > x > ## last separator "." gets swallowed; an non-intended line feed is > inserted > > ---------------------------------------------------------------------------- > code causing this behaviour > ---------------------------------------------------------------------------- > ##### "\n" > > I have looked a bit into the source code > (lines 468-630 in builtin.c in src/main) > and found out, as variable pwidth is set to 1 in line 504, i.e.; > > if (strstr(CHAR(STRING_ELT(sepr, i)), "\n")) nlsep = 1; /* ASCII */ > > the code in lines 622-23, i.e.; > > if ((pwidth != INT_MAX) || nlsep) > Rprintf("\n"); > > is responsible for the newline. Is this really intended? > > ##### separators, not terminators > > Another look shows that, contrary to what is said in the help file, > an element of vector 'sep' is /not/ printed out after each element > of the vector passed as argument '...' to cat(), "including the last" > --- confer the for-loop over the elements of '...' in lines 596-617 > and the print-out of the separator > > cat_printsep(sepr, ntot); > > in line 600. Once again: Is this intended? > > A patch fixing my problem would be easy, though might crash > other much more important code; would you have any > proposals? > > Best, > Peter > > ------------------------------------------------------------------- > Version: > platform = i386-pc-mingw32 > arch = i386 > os = mingw32 > system = i386, mingw32 > status = Under development (unstable) > major = 2 > minor = 9.0 > year = 2008 > month = 10 > day = 01 > svn rev = 46589 > language = R > version.string = R version 2.9.0 Under development (unstable) > (2008-10-01 r46589) > > Windows XP (build 2600) Service Pack 3 > > Locale: > LC_COLLATE=German_Germany.1252;LC_CTYPE=German_Germany.1252;LC_MONETARY=German_Germany.1252;LC_NUMERIC=C;LC_TIME=German_Germany.1252 > > Search Path: > .GlobalEnv, package:stats, package:graphics, package:grDevices, > package:utils, package:datasets, package:methods, Autoloads, package:base > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel
Steven McKinney
2008-Nov-06 00:45 UTC
[Rd] puzzled by cat() behaviour when argument '...' is a vector (and argument 'sep' contains "\n")
> On 05/11/2008 5:47 PM, Peter Ruckdeschel wrote: > > Hi r-devels, > > > > I am a bit puzzled by the behaviour of cat() --- any help is > > appreciated...It appears to me that the elements of sep are just used as separators _between_ each of the objects comprising '...' handed to cat. If N objects are handed to cat, cat requires N-1 separator strings. The default separator string is " " (space character). Hence for cat(rep("x",3), sep = ".") two periods are needed to separate the three input objects> cat(rep("x",3), sep = ".")x.x.x>as expected. For cat(rep("x",3),sep = c(".","\n",".")), the first separator is a period, the second is a newline, and the third is not needed.> cat(rep("x",3),sep = c(".","\n","."))x.x x>as expected. The line feed inserted is expected, it is the second element of the sep vector, so should appear between the second and third objects, as it does. The third element of sep is not needed, so is ignored. Another example:> cat(letters, sep = c(as.character(1:9), "\n"))a1b2c3d4e5f6g7h8i9j k1l2m3n4o5p6q7r8s9t u1v2w3x4y5z>Again, as expected. Slightly more complex> paste("[", c(as.character(1:9), "\n"), "]", sep = "")[1] "[1]" "[2]" "[3]" "[4]" "[5]" "[6]" "[7]" "[8]" "[9]" "[\n]"> cat(letters, sep = paste("[", c(as.character(1:9), "\n"), "]", sep = ""))a[1]b[2]c[3]d[4]e[5]f[6]g[7]h[8]i[9]j[ ]k[1]l[2]m[3]n[4]o[5]p[6]q[7]r[8]s[9]t[ ]u[1]v[2]w[3]x[4]y[5]z>again, as expected. I haven't delved into the source to see where the final line feed is being generated (as I see the next R prompt on a new line) so I can't comment on whether anything is appended to the end of the output string generated by cat(). The documentation says no line feed is appended unless argument 'fill' is TRUE or numeric.> > > > At least AFAICS, cat() for vector-valued '...' argument behaves in > > contradiction to what I understand from the note in the help to cat() > > which reads > > > > " > > Despite its name and earlier documentation, 'sep' is a vector of > > terminators rather than separators, being output after every > > vector element (including the last). Entries are recycled as > > needed. > > " > > I think you're right that the documentation is incorrect. I'd prefer a > patch to the docs, rather than a change to the behaviour: cat() is so > fundamental that any changes to it would have wide ranging consequences. > > If you want to study the code and draft a documentation patch, I'll > review it and possibly commit it.How about this: sep a character vector of strings to insert between each object. If there are too few elements in sep to separate all the objects, the elements of sep are recycled. Unused elements of sep are ignored. then in Details: Details cat is useful for producing output in user-defined functions. It converts its arguments to character vectors, concatenates them to a single character vector, inserts the given sep= string(s) between each element and then outputs them.> > Duncan Murdoch > > > ---------------------------------------------------------------------------- > > reproducible example code: > > ---------------------------------------------------------------------------- > > > >> cat(rep("x",3), sep = ".") > > x.x.x > > ## no "." appended! > > > > Things get even worse if "\n" features in the 'sep' vector: > > > >> cat(rep("x",3),sep = c(".","\n",".")) > > x.x > > x > > ## last separator "." gets swallowed; an non-intended line feed is > > inserted > > > > ---------------------------------------------------------------------------- > > code causing this behaviour > > ---------------------------------------------------------------------------- > > ##### "\n" > > > > I have looked a bit into the source code > > (lines 468-630 in builtin.c in src/main) > > and found out, as variable pwidth is set to 1 in line 504, i.e.; > > > > if (strstr(CHAR(STRING_ELT(sepr, i)), "\n")) nlsep = 1; /* ASCII */ > > > > the code in lines 622-23, i.e.; > > > > if ((pwidth != INT_MAX) || nlsep) > > Rprintf("\n"); > > > > is responsible for the newline. Is this really intended? > > > > ##### separators, not terminators > > > > Another look shows that, contrary to what is said in the help file, > > an element of vector 'sep' is /not/ printed out after each element > > of the vector passed as argument '...' to cat(), "including the last" > > --- confer the for-loop over the elements of '...' in lines 596-617 > > and the print-out of the separator > > > > cat_printsep(sepr, ntot); > > > > in line 600. Once again: Is this intended? > > > > A patch fixing my problem would be easy, though might crash > > other much more important code; would you have any > > proposals? > > > > Best, > > Peter > > > > ------------------------------------------------------------------- > > Version: > > platform = i386-pc-mingw32 > > arch = i386 > > os = mingw32 > > system = i386, mingw32 > > status = Under development (unstable) > > major = 2 > > minor = 9.0 > > year = 2008 > > month = 10 > > day = 01 > > svn rev = 46589 > > language = R > > version.string = R version 2.9.0 Under development (unstable) > > (2008-10-01 r46589) > > > > Windows XP (build 2600) Service Pack 3 > > > > Locale: > > LC_COLLATE=German_Germany.1252;LC_CTYPE=German_Germany.1252;LC_MONETARY=German_Germany.1252;LC_NUMERIC=C;LC_TIME=German_Germany.1252 > > > > Search Path: > > .GlobalEnv, package:stats, package:graphics, package:grDevices, > > package:utils, package:datasets, package:methods, Autoloads, package:base > > > > ______________________________________________ > > R-devel at r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-devel > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-develSteven McKinney Statistician Molecular Oncology and Breast Cancer Program British Columbia Cancer Research Centre email: smckinney +at+ bccrc +dot+ ca tel: 604-675-8000 x7561 BCCRC Molecular Oncology 675 West 10th Ave, Floor 4 Vancouver B.C. V5Z 1L3 Canada