Leonard Mada
2022-Oct-28 21:42 UTC
[R] Partition vector of strings into lines of preferred width
Dear R-Users, text = " What is the best way to split/cut a vector of strings into lines of preferred width? I have come up with a simple solution, albeit naive, as it involves many arithmetic divisions. I have an alternative idea which avoids this problem. But I may miss some existing functionality!" # Long vector of strings: str = strsplit(text, " |(?<=\n)", perl=TRUE)[[1]]; lenWords = nchar(str); # simple, but naive solution: # - it involves many divisions; cut.character.int = function(n, w) { ?? ?ncm = cumsum(n); ?? ?nwd = ncm %/% w; ?? ?count = rle(nwd)$lengths; ?? ?pos = cumsum(count); ?? ?posS = pos[ - length(pos)] + 1; ?? ?posS = c(1, posS); ?? ?pos = rbind(posS, pos); ?? ?return(pos); } npos = cut.character.int(lenWords, w=30); # lets print the results; for(id in seq(ncol(npos))) { ?? len = npos[2, id] - npos[1, id]; ?? cat(str[seq(npos[1, id], npos[2, id])], c(rep(" ", len), "\n")); } The first solution performs an arithmetic division on all string lengths. It is possible to find out the total length and divide only the last element of the cumsum. Something like this should work (although it is not properly tested). w = 30; cumlen = cumsum(lenWords); max = tail(cumlen, 1) %/% w + 1; pos = cut(cumlen, seq(0, max) * w); count = rle(as.numeric(pos))$lengths; # everything else is the same; pos = cumsum(count); posS = pos[ - length(pos)] + 1; posS = c(1, posS); pos = rbind(posS, pos); npos = pos; # then print The cut() may be optimized as well, as the cumsum is sorted ascending. I did not evaluate the efficiency of the code either. But do I miss some existing functionality? Note: - technically, the cut() function should probably return a vector of indices (something like: rep(seq_along(count), count)), but it was more practical to have both the start and end positions. Many thanks, Leonard
Andrew Simmons
2022-Oct-28 21:51 UTC
[R] Partition vector of strings into lines of preferred width
I would suggest using strwrap(), the documentation at ?strwrap has plenty of details and examples. For paragraphs, I would usually do something like: strwrap(x = , width = 80, indent = 4) On Fri, Oct 28, 2022 at 5:42 PM Leonard Mada via R-help <r-help at r-project.org> wrote:> > Dear R-Users, > > text = " > What is the best way to split/cut a vector of strings into lines of > preferred width? > I have come up with a simple solution, albeit naive, as it involves many > arithmetic divisions. > I have an alternative idea which avoids this problem. > But I may miss some existing functionality!" > > # Long vector of strings: > str = strsplit(text, " |(?<=\n)", perl=TRUE)[[1]]; > lenWords = nchar(str); > > # simple, but naive solution: > # - it involves many divisions; > cut.character.int = function(n, w) { > ncm = cumsum(n); > nwd = ncm %/% w; > count = rle(nwd)$lengths; > pos = cumsum(count); > posS = pos[ - length(pos)] + 1; > posS = c(1, posS); > pos = rbind(posS, pos); > return(pos); > } > > npos = cut.character.int(lenWords, w=30); > # lets print the results; > for(id in seq(ncol(npos))) { > len = npos[2, id] - npos[1, id]; > cat(str[seq(npos[1, id], npos[2, id])], c(rep(" ", len), "\n")); > } > > > The first solution performs an arithmetic division on all string > lengths. It is possible to find out the total length and divide only the > last element of the cumsum. Something like this should work (although it > is not properly tested). > > > w = 30; > cumlen = cumsum(lenWords); > max = tail(cumlen, 1) %/% w + 1; > pos = cut(cumlen, seq(0, max) * w); > count = rle(as.numeric(pos))$lengths; > # everything else is the same; > pos = cumsum(count); > posS = pos[ - length(pos)] + 1; > posS = c(1, posS); > pos = rbind(posS, pos); > > npos = pos; # then print > > > The cut() may be optimized as well, as the cumsum is sorted ascending. I > did not evaluate the efficiency of the code either. > > But do I miss some existing functionality? > > > Note: > > - technically, the cut() function should probably return a vector of > indices (something like: rep(seq_along(count), count)), but it was more > practical to have both the start and end positions. > > > Many thanks, > > > Leonard > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Martin Morgan
2022-Oct-28 21:52 UTC
[R] Partition vector of strings into lines of preferred width
> strwrap(text)[1] "What is the best way to split/cut a vector of strings into lines of" [2] "preferred width? I have come up with a simple solution, albeit naive," [3] "as it involves many arithmetic divisions. I have an alternative idea" [4] "which avoids this problem. But I may miss some existing functionality!" Maybe used as> strwrap(text) |> paste(collapse = "\n") |> cat("\n")What is the best way to split/cut a vector of strings into lines of preferred width? I have come up with a simple solution, albeit naive, as it involves many arithmetic divisions. I have an alternative idea which avoids this problem. But I may miss some existing functionality!>? From: R-help <r-help-bounces at r-project.org> on behalf of Leonard Mada via R-help <r-help at r-project.org> Date: Friday, October 28, 2022 at 5:42 PM To: R-help Mailing List <r-help at r-project.org> Subject: [R] Partition vector of strings into lines of preferred width Dear R-Users, text = " What is the best way to split/cut a vector of strings into lines of preferred width? I have come up with a simple solution, albeit naive, as it involves many arithmetic divisions. I have an alternative idea which avoids this problem. But I may miss some existing functionality!" # Long vector of strings: str = strsplit(text, " |(?<=\n)", perl=TRUE)[[1]]; lenWords = nchar(str); # simple, but naive solution: # - it involves many divisions; cut.character.int = function(n, w) { ncm = cumsum(n); nwd = ncm %/% w; count = rle(nwd)$lengths; pos = cumsum(count); posS = pos[ - length(pos)] + 1; posS = c(1, posS); pos = rbind(posS, pos); return(pos); } npos = cut.character.int(lenWords, w=30); # lets print the results; for(id in seq(ncol(npos))) { len = npos[2, id] - npos[1, id]; cat(str[seq(npos[1, id], npos[2, id])], c(rep(" ", len), "\n")); } The first solution performs an arithmetic division on all string lengths. It is possible to find out the total length and divide only the last element of the cumsum. Something like this should work (although it is not properly tested). w = 30; cumlen = cumsum(lenWords); max = tail(cumlen, 1) %/% w + 1; pos = cut(cumlen, seq(0, max) * w); count = rle(as.numeric(pos))$lengths; # everything else is the same; pos = cumsum(count); posS = pos[ - length(pos)] + 1; posS = c(1, posS); pos = rbind(posS, pos); npos = pos; # then print The cut() may be optimized as well, as the cumsum is sorted ascending. I did not evaluate the efficiency of the code either. But do I miss some existing functionality? Note: - technically, the cut() function should probably return a vector of indices (something like: rep(seq_along(count), count)), but it was more practical to have both the start and end positions. Many thanks, Leonard ______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]]