Nutter, Benjamin
2008-Oct-31 19:43 UTC
[R] Is there a way to vectorize this? [with correction]
** Sorry to repost. I forgot to include a function necessary to make the example work ** I apologize up front for this being a little long. I hope it's understandable. Please let me know if I need to clarify anything. Several months ago I wrote a series of functions to help me take my R analyses and build custom reports in html files. Each function either builds or modifies a string of html code that can then be written to a file to produce the desired output. To make modifications in the html code, I've placed 'markers' around certain characteristics that I might want to change. For instance, the alignment characteristics have an 'algnmark' on either side of them. When I wish to change the alignment, I can find where these markers are, determine their location, and replace the contents between them. I've been using the functions for a few months now, and am pleased with the utility. Unfortunately, as I was writing these, I wasn't very strong with my vectorization skills and relied on for loops (lots of for loops) to get through the work. So while I'm pleased with the utility, I've been trying to optimize the functions by vectorizing the for loops. At this point, I've hit a small snag. I have a situation where I can't seem to figure out how to vectorize the loop. Part of me wonders if it is even possible. The scenario is this: I run a string of code through the loop, on each pass, the section of code in need of modification is identified and the changes are made. When this is done, however, the length of the string changes. The change in length needs to be recognized in the next pass through the loop. Okay, some code to illustrate what I mean. This first function formats the html file. I only include it because it will be necessary to create illustrate what the function is doing. I am eliminating all comments and spacing from the code for brevity. #******************************************* Start of html.file.start 'html.file.start' <- function(title, size=11, font="Times New Roman"){ size <- format(floor(size),nsmall=1) code <- paste(" <html xmlns:o='urn:schemas-microsoft-com:office:office\' xmlns:w=\'urn:schemas-microsoft-com:office:word\' xmlns=\'http://www.w3.org/TR/REC-html40\'> <head> <meta http-equiv=Content-Type content=\'text/html; charset=windows-1252\'> <meta name=ProgId content=Word.Document> <meta name=Generator content=\'Microsoft Word 11\'> <meta name=Originator content=\'Microsoft Word 11\'> <style> <!-- /* Style Definitions */ p.MsoNormal, li.MsoNormal, div.MsoNormal p.MsoEndnoteText, li.MsoEndnoteText, div.MsoEndnoteText {margin-top:2.0pt; margin-right:0in; margin-bottom:0in; margin-left:.15in; margin-bottom:.0001pt; text-indent:-.15in; mso-pagination:none; font-size:9.0pt; mso-bidi-font-size:10.0pt; font-family:'Times New Roman'; mso-fareast-font-family:'Times New Roman';} p.Textbody, li.Textbody, div.Textbody --> </style> ", "<title>",title,"</title> </head> <body lang=EN-US style=\'tab-interval:.5in;", "textmark; font-size:",size,"pt; textmark;", "fontmark; font-family:",font,"; fontmark;\'>", sep="") return(code) } #******************************************** End of html.file.start #******************************************** Start of html.text 'html.text' <- function(text, size=11, font="Times New Roman", align="left", title){ size <- format(floor(size),nsmall=1) if(missing(title)) title <- "" else title <- paste("<br/>",title) title <- paste("<b>",title,"</b><br/>\n",sep="") code <- paste(" <p class=MsoNormal ", "algnmark align=",align," algnmark> <span class=GramE style=\'", " textmark; font-size:",size,"pt; textmark;", "fontmark; font-family:",font,"; fontmark;", "stylemark; font-weight:normal; font-style:normal;", " text-decoration:none; stylemark;\'>", title,text," </span> </p>",sep="") return(code) } #****************************************** End of html.text So here is the function I'm trying to vectorize. #******************************************* Start of html.align html.align <- function(code,new.align="left"){ #* Create a string to replace the current alignment setting. align <- paste(" align=",new.align," ",sep="") #* Function to pass to sapply. This is handy when 'code' #* is a vector. f1 <- function(code,align=align){ mark <- unlist(gregexpr("algnmark",code)) #* Get positions of markers if(mark[1]>0){ odd <- seq(1,length(mark),by=2) #* odd elements are starting marker evn <- seq(2,length(mark),by=2) #* even elements are ending marker mark[odd] <- mark[odd]+9 #* These two lines determine the starting mark[evn] <- mark[evn]-1 #* and ending elements of the substring to #* be replaced for(i in 1:length(odd)){ l.old <- nchar(code) #* store the length of the code segment. old.align <- substr(code,mark[odd[i]],mark[evn[i]]) #* Identify the old alignment setting code <- gsub(old.align,align,code) #* Replace alignment setting mark <- mark - (l.old-nchar(code)) #* See the NOTE Below. } return(code) } } code <- sapply(code,f1,align=align) return(code) } #************************************************* end of html.align NOTE: This is the problem. When the alignment setting is changed, the length of the code string changes, and the elements in 'mark' have to be adjusted accordingly. Can anyone think of a way to vectorize this process while adjusting for these changes, or is the loop the best solution? Here's a little something to run it on so you can see how it works. #********************************* Start Example code. head <- html.file.start("Test File") text1 <- html.text("I need to write a paragraph so that I can test the html.align function in my package. I need to change the function so that it no longer uses the for loop that it now contains. <br/> <br/> Now a short sentence.") text2 <- html.text("A second element") text <- c(text1,text2) text <- html.align(text,"center") # options are "left", "right", # "center", or "justify" write(head,"test.html") write(text,"test.html") Thanks for any help. Benjamin P Please consider the environment before printing this e-mail Cleveland Clinic is ranked one of the top hospitals in America by U.S. News & World Report (2008). Visit us online at http://www.clevelandclinic.org for a complete listing of our services, staff and locations. Confidentiality Note: This message is intended for use\...{{dropped:13}}
Duncan Temple Lang
2008-Nov-01 16:20 UTC
[R] Is there a way to vectorize this? [with correction]
Nutter, Benjamin wrote:> ** Sorry to repost. I forgot to include a function necessary to make > the example work ** > > I apologize up front for this being a little long. I hope it's > understandable. Please let me know if I need to clarify anything. > > Several months ago I wrote a series of functions to help me take my R > analyses and build custom reports in html files. Each function either > builds or modifies a string of html code that can then be written to a > file to produce the desired output. > > To make modifications in the html code, I've placed 'markers' around > certain characteristics that I might want to change. For instance, the > alignment characteristics have an 'algnmark' on either side of them. > When I wish to change the alignment, I can find where these markers are, > determine their location, and replace the contents between them. > > I've been using the functions for a few months now, and am pleased with > the utility. Unfortunately, as I was writing these, I wasn't very > strong with my vectorization skills and relied on for loops (lots of for > loops) to get through the work. So while I'm pleased with the utility, > I've been trying to optimize the functions by vectorizing the for loops. > > At this point, I've hit a small snag. I have a situation where I can't > seem to figure out how to vectorize the loop. Part of me wonders if it > is even possible. > > The scenario is this: I run a string of code through the loop, on each > pass, the section of code in need of modification is identified and the > changes are made. When this is done, however, the length of the string > changes. The change in length needs to be recognized in the next pass > through the loop.At a quick glance, it seems merely trying to transform each instance of algnmark align=left algnmark to algnmark align=right algnmark If so, you are going about this in an unnecessarily complicated manner. html.text = function(text, new.align) gsub("algnmark align=[a-z]+ algnmark", paste("algnmark align=", new.align, " algnmark", sep = ""), text) would be much more explicit about what you are doing. You actually want to be more specific about this and replace only within <.... >, i.e. html elements. You might benefit from a package like R2HTML and using that to generate the content. However, building reports by building strings containing markup and content/text seems simple and is easy to get started, but actually becomes complicated. You might look at Sweave, or alternatively build the document directly yourself using tools designed for creating HTML (or XML). You can use xmlParse(), newXMLNode() and friends in the XML package to read an empty template document and then add new nodes, etc. When you use this approach, you can access individual nodes and change them without having to work with the entire content. Alternatively, in a few weeks, we'll release some tools for working directly with .docx and .xslx and modifying their content. D.> > Okay, some code to illustrate what I mean. This first function formats > the html file. I only include it because it will be necessary to create > illustrate what the function is doing. I am eliminating all comments > and spacing from the code for brevity. > > #******************************************* Start of html.file.start > 'html.file.start' <- function(title, size=11, font="Times New Roman"){ > size <- format(floor(size),nsmall=1) > code <- paste(" > <html xmlns:o='urn:schemas-microsoft-com:office:office\' > xmlns:w=\'urn:schemas-microsoft-com:office:word\' > xmlns=\'http://www.w3.org/TR/REC-html40\'> > <head> > <meta http-equiv=Content-Type content=\'text/html; > charset=windows-1252\'> > <meta name=ProgId content=Word.Document> > <meta name=Generator content=\'Microsoft Word 11\'> > <meta name=Originator content=\'Microsoft Word 11\'> > <style> > <!-- > /* Style Definitions */ > p.MsoNormal, li.MsoNormal, div.MsoNormal > p.MsoEndnoteText, li.MsoEndnoteText, div.MsoEndnoteText > {margin-top:2.0pt; > margin-right:0in; > margin-bottom:0in; > margin-left:.15in; > margin-bottom:.0001pt; > text-indent:-.15in; > mso-pagination:none; > font-size:9.0pt; > mso-bidi-font-size:10.0pt; > font-family:'Times New Roman'; > mso-fareast-font-family:'Times New Roman';} > p.Textbody, li.Textbody, div.Textbody --> > </style> > ", > "<title>",title,"</title> > </head> > <body lang=EN-US style=\'tab-interval:.5in;", > "textmark; font-size:",size,"pt; textmark;", > "fontmark; font-family:",font,"; fontmark;\'>", sep="") > return(code) > } #******************************************** End of html.file.start > > > #******************************************** Start of html.text > 'html.text' <- function(text, size=11, font="Times New Roman", > align="left", title){ > size <- format(floor(size),nsmall=1) > if(missing(title)) title <- "" else title <- paste("<br/>",title) > title <- paste("<b>",title,"</b><br/>\n",sep="") > code <- paste(" > <p class=MsoNormal ", > "algnmark align=",align," algnmark> > <span class=GramE style=\'", > " textmark; font-size:",size,"pt; textmark;", > "fontmark; font-family:",font,"; fontmark;", > "stylemark; font-weight:normal; font-style:normal;", > " text-decoration:none; stylemark;\'>", > title,text," > </span> > </p>",sep="") > return(code) > } #****************************************** End of html.text > > > So here is the function I'm trying to vectorize. > > #******************************************* Start of html.align > html.align <- function(code,new.align="left"){ > #* Create a string to replace the current alignment setting. > align <- paste(" align=",new.align," ",sep="") > > #* Function to pass to sapply. This is handy when 'code' > #* is a vector. > f1 <- function(code,align=align){ > mark <- unlist(gregexpr("algnmark",code)) #* Get positions of > markers > if(mark[1]>0){ > odd <- seq(1,length(mark),by=2) #* odd elements are starting > marker > evn <- seq(2,length(mark),by=2) #* even elements are ending marker > > mark[odd] <- mark[odd]+9 #* These two lines determine the > starting > mark[evn] <- mark[evn]-1 #* and ending elements of the substring > to > #* be replaced > > for(i in 1:length(odd)){ > > l.old <- nchar(code) #* store the length of the code segment. > > old.align <- substr(code,mark[odd[i]],mark[evn[i]]) > #* Identify the old alignment setting > > code <- gsub(old.align,align,code) #* Replace alignment setting > > mark <- mark - (l.old-nchar(code)) #* See the NOTE Below. > } > return(code) > } > } > code <- sapply(code,f1,align=align) > return(code) > } #************************************************* end of html.align > > > NOTE: This is the problem. When the alignment setting is changed, the > length of the code string changes, and the elements in 'mark' have to be > adjusted accordingly. Can anyone think of a way to vectorize this > process while adjusting for these changes, or is the loop the best > solution? > > Here's a little something to run it on so you can see how it works. > > #********************************* Start Example code. > head <- html.file.start("Test File") > text1 <- html.text("I need to write a paragraph so that I can test the > html.align function in my package. I need to change the > function > so that it no longer uses the for loop that it now contains. > <br/> <br/> Now a short sentence.") > text2 <- html.text("A second element") > > text <- c(text1,text2) > > text <- html.align(text,"center") # options are "left", "right", > # "center", or "justify" > > write(head,"test.html") > write(text,"test.html") > > > > Thanks for any help. > Benjamin > > > P Please consider the environment before printing this e-mail > > Cleveland Clinic is ranked one of the top hospitals > in America by U.S. News & World Report (2008). > Visit us online at http://www.clevelandclinic.org for > a complete listing of our services, staff and > locations. > > > Confidentiality Note: This message is intended for use\...{{dropped:13}} > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.