The new strcapture function in R-devel is handy, capturing the matches to the parenthesized subpatterns in a regular expression in the columns of a data.frame, whose column names and classes are given by the 'proto' argument. E.g.,> p1 <- data.frame(Name="", Number=0) > str(strcapture("([[:alpha:]]*) +([[:digit:]]*)", c("Three 3", "Twenty20"), proto=p1)) 'data.frame': 2 obs. of 2 variables: $ Name : Factor w/ 2 levels "Three","Twenty": 1 2 $ Number: num 3 20 I think it would be even nicer if it constructed its data.frame using the check.names=FALSE and stringsAsFactors=FALSE arguments. Then the names and types specified in the proto argument would be respected instead of changing them as in the following example> p2 <- data.frame("The Name"="", "The Number"=0, stringsAsFactors=FALSE,check.names=FALSE)> str(strcapture("([[:alpha:]]*) +([[:digit:]]*)", c("Three 3", "Twenty20"), proto=p2)) 'data.frame': 2 obs. of 2 variables: $ The.Name : Factor w/ 2 levels "Three","Twenty": 1 2 $ The.Number: num 3 20 Bill Dunlap TIBCO Software wdunlap tibco.com [[alternative HTML version deleted]]
Note that read.pattern in gsubfn does accept stringsAsFactors = FALSE, e.g. using your input lines and pattern: library(gsubfn) Lines <- c("Three 3", "Twenty 20") pat <- "([[:alpha:]]*) +([[:digit:]]*)" s2 <- read.pattern(text = Lines, pattern = pat, stringsAsFactors = FALSE, col.names = c("Name", "Number")) giving:> str(s2)'data.frame': 2 obs. of 2 variables: $ Name : chr "Three" "Twenty" $ Number: int 3 20 On Wed, Sep 21, 2016 at 2:06 PM, William Dunlap via R-devel <r-devel at r-project.org> wrote:> The new strcapture function in R-devel is handy, capturing > the matches to the parenthesized subpatterns in a regular > expression in the columns of a data.frame, whose column > names and classes are given by the 'proto' argument. E.g., > >> p1 <- data.frame(Name="", Number=0) >> str(strcapture("([[:alpha:]]*) +([[:digit:]]*)", c("Three 3", "Twenty > 20"), proto=p1)) > 'data.frame': 2 obs. of 2 variables: > $ Name : Factor w/ 2 levels "Three","Twenty": 1 2 > $ Number: num 3 20 > > I think it would be even nicer if it constructed its data.frame > using the check.names=FALSE and stringsAsFactors=FALSE > arguments. Then the names and types specified in the proto > argument would be respected instead of changing them as > in the following example > >> p2 <- data.frame("The Name"="", "The Number"=0, stringsAsFactors=FALSE, > check.names=FALSE) >> str(strcapture("([[:alpha:]]*) +([[:digit:]]*)", c("Three 3", "Twenty > 20"), proto=p2)) > 'data.frame': 2 obs. of 2 variables: > $ The.Name : Factor w/ 2 levels "Three","Twenty": 1 2 > $ The.Number: num 3 20 > > > Bill Dunlap > TIBCO Software > wdunlap tibco.com > > [[alternative HTML version deleted]] > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel-- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com
Thanks for the suggestion. Checked in that change. Michael On Wed, Sep 21, 2016 at 11:06 AM, William Dunlap via R-devel <r-devel at r-project.org> wrote:> The new strcapture function in R-devel is handy, capturing > the matches to the parenthesized subpatterns in a regular > expression in the columns of a data.frame, whose column > names and classes are given by the 'proto' argument. E.g., > >> p1 <- data.frame(Name="", Number=0) >> str(strcapture("([[:alpha:]]*) +([[:digit:]]*)", c("Three 3", "Twenty > 20"), proto=p1)) > 'data.frame': 2 obs. of 2 variables: > $ Name : Factor w/ 2 levels "Three","Twenty": 1 2 > $ Number: num 3 20 > > I think it would be even nicer if it constructed its data.frame > using the check.names=FALSE and stringsAsFactors=FALSE > arguments. Then the names and types specified in the proto > argument would be respected instead of changing them as > in the following example > >> p2 <- data.frame("The Name"="", "The Number"=0, stringsAsFactors=FALSE, > check.names=FALSE) >> str(strcapture("([[:alpha:]]*) +([[:digit:]]*)", c("Three 3", "Twenty > 20"), proto=p2)) > 'data.frame': 2 obs. of 2 variables: > $ The.Name : Factor w/ 2 levels "Three","Twenty": 1 2 > $ The.Number: num 3 20 > > > Bill Dunlap > TIBCO Software > wdunlap tibco.com > > [[alternative HTML version deleted]] > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel