How does one create a vector whose contents is the list of variables in a dataframe pertaining to a particular pattern? This is so simple but I cannot find a straightforward answer. I want to be able to pass the contents of that list to a "for" loop. So let us assume that one has a dataframe whose name is Data. And let us assume one had the height of a group of people measured at various ages. It could be made up of vectors Data$PersonalID, Data$FirstName, Data$LastName, Data$Height.1, Data$Height.5, Data$Height.9, Data$Height.10,Data$Height.12,Data$Height.20....many many more variables. How would one create a vector of all the Height variable names. The simple workaround is to not bother creating the vector "Data$Height.1" "Data$Height.5" "Data$Height.9" "Data$Height.10" "Data$Height.12""Data$Height.20"...but rather just to use the sapply function. However with some functions the sapply will not work and it is necessary to supply each variable name to a function (see thread at Repeating tdt function on thousands of variables) This is such a core capability. I would like to see it in the R-Wiki but could not find it there. -- Farrel Buchinsky, MD Pediatric Otolaryngologist Allegheny General Hospital Pittsburgh, PA
Farrel Buchinsky wrote:> How does one create a vector whose contents is the list of variables in a > dataframe pertaining to a particular pattern? > This is so simple but I cannot find a straightforward answer. > I want to be able to pass the contents of that list to a "for" loop. > > So let us assume that one has a dataframe whose name is Data. And let us > assume one had the height of a group of people measured at various ages. > > It could be made up of vectors Data$PersonalID, Data$FirstName, > Data$LastName, Data$Height.1, Data$Height.5, Data$Height.9, > Data$Height.10,Data$Height.12,Data$Height.20....many many more variables. > > How would one create a vector of all the Height variable names. > > The simple workaround is to not bother creating the vector "Data$Height.1" > "Data$Height.5" "Data$Height.9" "Data$Height.10" > "Data$Height.12""Data$Height.20"...but rather just to use the sapply > function. However with some functions the sapply will not work and it is > necessary to supply each variable name to a function (see thread at > Repeating tdt function on thousands of variables)vnames <- paste("Height", 1:20, sep=".") for(vn in vnames){ doSomethingWith(Data[[vn]]) } Uwe Ligges> > This is such a core capability. I would like to see it in the R-Wiki but > could not find it there. >
Column names in iris that contain the string Sepal: grep("Sepal", names(iris), value = TRUE) On 5/3/06, Farrel Buchinsky <fjbuch at gmail.com> wrote:> How does one create a vector whose contents is the list of variables in a > dataframe pertaining to a particular pattern? > This is so simple but I cannot find a straightforward answer. > I want to be able to pass the contents of that list to a "for" loop. > > So let us assume that one has a dataframe whose name is Data. And let us > assume one had the height of a group of people measured at various ages. > > It could be made up of vectors Data$PersonalID, Data$FirstName, > Data$LastName, Data$Height.1, Data$Height.5, Data$Height.9, > Data$Height.10,Data$Height.12,Data$Height.20....many many more variables. > > How would one create a vector of all the Height variable names. > > The simple workaround is to not bother creating the vector "Data$Height.1" > "Data$Height.5" "Data$Height.9" "Data$Height.10" > "Data$Height.12""Data$Height.20"...but rather just to use the sapply > function. However with some functions the sapply will not work and it is > necessary to supply each variable name to a function (see thread at > Repeating tdt function on thousands of variables) > > > This is such a core capability. I would like to see it in the R-Wiki but > could not find it there. > > -- > Farrel Buchinsky, MD > Pediatric Otolaryngologist > Allegheny General Hospital > Pittsburgh, PA > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html >
Here's an example. dfr <- data.frame(A1=1:10,A2=21:30,B1=31:40,B2=41:50) vars <- colnames(dfr) for (v in vars[grep("B",vars)]) print(mean(dfr[,v]))> -----Original Message----- > From: r-help-bounces at stat.math.ethz.ch > [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Farrel > Buchinsky > Sent: Wednesday, May 03, 2006 10:46 AM > To: r-help at stat.math.ethz.ch > Subject: [R] Listing Variables > > How does one create a vector whose contents is the list of > variables in a > dataframe pertaining to a particular pattern? > This is so simple but I cannot find a straightforward answer. > I want to be able to pass the contents of that list to a "for" loop. > > So let us assume that one has a dataframe whose name is Data. > And let us > assume one had the height of a group of people measured at > various ages. > > It could be made up of vectors Data$PersonalID, Data$FirstName, > Data$LastName, Data$Height.1, Data$Height.5, Data$Height.9, > Data$Height.10,Data$Height.12,Data$Height.20....many many > more variables. > > How would one create a vector of all the Height variable names. > > The simple workaround is to not bother creating the vector > "Data$Height.1" > "Data$Height.5" "Data$Height.9" "Data$Height.10" > "Data$Height.12""Data$Height.20"...but rather just to use the sapply > function. However with some functions the sapply will not > work and it is > necessary to supply each variable name to a function (see thread at > Repeating tdt function on thousands of variables) > > > This is such a core capability. I would like to see it in the > R-Wiki but > could not find it there. > > -- > Farrel Buchinsky, MD > Pediatric Otolaryngologist > Allegheny General Hospital > Pittsburgh, PA > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html >
On Wed, 2006-05-03 at 10:46 -0400, Farrel Buchinsky wrote:> How does one create a vector whose contents is the list of variables in a > dataframe pertaining to a particular pattern? > This is so simple but I cannot find a straightforward answer. > I want to be able to pass the contents of that list to a "for" loop. > > So let us assume that one has a dataframe whose name is Data. And let us > assume one had the height of a group of people measured at various ages. > > It could be made up of vectors Data$PersonalID, Data$FirstName, > Data$LastName, Data$Height.1, Data$Height.5, Data$Height.9, > Data$Height.10,Data$Height.12,Data$Height.20....many many more variables. > > How would one create a vector of all the Height variable names. > > The simple workaround is to not bother creating the vector "Data$Height.1" > "Data$Height.5" "Data$Height.9" "Data$Height.10" > "Data$Height.12""Data$Height.20"...but rather just to use the sapply > function. However with some functions the sapply will not work and it is > necessary to supply each variable name to a function (see thread at > Repeating tdt function on thousands of variables) > > > This is such a core capability. I would like to see it in the R-Wiki but > could not find it there.I may be misunderstanding what you want to do, but to simply get the names of the columns in Data that contain "Height", you can do this:> grep("Height", names(Data), value = TRUE)[1] "Height.1" "Height.5" "Height.9" "Height.10" "Height.12" [6] "Height.20" Now you could use something like the following: for (i in grep("Height", names(Data), value = TRUE)) YourFunctionHere(Data[[i]]) If it makes for easier reading, you could first assign the subset of the column names to a vector and then use that in the for() loop, rather than the above. HTH, Marc Schwartz
"Uwe Ligges" <ligges at statistik.uni-dortmund.de> wrote in message news:4458C63A.7080403 at statistik.uni-dortmund.de...> vnames <- paste("Height", 1:20, sep=".")Interesting but not suitable. It creates a name even if such a variable does not exist. I have 6000 variables. The numeric component of the variable name goes from 0 to about 10000000. The output from the for loop would be largely empty space. Thanks for teaching me the paste() which will come in use later. I saw a response that makes use of names() or colnames() -- Farrel Buchinsky, MD Pediatric Otolaryngologist Allegheny General Hospital Pittsburgh, PA