Ferry
2009-Apr-09 22:30 UTC
[R] how to automatically select certain columns using for loop in dataframe
Hi, I am trying to display / print certain columns in my data frame that share certain condition (for example, part of the column name). I am using for loop, as follow: # below is the sample data structure all.data <- data.frame( NUM_A = 1:5, NAME_A = c("Andy", "Andrew", "Angus", "Alex", "Argo"), NUM_B = 1:5, NAME_B = c(NA, "Barn", "Bolton", "Bravo", NA), NUM_C = 1:5, NAME_C = c("Candy", NA, "Cecil", "Crayon", "Corey"), NUM_D = 1:5, NAME_D = c("David", "Delta", NA, NA, "Dummy") ) col_names <- c("A", "B", "C", "D")> all.dataNUM_A NAME_A NUM_B NAME_B NUM_C NAME_C NUM_D NAME_D 1 1 Andy 1 <NA> 1 Candy 1 David 2 2 Andrew 2 Barn 2 <NA> 2 Delta 3 3 Angus 3 Bolton 3 Cecil 3 <NA> 4 4 Alex 4 Bravo 4 Crayon 4 <NA> 5 5 Argo 5 <NA> 5 Corey 5 Dummy>Then for each col_names, I want to display the columns: for (each_name in col_names) { sub.data <- subset( all.data, !is.na( paste("NAME_", each_name, sep = '') ), select = c( paste("NUM_", each_name, sep = '') , paste("NAME_", each_name, sep = '') ) ) print(sub.data) } the "incorrect" result: NUM_A NAME_A 1 1 Andy 2 2 Andrew 3 3 Angus 4 4 Alex 5 5 Argo NUM_B NAME_B 1 1 <NA> 2 2 Barn 3 3 Bolton 4 4 Bravo 5 5 <NA> NUM_C NAME_C 1 1 Candy 2 2 <NA> 3 3 Cecil 4 4 Crayon 5 5 Corey NUM_D NAME_D 1 1 David 2 2 Delta 3 3 <NA> 4 4 <NA> 5 5 Dummy>What I want to achieve is that the result should only display the NUM and NAME that is not NA. Here, the NA can be NULL, or zero (or other specific values). the "correct" result: NUM_A NAME_A 1 1 Andy 2 2 Andrew 3 3 Angus 4 4 Alex 5 5 Argo NUM_B NAME_B 2 2 Barn 3 3 Bolton 4 4 Bravo NUM_C NAME_C 1 1 Candy 3 3 Cecil 4 4 Crayon 5 5 Corey NUM_D NAME_D 1 1 David 2 2 Delta 5 5 Dummy>I am guessing that I don't use the correct type on the following statement (within the subset in the loop): !is.na( paste("NAME_", each_name, sep = '') ) But then, I might be completely using a wrong approach. Any idea is definitely appreciated. Thank you, Ferry [[alternative HTML version deleted]]
milton ruser
2009-Apr-10 03:30 UTC
[R] how to automatically select certain columns using for loop in dataframe
Hi Ferry, It is not so elegant, but you can try for (each_name in col_names) { sub.data <- subset( all.data, !is.na( paste("NAME_", each_name, sep = '') ), select = c( paste("NUM_", each_name, sep = '') , paste("NAME_", each_name, sep = '') ) ) sub.data.2<-subset(sub.data, !is.na(sub.data[,2])) print(sub.data.2) } On Thu, Apr 9, 2009 at 6:30 PM, Ferry <fmi.mlist@gmail.com> wrote:> Hi, > > I am trying to display / print certain columns in my data frame that share > certain condition (for example, part of the column name). I am using for > loop, as follow: > > # below is the sample data structure > all.data <- data.frame( NUM_A = 1:5, NAME_A = c("Andy", "Andrew", "Angus", > "Alex", "Argo"), > NUM_B = 1:5, NAME_B = c(NA, "Barn", "Bolton", > "Bravo", NA), > NUM_C = 1:5, NAME_C = c("Candy", NA, "Cecil", > "Crayon", "Corey"), > NUM_D = 1:5, NAME_D = c("David", "Delta", NA, NA, > "Dummy") ) > > col_names <- c("A", "B", "C", "D") > > > all.data > NUM_A NAME_A NUM_B NAME_B NUM_C NAME_C NUM_D NAME_D > 1 1 Andy 1 <NA> 1 Candy 1 David > 2 2 Andrew 2 Barn 2 <NA> 2 Delta > 3 3 Angus 3 Bolton 3 Cecil 3 <NA> > 4 4 Alex 4 Bravo 4 Crayon 4 <NA> > 5 5 Argo 5 <NA> 5 Corey 5 Dummy > > > > Then for each col_names, I want to display the columns: > > for (each_name in col_names) { > > sub.data <- subset( all.data, > !is.na( paste("NAME_", each_name, sep = '') ), > select = c( paste("NUM_", each_name, sep = '') , > paste("NAME_", each_name, sep = '') ) > ) > print(sub.data) > } > > the "incorrect" result: > > NUM_A NAME_A > 1 1 Andy > 2 2 Andrew > 3 3 Angus > 4 4 Alex > 5 5 Argo > NUM_B NAME_B > 1 1 <NA> > 2 2 Barn > 3 3 Bolton > 4 4 Bravo > 5 5 <NA> > NUM_C NAME_C > 1 1 Candy > 2 2 <NA> > 3 3 Cecil > 4 4 Crayon > 5 5 Corey > NUM_D NAME_D > 1 1 David > 2 2 Delta > 3 3 <NA> > 4 4 <NA> > 5 5 Dummy > > > > What I want to achieve is that the result should only display the NUM and > NAME that is not NA. Here, the NA can be NULL, or zero (or other specific > values). > > the "correct" result: > > NUM_A NAME_A > 1 1 Andy > 2 2 Andrew > 3 3 Angus > 4 4 Alex > 5 5 Argo > NUM_B NAME_B > 2 2 Barn > 3 3 Bolton > 4 4 Bravo > NUM_C NAME_C > 1 1 Candy > 3 3 Cecil > 4 4 Crayon > 5 5 Corey > NUM_D NAME_D > 1 1 David > 2 2 Delta > 5 5 Dummy > > > > I am guessing that I don't use the correct type on the following statement > (within the subset in the loop): > !is.na( paste("NAME_", each_name, sep = '') ) > > But then, I might be completely using a wrong approach. > > Any idea is definitely appreciated. > > Thank you, > > Ferry > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Petr PIKAL
2009-Apr-10 07:10 UTC
[R] Odp: how to automatically select certain columns using for loop in dataframe
Hi I do not like complicated paste cycles too much so I would prefer for (i in 1:4) print(na.omit(all.data[ ,last.char(names(all.data)) %in% col_names[i] ])) with last.char function like this last.char<-function(x) substring(x, first=nchar(x), last=nchar(x)) Regards Petr r-help-bounces at r-project.org napsal dne 10.04.2009 00:30:37:> Hi, > > I am trying to display / print certain columns in my data frame thatshare> certain condition (for example, part of the column name). I am using for > loop, as follow: > > # below is the sample data structure > all.data <- data.frame( NUM_A = 1:5, NAME_A = c("Andy", "Andrew","Angus",> "Alex", "Argo"), > NUM_B = 1:5, NAME_B = c(NA, "Barn", "Bolton", > "Bravo", NA), > NUM_C = 1:5, NAME_C = c("Candy", NA, "Cecil", > "Crayon", "Corey"), > NUM_D = 1:5, NAME_D = c("David", "Delta", NA,NA,> "Dummy") ) > > col_names <- c("A", "B", "C", "D") > > > all.data > NUM_A NAME_A NUM_B NAME_B NUM_C NAME_C NUM_D NAME_D > 1 1 Andy 1 <NA> 1 Candy 1 David > 2 2 Andrew 2 Barn 2 <NA> 2 Delta > 3 3 Angus 3 Bolton 3 Cecil 3 <NA> > 4 4 Alex 4 Bravo 4 Crayon 4 <NA> > 5 5 Argo 5 <NA> 5 Corey 5 Dummy > > > > Then for each col_names, I want to display the columns: > > for (each_name in col_names) { > > sub.data <- subset( all.data, > !is.na( paste("NAME_", each_name, sep = '')),> select = c( paste("NUM_", each_name, sep ='') ,> paste("NAME_", each_name, sep = '') ) > ) > print(sub.data) > } > > the "incorrect" result: > > NUM_A NAME_A > 1 1 Andy > 2 2 Andrew > 3 3 Angus > 4 4 Alex > 5 5 Argo > NUM_B NAME_B > 1 1 <NA> > 2 2 Barn > 3 3 Bolton > 4 4 Bravo > 5 5 <NA> > NUM_C NAME_C > 1 1 Candy > 2 2 <NA> > 3 3 Cecil > 4 4 Crayon > 5 5 Corey > NUM_D NAME_D > 1 1 David > 2 2 Delta > 3 3 <NA> > 4 4 <NA> > 5 5 Dummy > > > > What I want to achieve is that the result should only display the NUMand> NAME that is not NA. Here, the NA can be NULL, or zero (or otherspecific> values). > > the "correct" result: > > NUM_A NAME_A > 1 1 Andy > 2 2 Andrew > 3 3 Angus > 4 4 Alex > 5 5 Argo > NUM_B NAME_B > 2 2 Barn > 3 3 Bolton > 4 4 Bravo > NUM_C NAME_C > 1 1 Candy > 3 3 Cecil > 4 4 Crayon > 5 5 Corey > NUM_D NAME_D > 1 1 David > 2 2 Delta > 5 5 Dummy > > > > I am guessing that I don't use the correct type on the followingstatement> (within the subset in the loop): > !is.na( paste("NAME_", each_name, sep = '') ) > > But then, I might be completely using a wrong approach. > > Any idea is definitely appreciated. > > Thank you, > > Ferry > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code.