Frank Schäffer
2009-Aug-09 15:29 UTC
[R] howto get the number of columns and column names of multiply data frames
Hi, I' ve read in several files with measurements into R data frames(works flawlessly). Each dataframe is named by the location of measurement and contains hundreds of rows and about 50 columns like this dataframe1. date measurment_1 .... mesurement_n 1 2 3 .. .. .. n For further processing I need to check whether or not ncol and colnames are the same for all dataframes. Also I need to add a new column to each dataframe with contain the name of the dataframe, so that this column can be treated as factor in later processing (after merging some seleted dataframes to one) I tried out for (i in 1:length(ls()){ print(ncol(ls()[i]) } but this does not work because r returns a "character" for i and therefore "NULL" as result. Reading the output of ls() into a list also does not work. How can I accomplish this task?? Best regards and thanks Frank
Steve Lianoglou
2009-Aug-09 18:47 UTC
[R] howto get the number of columns and column names of multiply data frames
Hi, On Aug 9, 2009, at 11:29 AM, Frank Sch?ffer wrote:> Hi, > I' ve read in several files with measurements into R data frames(works > flawlessly). Each dataframe is named by the location of measurement > and > contains hundreds of rows and about 50 columns like this > > dataframe1. > date measurment_1 .... mesurement_n > 1 > 2 > 3 > .. > .. > .. > nJust as an aside, it's somehow considered more R-idiomatic to store all of these tables in a list (of tables) and access them as mydata[[1]], mydata[[2]], ..., mydata[[n]]. Assuming the datafiles are 'filename.1.txt', 'filename.2.txt', etc. You might do this like so: mydata <- lapply(paste('filename', 1:n, 'txt', sep='.'), read.table, header=TRUE, sep=...) To test that all colnames are the same, you could do something like. names1 <- colnames(mydata[[1]]) all(sapply(2:n, function(dat) length(intersect(names1, colnames(mydata[[n]]))) == length(names1)))> For further processing I need to check whether or not ncol and > colnames are > the same for all dataframes. > Also I need to add a new column to each dataframe with contain the > name of the > dataframe, so that this column can be treated as factor in later > processing > (after merging some seleted dataframes to one) > > I tried out > > for (i in 1:length(ls()){ > print(ncol(ls()[i]) > } > > but this does not work because r returns a "character" for i and > therefore > "NULL" as result. > Reading the output of ls() into a list also does not work. > > How can I accomplish this task??If you still want to do it this way, see: ?get for example: for (varName in paste('dataframe', 1:n, sep='')) { cat(colnames(get(varName))) } HTH, -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact
Don MacQueen
2009-Aug-09 20:26 UTC
[R] howto get the number of columns and column names of multiply data frames
## You can use get() for ( i in 1:n) { nm <- paste('dataframe',i,sep='') cat( ncol( get(nm)), 'columns in',nm,'\n') ) } ## or nms <- ls(pattern='dataframe') for (nm in nms) cat( ncol(get(nm)) , 'columns in',nm,'\n') ) } (Assuming I have balanced parantheses, that is -- my email software doesn't check that like Emacs does!) Storing the dataframes as elements of a list, as Steve Lianoglou suggested, lets you avoid using the get() function. You could also use the count.fields() function to check whether the files have the correct number of columns even before you read the data it. Or make a pass through the files reading in only the first line as data, and comparing those as data rather than as a names attribute of a dataframe. -Don At 5:29 PM +0200 8/9/09, Frank Sch?ffer wrote:>Hi, >I' ve read in several files with measurements into R data frames(works >flawlessly). Each dataframe is named by the location of measurement and >contains hundreds of rows and about 50 columns like this > >dataframe1. >date measurment_1 .... mesurement_n >1 >2 >3 >.. >.. >.. >n > >For further processing I need to check whether or not ncol and colnames are >the same for all dataframes. >Also I need to add a new column to each dataframe with contain the name of the >dataframe, so that this column can be treated as factor in later processing >(after merging some seleted dataframes to one) > >I tried out > >for (i in 1:length(ls()){ > print(ncol(ls()[i]) >} > >but this does not work because r returns a "character" for i and therefore >"NULL" as result. >Reading the output of ls() into a list also does not work. > >How can I accomplish this task?? > >Best regards and thanks > >Frank > >______________________________________________ >R-help at r-project.org mailing list >https://*stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide http://*www.*R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.-- -------------------------------------- Don MacQueen Environmental Protection Department Lawrence Livermore National Laboratory Livermore, CA, USA 925-423-1062