Hello: I need to take a species-sample matrix and transpose it to the format used by PC-ORD for analysis. Unfortunately, the number of species is very large (>5000), and so this operation cannot be performed simply in an application like Excel, which has a 255 column limit. So, I wrote relatively simple code in R that I hoped would do this (appended below). But there are glitches. The format needed for PC-ORD (where "NA" shows an empty cell): NA,3,sites,NA NA,3,species,NA NA,Q,Q,Q NA,sp1,sp2,sp3 site1,1,0,0 site2,0,1,2 site3,0,3,0 2 cells in first row indicate number of samples (rows), the second column indicates number of species (columns), the third row indicates variable type (Q = quantitative), and the fourth row shows column headers (species names). So, one can create a transposable matrix in a spreadsheet where 5000+ species are the rows: NA,NA,NA,NA,site1,site2,site3 3,3,Q,sp1,1,0,0 sites,species,Q,sp2,0,1,3 NA,NA,Q,sp3,0,2,0 It is important that the data file written out is totally clean and ready to go for PC-ORD, because I cannot open and edit it in a spreadsheet. However, the code performs the transpose operation and writes the file, but now the former row IDs are the first row in the new file (NA,1,2,3), and the 4 leading spaces are "X, X.1, X.2, X.3". I'd like to delete the first row and delete the first 4 values of column1, without deleting the column. NA,1,2,3 X,3,islands,NA X.1,3,speciesNA X.2,Q,Q,Q X.3,sp1,sp2,sp3 site1,1,0,0 site2,0,1,2 site3,0,3,0 I have tried various tricks that I will not list/belabor here (various col.names, row.names, header, Extract, etc commands). Any further hints on code that will either stop R from adding these, or strip them at the end? (PS, yes, I can learn how to my multivariate analyses in R and skip PC-ORD, but I am time limited on this one, and it seems that this code could be very useful in numerous ways) Many thanks for the help, Dan Gruner (Windows XP, R vers2.2) ##transpose datasets to convert to PC-ORD format data<-read.csv("data.csv", header=TRUE, as.is=T, strip.white=T, na.strings="NA") data<-as.matrix(data) data.trans <- t(data) write.csv(data.trans, file = "datatransp.csv", quote = F, na = "") ******************************* Daniel S. Gruner, Postdoctoral Scholar Bodega Marine Lab, University of California -- Davis PO Box 247, 2099 Westside Rd Bodega Bay, CA 94923-0247 (o) 707.875.2022 (f) 707.875.2009 (m) 707.338.5722 email: dsgruner_at_ucdavis.edu http://www.bml.ucdavis.edu/facresearch/gruner.html http://www.hawaii.edu/ant/
I do not know exactly what you are looking for but it seems that you are writing the column names (which become row names) when transposing the data. So to fix this try using write.table(..., sep=",", row.names=F) Jean Daniel Gruner wrote:> Hello: > > I need to take a species-sample matrix and transpose it to the format > used by PC-ORD for analysis. Unfortunately, the number of species is > very large (>5000), and so this operation cannot be performed simply > in an application like Excel, which has a 255 column limit. So, I > wrote relatively simple code in R that I hoped would do this > (appended below). But there are glitches. > > The format needed for PC-ORD (where "NA" shows an empty cell): > > NA,3,sites,NA > NA,3,species,NA > NA,Q,Q,Q > NA,sp1,sp2,sp3 > site1,1,0,0 > site2,0,1,2 > site3,0,3,0 > > 2 cells in first row indicate number of samples (rows), the second > column indicates number of species (columns), the third row indicates > variable type (Q = quantitative), and the fourth row shows column > headers (species names). So, one can create a transposable matrix in > a spreadsheet where 5000+ species are the rows: > > NA,NA,NA,NA,site1,site2,site3 > 3,3,Q,sp1,1,0,0 > sites,species,Q,sp2,0,1,3 > NA,NA,Q,sp3,0,2,0 > > > It is important that the data file written out is totally clean and > ready to go for PC-ORD, because I cannot open and edit it in a > spreadsheet. However, the code performs the transpose operation and > writes the file, but now the former row IDs are the first row in the > new file (NA,1,2,3), and the 4 leading spaces are "X, X.1, X.2, > X.3". I'd like to delete the first row and delete the first 4 values > of column1, without deleting the column. > > NA,1,2,3 > X,3,islands,NA > X.1,3,speciesNA > X.2,Q,Q,Q > X.3,sp1,sp2,sp3 > site1,1,0,0 > site2,0,1,2 > site3,0,3,0 > > I have tried various tricks that I will not list/belabor here > (various col.names, row.names, header, Extract, etc commands). Any > further hints on code that will either stop R from adding these, or > strip them at the end? > > (PS, yes, I can learn how to my multivariate analyses in R and skip > PC-ORD, but I am time limited on this one, and it seems that this > code could be very useful in numerous ways) > > Many thanks for the help, > Dan Gruner > (Windows XP, R vers2.2) > > > > ##transpose datasets to convert to PC-ORD format > > data<-read.csv("data.csv", header=TRUE, as.is=T, > strip.white=T, na.strings="NA") > data<-as.matrix(data) > data.trans <- t(data) > write.csv(data.trans, file = "datatransp.csv", > quote = F, na = "") > > > > ******************************* > > Daniel S. Gruner, Postdoctoral Scholar > Bodega Marine Lab, University of California -- Davis > PO Box 247, 2099 Westside Rd > Bodega Bay, CA 94923-0247 > (o) 707.875.2022 (f) 707.875.2009 (m) 707.338.5722 > email: dsgruner_at_ucdavis.edu > http://www.bml.ucdavis.edu/facresearch/gruner.html > http://www.hawaii.edu/ant/ > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html > >
Daniel, I can help somewhat I think. PC-ORD also allows data input in what it calls "database" format, where each row is sample, taxon, abundance There as many rows/sample as there are non-zero species, and only three columns. To get your taxon data.frame (currently samples as rows, species as columns, called data in your example) in that format try dematrify(data,file='whatever.csv') with the function pasted below (watch out for email-altered line breaks). That will create a CSV file you can import into PC-ORD. Just to encourage you a little, you really should try the Ecology packages in R. See packages vegan, ade-4, and labdsv, for example, and take a look at http://ecology.msu.montana.edu/labdsv/R Dave R. ********************************************************************* dematrify <- function (df,filename=NULL,sep=",") { tmp <- which(df>0,arr.ind=TRUE) stack <- NULL samples <- row.names(tmp) taxon <- names(df)[tmp[,2]] abund <- rep(NA,nrow(tmp)) for (i in 1:nrow(tmp)) { abund[i] <- df[samples[i],taxon[i]] stack <- rbind(stack,paste(samples[i],sep,taxon[i],sep,abund[i],"\n",sep="")) } if (is.null(filename)) { tmp2 <- cbind(samples,taxon,abund) tmp2 <- data.frame(tmp2[order(tmp2[,1]),]) return(tmp2) } else { stack <- sort(stack) sink(file=filename) cat(stack) sink() } } Daniel Gruner wrote:> Hello: > > I need to take a species-sample matrix and transpose it to the format > used by PC-ORD for analysis. Unfortunately, the number of species is > very large (>5000), and so this operation cannot be performed simply > in an application like Excel, which has a 255 column limit. So, I > wrote relatively simple code in R that I hoped would do this > (appended below). But there are glitches. > > The format needed for PC-ORD (where "NA" shows an empty cell): > > NA,3,sites,NA > NA,3,species,NA > NA,Q,Q,Q > NA,sp1,sp2,sp3 > site1,1,0,0 > site2,0,1,2 > site3,0,3,0 > > 2 cells in first row indicate number of samples (rows), the second > column indicates number of species (columns), the third row indicates > variable type (Q = quantitative), and the fourth row shows column > headers (species names). So, one can create a transposable matrix in > a spreadsheet where 5000+ species are the rows: > > NA,NA,NA,NA,site1,site2,site3 > 3,3,Q,sp1,1,0,0 > sites,species,Q,sp2,0,1,3 > NA,NA,Q,sp3,0,2,0 > > > It is important that the data file written out is totally clean and > ready to go for PC-ORD, because I cannot open and edit it in a > spreadsheet. However, the code performs the transpose operation and > writes the file, but now the former row IDs are the first row in the > new file (NA,1,2,3), and the 4 leading spaces are "X, X.1, X.2, > X.3". I'd like to delete the first row and delete the first 4 values > of column1, without deleting the column. > > NA,1,2,3 > X,3,islands,NA > X.1,3,speciesNA > X.2,Q,Q,Q > X.3,sp1,sp2,sp3 > site1,1,0,0 > site2,0,1,2 > site3,0,3,0 > > I have tried various tricks that I will not list/belabor here > (various col.names, row.names, header, Extract, etc commands). Any > further hints on code that will either stop R from adding these, or > strip them at the end? > > (PS, yes, I can learn how to my multivariate analyses in R and skip > PC-ORD, but I am time limited on this one, and it seems that this > code could be very useful in numerous ways) > > Many thanks for the help, > Dan Gruner > (Windows XP, R vers2.2) > > > > ##transpose datasets to convert to PC-ORD format > > data<-read.csv("data.csv", header=TRUE, as.is=T, > strip.white=T, na.strings="NA") > data<-as.matrix(data) > data.trans <- t(data) > write.csv(data.trans, file = "datatransp.csv", > quote = F, na = "") > > > > ******************************* > > Daniel S. Gruner, Postdoctoral Scholar > Bodega Marine Lab, University of California -- Davis > PO Box 247, 2099 Westside Rd > Bodega Bay, CA 94923-0247 > (o) 707.875.2022 (f) 707.875.2009 (m) 707.338.5722 > email: dsgruner_at_ucdavis.edu > http://www.bml.ucdavis.edu/facresearch/gruner.html > http://www.hawaii.edu/ant/ > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html > >-- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ David W. Roberts office 406-994-4548 Professor and Head FAX 406-994-3190 Department of Ecology email droberts at montana.edu Montana State University Bozeman, MT 59717-3460