Steve_Friedman at nps.gov
2009-Apr-16 14:35 UTC
[R] Reading in a large number of dbf files
good morning This question is not a stats question per say but a data management and lattice plotting problem. I apologize now if I'm asking an inappropriate question to this gracious group. I'm need to bring in approximately 100 *.dbf files into R but I'm having difficultly understanding several examples I've tracked down regarding this procedure and could benefit from your suggestions. One example I've found does the following: DF <- lappy(dir(pattern="file.*\\.txt"), read.table, sep=";". header= TRUE) names(DF) <- paste("data", seq_along(DF), sep = "") This solution will not work for me for at least 2 reasons: 1) I need to modify the files after I import them by adding three new parameters to each file prior to combining them into a common data.frame For example one of my files is called SRF_DryDry_stats.dbf. The name of the file tells me that it refers to two conditions; 1) SRF = an indictor region field, and 2) DryDry = dry hydrological conditions. I also know that the data refer to a particular species. The data in the file include some general summarizing statistics (Min, Max, Range, Mean, and STD). After modifying the file, I need a species field, the SRF field and the hydro condition parameters in the file. After this modification, I need to "cbind" these files into a common file. 2) The goal is to use the common file to produce a series of lattice barchart graphs using the three new parameters as factors and plotting the some of the statistics in the lattice call statements. Is there a clean way of accomplishing these tasks or should the brute force approach be taken? Steve Friedman Ph. D. Spatial Statistical Analyst Everglades and Dry Tortugas National Park 950 N Krome Ave (3rd Floor) Homestead, Florida 33034 Steve_Friedman at nps.gov Office (305) 224 - 4282 Fax (305) 224 - 4147
A process like the following is how I would do it: inputData <- lapply(listOfFiles, function(.file){ input <- read.table(.file, ....whatever other parameters...) # now do the modifications that you need .... input # return the updated dataframe }) # combine into one dataframe inputData <- do.call(rbind, inputData) On Thu, Apr 16, 2009 at 10:35 AM, <Steve_Friedman at nps.gov> wrote:> > good morning > > This question is not a stats question per say but a data management and > lattice plotting problem. ?I apologize now if I'm asking an inappropriate > question to this gracious group. > > I'm need to bring in approximately 100 *.dbf files into R but I'm having > difficultly understanding several examples I've tracked down regarding this > procedure and could benefit from your suggestions. > > One example I've found does the following: > > DF <- lappy(dir(pattern="file.*\\.txt"), read.table, sep=";". header= TRUE) > names(DF) ?<- paste("data", seq_along(DF), sep = "") > > This solution will not work for me for at least 2 reasons: > > 1) I need to modify the files after I import them by adding three new > parameters to each file prior to combining them into a common data.frame > ?For example one of my files is called SRF_DryDry_stats.dbf. ?The name of > the file tells me that it refers to two conditions; 1) SRF = an indictor > region field, and 2) DryDry = dry hydrological conditions. I also know that > the data refer to a particular species. > > The data in the file include some general summarizing ?statistics (Min, > Max, Range, Mean, and STD). After modifying the file, I need a species > field, the SRF field and the hydro condition parameters in the file. ?After > this modification, I need to "cbind" these files into a common file. > > 2) The goal is to use the common file to produce a series of lattice > barchart graphs using the three new parameters as factors and plotting the > some of the statistics in the lattice call statements. > > Is there a clean way of accomplishing these tasks or should the brute force > approach be taken? > > > Steve Friedman Ph. D. > Spatial Statistical Analyst > Everglades and Dry Tortugas National Park > 950 N Krome Ave (3rd Floor) > Homestead, Florida 33034 > > Steve_Friedman at nps.gov > Office (305) 224 - 4282 > Fax ? ? (305) 224 - 4147 > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve?
Steve If the .dbf extension files are dBase type. generalizing: there are 2 series of dbase .dbf files for non SQL type dbf files: 1 dBase III when Borland had dBase and 2 dBase 2000 produced by dBase inc If they are dbase III (ie can be imported into Excel) you can use the foreign package to import them require(foreign) # get a list of files listd <- list.files(.....) # loop or otherwise for (j in seq_along(lisd) ) { # Example for 1 file (you will have to use something different for > 1) x <- read.dbf(list[j], as.is = TRUE) } Regards Duncan Mackay Department of Agronomy and Soil Science University of New England ARMIDALE NSW 2351 Email home: mackay at northnet.com.au At 00:35 17/04/2009, you wrote:>good morning > >This question is not a stats question per say but a data management and >lattice plotting problem. I apologize now if I'm asking an inappropriate >question to this gracious group. > >I'm need to bring in approximately 100 *.dbf files into R but I'm having >difficultly understanding several examples I've tracked down regarding this >procedure and could benefit from your suggestions. > >One example I've found does the following: > >DF <- lappy(dir(pattern="file.*\\.txt"), read.table, sep=";". header= TRUE) >names(DF) <- paste("data", seq_along(DF), sep = "") > >This solution will not work for me for at least 2 reasons: > >1) I need to modify the files after I import them by adding three new >parameters to each file prior to combining them into a common data.frame > For example one of my files is called SRF_DryDry_stats.dbf. The name of >the file tells me that it refers to two conditions; 1) SRF = an indictor >region field, and 2) DryDry = dry hydrological conditions. I also know that >the data refer to a particular species. > >The data in the file include some general summarizing statistics (Min, >Max, Range, Mean, and STD). After modifying the file, I need a species >field, the SRF field and the hydro condition parameters in the file. After >this modification, I need to "cbind" these files into a common file. > >2) The goal is to use the common file to produce a series of lattice >barchart graphs using the three new parameters as factors and plotting the >some of the statistics in the lattice call statements. > >Is there a clean way of accomplishing these tasks or should the brute force >approach be taken? > > >Steve Friedman Ph. D. >Spatial Statistical Analyst >Everglades and Dry Tortugas National Park >950 N Krome Ave (3rd Floor) >Homestead, Florida 33034 > >Steve_Friedman at nps.gov >Office (305) 224 - 4282 >Fax (305) 224 - 4147 > >______________________________________________ >R-help at r-project.org mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.