Steve_Friedman at nps.gov
2009-Apr-16 14:35 UTC
[R] Reading in a large number of dbf files
good morning
This question is not a stats question per say but a data management and
lattice plotting problem. I apologize now if I'm asking an inappropriate
question to this gracious group.
I'm need to bring in approximately 100 *.dbf files into R but I'm having
difficultly understanding several examples I've tracked down regarding this
procedure and could benefit from your suggestions.
One example I've found does the following:
DF <- lappy(dir(pattern="file.*\\.txt"), read.table,
sep=";". header= TRUE)
names(DF) <- paste("data", seq_along(DF), sep = "")
This solution will not work for me for at least 2 reasons:
1) I need to modify the files after I import them by adding three new
parameters to each file prior to combining them into a common data.frame
For example one of my files is called SRF_DryDry_stats.dbf. The name of
the file tells me that it refers to two conditions; 1) SRF = an indictor
region field, and 2) DryDry = dry hydrological conditions. I also know that
the data refer to a particular species.
The data in the file include some general summarizing statistics (Min,
Max, Range, Mean, and STD). After modifying the file, I need a species
field, the SRF field and the hydro condition parameters in the file. After
this modification, I need to "cbind" these files into a common file.
2) The goal is to use the common file to produce a series of lattice
barchart graphs using the three new parameters as factors and plotting the
some of the statistics in the lattice call statements.
Is there a clean way of accomplishing these tasks or should the brute force
approach be taken?
Steve Friedman Ph. D.
Spatial Statistical Analyst
Everglades and Dry Tortugas National Park
950 N Krome Ave (3rd Floor)
Homestead, Florida 33034
Steve_Friedman at nps.gov
Office (305) 224 - 4282
Fax (305) 224 - 4147
A process like the following is how I would do it:
inputData <- lapply(listOfFiles, function(.file){
input <- read.table(.file, ....whatever other parameters...)
# now do the modifications that you need
....
input # return the updated dataframe
})
# combine into one dataframe
inputData <- do.call(rbind, inputData)
On Thu, Apr 16, 2009 at 10:35 AM, <Steve_Friedman at nps.gov>
wrote:>
> good morning
>
> This question is not a stats question per say but a data management and
> lattice plotting problem. ?I apologize now if I'm asking an
inappropriate
> question to this gracious group.
>
> I'm need to bring in approximately 100 *.dbf files into R but I'm
having
> difficultly understanding several examples I've tracked down regarding
this
> procedure and could benefit from your suggestions.
>
> One example I've found does the following:
>
> DF <- lappy(dir(pattern="file.*\\.txt"), read.table,
sep=";". header= TRUE)
> names(DF) ?<- paste("data", seq_along(DF), sep = "")
>
> This solution will not work for me for at least 2 reasons:
>
> 1) I need to modify the files after I import them by adding three new
> parameters to each file prior to combining them into a common data.frame
> ?For example one of my files is called SRF_DryDry_stats.dbf. ?The name of
> the file tells me that it refers to two conditions; 1) SRF = an indictor
> region field, and 2) DryDry = dry hydrological conditions. I also know that
> the data refer to a particular species.
>
> The data in the file include some general summarizing ?statistics (Min,
> Max, Range, Mean, and STD). After modifying the file, I need a species
> field, the SRF field and the hydro condition parameters in the file. ?After
> this modification, I need to "cbind" these files into a common
file.
>
> 2) The goal is to use the common file to produce a series of lattice
> barchart graphs using the three new parameters as factors and plotting the
> some of the statistics in the lattice call statements.
>
> Is there a clean way of accomplishing these tasks or should the brute force
> approach be taken?
>
>
> Steve Friedman Ph. D.
> Spatial Statistical Analyst
> Everglades and Dry Tortugas National Park
> 950 N Krome Ave (3rd Floor)
> Homestead, Florida 33034
>
> Steve_Friedman at nps.gov
> Office (305) 224 - 4282
> Fax ? ? (305) 224 - 4147
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Jim Holtman
Cincinnati, OH
+1 513 646 9390
What is the problem that you are trying to solve?
Steve
If the .dbf extension files are dBase type.
generalizing: there are 2 series of dbase .dbf files for non SQL type dbf
files:
1 dBase III when Borland had dBase and
2 dBase 2000 produced by dBase inc
If they are dbase III (ie can be imported into Excel) you can use the
foreign package to import them
require(foreign)
# get a list of files
listd <- list.files(.....)
# loop or otherwise
for (j in seq_along(lisd) ) {
# Example for 1 file (you will have to use something different for > 1)
x <- read.dbf(list[j], as.is = TRUE)
}
Regards
Duncan Mackay
Department of Agronomy and Soil Science
University of New England
ARMIDALE NSW 2351
Email home: mackay at northnet.com.au
At 00:35 17/04/2009, you wrote:
>good morning
>
>This question is not a stats question per say but a data management and
>lattice plotting problem. I apologize now if I'm asking an
inappropriate
>question to this gracious group.
>
>I'm need to bring in approximately 100 *.dbf files into R but I'm
having
>difficultly understanding several examples I've tracked down regarding
this
>procedure and could benefit from your suggestions.
>
>One example I've found does the following:
>
>DF <- lappy(dir(pattern="file.*\\.txt"), read.table,
sep=";". header= TRUE)
>names(DF) <- paste("data", seq_along(DF), sep = "")
>
>This solution will not work for me for at least 2 reasons:
>
>1) I need to modify the files after I import them by adding three new
>parameters to each file prior to combining them into a common data.frame
> For example one of my files is called SRF_DryDry_stats.dbf. The name of
>the file tells me that it refers to two conditions; 1) SRF = an indictor
>region field, and 2) DryDry = dry hydrological conditions. I also know that
>the data refer to a particular species.
>
>The data in the file include some general summarizing statistics (Min,
>Max, Range, Mean, and STD). After modifying the file, I need a species
>field, the SRF field and the hydro condition parameters in the file. After
>this modification, I need to "cbind" these files into a common
file.
>
>2) The goal is to use the common file to produce a series of lattice
>barchart graphs using the three new parameters as factors and plotting the
>some of the statistics in the lattice call statements.
>
>Is there a clean way of accomplishing these tasks or should the brute force
>approach be taken?
>
>
>Steve Friedman Ph. D.
>Spatial Statistical Analyst
>Everglades and Dry Tortugas National Park
>950 N Krome Ave (3rd Floor)
>Homestead, Florida 33034
>
>Steve_Friedman at nps.gov
>Office (305) 224 - 4282
>Fax (305) 224 - 4147
>
>______________________________________________
>R-help at r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.