Hi R users:
I have the British Household Panel Survey (BHPS) in .tab format. I want to
feed it through the Amelia package (which will be an ‘interesting’ job in
itself)..
But first I need to convert the various types of missing value (from about
-9 to -1) to a more generic ‘NA’ code.
I’ve written the following function to do this:
BHPS.converter <- function(from="D:/Data/BHPS/UKDA-5151-tab/tab/",
to="D:/BHPS/NA/", ext="tab" ) {
from.files <- dir(from,
pattern=paste(".",ext,"$",sep="") )
existing.to.files <- dir(to,
pattern=paste(".",ext,"$",sep="") )
still.to.do.index <- 1:length(from.files)
still.to.do.index <-
still.to.do.index[-match(existing.to.files, from.files)]
obs.to.do <- length(still.to.do.index)
for (i in 1:obs.to.do){
temp.table <-
read.delim(paste(from,from.files[still.to.do.index[i]], sep=""))
print(paste("read:",
from.files[still.to.do.index[i]]))
temp.table[temp.table < 0 ] <- NA
write.table(temp.table,
file=paste(to,from.files[still.to.do.index[i]], sep=""))
print(paste("written:",
from.files[still.to.do.index[i]]))
}
rm(i, from.files, existing.to.files, still.to.do.index,
obs.to.do, temp.table)
}
It checks for existing files in the ‘to’ directory (where files which have
been modified with R- -> NA) because when I tried to do this conversion
operation previously it got about ½ way through then crashed.
The problem is that it crashes *this time* too, without displaying a prompt
to say it’s read a single file.
The file it gets stuck on is about 75mb in size.
I am using a dual-core 3.2Ghz Pentium D processor with 2 Gb memory (& 2Gb
virtual memory), and (unfortunately) Windows XP.
Questions:
1) Any general tips on how to increase the amount of memory available to
process the file?
2) Can you see a more efficient way of doing what I’m doing?
3) What’s the best way of coding for multiple forms of NA? – the BHPS code
‘-8’ (meaning ‘inapplicable’, not routed for this respondent) should really
be distinguished from other forms of nonresponse...
Thanks,
Jon
p.s. Apologies if this is slightly too vague/long winded...
Jon Minton
[[alternative HTML version deleted]]
On 10/31/2006 6:43 AM, Jon Minton wrote: ....> It checks for existing files in the ?to? directory (where files which have > been modified with R- -> NA) because when I tried to do this conversion > operation previously it got about ? way through then crashed. > > > > The problem is that it crashes *this time* too, without displaying a prompt > to say it?s read a single file.When you say "crash", do you mean it displays an R error (like "unable to allocate vector of length ....") or a real crash with a Windows popup? Which version are you using? There were some fixes to the memory management after the 2.3.1 release, but I haven't heard of any problems in 2.4.0 before this. Duncan Murdoch
Hi you shall probably provide more information (OS, R version). I cannot help you much with crash but here is some opinion. I would try to do conversion interactively before I transferred it to a function. However, if you want different types of NA and your data is numeric, you probably could make a distinction by using -Inf, Inf, NaN and NA, but then you need to be careful when doing analysis, as these values can be treated differently. HTH Petr On 31 Oct 2006 at 11:43, Jon Minton wrote: From: "Jon Minton" <jm540 at york.ac.uk> To: <r-help at stat.math.ethz.ch> Date sent: Tue, 31 Oct 2006 11:43:22 -0000 Subject: [R] R crashing during batch file formatting> Hi R users: > > > > I have the British Household Panel Survey (BHPS) in .tab format. I > want to feed it through the Amelia package (which will be an > ?interesting? job in itself).. > > But first I need to convert the various types of missing value (from > about -9 to -1) to a more generic ?NA? code. > > > > I?ve written the following function to do this: > > > > BHPS.converter <- function(from="D:/Data/BHPS/UKDA-5151-tab/tab/", > to="D:/BHPS/NA/", ext="tab" ) { > > from.files <- dir(from, > pattern=paste(".",ext,"$",sep="") ) > > existing.to.files <- dir(to, > pattern=paste(".",ext,"$",sep="") ) > > still.to.do.index <- 1:length(from.files) > > still.to.do.index <- > still.to.do.index[-match(existing.to.files, from.files)] > > obs.to.do <- length(still.to.do.index) > > for (i in 1:obs.to.do){ > > temp.table <- > read.delim(paste(from,from.files[still.to.do.index[i]], sep="")) > > print(paste("read:", > from.files[still.to.do.index[i]])) > > temp.table[temp.table < 0 ] <- NA > > write.table(temp.table, > file=paste(to,from.files[still.to.do.index[i]], sep="")) > > print(paste("written:", > from.files[still.to.do.index[i]])) > > } > > > > > > rm(i, from.files, existing.to.files, > still.to.do.index, > obs.to.do, temp.table) > > } > > > > It checks for existing files in the ?to? directory (where files which > have been modified with R- -> NA) because when I tried to do this > conversion operation previously it got about ? way through then > crashed. > > > > The problem is that it crashes *this time* too, without displaying a > prompt to say it?s read a single file. > > > > The file it gets stuck on is about 75mb in size. > > > > I am using a dual-core 3.2Ghz Pentium D processor with 2 Gb memory (& > 2Gb virtual memory), and (unfortunately) Windows XP. > > > > Questions: > > 1) Any general tips on how to increase the amount of memory available > to > process the file? > > 2) Can you see a more efficient way of doing what I?m doing? > > 3) What?s the best way of coding for multiple forms of NA? ? the BHPS > code ?-8? (meaning ?inapplicable?, not routed for this respondent) > should really be distinguished from other forms of nonresponse... > > > > > > Thanks, > > > > Jon > > > > > > p.s. Apologies if this is slightly too vague/long winded... > > > > > > Jon Minton > > > > > > > [[alternative HTML version deleted]] > >Petr Pikal petr.pikal at precheza.cz