similar to: Working with large datafiles

Displaying 20 results from an estimated 8000 matches similar to: "Working with large datafiles"

2003 Jun 04
2
rsync for migrating oracle datafiles
Hi - a question for all ye rsync guru's out there... I have a need to migrate some fairly large Oracle datafiles from a UFS filesystem to VxFS (VERITAS), however I am not being allowed nearly enough outage time to perform a standard file copy migration. The datafiles (of which there are about 4 are about 50GB each in size and on separate UFS filesystems. I am considering instigating a local
2012 Oct 26
1
Parsing very large xml datafiles with SAX (XML package): What data structure should I favor?
Hello again, I have another question related to parsing a very large xml file with SAX: what kind of data structure should I favor? Unlike using DOM function that can return lists of relevant nodes and let me use various versions of 'apply', the SAX parsing returns me one thing at a time. I first tried to simply append to simple solution of appending to lists as I get the data. But I
2012 Oct 26
1
Parsing very large xml datafiles with SAX: How to profile <anonymous> functions?
Hello everyone, I'm trying to parse a very large XML file using SAX with the XML package (i.e., mainly the xmlEventParsing function). This function takes as an argument a list of other functions (handlers) that will be called to handle particular xml nodes. If when I use Rprof(), all the handler functions are lumped together under the <anonymous> label, and I get something like this:
2012 Jul 19
3
On RObjectTables
I was wondering if anyone knows more about the state of RObjectTables. This largely undocumented functionality was introduced by Duncan around 2002 somewhere and enables you create an environment where the contents are dynamically queried by R through a hook function. It is mentioned in R Internals and ?attach. This functionality is quite powerful and allows you to e.g. offload a big database of R
2006 Mar 21
3
Rsync 4TB datafiles...?
I need to rsync 4 TB datafiles to remote server and clone to a new oracle database..I have about 40 drives that contains this 4 TB data. I would like to do rsync from a directory level by using --files-from=FILE option. But the problem is what will happen if the network connection fails the whole rsync will fail right. rsync -a srchost:/ / --files-from=dbf-list and dbf-list would contain this:
2004 Nov 18
4
Enormous Datasets
Dear List, I have some projects where I use enormous datasets. For instance, the 5% PUMS microdata from the Census Bureau. After deleting cases I may have a dataset with 7 million+ rows and 50+ columns. Will R handle a datafile of this size? If so, how? Thank you in advance, Tom Volscho ************************************ Thomas W. Volscho Graduate Student Dept. of Sociology U-2068
2015 Apr 15
4
RObjectTables freezes in R 3.2.0 RC on 32bit systems
We recently started noticing freezes that appear only on 32bit systems (both linux and windows) with a relatively recent versions of R 3.2.0, including the RC. It looks like the problem can be traced back to the use of R_ObjectTables (see R_ext/Callbacks.h) The problem is a bit difficult to reproduce because it does not appear on x64 and because the official R interface to this functionality, the
2011 Jun 07
1
Regular Expressions for "Large" Data Set
I'm running R 2.13 on Ubuntu 10.10 I have a data set which is comprised of character strings. site = readLines('http://www.census.gov/tiger/tms/gazetteer/zips.txt') dat <- c("01, 35004, AL, ACMAR, 86.51557, 33.584132, 6055, 0.001499") dat I want to loop through the data and construct a data frame with the zip code, state abbreviation, and city name in seperate columns.
2015 Mar 12
2
Best way to handle dependency on non-CRAN package / large data package?
Thanks Dirk. I'm looking at it now. At first glance your documentation brings up a good limitation of simply telling users to type "devtools::install_github()". Namely, what happens when the census bureau updates their shapefiles, and I subsequently decide to update the package? Or if I discover an error in the package and decide to update it? The choroplethr package could have a
2005 Apr 14
2
Reading and coalescing many datafiles.
Greetings. I've got some analysis problems I'm trying to solve, the raw data for which are accumulated in a bunch of time-and-date-based files. /some/path/2005-01-02-00-00-02 etc. The best 'read all these files' method I've seen in the r-help archives comes down to for (df in my_list_of_filenames ) { dat <- rbind(dat,my_read_function(df)) } which,
2007 Mar 23
1
Completely off topic, but amusing?
Folks: Thought that many on this list might find this amusing, perhaps even a bit relevant. Hope it's OK: ************ WASHINGTON - The government's estimate of the number of Americans without health insurance fell by nearly 2 million Friday, but not because anyone got health coverage. The Census Bureau <http://search.news.yahoo.com/search/news/?p=Census+Bureau> said it has
2008 Jan 17
1
Any tools for working with US 2000 census data?
I've been given the job of extracting some data from the United States 2000 census (files at http://www2.census.gov/census_2000/datasets/Summary_File_2/Maryland/all_ Maryland.zip 52M). I'm only interested in Census Block Groups (CBGs) located within Baltimore City, Maryland. Additionally, I just have to extract certain data fields. I think I'll be using Summary File 2. This is my first
2006 Jul 11
18
Zip Code Ranges
Does anyone have any recommendations for working with zip code distance ranges? I need to calculate the distances between US zip codes. Thanks! -------------- next part -------------- An HTML attachment was scrubbed... URL: http://wrath.rubyonrails.org/pipermail/rails/attachments/20060711/f133d7de/attachment-0001.html
2008 Feb 13
3
Generalized nonlinear mixed model function?
I am wondering if there is an R function that could estimate a generalized nonlinear mixed model. >From my reading it seems that nlme from the nlme package can fit nonlinear mixed models, while lmer from the lme4 package can fit generalized linear mixed models. One alternative I?ve found is gnlmix from the repeated package, although this only allows for a single random effect. Is there
2010 Aug 12
2
How to building my own datafile
Hi folks, I'm prepared building my own datafiles, simple file at start, for testing wondering how to process? Which software will be used, MySQL/MS-SQL/MS-Excel/Open Office-Calc etc? On searching I found r-cran-rmysql on Ubuntu repo. Further searching I found; RMySQL: R interface to the MySQL database http://cran.r-project.org/web/packages/RMySQL/index.html Whether install the above
2004 Apr 22
1
rsync'ing large files
I'm using rsync to copy some large (>1GB) oracle datafiles. I've noticed that sometimes it transfers some of the files twice. Some earlier posts to this list that I saw in the archives seemed to indicate that this is a problem with the rsync algorithm itself when dealing with large files. Some of the mails seemed to indicate that this can be mitigated by using larger block sizes,
2013 Dec 09
2
How can I find nonstandard or control characters in a large file?
I have a humongous csv file containing census data, far too big to read into RAM. I have been trying to extract individual columns from this file using the colbycol package. This works for certain subsets of the columns, but not for others. I have not yet been able to precisely identify the problem columns, as there are 731 columns and running colbycol on the file on my old slow machine takes
2011 Dec 28
2
Census ARIMA x-12 seasonal adjustment in R?
Hello, I am new to usin R - which is a great tool - and would like to know if R has a seasonal adjustment program for time series and/if it incorporates the Census Bureau's ARIMA x-12 seasonal adjustment program in any way? Thanks so much! Tony [[alternative HTML version deleted]]
2003 Nov 03
10
USA map
R users, In S, there was a function called usa() that would draw the map of the United States, plus it had other options for graphics. I have looked but I can't find the equivalent in R. Is there one? Thanks, Jason
2011 Feb 10
1
"Error in plot.window(...) : invalid 'xlim' value" from plot(...par(new = TRUE))
[New to the community; still in early part of R's learning curve.] Several months ago, I was requested to generate some graphs on a periodic basis. Accordingly, I managed to figure out a way to do so, using a combination of Perl and R (in a FreeBSD environment). While I've needed to adjust a few things here and there, the general approach has been pretty solid , and the R part has had