thr3ads.net - similar to: "Working with large datafiles"

Displaying 20 results from an estimated 8000 matches similar to: "Working with large datafiles"

2003 Jun 04

rsync for migrating oracle datafiles

Hi - a question for all ye rsync guru's out there... I have a need to migrate some fairly large Oracle datafiles from a UFS filesystem to VxFS (VERITAS), however I am not being allowed nearly enough outage time to perform a standard file copy migration. The datafiles (of which there are about 4 are about 50GB each in size and on separate UFS filesystems. I am considering instigating a local

Parsing very large xml datafiles with SAX (XML package): What data structure should I favor?

2012 Oct 26

Parsing very large xml datafiles with SAX (XML package): What data structure should I favor?

Hello again, I have another question related to parsing a very large xml file with SAX: what kind of data structure should I favor? Unlike using DOM function that can return lists of relevant nodes and let me use various versions of 'apply', the SAX parsing returns me one thing at a time. I first tried to simply append to simple solution of appending to lists as I get the data. But I

Parsing very large xml datafiles with SAX: How to profile <anonymous> functions?

2012 Oct 26

Parsing very large xml datafiles with SAX: How to profile <anonymous> functions?

Hello everyone, I'm trying to parse a very large XML file using SAX with the XML package (i.e., mainly the xmlEventParsing function). This function takes as an argument a list of other functions (handlers) that will be called to handle particular xml nodes. If when I use Rprof(), all the handler functions are lumped together under the <anonymous> label, and I get something like this:

On RObjectTables

2012 Jul 19

On RObjectTables

I was wondering if anyone knows more about the state of RObjectTables. This largely undocumented functionality was introduced by Duncan around 2002 somewhere and enables you create an environment where the contents are dynamically queried by R through a hook function. It is mentioned in R Internals and ?attach. This functionality is quite powerful and allows you to e.g. offload a big database of R

Rsync 4TB datafiles...?

2006 Mar 21

Rsync 4TB datafiles...?

I need to rsync 4 TB datafiles to remote server and clone to a new oracle database..I have about 40 drives that contains this 4 TB data. I would like to do rsync from a directory level by using --files-from=FILE option. But the problem is what will happen if the network connection fails the whole rsync will fail right. rsync -a srchost:/ / --files-from=dbf-list and dbf-list would contain this:

Enormous Datasets

2004 Nov 18

Enormous Datasets

Dear List, I have some projects where I use enormous datasets. For instance, the 5% PUMS microdata from the Census Bureau. After deleting cases I may have a dataset with 7 million+ rows and 50+ columns. Will R handle a datafile of this size? If so, how? Thank you in advance, Tom Volscho ************************************ Thomas W. Volscho Graduate Student Dept. of Sociology U-2068

RObjectTables freezes in R 3.2.0 RC on 32bit systems

2015 Apr 15

RObjectTables freezes in R 3.2.0 RC on 32bit systems

We recently started noticing freezes that appear only on 32bit systems (both linux and windows) with a relatively recent versions of R 3.2.0, including the RC. It looks like the problem can be traced back to the use of R_ObjectTables (see R_ext/Callbacks.h) The problem is a bit difficult to reproduce because it does not appear on x64 and because the official R interface to this functionality, the

Regular Expressions for "Large" Data Set

2011 Jun 07

Regular Expressions for "Large" Data Set

I'm running R 2.13 on Ubuntu 10.10 I have a data set which is comprised of character strings. site = readLines('http://www.census.gov/tiger/tms/gazetteer/zips.txt') dat <- c("01, 35004, AL, ACMAR, 86.51557, 33.584132, 6055, 0.001499") dat I want to loop through the data and construct a data frame with the zip code, state abbreviation, and city name in seperate columns.

Best way to handle dependency on non-CRAN package / large data package?

2015 Mar 12

Best way to handle dependency on non-CRAN package / large data package?

Thanks Dirk. I'm looking at it now. At first glance your documentation brings up a good limitation of simply telling users to type "devtools::install_github()". Namely, what happens when the census bureau updates their shapefiles, and I subsequently decide to update the package? Or if I discover an error in the package and decide to update it? The choroplethr package could have a

Reading and coalescing many datafiles.

2005 Apr 14

Reading and coalescing many datafiles.

Greetings. I've got some analysis problems I'm trying to solve, the raw data for which are accumulated in a bunch of time-and-date-based files. /some/path/2005-01-02-00-00-02 etc. The best 'read all these files' method I've seen in the r-help archives comes down to for (df in my_list_of_filenames ) { dat <- rbind(dat,my_read_function(df)) } which,

Completely off topic, but amusing?

2007 Mar 23

Completely off topic, but amusing?

Folks: Thought that many on this list might find this amusing, perhaps even a bit relevant. Hope it's OK: ************ WASHINGTON - The government's estimate of the number of Americans without health insurance fell by nearly 2 million Friday, but not because anyone got health coverage. The Census Bureau <http://search.news.yahoo.com/search/news/?p=Census+Bureau> said it has

Any tools for working with US 2000 census data?

2008 Jan 17

Any tools for working with US 2000 census data?

I've been given the job of extracting some data from the United States 2000 census (files at http://www2.census.gov/census_2000/datasets/Summary_File_2/Maryland/all_ Maryland.zip 52M). I'm only interested in Census Block Groups (CBGs) located within Baltimore City, Maryland. Additionally, I just have to extract certain data fields. I think I'll be using Summary File 2. This is my first

Zip Code Ranges

2006 Jul 11

Zip Code Ranges

Does anyone have any recommendations for working with zip code distance ranges? I need to calculate the distances between US zip codes. Thanks! -------------- next part -------------- An HTML attachment was scrubbed... URL: http://wrath.rubyonrails.org/pipermail/rails/attachments/20060711/f133d7de/attachment-0001.html

Generalized nonlinear mixed model function?

2008 Feb 13

Generalized nonlinear mixed model function?

I am wondering if there is an R function that could estimate a generalized nonlinear mixed model. >From my reading it seems that nlme from the nlme package can fit nonlinear mixed models, while lmer from the lme4 package can fit generalized linear mixed models. One alternative I?ve found is gnlmix from the repeated package, although this only allows for a single random effect. Is there

How to building my own datafile

2010 Aug 12

How to building my own datafile

Hi folks, I'm prepared building my own datafiles, simple file at start, for testing wondering how to process? Which software will be used, MySQL/MS-SQL/MS-Excel/Open Office-Calc etc? On searching I found r-cran-rmysql on Ubuntu repo. Further searching I found; RMySQL: R interface to the MySQL database http://cran.r-project.org/web/packages/RMySQL/index.html Whether install the above

rsync'ing large files

2004 Apr 22

rsync'ing large files

I'm using rsync to copy some large (>1GB) oracle datafiles. I've noticed that sometimes it transfers some of the files twice. Some earlier posts to this list that I saw in the archives seemed to indicate that this is a problem with the rsync algorithm itself when dealing with large files. Some of the mails seemed to indicate that this can be mitigated by using larger block sizes,

How can I find nonstandard or control characters in a large file?

2013 Dec 09

How can I find nonstandard or control characters in a large file?

I have a humongous csv file containing census data, far too big to read into RAM. I have been trying to extract individual columns from this file using the colbycol package. This works for certain subsets of the columns, but not for others. I have not yet been able to precisely identify the problem columns, as there are 731 columns and running colbycol on the file on my old slow machine takes

Census ARIMA x-12 seasonal adjustment in R?

2011 Dec 28

Census ARIMA x-12 seasonal adjustment in R?

Hello, I am new to usin R - which is a great tool - and would like to know if R has a seasonal adjustment program for time series and/if it incorporates the Census Bureau's ARIMA x-12 seasonal adjustment program in any way? Thanks so much! Tony [[alternative HTML version deleted]]

USA map

2003 Nov 03

USA map

R users, In S, there was a function called usa() that would draw the map of the United States, plus it had other options for graphics. I have looked but I can't find the equivalent in R. Is there one? Thanks, Jason

Sum by group

2024 Dec 06

Sum by group

I have population data (?totpopE?) at the census tract level (?GEOID?), which are nested within Precincts (?Precinct?). Please see below my data structure. I used the code to sum population data per precinct: inters <- inters %>% group_by(Precinct) %>% mutate(TotalPop = sum(totpopE) ) However, said code produced too large sums because each census tract (?GEOID?) has multiple

similar to: Working with large datafiles