Displaying 20 results from an estimated 8000 matches similar to: "Working with large datafiles"
2003 Jun 04
2
rsync for migrating oracle datafiles
Hi - a question for all ye rsync guru's out there...
I have a need to migrate some fairly large Oracle datafiles from a UFS filesystem to VxFS (VERITAS), however I am not being allowed nearly enough outage time to perform a standard file copy migration. The datafiles (of which there are about 4 are about 50GB each in size and on separate UFS filesystems.
I am considering instigating a local
2012 Oct 26
1
Parsing very large xml datafiles with SAX (XML package): What data structure should I favor?
Hello again,
I have another question related to parsing a very large xml file with SAX:
what kind of data structure should I favor? Unlike using DOM function that
can return lists of relevant nodes and let me use various versions of
'apply', the SAX parsing returns me one thing at a time.
I first tried to simply append to simple solution of appending to lists as
I get the data. But I
2012 Oct 26
1
Parsing very large xml datafiles with SAX: How to profile <anonymous> functions?
Hello everyone,
I'm trying to parse a very large XML file using SAX with the XML package
(i.e., mainly the xmlEventParsing function). This function takes as an
argument a list of other functions (handlers) that will be called to handle
particular xml nodes.
If when I use Rprof(), all the handler functions are lumped together under
the <anonymous> label, and I get something like this:
2012 Jul 19
3
On RObjectTables
I was wondering if anyone knows more about the state of RObjectTables. This
largely undocumented functionality was introduced by Duncan around 2002
somewhere and enables you create an environment where the contents are
dynamically queried by R through a hook function. It is mentioned in R
Internals and ?attach. This functionality is quite powerful and allows you
to e.g. offload a big database of R
2006 Mar 21
3
Rsync 4TB datafiles...?
I need to rsync 4 TB datafiles to remote server and clone to a new oracle
database..I have about 40 drives that contains this 4 TB data. I would like
to do rsync from a directory level by using --files-from=FILE option. But
the problem is what will happen if the network connection fails the whole
rsync will fail right.
rsync -a srchost:/ / --files-from=dbf-list
and dbf-list would contain this:
2004 Nov 18
4
Enormous Datasets
Dear List,
I have some projects where I use enormous datasets. For instance, the 5% PUMS microdata from the Census Bureau. After deleting cases I may have a dataset with 7 million+ rows and 50+ columns. Will R handle a datafile of this size? If so, how?
Thank you in advance,
Tom Volscho
************************************
Thomas W. Volscho
Graduate Student
Dept. of Sociology U-2068
2015 Apr 15
4
RObjectTables freezes in R 3.2.0 RC on 32bit systems
We recently started noticing freezes that appear only on 32bit systems
(both linux and windows) with a relatively recent versions of R 3.2.0,
including the RC. It looks like the problem can be traced back to the use
of R_ObjectTables (see R_ext/Callbacks.h)
The problem is a bit difficult to reproduce because it does not appear on
x64 and because the official R interface to this functionality, the
2011 Jun 07
1
Regular Expressions for "Large" Data Set
I'm running R 2.13 on Ubuntu 10.10
I have a data set which is comprised of character strings.
site = readLines('http://www.census.gov/tiger/tms/gazetteer/zips.txt')
dat <- c("01, 35004, AL, ACMAR, 86.51557, 33.584132, 6055, 0.001499")
dat
I want to loop through the data and construct a data frame with the zip
code,
state abbreviation, and city name in seperate columns.
2015 Mar 12
2
Best way to handle dependency on non-CRAN package / large data package?
Thanks Dirk. I'm looking at it now.
At first glance your documentation brings up a good limitation of simply
telling users to type "devtools::install_github()". Namely, what happens
when the census bureau updates their shapefiles, and I subsequently decide
to update the package? Or if I discover an error in the package and decide
to update it? The choroplethr package could have a
2005 Apr 14
2
Reading and coalescing many datafiles.
Greetings.
I've got some analysis problems I'm trying to solve, the raw data for which
are accumulated in a bunch of time-and-date-based files.
/some/path/2005-01-02-00-00-02
etc.
The best 'read all these files' method I've seen in the r-help archives comes
down to
for (df in my_list_of_filenames )
{
dat <- rbind(dat,my_read_function(df))
}
which,
2007 Mar 23
1
Completely off topic, but amusing?
Folks:
Thought that many on this list might find this amusing, perhaps even a bit
relevant. Hope it's OK:
************
WASHINGTON - The government's estimate of the number of Americans without
health insurance fell by nearly 2 million Friday, but not because anyone got
health coverage.
The Census Bureau
<http://search.news.yahoo.com/search/news/?p=Census+Bureau> said it has
2008 Jan 17
1
Any tools for working with US 2000 census data?
I've been given the job of extracting some data from the United States
2000 census (files at
http://www2.census.gov/census_2000/datasets/Summary_File_2/Maryland/all_
Maryland.zip 52M). I'm only interested in Census Block Groups (CBGs)
located within Baltimore City, Maryland. Additionally, I just have to
extract certain data fields. I think I'll be using Summary File 2. This
is my first
2006 Jul 11
18
Zip Code Ranges
Does anyone have any recommendations for working with zip code distance
ranges? I need to calculate the distances between US zip codes.
Thanks!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://wrath.rubyonrails.org/pipermail/rails/attachments/20060711/f133d7de/attachment-0001.html
2008 Feb 13
3
Generalized nonlinear mixed model function?
I am wondering if there is an R function that could estimate a generalized
nonlinear mixed model.
>From my reading it seems that nlme from the nlme package can fit nonlinear
mixed models, while lmer from the lme4 package can fit generalized linear
mixed models.
One alternative I?ve found is gnlmix from the repeated package, although
this only allows for a single random effect.
Is there
2010 Aug 12
2
How to building my own datafile
Hi folks,
I'm prepared building my own datafiles, simple file at start, for testing
wondering how to process? Which software will be used,
MySQL/MS-SQL/MS-Excel/Open Office-Calc etc?
On searching I found r-cran-rmysql on Ubuntu repo.
Further searching I found;
RMySQL: R interface to the MySQL database
http://cran.r-project.org/web/packages/RMySQL/index.html
Whether install the above
2004 Apr 22
1
rsync'ing large files
I'm using rsync to copy some large (>1GB) oracle datafiles. I've noticed
that sometimes it transfers some of the files twice.
Some earlier posts to this list that I saw in the archives seemed to
indicate that this is a problem with the rsync algorithm itself when
dealing with large files. Some of the mails seemed to indicate that this
can be mitigated by using larger block sizes,
2013 Dec 09
2
How can I find nonstandard or control characters in a large file?
I have a humongous csv file containing census data, far too big to read into
RAM. I have been trying to extract individual columns from this file using
the colbycol package. This works for certain subsets of the columns, but not
for others. I have not yet been able to precisely identify the problem
columns, as there are 731 columns and running colbycol on the file on my old
slow machine takes
2011 Dec 28
2
Census ARIMA x-12 seasonal adjustment in R?
Hello,
I am new to usin R - which is a great tool - and would like to know if R
has a seasonal adjustment program for time series and/if it incorporates
the Census Bureau's ARIMA x-12 seasonal adjustment program in any way?
Thanks so much!
Tony
[[alternative HTML version deleted]]
2003 Nov 03
10
USA map
R users,
In S, there was a function called usa() that
would draw the map of the United States, plus
it had other options for graphics. I have looked
but I can't find the equivalent in R. Is there one?
Thanks,
Jason
2024 Dec 06
1
Sum by group
I have population data (?totpopE?) at the census tract level (?GEOID?),
which are nested within Precincts (?Precinct?). Please see below my data
structure.
I used the code to sum population data per precinct:
inters <- inters %>%
group_by(Precinct) %>%
mutate(TotalPop = sum(totpopE)
)
However, said code produced too large sums because each census tract
(?GEOID?) has multiple