Displaying 20 results from an estimated 30000 matches similar to: "What is the best package for large data cleaning (not statistical analysis)?"
2008 Apr 14
3
Doing the right amount of copy for large data frames.
Hi there,
Problem ::
When one tries to change one or some of the columns of a data.frame, R makes
a copy of the whole data.frame using the '*tmp*' mechanism (this does not
happen for components of a list, tracemem( ) on R-2.6.2 says so).
Suggested solution ::
Store the columns of the data.frame as a list inside of an environment slot
of an S4 class, and define the '[',
2010 Feb 22
1
big panel: filehash, bigmemory or other
Dear R-list
I'm on my way to start a new project on a rather big panel, consisting
of approximately 8 million observations in 30 waves of data and about
15 variables. I have a similar data set that is approximately 7
gigabytes in size.
Until now I have done my data management in SAS, and Stata, mostly
identifying spells, counting events in intervals, and a like, but I
would like to
2007 Nov 01
2
unable to install package ff
Hi all,
I've had one of my most miserable R weeks in memory. I'm trying to
deal with huge datasets (>1GB each) but am running up against those
pesky memory limits. The libraries filehash and g.data are not very
suitable for what I need. I haven't gotten into the sql thing yet.
Most recently I've been trying to install the new package ff (not yet
on the CRAN repository). I
2012 May 04
2
Can't import this 4GB DATASET
Dear Experienced R Practitioners,
I have 4GB .txt data called "dataset.txt" and have attempted to use *ff,
bigmemory, filehash and sqldf *packages to import it, but have had no
success. The readLines output of this data is:
readLines("dataset.txt",n=20)
[1] " "
2010 Oct 12
2
merging and working with BIG data sets. Is sqldf the best way??
Hi everyone,
I’m working with some very big datasets (each dataset has 11 million rows
and 2 columns). My first step is to merge all my individual data sets
together (I have about 20)
I’m using the following command from sqldf
data1 <- sqldf("select A.*, B.* from A inner join B
using(ID)")
But it’s taking A VERY VERY LONG TIME to merge just 2 of the datasets
2010 Oct 22
2
(no subject)
I am doing cluster analysis on 8768 respondents on 5 lifestyle variables and am having difficulty constructing a dissimilarity matrix which I will use for PAM. I always get an error: “cannot allocate vector of size 293.3 Mb” even if I have already increased my memory to its limit of 4000. I did it on 2GB , 32-bit OS . I tried ff and filehash and I still get the same error. Can you please
2012 Apr 27
1
TikzDevice
Dear R'ers,
I have trouble installing tikzDevice in Ubuntu. When I use install.packages("tikzDevice"), it gives error message:
ERROR: dependency ‘filehash’ is not available for package ‘tikzDevice’
* removing ‘/usr/local/lib/R/site-library/tikzDevice’
Then I tried filehash installation, I get the message:
"package ‘filehash’ is not available (for R version 2.13.1)"
2011 Jan 02
1
filehash for big data
Hi all,
I am trying to use the filehash library to analyze a 5M by 20 matrix with both
double and string data types.
After consulting a few tutorials online, it seems as though one needs to first
read the data into R; then create an R object; and then assign that object a
location in my computer via filehash. It seems like the benefit of this is
minimizing memory allocation when running
2011 Sep 23
2
tikzDevice install problem
Hi everybody!
I'm trying to install the tikzDevice package, and I keep on getting the
> ERROR: dependency ‘filehash’ is not available for package ‘tikzDevice’
I tried install.packages('filehash') and I get
> package ‘filehash’ is not available
Does anybody have the same problem or any hint?
thank youhelena
[[alternative HTML version deleted]]
2008 Mar 08
1
Error message while trying to update packages: Error in gzfile(file, mode) : unable to open connection
Hello,
I have just installed v 2.6.2 on a new computer running Windows XP
and tried to perform 'update packages' via the menu option on the R console.
Any advice on the following problem is much appreciated.
Bob
Below are the warning and error messages received. A search of the
hard drive does not reveal any file including "RtmpgMMu03/libloc" .
>
2013 Jul 15
2
suppress startup messages from default packages
Hi all,
several packages print messages during loading. How do I avoid to see
them when the packages are in the defaultPackages?
Here is an example.
With this in ~/.Rprofile
,----[ ~/.Rprofile ]
| old <- getOption("defaultPackages")
| options(defaultPackages = c(old, "filehash"))
| rm(old)
`----
I get as last line when starting R:
,----
| filehash: Simple key-value
2007 Nov 16
2
analysis of large data set
All,
I am working with a large data set (~ 450,000 rows by 34 columns) I am
trying to fit a regression model (I have tried to use several procedures psm
(Design package) lm, glm). However whenever I try to fit the model I get the
following error:
Error: cannot allocate vector of size 1.1 Gb
Here are the specs of the machine and version of R I am using
Windows Server 2003 R2 Enterprise x64
2008 Mar 15
1
filehash
Hello,
I'm using filehash on the windows XP and it has been working fine with the
newest R version 2.6.2. However, on the windows vista, when I ran the same
code, I got the following error:
> dbCreate("simdb") #create simdb database
[1] TRUE
> db<-dbInit("simdb") #initiate an object of database
Error in sprintf(gettext(fmt, domain = domain), ...) :
object
2010 Feb 19
2
problem with RGtk
Dear List,
I would like to ask about package RGtk2 with which I have a problem.
I will very much appreciate if somebody could tell me what I need to do.
I need to install a package scdMicro and it depends on gWidgetsRGtk2.
I am working on MAC, version 10.5.8.
When I try to load gWidgetsRGtk2 (or RGtk2), it asks me:
"Instal GTK+?"
I installed Gtk+ from CRAN, and added a path:
export
2012 Jul 24
1
unable to run spatial lag and error models on large data
Hi:
First my apologies for cross-posting. A few days back I posted my queries ar R-sig-geo but did not get any response. Hence this post.
I am working on two parcel-level housing dataset to estimate the impact of various variables on home sale prices.
I created the spatial weight metrics in ArcGIS 10 using sale
year of four nearest houses to assign weights. Next, I ran LM tests and
then ran
2008 Jul 31
2
C versions of serialize/unserialize in packages
Are the functions 'R_Unserialize' and 'R_InitFileInPStream' allowed to
be used in R packages? I guess I'm just not clear on the implications
of this comment in 'Rinternals.h':
/* The connection interface is not yet available to packages. To
allow limited use of connection pointers this defines the opaque
pointer type. */
I have a function in the
2009 Nov 19
1
RE shaping large dataset
I am doing a project which involve reshaping a large dataset, can any of you
please sugguest me some good reading/websites/ examples .... can be in R and
SAS
Thanks everyone !!!
--
View this message in context: http://old.nabble.com/REshaping-large-dataset-tp26421513p26421513.html
Sent from the R help mailing list archive at Nabble.com.
2010 Jan 21
0
filehash does not install on FreeBSD
Trying to install package 'filehash' I get the following error on
FreeBSD 9.0-CURRENT (amd64) with R version 2.11.0 (2010-01-15 r50990):
-----------------------------------
R CMD INSTALL filehash_2.0-1.tar.gz
* installing to library '/usr/local/lib/R/library'
* installing *source* package 'filehash' ...
** libs
gcc -std=gnu99 -I/usr/local/lib/R/include
2008 Aug 28
0
Can the file locking in filehash be reused? (Was: Re: [R] [R-pkgs] filehash 2.0)
Hi (Roger),
I saw the announcement of filehash v2.0 and the sentence "This
development has lead to better file locking for concurrent access and
faster reading and writing of data in general" caught my attention.
What kind of file locking do you refer to here?
I am looking for a mechanism that can be used to lock files for
reading and/or writing, and I'd love to have a cross
2013 Apr 13
1
Reshaping Data for bi-partite Network Analysis [SOLVED]
Wow !
so many thanks Arun and Rui
works like a charm
problem solved
2013/4/13 arun <smartpink111@yahoo.com>
> Hi,
> Try this;
> library(reshape2)
> res<-dcast(Input,people~place,value.var="time")
> res[is.na(res)]<-0
> res
> # people beach home school sport
> #1 Joe 5 3 0 1
> #2 Marc 0 4 2 0
> #3 Mary