> On Sep 17, 2017, at 9:24 PM, Ajay Arvind Rao <AjayArvind.Rao at
gmrgroup.in> wrote:
>
> Hi,
>
> We are using open source license of R to analyze data at our organization.
The system configuration are as follows:
>
> * System configuration:
>
> o Operating System - Windows 7 Enterprise SP1, 64 bit (Desktop)
>
> o RAM - 8 GB
>
> o Processor - i5-6500 @ 3.2 Ghz
>
> * R Version:
>
> o R Studio 1.0.136
>
> o R 3.4.0
>
> While trying to merge two datasets we received the following resource error
message on running the code
> Code: merg_data <-
merge(x=Data_1Junto30Jun,y=flight_code,by.x="EB_FLNO1",by.y="EB_FLNO1",all.x
= TRUE)
> Error Message: Error: cannot allocate vector of size 5.8 Gb
>
> Later we tried running the code differently but error still remained
> Code: merg_data <- sqldf("Select * from Data_1Junto30Jun as a inner
join flight_code as b on a.EB_FLNO1=b.EB_FLNO1")
> Error Message: Error: cannot allocate vector of size 200.0 Mb
>
> We have upgraded the RAM to 8 GB couple of months back. Can you let us know
options to resolve the above issue without having to increase the RAM? The size
of the datasets are as follows:
>
> * Data_1Junto30Jun (513476 obs of 32 variables). Data size -
172033368 bytes / 172 MB
>
> * flight_code (478105 obs of 2 variables). Data size - 3836304 bytes
/ 4 MB
>
>
> Help with determining system requirement:
> Is there a way to determine minimum system requirement (hardware and
software)
There are some packages for working with data "out of memory". See
bigmemory and other "big*" packages. See also the data.table package
which has many satisfied users. There are also several packages for handling
data through database connections. That would be probably the preferred method
for your use case.
R objects are almost always copied when an assignment is made and this means
that you need at a minimum at least twice as much free (and in _continuous_
chunks) memory. You will often be breaking up the memory with other code and
other out-of-R processes. Windows was in the past notorious for having poor
memory management. I don't know if Windows 7 continued that tradition or
whether later versions might be useful to avoid the problem.
A dataframe will consume about 10 bytes per row for numeric columns. Factor and
character vectors are hashed so the memory consumed will depend on the degree of
duplication of entries. That will also affect the merge operations. Merges will
give you a Cartesian product so if you merge two dataframes with lots of
duplicates you will often get a message such as: "Error: cannot allocate
vector of size 5.8 Gb"
The second error you cite suggests that much of your 8Gb of storage has been
fragmented.
Most of this information should be available via searching in Rhelp or RSeek.
> depending on size of the data, the way the data is loaded into R (directly
from server or in a flat file) and the type of analysis to be run?
No difference for the source of data but cannot comment on the type of analysis
because that part of the question is too vague. (Aside from mentioning the issue
of Cartesian multiplication of merge results which often trips up new users of
database technology.)
> We have not been able to get any specific information related to this and
are estimating the requirements through a trial and error method. Any
information on this front will be helpful.
This suggests an impoverished ability for searching:
https://stackoverflow.com/search?q=%5Br%5D+memory+limitations
https://stackoverflow.com/search?q=%5Br%5D+memory+limitations+windows
http://markmail.org/search/?q=list%3Aorg.r-project.r-help+memory+limitations+windows
--
David.
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius
Alameda, CA, USA
'Any technology distinguishable from magic is insufficiently advanced.'
-Gehm's Corollary to Clarke's Third Law