thr3ads.net - R help - [R] Value Lookup from File without Slurping [Jan 2009]

If this information is useful, please help other people find it:
Share via:

Gundala Viswanath

2009-Jan-16 09:02 UTC

[R] Value Lookup from File without Slurping

Dear all,

I have a repository file (let's call it repo.txt)
 that contain two columns like this:

# tag  value
AAA    0.2
AAT    0.3
AAC   0.02
AAG   0.02
ATA    0.3
ATT   0.7

Given another query vector
> qr <- c("AAC", "ATT")
I would like to find the corresponding value for each query above,
yielding:

0.02
0.7

However, I want to avoid slurping whole repo.txt into an object (e.g. hash).
Is there any ways to do that?

The reason I want to do that because repo.txt is very2 large size
(milions of lines,
with tag length > 30 bp),  and my PC memory is too small to keep it.

- Gundala Viswanath
Jakarta - Indonesia

Carlos J. Gil Bellosta

2009-Jan-16 09:12 UTC

head link

[R] Value Lookup from File without Slurping

On Fri, 2009-01-16 at 18:02 +0900, Gundala Viswanath
wrote:> Dear all,
> 
> I have a repository file (let's call it repo.txt)
>  that contain two columns like this:
> 
> # tag  value
> AAA    0.2
> AAT    0.3
> AAC   0.02
> AAG   0.02
> ATA    0.3
> ATT   0.7
> 
> Given another query vector
> 
> > qr <- c("AAC", "ATT")
> 
> I would like to find the corresponding value for each query above,
> yielding:
> 
> 0.02
> 0.7
> 
> However, I want to avoid slurping whole repo.txt into an object (e.g.
hash).
> Is there any ways to do that?
> 
> The reason I want to do that because repo.txt is very2 large size
> (milions of lines,
> with tag length > 30 bp),  and my PC memory is too small to keep it.
> 
> - Gundala Viswanath
> Jakarta - Indonesia
Hello,

You can always store your repo.txt into a database, say, SQLite, and
select only the values you want via an SQL query.

Thus, you will prevent loading the full file into memory.

Best regards,

Carlos J. Gil Bellosta
http://www.datanalytics.com

Wacek Kusnierczyk

2009-Jan-16 09:30 UTC

head link

[R] Value Lookup from File without Slurping

you might try to iteratively read a limited number of line of lines in a
batch using readLines:

# filename, the name of your file
# n, the maximal count of lines to read in a batch
connection = file(filename, open="rt")
while (length(lines <- readLines(con=connection, n=n))) {
   # do your stuff here
}
close(connection)

?file
?readLines

vQ


Gundala Viswanath wrote:> Dear all,
>
> I have a repository file (let's call it repo.txt)
>  that contain two columns like this:
>
> # tag  value
> AAA    0.2
> AAT    0.3
> AAC   0.02
> AAG   0.02
> ATA    0.3
> ATT   0.7
>
> Given another query vector
>
>   
>> qr <- c("AAC", "ATT")
>>     
>
> I would like to find the corresponding value for each query above,
> yielding:
>
> 0.02
> 0.7
>
> However, I want to avoid slurping whole repo.txt into an object (e.g.
hash).
> Is there any ways to do that?
>
> The reason I want to do that because repo.txt is very2 large size
> (milions of lines,
> with tag length > 30 bp),  and my PC memory is too small to keep it.
>
>

r at quantide.com

2009-Jan-16 10:52 UTC

head link

[R] Value Lookup from File without Slurping

I agree on the database solution.
Database are the rigth tool to solve this kind of problem.
Only consider the start up cost of setting up the database. This could 
be a very time consuming task if someone is not familiar with database 
technology.

Using file() is not a real reading of all the file. This function will 
simply open a connection to the file without reading it.
countLines should do something lile "wc -l" from a bash shell

I would say that if this is a one time job this solution should work 
even thought is not the fastest. In case this job is a repetitive one, 
then a database solution is surely better

A.

Wacek Kusnierczyk wrote:> if the file is really large, reading it twice may add considerable penalty:
>
> r at quantide.com wrote:
>   
>> Something like this should work
>>
>> library(R.utils)
>> out = numeric()
>> qr = c("AAC", "ATT")
>> n =countLines("test.txt")
>>     
>
> # 1st pass
>
>   
>> file = file("test.txt", "r")
>> for (i in 1:n){
>>     
>
> # 2nd pass
>
>   
>> line = readLines(file, n = 1)
>> A = strsplit (line, split = " ")[[1]][1]
>> if(is.element(A, qr)) {
>> value = as.numeric(strsplit (line, split = " ")[[1]][2])
>> out = c(out, value)
>> }
>> }
>>     
>
> if this is a one-go task, counting the lines does not pay, and why
> bother.  if this is a repetitive task, a database-based solution will
> probably be a better idea.
>
> vQ
>
>

Gabor Grothendieck

2009-Jan-16 12:09 UTC

head link

[R] Value Lookup from File without Slurping

On Fri, Jan 16, 2009 at 5:52 AM, r at quantide.com <r at quantide.com>
wrote:> I agree on the database solution.
> Database are the rigth tool to solve this kind of problem.
> Only consider the start up cost of setting up the database. This could be a
> very time consuming task if someone is not familiar with database
> technology.
Using sqldf as mentioned previously on this thread allows one to use
the SQLite database with no setup at all.  sqldf automatically creates
the database, generates the record layout, loads the file (not going through
R but outside of R so R does not slow it down) and extracts the
portion you want into R issuing the appropriate calls to RSQLite/DBI and
destroying the database afterwards all automatically.  When you
install sqldf it automatically installs RSQLite and the SQLite database
itself so the entire installation is just one line:
install.packages("sqldf")

Gundala Viswanath

2009-Jan-16 16:11 UTC

head link

[R] Value Lookup from File without Slurping

Hi,
> Unless you specify an in-memory database the database is stored on disk.
Thanks for your explanation.
I just downloaded 'sqldf'.

Where can I find the option for that? In sqldf I can't see the command.

I looked at:
envir = parent.frame()

doesn't appear to be the one.

- Gundala Viswanath
Jakarta - Indonesia
>
> On Fri, Jan 16, 2009 at 10:59 AM, Gundala Viswanath <gundalav at
gmail.com> wrote:
>> Hi Gabor,
>>
>>> the file itself is read  into a database
>>
>> The above doesn't use RAM memory?
>>
>> Rgds,
>> GV.
>>
>>> without ever going through R so your memory requirements correspond
to what
>>> you extract, not the size of the file.
>>>
>>> On Fri, Jan 16, 2009 at 10:49 AM, Gundala Viswanath <gundalav at
gmail.com> wrote:
>>>> Hi Gabor,
>>>>
>>>> Do you mean storing data in "sqldf', doesn't take
memory?
>>>> For example, I have 3GB data file. with standard R object using
read.table()
>>>> the object size will explode twice ~6GB. My current 4GB RAM
>>>> cannot handle that.
>>>>
>>>> Do you mean with "sqldf", this is not the issue?
>>>> Why is that?
>>>>
>>>> Sorry for my naive question.
>>>>
>>>> - Gundala Viswanath
>>>> Jakarta - Indonesia
>>>>
>>>>
>>>>
>>>> On Fri, Jan 16, 2009 at 9:09 PM, Gabor Grothendieck
>>>> <ggrothendieck at gmail.com> wrote:
>>>>> On Fri, Jan 16, 2009 at 5:52 AM, r at quantide.com <r at
quantide.com> wrote:
>>>>>> I agree on the database solution.
>>>>>> Database are the rigth tool to solve this kind of
problem.
>>>>>> Only consider the start up cost of setting up the
database. This could be a
>>>>>> very time consuming task if someone is not familiar
with database
>>>>>> technology.
>>>>>
>>>>> Using sqldf as mentioned previously on this thread allows
one to use
>>>>> the SQLite database with no setup at all.  sqldf
automatically creates
>>>>> the database, generates the record layout, loads the file
(not going through
>>>>> R but outside of R so R does not slow it down) and extracts
the
>>>>> portion you want into R issuing the appropriate calls to
RSQLite/DBI and
>>>>> destroying the database afterwards all automatically.  When
you
>>>>> install sqldf it automatically installs RSQLite and the
SQLite database
>>>>> itself so the entire installation is just one line:
install.packages("sqldf")
>>>>>
>>>>> ______________________________________________
>>>>> R-help at r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained,
reproducible code.
>>>>>
>>>>
>>>
>>
>

Seemingly Similar Threads

Search for more maybe matching threads

R help - Jan 2009 - Value Lookup from File without Slurping

[R] Value Lookup from File without Slurping

[R] Value Lookup from File without Slurping

[R] Value Lookup from File without Slurping

[R] Value Lookup from File without Slurping

[R] Value Lookup from File without Slurping

[R] Value Lookup from File without Slurping

Seemingly Similar Threads