thr3ads.net - R help - [R] a function more appropriate than 'sapply'? [Jan 2013]

If this information is useful, please help other people find it:
Share via:

emorway

2013-Jan-26 18:43 UTC

[R] a function more appropriate than 'sapply'?

I'm wondering if I need to use a function other than sapply as the following
line of code runs indefinitely (or > 30 min so far) and uses up all 16Gb of
memory on my machine for what seems like a very small dataset (data attached
in a txt file  wells.txt
<http://r.789695.n4.nabble.com/file/n4656723/wells.txt>  ).  The R code
is:

wells<-read.table("c:/temp/wells.txt",col.names=c("name","plc_hldr"))
wells2<-wells[sapply(wells[,1],function(x)length(strsplit(as.character(x),
"_")[[1]])==2),]

The 2nd line of R code above gets bogged down and takes all my RAM with it:
<http://r.789695.n4.nabble.com/file/n4656723/memory_loss.png> 

I'm simply trying to extract all of the lines of data that have a single
"_"
in the first column and place them into a dataset called "wells2".  If
that
were to work, I then want to extract the lines of data that have two
"_" and
put them into a separate dataset, say "wells3".  Is there a better way
to do
this than the one-liner above?

-Eric



--
View this message in context:
http://r.789695.n4.nabble.com/a-function-more-appropriate-than-sapply-tp4656723.html
Sent from the R help mailing list archive at Nabble.com.

arun

2013-Jan-26 19:34 UTC

head link

[R] a function more appropriate than 'sapply'?

Hi,
May be this helps:
?wells<-read.table("wells.txt",header=FALSE,stringsAsFactors=F)


?wells2<-wells[-grep(".*\\_.*\\_",wells[,1]),]
? head(wells2)
? # ? V1 V2
#1? w7_1? 0
#2 w11_1? 0
#3 w12_1? 0
#4 w13_1? 0
#5 w14_1? 0
#6 w15_1? 0



wellsNew<-wells[grep(".*\\_.*\\_",wells[,1]),]
?head(wellsNew)
#??????????? V1 V2
#851 99_10_4395? 0
#852 99_10_4396? 0
#853 99_10_4400? 0
#854 99_10_4403? 0
#855 99_10_4404? 0
#856 99_10_4606? 0
?nrow(wells)
#[1] 46366
nrow(wells2)
#[1] 38080
?nrow(wellsNew)
#[1] 8286
?38080+8286
#[1] 46366
A.K.



----- Original Message -----
From: emorway <emorway at usgs.gov>
To: r-help at r-project.org
Cc: 
Sent: Saturday, January 26, 2013 1:43 PM
Subject: [R] a function more appropriate than 'sapply'?

I'm wondering if I need to use a function other than sapply as the following
line of code runs indefinitely (or > 30 min so far) and uses up all 16Gb of
memory on my machine for what seems like a very small dataset (data attached
in a txt file? wells.txt
<http://r.789695.n4.nabble.com/file/n4656723/wells.txt>? ).? The R code
is:

wells<-read.table("c:/temp/wells.txt",col.names=c("name","plc_hldr"))
wells2<-wells[sapply(wells[,1],function(x)length(strsplit(as.character(x),
"_")[[1]])==2),]

The 2nd line of R code above gets bogged down and takes all my RAM with it:
<http://r.789695.n4.nabble.com/file/n4656723/memory_loss.png> 

I'm simply trying to extract all of the lines of data that have a single
"_"
in the first column and place them into a dataset called "wells2".? If
that
were to work, I then want to extract the lines of data that have two
"_" and
put them into a separate dataset, say "wells3".? Is there a better way
to do
this than the one-liner above?

-Eric



--
View this message in context:
http://r.789695.n4.nabble.com/a-function-more-appropriate-than-sapply-tp4656723.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Berend Hasselman

2013-Jan-26 19:46 UTC

head link

[R] a function more appropriate than 'sapply'?

On 26-01-2013, at 19:43, emorway <emorway at usgs.gov> wrote:
> I'm wondering if I need to use a function other than sapply as the
following
> line of code runs indefinitely (or > 30 min so far) and uses up all 16Gb
of
> memory on my machine for what seems like a very small dataset (data
attached
> in a txt file  wells.txt
> <http://r.789695.n4.nabble.com/file/n4656723/wells.txt>  ).  The R
code is:
> 
>
wells<-read.table("c:/temp/wells.txt",col.names=c("name","plc_hldr"))
>
wells2<-wells[sapply(wells[,1],function(x)length(strsplit(as.character(x),
> "_")[[1]])==2),]
> 
> The 2nd line of R code above gets bogged down and takes all my RAM with it:
> <http://r.789695.n4.nabble.com/file/n4656723/memory_loss.png> 
> 
> I'm simply trying to extract all of the lines of data that have a
single "_"
> in the first column and place them into a dataset called
"wells2".  If that
> were to work, I then want to extract the lines of data that have two
"_" and
> put them into a separate dataset, say "wells3".  Is there a
better way to do
> this than the one-liner above?

Read your file with

	wells<-read.table("wells.txt",col.names=c("name","plc_hldr"),
stringsAsFactors=FALSE)

Remove all non underscores with

	w.sub <- gsub("[^_]+","",wells[,1])

then select elements of w.sub with 2 underscores and a single underscore with

	u.2 <- which(w.sub=="__")
	u.1 <- which(w.sub=="_")

and use u.1 and u.2 to select the appropriate rows of wells.

I tried to select rows containing 1 or 2 underscores with grep regular
expressions but that appeared to be more difficult than I had expected.
The method above is quick.

Berend

Possibly Parallel Threads

Search for more maybe matching threads

R help - Jan 2013 - a function more appropriate than 'sapply'?

[R] a function more appropriate than 'sapply'?

[R] a function more appropriate than 'sapply'?

[R] a function more appropriate than 'sapply'?

Possibly Parallel Threads