Let me start by saying I am rather new to R and generally consider myself to be a novice programmer...so don't assume I know what I'm doing :) I have a large matrix, approximately 300,000 x 14. It's essentially a 20-year dataset of 15-minute data. However, I only need the rows where the column I've named REC.TYPE contains the string "SAO " or "FL-15". My horribly inefficient solution was to search the matrix row by row, test the REC.TYPE column and essentially delete the row if it did not match my criteria. Essentially...> j <- 1 > for (i in 1:nrow(dataset)) { > if(dataset$REC.TYPE[j] != "SAO " && dataset$RECTYPE[j] != "FL-15") { > dataset <- dataset[-j,] } > else { > j <- j+1 } > }After watching my code get through only about 10% of the matrix in an hour and slowing with every row...I figure there must be a more efficient way of pulling out only the records I need...especially when I need to repeat this for another 8 datasets. Can anyone point me in the right direction? Thanks! Matt
Try this:
dataset <- subset(dataset, grepl("(SAO |FL-15)", REC.TYPE))
On Sun, Mar 3, 2013 at 1:11 AM, Matt Borkowski <mathias1979 at yahoo.com>
wrote:> Let me start by saying I am rather new to R and generally consider myself
to be a novice programmer...so don't assume I know what I'm doing :)
>
> I have a large matrix, approximately 300,000 x 14. It's essentially a
20-year dataset of 15-minute data. However, I only need the rows where the
column I've named REC.TYPE contains the string "SAO " or
"FL-15".
>
> My horribly inefficient solution was to search the matrix row by row, test
the REC.TYPE column and essentially delete the row if it did not match my
criteria. Essentially...
>
>> j <- 1
>> for (i in 1:nrow(dataset)) {
>> if(dataset$REC.TYPE[j] != "SAO " &&
dataset$RECTYPE[j] != "FL-15") {
>> dataset <- dataset[-j,] }
>> else {
>> j <- j+1 }
>> }
>
> After watching my code get through only about 10% of the matrix in an hour
and slowing with every row...I figure there must be a more efficient way of
pulling out only the records I need...especially when I need to repeat this for
another 8 datasets.
>
> Can anyone point me in the right direction?
>
> Thanks!
>
> Matt
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
--
Jim Holtman
Data Munger Guru
What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.
Thank you for your response Jim! I will give this one a try! But a couple followup questions... In my search for a solution, I had seen something stating match() is much more efficient than subset() and will cut down significantly on computing time. Is there any truth to that? Also, I found the following solution which works for matching a single condition, but I couldn't quite figure out how to ?modify it it to search for both my acceptable conditions...> testdata <- testdata[testdata$REC.TYPE == "SAO",,drop=FALSE]-Matt --- On Sun, 3/3/13, jim holtman <jholtman at gmail.com> wrote: From: jim holtman <jholtman at gmail.com> Subject: Re: [R] Help searching a matrix for only certain records To: "Matt Borkowski" <mathias1979 at yahoo.com> Cc: r-help at r-project.org Date: Sunday, March 3, 2013, 8:00 AM Try this: dataset <- subset(dataset, grepl("(SAO |FL-15)", REC.TYPE)) On Sun, Mar 3, 2013 at 1:11 AM, Matt Borkowski <mathias1979 at yahoo.com> wrote:> Let me start by saying I am rather new to R and generally consider myself to be a novice programmer...so don't assume I know what I'm doing :) > > I have a large matrix, approximately 300,000 x 14. It's essentially a 20-year dataset of 15-minute data. However, I only need the rows where the column I've named REC.TYPE contains the string "SAO? " or "FL-15". > > My horribly inefficient solution was to search the matrix row by row, test the REC.TYPE column and essentially delete the row if it did not match my criteria. Essentially... > >> j <- 1 >> for (i in 1:nrow(dataset)) { >>? ? if(dataset$REC.TYPE[j] != "SAO? " && dataset$RECTYPE[j] != "FL-15") { >>? ? ? dataset <- dataset[-j,]? } >>? ? else { >>? ? ? j <- j+1? } >> } > > After watching my code get through only about 10% of the matrix in an hour and slowing with every row...I figure there must be a more efficient way of pulling out only the records I need...especially when I need to repeat this for another 8 datasets. > > Can anyone point me in the right direction? > > Thanks! > > Matt > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it.
Hi,
Try this:
set.seed(51)
?mat1<- as.matrix(data.frame(REC.TYPE=
sample(c("SAO","FAO","FL-1","FL-2","FL-15"),20,replace=TRUE),Col2=rnorm(20),Col3=runif(20),stringsAsFactors=FALSE))
?dat1<- as.data.frame(mat1,stringsAsFactors=FALSE)
dat1[grepl("SAO|FL-15",dat1$REC.TYPE),]
#?? REC.TYPE??????? Col2?????? Col3
#4???? FL-15 -1.31594143 0.41193183
#6???? FL-15? 0.43419586 0.96004780
#9???? FL-15 -0.90690732 0.84000657
#10????? SAO? 0.21363265 0.20155142
#13????? SAO -0.55566727 0.71606558
#15????? SAO -0.71533068 0.90851364
#17????? SAO? 1.58611036 0.97475674
#20????? SAO -0.42904914 0.33710578
A.K.
----- Original Message -----
From: Matt Borkowski <mathias1979 at yahoo.com>
To: r-help at r-project.org
Cc:
Sent: Sunday, March 3, 2013 1:11 AM
Subject: [R] Help searching a matrix for only certain records
Let me start by saying I am rather new to R and generally consider myself to be
a novice programmer...so don't assume I know what I'm doing :)
I have a large matrix, approximately 300,000 x 14. It's essentially a
20-year dataset of 15-minute data. However, I only need the rows where the
column I've named REC.TYPE contains the string "SAO? " or
"FL-15".
My horribly inefficient solution was to search the matrix row by row, test the
REC.TYPE column and essentially delete the row if it did not match my criteria.
Essentially...
> j <- 1
> for (i in 1:nrow(dataset)) {
>? ? if(dataset$REC.TYPE[j] != "SAO? " &&
dataset$RECTYPE[j] != "FL-15") {
>? ? ? dataset <- dataset[-j,]? }
>? ? else {
>? ? ? j <- j+1? }
> }
After watching my code get through only about 10% of the matrix in an hour and
slowing with every row...I figure there must be a more efficient way of pulling
out only the records I need...especially when I need to repeat this for another
8 datasets.
Can anyone point me in the right direction?
Thanks!
Matt
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
there are way "more efficient" ways of doing many of the operations ,
but you probably won't see any differences unless you have very large
objects (several hunfred thousand entries), or have to do it a lot of times. My
background is in computer performance and for the most part I have found that
the easiest/mostbstraight forward ways are fine most of the time.
a more efficient way might be:
testdata <- testdata[match(c('SAO ', 'FL-15'),
testdata$REC.TYPE), ]
you can always use 'system.time' to determine how long actions take.
for multiple comparisons use %in%
Sent from my iPad
On Mar 3, 2013, at 9:22, Matt Borkowski <mathias1979 at yahoo.com> wrote:
> Thank you for your response Jim! I will give this one a try! But a couple
followup questions...
>
> In my search for a solution, I had seen something stating match() is much
more efficient than subset() and will cut down significantly on computing time.
Is there any truth to that?
>
> Also, I found the following solution which works for matching a single
condition, but I couldn't quite figure out how to modify it it to search
for both my acceptable conditions...
>
>> testdata <- testdata[testdata$REC.TYPE ==
"SAO",,drop=FALSE]
>
> -Matt
>
>
>
>
> --- On Sun, 3/3/13, jim holtman <jholtman at gmail.com> wrote:
>
> From: jim holtman <jholtman at gmail.com>
> Subject: Re: [R] Help searching a matrix for only certain records
> To: "Matt Borkowski" <mathias1979 at yahoo.com>
> Cc: r-help at r-project.org
> Date: Sunday, March 3, 2013, 8:00 AM
>
> Try this:
>
> dataset <- subset(dataset, grepl("(SAO |FL-15)", REC.TYPE))
>
>
> On Sun, Mar 3, 2013 at 1:11 AM, Matt Borkowski <mathias1979 at
yahoo.com> wrote:
>> Let me start by saying I am rather new to R and generally consider
myself to be a novice programmer...so don't assume I know what I'm doing
:)
>>
>> I have a large matrix, approximately 300,000 x 14. It's essentially
a 20-year dataset of 15-minute data. However, I only need the rows where the
column I've named REC.TYPE contains the string "SAO " or
"FL-15".
>>
>> My horribly inefficient solution was to search the matrix row by row,
test the REC.TYPE column and essentially delete the row if it did not match my
criteria. Essentially...
>>
>>> j <- 1
>>> for (i in 1:nrow(dataset)) {
>>> if(dataset$REC.TYPE[j] != "SAO " &&
dataset$RECTYPE[j] != "FL-15") {
>>> dataset <- dataset[-j,] }
>>> else {
>>> j <- j+1 }
>>> }
>>
>> After watching my code get through only about 10% of the matrix in an
hour and slowing with every row...I figure there must be a more efficient way of
pulling out only the records I need...especially when I need to repeat this for
another 8 datasets.
>>
>> Can anyone point me in the right direction?
>>
>> Thanks!
>>
>> Matt
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
> Jim Holtman
> Data Munger Guru
>
> What is the problem that you are trying to solve?
> Tell me what you want to do, not how you want to do it.
>