thr3ads.net - R help - [R] sorting/grouping/classification problem? [Jan 2013]

If this information is useful, please help other people find it:
Share via:

Bart Joosen

2013-Jan-24 19:31 UTC

[R] sorting/grouping/classification problem?

Hi,


I'm a database admin for a database which manage chromatographic results of
products during stability studies.
I use R for the reporting of the results in MS Word through R2wd.


But now I think I need your help:
suppose we have the following data frame:


   ID  rrt Mnd Result
1 0.45   0   0.10
1 0.48   0   0.30
1 1.24   0   0.50
2 0.45   3   0.20
2 0.48   3   0.60
2 1.22   3   0.40
3 0.35   6   0.05
3 0.44   6   0.40
3 0.46   6   1.20
3 1.21   6   0.45
4 0.36   9   0.06
4 0.45   9   0.60
4 0.48   9   1.80
4 1.22   9   0.50



ID is the database ID, rrt is an identifier for the result, Mnd is the timepoint
of analysis and Result is... the result of the test.
What I need is this dataframe in a wide format (which I managed with dat2 <-
as.data.frame(tapply(dat$Result,list(rrt=dat$rrt,Mnd=dat$Mnd), function(x)
paste(x[x!=""],collapse="/"))) )
But as you can see, rrt is not an exact identifier for the result.

Sometimes rrt for 0 Mnd is 0.45, but at 6 Mnd the rrt is 0.44.
Now I need the results to align so that one can easily see how rrt x is evolving
within the Mnd time points.
I tried with different rounding procedures (round every 0.02, check that no
results are discarded this way, and check for alignment), but nothing seems to
make some sense.
Also tried checking the highest results in each Mnd, align these, determine
correction factors for the rrt for all the other rrts, ...


Some results will follow a trend (like rrt 0.45), some will remain more or less
stable.
But NEVER rrt will switch i with each other!




Ultimately I need to update in the db, so I need a list/dataframe with the ID,
the original rrt and the adjusted rrt (maybe the first occuring rrt, or the mean
of the rrts, doesn't matter).




Any ideas about which algorithms can be used? I searched on pubmed, but
couldn't find anything




Thanks


Bart


PS: to get the data:


dat <-
structure(list(ID = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 
4L, 4L, 4L, 4L), rrt = c(0.45, 0.48, 1.24, 0.45, 0.48, 1.22, 
0.35, 0.44, 0.46, 1.21, 0.36, 0.45, 0.48, 1.22), Mnd = c(0L, 
0L, 0L, 3L, 3L, 3L, 6L, 6L, 6L, 6L, 9L, 9L, 9L, 9L), Result = c(0.1, 
0.3, 0.5, 0.2, 0.6, 0.4, 0.05, 0.4, 1.2, 0.45, 0.06, 0.6, 1.8, 
0.5)), .Names = c("ID", "rrt", "Mnd",
"Result"), class = "data.frame", row.names = c(NA,
-14L))



resulting dataframe:
dat3 <-
structure(list(Time = c(0.355, 0.45, 0.48, 1.22), `0` = c(NA, 
0.1, 0.3, 0.5), `3` = c(NA, 0.2, 0.6, 0.4), `6` = c(0.05, 0.4, 
1.2, 0.45), `9` = c(0.06, 0.6, 1.8, 0.5)), .Names = c("Time", 
"0", "3", "6", "9"), class =
"data.frame", row.names = c(NA,
-4L))



 		 	   		  
	[[alternative HTML version deleted]]

arun

2013-Jan-24 20:18 UTC

head link

[R] sorting/grouping/classification problem?

HI,

If I understand your question, "dat3" is not you wanted.

Is? it something like this you wanted?
library(reshape2)
?dcast(dat,rrt~Mnd,value.var="Result")
#?? rrt?? 0?? 3??? 6??? 9
#1 0.35? NA? NA 0.05?? NA
#2 0.36? NA? NA?? NA 0.06
#3 0.44? NA? NA 0.40?? NA
#4 0.45 0.1 0.2?? NA 0.60
#5 0.46? NA? NA 1.20?? NA
#6 0.48 0.3 0.6?? NA 1.80
#7 1.21? NA? NA 0.45?? NA
#8 1.22? NA 0.4?? NA 0.50
#9 1.24 0.5? NA?? NA?? NA
A.K.




----- Original Message -----
From: Bart Joosen <bartjoosen at hotmail.com>
To: r-help <r-help at r-project.org>
Cc: 
Sent: Thursday, January 24, 2013 2:31 PM
Subject: [R] sorting/grouping/classification problem?


Hi,


I'm a database admin for a database which manage chromatographic results of
products during stability studies.
I use R for the reporting of the results in MS Word through R2wd.


But now I think I need your help:
suppose we have the following data frame:


?  ID? rrt Mnd Result
1 0.45?  0?  0.10
1 0.48?  0?  0.30
1 1.24?  0?  0.50
2 0.45?  3?  0.20
2 0.48?  3?  0.60
2 1.22?  3?  0.40
3 0.35?  6?  0.05
3 0.44?  6?  0.40
3 0.46?  6?  1.20
3 1.21?  6?  0.45
4 0.36?  9?  0.06
4 0.45?  9?  0.60
4 0.48?  9?  1.80
4 1.22?  9?  0.50



ID is the database ID, rrt is an identifier for the result, Mnd is the timepoint
of analysis and Result is... the result of the test.
What I need is this dataframe in a wide format (which I managed with dat2 <-
as.data.frame(tapply(dat$Result,list(rrt=dat$rrt,Mnd=dat$Mnd), function(x)
paste(x[x!=""],collapse="/"))) )
But as you can see, rrt is not an exact identifier for the result.

Sometimes rrt for 0 Mnd is 0.45, but at 6 Mnd the rrt is 0.44.
Now I need the results to align so that one can easily see how rrt x is evolving
within the Mnd time points.
I tried with different rounding procedures (round every 0.02, check that no
results are discarded this way, and check for alignment), but nothing seems to
make some sense.
Also tried checking the highest results in each Mnd, align these, determine
correction factors for the rrt for all the other rrts, ...


Some results will follow a trend (like rrt 0.45), some will remain more or less
stable.
But NEVER rrt will switch i with each other!




Ultimately I need to update in the db, so I need a list/dataframe with the ID,
the original rrt and the adjusted rrt (maybe the first occuring rrt, or the mean
of the rrts, doesn't matter).




Any ideas about which algorithms can be used? I searched on pubmed, but
couldn't find anything




Thanks


Bart


PS: to get the data:


dat <-
structure(list(ID = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 
4L, 4L, 4L, 4L), rrt = c(0.45, 0.48, 1.24, 0.45, 0.48, 1.22, 
0.35, 0.44, 0.46, 1.21, 0.36, 0.45, 0.48, 1.22), Mnd = c(0L, 
0L, 0L, 3L, 3L, 3L, 6L, 6L, 6L, 6L, 9L, 9L, 9L, 9L), Result = c(0.1, 
0.3, 0.5, 0.2, 0.6, 0.4, 0.05, 0.4, 1.2, 0.45, 0.06, 0.6, 1.8, 
0.5)), .Names = c("ID", "rrt", "Mnd",
"Result"), class = "data.frame", row.names = c(NA,
-14L))



resulting dataframe:
dat3 <-
structure(list(Time = c(0.355, 0.45, 0.48, 1.22), `0` = c(NA, 
0.1, 0.3, 0.5), `3` = c(NA, 0.2, 0.6, 0.4), `6` = c(0.05, 0.4, 
1.2, 0.45), `9` = c(0.06, 0.6, 1.8, 0.5)), .Names = c("Time", 
"0", "3", "6", "9"), class =
"data.frame", row.names = c(NA,
-4L))



??? ???  ??? ?  ??? ??? ? 
??? [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Bart Joosen

2013-Jan-24 20:43 UTC

head link

[R] sorting/grouping/classification problem?

dat3 is the dataframe where there are some rrt values merged, which is actually
the problem: how on Earth discide which rows van be merged....

Thanks for your input!

Bart

-----Original Message-----

From: arun
Sent: 24 Jan 2013 20:18:28 GMT
To: Bart Joosen
Cc: R help
Subject: Re: [R] sorting/grouping/classification problem?

HI,

If I understand your question, "dat3" is not you wanted.

Is  it something like this you wanted?
library(reshape2)
 dcast(dat,rrt~Mnd,value.var="Result")
#   rrt   0   3    6    9
#1 0.35  NA  NA 0.05   NA
#2 0.36  NA  NA   NA 0.06
#3 0.44  NA  NA 0.40   NA
#4 0.45 0.1 0.2   NA 0.60
#5 0.46  NA  NA 1.20   NA
#6 0.48 0.3 0.6   NA 1.80
#7 1.21  NA  NA 0.45   NA
#8 1.22  NA 0.4   NA 0.50
#9 1.24 0.5  NA   NA   NA
A.K.




----- Original Message -----
From: Bart Joosen <bartjoosen at hotmail.com>
To: r-help <r-help at r-project.org>
Cc:
Sent: Thursday, January 24, 2013 2:31 PM
Subject: [R] sorting/grouping/classification problem?


Hi,


I'm a database admin for a database which manage chromatographic results of
products during stability studies.
I use R for the reporting of the results in MS Word through R2wd.


But now I think I need your help:
suppose we have the following data frame:


   ID  rrt Mnd Result
1 0.45   0   0.10
1 0.48   0   0.30
1 1.24   0   0.50
2 0.45   3   0.20
2 0.48   3   0.60
2 1.22   3   0.40
3 0.35   6   0.05
3 0.44   6   0.40
3 0.46   6   1.20
3 1.21   6   0.45
4 0.36   9   0.06
4 0.45   9   0.60
4 0.48   9   1.80
4 1.22   9   0.50



ID is the database ID, rrt is an identifier for the result, Mnd is the timepoint
of analysis and Result is... the result of the test.
What I need is this dataframe in a wide format (which I managed with dat2 <-
as.data.frame(tapply(dat$Result,list(rrt=dat$rrt,Mnd=dat$Mnd), function(x)
paste(x[x!=""],collapse="/"))) )
But as you can see, rrt is not an exact identifier for the result.

Sometimes rrt for 0 Mnd is 0.45, but at 6 Mnd the rrt is 0.44.
Now I need the results to align so that one can easily see how rrt x is evolving
within the Mnd time points.
I tried with different rounding procedures (round every 0.02, check that no
results are discarded this way, and check for alignment), but nothing seems to
make some sense.
Also tried checking the highest results in each Mnd, align these, determine
correction factors for the rrt for all the other rrts, ...


Some results will follow a trend (like rrt 0.45), some will remain more or less
stable.
But NEVER rrt will switch i with each other!




Ultimately I need to update in the db, so I need a list/dataframe with the ID,
the original rrt and the adjusted rrt (maybe the first occuring rrt, or the mean
of the rrts, doesn't matter).




Any ideas about which algorithms can be used? I searched on pubmed, but
couldn't find anything




Thanks


Bart


PS: to get the data:


dat <-
structure(list(ID = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 3L,
4L, 4L, 4L, 4L), rrt = c(0.45, 0.48, 1.24, 0.45, 0.48, 1.22,
0.35, 0.44, 0.46, 1.21, 0.36, 0.45, 0.48, 1.22), Mnd = c(0L,
0L, 0L, 3L, 3L, 3L, 6L, 6L, 6L, 6L, 9L, 9L, 9L, 9L), Result = c(0.1,
0.3, 0.5, 0.2, 0.6, 0.4, 0.05, 0.4, 1.2, 0.45, 0.06, 0.6, 1.8,
0.5)), .Names = c("ID", "rrt", "Mnd",
"Result"), class = "data.frame", row.names = c(NA,
-14L))



resulting dataframe:
dat3 <-
structure(list(Time = c(0.355, 0.45, 0.48, 1.22), `0` = c(NA,
0.1, 0.3, 0.5), `3` = c(NA, 0.2, 0.6, 0.4), `6` = c(0.05, 0.4,
1.2, 0.45), `9` = c(0.06, 0.6, 1.8, 0.5)), .Names = c("Time",
"0", "3", "6", "9"), class =
"data.frame", row.names = c(NA,
-4L))




    [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Bart Joosen

2013-Jan-25 06:48 UTC

head link

[R] sorting/grouping/classification problem?

Nice suggestion for the extra "Time" column.

But I think I didn't ask clear enough my problem.
My main problem is to find a way to "classify" the rrt's, so that
we don't have to check each dataframe by our selfs.

So I need a function that fills in the extra "Time" column by taking a
look at the rrt's (and maybe the results), and take the discision which rrts
are the same, and which are new ones.

As stated: rrt's never switch place, and results can't be concatenated
or averaged within a Mnd.

I hope my question is a bit more clear now.

Thank you all for your suggestions

Bart
> Date: Thu, 24 Jan 2013 15:01:40 -0800
> Subject: Re: [R] sorting/grouping/classification problem?
> From: djmuser@gmail.com
> To: bartjoosen@hotmail.com
> 
> Hi:
> 
> Here's a potential workaround:
> 
> # Add a time order variable
> dat$ord <- c(rep(2:4, 2), rep(1:4, 2))
> 
> # Average rrt by ord
> dat$Time <- with(dat, ave(rrt, ord, FUN = mean))
> dat
> 
> # Reshape the data
> 
> library(reshape2)
> > dcast(dat, Time ~ Mnd, value.var = "Result")
>     Time   0   3    6    9
> 1 0.3550  NA  NA 0.05 0.06
> 2 0.4475 0.1 0.2 0.40 0.60
> 3 0.4750 0.3 0.6 1.20 1.80
> 4 1.2225 0.5 0.4 0.45 0.50
> 
> You could always round dat$Time to two decimal places in its
> definition before doing the cast if you so desired.
> 
> Dennis
> 
> On Thu, Jan 24, 2013 at 11:31 AM, Bart Joosen
<bartjoosen@hotmail.com> wrote:
> >
> > Hi,
> >
> >
> > I'm a database admin for a database which manage chromatographic
results of products during stability studies.
> > I use R for the reporting of the results in MS Word through R2wd.
> >
> >
> > But now I think I need your help:
> > suppose we have the following data frame:
> >
> >
> >    ID  rrt Mnd Result
> > 1 0.45   0   0.10
> > 1 0.48   0   0.30
> > 1 1.24   0   0.50
> > 2 0.45   3   0.20
> > 2 0.48   3   0.60
> > 2 1.22   3   0.40
> > 3 0.35   6   0.05
> > 3 0.44   6   0.40
> > 3 0.46   6   1.20
> > 3 1.21   6   0.45
> > 4 0.36   9   0.06
> > 4 0.45   9   0.60
> > 4 0.48   9   1.80
> > 4 1.22   9   0.50
> >
> >
> >
> > ID is the database ID, rrt is an identifier for the result, Mnd is the
timepoint of analysis and Result is... the result of the test.
> > What I need is this dataframe in a wide format (which I managed with
dat2 <- as.data.frame(tapply(dat$Result,list(rrt=dat$rrt,Mnd=dat$Mnd),
function(x) paste(x[x!=""],collapse="/"))) )
> > But as you can see, rrt is not an exact identifier for the result.
> >
> > Sometimes rrt for 0 Mnd is 0.45, but at 6 Mnd the rrt is 0.44.
> > Now I need the results to align so that one can easily see how rrt x
is evolving within the Mnd time points.
> > I tried with different rounding procedures (round every 0.02, check
that no results are discarded this way, and check for alignment), but nothing
seems to make some sense.
> > Also tried checking the highest results in each Mnd, align these,
determine correction factors for the rrt for all the other rrts, ...
> >
> >
> > Some results will follow a trend (like rrt 0.45), some will remain
more or less stable.
> > But NEVER rrt will switch i with each other!
> >
> >
> >
> >
> > Ultimately I need to update in the db, so I need a list/dataframe with
the ID, the original rrt and the adjusted rrt (maybe the first occuring rrt, or
the mean of the rrts, doesn't matter).
> >
> >
> >
> >
> > Any ideas about which algorithms can be used? I searched on pubmed,
but couldn't find anything
> >
> >
> >
> >
> > Thanks
> >
> >
> > Bart
> >
> >
> > PS: to get the data:
> >
> >
> > dat <-
> > structure(list(ID = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 3L,
> > 4L, 4L, 4L, 4L), rrt = c(0.45, 0.48, 1.24, 0.45, 0.48, 1.22,
> > 0.35, 0.44, 0.46, 1.21, 0.36, 0.45, 0.48, 1.22), Mnd = c(0L,
> > 0L, 0L, 3L, 3L, 3L, 6L, 6L, 6L, 6L, 9L, 9L, 9L, 9L), Result = c(0.1,
> > 0.3, 0.5, 0.2, 0.6, 0.4, 0.05, 0.4, 1.2, 0.45, 0.06, 0.6, 1.8,
> > 0.5)), .Names = c("ID", "rrt", "Mnd",
"Result"), class = "data.frame", row.names = c(NA,
> > -14L))
> >
> >
> >
> > resulting dataframe:
> > dat3 <-
> > structure(list(Time = c(0.355, 0.45, 0.48, 1.22), `0` = c(NA,
> > 0.1, 0.3, 0.5), `3` = c(NA, 0.2, 0.6, 0.4), `6` = c(0.05, 0.4,
> > 1.2, 0.45), `9` = c(0.06, 0.6, 1.8, 0.5)), .Names =
c("Time",
> > "0", "3", "6", "9"), class =
"data.frame", row.names = c(NA,
> > -4L))
> >
> >
> >
> >
> >         [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code. 		 	   		  
	[[alternative HTML version deleted]]

Bart Joosen

2013-Jan-28 19:50 UTC

head link

[R] sorting/grouping/classification problem?

Hi,
 
after all, brute force seems  the way to go.
 
I will use a simplified example to illustrate what I want (dump of dat4 is
below):
suppose dat4:
ID  rrt Mnd Result
 1 0.45   0    0.1
 1 0.48   0    0.3
 1 1.24   0    0.5
 2 0.45   3    0.2
 2 0.48   3    0.6
 2 1.22   3    0.4
 
I want to generate all possible combinations of Mnd 0 with Mnd 3 to
calculate the sum of the squared differences divided by the number of rrt's
eg:
Mnd 0 rrt 0.45 gives result 0.1 where Mnd 3 rrt 0.45 gives 0.2 
At the same time rrt 0.48 at Mnd 0 gives 0.3 where rrt 0.48 at Mnd 3 gives
0.6
The same for rrt 1.24 at Mnd 0: 0.5 ....
This gives (0.1-0.2) ^2 + (0.3-0.6)^2  + ....  
 
The permutations should follow this rules:
- rrt's can never differ more than 10%
- rrt's can never switch (eg if rrt 0.45 at 0 Mnd is coupled to 0.48 mnd at
3 Mnd, then 0.48 at 0Mnd can not be coupled to 0.45 at 3 Mnd)
- rrt's can be coupled to NA values and shouldn't be coupled
necessarily.
 
 
I already played with combn, and expand.grid, but couldn't figure out how to
generate the combinations...
 
The  goal is to minimize the resulting value, but be aware of the fact that
the problem above is simplified, and thus isn't limited to only 2 Mnd
values, but maybe 5 - 10.
 
Thanks
 
Bart
 
 
dat4 <-
structure(list(ID = c(1L, 1L, 1L, 2L, 2L, 2L), rrt = c(0.45, 
0.48, 1.24, 0.45, 0.48, 1.22), Mnd = c(0L, 0L, 0L, 3L, 3L, 3L
), Result = c(0.1, 0.3, 0.5, 0.2, 0.6, 0.4)), .Names = c("ID", 
"rrt", "Mnd", "Result"), row.names = c(NA, 6L),
class = "data.frame")
 
 
 
 		 	   		  
	[[alternative HTML version deleted]]

Maybe Matching Threads

Search for more seemingly similar threads

R help - Jan 2013 - sorting/grouping/classification problem?

[R] sorting/grouping/classification problem?

[R] sorting/grouping/classification problem?

[R] sorting/grouping/classification problem?

[R] sorting/grouping/classification problem?

[R] sorting/grouping/classification problem?

Maybe Matching Threads