Hi Bert, Thank you for your message. I am looking into ave() and tapply() as you suggested but at the same time I have prepared a example of input and output files, just in case you or someone else would like to make an attempt to generate a code that goes from input to output. Please see below or download it from https://www.dropbox.com/s/qhmpkkrejjkpbkx/sample_code.txt?dl=0 # this is (an extract of) the INPUT file I have: f1 <- structure(list(v1 = c("A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B"), v2 = c("A", "B", "C", "A", "B", "C", "A", "B", "C", "A", "B", "C"), v3 = c("B", "B", "B", "C", "C", "C", "B", "B", "B", "C", "C", "C"), v4 = c(18.18530, 3.43806,0.00273, 1.42917, 1.05786, 0.00042, 2.37232, 3.01835, 0, 1.13430, 0.92872, 0)), .Names = c("v1", "v2", "v3", "v4"), class = "data.frame", row.names c(2L, 9L, 11L, 41L, 48L, 50L, 158L, 165L, 167L, 197L, 204L, 206L)) # this is (an extract of) the OUTPUT file I would like to obtain: f2 <- structure(list(v1 = c("A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B"), v2 = c("A", "B", "C", "A", "B", "C", "A", "B", "C", "A", "B", "C"), v3 = c("B", "B", "B", "C", "C", "C", "B", "B", "B", "C", "C", "C"), v4 = c(17.83529, 3.43806,0.00295, 1.77918, 1.05786, 0.0002, 2.37232, 3.01835, 0, 1.13430, 0.92872, 0)), .Names = c("v1", "v2", "v3", "v4"), class = "data.frame", row.names c(2L, 9L, 11L, 41L, 48L, 50L, 158L, 165L, 167L, 197L, 204L, 206L)) # please notice that while the aggregated v4 on v3 has changed ? aggregate(f1[,c("v4")],list(f1$v3),sum) aggregate(f2[,c("v4")],list(f2$v3),sum) # ? the aggregated v4 over v1xv2 has remained unchanged: aggregate(f1[,c("v4")],list(f1$v1,f1$v2),sum) aggregate(f2[,c("v4")],list(f2$v1,f2$v2),sum) Thank you very much in advance for your assitance. Luca 2015-03-21 13:18 GMT+01:00 Bert Gunter <gunter.berton at gene.com>:> 1. Still not sure what you mean, but maybe look at ?ave and ?tapply, > for which ave() is a wrapper. > > 2. You still need to heed the rest of Jeff's advice. > > Cheers, > Bert > > Bert Gunter > Genentech Nonclinical Biostatistics > (650) 467-7374 > > "Data is not information. Information is not knowledge. And knowledge > is certainly not wisdom." > Clifford Stoll > > > > > On Sat, Mar 21, 2015 at 4:53 AM, Luca Meyer <lucam1968 at gmail.com> wrote: > > Hi Jeff & other R-experts, > > > > Thank you for your note. I have tried myself to solve the issue without > > success. > > > > Following your suggestion, I am providing a sample of the dataset I am > > using below (also downloadble in plain text from > > https://www.dropbox.com/s/qhmpkkrejjkpbkx/sample_code.txt?dl=0): > > > > #this is an extract of the overall dataset (n=1200 cases) > > f1 <- structure(list(v1 = c("A", "A", "A", "A", "A", "A", "B", "B", > > "B", "B", "B", "B"), v2 = c("A", "B", "C", "A", "B", "C", "A", > > "B", "C", "A", "B", "C"), v3 = c("B", "B", "B", "C", "C", "C", > > "B", "B", "B", "C", "C", "C"), v4 = c(18.1853007621835, 3.43806581506388, > > 0.002733567617055, 1.42917483425029, 1.05786640463504, > > 0.000420548864162308, > > 2.37232740842861, 3.01835841813241, 0, 1.13430282139936, > 0.928725667117666, > > 0)), .Names = c("v1", "v2", "v3", "v4"), class = "data.frame", row.names > > > c(2L, > > 9L, 11L, 41L, 48L, 50L, 158L, 165L, 167L, 197L, 204L, 206L)) > > > > I need to find a automated procedure that allows me to adjust v3 > marginals > > while maintaining v1xv2 marginals unchanged. > > > > That is: modify the v4 values you can find by running: > > > > aggregate(f1[,c("v4")],list(f1$v3),sum) > > > > while maintaining costant the values you can find by running: > > > > aggregate(f1[,c("v4")],list(f1$v1,f1$v2),sum) > > > > Now does it make sense? > > > > Please notice I have tried to build some syntax that tries to modify > values > > within each v1xv2 combination by computing sum of v4, row percentage in > > terms of v4, and there is where my effort is blocked. Not really sure > how I > > should proceed. Any suggestion? > > > > Thanks, > > > > Luca > > > > > > 2015-03-19 2:38 GMT+01:00 Jeff Newmiller <jdnewmil at dcn.davis.ca.us>: > > > >> I don't understand your description. The standard practice on this list > is > >> to provide a reproducible R example [1] of the kind of data you are > working > >> with (and any code you have tried) to go along with your description. In > >> this case, that would be two dputs of your input data frames and a dput > of > >> an output data frame (generated by hand from your input data frame). > >> (Probably best to not use the full number of input values just to keep > the > >> size down.) We could then make an attempt to generate code that goes > from > >> input to output. > >> > >> Of course, if you post that hard work using HTML then it will get > >> corrupted (much like the text below from your earlier emails) and we > won't > >> be able to use it. Please learn to post from your email software using > >> plain text when corresponding with this mailing list. > >> > >> [1] > >> > http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example > >> > --------------------------------------------------------------------------- > >> Jeff Newmiller The ..... ..... Go > Live... > >> DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live > >> Go... > >> Live: OO#.. Dead: OO#.. Playing > >> Research Engineer (Solar/Batteries O.O#. #.O#. with > >> /Software/Embedded Controllers) .OO#. .OO#. > rocks...1k > >> > --------------------------------------------------------------------------- > >> Sent from my phone. Please excuse my brevity. > >> > >> On March 18, 2015 9:05:37 AM PDT, Luca Meyer <lucam1968 at gmail.com> > wrote: > >> >Thanks for you input Michael, > >> > > >> >The continuous variable I have measures quantities (down to the 3rd > >> >decimal level) so unfortunately are not frequencies. > >> > > >> >Any more specific suggestions on how that could be tackled? > >> > > >> >Thanks & kind regards, > >> > > >> >Luca > >> > > >> > > >> >==> >> > > >> >Michael Friendly wrote: > >> >I'm not sure I understand completely what you want to do, but > >> >if the data were frequencies, it sounds like task for fitting a > >> >loglinear model with the model formula > >> > > >> >~ V1*V2 + V3 > >> > > >> >On 3/18/2015 2:17 AM, Luca Meyer wrote: > >> >>* Hello, > >> >*>>* I am facing a quite challenging task (at least to me) and I was > >> >wondering > >> >*>* if someone could advise how R could assist me to speed the task up. > >> >*>>* I am dealing with a dataset with 3 discrete variables and one > >> >continuous > >> >*>* variable. The discrete variables are: > >> >*>>* V1: 8 modalities > >> >*>* V2: 13 modalities > >> >*>* V3: 13 modalities > >> >*>>* The continuous variable V4 is a decimal number always greater than > >> >zero in > >> >*>* the marginals of each of the 3 variables but it is sometimes equal > >> >to zero > >> >*>* (and sometimes negative) in the joint tables. > >> >*>>* I have got 2 files: > >> >*>>* => one with distribution of all possible combinations of V1xV2 > >> >(some of > >> >*>* which are zero or neagtive) and > >> >*>* => one with the marginal distribution of V3. > >> >*>>* I am trying to build the long and narrow dataset V1xV2xV3 in such > >> >a way > >> >*>* that each V1xV2 cell does not get modified and V3 fits as closely > >> >as > >> >*>* possible to its marginal distribution. Does it make sense? > >> >*>>* To be even more specific, my 2 input files look like the > >> >following. > >> >*>>* FILE 1 > >> >*>* V1,V2,V4 > >> >*>* A, A, 24.251 > >> >*>* A, B, 1.065 > >> >*>* (...) > >> >*>* B, C, 0.294 > >> >*>* B, D, 2.731 > >> >*>* (...) > >> >*>* H, L, 0.345 > >> >*>* H, M, 0.000 > >> >*>>* FILE 2 > >> >*>* V3, V4 > >> >*>* A, 1.575 > >> >*>* B, 4.294 > >> >*>* C, 10.044 > >> >*>* (...) > >> >*>* L, 5.123 > >> >*>* M, 3.334 > >> >*>>* What I need to achieve is a file such as the following > >> >*>>* FILE 3 > >> >*>* V1, V2, V3, V4 > >> >*>* A, A, A, ??? > >> >*>* A, A, B, ??? > >> >*>* (...) > >> >*>* D, D, E, ??? > >> >*>* D, D, F, ??? > >> >*>* (...) > >> >*>* H, M, L, ??? > >> >*>* H, M, M, ??? > >> >*>>* Please notice that FILE 3 need to be such that if I aggregate on > >> >V1+V2 I > >> >*>* recover exactly FILE 1 and that if I aggregate on V3 I can recover > >> >a file > >> >*>* as close as possible to FILE 3 (ideally the same file). > >> >*>>* Can anyone suggest how I could do that with R? > >> >*>>* Thank you very much indeed for any assistance you are able to > >> >provide. > >> >*>>* Kind regards, > >> >*>>* Luca* > >> > > >> > [[alternative HTML version deleted]] > >> > > >> >______________________________________________ > >> >R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > >> >https://stat.ethz.ch/mailman/listinfo/r-help > >> >PLEASE do read the posting guide > >> >http://www.R-project.org/posting-guide.html > >> >and provide commented, minimal, self-contained, reproducible code. > >> > >> > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
z <- rnorm(nrow(f1)) ## or anything you want z1 <- f1$v4 + z - with(f1,ave(z,v1,v2,FUN=mean)) aggregate(v4~v1,f1,sum) aggregate(z1~v1,f1,sum) aggregate(v4~v2,f1,sum) aggregate(z1~v2,f1,sum) aggregate(v4~v3,f1,sum) aggregate(z1~v3,f1,sum) Cheers, Bert Bert Gunter Genentech Nonclinical Biostatistics (650) 467-7374 "Data is not information. Information is not knowledge. And knowledge is certainly not wisdom." Clifford Stoll On Sat, Mar 21, 2015 at 6:49 AM, Luca Meyer <lucam1968 at gmail.com> wrote:> Hi Bert, > > Thank you for your message. I am looking into ave() and tapply() as you > suggested but at the same time I have prepared a example of input and output > files, just in case you or someone else would like to make an attempt to > generate a code that goes from input to output. > > Please see below or download it from > https://www.dropbox.com/s/qhmpkkrejjkpbkx/sample_code.txt?dl=0 > > # this is (an extract of) the INPUT file I have: > f1 <- structure(list(v1 = c("A", "A", "A", "A", "A", "A", "B", "B", > "B", "B", "B", "B"), v2 = c("A", "B", "C", "A", "B", "C", "A", > "B", "C", "A", "B", "C"), v3 = c("B", "B", "B", "C", "C", "C", > "B", "B", "B", "C", "C", "C"), v4 = c(18.18530, 3.43806,0.00273, 1.42917, > 1.05786, 0.00042, 2.37232, 3.01835, 0, 1.13430, 0.92872, > 0)), .Names = c("v1", "v2", "v3", "v4"), class = "data.frame", row.names > c(2L, > 9L, 11L, 41L, 48L, 50L, 158L, 165L, 167L, 197L, 204L, 206L)) > > # this is (an extract of) the OUTPUT file I would like to obtain: > f2 <- structure(list(v1 = c("A", "A", "A", "A", "A", "A", "B", "B", > "B", "B", "B", "B"), v2 = c("A", "B", "C", "A", "B", "C", "A", > "B", "C", "A", "B", "C"), v3 = c("B", "B", "B", "C", "C", "C", > "B", "B", "B", "C", "C", "C"), v4 = c(17.83529, 3.43806,0.00295, 1.77918, > 1.05786, 0.0002, 2.37232, 3.01835, 0, 1.13430, 0.92872, > 0)), .Names = c("v1", "v2", "v3", "v4"), class = "data.frame", row.names > c(2L, > 9L, 11L, 41L, 48L, 50L, 158L, 165L, 167L, 197L, 204L, 206L)) > > # please notice that while the aggregated v4 on v3 has changed ? > aggregate(f1[,c("v4")],list(f1$v3),sum) > aggregate(f2[,c("v4")],list(f2$v3),sum) > > # ? the aggregated v4 over v1xv2 has remained unchanged: > aggregate(f1[,c("v4")],list(f1$v1,f1$v2),sum) > aggregate(f2[,c("v4")],list(f2$v1,f2$v2),sum) > > Thank you very much in advance for your assitance. > > Luca > > 2015-03-21 13:18 GMT+01:00 Bert Gunter <gunter.berton at gene.com>: >> >> 1. Still not sure what you mean, but maybe look at ?ave and ?tapply, >> for which ave() is a wrapper. >> >> 2. You still need to heed the rest of Jeff's advice. >> >> Cheers, >> Bert >> >> Bert Gunter >> Genentech Nonclinical Biostatistics >> (650) 467-7374 >> >> "Data is not information. Information is not knowledge. And knowledge >> is certainly not wisdom." >> Clifford Stoll >> >> >> >> >> On Sat, Mar 21, 2015 at 4:53 AM, Luca Meyer <lucam1968 at gmail.com> wrote: >> > Hi Jeff & other R-experts, >> > >> > Thank you for your note. I have tried myself to solve the issue without >> > success. >> > >> > Following your suggestion, I am providing a sample of the dataset I am >> > using below (also downloadble in plain text from >> > https://www.dropbox.com/s/qhmpkkrejjkpbkx/sample_code.txt?dl=0): >> > >> > #this is an extract of the overall dataset (n=1200 cases) >> > f1 <- structure(list(v1 = c("A", "A", "A", "A", "A", "A", "B", "B", >> > "B", "B", "B", "B"), v2 = c("A", "B", "C", "A", "B", "C", "A", >> > "B", "C", "A", "B", "C"), v3 = c("B", "B", "B", "C", "C", "C", >> > "B", "B", "B", "C", "C", "C"), v4 = c(18.1853007621835, >> > 3.43806581506388, >> > 0.002733567617055, 1.42917483425029, 1.05786640463504, >> > 0.000420548864162308, >> > 2.37232740842861, 3.01835841813241, 0, 1.13430282139936, >> > 0.928725667117666, >> > 0)), .Names = c("v1", "v2", "v3", "v4"), class = "data.frame", row.names >> > >> > c(2L, >> > 9L, 11L, 41L, 48L, 50L, 158L, 165L, 167L, 197L, 204L, 206L)) >> > >> > I need to find a automated procedure that allows me to adjust v3 >> > marginals >> > while maintaining v1xv2 marginals unchanged. >> > >> > That is: modify the v4 values you can find by running: >> > >> > aggregate(f1[,c("v4")],list(f1$v3),sum) >> > >> > while maintaining costant the values you can find by running: >> > >> > aggregate(f1[,c("v4")],list(f1$v1,f1$v2),sum) >> > >> > Now does it make sense? >> > >> > Please notice I have tried to build some syntax that tries to modify >> > values >> > within each v1xv2 combination by computing sum of v4, row percentage in >> > terms of v4, and there is where my effort is blocked. Not really sure >> > how I >> > should proceed. Any suggestion? >> > >> > Thanks, >> > >> > Luca >> > >> > >> > 2015-03-19 2:38 GMT+01:00 Jeff Newmiller <jdnewmil at dcn.davis.ca.us>: >> > >> >> I don't understand your description. The standard practice on this list >> >> is >> >> to provide a reproducible R example [1] of the kind of data you are >> >> working >> >> with (and any code you have tried) to go along with your description. >> >> In >> >> this case, that would be two dputs of your input data frames and a dput >> >> of >> >> an output data frame (generated by hand from your input data frame). >> >> (Probably best to not use the full number of input values just to keep >> >> the >> >> size down.) We could then make an attempt to generate code that goes >> >> from >> >> input to output. >> >> >> >> Of course, if you post that hard work using HTML then it will get >> >> corrupted (much like the text below from your earlier emails) and we >> >> won't >> >> be able to use it. Please learn to post from your email software using >> >> plain text when corresponding with this mailing list. >> >> >> >> [1] >> >> >> >> http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example >> >> >> >> --------------------------------------------------------------------------- >> >> Jeff Newmiller The ..... ..... Go >> >> Live... >> >> DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live >> >> Go... >> >> Live: OO#.. Dead: OO#.. >> >> Playing >> >> Research Engineer (Solar/Batteries O.O#. #.O#. with >> >> /Software/Embedded Controllers) .OO#. .OO#. >> >> rocks...1k >> >> >> >> --------------------------------------------------------------------------- >> >> Sent from my phone. Please excuse my brevity. >> >> >> >> On March 18, 2015 9:05:37 AM PDT, Luca Meyer <lucam1968 at gmail.com> >> >> wrote: >> >> >Thanks for you input Michael, >> >> > >> >> >The continuous variable I have measures quantities (down to the 3rd >> >> >decimal level) so unfortunately are not frequencies. >> >> > >> >> >Any more specific suggestions on how that could be tackled? >> >> > >> >> >Thanks & kind regards, >> >> > >> >> >Luca >> >> > >> >> > >> >> >==>> >> > >> >> >Michael Friendly wrote: >> >> >I'm not sure I understand completely what you want to do, but >> >> >if the data were frequencies, it sounds like task for fitting a >> >> >loglinear model with the model formula >> >> > >> >> >~ V1*V2 + V3 >> >> > >> >> >On 3/18/2015 2:17 AM, Luca Meyer wrote: >> >> >>* Hello, >> >> >*>>* I am facing a quite challenging task (at least to me) and I was >> >> >wondering >> >> >*>* if someone could advise how R could assist me to speed the task >> >> > up. >> >> >*>>* I am dealing with a dataset with 3 discrete variables and one >> >> >continuous >> >> >*>* variable. The discrete variables are: >> >> >*>>* V1: 8 modalities >> >> >*>* V2: 13 modalities >> >> >*>* V3: 13 modalities >> >> >*>>* The continuous variable V4 is a decimal number always greater >> >> > than >> >> >zero in >> >> >*>* the marginals of each of the 3 variables but it is sometimes equal >> >> >to zero >> >> >*>* (and sometimes negative) in the joint tables. >> >> >*>>* I have got 2 files: >> >> >*>>* => one with distribution of all possible combinations of V1xV2 >> >> >(some of >> >> >*>* which are zero or neagtive) and >> >> >*>* => one with the marginal distribution of V3. >> >> >*>>* I am trying to build the long and narrow dataset V1xV2xV3 in such >> >> >a way >> >> >*>* that each V1xV2 cell does not get modified and V3 fits as closely >> >> >as >> >> >*>* possible to its marginal distribution. Does it make sense? >> >> >*>>* To be even more specific, my 2 input files look like the >> >> >following. >> >> >*>>* FILE 1 >> >> >*>* V1,V2,V4 >> >> >*>* A, A, 24.251 >> >> >*>* A, B, 1.065 >> >> >*>* (...) >> >> >*>* B, C, 0.294 >> >> >*>* B, D, 2.731 >> >> >*>* (...) >> >> >*>* H, L, 0.345 >> >> >*>* H, M, 0.000 >> >> >*>>* FILE 2 >> >> >*>* V3, V4 >> >> >*>* A, 1.575 >> >> >*>* B, 4.294 >> >> >*>* C, 10.044 >> >> >*>* (...) >> >> >*>* L, 5.123 >> >> >*>* M, 3.334 >> >> >*>>* What I need to achieve is a file such as the following >> >> >*>>* FILE 3 >> >> >*>* V1, V2, V3, V4 >> >> >*>* A, A, A, ??? >> >> >*>* A, A, B, ??? >> >> >*>* (...) >> >> >*>* D, D, E, ??? >> >> >*>* D, D, F, ??? >> >> >*>* (...) >> >> >*>* H, M, L, ??? >> >> >*>* H, M, M, ??? >> >> >*>>* Please notice that FILE 3 need to be such that if I aggregate on >> >> >V1+V2 I >> >> >*>* recover exactly FILE 1 and that if I aggregate on V3 I can recover >> >> >a file >> >> >*>* as close as possible to FILE 3 (ideally the same file). >> >> >*>>* Can anyone suggest how I could do that with R? >> >> >*>>* Thank you very much indeed for any assistance you are able to >> >> >provide. >> >> >*>>* Kind regards, >> >> >*>>* Luca* >> >> > >> >> > [[alternative HTML version deleted]] >> >> > >> >> >______________________________________________ >> >> >R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> >> >https://stat.ethz.ch/mailman/listinfo/r-help >> >> >PLEASE do read the posting guide >> >> >http://www.R-project.org/posting-guide.html >> >> >and provide commented, minimal, self-contained, reproducible code. >> >> >> >> >> > >> > [[alternative HTML version deleted]] >> > >> > ______________________________________________ >> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> > https://stat.ethz.ch/mailman/listinfo/r-help >> > PLEASE do read the posting guide >> > http://www.R-project.org/posting-guide.html >> > and provide commented, minimal, self-contained, reproducible code. > >
... or cleaner: z1 <- with(f1,v4 + z -ave(z,v1,v2,FUN=mean)) Just for curiosity, was this homework? (in which case I should probably have not provided you an answer -- that is, assuming that I HAVE provided an answer). Cheers, Bert Bert Gunter Genentech Nonclinical Biostatistics (650) 467-7374 "Data is not information. Information is not knowledge. And knowledge is certainly not wisdom." Clifford Stoll On Sat, Mar 21, 2015 at 7:53 AM, Bert Gunter <bgunter at gene.com> wrote:> z <- rnorm(nrow(f1)) ## or anything you want > z1 <- f1$v4 + z - with(f1,ave(z,v1,v2,FUN=mean)) > > > aggregate(v4~v1,f1,sum) > aggregate(z1~v1,f1,sum) > aggregate(v4~v2,f1,sum) > aggregate(z1~v2,f1,sum) > aggregate(v4~v3,f1,sum) > aggregate(z1~v3,f1,sum) > > > Cheers, > Bert > > Bert Gunter > Genentech Nonclinical Biostatistics > (650) 467-7374 > > "Data is not information. Information is not knowledge. And knowledge > is certainly not wisdom." > Clifford Stoll > > > > > On Sat, Mar 21, 2015 at 6:49 AM, Luca Meyer <lucam1968 at gmail.com> wrote: >> Hi Bert, >> >> Thank you for your message. I am looking into ave() and tapply() as you >> suggested but at the same time I have prepared a example of input and output >> files, just in case you or someone else would like to make an attempt to >> generate a code that goes from input to output. >> >> Please see below or download it from >> https://www.dropbox.com/s/qhmpkkrejjkpbkx/sample_code.txt?dl=0 >> >> # this is (an extract of) the INPUT file I have: >> f1 <- structure(list(v1 = c("A", "A", "A", "A", "A", "A", "B", "B", >> "B", "B", "B", "B"), v2 = c("A", "B", "C", "A", "B", "C", "A", >> "B", "C", "A", "B", "C"), v3 = c("B", "B", "B", "C", "C", "C", >> "B", "B", "B", "C", "C", "C"), v4 = c(18.18530, 3.43806,0.00273, 1.42917, >> 1.05786, 0.00042, 2.37232, 3.01835, 0, 1.13430, 0.92872, >> 0)), .Names = c("v1", "v2", "v3", "v4"), class = "data.frame", row.names >> c(2L, >> 9L, 11L, 41L, 48L, 50L, 158L, 165L, 167L, 197L, 204L, 206L)) >> >> # this is (an extract of) the OUTPUT file I would like to obtain: >> f2 <- structure(list(v1 = c("A", "A", "A", "A", "A", "A", "B", "B", >> "B", "B", "B", "B"), v2 = c("A", "B", "C", "A", "B", "C", "A", >> "B", "C", "A", "B", "C"), v3 = c("B", "B", "B", "C", "C", "C", >> "B", "B", "B", "C", "C", "C"), v4 = c(17.83529, 3.43806,0.00295, 1.77918, >> 1.05786, 0.0002, 2.37232, 3.01835, 0, 1.13430, 0.92872, >> 0)), .Names = c("v1", "v2", "v3", "v4"), class = "data.frame", row.names >> c(2L, >> 9L, 11L, 41L, 48L, 50L, 158L, 165L, 167L, 197L, 204L, 206L)) >> >> # please notice that while the aggregated v4 on v3 has changed ? >> aggregate(f1[,c("v4")],list(f1$v3),sum) >> aggregate(f2[,c("v4")],list(f2$v3),sum) >> >> # ? the aggregated v4 over v1xv2 has remained unchanged: >> aggregate(f1[,c("v4")],list(f1$v1,f1$v2),sum) >> aggregate(f2[,c("v4")],list(f2$v1,f2$v2),sum) >> >> Thank you very much in advance for your assitance. >> >> Luca >> >> 2015-03-21 13:18 GMT+01:00 Bert Gunter <gunter.berton at gene.com>: >>> >>> 1. Still not sure what you mean, but maybe look at ?ave and ?tapply, >>> for which ave() is a wrapper. >>> >>> 2. You still need to heed the rest of Jeff's advice. >>> >>> Cheers, >>> Bert >>> >>> Bert Gunter >>> Genentech Nonclinical Biostatistics >>> (650) 467-7374 >>> >>> "Data is not information. Information is not knowledge. And knowledge >>> is certainly not wisdom." >>> Clifford Stoll >>> >>> >>> >>> >>> On Sat, Mar 21, 2015 at 4:53 AM, Luca Meyer <lucam1968 at gmail.com> wrote: >>> > Hi Jeff & other R-experts, >>> > >>> > Thank you for your note. I have tried myself to solve the issue without >>> > success. >>> > >>> > Following your suggestion, I am providing a sample of the dataset I am >>> > using below (also downloadble in plain text from >>> > https://www.dropbox.com/s/qhmpkkrejjkpbkx/sample_code.txt?dl=0): >>> > >>> > #this is an extract of the overall dataset (n=1200 cases) >>> > f1 <- structure(list(v1 = c("A", "A", "A", "A", "A", "A", "B", "B", >>> > "B", "B", "B", "B"), v2 = c("A", "B", "C", "A", "B", "C", "A", >>> > "B", "C", "A", "B", "C"), v3 = c("B", "B", "B", "C", "C", "C", >>> > "B", "B", "B", "C", "C", "C"), v4 = c(18.1853007621835, >>> > 3.43806581506388, >>> > 0.002733567617055, 1.42917483425029, 1.05786640463504, >>> > 0.000420548864162308, >>> > 2.37232740842861, 3.01835841813241, 0, 1.13430282139936, >>> > 0.928725667117666, >>> > 0)), .Names = c("v1", "v2", "v3", "v4"), class = "data.frame", row.names >>> > >>> > c(2L, >>> > 9L, 11L, 41L, 48L, 50L, 158L, 165L, 167L, 197L, 204L, 206L)) >>> > >>> > I need to find a automated procedure that allows me to adjust v3 >>> > marginals >>> > while maintaining v1xv2 marginals unchanged. >>> > >>> > That is: modify the v4 values you can find by running: >>> > >>> > aggregate(f1[,c("v4")],list(f1$v3),sum) >>> > >>> > while maintaining costant the values you can find by running: >>> > >>> > aggregate(f1[,c("v4")],list(f1$v1,f1$v2),sum) >>> > >>> > Now does it make sense? >>> > >>> > Please notice I have tried to build some syntax that tries to modify >>> > values >>> > within each v1xv2 combination by computing sum of v4, row percentage in >>> > terms of v4, and there is where my effort is blocked. Not really sure >>> > how I >>> > should proceed. Any suggestion? >>> > >>> > Thanks, >>> > >>> > Luca >>> > >>> > >>> > 2015-03-19 2:38 GMT+01:00 Jeff Newmiller <jdnewmil at dcn.davis.ca.us>: >>> > >>> >> I don't understand your description. The standard practice on this list >>> >> is >>> >> to provide a reproducible R example [1] of the kind of data you are >>> >> working >>> >> with (and any code you have tried) to go along with your description. >>> >> In >>> >> this case, that would be two dputs of your input data frames and a dput >>> >> of >>> >> an output data frame (generated by hand from your input data frame). >>> >> (Probably best to not use the full number of input values just to keep >>> >> the >>> >> size down.) We could then make an attempt to generate code that goes >>> >> from >>> >> input to output. >>> >> >>> >> Of course, if you post that hard work using HTML then it will get >>> >> corrupted (much like the text below from your earlier emails) and we >>> >> won't >>> >> be able to use it. Please learn to post from your email software using >>> >> plain text when corresponding with this mailing list. >>> >> >>> >> [1] >>> >> >>> >> http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example >>> >> >>> >> --------------------------------------------------------------------------- >>> >> Jeff Newmiller The ..... ..... Go >>> >> Live... >>> >> DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live >>> >> Go... >>> >> Live: OO#.. Dead: OO#.. >>> >> Playing >>> >> Research Engineer (Solar/Batteries O.O#. #.O#. with >>> >> /Software/Embedded Controllers) .OO#. .OO#. >>> >> rocks...1k >>> >> >>> >> --------------------------------------------------------------------------- >>> >> Sent from my phone. Please excuse my brevity. >>> >> >>> >> On March 18, 2015 9:05:37 AM PDT, Luca Meyer <lucam1968 at gmail.com> >>> >> wrote: >>> >> >Thanks for you input Michael, >>> >> > >>> >> >The continuous variable I have measures quantities (down to the 3rd >>> >> >decimal level) so unfortunately are not frequencies. >>> >> > >>> >> >Any more specific suggestions on how that could be tackled? >>> >> > >>> >> >Thanks & kind regards, >>> >> > >>> >> >Luca >>> >> > >>> >> > >>> >> >==>>> >> > >>> >> >Michael Friendly wrote: >>> >> >I'm not sure I understand completely what you want to do, but >>> >> >if the data were frequencies, it sounds like task for fitting a >>> >> >loglinear model with the model formula >>> >> > >>> >> >~ V1*V2 + V3 >>> >> > >>> >> >On 3/18/2015 2:17 AM, Luca Meyer wrote: >>> >> >>* Hello, >>> >> >*>>* I am facing a quite challenging task (at least to me) and I was >>> >> >wondering >>> >> >*>* if someone could advise how R could assist me to speed the task >>> >> > up. >>> >> >*>>* I am dealing with a dataset with 3 discrete variables and one >>> >> >continuous >>> >> >*>* variable. The discrete variables are: >>> >> >*>>* V1: 8 modalities >>> >> >*>* V2: 13 modalities >>> >> >*>* V3: 13 modalities >>> >> >*>>* The continuous variable V4 is a decimal number always greater >>> >> > than >>> >> >zero in >>> >> >*>* the marginals of each of the 3 variables but it is sometimes equal >>> >> >to zero >>> >> >*>* (and sometimes negative) in the joint tables. >>> >> >*>>* I have got 2 files: >>> >> >*>>* => one with distribution of all possible combinations of V1xV2 >>> >> >(some of >>> >> >*>* which are zero or neagtive) and >>> >> >*>* => one with the marginal distribution of V3. >>> >> >*>>* I am trying to build the long and narrow dataset V1xV2xV3 in such >>> >> >a way >>> >> >*>* that each V1xV2 cell does not get modified and V3 fits as closely >>> >> >as >>> >> >*>* possible to its marginal distribution. Does it make sense? >>> >> >*>>* To be even more specific, my 2 input files look like the >>> >> >following. >>> >> >*>>* FILE 1 >>> >> >*>* V1,V2,V4 >>> >> >*>* A, A, 24.251 >>> >> >*>* A, B, 1.065 >>> >> >*>* (...) >>> >> >*>* B, C, 0.294 >>> >> >*>* B, D, 2.731 >>> >> >*>* (...) >>> >> >*>* H, L, 0.345 >>> >> >*>* H, M, 0.000 >>> >> >*>>* FILE 2 >>> >> >*>* V3, V4 >>> >> >*>* A, 1.575 >>> >> >*>* B, 4.294 >>> >> >*>* C, 10.044 >>> >> >*>* (...) >>> >> >*>* L, 5.123 >>> >> >*>* M, 3.334 >>> >> >*>>* What I need to achieve is a file such as the following >>> >> >*>>* FILE 3 >>> >> >*>* V1, V2, V3, V4 >>> >> >*>* A, A, A, ??? >>> >> >*>* A, A, B, ??? >>> >> >*>* (...) >>> >> >*>* D, D, E, ??? >>> >> >*>* D, D, F, ??? >>> >> >*>* (...) >>> >> >*>* H, M, L, ??? >>> >> >*>* H, M, M, ??? >>> >> >*>>* Please notice that FILE 3 need to be such that if I aggregate on >>> >> >V1+V2 I >>> >> >*>* recover exactly FILE 1 and that if I aggregate on V3 I can recover >>> >> >a file >>> >> >*>* as close as possible to FILE 3 (ideally the same file). >>> >> >*>>* Can anyone suggest how I could do that with R? >>> >> >*>>* Thank you very much indeed for any assistance you are able to >>> >> >provide. >>> >> >*>>* Kind regards, >>> >> >*>>* Luca* >>> >> > >>> >> > [[alternative HTML version deleted]] >>> >> > >>> >> >______________________________________________ >>> >> >R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> >> >https://stat.ethz.ch/mailman/listinfo/r-help >>> >> >PLEASE do read the posting guide >>> >> >http://www.R-project.org/posting-guide.html >>> >> >and provide commented, minimal, self-contained, reproducible code. >>> >> >>> >> >>> > >>> > [[alternative HTML version deleted]] >>> > >>> > ______________________________________________ >>> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> > https://stat.ethz.ch/mailman/listinfo/r-help >>> > PLEASE do read the posting guide >>> > http://www.R-project.org/posting-guide.html >>> > and provide commented, minimal, self-contained, reproducible code. >> >>