Hello R users, I have been using R for a while now for basic stats but I'm now trying to get my head around looping scripts and in some places I am failing! I have a data set with c. 1200 data points on 98 individual animals with data on each row representing a daily measure and I am asking the question "what variables affect the animal's behaviour?" the dataset includes these variables for analyses: presence of behaviour, absence of behaviour, site, year, rain, air temp, ID, Day Listed below as they appear in the data set: BEH_T, BEH_F, SITE, YEAR, PRECIP_MM_DAY, PUP_AGE_EST, MO_AIR_TEMP, ID2, DAY with BEH_T & BEH_F = the response variable for a binomial GLM here is the head of the dataset (NB there are only two years and two sites) BEH_T BEH_F SITE YEAR PRECIP_MM_DAY PUP_AGE_EST MO_AIR_TEMP ID2 DAY [1,] 14 10 1 2007 0 12 10.98750 1 1 [2,] 37 23 1 2007 0 13 11.47333 1 2 [3,] 56 22 1 2007 0 14 12.16667 1 3 [4,] 43 23 1 2007 0 16 10.91515 1 5 [5,] 62 16 1 2007 0 17 12.81026 1 6 [6,] 30 20 1 2007 0 19 8.67037 1 8 (Sorry the headings are skewed) Because I don't want to do too complex a model to start with (just wanting to learn first with a 'simple' model) I have issues with independence of the data as there are repeats of individuals - i.e. data taken on the same IDs on different days. So in order to account for that I have decided to random sample one data point for each ID then run the GLM on that data for x number of simulations to see if the explanatory variables are the same/similar across all models. (This will reduce my data set to 98 data points, but it is the best way I can see of doing this without doing mixed-effects models, since not all IDs are seen at both sites in both years). I am also using the MuMIn package for running all subsets of your model the code I'm using is: for (S in 1:2){ Sample.dat<-ALL.R[1,] for (I in 1:98) { tmp<-ALL.R[ALL.R$ID2==I,] max<-dim(tmp)[1] if (I==1) Sample.dat<-tmp[sample(1:max,1),] else { Sample.dat<-rbind(Sample.dat,tmp[sample(1:max,1),]) m1.R<-glm(cbind(Sample.dat$BEH_T, Sample.dat$BEH_F) ~ Sample.dat$SITE + Sample.dat$YEAR + Sample.dat$PRECIP_MM_DAY + Sample.dat$PUP_AGE_EST + Sample.dat$MO_AIR_TEMP, family="binomial") mod<-dredge(m1.R)}}} At this point I have two issues if I do it manually then it seems to work i.e. gives me one output (e.g shown at bottom of post) where I then want to take the first line, the model with the best AIC using mod[1,] - no problem! However, letting the code run and for example using print ((mod[1,])) at the end it prints out the first line of 98 outputs - so I'm not too sure what I've done wrong here, but it appears to be running a model for each ID - something basic no doubt! Ideally, what I want to do is take a random sample of the data then run the model get one output for that take the top line (i.e. the best AIC) and save this, then run this routine say 100 times, saving that top line every time, then having a look at the results and take a model average. Anytime I've got close to this I have issues with overwriting the previous first line of the model selection and I can't seem to identify how to set this loop up properly. Any advice or guidance would be most appreciated, I have tried to explain my issues clearly but if more info is required please just ask, Many thanks in advance to those of you that took the time to read this! Ross Ross Culloch Ph.D. Student Durham University UK Here is an example of the model selection table from usingMuMIn: Model selection table (Intr) S.$MO_ S.$PRE S.$PUP S.$SIT S.$YEA k Dev. AIC AICc delta weight 30 645.8000 0.03841 -0.02148 0.2882 -0.3212 5 304.0 687.1 687.7 0.000 0.707 32 648.8000 0.03811 0.0009399 -0.02172 0.2857 -0.3227 6 304.0 689.0 690.0 2.249 0.230 26 785.1000 -0.02543 0.4678 -0.3905 4 312.8 693.9 694.3 6.630 0.026 31 794.2000 0.0037260 -0.02627 0.4519 -0.3950 5 312.5 695.5 696.2 8.493 0.010 22 582.7000 0.04703 0.2641 -0.2899 4 314.7 695.8 696.2 8.529 0.010 21 582.8000 0.06893 -0.01967 -0.2899 4 314.9 696.0 696.4 8.717 0.009 29 573.1000 0.04787 -0.0039980 0.2762 -0.2851 5 314.3 697.4 698.0 10.330 0.004 28 600.1000 0.06612 0.0046710 -0.02092 -0.2985 5 314.4 697.4 698.1 10.370 0.004 20 0.7526 0.05509 -0.01808 0.2450 4 321.0 702.0 702.5 14.770 0.000 10 530.4000 0.07447 -0.2639 3 324.0 703.1 703.3 15.640 0.000 27 0.7493 0.05556 -0.0022820 -0.01753 0.2519 5 320.8 703.9 704.6 16.850 0.000 19 530.0000 0.07455 -0.0001489 -0.2637 4 324.0 705.1 705.5 17.820 0.000 16 743.4000 0.4875 -0.3698 3 328.7 707.8 708.0 20.310 0.000 9 0.5512 0.06094 0.2286 3 328.8 707.9 708.1 20.430 0.000 8 0.6828 0.08019 -0.01688 3 328.9 708.0 708.2 20.540 0.000 18 0.5584 0.06173 -0.0059840 0.2481 4 327.8 708.9 709.3 21.620 0.000 25 739.9000 -0.0016930 0.4944 -0.3681 4 328.6 709.7 710.1 22.410 0.000 17 0.6856 0.07953 0.0012680 -0.01720 4 328.9 709.9 710.4 22.670 0.000 2 0.4985 0.08406 2 335.8 712.8 713.0 25.270 0.000 7 0.4996 0.08516 -0.0023780 3 335.6 714.7 714.9 27.240 0.000 14 1.0760 -0.02288 0.5151 3 340.8 719.9 720.1 32.420 0.000 23 1.0760 0.0003492 -0.02296 0.5136 4 340.8 721.9 722.3 34.590 0.000 5 0.8587 0.5304 2 354.0 731.0 731.1 43.440 0.000 12 0.8663 -0.0042170 0.5473 3 353.5 732.5 732.8 45.070 0.000 24 967.8000 0.0198500 -0.03274 -0.4813 4 358.3 739.4 739.8 52.140 0.000 15 942.8000 -0.02909 -0.4689 3 370.5 749.5 749.8 62.090 0.000 13 915.8000 0.0151200 -0.4556 3 384.7 763.7 764.0 76.290 0.000 6 900.3000 -0.4478 2 391.8 768.9 769.0 81.320 0.000 11 1.3530 0.0176300 -0.02957 3 402.3 781.3 781.6 93.890 0.000 4 1.3940 -0.02630 2 412.3 789.4 789.5 101.800 0.000 3 1.1010 0.0134300 2 424.4 801.4 801.6 113.800 0.000 1 1.1550 1 430.3 805.4 805.4 117.700 0.000>-- View this message in context: http://n4.nabble.com/Loop-overwrite-and-data-output-problems-tp1570593p1570593.html Sent from the R help mailing list archive at Nabble.com.
Hi, Since I'm not an expert, I still have problems understanding when it's not my own work, but I have something that might help you. if (I==1) Sample.dat<-tmp[sample(1:max,1),] else { Sample.dat<-rbind(Sample.dat,tmp[sample(1:max,1),]) This part might not be the best. I would do something like: Sample.dat[[I]] <- tmp[sample(1:max, 1),] That way, you will store your line in the Ith element of the list "Sample.dat". 5you might have to define it first like: Sample.dat <- list() ) You can then convert it to a matrix using: do.call(rbind, Sample.dat) It might get you started HTH, Ivan Le 2/26/2010 14:31, RCulloch a ?crit :> Hello R users, > > I have been using R for a while now for basic stats but I'm now trying to > get my head around looping scripts and in some places I am failing! > > I have a data set with c. 1200 data points on 98 individual animals with > data on each row representing a daily measure and I am asking the question > "what variables affect the animal's behaviour?" > > the dataset includes these variables for analyses: > > presence of behaviour, absence of behaviour, site, year, rain, air temp, ID, > Day > > Listed below as they appear in the data set: > > BEH_T, BEH_F, SITE, YEAR, PRECIP_MM_DAY, PUP_AGE_EST, MO_AIR_TEMP, ID2, > DAY > > with BEH_T& BEH_F = the response variable for a binomial GLM > > here is the head of the dataset > (NB there are only two years and two sites) > > BEH_T BEH_F SITE YEAR PRECIP_MM_DAY PUP_AGE_EST MO_AIR_TEMP ID2 DAY > [1,] 14 10 1 2007 0 12 10.98750 1 1 > [2,] 37 23 1 2007 0 13 11.47333 1 2 > [3,] 56 22 1 2007 0 14 12.16667 1 3 > [4,] 43 23 1 2007 0 16 10.91515 1 5 > [5,] 62 16 1 2007 0 17 12.81026 1 6 > [6,] 30 20 1 2007 0 19 8.67037 1 8 > > (Sorry the headings are skewed) > > Because I don't want to do too complex a model to start with (just wanting > to learn first with a 'simple' model) I have issues with independence of the > data as there are repeats of individuals - i.e. data taken on the same IDs > on different days. So in order to account for that I have decided to random > sample one data point for each ID then run the GLM on that data for x number > of simulations to see if the explanatory variables are the same/similar > across all models. (This will reduce my data set to 98 data points, but it > is the best way I can see of doing this without doing mixed-effects models, > since not all IDs are seen at both sites in both years). > > I am also using the MuMIn package for running all subsets of your model > > > the code I'm using is: > > > for (S in 1:2){ > Sample.dat<-ALL.R[1,] > for (I in 1:98) { > tmp<-ALL.R[ALL.R$ID2==I,] > max<-dim(tmp)[1] > if (I==1) Sample.dat<-tmp[sample(1:max,1),] else { > Sample.dat<-rbind(Sample.dat,tmp[sample(1:max,1),]) > m1.R<-glm(cbind(Sample.dat$BEH_T, Sample.dat$BEH_F) ~ Sample.dat$SITE + > Sample.dat$YEAR + Sample.dat$PRECIP_MM_DAY + Sample.dat$PUP_AGE_EST + > Sample.dat$MO_AIR_TEMP, family="binomial") > mod<-dredge(m1.R)}}} > > At this point I have two issues if I do it manually then it seems to work > i.e. gives me one output (e.g shown at bottom of post) where I then want to > take the first line, the model with the best AIC using mod[1,] - no problem! > > However, letting the code run and for example using print ((mod[1,])) at the > end it prints out the first line of 98 outputs - so I'm not too sure what > I've done wrong here, but it appears to be running a model for each ID - > something basic no doubt! > > Ideally, what I want to do is take a random sample of the data then run the > model get one output for that take the top line (i.e. the best AIC) and save > this, then run this routine say 100 times, saving that top line every time, > then having a look at the results and take a model average. Anytime I've got > close to this I have issues with overwriting the previous first line of the > model selection and I can't seem to identify how to set this loop up > properly. > > Any advice or guidance would be most appreciated, I have tried to explain my > issues clearly but if more info is required please just ask, > > Many thanks in advance to those of you that took the time to read this! > > Ross > > Ross Culloch > Ph.D. Student > Durham University > UK > > > > > > > > Here is an example of the model selection table from usingMuMIn: > > > Model selection table > (Intr) S.$MO_ S.$PRE S.$PUP S.$SIT S.$YEA k Dev. AIC AICc > delta weight > 30 645.8000 0.03841 -0.02148 0.2882 -0.3212 5 304.0 687.1 687.7 > 0.000 0.707 > 32 648.8000 0.03811 0.0009399 -0.02172 0.2857 -0.3227 6 304.0 689.0 690.0 > 2.249 0.230 > 26 785.1000 -0.02543 0.4678 -0.3905 4 312.8 693.9 694.3 > 6.630 0.026 > 31 794.2000 0.0037260 -0.02627 0.4519 -0.3950 5 312.5 695.5 696.2 > 8.493 0.010 > 22 582.7000 0.04703 0.2641 -0.2899 4 314.7 695.8 696.2 > 8.529 0.010 > 21 582.8000 0.06893 -0.01967 -0.2899 4 314.9 696.0 696.4 > 8.717 0.009 > 29 573.1000 0.04787 -0.0039980 0.2762 -0.2851 5 314.3 697.4 698.0 > 10.330 0.004 > 28 600.1000 0.06612 0.0046710 -0.02092 -0.2985 5 314.4 697.4 698.1 > 10.370 0.004 > 20 0.7526 0.05509 -0.01808 0.2450 4 321.0 702.0 702.5 > 14.770 0.000 > 10 530.4000 0.07447 -0.2639 3 324.0 703.1 703.3 > 15.640 0.000 > 27 0.7493 0.05556 -0.0022820 -0.01753 0.2519 5 320.8 703.9 704.6 > 16.850 0.000 > 19 530.0000 0.07455 -0.0001489 -0.2637 4 324.0 705.1 705.5 > 17.820 0.000 > 16 743.4000 0.4875 -0.3698 3 328.7 707.8 708.0 > 20.310 0.000 > 9 0.5512 0.06094 0.2286 3 328.8 707.9 708.1 > 20.430 0.000 > 8 0.6828 0.08019 -0.01688 3 328.9 708.0 708.2 > 20.540 0.000 > 18 0.5584 0.06173 -0.0059840 0.2481 4 327.8 708.9 709.3 > 21.620 0.000 > 25 739.9000 -0.0016930 0.4944 -0.3681 4 328.6 709.7 710.1 > 22.410 0.000 > 17 0.6856 0.07953 0.0012680 -0.01720 4 328.9 709.9 710.4 > 22.670 0.000 > 2 0.4985 0.08406 2 335.8 712.8 713.0 > 25.270 0.000 > 7 0.4996 0.08516 -0.0023780 3 335.6 714.7 714.9 > 27.240 0.000 > 14 1.0760 -0.02288 0.5151 3 340.8 719.9 720.1 > 32.420 0.000 > 23 1.0760 0.0003492 -0.02296 0.5136 4 340.8 721.9 722.3 > 34.590 0.000 > 5 0.8587 0.5304 2 354.0 731.0 731.1 > 43.440 0.000 > 12 0.8663 -0.0042170 0.5473 3 353.5 732.5 732.8 > 45.070 0.000 > 24 967.8000 0.0198500 -0.03274 -0.4813 4 358.3 739.4 739.8 > 52.140 0.000 > 15 942.8000 -0.02909 -0.4689 3 370.5 749.5 749.8 > 62.090 0.000 > 13 915.8000 0.0151200 -0.4556 3 384.7 763.7 764.0 > 76.290 0.000 > 6 900.3000 -0.4478 2 391.8 768.9 769.0 > 81.320 0.000 > 11 1.3530 0.0176300 -0.02957 3 402.3 781.3 781.6 > 93.890 0.000 > 4 1.3940 -0.02630 2 412.3 789.4 789.5 > 101.800 0.000 > 3 1.1010 0.0134300 2 424.4 801.4 801.6 > 113.800 0.000 > 1 1.1550 1 430.3 805.4 805.4 > 117.700 0.000 > >>
Hi I am bit confused what you want to achieve. As I can not reproduce code without your data I just guess. If I understand you want to select from all your data randomly 98 values for 98 animals (one for each animal). I presume your id2 is sorted. One option # make sorted ids id2<-sample(1:5, 100, replace=T) id2<-sort(id2) # how many unique ids len<-rle(id2)$lengths # how many values are from beginning shift.len<-c(0,cumsum(len))[-(length(len)+1)] # get one value from each id samp<-sapply(sapply(split(id2, id2), function(x) 1:length(x)), sample, 1) # just test id2[samp+shift.len] [1] 1 2 3 4 5 The other option is randomise vector of indices ss<-sample(1:100) sort data.frame according those randomised indices and select let say first one sapply(split(daf[ss,], daf[ss,1]), function(x) x[1,]) But i believe that there are even better options. Regards Petr r-help-bounces at r-project.org napsal dne 26.02.2010 14:31:06:> > Hello R users, > > I have been using R for a while now for basic stats but I'm now tryingto> get my head around looping scripts and in some places I am failing! > > I have a data set with c. 1200 data points on 98 individual animals with > data on each row representing a daily measure and I am asking thequestion> "what variables affect the animal's behaviour?" > > the dataset includes these variables for analyses: > > presence of behaviour, absence of behaviour, site, year, rain, air temp,ID,> Day > > Listed below as they appear in the data set: > > BEH_T, BEH_F, SITE, YEAR, PRECIP_MM_DAY, PUP_AGE_EST, MO_AIR_TEMP, ID2, > DAY > > with BEH_T & BEH_F = the response variable for a binomial GLM > > here is the head of the dataset > (NB there are only two years and two sites) > > BEH_T BEH_F SITE YEAR PRECIP_MM_DAY PUP_AGE_EST MO_AIR_TEMP ID2 DAY > [1,] 14 10 1 2007 0 12 10.98750 1 1 > [2,] 37 23 1 2007 0 13 11.47333 1 2 > [3,] 56 22 1 2007 0 14 12.16667 1 3 > [4,] 43 23 1 2007 0 16 10.91515 1 5 > [5,] 62 16 1 2007 0 17 12.81026 1 6 > [6,] 30 20 1 2007 0 19 8.67037 1 8 > > (Sorry the headings are skewed) > > Because I don't want to do too complex a model to start with (justwanting> to learn first with a 'simple' model) I have issues with independence ofthe> data as there are repeats of individuals - i.e. data taken on the sameIDs> on different days. So in order to account for that I have decided torandom> sample one data point for each ID then run the GLM on that data for xnumber> of simulations to see if the explanatory variables are the same/similar > across all models. (This will reduce my data set to 98 data points, butit> is the best way I can see of doing this without doing mixed-effectsmodels,> since not all IDs are seen at both sites in both years). > > I am also using the MuMIn package for running all subsets of your model > > > the code I'm using is: > > > for (S in 1:2){ > Sample.dat<-ALL.R[1,] > for (I in 1:98) { > tmp<-ALL.R[ALL.R$ID2==I,] > max<-dim(tmp)[1] > if (I==1) Sample.dat<-tmp[sample(1:max,1),] else { > Sample.dat<-rbind(Sample.dat,tmp[sample(1:max,1),]) > m1.R<-glm(cbind(Sample.dat$BEH_T, Sample.dat$BEH_F) ~Sample.dat$SITE +> Sample.dat$YEAR + Sample.dat$PRECIP_MM_DAY + Sample.dat$PUP_AGE_EST + > Sample.dat$MO_AIR_TEMP, family="binomial") > mod<-dredge(m1.R)}}} > > At this point I have two issues if I do it manually then it seems towork> i.e. gives me one output (e.g shown at bottom of post) where I then wantto> take the first line, the model with the best AIC using mod[1,] - noproblem!> > However, letting the code run and for example using print ((mod[1,])) atthe> end it prints out the first line of 98 outputs - so I'm not too surewhat> I've done wrong here, but it appears to be running a model for each ID - > something basic no doubt! > > Ideally, what I want to do is take a random sample of the data then runthe> model get one output for that take the top line (i.e. the best AIC) andsave> this, then run this routine say 100 times, saving that top line everytime,> then having a look at the results and take a model average. Anytime I'vegot> close to this I have issues with overwriting the previous first line ofthe> model selection and I can't seem to identify how to set this loop up > properly. > > Any advice or guidance would be most appreciated, I have tried toexplain my> issues clearly but if more info is required please just ask, > > Many thanks in advance to those of you that took the time to read this! > > Ross > > Ross Culloch > Ph.D. Student > Durham University > UK > > > > > > > > Here is an example of the model selection table from usingMuMIn: > > > Model selection table > (Intr) S.$MO_ S.$PRE S.$PUP S.$SIT S.$YEA k Dev. AICAICc> delta weight > 30 645.8000 0.03841 -0.02148 0.2882 -0.3212 5 304.0 687.1687.7> 0.000 0.707 > 32 648.8000 0.03811 0.0009399 -0.02172 0.2857 -0.3227 6 304.0 689.0690.0> 2.249 0.230 > 26 785.1000 -0.02543 0.4678 -0.3905 4 312.8 693.9694.3> 6.630 0.026 > 31 794.2000 0.0037260 -0.02627 0.4519 -0.3950 5 312.5 695.5696.2> 8.493 0.010 > 22 582.7000 0.04703 0.2641 -0.2899 4 314.7 695.8696.2> 8.529 0.010 > 21 582.8000 0.06893 -0.01967 -0.2899 4 314.9 696.0696.4> 8.717 0.009 > 29 573.1000 0.04787 -0.0039980 0.2762 -0.2851 5 314.3 697.4698.0> 10.330 0.004 > 28 600.1000 0.06612 0.0046710 -0.02092 -0.2985 5 314.4 697.4698.1> 10.370 0.004 > 20 0.7526 0.05509 -0.01808 0.2450 4 321.0 702.0702.5> 14.770 0.000 > 10 530.4000 0.07447 -0.2639 3 324.0 703.1703.3> 15.640 0.000 > 27 0.7493 0.05556 -0.0022820 -0.01753 0.2519 5 320.8 703.9704.6> 16.850 0.000 > 19 530.0000 0.07455 -0.0001489 -0.2637 4 324.0 705.1705.5> 17.820 0.000 > 16 743.4000 0.4875 -0.3698 3 328.7 707.8708.0> 20.310 0.000 > 9 0.5512 0.06094 0.2286 3 328.8 707.9708.1> 20.430 0.000 > 8 0.6828 0.08019 -0.01688 3 328.9 708.0708.2> 20.540 0.000 > 18 0.5584 0.06173 -0.0059840 0.2481 4 327.8 708.9709.3> 21.620 0.000 > 25 739.9000 -0.0016930 0.4944 -0.3681 4 328.6 709.7710.1> 22.410 0.000 > 17 0.6856 0.07953 0.0012680 -0.01720 4 328.9 709.9710.4> 22.670 0.000 > 2 0.4985 0.08406 2 335.8 712.8713.0> 25.270 0.000 > 7 0.4996 0.08516 -0.0023780 3 335.6 714.7714.9> 27.240 0.000 > 14 1.0760 -0.02288 0.5151 3 340.8 719.9720.1> 32.420 0.000 > 23 1.0760 0.0003492 -0.02296 0.5136 4 340.8 721.9722.3> 34.590 0.000 > 5 0.8587 0.5304 2 354.0 731.0731.1> 43.440 0.000 > 12 0.8663 -0.0042170 0.5473 3 353.5 732.5732.8> 45.070 0.000 > 24 967.8000 0.0198500 -0.03274 -0.4813 4 358.3 739.4739.8> 52.140 0.000 > 15 942.8000 -0.02909 -0.4689 3 370.5 749.5749.8> 62.090 0.000 > 13 915.8000 0.0151200 -0.4556 3 384.7 763.7764.0> 76.290 0.000 > 6 900.3000 -0.4478 2 391.8 768.9769.0> 81.320 0.000 > 11 1.3530 0.0176300 -0.02957 3 402.3 781.3781.6> 93.890 0.000 > 4 1.3940 -0.02630 2 412.3 789.4789.5> 101.800 0.000 > 3 1.1010 0.0134300 2 424.4 801.4801.6> 113.800 0.000 > 1 1.1550 1 430.3 805.4805.4> 117.700 0.000 > > > > -- > View this message in context:http://n4.nabble.com/Loop-overwrite-and-data-> output-problems-tp1570593p1570593.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code.