Hello R users,
I have been using R for a while now for basic stats but I'm now trying to
get my head around looping scripts and in some places I am failing!
I have a data set with c. 1200 data points on 98 individual animals with
data on each row representing a daily measure and I am asking the question
"what variables affect the animal's behaviour?"
the dataset includes these variables for analyses:
presence of behaviour, absence of behaviour, site, year, rain, air temp, ID,
Day
Listed below as they appear in the data set:
BEH_T, BEH_F, SITE, YEAR, PRECIP_MM_DAY, PUP_AGE_EST, MO_AIR_TEMP, ID2,
DAY
with BEH_T & BEH_F = the response variable for a binomial GLM
here is the head of the dataset
(NB there are only two years and two sites)
BEH_T BEH_F SITE YEAR PRECIP_MM_DAY PUP_AGE_EST MO_AIR_TEMP ID2 DAY
[1,] 14 10 1 2007 0 12 10.98750 1 1
[2,] 37 23 1 2007 0 13 11.47333 1 2
[3,] 56 22 1 2007 0 14 12.16667 1 3
[4,] 43 23 1 2007 0 16 10.91515 1 5
[5,] 62 16 1 2007 0 17 12.81026 1 6
[6,] 30 20 1 2007 0 19 8.67037 1 8
(Sorry the headings are skewed)
Because I don't want to do too complex a model to start with (just wanting
to learn first with a 'simple' model) I have issues with independence of
the
data as there are repeats of individuals - i.e. data taken on the same IDs
on different days. So in order to account for that I have decided to random
sample one data point for each ID then run the GLM on that data for x number
of simulations to see if the explanatory variables are the same/similar
across all models. (This will reduce my data set to 98 data points, but it
is the best way I can see of doing this without doing mixed-effects models,
since not all IDs are seen at both sites in both years).
I am also using the MuMIn package for running all subsets of your model
the code I'm using is:
for (S in 1:2){
Sample.dat<-ALL.R[1,]
for (I in 1:98) {
tmp<-ALL.R[ALL.R$ID2==I,]
max<-dim(tmp)[1]
if (I==1) Sample.dat<-tmp[sample(1:max,1),] else {
Sample.dat<-rbind(Sample.dat,tmp[sample(1:max,1),])
m1.R<-glm(cbind(Sample.dat$BEH_T, Sample.dat$BEH_F) ~ Sample.dat$SITE +
Sample.dat$YEAR + Sample.dat$PRECIP_MM_DAY + Sample.dat$PUP_AGE_EST +
Sample.dat$MO_AIR_TEMP, family="binomial")
mod<-dredge(m1.R)}}}
At this point I have two issues if I do it manually then it seems to work
i.e. gives me one output (e.g shown at bottom of post) where I then want to
take the first line, the model with the best AIC using mod[1,] - no problem!
However, letting the code run and for example using print ((mod[1,])) at the
end it prints out the first line of 98 outputs - so I'm not too sure what
I've done wrong here, but it appears to be running a model for each ID -
something basic no doubt!
Ideally, what I want to do is take a random sample of the data then run the
model get one output for that take the top line (i.e. the best AIC) and save
this, then run this routine say 100 times, saving that top line every time,
then having a look at the results and take a model average. Anytime I've got
close to this I have issues with overwriting the previous first line of the
model selection and I can't seem to identify how to set this loop up
properly.
Any advice or guidance would be most appreciated, I have tried to explain my
issues clearly but if more info is required please just ask,
Many thanks in advance to those of you that took the time to read this!
Ross
Ross Culloch
Ph.D. Student
Durham University
UK
Here is an example of the model selection table from usingMuMIn:
Model selection table
(Intr) S.$MO_ S.$PRE S.$PUP S.$SIT S.$YEA k Dev. AIC AICc
delta weight
30 645.8000 0.03841 -0.02148 0.2882 -0.3212 5 304.0 687.1 687.7
0.000 0.707
32 648.8000 0.03811 0.0009399 -0.02172 0.2857 -0.3227 6 304.0 689.0 690.0
2.249 0.230
26 785.1000 -0.02543 0.4678 -0.3905 4 312.8 693.9 694.3
6.630 0.026
31 794.2000 0.0037260 -0.02627 0.4519 -0.3950 5 312.5 695.5 696.2
8.493 0.010
22 582.7000 0.04703 0.2641 -0.2899 4 314.7 695.8 696.2
8.529 0.010
21 582.8000 0.06893 -0.01967 -0.2899 4 314.9 696.0 696.4
8.717 0.009
29 573.1000 0.04787 -0.0039980 0.2762 -0.2851 5 314.3 697.4 698.0
10.330 0.004
28 600.1000 0.06612 0.0046710 -0.02092 -0.2985 5 314.4 697.4 698.1
10.370 0.004
20 0.7526 0.05509 -0.01808 0.2450 4 321.0 702.0 702.5
14.770 0.000
10 530.4000 0.07447 -0.2639 3 324.0 703.1 703.3
15.640 0.000
27 0.7493 0.05556 -0.0022820 -0.01753 0.2519 5 320.8 703.9 704.6
16.850 0.000
19 530.0000 0.07455 -0.0001489 -0.2637 4 324.0 705.1 705.5
17.820 0.000
16 743.4000 0.4875 -0.3698 3 328.7 707.8 708.0
20.310 0.000
9 0.5512 0.06094 0.2286 3 328.8 707.9 708.1
20.430 0.000
8 0.6828 0.08019 -0.01688 3 328.9 708.0 708.2
20.540 0.000
18 0.5584 0.06173 -0.0059840 0.2481 4 327.8 708.9 709.3
21.620 0.000
25 739.9000 -0.0016930 0.4944 -0.3681 4 328.6 709.7 710.1
22.410 0.000
17 0.6856 0.07953 0.0012680 -0.01720 4 328.9 709.9 710.4
22.670 0.000
2 0.4985 0.08406 2 335.8 712.8 713.0
25.270 0.000
7 0.4996 0.08516 -0.0023780 3 335.6 714.7 714.9
27.240 0.000
14 1.0760 -0.02288 0.5151 3 340.8 719.9 720.1
32.420 0.000
23 1.0760 0.0003492 -0.02296 0.5136 4 340.8 721.9 722.3
34.590 0.000
5 0.8587 0.5304 2 354.0 731.0 731.1
43.440 0.000
12 0.8663 -0.0042170 0.5473 3 353.5 732.5 732.8
45.070 0.000
24 967.8000 0.0198500 -0.03274 -0.4813 4 358.3 739.4 739.8
52.140 0.000
15 942.8000 -0.02909 -0.4689 3 370.5 749.5 749.8
62.090 0.000
13 915.8000 0.0151200 -0.4556 3 384.7 763.7 764.0
76.290 0.000
6 900.3000 -0.4478 2 391.8 768.9 769.0
81.320 0.000
11 1.3530 0.0176300 -0.02957 3 402.3 781.3 781.6
93.890 0.000
4 1.3940 -0.02630 2 412.3 789.4 789.5
101.800 0.000
3 1.1010 0.0134300 2 424.4 801.4 801.6
113.800 0.000
1 1.1550 1 430.3 805.4 805.4
117.700 0.000>
--
View this message in context:
http://n4.nabble.com/Loop-overwrite-and-data-output-problems-tp1570593p1570593.html
Sent from the R help mailing list archive at Nabble.com.
Hi,
Since I'm not an expert, I still have problems understanding when it's
not my own work, but I have something that might help you.
if (I==1) Sample.dat<-tmp[sample(1:max,1),] else {
Sample.dat<-rbind(Sample.dat,tmp[sample(1:max,1),])
This part might not be the best.
I would do something like:
Sample.dat[[I]] <- tmp[sample(1:max, 1),]
That way, you will store your line in the Ith element of the list
"Sample.dat". 5you might have to define it first like: Sample.dat
<-
list() )
You can then convert it to a matrix using: do.call(rbind, Sample.dat)
It might get you started
HTH,
Ivan
Le 2/26/2010 14:31, RCulloch a ?crit :> Hello R users,
>
> I have been using R for a while now for basic stats but I'm now trying
to
> get my head around looping scripts and in some places I am failing!
>
> I have a data set with c. 1200 data points on 98 individual animals with
> data on each row representing a daily measure and I am asking the question
> "what variables affect the animal's behaviour?"
>
> the dataset includes these variables for analyses:
>
> presence of behaviour, absence of behaviour, site, year, rain, air temp,
ID,
> Day
>
> Listed below as they appear in the data set:
>
> BEH_T, BEH_F, SITE, YEAR, PRECIP_MM_DAY, PUP_AGE_EST, MO_AIR_TEMP, ID2,
> DAY
>
> with BEH_T& BEH_F = the response variable for a binomial GLM
>
> here is the head of the dataset
> (NB there are only two years and two sites)
>
> BEH_T BEH_F SITE YEAR PRECIP_MM_DAY PUP_AGE_EST MO_AIR_TEMP ID2 DAY
> [1,] 14 10 1 2007 0 12 10.98750 1 1
> [2,] 37 23 1 2007 0 13 11.47333 1 2
> [3,] 56 22 1 2007 0 14 12.16667 1 3
> [4,] 43 23 1 2007 0 16 10.91515 1 5
> [5,] 62 16 1 2007 0 17 12.81026 1 6
> [6,] 30 20 1 2007 0 19 8.67037 1 8
>
> (Sorry the headings are skewed)
>
> Because I don't want to do too complex a model to start with (just
wanting
> to learn first with a 'simple' model) I have issues with
independence of the
> data as there are repeats of individuals - i.e. data taken on the same IDs
> on different days. So in order to account for that I have decided to random
> sample one data point for each ID then run the GLM on that data for x
number
> of simulations to see if the explanatory variables are the same/similar
> across all models. (This will reduce my data set to 98 data points, but it
> is the best way I can see of doing this without doing mixed-effects models,
> since not all IDs are seen at both sites in both years).
>
> I am also using the MuMIn package for running all subsets of your model
>
>
> the code I'm using is:
>
>
> for (S in 1:2){
> Sample.dat<-ALL.R[1,]
> for (I in 1:98) {
> tmp<-ALL.R[ALL.R$ID2==I,]
> max<-dim(tmp)[1]
> if (I==1) Sample.dat<-tmp[sample(1:max,1),] else {
> Sample.dat<-rbind(Sample.dat,tmp[sample(1:max,1),])
> m1.R<-glm(cbind(Sample.dat$BEH_T, Sample.dat$BEH_F) ~ Sample.dat$SITE
+
> Sample.dat$YEAR + Sample.dat$PRECIP_MM_DAY + Sample.dat$PUP_AGE_EST +
> Sample.dat$MO_AIR_TEMP, family="binomial")
> mod<-dredge(m1.R)}}}
>
> At this point I have two issues if I do it manually then it seems to work
> i.e. gives me one output (e.g shown at bottom of post) where I then want to
> take the first line, the model with the best AIC using mod[1,] - no
problem!
>
> However, letting the code run and for example using print ((mod[1,])) at
the
> end it prints out the first line of 98 outputs - so I'm not too sure
what
> I've done wrong here, but it appears to be running a model for each ID
-
> something basic no doubt!
>
> Ideally, what I want to do is take a random sample of the data then run the
> model get one output for that take the top line (i.e. the best AIC) and
save
> this, then run this routine say 100 times, saving that top line every time,
> then having a look at the results and take a model average. Anytime
I've got
> close to this I have issues with overwriting the previous first line of the
> model selection and I can't seem to identify how to set this loop up
> properly.
>
> Any advice or guidance would be most appreciated, I have tried to explain
my
> issues clearly but if more info is required please just ask,
>
> Many thanks in advance to those of you that took the time to read this!
>
> Ross
>
> Ross Culloch
> Ph.D. Student
> Durham University
> UK
>
>
>
>
>
>
>
> Here is an example of the model selection table from usingMuMIn:
>
>
> Model selection table
> (Intr) S.$MO_ S.$PRE S.$PUP S.$SIT S.$YEA k Dev. AIC AICc
> delta weight
> 30 645.8000 0.03841 -0.02148 0.2882 -0.3212 5 304.0 687.1 687.7
> 0.000 0.707
> 32 648.8000 0.03811 0.0009399 -0.02172 0.2857 -0.3227 6 304.0 689.0 690.0
> 2.249 0.230
> 26 785.1000 -0.02543 0.4678 -0.3905 4 312.8 693.9 694.3
> 6.630 0.026
> 31 794.2000 0.0037260 -0.02627 0.4519 -0.3950 5 312.5 695.5 696.2
> 8.493 0.010
> 22 582.7000 0.04703 0.2641 -0.2899 4 314.7 695.8 696.2
> 8.529 0.010
> 21 582.8000 0.06893 -0.01967 -0.2899 4 314.9 696.0 696.4
> 8.717 0.009
> 29 573.1000 0.04787 -0.0039980 0.2762 -0.2851 5 314.3 697.4 698.0
> 10.330 0.004
> 28 600.1000 0.06612 0.0046710 -0.02092 -0.2985 5 314.4 697.4 698.1
> 10.370 0.004
> 20 0.7526 0.05509 -0.01808 0.2450 4 321.0 702.0 702.5
> 14.770 0.000
> 10 530.4000 0.07447 -0.2639 3 324.0 703.1 703.3
> 15.640 0.000
> 27 0.7493 0.05556 -0.0022820 -0.01753 0.2519 5 320.8 703.9 704.6
> 16.850 0.000
> 19 530.0000 0.07455 -0.0001489 -0.2637 4 324.0 705.1 705.5
> 17.820 0.000
> 16 743.4000 0.4875 -0.3698 3 328.7 707.8 708.0
> 20.310 0.000
> 9 0.5512 0.06094 0.2286 3 328.8 707.9 708.1
> 20.430 0.000
> 8 0.6828 0.08019 -0.01688 3 328.9 708.0 708.2
> 20.540 0.000
> 18 0.5584 0.06173 -0.0059840 0.2481 4 327.8 708.9 709.3
> 21.620 0.000
> 25 739.9000 -0.0016930 0.4944 -0.3681 4 328.6 709.7 710.1
> 22.410 0.000
> 17 0.6856 0.07953 0.0012680 -0.01720 4 328.9 709.9 710.4
> 22.670 0.000
> 2 0.4985 0.08406 2 335.8 712.8 713.0
> 25.270 0.000
> 7 0.4996 0.08516 -0.0023780 3 335.6 714.7 714.9
> 27.240 0.000
> 14 1.0760 -0.02288 0.5151 3 340.8 719.9 720.1
> 32.420 0.000
> 23 1.0760 0.0003492 -0.02296 0.5136 4 340.8 721.9 722.3
> 34.590 0.000
> 5 0.8587 0.5304 2 354.0 731.0 731.1
> 43.440 0.000
> 12 0.8663 -0.0042170 0.5473 3 353.5 732.5 732.8
> 45.070 0.000
> 24 967.8000 0.0198500 -0.03274 -0.4813 4 358.3 739.4 739.8
> 52.140 0.000
> 15 942.8000 -0.02909 -0.4689 3 370.5 749.5 749.8
> 62.090 0.000
> 13 915.8000 0.0151200 -0.4556 3 384.7 763.7 764.0
> 76.290 0.000
> 6 900.3000 -0.4478 2 391.8 768.9 769.0
> 81.320 0.000
> 11 1.3530 0.0176300 -0.02957 3 402.3 781.3 781.6
> 93.890 0.000
> 4 1.3940 -0.02630 2 412.3 789.4 789.5
> 101.800 0.000
> 3 1.1010 0.0134300 2 424.4 801.4 801.6
> 113.800 0.000
> 1 1.1550 1 430.3 805.4 805.4
> 117.700 0.000
>
>>
Hi I am bit confused what you want to achieve. As I can not reproduce code without your data I just guess. If I understand you want to select from all your data randomly 98 values for 98 animals (one for each animal). I presume your id2 is sorted. One option # make sorted ids id2<-sample(1:5, 100, replace=T) id2<-sort(id2) # how many unique ids len<-rle(id2)$lengths # how many values are from beginning shift.len<-c(0,cumsum(len))[-(length(len)+1)] # get one value from each id samp<-sapply(sapply(split(id2, id2), function(x) 1:length(x)), sample, 1) # just test id2[samp+shift.len] [1] 1 2 3 4 5 The other option is randomise vector of indices ss<-sample(1:100) sort data.frame according those randomised indices and select let say first one sapply(split(daf[ss,], daf[ss,1]), function(x) x[1,]) But i believe that there are even better options. Regards Petr r-help-bounces at r-project.org napsal dne 26.02.2010 14:31:06:> > Hello R users, > > I have been using R for a while now for basic stats but I'm now tryingto> get my head around looping scripts and in some places I am failing! > > I have a data set with c. 1200 data points on 98 individual animals with > data on each row representing a daily measure and I am asking thequestion> "what variables affect the animal's behaviour?" > > the dataset includes these variables for analyses: > > presence of behaviour, absence of behaviour, site, year, rain, air temp,ID,> Day > > Listed below as they appear in the data set: > > BEH_T, BEH_F, SITE, YEAR, PRECIP_MM_DAY, PUP_AGE_EST, MO_AIR_TEMP, ID2, > DAY > > with BEH_T & BEH_F = the response variable for a binomial GLM > > here is the head of the dataset > (NB there are only two years and two sites) > > BEH_T BEH_F SITE YEAR PRECIP_MM_DAY PUP_AGE_EST MO_AIR_TEMP ID2 DAY > [1,] 14 10 1 2007 0 12 10.98750 1 1 > [2,] 37 23 1 2007 0 13 11.47333 1 2 > [3,] 56 22 1 2007 0 14 12.16667 1 3 > [4,] 43 23 1 2007 0 16 10.91515 1 5 > [5,] 62 16 1 2007 0 17 12.81026 1 6 > [6,] 30 20 1 2007 0 19 8.67037 1 8 > > (Sorry the headings are skewed) > > Because I don't want to do too complex a model to start with (justwanting> to learn first with a 'simple' model) I have issues with independence ofthe> data as there are repeats of individuals - i.e. data taken on the sameIDs> on different days. So in order to account for that I have decided torandom> sample one data point for each ID then run the GLM on that data for xnumber> of simulations to see if the explanatory variables are the same/similar > across all models. (This will reduce my data set to 98 data points, butit> is the best way I can see of doing this without doing mixed-effectsmodels,> since not all IDs are seen at both sites in both years). > > I am also using the MuMIn package for running all subsets of your model > > > the code I'm using is: > > > for (S in 1:2){ > Sample.dat<-ALL.R[1,] > for (I in 1:98) { > tmp<-ALL.R[ALL.R$ID2==I,] > max<-dim(tmp)[1] > if (I==1) Sample.dat<-tmp[sample(1:max,1),] else { > Sample.dat<-rbind(Sample.dat,tmp[sample(1:max,1),]) > m1.R<-glm(cbind(Sample.dat$BEH_T, Sample.dat$BEH_F) ~Sample.dat$SITE +> Sample.dat$YEAR + Sample.dat$PRECIP_MM_DAY + Sample.dat$PUP_AGE_EST + > Sample.dat$MO_AIR_TEMP, family="binomial") > mod<-dredge(m1.R)}}} > > At this point I have two issues if I do it manually then it seems towork> i.e. gives me one output (e.g shown at bottom of post) where I then wantto> take the first line, the model with the best AIC using mod[1,] - noproblem!> > However, letting the code run and for example using print ((mod[1,])) atthe> end it prints out the first line of 98 outputs - so I'm not too surewhat> I've done wrong here, but it appears to be running a model for each ID - > something basic no doubt! > > Ideally, what I want to do is take a random sample of the data then runthe> model get one output for that take the top line (i.e. the best AIC) andsave> this, then run this routine say 100 times, saving that top line everytime,> then having a look at the results and take a model average. Anytime I'vegot> close to this I have issues with overwriting the previous first line ofthe> model selection and I can't seem to identify how to set this loop up > properly. > > Any advice or guidance would be most appreciated, I have tried toexplain my> issues clearly but if more info is required please just ask, > > Many thanks in advance to those of you that took the time to read this! > > Ross > > Ross Culloch > Ph.D. Student > Durham University > UK > > > > > > > > Here is an example of the model selection table from usingMuMIn: > > > Model selection table > (Intr) S.$MO_ S.$PRE S.$PUP S.$SIT S.$YEA k Dev. AICAICc> delta weight > 30 645.8000 0.03841 -0.02148 0.2882 -0.3212 5 304.0 687.1687.7> 0.000 0.707 > 32 648.8000 0.03811 0.0009399 -0.02172 0.2857 -0.3227 6 304.0 689.0690.0> 2.249 0.230 > 26 785.1000 -0.02543 0.4678 -0.3905 4 312.8 693.9694.3> 6.630 0.026 > 31 794.2000 0.0037260 -0.02627 0.4519 -0.3950 5 312.5 695.5696.2> 8.493 0.010 > 22 582.7000 0.04703 0.2641 -0.2899 4 314.7 695.8696.2> 8.529 0.010 > 21 582.8000 0.06893 -0.01967 -0.2899 4 314.9 696.0696.4> 8.717 0.009 > 29 573.1000 0.04787 -0.0039980 0.2762 -0.2851 5 314.3 697.4698.0> 10.330 0.004 > 28 600.1000 0.06612 0.0046710 -0.02092 -0.2985 5 314.4 697.4698.1> 10.370 0.004 > 20 0.7526 0.05509 -0.01808 0.2450 4 321.0 702.0702.5> 14.770 0.000 > 10 530.4000 0.07447 -0.2639 3 324.0 703.1703.3> 15.640 0.000 > 27 0.7493 0.05556 -0.0022820 -0.01753 0.2519 5 320.8 703.9704.6> 16.850 0.000 > 19 530.0000 0.07455 -0.0001489 -0.2637 4 324.0 705.1705.5> 17.820 0.000 > 16 743.4000 0.4875 -0.3698 3 328.7 707.8708.0> 20.310 0.000 > 9 0.5512 0.06094 0.2286 3 328.8 707.9708.1> 20.430 0.000 > 8 0.6828 0.08019 -0.01688 3 328.9 708.0708.2> 20.540 0.000 > 18 0.5584 0.06173 -0.0059840 0.2481 4 327.8 708.9709.3> 21.620 0.000 > 25 739.9000 -0.0016930 0.4944 -0.3681 4 328.6 709.7710.1> 22.410 0.000 > 17 0.6856 0.07953 0.0012680 -0.01720 4 328.9 709.9710.4> 22.670 0.000 > 2 0.4985 0.08406 2 335.8 712.8713.0> 25.270 0.000 > 7 0.4996 0.08516 -0.0023780 3 335.6 714.7714.9> 27.240 0.000 > 14 1.0760 -0.02288 0.5151 3 340.8 719.9720.1> 32.420 0.000 > 23 1.0760 0.0003492 -0.02296 0.5136 4 340.8 721.9722.3> 34.590 0.000 > 5 0.8587 0.5304 2 354.0 731.0731.1> 43.440 0.000 > 12 0.8663 -0.0042170 0.5473 3 353.5 732.5732.8> 45.070 0.000 > 24 967.8000 0.0198500 -0.03274 -0.4813 4 358.3 739.4739.8> 52.140 0.000 > 15 942.8000 -0.02909 -0.4689 3 370.5 749.5749.8> 62.090 0.000 > 13 915.8000 0.0151200 -0.4556 3 384.7 763.7764.0> 76.290 0.000 > 6 900.3000 -0.4478 2 391.8 768.9769.0> 81.320 0.000 > 11 1.3530 0.0176300 -0.02957 3 402.3 781.3781.6> 93.890 0.000 > 4 1.3940 -0.02630 2 412.3 789.4789.5> 101.800 0.000 > 3 1.1010 0.0134300 2 424.4 801.4801.6> 113.800 0.000 > 1 1.1550 1 430.3 805.4805.4> 117.700 0.000 > > > > -- > View this message in context:http://n4.nabble.com/Loop-overwrite-and-data-> output-problems-tp1570593p1570593.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code.