thr3ads.net - R help - [R] help for an R automated procedures [Feb 2013]

If this information is useful, please help other people find it:
Share via:

Gustavo Vieira

2013-Feb-28 09:52 UTC

[R] help for an R automated procedures

Dear, I would like to post the following question to the r-help on Nabble
(thanks in advance for the attention, Gustavo Vieira):
Hi there.
I have a data set on hands with 5,220 cases and I'd like to automate some
procedures (but I have almost no programming knowledge). The data has some
continuous variables that are grouped by 2 others: the name of species and
the locality where they were collected. So, the samples are defined as 'each
species on each locality'. For every sample I'd like to do multiple
imputation (when applicable), test for the presence of outliers, standardize
the variables, correct some species abundances, save individual samples to
tab delimited text file, and assemble each individual sample (now, without
NAs and outliers, corrected abundances, and with the new standardized
variables) into a single data set. That task is pretty complex to me, since
my programming knowledge is poor (and my free time to learn R programming is
sparse). Could someone help me with that (I could provide you the data set
and the script I have written to do that, sample by sample [ouch!])?
Thanks in advance for your attention and all the best (ghcv@hotmail.com).

[Bellow is an example is the codes I've used to accomplish my goals, sample
by sample, which can exemplify the complexity of the procedures:

#Subsetting the data (v1-v11 are continuous "predictors"): species 1
at
locality 1 (all data [5520 cases] are on a vector called 'morfo')
sp1.loc1<-morfo[which(spps=="sp1" & taxoc=="loc1"),]
#getting only the
observations of sp1 (species 1) at loc1 (locality 1)
str(sp1.loc1) #abundance -> 19 cases and the abundance variable
('abund')
says 18…
sp1.loc1$abund<-rep(19,19)
summary(sp1.loc1) #missing values present; abundance for sp1 at loc1
corrected
attach(sp1.loc1)

#Dealing with NAs:
install.packages("mice", dependencies = T) #ok (R at: home & work)
library(mice)
imp <- mice(sp1.loc1)
sp1.loc1 <- complete(imp)
summary(sp1.loc1) #jaust checking... No more Nas!
attach(sp1.loc1)


#Detecting univariate outliers
z.crit <- qnorm(0.9999)

subset(sp1.loc1, select = id, subset = abs(scale(v1)) > z.crit)

subset(sp1.loc1, select = id, subset = abs(scale(v2)) > z.crit)
morfo[47,6]
sort(v2[taxoc=="loc1"]) #the nearest observation close to 32.00 is
25.10
sp1.loc1[,6][sp1.loc1[,6]==32.00]<-25.10
subset(sp1.loc1, select = id, subset = abs(scale(v2)) > z.crit) #Rechecking
for outliers (now, it's ok)

subset(sp1.loc1, select = id, subset = abs(scale(v3)) > z.crit)

subset(sp1.loc1, select = id, subset = abs(scale(v4)) > z.crit)

subset(sp1.loc1, select = id, subset = abs(scale(v5)) > z.crit)

subset(sp1.loc1, select = id, subset = abs(scale(v6)) > z.crit)

subset(sp1.loc1, select = id, subset = abs(scale(v7)) > z.crit)

subset(sp1.loc1, select = id, subset = abs(scale(v8)) > z.crit)

subset(sp1.loc1, select = id, subset = abs(scale(v9)) > z.crit)

subset(sp1.loc1, select = id, subset = abs(scale(v10)) > z.crit)

subset(sp1.loc1, select = id, subset = abs(scale(v11)) > z.crit)

#Standardizing variables
v1.std<-with(sp1.loc1,(scale(v1)))
v1.pad<-v1.std[,1]

v2.std<-with(sp1.loc1,(scale(v2)))
v2.pad<-v2.std[,1]

v3.std<-with(sp1.loc1,(scale(v3)))
v3.pad<-v3.std[,1]

v4.std<-with(sp1.loc1,(scale(v4)))
v4.pad<-v4.std[,1]

v5.std<-with(sp1.loc1,(scale(v5)))
v5.pad<-v5.std[,1]

v6.std<-with(sp1.loc1,(scale(v6)))
v6.pad<-v6.std[,1]

v7.std<-with(sp1.loc1,(scale(v7)))
v7.pad<-v7.std[,1]

v8.std<-with(sp1.loc1,(scale(v8)))
v8.pad<-v8.std[,1]

v9.std<-with(sp1.loc1,(scale(v9)))
v9.pad<-v9.std[,1]

v10.std<-with(sp1.loc1,(scale(v10)))
v10.pad<-v10.std[,1]

v11.std<-with(sp1.loc1,(scale(v11)))
v11.pad<-v1.std[,1]


#Joining the new standardized variables to the sp1.loc1 data set

sp1.loc1<-data.frame(sp1.loc1,v1.pad,v2.pad,v3.pad,v4.pad,v5.pad,v6.pad,v7.pad,v8.pad,v9.pad,v10.pad,v11.pad)

attach(sp1.loc1)

write.table(sp1.loc1,"sp1.at.loc1.txt",quote=F,row.names=F,
col.names=T,sep="\t")

detach(sp1.loc1)

#Subsetting the data (v1-v11 are continuous "predictors"): species 2
at
locality 1...]--

"Time will tell"
--

 		 	   		  
	[[alternative HTML version deleted]]

PIKAL Petr

2013-Feb-28 11:09 UTC

head link

[R] help for an R automated procedures

Hi

exactly what is 

fortune("surgery")

about.

Anyway, you can save yourself a lot headache, if you start using lists for your
objects.

Lists can be used easily in cycles.

for (i in 1:n) {
some.list[i] <- some.function(some.other.list[i])
}

and also lapply/sapply functions can be useful

sapply(sp1.loc1,scale)

will give you scaled data frame


Regards
Petr

> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org] On Behalf Of Gustavo Vieira
> Sent: Thursday, February 28, 2013 10:53 AM
> To: r-help at r-project.org
> Subject: [R] help for an R automated procedures
> 
> 
> Dear, I would like to post the following question to the r-help on
> Nabble (thanks in advance for the attention, Gustavo Vieira):
> Hi there.
> I have a data set on hands with 5,220 cases and I'd like to automate
> some procedures (but I have almost no programming knowledge). The data
> has some continuous variables that are grouped by 2 others: the name of
> species and the locality where they were collected. So, the samples are
> defined as 'each species on each locality'. For every sample
I'd like
> to do multiple imputation (when applicable), test for the presence of
> outliers, standardize the variables, correct some species abundances,
> save individual samples to tab delimited text file, and assemble each
> individual sample (now, without NAs and outliers, corrected abundances,
> and with the new standardized
> variables) into a single data set. That task is pretty complex to me,
> since my programming knowledge is poor (and my free time to learn R
> programming is sparse). Could someone help me with that (I could
> provide you the data set and the script I have written to do that,
> sample by sample [ouch!])?
> Thanks in advance for your attention and all the best
> (ghcv at hotmail.com).
> 
> [Bellow is an example is the codes I've used to accomplish my goals,
> sample by sample, which can exemplify the complexity of the procedures:
> 
> #Subsetting the data (v1-v11 are continuous "predictors"):
species 1 at
> locality 1 (all data [5520 cases] are on a vector called 'morfo')
> sp1.loc1<-morfo[which(spps=="sp1" &
taxoc=="loc1"),] #getting only the
> observations of sp1 (species 1) at loc1 (locality 1)
> str(sp1.loc1) #abundance -> 19 cases and the abundance variable
> ('abund') says 18...
> sp1.loc1$abund<-rep(19,19)
> summary(sp1.loc1) #missing values present; abundance for sp1 at loc1
> corrected
> attach(sp1.loc1)
> 
> #Dealing with NAs:
> install.packages("mice", dependencies = T) #ok (R at: home &
work)
> library(mice)
> imp <- mice(sp1.loc1)
> sp1.loc1 <- complete(imp)
> summary(sp1.loc1) #jaust checking... No more Nas!
> attach(sp1.loc1)
> 
> 
> #Detecting univariate outliers
> z.crit <- qnorm(0.9999)
> 
> subset(sp1.loc1, select = id, subset = abs(scale(v1)) > z.crit)
> 
> subset(sp1.loc1, select = id, subset = abs(scale(v2)) > z.crit)
> morfo[47,6]
> sort(v2[taxoc=="loc1"]) #the nearest observation close to 32.00
is
> 25.10 sp1.loc1[,6][sp1.loc1[,6]==32.00]<-25.10
> subset(sp1.loc1, select = id, subset = abs(scale(v2)) > z.crit)
> #Rechecking for outliers (now, it's ok)
> 
> subset(sp1.loc1, select = id, subset = abs(scale(v3)) > z.crit)
> 
> subset(sp1.loc1, select = id, subset = abs(scale(v4)) > z.crit)
> 
> subset(sp1.loc1, select = id, subset = abs(scale(v5)) > z.crit)
> 
> subset(sp1.loc1, select = id, subset = abs(scale(v6)) > z.crit)
> 
> subset(sp1.loc1, select = id, subset = abs(scale(v7)) > z.crit)
> 
> subset(sp1.loc1, select = id, subset = abs(scale(v8)) > z.crit)
> 
> subset(sp1.loc1, select = id, subset = abs(scale(v9)) > z.crit)
> 
> subset(sp1.loc1, select = id, subset = abs(scale(v10)) > z.crit)
> 
> subset(sp1.loc1, select = id, subset = abs(scale(v11)) > z.crit)
> 
> #Standardizing variables
> v1.std<-with(sp1.loc1,(scale(v1)))
> v1.pad<-v1.std[,1]
> 
> v2.std<-with(sp1.loc1,(scale(v2)))
> v2.pad<-v2.std[,1]
> 
> v3.std<-with(sp1.loc1,(scale(v3)))
> v3.pad<-v3.std[,1]
> 
> v4.std<-with(sp1.loc1,(scale(v4)))
> v4.pad<-v4.std[,1]
> 
> v5.std<-with(sp1.loc1,(scale(v5)))
> v5.pad<-v5.std[,1]
> 
> v6.std<-with(sp1.loc1,(scale(v6)))
> v6.pad<-v6.std[,1]
> 
> v7.std<-with(sp1.loc1,(scale(v7)))
> v7.pad<-v7.std[,1]
> 
> v8.std<-with(sp1.loc1,(scale(v8)))
> v8.pad<-v8.std[,1]
> 
> v9.std<-with(sp1.loc1,(scale(v9)))
> v9.pad<-v9.std[,1]
> 
> v10.std<-with(sp1.loc1,(scale(v10)))
> v10.pad<-v10.std[,1]
> 
> v11.std<-with(sp1.loc1,(scale(v11)))
> v11.pad<-v1.std[,1]
> 
> 
> #Joining the new standardized variables to the sp1.loc1 data set
> 
> sp1.loc1<-
> data.frame(sp1.loc1,v1.pad,v2.pad,v3.pad,v4.pad,v5.pad,v6.pad,v7.pad,v8
> .pad,v9.pad,v10.pad,v11.pad)
> 
> attach(sp1.loc1)
> 
> write.table(sp1.loc1,"sp1.at.loc1.txt",quote=F,row.names=F,
> col.names=T,sep="\t")
> 
> detach(sp1.loc1)
> 
> #Subsetting the data (v1-v11 are continuous "predictors"):
species 2 at
> locality 1...]--
> 
> "Time will tell"
> --
> 
> 
> 	[[alternative HTML version deleted]]

Apparently Analagous Threads

Search for more maybe matching threads

R help - Feb 2013 - help for an R automated procedures

[R] help for an R automated procedures

[R] help for an R automated procedures

Apparently Analagous Threads