thr3ads.net - R help - [R] Looping column names [Jan 2012]

If this information is useful, please help other people find it:
Share via:

anjulka

2012-Jan-27 15:57 UTC

[R] Looping column names

Hello,

I have a data file that I want to run loess on for 36 columns, divide the
original data by the new data, then dividing columns that end in A and B by
those that end in C.  However, I have something wrong in my first step and
am completely stuck on the third.  Could someone help me please?

Here's a snippet of the data file:
Order	Target	GC	AA_001_A	AA_001_B	AA_001_C
1	a	0.584507042	422.94	302.32	412.19
2	b	0.630434783	193.44	182.88	224.96
3	c	0.649350649	132.67	116	136.12
4	d	0.635359116	306.78	203.68	306.98
5	e	0.609271523	276.32	214.73	307.03
6	f	0.626373626	333.93	249.28	421.97
7	g	0.618834081	216.22	200.94	236.27
All columns have 3722 rows.  The columns repeat in that pattern out to
AA_012_C.  

This is the script that I've tried:

gc<-read.delim("AA1_3_GC.txt")
gc2<-gc[,-c(1:2)]
res=cbind()
for(i in colnames(gc[,-1])){
                temp<-loess(i~GC,gc2)
                temp2<-predict(temp)
                if (length(res)==0){
                                res=temp2
                }else{ res=cbind(res,temp2)
                }
}

But I keep getting this error:
Error in model.frame.default(formula = i ~ GC, data = gc2) : 
  variable lengths differ (found for 'GC')

If I manually type in a name, then it works just fine, but obviously I don't
want to do that for 36 columns.  (Or 72 for the next project.)   Where am I
going wrong and how to do I fix this?

For the second step (dividing column after I divide gc2/res), I really am
unsure of where to even start.  I would guess that it would be something
along the lines of 
for(i in colnames(gc[,-1])){
	res[i]/res[i+2]}
But that would only get me A/C, then B/D, etc.  I've spent the last hour
searching for this, but I'm clearly not using the right terms.  Could
someone even point me in the right direction please?  

Any help/suggestions you can give for either/both parts would be really
appreciated.

Thanks,
Rose


--
View this message in context:
http://r.789695.n4.nabble.com/Looping-column-names-tp4333870p4333870.html
Sent from the R help mailing list archive at Nabble.com.

Rui Barradas

2012-Jan-28 05:14 UTC

head link

[R] Looping column names

Hello,
> But I keep getting this error:
> Error in model.frame.default(formula = i ~ GC, data = gc2) :
>   variable lengths differ (found for 'GC')
Simple: you are using a variable's name, not the variable itself
Your code corrected should be


res <- NULL
for(i in colnames(gc2[,-1])){
	temp <- loess(gc2[, i]~GC,gc2)  # fit the vector, NOT it's name
	temp2 <- predict(temp)
	res <- cbind(res, temp2)
}
colnames(res) <- colnames(gc2[,-1])
res

But even better, without the loop,

apply(gc2[,-1], 2, function(x) predict(loess(x~GC, data=gc2)))
> For the second step (dividing column after I divide gc2/res), I really am
> unsure of where to even start.  I would guess that it would be something
> along the lines of
> for(i in colnames(gc[,-1])){
>         res[i]/res[i+2]}
> But that would only get me A/C, then B/D, etc. 
Create indexes on the columns:


res2 <- gc2[, -1]/res
n <- ncol(res2)

ainx <- seq(1, n, 3)
binx <- seq(2, n, 3)
cinx <- seq(3, n, 3)

res2[, ainx]/res2[, cinx]
res2[, binx]/res2[, cinx]

One final note.
You've named your data.frame 'gc' but since this is the name of a
function
in R, it's a bad choice.
I've renamed it 'gc1'.

Hope this helps,

Rui Barradas



--
View this message in context:
http://r.789695.n4.nabble.com/Looping-column-names-tp4334211p4335454.html
Sent from the R help mailing list archive at Nabble.com.

Possibly Parallel Threads

Search for more possibly parallel threads

R help - Jan 2012 - Looping column names

[R] Looping column names

[R] Looping column names

Possibly Parallel Threads