Hi, I have a matrix with column names starting with a character in [0-9]. After some matrix operations (e.g. copy to another matrix), R seems to add a character 'X' in front of the column name. Is this a normal default behaviour of R? Why has it got this behaviour? Can it be changed? What would be the side effect? Thank you. Regards, Gilbert [[alternative HTML version deleted]]
This normally happens to me when I read in a table where the rownames
will be appended by an "X". Read help(make.names) for more
information.
Remember that R is primarily a statistical software and thus likes
colnames classes to be characters.
mat1 <- matrix( 1:12, nc=3, dimnames=list(NULL, c(0,1,2)) )
mat1
0 1 2
[1,] 1 5 9
[2,] 2 6 10
[3,] 3 7 11
[4,] 4 8 12
colnames(mat1)
[1] "0" "1" "2"
is.character( colnames(mat1) )
[1] TRUE
However I am not able to reproduce what your problem
mat2 <- matrix( 101:108, nc=2, dimnames=list(NULL, c("A",
"B")) )
mat2
A B
[1,] 101 105
[2,] 102 106
[3,] 103 107
[4,] 104 108
cbind(mat1, mat2)
0 1 2 A B
[1,] 1 5 9 101 105
[2,] 2 6 10 102 106
[3,] 3 7 11 103 107
[4,] 4 8 12 104 108
I tried other operation such as mat1[ , 1:2] + mat2 and
mat1[ ,1] <- mat2[ ,2] but it does not add a preceding "X".
Can you give a reproducible example please ?
If you want to get rid of the preceding "X", try
colnames( mat1 ) <- c("X0", "X1", "X2")
colnames( mat1 ) <- gsub("^X", "", colnames(mat1))
Why do want to do this anyway ?
Regards, Adai
On Mon, 2005-07-18 at 16:11 +0100, Gilbert Wu wrote:> Hi,
>
> I have a matrix with column names starting with a character in [0-9]. After
some matrix operations (e.g. copy to another matrix), R seems to add a character
'X' in front of the column name. Is this a normal default behaviour of
R? Why has it got this behaviour? Can it be changed? What would be the side
effect?
>
> Thank you.
>
> Regards,
>
> Gilbert
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html
>
Hi Adai, Many Thanks for the examples. I work for a financial institution. We are exploring R as a tool to implement our portfolio optimization strategies. Hence, R is still a new language to us. The script I wrote tried to make a returns matrix from the daily return indices extracted from a SQL database. Please find below the output that produces the 'X' prefix in the colnames. The reason to preserve the column names is that they are stock identifiers which are to be used by other sub systems rather than R. I would welcome any suggestion to improve the script. Regards, Gilbert> "p.RIs2Returns" <-+ function (RIm) + { + x<-RIm[1:(nrow(RIm)-1), 1:ncol(RIm)] + y<-RIm[2:nrow(RIm), 1:ncol(RIm)] + RReturns <- (y/x -1) + RReturns + }> > > channel<-odbcConnect("ourSQLDB") > result<-sqlQuery(channel,paste("select * from equityRIs;")) > odbcClose(channel) > resultstockid sdate dbPrice 1 899188 20050713 7.59500 2 899188 20050714 7.60500 3 899188 20050715 7.48000 4 899188 20050718 7.41500 5 902232 20050713 10.97000 6 902232 20050714 10.94000 7 902232 20050715 10.99000 8 902232 20050718 11.05000 9 901714 20050713 17.96999 10 901714 20050714 18.00999 11 901714 20050715 17.64999 12 901714 20050718 17.64000 13 28176U 20050713 5.19250 14 28176U 20050714 5.25000 15 28176U 20050715 5.25000 16 28176U 20050718 5.22500 17 15322M 20050713 11.44000 18 15322M 20050714 11.50000 19 15322M 20050715 11.33000 20 15322M 20050718 11.27000> r1<-reshape(result, timevar="stockid", idvar="sdate", direction="wide") > r1sdate dbPrice.899188 dbPrice.902232 dbPrice.901714 dbPrice.28176U dbPrice.15322M 1 20050713 7.595 10.97 17.96999 5.1925 11.44 2 20050714 7.605 10.94 18.00999 5.2500 11.50 3 20050715 7.480 10.99 17.64999 5.2500 11.33 4 20050718 7.415 11.05 17.64000 5.2250 11.27> #Set sdate as the rownames > rownames(r1) <-as.character(r1[1:nrow(r1),1:1]) > #Get rid of the first column > r1 <- r1[1:nrow(r1),2:ncol(r1)] > r1dbPrice.899188 dbPrice.902232 dbPrice.901714 dbPrice.28176U dbPrice.15322M 20050713 7.595 10.97 17.96999 5.1925 11.44 20050714 7.605 10.94 18.00999 5.2500 11.50 20050715 7.480 10.99 17.64999 5.2500 11.33 20050718 7.415 11.05 17.64000 5.2250 11.27> colnames(r1) <- as.character(sub("[[:alnum:]]*\\.","", colnames(r1))) > r1899188 902232 901714 28176U 15322M 20050713 7.595 10.97 17.96999 5.1925 11.44 20050714 7.605 10.94 18.00999 5.2500 11.50 20050715 7.480 10.99 17.64999 5.2500 11.33 20050718 7.415 11.05 17.64000 5.2250 11.27> RRs<-p.RIs2Returns(r1) > RRsX899188 X902232 X901714 X28176U X15322M 20050714 0.001316656 -0.002734731 0.002225933 0.011073664 0.005244755 20050715 -0.016436555 0.004570384 -0.019988906 0.000000000 -0.014782609 20050718 -0.008689840 0.005459509 -0.000566006 -0.004761905 -0.005295675>
Hi Adai,
Thank you very much for your suggestions. Your optimized function would come in
very handy cause I will need to generate a matrix of size around 2250 * 1000.
Regards,
Gilbert
-----Original Message-----
From: Adaikalavan Ramasamy [mailto:ramasamy at cancer.org.uk]
Sent: 19 July 2005 12:20
To: Gilbert Wu
Cc: r-help at stat.math.ethz.ch
Subject: RE: [R] colnames
First, your problem could be boiled down to the following example. See
how the colnames of the two outputs vary.
df <- cbind.data.frame( "100"=1:2, "200"=3:4 )
df/df
X100 X200
1 1 1
2 1 1
m <- as.matrix( df ) # coerce to matrix class
m/m
100 200
1 1 1
2 1 1
It appears that whenever R has to create a new dataframe automatically,
it tries to get nice colnames. See help(data.frame). I am not exactly
sure why this behaviour is different when creating a matrix. But I do
not think this is a major problem for most people. If you coerce your
input to matrix, the problem goes away.
Next, note the following points :
a) "mat[ 1:3, 1:ncol(mat) ]" is equivalent to simply "mat[ 1:3,
]".
b) "mat[ 2:nrow(mat), ]" is equivalent to simply "mat[ -1,
]"
See help(subset) for more information.
Using the points above, we can simplify your function as
p.RIs2Returns <- function (mat){
mat <- as.matrix(mat)
x <- mat[ -nrow(mat), ]
y <- mat[ -1, ]
return( y/x -1 )
}
If your data contains only numerical data, it is probably good idea to
work with matrices as matrix operations are faster.
Finally, we can shorten your function. You can use the diff (which works
column-wise if input is a matrix) and apply function if you know that
y/x = exp(log(y/x)) = exp( log(y) - log(x) )
which could be coded in R as
exp( diff( log(r1) ) )
and then subtract 1 from above to get your returns.
Regards, Adai
On Tue, 2005-07-19 at 09:17 +0100, Gilbert Wu wrote:> Hi Adai,
>
> Many Thanks for the examples.
>
> I work for a financial institution. We are exploring R as a tool to
implement our portfolio optimization strategies. Hence, R is still a new
language to us.
>
> The script I wrote tried to make a returns matrix from the daily return
indices extracted from a SQL database. Please find below the output that
produces the 'X' prefix in the colnames. The reason to preserve the
column names is that they are stock identifiers which are to be used by other
sub systems rather than R.
>
> I would welcome any suggestion to improve the script.
>
>
> Regards,
>
> Gilbert
>
> > "p.RIs2Returns" <-
> + function (RIm)
> + {
> + x<-RIm[1:(nrow(RIm)-1), 1:ncol(RIm)]
> + y<-RIm[2:nrow(RIm), 1:ncol(RIm)]
> + RReturns <- (y/x -1)
> + RReturns
> + }
> >
> >
> > channel<-odbcConnect("ourSQLDB")
> > result<-sqlQuery(channel,paste("select * from
equityRIs;"))
> > odbcClose(channel)
> > result
> stockid sdate dbPrice
> 1 899188 20050713 7.59500
> 2 899188 20050714 7.60500
> 3 899188 20050715 7.48000
> 4 899188 20050718 7.41500
> 5 902232 20050713 10.97000
> 6 902232 20050714 10.94000
> 7 902232 20050715 10.99000
> 8 902232 20050718 11.05000
> 9 901714 20050713 17.96999
> 10 901714 20050714 18.00999
> 11 901714 20050715 17.64999
> 12 901714 20050718 17.64000
> 13 28176U 20050713 5.19250
> 14 28176U 20050714 5.25000
> 15 28176U 20050715 5.25000
> 16 28176U 20050718 5.22500
> 17 15322M 20050713 11.44000
> 18 15322M 20050714 11.50000
> 19 15322M 20050715 11.33000
> 20 15322M 20050718 11.27000
> > r1<-reshape(result, timevar="stockid",
idvar="sdate", direction="wide")
> > r1
> sdate dbPrice.899188 dbPrice.902232 dbPrice.901714 dbPrice.28176U
dbPrice.15322M
> 1 20050713 7.595 10.97 17.96999 5.1925
11.44
> 2 20050714 7.605 10.94 18.00999 5.2500
11.50
> 3 20050715 7.480 10.99 17.64999 5.2500
11.33
> 4 20050718 7.415 11.05 17.64000 5.2250
11.27
> > #Set sdate as the rownames
> > rownames(r1) <-as.character(r1[1:nrow(r1),1:1])
> > #Get rid of the first column
> > r1 <- r1[1:nrow(r1),2:ncol(r1)]
> > r1
> dbPrice.899188 dbPrice.902232 dbPrice.901714 dbPrice.28176U
dbPrice.15322M
> 20050713 7.595 10.97 17.96999 5.1925
11.44
> 20050714 7.605 10.94 18.00999 5.2500
11.50
> 20050715 7.480 10.99 17.64999 5.2500
11.33
> 20050718 7.415 11.05 17.64000 5.2250
11.27
> > colnames(r1) <-
as.character(sub("[[:alnum:]]*\\.","", colnames(r1)))
> > r1
> 899188 902232 901714 28176U 15322M
> 20050713 7.595 10.97 17.96999 5.1925 11.44
> 20050714 7.605 10.94 18.00999 5.2500 11.50
> 20050715 7.480 10.99 17.64999 5.2500 11.33
> 20050718 7.415 11.05 17.64000 5.2250 11.27
> > RRs<-p.RIs2Returns(r1)
> > RRs
> X899188 X902232 X901714 X28176U X15322M
> 20050714 0.001316656 -0.002734731 0.002225933 0.011073664 0.005244755
> 20050715 -0.016436555 0.004570384 -0.019988906 0.000000000 -0.014782609
> 20050718 -0.008689840 0.005459509 -0.000566006 -0.004761905 -0.005295675
> >
>
Hi Adai,
When I tried the optimized routine, I got the following error message:
r1
899188 902232 901714 28176U 15322M
20050713 7.595 10.97 17.96999 5.1925 11.44
20050714 7.605 10.94 18.00999 5.2500 11.50
20050715 7.480 10.99 17.64999 5.2500 11.33
20050718 7.415 11.05 17.64000 5.2250 11.27> exp(diff(log(r1))) -1
Error in r[i1] - r[-length(r):-(length(r) - lag + 1)] :
non-numeric argument to binary operator>
Any idea?
Many Thanks.
Gilbert
-----Original Message-----
From: Adaikalavan Ramasamy [mailto:ramasamy at cancer.org.uk]
Sent: 19 July 2005 12:20
To: Gilbert Wu
Cc: r-help at stat.math.ethz.ch
Subject: RE: [R] colnames
First, your problem could be boiled down to the following example. See
how the colnames of the two outputs vary.
df <- cbind.data.frame( "100"=1:2, "200"=3:4 )
df/df
X100 X200
1 1 1
2 1 1
m <- as.matrix( df ) # coerce to matrix class
m/m
100 200
1 1 1
2 1 1
It appears that whenever R has to create a new dataframe automatically,
it tries to get nice colnames. See help(data.frame). I am not exactly
sure why this behaviour is different when creating a matrix. But I do
not think this is a major problem for most people. If you coerce your
input to matrix, the problem goes away.
Next, note the following points :
a) "mat[ 1:3, 1:ncol(mat) ]" is equivalent to simply "mat[ 1:3,
]".
b) "mat[ 2:nrow(mat), ]" is equivalent to simply "mat[ -1,
]"
See help(subset) for more information.
Using the points above, we can simplify your function as
p.RIs2Returns <- function (mat){
mat <- as.matrix(mat)
x <- mat[ -nrow(mat), ]
y <- mat[ -1, ]
return( y/x -1 )
}
If your data contains only numerical data, it is probably good idea to
work with matrices as matrix operations are faster.
Finally, we can shorten your function. You can use the diff (which works
column-wise if input is a matrix) and apply function if you know that
y/x = exp(log(y/x)) = exp( log(y) - log(x) )
which could be coded in R as
exp( diff( log(r1) ) )
and then subtract 1 from above to get your returns.
Regards, Adai
On Tue, 2005-07-19 at 09:17 +0100, Gilbert Wu wrote:> Hi Adai,
>
> Many Thanks for the examples.
>
> I work for a financial institution. We are exploring R as a tool to
implement our portfolio optimization strategies. Hence, R is still a new
language to us.
>
> The script I wrote tried to make a returns matrix from the daily return
indices extracted from a SQL database. Please find below the output that
produces the 'X' prefix in the colnames. The reason to preserve the
column names is that they are stock identifiers which are to be used by other
sub systems rather than R.
>
> I would welcome any suggestion to improve the script.
>
>
> Regards,
>
> Gilbert
>
> > "p.RIs2Returns" <-
> + function (RIm)
> + {
> + x<-RIm[1:(nrow(RIm)-1), 1:ncol(RIm)]
> + y<-RIm[2:nrow(RIm), 1:ncol(RIm)]
> + RReturns <- (y/x -1)
> + RReturns
> + }
> >
> >
> > channel<-odbcConnect("ourSQLDB")
> > result<-sqlQuery(channel,paste("select * from
equityRIs;"))
> > odbcClose(channel)
> > result
> stockid sdate dbPrice
> 1 899188 20050713 7.59500
> 2 899188 20050714 7.60500
> 3 899188 20050715 7.48000
> 4 899188 20050718 7.41500
> 5 902232 20050713 10.97000
> 6 902232 20050714 10.94000
> 7 902232 20050715 10.99000
> 8 902232 20050718 11.05000
> 9 901714 20050713 17.96999
> 10 901714 20050714 18.00999
> 11 901714 20050715 17.64999
> 12 901714 20050718 17.64000
> 13 28176U 20050713 5.19250
> 14 28176U 20050714 5.25000
> 15 28176U 20050715 5.25000
> 16 28176U 20050718 5.22500
> 17 15322M 20050713 11.44000
> 18 15322M 20050714 11.50000
> 19 15322M 20050715 11.33000
> 20 15322M 20050718 11.27000
> > r1<-reshape(result, timevar="stockid",
idvar="sdate", direction="wide")
> > r1
> sdate dbPrice.899188 dbPrice.902232 dbPrice.901714 dbPrice.28176U
dbPrice.15322M
> 1 20050713 7.595 10.97 17.96999 5.1925
11.44
> 2 20050714 7.605 10.94 18.00999 5.2500
11.50
> 3 20050715 7.480 10.99 17.64999 5.2500
11.33
> 4 20050718 7.415 11.05 17.64000 5.2250
11.27
> > #Set sdate as the rownames
> > rownames(r1) <-as.character(r1[1:nrow(r1),1:1])
> > #Get rid of the first column
> > r1 <- r1[1:nrow(r1),2:ncol(r1)]
> > r1
> dbPrice.899188 dbPrice.902232 dbPrice.901714 dbPrice.28176U
dbPrice.15322M
> 20050713 7.595 10.97 17.96999 5.1925
11.44
> 20050714 7.605 10.94 18.00999 5.2500
11.50
> 20050715 7.480 10.99 17.64999 5.2500
11.33
> 20050718 7.415 11.05 17.64000 5.2250
11.27
> > colnames(r1) <-
as.character(sub("[[:alnum:]]*\\.","", colnames(r1)))
> > r1
> 899188 902232 901714 28176U 15322M
> 20050713 7.595 10.97 17.96999 5.1925 11.44
> 20050714 7.605 10.94 18.00999 5.2500 11.50
> 20050715 7.480 10.99 17.64999 5.2500 11.33
> 20050718 7.415 11.05 17.64000 5.2250 11.27
> > RRs<-p.RIs2Returns(r1)
> > RRs
> X899188 X902232 X901714 X28176U X15322M
> 20050714 0.001316656 -0.002734731 0.002225933 0.011073664 0.005244755
> 20050715 -0.016436555 0.004570384 -0.019988906 0.000000000 -0.014782609
> 20050718 -0.008689840 0.005459509 -0.000566006 -0.004761905 -0.005295675
> >
>
Hi Adai, Your diagnosis is absolutely right; class(r1) returned data.frame and your suggested solution worked perfectly. Your assumption is also right; both x and y are positive. If I want to compare the performance of the my old function with yours, are there some functions in R I could use to get the elapsed time etc? Many Thanks indeed. Regards, Gilbert -----Original Message----- From: Adaikalavan Ramasamy [mailto:ramasamy at cancer.org.uk] Sent: 19 July 2005 23:38 To: Gilbert Wu Cc: r-help at stat.math.ethz.ch Subject: RE: [R] colnames What does class(r1) give you ? If it is "data.frame", then try exp( diff( log( as.matrix( df ) ) ) ) BTW, I made the assumption that both x and y are positive values only. Regards, Adai On Tue, 2005-07-19 at 16:30 +0100, Gilbert Wu wrote:> Hi Adai, > > When I tried the optimized routine, I got the following error message: > > r1 > 899188 902232 901714 28176U 15322M > 20050713 7.595 10.97 17.96999 5.1925 11.44 > 20050714 7.605 10.94 18.00999 5.2500 11.50 > 20050715 7.480 10.99 17.64999 5.2500 11.33 > 20050718 7.415 11.05 17.64000 5.2250 11.27 > > exp(diff(log(r1))) -1 > Error in r[i1] - r[-length(r):-(length(r) - lag + 1)] : > non-numeric argument to binary operator > > > > Any idea? > > Many Thanks. > > Gilbert > -----Original Message----- > From: Adaikalavan Ramasamy [mailto:ramasamy at cancer.org.uk] > Sent: 19 July 2005 12:20 > To: Gilbert Wu > Cc: r-help at stat.math.ethz.ch > Subject: RE: [R] colnames > > > First, your problem could be boiled down to the following example. See > how the colnames of the two outputs vary. > > df <- cbind.data.frame( "100"=1:2, "200"=3:4 ) > df/df > X100 X200 > 1 1 1 > 2 1 1 > > m <- as.matrix( df ) # coerce to matrix class > m/m > 100 200 > 1 1 1 > 2 1 1 > > It appears that whenever R has to create a new dataframe automatically, > it tries to get nice colnames. See help(data.frame). I am not exactly > sure why this behaviour is different when creating a matrix. But I do > not think this is a major problem for most people. If you coerce your > input to matrix, the problem goes away. > > > Next, note the following points : > a) "mat[ 1:3, 1:ncol(mat) ]" is equivalent to simply "mat[ 1:3, ]". > b) "mat[ 2:nrow(mat), ]" is equivalent to simply "mat[ -1, ]" > See help(subset) for more information. > > Using the points above, we can simplify your function as > > p.RIs2Returns <- function (mat){ > > mat <- as.matrix(mat) > x <- mat[ -nrow(mat), ] > y <- mat[ -1, ] > > return( y/x -1 ) > } > > If your data contains only numerical data, it is probably good idea to > work with matrices as matrix operations are faster. > > > Finally, we can shorten your function. You can use the diff (which works > column-wise if input is a matrix) and apply function if you know that > > y/x = exp(log(y/x)) = exp( log(y) - log(x) ) > > which could be coded in R as > > exp( diff( log(r1) ) ) > > and then subtract 1 from above to get your returns. > > Regards, Adai > > > > On Tue, 2005-07-19 at 09:17 +0100, Gilbert Wu wrote: > > Hi Adai, > > > > Many Thanks for the examples. > > > > I work for a financial institution. We are exploring R as a tool to implement our portfolio optimization strategies. Hence, R is still a new language to us. > > > > The script I wrote tried to make a returns matrix from the daily return indices extracted from a SQL database. Please find below the output that produces the 'X' prefix in the colnames. The reason to preserve the column names is that they are stock identifiers which are to be used by other sub systems rather than R. > > > > I would welcome any suggestion to improve the script. > > > > > > Regards, > > > > Gilbert > > > > > "p.RIs2Returns" <- > > + function (RIm) > > + { > > + x<-RIm[1:(nrow(RIm)-1), 1:ncol(RIm)] > > + y<-RIm[2:nrow(RIm), 1:ncol(RIm)] > > + RReturns <- (y/x -1) > > + RReturns > > + } > > > > > > > > > channel<-odbcConnect("ourSQLDB") > > > result<-sqlQuery(channel,paste("select * from equityRIs;")) > > > odbcClose(channel) > > > result > > stockid sdate dbPrice > > 1 899188 20050713 7.59500 > > 2 899188 20050714 7.60500 > > 3 899188 20050715 7.48000 > > 4 899188 20050718 7.41500 > > 5 902232 20050713 10.97000 > > 6 902232 20050714 10.94000 > > 7 902232 20050715 10.99000 > > 8 902232 20050718 11.05000 > > 9 901714 20050713 17.96999 > > 10 901714 20050714 18.00999 > > 11 901714 20050715 17.64999 > > 12 901714 20050718 17.64000 > > 13 28176U 20050713 5.19250 > > 14 28176U 20050714 5.25000 > > 15 28176U 20050715 5.25000 > > 16 28176U 20050718 5.22500 > > 17 15322M 20050713 11.44000 > > 18 15322M 20050714 11.50000 > > 19 15322M 20050715 11.33000 > > 20 15322M 20050718 11.27000 > > > r1<-reshape(result, timevar="stockid", idvar="sdate", direction="wide") > > > r1 > > sdate dbPrice.899188 dbPrice.902232 dbPrice.901714 dbPrice.28176U dbPrice.15322M > > 1 20050713 7.595 10.97 17.96999 5.1925 11.44 > > 2 20050714 7.605 10.94 18.00999 5.2500 11.50 > > 3 20050715 7.480 10.99 17.64999 5.2500 11.33 > > 4 20050718 7.415 11.05 17.64000 5.2250 11.27 > > > #Set sdate as the rownames > > > rownames(r1) <-as.character(r1[1:nrow(r1),1:1]) > > > #Get rid of the first column > > > r1 <- r1[1:nrow(r1),2:ncol(r1)] > > > r1 > > dbPrice.899188 dbPrice.902232 dbPrice.901714 dbPrice.28176U dbPrice.15322M > > 20050713 7.595 10.97 17.96999 5.1925 11.44 > > 20050714 7.605 10.94 18.00999 5.2500 11.50 > > 20050715 7.480 10.99 17.64999 5.2500 11.33 > > 20050718 7.415 11.05 17.64000 5.2250 11.27 > > > colnames(r1) <- as.character(sub("[[:alnum:]]*\\.","", colnames(r1))) > > > r1 > > 899188 902232 901714 28176U 15322M > > 20050713 7.595 10.97 17.96999 5.1925 11.44 > > 20050714 7.605 10.94 18.00999 5.2500 11.50 > > 20050715 7.480 10.99 17.64999 5.2500 11.33 > > 20050718 7.415 11.05 17.64000 5.2250 11.27 > > > RRs<-p.RIs2Returns(r1) > > > RRs > > X899188 X902232 X901714 X28176U X15322M > > 20050714 0.001316656 -0.002734731 0.002225933 0.011073664 0.005244755 > > 20050715 -0.016436555 0.004570384 -0.019988906 0.000000000 -0.014782609 > > 20050718 -0.008689840 0.005459509 -0.000566006 -0.004761905 -0.005295675 > > > > > > >