Hi, I am still new to R and this is my first post on this mailing-list. I have two .csv (each one being a column of real numbers) coming from the same database (the first one is just longer than the second) and I read them in R the following way: returns <- read.csv("test.csv", header = FALSE) returns2 <- read.csv("test2.csv", header = FALSE) However, the two objects clearly don't seem to be equivalent:> returns[2528:2537,1][1] -0.002206 0.115696 -0.015192 0.008719 -0.004654 -0.010688 0.009453 0.002676 0.001334 -0.011326 7470 Levels: -0.000078 -0.000085 -0.000086 -0.0001 -0.000112 -0.000115 -0.000152 -0.000154 -0.000157 -0.00016 -0.000171 -0.000185 -0.000212 -0.000238 -0.000256 -0.000259 -0.000263 -0.000273 ... C> returns2[1:10,1][1] -0.002206 0.115696 -0.015192 0.008719 -0.004654 -0.010688 0.009453 0.002676 0.001334 -0.011326> as.numeric(returns[2528:2537,1])[1] 341 7444 2244 5149 787 1717 5251 4122 3878 1811> as.numeric(returns2[1:10,1])[1] -0.002206 0.115696 -0.015192 0.008719 -0.004654 -0.010688 0.009453 0.002676 0.001334 -0.011326 I would like to understand what's happening and how to handle the longer one. This problem may seem stupid, but I've been trying to figure it out for a while and nothing seems to work. I checked in excel and both seems to be completely normal lists of real numbers). What am I missing here? What are those "levels" and why the as.numeric doesn't work the same with the longer one? My final goal is to extract small parts of those columns the following way:> cbind(returns[which(names == id)[2528:2537],1])[,1] [1,] 341 [2,] 7444 [3,] 2244 [4,] 5149 [5,] 787 [6,] 1717 [7,] 5251 [8,] 4122 [9,] 3878 [10,] 1811 Wich should be equivalent to:> cbind(returns2[which(names == id)[1:10],1])[,1] [1,] -0.002206 [2,] 0.115696 [3,] -0.015192 [4,] 0.008719 [5,] -0.004654 [6,] -0.010688 [7,] 0.009453 [8,] 0.002676 [9,] 0.001334 [10,] -0.011326 Thanks a lot, Thibault --------- *Thibault Vatter* EPFL- Master, 1ère année Laboratory of Statistical Biophysics <http://lbs.epfl.ch/> Tel: +41 78 820 18 64 @: thibault.vatter@epfl.ch Web: http://personnes.epfl.ch/thibault.vatter *Please consider the environment before printing this email.* [[alternative HTML version deleted]]
On Sun, Apr 10, 2011 at 05:47:59PM +0200, Thibault Vatter wrote:> Hi, > > I am still new to R and this is my first post on this mailing-list. > > I have two .csv (each one being a column of real numbers) coming from the > same database (the first one is just longer than the second) and I read them > in R the following way: > > returns <- read.csv("test.csv", header = FALSE) > returns2 <- read.csv("test2.csv", header = FALSE) > > However, the two objects clearly don't seem to be equivalent: > > > returns[2528:2537,1] > [1] -0.002206 0.115696 -0.015192 0.008719 -0.004654 -0.010688 0.009453 > 0.002676 0.001334 -0.011326 > 7470 Levels: -0.000078 -0.000085 -0.000086 -0.0001 -0.000112 -0.000115 > -0.000152 -0.000154 -0.000157 -0.00016 -0.000171 -0.000185 -0.000212 > -0.000238 -0.000256 -0.000259 -0.000263 -0.000273 ... CHi. It seems that the first file contains a non-numeric row. It may contain "C", which is the last of the levels. In this case, the whole column is considered as a character vector and is converted to a factor.> > returns2[1:10,1] > [1] -0.002206 0.115696 -0.015192 0.008719 -0.004654 -0.010688 0.009453 > 0.002676 0.001334 -0.011326 > > > as.numeric(returns[2528:2537,1]) > [1] 341 7444 2244 5149 787 1717 5251 4122 3878 1811These are indices to the levels of the factor. Petr Savicky.
Thibault, Your questions indicate that you would benefit enormously from reading 'An Introduction to R'. A very useful function is str(). Understanding the concept of "factors" is crucial in R. "Checking" anything with Excel is never much use. Peter Ehlers On 2011-04-10 08:47, Thibault Vatter wrote:> Hi, > > I am still new to R and this is my first post on this mailing-list. > > I have two .csv (each one being a column of real numbers) coming from the > same database (the first one is just longer than the second) and I read them > in R the following way: > > returns<- read.csv("test.csv", header = FALSE) > returns2<- read.csv("test2.csv", header = FALSE) > > However, the two objects clearly don't seem to be equivalent: > >> returns[2528:2537,1] > [1] -0.002206 0.115696 -0.015192 0.008719 -0.004654 -0.010688 0.009453 > 0.002676 0.001334 -0.011326 > 7470 Levels: -0.000078 -0.000085 -0.000086 -0.0001 -0.000112 -0.000115 > -0.000152 -0.000154 -0.000157 -0.00016 -0.000171 -0.000185 -0.000212 > -0.000238 -0.000256 -0.000259 -0.000263 -0.000273 ... C > >> returns2[1:10,1] > [1] -0.002206 0.115696 -0.015192 0.008719 -0.004654 -0.010688 0.009453 > 0.002676 0.001334 -0.011326 > >> as.numeric(returns[2528:2537,1]) > [1] 341 7444 2244 5149 787 1717 5251 4122 3878 1811 > >> as.numeric(returns2[1:10,1]) > [1] -0.002206 0.115696 -0.015192 0.008719 -0.004654 -0.010688 0.009453 > 0.002676 0.001334 -0.011326 > > I would like to understand what's happening and how to handle the longer > one. This problem may seem stupid, but I've been trying to figure it out for > a while and nothing seems to work. I checked in excel and both seems to be > completely normal lists of real numbers). > > What am I missing here? What are those "levels" and why the as.numeric > doesn't work the same with the longer one? > > My final goal is to extract small parts of those columns the following way: > >> cbind(returns[which(names == id)[2528:2537],1]) > [,1] > [1,] 341 > [2,] 7444 > [3,] 2244 > [4,] 5149 > [5,] 787 > [6,] 1717 > [7,] 5251 > [8,] 4122 > [9,] 3878 > [10,] 1811 > > Wich should be equivalent to: > >> cbind(returns2[which(names == id)[1:10],1]) > [,1] > [1,] -0.002206 > [2,] 0.115696 > [3,] -0.015192 > [4,] 0.008719 > [5,] -0.004654 > [6,] -0.010688 > [7,] 0.009453 > [8,] 0.002676 > [9,] 0.001334 > [10,] -0.011326 > > Thanks a lot, > Thibault > > --------- > *Thibault Vatter* > EPFL- Master, 1?re ann?e > Laboratory of Statistical Biophysics<http://lbs.epfl.ch/> > > Tel: +41 78 820 18 64 > @: thibault.vatter at epfl.ch > Web: http://personnes.epfl.ch/thibault.vatter > > *Please consider the environment before printing this email.* > > [[alternative HTML version deleted]] >
On 11/04/11 10:08, Peter Ehlers wrote: <SNIP>> "Checking" anything with Excel is never much use.<SNIP> Fortune? cheers, Rolf Turner
On Sun, Apr 10, 2011 at 05:47:59PM +0200, Thibault Vatter wrote:> Hi, > > I am still new to R and this is my first post on this mailing-list. > > I have two .csv (each one being a column of real numbers) coming from the > same database (the first one is just longer than the second) and I read them > in R the following way: > > returns <- read.csv("test.csv", header = FALSE) > returns2 <- read.csv("test2.csv", header = FALSE) > > However, the two objects clearly don't seem to be equivalent: > > > returns[2528:2537,1] > [1] -0.002206 0.115696 -0.015192 0.008719 -0.004654 -0.010688 0.009453 > 0.002676 0.001334 -0.011326 > 7470 Levels: -0.000078 -0.000085 -0.000086 -0.0001 -0.000112 -0.000115 > -0.000152 -0.000154 -0.000157 -0.00016 -0.000171 -0.000185 -0.000212 > -0.000238 -0.000256 -0.000259 -0.000263 -0.000273 ... CThere is probably a non-numeric row in the data. In order to locate this row, try the following which(is.na(as.numeric(as.character(returns[, 1])))) This will show the indices of the rows, which cannot be converted to numeric type. Petr Savicky.
--- On Sun, 4/10/11, Rolf Turner <rolf.turner at xtra.co.nz> wrote:> From: Rolf Turner <rolf.turner at xtra.co.nz> > Subject: Re: [R] Question about levels/as.numeric > To: r-help at r-project.org > Received: Sunday, April 10, 2011, 9:48 PM > On 11/04/11 10:08, Peter Ehlers > wrote: > > <SNIP> > > "Checking" anything with Excel is never much use. > > <SNIP> > > Fortune? > > ? ? cheers, > > ? ? ? ? Rolf TurnerDefinitely!