thr3ads.net - R help - [R] data after write() is off by 1 ? [Nov 2012]

If this information is useful, please help other people find it:
Share via:

Brian Feeny

2012-Nov-20 19:30 UTC

[R] data after write() is off by 1 ?

I am new to R, so I am sure I am making a simple mistake.  I am including
complete information in hopes
someone can help me.

Basically my data in R looks good, I write it to a file, and every value is off
by 1.

Here is my flow:
> str(prediction) Factor w/ 10 levels "0","1","2","3",..:
3 1 10 10 4 8 1 4 1 4 ...
 - attr(*, "names")= chr [1:28000] "1" "2"
"3" "4" ...> print(prediction)    1     2     3     4     5     6     7     8     9    10    11    12    13   
14    15    16    17    18    19    20    21    22    23
    2     0     9     9     3     7     0     3     0     3     5     7     4   
0     4     3     3     1     9     0     9     1     1

ok, so it shows my values are 2, 0, 9, 9, 3 etc

# I write my file out
write(prediction, file="prediction.csv")

# look at the first 10 values
$ head -10 prediction.csv 
3 1 10 10 4
8 1 4 1 4
6 8 5 1 5
4 4 2 10 1
10 2 2 6 8
5 3 8 5 8
8 6 5 3 7
3 6 6 2 7
8 8 5 10 9
8 9 3 7 8

The complete work of what I did was as follows:

# First I load in a dataset, label the first column as a
factor> dataset <- read.csv('train.csv',head=TRUE)
> dataset$label <- as.factor(dataset$label)
# it has 42000 obs. 785 variables> str(dataset)'data.frame':	42000 obs. of  785 variables:
 $ label   : Factor w/ 10 levels
"0","1","2","3",..: 2 1 2 5 1 1 8 4 6 4
...
 $ pixel0  : int  0 0 0 0 0 0 0 0 0 0 ...
 $ pixel1  : int  0 0 0 0 0 0 0 0 0 0 ...
 $ pixel2  : int  0 0 0 0 0 0 0 0 0 0 ...
  [list output truncated]

# I make a sampling testset and trainset> index <- 1:nrow(dataset)
> testindex <- sample(index, trunc(length(index)*30/100))
> testset <- dataset[testindex,]
> trainset <- dataset[-testindex,]
# build model, predict, view> model  <- svm(label~., data = trainset,
type="C-classification", kernel="radial", gamma=0.0000001,
cost=16)
> prediction <- predict(model, testset)
> tab <- table(pred = prediction, true = testset[,1])    true
pred    0    1    2    3    4    5    6    7    8    9
   0 1210    0    3    1    0    5    7    2    5    8
   1    0 1415    2    0    2    1    0    7    5    0
   2    0    2 1127   12    3    0    2    7    2    0
   3    0    0    7 1296    0   10    0    2   15    6
   4    1    1    8    2 1201    2    4    3    5   16
   5    3    1    0   13    0 1100    3    1    2    3
   6    3    0    3    0    5    9 1263    0    1    0
   7    0    2    9    6    6    1    0 1296    1   13
   8    3    5    7   11    1    2    0    2 1190    4
   9    1    1    2    3   17    2    0    4    4 1190


Ok everything looks great up to this point..........so I try to apply my model
to a "real" testset, which is the same format as my previous
dataset, except it does not have the label/factor column, so its 28000 obs 784
variables:
> testset <- read.csv('test.csv',head=TRUE)
> str(testset)'data.frame':	28000 obs. of  784 variables:
 $ pixel0  : int  0 0 0 0 0 0 0 0 0 0 ...
 $ pixel1  : int  0 0 0 0 0 0 0 0 0 0 ...
 $ pixel2  : int  0 0 0 0 0 0 0 0 0 0 ...
  [list output truncated]
> prediction <- predict(model, testset)
> summary(prediction)   0    1    2    3    4    5    6    7    8    9 
2780 3204 2824 2767 2771 2516 2744 2898 2736 2760 > print(prediction)    1     2     3     4     5     6     7     8     9    10    11    12    13   
14    15    16    17    18    19    20    21    22    23
    2     0     9     9     3     7     0     3     0     3     5     7     4   
0     4     3     3     1     9     0     9     1     1
   24    25    26    27    28    29    30    31    32    33    34    35    36   
37    38    39    40    41    42    43    44    45    46
    5     7     4     2     7     4     7     7     5     4     2     6     2   
5     5     1     6     7     7     4     9     8     7
  [list output truncated]
> write(prediction, file="prediction.csv")$ head -10 prediction.csv 
3 1 10 10 4
8 1 4 1 4
6 8 5 1 5
4 4 2 10 1
10 2 2 6 8
5 3 8 5 8
8 6 5 3 7
3 6 6 2 7
8 8 5 10 9
8 9 3 7 8


I am obviously making a mistake.  Everything is off by a value of 1.


Can someone tell me what I am doing wrong?

Brian



	[[alternative HTML version deleted]]

Brian Feeny

2012-Nov-20 19:45 UTC

head link

[R] data after write() is off by 1 ?

A followup to my own post, I believe I figured this out, but if I should be
doing something different please correct:
> prediction.out <- levels(prediction)[prediction]
> write(prediction.out, file="prediction.csv")
This gives me my correctly adjusted values

Brian

On Nov 20, 2012, at 2:30 PM, Brian Feeny wrote:
> I am new to R, so I am sure I am making a simple mistake.  I am including
complete information in hopes
> someone can help me.
> 
> Basically my data in R looks good, I write it to a file, and every value is
off by 1.
> 
> Here is my flow:
> 
>> str(prediction)
> Factor w/ 10 levels
"0","1","2","3",..: 3 1 10 10 4 8 1 4 1
4 ...
> - attr(*, "names")= chr [1:28000] "1" "2"
"3" "4" ...
>> print(prediction)
>    1     2     3     4     5     6     7     8     9    10    11    12   
13    14    15    16    17    18    19    20    21    22    23
>    2     0     9     9     3     7     0     3     0     3     5     7    
4     0     4     3     3     1     9     0     9     1     1
> 
> ok, so it shows my values are 2, 0, 9, 9, 3 etc
> 
> # I write my file out
> write(prediction, file="prediction.csv")
> 
> # look at the first 10 values
> $ head -10 prediction.csv 
> 3 1 10 10 4
> 8 1 4 1 4
> 6 8 5 1 5
> 4 4 2 10 1
> 10 2 2 6 8
> 5 3 8 5 8
> 8 6 5 3 7
> 3 6 6 2 7
> 8 8 5 10 9
> 8 9 3 7 8
> 
> The complete work of what I did was as follows:
> 
> # First I load in a dataset, label the first column as a factor
>> dataset <- read.csv('train.csv',head=TRUE)
>> dataset$label <- as.factor(dataset$label)
> 
> # it has 42000 obs. 785 variables
>> str(dataset)
> 'data.frame':	42000 obs. of  785 variables:
> $ label   : Factor w/ 10 levels
"0","1","2","3",..: 2 1 2 5 1 1 8 4 6 4
...
> $ pixel0  : int  0 0 0 0 0 0 0 0 0 0 ...
> $ pixel1  : int  0 0 0 0 0 0 0 0 0 0 ...
> $ pixel2  : int  0 0 0 0 0 0 0 0 0 0 ...
>  [list output truncated]
> 
> # I make a sampling testset and trainset
>> index <- 1:nrow(dataset)
>> testindex <- sample(index, trunc(length(index)*30/100))
>> testset <- dataset[testindex,]
>> trainset <- dataset[-testindex,]
> 
> # build model, predict, view
>> model  <- svm(label~., data = trainset,
type="C-classification", kernel="radial", gamma=0.0000001,
cost=16)
>> prediction <- predict(model, testset)
>> tab <- table(pred = prediction, true = testset[,1])
>    true
> pred    0    1    2    3    4    5    6    7    8    9
>   0 1210    0    3    1    0    5    7    2    5    8
>   1    0 1415    2    0    2    1    0    7    5    0
>   2    0    2 1127   12    3    0    2    7    2    0
>   3    0    0    7 1296    0   10    0    2   15    6
>   4    1    1    8    2 1201    2    4    3    5   16
>   5    3    1    0   13    0 1100    3    1    2    3
>   6    3    0    3    0    5    9 1263    0    1    0
>   7    0    2    9    6    6    1    0 1296    1   13
>   8    3    5    7   11    1    2    0    2 1190    4
>   9    1    1    2    3   17    2    0    4    4 1190
> 
> 
> Ok everything looks great up to this point..........so I try to apply my
model to a "real" testset, which is the same format as my previous
> dataset, except it does not have the label/factor column, so its 28000 obs
784 variables:
> 
>> testset <- read.csv('test.csv',head=TRUE)
>> str(testset)
> 'data.frame':	28000 obs. of  784 variables:
> $ pixel0  : int  0 0 0 0 0 0 0 0 0 0 ...
> $ pixel1  : int  0 0 0 0 0 0 0 0 0 0 ...
> $ pixel2  : int  0 0 0 0 0 0 0 0 0 0 ...
>  [list output truncated]
> 
>> prediction <- predict(model, testset)
>> summary(prediction)
>   0    1    2    3    4    5    6    7    8    9 
> 2780 3204 2824 2767 2771 2516 2744 2898 2736 2760 
>> print(prediction)
>    1     2     3     4     5     6     7     8     9    10    11    12   
13    14    15    16    17    18    19    20    21    22    23
>    2     0     9     9     3     7     0     3     0     3     5     7    
4     0     4     3     3     1     9     0     9     1     1
>   24    25    26    27    28    29    30    31    32    33    34    35   
36    37    38    39    40    41    42    43    44    45    46
>    5     7     4     2     7     4     7     7     5     4     2     6    
2     5     5     1     6     7     7     4     9     8     7
>  [list output truncated]
> 
>> write(prediction, file="prediction.csv")
> $ head -10 prediction.csv 
> 3 1 10 10 4
> 8 1 4 1 4
> 6 8 5 1 5
> 4 4 2 10 1
> 10 2 2 6 8
> 5 3 8 5 8
> 8 6 5 3 7
> 3 6 6 2 7
> 8 8 5 10 9
> 8 9 3 7 8
> 
> 
> I am obviously making a mistake.  Everything is off by a value of 1.
> 
> 
> Can someone tell me what I am doing wrong?
> 
> Brian
> 
> 
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Duncan Murdoch

2012-Nov-20 19:46 UTC

head link

[R] data after write() is off by 1 ?

On 20/11/2012 2:30 PM, Brian Feeny wrote:> I am new to R, so I am sure I am making a simple mistake.  I am including
complete information in hopes
> someone can help me.
>
> Basically my data in R looks good, I write it to a file, and every value is
off by 1.
>
> Here is my flow:
>
> > str(prediction)
>   Factor w/ 10 levels
"0","1","2","3",..: 3 1 10 10 4 8 1 4 1
4 ...
>   - attr(*, "names")= chr [1:28000] "1" "2"
"3" "4" ...
You have a factor, not numerical data.  Apparently write() is writing 
out the factor values (index into the levels) rather than their string 
representation.  (I've never used write().  Normally would use cat() or 
write.csv() or something related to write data
to a file for reading outside of R. )  write.csv() will write out the 
strings, by default in quotes, but there are lots of arguments
to control the formatting.

Duncan Murdoch
> > print(prediction)
>      1     2     3     4     5     6     7     8     9    10    11    12   
13    14    15    16    17    18    19    20    21    22    23
>      2     0     9     9     3     7     0     3     0     3     5     7   
4     0     4     3     3     1     9     0     9     1     1
>
> ok, so it shows my values are 2, 0, 9, 9, 3 etc
>
> # I write my file out
> write(prediction, file="prediction.csv")
>
> # look at the first 10 values
> $ head -10 prediction.csv
> 3 1 10 10 4
> 8 1 4 1 4
> 6 8 5 1 5
> 4 4 2 10 1
> 10 2 2 6 8
> 5 3 8 5 8
> 8 6 5 3 7
> 3 6 6 2 7
> 8 8 5 10 9
> 8 9 3 7 8
>
> The complete work of what I did was as follows:
>
> # First I load in a dataset, label the first column as a factor
> > dataset <- read.csv('train.csv',head=TRUE)
> > dataset$label <- as.factor(dataset$label)
>
> # it has 42000 obs. 785 variables
> > str(dataset)
> 'data.frame':	42000 obs. of  785 variables:
>   $ label   : Factor w/ 10 levels
"0","1","2","3",..: 2 1 2 5 1 1 8 4 6 4
...
>   $ pixel0  : int  0 0 0 0 0 0 0 0 0 0 ...
>   $ pixel1  : int  0 0 0 0 0 0 0 0 0 0 ...
>   $ pixel2  : int  0 0 0 0 0 0 0 0 0 0 ...
>    [list output truncated]
>
> # I make a sampling testset and trainset
> > index <- 1:nrow(dataset)
> > testindex <- sample(index, trunc(length(index)*30/100))
> > testset <- dataset[testindex,]
> > trainset <- dataset[-testindex,]
>
> # build model, predict, view
> > model  <- svm(label~., data = trainset,
type="C-classification", kernel="radial", gamma=0.0000001,
cost=16)
> > prediction <- predict(model, testset)
> > tab <- table(pred = prediction, true = testset[,1])
>      true
> pred    0    1    2    3    4    5    6    7    8    9
>     0 1210    0    3    1    0    5    7    2    5    8
>     1    0 1415    2    0    2    1    0    7    5    0
>     2    0    2 1127   12    3    0    2    7    2    0
>     3    0    0    7 1296    0   10    0    2   15    6
>     4    1    1    8    2 1201    2    4    3    5   16
>     5    3    1    0   13    0 1100    3    1    2    3
>     6    3    0    3    0    5    9 1263    0    1    0
>     7    0    2    9    6    6    1    0 1296    1   13
>     8    3    5    7   11    1    2    0    2 1190    4
>     9    1    1    2    3   17    2    0    4    4 1190
>
>
> Ok everything looks great up to this point..........so I try to apply my
model to a "real" testset, which is the same format as my previous
> dataset, except it does not have the label/factor column, so its 28000 obs
784 variables:
>
> > testset <- read.csv('test.csv',head=TRUE)
> > str(testset)
> 'data.frame':	28000 obs. of  784 variables:
>   $ pixel0  : int  0 0 0 0 0 0 0 0 0 0 ...
>   $ pixel1  : int  0 0 0 0 0 0 0 0 0 0 ...
>   $ pixel2  : int  0 0 0 0 0 0 0 0 0 0 ...
>    [list output truncated]
>
> > prediction <- predict(model, testset)
> > summary(prediction)
>     0    1    2    3    4    5    6    7    8    9
> 2780 3204 2824 2767 2771 2516 2744 2898 2736 2760
> > print(prediction)
>      1     2     3     4     5     6     7     8     9    10    11    12   
13    14    15    16    17    18    19    20    21    22    23
>      2     0     9     9     3     7     0     3     0     3     5     7   
4     0     4     3     3     1     9     0     9     1     1
>     24    25    26    27    28    29    30    31    32    33    34    35   
36    37    38    39    40    41    42    43    44    45    46
>      5     7     4     2     7     4     7     7     5     4     2     6   
2     5     5     1     6     7     7     4     9     8     7
>    [list output truncated]
>
> > write(prediction, file="prediction.csv")
> $ head -10 prediction.csv
> 3 1 10 10 4
> 8 1 4 1 4
> 6 8 5 1 5
> 4 4 2 10 1
> 10 2 2 6 8
> 5 3 8 5 8
> 8 6 5 3 7
> 3 6 6 2 7
> 8 8 5 10 9
> 8 9 3 7 8
>
>
> I am obviously making a mistake.  Everything is off by a value of 1.
>
>
> Can someone tell me what I am doing wrong?
>
> Brian
>
>
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Rui Barradas

2012-Nov-20 19:50 UTC

head link

[R] data after write() is off by 1 ?

Hello,

You are seeing the levels of a factor but saving its values. Internally, 
factors are coded as consecutive integers starting at 1, and that's what 
is saved to file using write.table. To have the levels "0",
"1", etc and
not the corresponding values 1, 2, etc, try

levels(prediction)[prediction]

or

as.integer(levels(prediction)[prediction])


Hope this helps,

Rui Barradas
Em 20-11-2012 19:30, Brian Feeny escreveu:> I am new to R, so I am sure I am making a simple mistake.  I am including
complete information in hopes
> someone can help me.
>
> Basically my data in R looks good, I write it to a file, and every value is
off by 1.
>
> Here is my flow:
>
>> str(prediction)
>   Factor w/ 10 levels
"0","1","2","3",..: 3 1 10 10 4 8 1 4 1
4 ...
>   - attr(*, "names")= chr [1:28000] "1" "2"
"3" "4" ...
>> print(prediction)
>      1     2     3     4     5     6     7     8     9    10    11    12   
13    14    15    16    17    18    19    20    21    22    23
>      2     0     9     9     3     7     0     3     0     3     5     7   
4     0     4     3     3     1     9     0     9     1     1
>
> ok, so it shows my values are 2, 0, 9, 9, 3 etc
>
> # I write my file out
> write(prediction, file="prediction.csv")
>
> # look at the first 10 values
> $ head -10 prediction.csv
> 3 1 10 10 4
> 8 1 4 1 4
> 6 8 5 1 5
> 4 4 2 10 1
> 10 2 2 6 8
> 5 3 8 5 8
> 8 6 5 3 7
> 3 6 6 2 7
> 8 8 5 10 9
> 8 9 3 7 8
>
> The complete work of what I did was as follows:
>
> # First I load in a dataset, label the first column as a factor
>> dataset <- read.csv('train.csv',head=TRUE)
>> dataset$label <- as.factor(dataset$label)
> # it has 42000 obs. 785 variables
>> str(dataset)
> 'data.frame':	42000 obs. of  785 variables:
>   $ label   : Factor w/ 10 levels
"0","1","2","3",..: 2 1 2 5 1 1 8 4 6 4
...
>   $ pixel0  : int  0 0 0 0 0 0 0 0 0 0 ...
>   $ pixel1  : int  0 0 0 0 0 0 0 0 0 0 ...
>   $ pixel2  : int  0 0 0 0 0 0 0 0 0 0 ...
>    [list output truncated]
>
> # I make a sampling testset and trainset
>> index <- 1:nrow(dataset)
>> testindex <- sample(index, trunc(length(index)*30/100))
>> testset <- dataset[testindex,]
>> trainset <- dataset[-testindex,]
> # build model, predict, view
>> model  <- svm(label~., data = trainset,
type="C-classification", kernel="radial", gamma=0.0000001,
cost=16)
>> prediction <- predict(model, testset)
>> tab <- table(pred = prediction, true = testset[,1])
>      true
> pred    0    1    2    3    4    5    6    7    8    9
>     0 1210    0    3    1    0    5    7    2    5    8
>     1    0 1415    2    0    2    1    0    7    5    0
>     2    0    2 1127   12    3    0    2    7    2    0
>     3    0    0    7 1296    0   10    0    2   15    6
>     4    1    1    8    2 1201    2    4    3    5   16
>     5    3    1    0   13    0 1100    3    1    2    3
>     6    3    0    3    0    5    9 1263    0    1    0
>     7    0    2    9    6    6    1    0 1296    1   13
>     8    3    5    7   11    1    2    0    2 1190    4
>     9    1    1    2    3   17    2    0    4    4 1190
>
>
> Ok everything looks great up to this point..........so I try to apply my
model to a "real" testset, which is the same format as my previous
> dataset, except it does not have the label/factor column, so its 28000 obs
784 variables:
>
>> testset <- read.csv('test.csv',head=TRUE)
>> str(testset)
> 'data.frame':	28000 obs. of  784 variables:
>   $ pixel0  : int  0 0 0 0 0 0 0 0 0 0 ...
>   $ pixel1  : int  0 0 0 0 0 0 0 0 0 0 ...
>   $ pixel2  : int  0 0 0 0 0 0 0 0 0 0 ...
>    [list output truncated]
>
>> prediction <- predict(model, testset)
>> summary(prediction)
>     0    1    2    3    4    5    6    7    8    9
> 2780 3204 2824 2767 2771 2516 2744 2898 2736 2760
>> print(prediction)
>      1     2     3     4     5     6     7     8     9    10    11    12   
13    14    15    16    17    18    19    20    21    22    23
>      2     0     9     9     3     7     0     3     0     3     5     7   
4     0     4     3     3     1     9     0     9     1     1
>     24    25    26    27    28    29    30    31    32    33    34    35   
36    37    38    39    40    41    42    43    44    45    46
>      5     7     4     2     7     4     7     7     5     4     2     6   
2     5     5     1     6     7     7     4     9     8     7
>    [list output truncated]
>
>> write(prediction, file="prediction.csv")
> $ head -10 prediction.csv
> 3 1 10 10 4
> 8 1 4 1 4
> 6 8 5 1 5
> 4 4 2 10 1
> 10 2 2 6 8
> 5 3 8 5 8
> 8 6 5 3 7
> 3 6 6 2 7
> 8 8 5 10 9
> 8 9 3 7 8
>
>
> I am obviously making a mistake.  Everything is off by a value of 1.
>
>
> Can someone tell me what I am doing wrong?
>
> Brian
>
>
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Seemingly Similar Threads

Search for more possibly parallel threads

R help - Nov 2012 - data after write() is off by 1 ?

[R] data after write() is off by 1 ?

[R] data after write() is off by 1 ?

[R] data after write() is off by 1 ?

[R] data after write() is off by 1 ?

Seemingly Similar Threads