Hello,
I've been using a pre-release version of R v 2.8.0 for Windows for the last
couple months. I think that there have been consistent problems with subsetting
data sets, but I had usually been able to find work-arounds or was unable to
confirm this as a bug. I think now I have, and would love advice on what to do
if I've made some error.
The data set in question ("c") has 500,000 observations and 44
variables. The problematic variable, "month," takes integer values
1:12, and all are present in the data set:
> unique(c$month)
[1] 11 10 9 8 12 1 7 4 6 2 5 3
However, I can't select observations of c for certain values of month:
> c[c$month==11,]
[1] STATE DISTRICT TALUK VILLAGE TYPE SERIALNO
INTDATE QH101P
[9] QH114 QH115A1 QH115B1 QH115C1 QH115A2 QH115B2
QH115C2 QH115A3
[17] QH115B3 QH115C3 QH115A4 QH115B4 QH115C4 QH115A5
QH115B5 QH115C5
[25] QH116 QH117A1 QH117B1 QH117C1 QH117A2 QH117B2
QH117C2 QH117A3
[33] QH117B3 QH117C3 QH117A4 QH117B4 QH117C4 QH117A5
QH117B5 QH117C5
[41] phase year month stdistid.rch
<0 rows> (or 0-length row.names)
I get the same result for c[c[,43]==11,], and
> length(c$month[c$month==11])
[1] 0
This is true for most values of month (1,2,4,5,7,8,10,11), but the multiples of
3 work, apparently correctly.
Other variables do not have this problem (the columns shift in the email, but
these three observations have month=11):
> c[c$STATE==11,][1:3,]
STATE DISTRICT TALUK VILLAGE TYPE SERIALNO INTDATE QH101P QH114 QH115A1
QH115B1 QH115C1 QH115A2 QH115B2 QH115C2 QH115A3 QH115B3
87556 11 2 1 1 1 5 1187 6 0 0
0 0 0 0 0 0 0
87557 11 2 1 1 1 10 1187 3 0 0
0 0 0 0 0 0 0
87558 11 2 1 1 1 14 1187 5 0 0
0 0 0 0 0 0 0
QH115C3 QH115A4 QH115B4 QH115C4 QH115A5 QH115B5 QH115C5 QH116 QH117A1
QH117B1 QH117C1 QH117A2 QH117B2 QH117C2 QH117A3 QH117B3 QH117C3
87556 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
87557 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
87558 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
QH117A4 QH117B4 QH117C4 QH117A5 QH117B5 QH117C5 phase year month
stdistid.rch
87556 0 0 0 0 0 0 1 1998 11
1102
87557 0 0 0 0 0 0 1 1998 11
1102
87558 0 0 0 0 0 0 1 1998 11
1102
The data set is called directly from a csv file, where all variables should be
stored in the same way, and using as.numeric(as.character(c$month)) does not
help. Nor does restarting R, restarting the computer, or trying the operation
on smaller subsets of c. I'd appreciate any help you an provide.
Sincerely,
Alan Cohen
Does it not work in the official release of R 2.8.0? On Mon, Dec 1, 2008 at 5:32 PM, Alan Cohen <CohenA at smh.toronto.on.ca> wrote:> Hello, > > I've been using a pre-release version of R v 2.8.0 for Windows for the last couple months. I think that there have been consistent problems with subsetting data sets, but I had usually been able to find work-arounds or was unable to confirm this as a bug. I think now I have, and would love advice on what to do if I've made some error. > > The data set in question ("c") has 500,000 observations and 44 variables. The problematic variable, "month," takes integer values 1:12, and all are present in the data set: > >> unique(c$month) > [1] 11 10 9 8 12 1 7 4 6 2 5 3 > > However, I can't select observations of c for certain values of month: > >> c[c$month==11,] > [1] STATE DISTRICT TALUK VILLAGE TYPE SERIALNO INTDATE QH101P > [9] QH114 QH115A1 QH115B1 QH115C1 QH115A2 QH115B2 QH115C2 QH115A3 > [17] QH115B3 QH115C3 QH115A4 QH115B4 QH115C4 QH115A5 QH115B5 QH115C5 > [25] QH116 QH117A1 QH117B1 QH117C1 QH117A2 QH117B2 QH117C2 QH117A3 > [33] QH117B3 QH117C3 QH117A4 QH117B4 QH117C4 QH117A5 QH117B5 QH117C5 > [41] phase year month stdistid.rch > <0 rows> (or 0-length row.names) > > I get the same result for c[c[,43]==11,], and > >> length(c$month[c$month==11]) > [1] 0 > > This is true for most values of month (1,2,4,5,7,8,10,11), but the multiples of 3 work, apparently correctly. > > Other variables do not have this problem (the columns shift in the email, but these three observations have month=11): > >> c[c$STATE==11,][1:3,] > STATE DISTRICT TALUK VILLAGE TYPE SERIALNO INTDATE QH101P QH114 QH115A1 QH115B1 QH115C1 QH115A2 QH115B2 QH115C2 QH115A3 QH115B3 > 87556 11 2 1 1 1 5 1187 6 0 0 0 0 0 0 0 0 0 > 87557 11 2 1 1 1 10 1187 3 0 0 0 0 0 0 0 0 0 > 87558 11 2 1 1 1 14 1187 5 0 0 0 0 0 0 0 0 0 > QH115C3 QH115A4 QH115B4 QH115C4 QH115A5 QH115B5 QH115C5 QH116 QH117A1 QH117B1 QH117C1 QH117A2 QH117B2 QH117C2 QH117A3 QH117B3 QH117C3 > 87556 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 87557 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 87558 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > QH117A4 QH117B4 QH117C4 QH117A5 QH117B5 QH117C5 phase year month stdistid.rch > 87556 0 0 0 0 0 0 1 1998 11 1102 > 87557 0 0 0 0 0 0 1 1998 11 1102 > 87558 0 0 0 0 0 0 1 1998 11 1102 > > The data set is called directly from a csv file, where all variables should be stored in the same way, and using as.numeric(as.character(c$month)) does not help. Nor does restarting R, restarting the computer, or trying the operation on smaller subsets of c. I'd appreciate any help you an provide. > > Sincerely, > Alan Cohen > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Stephen Sefick Let's not spend our time and resources thinking about things that are so little or so large that all they really do for us is puff us up and make us feel like gods. We are mammals, and have not exhausted the annoying little problems of being mammals. -K. Mullis
I just tried:
set.seed(42)
c <- data.frame(month=sample(1:12,50,TRUE),blah=sample(letters[1:4],
50,TRUE))
c[c$month==11,]
and got
month blah
1 11 b
21 11 a
28 11 b
30 11 a
39 11 a
47 11 b
All appears to be in harmony. So there would appear to be something
funny about your
data frame ``c'', rather than there being anything wrong with
``[''.
Is c$month a factor,
perhaps? If so, what are its levels? Also have a look at str(c).
BTW ``c'' is a lousy name for an object, since it is the name of the
built-in function
which effects concatenation.
cheers,
Rolf Turner
On 2/12/2008, at 11:32 AM, Alan Cohen wrote:
> Hello,
>
> I've been using a pre-release version of R v 2.8.0 for Windows for
> the last couple months. I think that there have been consistent
> problems with subsetting data sets, but I had usually been able to
> find work-arounds or was unable to confirm this as a bug. I think
> now I have, and would love advice on what to do if I've made some
> error.
>
> The data set in question ("c") has 500,000 observations and 44
> variables. The problematic variable, "month," takes integer
values
> 1:12, and all are present in the data set:
>
>> unique(c$month)
> [1] 11 10 9 8 12 1 7 4 6 2 5 3
>
> However, I can't select observations of c for certain values of month:
>
>> c[c$month==11,]
> [1] STATE DISTRICT TALUK VILLAGE
> TYPE SERIALNO INTDATE QH101P
> [9] QH114 QH115A1 QH115B1 QH115C1
> QH115A2 QH115B2 QH115C2 QH115A3
> [17] QH115B3 QH115C3 QH115A4 QH115B4
> QH115C4 QH115A5 QH115B5 QH115C5
> [25] QH116 QH117A1 QH117B1 QH117C1
> QH117A2 QH117B2 QH117C2 QH117A3
> [33] QH117B3 QH117C3 QH117A4 QH117B4
> QH117C4 QH117A5 QH117B5 QH117C5
> [41] phase year month stdistid.rch
> <0 rows> (or 0-length row.names)
>
> I get the same result for c[c[,43]==11,], and
>
>> length(c$month[c$month==11])
> [1] 0
>
> This is true for most values of month (1,2,4,5,7,8,10,11), but the
> multiples of 3 work, apparently correctly.
>
> Other variables do not have this problem (the columns shift in the
> email, but these three observations have month=11):
>
>> c[c$STATE==11,][1:3,]
> STATE DISTRICT TALUK VILLAGE TYPE SERIALNO INTDATE QH101P
> QH114 QH115A1 QH115B1 QH115C1 QH115A2 QH115B2 QH115C2 QH115A3 QH115B3
> 87556 11 2 1 1 1 5 1187 6
> 0 0 0 0 0 0 0 0 0
> 87557 11 2 1 1 1 10 1187 3
> 0 0 0 0 0 0 0 0 0
> 87558 11 2 1 1 1 14 1187 5
> 0 0 0 0 0 0 0 0 0
> QH115C3 QH115A4 QH115B4 QH115C4 QH115A5 QH115B5 QH115C5 QH116
> QH117A1 QH117B1 QH117C1 QH117A2 QH117B2 QH117C2 QH117A3 QH117B3
> QH117C3
> 87556 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0
> 0 0
> 87557 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0
> 0 0
> 87558 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0
> 0 0
> QH117A4 QH117B4 QH117C4 QH117A5 QH117B5 QH117C5 phase year
> month stdistid.rch
> 87556 0 0 0 0 0 0 1 1998
> 11 1102
> 87557 0 0 0 0 0 0 1 1998
> 11 1102
> 87558 0 0 0 0 0 0 1 1998
> 11 1102
>
> The data set is called directly from a csv file, where all
> variables should be stored in the same way, and using as.numeric
> (as.character(c$month)) does not help. Nor does restarting R,
> restarting the computer, or trying the operation on smaller subsets
> of c. I'd appreciate any help you an provide.
######################################################################
Attention:\ This e-mail message is privileged and confid...{{dropped:9}}
Alan Cohen wrote:> Hello, > > I've been using a pre-release version of R v 2.8.0 for Windows for the last couple months. I think that there have been consistent problems with subsetting data sets, but I had usually been able to find work-arounds or was unable to confirm this as a bug. I think now I have, and would love advice on what to do if I've made some error. > > The data set in question ("c") has 500,000 observations and 44 variables. The problematic variable, "month," takes integer values 1:12, and all are present in the data set: > >> unique(c$month) > [1] 11 10 9 8 12 1 7 4 6 2 5 3 > > However, I can't select observations of c for certain values of month:...>> length(c$month[c$month==11]) > [1] 0Hmm. Does any of these make us any wiser? mode(c$month) any(c$month==11) table(c$month) dput(unique(c$month)) -- O__ ---- Peter Dalgaard ?ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907