When I try to select only those rows from the following data frame, called "data", in which X > Y X Y V3 2 2 1 8.062258 3 3 1 2.236068 4 4 1 6.324555 5 5 1 5.000000 6 1 2 8.062258 8 3 2 9.486833 9 4 2 2.236068 10 5 2 5.656854 11 1 3 2.236068 12 2 3 9.486833 14 4 3 8.062258 15 5 3 5.099020 16 1 4 6.324555 17 2 4 2.236068 18 3 4 8.062258 20 5 4 5.385165 21 1 5 5.000000 22 2 5 5.656854 23 3 5 5.099020 24 4 5 5.385165 using the commands> attach(data) > data2 = data[X >Y,];data2I get this for data2: X Y V3 3 3 1 2.236068 4 4 1 6.324555 5 5 1 5.000000 6 1 2 8.062258 10 5 2 5.656854 11 1 3 2.236068 12 2 3 9.486833 17 2 4 2.236068 18 3 4 8.062258 24 4 5 5.385165 Clearly, this is not what I intend but I cannot figure out what I've done wrong. Any help appreciated. Thanks. Jim Bouldin
What's wrong with it? It looks okay to me. If you use subset(data, data$X >data$Y)you get the same results. Any chance you're reading the row.numbers as values? BTW "data" is a reserved word in R and it is good practice not to use it as a variable name. My Results X Y V3 3 3 1 2.236068 4 4 1 6.324555 5 5 1 5.000000 6 1 2 8.062258 10 5 2 5.656854 11 1 3 2.236068 12 2 3 9.486833 17 2 4 2.236068 18 3 4 8.062258 24 4 5 5.385165 --- On Mon, 8/10/09, Jim Bouldin <jrbouldin at ucdavis.edu> wrote:> From: Jim Bouldin <jrbouldin at ucdavis.edu> > Subject: [R] problem selecting rows meeting a criterion > To: r-help at r-project.org > Received: Monday, August 10, 2009, 5:49 PM > > When I try to select only those rows from the following > data frame, called > "data", in which X > Y > > ???X Y? ? ???V3 > 2? 2 1 8.062258 > 3? 3 1 2.236068 > 4? 4 1 6.324555 > 5? 5 1 5.000000 > 6? 1 2 8.062258 > 8? 3 2 9.486833 > 9? 4 2 2.236068 > 10 5 2 5.656854 > 11 1 3 2.236068 > 12 2 3 9.486833 > 14 4 3 8.062258 > 15 5 3 5.099020 > 16 1 4 6.324555 > 17 2 4 2.236068 > 18 3 4 8.062258 > 20 5 4 5.385165 > 21 1 5 5.000000 > 22 2 5 5.656854 > 23 3 5 5.099020 > 24 4 5 5.385165 > > using the commands > > attach(data) > > data2 = data[X >Y,];data2 > > I get this for data2: > > ???X Y? ? ???V3 > 3? 3 1 2.236068 > 4? 4 1 6.324555 > 5? 5 1 5.000000 > 6? 1 2 8.062258 > 10 5 2 5.656854 > 11 1 3 2.236068 > 12 2 3 9.486833 > 17 2 4 2.236068 > 18 3 4 8.062258 > 24 4 5 5.385165 > > Clearly, this is not what I intend but I cannot figure out > what I've done > wrong.? Any help appreciated.? Thanks. > > Jim Bouldin__________________________________________________________________ Ask a question on any topic and get answers from real people. Go to Yahoo! Answers and share wha
What's wrong is I'm trying to select only those rows in which X > Y, but I'm getting rows in which Y > X and losing some in which X > Y. The row numbers are not being read as values. Very confusing. Jim> > What's wrong with it? It looks okay to me. If you use > subset(data, data$X >data$Y)you get the same results. Any chance you're > reading the row.numbers as values? > > BTW "data" is a reserved word in R and it is good practice not to use it > as a variable name. > > My Results > > X Y V3 > 3 3 1 2.236068 > 4 4 1 6.324555 > 5 5 1 5.000000 > 6 1 2 8.062258 > 10 5 2 5.656854 > 11 1 3 2.236068 > 12 2 3 9.486833 > 17 2 4 2.236068 > 18 3 4 8.062258 > 24 4 5 5.385165 > > > --- On Mon, 8/10/09, Jim Bouldin <jrbouldin at ucdavis.edu> wrote: > > > From: Jim Bouldin <jrbouldin at ucdavis.edu> > > Subject: [R] problem selecting rows meeting a criterion > > To: r-help at r-project.org > > Received: Monday, August 10, 2009, 5:49 PM > > > > When I try to select only those rows from the following > > data frame, called > > "data", in which X > Y > > > > ???X Y? ? ???V3 > > 2? 2 1 8.062258 > > 3? 3 1 2.236068 > > 4? 4 1 6.324555 > > 5? 5 1 5.000000 > > 6? 1 2 8.062258 > > 8? 3 2 9.486833 > > 9? 4 2 2.236068 > > 10 5 2 5.656854 > > 11 1 3 2.236068 > > 12 2 3 9.486833 > > 14 4 3 8.062258 > > 15 5 3 5.099020 > > 16 1 4 6.324555 > > 17 2 4 2.236068 > > 18 3 4 8.062258 > > 20 5 4 5.385165 > > 21 1 5 5.000000 > > 22 2 5 5.656854 > > 23 3 5 5.099020 > > 24 4 5 5.385165 > > > > using the commands > > > attach(data) > > > data2 = data[X >Y,];data2 > > > > I get this for data2: > > > > ???X Y? ? ???V3 > > 3? 3 1 2.236068 > > 4? 4 1 6.324555 > > 5? 5 1 5.000000 > > 6? 1 2 8.062258 > > 10 5 2 5.656854 > > 11 1 3 2.236068 > > 12 2 3 9.486833 > > 17 2 4 2.236068 > > 18 3 4 8.062258 > > 24 4 5 5.385165 > > > > Clearly, this is not what I intend but I cannot figure out > > what I've done > > wrong.? Any help appreciated.? Thanks. > > > > Jim Bouldin > > > > __________________________________________________________________ > Ask a question on any topic and get answers from real people. Go to Yahoo! > Answers and share what you know at http://ca.answers.yahoo.com >Jim Bouldin, PhD Research Ecologist Department of Plant Sciences, UC Davis Davis CA, 95616 530-554-1740
No problem John, thanks for your help, and also thanks to Dan and Patrick. Wasn't able to read or try anybody's suggestions yesterday. Here's what I've discovered in the meantime: What I did not include yesterday is that my original data frame, called "data", was this: X Y V3 1 1 1 0.000000 2 2 1 8.062258 3 3 1 2.236068 4 4 1 6.324555 5 5 1 5.000000 6 1 2 8.062258 7 2 2 0.000000 8 3 2 9.486833 9 4 2 2.236068 10 5 2 5.656854 11 1 3 2.236068 12 2 3 9.486833 13 3 3 0.000000 14 4 3 8.062258 15 5 3 5.099020 16 1 4 6.324555 17 2 4 2.236068 18 3 4 8.062258 19 4 4 0.000000 20 5 4 5.385165 21 1 5 5.000000 22 2 5 5.656854 23 3 5 5.099020 24 4 5 5.385165 25 5 5 0.000000 To this data frame I applied the following command: data <- data[data$V3 >0,];data #to remove all rows where V3 = 0 giving me this (the point from which I started yesterday): X Y V3 2 2 1 8.062258 3 3 1 2.236068 4 4 1 6.324555 5 5 1 5.000000 6 1 2 8.062258 8 3 2 9.486833 9 4 2 2.236068 10 5 2 5.656854 11 1 3 2.236068 12 2 3 9.486833 14 4 3 8.062258 15 5 3 5.099020 16 1 4 6.324555 17 2 4 2.236068 18 3 4 8.062258 20 5 4 5.385165 21 1 5 5.000000 22 2 5 5.656854 23 3 5 5.099020 24 4 5 5.385165 So far so good. But when I then submit the command> data = data[X>Y,] #to select all rows where X > YI get the problem result already mentioned, namely: X Y V3 3 3 1 2.236068 4 4 1 6.324555 5 5 1 5.000000 6 1 2 8.062258 10 5 2 5.656854 11 1 3 2.236068 12 2 3 9.486833 17 2 4 2.236068 18 3 4 8.062258 24 4 5 5.385165 which is clearly wrong! It doesn't matter if I give a new name to the data frame at each step or not, or whether I use the name "data" or not. It always gives the same wrong answer. However, if I instead use the command: subset(data, X>Y), I get the right answer, namely: X Y V3 2 2 1 8.062258 3 3 1 2.236068 4 4 1 6.324555 5 5 1 5.000000 8 3 2 9.486833 9 4 2 2.236068 10 5 2 5.656854 14 4 3 8.062258 15 5 3 5.099020 20 5 4 5.385165 OK so the lesson so far is "use the subset function". But here it gets weirder. If I instead go straight from the initial data frame ("data", given at the top of this post), selecting only rows where X>Y (without the intermediate step of removing rows with V3 = 0, which although is unnecessary in getting the result I want, is very relevant to the larger issue here), by using the command that caused me the original trouble (data = data[X>Y,]), I get the RIGHT answer (the data frame just above). The subset function also gives the right answer. Now what in the world is going on? This kind of thing scares me. Below is the full set of commands starting from scratch: #Point of the following is to measure the pairwise euclidean distances between 5 objects, each having X and Y coordinates #and put them into data frame format that labels each pair and gives the distance between them d = data.frame(x=sample(1:10, 5), y=sample(1:10, 5)) #create a sample data set ss2 = as.data.frame(as.matrix(dist(d))) #create a data.frame to extract row and column names X = rep(seq(1:length(row.names(ss2))), length(names(ss2))) #make a vector containing the X coordinate names Y = rep(seq(1:length(names(ss2))), length(row.names(ss2))) #the same for Y Y = sort(Y) #first sort coords = cbind(X, Y);rm(X,Y) #then cbind and remove X and Y data1 = as.data.frame(cbind(coords, as.vector(as.matrix(dist(d)))));rm(coords) # column bind the 3 vectors data2 = data1[data1$V3 >0,] #remove those with V3 = 0 (= the original matrix diagonal) data3 = data2[X>Y,] #remove duplicates from original distance matrix data1;data2;data3 Thoughts much appreciated. Thanks. Jim Bouldin> > Clearly I was more tired than I realised last night. :( My appologies. > > In any case with the data.frame name changed to xx this seems to give you > what you want > > subset(xx, xx[,1] > xx[,2]) > > or using the data name > subset(data, data[,1] > data[,2]) > should work as well
Yes, thanks Steve and also to everyone else for helping me clear this up. The issue was definitely the existence of other objects named X and Y that I inadvertently referred to in my command statement. Only when these objects are removed AND the data frame in question is attached, will the command I originally used work. However, I see that it is much easier to just use the subset function or perhaps the with function. Seems that R has many painful lessons to teach. Thanks again. Jim Bouldin> This won't work in general, and is probably only working in this > particular case because you already have defined somewhere in your > workspace vars named X and Y. > > What you wrote above isn't taking the values X,Y from data$X and data > $Y, respectively, but rather from var X and Y defined elsewhere. > > Instead of doing data[X > Y], do: > > data[data$X > data$Y,] > > This should get you what you're expecting....> > Hopefully you're learning a slightly different lesson now :-) > > Does that clear things up at all? > > -steve > > -- > Steve Lianoglou > Graduate Student: Computational Systems Biology > | Memorial Sloan-Kettering Cancer Center > | Weill Medical College of Cornell University > Contact Info: http://cbio.mskcc.org/~lianos/contact > >