Hi R users, I am trying to assign ages to age classes for a large data set (123,000 records), and using a for-loop was too slow, so I wrote a function and used apply. However, the function does not properly assign the first two classes (the rest are fine). It appears that when age is one digit, it does not get assigned properly. I tried to provide a small-scale work-up (at the end of the email) but it does not reproduce the problem; the best I can do is to provide my code and the output below. As you can see, I've confirmed that age is numeric, that all values are integers, and that pieces of the code work independently. Any thoughts would be appreciated. To add to the mystery, depending which rows of my data set I select, I get different problems. mds[1:100,] gives the problem above, as do mds[100:200,] , mds[150:250,] and mds[10000:10100,]. However, with mds[200:300,], mds[250:350,] and mds[1000:1100,], only ages with 3 digits are correctly assigned - all ages <100 are returned as NA. I'm using R v 2.8.1 on Windows XP. Cheers, Alan Cohen Centre for Global Health Research, Toronto,ON> ageassign <- function(x){+ y <- NA + if (x[11] %in% c(0:4)) {y <- "0-4"} + else if (x[11] %in% c(5:14)) {y <- "5-14" } + else if (x[11] %in% c(15:29)) {y <- "15-29" } + else if (x[11] %in% c(30:69)) {y <- "30-69"} + else if (x[11] %in% c(70:79)) {y <- "70-79"} + else if (x[11] %in% c(80:125)) {y <- "80+"} + return(y) + }> jj <- apply(mds[1:100,],1,FUN=ageassign) > jj1 2 3 4 5 6 7 8 9 10 11 12 13 NA "80+" "30-69" "30-69" "80+" NA "30-69" "30-69" "70-79" "15-29" "15-29" "30-69" "70-79" 14 15 16 17 18 19 20 21 22 23 24 25 26 "80+" NA "30-69" "30-69" "30-69" "80+" "80+" "15-29" "70-79" "30-69" "70-79" "70-79" "30-69" 27 28 29 30 31 32 33 34 35 36 37 38 39 "70-79" "80+" NA "80+" "70-79" NA "15-29" "15-29" NA NA "70-79" "30-69" "30-69" 40 41 42 43 44 45 46 47 48 49 50 51 52 "70-79" "30-69" "30-69" "30-69" "70-79" "30-69" "30-69" "70-79" "15-29" "30-69" NA "15-29" "30-69" 53 54 55 56 57 58 59 60 61 62 63 64 65 "30-69" NA "70-79" "30-69" "30-69" "30-69" "30-69" "15-29" "30-69" "30-69" "70-79" "30-69" NA 66 67 68 69 70 71 72 73 74 75 76 77 78 "30-69" "30-69" "30-69" "30-69" "30-69" "80+" "30-69" "80+" "70-79" "30-69" "30-69" "30-69" NA 79 80 81 82 83 84 85 86 87 88 89 90 91 "30-69" "30-69" "30-69" NA "80+" "30-69" "30-69" "30-69" NA "15-29" "30-69" "30-69" "30-69" 92 93 94 95 96 97 98 99 100 "30-69" "30-69" "30-69" "30-69" "70-79" "30-69" "30-69" "30-69" "30-69"> mds[1:100,11][1] 3 82 40 35 82 1 37 57 71 22 21 52 73 86 1 43 60 63 84 88 29 73 69 75 73 43 75 83 4 83 77 1 27 [34] 15 1 6 76 51 45 71 54 64 69 70 48 38 74 26 37 4 18 63 59 8 78 63 67 62 50 21 66 69 75 57 4 50 [67] 58 60 61 62 83 69 92 75 30 49 69 1 69 63 69 0 93 64 59 69 2 25 32 60 66 67 54 53 64 79 59 49 59 [100] 64> table(mds[,11])0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 3123 6441 3856 2884 1968 1615 1386 1088 1098 721 943 681 511 380 426 835 571 555 719 653 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 879 715 672 631 655 773 680 713 769 538 685 566 729 702 652 766 683 723 821 675 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 774 650 908 892 784 925 781 1043 1161 924 1087 827 1261 1356 1297 1272 1277 1614 1831 1523 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 1702 1251 1954 2157 1901 2090 1874 2705 3085 2529 2488 1777 2701 2586 2308 2020 1801 2269 2486 1856 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 1762 1047 1413 1326 967 1013 753 870 884 531 601 277 364 301 193 288 149 174 169 470 100 101 102 103 104 105 106 107 108 114 115 117 118 120 125 15 2 5 7 2 4 1 1 2 1 1 2 2 2 1> mode(mds[,11])[1] "numeric"> mds[1,11] %in% c(0:4)[1] TRUE> if (mds[1,11] %in% c(0:4)) {y <- "0-4"} > y[1] "0-4"> xx <- matrix(trunc(runif(30,0,125)),15,2) > aassign <- function(x){+ y <- NA + if (x[2] %in% c(0:4)) {y <- "0-4"} + else if (x[2] %in% c(5:14)) {y <- "5-14" } + else if (x[2] %in% c(15:29)) {y <- "15-29" } + else if (x[2] %in% c(30:69)) {y <- "30-69"} + else if (x[2] %in% c(70:79)) {y <- "70-79"} + else if (x[2] %in% c(80:125)) {y <- "80+"} + return(y) + }> jj <- apply(xx,1,FUN=aassign) > t(xx)[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15] [1,] 23 98 107 94 76 103 106 40 66 11 109 101 96 37 18 [2,] 11 57 58 91 43 123 103 77 4 79 64 10 8 105 76> jj[1] "5-14" "30-69" "30-69" "80+" "30-69" "80+" "80+" "70-79" "0-4" "70-79" "30-69" "5-14" [13] "5-14" "80+" "70-79">
I'm not sure, but I think that using the "cut" function would solve your problem? ?cut On Wed, 22 Apr 2009 14:56:10 -0400, "Alan Cohen" <CohenA at smh.toronto.on.ca> wrote:> Hi R users, > > I am trying to assign ages to age classes for a large data set (123,000 > records), and using a for-loop was too slow, so I wrote a function andused> apply. However, the function does not properly assign the first two > classes (the rest are fine). It appears that when age is one digit, it > does not get assigned properly. > > I tried to provide a small-scale work-up (at the end of the email) but it > does not reproduce the problem; the best I can do is to provide my codeand> the output below. As you can see, I've confirmed that age is numeric,that> all values are integers, and that pieces of the code work independently. > Any thoughts would be appreciated. > > To add to the mystery, depending which rows of my data set I select, Iget> different problems. mds[1:100,] gives the problem above, as do > mds[100:200,] , mds[150:250,] and mds[10000:10100,]. However, with > mds[200:300,], mds[250:350,] and mds[1000:1100,], only ages with 3 digits > are correctly assigned - all ages <100 are returned as NA. > > I'm using R v 2.8.1 on Windows XP. > > Cheers, > Alan Cohen > Centre for Global Health Research, > Toronto,ON > >> ageassign <- function(x){ > + y <- NA > + if (x[11] %in% c(0:4)) {y <- "0-4"} > + else if (x[11] %in% c(5:14)) {y <- "5-14" } > + else if (x[11] %in% c(15:29)) {y <- "15-29" } > + else if (x[11] %in% c(30:69)) {y <- "30-69"} > + else if (x[11] %in% c(70:79)) {y <- "70-79"} > + else if (x[11] %in% c(80:125)) {y <- "80+"} > + return(y) > + } >> jj <- apply(mds[1:100,],1,FUN=ageassign) >> jj > 1 2 3 4 5 6 7 8 9> 10 11 12 13 > NA "80+" "30-69" "30-69" "80+" NA "30-69" "30-69" "70-79" > "15-29" "15-29" "30-69" "70-79" > 14 15 16 17 18 19 20 21 22> 23 24 25 26 > "80+" NA "30-69" "30-69" "30-69" "80+" "80+" "15-29" "70-79" > "30-69" "70-79" "70-79" "30-69" > 27 28 29 30 31 32 33 34 35> 36 37 38 39 > "70-79" "80+" NA "80+" "70-79" NA "15-29" "15-29" NA> NA "70-79" "30-69" "30-69" > 40 41 42 43 44 45 46 47 48> 49 50 51 52 > "70-79" "30-69" "30-69" "30-69" "70-79" "30-69" "30-69" "70-79" "15-29" > "30-69" NA "15-29" "30-69" > 53 54 55 56 57 58 59 60 61> 62 63 64 65 > "30-69" NA "70-79" "30-69" "30-69" "30-69" "30-69" "15-29" "30-69" > "30-69" "70-79" "30-69" NA > 66 67 68 69 70 71 72 73 74> 75 76 77 78 > "30-69" "30-69" "30-69" "30-69" "30-69" "80+" "30-69" "80+" "70-79" > "30-69" "30-69" "30-69" NA > 79 80 81 82 83 84 85 86 87> 88 89 90 91 > "30-69" "30-69" "30-69" NA "80+" "30-69" "30-69" "30-69" NA > "15-29" "30-69" "30-69" "30-69" > 92 93 94 95 96 97 98 99 100 > "30-69" "30-69" "30-69" "30-69" "70-79" "30-69" "30-69" "30-69" "30-69" >> mds[1:100,11] > [1] 3 82 40 35 82 1 37 57 71 22 21 52 73 86 1 43 60 63 84 88 29 7369> 75 73 43 75 83 4 83 77 1 27 > [34] 15 1 6 76 51 45 71 54 64 69 70 48 38 74 26 37 4 18 63 59 8 7863> 67 62 50 21 66 69 75 57 4 50 > [67] 58 60 61 62 83 69 92 75 30 49 69 1 69 63 69 0 93 64 59 69 2 2532> 60 66 67 54 53 64 79 59 49 59 > [100] 64 >> table(mds[,11]) > > 0 1 2 3 4 5 6 7 8 9 10 11 12 1314> 15 16 17 18 19 > 3123 6441 3856 2884 1968 1615 1386 1088 1098 721 943 681 511 380426> 835 571 555 719 653 > 20 21 22 23 24 25 26 27 28 29 30 31 32 3334> 35 36 37 38 39 > 879 715 672 631 655 773 680 713 769 538 685 566 729 702652> 766 683 723 821 675 > 40 41 42 43 44 45 46 47 48 49 50 51 52 5354> 55 56 57 58 59 > 774 650 908 892 784 925 781 1043 1161 924 1087 827 1261 13561297> 1272 1277 1614 1831 1523 > 60 61 62 63 64 65 66 67 68 69 70 71 72 7374> 75 76 77 78 79 > 1702 1251 1954 2157 1901 2090 1874 2705 3085 2529 2488 1777 2701 25862308> 2020 1801 2269 2486 1856 > 80 81 82 83 84 85 86 87 88 89 90 91 92 9394> 95 96 97 98 99 > 1762 1047 1413 1326 967 1013 753 870 884 531 601 277 364 301193> 288 149 174 169 470 > 100 101 102 103 104 105 106 107 108 114 115 117 118 120125> 15 2 5 7 2 4 1 1 2 1 1 2 2 21>> mode(mds[,11]) > [1] "numeric" > >> mds[1,11] %in% c(0:4) > [1] TRUE >> if (mds[1,11] %in% c(0:4)) {y <- "0-4"} >> y > [1] "0-4" > >> xx <- matrix(trunc(runif(30,0,125)),15,2) >> aassign <- function(x){ > + y <- NA > + if (x[2] %in% c(0:4)) {y <- "0-4"} > + else if (x[2] %in% c(5:14)) {y <- "5-14" } > + else if (x[2] %in% c(15:29)) {y <- "15-29" } > + else if (x[2] %in% c(30:69)) {y <- "30-69"} > + else if (x[2] %in% c(70:79)) {y <- "70-79"} > + else if (x[2] %in% c(80:125)) {y <- "80+"} > + return(y) > + } >> jj <- apply(xx,1,FUN=aassign) >> t(xx) > [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] > [,14] [,15] > [1,] 23 98 107 94 76 103 106 40 66 11 109 101 96> 37 18 > [2,] 11 57 58 91 43 123 103 77 4 79 64 10 8> 105 76 >> jj > [1] "5-14" "30-69" "30-69" "80+" "30-69" "80+" "80+" "70-79""0-4"> "70-79" "30-69" "5-14" > [13] "5-14" "80+" "70-79" >> > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Hi Alan, You can avoid the if() else() you're using in your function by replacing it with the recode() function in the car package. I'm pretty sure it will speed up your function. See ?recode after loading the car package: require(car) ?recode HTH, Jorge On Wed, Apr 22, 2009 at 2:56 PM, Alan Cohen <CohenA@smh.toronto.on.ca>wrote:> Hi R users, > > I am trying to assign ages to age classes for a large data set (123,000 > records), and using a for-loop was too slow, so I wrote a function and used > apply. However, the function does not properly assign the first two classes > (the rest are fine). It appears that when age is one digit, it does not get > assigned properly. > > I tried to provide a small-scale work-up (at the end of the email) but it > does not reproduce the problem; the best I can do is to provide my code and > the output below. As you can see, I've confirmed that age is numeric, that > all values are integers, and that pieces of the code work independently. > Any thoughts would be appreciated. > > To add to the mystery, depending which rows of my data set I select, I get > different problems. mds[1:100,] gives the problem above, as do > mds[100:200,] , mds[150:250,] and mds[10000:10100,]. However, with > mds[200:300,], mds[250:350,] and mds[1000:1100,], only ages with 3 digits > are correctly assigned - all ages <100 are returned as NA. > > I'm using R v 2.8.1 on Windows XP. > > Cheers, > Alan Cohen > Centre for Global Health Research, > Toronto,ON > > > ageassign <- function(x){ > + y <- NA > + if (x[11] %in% c(0:4)) {y <- "0-4"} > + else if (x[11] %in% c(5:14)) {y <- "5-14" } > + else if (x[11] %in% c(15:29)) {y <- "15-29" } > + else if (x[11] %in% c(30:69)) {y <- "30-69"} > + else if (x[11] %in% c(70:79)) {y <- "70-79"} > + else if (x[11] %in% c(80:125)) {y <- "80+"} > + return(y) > + } > > jj <- apply(mds[1:100,],1,FUN=ageassign) > > jj > 1 2 3 4 5 6 7 8 9 > 10 11 12 13 > NA "80+" "30-69" "30-69" "80+" NA "30-69" "30-69" "70-79" > "15-29" "15-29" "30-69" "70-79" > 14 15 16 17 18 19 20 21 22 > 23 24 25 26 > "80+" NA "30-69" "30-69" "30-69" "80+" "80+" "15-29" "70-79" > "30-69" "70-79" "70-79" "30-69" > 27 28 29 30 31 32 33 34 35 > 36 37 38 39 > "70-79" "80+" NA "80+" "70-79" NA "15-29" "15-29" NA > NA "70-79" "30-69" "30-69" > 40 41 42 43 44 45 46 47 48 > 49 50 51 52 > "70-79" "30-69" "30-69" "30-69" "70-79" "30-69" "30-69" "70-79" "15-29" > "30-69" NA "15-29" "30-69" > 53 54 55 56 57 58 59 60 61 > 62 63 64 65 > "30-69" NA "70-79" "30-69" "30-69" "30-69" "30-69" "15-29" "30-69" > "30-69" "70-79" "30-69" NA > 66 67 68 69 70 71 72 73 74 > 75 76 77 78 > "30-69" "30-69" "30-69" "30-69" "30-69" "80+" "30-69" "80+" "70-79" > "30-69" "30-69" "30-69" NA > 79 80 81 82 83 84 85 86 87 > 88 89 90 91 > "30-69" "30-69" "30-69" NA "80+" "30-69" "30-69" "30-69" NA > "15-29" "30-69" "30-69" "30-69" > 92 93 94 95 96 97 98 99 100 > "30-69" "30-69" "30-69" "30-69" "70-79" "30-69" "30-69" "30-69" "30-69" > > mds[1:100,11] > [1] 3 82 40 35 82 1 37 57 71 22 21 52 73 86 1 43 60 63 84 88 29 73 69 > 75 73 43 75 83 4 83 77 1 27 > [34] 15 1 6 76 51 45 71 54 64 69 70 48 38 74 26 37 4 18 63 59 8 78 63 > 67 62 50 21 66 69 75 57 4 50 > [67] 58 60 61 62 83 69 92 75 30 49 69 1 69 63 69 0 93 64 59 69 2 25 32 > 60 66 67 54 53 64 79 59 49 59 > [100] 64 > > table(mds[,11]) > > 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 > 15 16 17 18 19 > 3123 6441 3856 2884 1968 1615 1386 1088 1098 721 943 681 511 380 426 > 835 571 555 719 653 > 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 > 35 36 37 38 39 > 879 715 672 631 655 773 680 713 769 538 685 566 729 702 652 > 766 683 723 821 675 > 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 > 55 56 57 58 59 > 774 650 908 892 784 925 781 1043 1161 924 1087 827 1261 1356 1297 > 1272 1277 1614 1831 1523 > 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 > 75 76 77 78 79 > 1702 1251 1954 2157 1901 2090 1874 2705 3085 2529 2488 1777 2701 2586 2308 > 2020 1801 2269 2486 1856 > 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 > 95 96 97 98 99 > 1762 1047 1413 1326 967 1013 753 870 884 531 601 277 364 301 193 > 288 149 174 169 470 > 100 101 102 103 104 105 106 107 108 114 115 117 118 120 125 > 15 2 5 7 2 4 1 1 2 1 1 2 2 2 1 > > mode(mds[,11]) > [1] "numeric" > > > mds[1,11] %in% c(0:4) > [1] TRUE > > if (mds[1,11] %in% c(0:4)) {y <- "0-4"} > > y > [1] "0-4" > > > xx <- matrix(trunc(runif(30,0,125)),15,2) > > aassign <- function(x){ > + y <- NA > + if (x[2] %in% c(0:4)) {y <- "0-4"} > + else if (x[2] %in% c(5:14)) {y <- "5-14" } > + else if (x[2] %in% c(15:29)) {y <- "15-29" } > + else if (x[2] %in% c(30:69)) {y <- "30-69"} > + else if (x[2] %in% c(70:79)) {y <- "70-79"} > + else if (x[2] %in% c(80:125)) {y <- "80+"} > + return(y) > + } > > jj <- apply(xx,1,FUN=aassign) > > t(xx) > [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] > [,14] [,15] > [1,] 23 98 107 94 76 103 106 40 66 11 109 101 96 > 37 18 > [2,] 11 57 58 91 43 123 103 77 4 79 64 10 8 > 105 76 > > jj > [1] "5-14" "30-69" "30-69" "80+" "30-69" "80+" "80+" "70-79" "0-4" > "70-79" "30-69" "5-14" > [13] "5-14" "80+" "70-79" > > > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
The cut() function will do what you want in a vectorized fashion. See ? cut However, that being said, I would strongly advise that you read Frank's page on the categorizing of continuous variables: http://biostat.mc.vanderbilt.edu/twiki/bin/view/Main/CatContinuous before you proceed. HTH, Marc Schwartz On Apr 22, 2009, at 1:56 PM, Alan Cohen wrote:> Hi R users, > > I am trying to assign ages to age classes for a large data set > (123,000 records), and using a for-loop was too slow, so I wrote a > function and used apply. However, the function does not properly > assign the first two classes (the rest are fine). It appears that > when age is one digit, it does not get assigned properly. > > I tried to provide a small-scale work-up (at the end of the email) > but it does not reproduce the problem; the best I can do is to > provide my code and the output below. As you can see, I've > confirmed that age is numeric, that all values are integers, and > that pieces of the code work independently. Any thoughts would be > appreciated. > > To add to the mystery, depending which rows of my data set I select, > I get different problems. mds[1:100,] gives the problem above, as > do mds[100:200,] , mds[150:250,] and mds[10000:10100,]. However, > with mds[200:300,], mds[250:350,] and mds[1000:1100,], only ages > with 3 digits are correctly assigned - all ages <100 are returned as > NA. > > I'm using R v 2.8.1 on Windows XP. > > Cheers, > Alan Cohen > Centre for Global Health Research, > Toronto,ON > >> ageassign <- function(x){ > + y <- NA > + if (x[11] %in% c(0:4)) {y <- "0-4"} > + else if (x[11] %in% c(5:14)) {y <- "5-14" } > + else if (x[11] %in% c(15:29)) {y <- "15-29" } > + else if (x[11] %in% c(30:69)) {y <- "30-69"} > + else if (x[11] %in% c(70:79)) {y <- "70-79"} > + else if (x[11] %in% c(80:125)) {y <- "80+"} > + return(y) > + } >> jj <- apply(mds[1:100,],1,FUN=ageassign) >> jj > 1 2 3 4 5 6 7 8 > 9 10 11 12 13 > NA "80+" "30-69" "30-69" "80+" NA "30-69" "30-69" > "70-79" "15-29" "15-29" "30-69" "70-79" > 14 15 16 17 18 19 20 21 > 22 23 24 25 26 > "80+" NA "30-69" "30-69" "30-69" "80+" "80+" "15-29" > "70-79" "30-69" "70-79" "70-79" "30-69" > 27 28 29 30 31 32 33 34 > 35 36 37 38 39 > "70-79" "80+" NA "80+" "70-79" NA "15-29" "15-29" > NA NA "70-79" "30-69" "30-69" > 40 41 42 43 44 45 46 47 > 48 49 50 51 52 > "70-79" "30-69" "30-69" "30-69" "70-79" "30-69" "30-69" "70-79" > "15-29" "30-69" NA "15-29" "30-69" > 53 54 55 56 57 58 59 60 > 61 62 63 64 65 > "30-69" NA "70-79" "30-69" "30-69" "30-69" "30-69" "15-29" > "30-69" "30-69" "70-79" "30-69" NA > 66 67 68 69 70 71 72 73 > 74 75 76 77 78 > "30-69" "30-69" "30-69" "30-69" "30-69" "80+" "30-69" "80+" > "70-79" "30-69" "30-69" "30-69" NA > 79 80 81 82 83 84 85 86 > 87 88 89 90 91 > "30-69" "30-69" "30-69" NA "80+" "30-69" "30-69" "30-69" > NA "15-29" "30-69" "30-69" "30-69" > 92 93 94 95 96 97 98 99 100 > "30-69" "30-69" "30-69" "30-69" "70-79" "30-69" "30-69" "30-69" > "30-69" >> mds[1:100,11] > [1] 3 82 40 35 82 1 37 57 71 22 21 52 73 86 1 43 60 63 84 88 29 > 73 69 75 73 43 75 83 4 83 77 1 27 > [34] 15 1 6 76 51 45 71 54 64 69 70 48 38 74 26 37 4 18 63 59 8 > 78 63 67 62 50 21 66 69 75 57 4 50 > [67] 58 60 61 62 83 69 92 75 30 49 69 1 69 63 69 0 93 64 59 69 2 > 25 32 60 66 67 54 53 64 79 59 49 59 > [100] 64 >> table(mds[,11]) > > 0 1 2 3 4 5 6 7 8 9 10 11 12 > 13 14 15 16 17 18 19 > 3123 6441 3856 2884 1968 1615 1386 1088 1098 721 943 681 511 > 380 426 835 571 555 719 653 > 20 21 22 23 24 25 26 27 28 29 30 31 32 > 33 34 35 36 37 38 39 > 879 715 672 631 655 773 680 713 769 538 685 566 729 > 702 652 766 683 723 821 675 > 40 41 42 43 44 45 46 47 48 49 50 51 52 > 53 54 55 56 57 58 59 > 774 650 908 892 784 925 781 1043 1161 924 1087 827 1261 1356 > 1297 1272 1277 1614 1831 1523 > 60 61 62 63 64 65 66 67 68 69 70 71 72 > 73 74 75 76 77 78 79 > 1702 1251 1954 2157 1901 2090 1874 2705 3085 2529 2488 1777 2701 > 2586 2308 2020 1801 2269 2486 1856 > 80 81 82 83 84 85 86 87 88 89 90 91 92 > 93 94 95 96 97 98 99 > 1762 1047 1413 1326 967 1013 753 870 884 531 601 277 364 > 301 193 288 149 174 169 470 > 100 101 102 103 104 105 106 107 108 114 115 117 118 > 120 125 > 15 2 5 7 2 4 1 1 2 1 1 2 2 > 2 1 >> mode(mds[,11]) > [1] "numeric" > >> mds[1,11] %in% c(0:4) > [1] TRUE >> if (mds[1,11] %in% c(0:4)) {y <- "0-4"} >> y > [1] "0-4" > >> xx <- matrix(trunc(runif(30,0,125)),15,2) >> aassign <- function(x){ > + y <- NA > + if (x[2] %in% c(0:4)) {y <- "0-4"} > + else if (x[2] %in% c(5:14)) {y <- "5-14" } > + else if (x[2] %in% c(15:29)) {y <- "15-29" } > + else if (x[2] %in% c(30:69)) {y <- "30-69"} > + else if (x[2] %in% c(70:79)) {y <- "70-79"} > + else if (x[2] %in% c(80:125)) {y <- "80+"} > + return(y) > + } >> jj <- apply(xx,1,FUN=aassign) >> t(xx) > [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [, > 13] [,14] [,15] > [1,] 23 98 107 94 76 103 106 40 66 11 109 > 101 96 37 18 > [2,] 11 57 58 91 43 123 103 77 4 79 64 > 10 8 105 76 >> jj > [1] "5-14" "30-69" "30-69" "80+" "30-69" "80+" "80+" "70-79" > "0-4" "70-79" "30-69" "5-14" > [13] "5-14" "80+" "70-79" >> > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Hello, ? The "apply" function seems to behave oddly with?my code below ? NB : H1 is a data frame. (data in the attached file.) # the first lines are: 1 02/01/2008 0.000000? 0? 0 0.000000?? 0 2 03/01/2008 0.000000? 0? 0 0.000000?? 0 3 04/01/2008 0.000000? 0? 0 0.000000?? 0 4 07/01/2008 0.000000? 0? 0 0.000000?? 0 5 08/01/2008 0.000000? 0? 0 0.000000?? 0 6 09/01/2008 0.000000? 0? 0 0.000000?? 0 7 10/01/2008 0.000000? 0? 0 0.000000?? 0 8 11/01/2008 1.010391? 0? 0 1.102169?? 0 ... The aim of the code is to extract those lines for which there is a strictly positive value in the?second column AND in one of the others: ? reper=function(x){as.numeric(x[2]>1 & any(x[3:length(x)]>1))} ? TAB1= H1[which(apply(H1,1,reper)>0),] ? Strangely, this is OK for all the lines, except for the last one. In fact, in H1, the last 2 lines are: 258 29/12/2008 1.476535 1.187615? 0 0.000000?? 0 259 30/12/2008 0.000000 1.147888? 0 0.000000?? 0 Obviously, line 258 should be?the last line of TAB1, but it is not the case (it does not appear at all)?and I really don't understand why. This is all the more?strange?since applying the function "reper" only to this line 258 gives a "1" as expected... Can someone help ? ? Thanks, ? Henri -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: data.txt URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20090520/6798b6f1/attachment-0002.txt>
Maybe Matching Threads
- DebugInfo, Metadata usage
- Mac OS and interpretation of @ in a username. Ex user@mds.xyz doesn't work on Mac OS but does on Win 10
- Plotting K-means clustering results on an MDS
- Mac OS and interpretation of @ in a username. Ex user@mds.xyz doesn't work on Mac OS but does on Win 10
- 3 dimensional MDS plots