Hello, I am a beginner in R programming and recently heard about this mailing list. Currently, I am trapped into a simple problem for which I just can't find a solution. I have a huge dataset (~81,000 observations) that has been analyzed and the final result is in the form of 0 and 1(one column). I need to write a code to process this column in a little complicated way. These 81,000 observations are actually 9,000 sets (81,000/9). So, in each set whenever zero appears, rest all observations become zero. For example; If the column has: 111110111111011111111111111111111.... The output should look like: 111110000111000000111111111111111... I hope this makes sense. Thank you in anticipation, Pravin Pravin Jadhav [[alternative HTML version deleted]]
On Wed, 24 Dec 2003, Pravin wrote:> I am a beginner in R programming and recently heard about this mailing list. > Currently, I am trapped into a simple problem for which I just can't find a > solution. I have a huge dataset (~81,000 observations) that has beenBTW, that is quite a small dataset these days: not even 10 million is `huge'.> analyzed and the final result is in the form of 0 and 1(one column). > > I need to write a code to process this column in a little complicated way. > These 81,000 observations are actually 9,000 sets (81,000/9). > So, in each set whenever zero appears, rest all observations become zero. > > For example; > > If the column has: > > 111110111111011111111111111111111.... > > The output should look like: > > 111110000111000000111111111111111...Let me see if I understand you. This was really 111110111 111011111 111111111 111111... and you want 111110000 111000000 111111111 111111... So let's treat it as a matrix (extending to 4 complete sets): x <- as.numeric(strsplit("111110111111011111111111111111111011", NULL)[[1]]) xx <- matrix(x, ncol=9, byrow=TRUE) Then a simple loop for(i in 2:9) xx[,i] <- xx[,i] & xx[,i-1] give me the second matrix, which I can read out as a vector as as.vector(t(xx)) [1] 1 1 1 1 1 0 0 0 0 1 1 1 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 or in what I understand as your format paste(t(xx), collapse="") [1] "111110000111000000111111111111111000" Doing this with 81000 random 0/1's took a fraction of a second. -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
In R, always begin to try to obtain result on a little unit. Begin to make a function that will make replacements for ONE vector (of size 9) FillWith=function(vec,SearchForOne=0,ReplaceNextValues=0) { pp=which(vec==SearchForOne) if (length(pp)>0) vec[pp:length(vec)]=ReplaceNextValues return(vec) } Verify it works: > FillWith(c(1,1,0,1,1)) [1] 1 1 0 0 0 Then try to apply it with your data, using one of the ?apply functions. Here, tapply seems to be adequate. > data=c(rep(1,9),rep(1,4),0,rep(1,4)) > data [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 > data=cbind(data,groups=((1:length(data)-1)%/%9)) > data data groups [1,] 1 0 [2,] 1 0 [3,] 1 0 [4,] 1 0 [5,] 1 0 [6,] 1 0 [7,] 1 0 [8,] 1 0 [9,] 1 0 [10,] 1 1 [11,] 1 1 [12,] 1 1 [13,] 1 1 [14,] 0 1 [15,] 1 1 [16,] 1 1 [17,] 1 1 [18,] 1 1 > tapply(data[,1],data[,2],FUN=FillWith) $"0" [1] 1 1 1 1 1 1 1 1 1 $"1" [1] 1 1 1 1 0 0 0 0 0 And then come back to a vector with unlist(). Eric At 08:27 24/12/2003, Pravin wrote:>Hello, > > >I am a beginner in R programming and recently heard about this mailing list. >Currently, I am trapped into a simple problem for which I just can't find a >solution. I have a huge dataset (~81,000 observations) that has been >analyzed and the final result is in the form of 0 and 1(one column). > >I need to write a code to process this column in a little complicated way. > >These 81,000 observations are actually 9,000 sets (81,000/9). > >So, in each set whenever zero appears, rest all observations become zero. > >For example; >If the column has: >111110111111011111111111111111111.... >The output should look like: >111110000111000000111111111111111... >I hope this makes sense. >Thank you in anticipation, > >Pravin > >Pravin Jadhav-------------------------------------------------- L'erreur est certes humaine, mais un vrai d?sastre n?cessite un ou deux ordinateurs. Citation anonyme -------------------------------------------------- Eric Lecoutre Informaticien/Statisticien Institut de Statistique / UCL TEL (+32)(0)10473050 lecoutre at stat.ucl.ac.be URL http://www.stat.ucl.ac.be/ISpersonnel/lecoutre
Pravin a ?crit :> Hello, > > > > I am a beginner in R programming and recently heard about this mailing list. > Currently, I am trapped into a simple problem for which I just can't find a > solution. I have a huge dataset (~81,000 observations) that has been > analyzed and the final result is in the form of 0 and 1(one column). > > > > I need to write a code to process this column in a little complicated way. > > These 81,000 observations are actually 9,000 sets (81,000/9). > > So, in each set whenever zero appears, rest all observations become zero. > > > > For example; > > If the column has: > > 111110111111011111111111111111111.... > > The output should look like: > > 111110000111000000111111111111111... > > > > I hope this makes sense. > > > > Thank you in anticipation, > > > > Pravin > > > > Pravin Jadhav > > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help > >Pravin a ?crit : > Hello, > > > > I am a beginner in R programming and recently heard about this mailing list. > Currently, I am trapped into a simple problem for which I just can't find a > solution. I have a huge dataset (~81,000 observations) that has been > analyzed and the final result is in the form of 0 and 1(one column). > > > > I need to write a code to process this column in a little complicated way. > > These 81,000 observations are actually 9,000 sets (81,000/9). > > So, in each set whenever zero appears, rest all observations become zero. > > > > For example; > > If the column has: > > 111110111111011111111111111111111.... > > The output should look like: > > 111110000111000000111111111111111... > > > > I hope this makes sense. > > > > Thank you in anticipation, > > > > Pravin > > > > Pravin Jadhav > > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help > > Here is an example: set.seed(101) v <- sample(c(0, 1), size = 36, replace = TRUE, prob = c(.05, .95)) L <- length(v) / 9 idx <- rep(seq(L), each = 9) fn <- function(x){ ok <- FALSE for(i in seq(length(x))){ if(x[i] == 0) ok <- TRUE x[i] <- if(ok) 0 else 1 } x } cbind(idx, v, recod = unlist(tapply(v, idx, fn))) idx v recod 11 1 1 1 12 1 1 1 13 1 1 1 14 1 1 1 15 1 1 1 16 1 1 1 17 1 1 1 18 1 1 1 19 1 1 1 21 2 1 1 22 2 1 1 23 2 1 1 24 2 1 1 25 2 1 1 26 2 1 1 27 2 1 1 28 2 1 1 29 2 1 1 31 3 1 1 32 3 1 1 33 3 1 1 34 3 0 0 35 3 1 0 36 3 1 0 37 3 1 0 38 3 1 0 39 3 1 0 41 4 1 1 42 4 1 1 43 4 1 1 44 4 1 1 45 4 1 1 46 4 1 1 47 4 1 1 48 4 1 1 49 4 1 1 > Merry Christmas ! Renaud -- Dr Renaud Lancelot v?t?rinaire ?pid?miologiste Ambassade de France - SCAC BP 834 Antannarivo 101 Madagascar t?l. +261 (0)32 04 824 55 (cell) +261 (0)20 22 494 37 (home) -- Dr Renaud Lancelot v?t?rinaire ?pid?miologiste Ambassade de France - SCAC BP 834 Antannarivo 101 Madagascar t?l. +261 (0)32 04 824 55 (cell) +261 (0)20 22 494 37 (home)
In addition to the previous replies, try this x <- as.numeric(strsplit("111110111111011111111111111", NULL)[[1]]) g <- rep(1:3, each=9) # set numbering rbind(x, g) # to check y <- unlist( tapply(x, g, cummin) )> y11 12 13 14 15 16 17 18 19 21 22 23 24 25 26 27 28 29 31 32 33 34 35 36 37 38 39 1 1 1 1 1 0 0 0 0 1 1 1 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 tapply() applies a given function, in this case cummin(), to the sets defined by 'g'. cummin() returns the cummulative minimum Here, the names of vector y is a combination of set number and observation in set number. -- Adaikalavan Ramasamy -----Original Message----- From: r-help-bounces at stat.math.ethz.ch [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Renaud Lancelot Sent: Wednesday, December 24, 2003 5:00 PM To: Pravin Cc: r-help at stat.math.ethz.ch Subject: Re: [R] coding logic and syntax in R Pravin a ?crit :> Hello, > > > > I am a beginner in R programming and recently heard about this mailing > list. Currently, I am trapped into a simple problem for which I just > can't find a solution. I have a huge dataset (~81,000 observations) > that has been analyzed and the final result is in the form of 0 and > 1(one column). > > > > I need to write a code to process this column in a little complicated > way. > > These 81,000 observations are actually 9,000 sets (81,000/9). > > So, in each set whenever zero appears, rest all observations become > zero. > > > > For example; > > If the column has: > > 111110111111011111111111111111111.... > > The output should look like: > > 111110000111000000111111111111111... > > > > I hope this makes sense. > > > > Thank you in anticipation, > > > > Pravin > > > > Pravin Jadhav > > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help > >Pravin a ?crit : > Hello, > > > > I am a beginner in R programming and recently heard about this mailing list. > Currently, I am trapped into a simple problem for which I just can't find a > solution. I have a huge dataset (~81,000 observations) that has been > analyzed and the final result is in the form of 0 and 1(one column). > > > > I need to write a code to process this column in a little complicated way. > > These 81,000 observations are actually 9,000 sets (81,000/9). > > So, in each set whenever zero appears, rest all observations become zero. > > > > For example; > > If the column has: > > 111110111111011111111111111111111.... > > The output should look like: > > 111110000111000000111111111111111... > > > > I hope this makes sense. > > > > Thank you in anticipation, > > > > Pravin > > > > Pravin Jadhav > > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help > > Here is an example: set.seed(101) v <- sample(c(0, 1), size = 36, replace = TRUE, prob = c(.05, .95)) L <- length(v) / 9 idx <- rep(seq(L), each = 9) fn <- function(x){ ok <- FALSE for(i in seq(length(x))){ if(x[i] == 0) ok <- TRUE x[i] <- if(ok) 0 else 1 } x } cbind(idx, v, recod = unlist(tapply(v, idx, fn))) idx v recod 11 1 1 1 12 1 1 1 13 1 1 1 14 1 1 1 15 1 1 1 16 1 1 1 17 1 1 1 18 1 1 1 19 1 1 1 21 2 1 1 22 2 1 1 23 2 1 1 24 2 1 1 25 2 1 1 26 2 1 1 27 2 1 1 28 2 1 1 29 2 1 1 31 3 1 1 32 3 1 1 33 3 1 1 34 3 0 0 35 3 1 0 36 3 1 0 37 3 1 0 38 3 1 0 39 3 1 0 41 4 1 1 42 4 1 1 43 4 1 1 44 4 1 1 45 4 1 1 46 4 1 1 47 4 1 1 48 4 1 1 49 4 1 1 > Merry Christmas ! Renaud -- Dr Renaud Lancelot v?t?rinaire ?pid?miologiste Ambassade de France - SCAC BP 834 Antannarivo 101 Madagascar t?l. +261 (0)32 04 824 55 (cell) +261 (0)20 22 494 37 (home) -- Dr Renaud Lancelot v?t?rinaire ?pid?miologiste Ambassade de France - SCAC BP 834 Antannarivo 101 Madagascar t?l. +261 (0)32 04 824 55 (cell) +261 (0)20 22 494 37 (home) ______________________________________________ R-help at stat.math.ethz.ch mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help