Hello, I am a beginner in R programming and recently heard about this mailing list. Currently, I am trapped into a simple problem for which I just can't find a solution. I have a huge dataset (~81,000 observations) that has been analyzed and the final result is in the form of 0 and 1(one column). I need to write a code to process this column in a little complicated way. These 81,000 observations are actually 9,000 sets (81,000/9). So, in each set whenever zero appears, rest all observations become zero. For example; If the column has: 111110111111011111111111111111111.... The output should look like: 111110000111000000111111111111111... I hope this makes sense. Thank you in anticipation, Pravin Pravin Jadhav [[alternative HTML version deleted]]
On Wed, 24 Dec 2003, Pravin wrote:> I am a beginner in R programming and recently heard about this mailing list. > Currently, I am trapped into a simple problem for which I just can't find a > solution. I have a huge dataset (~81,000 observations) that has beenBTW, that is quite a small dataset these days: not even 10 million is `huge'.> analyzed and the final result is in the form of 0 and 1(one column). > > I need to write a code to process this column in a little complicated way. > These 81,000 observations are actually 9,000 sets (81,000/9). > So, in each set whenever zero appears, rest all observations become zero. > > For example; > > If the column has: > > 111110111111011111111111111111111.... > > The output should look like: > > 111110000111000000111111111111111...Let me see if I understand you. This was really 111110111 111011111 111111111 111111... and you want 111110000 111000000 111111111 111111... So let's treat it as a matrix (extending to 4 complete sets): x <- as.numeric(strsplit("111110111111011111111111111111111011", NULL)[[1]]) xx <- matrix(x, ncol=9, byrow=TRUE) Then a simple loop for(i in 2:9) xx[,i] <- xx[,i] & xx[,i-1] give me the second matrix, which I can read out as a vector as as.vector(t(xx)) [1] 1 1 1 1 1 0 0 0 0 1 1 1 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 or in what I understand as your format paste(t(xx), collapse="") [1] "111110000111000000111111111111111000" Doing this with 81000 random 0/1's took a fraction of a second. -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
In R, always begin to try to obtain result on a little unit.
Begin to make a function that will make replacements for ONE vector (of size 9)
FillWith=function(vec,SearchForOne=0,ReplaceNextValues=0)
{
pp=which(vec==SearchForOne)
if (length(pp)>0) vec[pp:length(vec)]=ReplaceNextValues
return(vec)
}
Verify it works:
> FillWith(c(1,1,0,1,1))
[1] 1 1 0 0 0
Then try to apply it with your data, using one of the ?apply functions.
Here, tapply seems to be adequate.
> data=c(rep(1,9),rep(1,4),0,rep(1,4))
> data
[1] 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1
> data=cbind(data,groups=((1:length(data)-1)%/%9))
> data
data groups
[1,] 1 0
[2,] 1 0
[3,] 1 0
[4,] 1 0
[5,] 1 0
[6,] 1 0
[7,] 1 0
[8,] 1 0
[9,] 1 0
[10,] 1 1
[11,] 1 1
[12,] 1 1
[13,] 1 1
[14,] 0 1
[15,] 1 1
[16,] 1 1
[17,] 1 1
[18,] 1 1
> tapply(data[,1],data[,2],FUN=FillWith)
$"0"
[1] 1 1 1 1 1 1 1 1 1
$"1"
[1] 1 1 1 1 0 0 0 0 0
And then come back to a vector with unlist().
Eric
At 08:27 24/12/2003, Pravin wrote:>Hello,
>
>
>I am a beginner in R programming and recently heard about this mailing list.
>Currently, I am trapped into a simple problem for which I just can't
find a
>solution. I have a huge dataset (~81,000 observations) that has been
>analyzed and the final result is in the form of 0 and 1(one column).
>
>I need to write a code to process this column in a little complicated way.
>
>These 81,000 observations are actually 9,000 sets (81,000/9).
>
>So, in each set whenever zero appears, rest all observations become zero.
>
>For example;
>If the column has:
>111110111111011111111111111111111....
>The output should look like:
>111110000111000000111111111111111...
>I hope this makes sense.
>Thank you in anticipation,
>
>Pravin
>
>Pravin Jadhav
--------------------------------------------------
L'erreur est certes humaine, mais un vrai d?sastre
n?cessite un ou deux ordinateurs. Citation anonyme
--------------------------------------------------
Eric Lecoutre
Informaticien/Statisticien
Institut de Statistique / UCL
TEL (+32)(0)10473050 lecoutre at stat.ucl.ac.be
URL http://www.stat.ucl.ac.be/ISpersonnel/lecoutre
Pravin a ?crit :> Hello, > > > > I am a beginner in R programming and recently heard about this mailing list. > Currently, I am trapped into a simple problem for which I just can't find a > solution. I have a huge dataset (~81,000 observations) that has been > analyzed and the final result is in the form of 0 and 1(one column). > > > > I need to write a code to process this column in a little complicated way. > > These 81,000 observations are actually 9,000 sets (81,000/9). > > So, in each set whenever zero appears, rest all observations become zero. > > > > For example; > > If the column has: > > 111110111111011111111111111111111.... > > The output should look like: > > 111110000111000000111111111111111... > > > > I hope this makes sense. > > > > Thank you in anticipation, > > > > Pravin > > > > Pravin Jadhav > > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help > >Pravin a ?crit : > Hello, > > > > I am a beginner in R programming and recently heard about this mailing list. > Currently, I am trapped into a simple problem for which I just can't find a > solution. I have a huge dataset (~81,000 observations) that has been > analyzed and the final result is in the form of 0 and 1(one column). > > > > I need to write a code to process this column in a little complicated way. > > These 81,000 observations are actually 9,000 sets (81,000/9). > > So, in each set whenever zero appears, rest all observations become zero. > > > > For example; > > If the column has: > > 111110111111011111111111111111111.... > > The output should look like: > > 111110000111000000111111111111111... > > > > I hope this makes sense. > > > > Thank you in anticipation, > > > > Pravin > > > > Pravin Jadhav > > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help > > Here is an example: set.seed(101) v <- sample(c(0, 1), size = 36, replace = TRUE, prob = c(.05, .95)) L <- length(v) / 9 idx <- rep(seq(L), each = 9) fn <- function(x){ ok <- FALSE for(i in seq(length(x))){ if(x[i] == 0) ok <- TRUE x[i] <- if(ok) 0 else 1 } x } cbind(idx, v, recod = unlist(tapply(v, idx, fn))) idx v recod 11 1 1 1 12 1 1 1 13 1 1 1 14 1 1 1 15 1 1 1 16 1 1 1 17 1 1 1 18 1 1 1 19 1 1 1 21 2 1 1 22 2 1 1 23 2 1 1 24 2 1 1 25 2 1 1 26 2 1 1 27 2 1 1 28 2 1 1 29 2 1 1 31 3 1 1 32 3 1 1 33 3 1 1 34 3 0 0 35 3 1 0 36 3 1 0 37 3 1 0 38 3 1 0 39 3 1 0 41 4 1 1 42 4 1 1 43 4 1 1 44 4 1 1 45 4 1 1 46 4 1 1 47 4 1 1 48 4 1 1 49 4 1 1 > Merry Christmas ! Renaud -- Dr Renaud Lancelot v?t?rinaire ?pid?miologiste Ambassade de France - SCAC BP 834 Antannarivo 101 Madagascar t?l. +261 (0)32 04 824 55 (cell) +261 (0)20 22 494 37 (home) -- Dr Renaud Lancelot v?t?rinaire ?pid?miologiste Ambassade de France - SCAC BP 834 Antannarivo 101 Madagascar t?l. +261 (0)32 04 824 55 (cell) +261 (0)20 22 494 37 (home)
In addition to the previous replies, try this
x <- as.numeric(strsplit("111110111111011111111111111", NULL)[[1]])
g <- rep(1:3, each=9) # set numbering
rbind(x, g) # to check
y <- unlist( tapply(x, g, cummin) )> y
11 12 13 14 15 16 17 18 19 21 22 23 24 25 26 27 28 29 31 32 33 34 35 36 37 38 39
1 1 1 1 1 0 0 0 0 1 1 1 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1
tapply() applies a given function, in this case cummin(), to the sets defined by
'g'.
cummin() returns the cummulative minimum
Here, the names of vector y is a combination of set number and observation in
set number.
--
Adaikalavan Ramasamy
-----Original Message-----
From: r-help-bounces at stat.math.ethz.ch [mailto:r-help-bounces at
stat.math.ethz.ch] On Behalf Of Renaud Lancelot
Sent: Wednesday, December 24, 2003 5:00 PM
To: Pravin
Cc: r-help at stat.math.ethz.ch
Subject: Re: [R] coding logic and syntax in R
Pravin a ?crit :
> Hello,
>
>
>
> I am a beginner in R programming and recently heard about this mailing
> list. Currently, I am trapped into a simple problem for which I just
> can't find a solution. I have a huge dataset (~81,000 observations)
> that has been analyzed and the final result is in the form of 0 and
> 1(one column).
>
>
>
> I need to write a code to process this column in a little complicated
> way.
>
> These 81,000 observations are actually 9,000 sets (81,000/9).
>
> So, in each set whenever zero appears, rest all observations become
> zero.
>
>
>
> For example;
>
> If the column has:
>
> 111110111111011111111111111111111....
>
> The output should look like:
>
> 111110000111000000111111111111111...
>
>
>
> I hope this makes sense.
>
>
>
> Thank you in anticipation,
>
>
>
> Pravin
>
>
>
> Pravin Jadhav
>
>
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
>
>
Pravin a ?crit :
> Hello,
>
>
>
> I am a beginner in R programming and recently heard about this
mailing list.
> Currently, I am trapped into a simple problem for which I just can't
find a
> solution. I have a huge dataset (~81,000 observations) that has been >
analyzed and the final result is in the form of 0 and 1(one column). > >
> > I need to write a code to process this column in a little complicated
way.
>
> These 81,000 observations are actually 9,000 sets (81,000/9). > >
So, in each set whenever zero appears, rest all observations become zero. >
> > > For example; > > If the column has: > >
111110111111011111111111111111111....
>
> The output should look like:
>
> 111110000111000000111111111111111...
>
>
>
> I hope this makes sense.
>
>
>
> Thank you in anticipation,
>
>
>
> Pravin
>
>
>
> Pravin Jadhav
>
>
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
>
>
Here is an example:
set.seed(101)
v <- sample(c(0, 1), size = 36, replace = TRUE, prob = c(.05, .95)) L <-
length(v) / 9 idx <- rep(seq(L), each = 9)
fn <- function(x){
ok <- FALSE
for(i in seq(length(x))){
if(x[i] == 0) ok <- TRUE
x[i] <- if(ok) 0 else 1
}
x
}
cbind(idx, v, recod = unlist(tapply(v, idx, fn)))
idx v recod
11 1 1 1
12 1 1 1
13 1 1 1
14 1 1 1
15 1 1 1
16 1 1 1
17 1 1 1
18 1 1 1
19 1 1 1
21 2 1 1
22 2 1 1
23 2 1 1
24 2 1 1
25 2 1 1
26 2 1 1
27 2 1 1
28 2 1 1
29 2 1 1
31 3 1 1
32 3 1 1
33 3 1 1
34 3 0 0
35 3 1 0
36 3 1 0
37 3 1 0
38 3 1 0
39 3 1 0
41 4 1 1
42 4 1 1
43 4 1 1
44 4 1 1
45 4 1 1
46 4 1 1
47 4 1 1
48 4 1 1
49 4 1 1
>
Merry Christmas !
Renaud
--
Dr Renaud Lancelot
v?t?rinaire ?pid?miologiste
Ambassade de France - SCAC
BP 834 Antannarivo 101
Madagascar
t?l. +261 (0)32 04 824 55 (cell)
+261 (0)20 22 494 37 (home)
--
Dr Renaud Lancelot
v?t?rinaire ?pid?miologiste
Ambassade de France - SCAC
BP 834 Antannarivo 101
Madagascar
t?l. +261 (0)32 04 824 55 (cell)
+261 (0)20 22 494 37 (home)
______________________________________________
R-help at stat.math.ethz.ch mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help