Jorge Iván Vélez
2008-Feb-14 20:22 UTC
[R] Replacing columns in a data frame using a previous condition
Dear R-list, I'm working with a data frame which dimensions are> dim(GERU)[1] 3468 318 and looks like> GERU[1:10,1:10]ped ind par1 par2 sex sta rs7696470 rs7696470.1 rs1032896 rs1032896.1 1 USA5854 2 0 0 2 1 4 4 1 1 2 USA5854 3 1 2 1 1 4 4 1 1 3 USA5854 4 1 2 2 2 1 4 1 3 4 USA5854 5 1 2 1 2 4 2 2 1 5 USA5855 1 0 0 1 1 0 0 0 0 6 USA5855 2 0 0 2 2 1 0 0 0 7 USA5855 3 1 2 1 2 0 2 0 0 8 USA5855 4 1 2 1 1 2 0 2 1 9 USA5855 5 1 2 1 2 0 1 0 0 10 USA5856 1 0 0 1 1 3 3 3 3 What I would like to do is: 1. Identify which column (from 6 to 318) has more than 4 categories (I solved that). In GERU would be rs7696470 and rs7696470.1. 2. Using the columns in step 1, replace its entries equals to 2 for 3. For example, rs7696470 would be 4,4,1,4,0,1,0,3,0,3 and so on. 3. Once replaced the entries, I need to rewrite the columns in GERU. Here is what I've done:> # Function to identify columns with 3 or more categories > tx=function(x) ifelse(dim(table(x))>4,1,0)> # Identifying the columns > M4=apply(GUPN[,-c(1:6)],2,tx) > names(which(MR==1)) # Step 1[1] "rs335322" "rs335322.1" "rs186750" "rs186750.1" "rs1565901" "rs1565901.1" "rs1565902" [8] "rs1565902.1" "rs11131334" "rs11131334.1" "rs1948616" " rs1948616.1" "rs4484334" "rs4484334.1" [15] "rs1497921" "rs1497921.1" "rs1391320" "rs1391320.1" "rs1497913" "rs1497913.1" "rs996208" [22] "rs996208.1"> # Step 2 > REPLACE=GUPN[,names(which(AR==1))] > RES=apply(REPLACE,2,function(x) ifelse(x==2,3,x)) > RES[1:10,1:5]rs335322 rs335322.1 rs186750 rs186750.1 rs1565901 1 1 3 3 3 3 2 1 1 3 3 3 3 3 3 1 3 3 4 1 3 3 3 3 5 0 0 0 0 0 6 0 0 0 0 0 7 0 0 0 0 0 8 0 0 0 0 0 9 0 0 0 0 0 10 1 3 3 3 1 Now, the problem I have is replacing the columns in GERU by the columns in RES (step 3). At the end the dimension of the new data set should be 3468x318. Any help would be greatly appreciated. Thanks you so much, Jorge [[alternative HTML version deleted]]
jim holtman
2008-Feb-14 20:41 UTC
[R] Replacing columns in a data frame using a previous condition
Is this what you want to do?> x <- data.frame(a=1:10, b=1:10, c=1:10, d=1:10) > z <- cbind(c=11:20, d=11:20) > zc d [1,] 11 11 [2,] 12 12 [3,] 13 13 [4,] 14 14 [5,] 15 15 [6,] 16 16 [7,] 17 17 [8,] 18 18 [9,] 19 19 [10,] 20 20> x[,colnames(z)] <- z[, colnames(z)] > xa b c d 1 1 1 11 11 2 2 2 12 12 3 3 3 13 13 4 4 4 14 14 5 5 5 15 15 6 6 6 16 16 7 7 7 17 17 8 8 8 18 18 9 9 9 19 19 10 10 10 20 20>On 2/14/08, Jorge Iv?n V?lez <jorgeivanvelez at gmail.com> wrote:> Dear R-list, > > I'm working with a data frame which dimensions are > > > dim(GERU) > [1] 3468 318 > > and looks like > > > GERU[1:10,1:10] > ped ind par1 par2 sex sta rs7696470 rs7696470.1 rs1032896 rs1032896.1 > 1 USA5854 2 0 0 2 1 4 4 1 1 > 2 USA5854 3 1 2 1 1 4 4 1 1 > 3 USA5854 4 1 2 2 2 1 4 1 3 > 4 USA5854 5 1 2 1 2 4 2 2 1 > 5 USA5855 1 0 0 1 1 0 0 0 0 > 6 USA5855 2 0 0 2 2 1 0 0 0 > 7 USA5855 3 1 2 1 2 0 2 0 0 > 8 USA5855 4 1 2 1 1 2 0 2 1 > 9 USA5855 5 1 2 1 2 0 1 0 0 > 10 USA5856 1 0 0 1 1 3 3 3 3 > > What I would like to do is: > > 1. Identify which column (from 6 to 318) has more than 4 categories (I > solved that). In GERU would be rs7696470 and rs7696470.1. > 2. Using the columns in step 1, replace its entries equals to 2 for 3. For > example, rs7696470 would be 4,4,1,4,0,1,0,3,0,3 and so on. > 3. Once replaced the entries, I need to rewrite the columns in GERU. > > Here is what I've done: > > > # Function to identify columns with 3 or more categories > > tx=function(x) ifelse(dim(table(x))>4,1,0) > > > # Identifying the columns > > M4=apply(GUPN[,-c(1:6)],2,tx) > > names(which(MR==1)) # Step 1 > [1] "rs335322" "rs335322.1" "rs186750" "rs186750.1" > "rs1565901" "rs1565901.1" "rs1565902" > [8] "rs1565902.1" "rs11131334" "rs11131334.1" "rs1948616" " > rs1948616.1" "rs4484334" "rs4484334.1" > [15] "rs1497921" "rs1497921.1" "rs1391320" "rs1391320.1" > "rs1497913" "rs1497913.1" "rs996208" > [22] "rs996208.1" > > # Step 2 > > REPLACE=GUPN[,names(which(AR==1))] > > RES=apply(REPLACE,2,function(x) ifelse(x==2,3,x)) > > RES[1:10,1:5] > rs335322 rs335322.1 rs186750 rs186750.1 rs1565901 > 1 1 3 3 3 3 > 2 1 1 3 3 3 > 3 3 3 1 3 3 > 4 1 3 3 3 3 > 5 0 0 0 0 0 > 6 0 0 0 0 0 > 7 0 0 0 0 0 > 8 0 0 0 0 0 > 9 0 0 0 0 0 > 10 1 3 3 3 1 > > Now, the problem I have is replacing the columns in GERU by the columns in > RES (step 3). At the end the dimension of the new data set should be > 3468x318. Any help would be greatly appreciated. > > Thanks you so much, > > > Jorge > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve?
Dimitris Rizopoulos
2008-Feb-14 20:44 UTC
[R] Replacing columns in a data frame using a previous condition
try this: GERU[6:318] <- lapply(GERU[6:318], function (x) { if (length(unique(x[!is.na(x)])) >= 5) x[x == 2] <- 3 x }) I hope it helps. Best, Dimitris ---- Dimitris Rizopoulos Ph.D. Student Biostatistical Centre School of Public Health Catholic University of Leuven Address: Kapucijnenvoer 35, Leuven, Belgium Tel: +32/(0)16/336899 Fax: +32/(0)16/337015 Web: http://med.kuleuven.be/biostat/ http://www.student.kuleuven.be/~m0390867/dimitris.htm Quoting Jorge Iv?n V?lez <jorgeivanvelez at gmail.com>:> Dear R-list, > > I'm working with a data frame which dimensions are > >> dim(GERU) > [1] 3468 318 > > and looks like > >> GERU[1:10,1:10] > ped ind par1 par2 sex sta rs7696470 rs7696470.1 rs1032896 rs1032896.1 > 1 USA5854 2 0 0 2 1 4 4 1 1 > 2 USA5854 3 1 2 1 1 4 4 1 1 > 3 USA5854 4 1 2 2 2 1 4 1 3 > 4 USA5854 5 1 2 1 2 4 2 2 1 > 5 USA5855 1 0 0 1 1 0 0 0 0 > 6 USA5855 2 0 0 2 2 1 0 0 0 > 7 USA5855 3 1 2 1 2 0 2 0 0 > 8 USA5855 4 1 2 1 1 2 0 2 1 > 9 USA5855 5 1 2 1 2 0 1 0 0 > 10 USA5856 1 0 0 1 1 3 3 3 3 > > What I would like to do is: > > 1. Identify which column (from 6 to 318) has more than 4 categories (I > solved that). In GERU would be rs7696470 and rs7696470.1. > 2. Using the columns in step 1, replace its entries equals to 2 for 3. For > example, rs7696470 would be 4,4,1,4,0,1,0,3,0,3 and so on. > 3. Once replaced the entries, I need to rewrite the columns in GERU. > > Here is what I've done: > >> # Function to identify columns with 3 or more categories >> tx=function(x) ifelse(dim(table(x))>4,1,0) > >> # Identifying the columns >> M4=apply(GUPN[,-c(1:6)],2,tx) >> names(which(MR==1)) # Step 1 > [1] "rs335322" "rs335322.1" "rs186750" "rs186750.1" > "rs1565901" "rs1565901.1" "rs1565902" > [8] "rs1565902.1" "rs11131334" "rs11131334.1" "rs1948616" " > rs1948616.1" "rs4484334" "rs4484334.1" > [15] "rs1497921" "rs1497921.1" "rs1391320" "rs1391320.1" > "rs1497913" "rs1497913.1" "rs996208" > [22] "rs996208.1" >> # Step 2 >> REPLACE=GUPN[,names(which(AR==1))] >> RES=apply(REPLACE,2,function(x) ifelse(x==2,3,x)) >> RES[1:10,1:5] > rs335322 rs335322.1 rs186750 rs186750.1 rs1565901 > 1 1 3 3 3 3 > 2 1 1 3 3 3 > 3 3 3 1 3 3 > 4 1 3 3 3 3 > 5 0 0 0 0 0 > 6 0 0 0 0 0 > 7 0 0 0 0 0 > 8 0 0 0 0 0 > 9 0 0 0 0 0 > 10 1 3 3 3 1 > > Now, the problem I have is replacing the columns in GERU by the columns in > RES (step 3). At the end the dimension of the new data set should be > 3468x318. Any help would be greatly appreciated. > > Thanks you so much, > > > Jorge > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > >Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm