Clive Nicholas
2014-Jul-06 15:38 UTC
[R] Expanding dataset on the values of one of its variables
Hello! I have a dataset which is perhaps rather topical at about this time:> wc=read.delim("/home/openclive/Documents/worldcup.csv",header=T,sep="\t",fill=T)> head(wc,n=20) team year time score out top goals host format formed culture wcups cholder times1 ARG 1986 1 6 0 1 4 0 0 1893 93 8 0 6 2 ARG 1990 2 5 1 0 1 0 0 1893 97 9 0 6 3 ARG 1994 3 1 1 0 3 0 0 1893 101 10 1 6 4 ARG 1998 4 2 1 1 7 0 1 1893 105 11 0 6 5 ARG 2006 6 2 1 1 7 0 1 1893 113 13 0 6 6 ARG 2010 7 2 1 1 6 0 1 1893 117 14 0 6 7 AUS 2006 6 1 1 0 0 0 1 1961 45 1 1 1 8 BEL 1986 1 3 1 0 0 0 0 1895 91 6 0 4 9 BEL 1990 2 1 1 0 3 0 0 1895 95 7 0 4 10 BEL 1994 3 1 1 0 1 0 0 1895 99 8 0 4 11 BEL 2002 5 1 1 0 1 0 1 1895 107 9 0 4 12 BRA 1986 1 2 1 1 5 0 0 1914 72 12 0 7 13 BRA 1990 2 1 1 1 3 0 0 1914 76 13 1 7 14 BRA 1994 3 6 0 1 5 0 0 1914 80 14 0 7 15 BRA 1998 4 5 1 1 3 0 1 1914 84 15 1 7 16 BRA 2002 5 6 0 1 8 0 1 1914 88 16 0 7 17 BRA 2006 6 2 1 1 6 0 1 1914 92 17 1 7 18 BRA 2010 7 2 1 1 3 0 1 1914 96 18 0 7 19 BUL 1986 1 1 1 0 -2 0 0 1923 63 4 0 2 20 BUL 1994 3 3 1 0 3 0 0 1923 71 5 0 2 cards gdp 1 12 6146.155 2 25 5800.057 3 9 7162.093 4 14 7994.116 5 15 8107.975 6 7 9933.229 7 12 9933.229 8 8 16273.539 9 3 18222.221 10 6 18964.370 11 6 22801.777 12 3 3334.000 13 7 3564.636 14 11 3380.128 15 12 3693.276 16 10 3692.840 17 11 3976.619 18 13 4424.759 19 3 1508.592 20 25 1438.153 Don't worry about they all denote; I merely show it to you for the purposes of demonstration. Basically, I want to expand the dataset on the value of the -score- variable in order to prepare the data for survival analysis. Thus, where Argentina achieved a score of 6 in 1986, I want R to create five additional records. Where Bulgaria achieved a score of 3 in 1994, I want it to create two additional records. Rows containing scores of 1 shouldn't create any extra records at all. Doing this should be straightforward, but the -expand- command in the -reshape- package doesn't appear to do this ... unless I've missed something or there is another command from another package that does what I need. I'd be most grateful if anybody has a solution to this. -- Clive Nicholas "My colleagues in the social sciences talk a great deal about methodology. I prefer to call it style." -- Freeman J. Dyson [[alternative HTML version deleted]]
Hi, Not sure about the expected output. If `dat` is the dataset: res <- dat[rep(1:nrow(dat), dat$score),] head(res,7) ??? team year time score out top goals host format formed culture wcups cholder 1??? ARG 1986??? 1???? 6?? 0?? 1???? 4??? 0????? 0?? 1893????? 93???? 8?????? 0 1.1? ARG 1986??? 1???? 6?? 0?? 1???? 4??? 0????? 0?? 1893????? 93???? 8?????? 0 1.2? ARG 1986??? 1???? 6?? 0?? 1???? 4??? 0????? 0?? 1893????? 93???? 8?????? 0 1.3? ARG 1986??? 1???? 6?? 0?? 1???? 4??? 0????? 0?? 1893????? 93???? 8?????? 0 1.4? ARG 1986??? 1???? 6?? 0?? 1???? 4??? 0????? 0?? 1893????? 93???? 8?????? 0 1.5? ARG 1986??? 1???? 6?? 0?? 1???? 4??? 0????? 0?? 1893????? 93???? 8?????? 0 2??? ARG 1990??? 2???? 5?? 1?? 0???? 1??? 0????? 0?? 1893????? 97???? 9?????? 0 ??? times cards????? gdp 1?????? 6??? 12 6146.155 1.1???? 6??? 12 6146.155 1.2???? 6??? 12 6146.155 1.3???? 6??? 12 6146.155 1.4???? 6??? 12 6146.155 1.5???? 6??? 12 6146.155 2?????? 6??? 25 5800.057 A.K. On Sunday, July 6, 2014 11:38 AM, Clive Nicholas <clivelists at googlemail.com> wrote: Hello! I have a dataset which is perhaps rather topical at about this time:> wc=read.delim("/home/openclive/Documents/worldcup.csv",header=T,sep="\t",fill=T)> head(wc,n=20)? team year time score out top goals host format formed culture wcups cholder times1? ARG 1986? ? 1? ? 6? 0? 1? ? 4? ? 0? ? ? 0? 1893? ? ? 93? ? 8 ? ? ? 0? ? 6 2? ARG 1990? ? 2? ? 5? 1? 0? ? 1? ? 0? ? ? 0? 1893? ? ? 97? ? 9 ? ? ? 0? ? 6 3? ARG 1994? ? 3? ? 1? 1? 0? ? 3? ? 0? ? ? 0? 1893? ? 101? ? 10 ? ? ? 1? ? 6 4? ARG 1998? ? 4? ? 2? 1? 1? ? 7? ? 0? ? ? 1? 1893? ? 105? ? 11 ? ? ? 0? ? 6 5? ARG 2006? ? 6? ? 2? 1? 1? ? 7? ? 0? ? ? 1? 1893? ? 113? ? 13 ? ? ? 0? ? 6 6? ARG 2010? ? 7? ? 2? 1? 1? ? 6? ? 0? ? ? 1? 1893? ? 117? ? 14 ? ? ? 0? ? 6 7? AUS 2006? ? 6? ? 1? 1? 0? ? 0? ? 0? ? ? 1? 1961? ? ? 45? ? 1 ? ? ? 1? ? 1 8? BEL 1986? ? 1? ? 3? 1? 0? ? 0? ? 0? ? ? 0? 1895? ? ? 91? ? 6 ? ? ? 0? ? 4 9? BEL 1990? ? 2? ? 1? 1? 0? ? 3? ? 0? ? ? 0? 1895? ? ? 95? ? 7 ? ? ? 0? ? 4 10? BEL 1994? ? 3? ? 1? 1? 0? ? 1? ? 0? ? ? 0? 1895? ? ? 99? ? 8 ? ? ? 0? ? 4 11? BEL 2002? ? 5? ? 1? 1? 0? ? 1? ? 0? ? ? 1? 1895? ? 107? ? 9 ? ? ? 0? ? 4 12? BRA 1986? ? 1? ? 2? 1? 1? ? 5? ? 0? ? ? 0? 1914? ? ? 72? ? 12 ? ? ? 0? ? 7 13? BRA 1990? ? 2? ? 1? 1? 1? ? 3? ? 0? ? ? 0? 1914? ? ? 76? ? 13 ? ? ? 1? ? 7 14? BRA 1994? ? 3? ? 6? 0? 1? ? 5? ? 0? ? ? 0? 1914? ? ? 80? ? 14 ? ? ? 0? ? 7 15? BRA 1998? ? 4? ? 5? 1? 1? ? 3? ? 0? ? ? 1? 1914? ? ? 84? ? 15 ? ? ? 1? ? 7 16? BRA 2002? ? 5? ? 6? 0? 1? ? 8? ? 0? ? ? 1? 1914? ? ? 88? ? 16 ? ? ? 0? ? 7 17? BRA 2006? ? 6? ? 2? 1? 1? ? 6? ? 0? ? ? 1? 1914? ? ? 92? ? 17 ? ? ? 1? ? 7 18? BRA 2010? ? 7? ? 2? 1? 1? ? 3? ? 0? ? ? 1? 1914? ? ? 96? ? 18 ? ? ? 0? ? 7 19? BUL 1986? ? 1? ? 1? 1? 0? ? -2? ? 0? ? ? 0? 1923? ? ? 63? ? 4 ? ? ? 0? ? 2 20? BUL 1994? ? 3? ? 3? 1? 0? ? 3? ? 0? ? ? 0? 1923? ? ? 71? ? 5 ? ? ? 0? ? 2 ? cards? ? ? gdp 1? ? 12? 6146.155 2? ? 25? 5800.057 3? ? ? 9? 7162.093 4? ? 14? 7994.116 5? ? 15? 8107.975 6? ? ? 7? 9933.229 7? ? 12? 9933.229 8? ? ? 8 16273.539 9? ? ? 3 18222.221 10? ? 6 18964.370 11? ? 6 22801.777 12? ? 3? 3334.000 13? ? 7? 3564.636 14? ? 11? 3380.128 15? ? 12? 3693.276 16? ? 10? 3692.840 17? ? 11? 3976.619 18? ? 13? 4424.759 19? ? 3? 1508.592 20? ? 25? 1438.153 Don't worry about they all denote; I merely show it to you for the purposes of demonstration. Basically, I want to expand the dataset on the value of the -score- variable in order to prepare the data for survival analysis. Thus, where Argentina achieved a score of 6 in 1986, I want R to create five additional records. Where Bulgaria achieved a score of 3 in 1994, I want it to create two additional records. Rows containing scores of 1 shouldn't create any extra records at all. Doing this should be straightforward, but the -expand- command in the -reshape- package doesn't appear to do this ... unless I've missed something or there is another command from another package that does what I need. I'd be most grateful if anybody has a solution to this. -- Clive Nicholas "My colleagues in the social sciences talk a great deal about methodology. I prefer to call it style." -- Freeman J. Dyson ??? [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.