Hello,
I have a two column dataframe that
has entries that look like this:
2315100 NR_024005,NR_024004,AK093685
2315106 DQ786314
and I want to change this to look like this:
2315100 NR_024005
2315100 NR_024004
2315100 AK093685
2315106 DQ786314
I can do this with the following "for" loop but the dataframe (GPL)
has ~140,000 rows and this takes about 15 minutes:
extGPL <- matrix(nrow=0,ncol=2)
for (i in 1:length(GPL[,2])){
aa <- unlist(strsplit(as.character(GPL[i,2]),"\\,"))
bb <- rep(as.numeric(as.character(GPL[i,1])), length(aa))
cc <- matrix(c(bb,aa),ncol = 2)
extGPL <- rbind(extGPL,cc)
}
Is there a way to vectorize this?
Thanks,
Dylan Miracle
University of Minnesota
GCD Department
Try this: do.call(rbind.data.frame, mapply(cbind, DF$V1, strsplit(as.character(DF$V2), ","))) On Mon, Oct 4, 2010 at 2:54 PM, Dylan Miracle <dylan.miracle@gmail.com>wrote:> Hello, > > I have a two column dataframe that > has entries that look like this: > > 2315100 NR_024005,NR_024004,AK093685 > 2315106 DQ786314 > > and I want to change this to look like this: > > 2315100 NR_024005 > 2315100 NR_024004 > 2315100 AK093685 > 2315106 DQ786314 > > I can do this with the following "for" loop but the dataframe (GPL) > has ~140,000 rows and this takes about 15 minutes: > > > extGPL <- matrix(nrow=0,ncol=2) > for (i in 1:length(GPL[,2])){ > aa <- unlist(strsplit(as.character(GPL[i,2]),"\\,")) > bb <- rep(as.numeric(as.character(GPL[i,1])), length(aa)) > cc <- matrix(c(bb,aa),ncol = 2) > extGPL <- rbind(extGPL,cc) > } > > Is there a way to vectorize this? > > Thanks, > > Dylan Miracle > University of Minnesota > GCD Department > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40" S 49° 16' 22" O [[alternative HTML version deleted]]
try this:
GPL <- data.frame(x = c(2315100, 2315106),
y = c("NR_024005,NR_024004,AK093685", "DQ786314"))
sp <- strsplit(as.character(GPL$y), ",")
ni <- sapply(sp, length)
data.frame(x = rep(GPL$x, ni), y = unlist(sp))
I hope it helps.
Best,
Dimitris
On 10/4/2010 7:54 PM, Dylan Miracle wrote:> Hello,
>
> I have a two column dataframe that
> has entries that look like this:
>
> 2315100 NR_024005,NR_024004,AK093685
> 2315106 DQ786314
>
> and I want to change this to look like this:
>
> 2315100 NR_024005
> 2315100 NR_024004
> 2315100 AK093685
> 2315106 DQ786314
>
> I can do this with the following "for" loop but the dataframe
(GPL)
> has ~140,000 rows and this takes about 15 minutes:
>
>
> extGPL<- matrix(nrow=0,ncol=2)
> for (i in 1:length(GPL[,2])){
> aa<- unlist(strsplit(as.character(GPL[i,2]),"\\,"))
> bb<- rep(as.numeric(as.character(GPL[i,1])), length(aa))
> cc<- matrix(c(bb,aa),ncol = 2)
> extGPL<- rbind(extGPL,cc)
> }
>
> Is there a way to vectorize this?
>
> Thanks,
>
> Dylan Miracle
> University of Minnesota
> GCD Department
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Dimitris Rizopoulos
Assistant Professor
Department of Biostatistics
Erasmus University Medical Center
Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands
Tel: +31/(0)10/7043478
Fax: +31/(0)10/7043014
Web: http://www.erasmusmc.nl/biostatistiek/
On Mon, Oct 4, 2010 at 1:54 PM, Dylan Miracle <dylan.miracle at gmail.com> wrote:> Hello, > > I have a two column dataframe that > has entries that look like this: > > 2315100 ? ? ? NR_024005,NR_024004,AK093685 > 2315106 ? ? ? DQ786314 > > and I want to change this to look like this: > > 2315100 ? ? ? NR_024005 > 2315100 ? ? ? NR_024004 > 2315100 ? ? ? AK093685 > 2315106 ? ? ? DQ786314 > > I can do this with the following "for" loop but the dataframe (GPL) > has ~140,000 rows and this takes about 15 minutes:Try this assuming that the columns of GPL are character. You may need to use as.character first if they are factor: library(reshape2) V2 <- strsplit(GPL$V2, ",") names(V2) <- GPL$V1 melt(V2) -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com
Apparently Analagous Threads
- "Invalid object" error in boxplot
- File walking issue?
- I/O bottleneck Root cause identification w Dtrace ?? (controller or IO bus)
- vegdist Error en double(N * (N - 1)/2) : tama?o del vector especificado es muy grande
- Prevent `wbinfo -u` from making Winbind unresponsive