Jan Hornych
2010-Mar-02 18:35 UTC
[R] Creating matrix from long table in database (pivoting)
Hi all, I have a table in database that is very long and when simplified it has only two columns in it (id, text). id is the row, and text is the column. Technically the text is a term and and id is the document. If simplifying this and assuming there is only one occurrence of the term per the document. I shall be able to convert this into a binary matrix. Table looks like this... *ID** **Text* ------------ 1 this 1 is 1 the 1 first 1 row 2 this 2 is 2 the 2 send 2 row ... in R I would like to have it as *id this is the first second row* ------------------------------------------------ 1 1 1 1 1 0 1 2 1 1 1 0 1 1 it would be simpler for me to do this transformation in R as I guess the language is more handy as the SQL. The table in R have few dozen thousand of columns and rows as well. I know how to read the data from database, but just unsure if there is some suitable transformation available. Thank you Jan [[alternative HTML version deleted]]
Henrique Dallazuanna
2010-Mar-02 18:49 UTC
[R] Creating matrix from long table in database (pivoting)
Try this:
DF <- read.table(textConnection("1 this
1 is
1 the
1 first
1 row
2 this
2 is
2 the
2 send
2 row"))
reshape(DF, v.names = 'V2', idvar = 'V1', timevar =
'V2', direction = 'wide')
On Tue, Mar 2, 2010 at 3:35 PM, Jan Hornych <jh.hornych at gmail.com>
wrote:> Hi all,
>
> I have a table in database that is very long and when simplified it has
only
> two columns in it (id, text). id is the row, and text is the column.
> Technically the text is a term and and id is the document.
> If simplifying this and assuming there is only one occurrence of the term
> per the document. I shall be able to convert this into a binary matrix.
> Table looks like this...
>
> *ID** **Text*
> ------------
> 1 this
> 1 is
> 1 the
> 1 first
> 1 row
> 2 this
> 2 is
> 2 the
> 2 send
> 2 row
> ...
>
>
> in R I would like to have it as
>
> *id ?this is the first second row*
> ------------------------------------------------
> 1 ? ? 1 ?1 ? 1 ? ? 1 ? ? ? ? ?0 ? ? 1
> 2 ? ? 1 ?1 ? 1 ? ? 0 ? ? ? ? ?1 ? ? 1
>
> it would be simpler for me to do this transformation in R as I guess the
> language is more handy as the SQL. The table in R have few dozen thousand
of
> columns and rows as well. I know how to read the data from database, but
> just unsure if there is some suitable transformation available.
>
> Thank you
> Jan
>
> ? ? ? ?[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Henrique Dallazuanna
Curitiba-Paran?-Brasil
25? 25' 40" S 49? 16' 22" O
Phil Spector
2010-Mar-02 19:01 UTC
[R] Creating matrix from long table in database (pivoting)
Jan -
Here's one way:
> tbl = data.frame(id=c(1,1,1,1,1,2,2,2,2,2),
text=c('this','is','the','first','row','this','is','the','second','row'))
> xtabs(~id+text,tbl)
text
id first is row second the this
1 1 1 1 0 1 1
2 0 1 1 1 1 1
It's a bit tricky to automatically get the column headings to
be in the order you want. This comes close:
> tbl$text = factor(tbl$text,levels=tbl$text[!duplicated(tbl$text)])
> xtabs(~id+text,tbl)
text
id this is the first row second
1 1 1 1 1 1 0
2 1 1 1 0 1 1
Hope this helps.
- Phil Spector
Statistical Computing Facility
Department of Statistics
UC Berkeley
spector at stat.berkeley.edu
On Tue, 2 Mar 2010, Jan Hornych wrote:
> Hi all,
>
> I have a table in database that is very long and when simplified it has
only
> two columns in it (id, text). id is the row, and text is the column.
> Technically the text is a term and and id is the document.
> If simplifying this and assuming there is only one occurrence of the term
> per the document. I shall be able to convert this into a binary matrix.
> Table looks like this...
>
> *ID** **Text*
> ------------
> 1 this
> 1 is
> 1 the
> 1 first
> 1 row
> 2 this
> 2 is
> 2 the
> 2 send
> 2 row
> ...
>
>
> in R I would like to have it as
>
> *id this is the first second row*
> ------------------------------------------------
> 1 1 1 1 1 0 1
> 2 1 1 1 0 1 1
>
> it would be simpler for me to do this transformation in R as I guess the
> language is more handy as the SQL. The table in R have few dozen thousand
of
> columns and rows as well. I know how to read the data from database, but
> just unsure if there is some suitable transformation available.
>
> Thank you
> Jan
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>