thr3ads.net - R help - [R] Help with [Oct 2012]

If this information is useful, please help other people find it:
Share via:

Rui Esteves

2012-Oct-18 12:44 UTC

[R] Help with

Hi,

I downloaded a dataset from UCI repositories named Bag of Words:
http://archive.ics.uci.edu/ml/machine-learning-databases/bag-of-words/readme.txt


The dataset is in a text file with the following structure:
---

docID1 wordID1 count
docID1 wordID2 count
docID1 wordID3 count
docID1 wordID4 count
...
docID2 wordID2 count
docID2 wordID5 count
docID2 wordID6 count
---

Where docIDx is an integer that identifies the document x; wordIDy is
an integer that identifies the word y ; and count is an integer with
the number of times that the wordIDy appears in the docIDx.


Example:

---

1 1 3
1 2 54
1 3 11
1 4 17
2 1 5
2 4 78
2 5 20
---

I would like to import the file into a matrix (not sparse) where:

the wordIDy would correspond to the column [,y]

the docIDx would correspond to the row [x,]

the value in [x,y] would be the count of wordIDy in the docIDx

So, for the previous example it would be like:


    [,1][,2][,3][,4][,5]

[1,]  3   54  11 17   0

[2,]  5    0   0 78  20


I don1t have a clue about how to do this.

Can someone please help me?

Thank you

Rui

	[[alternative HTML version deleted]]

Rui Barradas

2012-Oct-18 13:14 UTC

head link

[R] Help with

Hello,

It's much easier than you think, the first two columns of the input 
matrix are the row and column numbers into the output matrix, therefore 
those columns form an index matrix. Just see:

x <- scan(text="
1 1 3
1 2 54
1 3 11
1 4 17
2 1 5
2 4 78
2 5 20
")

mat <- matrix(x, ncol = 3, byrow=TRUE)

result <- matrix(0, max(mat[, 1]), max(mat[, 2]))
result[ mat[, 1:2] ] <- mat[, 3]


Easy, no?

Hope this helps,

Rui Barradas
Em 18-10-2012 13:44, Rui Esteves escreveu:> Hi,
>
> I downloaded a dataset from UCI repositories named Bag of Words:
>
http://archive.ics.uci.edu/ml/machine-learning-databases/bag-of-words/readme.txt
>
>
> The dataset is in a text file with the following structure:
> ---
>
> docID1 wordID1 count
> docID1 wordID2 count
> docID1 wordID3 count
> docID1 wordID4 count
> ...
> docID2 wordID2 count
> docID2 wordID5 count
> docID2 wordID6 count
> ---
>
> Where docIDx is an integer that identifies the document x; wordIDy is
> an integer that identifies the word y ; and count is an integer with
> the number of times that the wordIDy appears in the docIDx.
>
>
> Example:
>
> ---
>
> 1 1 3
> 1 2 54
> 1 3 11
> 1 4 17
> 2 1 5
> 2 4 78
> 2 5 20
> ---
>
> I would like to import the file into a matrix (not sparse) where:
>
> the wordIDy would correspond to the column [,y]
>
> the docIDx would correspond to the row [x,]
>
> the value in [x,y] would be the count of wordIDy in the docIDx
>
> So, for the previous example it would be like:
>
>
>      [,1][,2][,3][,4][,5]
>
> [1,]  3   54  11 17   0
>
> [2,]  5    0   0 78  20
>
>
> I don1t have a clue about how to do this.
>
> Can someone please help me?
>
> Thank you
>
> Rui
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

arun

2012-Oct-18 16:37 UTC

head link

[R] Help with

Hi,
You can also try this:
dat1<-read.table(text="
1 1 3
1 2 54
1 3 11
1 4 17
2 1 5
2 4 78
2 5 20
",sep="",header=FALSE)
library(reshape2)
dat2<-cast(dat1,V1~V2)
dat2<-dat2[,-1]
dat2[is.na(dat2)]<-0
dat3<-as.matrix(dat2)
?dat3
#???? [,1] [,2] [,3] [,4] [,5]
#[1,]??? 3?? 54?? 11?? 17??? 0
#[2,]??? 5??? 0??? 0?? 78?? 20
A.K.




----- Original Message -----
From: Rui Esteves <ruimaximo at gmail.com>
To: r-help at r-project.org
Cc: 
Sent: Thursday, October 18, 2012 8:44 AM
Subject: [R] Help with

Hi,

I downloaded a dataset from UCI repositories named Bag of Words:
http://archive.ics.uci.edu/ml/machine-learning-databases/bag-of-words/readme.txt


The dataset is in a text file with the following structure:
---

docID1 wordID1 count
docID1 wordID2 count
docID1 wordID3 count
docID1 wordID4 count
...
docID2 wordID2 count
docID2 wordID5 count
docID2 wordID6 count
---

Where docIDx is an integer that identifies the document x; wordIDy is
an integer that identifies the word y ; and count is an integer with
the number of times that the wordIDy appears in the docIDx.


Example:

---

1 1 3
1 2 54
1 3 11
1 4 17
2 1 5
2 4 78
2 5 20
---

I would like to import the file into a matrix (not sparse) where:

the wordIDy would correspond to the column [,y]

the docIDx would correspond to the row [x,]

the value in [x,y] would be the count of wordIDy in the docIDx

So, for the previous example it would be like:


? ? [,1][,2][,3][,4][,5]

[1,]? 3?  54? 11 17?  0

[2,]? 5? ? 0?  0 78? 20


I don1t have a clue about how to do this.

Can someone please help me?

Thank you

Rui

??? [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Greg Snow

2012-Oct-18 17:35 UTC

head link

[R] Help with

Another option would be to read the data using read.table or similar
to get the data into a data frame then use the xtabs function,
something like:

result <- xtabs( count ~ docID + wordID, data=mydf)



On Thu, Oct 18, 2012 at 6:44 AM, Rui Esteves <ruimaximo at gmail.com>
wrote:> Hi,
>
> I downloaded a dataset from UCI repositories named Bag of Words:
>
http://archive.ics.uci.edu/ml/machine-learning-databases/bag-of-words/readme.txt
>
>
> The dataset is in a text file with the following structure:
> ---
>
> docID1 wordID1 count
> docID1 wordID2 count
> docID1 wordID3 count
> docID1 wordID4 count
> ...
> docID2 wordID2 count
> docID2 wordID5 count
> docID2 wordID6 count
> ---
>
> Where docIDx is an integer that identifies the document x; wordIDy is
> an integer that identifies the word y ; and count is an integer with
> the number of times that the wordIDy appears in the docIDx.
>
>
> Example:
>
> ---
>
> 1 1 3
> 1 2 54
> 1 3 11
> 1 4 17
> 2 1 5
> 2 4 78
> 2 5 20
> ---
>
> I would like to import the file into a matrix (not sparse) where:
>
> the wordIDy would correspond to the column [,y]
>
> the docIDx would correspond to the row [x,]
>
> the value in [x,y] would be the count of wordIDy in the docIDx
>
> So, for the previous example it would be like:
>
>
>     [,1][,2][,3][,4][,5]
>
> [1,]  3   54  11 17   0
>
> [2,]  5    0   0 78  20
>
>
> I don1t have a clue about how to do this.
>
> Can someone please help me?
>
> Thank you
>
> Rui
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


-- 
Gregory (Greg) L. Snow Ph.D.
538280 at gmail.com

arun

2012-Oct-18 17:46 UTC

head link

[R] Help with

Hi Rainer,
Thanks for notifying me.? You are right. 

Sorry, I was working with library(reshape) instead of reshape2.? So, I guess the
cast() will not work if we load only reshape2.
A.K.




----- Original Message -----
From: Rainer Schuermann <rainer.schuermann at gmx.net>
To: arun <smartpink111 at yahoo.com>
Cc: 
Sent: Thursday, October 18, 2012 1:33 PM
Subject: Re: [R] Help with

I think you need to use 
dat2<-dcast(dat1,V1~V2)
? ? ? ^

At least on my machine, cast wouldn't do it.

Rgds, Rainer
> sessionInfo()? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
? ? ? ? ? ? ? ? ? ? ? ?R version 2.15.1 (2012-06-22)? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
? ? ? ? ? ? ? ? ? ? ?
Platform: x86_64-pc-linux-gnu (64-bit)? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
? ? ? ? ? ? ? ? ? ? ?
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
? ? ? ? ? ? ? ? ? ? ?
locale:? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
? ? ? ? ? ? ? ? ? ? ?
[1] LC_CTYPE=en_US.UTF-8? ? ?  LC_NUMERIC=C? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
? ? ? ? ? ? ? ? ? ?
[3] LC_TIME=en_US.UTF-8? ? ? ? LC_COLLATE=en_US.UTF-8? ? ? ? ? ? ? ? ? ? ? ? ? ?
? ? ? ? ? ? ? ? ? ?
[5] LC_MONETARY=en_US.UTF-8? ? LC_MESSAGES=en_US.UTF-8? ? ? ? ? ? ? ? ? ? ? ? ?
? ? ? ? ? ? ? ? ? ? ?
[7] LC_PAPER=C? ? ? ? ? ? ? ?  LC_NAME=C? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
? ? ? ? ? ? ? ? ? ? ?
[9] LC_ADDRESS=C? ? ? ? ? ? ?  LC_TELEPHONE=C? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
? ? ? ? ? ? ? ? ? ?
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
? ? ? ? ? ? ? ? ? ? ?
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
? ? ? ? ? ? ? ? ? ? ?
attached base packages:? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
? ? ? ? ? ? ? ? ? ? ?
[1] stats? ?  graphics? grDevices utils? ?  datasets? methods?  base? ? ? ? ? ?
? ? ? ? ? ? ? ? ? ? ?
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
? ? ? ? ? ? ? ? ? ? ?
other attached packages:? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
? ? ? ? ? ? ? ? ? ? ?
[1] reshape2_1.2.1? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
? ? ? ? ? ? ? ? ? ? ?
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
? ? ? ? ? ? ? ? ? ? ?
loaded via a namespace (and not attached):? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
? ? ? ? ? ? ? ? ? ? ?
[1] plyr_1.7.1? ? stringr_0.6.1 tools_2.15.1 > 



On Thursday 18 October 2012 09:37:20 arun wrote:> Hi,
> You can also try this:
> dat1<-read.table(text="
> 1 1 3
> 1 2 54
> 1 3 11
> 1 4 17
> 2 1 5
> 2 4 78
> 2 5 20
> ",sep="",header=FALSE)
> library(reshape2)
> dat2<-cast(dat1,V1~V2)
> dat2<-dat2[,-1]
> dat2[is.na(dat2)]<-0
> dat3<-as.matrix(dat2)
>? dat3
> #? ?  [,1] [,2] [,3] [,4] [,5]
> #[1,]? ? 3?  54?  11?  17? ? 0
> #[2,]? ? 5? ? 0? ? 0?  78?  20
> A.K.
> 
> 
> 
> 
> ----- Original Message -----
> From: Rui Esteves <ruimaximo at gmail.com>
> To: r-help at r-project.org
> Cc: 
> Sent: Thursday, October 18, 2012 8:44 AM
> Subject: [R] Help with
> 
> Hi,
> 
> I downloaded a dataset from UCI repositories named Bag of Words:
>
http://archive.ics.uci.edu/ml/machine-learning-databases/bag-of-words/readme.txt
> 
> 
> The dataset is in a text file with the following structure:
> ---
> 
> docID1 wordID1 count
> docID1 wordID2 count
> docID1 wordID3 count
> docID1 wordID4 count
> ...
> docID2 wordID2 count
> docID2 wordID5 count
> docID2 wordID6 count
> ---
> 
> Where docIDx is an integer that identifies the document x; wordIDy is
> an integer that identifies the word y ; and count is an integer with
> the number of times that the wordIDy appears in the docIDx.
> 
> 
> Example:
> 
> ---
> 
> 1 1 3
> 1 2 54
> 1 3 11
> 1 4 17
> 2 1 5
> 2 4 78
> 2 5 20
> ---
> 
> I would like to import the file into a matrix (not sparse) where:
> 
> the wordIDy would correspond to the column [,y]
> 
> the docIDx would correspond to the row [x,]
> 
> the value in [x,y] would be the count of wordIDy in the docIDx
> 
> So, for the previous example it would be like:
> 
> 
>? ?  [,1][,2][,3][,4][,5]
> 
> [1,]? 3?  54? 11 17?  0
> 
> [2,]? 5? ? 0?  0 78? 20
> 
> 
> I don1t have a clue about how to do this.
> 
> Can someone please help me?
> 
> Thank you
> 
> Rui
> 
>? ?  [[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Seemingly Similar Threads

Logging the click data

R help - Oct 2012 - Help with

[R] Help with

[R] Help with

[R] Help with

[R] Help with

[R] Help with

Seemingly Similar Threads