thr3ads.net - R help - [R] Convert COLON separated format [Oct 2012]

If this information is useful, please help other people find it:
Share via:

Noah Silverman

2012-Oct-09 04:56 UTC

[R] Convert COLON separated format

I have a bunch of data sets that were created for the libsvm tool.  They are in
"colon separated sparse format".

i.e.

1  5:1  27:3  345:10

Is a row with the label of "1" and only has values in columns 5, 27,
and 345.

I want to read these into a data.frame in R.  

Is there a simple way to do this?

--
Noah Silverman, M.S.
UCLA Department of Statistics
8117 Math Sciences Building
Los Angeles, CA 90095

Hasan Diwan

2012-Oct-09 05:15 UTC

head link

[R] Convert COLON separated format

Mr Silverman,

On 9 October 2012 00:56, Noah Silverman <noahsilverman@ucla.edu> wrote:
> I have a bunch of data sets that were created for the libsvm tool.  They
> are in "colon separated sparse format".
> Is there a simple way to do this?
>
Use read.table with a sep of ':' and let me know how you get on. -- H
-- 
Sent from my mobile device
Envoyait de mon portable

	[[alternative HTML version deleted]]

Rui Barradas

2012-Oct-09 05:28 UTC

head link

[R] Convert COLON separated format

Hello,

Here's a function that doesn't do it all but might help.

fun <- function(x){
     x1 <- unlist(strsplit(x, " "))
     x2 <- x1[nchar(x1) > 0]
     i <- as.integer(x2[1])
     x3 <- unlist(strsplit(x2[-1], ":"))
     j <- as.integer(x3[rep(c(TRUE, FALSE), length(x3)/2)])
     y <- numeric(max(j))
     y[j] <- as.numeric(x3[rep(c(FALSE, TRUE), length(x3)/2)])
     list(row = i, line = y)
}

x <- "1  5:1  27:3  345:10"
fun(x)

If you know that your labels, i.e., row numbers are consecutive, have 
the function return just 'y', not a list.
Then use readLines to read the file in and lapply fun to it. Something like

ln <- readLines(filename)
lst <- lapply(ln, fun)

Then you'll have another problem. The lines' lengths. They shouldn't
be
all the same, so in order to make a data.frame or matrix you'll need 
extra work. Try the code above and say whether it's on the right track.

Also, take a look at package Matrix. It's a recommended package and it 
implements sparse matrices.

Hope this helps,

Rui Barradas

Em 09-10-2012 05:56, Noah Silverman escreveu:> I have a bunch of data sets that were created for the libsvm tool.  They
are in "colon separated sparse format".
>
> i.e.
>
> 1  5:1  27:3  345:10
>
> Is a row with the label of "1" and only has values in columns 5,
27, and 345.
>
> I want to read these into a data.frame in R.
>
> Is there a simple way to do this?
>
> --
> Noah Silverman, M.S.
> UCLA Department of Statistics
> 8117 Math Sciences Building
> Los Angeles, CA 90095
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

jim holtman

2012-Oct-09 12:10 UTC

head link

[R] Convert COLON separated format

If you want something that is fast, read the file in, strip off the
colon/data, write it out to a temp and then read it back in.  Here is
a 355K line file:
> temp <- tempfile()
> input <- readLines('/temp/colon.txt')
> length(input)
[1] 355212> system.time(input <- gsub("(:[0-9]+)", "", input))   user  system elapsed
   0.72    0.00    0.74> head(input)[1] "1  5  27  345" "1  5  27  345" "1  5  27 
345" "1  5  27  345" "1
 5  27  345" "1  5  27  345"> writeLines(input, temp)
> system.time(newInput <- read.table(temp))   user  system elapsed
   1.08    0.02    1.13> dim(newInput)
[1] 355212      4>
> head(newInput)  V1 V2 V3  V4
1  1  5 27 345
2  1  5 27 345
3  1  5 27 345
4  1  5 27 345
5  1  5 27 345
6  1  5 27 345


On Tue, Oct 9, 2012 at 12:56 AM, Noah Silverman <noahsilverman at
ucla.edu> wrote:> I have a bunch of data sets that were created for the libsvm tool.  They
are in "colon separated sparse format".
>
> i.e.
>
> 1  5:1  27:3  345:10
>
> Is a row with the label of "1" and only has values in columns 5,
27, and 345.
>
> I want to read these into a data.frame in R.
>
> Is there a simple way to do this?
>
> --
> Noah Silverman, M.S.
> UCLA Department of Statistics
> 8117 Math Sciences Building
> Los Angeles, CA 90095
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

William Dunlap

2012-Oct-09 16:01 UTC

head link

[R] Convert COLON separated format

Matrix::spMatrix can help.

Read your data file with lns <- readLines("fileName") to get
something like
   lns <- c("1 5:15 7:17 9:19",
                 "2 2:22 8:28",
                 "4 6:46")
Then use a function like the following that reformats the
data to the i=row,j=col,x=value vectors that spMatrix can use.
   f <- function(lns, nrow=NULL, ncol=NULL)
   {
      # expect lines of the form
"rowNum<whiteSpace>colNum:value[<whiteSpace>colNum:value
...]"
      triples <- unlist(lapply(strsplit(lns, "[ \t]+"),
function(ln)paste(sep=":",ln[1],ln[-1]))))
      triples <- strsplit(triples, ":")
      if (any(which <- vapply(triples, length, 0) != 3))
stop("formatting error")
      ijx <- matrix(as.numeric(unlist(triples)), ncol=3, byrow=TRUE)
      if (is.null(nrow)) nrow <- max(ijx[,1])
      if (is.null(ncol)) ncol <- max(ijx[,2])
      spMatrix(nrow=nrow, ncol=ncol, i=ijx[,1], j=ijx[,2], x=ijx[,3])
   }
Use it as> f(lns)4 x 9 sparse Matrix of class "dgTMatrix"

[1,] .  . . . 15  . 17  . 19
[2,] . 22 . .  .  .  . 28  .
[3,] .  . . .  .  .  .  .  .
[4,] .  . . .  . 46  .  .  .

or, if you know the number of rows and columns, tell it:
> f(lns, 10, 10)10 x 10 sparse Matrix of class "dgTMatrix"

 [1,] .  . . . 15  . 17  . 19 .
 [2,] . 22 . .  .  .  . 28  . .
 [3,] .  . . .  .  .  .  .  . .
 [4,] .  . . .  . 46  .  .  . .
 [5,] .  . . .  .  .  .  .  . .
 [6,] .  . . .  .  .  .  .  . .
 [7,] .  . . .  .  .  .  .  . .
 [8,] .  . . .  .  .  .  .  . .
 [9,] .  . . .  .  .  .  .  . .
[10,] .  . . .  .  .  .  .  . .

Use as.matrix() on its output if you don't want to continue
using the sparse matrix format.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com

> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at
r-project.org] On Behalf
> Of Noah Silverman
> Sent: Monday, October 08, 2012 9:57 PM
> To: r-help
> Subject: [R] Convert COLON separated format
> 
> I have a bunch of data sets that were created for the libsvm tool.  They
are in "colon
> separated sparse format".
> 
> i.e.
> 
> 1  5:1  27:3  345:10
> 
> Is a row with the label of "1" and only has values in columns 5,
27, and 345.
> 
> I want to read these into a data.frame in R.
> 
> Is there a simple way to do this?
> 
> --
> Noah Silverman, M.S.
> UCLA Department of Statistics
> 8117 Math Sciences Building
> Los Angeles, CA 90095
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Possibly Parallel Threads

Search for more apparently analagous threads

R help - Oct 2012 - Convert COLON separated format

[R] Convert COLON separated format

[R] Convert COLON separated format

[R] Convert COLON separated format

[R] Convert COLON separated format

[R] Convert COLON separated format

Possibly Parallel Threads