Dear all, Does R has any function/package that can pack and unpack string into bit size? The reason I want to do this in R is that R has much more native statistical function than Perl. Yet the data I need to process is so large that it required me to compress it into smaller unit -> process it -> finally recover them back again into string with new information. In Perl the implementation will look like this: I wonder how can this be implemented in R. __BEGIN__ my %charmap = ( A => '00', C => '01', G => '10', T => '11', ); my %digmap = ( '00' => "A", '01' => "C", '10' => "G", '11' => "T", ); my $string = 'GATTA'; $string =~ s/(.)/$charmap{$1}/ge; my $compressed = pack 'b*', $string; print "COMP: $compressed\n"; printf "%d bytes\n", length $compressed; my @data; # Store the compressed bit into array push @data, $compressed; # process the array foreach my $dat ( @data ) { my $decompressed = unpack 'b*', $dat; $decompressed =~ s/(..)/$digmap{$1}/ge; print "$decompressed\n"; # or do further processing on $dat } __END__ - Gundala Viswanath Jakarta - Indonesia
Try this: ## 1 map <- list(A = '00', C = '01', G = '10', T = '11') myStr <- 'GATTA' paste(map[unlist(strsplit(myStr, NULL))], collapse = "") ## 2 cod <- "1000111100" library(gsubfn) strapply(cod, '[0-9]{2}') names(map)[match(unlist(strapply(cod, '[0-9]{2}')), map)] On Fri, Jan 9, 2009 at 1:50 PM, Gundala Viswanath <gundalav@gmail.com>wrote:> Dear all, > > Does R has any function/package that can pack > and unpack string into bit size? > > The reason I want to do this in R is that R > has much more native statistical function than Perl. > > Yet the data I need to process is so large that it > required me to compress it into smaller unit -> process it -> finally > recover them back again into string with new information. > > In Perl the implementation will look like this: > I wonder how can this be implemented in R. > > __BEGIN__ > my %charmap = ( > A => '00', > C => '01', > G => '10', > T => '11', > ); > > my %digmap = ( > '00' => "A", > '01' => "C", > '10' => "G", > '11' => "T", > ); > > my $string = 'GATTA'; > $string =~ s/(.)/$charmap{$1}/ge; > > my $compressed = pack 'b*', $string; > > print "COMP: $compressed\n"; > printf "%d bytes\n", length $compressed; > > my @data; > > # Store the compressed bit into array > push @data, $compressed; > > # process the array > foreach my $dat ( @data ) { > > my $decompressed = unpack 'b*', $dat; > $decompressed =~ s/(..)/$digmap{$1}/ge; > > print "$decompressed\n"; > # or do further processing on $dat > } > __END__ > > > - Gundala Viswanath > Jakarta - Indonesia > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40" S 49° 16' 22" O [[alternative HTML version deleted]]
see: http://www.nabble.com/Compressing-String-in-R-td21160453.html On Fri, Jan 9, 2009 at 10:50 AM, Gundala Viswanath <gundalav at gmail.com> wrote:> Dear all, > > Does R has any function/package that can pack > and unpack string into bit size? > > The reason I want to do this in R is that R > has much more native statistical function than Perl. > > Yet the data I need to process is so large that it > required me to compress it into smaller unit -> process it -> finally > recover them back again into string with new information. > > In Perl the implementation will look like this: > I wonder how can this be implemented in R. > > __BEGIN__ > my %charmap = ( > A => '00', > C => '01', > G => '10', > T => '11', > ); > > my %digmap = ( > '00' => "A", > '01' => "C", > '10' => "G", > '11' => "T", > ); > > my $string = 'GATTA'; > $string =~ s/(.)/$charmap{$1}/ge; > > my $compressed = pack 'b*', $string; > > print "COMP: $compressed\n"; > printf "%d bytes\n", length $compressed; > > my @data; > > # Store the compressed bit into array > push @data, $compressed; > > # process the array > foreach my $dat ( @data ) { > > my $decompressed = unpack 'b*', $dat; > $decompressed =~ s/(..)/$digmap{$1}/ge; > > print "$decompressed\n"; > # or do further processing on $dat > } > __END__ > > > - Gundala Viswanath > Jakarta - Indonesia > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve?
Gundala -- Gundala Viswanath wrote:> Dear all, > > Does R has any function/package that can pack > and unpack string into bit size?All of your questions relate to DNA strings. The R/Bioconductor package Biostrings is designed to manipulate such objects. It does not necessarily address this particular problem (because in general DNA strings contain any of the 16 IUPAC symbols and hence compression becomes less compelling, and as you indicate even with compression the size of the data means that one might often need to process parts of the data at a time), but may provide useful containers and methods that make such issues less important. > source('http://bioconductor.org/biocLite.R') > biocLite('Biostrings') > library('Biostrings') see also the vignettes for the package, available within R or for example at http://bioconductor.org/packages/release/bioc/html/Biostrings.html It seems that you have data suitable for representation as a DNAStringSet. The package is actively developed, and using the 'devel' version of R (and hence 'devel' version of Biostrings) might provide additional important facilities. If this proves useful then follow-up questions should use the Bioconductor mailing lists http://bioconductor.org/docs/mailList.html Martin> The reason I want to do this in R is that R > has much more native statistical function than Perl. > > Yet the data I need to process is so large that it > required me to compress it into smaller unit -> process it -> finally > recover them back again into string with new information. > > In Perl the implementation will look like this: > I wonder how can this be implemented in R. > > __BEGIN__ > my %charmap = ( > A => '00', > C => '01', > G => '10', > T => '11', > ); > > my %digmap = ( > '00' => "A", > '01' => "C", > '10' => "G", > '11' => "T", > ); > > my $string = 'GATTA'; > $string =~ s/(.)/$charmap{$1}/ge; > > my $compressed = pack 'b*', $string; > > print "COMP: $compressed\n"; > printf "%d bytes\n", length $compressed; > > my @data; > > # Store the compressed bit into array > push @data, $compressed; > > # process the array > foreach my $dat ( @data ) { > > my $decompressed = unpack 'b*', $dat; > $decompressed =~ s/(..)/$digmap{$1}/ge; > > print "$decompressed\n"; > # or do further processing on $dat > } > __END__ > > > - Gundala Viswanath > Jakarta - Indonesia > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Martin Morgan Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M2 B169 Phone: (206) 667-2793