thr3ads.net - similar to: "Using plyr::dply more (memory) efficiently?"

Displaying 20 results from an estimated 2000 matches similar to: "Using plyr::dply more (memory) efficiently?"

2010 Jan 21

Merging and extracting data from list

Hello R-help group, I have a question about merging lists. I have two lists: Genes list (hSgenes) name chr strand start end transStart transEnd symbol description feature ENSG00000223972 1 1 11874 14412 11874 14412 DEAD/H box polypeptide 11 like 1DEAD/H box polypeptide 11 like 3DEAD/H box polypeptide 11 like 9 ;;

Probably simple function problem

2007 Mar 16

Probably simple function problem

# I have a simple function problem. I thought that I could write a function to modify a couple of vectors but I am doing something wrong #I have a standard cost vector called "fuel" and some adjustments to the #costs called "adjusts". The changes are completely dependend on the length #of the dataframe newdata I then need to take the modifed vectors and use # them later. I

Problem with substr

2011 Nov 15

Problem with substr

Hi, everyone When I ran this cript, There is Error in substring(tmp.subject, tmp.end[ex] + 1, tmp.start[ex + 1] - 1) : invalid substring argument(s) Could someone figure out what the problem is? for(i in 1:length(genebody[,1])){ tmp.id<-as.vector(genebody[i,1]) # get gene id tmp.subject<-as.vector(genebody[i,2]) # get gene sequence

Writing a summary file in R

2011 Jul 27

Writing a summary file in R

Hello, I have an input file: http://r.789695.n4.nabble.com/file/n3700031/testOut.txt testOut.txt where col 1 is chromosome, column2 is start of region, column 3 is end of region, column 4 and 5 is base position, column 6 is total reads, column 7 is methylation data, and column 8 is the strand. I would like a summary output file such as:

extract data from a data frame field

2011 Jun 07

extract data from a data frame field

Hi all, I am given the a data frame in which one of the columns has more information together- see column 4, peak_loc: chr start end peak_loc cluster_TC strand peak_TC 1 chr1 564620 564649 chr1:564644..564645,+ 94 + 10 2 chr1 565369 565404 chr1:565371..565372,+ 217 + 8 3 chr1 565463 565541 chr1:565480..565481,+ 1214 + 15 4 chr1

Merge by Range in R

2017 Sep 04

Merge by Range in R

Hi,? I have two big data set.? data _1 :? > dim(data_1) [1] 15820 5 > head(data_1) ? ?Chromosome ?????Start????????End????????Feature GroupA_3 1: ? ? ? ????????chr1 521369 ?750000 ????chr1-0001 ? ?????0.170 2: ? ? ? ????????chr1 750001 ?800000 ????chr1-0002 ? ????-0.086 3: ? ? ? ????????chr1 800001 ?850000 ????chr1-0003 ? ?????0.006 4: ? ? ? ????????chr1 850001 ?900000 ????chr1-0004 ?

Adding complex new columns to data frame depending on existing column

2013 Feb 03

Adding complex new columns to data frame depending on existing column

Hello I have a data frame as below V1 V2 V3 V4 V5 V6 chr1 18884 C CAAAA 2 0 chr1 135419 TATACA T 2 0 chr1 332045 T TTG 0 2 chr1 453838 T TAC 2 0 chr1 567652 T TG 1 0 chr1 602541 TTTA T 2 0 on which I want to perform complex rearrangement such that: if V3 is a string >1 (i.e line 2) then I

inserting text lines in a dat frame

2008 Feb 06

inserting text lines in a dat frame

Hi Jim I am trying to prepare a bed file to load as accustom track on the UCSC genome browser. I have a data frame that looks like the one below. > x V1 V2 V3 1 chr1 11255 55 2 chr1 11320 29 3 chr1 11400 45 4 chr2 21680 35 5 chr2 21750 84 6 chr2 21820 29 7 chr2 31890 46 8 chr3 32100 29 9 chr3 52380 29 10 chr3 66450 46 I would like to insert the following 4 lines at the beginning:

counting identical data in a column

2008 Feb 04

counting identical data in a column

Hi Peter I have the following data frame with chromosome name, start and end positions: chrN start end 1 chr1 11122333 11122633 2 chr1 11122333 11122633 3 chr3 11122333 11122633 8 chr3 111273334 111273634 7 chr2 12122334 12122634 4 chr1 21122377 21122677 5 chr2 33122355 33122655 6 chr2 33122355 33122655 I would like to count the positions that have the same start and

Intersection of two chromosomal ranges

2012 Mar 04

Intersection of two chromosomal ranges

Hi, I want to merge multiple chromosomal regions based on their common intersecting regions. I tried couple of things using while and if loops but did not work out. I would appreciate if anyone could provide me a small piece of code in R to get the intersection of following example: chr1: 100-150 chr1: 79-250 chr1: 100-175 chr1: 300-350 I want the intersection of all four regions as follow:

return counts of elements on a table column depending on elements on another column

2011 Jun 08

return counts of elements on a table column depending on elements on another column

Hi, I am given the following table: > head(hsa_refseq) chr genome region start stop nu strand nu.1 nu.2 gene_id 1 chr1 hg19_refGene CDS 67000042 67000051 0 + 0 gene_id NM_032291 2 chr1 hg19_refGene exon 66999825 67000051 0 + . gene_id NM_032291 3 chr1 hg19_refGene CDS 67091530 67091593 0 + 2 gene_id NM_032291 4 chr1 hg19_refGene exon

how to search to value to another table

2011 Jan 31

how to search to value to another table

Hello, I'm a new R user. I have two different dummy tables with the variable name tb1 and tb2. tb1< v1 v2 v3 v4 "chr1" 22 23 3 "chr1" 36 37 1 "chr1" 54 55 0 "chr1" 77 78 1 "chr2" 80 81 4 "chr2" 85 86 0 "chr2" 99 100 1

apply with multiple conditions

2012 Jul 02

apply with multiple conditions

Hello all, I have written a for loop to act on a dataframe with close to 3million rows and 6 columns and I would like to pass it to apply() to speed the process up (I let the loop run for 2 days before stopping it and it had only gone through 200,000 rows) but I am really struggling to find a way to pass the arguments. Below are the loop and the head of the dataframe I am working on. Any hints

create a new data frame after comparing two columns of the previous data frame

2011 Jun 27

create a new data frame after comparing two columns of the previous data frame

Hi everyone, I am trying to find a way to filter a table; If I am given for example the following table: > head(intra) chr miRNA start end strand ACC hsa_ID region region_start region_end gene_id transcrip_id 1 chr1 miRNA 1102484 1102578 + ACC="MI0000342"; ID="hsa-mir-200b"; exon 1102484 1102578 NR_029639 NR_029639 2 chr1

Is that an efficient way to find the overlapped , upstream and downstream ranges for a bunch of ranges

2016 Apr 05

Is that an efficient way to find the overlapped , upstream and downstream ranges for a bunch of ranges

I do have a bunch of genes ( nearly ~50000) from the whole genome, which read in genomic ranges A range(gene) can be seem as an observation has three columns chromosome, start and end, like that seqnames start end width strand gene1 chr1 1 5 5 + gene2 chr1 10 15 6 + gene3 chr1 12 17 6 + gene4 chr1 20 25 6 + gene5

warning for inefficiently compressed datasets

2011 Dec 06

warning for inefficiently compressed datasets

Hi, Recently added to doc/NEWS.Rd: 'R CMD check' now gives a warning rather than a note if it finds inefficiently compressed datasets. With 'bzip2' and 'xz' compression having been available since R 2.10.0, there is no excuse for not using them. Why isn't a note enough for this? Generally speaking, warnings are for things that are dangerous, or unsafe,

how to separate char and num within a variable

2009 Feb 05

how to separate char and num within a variable

Hi all, I read in a column which looks like "chr1:000889594-000889638", and need to break them into three columns like "chr1:", "000889594" and "000889638". How shall I do in R. Thanks a lot for your suggestions! Bill

Histogram for each ID value

2011 Oct 17

Histogram for each ID value

I have a dataframe in the general format: chr1 0.5 chr1 0 chr1 0.75 chr2 0 chr2 0 chr3 1 chr3 1 chr3 0.5 chr7 0.75 chr9 1 chr9 1 chr22 0.5 chr22 0.5 where the first column is the chromosome location and the second column is some value. What I'd like to do is have a histogram created for each chr location (i.e. a separate histogram for chr1, chr2, chr3, chr7, chr9, and chr22). I am just

Unexpected behaviour in reading genomic coordinate files of R-2.7.0

2008 May 28

Unexpected behaviour in reading genomic coordinate files of R-2.7.0

Great R people, I have noticed a strange behaviour in read.delim() and friends in the R 2.7.0 version. I will describe you the problem and also the solution I already found, just to be sure it is an expected behaviour and also to tell people, who may experience the same difficulty, a way to overcome it. And also to see if it is a proper behaviour or maybe a correction is needed. Here is the

sort a data.frame

2010 May 20

sort a data.frame

Hello, I have a dataframe: dd <- data.frame(b = c("chr2", "chr1", "chr15", "chr13"), x = c("A", "D", "A", "C"), y = c(8, 3, 9, 9), z = c(1, 1, 1, 2)) >dd b x y z 1 chr2 A 8 1 2 chr1 D 3 1 3 chr15 A 9 1 4 chr13 C 9 2 Now I want to sort them according column "b", but only its

similar to: Using plyr::dply more (memory) efficiently?