Displaying 20 results from an estimated 2000 matches similar to: "Using plyr::dply more (memory) efficiently?"
2010 Jan 21
1
Merging and extracting data from list
Hello R-help group,
I have a question about merging lists. I have two lists:
Genes list (hSgenes)
name chr strand start end transStart transEnd
symbol description feature
ENSG00000223972 1 1 11874 14412 11874 14412
DEAD/H box polypeptide 11 like 1DEAD/H box polypeptide 11 like 3DEAD/H
box polypeptide 11 like 9 ;;
2007 Mar 16
1
Probably simple function problem
# I have a simple function problem. I thought that I
could write a function to modify a couple of vectors
but I am doing something wrong
#I have a standard cost vector called "fuel" and some
adjustments to the
#costs called "adjusts". The changes are completely
dependend on the length
#of the dataframe newdata I then need to take the
modifed vectors and use
# them later. I
2011 Nov 15
1
Problem with substr
Hi, everyone
When I ran this cript, There is Error in substring(tmp.subject, tmp.end[ex]
+ 1, tmp.start[ex + 1] - 1) :
invalid substring argument(s)
Could someone figure out what the problem is?
for(i in 1:length(genebody[,1])){
tmp.id<-as.vector(genebody[i,1]) # get gene id
tmp.subject<-as.vector(genebody[i,2]) # get gene sequence
2011 Jul 27
2
Writing a summary file in R
Hello,
I have an input file:
http://r.789695.n4.nabble.com/file/n3700031/testOut.txt testOut.txt
where col 1 is chromosome, column2 is start of region, column 3 is end of
region, column 4 and 5 is base position, column 6 is total reads, column 7
is methylation data, and column 8 is the strand.
I would like a summary output file such as:
2011 Jun 07
1
extract data from a data frame field
Hi all,
I am given the a data frame in which one of the columns has more information
together- see column 4, peak_loc:
chr start end peak_loc cluster_TC strand peak_TC
1 chr1 564620 564649 chr1:564644..564645,+ 94 + 10
2 chr1 565369 565404 chr1:565371..565372,+ 217 + 8
3 chr1 565463 565541 chr1:565480..565481,+ 1214 + 15
4 chr1
2017 Sep 04
1
Merge by Range in R
Hi,?
I have two big data set.?
data _1 :?
> dim(data_1)
[1] 15820 5
> head(data_1)
? ?Chromosome ?????Start????????End????????Feature GroupA_3
1: ? ? ? ????????chr1 521369 ?750000 ????chr1-0001 ? ?????0.170
2: ? ? ? ????????chr1 750001 ?800000 ????chr1-0002 ? ????-0.086
3: ? ? ? ????????chr1 800001 ?850000 ????chr1-0003 ? ?????0.006
4: ? ? ? ????????chr1 850001 ?900000 ????chr1-0004 ?
2013 Feb 03
1
Adding complex new columns to data frame depending on existing column
Hello
I have a data frame as below
V1 V2 V3 V4 V5 V6
chr1 18884 C CAAAA 2 0
chr1 135419 TATACA T 2 0
chr1 332045 T TTG 0 2
chr1 453838 T TAC 2 0
chr1 567652 T TG 1 0
chr1 602541 TTTA T 2 0
on which I want to perform complex rearrangement such that:
if V3 is a string >1 (i.e line 2) then I
2008 Feb 06
4
inserting text lines in a dat frame
Hi Jim
I am trying to prepare a bed file to load as accustom track on the UCSC genome browser.
I have a data frame that looks like the one below.
> x
V1 V2 V3
1 chr1 11255 55
2 chr1 11320 29
3 chr1 11400 45
4 chr2 21680 35
5 chr2 21750 84
6 chr2 21820 29
7 chr2 31890 46
8 chr3 32100 29
9 chr3 52380 29
10 chr3 66450 46
I would like to insert the following 4 lines at the beginning:
2008 Feb 04
1
counting identical data in a column
Hi Peter
I have the following data frame with chromosome name, start and end positions:
chrN start end
1 chr1 11122333 11122633
2 chr1 11122333 11122633
3 chr3 11122333 11122633
8 chr3 111273334 111273634
7 chr2 12122334 12122634
4 chr1 21122377 21122677
5 chr2 33122355 33122655
6 chr2 33122355 33122655
I would like to count the positions that have the same start and
2012 Mar 04
1
Intersection of two chromosomal ranges
Hi,
I want to merge multiple chromosomal regions based on their common
intersecting regions. I tried couple of things using while and if loops but
did not work out.
I would appreciate if anyone could provide me a small piece of code in R to
get the intersection of following example:
chr1: 100-150
chr1: 79-250
chr1: 100-175
chr1: 300-350
I want the intersection of all four regions as follow:
2011 Jun 08
1
return counts of elements on a table column depending on elements on another column
Hi,
I am given the following table:
> head(hsa_refseq)
chr genome region start stop nu strand nu.1 nu.2
gene_id
1 chr1 hg19_refGene CDS 67000042 67000051 0 + 0 gene_id
NM_032291
2 chr1 hg19_refGene exon 66999825 67000051 0 + . gene_id
NM_032291
3 chr1 hg19_refGene CDS 67091530 67091593 0 + 2 gene_id
NM_032291
4 chr1 hg19_refGene exon
2011 Jan 31
1
how to search to value to another table
Hello,
I'm a new R user.
I have two different dummy tables with the variable name tb1 and tb2.
tb1<
v1 v2 v3 v4
"chr1" 22 23 3
"chr1" 36 37 1
"chr1" 54 55 0
"chr1" 77 78 1
"chr2" 80 81 4
"chr2" 85 86 0
"chr2" 99 100 1
2012 Jul 02
1
apply with multiple conditions
Hello all,
I have written a for loop to act on a dataframe with close to 3million rows
and 6 columns and I would like to pass it to apply() to speed the process up
(I let the loop run for 2 days before stopping it and it had only gone
through 200,000 rows) but I am really struggling to find a way to pass the
arguments. Below are the loop and the head of the dataframe I am working on.
Any hints
2011 Jun 27
1
create a new data frame after comparing two columns of the previous data frame
Hi everyone,
I am trying to find a way to filter a table; If I am given for example the
following table:
> head(intra)
chr miRNA start end strand ACC hsa_ID
region region_start region_end gene_id transcrip_id
1 chr1 miRNA 1102484 1102578 + ACC="MI0000342"; ID="hsa-mir-200b";
exon 1102484 1102578 NR_029639 NR_029639
2 chr1
2016 Apr 05
2
Is that an efficient way to find the overlapped , upstream and downstream ranges for a bunch of ranges
I do have a bunch of genes ( nearly ~50000) from the whole genome, which read in genomic ranges
A range(gene) can be seem as an observation has three columns chromosome, start and end, like that
seqnames start end width strand
gene1 chr1 1 5 5 +
gene2 chr1 10 15 6 +
gene3 chr1 12 17 6 +
gene4 chr1 20 25 6 +
gene5
2011 Dec 06
1
warning for inefficiently compressed datasets
Hi,
Recently added to doc/NEWS.Rd:
'R CMD check' now gives a warning rather than a note if it finds
inefficiently compressed datasets. With 'bzip2' and 'xz' compression
having been available since R 2.10.0, there is no excuse for not
using them.
Why isn't a note enough for this?
Generally speaking, warnings are for things that are dangerous,
or unsafe,
2009 Feb 05
3
how to separate char and num within a variable
Hi all,
I read in a column which looks like "chr1:000889594-000889638", and need to break them into three columns like "chr1:", "000889594" and "000889638". How shall I do in R. Thanks a lot for your suggestions!
Bill
2011 Oct 17
2
Histogram for each ID value
I have a dataframe in the general format:
chr1 0.5
chr1 0
chr1 0.75
chr2 0
chr2 0
chr3 1
chr3 1
chr3 0.5
chr7 0.75
chr9 1
chr9 1
chr22 0.5
chr22 0.5
where the first column is the chromosome location and the second column is
some value. What I'd like to do is have a histogram created for each chr
location (i.e. a separate histogram for chr1, chr2, chr3, chr7, chr9, and
chr22). I am just
2008 May 28
2
Unexpected behaviour in reading genomic coordinate files of R-2.7.0
Great R people,
I have noticed a strange behaviour in read.delim() and friends in the R
2.7.0 version. I will describe you the problem and also the solution I
already found, just to be sure it is an expected behaviour and also to
tell people, who may experience the same difficulty, a way to overcome it.
And also to see if it is a proper behaviour or maybe a correction is needed.
Here is the
2010 May 20
5
sort a data.frame
Hello,
I have a dataframe:
dd <- data.frame(b = c("chr2", "chr1", "chr15", "chr13"),
x = c("A", "D", "A", "C"), y = c(8, 3, 9, 9),
z = c(1, 1, 1, 2))
>dd
b x y z
1 chr2 A 8 1
2 chr1 D 3 1
3 chr15 A 9 1
4 chr13 C 9 2
Now I want to sort them according column "b", but only its