Hi, I'm heaving difficulties with a dataset containing gene names and positions of those genes. Not such a big problem, but each gene has multiple exons so it's hard to say where de gene starts and where it ends. I want the starting and ending position of each gene in my dataset. Attached is the dataset: nabble.com/file/p21312449/genlistchrompos.csv genlistchrompos.csv Column 'B' is the gene name, 'G' is the starting position and 'H' is the stop position. You can load the dataset by using: data<-read.csv("genlistchrompos.csv", sep=";") I hope someone can help me, it's giving me headaches for a week now:-((. Thanks! -- View this message in context: nabble.com/for-loop-and-if-problem-tp21312449p21312449.html Sent from the R help mailing list archive at Nabble.com.
> I'm heaving difficulties with a dataset containing gene names andpositions> of those genes. > Not such a big problem, but each gene has multiple exons so it's hard tosay> where de gene starts and where it ends. I want the starting and ending > position of each gene in my dataset. > Attached is the dataset: > nabble.com/file/p21312449/genlistchrompos.csvgenlistchrompos.csv> Column 'B' is the gene name, 'G' is the starting position and 'H' is the > stop position. > You can load the dataset by using: data<-read.csv("genlistchrompos.csv", > sep=";") > I hope someone can help me, it's giving me headaches for a week now:-((.which(diff(as.numeric(data$Gene))!=0) will give you a vector of the last row in each gene. The start position is obviously the next row after the previous end. Also take a look at split(data, data$Gene) Regards, Richie. Mathematical Sciences Unit HSL ------------------------------------------------------------------------ ATTENTION: This message contains privileged and confidential inform...{{dropped:20}}
On Tue, Jan 06, 2009 at 07:21:48AM -0800, Sake wrote:> I'm heaving difficulties with a dataset containing gene names and positions > of those genes. > Not such a big problem, but each gene has multiple exons so it's hard to say > where de gene starts and where it ends. I want the starting and ending > position of each gene in my dataset. > Attached is the dataset: > nabble.com/file/p21312449/genlistchrompos.csv genlistchrompos.csv > Column 'B' is the gene name, 'G' is the starting position and 'H' is the > stop position.I don't really see how 'if' and 'for loops' are involved in the question. You may want to give us a little more detail on what exactly you need and what you tried unsuccessfully. (By the way -- there are no columns labeled 'B', 'G' or 'H' in the file). Anyway - I believe this is what you are after: # get minimum start position by gene aggregate(dat[, c('Exon_Start.Chr.')], by=list(dat$Gene), min) # get maximum stop position by gene aggregate(dat[, c('Exon_Stop.Chr.')], by=list(dat$Gene), max) Of course, these will only reflect the real start and stop coordinates of the gene if ALL exons are given in the file. cu Philipp -- Dr. Philipp Pagel Lehrstuhl f?r Genomorientierte Bioinformatik Technische Universit?t M?nchen Wissenschaftszentrum Weihenstephan 85350 Freising, Germany mips.gsf.de/staff/pagel
On Tue, 6 Jan 2009, Sake wrote:> > Hi, > > I'm heaving difficulties with a dataset containing gene names and positions > of those genes. > Not such a big problem, but each gene has multiple exons so it's hard to say > where de gene starts and where it ends. I want the starting and ending > position of each gene in my dataset.This looks like a minor variant on the 'first and last observation' thread from a few days ago: thread.gmane.org/gmane.comp.lang.r.general/135411 to which several useful solutions were posted. I suggest you read that thread and try to adapt what is there to your situation. If this does not get you all the way there, when you post back it will help to "provide commented, minimal, self-contained, reproducible code". What you have given us is not quite there. Here is a start: data <- read.csv("nabble.com/file/p21312449/genlistchrompos.csv",sep=';') and note that> colnames(data)[1] "Query" "Gene" "Chrom" "Strand" "Accession" [6] "Exon" "Exon_Start.Chr." "Exon_Stop.Chr." "Exon_Start.Trans." "Exon_Stop.Trans." does not include anything like "Column 'B'", so refer to those column names if you need further help after studying the thread above. HTH, Chuck> Attached is the dataset: > nabble.com/file/p21312449/genlistchrompos.csv genlistchrompos.csv > Column 'B' is the gene name, 'G' is the starting position and 'H' is the > stop position. > You can load the dataset by using: data<-read.csv("genlistchrompos.csv", > sep=";") > I hope someone can help me, it's giving me headaches for a week now:-((. > > Thanks! > > -- > View this message in context: nabble.com/for-loop-and-if-problem-tp21312449p21312449.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >Charles C. Berry (858) 534-2098 Dept of Family/Preventive Medicine E mailto:cberry at tajo.ucsd.edu UC San Diego famprevmed.ucsd.edu/faculty/cberry La Jolla, San Diego 92093-0901
Sake wrote:> > Hi, > > I'm heaving difficulties with a dataset containing gene names and > positions of those genes. > Not such a big problem, but each gene has multiple exons so it's hard to > say where de gene starts and where it ends. I want the starting and ending > position of each gene in my dataset. > Attached is the dataset: > nabble.com/file/p21312449/genlistchrompos.csv > genlistchrompos.csv > Column 'B' is the gene name, 'G' is the starting position and 'H' is the > stop position. > You can load the dataset by using: data<-read.csv("genlistchrompos.csv", > sep=";") > I hope someone can help me, it's giving me headaches for a week now:-((. > > Thanks! > >Thanks for the tips, i'm going to test them today! The B,G,H columns I mentioned are the columns you see when you open the file in Excel, I should have said that. Sorry for the confusion about that:-) I thought I had to use the 'if' statement because I only want to search for the Min and Max if the Gene name is the same as the one directly under it. And the 'for loop' I wanted to use to apply the 'if' statement to the entire row of gene names. Edit: I have tested: aggregate(data[, c("Exon_Start.Chr.")], by = list(data$Gene), min) aggregate(data[, c("Exon_Stop.Chr.")], by = list(data$Gene), max) And it worked like a charm! thanx! -- View this message in context: nabble.com/for-loop-and-if-problem-tp21312449p21326557.html Sent from the R help mailing list archive at Nabble.com.
On Wed, Jan 7, 2009 at 3:51 AM, Sake <tlep.nav.ekas at hccnet.nl> wrote:> aggregate(data[, c("Exon_Start.Chr.")], by = list(data$Gene), min) > aggregate(data[, c("Exon_Stop.Chr.")], by = list(data$Gene), max)That could be written: aggregate(data["Excon_Start.Chr."], data["Gene"], min) aggregate(data["Excon_Start.Chr."], data["Gene"], max)
I have one final question... How can I save a CSV ifile with ; separation in stead of , separation? I know the write.csv(file="filename.csv") an that you can use sep=";" when you open a .csv file, but that doesn't work with the write.csv command. -- View this message in context: nabble.com/for-loop-and-if-problem-tp21312449p21412888.html Sent from the R help mailing list archive at Nabble.com.
Try using: write.table(..., sep=";") write.csv just calls write.table On Mon, Jan 12, 2009 at 6:38 AM, Sake <tlep.nav.ekas at hccnet.nl> wrote:> > I have one final question... > How can I save a CSV ifile with ; separation in stead of , separation? > I know the write.csv(file="filename.csv") an that you can use sep=";" when > you open a .csv file, but that doesn't work with the write.csv command. > -- > View this message in context: nabble.com/for-loop-and-if-problem-tp21312449p21412888.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve?
write.csv does exactly what you would expect ... creates a *Comma* Separated Values file. If you don't want a comma separated value format then use write.table with sep=";" You can still name it "whatever.csv". Or you if you also intend commas for decimal points, use write.csv2 as described in the help page: "write.csv2 uses a comma for the decimal point and a semicolon for the separator, the Excel convention for CSV files in some Western European locales." -- David Winsemius On Jan 12, 2009, at 6:38 AM, Sake wrote:> > I have one final question... > How can I save a CSV ifile with ; separation in stead of , separation? > I know the write.csv(file="filename.csv") an that you can use > sep=";" when > you open a .csv file, but that doesn't work with the write.csv > command. > -- > View this message in context: nabble.com/for-loop-and-if-problem-tp21312449p21412888.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.