Good evening, I am attempting to anaylze the protein expression data contained within these two ICGC, TCGA datasets (one for GBM and the other for LGG) *File for GBM protein expression*: https://dcc.icgc.org/search?filters=%7B%22donor%22:%7B%22projectId%22:%7B%22is%22:%5B%22GBM-US%22%5D%7D,%22availableDataTypes%22:%7B%22is%22:%5B%22pexp%22%5D%7D%7D%7D *File for LGG protein expression:* *https://dcc.icgc.org/search?filters=%7B%22donor%22:%7B%22projectId%22:%7B%22is%22:%5B%22LGG-US%22%5D%7D,%22availableDataTypes%22:%7B%22is%22:%5B%22pexp%22%5D%7D%7D%7D <https://dcc.icgc.org/search?filters=%7B%22donor%22:%7B%22projectId%22:%7B%22is%22:%5B%22LGG-US%22%5D%7D,%22availableDataTypes%22:%7B%22is%22:%5B%22pexp%22%5D%7D%7D%7D>* When I tried to transfer the files from .txt (via Notepad) to .csv (via Excel), the data appeared in the columns as unorganized and random script... not like how a typical csv should be arranged at all. I need the dataset to be converted into .csv in order to analyze it in R, which is why I am hoping someone here might help me in doing that. If not, is there perhaps some other way that I could analyze the datatsets on R, which again is downloaded from the dataportal ICGC? Best, Spencer Brackett [[alternative HTML version deleted]]
Inline. -- Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Wed, Dec 26, 2018 at 3:04 PM Spencer Brackett < spbrackett20 at saintjosephhs.com> wrote:> Good evening, > > I am attempting to anaylze the protein expression data contained within > these two ICGC, TCGA datasets (one for GBM and the other for LGG) > > ... > When I tried to transfer the files from .txt (via Notepad) to .csv (via > Excel), the data appeared in the columns as unorganized and random > script... not like how a typical csv should be arranged at all. I need the > dataset to be converted into .csv in order to analyze it in R,Huh?? Why do you think this? A csv is just a comma delimited text file. R can input pretty much any kind of file, ONCE YOU KNOW THE FORMAT OF WHAT YOU ARE INPUTTING. This should be provided by the links that you gave. Then see ?read.table or, more generally, ?scan for how to read the (text) file into R into whatever data structure you need. See also the R data import/export manual. Or possibly post to the Bioconductor list where they specialize in this sort of thing and may already have packages that can access the repositories and bring in the data in the form you need them. They also have lots of software there for analysis, too. Cheers, Bert> which is why > I am hoping someone here might help me in doing that. If not, is there > perhaps some other way that I could analyze the datatsets on R, which again > is downloaded from the dataportal ICGC? > > Best, > > Spencer Brackett > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
I looked at the first file. It gives an option to download as TSV (tab separated values). That is the same as CSV except with tabs instead of commas. You do not need any external software to read it. Read the downloaded file directly into R. read.delim looks as if it would work directly on the downloaded file. ?read.delim The notation "\t" means the tab character. As an aside, stay away from notepad. it is too naive for almost anything interesting. The specific case I often see is people reading linux-style text files with notepad, which doesn't understand NL terminated lines. nicely formatted text files become illegible. On Wed, Dec 26, 2018 at 6:04 PM Spencer Brackett <spbrackett20 at saintjosephhs.com> wrote:> > Good evening, > > I am attempting to anaylze the protein expression data contained within > these two ICGC, TCGA datasets (one for GBM and the other for LGG) > > *File for GBM protein expression*: > https://dcc.icgc.org/search?filters=%7B%22donor%22:%7B%22projectId%22:%7B%22is%22:%5B%22GBM-US%22%5D%7D,%22availableDataTypes%22:%7B%22is%22:%5B%22pexp%22%5D%7D%7D%7D > > *File for LGG protein expression:* > > > *https://dcc.icgc.org/search?filters=%7B%22donor%22:%7B%22projectId%22:%7B%22is%22:%5B%22LGG-US%22%5D%7D,%22availableDataTypes%22:%7B%22is%22:%5B%22pexp%22%5D%7D%7D%7D > <https://dcc.icgc.org/search?filters=%7B%22donor%22:%7B%22projectId%22:%7B%22is%22:%5B%22LGG-US%22%5D%7D,%22availableDataTypes%22:%7B%22is%22:%5B%22pexp%22%5D%7D%7D%7D>* > > When I tried to transfer the files from .txt (via Notepad) to .csv (via > Excel), the data appeared in the columns as unorganized and random > script... not like how a typical csv should be arranged at all. I need the > dataset to be converted into .csv in order to analyze it in R, which is why > I am hoping someone here might help me in doing that. If not, is there > perhaps some other way that I could analyze the datatsets on R, which again > is downloaded from the dataportal ICGC? > > Best, > > Spencer Brackett > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Mr. Heiberger, Thank you for the insight! I will try out suggestion. Best, Spencer Brackett On Wed, Dec 26, 2018 at 6:34 PM Richard M. Heiberger <rmh at temple.edu> wrote:> I looked at the first file. It gives an option to download as TSV > (tab separated values). > That is the same as CSV except with tabs instead of commas. > You do not need any external software to read it. Read the downloaded > file directly into R. > > read.delim looks as if it would work directly on the downloaded file. > ?read.delim > The notation "\t" means the tab character. > > As an aside, stay away from notepad. it is too naive for almost > anything interesting. > The specific case I often see is people reading linux-style text files > with notepad, which doesn't > understand NL terminated lines. nicely formatted text files become > illegible. > > On Wed, Dec 26, 2018 at 6:04 PM Spencer Brackett > <spbrackett20 at saintjosephhs.com> wrote: > > > > Good evening, > > > > I am attempting to anaylze the protein expression data contained within > > these two ICGC, TCGA datasets (one for GBM and the other for LGG) > > > > *File for GBM protein expression*: > > > https://dcc.icgc.org/search?filters=%7B%22donor%22:%7B%22projectId%22:%7B%22is%22:%5B%22GBM-US%22%5D%7D,%22availableDataTypes%22:%7B%22is%22:%5B%22pexp%22%5D%7D%7D%7D > > > > *File for LGG protein expression:* > > > > > > * > https://dcc.icgc.org/search?filters=%7B%22donor%22:%7B%22projectId%22:%7B%22is%22:%5B%22LGG-US%22%5D%7D,%22availableDataTypes%22:%7B%22is%22:%5B%22pexp%22%5D%7D%7D%7D > > < > https://dcc.icgc.org/search?filters=%7B%22donor%22:%7B%22projectId%22:%7B%22is%22:%5B%22LGG-US%22%5D%7D,%22availableDataTypes%22:%7B%22is%22:%5B%22pexp%22%5D%7D%7D%7D > >* > > > > When I tried to transfer the files from .txt (via Notepad) to .csv (via > > Excel), the data appeared in the columns as unorganized and random > > script... not like how a typical csv should be arranged at all. I need > the > > dataset to be converted into .csv in order to analyze it in R, which is > why > > I am hoping someone here might help me in doing that. If not, is there > > perhaps some other way that I could analyze the datatsets on R, which > again > > is downloaded from the dataportal ICGC? > > > > Best, > > > > Spencer Brackett > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]