Hi Bert,
Sorry that I was in a hurry going home yesterday afternoon and just posted my
question and hoped to get some advice.
Here is what I got yesterday before going home.
---------------------------------------------------------------
setwd("C:/Awork/VNTR/GETXdata/GTEx_genotypes")
file_list <- list.files(pattern="*.out")
#to read all 652 files into Rstudio and found that NOT all files have same
number of rows
for (i in 1:length(file_list)){
assign( substr(file_list[i], 1, nchar(file_list[i]) -4) ,
read.delim(file_list[i], head=F))
}
#the first file, GTEX_1117F, in the following format, one column and 19482 rows
#4 is marker id, 25/48 is its marker value;
# V1
# 4
# 25/48
# 201
# 2/2
# ...
# 648589
# None
#to make this one-column file into a two-column file as below
# so first column is marker id, second is corresponding marker values for the
sample GTEX_1117F
# VNTRid GTEX_1117F
# 4 25/48
# 201 2/2
# ... ...
# 648589 None
for (i in 1:length(file_list)){
temp <- read.delim(file_list[i], head=F)
even <-seq(2, length(temp$V1),2)
odd <-seq(1, length(temp$V1)-1, 2)
output <-matrix(0, ncol=2, nrow=length(temp$V1)/2)
colnames(output)<- c("VNTRid",substr(file_list[i], 1,
nchar(file_list[i]) -4))
for (j in 1:length(temp$V1)/2){
output[j,1]<- as.character(temp$V1)[odd[j]]
output[j,2]<- as.character(temp$V1)[even[j]]}
assign(gsub("-","_", substr(file_list[i], 1,
nchar(file_list[i])-4)), as.data.frame(output))
}
Yesterday, I intended to reshape the output file above from long to wide using
VNTRid as key.
Since not all files have the same number of rows, after reshaping, those file
would not bind correctly using rbind function.
One my way to work place this morning, I changed my intension; I will not
reshape to wide format and actually like the long format I generated. I will
read in a VNTR marker annotation file including VNTRid in first column and
marker locations in human chromosomes in the second column, this annotation file
should include all the VNTR markers. I know the VNTRid in the annotation file
are same as the VNTRid in the 652 file I read in.
Do you know a good way to merge all those 652 files (with two columns) ?
Thank you,
Ding
#merge all 652 files into one file with VNTRid as first column, 2nd to 653th
column are genotype with header
#as sample ID, so
From: Bert Gunter [mailto:bgunter.4567 at gmail.com]
Sent: Thursday, December 19, 2019 6:52 PM
To: Yuan Chun Ding
Cc: r-help at r-project.org
Subject: Re: [R] data reshape
________________________________
[Attention: This email came from an external source. Do not open attachments or
click on links from unknown senders or unexpected emails.]
________________________________
Did you even make an attempt to do this? -- or would you like us do all your
work for you?
If you made an attempt, show us your code and errors.
If not, we usually expect you to try on your own first.
If you have no idea where to start, perhaps you need to spend some more time
with tutorials to learn basic R functionality before proceeding.
Bert
"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
On Thu, Dec 19, 2019 at 6:01 PM Yuan Chun Ding <ycding at
coh.org<mailto:ycding at coh.org>> wrote:
Hi R users,
I have a folder (called genotype) with 652 files; the file names are
GTEX-1A3MV.out, GTEX-1A3MX.out, GTEX-1B8SF.out, etc; in each file, only one
column of data without a header as below
201
2/2
238
3/4
245
1/2
.....
983255
3/3
983766
None
A total of 20528 rows;
I need to read all those 652 files in the genotype folder and then reshape the
one column in each file as:
SampleID 201 238 245 .... 983255
983766
GTEX-1A3MV 2/2 3/4 1/2 3/3
None
There are 10264 data columns plus the sample ID column, so 10265 columns in
total after data reshaping.
After reading those 652 file and reshape the one column in each file, I will
stack them by the rbind function, then I have a file with a dimension of 653
row, 10265 column.
Thank you,
Ding
----------------------------------------------------------------------
------------------------------------------------------------
-SECURITY/CONFIDENTIALITY WARNING-
This message and any attachments are intended solely for the individual or
entity to which they are addressed. This communication may contain information
that is privileged, confidential, or exempt from disclosure under applicable law
(e.g., personal health information, research data, financial information).
Because this e-mail has been sent without encryption, individuals other than the
intended recipient may be able to view the information, forward it to others or
tamper with the information without the knowledge or consent of the sender. If
you are not the intended recipient, or the employee or person responsible for
delivering the message to the intended recipient, any dissemination,
distribution or copying of the communication is strictly prohibited. If you
received the communication in error, please notify the sender immediately by
replying to this message and deleting the message and any accompanying files
from your system. If, due to the security risks, you do not wish to rec
eive further communications via e-mail, please reply to this message and inform
the sender that you do not wish to receive further e-mail from the sender.
(LCP301)
------------------------------------------------------------
[[alternative HTML version deleted]]
______________________________________________
R-help at r-project.org<mailto:R-help at r-project.org> mailing list -- To
UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help<https://urldefense.com/v3/__https:/stat.ethz.ch/mailman/listinfo/r-help__;!!Fou38LsQmgU!8ZMVp6KEM5teZqzisPd2_VC4UWgOKsPv57IKfSREDz7-G68yAohVXLf7Sf4L$>
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html<https://urldefense.com/v3/__http:/www.R-project.org/posting-guide.html__;!!Fou38LsQmgU!8ZMVp6KEM5teZqzisPd2_VC4UWgOKsPv57IKfSREDz7-G68yAohVXNnRAp_Y$>
and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]