thr3ads.net - R help - [R] select portion of text file using R [Apr 2015]

If this information is useful, please help other people find it:
Share via:

Luigi Marongiu

2015-Apr-27 21:20 UTC

[R] select portion of text file using R

Dear Duncan,
thank you for your reply,
I tried to read the file using skip and nrows but it did not work.
Here i am pasting the code I wrote and the head of the file i need to
read. Probably the error is due to the fact that the column "well" has
duplication, but how can i add a row column with unique row names? How
can I overcome this error?
Best regards
Luigi

CODE
raw.data<-read.table(
      mydata,
      header=TRUE,
      row.names=31,
      dec=".",
      sep="\t",
      skip = 30,
      nrows = 17281,
      row.names = 1:17281
    )


HEAD OF MYDATA
* Block Type = Array Card Block
* Calibration Background is expired = No
* Calibration Background performed on = 2014-12-02 11:27:49 AM PST
* Calibration FAM is expired = No
* Calibration FAM performed on = 2014-12-02 12:00:20 PM PST
* Calibration ROI is expired = No
* Calibration ROI performed on = 2014-12-02 11:20:40 AM PST
* Calibration ROX is expired = No
* Calibration ROX performed on = 2014-12-02 12:11:21 PM PST
* Calibration Uniformity is expired = No
* Calibration Uniformity performed on = 2014-12-02 11:43:43 AM PST
* Calibration VIC is expired = No
* Calibration VIC performed on = 2014-12-02 11:51:59 AM PST
* Chemistry = TAQMAN
* Experiment Barcode * Experiment Comments * Experiment File Name =
F:\2015-04-13 Gastro array 59 Luigi - plate 3.eds
* Experiment Name = 2015-04-13 171216
* Experiment Run End Time = 2015-04-13 18:07:57 PM PDT
* Experiment Type = Comparative C? (??C?)
* Experiment User Name * Instrument Name = 278882033
* Instrument Serial Number = 278882033
* Instrument Type = ViiA 7
* Passive Reference = ROX
* Quantification Cycle Method = Ct
* Signal Smoothing On = false
* Stage/ Cycle where Analysis is performed = Stage 3, Step 2

[Amplification Data]

Well \tCycle \tTarget \tName \tRn
\t1 \t1 \tAdeno 1 \t0.82
\t1 \t2 \tAdeno 1\ \t0.93
...
\t2 \t1 \tAdeno 2 \t0.78
...

On Mon, Apr 20, 2015 at 12:17 PM, Duncan Murdoch
<murdoch.duncan at gmail.com> wrote:> On 20/04/2015 3:28 AM, Luigi Marongiu wrote:
>> Dear all,
>> I have a flat file (tab delimited) derived from an excel file which is
>> subdivided in different parts: a first part is reporting metadata,
>> then there is a first spreadsheet indicated by [ ], then the actual
>> data and the second spreadsheet with the same format [ ] and then the
>> data.
>> How can I import such file using for instance read.table()?
>
> read.table() by itself can't recognize where the data starts, but it
has
> arguments "skip" and "nrows" to control how much gets
read.  If you
> don't know the values for those arguments, you can use readLines() to
> read the entire file, then use grep() to recognize your table data, and
> either re-read the file, or just extract those lines and read from them
> as a textConnection.
>
> Duncan Murdoch
>
>> Many thanks
>> regards
>> Luigi
>>
>> Here is a sample of the file:
>> * Experiment Barcode >> * Experiment Comments >> *
Experiment File Name = F:\array 59
>> * Experiment Name = 2015-04-13 171216
>> * Experiment Run End Time = 2015-04-13 18:07:57 PM PDT
>> ...
>> [Amplification Data]
>> Well    Cycle    Target Name    Rn    Delta Rn
>> 1    1    Adeno 1-Adeno 1    0.820    -0.051
>> 1    2    Adeno 1-Adeno 1    0.827    -0.042
>> 1    3    Adeno 1-Adeno 1    0.843    -0.025
>> 1    4    Adeno 1-Adeno 1    0.852    -0.015
>> 1    5    Adeno 1-Adeno 1    0.858    -0.008
>> 1    6    Adeno 1-Adeno 1    0.862    -0.002
>> ...
>> [Results]
>> Well    Well Position    Omit    Sample Name    Target Name    Task
>> Reporter    Quencher    RQ    RQ Min    RQ Max    CT    Ct Mean    Ct
>> SD    Quantity    Delta Ct Mean    Delta Ct SD    Delta Delta Ct
>> Automatic Ct Threshold    Ct Threshold    Automatic Baseline
>> Baseline Start    Baseline End    Efficiency    Comments    Custom1
>> Custom2    Custom3    Custom4    Custom5    Custom6    NOAMP
>> EXPFAIL
>> 1    A1    false    P17    Adeno 1-Adeno 1    UNKNOWN    FAM
>> NFQ-MGB                Undetermined                            false
>>  0.200    true    3    44    1.000    N/A                            N
>>    Y
>> 2    A2    false    P17    Adeno 40/41 EH-AIQJCT3    UNKNOWN    FAM
>> NFQ-MGB                Undetermined
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>

Duncan Murdoch

2015-Apr-27 22:45 UTC

head link

[R] select portion of text file using R

On 27/04/2015 5:20 PM, Luigi Marongiu wrote:> Dear Duncan,
> thank you for your reply,
> I tried to read the file using skip and nrows but it did not work.
What does that mean?  We might be able to be more help if you tell us
what happened when you tried the code below.
> Here i am pasting the code I wrote and the head of the file i need to
> read. Probably the error is due to the fact that the column
"well" has
> duplication, but how can i add a row column with unique row names? How
> can I overcome this error?
> Best regards
> Luigi
> 
> CODE
> raw.data<-read.table(
>       mydata,
>       header=TRUE,
>       row.names=31,
That's a strange entry, in conflict with the entry below...
>       dec=".",
>       sep="\t",
>       skip = 30,
>       nrows = 17281,
>       row.names = 1:17281
... i.e. here.  If you don't have row names in the file, there's no need
to specify them numerically:  that would be the default.  So I'd drop
*both* cases where you give the row.names argument.

Also, I may have counted wrong, but it looks to me that the line with
the column headings is line 32, so you should have skip = 31.

Duncan Murdoch

>     )
> 
> 
> HEAD OF MYDATA
> * Block Type = Array Card Block
> * Calibration Background is expired = No
> * Calibration Background performed on = 2014-12-02 11:27:49 AM PST
> * Calibration FAM is expired = No
> * Calibration FAM performed on = 2014-12-02 12:00:20 PM PST
> * Calibration ROI is expired = No
> * Calibration ROI performed on = 2014-12-02 11:20:40 AM PST
> * Calibration ROX is expired = No
> * Calibration ROX performed on = 2014-12-02 12:11:21 PM PST
> * Calibration Uniformity is expired = No
> * Calibration Uniformity performed on = 2014-12-02 11:43:43 AM PST
> * Calibration VIC is expired = No
> * Calibration VIC performed on = 2014-12-02 11:51:59 AM PST
> * Chemistry = TAQMAN
> * Experiment Barcode > * Experiment Comments > * Experiment File Name
= F:\2015-04-13 Gastro array 59 Luigi - plate 3.eds
> * Experiment Name = 2015-04-13 171216
> * Experiment Run End Time = 2015-04-13 18:07:57 PM PDT
> * Experiment Type = Comparative C? (??C?)
> * Experiment User Name > * Instrument Name = 278882033
> * Instrument Serial Number = 278882033
> * Instrument Type = ViiA 7
> * Passive Reference = ROX
> * Quantification Cycle Method = Ct
> * Signal Smoothing On = false
> * Stage/ Cycle where Analysis is performed = Stage 3, Step 2
> 
> [Amplification Data]
> 
> Well \tCycle \tTarget \tName \tRn
> \t1 \t1 \tAdeno 1 \t0.82
> \t1 \t2 \tAdeno 1\ \t0.93
> ...
> \t2 \t1 \tAdeno 2 \t0.78
> ...
> 
> On Mon, Apr 20, 2015 at 12:17 PM, Duncan Murdoch
> <murdoch.duncan at gmail.com> wrote:
>> On 20/04/2015 3:28 AM, Luigi Marongiu wrote:
>>> Dear all,
>>> I have a flat file (tab delimited) derived from an excel file which
is
>>> subdivided in different parts: a first part is reporting metadata,
>>> then there is a first spreadsheet indicated by [ ], then the actual
>>> data and the second spreadsheet with the same format [ ] and then
the
>>> data.
>>> How can I import such file using for instance read.table()?
>>
>> read.table() by itself can't recognize where the data starts, but
it has
>> arguments "skip" and "nrows" to control how much
gets read.  If you
>> don't know the values for those arguments, you can use readLines()
to
>> read the entire file, then use grep() to recognize your table data, and
>> either re-read the file, or just extract those lines and read from them
>> as a textConnection.
>>
>> Duncan Murdoch
>>
>>> Many thanks
>>> regards
>>> Luigi
>>>
>>> Here is a sample of the file:
>>> * Experiment Barcode >>> * Experiment Comments
>>> * Experiment File Name = F:\array 59
>>> * Experiment Name = 2015-04-13 171216
>>> * Experiment Run End Time = 2015-04-13 18:07:57 PM PDT
>>> ...
>>> [Amplification Data]
>>> Well    Cycle    Target Name    Rn    Delta Rn
>>> 1    1    Adeno 1-Adeno 1    0.820    -0.051
>>> 1    2    Adeno 1-Adeno 1    0.827    -0.042
>>> 1    3    Adeno 1-Adeno 1    0.843    -0.025
>>> 1    4    Adeno 1-Adeno 1    0.852    -0.015
>>> 1    5    Adeno 1-Adeno 1    0.858    -0.008
>>> 1    6    Adeno 1-Adeno 1    0.862    -0.002
>>> ...
>>> [Results]
>>> Well    Well Position    Omit    Sample Name    Target Name    Task
>>> Reporter    Quencher    RQ    RQ Min    RQ Max    CT    Ct Mean   
Ct
>>> SD    Quantity    Delta Ct Mean    Delta Ct SD    Delta Delta Ct
>>> Automatic Ct Threshold    Ct Threshold    Automatic Baseline
>>> Baseline Start    Baseline End    Efficiency    Comments    Custom1
>>> Custom2    Custom3    Custom4    Custom5    Custom6    NOAMP
>>> EXPFAIL
>>> 1    A1    false    P17    Adeno 1-Adeno 1    UNKNOWN    FAM
>>> NFQ-MGB                Undetermined                           
false
>>>  0.200    true    3    44    1.000    N/A                          
N
>>>    Y
>>> 2    A2    false    P17    Adeno 40/41 EH-AIQJCT3    UNKNOWN    FAM
>>> NFQ-MGB                Undetermined
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>

jim holtman

2015-Apr-28 00:30 UTC

head link

[R] select portion of text file using R

try this.  It read in all the data and discards the lines not required.
> # read in the data and delete lines not required
> data_in <- readLines(textConnection("HEAD OF MYDATA+  * Block Type = Array Card Block
+  * Calibration Background is expired = No
+  * Calibration Background performed on = 2014-12-02 11:27:49 AM PST
+  * Calibration FAM is expired = No
+  * Calibration FAM performed on = 2014-12-02 12:00:20 PM PST
+  * Calibration ROI is expired = No
+  * Calibration ROI performed on = 2014-12-02 11:20:40 AM PST
+  * Calibration ROX is expired = No
+  * Calibration ROX performed on = 2014-12-02 12:11:21 PM PST
+  * Calibration Uniformity is expired = No
+  * Calibration Uniformity performed on = 2014-12-02 11:43:43 AM PST
+  * Calibration VIC is expired = No
+  * Calibration VIC performed on = 2014-12-02 11:51:59 AM PST
+  * Chemistry = TAQMAN
+  * Experiment Barcode +  * Experiment Comments +  * Experiment File Name =
F:\2015-04-13 Gastro array 59 Luigi - plate
3.eds
+  * Experiment Name = 2015-04-13 171216
+  * Experiment Run End Time = 2015-04-13 18:07:57 PM PDT
+  * Experiment Type = Comparative
+  * Experiment User Name +  * Instrument Name = 278882033
+  * Instrument Serial Number = 278882033
+  * Instrument Type = ViiA 7
+  * Passive Reference = ROX
+  * Quantification Cycle Method = Ct
+  * Signal Smoothing On = false
+  * Stage/ Cycle where Analysis is performed = Stage 3, Step 2
+
+  [Amplification Data]
+
+  Well \tCycle \tTarget \tName \tRn
+  \t1 \t1 \tAdeno 1 \t0.82
+  \t1 \t2 \tAdeno 1\ \t0.93
+  \t2 \t1 \tAdeno 2 \t0.78"))>
>  indx <- grep("Amplification Data", data_in) + 1
>  data_in <- tail(data_in, -indx)  # delete lines
>  read.table(text = data_in, header = TRUE, sep = '\t')  Well Cycle Target     Name   Rn
1   NA     1      1 Adeno 1  0.82
2   NA     1      2 Adeno 1  0.93
3   NA     2      1 Adeno 2  0.78>
>

Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

On Mon, Apr 27, 2015 at 5:20 PM, Luigi Marongiu <marongiu.luigi at
gmail.com>
wrote:
> Dear Duncan,
> thank you for your reply,
> I tried to read the file using skip and nrows but it did not work.
> Here i am pasting the code I wrote and the head of the file i need to
> read. Probably the error is due to the fact that the column
"well" has
> duplication, but how can i add a row column with unique row names? How
> can I overcome this error?
> Best regards
> Luigi
>
> CODE
> raw.data<-read.table(
>       mydata,
>       header=TRUE,
>       row.names=31,
>       dec=".",
>       sep="\t",
>       skip = 30,
>       nrows = 17281,
>       row.names = 1:17281
>     )
>
>
> HEAD OF MYDATA
> * Block Type = Array Card Block
> * Calibration Background is expired = No
> * Calibration Background performed on = 2014-12-02 11:27:49 AM PST
> * Calibration FAM is expired = No
> * Calibration FAM performed on = 2014-12-02 12:00:20 PM PST
> * Calibration ROI is expired = No
> * Calibration ROI performed on = 2014-12-02 11:20:40 AM PST
> * Calibration ROX is expired = No
> * Calibration ROX performed on = 2014-12-02 12:11:21 PM PST
> * Calibration Uniformity is expired = No
> * Calibration Uniformity performed on = 2014-12-02 11:43:43 AM PST
> * Calibration VIC is expired = No
> * Calibration VIC performed on = 2014-12-02 11:51:59 AM PST
> * Chemistry = TAQMAN
> * Experiment Barcode > * Experiment Comments > * Experiment File Name
= F:\2015-04-13 Gastro array 59 Luigi - plate 3.eds
> * Experiment Name = 2015-04-13 171216
> * Experiment Run End Time = 2015-04-13 18:07:57 PM PDT
> * Experiment Type = Comparative C? (??C?)
> * Experiment User Name > * Instrument Name = 278882033
> * Instrument Serial Number = 278882033
> * Instrument Type = ViiA 7
> * Passive Reference = ROX
> * Quantification Cycle Method = Ct
> * Signal Smoothing On = false
> * Stage/ Cycle where Analysis is performed = Stage 3, Step 2
>
> [Amplification Data]
>
> Well \tCycle \tTarget \tName \tRn
> \t1 \t1 \tAdeno 1 \t0.82
> \t1 \t2 \tAdeno 1\ \t0.93
> ...
> \t2 \t1 \tAdeno 2 \t0.78
> ...
>
> On Mon, Apr 20, 2015 at 12:17 PM, Duncan Murdoch
> <murdoch.duncan at gmail.com> wrote:
> > On 20/04/2015 3:28 AM, Luigi Marongiu wrote:
> >> Dear all,
> >> I have a flat file (tab delimited) derived from an excel file
which is
> >> subdivided in different parts: a first part is reporting metadata,
> >> then there is a first spreadsheet indicated by [ ], then the
actual
> >> data and the second spreadsheet with the same format [ ] and then
the
> >> data.
> >> How can I import such file using for instance read.table()?
> >
> > read.table() by itself can't recognize where the data starts, but
it has
> > arguments "skip" and "nrows" to control how much
gets read.  If you
> > don't know the values for those arguments, you can use readLines()
to
> > read the entire file, then use grep() to recognize your table data,
and
> > either re-read the file, or just extract those lines and read from
them
> > as a textConnection.
> >
> > Duncan Murdoch
> >
> >> Many thanks
> >> regards
> >> Luigi
> >>
> >> Here is a sample of the file:
> >> * Experiment Barcode > >> * Experiment Comments >
>> * Experiment File Name = F:\array 59
> >> * Experiment Name = 2015-04-13 171216
> >> * Experiment Run End Time = 2015-04-13 18:07:57 PM PDT
> >> ...
> >> [Amplification Data]
> >> Well    Cycle    Target Name    Rn    Delta Rn
> >> 1    1    Adeno 1-Adeno 1    0.820    -0.051
> >> 1    2    Adeno 1-Adeno 1    0.827    -0.042
> >> 1    3    Adeno 1-Adeno 1    0.843    -0.025
> >> 1    4    Adeno 1-Adeno 1    0.852    -0.015
> >> 1    5    Adeno 1-Adeno 1    0.858    -0.008
> >> 1    6    Adeno 1-Adeno 1    0.862    -0.002
> >> ...
> >> [Results]
> >> Well    Well Position    Omit    Sample Name    Target Name   
Task
> >> Reporter    Quencher    RQ    RQ Min    RQ Max    CT    Ct Mean   
Ct
> >> SD    Quantity    Delta Ct Mean    Delta Ct SD    Delta Delta Ct
> >> Automatic Ct Threshold    Ct Threshold    Automatic Baseline
> >> Baseline Start    Baseline End    Efficiency    Comments   
Custom1
> >> Custom2    Custom3    Custom4    Custom5    Custom6    NOAMP
> >> EXPFAIL
> >> 1    A1    false    P17    Adeno 1-Adeno 1    UNKNOWN    FAM
> >> NFQ-MGB                Undetermined                           
false
> >>  0.200    true    3    44    1.000    N/A                         
N
> >>    Y
> >> 2    A2    false    P17    Adeno 40/41 EH-AIQJCT3    UNKNOWN   
FAM
> >> NFQ-MGB                Undetermined
> >>
> >> ______________________________________________
> >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
see
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >>
> >
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
	[[alternative HTML version deleted]]

Duncan Mackay

2015-Apr-28 01:30 UTC

head link

[R] select portion of text file using R

Hi Luigi

I think there may be problems with \t being equivalent to tab chr(9)

Therefore try

xlines <-
readLines(textConnection("* Block Type = Array Card Block
* Calibration Background is expired = No
* Calibration Background performed on = 2014-12-02 11:27:49 AM PST
* Calibration FAM is expired = No
* Calibration FAM performed on = 2014-12-02 12:00:20 PM PST
* Calibration ROI is expired = No
* Calibration ROI performed on = 2014-12-02 11:20:40 AM PST
* Calibration ROX is expired = No
* Calibration ROX performed on = 2014-12-02 12:11:21 PM PST
* Calibration Uniformity is expired = No
* Calibration Uniformity performed on = 2014-12-02 11:43:43 AM PST
* Calibration VIC is expired = No
* Calibration VIC performed on = 2014-12-02 11:51:59 AM PST
* Chemistry = TAQMAN
* Experiment Barcode * Experiment Comments * Experiment File Name =
F:\2015-04-13 Gastro array 59 Luigi - plate 3.eds
* Experiment Name = 2015-04-13 171216
* Experiment Run End Time = 2015-04-13 18:07:57 PM PDT
* Experiment Type = Comparative C? (??C?)
* Experiment User Name * Instrument Name = 278882033
* Instrument Serial Number = 278882033
* Instrument Type = ViiA 7
* Passive Reference = ROX
* Quantification Cycle Method = Ct
* Signal Smoothing On = false
* Stage/ Cycle where Analysis is performed = Stage 3, Step 2
Well  Cycle   Target  Name  Rn
  1   1   Adeno 1   0.82
  1   2   Adeno 1   0.93
  2   1   Adeno 2   0.78") )
xlines = sub("^\\*.*$","", xlines)
xlines = xlines[nchar(xlines)>0]
xlines = sub("^[[:space:]]+","", xlines)
xlines = xlines[-1]
datc = data.frame(do.call(rbind, lapply(xlines, function(x) unlist(strsplit(x,
"[[:space:]]+")))))
names(datc) =
c("Well","Cycle","Target","Name","Rn")
dat = datc
for (j in c(1,2,4,5)) dat[,j] = as.numeric(dat[,j])

Regards

Duncan

Duncan Mackay
Department of Agronomy and Soil Science
University of New England
Armidale NSW 2351
Email: home: mackay at northnet.com.au


-----Original Message-----
From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Luigi
Marongiu
Sent: Tuesday, 28 April 2015 07:20
To: Duncan Murdoch; r-help
Subject: Re: [R] select portion of text file using R

Dear Duncan,
thank you for your reply,
I tried to read the file using skip and nrows but it did not work.
Here i am pasting the code I wrote and the head of the file i need to
read. Probably the error is due to the fact that the column "well" has
duplication, but how can i add a row column with unique row names? How
can I overcome this error?
Best regards
Luigi

CODE
raw.data<-read.table(
      mydata,
      header=TRUE,
      row.names=31,
      dec=".",
      sep="\t",
      skip = 30,
      nrows = 17281,
      row.names = 1:17281
    )


HEAD OF MYDATA
* Block Type = Array Card Block
* Calibration Background is expired = No
* Calibration Background performed on = 2014-12-02 11:27:49 AM PST
* Calibration FAM is expired = No
* Calibration FAM performed on = 2014-12-02 12:00:20 PM PST
* Calibration ROI is expired = No
* Calibration ROI performed on = 2014-12-02 11:20:40 AM PST
* Calibration ROX is expired = No
* Calibration ROX performed on = 2014-12-02 12:11:21 PM PST
* Calibration Uniformity is expired = No
* Calibration Uniformity performed on = 2014-12-02 11:43:43 AM PST
* Calibration VIC is expired = No
* Calibration VIC performed on = 2014-12-02 11:51:59 AM PST
* Chemistry = TAQMAN
* Experiment Barcode * Experiment Comments * Experiment File Name =
F:\2015-04-13 Gastro array 59 Luigi - plate 3.eds
* Experiment Name = 2015-04-13 171216
* Experiment Run End Time = 2015-04-13 18:07:57 PM PDT
* Experiment Type = Comparative C? (??C?)
* Experiment User Name * Instrument Name = 278882033
* Instrument Serial Number = 278882033
* Instrument Type = ViiA 7
* Passive Reference = ROX
* Quantification Cycle Method = Ct
* Signal Smoothing On = false
* Stage/ Cycle where Analysis is performed = Stage 3, Step 2

[Amplification Data]

Well \tCycle \tTarget \tName \tRn
\t1 \t1 \tAdeno 1 \t0.82
\t1 \t2 \tAdeno 1\ \t0.93
...
\t2 \t1 \tAdeno 2 \t0.78
...

On Mon, Apr 20, 2015 at 12:17 PM, Duncan Murdoch
<murdoch.duncan at gmail.com> wrote:> On 20/04/2015 3:28 AM, Luigi Marongiu wrote:
>> Dear all,
>> I have a flat file (tab delimited) derived from an excel file which is
>> subdivided in different parts: a first part is reporting metadata,
>> then there is a first spreadsheet indicated by [ ], then the actual
>> data and the second spreadsheet with the same format [ ] and then the
>> data.
>> How can I import such file using for instance read.table()?
>
> read.table() by itself can't recognize where the data starts, but it
has
> arguments "skip" and "nrows" to control how much gets
read.  If you
> don't know the values for those arguments, you can use readLines() to
> read the entire file, then use grep() to recognize your table data, and
> either re-read the file, or just extract those lines and read from them
> as a textConnection.
>
> Duncan Murdoch
>
>> Many thanks
>> regards
>> Luigi
>>
>> Here is a sample of the file:
>> * Experiment Barcode >> * Experiment Comments >> *
Experiment File Name = F:\array 59
>> * Experiment Name = 2015-04-13 171216
>> * Experiment Run End Time = 2015-04-13 18:07:57 PM PDT
>> ...
>> [Amplification Data]
>> Well    Cycle    Target Name    Rn    Delta Rn
>> 1    1    Adeno 1-Adeno 1    0.820    -0.051
>> 1    2    Adeno 1-Adeno 1    0.827    -0.042
>> 1    3    Adeno 1-Adeno 1    0.843    -0.025
>> 1    4    Adeno 1-Adeno 1    0.852    -0.015
>> 1    5    Adeno 1-Adeno 1    0.858    -0.008
>> 1    6    Adeno 1-Adeno 1    0.862    -0.002
>> ...
>> [Results]
>> Well    Well Position    Omit    Sample Name    Target Name    Task
>> Reporter    Quencher    RQ    RQ Min    RQ Max    CT    Ct Mean    Ct
>> SD    Quantity    Delta Ct Mean    Delta Ct SD    Delta Delta Ct
>> Automatic Ct Threshold    Ct Threshold    Automatic Baseline
>> Baseline Start    Baseline End    Efficiency    Comments    Custom1
>> Custom2    Custom3    Custom4    Custom5    Custom6    NOAMP
>> EXPFAIL
>> 1    A1    false    P17    Adeno 1-Adeno 1    UNKNOWN    FAM
>> NFQ-MGB                Undetermined                            false
>>  0.200    true    3    44    1.000    N/A                            N
>>    Y
>> 2    A2    false    P17    Adeno 40/41 EH-AIQJCT3    UNKNOWN    FAM
>> NFQ-MGB                Undetermined
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Duncan Mackay

2015-Apr-29 02:09 UTC

head link

[R] select portion of text file using R

Hi Luigi

If it is an excel sheet can you split the excel sheet into sections and import
them that way
There are several ways to import excel
If you only have a text file:

# The good news is that the file is tab delimited although multiple for some
columns
xlines <- readLines("G:/1/plate 2.txt")
# similar to prev post
xlines = sub("^[\\[\\*]+.*$","", xlines)
xlines = xlines[nchar(xlines)>0]
# get non numeric row ? col headers
grep('^[^0-9]', xlines)
# ? second group
xlines[386]
# first group
x1 <- xlines[2:385]
head(x1)
tail(x1)
strsplit(x1[1],"\t+")
dat1 = data.frame(do.call(rbind, lapply(x1, function(x) unlist(strsplit(x,
"\t+")))))
dat1
# remove " "
dat1[,10] = sub(" ","", dat1[,10])
# split colours
data.frame(do.call(rbind, lapply(dat1[,4],function(x)
unlist(strsplit(gsub("[RGB\\(\\)]+","",
x),",")))))

If you have not got a good text editor then get one; there are plenty of free
ones not to mention shareware; very handy to view the separators

You will have to split it into sections based on grepping the non numeric lines
The method will be similar to above for the different sections
I suggest you read up on regular expressions - I use them every day in various
ways.
? sub 
and follow the prompts as well as the page

Duncan

-----Original Message-----
From: Luigi Marongiu [mailto:marongiu.luigi at gmail.com] 
Sent: Wednesday, 29 April 2015 08:06
To: Duncan Mackay
Subject: Re: [R] select portion of text file using R

Dear Duncan,
thank you for your reply. Please find attached the file I need to read
for further reference. I can't paste the content of the file because,
as you cans see, the content is huge; I need to send the path to the
file to a function that can screen the content of the file and select
the good part of it. The file is a flatfile version of an excel
spreadsheet and thus divided in different sections, each with a entry
part with comments that shall be removed.
The actual script I have is:
raw.data<-read.table(
      plate,
      header=TRUE,
      row.names=1,
      dec=".",
      sep="\t",
      skip = 30,
      nrows = 17281,
      row.names = 1:17281
    )
where plate is the path to the file (my count of the entry section is
30, coming from the calc spreadsheet version of the file).
The error I am getting is:>Error in read.table(plate, header = TRUE, row.names = 31, dec =
".", sep = "\t",  :  formal argument "row.names" matched by multiple actual arguments
and when removing the row.names argument the error
becomes:>Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, 
:  line 17283 did not have 5 elements
so I reduced nrows to 17280 but the the error became:>Error in data[[rlabp]] : subscript out of boundsHow can I overcome these issues?
Best regards
Luigi


On Tue, Apr 28, 2015 at 2:30 AM, Duncan Mackay <dulcalma at bigpond.com>
wrote:> Hi Luigi
>
> I think there may be problems with \t being equivalent to tab chr(9)
>
> Therefore try
>
> xlines <-
> readLines(textConnection("* Block Type = Array Card Block
> * Calibration Background is expired = No
> * Calibration Background performed on = 2014-12-02 11:27:49 AM PST
> * Calibration FAM is expired = No
> * Calibration FAM performed on = 2014-12-02 12:00:20 PM PST
> * Calibration ROI is expired = No
> * Calibration ROI performed on = 2014-12-02 11:20:40 AM PST
> * Calibration ROX is expired = No
> * Calibration ROX performed on = 2014-12-02 12:11:21 PM PST
> * Calibration Uniformity is expired = No
> * Calibration Uniformity performed on = 2014-12-02 11:43:43 AM PST
> * Calibration VIC is expired = No
> * Calibration VIC performed on = 2014-12-02 11:51:59 AM PST
> * Chemistry = TAQMAN
> * Experiment Barcode > * Experiment Comments > * Experiment File Name
= F:\2015-04-13 Gastro array 59 Luigi - plate 3.eds
> * Experiment Name = 2015-04-13 171216
> * Experiment Run End Time = 2015-04-13 18:07:57 PM PDT
> * Experiment Type = Comparative C? (??C?)
> * Experiment User Name > * Instrument Name = 278882033
> * Instrument Serial Number = 278882033
> * Instrument Type = ViiA 7
> * Passive Reference = ROX
> * Quantification Cycle Method = Ct
> * Signal Smoothing On = false
> * Stage/ Cycle where Analysis is performed = Stage 3, Step 2
> Well  Cycle   Target  Name  Rn
>   1   1   Adeno 1   0.82
>   1   2   Adeno 1   0.93
>   2   1   Adeno 2   0.78") )
> xlines = sub("^\\*.*$","", xlines)
> xlines = xlines[nchar(xlines)>0]
> xlines = sub("^[[:space:]]+","", xlines)
> xlines = xlines[-1]
> datc = data.frame(do.call(rbind, lapply(xlines, function(x)
unlist(strsplit(x, "[[:space:]]+")))))
> names(datc) =
c("Well","Cycle","Target","Name","Rn")
> dat = datc
> for (j in c(1,2,4,5)) dat[,j] = as.numeric(dat[,j])
>
> Regards
>
> Duncan
>
> Duncan Mackay
> Department of Agronomy and Soil Science
> University of New England
> Armidale NSW 2351
> Email: home: mackay at northnet.com.au
>
>
> -----Original Message-----
> From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Luigi
Marongiu
> Sent: Tuesday, 28 April 2015 07:20
> To: Duncan Murdoch; r-help
> Subject: Re: [R] select portion of text file using R
>
> Dear Duncan,
> thank you for your reply,
> I tried to read the file using skip and nrows but it did not work.
> Here i am pasting the code I wrote and the head of the file i need to
> read. Probably the error is due to the fact that the column
"well" has
> duplication, but how can i add a row column with unique row names? How
> can I overcome this error?
> Best regards
> Luigi
>
> CODE
> raw.data<-read.table(
>       mydata,
>       header=TRUE,
>       row.names=31,
>       dec=".",
>       sep="\t",
>       skip = 30,
>       nrows = 17281,
>       row.names = 1:17281
>     )
>
>
> HEAD OF MYDATA
> * Block Type = Array Card Block
> * Calibration Background is expired = No
> * Calibration Background performed on = 2014-12-02 11:27:49 AM PST
> * Calibration FAM is expired = No
> * Calibration FAM performed on = 2014-12-02 12:00:20 PM PST
> * Calibration ROI is expired = No
> * Calibration ROI performed on = 2014-12-02 11:20:40 AM PST
> * Calibration ROX is expired = No
> * Calibration ROX performed on = 2014-12-02 12:11:21 PM PST
> * Calibration Uniformity is expired = No
> * Calibration Uniformity performed on = 2014-12-02 11:43:43 AM PST
> * Calibration VIC is expired = No
> * Calibration VIC performed on = 2014-12-02 11:51:59 AM PST
> * Chemistry = TAQMAN
> * Experiment Barcode > * Experiment Comments > * Experiment File Name
= F:\2015-04-13 Gastro array 59 Luigi - plate 3.eds
> * Experiment Name = 2015-04-13 171216
> * Experiment Run End Time = 2015-04-13 18:07:57 PM PDT
> * Experiment Type = Comparative C? (??C?)
> * Experiment User Name > * Instrument Name = 278882033
> * Instrument Serial Number = 278882033
> * Instrument Type = ViiA 7
> * Passive Reference = ROX
> * Quantification Cycle Method = Ct
> * Signal Smoothing On = false
> * Stage/ Cycle where Analysis is performed = Stage 3, Step 2
>
> [Amplification Data]
>
> Well \tCycle \tTarget \tName \tRn
> \t1 \t1 \tAdeno 1 \t0.82
> \t1 \t2 \tAdeno 1\ \t0.93
> ...
> \t2 \t1 \tAdeno 2 \t0.78
> ...
>
> On Mon, Apr 20, 2015 at 12:17 PM, Duncan Murdoch
> <murdoch.duncan at gmail.com> wrote:
>> On 20/04/2015 3:28 AM, Luigi Marongiu wrote:
>>> Dear all,
>>> I have a flat file (tab delimited) derived from an excel file which
is
>>> subdivided in different parts: a first part is reporting metadata,
>>> then there is a first spreadsheet indicated by [ ], then the actual
>>> data and the second spreadsheet with the same format [ ] and then
the
>>> data.
>>> How can I import such file using for instance read.table()?
>>
>> read.table() by itself can't recognize where the data starts, but
it has
>> arguments "skip" and "nrows" to control how much
gets read.  If you
>> don't know the values for those arguments, you can use readLines()
to
>> read the entire file, then use grep() to recognize your table data, and
>> either re-read the file, or just extract those lines and read from them
>> as a textConnection.
>>
>> Duncan Murdoch
>>
>>> Many thanks
>>> regards
>>> Luigi
>>>
>>> Here is a sample of the file:
>>> * Experiment Barcode >>> * Experiment Comments
>>> * Experiment File Name = F:\array 59
>>> * Experiment Name = 2015-04-13 171216
>>> * Experiment Run End Time = 2015-04-13 18:07:57 PM PDT
>>> ...
>>> [Amplification Data]
>>> Well    Cycle    Target Name    Rn    Delta Rn
>>> 1    1    Adeno 1-Adeno 1    0.820    -0.051
>>> 1    2    Adeno 1-Adeno 1    0.827    -0.042
>>> 1    3    Adeno 1-Adeno 1    0.843    -0.025
>>> 1    4    Adeno 1-Adeno 1    0.852    -0.015
>>> 1    5    Adeno 1-Adeno 1    0.858    -0.008
>>> 1    6    Adeno 1-Adeno 1    0.862    -0.002
>>> ...
>>> [Results]
>>> Well    Well Position    Omit    Sample Name    Target Name    Task
>>> Reporter    Quencher    RQ    RQ Min    RQ Max    CT    Ct Mean   
Ct
>>> SD    Quantity    Delta Ct Mean    Delta Ct SD    Delta Delta Ct
>>> Automatic Ct Threshold    Ct Threshold    Automatic Baseline
>>> Baseline Start    Baseline End    Efficiency    Comments    Custom1
>>> Custom2    Custom3    Custom4    Custom5    Custom6    NOAMP
>>> EXPFAIL
>>> 1    A1    false    P17    Adeno 1-Adeno 1    UNKNOWN    FAM
>>> NFQ-MGB                Undetermined                           
false
>>>  0.200    true    3    44    1.000    N/A                          
N
>>>    Y
>>> 2    A2    false    P17    Adeno 40/41 EH-AIQJCT3    UNKNOWN    FAM
>>> NFQ-MGB                Undetermined
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

R help - Apr 2015 - select portion of text file using R

[R] select portion of text file using R

[R] select portion of text file using R

[R] select portion of text file using R

[R] select portion of text file using R

[R] select portion of text file using R