thr3ads.net - R help - [R] Matrix Question [Jun 2011]

If this information is useful, please help other people find it:
Share via:

Ben Ganzfried

2011-Jun-02 18:54 UTC

[R] Matrix Question

Hi,

First of all, I would like to introduce myself as I will probably have many
questions over the next few weeks and want to thank you guys in advance for
your help.  I'm a cancer researcher and I need to learn R to complete a few
projects.  I have an introductory background in Python.

My questions at the moment are based on the following sample input file:
*Sample_Input_File*
 characteristics_ch1.3  Stage: T1N0  Stage: T2N1  Stage: T0N0  Stage:
T1N0  Stage:
T0N3

"characteristics_ch1.3" is a column header in  the input excel file.

"T's" represent stage and "N's" represent degree of
disease spreading.

I want to create output that looks like this:
*Sample_Output_File*
T     N
1     0
2     1
0     0
1     0
0     3

As it currently stands, my code is the following:

rm(list=ls())
source("../../functions.R")

uncurated <-
read.csv("../uncurated/Sample_Input_File_full_pdata.csv",as.is
=TRUE,row.names=1)

##initial creation of curated dataframe
curated <-
initialCuratedDF(rownames(uncurated),template.filename="Sample_Template_File.csv")

##--------------------
##start the mappings
##--------------------


##title -> alt_sample_name
curated$alt_sample_name <- uncurated$title

#T
tmp <- uncurated$characteristics_ch1.3
tmp <- *??????*
curated$T <- tmp

#N
tmp <- uncurated$characteristics_ch1.3
tmp <- *??????*
curated$N <- tmp

write.table(curated, row.names=FALSE,
file="../curated/Sample_Output_File_curated_pdata.txt",sep="\t")

My question is the following:

What code gets me the desired output (replacing the *??????*'s above)?  I
want to: a) Find the integer value one element to the right of "T";
and b)
find the integer value one element to the right of "N".  I've read
the
regular expression tutorial for R, but could only figure out how to grab an
integer value if it is the only integer value in the row (ie more than one
integer value makes this basic regular expression unsuccessful).

Thank you very much for any help you can provide.

Sincerely,

Ben Ganzfried

	[[alternative HTML version deleted]]

David Winsemius

2011-Jun-02 19:33 UTC

head link

[R] Regex Question: return digits after particular letters

On Jun 2, 2011, at 2:54 PM, Ben Ganzfried wrote:
> Hi,
>
> First of all, I would like to introduce myself as I will probably  
> have many
> questions over the next few weeks and want to thank you guys in  
> advance for
> your help.  I'm a cancer researcher and I need to learn R to  
> complete a few
> projects.  I have an introductory background in Python.
>
> My questions at the moment are based on the following sample input  
> file:
> *Sample_Input_File*
> characteristics_ch1.3  Stage: T1N0  Stage: T2N1  Stage: T0N0  Stage:
> T1N0  Stage:
> T0N3
>
I haven't quite figured out what your structure really is, and for  
that you should learn to post the output of dput()  on the R object...  
but see if this helps:

 > stg <- c('Stage: T1N0',  'Stage: T2N1', 'Stage:
T0N0', 'Stage:
T1N0', 'Stage: T0N3')
 > Tstg <- sub(".*T(\\d)N.", "\\1", stg)
 > Tstg
#[1] "1" "2" "0" "1" "0"
 > Nstg <- sub(".*T\\dN(\\d)", "\\1", stg)
 > Nstg
#[1] "0" "1" "0" "0" "3"

> "characteristics_ch1.3" is a column header in  the input excel
file.
>
> "T's" represent stage and "N's" represent
degree of disease spreading.
>
> I want to create output that looks like this:
> *Sample_Output_File*
> T     N
> 1     0
> 2     1
> 0     0
> 1     0
> 0     3
>
> As it currently stands, my code is the following:
>

> # rm(list=ls())####----
AND PLEASE DON"T POST THAT CODE WITHOUT A COMMENT.

I noticed it this time, but it is very aggravating to accidentally  
wide out hours of work while trying to offer help.
> source("../../functions.R")
>
> uncurated <- read.csv("../uncurated/ 
> Sample_Input_File_full_pdata.csv",as.is
> =TRUE,row.names=1)
>
> ##initial creation of curated dataframe
> curated <-
> initialCuratedDF 
>
(rownames(uncurated),template.filename="Sample_Template_File.csv")
>
> ##--------------------
> ##start the mappings
> ##--------------------
>
>
> ##title -> alt_sample_name
> curated$alt_sample_name <- uncurated$title
>
> #T
> tmp <- uncurated$characteristics_ch1.3
> tmp <- *??????*
> curated$T <- tmp
So here Tstg is tmp>
> #N
> tmp <- uncurated$characteristics_ch1.3
> tmp <- *??????*
> curated$N <- tmpAnd Nstg is tmp
> write.table(curated, row.names=FALSE,
>
file="../curated/Sample_Output_File_curated_pdata.txt",sep="\t")
>
> My question is the following:
>
> What code gets me the desired output (replacing the *??????*'s  
> above)?  I
> want to: a) Find the integer value one element to the right of
"T";
> and b)
> find the integer value one element to the right of "N".  I've
read the
> regular expression tutorial for R, but could only figure out how to  
> grab an
> integer value if it is the only integer value in the row (ie more  
> than one
> integer value makes this basic regular expression unsuccessful).
Just surround it with a pattern and use the ()  , "\\n"
mechanism>
> Thank you very much for any help you can provide.
>
> Sincerely,
>
> Ben Ganzfried
>
> 	[[alternative HTML version deleted]]

David Winsemius, MD
West Hartford, CT

Bill.Venables at csiro.au

2011-Jun-03 03:54 UTC

head link

[R] Matrix Question

Here is one way you might do it.
> con <- textConnection("+ characteristics_ch1.3  Stage: T1N0  Stage: T2N1
+ Stage: T0N0  Stage: T1N0  Stage: T0N3
+ ")> txt <- scan(con, what = "")
Read 11 items> close(con)
> 
> Ts <- grep("^T", txt, value = TRUE)
> Ts <- sub("T([[:digit:]]+)N([[:digit:]]+)",
"\\1x\\2", Ts)
> out <- do.call(rbind, strsplit(Ts, "x"))
> mode(out) <- "numeric"
> dimnames(out) <- list(rep("", nrow(out)), c("T",
"N"))
> 
> out T N
 1 0
 2 1
 0 0
 1 0
 0 3> 
Now you can print 'out' however you want it, e.g.
> sink("outfile.txt")
> out
> sink()
This is slightly more complex than it might be as I have allowed for the
possibility that your numbers have more than one digit.

Bill Venables. 

-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
On Behalf Of Ben Ganzfried
Sent: Friday, 3 June 2011 4:54 AM
To: r-help at r-project.org
Subject: [R] Matrix Question

Hi,

First of all, I would like to introduce myself as I will probably have many
questions over the next few weeks and want to thank you guys in advance for
your help.  I'm a cancer researcher and I need to learn R to complete a few
projects.  I have an introductory background in Python.

My questions at the moment are based on the following sample input file:
*Sample_Input_File*
 characteristics_ch1.3  Stage: T1N0  Stage: T2N1  Stage: T0N0  Stage:
T1N0  Stage:
T0N3

"characteristics_ch1.3" is a column header in  the input excel file.

"T's" represent stage and "N's" represent degree of
disease spreading.

I want to create output that looks like this:
*Sample_Output_File*
T     N
1     0
2     1
0     0
1     0
0     3

As it currently stands, my code is the following:

rm(list=ls())
source("../../functions.R")

uncurated <-
read.csv("../uncurated/Sample_Input_File_full_pdata.csv",as.is
=TRUE,row.names=1)

##initial creation of curated dataframe
curated <-
initialCuratedDF(rownames(uncurated),template.filename="Sample_Template_File.csv")

##--------------------
##start the mappings
##--------------------


##title -> alt_sample_name
curated$alt_sample_name <- uncurated$title

#T
tmp <- uncurated$characteristics_ch1.3
tmp <- *??????*
curated$T <- tmp

#N
tmp <- uncurated$characteristics_ch1.3
tmp <- *??????*
curated$N <- tmp

write.table(curated, row.names=FALSE,
file="../curated/Sample_Output_File_curated_pdata.txt",sep="\t")

My question is the following:

What code gets me the desired output (replacing the *??????*'s above)?  I
want to: a) Find the integer value one element to the right of "T";
and b)
find the integer value one element to the right of "N".  I've read
the
regular expression tutorial for R, but could only figure out how to grab an
integer value if it is the only integer value in the row (ie more than one
integer value makes this basic regular expression unsuccessful).

Thank you very much for any help you can provide.

Sincerely,

Ben Ganzfried

	[[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

R help - Jun 2011 - Matrix Question

[R] Matrix Question

[R] Regex Question: return digits after particular letters

[R] Matrix Question

Seemingly Similar Threads