Dry, Jonathan R
2009-Sep-25 14:01 UTC
[R] Spliting columns, strings or reg exp returning substrings
Currently as the first column in a data frame I have string values in the format
xx_yy - I want to create a new column with just the substring xx (for each row
in turn). Three possible ways to do this might be (1) split the string by
'_' using strsplit and paste the first of the resulting variables into a
new column, but I have been unable to do this for each row of my data frame in
turn (trying to use apply); (2) split the column into two based on '_',
but I am not sure if this is possible; (3) use a regular expression to return
the substring up to the '_', but I am unsure how to make a regular
expression return the substring it matches to in R.
Any ideas on all three counts would be gratefully recieved.
--------------------------------------------------------------------------
AstraZeneca UK Limited is a company incorporated in Engl...{{dropped:21}}
Henrique Dallazuanna
2009-Sep-25 14:22 UTC
[R] Spliting columns, strings or reg exp returning substrings
Try this:
DF <- data.frame(A = c('11_12', '22_23', '33_34'),
B = sample(3))
#1) Using strsplit
transform(DF, C = sapply(strsplit(as.character(DF$A), "_"),
'[', 1))
#2) Using substr
transform(DF, C = substr(DF$A, 1, 2))
#3) Using regex
transform(DF, C = gsub("_.*", "", DF$A))
On Fri, Sep 25, 2009 at 11:01 AM, Dry, Jonathan R
<Jonathan.Dry at astrazeneca.com> wrote:> Currently as the first column in a data frame I have string values in the
format xx_yy - I want to create a new column with just the substring xx (for
each row in turn). ?Three possible ways to do this might be (1) split the string
by '_' using strsplit and paste the first of the resulting variables
into a new column, but I have been unable to do this for each row of my data
frame in turn (trying to use apply); (2) split the column into two based on
'_', but I am not sure if this is possible; (3) use a regular expression
to return the substring up to the '_', but I am unsure how to make a
regular expression return the substring it matches to in R.
>
> Any ideas on all three counts would be gratefully recieved.
>
> --------------------------------------------------------------------------
> AstraZeneca UK Limited is a company incorporated in Engl...{{dropped:21}}
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Henrique Dallazuanna
Curitiba-Paran?-Brasil
25? 25' 40" S 49? 16' 22" O
Ista Zahn
2009-Sep-26 14:15 UTC
[R] Spliting columns, strings or reg exp returning substrings
the colsplit function in the reshape package does this really easily. --ista> ---------- Forwarded message ---------- > From: "Dry, Jonathan R" <Jonathan.Dry at astrazeneca.com> > To: <r-help at R-project.org> > Date: Fri, 25 Sep 2009 15:01:46 +0100 > Subject: [R] Spliting columns, strings or reg exp returning substrings > Currently as the first column in a data frame I have string values in the format xx_yy - I want to create a new column with just the substring xx (for each row in turn). ?Three possible ways to do this might be (1) split the string by '_' using strsplit and paste the first of the resulting variables into a new column, but I have been unable to do this for each row of my data frame in turn (trying to use apply); (2) split the column into two based on '_', but I am not sure if this is possible; (3) use a regular expression to return the substring up to the '_', but I am unsure how to make a regular expression return the substring it matches to in R. > > Any ideas on all three counts would be gratefully recieved.
Hello all
I have a data frame representing a matrix of data. For each of my variables
(rows) I want to scale the data between 0 (representing the minimum value in
that row) and 1 (representing the maximum value in that row). I was wondering
if there is a simple function anywhere that does this?
Jonathan
--------------------------------------------------------------------------
AstraZeneca UK Limited is a company incorporated in Engl...{{dropped:21}}
On 28-Sep-09 09:55:04, Dry, Jonathan R wrote:> Hello all > I have a data frame representing a matrix of data. For each of my > variables (rows) I want to scale the data between 0 (representing > the minimum value in that row) and 1 (representing the maximum value > in that row). I was wondering if there is a simple function anywhere > that does this? > JonathanExample: set.seed(12345) X <- matrix(rnorm(50),ncol=5) X # [,1] [,2] [,3] [,4] [,5] # [1,] 0.5855288 -0.1162478 0.7796219 0.81187318 1.1285108 # [2,] 0.7094660 1.8173120 1.4557851 2.19683355 -2.3803581 # [3,] -0.1093033 0.3706279 -0.6443284 2.04919034 -1.0602656 # [4,] -0.4534972 0.5202165 -1.5531374 1.63244564 0.9371405 # [5,] 0.6058875 -0.7505320 -1.5977095 0.25427119 0.8544517 # [6,] -1.8179560 0.8168998 1.8050975 0.49118828 1.4607294 # [7,] 0.6300986 -0.8863575 -0.4816474 -0.32408658 -1.4130988 # [8,] -0.2761841 -0.3315776 0.6203798 -1.66205024 0.5674033 # [9,] -0.2841597 1.1207127 0.6121235 1.76773385 0.5831877 #[10,] -0.9193220 0.2987237 -0.1623110 0.02580105 -1.3067988 t(apply(X,1,function(x){(x-min(x))/(max(x)-min(x))})) # [,1] [,2] [,3] [,4] [,5] # [1,] 0.5637853 0.0000000 0.7197136 0.7456233 1.0000000 # [2,] 0.6750480 0.9170842 0.8380998 1.0000000 0.0000000 # [3,] 0.3058291 0.4601749 0.1337652 1.0000000 0.0000000 # [4,] 0.3451928 0.6508554 0.0000000 1.0000000 0.7817338 # [5,] 0.8986346 0.3454820 0.0000000 0.7552443 1.0000000 # [6,] 0.0000000 0.7272473 1.0000000 0.6373475 0.9049509 # [7,] 1.0000000 0.2578024 0.4558793 0.5329941 0.0000000 # [8,] 0.6071889 0.5829194 1.0000000 0.0000000 0.9767894 # [9,] 0.0000000 0.6846712 0.4368079 1.0000000 0.4227058 #[10,] 0.2413400 1.0000000 0.7128445 0.8300101 0.0000000 with identical results if applied to Y <- as.data.frame(X) Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk> Fax-to-email: +44 (0)870 094 0861 Date: 28-Sep-09 Time: 11:11:23 ------------------------------ XFMail ------------------------------
Try this, library(ggplot2) apply(matrix(10*rnorm(10),2), 1, ggplot2::rescale) HTH, baptiste 2009/9/28 Dry, Jonathan R <Jonathan.Dry at astrazeneca.com>:> Hello all > > I have a data frame representing a matrix of data. ?For each of my variables (rows) I want to scale the data between 0 (representing the minimum value in that row) and 1 (representing the maximum value in that row). ?I was wondering if there is a simple function anywhere that does this? > > Jonathan > > > > -------------------------------------------------------------------------- > AstraZeneca UK Limited is a company incorporated in Engl...{{dropped:21}} > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Hello all
I am manipulating some data and wish to expand/unmerge (i.e. do the opposite of
aggregate) rows in a data matrix based on the values in a particular column and
a seperator, e.g.
Col1 Col2
n1;n2 6
...separating by ";" becomes....
Col1 Col2
num1 6
num2 6
Any ideas?
Also can I do this based on values in two columns? EG:
Col1 Col2 Col3
n1;n2 ID1;ID2 6
...becomes....
Col1 Col2 Col3
n1 ID1 6
n2 ID2 6
?
--------------------------------------------------------------------------
AstraZeneca UK Limited is a company incorporated in Engl...{{dropped:21}}
Hi all - I have a data frame and have sorted it by a particular column, with
rownames set to a different variable. I wish to transpose this data frame,
naming columns by the rowname variable but maintaining the sorted order through
to the order of columns in my transposed table, however use of t(DF) results in
a transposed table where the columns are ordered alphabetically by the original
rownames. Any ideas how I can get around this?
--------------------------------------------------------------------------
AstraZeneca UK Limited is a company incorporated in Engl...{{dropped:21}}
Seemingly Similar Threads
- aggregating strings
- Merging tables
- strsplit does not return correct value when spliting "" (PR#8777)
- Bug: time complexity of substring is quadratic as string size and number of substrings increases
- Bug: time complexity of substring is quadratic as string size and number of substrings increases