thr3ads.net - R help - [R] Spliting columns, strings or reg exp returning substrings [Sep 2009]

If this information is useful, please help other people find it:
Share via:

Dry, Jonathan R

2009-Sep-25 14:01 UTC

[R] Spliting columns, strings or reg exp returning substrings

Currently as the first column in a data frame I have string values in the format
xx_yy - I want to create a new column with just the substring xx (for each row
in turn).  Three possible ways to do this might be (1) split the string by
'_' using strsplit and paste the first of the resulting variables into a
new column, but I have been unable to do this for each row of my data frame in
turn (trying to use apply); (2) split the column into two based on '_',
but I am not sure if this is possible; (3) use a regular expression to return
the substring up to the '_', but I am unsure how to make a regular
expression return the substring it matches to in R.

Any ideas on all three counts would be gratefully recieved.

--------------------------------------------------------------------------
AstraZeneca UK Limited is a company incorporated in Engl...{{dropped:21}}

Henrique Dallazuanna

2009-Sep-25 14:22 UTC

head link

[R] Spliting columns, strings or reg exp returning substrings

Try this:

DF <- data.frame(A = c('11_12', '22_23', '33_34'),
                 B = sample(3))

#1) Using strsplit
transform(DF, C = sapply(strsplit(as.character(DF$A), "_"),
'[', 1))

#2) Using substr
transform(DF, C = substr(DF$A, 1, 2))

#3) Using regex
transform(DF, C = gsub("_.*", "", DF$A))


On Fri, Sep 25, 2009 at 11:01 AM, Dry, Jonathan R
<Jonathan.Dry at astrazeneca.com> wrote:> Currently as the first column in a data frame I have string values in the
format xx_yy - I want to create a new column with just the substring xx (for
each row in turn). ?Three possible ways to do this might be (1) split the string
by '_' using strsplit and paste the first of the resulting variables
into a new column, but I have been unable to do this for each row of my data
frame in turn (trying to use apply); (2) split the column into two based on
'_', but I am not sure if this is possible; (3) use a regular expression
to return the substring up to the '_', but I am unsure how to make a
regular expression return the substring it matches to in R.
>
> Any ideas on all three counts would be gratefully recieved.
>
> --------------------------------------------------------------------------
> AstraZeneca UK Limited is a company incorporated in Engl...{{dropped:21}}
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Henrique Dallazuanna
Curitiba-Paran?-Brasil
25? 25' 40" S 49? 16' 22" O

Ista Zahn

2009-Sep-26 14:15 UTC

head link

[R] Spliting columns, strings or reg exp returning substrings

the colsplit function in the reshape package does this really easily.

--ista
> ---------- Forwarded message ----------
> From: "Dry, Jonathan R" <Jonathan.Dry at astrazeneca.com>
> To: <r-help at R-project.org>
> Date: Fri, 25 Sep 2009 15:01:46 +0100
> Subject: [R] Spliting columns, strings or reg exp returning substrings
> Currently as the first column in a data frame I have string values in the
format xx_yy - I want to create a new column with just the substring xx (for
each row in turn). ?Three possible ways to do this might be (1) split the string
by '_' using strsplit and paste the first of the resulting variables
into a new column, but I have been unable to do this for each row of my data
frame in turn (trying to use apply); (2) split the column into two based on
'_', but I am not sure if this is possible; (3) use a regular expression
to return the substring up to the '_', but I am unsure how to make a
regular expression return the substring it matches to in R.
>
> Any ideas on all three counts would be gratefully recieved.

Dry, Jonathan R

2009-Sep-28 09:55 UTC

head link

[R] Scaling data

Hello all

I have a data frame representing a matrix of data.  For each of my variables
(rows) I want to scale the data between 0 (representing the minimum value in
that row) and 1 (representing the maximum value in that row).  I was wondering
if there is a simple function anywhere that does this?

Jonathan



--------------------------------------------------------------------------
AstraZeneca UK Limited is a company incorporated in Engl...{{dropped:21}}

(Ted Harding)

2009-Sep-28 10:11 UTC

head link

[R] Scaling data

On 28-Sep-09 09:55:04, Dry, Jonathan R wrote:> Hello all
> I have a data frame representing a matrix of data.  For each of my
> variables (rows) I want to scale the data between 0 (representing
> the minimum value in that row) and 1 (representing the maximum value
> in that row).  I was wondering if there is a simple function anywhere
> that does this?
> Jonathan
Example:
set.seed(12345)
X <- matrix(rnorm(50),ncol=5)
X
#            [,1]       [,2]       [,3]        [,4]       [,5]
# [1,]  0.5855288 -0.1162478  0.7796219  0.81187318  1.1285108
# [2,]  0.7094660  1.8173120  1.4557851  2.19683355 -2.3803581
# [3,] -0.1093033  0.3706279 -0.6443284  2.04919034 -1.0602656
# [4,] -0.4534972  0.5202165 -1.5531374  1.63244564  0.9371405
# [5,]  0.6058875 -0.7505320 -1.5977095  0.25427119  0.8544517
# [6,] -1.8179560  0.8168998  1.8050975  0.49118828  1.4607294
# [7,]  0.6300986 -0.8863575 -0.4816474 -0.32408658 -1.4130988
# [8,] -0.2761841 -0.3315776  0.6203798 -1.66205024  0.5674033
# [9,] -0.2841597  1.1207127  0.6121235  1.76773385  0.5831877
#[10,] -0.9193220  0.2987237 -0.1623110  0.02580105 -1.3067988

 t(apply(X,1,function(x){(x-min(x))/(max(x)-min(x))}))
#           [,1]      [,2]      [,3]      [,4]      [,5]
# [1,] 0.5637853 0.0000000 0.7197136 0.7456233 1.0000000
# [2,] 0.6750480 0.9170842 0.8380998 1.0000000 0.0000000
# [3,] 0.3058291 0.4601749 0.1337652 1.0000000 0.0000000
# [4,] 0.3451928 0.6508554 0.0000000 1.0000000 0.7817338
# [5,] 0.8986346 0.3454820 0.0000000 0.7552443 1.0000000
# [6,] 0.0000000 0.7272473 1.0000000 0.6373475 0.9049509
# [7,] 1.0000000 0.2578024 0.4558793 0.5329941 0.0000000
# [8,] 0.6071889 0.5829194 1.0000000 0.0000000 0.9767894
# [9,] 0.0000000 0.6846712 0.4368079 1.0000000 0.4227058
#[10,] 0.2413400 1.0000000 0.7128445 0.8300101 0.0000000

with identical results if applied to Y <- as.data.frame(X)

Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 28-Sep-09                                       Time: 11:11:23
------------------------------ XFMail ------------------------------

baptiste auguie

2009-Sep-28 10:15 UTC

head link

[R] Scaling data

Try this,

library(ggplot2)

apply(matrix(10*rnorm(10),2), 1, ggplot2::rescale)

HTH,

baptiste

2009/9/28 Dry, Jonathan R <Jonathan.Dry at
astrazeneca.com>:> Hello all
>
> I have a data frame representing a matrix of data. ?For each of my
variables (rows) I want to scale the data between 0 (representing the minimum
value in that row) and 1 (representing the maximum value in that row). ?I was
wondering if there is a simple function anywhere that does this?
>
> Jonathan
>
>
>
> --------------------------------------------------------------------------
> AstraZeneca UK Limited is a company incorporated in Engl...{{dropped:21}}
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Dry, Jonathan R

2009-Sep-30 16:18 UTC

head link

[R] Scaling data

Hello all

I am manipulating some data and wish to expand/unmerge (i.e. do the opposite of
aggregate) rows in a data matrix based on the values in a particular column and
a seperator, e.g.

Col1	Col2
n1;n2	6

...separating by ";" becomes....

Col1	Col2
num1	6
num2	6

Any ideas?

Also can I do this based on values in two columns?  EG:

Col1	Col2	Col3
n1;n2	ID1;ID2	6

...becomes....

Col1	Col2	Col3
n1	ID1	6
n2	ID2	6
?

--------------------------------------------------------------------------
AstraZeneca UK Limited is a company incorporated in Engl...{{dropped:21}}

Dry, Jonathan R

2009-Oct-01 14:04 UTC

head link

[R] Maintaining sort order when transpose

Hi all - I have a data frame and have sorted it by a particular column, with
rownames set to a different variable.  I wish to transpose this data frame,
naming columns by the rowname variable but maintaining the sorted order through
to the order of columns in my transposed table, however use of  t(DF) results in
a transposed table where the columns are ordered alphabetically by the original
rownames.  Any ideas how I can get around this?

--------------------------------------------------------------------------
AstraZeneca UK Limited is a company incorporated in Engl...{{dropped:21}}

Apparently Analagous Threads

Search for more maybe matching threads

R help - Sep 2009 - Spliting columns, strings or reg exp returning substrings

[R] Spliting columns, strings or reg exp returning substrings

[R] Spliting columns, strings or reg exp returning substrings

[R] Spliting columns, strings or reg exp returning substrings

[R] Scaling data

[R] Scaling data

[R] Scaling data

[R] Scaling data

[R] Maintaining sort order when transpose

Apparently Analagous Threads