Dry, Jonathan R
2009-Sep-25 14:01 UTC
[R] Spliting columns, strings or reg exp returning substrings
Currently as the first column in a data frame I have string values in the format xx_yy - I want to create a new column with just the substring xx (for each row in turn). Three possible ways to do this might be (1) split the string by '_' using strsplit and paste the first of the resulting variables into a new column, but I have been unable to do this for each row of my data frame in turn (trying to use apply); (2) split the column into two based on '_', but I am not sure if this is possible; (3) use a regular expression to return the substring up to the '_', but I am unsure how to make a regular expression return the substring it matches to in R. Any ideas on all three counts would be gratefully recieved. -------------------------------------------------------------------------- AstraZeneca UK Limited is a company incorporated in Engl...{{dropped:21}}
Henrique Dallazuanna
2009-Sep-25 14:22 UTC
[R] Spliting columns, strings or reg exp returning substrings
Try this: DF <- data.frame(A = c('11_12', '22_23', '33_34'), B = sample(3)) #1) Using strsplit transform(DF, C = sapply(strsplit(as.character(DF$A), "_"), '[', 1)) #2) Using substr transform(DF, C = substr(DF$A, 1, 2)) #3) Using regex transform(DF, C = gsub("_.*", "", DF$A)) On Fri, Sep 25, 2009 at 11:01 AM, Dry, Jonathan R <Jonathan.Dry at astrazeneca.com> wrote:> Currently as the first column in a data frame I have string values in the format xx_yy - I want to create a new column with just the substring xx (for each row in turn). ?Three possible ways to do this might be (1) split the string by '_' using strsplit and paste the first of the resulting variables into a new column, but I have been unable to do this for each row of my data frame in turn (trying to use apply); (2) split the column into two based on '_', but I am not sure if this is possible; (3) use a regular expression to return the substring up to the '_', but I am unsure how to make a regular expression return the substring it matches to in R. > > Any ideas on all three counts would be gratefully recieved. > > -------------------------------------------------------------------------- > AstraZeneca UK Limited is a company incorporated in Engl...{{dropped:21}} > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Henrique Dallazuanna Curitiba-Paran?-Brasil 25? 25' 40" S 49? 16' 22" O
Ista Zahn
2009-Sep-26 14:15 UTC
[R] Spliting columns, strings or reg exp returning substrings
the colsplit function in the reshape package does this really easily. --ista> ---------- Forwarded message ---------- > From: "Dry, Jonathan R" <Jonathan.Dry at astrazeneca.com> > To: <r-help at R-project.org> > Date: Fri, 25 Sep 2009 15:01:46 +0100 > Subject: [R] Spliting columns, strings or reg exp returning substrings > Currently as the first column in a data frame I have string values in the format xx_yy - I want to create a new column with just the substring xx (for each row in turn). ?Three possible ways to do this might be (1) split the string by '_' using strsplit and paste the first of the resulting variables into a new column, but I have been unable to do this for each row of my data frame in turn (trying to use apply); (2) split the column into two based on '_', but I am not sure if this is possible; (3) use a regular expression to return the substring up to the '_', but I am unsure how to make a regular expression return the substring it matches to in R. > > Any ideas on all three counts would be gratefully recieved.
Hello all I have a data frame representing a matrix of data. For each of my variables (rows) I want to scale the data between 0 (representing the minimum value in that row) and 1 (representing the maximum value in that row). I was wondering if there is a simple function anywhere that does this? Jonathan -------------------------------------------------------------------------- AstraZeneca UK Limited is a company incorporated in Engl...{{dropped:21}}
On 28-Sep-09 09:55:04, Dry, Jonathan R wrote:> Hello all > I have a data frame representing a matrix of data. For each of my > variables (rows) I want to scale the data between 0 (representing > the minimum value in that row) and 1 (representing the maximum value > in that row). I was wondering if there is a simple function anywhere > that does this? > JonathanExample: set.seed(12345) X <- matrix(rnorm(50),ncol=5) X # [,1] [,2] [,3] [,4] [,5] # [1,] 0.5855288 -0.1162478 0.7796219 0.81187318 1.1285108 # [2,] 0.7094660 1.8173120 1.4557851 2.19683355 -2.3803581 # [3,] -0.1093033 0.3706279 -0.6443284 2.04919034 -1.0602656 # [4,] -0.4534972 0.5202165 -1.5531374 1.63244564 0.9371405 # [5,] 0.6058875 -0.7505320 -1.5977095 0.25427119 0.8544517 # [6,] -1.8179560 0.8168998 1.8050975 0.49118828 1.4607294 # [7,] 0.6300986 -0.8863575 -0.4816474 -0.32408658 -1.4130988 # [8,] -0.2761841 -0.3315776 0.6203798 -1.66205024 0.5674033 # [9,] -0.2841597 1.1207127 0.6121235 1.76773385 0.5831877 #[10,] -0.9193220 0.2987237 -0.1623110 0.02580105 -1.3067988 t(apply(X,1,function(x){(x-min(x))/(max(x)-min(x))})) # [,1] [,2] [,3] [,4] [,5] # [1,] 0.5637853 0.0000000 0.7197136 0.7456233 1.0000000 # [2,] 0.6750480 0.9170842 0.8380998 1.0000000 0.0000000 # [3,] 0.3058291 0.4601749 0.1337652 1.0000000 0.0000000 # [4,] 0.3451928 0.6508554 0.0000000 1.0000000 0.7817338 # [5,] 0.8986346 0.3454820 0.0000000 0.7552443 1.0000000 # [6,] 0.0000000 0.7272473 1.0000000 0.6373475 0.9049509 # [7,] 1.0000000 0.2578024 0.4558793 0.5329941 0.0000000 # [8,] 0.6071889 0.5829194 1.0000000 0.0000000 0.9767894 # [9,] 0.0000000 0.6846712 0.4368079 1.0000000 0.4227058 #[10,] 0.2413400 1.0000000 0.7128445 0.8300101 0.0000000 with identical results if applied to Y <- as.data.frame(X) Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk> Fax-to-email: +44 (0)870 094 0861 Date: 28-Sep-09 Time: 11:11:23 ------------------------------ XFMail ------------------------------
Try this, library(ggplot2) apply(matrix(10*rnorm(10),2), 1, ggplot2::rescale) HTH, baptiste 2009/9/28 Dry, Jonathan R <Jonathan.Dry at astrazeneca.com>:> Hello all > > I have a data frame representing a matrix of data. ?For each of my variables (rows) I want to scale the data between 0 (representing the minimum value in that row) and 1 (representing the maximum value in that row). ?I was wondering if there is a simple function anywhere that does this? > > Jonathan > > > > -------------------------------------------------------------------------- > AstraZeneca UK Limited is a company incorporated in Engl...{{dropped:21}} > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Hello all I am manipulating some data and wish to expand/unmerge (i.e. do the opposite of aggregate) rows in a data matrix based on the values in a particular column and a seperator, e.g. Col1 Col2 n1;n2 6 ...separating by ";" becomes.... Col1 Col2 num1 6 num2 6 Any ideas? Also can I do this based on values in two columns? EG: Col1 Col2 Col3 n1;n2 ID1;ID2 6 ...becomes.... Col1 Col2 Col3 n1 ID1 6 n2 ID2 6 ? -------------------------------------------------------------------------- AstraZeneca UK Limited is a company incorporated in Engl...{{dropped:21}}
Hi all - I have a data frame and have sorted it by a particular column, with rownames set to a different variable. I wish to transpose this data frame, naming columns by the rowname variable but maintaining the sorted order through to the order of columns in my transposed table, however use of t(DF) results in a transposed table where the columns are ordered alphabetically by the original rownames. Any ideas how I can get around this? -------------------------------------------------------------------------- AstraZeneca UK Limited is a company incorporated in Engl...{{dropped:21}}
Seemingly Similar Threads
- aggregating strings
- Merging tables
- strsplit does not return correct value when spliting "" (PR#8777)
- Bug: time complexity of substring is quadratic as string size and number of substrings increases
- Bug: time complexity of substring is quadratic as string size and number of substrings increases