thr3ads.net - R help - [R] substituting dots in the names of the columns (sub, gsub, regexpr) [Jul 2007]

If this information is useful, please help other people find it:
Share via:

8rino-Luca Pantani

2007-Jul-26 13:40 UTC

[R] substituting dots in the names of the columns (sub, gsub, regexpr)

Dear R users,
I have the following two problems, related to the function sub, grep, 
regexpr and similia.

The header of the file(s) I have to import is like this.

c("y (m)", "BD (g/cm3)", "PR (Mpa)", "Ks
(m/s)", "SP g./g.", "P
(m3/m3)", "theta1 (g/g)", "theta2 (g/g)", "AWC
(g/g)")

To get rid of spaces and symbols in the names of the columns,
I use read.table(... check.names=TRUE) and I get:
str <- c("y..m.", "BD..g.cm3.", "PR..Mpa.",
"Ks..m.s.", "SP.g..g.",
"P..m3.m3.", "theta1..g.g.", "theta2..g.g.",
"AWC..g.g.")

Now, my problem is to remove the trailing dots, as well as the double 
dots, in order to get the names like the following
c("y.m", "BD.g.cm3", "PR.Mpa", "Ks.m.s",
"SP.g.g", "P.m3.m3.",
"theta1.g.g", "theta2.g.g", "AWC.g.g")

I've searched the help pages for sub, regexpr and similia, and also 
searched the help archives.
I understand that the dot is a peculiar sign since
sub("..", ".", str)
[1] "..m."        "...g.cm3."   "...Mpa."    
"...m.s."     "..g..g."
[6] "..m3.m3."    ".eta1..g.g." ".eta2..g.g."
".C..g.g."

Therefore I tried
sub("\\..", ".", str)
[1] "y.m."        "BD.g.cm3."   "PR.Mpa."    
"Ks.m.s."     "SP...g."
[6] "P.m3.m3."    "theta1.g.g." "theta2.g.g."
"AWC.g.g."
and I've been surprised by the (to me) strange behaviour in
"SP.g..g."
modified in "SP...g."
An this is the first problem I cannot solve.

Then there's the problem of trailing dot removal.
In
http://tolstoy.newcastle.edu.au/R/e2/help/07/01/8665.html
I've found a somewhat similar problem, but it do not works in this case 
since:
gsub("[.].*", "", str)
[1] "y"      "BD"     "PR"     "Ks"    
"SP"     "P"      "theta1" "theta2"
[9] "AWC"   
And this the second problem

Apart this particular problems I would like to know more on regexp, sub 
and so on, since each time
I have strings to manipulate, I must face my ignorance in the topic of 
regular expression and its syntax.

Is there any page with examples, where I can improve my knowledge and 
stop being frustrated each time I have to manipulate strings?

8rino

-- 
Ottorino-Luca Pantani, Universit? di Firenze
Dip. Scienza del Suolo e Nutrizione della Pianta
P.zle Cascine 28 50144 Firenze Italia
Tel 39 055 3288 202 (348 lab) Fax 39 055 333 273 
OLPantani at unifi.it

Gabor Grothendieck

2007-Jul-26 14:06 UTC

head link

[R] substituting dots in the names of the columns (sub, gsub, regexpr)

Use \\. or [.] with quotes to denote a literal dot (#1)
or can use fixed = TRUE to remove the meaning of dot (#2) or
use a zero-width lookahead assertion (?=[.]) which will be matched
but is not added to the string to be replaced (#3).  Try ?regexpr .
Also the links on the gsubfn home page (http://code.google.com/p/gsubfn/)
point to a number of good resources on regular expressions.

Str <- c("y..m.", "BD..g.cm3.", "PR..Mpa.",
"Ks..m.s.", "SP.g..g.",
"P..m3.m3.", "theta1..g.g.", "theta2..g.g.",
"AWC..g.g.")

# 1
tmp <- gsub("[.]+", ".", Str)
sub("[.]+$", "", tmp)

# 2
tmp <- gsub("..", ".", Str, fixed = TRUE)
sub("[.]+$", "", tmp)

# 3 - both done at once using zero-width lookahead
gsub("[.]*$|[.]*(?=[.])", "", Str, perl = TRUE)


On 7/26/07, 8rino-Luca Pantani <ottorino-luca.pantani at unifi.it>
wrote:> Dear R users,
> I have the following two problems, related to the function sub, grep,
> regexpr and similia.
>
> The header of the file(s) I have to import is like this.
>
> c("y (m)", "BD (g/cm3)", "PR (Mpa)", "Ks
(m/s)", "SP g./g.", "P
> (m3/m3)", "theta1 (g/g)", "theta2 (g/g)",
"AWC (g/g)")
>
> To get rid of spaces and symbols in the names of the columns,
> I use read.table(... check.names=TRUE) and I get:
> str <- c("y..m.", "BD..g.cm3.",
"PR..Mpa.", "Ks..m.s.", "SP.g..g.",
> "P..m3.m3.", "theta1..g.g.", "theta2..g.g.",
"AWC..g.g.")
>
> Now, my problem is to remove the trailing dots, as well as the double
> dots, in order to get the names like the following
> c("y.m", "BD.g.cm3", "PR.Mpa",
"Ks.m.s", "SP.g.g", "P.m3.m3.",
> "theta1.g.g", "theta2.g.g", "AWC.g.g")
>
> I've searched the help pages for sub, regexpr and similia, and also
> searched the help archives.
> I understand that the dot is a peculiar sign since
> sub("..", ".", str)
> [1] "..m."        "...g.cm3."   "...Mpa."    
"...m.s."     "..g..g."
> [6] "..m3.m3."    ".eta1..g.g." ".eta2..g.g."
".C..g.g."
>
> Therefore I tried
> sub("\\..", ".", str)
> [1] "y.m."        "BD.g.cm3."   "PR.Mpa."    
"Ks.m.s."     "SP...g."
> [6] "P.m3.m3."    "theta1.g.g." "theta2.g.g."
"AWC.g.g."
> and I've been surprised by the (to me) strange behaviour in
"SP.g..g."
> modified in "SP...g."
> An this is the first problem I cannot solve.
>
> Then there's the problem of trailing dot removal.
> In
> http://tolstoy.newcastle.edu.au/R/e2/help/07/01/8665.html
> I've found a somewhat similar problem, but it do not works in this case
> since:
> gsub("[.].*", "", str)
> [1] "y"      "BD"     "PR"     "Ks"
"SP"     "P"      "theta1" "theta2"
> [9] "AWC"
> And this the second problem
>
> Apart this particular problems I would like to know more on regexp, sub
> and so on, since each time
> I have strings to manipulate, I must face my ignorance in the topic of
> regular expression and its syntax.
>
> Is there any page with examples, where I can improve my knowledge and
> stop being frustrated each time I have to manipulate strings?
>
> 8rino
>
> --
> Ottorino-Luca Pantani, Universit? di Firenze
> Dip. Scienza del Suolo e Nutrizione della Pianta
> P.zle Cascine 28 50144 Firenze Italia
> Tel 39 055 3288 202 (348 lab) Fax 39 055 333 273
> OLPantani at unifi.it
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Gabor Grothendieck

2007-Jul-26 14:07 UTC

head link

[R] substituting dots in the names of the columns (sub, gsub, regexpr)

Use \\. or [.] with quotes to denote a literal dot (#1)
or can use fixed = TRUE to remove the meaning of dot (#2) or
use a zero-width lookahead assertion (?=[.]) which will be matched
but is not added to the string to be replaced (#3).  Try ?regexpr .
Also the links on the gsubfn home page (http://code.google.com/p/gsubfn/)
point to a number of good resources on regular expressions.

Str <- c("y..m.", "BD..g.cm3.", "PR..Mpa.",
"Ks..m.s.", "SP.g..g.",
"P..m3.m3.", "theta1..g.g.", "theta2..g.g.",
"AWC..g.g.")

# 1
tmp <- gsub("[.]+", ".", Str)
sub("[.]+$", "", tmp)

# 2
tmp <- gsub("..", ".", Str, fixed = TRUE)
sub("[.]+$", "", tmp)

# 3 - both done at once using zero-width lookahead
gsub("[.]*$|[.]*(?=[.])", "", Str, perl = TRUE)


On 7/26/07, 8rino-Luca Pantani <ottorino-luca.pantani at unifi.it>
wrote:> Dear R users,
> I have the following two problems, related to the function sub, grep,
> regexpr and similia.
>
> The header of the file(s) I have to import is like this.
>
> c("y (m)", "BD (g/cm3)", "PR (Mpa)", "Ks
(m/s)", "SP g./g.", "P
> (m3/m3)", "theta1 (g/g)", "theta2 (g/g)",
"AWC (g/g)")
>
> To get rid of spaces and symbols in the names of the columns,
> I use read.table(... check.names=TRUE) and I get:
> str <- c("y..m.", "BD..g.cm3.",
"PR..Mpa.", "Ks..m.s.", "SP.g..g.",
> "P..m3.m3.", "theta1..g.g.", "theta2..g.g.",
"AWC..g.g.")
>
> Now, my problem is to remove the trailing dots, as well as the double
> dots, in order to get the names like the following
> c("y.m", "BD.g.cm3", "PR.Mpa",
"Ks.m.s", "SP.g.g", "P.m3.m3.",
> "theta1.g.g", "theta2.g.g", "AWC.g.g")
>
> I've searched the help pages for sub, regexpr and similia, and also
> searched the help archives.
> I understand that the dot is a peculiar sign since
> sub("..", ".", str)
> [1] "..m."        "...g.cm3."   "...Mpa."    
"...m.s."     "..g..g."
> [6] "..m3.m3."    ".eta1..g.g." ".eta2..g.g."
".C..g.g."
>
> Therefore I tried
> sub("\\..", ".", str)
> [1] "y.m."        "BD.g.cm3."   "PR.Mpa."    
"Ks.m.s."     "SP...g."
> [6] "P.m3.m3."    "theta1.g.g." "theta2.g.g."
"AWC.g.g."
> and I've been surprised by the (to me) strange behaviour in
"SP.g..g."
> modified in "SP...g."
> An this is the first problem I cannot solve.
>
> Then there's the problem of trailing dot removal.
> In
> http://tolstoy.newcastle.edu.au/R/e2/help/07/01/8665.html
> I've found a somewhat similar problem, but it do not works in this case
> since:
> gsub("[.].*", "", str)
> [1] "y"      "BD"     "PR"     "Ks"
"SP"     "P"      "theta1" "theta2"
> [9] "AWC"
> And this the second problem
>
> Apart this particular problems I would like to know more on regexp, sub
> and so on, since each time
> I have strings to manipulate, I must face my ignorance in the topic of
> regular expression and its syntax.
>
> Is there any page with examples, where I can improve my knowledge and
> stop being frustrated each time I have to manipulate strings?
>
> 8rino
>
> --
> Ottorino-Luca Pantani, Universit? di Firenze
> Dip. Scienza del Suolo e Nutrizione della Pianta
> P.zle Cascine 28 50144 Firenze Italia
> Tel 39 055 3288 202 (348 lab) Fax 39 055 333 273
> OLPantani at unifi.it
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Felix Andrews

2007-Jul-26 14:15 UTC

head link

[R] substituting dots in the names of the columns (sub, gsub, regexpr)

Hi,

A dot in a regular expression matches any character, so you have to
escape each dot with backslash \\ (which itself is escaped in the
string, to confuse things...).
A plus symbol will match one or more of the preceding characters.
A dollar symbol will match the end of a string.

So:

gsub("\\.$", "", gsub("\\.+", ".", str))
[1] "y.m"        "BD.g.cm3"   "PR.Mpa"    
"Ks.m.s"     "SP.g.g"
"P.m3.m3"    "theta1.g.g"
[8] "theta2.g.g" "AWC.g.g"

Learn more at ?regexp

Felix


On 7/26/07, 8rino-Luca Pantani <ottorino-luca.pantani at unifi.it>
wrote:> Dear R users,
> I have the following two problems, related to the function sub, grep,
> regexpr and similia.
>
> The header of the file(s) I have to import is like this.
>
> c("y (m)", "BD (g/cm3)", "PR (Mpa)", "Ks
(m/s)", "SP g./g.", "P
> (m3/m3)", "theta1 (g/g)", "theta2 (g/g)",
"AWC (g/g)")
>
> To get rid of spaces and symbols in the names of the columns,
> I use read.table(... check.names=TRUE) and I get:
> str <- c("y..m.", "BD..g.cm3.",
"PR..Mpa.", "Ks..m.s.", "SP.g..g.",
> "P..m3.m3.", "theta1..g.g.", "theta2..g.g.",
"AWC..g.g.")
>
> Now, my problem is to remove the trailing dots, as well as the double
> dots, in order to get the names like the following
> c("y.m", "BD.g.cm3", "PR.Mpa",
"Ks.m.s", "SP.g.g", "P.m3.m3.",
> "theta1.g.g", "theta2.g.g", "AWC.g.g")
>
> I've searched the help pages for sub, regexpr and similia, and also
> searched the help archives.
> I understand that the dot is a peculiar sign since
> sub("..", ".", str)
> [1] "..m."        "...g.cm3."   "...Mpa."    
"...m.s."     "..g..g."
> [6] "..m3.m3."    ".eta1..g.g." ".eta2..g.g."
".C..g.g."
>
> Therefore I tried
> sub("\\..", ".", str)
> [1] "y.m."        "BD.g.cm3."   "PR.Mpa."    
"Ks.m.s."     "SP...g."
> [6] "P.m3.m3."    "theta1.g.g." "theta2.g.g."
"AWC.g.g."
> and I've been surprised by the (to me) strange behaviour in
"SP.g..g."
> modified in "SP...g."
> An this is the first problem I cannot solve.
>
> Then there's the problem of trailing dot removal.
> In
> http://tolstoy.newcastle.edu.au/R/e2/help/07/01/8665.html
> I've found a somewhat similar problem, but it do not works in this case
> since:
> gsub("[.].*", "", str)
> [1] "y"      "BD"     "PR"     "Ks"
"SP"     "P"      "theta1" "theta2"
> [9] "AWC"
> And this the second problem
>
> Apart this particular problems I would like to know more on regexp, sub
> and so on, since each time
> I have strings to manipulate, I must face my ignorance in the topic of
> regular expression and its syntax.
>
> Is there any page with examples, where I can improve my knowledge and
> stop being frustrated each time I have to manipulate strings?
>
> 8rino
>
> --
> Ottorino-Luca Pantani, Universit?? di Firenze
> Dip. Scienza del Suolo e Nutrizione della Pianta
> P.zle Cascine 28 50144 Firenze Italia
> Tel 39 055 3288 202 (348 lab) Fax 39 055 333 273
> OLPantani at unifi.it
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Felix Andrews / ??????
PhD candidate
Integrated Catchment Assessment and Management Centre
The Fenner School of Environment and Society
The Australian National University (Building 48A), ACT 0200
Beijing Bag, Locked Bag 40, Kingston ACT 2604
http://www.neurofractal.org/felix/
voice:+86_1051404394 (in China)
mobile:+86_13522529265 (in China)
mobile:+61_410400963 (in Australia)
xmpp:foolish.android at gmail.com
3358 543D AAC6 22C2 D336  80D9 360B 72DD 3E4C F5D8

Maybe Matching Threads

Search for more seemingly similar threads

R help - Jul 2007 - substituting dots in the names of the columns (sub, gsub, regexpr)

[R] substituting dots in the names of the columns (sub, gsub, regexpr)

[R] substituting dots in the names of the columns (sub, gsub, regexpr)

[R] substituting dots in the names of the columns (sub, gsub, regexpr)

[R] substituting dots in the names of the columns (sub, gsub, regexpr)

Maybe Matching Threads