thr3ads.net - R help - [R] How to remove square brackets, etc. from address strings? [May 2012]

If this information is useful, please help other people find it:
Share via:

Sabina Arndt

2012-May-22 10:08 UTC

[R] How to remove square brackets, etc. from address strings?

Hello,



I'd like to remove the individual pairs of square brackets along with 
their content - plus the space directly behind it - from address strings
 such as this:


  [Swidsinski, Alexander; Loening-Baucke, Vera; Lochs, Herbert] Charite 
Humboldt Univ, Innere Klin, D-10098 Berlin, Germany; [Hale, Laura P.] 
Duke Univ, Med Ctr, Dept Pathol, Durham, NC 27710 USA



 I'd like get the following result:



  Charite Humboldt Univ, Innere Klin, D-10098 Berlin, Germany; Duke Univ, Med
Ctr, Dept Pathol, Durham, NC 27710 USA



I tried


  address = gsub("(.*)[(.*)]", "\\2", address)



But this deletes everything from the first opening bracket to the last closing
bracket and leaves only the very last address:


  Duke Univ, Med Ctr, Dept Pathol, Durham, NC 27710 USA



How can I remove only the individual pairs of square brackets along with their
content?



Thank you very much in advance! 		 	   		  
	[[alternative HTML version deleted]]

Sarah Goslee

2012-May-22 12:39 UTC

head link

[R] How to remove square brackets, etc. from address strings?

Hi Sabina,

You've run into two characteristics of regular expressions:
[ ] are special characters
* is a greedy match
Reading an intro regular expression document will help with both of those.

Meanwhile:
> x <- "[Swidsinski, Alexander; Loening-Baucke, Vera; Lochs, Herbert]
Charite Humboldt Univ, Innere Klin, D-10098 Berlin, Germany; [Hale, Laura P.]
Duke Univ, Med Ctr, Dept Pathol, Durham, NC 27710 USA"
> x[1] "[Swidsinski, Alexander; Loening-Baucke, Vera; Lochs, Herbert]
Charite Humboldt Univ, Innere Klin, D-10098 Berlin, Germany; [Hale,
Laura P.] Duke Univ, Med Ctr, Dept Pathol, Durham, NC 27710 USA"
> gsub("\\[.*?\\] ", "", x) # escape [ and ] and make *
lazy instead of greedy[1] "Charite Humboldt Univ, Innere Klin, D-10098 Berlin, Germany; Duke
Univ, Med Ctr, Dept Pathol, Durham, NC 27710 USA"

Sarah

On Tue, May 22, 2012 at 6:08 AM, Sabina Arndt <sabina.arndt at hotmail.de>
wrote:>
> Hello,
>
>
>
> I'd like to remove the individual pairs of square brackets along with
> their content - plus the space directly behind it - from address strings
> ?such as this:
>
>
> ?[Swidsinski, Alexander; Loening-Baucke, Vera; Lochs, Herbert] Charite
> Humboldt Univ, Innere Klin, D-10098 Berlin, Germany; [Hale, Laura P.]
> Duke Univ, Med Ctr, Dept Pathol, Durham, NC 27710 USA
>
>
>
> ?I'd like get the following result:
>
>
>
> ?Charite Humboldt Univ, Innere Klin, D-10098 Berlin, Germany; Duke Univ,
Med Ctr, Dept Pathol, Durham, NC 27710 USA
>
>
>
> I tried
>
>
> ?address = gsub("(.*)[(.*)]", "\\2", address)
>
>
>
> But this deletes everything from the first opening bracket to the last
closing bracket and leaves only the very last address:
>
>
> ?Duke Univ, Med Ctr, Dept Pathol, Durham, NC 27710 USA
>
>
>
> How can I remove only the individual pairs of square brackets along with
their content?
>
>
>
> Thank you very much in advance!
> ? ? ? ?[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


-- 
Sarah Goslee
http://www.stringpage.com
http://www.sarahgoslee.com
http://www.functionaldiversity.org

arun

2012-May-22 15:42 UTC

head link

[R] How to remove square brackets, etc. from address strings?

Hi,

?text <- "[Swidsinski, Alexander; Loening-Baucke, Vera; Lochs, Herbert]
Charite Humboldt Univ, Innere Klin, D-10098 Berlin, Germany; [Hale, Laura P.]
Duke Univ, Med Ctr, Dept Pathol, Durham, NC 27710 USA"


?gsub("\\[.+?]","",text)
A.K.



----- Original Message -----
From: Sabina Arndt <sabina.arndt at hotmail.de>
To: r-help at r-project.org
Cc: 
Sent: Tuesday, May 22, 2012 6:08 AM
Subject: [R] How to remove square brackets, etc. from address strings?


Hello,



I'd like to remove the individual pairs of square brackets along with 
their content - plus the space directly behind it - from address strings
such as this:


? [Swidsinski, Alexander; Loening-Baucke, Vera; Lochs, Herbert] Charite 
Humboldt Univ, Innere Klin, D-10098 Berlin, Germany; [Hale, Laura P.] 
Duke Univ, Med Ctr, Dept Pathol, Durham, NC 27710 USA



I'd like get the following result:



? Charite Humboldt Univ, Innere Klin, D-10098 Berlin, Germany; Duke Univ, Med
Ctr, Dept Pathol, Durham, NC 27710 USA



I tried


? address = gsub("(.*)[(.*)]", "\\2", address)



But this deletes everything from the first opening bracket to the last closing
bracket and leaves only the very last address:


? Duke Univ, Med Ctr, Dept Pathol, Durham, NC 27710 USA



How can I remove only the individual pairs of square brackets along with their
content?



Thank you very much in advance! ??? ???  ??? ?  ??? ??? ? 
??? [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Sabina Arndt

2012-May-25 20:31 UTC

head link

[R] How to remove square brackets, etc. from address strings?

Hello r-help members,

the solutions which Sarah Goslee and arun sent to me in such a prompt 
and helpful manner work well with the examples I cut from the data.frame 
I'm analyzing. Thank you very much for that!
I incorporated them into my R-script and discovered that it still 
doesn't work properly, unfortunately. I have no idea why that's the
case.
You see, I want to extract country names from the contents of 
tab-delimited text files. This is an example of the data I'm using: 
http://pastebin.com/mYZNDXg6
This is the script I'm using to import the data: 
http://pastebin.com/Z10UUH3z (It requires the text files to be in a 
folder which doesn't contain any other .txt files.)
This is the script I'm using to extract the country names: 
http://pastebin.com/G37fuPba
This is the string that's in the relevant field of the first record I'm 
working on:

[Engel, Kathrin M. Y.; Schroeck, Kristin; Schoeneberg, Torsten; Schulz, 
Angela] Univ Leipzig, Fac Med, Inst Biochem, Leipzig, Germany; [Teupser, 
Daniel; Holdt, Lesca Miriam; Thiery, Joachim] Univ Leipzig, Fac Med, 
Inst Lab Med Clin Chem & Mol Diagnost, Leipzig, Germany; [Toenjes, Anke; 
Kern, Matthias; Blueher, Matthias; Stumvoll, Michael] Univ Leipzig, Fac 
Med, Dept Internal Med, Leipzig, Germany; [Dietrich, Kerstin; Kovacs, 
Peter] Univ Leipzig, Fac Med, Interdisciplinary Ctr Clin Res, Leipzig, 
Germany; [Kruegel, Ute] Univ Leipzig, Fac Med, Rudolf Boehm Inst 
Pharmacol & Toxicol, Leipzig, Germany; [Scheidt, Holger A.; Schiller, 
Juergen; Huster, Daniel] Univ Leipzig, Fac Med, Inst Med Phys & Biophys, 
Leipzig, Germany; [Brockmann, Gudrun A.] Humboldt Univ, Inst Anim Sci, 
D-10099 Berlin, Germany; [Augustin, Martin] Ingenium Pharmaceut AG, 
Martinsried, Germany

This is the incorrect result my extraction script gives me for the first 
record:

 > C1s[1]
  [1] "[ENGEL,  KATHRIN M. Y." "KRISTIN"               
"TORSTEN"
  [4] "GERMANY"                "DANIEL"                
"LESCA MIRIAM"
  [7] "GERMANY"                "ANKE"                  
"MATTHIAS"
[10] "MATTHIAS"               "GERMANY"               
"KERSTIN"
[13] "GERMANY"                "GERMANY"               
"[SCHEIDT,
HOLGER A."
[16] "JUERGEN"                "GERMANY"               
"HUMBOLDT"
[19] "GERMANY"

For some reason the first and sixth pair of the eight square brackets 
are not removed ... Do you understand why?
Instead I'd like to get this result, though:

 > C1s[1]
  [1] "GERMANY"        "GERMANY"        "GERMANY"
  [4] "GERMANY"        "GERMANY"        "GERMANY"
  [7] "HUMBOLDT"        "GERMANY"

What am I doing wrong? What are the errors in my R-script?
Would anybody be so kind as to take a look and help me out, please?
Thank you very much in advance!

Faithfully yours,

Sabina Arndt

Possibly Parallel Threads

Search for more possibly parallel threads

R help - May 2012 - How to remove square brackets, etc. from address strings?

[R] How to remove square brackets, etc. from address strings?

[R] How to remove square brackets, etc. from address strings?

[R] How to remove square brackets, etc. from address strings?

[R] How to remove square brackets, etc. from address strings?

Possibly Parallel Threads