For tasks like this, you will probably want to make sure to import the data as
character data rather than as a factor. E.g.
dat <- read.csv( "myfile.csv", header=FALSE, as.is=TRUE )
You can check what you have with the str() function.
--
Sent from my phone. Please excuse my brevity.
On February 28, 2017 5:19:40 AM PST, Marc Schwartz <marc_schwartz at
me.com> wrote:>
>> On Feb 28, 2017, at 3:38 AM, Harshal Athawale
><pgcim15.harshal at spjimr.org> wrote:
>>
>> I am new in R.
>>
>> I have a file. This file contains name of the companies.
>> 'data.frame': 494 obs. of 1 variable:
>> $ V1: Factor w/ 470 levels "3-d engineering corp",..: 293 134
339 359
>143
>> 399 122 447 398 384 ...
>>
>> Problem: I would like to remove "CO" (As it is the most
frequent
>word). I
>> would like "CO" to removed from BOEING CO --> BOEING but
not from
>SAGINAW
>> *CO*UNTY INC*. *
>>
>>> text = c("BOEING CO","ENGMANTAYLOR
CO","SAGINAW COUNTY INC")
>>
>>> gsub(x = text, pattern = "CO", replacement =
"")
>>
>> [1] "BOEING " "ENGMANTAYLOR " "SAGINAW
UNTY"
>>
>> Thanks in advance.
>>
>> - Sam
>
>
>Hi,
>
>See ?regex and ?grep for some details and examples on how to construct
>the expression used for matching, as well as some of the references
>therein.
>
>In this case, you want to use something along the lines of:
>
>> gsub(" CO$", "", text)
>[1] "BOEING" "ENGMANTAYLOR"
"SAGINAW COUNTY INC"
>
>where the "CO" is preceded by a space and followed by the
"$", which is
>a special character that indicates the end of the string to be matched.
>
>Regards,
>
>Marc Schwartz
>
>
> [[alternative HTML version deleted]]
>
>______________________________________________
>R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.