thr3ads.net - R help - [R] removing specified length of text after a period in dataframe of char's [Dec 2011]

If this information is useful, please help other people find it:
Share via:

Aidan Corcoran

2011-Dec-07 11:05 UTC

[R] removing specified length of text after a period in dataframe of char's

Dear all,

 I'm trying to remove some text after the period (a decimal point) in
the data frame 'hi', below. This is one step in formatting a table. So
I would like e.g.
"2.0" to become "2"
and "5.3" to be "5.3",
where the variable digordered contains the number of digits after the
decimal that I would like to display, in the same order in which the
variables appear in hi. If it makes it easier to use, this info is
also contained in the dataframe nam2. The reason the numbers are
recorded as characters is because I used format to get a thousand
separator, which I also need.

The string manipulation functions in R generally don't seem to work
with matrices or data frames, so e.g.   regexpr("\\.",  hi[1,2]) works
but not regexpr("\\.", hi). Finding the location of the period and
then using substring was the approach I was thinking of taking, but
this would seem to need for loops here. I was wondering if anyone
knows any easier ways.

Thanks very much for any help!

Aidan


digordered<-  c(0, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1)
f<-structure(list(c("GDP (LCU,bn)", "GDP ($, bn)",
"GDP per capita (LCU)",
"Ratio to EZ GDP Per Cap", "Share of World GDP (Intl $, %)",
"Real GDP Growth (%)", "Population (mn)", "Unemployment
Rate (%)",
"Ratio of Employed/Unemployed", "PPP Exchange Rate",
"Nominal Exchange
Rate (LCU per $)",
"Inflation (%)", "Main Lending Rate to Private Sector (%)",
"Claims on
Central Gov",
"Claims on Private Sector", "Bank Assets", "Regulator
Capital to RWA",
"Tier 1 Capital to RWA", "Return on Equity", "Liquid
Assets to ST Liabilities"
), `2005` = c(35662, 809, 32128, 0.1, 4.3, 9, 1110, 3.5, NA,
14.7, 44.1, 4, 10.8, 7, 15, 22835, NA, NA, NA, NA), `2009` = c(61240,
1265, 52163, 0.1, 5.2, 6.8, 1174, NA, NA, 16.8, 48.4, 10.9, 12.2,
14, 31, 47180, 13.6, 9, 10.8, 42.8), `2010` = c(75122, 1632,
63100, 0.1, 5.5, 10.1, 1191, NA, NA, 18.5, 45.7, 12, NA, 15,
39, 56787, 14.7, 9.9, 10.5, 41.1), `2011` = c(87455, 1843, 72461,
0.1, 5.7, 7.8, 1207, NA, NA, 19.6, NA, 10.6, NA, NA, NA, NA,
13.5, 9.3, 14.3, 35.8), `2012` = c(99459, 2013, 81313, 0.1, 5.9,
7.5, 1223, NA, NA, 20.5, NA, 8.6, NA, NA, NA, NA, NA, NA, NA,
NA)), .Names = c("", "2005", "2009",
"2010", "2011", "2012"), row.names = c(NA,
20L), class = c("cast_df", "data.frame"))

  hi<-format(f,big.mark=",",scientific=F)
  regexpr("\\.",  hi) #don't know to get location of "."
in a dataframe of chars


nam2<-  structure(list(var1 = c("GDP (LCU,bn)", "GDP ($,
bn)", "GDP
per capita (LCU)",
"Ratio to EZ GDP Per Cap", "GDP per capita (Intl $)",
"EU GDP per
capita (Intl $)",
"Share of World GDP (Intl $, %)", "Real GDP Growth (%)",
"Population (mn)",
"Unemployment Rate (%)", "Ratio of Employed/Unemployed",
"Employment (1000s)",
"Unemployment (1000s)", "PPP Exchange Rate", "Nominal
Exchange Rate
(LCU per $)",
"Inflation (%)", "Main Lending Rate to Private Sector (%)",
"Claims on
Central Gov",
"Claims on Private Sector", "Bank Assets", "Regulator
Capital to RWA",
"Tier 1 Capital to RWA", "Return on Equity", "Liquid
Assets to ST Liabilities",
"Reserves"), digi = c(0, 0, 0, 1, 0, 0, 1, 1, 0, 1, 1, 0, 0,
1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0)), .Names = c("var1",
"digi"
), row.names = c("96", "97", "98",
"110", "99", "100", "101",
"102", "103", "111", "112",
"104", "105", "106", "107",
"108",
"109", "114", "115", "113",
"119", "120", "121", "122",
"116"
), class = "data.frame")

Sarah Goslee

2011-Dec-07 12:05 UTC

head link

[R] removing specified length of text after a period in dataframe of char's

Hi,

Example data is crucial, but small simple example data is even better.
I'm too lazy to figure out which bits I need from your data, so here's
a simple example of one way to approach your question. You could
use gsub() in very much the same manner if you need more complex
output.
> testdata <- data.frame(values=c(2.0, 5.3, 1.1), digits=c(0, 1, 2))
> testdata  values digits
1    2.0      0
2    5.3      1
3    1.1      2
# a nice way that works on numbers> apply(testdata, 1, function(x)sprintf(paste("%0.", x[2],
"f", sep=""), x[1]))[1] "2"    "5.3"  "1.10"

# a messy way that works on strings> apply(testdata, 1, function(x)sub(paste("(^.*\\.\\d{", x[2],
"})(\\d*)", sep=""), "\\1", x[1]))[1] "2"   "5.3" "1.1"

Also note that the second method will not add zeros to pad out the
end. If you need that, I'd consider rearranging the order of your
steps so that you can use sprintf().

Someone else might have a more flexible way too; I'd be interested to see
it.
Unfortunately I don't think sprintf() has a way to insert a thousands
separator,
or that would be a one-step solution.

Sarah

On Wed, Dec 7, 2011 at 6:05 AM, Aidan Corcoran
<aidan.corcoran11 at gmail.com> wrote:> ?Dear all,
>
> ?I'm trying to remove some text after the period (a decimal point) in
> the data frame 'hi', below. This is one step in formatting a table.
So
> I would like e.g.
> "2.0" to become "2"
> and "5.3" to be "5.3",
> where the variable digordered contains the number of digits after the
> decimal that I would like to display, in the same order in which the
> variables appear in hi. If it makes it easier to use, this info is
> also contained in the dataframe nam2. The reason the numbers are
> recorded as characters is because I used format to get a thousand
> separator, which I also need.
>
> The string manipulation functions in R generally don't seem to work
> with matrices or data frames, so e.g. ? regexpr("\\.", ?hi[1,2])
works
> but not regexpr("\\.", hi). Finding the location of the period
and
> then using substring was the approach I was thinking of taking, but
> this would seem to need for loops here. I was wondering if anyone
> knows any easier ways.
>
> Thanks very much for any help!
>
> Aidan
>
>
> digordered<- ?c(0, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1,
1)
> f<-structure(list(c("GDP (LCU,bn)", "GDP ($, bn)",
"GDP per capita (LCU)",
> "Ratio to EZ GDP Per Cap", "Share of World GDP (Intl $,
%)",
> "Real GDP Growth (%)", "Population (mn)",
"Unemployment Rate (%)",
> "Ratio of Employed/Unemployed", "PPP Exchange Rate",
"Nominal Exchange
> Rate (LCU per $)",
> "Inflation (%)", "Main Lending Rate to Private Sector
(%)", "Claims on
> Central Gov",
> "Claims on Private Sector", "Bank Assets",
"Regulator Capital to RWA",
> "Tier 1 Capital to RWA", "Return on Equity",
"Liquid Assets to ST Liabilities"
> ), `2005` = c(35662, 809, 32128, 0.1, 4.3, 9, 1110, 3.5, NA,
> 14.7, 44.1, 4, 10.8, 7, 15, 22835, NA, NA, NA, NA), `2009` = c(61240,
> 1265, 52163, 0.1, 5.2, 6.8, 1174, NA, NA, 16.8, 48.4, 10.9, 12.2,
> 14, 31, 47180, 13.6, 9, 10.8, 42.8), `2010` = c(75122, 1632,
> 63100, 0.1, 5.5, 10.1, 1191, NA, NA, 18.5, 45.7, 12, NA, 15,
> 39, 56787, 14.7, 9.9, 10.5, 41.1), `2011` = c(87455, 1843, 72461,
> 0.1, 5.7, 7.8, 1207, NA, NA, 19.6, NA, 10.6, NA, NA, NA, NA,
> 13.5, 9.3, 14.3, 35.8), `2012` = c(99459, 2013, 81313, 0.1, 5.9,
> 7.5, 1223, NA, NA, 20.5, NA, 8.6, NA, NA, NA, NA, NA, NA, NA,
> NA)), .Names = c("", "2005", "2009",
"2010", "2011", "2012"), row.names = c(NA,
> 20L), class = c("cast_df", "data.frame"))
>
> ?hi<-format(f,big.mark=",",scientific=F)
> ?regexpr("\\.", ?hi) #don't know to get location of
"." in a dataframe of chars
>
>
> nam2<- ?structure(list(var1 = c("GDP (LCU,bn)", "GDP ($,
bn)", "GDP
> per capita (LCU)",
> "Ratio to EZ GDP Per Cap", "GDP per capita (Intl $)",
"EU GDP per
> capita (Intl $)",
> "Share of World GDP (Intl $, %)", "Real GDP Growth
(%)", "Population (mn)",
> "Unemployment Rate (%)", "Ratio of
Employed/Unemployed", "Employment (1000s)",
> "Unemployment (1000s)", "PPP Exchange Rate",
"Nominal Exchange Rate
> (LCU per $)",
> "Inflation (%)", "Main Lending Rate to Private Sector
(%)", "Claims on
> Central Gov",
> "Claims on Private Sector", "Bank Assets",
"Regulator Capital to RWA",
> "Tier 1 Capital to RWA", "Return on Equity",
"Liquid Assets to ST Liabilities",
> "Reserves"), digi = c(0, 0, 0, 1, 0, 0, 1, 1, 0, 1, 1, 0, 0,
> 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0)), .Names = c("var1",
"digi"
> ), row.names = c("96", "97", "98",
"110", "99", "100", "101",
> "102", "103", "111", "112",
"104", "105", "106", "107",
"108",
> "109", "114", "115", "113",
"119", "120", "121", "122",
"116"
> ), class = "data.frame")
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


-- 
Sarah Goslee
http://www.stringpage.com
http://www.sarahgoslee.com
http://www.functionaldiversity.org

Possibly Parallel Threads

Search for more apparently analagous threads

R help - Dec 2011 - removing specified length of text after a period in dataframe of char's

[R] removing specified length of text after a period in dataframe of char's

[R] removing specified length of text after a period in dataframe of char's

Possibly Parallel Threads