thr3ads.net - R help - [R] R newbie | sapply and FUN error [May 2010]

If this information is useful, please help other people find it:
Share via:

egc

2010-May-20 21:42 UTC

[R] R newbie | sapply and FUN error

Greetings -

While I've used R a fair bit for basic statistical machinations, I've
not used it for data manipulation - I've used SAS for 20+ years (and
SAS real shines in data handling). So, I've started the process of
trying to figure out 'how to do in R what I can do in my sleep in SAS'
- specifically wrt to data manipulating. So, these are decidely
'newbie' level questions.

So, starting very simple. Created a tine example CSV file, which I
call test.csv.

Loc,cost
A,1
C,3
D,2
F,3
H,4
K,3
M,8

Now, all I want to do is read it in, and derive a new variable which
is a Z-transform of 'cost'. Here is what I've tried so far:
> prices <- read.csv("c:/documents and
settings/user/desktop/test.csv",header=TRUE,sep=",",na.strings=".");
> ?print(prices$cost);
So far, so good (being able to pull in the data is a good thing).

Now, while I'm sure there are lots of ways to do what I want, I'm
going to brute force it, by calculating column mean and column SD for
'cost', generate the Z-transformed value, and then add it to the
dataframe. However, here is where I'm having problems. After about an
hour of searching, I realized I need to use an 'apply' function to
apply a function (say, mean) to a column in a dataframe. But, I can
seem to get it to work successfully (and this is the gist of the
question).

If I try
> result <- sapply(prices['cost'],MARGIN=2,FUN=mean,na.rm=TRUE);
> print(result);

Works perfectly.

But, if I simply change FUN=mean to FUN=sd, not so successful:

If I try
> result <- sapply(prices['cost'],MARGIN=2,FUN=sd,na.rm=TRUE);
> print(result);
Throws the following error:

Error in FUN(X[[1L]], ...) : unused argument(s) (MARGIN = 2)

Further, If I try
> result <- sapply(prices$cost,MARGIN=2,FUN=mean,na.rm=TRUE);
> print(result);
it prints 8 values corresponding to the value of each element of the
data set - meaning, its treating prices$cost as a row vector.Which
makes no sense to me. What do I have to do to use prices$cost as the
first argument in the sapply call? If I can't, why not?
is.vector(prices$cost) shows TRUE, so why can't I take the mean over
the vector?

At any rate, I'll start from here. Being able to apply functions to
column(s) of a dataframe seems pretty fundamental, so I'd like to
start by understanding the basics.

Thanks in advance.

David Winsemius

2010-May-20 22:15 UTC

head link

[R] R newbie | sapply and FUN error

On May 20, 2010, at 5:42 PM, egc wrote:
> Greetings -
>
> While I've used R a fair bit for basic statistical machinations,
I've
> not used it for data manipulation - I've used SAS for 20+ years (and
> SAS real shines in data handling). So, I've started the process of
> trying to figure out 'how to do in R what I can do in my sleep in
SAS'
> - specifically wrt to data manipulating. So, these are decidely
> 'newbie' level questions.
>
> So, starting very simple. Created a tine example CSV file, which I
> call test.csv.
>
> Loc,cost
> A,1
> C,3
> D,2
> F,3
> H,4
> K,3
> M,8
>
> Now, all I want to do is read it in, and derive a new variable which
> is a Z-transform of 'cost'. Here is what I've tried so far:
>
>> prices <- read.csv("c:/documents and settings/user/desktop/ 
>> test.csv",header=TRUE,sep=",",na.strings=".");
>>  print(prices$cost);
>
> So far, so good (being able to pull in the data is a good thing).
>
> Now, while I'm sure there are lots of ways to do what I want, I'm
> going to brute force it, by calculating column mean and column SD for
> 'cost', generate the Z-transformed value, and then add it to the
> dataframe. However, here is where I'm having problems. After about an
> hour of searching, I realized I need to use an 'apply' function to
> apply a function (say, mean) to a column in a dataframe. But, I can
> seem to get it to work successfully (and this is the gist of the
> question).
>
> If I try
>
>> result <-
sapply(prices['cost'],MARGIN=2,FUN=mean,na.rm=TRUE);
>> print(result);
I suspect you are missing the easy way to do this;

mean( prices['cost'] )
>
>
> Works perfectly.
>
> But, if I simply change FUN=mean to FUN=sd, not so successful:
>
> If I try
>
>> result <- sapply(prices['cost'],MARGIN=2,FUN=sd,na.rm=TRUE);
>> print(result);
>
Try:

result <- sd(prices['cost'])

R functions often  expect to work on vectors without an explicit look  
or apply function.

> Throws the following error:
>
> Error in FUN(X[[1L]], ...) : unused argument(s) (MARGIN = 2)
>
> Further, If I try
>
>> result <- sapply(prices$cost,MARGIN=2,FUN=mean,na.rm=TRUE);
>> print(result);
>
> it prints 8 values corresponding to the value of each element of the
> data set - meaning, its treating prices$cost as a row vector.Which
> makes no sense to me. What do I have to do to use prices$cost as the
> first argument in the sapply call?
Not use sapply. "sapply" generally will be used to produce a vector or
list  as a result. If you only want a scalar, then it's not the right  
tool.

> If I can't, why not?
> is.vector(prices$cost) shows TRUE, so why can't I take the mean over
> the vector?
>
> At any rate, I'll start from here. Being able to apply functions to
> column(s) of a dataframe seems pretty fundamental, so I'd like to
> start by understanding the basics.
>
> Thanks in advance.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
West Hartford, CT

Marc Schwartz

2010-May-20 22:23 UTC

head link

[R] R newbie | sapply and FUN error

On May 20, 2010, at 4:42 PM, egc wrote:
> Greetings -
> 
> While I've used R a fair bit for basic statistical machinations,
I've
> not used it for data manipulation - I've used SAS for 20+ years (and
> SAS real shines in data handling). So, I've started the process of
> trying to figure out 'how to do in R what I can do in my sleep in
SAS'
> - specifically wrt to data manipulating. So, these are decidely
> 'newbie' level questions.
> 
> So, starting very simple. Created a tine example CSV file, which I
> call test.csv.
> 
> Loc,cost
> A,1
> C,3
> D,2
> F,3
> H,4
> K,3
> M,8
> 
> Now, all I want to do is read it in, and derive a new variable which
> is a Z-transform of 'cost'. Here is what I've tried so far:
> 
>> prices <- read.csv("c:/documents and
settings/user/desktop/test.csv",header=TRUE,sep=",",na.strings=".");
>>  print(prices$cost);
> 
> So far, so good (being able to pull in the data is a good thing).
> 
> Now, while I'm sure there are lots of ways to do what I want, I'm
> going to brute force it, by calculating column mean and column SD for
> 'cost', generate the Z-transformed value, and then add it to the
> dataframe. However, here is where I'm having problems. After about an
> hour of searching, I realized I need to use an 'apply' function to
> apply a function (say, mean) to a column in a dataframe. But, I can
> seem to get it to work successfully (and this is the gist of the
> question).
> 
> If I try
> 
>> result <-
sapply(prices['cost'],MARGIN=2,FUN=mean,na.rm=TRUE);
>> print(result);
> 
> 
> Works perfectly.
> 
> But, if I simply change FUN=mean to FUN=sd, not so successful:
> 
> If I try
> 
>> result <- sapply(prices['cost'],MARGIN=2,FUN=sd,na.rm=TRUE);
>> print(result);
> 
> Throws the following error:
> 
> Error in FUN(X[[1L]], ...) : unused argument(s) (MARGIN = 2)
> 
> Further, If I try
> 
>> result <- sapply(prices$cost,MARGIN=2,FUN=mean,na.rm=TRUE);
>> print(result);
> 
> it prints 8 values corresponding to the value of each element of the
> data set - meaning, its treating prices$cost as a row vector.Which
> makes no sense to me. What do I have to do to use prices$cost as the
> first argument in the sapply call? If I can't, why not?
> is.vector(prices$cost) shows TRUE, so why can't I take the mean over
> the vector?
> 
> At any rate, I'll start from here. Being able to apply functions to
> column(s) of a dataframe seems pretty fundamental, so I'd like to
> start by understanding the basics.
> 
> Thanks in advance.

First, welcome to R.

Second, you are using the argument 'MARGIN', which is actually used in
the apply() function, not in sapply(). Hence the error messages and arguably,
the unpredictable behavior.

One of the key concepts with R, as opposed to SAS, is that in R, you take a
'holistic' view of objects, not an element-by-element view. So for many
operations, R's functions are 'vectorized', which means that they
can operate on an entire object (eg. a column in a data frame) with a single
function call.

So in this case:
> mean(prices$cost)[1] 3.428571
> sd(prices$cost)[1] 2.225395

gets you want you want. There is also more than one way of accessing the data.
For example:
> mean(prices[, "cost"])[1] 3.428571
> mean(prices[["cost"]])[1] 3.428571

and
> mean(prices["cost"])    cost 
3.428571

Note that in the last example, the result is 'named'.  Each of these
have to do with the structure of a data frame, which is covered in the manuals
and help files, for example: ?Extract and the 'See Also' links on that
page.

There is no need to loop over each element in the column using one of the
*apply() functions.

If you have not, I would recommend reading An Introduction to R, which is
available via the main R web site in the Manuals section, or it also installed
with R on your computer.

Additionally, an excellent resource for folks coming from SAS to R, is available
at:

  http://RforSASandSPSSusers.com/

The authors have provided a terrific review of how one performs common
operations in R, that you are already comfortable doing in SAS.

HTH,

Marc Schwartz

Dennis Murphy

2010-May-21 00:57 UTC

head link

[R] R newbie | sapply and FUN error

Hi:

To illustrate the idea of vectorization that the previous posters raised,
here's a quick example of finding the z-scores that you requested:

# Define a vectorized function to do the standardization - the argument
# x below is a vector. We'll keep it simple and ignore the possibility of
# missing values and other complications...
std <- function(x) (x - mean(x))/sd(x)

# Create a new column in the original data frame for the z-scores,
# where df is the name of your data frame...
df <- transform(df, zscore = std(df[, 'cost']))
df
  Loc cost     zscore
1   A    1 -1.0912993
2   C    3 -0.1925822
3   D    2 -0.6419407
4   F    3 -0.1925822
5   H    4  0.2567763
6   K    3 -0.1925822
7   M    8  2.0542104

transform() is a function used to add one or more columns to an existing
data
frame, usually by performing some function on its rows. Since a data frame
can
be indexed by its rows and columns, the comma before 'cost' signifies
that
we
are to choose the column of df named cost, and all rows. (BTW, indexing is a
very
powerful feature of R that can be used to great advantage in data
processing.)

Also notice how the std() function takes advantage of the vector property of
its input argument by computing the mean and standard deviation in-line and
mapping the results to each element of the vector through the function
definition.
It also implicitly applies the 'recycling rule', since mean(x) and sd(x)
are
scalars
that we are mapping to vectors.I find this more intuitive than the 'SAS
way'.
It takes three lines to read in the data, define the standardization
function,
apply it and attach it to the data frame. How many lines of SAS code would
this take?

HTH,
Dennis

On Thu, May 20, 2010 at 2:42 PM, egc <forum.query@gmail.com> wrote:
> Greetings -
>
> While I've used R a fair bit for basic statistical machinations,
I've
> not used it for data manipulation - I've used SAS for 20+ years (and
> SAS real shines in data handling). So, I've started the process of
> trying to figure out 'how to do in R what I can do in my sleep in
SAS'
> - specifically wrt to data manipulating. So, these are decidely
> 'newbie' level questions.
>
> So, starting very simple. Created a tine example CSV file, which I
> call test.csv.
>
> Loc,cost
> A,1
> C,3
> D,2
> F,3
> H,4
> K,3
> M,8
>
> Now, all I want to do is read it in, and derive a new variable which
> is a Z-transform of 'cost'. Here is what I've tried so far:
>
> > prices <- read.csv("c:/documents and
>
settings/user/desktop/test.csv",header=TRUE,sep=",",na.strings=".");
> >  print(prices$cost);
>
> So far, so good (being able to pull in the data is a good thing).
>
> Now, while I'm sure there are lots of ways to do what I want, I'm
> going to brute force it, by calculating column mean and column SD for
> 'cost', generate the Z-transformed value, and then add it to the
> dataframe. However, here is where I'm having problems. After about an
> hour of searching, I realized I need to use an 'apply' function to
> apply a function (say, mean) to a column in a dataframe. But, I can
> seem to get it to work successfully (and this is the gist of the
> question).
>
> If I try
>
> > result <-
sapply(prices['cost'],MARGIN=2,FUN=mean,na.rm=TRUE);
> > print(result);
>
>
> Works perfectly.
>
> But, if I simply change FUN=mean to FUN=sd, not so successful:
>
> If I try
>
> > result <-
sapply(prices['cost'],MARGIN=2,FUN=sd,na.rm=TRUE);
> > print(result);
>
> Throws the following error:
>
> Error in FUN(X[[1L]], ...) : unused argument(s) (MARGIN = 2)
>
> Further, If I try
>
> > result <- sapply(prices$cost,MARGIN=2,FUN=mean,na.rm=TRUE);
> > print(result);
>
> it prints 8 values corresponding to the value of each element of the
> data set - meaning, its treating prices$cost as a row vector.Which
> makes no sense to me. What do I have to do to use prices$cost as the
> first argument in the sapply call? If I can't, why not?
> is.vector(prices$cost) shows TRUE, so why can't I take the mean over
> the vector?
>
> At any rate, I'll start from here. Being able to apply functions to
> column(s) of a dataframe seems pretty fundamental, so I'd like to
> start by understanding the basics.
>
> Thanks in advance.
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

Possibly Parallel Threads

Search for more seemingly similar threads

R help - May 2010 - R newbie | sapply and FUN error

[R] R newbie | sapply and FUN error

[R] R newbie | sapply and FUN error

[R] R newbie | sapply and FUN error

[R] R newbie | sapply and FUN error

Possibly Parallel Threads