Dear list,
I'm trying to do a set of generic functions do make contingency tables from
data.frames. It is just running "nice" (I'm learning R), but I
think it can be
better.
I would like to filter the data.frame, i.e, eliminate all not numeric variables.
And I don't know how to make it: please, help me.
Below one of the my functions ('er' is a mention to EasieR, because
I'm trying
to do a package for myself and the my students):
#2. Tables from data.frames
#2.1---er.table.df.br (User define breaks and right)------------
er.table.df.br <- function(df,
breaks = c('Sturges', 'Scott',
'FD'),
right = FALSE) {
if (is.data.frame(df) != 'TRUE')
stop('need "data.frame" data')
dim_df <- dim(df)
tmpList <- list()
for (i in 1:dim_df[2]) {
x <- as.matrix(df[ ,i])
x <- na.omit(x)
k <- switch(breaks[1],
'Sturges' = nclass.Sturges(x),
'Scott' = nclass.scott(x),
'FD' = nclass.FD(x),
stop("'breaks' must be 'Sturges',
'Scott' or 'FD'"))
tmp <- range(x)
classIni <- tmp[1] - tmp[2]/100
classEnd <- tmp[2] + tmp[2]/100
R <- classEnd-classIni
h <- R/k
# Absolut frequency
f <- table(cut(x, br = seq(classIni, classEnd, h), right = right))
# Relative frequency
fr <- f/length(x)
# Relative frequency, %
frP <- 100*(f/length(x))
# Cumulative frequency
fac <- cumsum(f)
# Cumulative frequency, %
facP <- 100*(cumsum(f/length(x)))
fi <- round(f, 2)
fr <- round(as.numeric(fr), 2)
frP <- round(as.numeric(frP), 2)
fac <- round(as.numeric(fac), 2)
facP <- round(as.numeric(facP),2)
# Table
res <- data.frame(fi, fr, frP, fac, facP)
names(res) <- c('Class limits', 'fi', 'fr',
'fr(%)', 'fac', 'fac(%)')
tmpList <- c(tmpList, list(res))
}
names(tmpList) <- names(df)
return(tmpList)
}
To try the function:
#a) runing nice
y1=rnorm(100, 10, 1)
y2=rnorm(100, 58, 4)
y3=rnorm(100, 500, 10)
mydf=data.frame(y1, y2, y3)
#tbdf=er.table.df.br (mydf, breaks = 'Sturges', right=F)
#tbdf=er.table.df.br (mydf, breaks = 'Scott', right=F)
tbdf=er.table.df.br (mydf, breaks = 'FD', right=F)
print(tbdf)
#b) One of the problems
y1=rnorm(100, 10, 1)
y2=rnorm(100, 58, 4)
y3=rnorm(100, 500, 10)
y4=rep(letters[1:10], 10)
mydf=data.frame(y1, y2, y3, y4)
tbdf=er.table.df.br (mydf, breaks = 'Scott', right=F)
print(tbdf)
Could anyone give me a hint how to work around this?
PS: Excuse my bad English ;-)
--
Jose Claudio Faria
Brasil/Bahia/UESC/DCET
Estatistica Experimental/Prof. Adjunto
mails:
joseclaudio.faria at terra.com.br
jc_faria at uesc.br
jc_faria at uol.com.br
On 5/24/05, Jose Claudio Faria <joseclaudio.faria at terra.com.br> wrote:> Dear list, > > I'm trying to do a set of generic functions do make contingency tables from > data.frames. It is just running "nice" (I'm learning R), but I think it can be > better. > > I would like to filter the data.frame, i.e, eliminate all not numeric variables. > And I don't know how to make it: please, help me. > > Below one of the my functions ('er' is a mention to EasieR, because I'm trying > to do a package for myself and the my students): > > #2. Tables from data.frames > #2.1---er.table.df.br (User define breaks and right)------------ > er.table.df.br <- function(df, > breaks = c('Sturges', 'Scott', 'FD'), > right = FALSE) { > > if (is.data.frame(df) != 'TRUE') > stop('need "data.frame" data') > > dim_df <- dim(df) > > tmpList <- list() > > for (i in 1:dim_df[2]) { > > x <- as.matrix(df[ ,i]) > x <- na.omit(x) > > k <- switch(breaks[1], > 'Sturges' = nclass.Sturges(x), > 'Scott' = nclass.scott(x), > 'FD' = nclass.FD(x), > stop("'breaks' must be 'Sturges', 'Scott' or 'FD'")) > > tmp <- range(x) > classIni <- tmp[1] - tmp[2]/100 > classEnd <- tmp[2] + tmp[2]/100 > R <- classEnd-classIni > h <- R/k > > # Absolut frequency > f <- table(cut(x, br = seq(classIni, classEnd, h), right = right)) > > # Relative frequency > fr <- f/length(x) > > # Relative frequency, % > frP <- 100*(f/length(x)) > > # Cumulative frequency > fac <- cumsum(f) > > # Cumulative frequency, % > facP <- 100*(cumsum(f/length(x))) > > fi <- round(f, 2) > fr <- round(as.numeric(fr), 2) > frP <- round(as.numeric(frP), 2) > fac <- round(as.numeric(fac), 2) > facP <- round(as.numeric(facP),2) > > # Table > res <- data.frame(fi, fr, frP, fac, facP) > names(res) <- c('Class limits', 'fi', 'fr', 'fr(%)', 'fac', 'fac(%)') > tmpList <- c(tmpList, list(res)) > } > names(tmpList) <- names(df) > return(tmpList) > } > > To try the function: > > #a) runing nice > y1=rnorm(100, 10, 1) > y2=rnorm(100, 58, 4) > y3=rnorm(100, 500, 10) > mydf=data.frame(y1, y2, y3) > #tbdf=er.table.df.br (mydf, breaks = 'Sturges', right=F) > #tbdf=er.table.df.br (mydf, breaks = 'Scott', right=F) > tbdf=er.table.df.br (mydf, breaks = 'FD', right=F) > print(tbdf) > > > #b) One of the problems > y1=rnorm(100, 10, 1) > y2=rnorm(100, 58, 4) > y3=rnorm(100, 500, 10) > y4=rep(letters[1:10], 10) > mydf=data.frame(y1, y2, y3, y4) > tbdf=er.table.df.br (mydf, breaks = 'Scott', right=F) > print(tbdf) >Try this: sapply(my.data.frame, is.numeric) Also you might want to look up: ?match.arg ?stopifnot ?ncol ?sapply ?lapply