thr3ads.net - R help - [R] Function hints [Jun 2006]

If this information is useful, please help other people find it:
Share via:

hadley wickham

2006-Jun-19 12:51 UTC

[R] Function hints

One of the recurring themes in the recent UserR conference was that
many people find it difficult to find the functions they need for a
particular task.  Sandy Weisberg suggested a small idea he would like
to see: a hints function that given an object, lists likely
operations.  I've done my best to implement this function using the
tools currently available in R, and my code is included at the bottom
of this email (I hope that I haven't just duplicated something already
present in R).  I think Sandy's idea is genuinely useful, even in the
limited form provided by my implementation, and I have already
discovered a few useful functions that I was unaware of.

While developing and testing this function, I ran into a few problems
which, I think, represent underlying problems with the current
documentation system.  These are typified by the results of running
hints on a object produced by glm (having class c("glm",
"lm")).  I
have outlined (very tersely) some possible solutions.  Please note
that while these solutions are largely technological, the problem is
at heart sociological: writing documentation is no easier (and perhaps
much harder) than writing a scientific publication, but the rewards
are fewer.

Problems:

 * Many functions share the same description (eg. head, tail).
Solution: each rdoc file should only describe one method. Problem:
Writing rdoc files is tedious, there is a lot of information
duplicated between the code and the documenation (eg. the usage
statement) and some functions share a lot of similar information.
Solution: make it easier to write documentation (eg. documentation
inline with code), and easier to include certain common descriptions
in multiple methods (eg. new include command)

 * It is difficult to tell which functions are commonly
used/important. Solution: break down by keywords. Problem: keywords
are not useful at the moment.  Solution:  make better list of keywords
available and encourage people to use it.  Problem: people won't
unless there is a strong incentive, plus good keywording requires
considerable expertise (especially in bulding up list).  This is
probably insoluable unless one person systematically keywords all of
the base packages.

 * Some functions aren't documented (eg. simulate.lm, formula.glm) -
typically, these are methods where the documentation is in the
generic.  Solution: these methods should all be aliased to the generic
(by default?), and R CMD check should be amended to check for this
situation.  You could also argue that this is a deficiency with my
function, and easily fixed by automatically referring to the generic
if the specific isn't documented.

 * It can't supply suggestions when there isn't an explicit method
(ie. .default is used), this makes it pretty useless for basic
vectors.  This may not really be a problem, as all possible operations
are probably too numerous to list.

 * Provides full name for function, when best practice is to use
generic part only when calling function.  However, getting precise
documentation may requires that full name.  I do the best I can
(returning the generic if specific is alias to a documentation file
with the same method name), but this reflects a deeper problem that
the name you should use when calling a function may be different to
the name you use to get documentation.

 * Can only display methods from currently loaded packages.  This is a
shortcoming of the methods function, but I suspect it is difficult to
find S3 methods without loading a package.

Relatively trivial problems:

 * Needs wide display to be effective.  Could be dealt with by
breaking description in a sensible manner (there may already by R code
to do this.  Please let me know if you know of any)

 * Doesn't currently include S4 methods.  Solution: add some more code
to wrap showMethods

 * Personally, I think sentence case is more aesthetically pleasing
(and more flexible) than title case.


Hadley


hints <- function(x) {
	db <- eval(utils:::.hsearch_db())
	if (is.null(db)) {
		help.search("abcd!", rebuild=TRUE, agrep=FALSE)
		db <- eval(utils:::.hsearch_db())
	}

	base <- db$Base
	alias <- db$Aliases
	key <- db$Keywords

	m <- all.methods(class=class(x))
	m_id <- alias[match(m, alias[,1]), 2]
	keywords <- lapply(m_id, function(id) key[key[,2] %in% id, 1])

	f.names <- cbind(m, base[match(m_id, base[,3]), 4])
	f.names <- unlist(lapply(1:nrow(f.names), function(i) {
		if (is.na(f.names[i, 2])) return(f.names[i, 1])
		a <- methodsplit(f.names[i, 1])
		b <- methodsplit(f.names[i, 2])
		
		if (a[1] == b[1]) f.names[i, 2] else f.names[i, 1]		
	}))
	
	hints <- cbind(f.names, base[match(m_id, base[,3]), 5])
	hints <- hints[order(tolower(hints[,1])),]
	hints <- rbind(    c("--------", "---------------"),
hints)
	rownames(hints) <- rep("", nrow(hints))
	colnames(hints) <- c("Function", "Task")
	hints[is.na(hints)] <- "(Unknown)"
	
	class(hints) <- "hints"
	hints
}

print.hints <- function(x, ...) print(unclass(x), quote=FALSE)

all.methods <- function(classes) {
	methods <- do.call(rbind,lapply(classes, function(x) {
		m <- methods(class=x)
		t(sapply(as.vector(m), methodsplit)) #m[attr(m, "info")$visible]
	}))
	rownames(methods[!duplicated(methods[,1]),])
}

methodsplit <- function(m) {
	parts <- strsplit(m, "\\.")[[1]]
	if (length(parts) == 1) {
		c(name=m, class="")
	} else{
		c(name=paste(parts[-length(parts)], collapse="."),
class=parts[length(parts)])
	}	
}

Duncan Murdoch

2006-Jun-19 14:39 UTC

head link

[Rd] [R] Function hints

I've moved this from R-help to R-devel, where I think it is more 
appropriate, and interspersed comments below.



On 6/19/2006 8:51 AM, hadley wickham wrote:> One of the recurring themes in the recent UserR conference was that
> many people find it difficult to find the functions they need for a
> particular task.  Sandy Weisberg suggested a small idea he would like
> to see: a hints function that given an object, lists likely
> operations.  I've done my best to implement this function using the
> tools currently available in R, and my code is included at the bottom
> of this email (I hope that I haven't just duplicated something already
> present in R).  I think Sandy's idea is genuinely useful, even in the
> limited form provided by my implementation, and I have already
> discovered a few useful functions that I was unaware of.
> 
> While developing and testing this function, I ran into a few problems
> which, I think, represent underlying problems with the current
> documentation system.  These are typified by the results of running
> hints on a object produced by glm (having class c("glm",
"lm")).  I
> have outlined (very tersely) some possible solutions.  Please note
> that while these solutions are largely technological, the problem is
> at heart sociological: writing documentation is no easier (and perhaps
> much harder) than writing a scientific publication, but the rewards
> are fewer.
> 
> Problems:
> 
>  * Many functions share the same description (eg. head, tail).
> Solution: each rdoc file should only describe one method. Problem:
> Writing rdoc files is tedious, there is a lot of information
> duplicated between the code and the documenation (eg. the usage
> statement) and some functions share a lot of similar information.
> Solution: make it easier to write documentation (eg. documentation
> inline with code), and easier to include certain common descriptions
> in multiple methods (eg. new include command)
I think it's bad to document dissimilar functions in the same file, but 
similar related functions *should* be documented together.  Not doing 
this just adds to the burden of documenting them, and the risk of 
modifying only part of the documentation so that it is inconsistent. 
The user also gets the benefit of seeing a common description all at 
once, rather than having to decide whether to follow "See also" links.

Your solutions would both be interesting on their own merits regardless 
of the above.  We did decide to work on preprocessing directives for .Rd 
files at the R core meetings; some sort of include directive may be 
possible.

I don't think I would want complete documentation mixed with the 
original source, but it would certainly be interesting to have partial 
documentation there.  (Complete documentation is too long, and would 
make it harder to read the source without a dedicated editor that could 
hide it.  Though ESS users may see it as a reasonable requirement to 
have everyone use the same editor, I don't think it is.)  However, this 
is a lot of work, depending on infrastructure that is not in place.
>  * It is difficult to tell which functions are commonly
> used/important. Solution: break down by keywords. Problem: keywords
> are not useful at the moment.  Solution:  make better list of keywords
> available and encourage people to use it.  Problem: people won't
> unless there is a strong incentive, plus good keywording requires
> considerable expertise (especially in bulding up list).  This is
> probably insoluable unless one person systematically keywords all of
> the base packages.
I think it is worse than that.  There are concepts in packages that just 
don't arise in base R, and hence there would be no keywords for them 
other than "misc", even if someone redesigned the current system. 
Keywording is hard, and it's not clear to me how to do much better than 
we currently do.

We do already have user-defined keywords (via \concept), but these are 
not widely used.
> 
>  * Some functions aren't documented (eg. simulate.lm, formula.glm) -
> typically, these are methods where the documentation is in the
> generic.  Solution: these methods should all be aliased to the generic
> (by default?), and R CMD check should be amended to check for this
> situation.  You could also argue that this is a deficiency with my
> function, and easily fixed by automatically referring to the generic
> if the specific isn't documented.
I'd say it's a deficiency of your function.  You might want to look at 
the code in get("?") and .helpForCall() to see how those functions
work
out things like

?simulate(x)

where x is an lm object.  (But notice that .helpForCall is an 
undocumented internal function; don't depend on its implementation 
working forever).
>  * It can't supply suggestions when there isn't an explicit method
> (ie. .default is used), this makes it pretty useless for basic
> vectors.  This may not really be a problem, as all possible operations
> are probably too numerous to list.
> 
>  * Provides full name for function, when best practice is to use
> generic part only when calling function.  However, getting precise
> documentation may requires that full name. 
No, not if the call syntax above is used.

  I do the best I can> (returning the generic if specific is alias to a documentation file
> with the same method name), but this reflects a deeper problem that
> the name you should use when calling a function may be different to
> the name you use to get documentation.
> 
>  * Can only display methods from currently loaded packages.  This is a
> shortcoming of the methods function, but I suspect it is difficult to
> find S3 methods without loading a package.
> 
> Relatively trivial problems:
> 
>  * Needs wide display to be effective.  Could be dealt with by
> breaking description in a sensible manner (there may already by R code
> to do this.  Please let me know if you know of any)
I think strwrap() may do what you want.> 
>  * Doesn't currently include S4 methods.  Solution: add some more code
> to wrap showMethods
> 
>  * Personally, I think sentence case is more aesthetically pleasing
> (and more flexible) than title case.
It's quite hard to go from existing title case to sentence case, because 
we don't have any markup to indicate proper names.  One would think it 
would be easier to go in the opposite direction, but in fact the same 
problem arises:  "van Beethoven" for example, not "Van
Beethoven".

> 
> 
> Hadley
> 
> 
> hints <- function(x) {
I don't like the name "hints".  I think we already have too many
ways
into the help system:

help
?
help.search
apropos
etc.?

I like your function, but I'd rather see it attached to one of the 
existing help functions, probably help.search().  For example,

help.search(x)

could look for functions designed to work with the class of x, if it had 
one.  (There's some ambiguity here:  perhaps x contains a string, and I 
want help on that string.)

Anyway, thanks for your efforts on this so far; I hope we end up with 
something that can make it into the next release.

Duncan Murdoch
> 	db <- eval(utils:::.hsearch_db())
> 	if (is.null(db)) {
> 		help.search("abcd!", rebuild=TRUE, agrep=FALSE)
> 		db <- eval(utils:::.hsearch_db())
> 	}
> 
> 	base <- db$Base
> 	alias <- db$Aliases
> 	key <- db$Keywords
> 
> 	m <- all.methods(class=class(x))
> 	m_id <- alias[match(m, alias[,1]), 2]
> 	keywords <- lapply(m_id, function(id) key[key[,2] %in% id, 1])
> 
> 	f.names <- cbind(m, base[match(m_id, base[,3]), 4])
> 	f.names <- unlist(lapply(1:nrow(f.names), function(i) {
> 		if (is.na(f.names[i, 2])) return(f.names[i, 1])
> 		a <- methodsplit(f.names[i, 1])
> 		b <- methodsplit(f.names[i, 2])
> 		
> 		if (a[1] == b[1]) f.names[i, 2] else f.names[i, 1]		
> 	}))
> 	
> 	hints <- cbind(f.names, base[match(m_id, base[,3]), 5])
> 	hints <- hints[order(tolower(hints[,1])),]
> 	hints <- rbind(    c("--------",
"---------------"), hints)
> 	rownames(hints) <- rep("", nrow(hints))
> 	colnames(hints) <- c("Function", "Task")
> 	hints[is.na(hints)] <- "(Unknown)"
> 	
> 	class(hints) <- "hints"
> 	hints
> }
> 
> print.hints <- function(x, ...) print(unclass(x), quote=FALSE)
> 
> all.methods <- function(classes) {
> 	methods <- do.call(rbind,lapply(classes, function(x) {
> 		m <- methods(class=x)
> 		t(sapply(as.vector(m), methodsplit)) #m[attr(m,
"info")$visible]
> 	}))
> 	rownames(methods[!duplicated(methods[,1]),])
> }
> 
> methodsplit <- function(m) {
> 	parts <- strsplit(m, "\\.")[[1]]
> 	if (length(parts) == 1) {
> 		c(name=m, class="")
> 	} else{
> 		c(name=paste(parts[-length(parts)], collapse="."),
class=parts[length(parts)])
> 	}	
> }
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

Joerg van den Hoff

2006-Jun-19 16:14 UTC

head link

[R] Function hints

hadley wickham wrote:> One of the recurring themes in the recent UserR conference was that
> many people find it difficult to find the functions they need for a
> particular task.  Sandy Weisberg suggested a small idea he would like
> to see: a hints function that given an object, lists likely
> operations.  I've done my best to implement this function using the
> tools currently available in R, and my code is included at the bottom
> of this email (I hope that I haven't just duplicated something already
> present in R).  I think Sandy's idea is genuinely useful, even in the
> limited form provided by my implementation, and I have already
> discovered a few useful functions that I was unaware of.
> 
> While developing and testing this function, I ran into a few problems
> which, I think, represent underlying problems with the current
> documentation system.  These are typified by the results of running
> hints on a object produced by glm (having class c("glm",
"lm")).  I
> have outlined (very tersely) some possible solutions.  Please note
> that while these solutions are largely technological, the problem is
> at heart sociological: writing documentation is no easier (and perhaps
> much harder) than writing a scientific publication, but the rewards
> are fewer.
> 
> Problems:
> 
>  * Many functions share the same description (eg. head, tail).
> Solution: each rdoc file should only describe one method. Problem:
> Writing rdoc files is tedious, there is a lot of information
> duplicated between the code and the documenation (eg. the usage
> statement) and some functions share a lot of similar information.
> Solution: make it easier to write documentation (eg. documentation
> inline with code), and easier to include certain common descriptions
> in multiple methods (eg. new include command)
> 
>  * It is difficult to tell which functions are commonly
> used/important. Solution: break down by keywords. Problem: keywords
> are not useful at the moment.  Solution:  make better list of keywords
> available and encourage people to use it.  Problem: people won't
> unless there is a strong incentive, plus good keywording requires
> considerable expertise (especially in bulding up list).  This is
> probably insoluable unless one person systematically keywords all of
> the base packages.
> 
>  * Some functions aren't documented (eg. simulate.lm, formula.glm) -
> typically, these are methods where the documentation is in the
> generic.  Solution: these methods should all be aliased to the generic
> (by default?), and R CMD check should be amended to check for this
> situation.  You could also argue that this is a deficiency with my
> function, and easily fixed by automatically referring to the generic
> if the specific isn't documented.
> 
>  * It can't supply suggestions when there isn't an explicit method
> (ie. .default is used), this makes it pretty useless for basic
> vectors.  This may not really be a problem, as all possible operations
> are probably too numerous to list.
> 
>  * Provides full name for function, when best practice is to use
> generic part only when calling function.  However, getting precise
> documentation may requires that full name.  I do the best I can
> (returning the generic if specific is alias to a documentation file
> with the same method name), but this reflects a deeper problem that
> the name you should use when calling a function may be different to
> the name you use to get documentation.
> 
>  * Can only display methods from currently loaded packages.  This is a
> shortcoming of the methods function, but I suspect it is difficult to
> find S3 methods without loading a package.
> 
> Relatively trivial problems:
> 
>  * Needs wide display to be effective.  Could be dealt with by
> breaking description in a sensible manner (there may already by R code
> to do this.  Please let me know if you know of any)
> 
>  * Doesn't currently include S4 methods.  Solution: add some more code
> to wrap showMethods
> 
>  * Personally, I think sentence case is more aesthetically pleasing
> (and more flexible) than title case.
> 
> 
> Hadley
> 
> 
> hints <- function(x) {
> 	db <- eval(utils:::.hsearch_db())
> 	if (is.null(db)) {
> 		help.search("abcd!", rebuild=TRUE, agrep=FALSE)
> 		db <- eval(utils:::.hsearch_db())
> 	}
> 
> 	base <- db$Base
> 	alias <- db$Aliases
> 	key <- db$Keywords
> 
> 	m <- all.methods(class=class(x))
> 	m_id <- alias[match(m, alias[,1]), 2]
> 	keywords <- lapply(m_id, function(id) key[key[,2] %in% id, 1])
> 
> 	f.names <- cbind(m, base[match(m_id, base[,3]), 4])
> 	f.names <- unlist(lapply(1:nrow(f.names), function(i) {
> 		if (is.na(f.names[i, 2])) return(f.names[i, 1])
> 		a <- methodsplit(f.names[i, 1])
> 		b <- methodsplit(f.names[i, 2])
> 		
> 		if (a[1] == b[1]) f.names[i, 2] else f.names[i, 1]		
> 	}))
> 	
> 	hints <- cbind(f.names, base[match(m_id, base[,3]), 5])
> 	hints <- hints[order(tolower(hints[,1])),]
> 	hints <- rbind(    c("--------",
"---------------"), hints)
> 	rownames(hints) <- rep("", nrow(hints))
> 	colnames(hints) <- c("Function", "Task")
> 	hints[is.na(hints)] <- "(Unknown)"
> 	
> 	class(hints) <- "hints"
> 	hints
> }
> 
> print.hints <- function(x, ...) print(unclass(x), quote=FALSE)
> 
> all.methods <- function(classes) {
> 	methods <- do.call(rbind,lapply(classes, function(x) {
> 		m <- methods(class=x)
> 		t(sapply(as.vector(m), methodsplit)) #m[attr(m,
"info")$visible]
> 	}))
> 	rownames(methods[!duplicated(methods[,1]),])
> }
> 
> methodsplit <- function(m) {
> 	parts <- strsplit(m, "\\.")[[1]]
> 	if (length(parts) == 1) {
> 		c(name=m, class="")
> 	} else{
> 		c(name=paste(parts[-length(parts)], collapse="."),
class=parts[length(parts)])
> 	}	
> }
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

just a feedback: that's a useful function, thank you.

but the problem is probably more general: frequently I do not really 
want to know what I generally can do with a data frame, for instance, 
but rather I would like to use `help.search' as I would use, say, Google 
(and with the same rate of success...).
but the actual `keywords' in the manpages seem insufficient and 
`help.search' does not allow full text search in the manpages (I can 
imagine why (1000 hits...), but without such a thing google, for 
instance, would probably not be half as useful as it is, right?) and 
there is no "sorting by relevance" in the `help.search' output, I
think.
how this sorting could be achieved is a different question, of course.

Maybe Matching Threads

Search for more possibly parallel threads

R help - Jun 2006 - Function hints

[R] Function hints

[Rd] [R] Function hints

[R] Function hints

Maybe Matching Threads