Displaying 20 results from an estimated 8000 matches similar to: "Identifying similar but not identical rows in a dataframe"
2008 Sep 12
1
match and incomparables
Hello,
I was playing around with the newly implemented 'incomparables' argument
in 'match' and realized the argument does not behave anything like I
expected. Can someone explain what is going on here? Sorry if I'm
misreading the documentation.
> match(1:3, 1:3, incomparables=1)
[1] NA 2 3 # This seems right, the 1 in 'x' is 'incomparable'
>
2016 Jun 08
1
Trivial patch for merge.Rd
Hi all,
After replying to r-help earlier today on the merge() related thread, I noted a trivial grammatical error in the description for the 'suffixes' argument in it's help file.
A patch against the current SVN trunk version of merge.Rd in ..library/base/man is attached and pasted here:
--- merge1.Rd 2016-06-08 13:34:35.000000000 -0500
+++ merge2.Rd 2016-06-08 14:03:34.000000000
2004 Aug 19
0
suggesting a new feature for unique()
Dear R-devel,
May I suggest that a new feature be added to a couple of unique() methods?
Sometimes it's useful to have the indices of the original data that the
unique elements come from, so that the original data can be recreated from
the unique()ed data. I suggest that an `index' argument be added for
unique. Below is a suggested patch against
R/src/library/base/R/duplicated.R:
***
2009 Mar 30
1
duplicated fails to rise correct errors (PR#13632)
Full_Name: Wacek Kusnierczyk
Version: 2.8.0 and 2.10.0 r48242
OS: Ubuntu 8.04 Linux 32 bit
Submission from: (NULL) (129.241.110.161)
In the following code:
duplicated(data.frame(), incomparables=NA)
# Error in if (!is.logical(incomparables) || incomparables)
.NotYetUsed("incomparables != FALSE") :
# missing value where TRUE/FALSE needed
the raised error is clearly not the
2007 Nov 02
0
applying duplicated, unique and match to lists?
Dear R developers,
While improving duplicated.array() and friends and developing equivalents for the new ff package for large datasets I came across two questions:
1) is it safe to use duplicated.default(), unique.default() and match() on arbitrary lists? If so, we can speed up duplicated.array and friends considerably by using list() instead of paste(collapse="\r")
2) while
2010 Nov 17
2
Bug in agrep computing edit distance?
I posted this yesterday to r-help and Ben Bolker suggested reposting it
here...
Dickison, Daniel <ddickison <at> carnegielearning.com> writes:
>
> The documentation for agrep says it uses the Levenshtein edit distance,
> but it seems to get this wrong in certain cases when there is a
> combination of deletions and substitutions. For example:
>
> >
2010 Jan 18
0
Fix for bug in match()
Hello all,
I posted the following bug last week:
# These calls work correctly:
match(c("A", "B", "C"), c("A","C"), incomparables=NA) # okay
match(c("A", "B", "C"), "A") # okay
match("A", c("A", "B"), incomparables=NA) # okay
# This one causes R to hang:
match(c("A",
2012 Jan 19
1
bug en funcion 'agrep'
Estimados R-users:
Estoy intentando usar la función 'agrep' para hacer búsquedas en
cadenas de texto. El parámetro max.distance permite controlar la
medida de aproximación de búsqueda de la función de Levenshtein. No
obstante, cuando hago búsquedas específicas no obtengo siempre el
resultado deseado y no se si es un bug o que no entiendo bien el
algoritmo de búsqueda. Por
2010 Nov 16
1
Bug in agrep computing edit distance?
The documentation for agrep says it uses the Levenshtein edit distance,
but it seems to get this wrong in certain cases when there is a
combination of deletions and substitutions. For example:
> agrep("abcd", "abcxyz", max.distance=1)
[1] 1
That should've been a no-match. The edit distance between those strings
is 3 (1 substitution, 2 deletions), but agrep matches
2006 Jan 05
1
Pb with agrep()
Happy new year everybody,
I'm getting the following while trying to use the agrep() function:
> pattern <- "XXX"
> subject <- c("oooooo", "oooXooo", "oooXXooo", "oooXXXooo")
> max <- list(ins=0, del=0, sub=0) # I want exact matches only
> agrep(pattern, subject, max=max)
[1] 4
OK
> max$sub <- 1 # One allowed
2012 Dec 11
1
How do you use agrep inside a loop
Hi all.
This is my first message at R-help...so I'm hoping I have some beginner's
luck and get some good help for my problem!
FYI I have just started using R recently so my knowledge of R is pretty
preliminary.
Okay here is what I need help with - I need to know how to use agrep in a
for loop.
I need to compare elements of a vector of names with other elements of the
same vector.
2010 Sep 17
0
Merging data frames on a variety of columns
Hello,
This is a semi-complicated question about comparing two datasets,
probably using merge, but I am open to other ideas. I have a large
frame of information about companies.? It's over 30,000 rows and looks
something like...
df1 <-
identifier1???? identifier2 name other_name year
H34 C56 ACME ACME_LTD 2001
H34
2008 May 29
1
help (using ?) does not handle trailing whitespace (PR#11537)
> ?agrep
>
Results in:
No documentation for 'agrep ' in specified packages and libraries:
you could try 'help.search("agrep ")'
There is white space after agrep, that ? doesn't ignore.
--please do not edit the information below--
Version:
platform = i486-pc-linux-gnu
arch = i486
os = linux-gnu
system = i486, linux-gnu
status =
major = 2
minor =
2008 Jun 02
0
(PR#11537) help (using ?) does not handle trailing whitespace
>>>>> "BDR" == Prof Brian Ripley <ripley at stats.ox.ac.uk>
>>>>> on Fri, 30 May 2008 22:34:28 +0100 (BST) writes:
BDR> I think it is ESS that is parsing this as a help
BDR> request (so it can divert it to an ESS buffer).
BDR> Looks like this is an ESS issue, not an R one.
yes, indeed, hence much more belonging the ESS-help
2008 May 08
1
[PATCH] Typo in 'unique' help page (PR#11401)
---
src/library/base/man/unique.Rd | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/src/library/base/man/unique.Rd b/src/library/base/man/unique.Rd
index a8397c7..4664a34 100644
--- a/src/library/base/man/unique.Rd
+++ b/src/library/base/man/unique.Rd
@@ -29,7 +29,7 @@ unique(x, incomparables = FALSE, \dots)
\item{x}{a vector or a data frame or an array or
2012 Sep 27
3
Keep rows in a dataset if one value in a column is duplicated
Hi,
I have a data set of observations by either one person or a pair of people.
I want to only keep the pair observations, and was using the code below
until it gave me the error " $ operator is invalid for atomic vectors". I am
just beginning to learn R, so I apologize if the code is really rough.
Basically I want to keep all the rows in the data set for which the value of
2010 Nov 09
1
agrep pmatch recursive???
Hello R Helpers,
Business - 64 bit windows 7, R 2.11.1
I am trying to match the character contents of one list, called 'exclude', to those of a second list, called 'dataset'
dataset is a list of file names with folder locations, and looks like this when called:
> dataset
[1] "A/10-10-29a-13.cdf" "A/10-10-29a-14.cdf" "A/10-10-29a-15.cdf"
2015 Jan 23
1
:: and ::: as .Primitives?
Hi,
On 01/23/2015 07:01 AM, luke-tierney at uiowa.edu wrote:
> On Thu, 22 Jan 2015, Michael Lawrence wrote:
>
>> On Thu, Jan 22, 2015 at 11:44 AM, <luke-tierney at uiowa.edu> wrote:
>>>
>>> For default methods there ought to be a way to create those so the
>>> default method is computed at creation or load time and stored in an
>>>
2010 Jan 16
2
Extracing only Unique Rows based on only 1 Column
To Whomever is Interested,
I have spent several days searching the web, help files, the R wiki
and the archives of this mailing list for a solution to this problem,
but nonetheless I apologize in advance if I have missed something
obvious.
The problem is this; I have a 5-column data frame with about 4.2
million rows, and want to create a new (and hopefully much smaller)
data frame that
2011 Oct 05
1
unique possible bug
Hi,
I am trying to read in a rather large list of transactions using the
arules library. It seems in the coerce method into the dgCmatrix, it
somewhere calls unique. Unique.c throws an error when n > 536870912;
however, when 4*n was modified to 2*n in 2004, the overflow protection
should have changed from 2^29 to 2^30, right? If so, how would I
change it in my copy? Do I have to recompile