thr3ads.net - similar to: "Identifying similar but not identical rows in a dataframe"

Displaying 20 results from an estimated 8000 matches similar to: "Identifying similar but not identical rows in a dataframe"

match and incomparables

2008 Sep 12

match and incomparables

Hello, I was playing around with the newly implemented 'incomparables' argument in 'match' and realized the argument does not behave anything like I expected. Can someone explain what is going on here? Sorry if I'm misreading the documentation. > match(1:3, 1:3, incomparables=1) [1] NA 2 3 # This seems right, the 1 in 'x' is 'incomparable' >

Trivial patch for merge.Rd

2016 Jun 08

Trivial patch for merge.Rd

Hi all, After replying to r-help earlier today on the merge() related thread, I noted a trivial grammatical error in the description for the 'suffixes' argument in it's help file. A patch against the current SVN trunk version of merge.Rd in ..library/base/man is attached and pasted here: --- merge1.Rd 2016-06-08 13:34:35.000000000 -0500 +++ merge2.Rd 2016-06-08 14:03:34.000000000

suggesting a new feature for unique()

2004 Aug 19

suggesting a new feature for unique()

Dear R-devel, May I suggest that a new feature be added to a couple of unique() methods? Sometimes it's useful to have the indices of the original data that the unique elements come from, so that the original data can be recreated from the unique()ed data. I suggest that an `index' argument be added for unique. Below is a suggested patch against R/src/library/base/R/duplicated.R: ***

duplicated fails to rise correct errors (PR#13632)

2009 Mar 30

duplicated fails to rise correct errors (PR#13632)

Full_Name: Wacek Kusnierczyk Version: 2.8.0 and 2.10.0 r48242 OS: Ubuntu 8.04 Linux 32 bit Submission from: (NULL) (129.241.110.161) In the following code: duplicated(data.frame(), incomparables=NA) # Error in if (!is.logical(incomparables) || incomparables) .NotYetUsed("incomparables != FALSE") : # missing value where TRUE/FALSE needed the raised error is clearly not the

applying duplicated, unique and match to lists?

2007 Nov 02

applying duplicated, unique and match to lists?

Dear R developers, While improving duplicated.array() and friends and developing equivalents for the new ff package for large datasets I came across two questions: 1) is it safe to use duplicated.default(), unique.default() and match() on arbitrary lists? If so, we can speed up duplicated.array and friends considerably by using list() instead of paste(collapse="\r") 2) while

Bug in agrep computing edit distance?

2010 Nov 17

Bug in agrep computing edit distance?

I posted this yesterday to r-help and Ben Bolker suggested reposting it here... Dickison, Daniel <ddickison <at> carnegielearning.com> writes: > > The documentation for agrep says it uses the Levenshtein edit distance, > but it seems to get this wrong in certain cases when there is a > combination of deletions and substitutions. For example: > > >

Fix for bug in match()

2010 Jan 18

Fix for bug in match()

Hello all, I posted the following bug last week: # These calls work correctly: match(c("A", "B", "C"), c("A","C"), incomparables=NA) # okay match(c("A", "B", "C"), "A") # okay match("A", c("A", "B"), incomparables=NA) # okay # This one causes R to hang: match(c("A",

bug en funcion 'agrep'

2012 Jan 19

bug en funcion 'agrep'

Estimados R-users: Estoy intentando usar la función 'agrep' para hacer búsquedas en cadenas de texto. El parámetro max.distance permite controlar la medida de aproximación de búsqueda de la función de Levenshtein. No obstante, cuando hago búsquedas específicas no obtengo siempre el resultado deseado y no se si es un bug o que no entiendo bien el algoritmo de búsqueda. Por

Bug in agrep computing edit distance?

2010 Nov 16

Bug in agrep computing edit distance?

The documentation for agrep says it uses the Levenshtein edit distance, but it seems to get this wrong in certain cases when there is a combination of deletions and substitutions. For example: > agrep("abcd", "abcxyz", max.distance=1) [1] 1 That should've been a no-match. The edit distance between those strings is 3 (1 substitution, 2 deletions), but agrep matches

Pb with agrep()

2006 Jan 05

Pb with agrep()

Happy new year everybody, I'm getting the following while trying to use the agrep() function: > pattern <- "XXX" > subject <- c("oooooo", "oooXooo", "oooXXooo", "oooXXXooo") > max <- list(ins=0, del=0, sub=0) # I want exact matches only > agrep(pattern, subject, max=max) [1] 4 OK > max$sub <- 1 # One allowed

How do you use agrep inside a loop

2012 Dec 11

How do you use agrep inside a loop

Hi all. This is my first message at R-help...so I'm hoping I have some beginner's luck and get some good help for my problem! FYI I have just started using R recently so my knowledge of R is pretty preliminary. Okay here is what I need help with - I need to know how to use agrep in a for loop. I need to compare elements of a vector of names with other elements of the same vector.

Merging data frames on a variety of columns

2010 Sep 17

Merging data frames on a variety of columns

Hello, This is a semi-complicated question about comparing two datasets, probably using merge, but I am open to other ideas. I have a large frame of information about companies.? It's over 30,000 rows and looks something like... df1 <- identifier1???? identifier2 name other_name year H34 C56 ACME ACME_LTD 2001 H34

help (using ?) does not handle trailing whitespace (PR#11537)

2008 May 29

help (using ?) does not handle trailing whitespace (PR#11537)

> ?agrep > Results in: No documentation for 'agrep ' in specified packages and libraries: you could try 'help.search("agrep ")' There is white space after agrep, that ? doesn't ignore. --please do not edit the information below-- Version: platform = i486-pc-linux-gnu arch = i486 os = linux-gnu system = i486, linux-gnu status = major = 2 minor =

(PR#11537) help (using ?) does not handle trailing whitespace

2008 Jun 02

(PR#11537) help (using ?) does not handle trailing whitespace

>>>>> "BDR" == Prof Brian Ripley <ripley at stats.ox.ac.uk> >>>>> on Fri, 30 May 2008 22:34:28 +0100 (BST) writes: BDR> I think it is ESS that is parsing this as a help BDR> request (so it can divert it to an ESS buffer). BDR> Looks like this is an ESS issue, not an R one. yes, indeed, hence much more belonging the ESS-help

[PATCH] Typo in 'unique' help page (PR#11401)

2008 May 08

[PATCH] Typo in 'unique' help page (PR#11401)

--- src/library/base/man/unique.Rd | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/src/library/base/man/unique.Rd b/src/library/base/man/unique.Rd index a8397c7..4664a34 100644 --- a/src/library/base/man/unique.Rd +++ b/src/library/base/man/unique.Rd @@ -29,7 +29,7 @@ unique(x, incomparables = FALSE, \dots) \item{x}{a vector or a data frame or an array or

Keep rows in a dataset if one value in a column is duplicated

2012 Sep 27

Keep rows in a dataset if one value in a column is duplicated

Hi, I have a data set of observations by either one person or a pair of people. I want to only keep the pair observations, and was using the code below until it gave me the error " $ operator is invalid for atomic vectors". I am just beginning to learn R, so I apologize if the code is really rough. Basically I want to keep all the rows in the data set for which the value of

agrep pmatch recursive???

2010 Nov 09

agrep pmatch recursive???

Hello R Helpers, Business - 64 bit windows 7, R 2.11.1 I am trying to match the character contents of one list, called 'exclude', to those of a second list, called 'dataset' dataset is a list of file names with folder locations, and looks like this when called: > dataset [1] "A/10-10-29a-13.cdf" "A/10-10-29a-14.cdf" "A/10-10-29a-15.cdf"

:: and ::: as .Primitives?

2015 Jan 23

:: and ::: as .Primitives?

Hi, On 01/23/2015 07:01 AM, luke-tierney at uiowa.edu wrote: > On Thu, 22 Jan 2015, Michael Lawrence wrote: > >> On Thu, Jan 22, 2015 at 11:44 AM, <luke-tierney at uiowa.edu> wrote: >>> >>> For default methods there ought to be a way to create those so the >>> default method is computed at creation or load time and stored in an >>>

Extracing only Unique Rows based on only 1 Column

2010 Jan 16

Extracing only Unique Rows based on only 1 Column

To Whomever is Interested, I have spent several days searching the web, help files, the R wiki and the archives of this mailing list for a solution to this problem, but nonetheless I apologize in advance if I have missed something obvious. The problem is this; I have a 5-column data frame with about 4.2 million rows, and want to create a new (and hopefully much smaller) data frame that

unique possible bug

2011 Oct 05

unique possible bug

Hi, I am trying to read in a rather large list of transactions using the arules library. It seems in the coerce method into the dgCmatrix, it somewhere calls unique. Unique.c throws an error when n > 536870912; however, when 4*n was modified to 2*n in 2004, the overflow protection should have changed from 2^29 to 2^30, right? If so, how would I change it in my copy? Do I have to recompile

similar to: Identifying similar but not identical rows in a dataframe