thr3ads.net - R help - [R] Regular Expressions + Matrices [Aug 2012]

If this information is useful, please help other people find it:
Share via:

Fred G

2012-Aug-10 17:41 UTC

[R] Regular Expressions + Matrices

Hi all,

My code looks like the following:
inname = read.csv("ID_error_checker.csv", as.is=TRUE)
outname = read.csv("output.csv", as.is=TRUE)

#My algorithm is the following:
#for line in inname
#if first string up to whitespace in row in inname$name = first string up
to whitespace in row + 1 in inname$name
#AND ID in inname$ID for the top row NOT EQUAL ID in inname$ID for the row
below it
#copy these two lines to a new file

In other words, if the name (up to the first whitespace) in the first row
equals the name in the second row (etc for whole file) and the ID in the
first row does not equal the ID in the second row, copy both of these rows
in full to a new file.  Only caveat is that I want a regular expression not
to take the full names, but just the first string up to the first
whitespace in the inname$name column (ie if row1 has a name of: New York
Mets and row2 has a name of New York Yankees, I would want both of these
rows to be copied in full since "New" is the same in both...)

Here is some example data:
ID NAME                          YEAR     SOURCE     NOTES
1  New York Mets               1900      ESPN
2  New York Yankees          1920     Cooperstown
3  Boston Redsox               1918      ESPN
4  Washington Nationals      2010     ESPN
5  Detroit Tigers                  1990      ESPN

The desired output would be:
ID   NAME                    YEAR SOURCE
1    New York Mets        1900   ESPN
2    New York Yankees   1920   Cooperstown

Thanks so much!

	[[alternative HTML version deleted]]

arun

2012-Aug-10 18:01 UTC

head link

[R] Regular Expressions + Matrices

Hi,

Try this:
dat1<-read.table(text="
ID,??? NAME,??? YEAR,??? SOURCE
1,??? New York Mets,??? 1900,??? ESPN
2,??? New York Yankees,??? 1920,??? Cooperstown
3,??? Boston Redsox,??? 1918,??? ESPN
4,??? Washington Nationals,??? 2010,??? ESPN
5,??? Detroit Tigers,??? 1990,??? ESPN
",sep=",",header=TRUE,stringsAsFactors=FALSE)

?index<-grep("New York.*",dat1$NAME)
dat1[index,]
#? ID???????????? NAME YEAR????? SOURCE
#1? 1??? New York Mets 1900??????? ESPN
#2? 2 New York Yankees 1920 Cooperstown

A.K.



----- Original Message -----
From: Fred G <bayespokerguy at gmail.com>
To: r-help at r-project.org
Cc: 
Sent: Friday, August 10, 2012 1:41 PM
Subject: [R] Regular Expressions + Matrices

Hi all,

My code looks like the following:
inname = read.csv("ID_error_checker.csv", as.is=TRUE)
outname = read.csv("output.csv", as.is=TRUE)

#My algorithm is the following:
#for line in inname
#if first string up to whitespace in row in inname$name = first string up
to whitespace in row + 1 in inname$name
#AND ID in inname$ID for the top row NOT EQUAL ID in inname$ID for the row
below it
#copy these two lines to a new file

In other words, if the name (up to the first whitespace) in the first row
equals the name in the second row (etc for whole file) and the ID in the
first row does not equal the ID in the second row, copy both of these rows
in full to a new file.? Only caveat is that I want a regular expression not
to take the full names, but just the first string up to the first
whitespace in the inname$name column (ie if row1 has a name of: New York
Mets and row2 has a name of New York Yankees, I would want both of these
rows to be copied in full since "New" is the same in both...)

Here is some example data:
ID NAME? ? ? ? ? ? ? ? ? ? ? ? ? YEAR? ?  SOURCE? ?  NOTES
1? New York Mets? ? ? ? ? ? ?  1900? ? ? ESPN
2? New York Yankees? ? ? ? ? 1920? ?  Cooperstown
3? Boston Redsox? ? ? ? ? ? ?  1918? ? ? ESPN
4? Washington Nationals? ? ? 2010? ?  ESPN
5? Detroit Tigers? ? ? ? ? ? ? ? ? 1990? ? ? ESPN

The desired output would be:
ID?  NAME? ? ? ? ? ? ? ? ? ? YEAR SOURCE
1? ? New York Mets? ? ? ? 1900?  ESPN
2? ? New York Yankees?  1920?  Cooperstown

Thanks so much!

??? [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Rui Barradas

2012-Aug-10 18:17 UTC

head link

[R] Regular Expressions + Matrices

Hello,

Try the following.


d <- read.table(textConnection("
ID NAME                          YEAR     SOURCE
1  'New York Mets'               1900      ESPN
2  'New York Yankees'          1920     Cooperstown
3  'Boston Redsox'               1918      ESPN
4  'Washington Nationals'      2010     ESPN
5  'Detroit Tigers'                  1990      ESPN
"), header=TRUE)

d$NAME <- as.character(d$NAME)

fun <- function(i, x){
     if(x[i, "ID"] != x[i + 1, "ID"]){
         s <- unlist(strsplit(x[i, "NAME"],
"[[:space:]]"))[1]
         if(grepl(s, x[i + 1, "NAME"])) return(TRUE)
     }
     FALSE
}

inx <- sapply(seq_len(nrow(d) - 1), fun, d)
inx <- c(inx, FALSE) | c(FALSE, inx)
d[inx, ]

Hope this helps,

Rui Barradas
Em 10-08-2012 18:41, Fred G escreveu:> Hi all,
>
> My code looks like the following:
> inname = read.csv("ID_error_checker.csv", as.is=TRUE)
> outname = read.csv("output.csv", as.is=TRUE)
>
> #My algorithm is the following:
> #for line in inname
> #if first string up to whitespace in row in inname$name = first string up
> to whitespace in row + 1 in inname$name
> #AND ID in inname$ID for the top row NOT EQUAL ID in inname$ID for the row
> below it
> #copy these two lines to a new file
>
> In other words, if the name (up to the first whitespace) in the first row
> equals the name in the second row (etc for whole file) and the ID in the
> first row does not equal the ID in the second row, copy both of these rows
> in full to a new file.  Only caveat is that I want a regular expression not
> to take the full names, but just the first string up to the first
> whitespace in the inname$name column (ie if row1 has a name of: New York
> Mets and row2 has a name of New York Yankees, I would want both of these
> rows to be copied in full since "New" is the same in both...)
>
> Here is some example data:
> ID NAME                          YEAR     SOURCE     NOTES
> 1  New York Mets               1900      ESPN
> 2  New York Yankees          1920     Cooperstown
> 3  Boston Redsox               1918      ESPN
> 4  Washington Nationals      2010     ESPN
> 5  Detroit Tigers                  1990      ESPN
>
> The desired output would be:
> ID   NAME                    YEAR SOURCE
> 1    New York Mets        1900   ESPN
> 2    New York Yankees   1920   Cooperstown
>
> Thanks so much!
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Apparently Analagous Threads

Search for more possibly parallel threads

R help - Aug 2012 - Regular Expressions + Matrices

[R] Regular Expressions + Matrices

[R] Regular Expressions + Matrices

[R] Regular Expressions + Matrices

Apparently Analagous Threads