Dear R-users,
I am very new to R, so maybe my question is very easy to answer.
I have the following table:
TAB1<-data.frame(Name,Number), "Name" and "Number" are
all character
strings,
it looks like this:
Name Number
ab 2
ab 2
NA 15
NA 15
NA 15
cd 3
ef 1
NA 15
NA 15
gh 15
gh 15
I want to delete all the rows which begin with "NA"
and all the rows where names are duplicates
(for example the second row).
I have tried this, but I only get numbers:
for (i in 1:ZeileMax ) {if ( TAB1[[1]] [i] != "NA" )
{cat(TAB1[[1]][i],file = "Name.txt",fill= TRUE,append = TRUE ,sep =
"");cat(TAB1[[2]][i], file="Number.txt",
fill=TRUE,append=TRUE, sep="")}}
Name<-readLines("Name.txt")
Number<-readLines("Number.txt")
TAB<-data.frame(Name,Number)
Thanks in advance,
Michael Graber
To delete duplicate rows, use unique(TAB1): see its help page. It looks to me as if the names are missing values NA and *not* start with NA. If so, you want to use TAB1[!is.na(TAB1$Name), ] Otherwise, perhaps TAB1[substr(TAB1$Name, 1, 2) == "NA", ]. On Wed, 27 Jul 2005, Michael Graber wrote:> Dear R-users, > > I am very new to R, so maybe my question is very easy to answer. > I have the following table: > TAB1<-data.frame(Name,Number), "Name" and "Number" are all character > strings, > it looks like this: > > Name Number > > ab 2 > > ab 2 > > NA 15 > > NA 15 > > NA 15 > > cd 3 > > ef 1 > > NA 15 > > NA 15 > > gh 15 > > gh 15 > > I want to delete all the rows which begin with "NA" > and all the rows where names are duplicates > (for example the second row). > I have tried this, but I only get numbers: > > for (i in 1:ZeileMax ) {if ( TAB1[[1]] [i] != "NA" ) > {cat(TAB1[[1]][i],file = "Name.txt",fill= TRUE,append = TRUE ,sep > "");cat(TAB1[[2]][i], file="Number.txt", fill=TRUE,append=TRUE, sep="")}} > Name<-readLines("Name.txt") > Number<-readLines("Number.txt") > TAB<-data.frame(Name,Number) > > > Thanks in advance, > > > > Michael Graber > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Michael Graber wrote:> Dear R-users, > > I am very new to R, so maybe my question is very easy to answer. > I have the following table: > TAB1<-data.frame(Name,Number), "Name" and "Number" are all character > strings, > it looks like this: > > Name Number > > ab 2 >[etc]> gh 15 > > gh 15 >> for (i in 1:ZeileMax ) {if ( TAB1[[1]] [i] != "NA" ) > {cat(TAB1[[1]][i],file = "Name.txt",fill= TRUE,append = TRUE ,sep = > "");cat(TAB1[[2]][i], file="Number.txt", fill=TRUE,append=TRUE, sep="")}} > Name<-readLines("Name.txt") > Number<-readLines("Number.txt") > TAB<-data.frame(Name,Number)I'm not going to bother working out why that fails! The following assumes you want to keep one of any row that has a duplicated Name, in this case the first instance. I think your mail was a bit ambiguous as to whether you wanted to delete all rows with a duplicate Name... You can do it in two lines. First select the rows that dont have Name=="NA", and then select the rows that dont have duplicated Name: > TAB <- TAB1[TAB1$Name!="NA",] > TAB <- TAB[!duplicated(TAB$Name),] > TAB Name Number 1 ab 2 6 cd 3 7 ef 1 10 gh 15 Or you can do it in one line: > TAB=TAB1[!duplicated(TAB1$Name) & TAB1$Name!="NA",] > TAB Name Number 1 ab 2 6 cd 3 7 ef 1 10 gh 15 Dont think of it as deleting rows, you are selecting the rows you want and creating a new data frame. Any simple intro to R (see www.r-project.org for plenty) will have examples on selecting rows and columns. Baz