Dear R-users, I am very new to R, so maybe my question is very easy to answer. I have the following table: TAB1<-data.frame(Name,Number), "Name" and "Number" are all character strings, it looks like this: Name Number ab 2 ab 2 NA 15 NA 15 NA 15 cd 3 ef 1 NA 15 NA 15 gh 15 gh 15 I want to delete all the rows which begin with "NA" and all the rows where names are duplicates (for example the second row). I have tried this, but I only get numbers: for (i in 1:ZeileMax ) {if ( TAB1[[1]] [i] != "NA" ) {cat(TAB1[[1]][i],file = "Name.txt",fill= TRUE,append = TRUE ,sep = "");cat(TAB1[[2]][i], file="Number.txt", fill=TRUE,append=TRUE, sep="")}} Name<-readLines("Name.txt") Number<-readLines("Number.txt") TAB<-data.frame(Name,Number) Thanks in advance, Michael Graber
To delete duplicate rows, use unique(TAB1): see its help page. It looks to me as if the names are missing values NA and *not* start with NA. If so, you want to use TAB1[!is.na(TAB1$Name), ] Otherwise, perhaps TAB1[substr(TAB1$Name, 1, 2) == "NA", ]. On Wed, 27 Jul 2005, Michael Graber wrote:> Dear R-users, > > I am very new to R, so maybe my question is very easy to answer. > I have the following table: > TAB1<-data.frame(Name,Number), "Name" and "Number" are all character > strings, > it looks like this: > > Name Number > > ab 2 > > ab 2 > > NA 15 > > NA 15 > > NA 15 > > cd 3 > > ef 1 > > NA 15 > > NA 15 > > gh 15 > > gh 15 > > I want to delete all the rows which begin with "NA" > and all the rows where names are duplicates > (for example the second row). > I have tried this, but I only get numbers: > > for (i in 1:ZeileMax ) {if ( TAB1[[1]] [i] != "NA" ) > {cat(TAB1[[1]][i],file = "Name.txt",fill= TRUE,append = TRUE ,sep > "");cat(TAB1[[2]][i], file="Number.txt", fill=TRUE,append=TRUE, sep="")}} > Name<-readLines("Name.txt") > Number<-readLines("Number.txt") > TAB<-data.frame(Name,Number) > > > Thanks in advance, > > > > Michael Graber > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Michael Graber wrote:> Dear R-users, > > I am very new to R, so maybe my question is very easy to answer. > I have the following table: > TAB1<-data.frame(Name,Number), "Name" and "Number" are all character > strings, > it looks like this: > > Name Number > > ab 2 >[etc]> gh 15 > > gh 15 >> for (i in 1:ZeileMax ) {if ( TAB1[[1]] [i] != "NA" ) > {cat(TAB1[[1]][i],file = "Name.txt",fill= TRUE,append = TRUE ,sep = > "");cat(TAB1[[2]][i], file="Number.txt", fill=TRUE,append=TRUE, sep="")}} > Name<-readLines("Name.txt") > Number<-readLines("Number.txt") > TAB<-data.frame(Name,Number)I'm not going to bother working out why that fails! The following assumes you want to keep one of any row that has a duplicated Name, in this case the first instance. I think your mail was a bit ambiguous as to whether you wanted to delete all rows with a duplicate Name... You can do it in two lines. First select the rows that dont have Name=="NA", and then select the rows that dont have duplicated Name: > TAB <- TAB1[TAB1$Name!="NA",] > TAB <- TAB[!duplicated(TAB$Name),] > TAB Name Number 1 ab 2 6 cd 3 7 ef 1 10 gh 15 Or you can do it in one line: > TAB=TAB1[!duplicated(TAB1$Name) & TAB1$Name!="NA",] > TAB Name Number 1 ab 2 6 cd 3 7 ef 1 10 gh 15 Dont think of it as deleting rows, you are selecting the rows you want and creating a new data frame. Any simple intro to R (see www.r-project.org for plenty) will have examples on selecting rows and columns. Baz