This issue has been discussed on this list before but the solutions offerred are not satisfactory. So I thought I shall raise it again. I want to merge two datasets which have three common variables. These variables DO NOT have the same names in both the files. In addition, there are two variables with same name which do not necessarily have exactly same data. That is, there could be some discrepancy between the two datasets when it comes to these variables. I do not want them to be used when I merge the datasets. The problem is that R allows you to use by.x and by.y variables to specify only one variable in x dataset and one variable in y dataset to merge. Otherwise, if you do not specify anything, it matches all the variables that have common names to merge. This is very problemmatic. In my case, the variables I want to use to match do not have same names in two datasets and the ones that have same names must not be used to match. One approach will be to change names of variables and then merge. But that is not elegant, to say the least. If nothing else works, that is what I shall have to do. There again we have some problem. How do I change the name of a particular column. One solution suggested somewhere in the archives of the list is to use names(data.frame)=c(list of column names) But this requires you to list all the variable names. That can obviously be cumbersome when you have large number of variables. What would be the syntax if I want to change just one column name. Vikas
Hello, You can change e.g. the second column name in the following way: data(iris) colnames(iris) [1] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" "Species" To change the second column name: colnames(iris)[2] <- "name" colnames(iris) [1] "Sepal.Length" "name" "Petal.Length" "Petal.Width" "Species" Best, Matthias> > > This issue has been discussed on this list before but the solutions > offerred are not satisfactory. So I thought I shall raise it again. > > I want to merge two datasets which have three common variables. These > variables DO NOT have the same names in both the files. In addition, > there are two variables with same name which do not necessarily have > exactly same data. That is, there could be some discrepancy > between the > two datasets when it comes to these variables. I do not want > them to be > used when I merge the datasets. > > The problem is that R allows you to use by.x and by.y variables to > specify only one variable in x dataset and one variable in y > dataset to > merge. Otherwise, if you do not specify anything, it matches all the > variables that have common names to merge. This is very > problemmatic. In > my case, the variables I want to use to match do not have > same names in > two datasets and the ones that have same names must not be > used to match. > > One approach will be to change names of variables and then merge. But > that is not elegant, to say the least. > > If nothing else works, that is what I shall have to do. There > again we > have some problem. How do I change the name of a particular > column. One > solution suggested somewhere in the archives of the list is to use > > names(data.frame)=c(list of column names) > > But this requires you to list all the variable names. That > can obviously > be cumbersome when you have large number of variables. What > would be the > syntax if I want to change just one column name. > > Vikas > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read > the posting guide! http://www.R-project.org/posting-guide.html >
Hello! merge(TablePatient, TableSpecial, by.x="ID", by.y="PATIENTID") works fine for me. (There is also a variable ID in TableSpecial). One problem - or what has to be known - is that merge is using the levels, not the labels, if the merged variables are factors. Karl
On Wed, 6 Oct 2004, Vikas Rawal wrote:> > The problem is that R allows you to use by.x and by.y variables to specify > only one variable in x dataset and one variable in y dataset to merge.This turns out not to be the case.> names(df)[1] "x" "y" "z"> names(df2)[1] "a" "b" "c"> merge(df,df2,by.x=c("x","y"),by.y=c("a","b"))x y z c 1 101 111 121 221 2 102 112 120 220 3 103 113 121 221 4 104 114 120 220 5 105 115 121 221 6 106 116 120 220 7 107 117 121 221 8 108 118 120 220 9 109 119 121 221 10 110 120 120 220> If nothing else works, that is what I shall have to do. There again we have > some problem. How do I change the name of a particular column. One solution > suggested somewhere in the archives of the list is to use >names(df)[names(df) == "oldname"] <- "newname" is one possibility that doesn't even require working out which variable number it is. -thomas
At 10:44 AM +0530 10/6/04, Vikas Rawal wrote:>This issue has been discussed on this list before but the solutions >offerred are not satisfactory. So I thought I shall raise it again. > >I want to merge two datasets which have three common variables. >These variables DO NOT have the same names in both the files. In >addition, there are two variables with same name which do not >necessarily have exactly same data. That is, there could be some >discrepancy between the two datasets when it comes to these >variables. I do not want them to be used when I merge the datasets. > >The problem is that R allows you to use by.x and by.y variables to >specify only one variable in x dataset and one variable in y dataset >to merge. Otherwise, if you do not specify anything, it matches all >the variables that have common names to merge. This is very >problemmatic. In my case, the variables I want to use to match do >not have same names in two datasets and the ones that have same >names must not be used to match. > >One approach will be to change names of variables and then merge. >But that is not elegant, to say the least. > >If nothing else works, that is what I shall have to do. There again >we have some problem. How do I change the name of a particular >column. One solution suggested somewhere in the archives of the list >is to use > >names(data.frame)=c(list of column names) > >But this requires you to list all the variable names. That can >obviously be cumbersome when you have large number of variables. >What would be the syntax if I want to change just one column name.It's not that hard to figure out the syntax, using functions like match(), intersect(), setdiff() and friends. Here is a suggestion: mydf <- rename(mydf,from='oldvarname',to='newvarname') where the rename function is this: rename <- function (data, from = "", to = "", info = T) { dsn <- deparse(substitute(data)) dfn <- names(data) if (length(from) != length(to)) { cat("--------- from and to not same length ---------\n") stop() } if (length(dfn) < length(to)) { cat("--------- too many new names ---------\n") stop() } chng <- match(from, dfn) frm.in <- from %in% dfn if (!all(frm.in)) { cat("---------- some of the from names not found in", dsn, "\n") stop() } if (length(to) != length(unique(to))) { cat("---------- New names not unique\n") stop() } dfn.new <- dfn dfn.new[chng] <- to if (info) cat("\nChanging in", dsn) tmp <- rbind(from, to) dimnames(tmp)[[1]] <- c("From:", "To:") dimnames(tmp)[[2]] <- rep("", length(from)) if (info) print(tmp, quote = F) names(data) <- dfn.new invisible(data) } 'from' and 'to' can be character vectors, and they must be of the same length. It wouldn't be hard to modify it to *not* receive and return the entire dataframe, but I found it more convenient to use this way. Also, I wrote that function a long time ago, when I had a lot less experience than I do now (just in case anyone notices some obvious room for improvement!)> >Vikas > >______________________________________________ >R-help at stat.math.ethz.ch mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html-- -------------------------------------- Don MacQueen Environmental Protection Department Lawrence Livermore National Laboratory Livermore, CA, USA