Hi, May be this helps: set.seed(14) dat1 <- data.frame(shell_ID= sample(c("0208A_47_33","0208A_47_34","0912C_13_3","1400C_2_48"),20,replace=TRUE),stringsAsFactors=FALSE) dat2 <- dat1 ord1 <- order(as.numeric(gsub("[[:alpha:]]+.*","",dat1$shell_ID)),as.numeric(gsub(".*\\_","",dat1$shell_ID)) ) dat1 <- dat1[ord1,,drop=FALSE] row.names(dat1) <- 1:nrow(dat1) #or library(gtools) dat2$shell_ID <- mixedsort(dat2$shell_ID) identical(dat1,dat2) #[1] TRUE dat1$x <- as.numeric(factor(dat1$shell_ID)) dat1? #or dat2$x <- match(dat1$shell_ID,unique(dat1$shell_ID)) all.equal(dat1,dat2) #[1] TRUE A.K. Hi all, I am trying to do a similar thing however I would like the second vector to read as follows. shell_ID X 0208A_47_33 1 0208A_47_33 1 0208A_47_33 1 0208A_47_34 2 0208A_47_34 2 0208A_47_34 2 0208A_47_34 2 0208A_47_34 2 0208A_47_34 2 0208A_47_34 2 0912C_13_3 3 0912C_13_3 3 0912C_13_3 3 1400C_2_48 4 1400C_2_48 4 1400C_2_48 4 1400C_2_48 4 1400C_2_48 4 1400C_2_48 4 1400C_2_48 4 1400C_2_48 4 1400C_2_48 4 However the shell_ID's may not be in any particular order as I am already using a subset of data based on another variable in R I am not familiar with how to check that the shell_IDs are sorted. The subset contains 21,005 unique shell_ID's. Thanks Helen