Hi, How can I find the location of string data in my 2D dataset? spec(Dataset) will reveal the columns that contain the strings. But can I know where exactly the string values are in the column? [[alternative HTML version deleted]]
Hello, You should post a working example, we have no idea what your 2d data set is. A matrix? A data.frame? Something else? And the string you are looking for? Are you thinking of regular expressions (grep) or is it a simple equality '=='? Here is a reproducible example of the use of ?which() with argument arr.ind set to TRUE. # create a data set set.seed(2021) A <- matrix(sample(letters, 24, TRUE), ncol = 4) # Test for equality, this returns # a logical matrix and which() can # be applied to it found <- A == "g" which(found, arr.ind = TRUE) # row col #[1,] 1 1 #[2,] 5 1 #[3,] 2 3 # The same code can be use if the data is # a data.frame df1 <- as.data.frame(A) df1 == "g" But if you want to look for a regex, try sapply. In this example the pattern is a simple one, and I use grepl. pattern <- "g" found2 <- sapply(df1, function(x) grepl(pattern, x)) which(found2, arr.ind = TRUE) Hope this helps, Rui Barradas ?s 18:07 de 15/05/21, Tuhin Chakraborty escreveu:> Hi, > How can I find the location of string data in my 2D dataset? spec(Dataset) > will reveal the columns that contain the strings. But can I know where > exactly the string values are in the column? > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Tuhin, What do you mean by a 2-D dataset? You say some columns contain strings so it does not sound like you are using a matrix as then ALL columns would be of the same type. So are you using a data.frame or tibble or something you made on your own? Can you address one column at a time and would that be of type vector? Some methods work fairly easily on those and some also on lists. Once you have that vector, there are quite a few ways to find what you want. Is it fixed text like looking for an exact full match so it would be something like "theta" to be matched in full, or would you want to match "the" and both "theta" and "lathe" would match? Or are you matching a pattern that is more complex like looking for all text that has two vowels in a row in it? Once you figure out what you have and what you want, how do you want to identify what you are looking for? Will there be one match or possibly many or even all? Many methods will return a TRUE/FALSE vector of the same length or the integer offset of a match such as telling you it is the fifth item. R has collections of string functions including in packages like stringr/stringi that deal well with many things you might need. For matching patterns, there is a family of functions using "grep" and so on. Good luck. -----Original Message----- From: R-help <r-help-bounces at r-project.org> On Behalf Of Tuhin Chakraborty Sent: Saturday, May 15, 2021 1:08 PM To: r-help at r-project.org Subject: [R] Finding strings in a dataset Hi, How can I find the location of string data in my 2D dataset? spec(Dataset) will reveal the columns that contain the strings. But can I know where exactly the string values are in the column? [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.