Hi all, I have a data frame with huge rows and columns. When I looked at the data, it has several garbage values need to be cleaned. For a sample I am showing you the frequency distribution of one variables Var1 Freq 1 : 3 2 ] 6 3 MSN 1040 4 YYZ 300 5 \\ 4 6 + 3 7. ?> 15 and continues. I want to keep those rows that contain only a valid variable value In this case MSN and YYZ. I tried the following *test <- dat[dat$Var1 == "YYZ" | dat$Var1 =="MSN" ,]* but I am not getting the desired result. I have Any help or idea? [[alternative HTML version deleted]]
Hi, On Wed, Nov 11, 2015 at 6:51 PM, Ashta <sewashm at gmail.com> wrote:> Hi all, > > I have a data frame with huge rows and columns. > > When I looked at the data, it has several garbage values need to be > > cleaned. For a sample I am showing you the frequency distribution > of one variables > > Var1 Freq > 1 : 3 > 2 ] 6 > 3 MSN 1040 > 4 YYZ 300 > 5 \\ 4 > 6 + 3 > 7. ?> 15Please use dput() to provide your data. I made a guess at what you had in R, but could be wrong.> and continues. > > I want to keep those rows that contain only a valid variable value > > In this case MSN and YYZ. I tried the following > > *test <- dat[dat$Var1 == "YYZ" | dat$Var1 =="MSN" ,]* > > but I am not getting the desired result.What are you getting? How does it differ from the desired result?> I have > > Any help or idea?I get:> dat <- structure(list(X = 1:7, Var1 = c(":", "]", "MSN", "YYZ", "\\\\",+ "+", "?>"), Freq = c(3L, 6L, 1040L, 300L, 4L, 3L, 15L)), .Names = c("X", + "Var1", "Freq"), class = "data.frame", row.names = c(NA, -7L))> > test <- dat[dat$Var1 == "YYZ" | dat$Var1 =="MSN" ,] > testX Var1 Freq 3 3 MSN 1040 4 4 YYZ 300 Which seems reasonable to me.> > [[alternative HTML version deleted]]Please don't post in HTML either: it introduces all sorts of errors to your message. Sarah -- Sarah Goslee http://www.functionaldiversity.org
Please keep replies on the list so others may participate in the conversation. If you have a character vector containing the potential values, you might look at %in% for one approach to subsetting your data. Var1 %in% myvalues Sarah On Wed, Nov 11, 2015 at 7:10 PM, Ashta <sewashm at gmail.com> wrote:> Thank you Sarah for your prompt response! > > I have the list of values of the variable Var1 it is around 20. > How can I modify this one to include all the 20 valid values? > > test <- dat[dat$Var1 == "YYZ" | dat$Var1 =="MSN" ,] > > Is there a way (efficient ) of doing it? > > Thank you again > > > > On Wed, Nov 11, 2015 at 6:02 PM, Sarah Goslee <sarah.goslee at gmail.com> > wrote: >> >> Hi, >> >> On Wed, Nov 11, 2015 at 6:51 PM, Ashta <sewashm at gmail.com> wrote: >> > Hi all, >> > >> > I have a data frame with huge rows and columns. >> > >> > When I looked at the data, it has several garbage values need to be >> > >> > cleaned. For a sample I am showing you the frequency distribution >> > of one variables >> > >> > Var1 Freq >> > 1 : 3 >> > 2 ] 6 >> > 3 MSN 1040 >> > 4 YYZ 300 >> > 5 \\ 4 >> > 6 + 3 >> > 7. ?> 15 >> >> Please use dput() to provide your data. I made a guess at what you had >> in R, but could be wrong. >> >> >> > and continues. >> > >> > I want to keep those rows that contain only a valid variable value >> > >> > In this case MSN and YYZ. I tried the following >> > >> > *test <- dat[dat$Var1 == "YYZ" | dat$Var1 =="MSN" ,]* >> > >> > but I am not getting the desired result. >> >> What are you getting? How does it differ from the desired result? >> >> > I have >> > >> > Any help or idea? >> >> I get: >> >> > dat <- structure(list(X = 1:7, Var1 = c(":", "]", "MSN", "YYZ", "\\\\", >> + "+", "?>"), Freq = c(3L, 6L, 1040L, 300L, 4L, 3L, 15L)), .Names = c("X", >> + "Var1", "Freq"), class = "data.frame", row.names = c(NA, -7L)) >> > >> > test <- dat[dat$Var1 == "YYZ" | dat$Var1 =="MSN" ,] >> > test >> X Var1 Freq >> 3 3 MSN 1040 >> 4 4 YYZ 300 >> >> Which seems reasonable to me. >> >> >> > >> > [[alternative HTML version deleted]] >> >> Please don't post in HTML either: it introduces all sorts of errors to >> your message. >> >> Sarah >>
Hi Sarah, I used the following to clean my data, the program crushed several times. *test <- dat[dat$Var1 == "YYZ" | dat$Var1 =="MSN" ,]* *What is the difference between these two**test <- dat[dat$Var1 **%in% "YYZ" | dat$Var1** %in% "MSN" ,]* On Wed, Nov 11, 2015 at 6:38 PM, Sarah Goslee <sarah.goslee at gmail.com> wrote:> Please keep replies on the list so others may participate in the > conversation. > > If you have a character vector containing the potential values, you > might look at %in% for one approach to subsetting your data. > > Var1 %in% myvalues > > Sarah > > On Wed, Nov 11, 2015 at 7:10 PM, Ashta <sewashm at gmail.com> wrote: > > Thank you Sarah for your prompt response! > > > > I have the list of values of the variable Var1 it is around 20. > > How can I modify this one to include all the 20 valid values? > > > > test <- dat[dat$Var1 == "YYZ" | dat$Var1 =="MSN" ,] > > > > Is there a way (efficient ) of doing it? > > > > Thank you again > > > > > > > > On Wed, Nov 11, 2015 at 6:02 PM, Sarah Goslee <sarah.goslee at gmail.com> > > wrote: > >> > >> Hi, > >> > >> On Wed, Nov 11, 2015 at 6:51 PM, Ashta <sewashm at gmail.com> wrote: > >> > Hi all, > >> > > >> > I have a data frame with huge rows and columns. > >> > > >> > When I looked at the data, it has several garbage values need to be > >> > > >> > cleaned. For a sample I am showing you the frequency distribution > >> > of one variables > >> > > >> > Var1 Freq > >> > 1 : 3 > >> > 2 ] 6 > >> > 3 MSN 1040 > >> > 4 YYZ 300 > >> > 5 \\ 4 > >> > 6 + 3 > >> > 7. ?> 15 > >> > >> Please use dput() to provide your data. I made a guess at what you had > >> in R, but could be wrong. > >> > >> > >> > and continues. > >> > > >> > I want to keep those rows that contain only a valid variable value > >> > > >> > In this case MSN and YYZ. I tried the following > >> > > >> > *test <- dat[dat$Var1 == "YYZ" | dat$Var1 =="MSN" ,]* > >> > > >> > but I am not getting the desired result. > >> > >> What are you getting? How does it differ from the desired result? > >> > >> > I have > >> > > >> > Any help or idea? > >> > >> I get: > >> > >> > dat <- structure(list(X = 1:7, Var1 = c(":", "]", "MSN", "YYZ", > "\\\\", > >> + "+", "?>"), Freq = c(3L, 6L, 1040L, 300L, 4L, 3L, 15L)), .Names > c("X", > >> + "Var1", "Freq"), class = "data.frame", row.names = c(NA, -7L)) > >> > > >> > test <- dat[dat$Var1 == "YYZ" | dat$Var1 =="MSN" ,] > >> > test > >> X Var1 Freq > >> 3 3 MSN 1040 > >> 4 4 YYZ 300 > >> > >> Which seems reasonable to me. > >> > >> > >> > > >> > [[alternative HTML version deleted]] > >> > >> Please don't post in HTML either: it introduces all sorts of errors to > >> your message. > >> > >> Sarah > >> >[[alternative HTML version deleted]]