Hi Sarah, I used the following to clean my data, the program crushed several times. *test <- dat[dat$Var1 == "YYZ" | dat$Var1 =="MSN" ,]* *What is the difference between these two**test <- dat[dat$Var1 **%in% "YYZ" | dat$Var1** %in% "MSN" ,]* On Wed, Nov 11, 2015 at 6:38 PM, Sarah Goslee <sarah.goslee at gmail.com> wrote:> Please keep replies on the list so others may participate in the > conversation. > > If you have a character vector containing the potential values, you > might look at %in% for one approach to subsetting your data. > > Var1 %in% myvalues > > Sarah > > On Wed, Nov 11, 2015 at 7:10 PM, Ashta <sewashm at gmail.com> wrote: > > Thank you Sarah for your prompt response! > > > > I have the list of values of the variable Var1 it is around 20. > > How can I modify this one to include all the 20 valid values? > > > > test <- dat[dat$Var1 == "YYZ" | dat$Var1 =="MSN" ,] > > > > Is there a way (efficient ) of doing it? > > > > Thank you again > > > > > > > > On Wed, Nov 11, 2015 at 6:02 PM, Sarah Goslee <sarah.goslee at gmail.com> > > wrote: > >> > >> Hi, > >> > >> On Wed, Nov 11, 2015 at 6:51 PM, Ashta <sewashm at gmail.com> wrote: > >> > Hi all, > >> > > >> > I have a data frame with huge rows and columns. > >> > > >> > When I looked at the data, it has several garbage values need to be > >> > > >> > cleaned. For a sample I am showing you the frequency distribution > >> > of one variables > >> > > >> > Var1 Freq > >> > 1 : 3 > >> > 2 ] 6 > >> > 3 MSN 1040 > >> > 4 YYZ 300 > >> > 5 \\ 4 > >> > 6 + 3 > >> > 7. ?> 15 > >> > >> Please use dput() to provide your data. I made a guess at what you had > >> in R, but could be wrong. > >> > >> > >> > and continues. > >> > > >> > I want to keep those rows that contain only a valid variable value > >> > > >> > In this case MSN and YYZ. I tried the following > >> > > >> > *test <- dat[dat$Var1 == "YYZ" | dat$Var1 =="MSN" ,]* > >> > > >> > but I am not getting the desired result. > >> > >> What are you getting? How does it differ from the desired result? > >> > >> > I have > >> > > >> > Any help or idea? > >> > >> I get: > >> > >> > dat <- structure(list(X = 1:7, Var1 = c(":", "]", "MSN", "YYZ", > "\\\\", > >> + "+", "?>"), Freq = c(3L, 6L, 1040L, 300L, 4L, 3L, 15L)), .Names > c("X", > >> + "Var1", "Freq"), class = "data.frame", row.names = c(NA, -7L)) > >> > > >> > test <- dat[dat$Var1 == "YYZ" | dat$Var1 =="MSN" ,] > >> > test > >> X Var1 Freq > >> 3 3 MSN 1040 > >> 4 4 YYZ 300 > >> > >> Which seems reasonable to me. > >> > >> > >> > > >> > [[alternative HTML version deleted]] > >> > >> Please don't post in HTML either: it introduces all sorts of errors to > >> your message. > >> > >> Sarah > >> >[[alternative HTML version deleted]]
On Wed, Nov 11, 2015 at 8:44 PM, Ashta <sewashm at gmail.com> wrote:> Hi Sarah, > > I used the following to clean my data, the program crushed several times. > > test <- dat[dat$Var1 == "YYZ" | dat$Var1 =="MSN" ,] > > What is the difference between these two > > test <- dat[dat$Var1 %in% "YYZ" | dat$Var1 %in% "MSN" ,]Besides that you're using %in% wrong? I told you how to proceed. myvalues <- c("YYZ", "MSN") test <- subset(dat, Var1 %in% myvalues)> subset(dat, Var1 %in% myvalues)X Var1 Freq 3 3 MSN 1040 4 4 YYZ 300> > > > > On Wed, Nov 11, 2015 at 6:38 PM, Sarah Goslee <sarah.goslee at gmail.com> > wrote: >> >> Please keep replies on the list so others may participate in the >> conversation. >> >> If you have a character vector containing the potential values, you >> might look at %in% for one approach to subsetting your data. >> >> Var1 %in% myvalues >> >> Sarah >> >> On Wed, Nov 11, 2015 at 7:10 PM, Ashta <sewashm at gmail.com> wrote: >> > Thank you Sarah for your prompt response! >> > >> > I have the list of values of the variable Var1 it is around 20. >> > How can I modify this one to include all the 20 valid values? >> > >> > test <- dat[dat$Var1 == "YYZ" | dat$Var1 =="MSN" ,] >> > >> > Is there a way (efficient ) of doing it? >> > >> > Thank you again >> > >> > >> > >> > On Wed, Nov 11, 2015 at 6:02 PM, Sarah Goslee <sarah.goslee at gmail.com> >> > wrote: >> >> >> >> Hi, >> >> >> >> On Wed, Nov 11, 2015 at 6:51 PM, Ashta <sewashm at gmail.com> wrote: >> >> > Hi all, >> >> > >> >> > I have a data frame with huge rows and columns. >> >> > >> >> > When I looked at the data, it has several garbage values need to be >> >> > >> >> > cleaned. For a sample I am showing you the frequency distribution >> >> > of one variables >> >> > >> >> > Var1 Freq >> >> > 1 : 3 >> >> > 2 ] 6 >> >> > 3 MSN 1040 >> >> > 4 YYZ 300 >> >> > 5 \\ 4 >> >> > 6 + 3 >> >> > 7. ?> 15 >> >> >> >> Please use dput() to provide your data. I made a guess at what you had >> >> in R, but could be wrong. >> >> >> >> >> >> > and continues. >> >> > >> >> > I want to keep those rows that contain only a valid variable value >> >> > >> >> > In this case MSN and YYZ. I tried the following >> >> > >> >> > *test <- dat[dat$Var1 == "YYZ" | dat$Var1 =="MSN" ,]* >> >> > >> >> > but I am not getting the desired result. >> >> >> >> What are you getting? How does it differ from the desired result? >> >> >> >> > I have >> >> > >> >> > Any help or idea? >> >> >> >> I get: >> >> >> >> > dat <- structure(list(X = 1:7, Var1 = c(":", "]", "MSN", "YYZ", >> >> > "\\\\", >> >> + "+", "?>"), Freq = c(3L, 6L, 1040L, 300L, 4L, 3L, 15L)), .Names >> >> c("X", >> >> + "Var1", "Freq"), class = "data.frame", row.names = c(NA, -7L)) >> >> > >> >> > test <- dat[dat$Var1 == "YYZ" | dat$Var1 =="MSN" ,] >> >> > test >> >> X Var1 Freq >> >> 3 3 MSN 1040 >> >> 4 4 YYZ 300 >> >> >> >> Which seems reasonable to me. >> >> >> >> >> >> > >> >> > [[alternative HTML version deleted]] >> >> >> >> Please don't post in HTML either: it introduces all sorts of errors to >> >> your message. >> >> >> >> Sarah >> >> > >
Sarah,
Thank you very much. For the other variables
I was trying to do the same job in different way because it is easier to
list it
Example
test < which(dat$var1 !="BAA" | dat$var1 !="FAG" )
{
dat <- dat[-test,]} and I did not get the right result. What am I
missing here?
On Wed, Nov 11, 2015 at 7:54 PM, Sarah Goslee <sarah.goslee at gmail.com>
wrote:
> On Wed, Nov 11, 2015 at 8:44 PM, Ashta <sewashm at gmail.com> wrote:
> > Hi Sarah,
> >
> > I used the following to clean my data, the program crushed several
times.
> >
> > test <- dat[dat$Var1 == "YYZ" | dat$Var1
=="MSN" ,]
> >
> > What is the difference between these two
> >
> > test <- dat[dat$Var1 %in% "YYZ" | dat$Var1 %in%
"MSN" ,]
>
> Besides that you're using %in% wrong? I told you how to proceed.
>
> myvalues <- c("YYZ", "MSN")
>
> test <- subset(dat, Var1 %in% myvalues)
>
>
> > subset(dat, Var1 %in% myvalues)
> X Var1 Freq
> 3 3 MSN 1040
> 4 4 YYZ 300
>
> >
> >
> >
> >
> > On Wed, Nov 11, 2015 at 6:38 PM, Sarah Goslee <sarah.goslee at
gmail.com>
> > wrote:
> >>
> >> Please keep replies on the list so others may participate in the
> >> conversation.
> >>
> >> If you have a character vector containing the potential values,
you
> >> might look at %in% for one approach to subsetting your data.
> >>
> >> Var1 %in% myvalues
> >>
> >> Sarah
> >>
> >> On Wed, Nov 11, 2015 at 7:10 PM, Ashta <sewashm at
gmail.com> wrote:
> >> > Thank you Sarah for your prompt response!
> >> >
> >> > I have the list of values of the variable Var1 it is around
20.
> >> > How can I modify this one to include all the 20 valid values?
> >> >
> >> > test <- dat[dat$Var1 == "YYZ" | dat$Var1
=="MSN" ,]
> >> >
> >> > Is there a way (efficient ) of doing it?
> >> >
> >> > Thank you again
> >> >
> >> >
> >> >
> >> > On Wed, Nov 11, 2015 at 6:02 PM, Sarah Goslee
<sarah.goslee at gmail.com
> >
> >> > wrote:
> >> >>
> >> >> Hi,
> >> >>
> >> >> On Wed, Nov 11, 2015 at 6:51 PM, Ashta <sewashm at
gmail.com> wrote:
> >> >> > Hi all,
> >> >> >
> >> >> > I have a data frame with huge rows and columns.
> >> >> >
> >> >> > When I looked at the data, it has several garbage
values need to
> be
> >> >> >
> >> >> > cleaned. For a sample I am showing you the frequency
distribution
> >> >> > of one variables
> >> >> >
> >> >> > Var1 Freq
> >> >> > 1 : 3
> >> >> > 2 ] 6
> >> >> > 3 MSN 1040
> >> >> > 4 YYZ 300
> >> >> > 5 \\ 4
> >> >> > 6 + 3
> >> >> > 7. ?> 15
> >> >>
> >> >> Please use dput() to provide your data. I made a guess at
what you
> had
> >> >> in R, but could be wrong.
> >> >>
> >> >>
> >> >> > and continues.
> >> >> >
> >> >> > I want to keep those rows that contain only a valid
variable value
> >> >> >
> >> >> > In this case MSN and YYZ. I tried the following
> >> >> >
> >> >> > *test <- dat[dat$Var1 == "YYZ" |
dat$Var1 =="MSN" ,]*
> >> >> >
> >> >> > but I am not getting the desired result.
> >> >>
> >> >> What are you getting? How does it differ from the desired
result?
> >> >>
> >> >> > I have
> >> >> >
> >> >> > Any help or idea?
> >> >>
> >> >> I get:
> >> >>
> >> >> > dat <- structure(list(X = 1:7, Var1 =
c(":", "]", "MSN", "YYZ",
> >> >> > "\\\\",
> >> >> + "+", "?>"), Freq = c(3L, 6L,
1040L, 300L, 4L, 3L, 15L)), .Names > >> >> c("X",
> >> >> + "Var1", "Freq"), class =
"data.frame", row.names = c(NA, -7L))
> >> >> >
> >> >> > test <- dat[dat$Var1 == "YYZ" |
dat$Var1 =="MSN" ,]
> >> >> > test
> >> >> X Var1 Freq
> >> >> 3 3 MSN 1040
> >> >> 4 4 YYZ 300
> >> >>
> >> >> Which seems reasonable to me.
> >> >>
> >> >>
> >> >> >
> >> >> > [[alternative HTML version deleted]]
> >> >>
> >> >> Please don't post in HTML either: it introduces all
sorts of errors
> to
> >> >> your message.
> >> >>
> >> >> Sarah
> >> >>
> >
> >
>
[[alternative HTML version deleted]]