thr3ads.net - R help - [R] Cleaning [Nov 2015]

If this information is useful, please help other people find it:
Share via:

Ashta

2015-Nov-12 01:44 UTC

[R] Cleaning

Hi Sarah,

I used the following to clean my data, the program crushed several times.


*test <- dat[dat$Var1 == "YYZ" | dat$Var1 =="MSN" ,]*



*What is the difference between these two**test <- dat[dat$Var1
**%in% "YYZ" | dat$Var1** %in% "MSN" ,]*




On Wed, Nov 11, 2015 at 6:38 PM, Sarah Goslee <sarah.goslee at gmail.com>
wrote:
> Please keep replies on the list so others may participate in the
> conversation.
>
> If you have a character vector containing the potential values, you
> might look at %in% for one approach to subsetting your data.
>
> Var1 %in% myvalues
>
> Sarah
>
> On Wed, Nov 11, 2015 at 7:10 PM, Ashta <sewashm at gmail.com> wrote:
> > Thank you Sarah for your prompt response!
> >
> > I have the list of values of the variable Var1 it is around 20.
> > How can I modify this one to include all the 20 valid values?
> >
> > test <- dat[dat$Var1 == "YYZ" | dat$Var1
=="MSN" ,]
> >
> > Is there a way (efficient )  of doing it?
> >
> > Thank you again
> >
> >
> >
> > On Wed, Nov 11, 2015 at 6:02 PM, Sarah Goslee <sarah.goslee at
gmail.com>
> > wrote:
> >>
> >> Hi,
> >>
> >> On Wed, Nov 11, 2015 at 6:51 PM, Ashta <sewashm at
gmail.com> wrote:
> >> > Hi all,
> >> >
> >> > I have a data frame with  huge rows and columns.
> >> >
> >> > When I looked at the data,  it has several garbage values
need to be
> >> >
> >> > cleaned. For a sample I am showing you the frequency
distribution
> >> > of one variables
> >> >
> >> >     Var1 Freq
> >> > 1    :    3
> >> > 2    ]    6
> >> > 3    MSN 1040
> >> > 4    YYZ  300
> >> > 5    \\    4
> >> > 6    +     3
> >> > 7.   ?>   15
> >>
> >> Please use dput() to provide your data. I made a guess at what you
had
> >> in R, but could be wrong.
> >>
> >>
> >> > and continues.
> >> >
> >> > I want to keep those rows that contain only a valid variable
value
> >> >
> >> > In this  case MSN and YYZ. I tried the following
> >> >
> >> > *test <- dat[dat$Var1 == "YYZ" | dat$Var1
=="MSN" ,]*
> >> >
> >> > but I am not getting the desired result.
> >>
> >> What are you getting? How does it differ from the desired result?
> >>
> >> >  I have
> >> >
> >> > Any help or idea?
> >>
> >> I get:
> >>
> >> > dat <- structure(list(X = 1:7, Var1 = c(":",
"]", "MSN", "YYZ",
> "\\\\",
> >> + "+", "?>"), Freq = c(3L, 6L, 1040L, 300L,
4L, 3L, 15L)), .Names > c("X",
> >> + "Var1", "Freq"), class =
"data.frame", row.names = c(NA, -7L))
> >> >
> >> > test <- dat[dat$Var1 == "YYZ" | dat$Var1
=="MSN" ,]
> >> > test
> >>   X Var1 Freq
> >> 3 3  MSN 1040
> >> 4 4  YYZ  300
> >>
> >> Which seems reasonable to me.
> >>
> >>
> >> >
> >> >         [[alternative HTML version deleted]]
> >>
> >> Please don't post in HTML either: it introduces all sorts of
errors to
> >> your message.
> >>
> >> Sarah
> >>
>
	[[alternative HTML version deleted]]

Sarah Goslee

2015-Nov-12 01:54 UTC

head link

[R] Cleaning

On Wed, Nov 11, 2015 at 8:44 PM, Ashta <sewashm at gmail.com>
wrote:> Hi Sarah,
>
> I used the following to clean my data, the program crushed several times.
>
> test <- dat[dat$Var1 == "YYZ" | dat$Var1 =="MSN" ,]
>
> What is the difference between these two
>
> test <- dat[dat$Var1  %in% "YYZ" | dat$Var1 %in%
"MSN" ,]
Besides that you're using %in% wrong? I told you how to proceed.

myvalues <- c("YYZ", "MSN")

test <- subset(dat, Var1 %in% myvalues)

> subset(dat, Var1 %in% myvalues)  X Var1 Freq
3 3  MSN 1040
4 4  YYZ  300
>
>
>
>
> On Wed, Nov 11, 2015 at 6:38 PM, Sarah Goslee <sarah.goslee at
gmail.com>
> wrote:
>>
>> Please keep replies on the list so others may participate in the
>> conversation.
>>
>> If you have a character vector containing the potential values, you
>> might look at %in% for one approach to subsetting your data.
>>
>> Var1 %in% myvalues
>>
>> Sarah
>>
>> On Wed, Nov 11, 2015 at 7:10 PM, Ashta <sewashm at gmail.com>
wrote:
>> > Thank you Sarah for your prompt response!
>> >
>> > I have the list of values of the variable Var1 it is around 20.
>> > How can I modify this one to include all the 20 valid values?
>> >
>> > test <- dat[dat$Var1 == "YYZ" | dat$Var1
=="MSN" ,]
>> >
>> > Is there a way (efficient )  of doing it?
>> >
>> > Thank you again
>> >
>> >
>> >
>> > On Wed, Nov 11, 2015 at 6:02 PM, Sarah Goslee <sarah.goslee at
gmail.com>
>> > wrote:
>> >>
>> >> Hi,
>> >>
>> >> On Wed, Nov 11, 2015 at 6:51 PM, Ashta <sewashm at
gmail.com> wrote:
>> >> > Hi all,
>> >> >
>> >> > I have a data frame with  huge rows and columns.
>> >> >
>> >> > When I looked at the data,  it has several garbage values
need to be
>> >> >
>> >> > cleaned. For a sample I am showing you the frequency
distribution
>> >> > of one variables
>> >> >
>> >> >     Var1 Freq
>> >> > 1    :    3
>> >> > 2    ]    6
>> >> > 3    MSN 1040
>> >> > 4    YYZ  300
>> >> > 5    \\    4
>> >> > 6    +     3
>> >> > 7.   ?>   15
>> >>
>> >> Please use dput() to provide your data. I made a guess at what
you had
>> >> in R, but could be wrong.
>> >>
>> >>
>> >> > and continues.
>> >> >
>> >> > I want to keep those rows that contain only a valid
variable value
>> >> >
>> >> > In this  case MSN and YYZ. I tried the following
>> >> >
>> >> > *test <- dat[dat$Var1 == "YYZ" | dat$Var1
=="MSN" ,]*
>> >> >
>> >> > but I am not getting the desired result.
>> >>
>> >> What are you getting? How does it differ from the desired
result?
>> >>
>> >> >  I have
>> >> >
>> >> > Any help or idea?
>> >>
>> >> I get:
>> >>
>> >> > dat <- structure(list(X = 1:7, Var1 = c(":",
"]", "MSN", "YYZ",
>> >> > "\\\\",
>> >> + "+", "?>"), Freq = c(3L, 6L, 1040L,
300L, 4L, 3L, 15L)), .Names >> >> c("X",
>> >> + "Var1", "Freq"), class =
"data.frame", row.names = c(NA, -7L))
>> >> >
>> >> > test <- dat[dat$Var1 == "YYZ" | dat$Var1
=="MSN" ,]
>> >> > test
>> >>   X Var1 Freq
>> >> 3 3  MSN 1040
>> >> 4 4  YYZ  300
>> >>
>> >> Which seems reasonable to me.
>> >>
>> >>
>> >> >
>> >> >         [[alternative HTML version deleted]]
>> >>
>> >> Please don't post in HTML either: it introduces all sorts
of errors to
>> >> your message.
>> >>
>> >> Sarah
>> >>
>
>

Ashta

2015-Nov-12 03:03 UTC

head link

[R] Cleaning

Sarah,

Thank you very much.   For the other variables
I was trying to do the same job in different way because it is easier to
list it

Example

test < which(dat$var1  !="BAA" | dat$var1 !="FAG" )
 {
    dat <- dat[-test,]}   and I did not get the  right result. What am I
missing here?





On Wed, Nov 11, 2015 at 7:54 PM, Sarah Goslee <sarah.goslee at gmail.com>
wrote:
> On Wed, Nov 11, 2015 at 8:44 PM, Ashta <sewashm at gmail.com> wrote:
> > Hi Sarah,
> >
> > I used the following to clean my data, the program crushed several
times.
> >
> > test <- dat[dat$Var1 == "YYZ" | dat$Var1
=="MSN" ,]
> >
> > What is the difference between these two
> >
> > test <- dat[dat$Var1  %in% "YYZ" | dat$Var1 %in%
"MSN" ,]
>
> Besides that you're using %in% wrong? I told you how to proceed.
>
> myvalues <- c("YYZ", "MSN")
>
> test <- subset(dat, Var1 %in% myvalues)
>
>
> > subset(dat, Var1 %in% myvalues)
>   X Var1 Freq
> 3 3  MSN 1040
> 4 4  YYZ  300
>
> >
> >
> >
> >
> > On Wed, Nov 11, 2015 at 6:38 PM, Sarah Goslee <sarah.goslee at
gmail.com>
> > wrote:
> >>
> >> Please keep replies on the list so others may participate in the
> >> conversation.
> >>
> >> If you have a character vector containing the potential values,
you
> >> might look at %in% for one approach to subsetting your data.
> >>
> >> Var1 %in% myvalues
> >>
> >> Sarah
> >>
> >> On Wed, Nov 11, 2015 at 7:10 PM, Ashta <sewashm at
gmail.com> wrote:
> >> > Thank you Sarah for your prompt response!
> >> >
> >> > I have the list of values of the variable Var1 it is around
20.
> >> > How can I modify this one to include all the 20 valid values?
> >> >
> >> > test <- dat[dat$Var1 == "YYZ" | dat$Var1
=="MSN" ,]
> >> >
> >> > Is there a way (efficient )  of doing it?
> >> >
> >> > Thank you again
> >> >
> >> >
> >> >
> >> > On Wed, Nov 11, 2015 at 6:02 PM, Sarah Goslee
<sarah.goslee at gmail.com
> >
> >> > wrote:
> >> >>
> >> >> Hi,
> >> >>
> >> >> On Wed, Nov 11, 2015 at 6:51 PM, Ashta <sewashm at
gmail.com> wrote:
> >> >> > Hi all,
> >> >> >
> >> >> > I have a data frame with  huge rows and columns.
> >> >> >
> >> >> > When I looked at the data,  it has several garbage
values need to
> be
> >> >> >
> >> >> > cleaned. For a sample I am showing you the frequency
distribution
> >> >> > of one variables
> >> >> >
> >> >> >     Var1 Freq
> >> >> > 1    :    3
> >> >> > 2    ]    6
> >> >> > 3    MSN 1040
> >> >> > 4    YYZ  300
> >> >> > 5    \\    4
> >> >> > 6    +     3
> >> >> > 7.   ?>   15
> >> >>
> >> >> Please use dput() to provide your data. I made a guess at
what you
> had
> >> >> in R, but could be wrong.
> >> >>
> >> >>
> >> >> > and continues.
> >> >> >
> >> >> > I want to keep those rows that contain only a valid
variable value
> >> >> >
> >> >> > In this  case MSN and YYZ. I tried the following
> >> >> >
> >> >> > *test <- dat[dat$Var1 == "YYZ" |
dat$Var1 =="MSN" ,]*
> >> >> >
> >> >> > but I am not getting the desired result.
> >> >>
> >> >> What are you getting? How does it differ from the desired
result?
> >> >>
> >> >> >  I have
> >> >> >
> >> >> > Any help or idea?
> >> >>
> >> >> I get:
> >> >>
> >> >> > dat <- structure(list(X = 1:7, Var1 =
c(":", "]", "MSN", "YYZ",
> >> >> > "\\\\",
> >> >> + "+", "?>"), Freq = c(3L, 6L,
1040L, 300L, 4L, 3L, 15L)), .Names > >> >> c("X",
> >> >> + "Var1", "Freq"), class =
"data.frame", row.names = c(NA, -7L))
> >> >> >
> >> >> > test <- dat[dat$Var1 == "YYZ" |
dat$Var1 =="MSN" ,]
> >> >> > test
> >> >>   X Var1 Freq
> >> >> 3 3  MSN 1040
> >> >> 4 4  YYZ  300
> >> >>
> >> >> Which seems reasonable to me.
> >> >>
> >> >>
> >> >> >
> >> >> >         [[alternative HTML version deleted]]
> >> >>
> >> >> Please don't post in HTML either: it introduces all
sorts of errors
> to
> >> >> your message.
> >> >>
> >> >> Sarah
> >> >>
> >
> >
>
	[[alternative HTML version deleted]]

R help - Nov 2015 - Cleaning

[R] Cleaning

[R] Cleaning

[R] Cleaning