thr3ads.net - R help - [R] [FORGED] Re: remove [Feb 2017]

If this information is useful, please help other people find it:
Share via:

Rolf Turner

2017-Feb-12 06:04 UTC

[R] [FORGED] Re: remove

On 12/02/17 18:36, Bert Gunter wrote:> Basic stuff!
>
> Either subscripting or ?subset.
>
> There are many good R tutorials on the web. You should spend some
> (more?) time with some.
Uh, Bert, perhaps I'm being obtuse (a common occurrence) but it doesn't 
seem basic to me.  The only way that I can see how to go at it is via
a for loop:

rdln <- function(X) {
# Remove discordant last names.
     ok <- logical(nrow(X))
     for(nm in unique(X$first)) {
         xxx <- unique(X$last[X$first==nm])
         if(length(xxx)==1) ok[X$first==nm] <- TRUE
     }
     Y <- X[ok,]
     Y <- Y[order(Y$first),]
     rownames(Y) <- 1:nrow(Y)
     Y
}

Calling the toy data frame "melvin" rather than "df" (since
"df" is the
name of the built in F density function, it is bad form to use it as the 
name of another object) I get:

 > rdln(melvin)
   first week last
1   Bob    1 John
2   Bob    2 John
3   Bob    3 John
4  Cory    1 Jack
5  Cory    2 Jack

which is the desired output.  If there is a "basic stuff" way to do
this
I'd like to see it.  Perhaps I will then be toadally embarrassed, but 
they say that this is good for one.

cheers,

Rolf

-- 
Technical Editor ANZJS
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276
> On Sat, Feb 11, 2017 at 9:02 PM, Val <valkremk at gmail.com> wrote:
>> Hi all,
>> I have a big data set and want to  remove rows conditionally.
>> In my data file  each person were recorded  for several weeks. Somehow
>> during the recording periods, their last name was misreported.   For
>> each person,   the last name should be the same. Otherwise remove from
>> the data. Example, in the following data set, Alex was found to have
>> two last names .
>>
>> Alex   West
>> Alex   Joseph
>>
>> Alex should be removed  from the data.  if this happens then I want
>> remove  all rows with Alex. Here is my data set
>>
>> df <- read.table(header=TRUE, text='first  week last
>> Alex    1  West
>> Bob     1  John
>> Cory    1  Jack
>> Cory    2  Jack
>> Bob     2  John
>> Bob     3  John
>> Alex    2  Joseph
>> Alex    3  West
>> Alex    4  West ')
>>
>> Desired output
>>
>>       first  week last
>> 1     Bob     1   John
>> 2     Bob     2   John
>> 3     Bob     3   John
>> 4     Cory     1   Jack
>> 5     Cory     2   Jack

Bert Gunter

2017-Feb-12 16:19 UTC

head link

[R] [FORGED] Re: remove

My understanding was that the discordant names has been identified. So
in the example the OP gave, removing rows with first = "Alex" is done
by:

df[df$first !="Alex",]

If that is not the case, as others have pointed out, various forms of
tapply() (by, ave, etc.) can be used. I agree that that is not so
"basic," so I apologize if my understanding was incorrect.

Cheers,
Bert




Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Sat, Feb 11, 2017 at 10:04 PM, Rolf Turner <r.turner at auckland.ac.nz>
wrote:>
> On 12/02/17 18:36, Bert Gunter wrote:
>>
>> Basic stuff!
>>
>> Either subscripting or ?subset.
>>
>> There are many good R tutorials on the web. You should spend some
>> (more?) time with some.
>
>
> Uh, Bert, perhaps I'm being obtuse (a common occurrence) but it
doesn't seem
> basic to me.  The only way that I can see how to go at it is via
> a for loop:
>
> rdln <- function(X) {
> # Remove discordant last names.
>     ok <- logical(nrow(X))
>     for(nm in unique(X$first)) {
>         xxx <- unique(X$last[X$first==nm])
>         if(length(xxx)==1) ok[X$first==nm] <- TRUE
>     }
>     Y <- X[ok,]
>     Y <- Y[order(Y$first),]
>     rownames(Y) <- 1:nrow(Y)
>     Y
> }
>
> Calling the toy data frame "melvin" rather than "df"
(since "df" is the name
> of the built in F density function, it is bad form to use it as the name of
> another object) I get:
>
>> rdln(melvin)
>   first week last
> 1   Bob    1 John
> 2   Bob    2 John
> 3   Bob    3 John
> 4  Cory    1 Jack
> 5  Cory    2 Jack
>
> which is the desired output.  If there is a "basic stuff" way to
do this
> I'd like to see it.  Perhaps I will then be toadally embarrassed, but
they
> say that this is good for one.
>
> cheers,
>
> Rolf
>
> --
> Technical Editor ANZJS
> Department of Statistics
> University of Auckland
> Phone: +64-9-373-7599 ext. 88276
>
>> On Sat, Feb 11, 2017 at 9:02 PM, Val <valkremk at gmail.com>
wrote:
>>>
>>> Hi all,
>>> I have a big data set and want to  remove rows conditionally.
>>> In my data file  each person were recorded  for several weeks.
Somehow
>>> during the recording periods, their last name was misreported.  
For
>>> each person,   the last name should be the same. Otherwise remove
from
>>> the data. Example, in the following data set, Alex was found to
have
>>> two last names .
>>>
>>> Alex   West
>>> Alex   Joseph
>>>
>>> Alex should be removed  from the data.  if this happens then I want
>>> remove  all rows with Alex. Here is my data set
>>>
>>> df <- read.table(header=TRUE, text='first  week last
>>> Alex    1  West
>>> Bob     1  John
>>> Cory    1  Jack
>>> Cory    2  Jack
>>> Bob     2  John
>>> Bob     3  John
>>> Alex    2  Joseph
>>> Alex    3  West
>>> Alex    4  West ')
>>>
>>> Desired output
>>>
>>>       first  week last
>>> 1     Bob     1   John
>>> 2     Bob     2   John
>>> 3     Bob     3   John
>>> 4     Cory     1   Jack
>>> 5     Cory     2   Jack

Rainer Schuermann

2017-Feb-12 17:17 UTC

head link

[R] [FORGED] Re: remove

I may not be understanding the question well enough but for me

df[ df[ , "first"]  != "Alex", ]

seems to do the job:

  first week last 

Rainer




On Sonntag, 12. Februar 2017 19:04:19 CET Rolf Turner
wrote:> 
> On 12/02/17 18:36, Bert Gunter wrote:
> > Basic stuff!
> >
> > Either subscripting or ?subset.
> >
> > There are many good R tutorials on the web. You should spend some
> > (more?) time with some.
> 
> Uh, Bert, perhaps I'm being obtuse (a common occurrence) but it
doesn't
> seem basic to me.  The only way that I can see how to go at it is via
> a for loop:
> 
> rdln <- function(X) {
> # Remove discordant last names.
>      ok <- logical(nrow(X))
>      for(nm in unique(X$first)) {
>          xxx <- unique(X$last[X$first==nm])
>          if(length(xxx)==1) ok[X$first==nm] <- TRUE
>      }
>      Y <- X[ok,]
>      Y <- Y[order(Y$first),]
>      rownames(Y) <- 1:nrow(Y)
>      Y
> }
> 
> Calling the toy data frame "melvin" rather than "df"
(since "df" is the
> name of the built in F density function, it is bad form to use it as the 
> name of another object) I get:
> 
>  > rdln(melvin)
>    first week last
> 1   Bob    1 John
> 2   Bob    2 John
> 3   Bob    3 John
> 4  Cory    1 Jack
> 5  Cory    2 Jack
> 
> which is the desired output.  If there is a "basic stuff" way to
do this
> I'd like to see it.  Perhaps I will then be toadally embarrassed, but 
> they say that this is good for one.
> 
> cheers,
> 
> Rolf
> 
> > On Sat, Feb 11, 2017 at 9:02 PM, Val <valkremk at gmail.com>
wrote:
> >> Hi all,
> >> I have a big data set and want to  remove rows conditionally.
> >> In my data file  each person were recorded  for several weeks.
Somehow
> >> during the recording periods, their last name was misreported.  
For
> >> each person,   the last name should be the same. Otherwise remove
from
> >> the data. Example, in the following data set, Alex was found to
have
> >> two last names .
> >>
> >> Alex   West
> >> Alex   Joseph
> >>
> >> Alex should be removed  from the data.  if this happens then I
want
> >> remove  all rows with Alex. Here is my data set
> >>
> >> df <- read.table(header=TRUE, text='first  week last
> >> Alex    1  West
> >> Bob     1  John
> >> Cory    1  Jack
> >> Cory    2  Jack
> >> Bob     2  John
> >> Bob     3  John
> >> Alex    2  Joseph
> >> Alex    3  West
> >> Alex    4  West ')
> >>
> >> Desired output
> >>
> >>       first  week last
> >> 1     Bob     1   John
> >> 2     Bob     2   John
> >> 3     Bob     3   John
> >> 4     Cory     1   Jack
> >> 5     Cory     2   Jack
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 


	[[alternative HTML version deleted]]

Val

2017-Feb-12 18:51 UTC

head link

[R] [FORGED] Re: remove

Thank you Rainer,

The question was :-
1. Identify those first names with different last names or more than
one last names.
2. Once identified (like Alex)  then exclude them.  This is because
not reliable record.

On Sun, Feb 12, 2017 at 11:17 AM, Rainer Schuermann
<Rainer.Schuermann at gmx.net> wrote:> I may not be understanding the question well enough but for me
>
> df[ df[ , "first"]  != "Alex", ]
>
> seems to do the job:
>
>   first week last
>
> Rainer
>
>
>
>
> On Sonntag, 12. Februar 2017 19:04:19 CET Rolf Turner wrote:
>>
>> On 12/02/17 18:36, Bert Gunter wrote:
>> > Basic stuff!
>> >
>> > Either subscripting or ?subset.
>> >
>> > There are many good R tutorials on the web. You should spend some
>> > (more?) time with some.
>>
>> Uh, Bert, perhaps I'm being obtuse (a common occurrence) but it
doesn't
>> seem basic to me.  The only way that I can see how to go at it is via
>> a for loop:
>>
>> rdln <- function(X) {
>> # Remove discordant last names.
>>      ok <- logical(nrow(X))
>>      for(nm in unique(X$first)) {
>>          xxx <- unique(X$last[X$first==nm])
>>          if(length(xxx)==1) ok[X$first==nm] <- TRUE
>>      }
>>      Y <- X[ok,]
>>      Y <- Y[order(Y$first),]
>>      rownames(Y) <- 1:nrow(Y)
>>      Y
>> }
>>
>> Calling the toy data frame "melvin" rather than
"df" (since "df" is the
>> name of the built in F density function, it is bad form to use it as
the
>> name of another object) I get:
>>
>>  > rdln(melvin)
>>    first week last
>> 1   Bob    1 John
>> 2   Bob    2 John
>> 3   Bob    3 John
>> 4  Cory    1 Jack
>> 5  Cory    2 Jack
>>
>> which is the desired output.  If there is a "basic stuff" way
to do this
>> I'd like to see it.  Perhaps I will then be toadally embarrassed,
but
>> they say that this is good for one.
>>
>> cheers,
>>
>> Rolf
>>
>> > On Sat, Feb 11, 2017 at 9:02 PM, Val <valkremk at gmail.com>
wrote:
>> >> Hi all,
>> >> I have a big data set and want to  remove rows conditionally.
>> >> In my data file  each person were recorded  for several weeks.
Somehow
>> >> during the recording periods, their last name was misreported.
For
>> >> each person,   the last name should be the same. Otherwise
remove from
>> >> the data. Example, in the following data set, Alex was found
to have
>> >> two last names .
>> >>
>> >> Alex   West
>> >> Alex   Joseph
>> >>
>> >> Alex should be removed  from the data.  if this happens then I
want
>> >> remove  all rows with Alex. Here is my data set
>> >>
>> >> df <- read.table(header=TRUE, text='first  week last
>> >> Alex    1  West
>> >> Bob     1  John
>> >> Cory    1  Jack
>> >> Cory    2  Jack
>> >> Bob     2  John
>> >> Bob     3  John
>> >> Alex    2  Joseph
>> >> Alex    3  West
>> >> Alex    4  West ')
>> >>
>> >> Desired output
>> >>
>> >>       first  week last
>> >> 1     Bob     1   John
>> >> 2     Bob     2   John
>> >> 3     Bob     3   John
>> >> 4     Cory     1   Jack
>> >> 5     Cory     2   Jack
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

R help - Feb 2017 - [FORGED] Re: remove

[R] [FORGED] Re: remove

[R] [FORGED] Re: remove

[R] [FORGED] Re: remove

[R] [FORGED] Re: remove