thr3ads.net - R help - [R] R help [Aug 2016]

If this information is useful, please help other people find it:
Share via:

Вова Грабарник

2016-Aug-05 14:07 UTC

[R] R help

Dear R command,

I was wondering if I could ask you recommendations on my problem if that is
fine with you.
Basically, I have a data frame with 5 columns and 10 000 tweets
recorded(rows). Those columns are: numberofatweet(number), tweet (actual
textual tweet), locations(from where tweet sent), badwords(words that
should not be used on twitter, that is just a column irrespective the
number of a tweet and it contains only 80 rows with one word recorded in
one cell.
My question is whether it is possible to select only the rows which would
contain such tweets, where in column "tweet"(actual text) there was
one of
those words from badwords column present. I tried to use grep and grepl,
but nothing seems to be working.

Thank you in advance,
Vladimir

	[[alternative HTML version deleted]]

ruipbarradas at sapo.pt

2016-Aug-05 15:17 UTC

head link

[R] R help

Hello,

Please use ?dput to post a data example. Use something like the  
following, where 'dat' is the name of your data.frame.

dput(head(dat, 30))? # paste the output of this in a mail

Hope this helps,

Rui Barradas
?

Citando ???? ????????? <v.grabarnik at gmail.com>:
> Dear R command,
>
> I was wondering if I could ask you recommendations on my problem if that is
> fine with you.
> Basically, I have a data frame with 5 columns and 10 000 tweets
> recorded(rows). Those columns are: numberofatweet(number), tweet (actual
> textual tweet), locations(from where tweet sent), badwords(words that
> should not be used on twitter, that is just a column irrespective the
> number of a tweet and it contains only 80 rows with one word recorded in
> one cell.
> My question is whether it is possible to select only the rows which would
> contain such tweets, where in column "tweet"(actual text) there
was one of
> those words from badwords column present. I tried to use grep and grepl,
> but nothing seems to be working.
>
> Thank you in advance,
> Vladimir
>
> ? ? ? ? [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide  
> http://www.R-project.org/posting-guide.htmland provide commented,  
> minimal, self-contained, reproducible code.
?

	[[alternative HTML version deleted]]

Ulrik Stervbo

2016-Aug-05 17:14 UTC

head link

[R] R help

I'm not quite sure if this is what you are looking for:

example.df <- data.frame(words= c("A T", "Z H", "B
E", "C P H"), badwords c("A|I|J|H|K|L"))

# Extract the column with bad words
badwords <- example.df$badwords
badwords <- as.character(badwords[1])

# Subset the data.frame
subset(example.df, grepl(badwords, words))

As I understand your email the badwords column contains all bad words in
each cell, so I assume they are separated somehow. In my example I use |
because it used to signify OR in grep. Since all elements of the bad word
column are equal I just get the first element, make sure it is a character,
and use grepl to subset the entire data.frame

HTH
Ulrik

On Fri, 5 Aug 2016 at 17:19 <ruipbarradas at sapo.pt> wrote:
> Hello,
>
> Please use ?dput to post a data example. Use something like the
> following, where 'dat' is the name of your data.frame.
>
> dput(head(dat, 30))  # paste the output of this in a mail
>
> Hope this helps,
>
> Rui Barradas
>
>
> Citando ???? ????????? <v.grabarnik at gmail.com>:
>
> > Dear R command,
> >
> > I was wondering if I could ask you recommendations on my problem if
that
> is
> > fine with you.
> > Basically, I have a data frame with 5 columns and 10 000 tweets
> > recorded(rows). Those columns are: numberofatweet(number), tweet
(actual
> > textual tweet), locations(from where tweet sent), badwords(words that
> > should not be used on twitter, that is just a column irrespective the
> > number of a tweet and it contains only 80 rows with one word recorded
in
> > one cell.
> > My question is whether it is possible to select only the rows which
would
> > contain such tweets, where in column "tweet"(actual text)
there was one
> of
> > those words from badwords column present. I tried to use grep and
grepl,
> > but nothing seems to be working.
> >
> > Thank you in advance,
> > Vladimir
> >
> >         [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.htmland provide commented,
> > minimal, self-contained, reproducible code.
>
>
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
	[[alternative HTML version deleted]]

Jim Lemon

2016-Aug-06 23:19 UTC

head link

[R] R help

Hi Vladimir,
Do you want something like this?

vdat<-read.table(text="numberoftweet,tweet,locations,badwords
1,My cat is asleep,London,glum
2,My cat is flying,Paris,dashed
3,My cat is dancing,Berlin,mopey
4,My cat is singing,Rome,ill
5,My cat is reading,Budapest,sad
6,My cat is eating,Amsterdam,annoyed
7,My cat is hiding,Copenhagen,crazy
8,My cat is fluffy,Vilnius,terrified
9,My cat is annoyed,Athens,sick
10,My cat is exercising,Ankara,mortified
11,My cat is dreaming,Kracow,irked
12,My cat is mopey,Vienna,uneasy
13,My cat is glum,Brussels,upset",
sep=",",header=TRUE,stringsAsFactors=FALSE)

badwords<-paste(vdat$badwords,collapse="|")

names(unlist(sapply(vdat$tweet,grep,pattern=badwords)))

Jim


On Sat, Aug 6, 2016 at 12:07 AM, ???? ????????? <v.grabarnik at gmail.com>
wrote:> Dear R command,
>
> I was wondering if I could ask you recommendations on my problem if that is
> fine with you.
> Basically, I have a data frame with 5 columns and 10 000 tweets
> recorded(rows). Those columns are: numberofatweet(number), tweet (actual
> textual tweet), locations(from where tweet sent), badwords(words that
> should not be used on twitter, that is just a column irrespective the
> number of a tweet and it contains only 80 rows with one word recorded in
> one cell.
> My question is whether it is possible to select only the rows which would
> contain such tweets, where in column "tweet"(actual text) there
was one of
> those words from badwords column present. I tried to use grep and grepl,
> but nothing seems to be working.
>
> Thank you in advance,
> Vladimir
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Jim Lemon

2016-Aug-07 23:22 UTC

head link

[R] R help

Hi Vladimir,
This may fix the NA problem:

vdat<-read.table(text="numberoftweet,tweet,locations,badwords
1,My cat is asleep,London,glum
2,My cat is flying,Paris,dashed
3,My cat is dancing,Berlin,mopey
4,My cat is singing,Rome,ill
5,My cat is reading,Budapest,sad
6,My cat is eating,Amsterdam,annoyed
7,My cat is hiding,Copenhagen,crazy
8,My cat is fluffy,Vilnius,terrified
9,My cat is annoyed,Athens,sick
10,My cat is exercising,Ankara,mortified
11,My cat is dreaming,Kracow,irked
12,My cat is mopey,Vienna,uneasy
13,My cat is glum,Brussels,upset
14,My cat is swinging,Madrid,
15,My cat is crazy,Ljubljana,",
sep=",",header=TRUE,stringsAsFactors=FALSE)

vdat$badwords[!nchar(vdat$badwords)]<-NA

badwords<-paste(vdat$badwords[!is.na(vdat$badwords)],collapse="|")

names(unlist(sapply(vdat$tweet,grep,pattern=badwords)))

Jim


On Sun, Aug 7, 2016 at 6:43 PM, ???? ????????? <v.grabarnik at gmail.com>
wrote:> Hi Jim!
>
> That is exactly what I mean. Your example does the job I was looking for.
> If I refer to your example, my badwords column is not completed for all
> rows, like yours. For example it has only 10 values, but there are much
more
> rows. When I try to introduce NA for blanks and write
> badwords<-paste(vdat$badwords,collapse="|")
> it collapses all values and writes smth like: word|word|NA|NA
> and if I dont introduce NAs when reading data, the outcome is still like:
> word|word|word|word||||||||||||||||
> and when I try to
> names(unlist(sapply(vdat$tweet,grep,pattern=badwords))) there is a mistake.
> I had this question before but do you know by any chance how to separate
> just those words in a column badwords and not include NA's or blanks.
>
> Thank you,
> Vladimir
>
> 2016-08-07 0:19 GMT+01:00 Jim Lemon <drjimlemon at gmail.com>:
>>
>> Hi Vladimir,
>> Do you want something like this?
>>
>> vdat<-read.table(text="numberoftweet,tweet,locations,badwords
>> 1,My cat is asleep,London,glum
>> 2,My cat is flying,Paris,dashed
>> 3,My cat is dancing,Berlin,mopey
>> 4,My cat is singing,Rome,ill
>> 5,My cat is reading,Budapest,sad
>> 6,My cat is eating,Amsterdam,annoyed
>> 7,My cat is hiding,Copenhagen,crazy
>> 8,My cat is fluffy,Vilnius,terrified
>> 9,My cat is annoyed,Athens,sick
>> 10,My cat is exercising,Ankara,mortified
>> 11,My cat is dreaming,Kracow,irked
>> 12,My cat is mopey,Vienna,uneasy
>> 13,My cat is glum,Brussels,upset",
>> sep=",",header=TRUE,stringsAsFactors=FALSE)
>>
>> badwords<-paste(vdat$badwords,collapse="|")
>>
>> names(unlist(sapply(vdat$tweet,grep,pattern=badwords)))
>>
>> Jim
>>
>>
>> On Sat, Aug 6, 2016 at 12:07 AM, ???? ????????? <v.grabarnik at
gmail.com>
>> wrote:
>> > Dear R command,
>> >
>> > I was wondering if I could ask you recommendations on my problem
if that
>> > is
>> > fine with you.
>> > Basically, I have a data frame with 5 columns and 10 000 tweets
>> > recorded(rows). Those columns are: numberofatweet(number), tweet
(actual
>> > textual tweet), locations(from where tweet sent), badwords(words
that
>> > should not be used on twitter, that is just a column irrespective
the
>> > number of a tweet and it contains only 80 rows with one word
recorded in
>> > one cell.
>> > My question is whether it is possible to select only the rows
which
>> > would
>> > contain such tweets, where in column "tweet"(actual
text) there was one
>> > of
>> > those words from badwords column present. I tried to use grep and
grepl,
>> > but nothing seems to be working.
>> >
>> > Thank you in advance,
>> > Vladimir
>> >
>> >         [[alternative HTML version deleted]]
>> >
>> > ______________________________________________
>> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
see
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> > http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>
>
>
>
> --
> ? ?????????,
> ?????? ?????????

R help - Aug 2016 - R help

[R] R help

[R] R help

[R] R help

[R] R help

[R] R help