Hi. I have a huge list called twitter:> dim(twitter)NULL> str(twitter)List of 1 $ :Classes 'PlainTextDocument', 'TextDocument', 'character' atomic [1:35575] 11999;10:47:14;20;10;2009;ObamaLouverture;Trails Mixed Lessons For Governance From Campaigner-in-chief: President obama jumps campaign 09 tuesday.. http://bit.ly/2eHMaN;Florida;USA;FL;;;27.6648274;-81.5157535 12210;10:47:37;20;10;2009;David_Stringer;William Hague heading Washington meets Gen. Jim Jones, Sen. John McCain others. Will Obama team raise worries EU ties?;London, England;United Kingdom;Greater London;Westminster;;51.5001524;-0.1262362 12355;10:47:53;20;10;2009;Singsabit;RT @Drudge_Report PAPER: Excuses wearing thin Obama, media pals... http://tinyurl.com/yfw6cd9;So. California;USA;CA;;;36.778261;-119.4179324 12407;10:47:59;20;10;2009;obamavideonews;Obama News Obama Afghanistan troop decision timing (AFP) : AFP - Pres.. http://bit.ly/3KPUr8 #obama #video;USA;USA;;;;37.09024;-95.712891 ... .. ..- attr(*, "Author")= chr(0) .. ..- attr(*, "DateTimeStamp")= POSIXlt[1:9], format: "2009-10-31 04:46:56" .. ..- attr(*, "Description")= chr(0) .. ..- attr(*, "Heading")= chr(0) .. ..- attr(*, "ID")= chr "1" .. ..- attr(*, "Language")= chr "en" .. ..- attr(*, "LocalMetaData")= list() .. ..- attr(*, "Origin")= chr(0) - attr(*, "CMetaData")=List of 3 ..$ NodeID : num 0 ..$ MetaData:List of 2 .. ..$ create_date: POSIXlt[1:9], format: "2009-10-31 04:46:56" .. ..$ creator : Named chr "" .. .. ..- attr(*, "names")= chr "LOGNAME" ..$ Children: NULL ..- attr(*, "class")= chr "MetaDataNode" - attr(*, "DMetaData")='data.frame': 1 obs. of 1 variable: ..$ MetaID: num 0 - attr(*, "class")= chr [1:3] "VCorpus" "Corpus" "list" It contains tweets but in many languages. The "columns" are separated by semi-colons. I am using the tm package and it is a "corpus". It looks like this: 547282;06:37:17;21;10;2009;dani_jade18;@Laura_Whyte1 day :p;Huddersfield/Lincoln;United Kingdom;Kirklees;Kirklees;;53.6468475;-1.7727296 547283;06:37:17;21;10;2009;fabiomafra;algu?m traz mais lenha pro computador da facool? BOM DIA.;Belo Horizonte - MG - BR;Brazil;MG;;;-19.8157306;-43.9542226 547284;06:37:17;21;10;2009;romanotr;???, "????????? ??? ??????" ???????????? ?????? ????? ?? ???????? ?????, ?? 173 ?????? ?? 81 ????? ???????? ???????. ??????,??????...;Portugal Aveiro;Portugal;Aveiro;;;40.6411848;-8.6536169 547285;06:37:18;21;10;2009;Y_T_;Playing: Beth Orton <\;Someone's Daughter>\;;Kanazawa, Japan;Japan;Ishikawa Prefecture;;;36.5613254;136.6562051 Error: invalid input '547286;06:37:18;21;10;2009;Atogey;????????????????????????????????????????????????????????????????????????RT @zuola ???????????? @wenyunc I want to convert it to "fields" or columns and so I thought I should convert it to a dataframe. I tried> twitterDF<-as.data.frame(twitter)Error in sort.list(y) : invalid input '547286;06:37:18;21;10;2009;Atogey;????????????????????????????????????????????????????????????????????????RT @zuola ???????????? @wenyunchao ????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????*????????????????????????????????????????????????????????????????????????????????????;???????????????;China;Zhejiang;;;28.695035;119.751054' in 'utf8towcs'>Can anyone suggest what I can do? P.S. Actually, I would love to remove all the non-English tweets but I have no clue about how to do that. -- View this message in context: http://old.nabble.com/convert-list-to-Dataframe-tp26148889p26148889.html Sent from the R help mailing list archive at Nabble.com.
Three suggestions: -- drop the idea of using a dataframe. It's only appropriate when the data is rectangular. -- look at strsplit for separating at "@" characters. -- post the output of dput() on your sample, since email is probably not capable of rendering this data without creating distortions. -- David On Nov 1, 2009, at 7:43 AM, onyourmark wrote:> > Hi. I have a huge list called twitter: > >> dim(twitter) > NULL >> str(twitter)This looks to have been converted into an R object through soe process on some unspecified input. You should describe that process, and hte only unambiguous method of doing so is by including the code.> List of 1 > $ :Classes 'PlainTextDocument', 'TextDocument', 'character' atomic > [1:35575] 11999;10:47:14;20;10;2009;ObamaLouverture;Trails Mixed > Lessons For > Governance From Campaigner-in-chief: President obama jumps campaign > 09 > tuesday.. http://bit.ly/2eHMaN;Florida;USA;FL;;;27.6648274;-81.5157535 > 12210;10:47:37;20;10;2009;David_Stringer;William Hague heading > Washington > meets Gen. Jim Jones, Sen. John McCain others. Will Obama team raise > worries EU ties?;London, England;United Kingdom;Greater > London;Westminster;;51.5001524;-0.1262362 > 12355;10:47:53;20;10;2009;Singsabit;RT @Drudge_Report PAPER: Excuses > wearing > thin Obama, media pals... http://tinyurl.com/yfw6cd9;So. > California;USA;CA;;;36.778261;-119.4179324 > 12407;10:47:59;20;10;2009;obamavideonews;Obama News Obama > Afghanistan > troop decision timing (AFP) : AFP - Pres.. http://bit.ly/3KPUr8 #obama > #video;USA;USA;;;;37.09024;-95.712891 ... > .. ..- attr(*, "Author")= chr(0) > .. ..- attr(*, "DateTimeStamp")= POSIXlt[1:9], format: "2009-10-31 > 04:46:56" > .. ..- attr(*, "Description")= chr(0) > .. ..- attr(*, "Heading")= chr(0) > .. ..- attr(*, "ID")= chr "1" > .. ..- attr(*, "Language")= chr "en" > .. ..- attr(*, "LocalMetaData")= list() > .. ..- attr(*, "Origin")= chr(0) > - attr(*, "CMetaData")=List of 3 > ..$ NodeID : num 0 > ..$ MetaData:List of 2 > .. ..$ create_date: POSIXlt[1:9], format: "2009-10-31 04:46:56" > .. ..$ creator : Named chr "" > .. .. ..- attr(*, "names")= chr "LOGNAME" > ..$ Children: NULL > ..- attr(*, "class")= chr "MetaDataNode" > - attr(*, "DMetaData")='data.frame': 1 obs. of 1 variable: > ..$ MetaID: num 0 > - attr(*, "class")= chr [1:3] "VCorpus" "Corpus" "list" > > It contains tweets but in many languages. The "columns" are > separated by > semi-colons. I am using the tm package and it is a "corpus". > > It looks like this:It is difficult to see any connection with what you have above.> > 547282;06:37:17;21;10;2009;dani_jade18;@Laura_Whyte1 day > :p;Huddersfield/Lincoln;United > Kingdom;Kirklees;Kirklees;;53.6468475;-1.7727296 > 547283;06:37:17;21;10;2009;fabiomafra;algu?m traz mais lenha pro > computador > da facool? BOM DIA.;Belo Horizonte - MG - > BR;Brazil;MG;;;-19.8157306;-43.9542226 > 547284;06:37:17;21;10;2009;romanotr;???, "????????? > ??? ??????" ???????????? > ?????? ????? ?? ???????? ?????, ?? 173 > ?????? ?? 81 ????? ???????? ???????. > ??????,??????...;Portugal Aveiro;Portugal;Aveiro;;; > 40.6411848;-8.6536169 > 547285;06:37:18;21;10;2009;Y_T_;Playing: Beth Orton <\;Someone's > Daughter>\;;Kanazawa, Japan;Japan;Ishikawa > Prefecture;;;36.5613254;136.6562051 > Error: invalid input > '547286;06:37:18;21;10;2009;Atogey;???????? > ?????????????????? > ?????????????????????????? > ????????? ?????????RT > @zuola ???????????? @wenyunc > > I want to convert it to "fields" or columns and so I thought I should > convert it to a dataframe. I tried > >> twitterDF<-as.data.frame(twitter) > Error in sort.list(y) : > invalid input > '547286;06:37:18;21;10;2009;Atogey;???????? > ?????????????????? > ?????????????????????????? > ????????? ?????????RT > @zuola ???????????? @wenyunchao > ????????????????????????? > ???????????????????????? ?? > ?????????????? ???????? ????????? > ????????????????????????????? > ?????????????????????????????? > ??????????????????????*?????????? > ??????????????? > ???????????????????????????? > ????????????????????????????? > ??;???????????????;China;Zhejiang;;; > 28.695035;119.751054' > in 'utf8towcs' >> > > Can anyone suggest what I can do? > > P.S. Actually, I would love to remove all the non-English tweets but > I have > no clue about how to do that. > > --David Winsemius, MD Heritage Laboratories West Hartford, CT
Hello. The "fields" are separated by a ';'. I think that the
data is
"rectangular" in the sense that there are about 15 fields for each
row. Some
of the fields are empty. In the dput() display below, it seems that the rows
are delimited by ' " ' .
Any idea from this?
Here is the end of the output for dput(twitter)
"4927861;05:04:14;28;10;2009;HOYTSTHEATRES;GameStop Brings  15K  Manage
Holiday Rush [Black Friday]
http://bit.ly/2d3OJg;Australia;Australia;;;;-25.274398;133.775136", 
"4927863;05:04:14;28;10;2009;padden;Rachel  master chef  cook 
anytime!;Sydney, Australia;Australia;NSW;;;-33.867139;151.207114", 
"4927878;05:04:17;28;10;2009;GSpotMagazine;The penalty  success   bored 
attentions  people  formerly snubbed you. -Mary Wilson Little
#quote;UK;United Kingdom;;;;55.378051;-3.435973", 
"4927885;05:04:20;28;10;2009;super_assassin;@triplejsr flight  conchords,
pleeeeeaaase :) thanks rosie
xx;Australia;Australia;;;;-25.274398;133.775136", 
"4927893;05:04:21;28;10;2009;SLMFE;Gestern:Achso,ja okey,um 5 nach las ich
jemanden komen der dir die Akupunkturnadel(zb 5!im Ohr!)entfernt..Um 10 n.
kommt immer noch keiner..;Germany;Germany;;;;51.165691;10.451526", 
"4927901;05:04:23;28;10;2009;mikesemple;HHS Secretary pushes health care
reform  rural America: By Christopher Smart The health-care crisis  ..
http://bit.ly/49Iqcu;London;United Kingdom;Greater
London;Westminster;;51.5001524;-0.1262362", 
"4927913;05:04:26;28;10;2009;coax_k;Facebook Headquarters  Studio O+A: San
Francisco based interior design firm Studio O+A  designed  ..
http://bit.ly/hdqWp;Sydney;Australia;NSW;;;-33.867139;151.207114"
), Author = character(0), DateTimeStamp = structure(list(sec 56.4049999713898, 
    min = 46L, hour = 4L, mday = 31L, mon = 9L, year = 109L, 
    wday = 6L, yday = 303L, isdst = 0L), .Names = c("sec",
"min",
"hour", "mday", "mon", "year",
"wday", "yday", "isdst"), class =
c("POSIXt",
"POSIXlt"), tzone = "GMT"), Description = character(0),
Heading character(0), ID = "1", Language = "en",
LocalMetaData = list(), Origin character(0), class =
c("PlainTextDocument",
"TextDocument", "character"))), CMetaData =
structure(list(NodeID = 0,
    MetaData = structure(list(create_date = structure(list(sec 56.4059998989105,
        min = 46L, hour = 4L, mday = 31L, mon = 9L, year = 109L, 
        wday = 6L, yday = 303L, isdst = 0L), .Names = c("sec", 
    "min", "hour", "mday", "mon",
"year", "wday", "yday", "isdst"
    ), class = c("POSIXt", "POSIXlt"), tzone =
"GMT"), creator structure("", .Names =
"LOGNAME")), .Names = c("create_date",
    "creator")), Children = NULL), .Names = c("NodeID",
"MetaData",
"Children"), class = "MetaDataNode"), DMetaData =
structure(list(
    MetaID = 0), .Names = "MetaID", row.names = c(NA, -1L), class
"data.frame"), class = c("VCorpus",
"Corpus", "list"))
onyourmark wrote:> 
> Hi. I have a huge list called twitter:
> 
>> dim(twitter)
> NULL
>> str(twitter)
> List of 1
>  $ :Classes 'PlainTextDocument', 'TextDocument',
'character'  atomic
> [1:35575] 11999;10:47:14;20;10;2009;ObamaLouverture;Trails Mixed Lessons
> For Governance From Campaigner-in-chief: President obama jumps  campaign
> 09  tuesday.. http://bit.ly/2eHMaN;Florida;USA;FL;;;27.6648274;-81.5157535
> 12210;10:47:37;20;10;2009;David_Stringer;William Hague heading  Washington 
> meets  Gen. Jim Jones, Sen. John McCain  others. Will Obama team raise
> worries  EU ties?;London, England;United Kingdom;Greater
> London;Westminster;;51.5001524;-0.1262362
> 12355;10:47:53;20;10;2009;Singsabit;RT @Drudge_Report PAPER: Excuses
> wearing thin  Obama, media pals... http://tinyurl.com/yfw6cd9;So.
> California;USA;CA;;;36.778261;-119.4179324
> 12407;10:47:59;20;10;2009;obamavideonews;Obama News Obama   Afghanistan
> troop decision timing (AFP) : AFP - Pres.. http://bit.ly/3KPUr8 #obama
> #video;USA;USA;;;;37.09024;-95.712891 ...
>   .. ..- attr(*, "Author")= chr(0) 
>   .. ..- attr(*, "DateTimeStamp")= POSIXlt[1:9], format:
"2009-10-31
> 04:46:56"
>   .. ..- attr(*, "Description")= chr(0) 
>   .. ..- attr(*, "Heading")= chr(0) 
>   .. ..- attr(*, "ID")= chr "1"
>   .. ..- attr(*, "Language")= chr "en"
>   .. ..- attr(*, "LocalMetaData")= list()
>   .. ..- attr(*, "Origin")= chr(0) 
>  - attr(*, "CMetaData")=List of 3
>   ..$ NodeID  : num 0
>   ..$ MetaData:List of 2
>   .. ..$ create_date: POSIXlt[1:9], format: "2009-10-31 04:46:56"
>   .. ..$ creator    : Named chr ""
>   .. .. ..- attr(*, "names")= chr "LOGNAME"
>   ..$ Children: NULL
>   ..- attr(*, "class")= chr "MetaDataNode"
>  - attr(*, "DMetaData")='data.frame':   1 obs. of  1
variable:
>   ..$ MetaID: num 0
>  - attr(*, "class")= chr [1:3] "VCorpus"
"Corpus" "list"
> 
> It contains tweets but in many languages. The "columns" are
separated by
> semi-colons. I am using the tm package and it is a "corpus".
> 
> It looks like this:
> 
> 547282;06:37:17;21;10;2009;dani_jade18;@Laura_Whyte1   day
> :p;Huddersfield/Lincoln;United
> Kingdom;Kirklees;Kirklees;;53.6468475;-1.7727296
> 547283;06:37:17;21;10;2009;fabiomafra;algu?m traz mais lenha pro
> computador da facool? BOM DIA.;Belo Horizonte - MG -
> BR;Brazil;MG;;;-19.8157306;-43.9542226
> 547284;06:37:17;21;10;2009;romanotr;???, "????????? ??? ??????"
> ???????????? ?????? ????? ?? ???????? ?????, ?? 173 ?????? ?? 81 ?????
> ???????? ???????. ??????,??????...;Portugal
> Aveiro;Portugal;Aveiro;;;40.6411848;-8.6536169
> 547285;06:37:18;21;10;2009;Y_T_;Playing: Beth Orton <\;Someone's
> Daughter>\;;Kanazawa, Japan;Japan;Ishikawa
> Prefecture;;;36.5613254;136.6562051
> Error: invalid input
>
'547286;06:37:18;21;10;2009;Atogey;????????????????????????????????????????????????????????????????????????RT
> @zuola ???????????? @wenyunc
> 
> I want to convert it to "fields" or columns and so I thought I
should
> convert it to a dataframe. I tried
> 
>> twitterDF<-as.data.frame(twitter)
> Error in sort.list(y) : 
>   invalid input
>
'547286;06:37:18;21;10;2009;Atogey;????????????????????????????????????????????????????????????????????????RT
> @zuola ???????????? @wenyunchao
>
????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????*????????????????????????????????????????????????????????????????????????????????????;???????????????;China;Zhejiang;;;28.695035;119.751054'
> in 'utf8towcs'
>> 
> 
> Can anyone suggest what I can do? 
> 
> P.S. Actually, I would love to remove all the non-English tweets but I
> have no clue about how to do that.
> 
> 
-- 
View this message in context:
http://old.nabble.com/convert-list-to-Dataframe-tp26148889p26148893.html
Sent from the R help mailing list archive at Nabble.com.
On 01/11/2009 7:43 AM, onyourmark wrote:> Hi. I have a huge list called twitter:It's a list, but more importantly it's a VCorpus and a Corpus. You should use the functions appropriate to those classes to extract the strings making up the data, declare their encoding properly (or convert them to your native encoding), then use read.delim() on a textConnection to read them in. Duncan Murdoch> >> dim(twitter) > NULL >> str(twitter) > List of 1 > $ :Classes 'PlainTextDocument', 'TextDocument', 'character' atomic > [1:35575] 11999;10:47:14;20;10;2009;ObamaLouverture;Trails Mixed Lessons For > Governance From Campaigner-in-chief: President obama jumps campaign 09 > tuesday.. http://bit.ly/2eHMaN;Florida;USA;FL;;;27.6648274;-81.5157535 > 12210;10:47:37;20;10;2009;David_Stringer;William Hague heading Washington > meets Gen. Jim Jones, Sen. John McCain others. Will Obama team raise > worries EU ties?;London, England;United Kingdom;Greater > London;Westminster;;51.5001524;-0.1262362 > 12355;10:47:53;20;10;2009;Singsabit;RT @Drudge_Report PAPER: Excuses wearing > thin Obama, media pals... http://tinyurl.com/yfw6cd9;So. > California;USA;CA;;;36.778261;-119.4179324 > 12407;10:47:59;20;10;2009;obamavideonews;Obama News Obama Afghanistan > troop decision timing (AFP) : AFP - Pres.. http://bit.ly/3KPUr8 #obama > #video;USA;USA;;;;37.09024;-95.712891 ... > .. ..- attr(*, "Author")= chr(0) > .. ..- attr(*, "DateTimeStamp")= POSIXlt[1:9], format: "2009-10-31 > 04:46:56" > .. ..- attr(*, "Description")= chr(0) > .. ..- attr(*, "Heading")= chr(0) > .. ..- attr(*, "ID")= chr "1" > .. ..- attr(*, "Language")= chr "en" > .. ..- attr(*, "LocalMetaData")= list() > .. ..- attr(*, "Origin")= chr(0) > - attr(*, "CMetaData")=List of 3 > ..$ NodeID : num 0 > ..$ MetaData:List of 2 > .. ..$ create_date: POSIXlt[1:9], format: "2009-10-31 04:46:56" > .. ..$ creator : Named chr "" > .. .. ..- attr(*, "names")= chr "LOGNAME" > ..$ Children: NULL > ..- attr(*, "class")= chr "MetaDataNode" > - attr(*, "DMetaData")='data.frame': 1 obs. of 1 variable: > ..$ MetaID: num 0 > - attr(*, "class")= chr [1:3] "VCorpus" "Corpus" "list" > > It contains tweets but in many languages. The "columns" are separated by > semi-colons. I am using the tm package and it is a "corpus". > > It looks like this: > > 547282;06:37:17;21;10;2009;dani_jade18;@Laura_Whyte1 day > :p;Huddersfield/Lincoln;United > Kingdom;Kirklees;Kirklees;;53.6468475;-1.7727296 > 547283;06:37:17;21;10;2009;fabiomafra;algu?m traz mais lenha pro computador > da facool? BOM DIA.;Belo Horizonte - MG - > BR;Brazil;MG;;;-19.8157306;-43.9542226 > 547284;06:37:17;21;10;2009;romanotr;???, "????????? ??? ??????" ???????????? > ?????? ????? ?? ???????? ?????, ?? 173 ?????? ?? 81 ????? ???????? ???????. > ??????,??????...;Portugal Aveiro;Portugal;Aveiro;;;40.6411848;-8.6536169 > 547285;06:37:18;21;10;2009;Y_T_;Playing: Beth Orton <\;Someone's > Daughter>\;;Kanazawa, Japan;Japan;Ishikawa > Prefecture;;;36.5613254;136.6562051 > Error: invalid input > '547286;06:37:18;21;10;2009;Atogey;???????? ????????????????????????????????????????????????????? ?????????RT > @zuola ???????????? @wenyunc > > I want to convert it to "fields" or columns and so I thought I should > convert it to a dataframe. I tried > >> twitterDF<-as.data.frame(twitter) > Error in sort.list(y) : > invalid input > '547286;06:37:18;21;10;2009;Atogey;???????? ????????????????????????????????????????????????????? ?????????RT > @zuola ???????????? @wenyunchao > ?????????????????????????????????????????????????? ???????????????? ????????????????????????????????????????????????????????????????????????????????????????????????????*????????????????????????????????????????????????????????????????????????????????????;???????????????;China;Zhejiang;;;28.695035;119.751054' > in 'utf8towcs' > > Can anyone suggest what I can do? > > P.S. Actually, I would love to remove all the non-English tweets but I have > no clue about how to do that. >
I did this on the source files which were semi-colon delimted (to delimit the fields, I am not sure what character denotes the new tweet) After loading the tm package> txt <- system.file("texts", "txt", package = "tm") > (twitter <- Corpus(DirSource(txt),+ readerControl = list(language = "lat"))) then twitter <- tm_map(twitter, removeWords, stopwords("english")) That last command took about an hour to complete. onyourmark wrote:> > Hi. I have a huge list called twitter: > >> dim(twitter) > NULL >> str(twitter) > List of 1 > $ :Classes 'PlainTextDocument', 'TextDocument', 'character' atomic > [1:35575] 11999;10:47:14;20;10;2009;ObamaLouverture;Trails Mixed Lessons > For Governance From Campaigner-in-chief: President obama jumps campaign > 09 tuesday.. http://bit.ly/2eHMaN;Florida;USA;FL;;;27.6648274;-81.5157535 > 12210;10:47:37;20;10;2009;David_Stringer;William Hague heading Washington > meets Gen. Jim Jones, Sen. John McCain others. Will Obama team raise > worries EU ties?;London, England;United Kingdom;Greater > London;Westminster;;51.5001524;-0.1262362 > 12355;10:47:53;20;10;2009;Singsabit;RT @Drudge_Report PAPER: Excuses > wearing thin Obama, media pals... http://tinyurl.com/yfw6cd9;So. > California;USA;CA;;;36.778261;-119.4179324 > 12407;10:47:59;20;10;2009;obamavideonews;Obama News Obama Afghanistan > troop decision timing (AFP) : AFP - Pres.. http://bit.ly/3KPUr8 #obama > #video;USA;USA;;;;37.09024;-95.712891 ... > .. ..- attr(*, "Author")= chr(0) > .. ..- attr(*, "DateTimeStamp")= POSIXlt[1:9], format: "2009-10-31 > 04:46:56" > .. ..- attr(*, "Description")= chr(0) > .. ..- attr(*, "Heading")= chr(0) > .. ..- attr(*, "ID")= chr "1" > .. ..- attr(*, "Language")= chr "en" > .. ..- attr(*, "LocalMetaData")= list() > .. ..- attr(*, "Origin")= chr(0) > - attr(*, "CMetaData")=List of 3 > ..$ NodeID : num 0 > ..$ MetaData:List of 2 > .. ..$ create_date: POSIXlt[1:9], format: "2009-10-31 04:46:56" > .. ..$ creator : Named chr "" > .. .. ..- attr(*, "names")= chr "LOGNAME" > ..$ Children: NULL > ..- attr(*, "class")= chr "MetaDataNode" > - attr(*, "DMetaData")='data.frame': 1 obs. of 1 variable: > ..$ MetaID: num 0 > - attr(*, "class")= chr [1:3] "VCorpus" "Corpus" "list" > > It contains tweets but in many languages. The "columns" are separated by > semi-colons. I am using the tm package and it is a "corpus". > > It looks like this: > > 547282;06:37:17;21;10;2009;dani_jade18;@Laura_Whyte1 day > :p;Huddersfield/Lincoln;United > Kingdom;Kirklees;Kirklees;;53.6468475;-1.7727296 > 547283;06:37:17;21;10;2009;fabiomafra;algu?m traz mais lenha pro > computador da facool? BOM DIA.;Belo Horizonte - MG - > BR;Brazil;MG;;;-19.8157306;-43.9542226 > 547284;06:37:17;21;10;2009;romanotr;???, "????????? ??? ??????" > ???????????? ?????? ????? ?? ???????? ?????, ?? 173 ?????? ?? 81 ????? > ???????? ???????. ??????,??????...;Portugal > Aveiro;Portugal;Aveiro;;;40.6411848;-8.6536169 > 547285;06:37:18;21;10;2009;Y_T_;Playing: Beth Orton <\;Someone's > Daughter>\;;Kanazawa, Japan;Japan;Ishikawa > Prefecture;;;36.5613254;136.6562051 > Error: invalid input > '547286;06:37:18;21;10;2009;Atogey;????????????????????????????????????????????????????????????????????????RT > @zuola ???????????? @wenyunc > > I want to convert it to "fields" or columns and so I thought I should > convert it to a dataframe. I tried > >> twitterDF<-as.data.frame(twitter) > Error in sort.list(y) : > invalid input > '547286;06:37:18;21;10;2009;Atogey;????????????????????????????????????????????????????????????????????????RT > @zuola ???????????? @wenyunchao > ????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????*????????????????????????????????????????????????????????????????????????????????????;???????????????;China;Zhejiang;;;28.695035;119.751054' > in 'utf8towcs' >> > > Can anyone suggest what I can do? > > P.S. Actually, I would love to remove all the non-English tweets but I > have no clue about how to do that. > >-- View this message in context: http://old.nabble.com/convert-list-to-Dataframe-tp26148889p26148898.html Sent from the R help mailing list archive at Nabble.com.
Seemingly Similar Threads
- cannot find package in Packages>>Install Packages
- How to use tapply with more than one variables grouped
- Confirmatory factor analysis problems using sem package (works in Amos)
- For Whom the Gaza Bell Tolls -- Part 1 and 2 -- Obamas Mideast Jewish Wet Dream Team
- For Whom the Gaza Bell Tolls -- Part 1 and 2 -- Obamas Mideast Jewish Wet Dream Team