That's easy you are confusing the dummy code I sent.
Do this:
lit<-read.csv("litologija.csv", sep=";",
dec=".")
sent <-data.frame(sentence=lit$Opis,stringsAsFactors=FALSE)
irst=second=third=fourth=fifth=sixth=seventh=eighth=ninth=tenth<-vector(length=nrow(
sent)
I put the length of the vector to 10 just to do a dummy problem.
Then do this:
for(j in 1:nrow(sent) {
sent[j,2:11]<-strsplit(sent[j,1]," ")[[1]][1:10]
}
That will get you a result the crude brute force way.
try that.
Then you can learn sapply way. but first you need to learn R data
structures.
On Tue, Nov 2, 2010 at 1:47 PM, Matevž Pavlič
<matevz.pavlic@gi-zrmk.si>wrote:
> Hi Steven,
>
>
>
> Thank you for the help. I get an error though when i do this :
>
>
>
> >lit<-read.csv("litologija.csv", sep=";",
dec=".")
>
> >sent <-data.frame(sentence=lit$Opis,stringsAsFactors=FALSE)
>
> >str(sent)
>
> >sentV<-rep(sent,10)
>
> >str(sentV)
>
>
>
>
>
>first=second=third=fourth=fifth=sixth=seventh=eighth=ninth=tenth<-vector(length=10)
>
> >DF
>
<-data.frame(Sentence=sent,first,second,third,fourth,fifth,sixth,seventh,eighth,ninth,tenth,stringsAsFactors=FALSE)
>
>
>
> »Error in data.frame(Sentence = sent, first, second, third, fourth, fifth,
> :
>
> arguments imply differing number of rows: 22928, 10«
>
>
>
> What am I doing wrong?
>
>
>
> Thnks, m
>
>
>
>
>
>
>
> *From:* steven mosher [mailto:moshersteven@gmail.com]
> *Sent:* Tuesday, November 02, 2010 8:45 PM
> *To:* David Winsemius
> *Cc:* Matevž Pavlič; Gaj Vidmar; r-help@stat.math.ethz.ch
> *Subject:* Re: [R] spliting first 10 words in a string
>
>
>
> Thanks david.
>
>
>
> Matevz, maybe I can help explain by doing a very simple and brute force
> approach
>
> as opposed to the way david did it. But you should learn his methods.
>
>
>
> I will just do a subset of your problem and if you understand how it works
> then you should
>
> be able to get something done and then make it more elegant.
>
>
>
> First, I simplify the problem by separating out the "sentence"
column.
>
>
>
> You can do this with your data frame by simply doing this
>
>
>
> MySentence <-data.frame(sentence=yourbigDF$Opis,stringsAsFactors=FALSE)
>
>
>
> so I take your original data.frame (yourbigDF) and I just create a copy of
> that one column
>
> $Opis
>
>
>
> Later we can merge the two back together after I add 10 columns for the
> words
>
>
>
>
>
> Lets make some dummy data with just 10 rows
>
>
>
>
>
>
>
> sentence<- "this is a sentence with ten words or maybe more than
ten
> words"
>
> sentV<-rep(sentence,10)
>
> # now I just made 10 rows of the same sentence
>
> # NEXT because I am going to create 10 new colums of 10 rows I create
>
> # 10 vectors> each is named and each has 10 elements For the rows.
>
> # they have NO DATA in them
>
>
>
>
>
first=second=third=fourth=fifth=sixth=seventh=eighth=ninth=tenth<-vector(length=10)
>
>
>
> #Next I create a dataframe with Sentence in the first column and 10 blank
> colums.
>
> # NOTE I use stringsAsFactors=False
>
>
>
> DF
>
<-data.frame(Sentence=sentence,first,second,third,fourth,fifth,sixth,seventh,eighth,ninth,tenth,stringsAsFactors=FALSE)
>
>
>
> # This is what it would look like ( the first row)
>
> DF[1,]
>
>
>
> Sentence first second third fourth fifth sixth seventh eighth ninth tenth
>
> 1 this is a sentence with ten words or maybe more than ten words FALSE
> FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
>
>
>
> Next, I will show you how to assign the first ten words to the 10 blank
> columns
>
>
>
> DF[1,2:11]<-strsplit(DF[1,1]," ")[[1]][1:10]
>
>
>
> #DF[1,2:11] selects the columns 2-11 of the first row
>
> #strsplit returns the first 10 words [1:10] and place them in the
> columsn2-11
>
>
>
> If you want to do this the slow way you can just loop through your
> dataframe row by row
>
> or you can probably use apply.
>
>
>
> Make more sense?
>
> > DF[1,2:11]<-strsplit(DF[1,1]," ")[[1]][1:10]
>
> > DF[1,]
>
> Sentence first
> second third fourth fifth sixth seventh eighth ninth tenth
>
> 1 this is a sentence with ten words or maybe more than ten words this
> is a sentence with ten words or maybe more
>
> > DF[1,"first"]
>
> [1] "this"
>
>
>
> On Tue, Nov 2, 2010 at 12:22 PM, David Winsemius
<dwinsemius@comcast.net>
> wrote:
>
>
> On Nov 2, 2010, at 3:01 PM, Matevž Pavlič wrote:
>
> Hi all,
>
> Thanks for all the help. I managed to do it with what Gaj suggested (Excel
> :().
>
> The last solution from David is also freat i just don't undestand why R
> put the words in 14 columns and thre rows?
>
>
>
> Because the maximum number of words was 14 and the fill argument was TRUE.
> There were three rows because there were three items in the supplied
> character vector.
>
>
>
> I would like it to put just the first 10 words in source field to 10
> diefferent destiantion fields, but the same row. And so on...is that
> possible?
>
>
>
> I don't know what a destination field might be. Those are not R data
types.
>
> This would trim the extra columns (in this example set to those greater
> than 8) by adding a lot of "NULL"'s to the end of a
colClasses specification
> .... at the expense of a warning message which can be ignored:
>
> > read.table(textConnection(words), fill=T, colClasses =
c(rep("character",
> 8), rep("NULL", 30) ) , stringsAsFactors=FALSE )
>
>
> V1 V2 V3 V4 V5 V6 V7 V8
>
> 1 I have a columnn with text that has
>
> 2 I would like to split these words in
>
> 3 but just first ten words in the string.
>
> Warning message:
> In read.table(textConnection(words), fill = T, colClasses >
c(rep("character", :
> cols = 14 != length(data) = 38
>
>
> If you want to assign the first column to a variable then just:
> > first8 <- read.table(textConnection(words), fill=T, colClasses >
c(rep("character", 8), rep("NULL", 30) ) ,
stringsAsFactors=FALSE)
> > var1 <- first8[[1]]
> > var1
> [1] "I" "I" "but"
>
> --
> David.
>
>
>
>
> Thank you, m
> -----Original Message-----
> From: r-help-bounces@r-project.org [mailto:r-help-bounces@r-project.org]
> On Behalf Of David Winsemius
> Sent: Tuesday, November 02, 2010 3:47 PM
> To: Gaj Vidmar
> Cc: r-help@stat.math.ethz.ch
> Subject: Re: [R] spliting first 10 words in a string
>
>
> On Nov 2, 2010, at 6:24 AM, Gaj Vidmar wrote:
>
> Though <forbidden> in this list, in Excel it's just (literally!)
> five clicks
> away!
> (with the column in question selected)
> Data -> Text to Columns -> Delimited -> tick Space -> Finish
> Pa je! (~Voila in Slovenian)
> (then import back to R, keeping only the first 10 columns if so
> desired)
>
>
> You could do the same thing without needing to leave R. Just
> read.table( textConnection(..), header=FALSE, fill=TRUE)
>
> read.table(textConnection(words), fill=T)
>
> V1 V2 V3 V4 V5 V6 V7 V8 V9
> V10 V11 V12 V13 V14
> 1 I have a columnn with text that has quite
> a few words in it.
> 2 I would like to split these words in separate columns
> 3 but just first ten words in the string. Is that
> possible in R?
>
>
> Regards,
> Assist. Prof. Gaj Vidmar, PhD
> University Rehabilitattion Institute, Republic of Slovenia
>
> Irrelevant P.S. Long ago, before embarking on what eventually ended
> mainly
> in statistics,
> I did two years of geology, so (and also because of knowing what the
> poster's institute does)
> I even kinda imagine what these data are.
>
> "Matev¾ Pavliè" <matevz.pavlic@gi-zrmk.si> wrote in message
> news:AD5CA6183570B54F92AA45CE2619F9B9D96994@gi-zrmk.si...
>
> Hi,
>
> I am sorry, will try to be more exact from now on...
>
> I have a data.frame with a field called Opis. IT contains
> sentenses that
> I would like to split in words or fields in data.frame...when I say
> columns I mean as in Excel table. I would like to split "Opis"
into
> ten
> fields from the first ten words in Opis field.
> Here is an example of my data.frame.
>
> 'data.frame': 22928 obs. of 12 variables:
> $ VrtinaID : int 1 1 1 1 2 2 2 2 2 2 ...
> $ ZapStev : int 1 2 3 4 1 2 3 4 5 6 ...
> $ GlobinaOd : num 0 0.8 9.2 10.1 0 0.9 2.6 4.9 6.8 7.3 ...
> $ GlobinaDo : num 0.8 9.2 10.1 11 0.9 2.6 4.9 6.8 7.3 8.2 ...
> $ Opis : Factor w/ 12754 levels "","(MIVKA)
DROBEN MELJAST
> PESEK, GOST, SIVORJAV",..: 2060 11588 2477 11660 7539 3182 7884
> 9123 2500
> 4756 ...
> $ ACklasifikacija : Factor w/ 290 levels
"","(CL)","(CL)/(SC)",..:
> 154 125
> 101 101 NA 106 125 80 106 101 ...
> $ GeolNastOd : num 0 0.8 9.2 10.1 0 0.9 2.6 4.9 6.8 7.3 ...
> $ GeolNastDo : num 0.8 9.2 10.1 11 0.9 2.6 4.9 6.8 7.3 8.2 ...
> $ GeolNastOpis : Factor w/ 113 levels "","B. M.
S.",..: 56 53 53
> 53 56
> 53 53 53 53 53 ...
> $ NacinVrtanjaOd : num 0e+00 1e+09 1e+09 1e+09 0e+00 ...
> $ NacinVrtanjaDo : num 1.1e+01 1.0e+09 1.0e+09 1.0e+09 1.0e+01 ...
> $ NacinVrtanjaOpis: Factor w/ 43 levels "","H.
N.","IZKOP",..: 26 1
> 1 1 26
> 1 1 1 1 1 ...
>
> Hope that explains better...
> Thank you, m
>
> -----Original Message-----
> From: David Winsemius [mailto:dwinsemius@comcast.net]
> Sent: Monday, November 01, 2010 10:13 PM
> To: Matev¾ Pavliè
> Cc: r-help@r-project.org
> Subject: Re: [R] spliting first 10 words in a string
>
>
> On Nov 1, 2010, at 4:39 PM, Matev¾ Pavliè wrote:
>
> Hi all,
>
>
>
> I have a columnn with text that has quite a few words in it. I would
> like to split these words in separate columns, but just first ten
> words in the string. Is that possible in R?
>
>
> Not sure what a column means to you. It's not a precisely defined R
> type or class. (And you are requested to offered a concrete example
> rather than making us guess.)
>
> words <-"I have a columnn with text that has quite a few words in
>
> it. I would like to split these words in separate columns, but just
> first ten words in the string. Is that possible in R?"
>
> strsplit(words, " ")[[1]][1:10]
>
> [1] "I" "have" "a"
"columnn" "with" "text"
> "that" "has" "quite" "a"
>
>
> Or if in a dataframe:
>
> words <-c("I have a columnn with text that has quite a few words in
>
> it.", "I would like to split these words in separate
columns", "but
> just first ten words in the string. Is that possible in R?")
>
> worddf <- data.frame(words=words)
>
>
>
> t(sapply(strsplit(worddf$words, " "), "[", 1:10) )
>
> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,
> 8] [,9] [,10]
> [1,] "I" "have" "a"
"columnn" "with" "text" "that"
"has"
> "quite" "a"
> [2,] "I" "would" "like" "to"
"split" "these" "words" "in"
> "separate" "columns"
> [3,] "but" "just" "first" "ten"
"words" "in" "the"
> "string."
> "Is" "that"
>
>
> --
> David Winsemius, MD
> West Hartford, CT
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
> David Winsemius, MD
> West Hartford, CT
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
> David Winsemius, MD
> West Hartford, CT
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
>
[[alternative HTML version deleted]]