Here?s the dataset I?m working with, called test - subject group wk1 wk2 wk3 wk4 place 001-002 boys 2 3 4 5 002-003 boys 7 6 5 4 003-004 boys 9 4 6 1 004-005 girls 5 7 8 9 005-006 girls 2 6 3 8 006-007 girls 1 4 7 4 if I call mutate(test, place = substr(subject,1,3), ?001 is the first observation in the place column But it?s a character and ?subject? is a factor. I need place to be a factor, too, but I need the observations to be ONLY the first three numbers of ?subject.? Does that make my request more understandable? Ken kmnanus at gmail.com 914-450-0816 (tel) 347-730-4813 (fax)> On Mar 4, 2016, at 12:49 PM, Sarah Goslee <sarah.goslee at gmail.com> wrote: > > Hi Ken, > > You do that with as.factor(), as has already been suggested. You'll need to provide a reproducible example to show us what's going wrong. Using fake data is fine, we just need to see some data that look like yours and the code you're using. > > Sarah > > On Fri, Mar 4, 2016 at 11:57 AM, KMNanus <kmnanus at gmail.com <mailto:kmnanus at gmail.com>> wrote: > Let me see if I can ask the question more clearly - I am trying to extract a section of a hyphenated factor. For example, 001-004 is one observation of test$ken, which is a factor, and I want to set up a new factor variable called place that would have 001 as an observation. If I call mutate(place = (as.character (test$ken)), I can extract 001 from 001-004, but but don't know how to subsequently convert that character string back into a factor. > > > Or can 001 be extracted from a factor as a factor? > > Do you know how to execute either of these approaches? > > Ken > kmnanus at gmail.com <mailto:kmnanus at gmail.com> > 914-450-0816 <tel:914-450-0816> (tel) > 347-730-4813 <tel:347-730-4813> (fax) > > > > > On Mar 3, 2016, at 8:33 PM, Herv? Pag?s <hpages at fredhutch.org <mailto:hpages at fredhutch.org>> wrote: > > > > On 03/03/2016 02:13 PM, KMNanus wrote: > >> When I do that, > > > > When you do what exactly? > > > > It's impossible for anyone here to know what you're doing if you > > don't show the code. > > > >> I get "Error in `$<-.data.frame`(`*tmp*`, "site", value > >> = integer(0)) : > >> replacement has 0 rows, data has 6? > >> > >> The data frame has 6 rows. > > > > You said you had a factor variable, you never mentioned you had a > > data.frame. If the factor variable is part of a data.frame 'df', > > then first extract it with something like df$myvar or df[["myvar"]], > > and then call substr() followed by as.factor() on it. > > > > H. > > > >> > >> Ken > >> kmnanus at gmail.com <mailto:kmnanus at gmail.com> <mailto:kmnanus at gmail.com <mailto:kmnanus at gmail.com>> > >> 914-450-0816 <tel:914-450-0816> (tel) > >> 347-730-4813 <tel:347-730-4813> (fax) > >> > >> > >>> On Mar 3, 2016, at 4:52 PM, Herv? Pag?s <hpages at fredhutch.org <mailto:hpages at fredhutch.org> > >>> <mailto:hpages at fredhutch.org <mailto:hpages at fredhutch.org>>> wrote: > >>> > >>> Hi, > >>> > >>> On 03/03/2016 12:18 PM, KMNanus wrote: > >>>> I have a factor variable that is 6 digits and hyphenated. For > >>>> example, 001-014. > >>>> > >>>> I need to extract the first 3 digits to a new variable using mutate > >>>> in dplyr - in this case 001 - but can?t find a function to do it. > >>>> > >>>> substr will do this for character strings, but I need the variable to > >>>> remain as a factor. > >>> > >>> What prevents you from calling as.factor() on the result to turn it > >>> back into a factor? > >>> > >>> H. > >>> > >>>> > >>>> Is there an R function or workaround to do this? > >>>> > >>>> > >>>> Ken > >>>> kmnanus at gmail.com <mailto:kmnanus at gmail.com> <mailto:kmnanus at gmail.com <mailto:kmnanus at gmail.com>> > >>>> 914-450-0816 <tel:914-450-0816> (tel) > >>>> 347-730-4813 <tel:347-730-4813> (fax) > >>>> > >>>> > >>>> > >>>> ______________________________________________ > >>>> R-help at r-project.org <mailto:R-help at r-project.org> mailing list -- To UNSUBSCRIBE and more, see > >>>> https://stat.ethz.ch/mailman/listinfo/r-help <https://stat.ethz.ch/mailman/listinfo/r-help> > >>>> PLEASE do read the posting guide > >>>> http://www.R-project.org/posting-guide.html <http://www.r-project.org/posting-guide.html> > >>>> and provide commented, minimal, self-contained, reproducible code. > >>>> > >>> > >>> -- > >>> Herv? Pag?s > >>> > >>> Program in Computational Biology > >>> Division of Public Health Sciences > >>> Fred Hutchinson Cancer Research Center > >>> 1100 Fairview Ave. N, M1-B514 > >>> P.O. Box 19024 > >>> Seattle, WA 98109-1024 > >>> > >>> E-mail: hpages at fredhutch.org <mailto:hpages at fredhutch.org> <mailto:hpages at fredhutch.org <mailto:hpages at fredhutch.org>> > >>> Phone: (206) 667-5791 <tel:%28206%29%20667-5791> > >>> Fax: (206) 667-1319 <tel:%28206%29%20667-1319> > >> > > > > -- > > Herv? Pag?s > > > > Program in Computational Biology > > Division of Public Health Sciences > > Fred Hutchinson Cancer Research Center > > 1100 Fairview Ave. N, M1-B514 > > P.O. Box 19024 > > Seattle, WA 98109-1024 > > > > E-mail: hpages at fredhutch.org <mailto:hpages at fredhutch.org> > > Phone: (206) 667-5791 <tel:%28206%29%20667-5791> > > Fax: (206) 667-1319 <tel:%28206%29%20667-1319> >
As everyone has been telling you, as.factor(). If you like the mutate approach, you can call as.factor(test$subject) to convert it. Here's a one-liner with reproducible data. testdata <- structure(list(subject = structure(1:6, .Label = c("001-002", "002-003", "003-004", "004-005", "005-006", "006-007"), class = "factor"), group = structure(c(1L, 1L, 1L, 2L, 2L, 2L), .Label = c("boys", "girls"), class = "factor"), wk1 = c(2L, 7L, 9L, 5L, 2L, 1L), wk2 = c(3L, 6L, 4L, 7L, 6L, 4L), wk3 = c(4L, 5L, 6L, 8L, 3L, 7L), wk4 = c(5L, 4L, 1L, 9L, 8L, 4L)), .Names = c("subject", "group", "wk1", "wk2", "wk3", "wk4"), class = "data.frame", row.names = c(NA, -6L)) testdata$subject <- as.factor(substring(as.character(testdata$subject), 1, 3))> testdatasubject group wk1 wk2 wk3 wk4 1 001 boys 2 3 4 5 2 002 boys 7 6 5 4 3 003 boys 9 4 6 1 4 004 girls 5 7 8 9 5 005 girls 2 6 3 8 6 006 girls 1 4 7 4> str(testdata)'data.frame': 6 obs. of 6 variables: $ subject: Factor w/ 6 levels "001","002","003",..: 1 2 3 4 5 6 $ group : Factor w/ 2 levels "boys","girls": 1 1 1 2 2 2 $ wk1 : int 2 7 9 5 2 1 $ wk2 : int 3 6 4 7 6 4 $ wk3 : int 4 5 6 8 3 7 $ wk4 : int 5 4 1 9 8 4 Sarah On Fri, Mar 4, 2016 at 1:00 PM, KMNanus <kmnanus at gmail.com> wrote:> > Here?s the dataset I?m working with, called test - > > subject group wk1 wk2 wk3 wk4 place > 001-002 boys 2 3 4 5 > 002-003 boys 7 6 5 4 > 003-004 boys 9 4 6 1 > 004-005 girls 5 7 8 9 > 005-006 girls 2 6 3 8 > 006-007 girls 1 4 7 4 > > > if I call mutate(test, place = substr(subject,1,3), ?001 is the first observation in the place column > > But it?s a character and ?subject? is a factor. I need place to be a factor, too, but I need the observations to be ONLY the first three numbers of ?subject.? > > Does that make my request more understandable?
I much prefer the factor function over the as.factor function for converting character to factor, since you can set the levels in the order you want them to be. -- Sent from my phone. Please excuse my brevity. On March 4, 2016 10:07:27 AM PST, Sarah Goslee <sarah.goslee at gmail.com> wrote:>As everyone has been telling you, as.factor(). >If you like the mutate approach, you can call as.factor(test$subject) >to convert it. > >Here's a one-liner with reproducible data. > > >testdata <- structure(list(subject = structure(1:6, .Label >c("001-002", >"002-003", "003-004", "004-005", "005-006", "006-007"), class >"factor"), > group = structure(c(1L, 1L, 1L, 2L, 2L, 2L), .Label = c("boys", > "girls"), class = "factor"), wk1 = c(2L, 7L, 9L, 5L, 2L, > 1L), wk2 = c(3L, 6L, 4L, 7L, 6L, 4L), wk3 = c(4L, 5L, 6L, > 8L, 3L, 7L), wk4 = c(5L, 4L, 1L, 9L, 8L, 4L)), .Names = c("subject", >"group", "wk1", "wk2", "wk3", "wk4"), class = "data.frame", row.names >c(NA, >-6L)) > >testdata$subject <- as.factor(substring(as.character(testdata$subject), >1, 3)) > >> testdata > subject group wk1 wk2 wk3 wk4 >1 001 boys 2 3 4 5 >2 002 boys 7 6 5 4 >3 003 boys 9 4 6 1 >4 004 girls 5 7 8 9 >5 005 girls 2 6 3 8 >6 006 girls 1 4 7 4 >> str(testdata) >'data.frame': 6 obs. of 6 variables: > $ subject: Factor w/ 6 levels "001","002","003",..: 1 2 3 4 5 6 > $ group : Factor w/ 2 levels "boys","girls": 1 1 1 2 2 2 > $ wk1 : int 2 7 9 5 2 1 > $ wk2 : int 3 6 4 7 6 4 > $ wk3 : int 4 5 6 8 3 7 > $ wk4 : int 5 4 1 9 8 4 > >Sarah > >On Fri, Mar 4, 2016 at 1:00 PM, KMNanus <kmnanus at gmail.com> wrote: >> >> Here?s the dataset I?m working with, called test - >> >> subject group wk1 wk2 wk3 wk4 place >> 001-002 boys 2 3 4 5 >> 002-003 boys 7 6 5 4 >> 003-004 boys 9 4 6 1 >> 004-005 girls 5 7 8 9 >> 005-006 girls 2 6 3 8 >> 006-007 girls 1 4 7 4 >> >> >> if I call mutate(test, place = substr(subject,1,3), ?001 is the first >observation in the place column >> >> But it?s a character and ?subject? is a factor. I need place to be a >factor, too, but I need the observations to be ONLY the first three >numbers of ?subject.? >> >> Does that make my request more understandable? > >______________________________________________ >R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.[[alternative HTML version deleted]]