I have a data frame with a column of values that I want to bucket (group) into specific levels.> str(dat)'data.frame': 3678 obs. of 39 variables:$ id : int 23 76 129 156 166 180 200 214 296 344 ... $ final_purchase_amount : Factor w/ 32 levels "\\N","1082","1109",..: 1 1 1 1 1 1 1 1 1 1 ... So I ran the following to produce new levels, one for values from 100 to 400, 401 to 1000, and 1001+. dat$final_purchase_amount<- NA dat$final_purchase_amount[dat$final_purchase_amount %in% levels(dat$final_purchase_amount)[c(8,9,11,12,13,15,16,17,18,19,20,21)]] <- "100 to 400" dat$final_purchase_amount[dat$final_purchase_amount %in% levels(dat$final_purchase_amount)[c(22,23,24,25,26,27,28,29,30,31,32)]] <- "401 to 1000" dat$final_purchase_amount[dat$final_purchase_amount %in% levels(dat$final_purchase_amount)[c(2,3,4,5,6,7,10,14)]] <- "1001 +" dat$final_purchase_amount <- factor(dat$final_purchase_amount) levels(dat$final_purchase_amount) table(dat$final_purchase_amount) However, this doesn't seem to produce any levels and returns the following.> levels(dat$final_purchase_amount)character(0)Can anyone point to what I'm doing wrong. Thanks! -- *Abraham Mathew Statistical Analyst www.amathew.com 720-648-0108 @abmathewks* [[alternative HTML version deleted]]
Your first command erases all the data in that column: dat$final_purchase_amount<- NA so when you refer to it later, it consists of only NAs. ---------------------------------------------- David L Carlson Associate Professor of Anthropology Texas A&M University College Station, TX 77843-4352> -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r- > project.org] On Behalf Of Abraham Mathew > Sent: Tuesday, August 07, 2012 1:57 PM > To: r-help at r-project.org > Subject: [R] Re-grouping data in R > > I have a data frame with a column of values that I want to bucket > (group) > into specific levels. > > > str(dat)'data.frame': 3678 obs. of 39 variables: > $ id : int 23 76 129 156 166 180 200 214 296 > 344 ... > $ final_purchase_amount : Factor w/ 32 levels > "\\N","1082","1109",..: 1 1 1 1 1 1 1 1 1 1 ... > > > So I ran the following to produce new levels, one for values from 100 > to 400, 401 to 1000, and 1001+. > > > dat$final_purchase_amount<- NA > dat$final_purchase_amount[dat$final_purchase_amount %in% > levels(dat$final_purchase_amount)[c(8,9,11,12,13,15,16,17,18,19,20,21)] > ] > <- "100 to 400" > dat$final_purchase_amount[dat$final_purchase_amount %in% > levels(dat$final_purchase_amount)[c(22,23,24,25,26,27,28,29,30,31,32)]] > <- "401 to 1000" > dat$final_purchase_amount[dat$final_purchase_amount %in% > levels(dat$final_purchase_amount)[c(2,3,4,5,6,7,10,14)]] <- "1001 +" > dat$final_purchase_amount <- factor(dat$final_purchase_amount) > levels(dat$final_purchase_amount) > table(dat$final_purchase_amount) > > > > However, this doesn't seem to produce any levels and returns the > following. > > > > levels(dat$final_purchase_amount)character(0) > > > > Can anyone point to what I'm doing wrong. > > > > Thanks! > > > -- > *Abraham Mathew > Statistical Analyst > www.amathew.com > 720-648-0108 > @abmathewks* > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code.
Hello, Inline. Em 07-08-2012 19:56, Abraham Mathew escreveu:> I have a data frame with a column of values that I want to bucket (group) > into specific levels. > >> str(dat)'data.frame': 3678 obs. of 39 variables: > $ id : int 23 76 129 156 166 180 200 214 296 344 ... > $ final_purchase_amount : Factor w/ 32 levels > "\\N","1082","1109",..: 1 1 1 1 1 1 1 1 1 1 ... > > > So I ran the following to produce new levels, one for values from 100 > to 400, 401 to 1000, and 1001+. > > > dat$final_purchase_amount<- NA > dat$final_purchase_amount[dat$final_purchase_amount %in% > levels(dat$final_purchase_amount)[c(8,9,11,12,13,15,16,17,18,19,20,21)]] > <- "100 to 400" > dat$final_purchase_amount[dat$final_purchase_amount %in% > levels(dat$final_purchase_amount)[c(22,23,24,25,26,27,28,29,30,31,32)]] > <- "401 to 1000" > dat$final_purchase_amount[dat$final_purchase_amount %in% > levels(dat$final_purchase_amount)[c(2,3,4,5,6,7,10,14)]] <- "1001 +" > dat$final_purchase_amount <- factor(dat$final_purchase_amount) > levels(dat$final_purchase_amount) > table(dat$final_purchase_amount) > > > > However, this doesn't seem to produce any levelsFortunately not! You have started by setting the entire column vector to NA in your first instruction above, then try several times to find that vector of NAs %in% levels numbers c(8,9, ...etc...) or c(22,23, ...etc..). Your first line of code makes everything else relative to dat$final_purchase_amount useless. (I believe that that line should be deleted.) Hope this helps, Rui Barradas> and returns the following. > > >> levels(dat$final_purchase_amount)character(0) > > > Can anyone point to what I'm doing wrong. > > > > Thanks! > >