hind lazrak
2010-Nov-07 06:15 UTC
[R] creating a scale (factor) based on a continuous variable nested within levels of factor
Hello R-helpers I hope that my subject line is not detering anyone from helping me out:) I have been stuck of a few hours now, and I don't seem to pinpoint where the problem is. I have a data.frame which is structured as follow: str(hDatPretty) 'data.frame': 1665 obs. of ?8 variables: $ time ? ?: num ?0 1.02 2.05 3.07 4.09 ... $ hr ? ? ?: num ?62.4 63.6 64.6 65.5 66.2 ... $ emg ? ? : num ?3.3 3.42 3.52 3.57 3.6 ... $ respRate: num ?50.4 50.6 50.7 50.8 50.9 ... $ scr ? ? : num ?1.7 1.72 1.73 1.74 1.75 ... $ skinTemp: num ?28.1 28.2 28.2 28.2 28.2 ... $ rating ?: num ?4 4 4 4 4 4 4 4 4 4 ... $ songId ?: Factor w/ 37 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ... It consists of ratings ($rating) given by people (here the id variable is not indicated as this is a subset with only one person) for each of the 37 songs ($songId) they listen to. While they are listening we measure physiological responses (emg, hr,...) every second over a period of 45 seconds. Here's a quick peek at the data head(hDatPretty) time hr emg respRate scr skinTemp rating songId 1.1 0.000000 62.42135 3.300562 50.40538 1.703105 28.14489 4 1 1.2 1.022727 63.59057 3.424884 50.59292 1.718110 28.16189 4 1 1.3 2.045455 64.59840 3.515219 50.73523 1.730594 28.17836 4 1 1.4 3.068182 65.47707 3.573151 50.83909 1.740594 28.19422 4 1 1.5 4.090909 66.22192 3.597183 50.90466 1.748086 28.20948 4 1 1.6 5.113636 66.89209 3.588530 50.91911 1.753385 28.22414 4 1 So, every study participant gives one rating (from -10 to 10) for each song If we tab the data this is what we have (for the first 10 songs) table(hDatPretty$songId, hDatPretty$rating) ? ? -10 -9 -7 -3 ?0 ?1 ?3 ?4 ?5 ?7 ?8 ?9 10 ?1 ? ?0 ?0 ?0 ?0 ?0 ?0 ?0 45 ?0 ?0 ?0 ?0 ?0 ?# song 1 gets a score of 4 ?2 ? ?0 ?0 ?0 ?0 ?0 ?0 45 ?0 ?0 ?0 ?0 ?0 ?0 ?# song 2 gets a score of 3 ?3 ? ?0 ?0 45 ?0 ?0 ?0 ?0 ?0 ?0 ?0 ?0 ?0 ?0 ?#. ?4 ? ?0 45 ?0 ?0 ?0 ?0 ?0 ?0 ?0 ?0 ?0 ?0 ?0 ?5 ? ?0 ?0 ?0 ?0 ?0 ?0 ?0 ?0 ?0 45 ?0 ?0 ?0 ?6 ? ?0 ?0 ?0 ?0 ?0 ?0 ?0 ?0 ?0 ?0 ?0 ?0 45 ?7 ? ?0 ?0 ?0 ?0 ?0 ?0 ?0 ?0 ?0 ?0 45 ?0 ?0 ?#song 7 gets a score of 8 ?8 ? ?0 ?0 ?0 45 ?0 ?0 ?0 ?0 ?0 ?0 ?0 ?0 ?0 ?9 ? ?0 ?0 ?0 ?0 ?0 ?0 ?0 45 ?0 ?0 ?0 ?0 ?0 ?10 ? 0 ?0 ?0 ?0 ?0 45 ?0 ?0 ?0 ?0 ?0 ?0 ?0 What I would like to do is to create another scale ( a factor) based on the ratings with the following levels -10;-4 == dislike where -4 is included -4;4 == neutral where -4 is excluded 4;10 == like ?where 4 is excluded My code to obtain this new variable liking <- numeric(length(hDatPretty$rating)) liking[hDatPretty$rating <= -4] <- 'dislike' liking[hDatPretty$rating > -4 & hDatPretty$rating <= 4] <- 'neutral' liking[hDatPretty$rating > 4] <- 'like' hDatPretty['liking']<- factor(liking) The problem that I have is that for some reasons it does assign different values to the same rating for some songs but not all (?) See for example ? dislike like neutral 1 ? ? ? ?0 ? ?8 ? ? ?37 ? ## Here is one problem where the song # 1gets two 'liking' scores while the rating is constant 2 ? ? ? ?0 ? ?0 ? ? ?45 3 ? ? ? 45 ? ?0 ? ? ? 0 4 ? ? ? 45 ? ?0 ? ? ? 0 5 ? ? ? ?0 ? 45 ? ? ? 0 6 ? ? ? ?0 ? 45 ? ? ? 0 7 ? ? ? ?0 ? 45 ? ? ? 0 8 ? ? ? ?0 ? ?0 ? ? ?45 9 ? ? ? ?0 ? 10 ? ? ?35 ?## here is a similar problem Could you PLEASE help me with the proper code to obtain my 'liking' variable for each of the song based on the rating each song gets? Many thanks. Hind p.s.: I have also tried the cut() in the code as follow...unsuccesfully hDatPretty$liking <- by(hDatPretty$rating, hDatPretty$songId, ?? function (z) { cut(hDatPretty$z, c(-10, -4,4,10), ?? labels=c('dislike', 'neutral', 'like'))}) Error in cut.default(hDatPretty$z, c(-10, -4, 4, 10), labels = c("dislike", ?: ?'x' must be numeric again thank you.
Dennis Murphy
2010-Nov-07 09:45 UTC
[R] creating a scale (factor) based on a continuous variable nested within levels of factor
Hi: If I get your meaning, the cut() function would appear to be your friend in this problem. hDatPretty$liking <- cut(hDatPretty$rating, breaks = c(-11, -4, 4, 11), labels = c('dislike', 'neutral', 'like')) HTH, Dennis On Sat, Nov 6, 2010 at 11:15 PM, hind lazrak <hindstata@gmail.com> wrote:> Hello R-helpers > > > I hope that my subject line is not detering anyone from helping me out:) > I have been stuck of a few hours now, and I don't seem to pinpoint > where the problem is. > > > I have a data.frame which is structured as follow: > str(hDatPretty) > 'data.frame': 1665 obs. of 8 variables: > $ time : num 0 1.02 2.05 3.07 4.09 ... > $ hr : num 62.4 63.6 64.6 65.5 66.2 ... > $ emg : num 3.3 3.42 3.52 3.57 3.6 ... > $ respRate: num 50.4 50.6 50.7 50.8 50.9 ... > $ scr : num 1.7 1.72 1.73 1.74 1.75 ... > $ skinTemp: num 28.1 28.2 28.2 28.2 28.2 ... > $ rating : num 4 4 4 4 4 4 4 4 4 4 ... > $ songId : Factor w/ 37 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ... > > It consists of ratings ($rating) given by people (here the id variable > is not indicated as this is a subset with only one person) for each of > the 37 songs ($songId) they listen to. > While they are listening we measure physiological responses (emg, > hr,...) every second over a period of 45 seconds. > Here's a quick peek at the data > head(hDatPretty) > > time hr emg respRate scr skinTemp rating songId > 1.1 0.000000 62.42135 3.300562 50.40538 1.703105 28.14489 4 1 > 1.2 1.022727 63.59057 3.424884 50.59292 1.718110 28.16189 4 1 > 1.3 2.045455 64.59840 3.515219 50.73523 1.730594 28.17836 4 1 > 1.4 3.068182 65.47707 3.573151 50.83909 1.740594 28.19422 4 1 > 1.5 4.090909 66.22192 3.597183 50.90466 1.748086 28.20948 4 1 > 1.6 5.113636 66.89209 3.588530 50.91911 1.753385 28.22414 4 1 > > So, every study participant gives one rating (from -10 to 10) for each song > If we tab the data this is what we have (for the first 10 songs) > table(hDatPretty$songId, hDatPretty$rating) > > > -10 -9 -7 -3 0 1 3 4 5 7 8 9 10 > 1 0 0 0 0 0 0 0 45 0 0 0 0 0 # song 1 gets a score of 4 > 2 0 0 0 0 0 0 45 0 0 0 0 0 0 # song 2 gets a score of 3 > 3 0 0 45 0 0 0 0 0 0 0 0 0 0 #. > 4 0 45 0 0 0 0 0 0 0 0 0 0 0 > 5 0 0 0 0 0 0 0 0 0 45 0 0 0 > 6 0 0 0 0 0 0 0 0 0 0 0 0 45 > 7 0 0 0 0 0 0 0 0 0 0 45 0 0 #song 7 gets a score of 8 > 8 0 0 0 45 0 0 0 0 0 0 0 0 0 > 9 0 0 0 0 0 0 0 45 0 0 0 0 0 > 10 0 0 0 0 0 45 0 0 0 0 0 0 0 > > What I would like to do is to create another scale ( a factor) based > on the ratings with the following levels > -10;-4 == dislike where -4 is included > -4;4 == neutral where -4 is excluded > 4;10 == like where 4 is excluded > > My code to obtain this new variable > > liking <- numeric(length(hDatPretty$rating)) > liking[hDatPretty$rating <= -4] <- 'dislike' > liking[hDatPretty$rating > -4 & hDatPretty$rating <= 4] <- 'neutral' > liking[hDatPretty$rating > 4] <- 'like' > > hDatPretty['liking']<- factor(liking) > > The problem that I have is that for some reasons it does assign > different values to the same rating for some songs but not all (?) > See for example > > dislike like neutral > 1 0 8 37 ## Here is one problem where the song # > 1gets two 'liking' scores while the rating is constant > 2 0 0 45 > 3 45 0 0 > 4 45 0 0 > 5 0 45 0 > 6 0 45 0 > 7 0 45 0 > 8 0 0 45 > 9 0 10 35 ## here is a similar problem > > Could you PLEASE help me with the proper code to obtain my 'liking' > variable for each of the song based on the rating each song gets? > > Many thanks. > > > Hind > p.s.: I have also tried the cut() in the code as follow...unsuccesfully > > hDatPretty$liking <- by(hDatPretty$rating, hDatPretty$songId, > function (z) { cut(hDatPretty$z, c(-10, -4,4,10), > labels=c('dislike', 'neutral', 'like'))}) > > Error in cut.default(hDatPretty$z, c(-10, -4, 4, 10), labels = c("dislike", > : > 'x' must be numeric > > again thank you. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]