Dear Mr/Mrs, This is my first time working in R studio. I have a database of 36 participants but it has 150600 entries. Column - Column - Column - Column Participant Activityprobe - Activity Level - High/low/none Participant Screenprobe - screenon/off - Participant SMSprobe etc Participant CallLogProbe etc. I need a code that helps me count the activity level of all the participants High activity level. No activity level and Low activity level. And to help me find out for every participant what the percentages are of all their high/no/low activity. For screenprobe I need to count how many times the participant turned their screen on and how many times they turned it off and the percentage of screen on/off. For callLog I need to count how many times each participant got called and the percentage. For SMS I need to count the number of SMS for each participant and their percentage. I also need to categorize the probes. So that my database shows all the activity levels first, organized by none/high/low and then all the screenprobes, organized by on and off etc... I hope that my description is clear and that you can maybe help me. Best, Rachel [[alternative HTML version deleted]]
Not really. This is R-help, not R-do-my-work-for-me. You need to make enough progress in doing your work that you can ask a focused question about how to do some step of your work in R before your query will be answerable on this mailing list. Once you have started your work you will have some code and sample data you can share with us that w can run so we can understand your question (code without data is often really confusing). Note that R is text-based, and so is this mailing list... when you allow your email program to add HTML formatting to the email you send, the formatting will get removed anyway and the plain text that remains is often not readable or R won't understand the code... so follow the Posting Guide and set your email program to send plain text to begin with. Some useful readings are [1][2][3], and don't forget the Posting Guide mentioned in the footer of this and every posting on this mailing list. If you haven't already found an intro to R that you like, you might read [4]. Finally... please notice that RStudio provides one possible (very nice) way to interact with R, but this mailing list is indeed about R and not that specific user interface. You have to install R before RStudio can even be used, so you can always test your problem in R to be sure which program is causing your difficulty. If your question only arises when RStudio is being used then it doesn't belong here. [1] http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example [2] http://adv-r.had.co.nz/Reproducibility.html [3] https://cran.r-project.org/web/packages/reprex/index.html (read the vignette) [4] https://rstudio-education.github.io/hopr/ On January 4, 2019 10:24:26 AM PST, Rachel Thompson <rachel.thompson at student.uva.nl> wrote:>Dear Mr/Mrs, > >This is my first time working in R studio. >I have a database of 36 participants but it has 150600 entries. >Column - Column - Column - Column > >Participant Activityprobe - Activity Level - High/low/none > >Participant Screenprobe - screenon/off - > >Participant SMSprobe etc > >Participant CallLogProbe etc. > >I need a code that helps me count the activity level of all the >participants >High activity level. No activity level and Low activity level. >And to help me find out for every participant what the percentages are >of >all their high/no/low activity. > >For screenprobe I need to count how many times the participant turned >their >screen on and how many times they turned it off and the percentage of >screen on/off. > >For callLog I need to count how many times each participant got called >and >the percentage. > >For SMS I need to count the number of SMS for each participant and >their >percentage. > >I also need to categorize the probes. So that my database shows all the >activity levels first, organized by none/high/low and then all the >screenprobes, organized by on and off etc... > >I hope that my description is clear and that you can maybe help me. > >Best, > >Rachel > > [[alternative HTML version deleted]] > >______________________________________________ >R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.-- Sent from my phone. Please excuse my brevity.
Hi Rachel, I'll take a guess and assume that you are monitoring the mobile phones of 36 people, adding an observation every time some specified change of state is sensed on each device. I'll also assume that you are only recording four types of measurement. It seems that you want to aggregate these events for each subject over the interval or observation (or over each day or something). I think you are going to create a new data frame of these summaries from the one you have of individual observations. Creating each summary doesn't look too hard, but you will have to define more precisely what you want those summaries to be. For instance, "I want the mean activity level for each subject during the overall time that their mobile phone is switched on", One you have clearly defined your goals, it probably won't be too hard to get to them. Jim On Sun, Jan 6, 2019 at 5:39 AM Rachel Thompson <rachel.thompson at student.uva.nl> wrote:> > Dear Mr/Mrs, > > This is my first time working in R studio. > I have a database of 36 participants but it has 150600 entries. > Column - Column - Column - Column > > Participant Activityprobe - Activity Level - High/low/none > > Participant Screenprobe - screenon/off - > > Participant SMSprobe etc > > Participant CallLogProbe etc. > > I need a code that helps me count the activity level of all the participants > High activity level. No activity level and Low activity level. > And to help me find out for every participant what the percentages are of > all their high/no/low activity. > > For screenprobe I need to count how many times the participant turned their > screen on and how many times they turned it off and the percentage of > screen on/off. > > For callLog I need to count how many times each participant got called and > the percentage. > > For SMS I need to count the number of SMS for each participant and their > percentage. > > I also need to categorize the probes. So that my database shows all the > activity levels first, organized by none/high/low and then all the > screenprobes, organized by on and off etc... > > I hope that my description is clear and that you can maybe help me. > > Best, > > Rachel > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Hi Rachel, It looks to me as though the first thing you want to do is to get your data, which you attach as images, into a data frame. If these are flat files like CSV or TAB, you should be able to read them in with some variant of the read.table function. If Excel, look at the various Excel import packages. Then you can operate on the data frame by doing things like tabulating Participant ID against the code for SMS or call (which I assume are those 3000+ numbers). You can take the differences in what look like POSIX time values between successive TRUE and FALSE screen values to get the duration of screen activity and it looks like participant activity is recorded at regular intervals. As Jeff suggested, this is really just boring work figuring out how to extract the events: call_indices<-which(Probetype == xxxxxxCallLogProbe & ValueSpecified == _id & Valuedetailed ==3271) using suitable logical statements and then tabulating them by ParticipantID. If you know how to do that in SPSS, it won't be too hard to translate the logical statements into R syntax as above. I may have misunderstood the variable names, but I think the logic is clear. Jim On Sun, Jan 6, 2019 at 4:07 PM Rachel Thompson <rachel.thompson at student.uva.nl> wrote:> > Hi Jim, > > Thank you for the clarification. Since I only work in SPSS and I am from Amsterdam I have had problems with specifying what I am trying to do in this specific program and also in clear English language. > > I think I want to indeed aggregate these events for each subject over the observation. But in this case several observations. > 1. I want to have a summary of how many times a specific subject got called (CallLogProbe) > 2. I want to have a summary of how many times a specific subject got a text message (SMS probe) > 3. I want to have a summary of how many times a specific subject > - Turned their screen on - True (ScreenProbe) > - Or did not turn their screen on - False (ScreenProbe) > 4. I want to have a summary of the activity level of a specific subject > - Activity level - none (ActivityProbe) > - Activity level- low (ActivityProbe) > - Activity level - High (ActivityProbe) > > I want to do this for all the 36 subjects(Participants). > > In the end, I have to define percentages, so I am able to say...Subject 36 has low social interactions ( because they only got called and texted 500 times in total, while the average of all the participants is 10000 or something). I have to come up with the percentages myself and define cutoff points of what is considered low-medium-high, based on what the results of all the subjects are. > > I hope that I am as clear as possible . > > > I feel as if I am on my way of understanding it, but since I do not clearly know, I am trying out a lot of different codes etc. and I do not know if I am doing the right thing. I indeed made a new data frame etc, but I still feel a bit lost. Do I need to make one per subject or per Probe etc.. > > > Thanks for your help. I hope that you can help me resolve this issue. > > > Best, > > > Rachel > > > > > > > On Sat, Jan 5, 2019 at 9:03 PM Jim Lemon <drjimlemon at gmail.com> wrote: >> >> Hi Rachel, >> I'll take a guess and assume that you are monitoring the mobile phones >> of 36 people, adding an observation every time some specified change >> of state is sensed on each device. I'll also assume that you are only >> recording four types of measurement. It seems that you want to >> aggregate these events for each subject over the interval or >> observation (or over each day or something). I think you are going to >> create a new data frame of these summaries from the one you have of >> individual observations. Creating each summary doesn't look too hard, >> but you will have to define more precisely what you want those >> summaries to be. For instance, "I want the mean activity level for >> each subject during the overall time that their mobile phone is >> switched on", One you have clearly defined your goals, it probably >> won't be too hard to get to them. >> >> Jim >> >> On Sun, Jan 6, 2019 at 5:39 AM Rachel Thompson >> <rachel.thompson at student.uva.nl> wrote: >> > >> > Dear Mr/Mrs, >> > >> > This is my first time working in R studio. >> > I have a database of 36 participants but it has 150600 entries. >> > Column - Column - Column - Column >> > >> > Participant Activityprobe - Activity Level - High/low/none >> > >> > Participant Screenprobe - screenon/off - >> > >> > Participant SMSprobe etc >> > >> > Participant CallLogProbe etc. >> > >> > I need a code that helps me count the activity level of all the participants >> > High activity level. No activity level and Low activity level. >> > And to help me find out for every participant what the percentages are of >> > all their high/no/low activity. >> > >> > For screenprobe I need to count how many times the participant turned their >> > screen on and how many times they turned it off and the percentage of >> > screen on/off. >> > >> > For callLog I need to count how many times each participant got called and >> > the percentage. >> > >> > For SMS I need to count the number of SMS for each participant and their >> > percentage. >> > >> > I also need to categorize the probes. So that my database shows all the >> > activity levels first, organized by none/high/low and then all the >> > screenprobes, organized by on and off etc... >> > >> > I hope that my description is clear and that you can maybe help me. >> > >> > Best, >> > >> > Rachel >> > >> > [[alternative HTML version deleted]] >> > >> > ______________________________________________ >> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> > https://stat.ethz.ch/mailman/listinfo/r-help >> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> > and provide commented, minimal, self-contained, reproducible code.
I would not want to leave the impression that I think the task at hand is merely tedious... my point is that there are numerous steps involved and each step depends on information that has not been communicated to the list, and there is a learning curve even in knowing what to include in an email question. What I do think is that knowing enough basic R syntax to express small bits of the problem in R will be a vast improvement over attempting to use only English descriptions, and Rachel has to bridge that initial gap. For example, some images of data were apparently sent to Jim only, yet he still does not know in what format the data file is stored, so that technique was not very effective. One way for the question to become more focused is for Rachel to study up on her own how to import data and provide us with a "dput" (see the StackOverflow discussion I referenced before) of a small sample of data. Another is for Rachel to use basic R syntax to create an anonymous data set from scratch (also outlined in the SO discussion). These approaches allow us to keep the focus of our mailing list discussion on manipulating the data into summaries. Another approach is to re-focus the question on importing data by supplying a download link to the data so we can make suggestions as to what R commands will handle this data in its raw form. In any case, we cannot leapfrog over the data to the analysis as the question stands. Given the above, I have to wonder why Rachel hasn't simply used the tool she is familiar with... SPSS... to do this? If it is because this is an academic assignment to learn R then she should be talking to her institutional support (instructor/teaching assistant/tutoring staff) anyway since there is a no-homework policy on this list (and that avenue would have the benefit of being conducted orally and most likely in her native language). On January 6, 2019 1:12:46 AM PST, Jim Lemon <drjimlemon at gmail.com> wrote:>Hi Rachel, >It looks to me as though the first thing you want to do is to get your >data, which you attach as images, into a data frame. If these are flat >files like CSV or TAB, you should be able to read them in with some >variant of the read.table function. If Excel, look at the various >Excel import packages. Then you can operate on the data frame by doing >things like tabulating Participant ID against the code for SMS or call >(which I assume are those 3000+ numbers). You can take the differences >in what look like POSIX time values between successive TRUE and FALSE >screen values to get the duration of screen activity and it looks like >participant activity is recorded at regular intervals. As Jeff >suggested, this is really just boring work figuring out how to extract >the events: > >call_indices<-which(Probetype == xxxxxxCallLogProbe & ValueSpecified >== _id & Valuedetailed ==3271) > >using suitable logical statements and then tabulating them by >ParticipantID. If you know how to do that in SPSS, it won't be too >hard to translate the logical statements into R syntax as above. I may >have misunderstood the variable names, but I think the logic is clear. > >Jim > >On Sun, Jan 6, 2019 at 4:07 PM Rachel Thompson ><rachel.thompson at student.uva.nl> wrote: >> >> Hi Jim, >> >> Thank you for the clarification. Since I only work in SPSS and I am >from Amsterdam I have had problems with specifying what I am trying to >do in this specific program and also in clear English language. >> >> I think I want to indeed aggregate these events for each subject over >the observation. But in this case several observations. >> 1. I want to have a summary of how many times a specific subject got >called (CallLogProbe) >> 2. I want to have a summary of how many times a specific subject got >a text message (SMS probe) >> 3. I want to have a summary of how many times a specific subject >> - Turned their screen on - True (ScreenProbe) >> - Or did not turn their screen on - False (ScreenProbe) >> 4. I want to have a summary of the activity level of a specific >subject >> - Activity level - none (ActivityProbe) >> - Activity level- low (ActivityProbe) >> - Activity level - High (ActivityProbe) >> >> I want to do this for all the 36 subjects(Participants). >> >> In the end, I have to define percentages, so I am able to >say...Subject 36 has low social interactions ( because they only got >called and texted 500 times in total, while the average of all the >participants is 10000 or something). I have to come up with the >percentages myself and define cutoff points of what is considered >low-medium-high, based on what the results of all the subjects are. >> >> I hope that I am as clear as possible . >> >> >> I feel as if I am on my way of understanding it, but since I do not >clearly know, I am trying out a lot of different codes etc. and I do >not know if I am doing the right thing. I indeed made a new data frame >etc, but I still feel a bit lost. Do I need to make one per subject or >per Probe etc.. >> >> >> Thanks for your help. I hope that you can help me resolve this issue. >> >> >> Best, >> >> >> Rachel >> >> >> >> >> >> >> On Sat, Jan 5, 2019 at 9:03 PM Jim Lemon <drjimlemon at gmail.com> >wrote: >>> >>> Hi Rachel, >>> I'll take a guess and assume that you are monitoring the mobile >phones >>> of 36 people, adding an observation every time some specified change >>> of state is sensed on each device. I'll also assume that you are >only >>> recording four types of measurement. It seems that you want to >>> aggregate these events for each subject over the interval or >>> observation (or over each day or something). I think you are going >to >>> create a new data frame of these summaries from the one you have of >>> individual observations. Creating each summary doesn't look too >hard, >>> but you will have to define more precisely what you want those >>> summaries to be. For instance, "I want the mean activity level for >>> each subject during the overall time that their mobile phone is >>> switched on", One you have clearly defined your goals, it probably >>> won't be too hard to get to them. >>> >>> Jim >>> >>> On Sun, Jan 6, 2019 at 5:39 AM Rachel Thompson >>> <rachel.thompson at student.uva.nl> wrote: >>> > >>> > Dear Mr/Mrs, >>> > >>> > This is my first time working in R studio. >>> > I have a database of 36 participants but it has 150600 entries. >>> > Column - Column - Column - Column >>> > >>> > Participant Activityprobe - Activity Level - High/low/none >>> > >>> > Participant Screenprobe - screenon/off - >>> > >>> > Participant SMSprobe etc >>> > >>> > Participant CallLogProbe etc. >>> > >>> > I need a code that helps me count the activity level of all the >participants >>> > High activity level. No activity level and Low activity level. >>> > And to help me find out for every participant what the percentages >are of >>> > all their high/no/low activity. >>> > >>> > For screenprobe I need to count how many times the participant >turned their >>> > screen on and how many times they turned it off and the percentage >of >>> > screen on/off. >>> > >>> > For callLog I need to count how many times each participant got >called and >>> > the percentage. >>> > >>> > For SMS I need to count the number of SMS for each participant and >their >>> > percentage. >>> > >>> > I also need to categorize the probes. So that my database shows >all the >>> > activity levels first, organized by none/high/low and then all the >>> > screenprobes, organized by on and off etc... >>> > >>> > I hope that my description is clear and that you can maybe help >me. >>> > >>> > Best, >>> > >>> > Rachel >>> > >>> > [[alternative HTML version deleted]] >>> > >>> > ______________________________________________ >>> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> > https://stat.ethz.ch/mailman/listinfo/r-help >>> > PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >>> > and provide commented, minimal, self-contained, reproducible code. > >______________________________________________ >R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.-- Sent from my phone. Please excuse my brevity.
Hi Jim, Thank you for your email and information It is a CVS file which I imported in Rstudio. I will look into what you told me and see if I am able to figure it out. Best, Rachel On Sun, Jan 6, 2019 at 4:12 AM Jim Lemon <drjimlemon at gmail.com> wrote:> Hi Rachel, > It looks to me as though the first thing you want to do is to get your > data, which you attach as images, into a data frame. If these are flat > files like CSV or TAB, you should be able to read them in with some > variant of the read.table function. If Excel, look at the various > Excel import packages. Then you can operate on the data frame by doing > things like tabulating Participant ID against the code for SMS or call > (which I assume are those 3000+ numbers). You can take the differences > in what look like POSIX time values between successive TRUE and FALSE > screen values to get the duration of screen activity and it looks like > participant activity is recorded at regular intervals. As Jeff > suggested, this is really just boring work figuring out how to extract > the events: > > call_indices<-which(Probetype == xxxxxxCallLogProbe & ValueSpecified > == _id & Valuedetailed ==3271) > > using suitable logical statements and then tabulating them by > ParticipantID. If you know how to do that in SPSS, it won't be too > hard to translate the logical statements into R syntax as above. I may > have misunderstood the variable names, but I think the logic is clear. > > Jim > > On Sun, Jan 6, 2019 at 4:07 PM Rachel Thompson > <rachel.thompson at student.uva.nl> wrote: > > > > Hi Jim, > > > > Thank you for the clarification. Since I only work in SPSS and I am from > Amsterdam I have had problems with specifying what I am trying to do in > this specific program and also in clear English language. > > > > I think I want to indeed aggregate these events for each subject over > the observation. But in this case several observations. > > 1. I want to have a summary of how many times a specific subject got > called (CallLogProbe) > > 2. I want to have a summary of how many times a specific subject got a > text message (SMS probe) > > 3. I want to have a summary of how many times a specific subject > > - Turned their screen on - True (ScreenProbe) > > - Or did not turn their screen on - False (ScreenProbe) > > 4. I want to have a summary of the activity level of a specific subject > > - Activity level - none (ActivityProbe) > > - Activity level- low (ActivityProbe) > > - Activity level - High (ActivityProbe) > > > > I want to do this for all the 36 subjects(Participants). > > > > In the end, I have to define percentages, so I am able to say...Subject > 36 has low social interactions ( because they only got called and texted > 500 times in total, while the average of all the participants is 10000 or > something). I have to come up with the percentages myself and define cutoff > points of what is considered low-medium-high, based on what the results of > all the subjects are. > > > > I hope that I am as clear as possible . > > > > > > I feel as if I am on my way of understanding it, but since I do not > clearly know, I am trying out a lot of different codes etc. and I do not > know if I am doing the right thing. I indeed made a new data frame etc, but > I still feel a bit lost. Do I need to make one per subject or per Probe > etc.. > > > > > > Thanks for your help. I hope that you can help me resolve this issue. > > > > > > Best, > > > > > > Rachel > > > > > > > > > > > > > > On Sat, Jan 5, 2019 at 9:03 PM Jim Lemon <drjimlemon at gmail.com> wrote: > >> > >> Hi Rachel, > >> I'll take a guess and assume that you are monitoring the mobile phones > >> of 36 people, adding an observation every time some specified change > >> of state is sensed on each device. I'll also assume that you are only > >> recording four types of measurement. It seems that you want to > >> aggregate these events for each subject over the interval or > >> observation (or over each day or something). I think you are going to > >> create a new data frame of these summaries from the one you have of > >> individual observations. Creating each summary doesn't look too hard, > >> but you will have to define more precisely what you want those > >> summaries to be. For instance, "I want the mean activity level for > >> each subject during the overall time that their mobile phone is > >> switched on", One you have clearly defined your goals, it probably > >> won't be too hard to get to them. > >> > >> Jim > >> > >> On Sun, Jan 6, 2019 at 5:39 AM Rachel Thompson > >> <rachel.thompson at student.uva.nl> wrote: > >> > > >> > Dear Mr/Mrs, > >> > > >> > This is my first time working in R studio. > >> > I have a database of 36 participants but it has 150600 entries. > >> > Column - Column - Column - Column > >> > > >> > Participant Activityprobe - Activity Level - High/low/none > >> > > >> > Participant Screenprobe - screenon/off - > >> > > >> > Participant SMSprobe etc > >> > > >> > Participant CallLogProbe etc. > >> > > >> > I need a code that helps me count the activity level of all the > participants > >> > High activity level. No activity level and Low activity level. > >> > And to help me find out for every participant what the percentages > are of > >> > all their high/no/low activity. > >> > > >> > For screenprobe I need to count how many times the participant turned > their > >> > screen on and how many times they turned it off and the percentage of > >> > screen on/off. > >> > > >> > For callLog I need to count how many times each participant got > called and > >> > the percentage. > >> > > >> > For SMS I need to count the number of SMS for each participant and > their > >> > percentage. > >> > > >> > I also need to categorize the probes. So that my database shows all > the > >> > activity levels first, organized by none/high/low and then all the > >> > screenprobes, organized by on and off etc... > >> > > >> > I hope that my description is clear and that you can maybe help me. > >> > > >> > Best, > >> > > >> > Rachel > >> > > >> > [[alternative HTML version deleted]] > >> > > >> > ______________________________________________ > >> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > >> > https://stat.ethz.ch/mailman/listinfo/r-help > >> > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > >> > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]