Ulrik Stervbo
2017-Jun-01 15:49 UTC
[R] Data import R: some explanatory variables not showing up correctly in summary
Hi Tara, It seems that you categorise and count for each category. Could it be that the method you use puts everything that doesn't match the predefined categories in Other? I'm only guessing because without a minimal reproducible example it's difficult to do anything else. Best wishes Ulrik Rui Barradas <ruipbarradas at sapo.pt> schrieb am Do., 1. Juni 2017, 17:30:> Hello, > > In order for us to help we need to know how you've imported your data. > What was the file type? What instructions have you used to import it? > Did you use base R or a package? > Give us a minimal but complete code example that can reproduce your > situation. > > Hope this helps, > > Rui Barradas > > Em 01-06-2017 11:02, Tara Adcock escreveu: > > Hi, > > > > I have a question regarding data importing into R. > > > > When I import my data into R and review the summary, some of my > explanatory variables are being reported as if instead of being one > variable, they are two with the same name. See below for an example; > > > > Behav person Behav dog Position > > **combination : 38 combination : 4** Bank :372 > > **combination : 7 combination : 4** **Island :119** > > fast :123 fast : 15 **Island : 11** > > slow :445 slow : 95 Land : 3 > > stat :111 stat : 14 Water :230 > > > > Also, all of the distances I have imported are showing up in the summary > along with a line entitled "other". However, I haven't used any other > distances? > > > > Distance Distance.dog > > 2-10m :184 <50m : 35 > > <50m :156 2-10m : 27 > > 10-20m :156 20-30m : 23 > > 20-30m : 91 30-40m : 16 > > 40-50m : 57 10-20m : 13 > > **(Other): 82 (Other): 18** > > > > I have checked my data sheet over and over again and I think > standardised the data, but the issue keeps arising. I'm assuming I need to > clean the data set but as a nearly complete novice in R I am not certain > how to do this. Any help at all with this would be much appreciated. Thanks > so much. > > > > Kind Regards, > > > > Tara Adcock. > > > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
William Dunlap
2017-Jun-01 15:57 UTC
[R] Data import R: some explanatory variables not showing up correctly in summary
Check for leading or trailing spaces in the strings in your data. dput(dataset) would show them. Bill Dunlap TIBCO Software wdunlap tibco.com On Thu, Jun 1, 2017 at 8:49 AM, Ulrik Stervbo <ulrik.stervbo at gmail.com> wrote:> Hi Tara, > > It seems that you categorise and count for each category. Could it be that > the method you use puts everything that doesn't match the predefined > categories in Other? > > I'm only guessing because without a minimal reproducible example it's > difficult to do anything else. > > Best wishes > Ulrik > > Rui Barradas <ruipbarradas at sapo.pt> schrieb am Do., 1. Juni 2017, 17:30: > > > Hello, > > > > In order for us to help we need to know how you've imported your data. > > What was the file type? What instructions have you used to import it? > > Did you use base R or a package? > > Give us a minimal but complete code example that can reproduce your > > situation. > > > > Hope this helps, > > > > Rui Barradas > > > > Em 01-06-2017 11:02, Tara Adcock escreveu: > > > Hi, > > > > > > I have a question regarding data importing into R. > > > > > > When I import my data into R and review the summary, some of my > > explanatory variables are being reported as if instead of being one > > variable, they are two with the same name. See below for an example; > > > > > > Behav person Behav dog Position > > > **combination : 38 combination : 4** Bank :372 > > > **combination : 7 combination : 4** **Island :119** > > > fast :123 fast : 15 **Island : 11** > > > slow :445 slow : 95 Land : 3 > > > stat :111 stat : 14 Water :230 > > > > > > Also, all of the distances I have imported are showing up in the > summary > > along with a line entitled "other". However, I haven't used any other > > distances? > > > > > > Distance Distance.dog > > > 2-10m :184 <50m : 35 > > > <50m :156 2-10m : 27 > > > 10-20m :156 20-30m : 23 > > > 20-30m : 91 30-40m : 16 > > > 40-50m : 57 10-20m : 13 > > > **(Other): 82 (Other): 18** > > > > > > I have checked my data sheet over and over again and I think > > standardised the data, but the issue keeps arising. I'm assuming I need > to > > clean the data set but as a nearly complete novice in R I am not certain > > how to do this. Any help at all with this would be much appreciated. > Thanks > > so much. > > > > > > Kind Regards, > > > > > > Tara Adcock. > > > > > > > > > [[alternative HTML version deleted]] > > > > > > ______________________________________________ > > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > > https://stat.ethz.ch/mailman/listinfo/r-help > > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html > > > and provide commented, minimal, self-contained, reproducible code. > > > > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/ > posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
David L Carlson
2017-Jun-01 16:07 UTC
[R] Data import R: some explanatory variables not showing up correctly in summary
It looks like your printouts are based on the R summary() function? The function lists the number of cases in the 5 largest categories when the variable is coded as a function. Then it indicates how many other categories are present. This is described on the manual page for function summary(). In the first case the duplicates probably represent cases in your source data (a spreadsheet?), where you have inadvertently put a space at the end of the label, e.g. "combination", and "combination ". The answers to both questions are easy to find with the levels() function: levels(yourdataframe$Position) This will list all of the factor levels in variable Position for you. If there are extras spaces and you were using read.csv() to import the data, use the strip.white=TRUE argument to delete leading and trailing spaces. This is also documented on the manual page for function read.csv(). One of the problems with spreadsheets is that these extra spaces are not readily apparent. ------------------------------------- David L Carlson Department of Anthropology Texas A&M University College Station, TX 77840-4352 -----Original Message----- From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Ulrik Stervbo Sent: Thursday, June 1, 2017 10:50 AM To: Rui Barradas <ruipbarradas at sapo.pt>; Tara Adcock <taraadcock1 at hotmail.com>; r-help at r-project.org Subject: Re: [R] Data import R: some explanatory variables not showing up correctly in summary Hi Tara, It seems that you categorise and count for each category. Could it be that the method you use puts everything that doesn't match the predefined categories in Other? I'm only guessing because without a minimal reproducible example it's difficult to do anything else. Best wishes Ulrik Rui Barradas <ruipbarradas at sapo.pt> schrieb am Do., 1. Juni 2017, 17:30:> Hello, > > In order for us to help we need to know how you've imported your data. > What was the file type? What instructions have you used to import it? > Did you use base R or a package? > Give us a minimal but complete code example that can reproduce your > situation. > > Hope this helps, > > Rui Barradas > > Em 01-06-2017 11:02, Tara Adcock escreveu: > > Hi, > > > > I have a question regarding data importing into R. > > > > When I import my data into R and review the summary, some of my > explanatory variables are being reported as if instead of being one > variable, they are two with the same name. See below for an example; > > > > Behav person Behav dog Position > > **combination : 38 combination : 4** Bank :372 > > **combination : 7 combination : 4** **Island :119** > > fast :123 fast : 15 **Island : 11** > > slow :445 slow : 95 Land : 3 > > stat :111 stat : 14 Water :230 > > > > Also, all of the distances I have imported are showing up in the summary > along with a line entitled "other". However, I haven't used any other > distances? > > > > Distance Distance.dog > > 2-10m :184 <50m : 35 > > <50m :156 2-10m : 27 > > 10-20m :156 20-30m : 23 > > 20-30m : 91 30-40m : 16 > > 40-50m : 57 10-20m : 13 > > **(Other): 82 (Other): 18** > > > > I have checked my data sheet over and over again and I think > standardised the data, but the issue keeps arising. I'm assuming I need to > clean the data set but as a nearly complete novice in R I am not certain > how to do this. Any help at all with this would be much appreciated. Thanks > so much. > > > > Kind Regards, > > > > Tara Adcock. > > > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
David L Carlson
2017-Jun-01 16:10 UTC
[R] Data import R: some explanatory variables not showing up correctly in summary
It looks like your printouts are based on the R summary() function? The function lists the number of cases in the 5 largest categories when the variable is coded as a FACTOR. David C -----Original Message----- From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of David L Carlson Sent: Thursday, June 1, 2017 11:07 AM To: Ulrik Stervbo <ulrik.stervbo at gmail.com>; Rui Barradas <ruipbarradas at sapo.pt>; Tara Adcock <taraadcock1 at hotmail.com>; r-help at r-project.org Cc: William Dunlap via R-help <r-help at r-project.org> Subject: Re: [R] Data import R: some explanatory variables not showing up correctly in summary It looks like your printouts are based on the R summary() function? The function lists the number of cases in the 5 largest categories when the variable is coded as a function. Then it indicates how many other categories are present. This is described on the manual page for function summary(). In the first case the duplicates probably represent cases in your source data (a spreadsheet?), where you have inadvertently put a space at the end of the label, e.g. "combination", and "combination ". The answers to both questions are easy to find with the levels() function: levels(yourdataframe$Position) This will list all of the factor levels in variable Position for you. If there are extras spaces and you were using read.csv() to import the data, use the strip.white=TRUE argument to delete leading and trailing spaces. This is also documented on the manual page for function read.csv(). One of the problems with spreadsheets is that these extra spaces are not readily apparent. ------------------------------------- David L Carlson Department of Anthropology Texas A&M University College Station, TX 77840-4352 -----Original Message----- From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Ulrik Stervbo Sent: Thursday, June 1, 2017 10:50 AM To: Rui Barradas <ruipbarradas at sapo.pt>; Tara Adcock <taraadcock1 at hotmail.com>; r-help at r-project.org Subject: Re: [R] Data import R: some explanatory variables not showing up correctly in summary Hi Tara, It seems that you categorise and count for each category. Could it be that the method you use puts everything that doesn't match the predefined categories in Other? I'm only guessing because without a minimal reproducible example it's difficult to do anything else. Best wishes Ulrik Rui Barradas <ruipbarradas at sapo.pt> schrieb am Do., 1. Juni 2017, 17:30:> Hello, > > In order for us to help we need to know how you've imported your data. > What was the file type? What instructions have you used to import it? > Did you use base R or a package? > Give us a minimal but complete code example that can reproduce your > situation. > > Hope this helps, > > Rui Barradas > > Em 01-06-2017 11:02, Tara Adcock escreveu: > > Hi, > > > > I have a question regarding data importing into R. > > > > When I import my data into R and review the summary, some of my > explanatory variables are being reported as if instead of being one > variable, they are two with the same name. See below for an example; > > > > Behav person Behav dog Position > > **combination : 38 combination : 4** Bank :372 > > **combination : 7 combination : 4** **Island :119** > > fast :123 fast : 15 **Island : 11** > > slow :445 slow : 95 Land : 3 > > stat :111 stat : 14 Water :230 > > > > Also, all of the distances I have imported are showing up in the summary > along with a line entitled "other". However, I haven't used any other > distances? > > > > Distance Distance.dog > > 2-10m :184 <50m : 35 > > <50m :156 2-10m : 27 > > 10-20m :156 20-30m : 23 > > 20-30m : 91 30-40m : 16 > > 40-50m : 57 10-20m : 13 > > **(Other): 82 (Other): 18** > > > > I have checked my data sheet over and over again and I think > standardised the data, but the issue keeps arising. I'm assuming I need to > clean the data set but as a nearly complete novice in R I am not certain > how to do this. Any help at all with this would be much appreciated. Thanks > so much. > > > > Kind Regards, > > > > Tara Adcock. > > > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
David Winsemius
2017-Jun-01 16:17 UTC
[R] Data import R: some explanatory variables not showing up correctly in summary
> On Jun 1, 2017, at 8:57 AM, William Dunlap via R-help <r-help at r-project.org> wrote: > > Check for leading or trailing spaces in the strings in your data. > dput(dataset) would show them.This function would strip any leading or trailing spaces from a column: trim <- function (s) { s <- as.character(s) s <- sub(pattern = "^[[:blank:]]+", replacement = "", x = s) s <- sub(pattern = "[[:blank:]]+$", replacement = "", x = s) s } You could restrict it to non-mumeric columns with: my_dfrm[ !sapply(my_dfrm, is.numeric) ] <- lapply( my_dfrm[ !sapply(my_dfrm, is.numeric) ], trim) It would have the side-effect, (desirable in my opinion but opinions do vary on this matter), of converting any factor columns to character-class.> > Bill Dunlap > TIBCO Software > wdunlap tibco.com > > On Thu, Jun 1, 2017 at 8:49 AM, Ulrik Stervbo <ulrik.stervbo at gmail.com> > wrote: > >> Hi Tara, >> >> It seems that you categorise and count for each category. Could it be that >> the method you use puts everything that doesn't match the predefined >> categories in Other? >> >> I'm only guessing because without a minimal reproducible example it's >> difficult to do anything else. >> >> Best wishes >> Ulrik >> >> Rui Barradas <ruipbarradas at sapo.pt> schrieb am Do., 1. Juni 2017, 17:30: >> >>> Hello, >>> >>> In order for us to help we need to know how you've imported your data. >>> What was the file type? What instructions have you used to import it? >>> Did you use base R or a package? >>> Give us a minimal but complete code example that can reproduce your >>> situation. >>> >>> Hope this helps, >>> >>> Rui Barradas >>> >>> Em 01-06-2017 11:02, Tara Adcock escreveu: >>>> Hi, >>>> >>>> I have a question regarding data importing into R. >>>> >>>> When I import my data into R and review the summary, some of my >>> explanatory variables are being reported as if instead of being one >>> variable, they are two with the same name. See below for an example; >>>> >>>> Behav person Behav dog Position >>>> **combination : 38 combination : 4** Bank :372 >>>> **combination : 7 combination : 4** **Island :119** >>>> fast :123 fast : 15 **Island : 11** >>>> slow :445 slow : 95 Land : 3 >>>> stat :111 stat : 14 Water :230 >>>> >>>> Also, all of the distances I have imported are showing up in the >> summary >>> along with a line entitled "other". However, I haven't used any other >>> distances? >>>> >>>> Distance Distance.dog >>>> 2-10m :184 <50m : 35 >>>> <50m :156 2-10m : 27 >>>> 10-20m :156 20-30m : 23 >>>> 20-30m : 91 30-40m : 16 >>>> 40-50m : 57 10-20m : 13 >>>> **(Other): 82 (Other): 18** >>>> >>>> I have checked my data sheet over and over again and I think >>> standardised the data, but the issue keeps arising. I'm assuming I need >> to >>> clean the data set but as a nearly complete novice in R I am not certain >>> how to do this. Any help at all with this would be much appreciated. >> Thanks >>> so much. >>>> >>>> Kind Regards, >>>> >>>> Tara Adcock. >>>> >>>> >>>> [[alternative HTML version deleted]] >>>> >>>> ______________________________________________ >>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >>>> >>> >>> ______________________________________________ >>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/ >> posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius Alameda, CA, USA
Possibly Parallel Threads
- Data import R: some explanatory variables not showing up correctly in summary
- Data import R: some explanatory variables not showing up correctly in summary
- Data import R: some explanatory variables not showing up correctly in summary
- Data import R: some explanatory variables not showing up correctly in summary
- Data import R: some explanatory variables not showing up correctly in summary