Displaying 20 results from an estimated 70000 matches similar to: "how to compare two datasets in R>?"
2011 Jan 23
2
Creating subsets of a matrix
Hello,
Say I have 2 columns, bmi and gender, the first being all the values and the
second being male or female. How would I subset this into males only and
females only? I have searched these fora and read endlessly about select[]
and split() functions but to no avail. Also the table is not ordered.
bmi gender -> bmi gender + bmi gender
1 24.78 male
2010 Aug 13
2
Lattice xyplots plots with multiple lines per cell
Hello,
I need to plot the means of some outcome for two groups (control vs
intervention) over time (discrete) on the same plot, for various subsets
such as gender and grade level. What I have been doing is creating all
possible subsets first, using the aggregate function to create the means
over time, then plotting the means over time (as a simple line plot with
both control & intervention
2005 Apr 19
1
How to make combination data
Dear R-user,
I have a data like this below,
age <- c("young","mid","old")
married <- c("no","yes")
income <- c("low","high","medium")
gender <- c("female","male")
I want to make some of combination data like these,
age.income.dat <- expand.grid(age,
2017 Jul 26
3
How long to wait for process?
UseRs,
I have a dataframe with 2547 rows and several hundred columns in R
3.1.3. I am trying to run a small logistic regression with a subset of
the data.
know_fin ~
comp_grp2+age+gender+education+employment+income+ideol+home_lot+home+county
> str(knowf3)
'data.frame': 2033 obs. of 18 variables:
$ userid : Factor w/ 2542 levels
2012 Sep 11
1
Plotting every probability curve
I don't have a logistic regression model and am trying to generate
probability curves for all possible combinations of
the variables. My logit model has 5+ variables, and I want to draw curves
for every scenario.
See code below. When home_owner is 0 and 1, I want curves. The same goes
for all other variables categories, so that
I have permutations for all possible combinations.
I've
2011 Aug 24
2
data manipulation and summaries with few million rows
I have a data set with about 6 million rows and 50 columns. It is a
mixture of dates, factors, and numerics.
What I am trying to accomplish can be seen with the following
simplified data, which is given as dput output below.
> head(myData)
mydate gender mygroup id
1 2012-03-25 F A 1
2 2005-05-23 F B 2
3 2005-09-08 F B 2
4 2005-12-07 F B 2
2017 Jul 27
2
How long to wait for process?
Michael,
Thank you for the suggestion. I will take your advice and look more
critically at the covariates.
John
On 7/27/2017 8:08 AM, Michael Friendly wrote:
> Rather than go to a penalized GLM, you might be better off
> investigating the sources of quasi-perfect separation and simplifying
> the model to avoid or reduce it. In your data set you have several
> factors with large
2005 Dec 20
1
Help to find only one class and differennt class
Dear R users,
I have a problem, which I can not find a solution.
Probably someone could help me?
I have a result from my classification, like this
> credit.toy
[[1]]
age married ownhouse income gender class
1 20-30 no no low male good
2 40-50 no yes medium female good
[[2]]
age married ownhouse income gender class
1 20-30 yes yes high male
2010 Oct 19
2
ANOVA stuffs_How to save each result from FOR command?
Dear R experts,
I'm new in R and a beginner in terms of statistics.
It should be simple question, but definitely difficult to solve it by
myself.
I'd like to see main effect of group(gender: sample size is
different(M:F=23:18) and one of condition(cond) and the interaction at each
subset from 90 datasets
So I perform anova 90 times using a command like below;
for(i in 1:90)
2010 Sep 04
3
Levels in returned data.frame after subset
Dear List,
When I subset a data.frame, the levels are not re-adjusted (see
example). Why is this? Am I missing out on some basic stuff here?
Thanks
Ulrik
> m <- data.frame(gender = c("M", "M","F"), ht = c(172, 186.5, 165), wt = c(91,99, 74))
> dim(m)
[1] 3 3
> levels(m$gender)
[1] "F" "M"
> s <- subset(m, m$gender ==
2017 Jul 27
0
How long to wait for process?
Rather than go to a penalized GLM, you might be better off investigating
the sources of quasi-perfect separation and simplifying the model to
avoid or reduce it. In your data set you have several factors with
large number of levels, making the data sparse for all their combinations.
Like multicolinearity, near perfect separation is a data problem, and is
often better solved by careful
2017 Jul 27
0
How long to wait for process?
Hi,
Late to the thread here, but I noted that your dependent variable 'know_fin' has 3 levels in the str() output below.
Since you did not provide a full c&p of your glm() call, we can only presume that you did specify 'family = binomial' in the call.
Is the dataset 'knowf3' the result of a subsetting operation, such that there are only two of the three levels of
2017 Jul 27
1
How long to wait for process?
Marc,
Sorry for the lack of info on my part. Yes, I did use 'family =
binomial' and I did drop the 3rd level before running the model. I think
the str(<subset>) that I wrote into my original email might not have
been my final step before using glm. Thank you for reminding of the
potential problem.
I think Michael Friendly's idea is probably the solution I need to
consider.
2013 Mar 28
1
unique not working
i am using mac OSX 10.7.5, running R version 2.15.2 (2012-10-26) -- "Trick or Treat"
when i do:
uncountry <- unique(wvsAB[,7])
wvsAB$numcountry <- match(wvsAB$country, uncountry)
"unstate" isn't attaching.
> library(base)
> uncountry <- unique(wvsAB[,7])
> wvsAB$numcountry <- match(wvsAB$country, uncountry)
> ls(wvsAB)
[1] "age"
2017 Oct 19
1
looping using 'diverse' package measures
Hi everyone,
I'm new at R (although I'm a Stata user for some time and somehow
proficient in it) and I'm trying to use the 'diverse' R package to compute
a few diversity measures on a sample of firms for a period of about 10
years. I was wondering if you can give me some hints on how to best proceed
on using the 'diverse' package.
My sample has the following setup.
2013 Mar 28
4
bayesian HLM random effects
Hello, all.
I've been working on this for sometime and was almost at the end/ last chunk of code i would need.... When I received an error. Rather than go to bed and think about it in the morning, I messed with my data and now I am not getting anything. I was up until 4am trying to fix this.
Zip files of my data are attached (the data which ends in 'a' matches with wvsA and the
2008 Feb 26
3
OLS standard errors
Hi,
the standard errors of the coefficients in two regressions that I computed
by hand and using lm() differ by about 1%. Can somebody help me to identify
the source of this difference? The coefficient estimates are the same, but
the standard errors differ.
####Simulate data
happiness=0
income=0
gender=(rep(c(0,1,1,0),25))
for(i in 1:100){
happiness[i]=1000+i+rnorm(1,0,40)
2006 Nov 05
3
struggling to plot subgroups
Hi Folks,
I have data that looks like this:
freq gender xBar
1000 m 2.32
1000 f 3.22
2000 m 4.32
2000 f 4.53
3000 m 3.21
3000 f 3.44
4000 m 4.11
4000 f 3.99
I want to plot two lines (with symbols) for the two groups "m" and
"f". I have tried the following:
plot(xBar[gender=="m"]~freq[gender=="f"]) followed by
2007 Jun 26
1
A really simple data manipulation example
In response to those who asked for a better explanation of what the
Vilno software does, here's a simple example that gives some idea of
what it does.
LABRESULTS is a dataset with multiple rows per patient , with lab
sodium measurements. It has columns: PATIENT_ID, VISIT_NUM, and
SODIUM.
DEMO is a dataset with one row per patient, with demographic data.
It has columns: PATIENT_ID, GENDER.
2006 Sep 26
2
colClasses: supressed 'NA'
Hi,
The colClasses seem to be supressing 'NA' vlaues. How do I fix this?
R script and first 5 lines of output is below.
File "test2.dat" has blanks that are read as "NA" when I do not use
'colClasses', but as blanks when I use 'colClasses'.
temp.df <- read.fwf("test2.dat", width=c(10,1,1,1,1,2,2,3,3,1),