thr3ads.net - similar to: "Splitting and saving separate dataframes"

Displaying 20 results from an estimated 10000 matches similar to: "Splitting and saving separate dataframes"

2005 Aug 24

Remove NAs from Barplot

Dear List: I'm creating a series of barplots using Sweave that must assume a standard format. This is student achievement data and the x-axis must include all grades 3 to 8. In some cases, the data for a grade (or more than one grade) are missing in the vector math.bar, but are never missing for the vector apmxpmeet. The following sample code illustrates the issue. Using the code below to

Comparing rows in a dataframe

2004 Aug 06

Comparing rows in a dataframe

Hello I have a longitudinal dataframe organized in the long format and would like to make comparison between successive rows if certain conditions apply. Specifically, I have four variables of interest: grade, score, year, and schid, associated with each school with 3 measurements per school per grade, therefore the rows are temporally ordered and each school occupies multiple rows. For example,

"Impossible to run" error message when using Sweave

2004 Nov 17

"Impossible to run" error message when using Sweave

Dear List: I have a large dataset of multiple schools. My goal is to produce a separate tex file for each school that plots some of the student achievement scores. Essentially, the aim is to develop a custom report for each school. To accomplish this, I have code for a loop that gets sourced into R and then Sweaves the multiple files to create the individual school reports. Here is the code for

attach data from tapply to dataframe

2004 Aug 03

attach data from tapply to dataframe

I am working with a longitudinal data set in the long format. This data set has three observations per grade level per year. Here are the first 10 rows of the data frame: >tenn.dat[1:10,] year schid type grade gain se new cohort 6 2001 100005 5 4 33.1 3.5 4 3 7 2002 100005 5 4 33.9 3.9 4 2 8 2003 100005 5 4 32.3 4.2 4 1 10 2001 100005

paste command

2004 Nov 28

paste command

In a previous post, I mentioned a loop being used to generate graphs. I have some sample code partially put together but have found one offending line of code that I cannot figure out what to do with. I have one data frame called grade4. If I do something like hist(grade4$math) I get the appropriate chart. Within the loop, however, I am doing this for multiple files and grades, so I use

Creating dummy codes

2004 Aug 01

Creating dummy codes

Is there an efficient way to create a series of dummy codes from a single variable? For example, I have a variable, “grade” = {2, …, 12}. I want to create k-1 dummy codes for grade such that grade 2 is the base (i.e, grade 2 =0). I am hoping that the new variables can be labeled as grade.3, grade.4 etc. I'll then use grade <- paste("grade.", 3:12, sep="") in

Help with Plotting Function

2004 May 21

Help with Plotting Function

Dear List: I cannot seem to find a way to plot my data correctly. I have a small data frame with 6 total variables (x_1 ... x_6). I am trying to plot x_1 against x_2 and x_3. I have tried plot(x_2, x_1) #obviously works fine plot(x_3, x_1, add=TRUE) # Does not work. I keep getting error messages. I would also like to add ablines to this plot. I have experimented with a number of other

Lattice and Beamer

2010 Jun 28

Lattice and Beamer

Two things I think are some of the best developments in statistics and production are the lattice package and the beamer class for presentation in Latex. One thing I have not become very good at is properly sizing my visuals to look good in a presentation. For instance, I have the following code that creates a nice plot (sorry, cannot provide reproducible data).

Modifications to an abline

2004 Nov 28

Modifications to an abline

Dear List: I am working to generate graphs for individual students that will be created through a series of loops in Sweave. Before doing so, I am still trying to design the graph. The code for creating the barplot is below with some sample datapoints just made up for now. Ultimately, this chart will take data from an lme object using longitudinal student data. So, the dots represent the

reshape (was: Comparing rows in a dataframe)

2004 Aug 06

reshape (was: Comparing rows in a dataframe)

Hi all: I solved the previous stated problem in something of a brute force way (but it works). I seem to now be running into one little hiccup using reshape. Here is a quick snip of the data in long format: grade stability year schid 6 Grade 4 3 2001 100005 7 Grade 4 3 2002 100005 8 Grade 4 2 2003 100005 10 Grade 5 2 2001 100005 11 Grade 5

Sweep statistics

2005 May 09

Sweep statistics

Dear List: I am wondering if there is a more efficient way to compute the following. For the example I am using the star data frame in the mlmRev package. This has 80 schools and includes grades K, 1, 2, and 3. First I compute the grade level mean in each school using tapply as: tapply(star$math, list(star$sch,star$gr), mean, na.rm=T) This results in a table of means by school for each grade.

2005 Dec 01

Simulate Correlated data from complex sample

Dear List: I have created some code to simulate data from a complex sample where 5000 students are nested in 50 schools. My code returns a dataframe with a variable representing student achievement at a single time point. My actual code for creating this is below. What I would like to do is generate a second column of data that is correlated with the first at .8 and has the same means within

By() with method = spearman

2007 Sep 19

By() with method = spearman

I have a data set where I want the correlations between 2 variables conditional on a students grade level. This code works just fine. by(tmp[,c('mtsc07', 'DCBASmathscoreSPRING')], tmp$Grade, cor, use='complete', method='pearson') However, this generates an error by(tmp[,c('mtsc07', 'DCBASmathscoreSPRING')], tmp$Grade, cor, use='complete',

Area between CDFs

2004 Feb 18

Area between CDFs

Dear List: I am trying to find the area between two ECDFs. I am examining the gap in performance between two groups, males and females on a student achievement test in math, which is a continuous metric. I start by creating a subset of the dataframe male<-subset(datafile, female="Male") female<-subset(datafile, female="Female") I then plot the two CDFs via

ylim for graphic

2005 Aug 29

ylim for graphic

Dear list: I have some data for which I am generating a series of barplots for percentages. One issue that I am dealing with is that I am trying to get the legend to print in a fixed location for each chart generated by the data. Because these charts are being created in a loop, with different data, my code searches the data to identify the maximum value in the data and then print the data values

Labeling charts within a loop

2004 Nov 29

Labeling charts within a loop

Hi All: This may turn out to be very simply, but I can't seem to add the name of the school to a chart. The loop I created is below that subsets a dataframe and creates a chart for each school based on certain variables. As it stands now, they title includes the school's ID number. Instead, I want to replace this with the school's actual name, which is stored in a variable called

formula and model.frame

2009 Oct 21

formula and model.frame

Suppose I have the following function myFun <- function(formula, data){ f <- formula(formula) dat <- model.frame(f, data) dat } Applying it with this sample data yields a new dataframe: qqq <- data.frame(grade = c(3, NA, 3,4,5,5,4,3), score = rnorm(8), idVar = c(1:8)) dat <- myFun(score ~ grade, qqq) However, what I would like is for the resulting dataframe (dat) to include

Possible Improvement to sapply

2018 Mar 13

Possible Improvement to sapply

FYI, in R devel (to become 3.5.0), there's isFALSE() which will cut some corners compared to identical(): > microbenchmark::microbenchmark(identical(FALSE, FALSE), isFALSE(FALSE)) Unit: nanoseconds expr min lq mean median uq max neval identical(FALSE, FALSE) 984 1138 1694.13 1218.0 1337.5 13584 100 isFALSE(FALSE) 713 761 1133.53 809.5 871.5

Possible Improvement to sapply

2018 Mar 13

Possible Improvement to sapply

Quite possibly, and I?ll look into that. Aside from the work I was doing, however, I wonder if there is a way such that sapply could avoid the overhead of having to call the identical function to determine the conditional path. From: William Dunlap [mailto:wdunlap at tibco.com] Sent: Tuesday, March 13, 2018 12:14 PM To: Doran, Harold <HDoran at air.org> Cc: Martin Morgan <martin.morgan

Possible Improvement to sapply

2018 Mar 13

Possible Improvement to sapply

You?re right, it sure does. My suggestion causes it to fail when simplify = ?array? From: William Dunlap [mailto:wdunlap at tibco.com] Sent: Tuesday, March 13, 2018 12:11 PM To: Doran, Harold <HDoran at air.org> Cc: r-help at r-project.org Subject: Re: [R] Possible Improvement to sapply Wouldn't that change how simplify='array' is handled? > str(sapply(1:3,

similar to: Splitting and saving separate dataframes