thr3ads.net - similar to: "Two basic data manipulation questions (counting and aggregating)"

Displaying 20 results from an estimated 9000 matches similar to: "Two basic data manipulation questions (counting and aggregating)"

Counting question

2004 Jul 30

Counting question

Hi All, Here is something that sounds simple, but I'm having trouble getting it. I have a data frame with two columns, the first is date and the second is employee ID. I'd like to plot date on the horizontal axis, employee ID on the vertical axis, and the number of times the employee appears for the given date as a color. I've kluged something where I make a table (table(date, id))

"Reversal" of Aggregation

2007 Jan 29

"Reversal" of Aggregation

Dear all, given I have a data.frame in a format like this mydf <- data.frame(age=rep(1:3,5), year=c(rep(1996,3), rep(1997,3), rep(1998,3), rep(1999,3), rep(2000,3)), income=1:15) mydf Now I convert it to some 2D-frequency table like this: mymatrix <- tapply(X=mydf$income, INDEX=list(mydf$age, mydf$year),

How to count the number of NAs in each column of a df?

2007 Feb 09

How to count the number of NAs in each column of a df?

I would like to remove columns of a df which have too many NAs. I think that summary() should give me the information, I just don't know how to access it. Advice? _____________________________ Professor Michael Kubovy University of Virginia Department of Psychology USPS: P.O.Box 400400 Charlottesville, VA 22904-4400 Parcels: Room 102 Gilmer Hall McCormick Road

Chronological data manipulation question

2007 Oct 16

Chronological data manipulation question

Hi all, I currently work on a survey which contains biographical data stored in a chronological way, ie something like : id year variable 001 2000 0 001 2001 0 001 2002 1 001 2003 0 002 1996 0 002 1997 0 002 1998 1 002 1999 0 002 2000 0 where id is a person identifier, year the year of observation and variable the

aggregate.data.frame - prevent conversion to factors? show statistics for NA values of "by" variable?

2007 Jul 31

aggregate.data.frame - prevent conversion to factors? show statistics for NA values of "by" variable?

I have a two question regarding the "aggregate.data.frame" method of the "aggregate" function. My situation: a. My "x" variable is a data.frame ("mydf") with two columns, both columns of type/format "numeric". b. My "by" variable is a data.frame("mybys") with two columns, both columns of type/format "character". c.

id <username> - doesnt list all groups

2018 Aug 07

id <username> - doesnt list all groups

Hello, my enviroment: All Servers are Ubuntun 16.04-18.04 SAMBA AD DC Server and several SAMABA DOMAIN MEMBER (connected via WINBIND). In ADDC I've created a group "restrictaccess" and added some users. Now when im typing "id <username>" on a Domain Member, for some users the group "restrictaccess" are listed for some not! For example: ON DC: #

Strange variable names in factor regression

2024 May 09

Strange variable names in factor regression

On converting character variables to ordered factors, regression result has strange names. Is it possible to obtain same variable names with and without intercept? Thanks, Naresh mydf <- data.frame(date = seq.Date(as.Date("2024-01-01"), as.Date("2024-03-31"), by = 1)) mydf[, "wday"] <- weekdays(mydf$date, abbreviate = TRUE) mydf.work <- subset(mydf, !(wday

assign NA to rows by test on multiple columns of a data frame

2017 Nov 22

assign NA to rows by test on multiple columns of a data frame

Given this data frame (a simplified, essential reproducible example) A<-c(8,7,10,1,5) A_flag<-c(10,0,1,0,2) B<-c(5,6,2,1,0) B_flag<-c(12,9,0,5,0) mydf<-data.frame(A, A_flag, B, B_flag) # this is my initial df mydf I want to get to this final situation i<-which(mydf$A_flag==0) mydf$A[i]<-NA ii<-which(mydf$B_flag==0) mydf$B[ii]<-NA

Problem with subset() function?

2009 Jan 20

Problem with subset() function?

Hi all, Can anyone explain why the following use of the subset() function produces a different outcome than the use of the "[" extractor? The subset() function as used in density(subset(mydf, ht >= 150.0 & wt <= 150.0, select = c(age))) appears to me from documentation to be equivalent to density(mydf[mydf$ht >= 150.0 & mydf$wt <= 150.0, "age"])

Creating a "shifted" month (one that starts not on the first of each month but on another date)

2011 May 19

Creating a "shifted" month (one that starts not on the first of each month but on another date)

Hello! I have a data frame with dates. I need to create a new "month" that starts on the 20th of each month - because I'll need to aggregate my data later by that "shifted" month. I wrote the code below and it works. However, I was wondering if there is some ready-made function in some package - that makes it easier/more elegant? Thanks a lot! # Example data:

Reshaping data

2005 Dec 08

Reshaping data

Dear all, given I have data in a data.frame which indicate the number of people in a specific year at a specific age: n <- 10 mydf <- data.frame(yr=sample(1:10, size=n, replace=FALSE), age=sample(1:12, size=n, replace=FALSE), no=sample(1:10, size=n, replace=FALSE)) Now I would like to make a matrix with (in this simple example) 10 columns (for the

Surprising Behavior of 'tapply'

2005 Feb 03

Surprising Behavior of 'tapply'

Dear all, I wanted to make a two-way-table of two variables with a counting variable stored in another column of a dataframe. In version 1.9.1, the behavior is as expected as shown in the simplified example code. > sex <- rep(c("F", "M"), 5) > income <- c(rep("low", 5), rep("high", 5)) > count <- 1:10 > mydf <-

Comparing "transform" to "with"

2007 Sep 01

Comparing "transform" to "with"

Hi All, I've been successfully using the with function for analyses and the transform function for multiple transformations. Then I thought, why not use "with" for both? I ran into problems & couldn't figure them out from help files or books. So I created a simplified version of what I'm doing: rm( list=ls() ) x1<-c(1,3,3) x2<-c(3,2,1) x3<-c(2,5,2)

vector problems

2001 Nov 05

vector problems

I dont get it: > is.vector(c(mydf[1])) [1] TRUE > unique(c(mydf[1])) Error in unique(c(mydf[1])) : unique() applies only to vectors > Is it a vector or not? This stuff is driving me nuts. I'm simply trying to convince R that my grouping vector is actually a vector so that unique will work. Its just a vector of numbers, so why shouldnt it work? --

merging or joining 2 dataframes: merge, rbind.fill, etc.?

2013 Feb 26

merging or joining 2 dataframes: merge, rbind.fill, etc.?

#I want to "merge" or "join" 2 dataframes (df1 & df2) into a 3rd (mydf). I want the 3rd dataframe to contain 1 row for each row in df1 & df2, and all the columns in both df1 & df2. The solution should "work" even if the 2 dataframes are identical, and even if the 2 dataframes do not have the same column names. The rbind.fill function seems to work. For

assign NA to rows by test on multiple columns of a data frame

2017 Nov 22

assign NA to rows by test on multiple columns of a data frame

...well, I don't think this is exactly the expected result (see my post) to be noted that the columns affected should be "A" and "B" thanks for the help max ----- Messaggio originale ----- Da: "Rui Barradas" <ruipbarradas at sapo.pt> A: "Massimo Bressan" <massimo.bressan at arpa.veneto.it>, "r-help" <r-help at

retaining formatting when converting a vector to a matrix/data.frame?

2008 Jan 03

retaining formatting when converting a vector to a matrix/data.frame?

Please see example code below. I have a vector ("mydata") of length 10. "mydata" can have various formats (e.g. numeric, text, POSIXct, etc) I use the matrix and data.frame functions to convert "mydata" to a dataframe ("mydf") of 2 columns and 5 rows. What is a "good" way to ensure that the format is retained when I create the

NAs in indices

2007 Sep 02

NAs in indices

Hi All, I'm fiddling with an program to read a text file containing periods that SAS uses for missing values. I know that if I had the original SAS data set instead of a text file, R would handle this conversion for me. Data frames do not allow missing values in their indices but vectors do. Why is that? A search of the error message points out the problem and solution but not why they

Error: missing values where TRUE/FALSE needed

2011 Jun 09

Error: missing values where TRUE/FALSE needed

I'm writing a function and keep getting the following error message. myfunc <- function(lst) { lst <- list(roots = c("car insurance", "auto insurance"), roots2 = c("insurance"), prefix = c("cheap", "budget"), prefix2 = c("low cost"), suffix = c("quote", "quotes"), suffix2 = c("rate",

ggplot2 and facet_wrap help

2013 Feb 18

ggplot2 and facet_wrap help

Dear R experts, I am trying to arrange multiple plots, creating one graph for each size1 factor variable in my data frame, and each plot has the median price on the y-axis and the size2 on the x-axis grouped by clarity: library(ggplot2) df <- data.frame(price=matrix(sample(1:1000, 100, replace = TRUE), ncol = 1)) df$size1 = 1:nrow(df) df$size1 = cut(df$size1, breaks=11)

similar to: Two basic data manipulation questions (counting and aggregating)