thr3ads.net - similar to: "Subset dataframe based on condition"

Displaying 20 results from an estimated 12000 matches similar to: "Subset dataframe based on condition"

2008 May 13

Plotting Frequency Distribution in R

Hi, How can plot a frequency distribution curve for the following data. V1 V2 1 1 160.54% 2 1 201.59% 3 1 18.45% 4 1 179.03% 5 1 274.37% 6 1 0.00% 7 1 24.52% 8 1 39.17% 9 3 43.72% 10 1 53.06% 11 1 64.97% 12 1 79.84% 13 1 98.08% 14 1 115.32% 15 1 127.96% 16 1 155.38% 17 1 157.25% 18 1 193.17% 19 1 51.53% 20 15 99.32% 21 1 106.86% 22 1 219.44%

Recreate new dataframe based on condition

2006 Jul 14

Recreate new dataframe based on condition

Hi, How can I achieve this in R. Dataset is as follows: >df x 1 2 2 4 3 1 4 3 5 3 6 2 structure(list(x = c(2, 4, 1, 3, 3, 2)), .Names = "x", row.names = c("1", "2", "3", "4", "5", "6"), class = "data.frame") I want to recreate a new data frame whose rows are sum of (1&2, 3&4, 5&6)

Handling large dataset & dataframe

2006 Apr 24

Handling large dataset & dataframe

Hi, I have a dataset consisting of 350,000 rows and 266 columns. Out of 266 columns 250 are dummy variable columns. I am trying to read this data set into R dataframe object but unable to do it due to memory size limitations (object size created is too large to handle in R). Is there a way to handle such a large dataset in R. My PC has 1GB of RAM, and 55 GB harddisk space running

Nonlinear Regression model: Diagnostics

2006 Apr 18

Nonlinear Regression model: Diagnostics

Hi, I am trying to run the following nonlinear regression model. > nreg <- nls(y ~ exp(-b*x), data = mydf, start = list(b = 0), alg = "default", trace = TRUE) OUTPUT: 24619327 : 0 24593178 : 0.0001166910 24555219 : 0.0005019005 24521810 : 0.001341571 24500774 : 0.002705402 24490713 : 0.004401078 24486658 : 0.00607728 24485115 : 0.007484372

Problem with subset() function?

2009 Jan 20

Problem with subset() function?

Hi all, Can anyone explain why the following use of the subset() function produces a different outcome than the use of the "[" extractor? The subset() function as used in density(subset(mydf, ht >= 150.0 & wt <= 150.0, select = c(age))) appears to me from documentation to be equivalent to density(mydf[mydf$ht >= 150.0 & mydf$wt <= 150.0, "age"])

Subset

2017 Sep 25

Subset

myDF <- data.frame(a = c("<0.1", NA, 0.3, 5, "Nil"), b = c("<0.1", 1, 0.3, 5, "Nil"), stringsAsFactors = FALSE) # you can subset the b-column in several ways myDF[ , 2] myDF[ , "b"] myDF$b # using the column, you make a logical vector ! is.na(as.numeric(myDF$b)) # This can be used to select the

Subset

2017 Sep 25

Subset

Always via logical expressions. In this case you can use the logical expression myDF$b != "0" to give you a vector of TRUE/FALSE B. > On Sep 25, 2017, at 8:00 AM, Shane Carey <careyshan at gmail.com> wrote: > > This is super, really helpfull. Sorry, one final question, lets say I wanted to remove 0's rather than NAs , what would it be? > > Thanks >

Subset

2017 Sep 25

Subset

This is super, really helpfull. Sorry, one final question, lets say I wanted to remove 0's rather than NAs , what would it be? Thanks On Mon, Sep 25, 2017 at 12:41 PM, Boris Steipe <boris.steipe at utoronto.ca> wrote: > myDF <- data.frame(a = c("<0.1", NA, 0.3, 5, "Nil"), > b = c("<0.1", 1, 0.3, 5, "Nil"), >

calculating p-values of columns in a dataframe

2007 Jul 07

calculating p-values of columns in a dataframe

I have a dataframe ("mydf") that contains "differences of means". I wish to test whether these differences are significantly different from zero. Below, I calculate the t-statistic for each column. What is a "good" method to calculate/look-up the p-value for each column? mydf=data.frame(a=c(1,-22,3,-4),b=c(5,-6,-7,9)) mymean=mean(mydf) mysd=sd(mydf)

Creat new column based on condition

2006 Apr 21

Creat new column based on condition

Hi, How can I accomplish this task in R? V1 10 20 30 10 10 20 Create a new column V2 such that: If V1 = 10 then V2 = 4 If V1 = 20 then V2 = 6 V1 = 30 then V2 = 10 So the O/P looks like this V1 V2 10 4 20 6 30 10 10 4 10 4 20 6 Thanks in advance. Sachin

retaining "POSIXct" formatting when using apply(muff, FUN=MAX) on POSIXct dataframe?

2008 Jan 08

retaining "POSIXct" formatting when using apply(muff, FUN=MAX) on POSIXct dataframe?

How do I retain "POSIXct" formatting when using apply, with FUN=max? #example: mydata <- rep(Sys.time(), 10) mydf <- data.frame(matrix(data=mydata, nrow=2, ncol=length(mydata) ) ) for(i in seq(mydf))class(mydf[[i]]) <- class(mydata) str(mydf) maxdates <- apply(mydf,2,max,na.rm=T) str(maxdates) #Why is the formattign now "chr", and not

subset select within a function

2004 Jan 21

subset select within a function

Dear all, I'd like to subset a df within a function, and use select for choosing the variable. Something like (simplified example): mydf <- data.frame(a= 0:9, b= 10:19) ttt <- function(vv) { tmpdf <- subset(mydf, select= vv) mean(tmpdf$vv) } ttt(mydf$b) But this is not the correct way. Any help? Thanks in advance Juli

boxplot - labelling

2006 May 05

boxplot - labelling

Hi, How can I get the values of mean and median (not only points but values too) on the boxplot. I am using boxplot function from graphics package. Following is my data set > df [1] 5 1 1 0 0 10 38 47 2 5 0 28 5 8 81 21 12 9 1 12 2 4 22 3 > mean.val <- sapply(df,mean) > boxplot(df,las = 1,col = "light blue") > points(seq(df), mean.val,

How to call subset in a for loop?

2011 Jan 26

How to call subset in a for loop?

Dear all, I have a data frame 'myDf', in which one of the fields 'myField' can have several possible values. To extract the observations for which it has value "A", I can do: subset(myDf, myField="A") However, when I try to do this within a loop, it doesn't work, it returns everything, and not a subset for (currField in c("A", "B",

Reference to dataframe and contents

2007 Feb 04

Reference to dataframe and contents

This is probably easy for experienced users but I could not find a solution. I have several R scripts that process several columns of a dataframe (several dataframes and columns actually, but simplified for my question). References such as: myDF$myCol are all over. I like to automate this for other dataframes and columns by defining a reference only once in the beginning of the script. One

problem in applying function in data subset (with a level) - using plyr or other alternative are also welcome

2011 Sep 03

problem in applying function in data subset (with a level) - using plyr or other alternative are also welcome

Dear R experts. I might be missing something obvious. I have been trying to fix this problem for some weeks. Please help. #data ped <- c(rep(1, 4), rep(2, 3), rep(3, 3)) y <- rnorm(10, 8, 2) # variable set 1 M1a <- sample (c(1, 2,3), 10, replace= T) M1b <- sample (c(1, 2,3), 10, replace= T) M1aP1 <- sample (c(1, 2,3), 10, replace= T) M1bP2 <- sample (c(1, 2,3), 10, replace= T)

Handling large dataset & dataframe [Broadcast]

2006 Apr 24

Handling large dataset & dataframe [Broadcast]

Here's a skeletal example. Embellish as needed: p <- 5 n <- 300 set.seed(1) dat <- cbind(rnorm(n), matrix(runif(n * p), n, p)) write.table(dat, file="c:/temp/big.txt", row=FALSE, col=FALSE) xtx <- matrix(0, p + 1, p + 1) xty <- numeric(p + 1) f <- file("c:/temp/big.txt", open="r") for (i in 1:3) { x <- matrix(scan(f, nlines=100), 100,

add factor to dataframe given ranges

2005 Dec 22

add factor to dataframe given ranges

Hi all, I would like to factorize the entries in a dataframe given some groupings. E.g: mydf = data.frame( a = rnorm(100,10), b = rnorm(100,10), c = rgamma(100, 1, scale=1)) group = hist(mydf$c, breaks="FD") group$breaks The idea is to create a factor "mydf$d" with levels corresponding to the ranges in group$breaks. There must be an easy way to do this that I

conditional replacement

2006 May 23

conditional replacement

Hi How can do this in R. >df 48 1 35 32 80 If df < 30 then replace it with 30 and else if df > 60 replace it with 60. I have a large dataset so I cant afford to identify indexes and then replace. Desired o/p: 48 30 35 32 60 Thanx in advance. Sachin

Multiple subsetting of a dataframe based on many conditions

2013 Apr 06

Multiple subsetting of a dataframe based on many conditions

Hello Everybody, I'm working with a dataframe that has 18 columns. I would like to subset the data in one of these columns, "present", according to combinations of data in six of the other columns within the data frame and then save this into a text file. The columns I would like to use to subset "present" are: * answer (1:4) [answer takes the values 1 to 4] *p.num

similar to: Subset dataframe based on condition