similar to: Subset dataframe based on condition

Displaying 20 results from an estimated 12000 matches similar to: "Subset dataframe based on condition"

2008 May 13
2
Plotting Frequency Distribution in R
Hi, How can plot a frequency distribution curve for the following data.    V1      V2 1   1 160.54% 2   1 201.59% 3   1  18.45% 4   1 179.03% 5   1 274.37% 6   1   0.00% 7   1  24.52% 8   1  39.17% 9   3  43.72% 10  1  53.06% 11  1  64.97% 12  1  79.84% 13  1  98.08% 14  1 115.32% 15  1 127.96% 16  1 155.38% 17  1 157.25% 18  1 193.17% 19  1  51.53% 20 15  99.32% 21  1 106.86% 22  1 219.44%
2006 Jul 14
2
Recreate new dataframe based on condition
Hi, How can I achieve this in R. Dataset is as follows: >df x 1 2 2 4 3 1 4 3 5 3 6 2 structure(list(x = c(2, 4, 1, 3, 3, 2)), .Names = "x", row.names = c("1", "2", "3", "4", "5", "6"), class = "data.frame") I want to recreate a new data frame whose rows are sum of (1&2, 3&4, 5&6)
2006 Apr 24
6
Handling large dataset & dataframe
Hi, I have a dataset consisting of 350,000 rows and 266 columns. Out of 266 columns 250 are dummy variable columns. I am trying to read this data set into R dataframe object but unable to do it due to memory size limitations (object size created is too large to handle in R). Is there a way to handle such a large dataset in R. My PC has 1GB of RAM, and 55 GB harddisk space running
2006 Apr 18
1
Nonlinear Regression model: Diagnostics
Hi, I am trying to run the following nonlinear regression model. > nreg <- nls(y ~ exp(-b*x), data = mydf, start = list(b = 0), alg = "default", trace = TRUE) OUTPUT: 24619327 : 0 24593178 : 0.0001166910 24555219 : 0.0005019005 24521810 : 0.001341571 24500774 : 0.002705402 24490713 : 0.004401078 24486658 : 0.00607728 24485115 : 0.007484372
2009 Jan 20
5
Problem with subset() function?
Hi all, Can anyone explain why the following use of the subset() function produces a different outcome than the use of the "[" extractor? The subset() function as used in density(subset(mydf, ht >= 150.0 & wt <= 150.0, select = c(age))) appears to me from documentation to be equivalent to density(mydf[mydf$ht >= 150.0 & mydf$wt <= 150.0, "age"])
2017 Sep 25
2
Subset
myDF <- data.frame(a = c("<0.1", NA, 0.3, 5, "Nil"), b = c("<0.1", 1, 0.3, 5, "Nil"), stringsAsFactors = FALSE) # you can subset the b-column in several ways myDF[ , 2] myDF[ , "b"] myDF$b # using the column, you make a logical vector ! is.na(as.numeric(myDF$b)) # This can be used to select the
2017 Sep 25
1
Subset
Always via logical expressions. In this case you can use the logical expression myDF$b != "0" to give you a vector of TRUE/FALSE B. > On Sep 25, 2017, at 8:00 AM, Shane Carey <careyshan at gmail.com> wrote: > > This is super, really helpfull. Sorry, one final question, lets say I wanted to remove 0's rather than NAs , what would it be? > > Thanks >
2017 Sep 25
0
Subset
This is super, really helpfull. Sorry, one final question, lets say I wanted to remove 0's rather than NAs , what would it be? Thanks On Mon, Sep 25, 2017 at 12:41 PM, Boris Steipe <boris.steipe at utoronto.ca> wrote: > myDF <- data.frame(a = c("<0.1", NA, 0.3, 5, "Nil"), > b = c("<0.1", 1, 0.3, 5, "Nil"), >
2007 Jul 07
1
calculating p-values of columns in a dataframe
I have a dataframe ("mydf") that contains "differences of means". I wish to test whether these differences are significantly different from zero. Below, I calculate the t-statistic for each column. What is a "good" method to calculate/look-up the p-value for each column? mydf=data.frame(a=c(1,-22,3,-4),b=c(5,-6,-7,9)) mymean=mean(mydf) mysd=sd(mydf)
2006 Apr 21
3
Creat new column based on condition
Hi, How can I accomplish this task in R? V1 10 20 30 10 10 20 Create a new column V2 such that: If V1 = 10 then V2 = 4 If V1 = 20 then V2 = 6 V1 = 30 then V2 = 10 So the O/P looks like this V1 V2 10 4 20 6 30 10 10 4 10 4 20 6 Thanks in advance. Sachin
2008 Jan 08
1
retaining "POSIXct" formatting when using apply(muff, FUN=MAX) on POSIXct dataframe?
How do I retain "POSIXct" formatting when using apply, with FUN=max? #example: mydata <- rep(Sys.time(), 10) mydf <- data.frame(matrix(data=mydata, nrow=2, ncol=length(mydata) ) ) for(i in seq(mydf))class(mydf[[i]]) <- class(mydata) str(mydf) maxdates <- apply(mydf,2,max,na.rm=T) str(maxdates) #Why is the formattign now "chr", and not
2004 Jan 21
2
subset select within a function
Dear all, I'd like to subset a df within a function, and use select for choosing the variable. Something like (simplified example): mydf <- data.frame(a= 0:9, b= 10:19) ttt <- function(vv) { tmpdf <- subset(mydf, select= vv) mean(tmpdf$vv) } ttt(mydf$b) But this is not the correct way. Any help? Thanks in advance Juli
2006 May 05
2
boxplot - labelling
Hi, How can I get the values of mean and median (not only points but values too) on the boxplot. I am using boxplot function from graphics package. Following is my data set > df [1] 5 1 1 0 0 10 38 47 2 5 0 28 5 8 81 21 12 9 1 12 2 4 22 3 > mean.val <- sapply(df,mean) > boxplot(df,las = 1,col = "light blue") > points(seq(df), mean.val,
2011 Jan 26
1
How to call subset in a for loop?
Dear all, I have a data frame 'myDf', in which one of the fields 'myField' can have several possible values. To extract the observations for which it has value "A", I can do: subset(myDf, myField="A") However, when I try to do this within a loop, it doesn't work, it returns everything, and not a subset for (currField in c("A", "B",
2007 Feb 04
3
Reference to dataframe and contents
This is probably easy for experienced users but I could not find a solution. I have several R scripts that process several columns of a dataframe (several dataframes and columns actually, but simplified for my question). References such as: myDF$myCol are all over. I like to automate this for other dataframes and columns by defining a reference only once in the beginning of the script. One
2011 Sep 03
2
problem in applying function in data subset (with a level) - using plyr or other alternative are also welcome
Dear R experts. I might be missing something obvious. I have been trying to fix this problem for some weeks. Please help. #data ped <- c(rep(1, 4), rep(2, 3), rep(3, 3)) y <- rnorm(10, 8, 2) # variable set 1 M1a <- sample (c(1, 2,3), 10, replace= T) M1b <- sample (c(1, 2,3), 10, replace= T) M1aP1 <- sample (c(1, 2,3), 10, replace= T) M1bP2 <- sample (c(1, 2,3), 10, replace= T)
2006 Apr 24
1
Handling large dataset & dataframe [Broadcast]
Here's a skeletal example. Embellish as needed: p <- 5 n <- 300 set.seed(1) dat <- cbind(rnorm(n), matrix(runif(n * p), n, p)) write.table(dat, file="c:/temp/big.txt", row=FALSE, col=FALSE) xtx <- matrix(0, p + 1, p + 1) xty <- numeric(p + 1) f <- file("c:/temp/big.txt", open="r") for (i in 1:3) { x <- matrix(scan(f, nlines=100), 100,
2005 Dec 22
1
add factor to dataframe given ranges
Hi all, I would like to factorize the entries in a dataframe given some groupings. E.g: mydf = data.frame( a = rnorm(100,10), b = rnorm(100,10), c = rgamma(100, 1, scale=1)) group = hist(mydf$c, breaks="FD") group$breaks The idea is to create a factor "mydf$d" with levels corresponding to the ranges in group$breaks. There must be an easy way to do this that I
2006 May 23
5
conditional replacement
Hi How can do this in R. >df 48 1 35 32 80 If df < 30 then replace it with 30 and else if df > 60 replace it with 60. I have a large dataset so I cant afford to identify indexes and then replace. Desired o/p: 48 30 35 32 60 Thanx in advance. Sachin
2013 Apr 06
1
Multiple subsetting of a dataframe based on many conditions
Hello Everybody, I'm working with a dataframe that has 18 columns. I would like to subset the data in one of these columns, "present", according to combinations of data in six of the other columns within the data frame and then save this into a text file. The columns I would like to use to subset "present" are: * answer (1:4) [answer takes the values 1 to 4] *p.num