thr3ads.net - similar to: "How to delete a duplicate observation"

Displaying 20 results from an estimated 8000 matches similar to: "How to delete a duplicate observation"

2011 May 09

Creating Observation ID

If I have a data frame something like: Value=rnorm(30) Group = sample(c('A','B','C'), 30, replace=TRUE) df = data.frame(Value, Group) It seems like it should be simple to create an 'ObsID' column which indicates the observation order of each Value within each of the 3 groups. Somehow, I can't quite see how to do it without manually sub-setting the parent data

Shade area under curve?

2002 Jan 30

Shade area under curve?

Hi all, I've got this graphics question which really should be easy. I want to shade an area between bounds under a curve. A suitable beginning seems to be the following: > plot(dnorm,-4,4) > segments(-4,0,4,0) > segments(-2,0,-2,dnorm(-2)) > segments(2,0,2,dnorm(2)) It is the area between -2 and 2 which I want to shade (or something similar). Hints anyone? Robert

Several factors same levels

2011 Apr 19

Several factors same levels

This is probably very simple but I'm new to R so apologies for being stupid. I have some data with No coded as 0 and yes coded as 1. e.g. id sex alcohol smoker 1 M 0 1 2 F 1 0 3 M 0 0 I realise I can covert the numerical variable back to a factor by falcohol<-factor(alcohol,levels=0:1) levels<-c("No","Yes")

duplicate data between two data frames according to row names

2012 Jul 18

duplicate data between two data frames according to row names

Hi everybody. I'll first explain my problem and what I'm trying to do. Admit this example: I'm working on 5 different weather stations. I have first in one file 3 of these 5 weather stations, containing their data. Here's an example of this file: DF1 <- data.frame(station=c("ST001","ST004","ST005"),data=c(5,2,8)) And my two other stations in

Reading hierarchical data

2010 Feb 07

Reading hierarchical data

I would like to read the following hierarchical data set. There is a family record followed by one or more personal records. If col. 7 is "1" it is a family record. If it is "2" it is a personal record. The family record is formatted as follows: col. 1-5 family id col. 7 "1" col. 9 dwelling type code The personal record is formatted as follows: col.

Counting observations of a combined factor

2009 Sep 19

Counting observations of a combined factor

#I have a dataset with two factor. I want to combine those factors into a single factor and count the number of data values for each new factor. The following gives a comparable dataframe: a <- rep(c("a", "b"), c(6,6)) b <- rep(c("c", "d"), c(6,6)) df <- data.frame(f1=a, f2=b, d=rnorm(12)) df # I use the 'interaction' function to combine

Question on Merge/Lookup

2010 Jan 22

Question on Merge/Lookup

I need to merge three datasets and don't know how. If I were using SQL, I would use df3, look up the characteristics of each date in df1 and the value for each observation in df2. df1 - unique list of Dates and characteristics of those dates Date, YYYYMM, YYYYWW, DOW df2 - the raw data Date, Place, Value df3 - all posibile combinations of Date + Place (via

to delete lines by means of a vector

2013 Sep 10

to delete lines by means of a vector

Hi I would like to eliminate a large number of lines of the dataframe df1 The lines to delete are given here by the values of Mat (ex : 2,4,7,10). but I have a large number (300) values of Mat dput(df1) structure(list(Mat = c(1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 7, 7, 7, 7, 7, 7, 7, 8, 8, 8, 8, 8, 8, 8, 9, 9, 9, 9, 9, 9, 10, 10, 10, 11,

duplicated.data.frame() and POSIXct with DST shift

2012 Dec 13

duplicated.data.frame() and POSIXct with DST shift

Hi, I encountered the behavior, that the duplicated method for data.frames gives "false positives" if there are columns of class POSIXct with a clock shift from DST to standard time. time <- as.POSIXct("2012-10-28 02:00", tz="Europe/Vienna") + c(0, 60*60) time [1] "2012-10-28 02:00:00 CEST" "2012-10-28 02:00:00 CET" df <-

Indexing Grouped Data

2012 Jun 13

Indexing Grouped Data

I need help in indexing grouped data. In this excample (df1 data), the first child had a first immunization at age 2. The second child had the first, second and third immunization at age 5,10, and 12, the third child had first and second immunization at age 4 and 6 and the fourth child had the first immunization at age 2. I have df1 and I need to create df2 with and "ind' variable that

problem in simple saving and loading data frames

2011 Mar 28

problem in simple saving and loading data frames

Dear all My dataframe has > 80,000 variables which I can not everytime load into R using *.txt files (read.table option), cost me time and sometime computer decomes not responsive. So I need a way to save my dataframe in my workdirectory as such. Execuse me if the problem is too simple. I tried the following, I do not know what is wrong with it: Example data: x <- 1:10 y <- 21:30 z

difference of two data frames

2008 Sep 14

difference of two data frames

Hello I have 2 data frames DF1 and DF2 where DF2 is a subset of DF1: DF1= data.frame(V1=1:6, V2= letters[1:6]) DF2= data.frame(V1=1:3, V2= letters[1:3]) How do I create a new data frame of the difference between DF1 and DF2 newDF=data.frame(V1=4:6, V2= letters[4:6]) In my real data, the rows are not in order as in the example I provided. Thanks much Joseph [[alternative HTML version

merging or joining 2 dataframes: merge, rbind.fill, etc.?

2013 Feb 26

merging or joining 2 dataframes: merge, rbind.fill, etc.?

#I want to "merge" or "join" 2 dataframes (df1 & df2) into a 3rd (mydf). I want the 3rd dataframe to contain 1 row for each row in df1 & df2, and all the columns in both df1 & df2. The solution should "work" even if the 2 dataframes are identical, and even if the 2 dataframes do not have the same column names. The rbind.fill function seems to work. For

plotting and coloring longitudinal data with three time points (ggplot2)

2011 Dec 07

plotting and coloring longitudinal data with three time points (ggplot2)

Dear list, I have been struggling with this for some time now, and for the last hour I have been struggling to make a working example for the list. I hope someone out there have some experience with plotting longitudinal data that they will share. My data is some patient data with three different time stamps. First the patients are identified at different times (first time stamp). Second, they

Colsplit, removing parts of a string

2012 Sep 27

Colsplit, removing parts of a string

Hi, I am using colsplit (package = reshape) to split all strings in a column according to the same patterns. Here an example: library(reshape2) df1 <- data.frame(x=c("str1_name2", "str3_name5")) df2 <- data.frame(df1, colsplit(df1$x, pattern = "_", names=c("str","name"))) This is nearly what I want but I want to remove the words

remove NA in df results in NA, NA.1 ... rows

2012 Dec 13

remove NA in df results in NA, NA.1 ... rows

Good morning! I have the following data frame (df): X.outer Y.outer X.PAD1 Y.PAD1 X.PAD2 Y.PAD2 X.PAD3 Y.PAD3 X.PAD4 Y.PAD4 73 574690.0 179740.0 574690.2 179740.0 574618.3 179650 574729.2 179674 574747.1 179598 74 574680.6 179737.0 574693.4 179740.0 574719.0 179688 574831.8 179699 574724.9 179673 75 574671.0 179734.0 574696.2 179740.0 574719.0 179688 574807.8 179787 574729.2

rbind: inconsistent behaviour with empty data frames?

2013 Jan 02

rbind: inconsistent behaviour with empty data frames?

The rbind on empty and nonempty data frames behaves inconsistently. I am not sure if by design. In the first example, first row is deleted, which may or may not be on purpose: df1 <- data.frame() df2 <- data.frame(foo=c(1, 2), bar=c("a", "b")) rbind(df1, df2) foo bar 2 2 b Now if we continue: df1 <- data.frame(matrix(0, 0, 2)) names(df1) <- names(df2)

Best way/practice to create a new data frame from two given ones with last column computed from the two data frames?

2011 Aug 18

Best way/practice to create a new data frame from two given ones with last column computed from the two data frames?

Dear expeRts, What is the best approach to create a third data frame from two given ones, when the new/third data frame has last column computed from the last columns of the two given data frames? ## Okay, sounds complicated, so here is an example. Assume we have the two data frames: df1 <- data.frame(Year=rep(2001:2010, each=2), Group=c("Group 1","Group 2"), Value=1:20)

merging several dataframes from a list

2009 Jan 21

merging several dataframes from a list

Hi there, I have a list of dataframes (generated by reading multiple files) and all dataframes are comparable in dimension and column names. They also have a common column, which, I'd like to use for merging. To give a simple example of what I have: df1 <- data.frame(c(LETTERS[1:5]), c(2,6,3,1,9)) names(df1) <- c("pos", "data") df3 <- df2 <- df1 df2$data

using "na.locf" from package zoo to fill NA gaps

2012 Jul 02

using "na.locf" from package zoo to fill NA gaps

Hi everybody, I have a small question about the function "na.locf" from the package "zoo". I saw in the help that this function is able to fill NA gaps with the last value before the NA gap (or with the next value). But it is possible to fill my NA gaps according to the last AND the next value at the same time? Actually, I want R to fill my gaps with the method of

similar to: How to delete a duplicate observation