similar to: How to delete a duplicate observation

Displaying 20 results from an estimated 8000 matches similar to: "How to delete a duplicate observation"

2011 May 09
2
Creating Observation ID
If I have a data frame something like: Value=rnorm(30) Group = sample(c('A','B','C'), 30, replace=TRUE) df = data.frame(Value, Group) It seems like it should be simple to create an 'ObsID' column which indicates the observation order of each Value within each of the 3 groups. Somehow, I can't quite see how to do it without manually sub-setting the parent data
2002 Jan 30
2
Shade area under curve?
Hi all, I've got this graphics question which really should be easy. I want to shade an area between bounds under a curve. A suitable beginning seems to be the following: > plot(dnorm,-4,4) > segments(-4,0,4,0) > segments(-2,0,-2,dnorm(-2)) > segments(2,0,2,dnorm(2)) It is the area between -2 and 2 which I want to shade (or something similar). Hints anyone? Robert
2011 Apr 19
2
Several factors same levels
This is probably very simple but I'm new to R so apologies for being stupid. I have some data with No coded as 0 and yes coded as 1. e.g. id sex alcohol smoker 1 M 0 1 2 F 1 0 3 M 0 0 I realise I can covert the numerical variable back to a factor by falcohol<-factor(alcohol,levels=0:1) levels<-c("No","Yes")
2012 Jul 18
2
duplicate data between two data frames according to row names
Hi everybody. I'll first explain my problem and what I'm trying to do. Admit this example: I'm working on 5 different weather stations. I have first in one file 3 of these 5 weather stations, containing their data. Here's an example of this file: DF1 <- data.frame(station=c("ST001","ST004","ST005"),data=c(5,2,8)) And my two other stations in
2010 Feb 07
2
Reading hierarchical data
I would like to read the following hierarchical data set. There is a family record followed by one or more personal records. If col. 7 is "1" it is a family record. If it is "2" it is a personal record. The family record is formatted as follows: col. 1-5 family id col. 7 "1" col. 9 dwelling type code The personal record is formatted as follows: col.
2009 Sep 19
2
Counting observations of a combined factor
#I have a dataset with two factor. I want to combine those factors into a single factor and count the number of data values for each new factor. The following gives a comparable dataframe: a <- rep(c("a", "b"), c(6,6)) b <- rep(c("c", "d"), c(6,6)) df <- data.frame(f1=a, f2=b, d=rnorm(12)) df # I use the 'interaction' function to combine
2010 Jan 22
2
Question on Merge/Lookup
I need to merge three datasets and don't know how. If I were using SQL, I would use df3, look up the characteristics of each date in df1 and the value for each observation in df2. df1 - unique list of Dates and characteristics of those dates Date, YYYYMM, YYYYWW, DOW df2 - the raw data Date, Place, Value df3 - all posibile combinations of Date + Place (via
2013 Sep 10
3
to delete lines by means of a vector
Hi I would like to eliminate a large number of lines of the dataframe df1 The lines to delete are given here by the values of Mat (ex : 2,4,7,10). but I have a large number (300) values of Mat dput(df1) structure(list(Mat = c(1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 7, 7, 7, 7, 7, 7, 7, 8, 8, 8, 8, 8, 8, 8, 9, 9, 9, 9, 9, 9, 10, 10, 10, 11,
2012 Dec 13
1
duplicated.data.frame() and POSIXct with DST shift
Hi, I encountered the behavior, that the duplicated method for data.frames gives "false positives" if there are columns of class POSIXct with a clock shift from DST to standard time. time <- as.POSIXct("2012-10-28 02:00", tz="Europe/Vienna") + c(0, 60*60) time [1] "2012-10-28 02:00:00 CEST" "2012-10-28 02:00:00 CET" df <-
2012 Jun 13
1
Indexing Grouped Data
I need help in indexing grouped data. In this excample (df1 data), the first child had a first immunization at age 2. The second child had the first, second and third immunization at age 5,10, and 12, the third child had first and second immunization at age 4 and 6 and the fourth child had the first immunization at age 2. I have df1 and I need to create df2 with and "ind' variable that
2011 Mar 28
1
problem in simple saving and loading data frames
Dear all My dataframe has > 80,000 variables which I can not everytime load into R using *.txt files (read.table option), cost me time and sometime computer decomes not responsive. So I need a way to save my dataframe in my workdirectory as such. Execuse me if the problem is too simple. I tried the following, I do not know what is wrong with it: Example data: x <- 1:10 y <- 21:30 z
2008 Sep 14
5
difference of two data frames
Hello I have 2 data frames DF1 and DF2 where DF2 is a subset of DF1: DF1= data.frame(V1=1:6, V2= letters[1:6]) DF2= data.frame(V1=1:3, V2= letters[1:3]) How do I create a new data frame of the difference between DF1 and DF2 newDF=data.frame(V1=4:6, V2= letters[4:6]) In my real data, the rows are not in order as in the example I provided. Thanks much Joseph [[alternative HTML version
2013 Feb 26
2
merging or joining 2 dataframes: merge, rbind.fill, etc.?
#I want to "merge" or "join" 2 dataframes (df1 & df2) into a 3rd (mydf). I want the 3rd dataframe to contain 1 row for each row in df1 & df2, and all the columns in both df1 & df2. The solution should "work" even if the 2 dataframes are identical, and even if the 2 dataframes do not have the same column names. The rbind.fill function seems to work. For
2011 Dec 07
2
plotting and coloring longitudinal data with three time points (ggplot2)
Dear list, I have been struggling with this for some time now, and for the last hour I have been struggling to make a working example for the list. I hope someone out there have some experience with plotting longitudinal data that they will share. My data is some patient data with three different time stamps. First the patients are identified at different times (first time stamp). Second, they
2012 Sep 27
4
Colsplit, removing parts of a string
Hi, I am using colsplit (package = reshape) to split all strings in a column according to the same patterns. Here an example: library(reshape2) df1 <- data.frame(x=c("str1_name2", "str3_name5")) df2 <- data.frame(df1, colsplit(df1$x, pattern = "_", names=c("str","name"))) This is nearly what I want but I want to remove the words
2012 Dec 13
5
remove NA in df results in NA, NA.1 ... rows
Good morning! I have the following data frame (df): X.outer Y.outer X.PAD1 Y.PAD1 X.PAD2 Y.PAD2 X.PAD3 Y.PAD3 X.PAD4 Y.PAD4 73 574690.0 179740.0 574690.2 179740.0 574618.3 179650 574729.2 179674 574747.1 179598 74 574680.6 179737.0 574693.4 179740.0 574719.0 179688 574831.8 179699 574724.9 179673 75 574671.0 179734.0 574696.2 179740.0 574719.0 179688 574807.8 179787 574729.2
2013 Jan 02
2
rbind: inconsistent behaviour with empty data frames?
The rbind on empty and nonempty data frames behaves inconsistently. I am not sure if by design. In the first example, first row is deleted, which may or may not be on purpose: df1 <- data.frame() df2 <- data.frame(foo=c(1, 2), bar=c("a", "b")) rbind(df1, df2) foo bar 2 2 b Now if we continue: df1 <- data.frame(matrix(0, 0, 2)) names(df1) <- names(df2)
2011 Aug 18
2
Best way/practice to create a new data frame from two given ones with last column computed from the two data frames?
Dear expeRts, What is the best approach to create a third data frame from two given ones, when the new/third data frame has last column computed from the last columns of the two given data frames? ## Okay, sounds complicated, so here is an example. Assume we have the two data frames: df1 <- data.frame(Year=rep(2001:2010, each=2), Group=c("Group 1","Group 2"), Value=1:20)
2009 Jan 21
3
merging several dataframes from a list
Hi there, I have a list of dataframes (generated by reading multiple files) and all dataframes are comparable in dimension and column names. They also have a common column, which, I'd like to use for merging. To give a simple example of what I have: df1 <- data.frame(c(LETTERS[1:5]), c(2,6,3,1,9)) names(df1) <- c("pos", "data") df3 <- df2 <- df1 df2$data
2012 Jul 02
2
using "na.locf" from package zoo to fill NA gaps
Hi everybody, I have a small question about the function "na.locf" from the package "zoo". I saw in the help that this function is able to fill NA gaps with the last value before the NA gap (or with the next value). But it is possible to fill my NA gaps according to the last AND the next value at the same time? Actually, I want R to fill my gaps with the method of