Displaying 20 results from an estimated 8000 matches similar to: "How to delete a duplicate observation"
2011 May 09
2
Creating Observation ID
If I have a data frame something like:
Value=rnorm(30)
Group = sample(c('A','B','C'), 30, replace=TRUE)
df = data.frame(Value, Group)
It seems like it should be simple to create an 'ObsID' column which indicates the observation order of each Value within each of the 3 groups. Somehow, I can't quite see how to do it without manually sub-setting the parent data
2002 Jan 30
2
Shade area under curve?
Hi all,
I've got this graphics question which really should be easy. I want to shade
an area between bounds under a curve. A suitable beginning seems to be the
following:
> plot(dnorm,-4,4)
> segments(-4,0,4,0)
> segments(-2,0,-2,dnorm(-2))
> segments(2,0,2,dnorm(2))
It is the area between -2 and 2 which I want to shade (or something
similar). Hints anyone?
Robert
2011 Apr 19
2
Several factors same levels
This is probably very simple but I'm new to R so apologies for being stupid.
I have some data with No coded as 0 and yes coded as 1.
e.g.
id sex alcohol smoker
1 M 0 1
2 F 1 0
3 M 0 0
I realise I can covert the numerical variable back to a factor by
falcohol<-factor(alcohol,levels=0:1)
levels<-c("No","Yes")
2012 Jul 18
2
duplicate data between two data frames according to row names
Hi everybody.
I'll first explain my problem and what I'm trying to do.
Admit this example:
I'm working on 5 different weather stations.
I have first in one file 3 of these 5 weather stations, containing their
data. Here's an example of this file:
DF1 <- data.frame(station=c("ST001","ST004","ST005"),data=c(5,2,8))
And my two other stations in
2010 Feb 07
2
Reading hierarchical data
I would like to read the following hierarchical data set. There is a family
record followed by one or more personal records.
If col. 7 is "1" it is a family record. If it is "2" it is a personal
record.
The family record is formatted as follows:
col. 1-5 family id
col. 7 "1"
col. 9 dwelling type code
The personal record is formatted as follows:
col.
2009 Sep 19
2
Counting observations of a combined factor
#I have a dataset with two factor. I want to combine those factors into
a single factor and count the number of data values for each new factor.
The following gives a comparable dataframe:
a <- rep(c("a", "b"), c(6,6))
b <- rep(c("c", "d"), c(6,6))
df <- data.frame(f1=a, f2=b, d=rnorm(12))
df
# I use the 'interaction' function to combine
2010 Jan 22
2
Question on Merge/Lookup
I need to merge three datasets and don't know how. If I were using SQL, I
would use df3, look up the characteristics of each date in df1 and the value
for each observation in df2.
df1 - unique list of Dates and characteristics of those dates
Date, YYYYMM, YYYYWW, DOW
df2 - the raw data
Date, Place, Value
df3 - all posibile combinations of Date + Place (via
2013 Sep 10
3
to delete lines by means of a vector
Hi
I would like to eliminate a large number of lines of the dataframe df1
The lines to delete are given here by the values of Mat (ex : 2,4,7,10).
but I have a large number (300) values of Mat
dput(df1)
structure(list(Mat = c(1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3,
3, 3, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 7, 7,
7, 7, 7, 7, 7, 8, 8, 8, 8, 8, 8, 8, 9, 9, 9, 9, 9, 9, 10, 10,
10, 11,
2012 Dec 13
1
duplicated.data.frame() and POSIXct with DST shift
Hi,
I encountered the behavior, that the duplicated method for data.frames gives "false positives" if there are columns of class POSIXct with a clock shift from DST to standard time.
time <- as.POSIXct("2012-10-28 02:00", tz="Europe/Vienna") + c(0, 60*60)
time
[1] "2012-10-28 02:00:00 CEST" "2012-10-28 02:00:00 CET"
df <-
2012 Jun 13
1
Indexing Grouped Data
I need help in indexing grouped data. In this excample (df1 data), the first child had a first immunization at age 2. The second child had the first, second and third immunization at age 5,10, and 12, the third child had first and second immunization at age 4 and 6 and the fourth child had the first immunization at age 2. I have df1 and I need to create df2 with and "ind' variable that
2011 Mar 28
1
problem in simple saving and loading data frames
Dear all
My dataframe has > 80,000 variables which I can not everytime load into R
using *.txt files (read.table option), cost me time and sometime computer
decomes not responsive. So I need a way to save my dataframe in my
workdirectory as such.
Execuse me if the problem is too simple. I tried the following, I do not
know what is wrong with it:
Example data:
x <- 1:10
y <- 21:30
z
2008 Sep 14
5
difference of two data frames
Hello
I have 2 data frames DF1 and DF2 where DF2 is a subset of DF1:
DF1= data.frame(V1=1:6, V2= letters[1:6])
DF2= data.frame(V1=1:3, V2= letters[1:3])
How do I create a new data frame of the difference between DF1 and DF2
newDF=data.frame(V1=4:6, V2= letters[4:6])
In my real data, the rows are not in order as in the example I provided.
Thanks much
Joseph
[[alternative HTML version
2013 Feb 26
2
merging or joining 2 dataframes: merge, rbind.fill, etc.?
#I want to "merge" or "join" 2 dataframes (df1 & df2) into a 3rd
(mydf). I want the 3rd dataframe to contain 1 row for each row in df1
& df2, and all the columns in both df1 & df2. The solution should
"work" even if the 2 dataframes are identical, and even if the 2
dataframes do not have the same column names. The rbind.fill function
seems to work. For
2011 Dec 07
2
plotting and coloring longitudinal data with three time points (ggplot2)
Dear list,
I have been struggling with this for some time now, and for the last hour I have been struggling to make a working example for the list. I hope someone out there have some experience with plotting longitudinal data that they will share.
My data is some patient data with three different time stamps. First the patients are identified at different times (first time stamp). Second, they
2012 Sep 27
4
Colsplit, removing parts of a string
Hi,
I am using colsplit (package = reshape) to split all strings
in a column according to the same patterns. Here
an example:
library(reshape2)
df1 <- data.frame(x=c("str1_name2", "str3_name5"))
df2 <- data.frame(df1, colsplit(df1$x, pattern = "_", names=c("str","name")))
This is nearly what I want but I want to remove the words
2012 Dec 13
5
remove NA in df results in NA, NA.1 ... rows
Good morning!
I have the following data frame (df):
X.outer Y.outer X.PAD1 Y.PAD1 X.PAD2 Y.PAD2 X.PAD3 Y.PAD3 X.PAD4 Y.PAD4
73 574690.0 179740.0 574690.2 179740.0 574618.3 179650 574729.2 179674 574747.1 179598
74 574680.6 179737.0 574693.4 179740.0 574719.0 179688 574831.8 179699 574724.9 179673
75 574671.0 179734.0 574696.2 179740.0 574719.0 179688 574807.8 179787 574729.2
2013 Jan 02
2
rbind: inconsistent behaviour with empty data frames?
The rbind on empty and nonempty data frames behaves inconsistently. I am
not sure if by design.
In the first example, first row is deleted, which may or may not be on
purpose:
df1 <- data.frame()
df2 <- data.frame(foo=c(1, 2), bar=c("a", "b"))
rbind(df1, df2)
foo bar
2 2 b
Now if we continue:
df1 <- data.frame(matrix(0, 0, 2))
names(df1) <- names(df2)
2011 Aug 18
2
Best way/practice to create a new data frame from two given ones with last column computed from the two data frames?
Dear expeRts,
What is the best approach to create a third data frame from two given ones, when
the new/third data frame has last column computed from the last columns of the two given
data frames?
## Okay, sounds complicated, so here is an example. Assume we have the two data frames:
df1 <- data.frame(Year=rep(2001:2010, each=2), Group=c("Group 1","Group 2"), Value=1:20)
2009 Jan 21
3
merging several dataframes from a list
Hi there,
I have a list of dataframes (generated by reading multiple files) and all
dataframes are comparable in dimension and column names. They also have a
common column, which, I'd like to use for merging. To give a simple example of
what I have:
df1 <- data.frame(c(LETTERS[1:5]), c(2,6,3,1,9))
names(df1) <- c("pos", "data")
df3 <- df2 <- df1
df2$data
2012 Jul 02
2
using "na.locf" from package zoo to fill NA gaps
Hi everybody,
I have a small question about the function "na.locf" from the package "zoo".
I saw in the help that this function is able to fill NA gaps with the last
value before the NA gap (or with the next value).
But it is possible to fill my NA gaps according to the last AND the next
value at the same time?
Actually, I want R to fill my gaps with the method of