Displaying 20 results from an estimated 5000 matches similar to: "problem with data processing in R"
2012 Apr 14
3
Choose between duplicated rows
Dear r experts,
Sorry for this basic question, but I can't seem to find a solution?
I have this data frame:
df <- data.frame(id = c("id1", "id1", "id1", "id2", "id2", "id2"), A =
c(11905, 11907, 11907, 11829, 11829, 11829), v1 = c(NA, 3, NA,1,2,NA), v2 =
c(NA,2,NA, 2, NA,NA), v3 = c(NA,1,NA,1,NA,NA), v4 = c("N",
2013 Apr 12
1
Removing rows that are duplicates but column values are in reversed order
Hi,
From your example data,
dat1<- read.table(text="
id1?? id2?? value
a????? b?????? 10
c????? d??????? 11
b???? a???????? 10
c????? e???????? 12
",sep="",header=TRUE,stringsAsFactors=FALSE)
#it is easier to get the output you wanted
dat1[!duplicated(dat1$value),]
#? id1 id2 value
#1?? a?? b??? 10
#2?? c?? d??? 11
#4?? c?? e??? 12
But, if you have cases like the one
2008 Jul 09
2
Parsing
Dear R users,
I have a big text file formatted like this:
x x_string
y y_string
id1 id1_string
id2 id2_string
z z_string
w w_string
stuff stuff stuff
stuff stuff stuff
stuff stuff stuff
//
x x_string1
y y_string1
z z_string1
w w_string1
stuff stuff stuff
stuff stuff stuff
stuff stuff stuff
//
x x_string2
y y_string2
id1
2011 Apr 25
2
Problem with ddply in the plyr-package: surprising output of a date-column
Hi Together,
I have a problem with the plyr package - more precisely with the ddply
function - and would be very grateful for any help. I hope the example
here is precise enough for someone to identify the problem. Basically,
in this step I want to identify observations that are identical in
terms of certain identifiers (ID1, ID2, ID3) and just want to save
those observations (in this step,
2010 Sep 07
1
average columns of data frame corresponding to replicates
Hi Group,
I have a data frame below. Within this data frame there are samples
(columns) that are measured more than once. Samples are indicated by
"idx". So "id1" is present in columns 1, 3, and 5. Not every id is
repeated. I would like to create a new data frame so that the repeated
ids are averaged. For example, in the new data frame, columns 1, 3,
and 5 of the original
2005 Aug 10
1
Why only a "" string for heading for row.names with write.csv with a matrix?
Consider:
> x <- matrix(1:6, 2,3)
> rownames(x) <- c("ID1", "ID2")
> colnames(x) <- c("Attr1", "Attr2", "Attr3")
> x
Attr1 Attr2 Attr3
ID1 1 3 5
ID2 2 4 6
> write.csv(x,file="x.csv")
"","Attr1","Attr2","Attr3"
"ID1",1,3,5
2007 Apr 20
2
Fastest way to repeatedly subset a data frame?
Hi -
I have a data frame with a large number of observations (62,000 rows,
but only 2 columns - a character ID and a result list).
Sample:
> my.df <- data.frame(id=c("ID1", "ID2", "ID3"), result=1:3)
> my.df
id result
1 ID1 1
2 ID2 2
3 ID3 3
I have a list of ID vectors. This list will have anywhere from 100 to
1000 members, and
2006 Sep 13
3
group bunch of lines in a data.frame, an additional requirement
Thanks for pointing me out "aggregate", that works fine!
There is one complication though: I have mixed types (numerical and character),
So the matrix is of the form:
A 1.0 200 ID1
A 3.0 800 ID1
A 2.0 200 ID1
B 0.5 20 ID2
B 0.9 50 ID2
C 5.0 70 ID1
One letter always has the same ID but one ID can be shared by many
letters (like ID1)
I just want to keep track of the ID, and get
2005 Nov 09
3
dataframe without repetition
Hello,
with a data.frame like this :
> toto <-
data.frame(id=c("id1","id1","id2","id3","id3","id3"),dpt=c("13","13","34","30","30","30"))
> toto
id dpt
1 id1 13
2 id1 13
3 id2 34
4 id3 30
5 id3 30
6 id3 30
what is the most efficient ways to obtain :
id
2008 Jan 10
1
data.frame manipulation: Unbinding strings in a row
Hi all,
I have a data.frame I received with data that look like this (comma
separated strings in last row):
ID Shop Items
ID1 A1 item1, item2, item3
ID2 A2 item4, item5
ID3 A1 item1, item3, item4
But I would like to unbind the strings in col(2) items so that it will look
like this:
ID Shop Items
ID1 A1 item1
ID1 A1 item2
ID1 A1 item3
ID2 A2 item4
ID2 A2 item5
ID3 A1 item1
ID3 A1 item3
ID3 A1
2012 Feb 22
1
Lattice and horizontally stacked density plots
Hello,
I am try to make a density plot where plots are stacked like the one
found here:
http://dsarkar.fhcrc.org/lattice/book/images/Figure_14_03_stdBW.png
I am facing problems, however. Using the code example below, I'd like
to generate a separate panel for each val of id2. Within each panel,
I'd like to have individual histograms each on separate lines based on
the value of id1. ?Note
2011 May 25
1
Subtracting rows by id
Dear R users,
I have two datasets:
id1 <- c(rep(1,10), rep(2,10), rep(3,10))
value1 <- sample(1:100, 30, replace=TRUE)
dataset1 <- cbind(id1,value1)
id2 <- c(1,2,3)
subtract.value <- c(1,3,5)
dataset2 <- cbind(id2, subtract.value)
I want to subtract the number of rows in the subtract.value that
corresponds to the id value in dataset1. So for the 1 in id1, I want
to
2011 Apr 20
1
How to check if a value of a variable is in a list
Hi all,
I am working with some social network analysis in R and ran into a problem I
just cannot solve.
Each observation in my data consists of a respondent, some characteristics
and up to five friends. The problem is that all of these five friends might
no show up later as a respondent (observation). Therefore I might not have
characteristics on all the friends listed in the data and I want to
2008 Sep 25
2
How to order some of my columns (not rows) alphabetically
Hello,
I have a dataframe with 9 columns, and I would like to sort (order) the
right-most eight of them alphabetiaclly, i.e.:
ID1 ID2 F G A B C E D
would become
ID1 ID2 A B C D E F G
Right now, I'm using this code:
attach(data)
data<-data.frame(ID1,ID2,data[,sort(colnames(data)[3:9])])
detach(data)
but that's not very elegant. Ideally I could specify which columns to
sort and
2006 Jan 14
3
In place editing and external control
Dear all,
First I''d like to thank authors for so nice Scriptaculous and Prototype
libraries, which helped me already a lot!
I have question regarding externalControl parameter in InPlaceEditor. If
I understand correctly, I can use that to have one image as a trigger to
enter edit mode? I tried with below code but without success:
<span id="id1">My text</span>
2006 Feb 09
1
List Conversion
Hello,
I have a list (mode and class are list) in R that is many elements long and of the form:
>length(list)
[1] 5778
>list[1:4]
$ID1
[1] "num1"
$ID2
[1] "num2" "num3"
$ID3
[1] "num4"
$ID4
[1] NA
I'd like to convert the $ID2 value to be in one element rather than in two.?? It shows up as c(\"num2\", \"num3\") if I try to use
2010 Jun 03
2
deduplication
Colleagues,
I am trying to de-duplicate a large (long) database (approx 1mil records) of
diagnostic tests. Individuals in the database can have up-to 25
observations, but most will have only one. IDs for de-duplication (names,
sex, lab number...) are patchy. In a first step, I am using Andreas Borg's
excellent record linkage package (), that leaves me with a list of 'pairs'
looking
2012 Jan 27
1
multiple column comparison
Hello,
I have a very large content analysis project, which I've just begun to
collect training data on. I have three coders, who are entering data on up
to 95 measurements. Traditionally, I've used Excel to check coder agreement
(e.g., percentage agreement), by lining up each coder's measurements
side-by-side, creating a new column with the results using if statements.
That is, if
2008 Feb 07
1
Help with package reshape, wide to long
Hello,
I am having difficulty figuring out how to use functions in the
reshape package to perform a wide to long transformation
I have a "wide" dataframe whose columns are like this example:
id1 id2 subject treat height weight age
id1 and id2 are unique for each row
subject and treat are not unique for each row
height, weight, and age are different types of measurements made on
2012 Feb 26
1
Matrix problem to extract animal associations
Dear List,
I have been trying to extract associations from a matrix whereby individual locations are within a certain distance threshold from one another.
I have been able to extract those individuals where there is 'no interaction' (i.e. where these individuals are not within a specified distance threshold from another individual) and give these individuals a unique Group ID containing