thr3ads.net - R help - [R] Newbie data organisation/structures question... [Dec 2006]

If this information is useful, please help other people find it:
Share via:

Gav Wood

2006-Dec-20 16:05 UTC

[R] Newbie data organisation/structures question...

Howdo folks,

So my data is in this sort of format:

P  T  I
1  1  (1, 2, 3)
2  1  (2, 4)
1  2  (1, 3, 6, 7)
2  2  (6)

And I want to be able to quickly get:

1: The I when both P and T are given. e.g.:
P = 2, T = 2; I = (6)

2: The concatenated vector of Is when P and a subset of T is given, e.g.:
P = 1, T = 1:2;  Is = (1, 2, 3, 1, 3, 6, 7)

3: The length of that vector.

It would also be nice to have:

4: A list of Is when either P or T is given. e.g.:
P = 2: I = (2, 4), (6)
T = 1: I = (1, 2, 3), (1, 3, 6, 7)

Currently, I have a matrix of P x T, whose elements are lists of a 
single item, the vector I. I call this 'm'.

(1) is easy; just m[P, T][[1]]
(2) and (3) are apparently much harder. For 3, I'm resorting to:

total <- 0
for(p in 1:length(m[,T]))
	total <- total + length(m[p,T][[1]]);

And something similar for 2.

There must surely be a better way of doing this; but what is it?

Cheers,

Gav

Marc Schwartz

2006-Dec-20 17:50 UTC

head link

[R] Newbie data organisation/structures question...

On Wed, 2006-12-20 at 16:05 +0000, Gav Wood wrote:> Howdo folks,
> 
> So my data is in this sort of format:
> 
> P  T  I
> 1  1  (1, 2, 3)
> 2  1  (2, 4)
> 1  2  (1, 3, 6, 7)
> 2  2  (6)
> 
> And I want to be able to quickly get:
> 
> 1: The I when both P and T are given. e.g.:
> P = 2, T = 2; I = (6)
> 
> 2: The concatenated vector of Is when P and a subset of T is given, e.g.:
> P = 1, T = 1:2;  Is = (1, 2, 3, 1, 3, 6, 7)
> 
> 3: The length of that vector.
> 
> It would also be nice to have:
> 
> 4: A list of Is when either P or T is given. e.g.:
> P = 2: I = (2, 4), (6)
> T = 1: I = (1, 2, 3), (1, 3, 6, 7)
> 
> Currently, I have a matrix of P x T, whose elements are lists of a 
> single item, the vector I. I call this 'm'.
> 
> (1) is easy; just m[P, T][[1]]
> (2) and (3) are apparently much harder. For 3, I'm resorting to:
> 
> total <- 0
> for(p in 1:length(m[,T]))
> 	total <- total + length(m[p,T][[1]]);
> 
> And something simiThis then giveslar for 2.
> 
> There must surely be a better way of doing this; but what is it?
> 
> Cheers,
> 
> Gav
Reading in your data using:

DF <- read.fwf("clipboard", widths = c(3, 3, 12),
               skip = 1)

colnames(DF) <- c("P", "T", "I")


Substitute your actual data file name for 'clipboard' above.


Note that I skip the header row, as the "T" causes problems, since it
wants to be converted to 'TRUE' (logical, not char) upon import,
screwing up the column widths. I then assign the colnames post import.

This then gives me:
> DF  P T            I
1 1 1    (1, 2, 3)
2 2 1       (2, 4)
3 1 2 (1, 3, 6, 7)
4 2 2          (6)

Given the manipulations that you appear to want to do, I would first
strip the parens from "I" to make subsequent operations easier:

DF$I <- gsub("\\(|\\)", "", DF$I)

So:
> DF  P T          I
1 1 1    1, 2, 3
2 2 1       2, 4
3 1 2 1, 3, 6, 7
4 2 2          6


Now, split the character vector based DF$I into components and convert
it to numeric lists:
> DF$I <- lapply(strsplit(DF$I, split = ","), as.numeric)
> DF  P T          I
1 1 1    1, 2, 3
2 2 1       2, 4
3 1 2 1, 3, 6, 7
4 2 2          6

# Look at the structure of 'DF'> str(DF)'data.frame':	4 obs. of  3 variables:
 $ P: num  1 2 1 2
 $ T: num  1 1 2 2
 $ I:List of 4
  ..$ : num  1 2 3
  ..$ : num  2 4
  ..$ : num  1 3 6 7
  ..$ : num 6


Now for your manipulations above:

1: The I when both P and T are given. e.g.:
P = 2, T = 2; I = (6)
> subset(DF, (P == 2) & (T == 2), select = I)  I
4 6


2: The concatenated vector of Is when P and a subset of T is given,
e.g.:
P = 1, T = 1:2;  Is = (1, 2, 3, 1, 3, 6, 7)
> unlist(subset(DF, (P == 1) & (T %in% 1:2), select = I))I1 I2 I3 I4 I5 I6 I7 
 1  2  3  1  3  6  7

or you can use:
> as.vector(unlist(subset(DF, (P == 1) & (T %in% 1:2), select = I)))[1] 1 2 3 1 3 6 7

which strips the name attributes from the vector.



3: The length of that vector.
> length(unlist(subset(DF, (P == 1) & (T %in% 1:2), select = I)))[1] 7



4: A list of Is when either P or T is given. e.g.:
P = 2: I = (2, 4), (6)
T = 1: I = (1, 2, 3), (1, 3, 6, 7)

> subset(DF, P == 2, select = I)     I
2 2, 4
4    6

> subset(DF, T == 1, select = I)        I
1 1, 2, 3
2    2, 4

Note that your example above for 'T == 1' in 4 is incorrect based upon
your example data. "(1, 3, 6, 7)" is on the row where T == 2.   :-)


See ?read.fwf, ?read.table, ?subset, ?split, ?gsub, ?lapply, ?unlist, ?Syntax
and ?Comparison for more information.

HTH,

Marc Schwartz

Michael Kubovy

2006-Dec-20 18:14 UTC

head link

[R] Newbie data organisation/structures question...

On Dec 20, 2006, at 11:05 AM, Gav Wood wrote:
> So my data is in this sort of format:
>
> P  T  I
> 1  1  (1, 2, 3)
> 2  1  (2, 4)
> 1  2  (1, 3, 6, 7)
> 2  2  (6)
Not knowing why you organized the data as you did, let me suggest  
another approach:

iv <- c(1, 2, 3, 2, 4, 1, 3, 6, 7, 6)
p <- c(1, 1, 1, 2, 2, 1, 1, 1, 1, 2)
t <- rep(1:2, each = 5)
dat <- data.frame(iv, p, t)
> And I want to be able to quickly get:
>
> The I when both P and T are given. e.g.:
> P = 2, T = 2; I = (6)
subset(dat, p == 2 & t ==2)$iv
> The concatenated vector of Is when P and a subset of T is given, e.g.:
> P = 1, T = 1:2;  Is = (1, 2, 3, 1, 3, 6, 7)
(iv1 <- subset(dat, p == 1)$iv)
> The length of that vector.
length(iv1)
> A list of Is when either P or T is given. e.g.:
> P = 2: I = (2, 4), (6)
> T = 1: I = (1, 2, 3), (1, 3, 6, 7)
list(p2t1 = subset(dat, p == 2 & t ==1)$iv, p2t2 = subset(dat, p == 2  
& t ==2)$iv)
list(p1t1 = subset(dat, p == 1 & t ==1)$iv, p1t2 = subset(dat, p == 1  
& t ==2)$iv) # correcting your requirement to get your result

There are many other ways of getting the results you need as Marc  
Schwartz pointed out in his reply.
_____________________________
Professor Michael Kubovy
University of Virginia
Department of Psychology
USPS:     P.O.Box 400400    Charlottesville, VA 22904-4400
Parcels:    Room 102        Gilmer Hall
         McCormick Road    Charlottesville, VA 22903
Office:    B011    +1-434-982-4729
Lab:        B019    +1-434-982-4751
Fax:        +1-434-982-4766
WWW:    http://www.people.virginia.edu/~mk9y/

Apparently Analagous Threads

Search for more apparently analagous threads

R help - Dec 2006 - Newbie data organisation/structures question...

[R] Newbie data organisation/structures question...

[R] Newbie data organisation/structures question...

[R] Newbie data organisation/structures question...

Apparently Analagous Threads