thr3ads.net - R help - [R] Reading hierarchical data [Feb 2010]

If this information is useful, please help other people find it:
Share via:

Saba(Home)

2010-Feb-07 15:57 UTC

[R] Reading hierarchical data

I would like to read the following hierarchical data set. There is a family
record followed by one or more personal records.
If col. 7 is "1" it is a family record. If it is "2" it is a
personal
record.
The family record is formatted as follows: 
col. 1-5     family id
col. 7        "1"
col. 9        dwelling type code
The personal record is formatted as follows:
col. 1-5 	personal id
col. 7	 "2"
col. 8-9	age
col. 11	sex code

The first six family and accompanying personal records look like this:
06470 1 1
    1 232 0
    2 230 1
07470 1 0
    1 240 1
08470 1 0
    1 227 0
09470 1 0
    1 213 1
    2 222 0
    3 224 1
10470 1 1
    1 220 0
    2 211 1
11470 1 0
    1 217 0
    2 210 1
    3 226 1

I want to create a dataset containing 
. family ID 
. dwelling code 
. person ID 
. age 
. sex code 
The dataset will contain one observation per person, and the with family
information repeated for people in the same family. 
Can anyone help?
Thanks,
Richard Saba

jim holtman

2010-Feb-07 17:05 UTC

head link

[R] Reading hierarchical data

Will this do it for you:
> input <- readLines(textConnection("06470 1 1+     1 232 0
+     2 230 1
+ 07470 1 0
+     1 240 1
+ 08470 1 0
+     1 227 0
+ 09470 1 0
+     1 213 1
+     2 222 0
+     3 224 1
+ 10470 1 1
+     1 220 0
+     2 211 1
+ 11470 1 0
+     1 217 0
+     2 210 1
+     3 226 1"))> closeAllConnections()
> fid <- NULL
> dwell <- NULL
> result <- do.call(rbind, lapply(input, function(.line){+     values <- as.integer(substring(.line, c(1, 7, 9), c(5, 7, 9)))
# assume family record
+     if (values[2] == '1'){
+         fid <<- values[1]
+         dwell <<- values[3]
+         return(NULL)
+     } else {
+         values <- as.integer(substring(.line, c(1, 7, 8, 11), c(5, 7, 9,
11)))
+         return(c(fid=fid, dwell=dwell, pid=values[1], age=values[3],
sex=values[4]))
+     }
+ }))>
> result        fid dwell pid age sex
 [1,]  6470     1   1  32   0
 [2,]  6470     1   2  30   1
 [3,]  7470     0   1  40   1
 [4,]  8470     0   1  27   0
 [5,]  9470     0   1  13   1
 [6,]  9470     0   2  22   0
 [7,]  9470     0   3  24   1
 [8,] 10470     1   1  20   0
 [9,] 10470     1   2  11   1
[10,] 11470     0   1  17   0
[11,] 11470     0   2  10   1
[12,] 11470     0   3  26   1


On Sun, Feb 7, 2010 at 10:57 AM, Saba(Home) <sabaric at charter.net>
wrote:>
> I would like to read the following hierarchical data set. There is a family
> record followed by one or more personal records.
> If col. 7 is "1" it is a family record. If it is "2" it
is a personal
> record.
> The family record is formatted as follows:
> col. 1-5 ? ? family id
> col. 7 ? ? ? ?"1"
> col. 9 ? ? ? ?dwelling type code
> The personal record is formatted as follows:
> col. 1-5 ? ? ? ?personal id
> col. 7 ? "2"
> col. 8-9 ? ? ? ?age
> col. 11 sex code
>
> The first six family and accompanying personal records look like this:
> 06470 1 1
> ? ?1 232 0
> ? ?2 230 1
> 07470 1 0
> ? ?1 240 1
> 08470 1 0
> ? ?1 227 0
> 09470 1 0
> ? ?1 213 1
> ? ?2 222 0
> ? ?3 224 1
> 10470 1 1
> ? ?1 220 0
> ? ?2 211 1
> 11470 1 0
> ? ?1 217 0
> ? ?2 210 1
> ? ?3 226 1
>
> I want to create a dataset containing
> . family ID
> . dwelling code
> . person ID
> . age
> . sex code
> The dataset will contain one observation per person, and the with family
> information repeated for people in the same family.
> Can anyone help?
> Thanks,
> Richard Saba
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

Gabor Grothendieck

2010-Feb-07 23:30 UTC

head link

[R] Reading hierarchical data

Try this. It uses input defined in Jim's post and defines the rectype
of each row ("1" or "2").  It then reads the rectype
"1" records into
DF1 using read.fwf and the rectype "2" records into DF2 also using
read.fwf.  ix is defined to have one component per personal record
giving the row number in DF1 of the corresponding family.  We combine
DF1 and DF2 using ix and remove the column names that start with "X".

# record type ("1" or "2")
rectype <- substr(input, 7, 7)

# read in record type "1"
input1 <- input[rectype == "1"]
DF1 <- read.fwf(textConnection(input1), widths = c(5, 1, 1, 1, 1),
	col.names = c("familyid", "X", "X",
"X", "dwelling"))

# read in record type "2"
input2 <- input[rectype == "2"]
DF2 <- read.fwf(textConnection(input2), widths = c(5, 1, 1, 2, 1, 1),
	col.names = c("personalid", "X", "X",
"age", "X", "sex"))

# ix is the index in DF1 of family row corresponding to each personal row in DF2
ix <- cumsum(rectype == "1")[rectype == "2"]
DF <- cbind(DF1[ix,], DF2)
DF <- DF[substr(names(DF), 1, 1) != "X"]

so DF looks like this:
> DF    familyid dwelling personalid age sex
1       6470        1          1  32   0
1.1     6470        1          2  30   1
2       7470        0          1  40   1
3       8470        0          1  27   0
4       9470        0          1  13   1
4.1     9470        0          2  22   0
4.2     9470        0          3  24   1
5      10470        1          1  20   0
5.1    10470        1          2  11   1
6      11470        0          1  17   0
6.1    11470        0          2  10   1
6.2    11470        0          3  26   1

On Sun, Feb 7, 2010 at 10:57 AM, Saba(Home) <sabaric at charter.net>
wrote:>
> I would like to read the following hierarchical data set. There is a family
> record followed by one or more personal records.
> If col. 7 is "1" it is a family record. If it is "2" it
is a personal
> record.
> The family record is formatted as follows:
> col. 1-5 ? ? family id
> col. 7 ? ? ? ?"1"
> col. 9 ? ? ? ?dwelling type code
> The personal record is formatted as follows:
> col. 1-5 ? ? ? ?personal id
> col. 7 ? "2"
> col. 8-9 ? ? ? ?age
> col. 11 sex code
>
> The first six family and accompanying personal records look like this:
> 06470 1 1
> ? ?1 232 0
> ? ?2 230 1
> 07470 1 0
> ? ?1 240 1
> 08470 1 0
> ? ?1 227 0
> 09470 1 0
> ? ?1 213 1
> ? ?2 222 0
> ? ?3 224 1
> 10470 1 1
> ? ?1 220 0
> ? ?2 211 1
> 11470 1 0
> ? ?1 217 0
> ? ?2 210 1
> ? ?3 226 1
>
> I want to create a dataset containing
> . family ID
> . dwelling code
> . person ID
> . age
> . sex code
> The dataset will contain one observation per person, and the with family
> information repeated for people in the same family.
> Can anyone help?
> Thanks,
> Richard Saba
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Possibly Parallel Threads

Search for more possibly parallel threads

R help - Feb 2010 - Reading hierarchical data

[R] Reading hierarchical data

[R] Reading hierarchical data

[R] Reading hierarchical data

Possibly Parallel Threads