groemping@tfh-berlin.de
2006-Mar-05 08:50 UTC
[Rd] Wishlist: merge and subset to keep attributes (PR#8658)
Full_Name: Ulrike Gr?mping Version: 2.2.1 OS: Windows Submission from: (NULL) (84.190.139.94) When importing data from SPSS, it is a nice feature of the package foreign that it allows (option use.value.labels=F) to work with the original SPSS codes while keeping the value labels as information in an attribute. Unfortunately, after merging or subsetting, these attributes disappear. The code below illustrates the problem: Variable time originally has value labels that are gone after merging or subsetting. It would be very helpful, if this could be changed. With kind regards, Ulrike ----------------------------------------------------------------- data1 <- data.frame(id=c("Id1","Id2","Id3","Id4","Id5","Id6"), time=c(3,4,3,5,9,4)) vallab <- c(3,4,5,9) names(vallab) <- c("day","night","twilight","unknown") attr(data1$time,"value.labels")<-vallab str(data1) ## gives the output: ## `data.frame': 6 obs. of 2 variables: ## $ id : Factor w/ 6 levels "Id1","Id2","Id3",..: 1 2 3 4 5 6 ## $ time: atomic 3 4 3 5 9 4 ## ..- attr(*, "value.labels")= Named num 3 4 5 9 ## .. ..- attr(*, "names")= chr "day" "night" "twilight" "unknown" data2 <- data.frame(id=rep(c("Id1","Id2","Id3","Id4","Id5","Id6"),2), y=rnorm(12)) merged <- merge(data1,data2) subset <- subset(data1,id %in% c("Id2","Id4","Id6")) str(merged) ## gives the output: ## `data.frame': 12 obs. of 3 variables: ## $ id : Factor w/ 6 levels "Id1","Id2","Id3",..: 1 1 2 2 3 3 4 4 5 5 ... ## $ time: num 3 3 4 4 3 3 5 5 9 9 ... ## $ y : num -0.621 -2.617 -0.980 0.486 -0.558 ... str(subset) ## gives the output: ## `data.frame': 3 obs. of 2 variables: ## $ id : Factor w/ 6 levels "Id1","Id2","Id3",..: 2 4 6 ## $ time: num 4 5 4
Frank E Harrell Jr
2006-Mar-05 14:24 UTC
[Rd] Wishlist: merge and subset to keep attributes (PR#8658)
When importing data from SPSS, it is a nice feature of the package foreign that it allows (option use.value.labels=F) to work with the original SPSS codes while keeping the value labels as information in an attribute. Unfortunately, after merging or subsetting, these attributes disappear. The code below illustrates the problem: Variable time originally has value labels that are gone after merging or subsetting. It would be very helpful, if this could be changed. With kind regards, Ulrike ------------------------------- Ulrike - see the spss.get, label, contents, and describe functions in the Hmisc package. -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University
Ulrike Grömping
2006-Mar-12 16:51 UTC
[Rd] Wishlist: merge and subset to keep attributes (PR#8658)
> When importing data from SPSS, it is a nice feature of the package > foreign that > it allows (option use.value.labels=F) to work with the original SPSS > codes while > keeping the value labels as information in an attribute. Unfortunately, > after > merging or subsetting, these attributes disappear. > The code below illustrates the problem: Variable time originally has value > labels that are gone after merging or subsetting. > > It would be very helpful, if this could be changed. > > With kind regards, Ulrike > ------------------------------- > > Ulrike - see the spss.get, label, contents, and describe functions in > the Hmisc package. > > -- > Frank E Harrell Jr ? Professor and Chair ? ? ? ? ? School of Medicine > ? ? ? ? ? ? ? ? ? ? ? Department of Biostatistics ? Vanderbilt University------- End of Original Message ------- For the sake of completeness of the thread in R-devel: After a longer offline exchange, Frank and I have agreed that Hmisc spss.get currently does not offer more than read.spss from package foreign in terms of being able to use both original codes and value labels from SPSS files (which is desirable when working with large datasets from well-documented studies that often require filtering rules based on original codes to be applied while at the same time one does want to preseve annotation with value labels). The solution from package foreign: The option "use.value.labels=F" prevents SPSS factors (with codes and value labels) to be read into R as factors. Instead, codes are read as numeric values, and the value labels are preserved by assigning an attribute "value.labels" to each such variable. My issue is that these attributes are lost when subsetting or merging such datasets. I have no idea how difficult it is to get this changed; if it is doable without too much hassle, it would be great. And by the way - not mentioned in my wish - read.spss also assigns the attribute "variable.labels" to the dataset itself. This attribute is currently also lost when merging or subsetting. (Here, spss.get from Hmisc works differently by assigning each variable a class and a label attribute which are preserved. I have the suspicion that this makes spss.get substantially slower than read.spss; on the other hand, it makes it easier to use these labels in annotation.) With kind regards, Ulrike
Reasonably Related Threads
- Wishlist: xtabs and table to optionally use attribute value labels (PR#8659)
- Matrix problem to extract animal associations
- How to check if a value of a variable is in a list
- colnames for data.frame could be greatly improved
- colnames for data.frame could be greatly improved