axel.benz@iao.fhg.de
2003-Sep-16 21:40 UTC
[Rd] How does "subset" replace arguments? (PR#4193)
Full_Name: Axel Benz Version: 1.7.1 OS: Windows Submission from: (NULL) (137.251.33.43) Hello, I guess many people will answer me again that this is a S language feature, but I am only a stupid computer scientist and I simply do not understand this logic, despite of reading a lot about S:> testfield tuckey 4 Kreis2 -1 5 Kreis5 -2 9 Metall -3 17 Kreis1 -4 19 Kreis8 -5> subset(test,field=="Metall")field tuckey 9 Metall -3> subset(test,toString(field)=="Metall")[1] field tuckey <0 rows> (or 0-length row.names) This happens everytime I use a function with the column name ("field", in this case) as parameter in the logic expression in "subset", instead of using the column name on top level. I have the impression that the column name is only replaced when standing in top level position. I would call that "very lazy evaluation" ;-) ;-) Thank you for a friendly answer, this language is realy weird to me.
On Tuesday 16 September 2003 14:39, axel.benz@iao.fhg.de wrote:> Full_Name: Axel Benz > Version: 1.7.1 > OS: Windows > Submission from: (NULL) (137.251.33.43) > > > Hello, > I guess many people will answer me again that this is a S language feature,but> I am only a stupid computer scientist and I simply do not understand thislogic,> despite of reading a lot about S: > > > test > field tuckey > 4 Kreis2 -1 > 5 Kreis5 -2 > 9 Metall -3 > 17 Kreis1 -4 > 19 Kreis8 -5 > > > subset(test,field=="Metall") > field tuckey > 9 Metall -3 > > > subset(test,toString(field)=="Metall") > [1] field tuckey > <0 rows> (or 0-length row.names)I don't see any problem here. toString(field), evaluated in the data frame test, should be the single string "Kreis2, Kreis5, Metall, Kreis1, Kreis8" So, the comparison toString(field)=="Metall" actually does "Kreis2, Kreis5, Metall, Kreis1, Kreis8" == "Metall" which being false, returns FALSE, and so you finally should get subset(test, FALSE) which is what you do get. Perhaps you misunderstood what the function toString() does. HTH, Deepayan P.S. Please don't use R-bugs to report what may or may not be bugs, since all such reports have to be processed manually. Ask on r-help or r-devel first if you are not sure.
On Tue, 16 Sep 2003 axel.benz@iao.fhg.de wrote:> Full_Name: Axel Benz > Version: 1.7.1 > OS: Windows > Submission from: (NULL) (137.251.33.43) > > > Hello, I guess many people will answer me again that this is a S > language feature, but I am only a stupid computer scientist and I simply > do not understand this logic, despite of reading a lot about S:The point they are trying to make is that you should send this sort of question to r-devel or r-help, not r-bugs. The point of r-bugs is as a repository for bug reports, not as a discussion list.> > test > field tuckey > 4 Kreis2 -1 > 5 Kreis5 -2 > 9 Metall -3 > 17 Kreis1 -4 > 19 Kreis8 -5 > > > subset(test,field=="Metall") > field tuckey > 9 Metall -3 > > > subset(test,toString(field)=="Metall") > [1] field tuckey > <0 rows> (or 0-length row.names) > > This happens everytime I use a function with the column name ("field", in this > case) as parameter in the logic expression in "subset", instead of using the > column name on top level. I have the impression that the column name is only > replaced when standing in top level position. I would call that "very lazy > evaluation" ;-) ;-) > Thank you for a friendly answer, this language is realy weird to me. >Your impression is incorrect. The problem with toString is that it collapses a vector to a single string, so toString(field) is the string "Kreis2, Kreis5, Metall, Kries1, Kries8". There is no record whose `field' is equal to that string. Did you check to see that toString did what you thought it did? subset() will work as I think you expect if the output of the function is the same length as the input. For example, consider one of the built-in data sets data(esoph)> subset(esoph, toString(agegp)=="75+")[1] agegp alcgp tobgp ncases ncontrols <0 rows> (or 0-length row.names) but> subset(esoph, as.character(agegp)=="75+")agegp alcgp tobgp ncases ncontrols 78 75+ 0-39g/day 0-9g/day 1 18 79 75+ 0-39g/day 10-19 2 6 80 75+ 0-39g/day 30+ 1 3 81 75+ 40-79 0-9g/day 2 5 82 75+ 40-79 10-19 1 3 83 75+ 40-79 20-29 0 3 84 75+ 40-79 30+ 1 1 85 75+ 80-119 0-9g/day 1 1 86 75+ 80-119 10-19 1 1 87 75+ 120+ 0-9g/day 2 2 88 75+ 120+ 10-19 1 1 or to take a really extreme version> subset(esoph, substr(paste(as.character(agegp),toupper(as.character(agegp))),3,6)== "+ 75")agegp alcgp tobgp ncases ncontrols 78 75+ 0-39g/day 0-9g/day 1 18 79 75+ 0-39g/day 10-19 2 6 80 75+ 0-39g/day 30+ 1 3 81 75+ 40-79 0-9g/day 2 5 82 75+ 40-79 10-19 1 3 83 75+ 40-79 20-29 0 3 84 75+ 40-79 30+ 1 1 85 75+ 80-119 0-9g/day 1 1 86 75+ 80-119 10-19 1 1 87 75+ 120+ 0-9g/day 2 2 88 75+ 120+ 10-19 1 1 -thomas