Hello R users, I'm trying to replace numerical values in a datamatrix with strings. R does this except for numbers under 10000 starting with a 9 (eg 98, 970, 9504 etc). This is really weird and I wondered whether someone had encountered such a problem or knows the solution. I'm using the next script: test_1 <- read.table("5+ref_151111clusters3.csv", header = TRUE, sep = ",", colClasses = "numeric") test_1[test_1 > 94885 & test_1 <= 113835] = "KE3926OT" test_1[test_1 != 0 & test_1 <= 18954] = "I8456" test_1[test_1 > 75944 & test_1 <= 94885] = "KE3873" test_1[test_1 > 56951 & test_1 <= 75944] = "KE3870" test_1[test_1 > 37991 & test_1 <= 56951] = "Cyprus1" test_1[test_1 > 18954 & test_1 <= 37991] = "ref" write.table(test_1, file = "test_replace7.txt", quote = FALSE, sep="\t") Thanks, Set -- View this message in context: http://r.789695.n4.nabble.com/R-ignores-number-only-with-a-nine-under-10000-tp4091936p4091936.html Sent from the R help mailing list archive at Nabble.com.
R. Michael Weylandt
2011-Nov-21 17:03 UTC
[R] R ignores number only with a nine under 10000
This can't be reproduced without data -- kindly supply the result of test_1 right after the first line using dput() if you would. Michael On Mon, Nov 21, 2011 at 10:42 AM, set <astareh at hotmail.com> wrote:> Hello R users, > > I'm trying to replace numerical values in a datamatrix with strings. R does > this except for numbers under 10000 starting with a 9 (eg 98, 970, 9504 > etc). This is really weird and I wondered whether someone had encountered > such a problem or knows the solution. I'm using the next script: > > test_1 <- read.table("5+ref_151111clusters3.csv", header = TRUE, sep = ",", > colClasses = "numeric") > test_1[test_1 > 94885 & test_1 <= 113835] = "KE3926OT" > test_1[test_1 != 0 & test_1 <= 18954] = "I8456" > test_1[test_1 > 75944 & test_1 <= 94885] = "KE3873" > test_1[test_1 > 56951 & test_1 <= 75944] = "KE3870" > test_1[test_1 > 37991 & test_1 <= 56951] = "Cyprus1" > test_1[test_1 > 18954 & test_1 <= 37991] = "ref" > write.table(test_1, file = "test_replace7.txt", quote = FALSE, sep="\t") > > Thanks, > Set > > -- > View this message in context: http://r.789695.n4.nabble.com/R-ignores-number-only-with-a-nine-under-10000-tp4091936p4091936.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
1) "datamatrix" is not a defined term. I think you mean "data.frame". 2) you have not supplied any sample data, so your example is not reproducible. 3) All of the values in a vector (i.e. a column of a data.table must be of the same type, be that character or numeric (or anything else, such as factor). We cannot tell what data you have in your file, but if you are already trying to mix numeric and strings then the data are probably being imported as factors which act like numbers in some cases and as strings in others. You might need to look at the arguments for read.table to turn off conversion to factor. --------------------------------------------------------------------------- Jeff Newmiller The ..... ..... Go Live... DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/Batteries O.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --------------------------------------------------------------------------- Sent from my phone. Please excuse my brevity. set <astareh at hotmail.com> wrote:>Hello R users, > >I'm trying to replace numerical values in a datamatrix with strings. R >does >this except for numbers under 10000 starting with a 9 (eg 98, 970, 9504 >etc). This is really weird and I wondered whether someone had >encountered >such a problem or knows the solution. I'm using the next script: > >test_1 <- read.table("5+ref_151111clusters3.csv", header = TRUE, sep >",", >colClasses = "numeric") >test_1[test_1 > 94885 & test_1 <= 113835] = "KE3926OT" >test_1[test_1 != 0 & test_1 <= 18954] = "I8456" >test_1[test_1 > 75944 & test_1 <= 94885] = "KE3873" >test_1[test_1 > 56951 & test_1 <= 75944] = "KE3870" >test_1[test_1 > 37991 & test_1 <= 56951] = "Cyprus1" >test_1[test_1 > 18954 & test_1 <= 37991] = "ref" >write.table(test_1, file = "test_replace7.txt", quote = FALSE, >sep="\t") > >Thanks, >Set > >-- >View this message in context: >http://r.789695.n4.nabble.com/R-ignores-number-only-with-a-nine-under-10000-tp4091936p4091936.html >Sent from the R help mailing list archive at Nabble.com. > >______________________________________________ >R-help at r-project.org mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.
Hi: Strictly a guess, but the following might be helpful. The call below assumes that the referent data frame is test1, which consists of a single column named x. Modify as appropriate. test_lab <- with(test1, cut(x, c(0, 18954, 37791, 56951, 75944, 84885, 113835), labels = c('I8456', 'ref', 'Cyprus1', 'KE3870', 'KE3873', 'KE3926OT'))) cut() creates a factor from a numeric variable. The second argument consists of the cut points and the third argument generates the labels to be associated with values falling between the cut points. See ?cut for more details, and pay attention to the options. The object test_lab is a vector external to test1; if you want it to be a column of test1, then add it to the data frame in one of the usual ways. HTH, Dennis On Mon, Nov 21, 2011 at 7:42 AM, set <astareh at hotmail.com> wrote:> Hello R users, > > I'm trying to replace numerical values in a datamatrix with strings. R does > this except for numbers under 10000 starting with a 9 (eg 98, 970, 9504 > etc). This is really weird and I wondered whether someone had encountered > such a problem or knows the solution. I'm using the next script: > > test_1 <- read.table("5+ref_151111clusters3.csv", header = TRUE, sep = ",", > colClasses = "numeric") > test_1[test_1 > 94885 & test_1 <= 113835] = "KE3926OT" > test_1[test_1 != 0 & test_1 <= 18954] = "I8456" > test_1[test_1 > 75944 & test_1 <= 94885] = "KE3873" > test_1[test_1 > 56951 & test_1 <= 75944] = "KE3870" > test_1[test_1 > 37991 & test_1 <= 56951] = "Cyprus1" > test_1[test_1 > 18954 & test_1 <= 37991] = "ref" > write.table(test_1, file = "test_replace7.txt", quote = FALSE, sep="\t") > > Thanks, > Set > > -- > View this message in context: http://r.789695.n4.nabble.com/R-ignores-number-only-with-a-nine-under-10000-tp4091936p4091936.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
On Mon, Nov 21, 2011 at 7:42 AM, set <astareh at hotmail.com> wrote:> Hello R users, > > I'm trying to replace numerical values in a datamatrix with strings. R does > this except for numbers under 10000 starting with a 9 (eg 98, 970, 9504 > etc). This is really weird and I wondered whether someone had encountered > such a problem or knows the solution. I'm using the next script: > > test_1 <- read.table("5+ref_151111clusters3.csv", header = TRUE, sep = ",", > colClasses = "numeric") > test_1[test_1 > 94885 & test_1 <= 113835] = "KE3926OT" > test_1[test_1 != 0 & test_1 <= 18954] = "I8456" > test_1[test_1 > 75944 & test_1 <= 94885] = "KE3873" > test_1[test_1 > 56951 & test_1 <= 75944] = "KE3870" > test_1[test_1 > 37991 & test_1 <= 56951] = "Cyprus1" > test_1[test_1 > 18954 & test_1 <= 37991] = "ref" > write.table(test_1, file = "test_replace7.txt", quote = FALSE, sep="\t")I think others have already hinted at the problem, but here it is once again more explicitly: your line test_1[test_1 > 94885 & test_1 <= 113835] = "KE3926OT" converts the entire test1 to character (or at least the columns in which a replacement happens). When something is a character, you will find "strange" results: a = "109" b = "9" a<b> a<b[1] TRUE Note that when one side of a comparison is numeric and the other character, the numeric is converted to character and then they are compared:> b = 9 > class(a)[1] "character"> class(b)[1] "numeric"> a<b[1] TRUE This is why your entries starting with 9 are "ignored" - because as character strings they are the largest. The solution is simple: create a test2 initialized to test1: test2 = test1 then replace elements in test2 depending on test1, for example test_2[test_1 > 94885 & test_1 <= 113835] = "KE3926OT" This way your test1 remains numeric and the comparisons will work as you expect. HTH Peter
Thank you everybody! Eventually the Peter's trick did it! Thanks! -- View this message in context: http://r.789695.n4.nabble.com/R-ignores-number-only-with-a-nine-under-10000-tp4091936p4093692.html Sent from the R help mailing list archive at Nabble.com.