I have genetic information for several thousand individuals: A/T T/G C/G etc For some individuals there are some genotypes that are like this: A/, C/, T/, G/ or even just / which represents missing and I want to change these to the following: A/ A/. C/ C/. G/ G/. T/ T/. / ./. /A ./A /C ./C /G ./G /T ./T I've tried to use gsub with a command like the following: gsub("A/","[A/.]", GT[,6]) but if genotypes arent like the above, the command will change it to look something like: A/.T T/.G C/.G Is there anyway to be more specific in gsub? Thanks!
Hi, Briefly, you need to read about regular expressions. It's possible to be incredibly specific, and even to do what you want with a single line of code. It's hard to be certain of exactly what you need, though, without a reproducible example. See inline for one possibility. On Fri, Dec 5, 2014 at 2:24 PM, Kate Ignatius <kate.ignatius at gmail.com> wrote:> I have genetic information for several thousand individuals: > > A/T > T/G > C/G etc > > For some individuals there are some genotypes that are like this: A/, > C/, T/, G/ or even just / which represents missing and I want to > change these to the following: > > A/ A/. > C/ C/. > G/ G/. > T/ T/. > / ./. > /A ./A > /C ./C > /G ./G > /T ./T > > I've tried to use gsub with a command like the following: > > gsub("A/","[A/.]", GT[,6])I don't understand why you put square brackets in, and you probably want the end marker to distinguish A/ from A/A gsub("A/$","A/.", GT[,6])> but if genotypes arent like the above, the command will change it to > look something like: > > A/.T > T/.G > C/.G > > Is there anyway to be more specific in gsub?Sarah -- Sarah Goslee http://www.functionaldiversity.org
On 12/5/2014 11:24 AM, Kate Ignatius wrote:> I have genetic information for several thousand individuals: > > A/T > T/G > C/G etc > > For some individuals there are some genotypes that are like this: A/, > C/, T/, G/ or even just / which represents missing and I want to > change these to the following: > > A/ A/. > C/ C/. > G/ G/. > T/ T/. > / ./. > /A ./A > /C ./C > /G ./G > /T ./T > > I've tried to use gsub with a command like the following: > > gsub("A/","[A/.]", GT[,6])Hi Kate -- a different approach is to create a 'map' (named character vector) describing what you want in terms of what you have; the number of possible genotypes is not large. http://stackoverflow.com/questions/15912210/replace-a-list-of-values-by-another-in-r/15912309#15912309 Martin> > but if genotypes arent like the above, the command will change it to > look something like: > > A/.T > T/.G > C/.G > > Is there anyway to be more specific in gsub? > > Thanks! > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Dr. Martin Morgan, PhD Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
Does the following do what you want?> raw <- c("A/B", " /B", "A/", "/ ") > tmp <- sub("^ */", "./", raw) > cleaned <- sub("/ *$", "/.", tmp) > cleaned[1] "A/B" "./B" "A/." "./." (The " *" is to allow optional spaces before or after the slash.) Bill Dunlap TIBCO Software wdunlap tibco.com On Fri, Dec 5, 2014 at 11:24 AM, Kate Ignatius <kate.ignatius at gmail.com> wrote:> I have genetic information for several thousand individuals: > > A/T > T/G > C/G etc > > For some individuals there are some genotypes that are like this: A/, > C/, T/, G/ or even just / which represents missing and I want to > change these to the following: > > A/ A/. > C/ C/. > G/ G/. > T/ T/. > / ./. > /A ./A > /C ./C > /G ./G > /T ./T > > I've tried to use gsub with a command like the following: > > gsub("A/","[A/.]", GT[,6]) > > but if genotypes arent like the above, the command will change it to > look something like: > > A/.T > T/.G > C/.G > > Is there anyway to be more specific in gsub? > > Thanks! > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]