Hi All, I have the following problem, that's driving me mad. I have a dataframe of factors, from a genetic scan of SNPs. I DO have NAs in the dataframe, which would look like: V4 V5 V6 V7 V8 V9 V10 1 TT GG TT AC AG AG TT 2 AT CC TT AA AA AA TT 3 AT CC TT AC AA <NA> TT 4 TT CC TT AA AA AA TT 5 AT CG TT CC AA AA TT 6 TT CC TT AA AA AA TT 7 AT CC TT CC <NA> <NA> TT 8 TT CC TT AC AG AG TT 9 AT CC TT CC AG <NA> TT 10 TT CC TT CC GG GG TT In the dataframe I have 1 column where one factor has been erroneosly given alternative readings: CG and GC. I want to change the instances of GC to CG and I use the code: data[data[,30]=="GC", 30] = "CG" but get the error: Error in "[<-.data.frame"(`*tmp*`, all[, 30] == "GC", 30 missing values are not allowed in subscripted as Any hints? Cheers, Federico -- Federico C. F. Calboli Department of Epidemiology and Public Health Imperial College, St Mary's Campus Norfolk Place, London W2 1PG Tel +44 (0)20 7594 1602 Fax (+44) 020 7594 3193 f.calboli [.a.t] imperial.ac.uk f.calboli [.a.t] gmail.com
Federico Calboli <f.calboli at imperial.ac.uk> writes:> Hi All, > > I have the following problem, that's driving me mad. > > I have a dataframe of factors, from a genetic scan of SNPs. I DO have > NAs in the dataframe, which would look like: > > V4 V5 V6 V7 V8 V9 V10 > 1 TT GG TT AC AG AG TT > 2 AT CC TT AA AA AA TT > 3 AT CC TT AC AA <NA> TT > 4 TT CC TT AA AA AA TT > 5 AT CG TT CC AA AA TT > 6 TT CC TT AA AA AA TT > 7 AT CC TT CC <NA> <NA> TT > 8 TT CC TT AC AG AG TT > 9 AT CC TT CC AG <NA> TT > 10 TT CC TT CC GG GG TT > > > In the dataframe I have 1 column where one factor has been erroneosly > given alternative readings: CG and GC. > > I want to change the instances of GC to CG and I use the code: > > data[data[,30]=="GC", 30] = "CG" > > but get the error: > Error in "[<-.data.frame"(`*tmp*`, all[, 30] == "GC", 30 > missing values are not allowed in subscripted as > > Any hints?data[isTRUE(data[,30]=="GC"), 30] = "CG" -- O__ ---- Peter Dalgaard ??ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
On Fri, 28 Oct 2005, Federico Calboli wrote:> Hi All, > > I have the following problem, that's driving me mad. > > I have a dataframe of factors, from a genetic scan of SNPs. I DO have > NAs in the dataframe, which would look like: > > V4 V5 V6 V7 V8 V9 V10 > 1 TT GG TT AC AG AG TT > 2 AT CC TT AA AA AA TT > 3 AT CC TT AC AA <NA> TT > 4 TT CC TT AA AA AA TT > 5 AT CG TT CC AA AA TT > 6 TT CC TT AA AA AA TT > 7 AT CC TT CC <NA> <NA> TT > 8 TT CC TT AC AG AG TT > 9 AT CC TT CC AG <NA> TT > 10 TT CC TT CC GG GG TT > > > In the dataframe I have 1 column where one factor has been erroneosly > given alternative readings: CG and GC. > > I want to change the instances of GC to CG and I use the code: > > data[data[,30]=="GC", 30] = "CG" > > but get the error: > Error in "[<-.data.frame"(`*tmp*`, all[, 30] == "GC", 30 > missing values are not allowed in subscripted as > > Any hints?1) Use %in% not = 2) (Better) As this is a factor, use levels<- to merge the levels. See ?levels. -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Federico, There doesn't appear to be an instance of the value you want to change in your example, so I had to improvise. Part of the problem may be that the dataframe is composed of factors, and it's not possible to convert the value of a factor to another value that's in the set of possible values, given by the levels() function. So, if you want to change GC to CG, but CG does not already exist in the set of possible values you'll have to add it. E.g. > tmp <- data > levels(tmp[,30]) <- c(levels(data[,30]),'CG') then, if the problem only occurs in one column it's an easy fix. > tmp[data=='GC'] <- 'CG' If GC occurs in multiple columns you'll either have to change the levels for each column as I did just above, or work with a single column. Since you don't have 30 columns in your example, let's pretend you want to change all the instances of 'CC' in data$V5 to 'XX' > tmp <- data > levels(tmp$V5) <- c(levels(data$V5),'XX') > tmp$V5[data$V5=='CC'] <- 'XX' > tmp V4 V5 V6 V7 V8 V9 V10 1 TT GG TT AC AG AG TT 2 AT XX TT AA AA AA TT 3 AT XX TT AC AA <NA> TT 4 TT XX TT AA AA AA TT 5 AT CG TT CC AA AA TT 6 TT XX TT AA AA AA TT 7 AT XX TT CC <NA> <NA> TT 8 TT XX TT AC AG AG TT 9 AT XX TT CC AG <NA> TT 10 TT XX TT CC GG GG TT Notice that the instances of 'CC' in tmp$V7 did not change. HTH, Dave Roberts Federico Calboli wrote:> Hi All, > > I have the following problem, that's driving me mad. > > I have a dataframe of factors, from a genetic scan of SNPs. I DO have > NAs in the dataframe, which would look like: > > V4 V5 V6 V7 V8 V9 V10 > 1 TT GG TT AC AG AG TT > 2 AT CC TT AA AA AA TT > 3 AT CC TT AC AA <NA> TT > 4 TT CC TT AA AA AA TT > 5 AT CG TT CC AA AA TT > 6 TT CC TT AA AA AA TT > 7 AT CC TT CC <NA> <NA> TT > 8 TT CC TT AC AG AG TT > 9 AT CC TT CC AG <NA> TT > 10 TT CC TT CC GG GG TT > > > In the dataframe I have 1 column where one factor has been erroneosly > given alternative readings: CG and GC. > > I want to change the instances of GC to CG and I use the code: > > data[data[,30]=="GC", 30] = "CG" > > but get the error: > Error in "[<-.data.frame"(`*tmp*`, all[, 30] == "GC", 30 > missing values are not allowed in subscripted as > > Any hints? > > Cheers, > > Federico >-- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ David W. Roberts office 406-994-4548 Professor and Head FAX 406-994-3190 Department of Ecology email droberts at montana.edu Montana State University Bozeman, MT 59717-3460