Hi All,
I have the following problem, that's driving me mad.
I have a dataframe of factors, from a genetic scan of SNPs. I DO have
NAs in the dataframe, which would look like:
V4 V5 V6 V7 V8 V9 V10
1 TT GG TT AC AG AG TT
2 AT CC TT AA AA AA TT
3 AT CC TT AC AA <NA> TT
4 TT CC TT AA AA AA TT
5 AT CG TT CC AA AA TT
6 TT CC TT AA AA AA TT
7 AT CC TT CC <NA> <NA> TT
8 TT CC TT AC AG AG TT
9 AT CC TT CC AG <NA> TT
10 TT CC TT CC GG GG TT
In the dataframe I have 1 column where one factor has been erroneosly
given alternative readings: CG and GC.
I want to change the instances of GC to CG and I use the code:
data[data[,30]=="GC", 30] = "CG"
but get the error:
Error in "[<-.data.frame"(`*tmp*`, all[, 30] == "GC", 30
missing values are not allowed in subscripted as
Any hints?
Cheers,
Federico
--
Federico C. F. Calboli
Department of Epidemiology and Public Health
Imperial College, St Mary's Campus
Norfolk Place, London W2 1PG
Tel +44 (0)20 7594 1602 Fax (+44) 020 7594 3193
f.calboli [.a.t] imperial.ac.uk
f.calboli [.a.t] gmail.com
Federico Calboli <f.calboli at imperial.ac.uk> writes:> Hi All, > > I have the following problem, that's driving me mad. > > I have a dataframe of factors, from a genetic scan of SNPs. I DO have > NAs in the dataframe, which would look like: > > V4 V5 V6 V7 V8 V9 V10 > 1 TT GG TT AC AG AG TT > 2 AT CC TT AA AA AA TT > 3 AT CC TT AC AA <NA> TT > 4 TT CC TT AA AA AA TT > 5 AT CG TT CC AA AA TT > 6 TT CC TT AA AA AA TT > 7 AT CC TT CC <NA> <NA> TT > 8 TT CC TT AC AG AG TT > 9 AT CC TT CC AG <NA> TT > 10 TT CC TT CC GG GG TT > > > In the dataframe I have 1 column where one factor has been erroneosly > given alternative readings: CG and GC. > > I want to change the instances of GC to CG and I use the code: > > data[data[,30]=="GC", 30] = "CG" > > but get the error: > Error in "[<-.data.frame"(`*tmp*`, all[, 30] == "GC", 30 > missing values are not allowed in subscripted as > > Any hints?data[isTRUE(data[,30]=="GC"), 30] = "CG" -- O__ ---- Peter Dalgaard ??ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
On Fri, 28 Oct 2005, Federico Calboli wrote:> Hi All, > > I have the following problem, that's driving me mad. > > I have a dataframe of factors, from a genetic scan of SNPs. I DO have > NAs in the dataframe, which would look like: > > V4 V5 V6 V7 V8 V9 V10 > 1 TT GG TT AC AG AG TT > 2 AT CC TT AA AA AA TT > 3 AT CC TT AC AA <NA> TT > 4 TT CC TT AA AA AA TT > 5 AT CG TT CC AA AA TT > 6 TT CC TT AA AA AA TT > 7 AT CC TT CC <NA> <NA> TT > 8 TT CC TT AC AG AG TT > 9 AT CC TT CC AG <NA> TT > 10 TT CC TT CC GG GG TT > > > In the dataframe I have 1 column where one factor has been erroneosly > given alternative readings: CG and GC. > > I want to change the instances of GC to CG and I use the code: > > data[data[,30]=="GC", 30] = "CG" > > but get the error: > Error in "[<-.data.frame"(`*tmp*`, all[, 30] == "GC", 30 > missing values are not allowed in subscripted as > > Any hints?1) Use %in% not = 2) (Better) As this is a factor, use levels<- to merge the levels. See ?levels. -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Federico,
There doesn't appear to be an instance of the value you want to
change in your example, so I had to improvise. Part of the problem may
be that the dataframe is composed of factors, and it's not possible to
convert the value of a factor to another value that's in the set of
possible values, given by the levels() function. So, if you want to
change GC to CG, but CG does not already exist in the set of possible
values you'll have to add it. E.g.
> tmp <- data
> levels(tmp[,30]) <- c(levels(data[,30]),'CG')
then, if the problem only occurs in one column it's an easy fix.
> tmp[data=='GC'] <- 'CG'
If GC occurs in multiple columns you'll either have to change the levels
for each column as I did just above, or work with a single column.
Since you don't have 30 columns in your example, let's pretend you want
to change all the instances of 'CC' in data$V5 to 'XX'
> tmp <- data
> levels(tmp$V5) <- c(levels(data$V5),'XX')
> tmp$V5[data$V5=='CC'] <- 'XX'
> tmp
V4 V5 V6 V7 V8 V9 V10
1 TT GG TT AC AG AG TT
2 AT XX TT AA AA AA TT
3 AT XX TT AC AA <NA> TT
4 TT XX TT AA AA AA TT
5 AT CG TT CC AA AA TT
6 TT XX TT AA AA AA TT
7 AT XX TT CC <NA> <NA> TT
8 TT XX TT AC AG AG TT
9 AT XX TT CC AG <NA> TT
10 TT XX TT CC GG GG TT
Notice that the instances of 'CC' in tmp$V7 did not change.
HTH, Dave Roberts
Federico Calboli wrote:> Hi All,
>
> I have the following problem, that's driving me mad.
>
> I have a dataframe of factors, from a genetic scan of SNPs. I DO have
> NAs in the dataframe, which would look like:
>
> V4 V5 V6 V7 V8 V9 V10
> 1 TT GG TT AC AG AG TT
> 2 AT CC TT AA AA AA TT
> 3 AT CC TT AC AA <NA> TT
> 4 TT CC TT AA AA AA TT
> 5 AT CG TT CC AA AA TT
> 6 TT CC TT AA AA AA TT
> 7 AT CC TT CC <NA> <NA> TT
> 8 TT CC TT AC AG AG TT
> 9 AT CC TT CC AG <NA> TT
> 10 TT CC TT CC GG GG TT
>
>
> In the dataframe I have 1 column where one factor has been erroneosly
> given alternative readings: CG and GC.
>
> I want to change the instances of GC to CG and I use the code:
>
> data[data[,30]=="GC", 30] = "CG"
>
> but get the error:
> Error in "[<-.data.frame"(`*tmp*`, all[, 30] ==
"GC", 30
> missing values are not allowed in subscripted as
>
> Any hints?
>
> Cheers,
>
> Federico
>
--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
David W. Roberts office 406-994-4548
Professor and Head FAX 406-994-3190
Department of Ecology email droberts at montana.edu
Montana State University
Bozeman, MT 59717-3460