boule
2011-May-04 15:55 UTC
[R] Outlier removal by Principal Component Analysis : error message
Hi, I am currently analysis Raman spectroscopic data with the hyperSpec package. I consulted the documentation on this package and I found an example work-flow dedicated to Raman spectroscopy (see the address : http://hyperspec.r-forge.r-project.org/chondro.pdf) I am currently trying to remove outliers thanks to PCA just as they did in the documentation, but I get a message error I can't explain. Here is my code : "#import the data : T=read.table('bladder bis concatenation colonne.txt',header=TRUE) spec=new("hyperSpec",wavelength=T[,1],spc=t(T[,-1]),data=data.frame(sample=colnames(T[,-1])),label=list(.wavelength="Raman shift (cm-1)",spc="Intensity (a.u.)")) #baseline correction of the spectra spec=spec[,,500~1800] bl=spc.fit.poly.below(spec) spec=spec-bl #normalization of the spectra spec=sweep(spec,1,apply(spec,1,mean),'/') #PCA pca=prcomp(~ spc,data=spec$.,center=TRUE) scores=decomposition(spec,pca$x,label.wavelength="PC",label.spc="score/a.u.") loadings=decomposition(spec,t(pca$rotation),scores=FALSE,label.spc="laoding I/a.u.") #plot the scores of the first 20 PC against all other to have an idea where to find the outliers pairs(scores[[,,1:20]],pch=19,cex=0.5) #identify the outliers thanks to "map.identify" out=map.identify(scores[,,5]) Erreur dans `[.data.frame`(x at data, , j, drop = FALSE) : undefined columns selected Does anybody understand where the problem comes from ? And does anybody know another mean to find spectra outliers ? Thank you in advance. Boule -- View this message in context: http://r.789695.n4.nabble.com/Outlier-removal-by-Principal-Component-Analysis-error-message-tp3496023p3496023.html Sent from the R help mailing list archive at Nabble.com.
Claudia Beleites
2011-May-05 13:01 UTC
[R] Outlier removal by Principal Component Analysis : error message
Dear Boule, thank you for your interest in hyperSpec. In order to look into your *problem* I need some more information. I suggest that we solve the error off-list. Please note also that hyperSpec has its own help mailing list: hyperspec-help at lists.r-forge.r-project.org (due to the amount of spam I got to moderate, you need to subscribe first here: https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/hyperspec-help) - Which version of hyperSpec do you use? If it is the version from CRAN, could you please update to the development version at r-forge with install.packages("hyperSpec",repos="http://R-Forge.R-project.org") ? - Next, if the problem persists with the latest build, could you send me the raw data file so that I can exactly reproduce your problem? - Also, for tracking down the exact source of the error, please execute traceback () after you got the error and email me its output. It is basically impossible to give general recommendations about *Outlier detection*: a few spectra that are very different from all other spectra may be outliers or they may be the target of a study... This is also why the example in the vignette uses a two step procedure: PCA only identifies suspects, i.e. spectra that have very different scores than all others for some principal components. The second step is a manually supervised decision whether the spectrum is really an outlier. The first step could be replaced by other measures that however depend on your data. E.g. if you expect/know your data to consist of different clusters, suspects could be spectra that are too far away from any cluster. If your data comes from a mixture of a few components, spectra that cannot be modeled decently by a few PLS components could be suspicious. Or spectra that require an own component, ... Some kinds of outliers are actually well-defined in a spectroscopic sense, e.g. contamination by fluorescent lamp light. The second step could be replaced by an automatic decision, e.g. with a distance threshold. Personally, I rather use the term filtering for such automatic rules. And there you can think about any number of rules your spectra must comply with in order to be acceptable: signal to noise ratio, minimal and maximal intensity, original offset (baseline) less than, ... Hope that helps, Claudia> I am currently analysis Raman spectroscopic data with the hyperSpec package. > I consulted the documentation on this package and I found an example > work-flow dedicated to Raman spectroscopy (see the address : > http://hyperspec.r-forge.r-project.org/chondro.pdf) > > I am currently trying to remove outliers thanks to PCA just as they did in > the documentation, but I get a message error I can't explain. Here is my > code : > > "#import the data : > T=read.table('bladder bis concatenation colonne.txt',header=TRUE) > spec=new("hyperSpec",wavelength=T[,1],spc=t(T[,-1]),data=data.frame(sample=colnames(T[,-1])),label=list(.wavelength="Raman > shift (cm-1)",spc="Intensity (a.u.)")) > > #baseline correction of the spectra > spec=spec[,,500~1800] > bl=spc.fit.poly.below(spec) > spec=spec-bl > > #normalization of the spectra > spec=sweep(spec,1,apply(spec,1,mean),'/') > > #PCA > pca=prcomp(~ spc,data=spec$.,center=TRUE) > scores=decomposition(spec,pca$x,label.wavelength="PC",label.spc="score/a.u.") > loadings=decomposition(spec,t(pca$rotation),scores=FALSE,label.spc="laoding > I/a.u.") > > #plot the scores of the first 20 PC against all other to have an idea where > to find the outliers > pairs(scores[[,,1:20]],pch=19,cex=0.5) > > #identify the outliers thanks to "map.identify" > out=map.identify(scores[,,5]) > Erreur dans `[.data.frame`(x at data, , j, drop = FALSE) : > undefined columns selected > > Does anybody understand where the problem comes from ? > And does anybody know another mean to find spectra outliers ? > > Thank you in advance. > > Boule > > -- > View this message in context: http://r.789695.n4.nabble.com/Outlier-removal-by-Principal-Component-Analysis-error-message-tp3496023p3496023.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Claudia Beleites Spectroscopy/Imaging Institute of Photonic Technology Albert-Einstein-Str. 9 07745 Jena Germany email: claudia.beleites at ipht-jena.de phone: +49 3641 206-133 fax: +49 2641 206-399