B. Bogart
2008-May-31 16:54 UTC
[R] Advice for working with Sammon's Projection on image data
Hello all, I'm working on a project that uses a SOM to generate feature maps from image data. Each image is a 100x75 pixels and RGBA, so 30,000 elements per image. I wanted to observe the structure of the pixel by pixel euclidean distance (where each image is a point in 30,000 dimensional space) of my image data. Sammon's projection seems appropriate for this, though I'm a bit concerned 400 images with 30,000 dimensions make be too large for the algo. I'm planning on only publishing B+W images, so it is possible I could throw away the colour channels to make each image 7500 dimensions. Also I'm not sure how to structure my data to make using it with Sammon's projection easiest. A 30,000 dimensional matrix for each image occurs to me first, but I'm not sure what the best data format is for that. I'm using two data types currently, the raw file has 30,000 variables and one observation per image. This is converted to a data frame where each pixel is a observation and there are variables for its position per image, and which image in which it is contained which is used for ggplot2 tile plotting. What I am attempting to do is compare the topology of my test data in order to compare it to the topology of the SOM weights trained on that data. The projections should be similar is topology is being preserved correct? Any advice appreciated. Thanks, B. Bogart