thr3ads.net - R help - [R] Remove error data and clustering analysis [Mar 2009]

If this information is useful, please help other people find it:
Share via:

guodong wang

2009-Mar-27 08:27 UTC

[R] Remove error data and clustering analysis

Hi, all,

I?d like to do the clustering analysis in my dataset. The example data
are as follows:

Dataset 1:

500, 490, 486, 490, 491, 493, 480, 461, 504, 476, 434, 500, 470, 495,
3116, 3142, 12836, 3062, 3091, 3141, 3177, 3150, 3114, 3149;

Dataset 2:

506, 473, 495, 494, 434, 459, 445, 475, 476, 128367, 470, 513, 466,
476,482, 1201, 469, 502;

I had so many datasets like that. Basically, every dataset can
classify one or two clusters (no more than 2), meanwhile, there have
error data points, for example, 12836 is error data point in Dataset
1; and 128367, 1201 is error data points in dataset2.

The clustered data is following the normal distribution, the standard
deviation was known. That?s mean the one cluster is following the
normal distribution when the dataset classified one cluster like
dataset2; the two clusters are following the normal distribution
respectively when the dataset classified two clusters like dataset1.
Error data are far away of the mean.

    I am wondering is there any mathematic pipeline/function can do
the analysis that removing error data, and clustering the dataset in 1
or 2 clusters?

    Thank you for your reply.

wanggd1983

2009-Mar-27 10:34 UTC

head link

[R] Remove error data and clustering analysis

Hi, all,
I'd like to do the clustering analysis in my dataset. The example data are
as follows:
 
Dataset 1:
500, 490, 486, 490, 491, 493, 480, 461, 504, 476, 434, 500, 470, 495, 3116,
3142, 12836, 3062, 3091, 3141, 3177, 3150, 3114, 3149;
Dataset 2:
506, 473, 495, 494, 434, 459, 445, 475, 476, 128367, 470, 513, 466, 476,482,
1201, 469, 502;
 
I had so many datasets like that. Basically, every dataset can classify one or
two clusters (no more than 2), meanwhile, there have error data points, for
example, 12836 is error data point in Dataset 1; and 128367, 1201 is error data
points in dataset2.
 
The clustered data is following the normal distribution, the standard deviation
was known. That’s mean the one cluster is following the normal distribution when
the dataset classified one cluster like dataset2; the two clusters are following
the normal distribution respectively when the dataset classified two clusters
like dataset1. Error data are far away of the mean.
 
    I am wondering is there any mathematic pipeline/function can do the analysis
that removing error data, and clustering the dataset in 1 or 2 clusters?

    Thank you for your reply.

2009-03-27 



wanggd1983 

	[[alternative HTML version deleted]]

guodong wang

2009-Apr-24 12:18 UTC

head link

[R] [R-help]Remove error data and clustering analysis

Hi, all,
I?d like to do the clustering analysis in my dataset. The example data
are as follows:

Dataset 1:
500, 490, 486, 490, 491, 493, 480, 461, 504, 476, 434, 500, 470, 495,
3116, 3142, 12836, 3062, 3091, 3141, 3177, 3150, 3114, 3149;
Dataset 2:
506, 473, 495, 494, 434, 459, 445, 475, 476, 128367, 470, 513, 466,
476,482, 1201, 469, 502;

I had so many datasets like that. Basically, every dataset can
classify one or two clusters (no more than 2), meanwhile, there have
error data points, for example, 12836 is error data point in Dataset
1; and 128367, 1201 is error data points in dataset2.

The clustered data is following the normal distribution, the standard
deviation was known. That?s mean the one cluster is following the
normal distribution when the dataset classified one cluster like
dataset2; the two clusters are following the normal distribution
respectively when the dataset classified two clusters like dataset1.
Error data are far away of the mean.

    I am wondering is there any mathematic pipeline/function can do
the analysis that removing error data, and clustering the dataset in 1
or 2 clusters?

    Thank you for your reply.

Reasonably Related Threads

Search for more maybe matching threads

R help - Mar 2009 - Remove error data and clustering analysis

[R] Remove error data and clustering analysis

[R] Remove error data and clustering analysis

[R] [R-help]Remove error data and clustering analysis

Reasonably Related Threads