similar to: k-means: should columns in dataset be in same scale?

Displaying 20 results from an estimated 10000 matches similar to: "k-means: should columns in dataset be in same scale?"

2004 Nov 14
2
Exporting to file: passing source name to file name in loop
Hi, I'm having a mental block as to how I can automatically assign filenames to the output of the following code. I am wishing to create a separate .png file for every image created, each of them having a sequential filename ie "sourcefile_index.png" so that I can create a movie from them. Please could someone tell me where I am going wrong? the following code works fine and
2005 Oct 09
3
[ subscripting sometimes loses names (PR#8192)
--rwEMma7ioTxnRzrJ Content-Type: text/plain; charset=us-ascii Content-Disposition: inline R, like recent versions of S-Plus, sometimes - but not always - loses names when subscripting objects with "[". (Earlier versions of S and S-Plus had the correct, name-preserving behavior.) This seems bad, it would be better to remove names only by explicit request, not as an accidental
2011 Oct 03
4
distance coefficient for amatrix with ngative valus
Hi, I need to run a PCoA (PCO) for a data set wich has both positive and negative values for variables. I  could not find any distancecoefficient other than euclidean distace running for the data set. Are there any other coefficient works with negtive values.Also I cannot get summary out put (the eigen values) for PCO as for PCA.   Thanks. Dilshan [[alternative HTML version deleted]]
2017 Aug 23
0
Comparing 2 dale columns
Patrick, ## Run the following script an notice the different values of the dataframe "data" in each instance. # I understand you have done something like the following: data <- data.frame(COL1 = c("6/1/14", "7/1/14"), COL2 = c("5/1/15", "5/1/15"), stringsAsFactors = FALSE) data$Date_Flag <- ifelse(data$COL2 >
2004 Oct 13
3
data(eurodist) and PCA ??
If I perform PCA on the 'eurodist' data, should I get an accurate geographic layout of the cities with biplot? (barring inversions, i.e. their is no way to define north.. but you get the idea...) I have a complex distance matrix, and I am thinking about how to cluster it and how to visualize the quality of the resulting clusters. If I could 'see' the clusters in space I could
2006 Jan 23
4
Converting from a dataset to a single "column"
I have a dataset of 3 ?columns? and 5 ?rows?. temp<-data.frame(col1=c(5,10,14,56,7),col2=c(4,2,8,3,34),col3=c(28,4,52,34,67)) I wish to convert this to a single ?column?, with column 1 on ?top? and column 3 on ?bottom?. i.e. 5 10 14 56 7 4 2 8 3 34 28 4 52 34 67 Are there any functions that do this, and that will work well on much larger datasets (e.g. 1000 rows, 6000 columns)?
2017 Jun 21
0
selecting dataframe columns based on substring of col name(s)
> On Jun 21, 2017, at 9:11 AM, Evan Cooch <evan.cooch at gmail.com> wrote: > > Suppose I have the following sort of dataframe, where each column name has a common structure: prefix, followed by a number (for this example, col1, col2, col3 and col4): > > d = data.frame( col1=runif(10), col2=runif(10), col3=runif(10),col4=runif(10)) > > What I haven't been able to
2011 Jun 03
2
modify a data frame by values in the columns
I have a data frame like this: col1 col2 r1 2 1 r2 4 3 r3 6 5 r4 8 7 r5 10 9 r6 12 11 r7 14 13 r8 16 15 r9 18 17 r10 20 19 I want to modify this data frame, for example, assign every row in column col1 and col2 to -1 if the values in col1 is less than 12 and values in col2 is greater than 10. The result should look like this: col1
2008 Mar 16
2
How to loop through all the columns in dataframe
Hi: Can anyone advice me on how to loop and perform a calculation through all the columns. here's my data xd<- c(2.2024,2.4216,1.4672,1.4817,1.4957,1.4431,1.5676) pd<- c(0.017046,0.018504,0.012157,0.012253,0.012348,0.011997,0.012825) td<- c(160524,163565,143973,111956,89677,95269,81558) mydf<-data.frame(xd,pd,td) trans<-t(mydf) trans I have these values that I need to
2002 Feb 15
1
cmdscale k=1
In applying multidimensional scaling, it seems to me that sometimes the underlying dimensionality of the matrix is 1. However I found a case where cmdscale failed when I tried k=1. Here it is: m<-matrix( c(.5,.81,.23,.47,.61, .19,.5,.06,.17,.28, .77,.94,.5,.74,.85, .53,.83,.26,.5,.64, .39,.72,.15,.36,.5), nrow=5) # BTW I think cmdscale uses only the lower triangle--how to enter only # that
2017 Jun 21
4
selecting dataframe columns based on substring of col name(s)
Suppose I have the following sort of dataframe, where each column name has a common structure: prefix, followed by a number (for this example, col1, col2, col3 and col4): d = data.frame( col1=runif(10), col2=runif(10), col3=runif(10),col4=runif(10)) What I haven't been able to suss out is how to efficiently 'extract/manipulate/play with' columns from the data frame, making use
2017 Aug 23
0
Comparing 2 dale columns
Hi your code is wrong. I get > test<-read.table("clipboard", header=T) > str(test) 'data.frame': 2 obs. of 2 variables: $ COL1: Factor w/ 2 levels "6/1/14","7/1/14": 1 2 $ COL2: Factor w/ 1 level "5/1/15": 1 1 > test$COL2<- as.Date(as.character(test$COL2, format="%y/%m/%d")) > test$COL1<-
2017 Aug 23
2
Comparing 2 dale columns
Thanks. But when I apply your codes I get all NA instead of TRUE and FALSE ________________________________ From: PIKAL Petr <petr.pikal at precheza.cz> Sent: Wednesday, August 23, 2017 11:20:00 AM To: Patrick Casimir; r-help at r-project.org Subject: RE: Comparing 2 dale columns Hi your code is wrong. I get > test<-read.table("clipboard", header=T) > str(test)
2010 Nov 07
3
help! kennard-stone algorithm in soil.spec packages does not work for my dataset!!!
http://r.789695.n4.nabble.com/file/n3031344/RSV.Rdata RSV.Rdata I want to split my dataset to training set and test set using kennard-stone(KS) algorithm, it is lucky there is R packages soil.spec to implement it. but when I used it to my dataset, it does not work, who can help me, how reasons is it, below, it is my code, and my data in the attachment.
2001 Dec 13
2
k-means with euclidian distance but no coordinates
Hi, I'm trying to build a thesaurus that will sensible values for rare words. I suspect the best algorithm to use is k-means although I'm not sure about that -- I would have preferred a k dimensional space with a binary cluster in each dimension so a word can belong to 0..k clusters, but I digress... I can measure the strength of correlation between words fairly easily by counting
2017 Aug 23
2
Comparing 2 dale columns
Dear R fellows, I created a new column Date_flag to compare the dates of COL1 and COL2 using the code below. But it showed that 5/1/15 is greater than 6/1/2014 and 5/1/2015 greater than 7/1/2014 despite the year is greater. How do I fix that? I did try to format as %y/%m/%d but it does not fix that. data$Date_Flag <- ifelse(data$COL2 > data$COL1, 0,1) COL1 COL2 6/1/14
2005 Mar 14
1
Significance of Principal Coordinates
Dear all, I was looking for methods in R that allow assessing the number of significant principal coordinates. Unfortunatly I was not very successful. I expanded my search to the web and Current Contents, however, the information I found is very limited. Therefore, I tried to write code for doing a randomization. I would highly appriciate if somebody could comment on the following approach.
2009 Nov 18
1
row-wise means
I have a dataframe with 3 columns. The first column stores an index. I would like to calculate the mean of the numbers stored in each of the rest of the columns. So, here is my data matrix: col1 col2 col3 1 23 34 2 45 56 3 23 56 4 34 68 For each row I would like to calculate the means of the numbers stored in col2 and col3. How can this be done in R? TIA, Anjan -- =============================
2005 Apr 03
2
how to draw a 45 degree line on qqnorm() plot?
# I can not draw a 45 degree line on a qqnorm() plot, jj <- sample(c(1:100), 10) qqnorm(jj) abline() don't work. Thank you.
2016 Jul 27
2
K MEANS clustering
Hey Parth, Thanks for the reply. I am considering implementing a cosine distance metric too, along with euclidian distance because of the dimensionality issue that comes in with K-Means and euclidian distance metric. That does help when we deal with sparse vectors for documents. The particular problem I'm having is representing centroids in an efficient way. For example, when we find the mean