Topic: Clustering Basics
In this Module 6 Discussion, we shall discuss what are the R functions for clustering using k-means clustering algorithm and hierarchical clustering procedures for clustering. Please answer the questions by filling out the blank on the right hand side of the table.
To answer these questions, please go over the examples (Example 15.1,15.2,15.4,15.5) in Data Mining and Business Analytics with R and Data Mining for Business Analytics: Concepts, Techniques, and Applications in R Chapter 15 (all found in this week’s Reading & Resources) to find and then fill in the blanks for answers to the following questions. You may also refer to some open resources to find relevant answers to fill in those blanks as answers.
What are the R functions in those examples (15.1,15.2) you can use to run k-means clustering algorithm for clustering?
What are the R functions in those example (15.4, 15.5) you can use to run hierarchical clustering procedures for clustering?
In K-means clustering, what is the method for comparing different choices of k in terms of overall average within-cluster distance? What is the R function you can use for it? Please also describe that method briefly.
Please define the concept of dendrogram.
Please refer to the following link for more details on different dendrograms. Then use the following horizontal dendrogram (where y is the height) and describe how many clusters you have based on the dendrogram at height=90,150,250 and 400, respectively?
In your response to other students, suggest changes to their design that you think would make it a stronger study, or ask clarifying questions if anything was unclear or confusing.