Foundations of Machine LearningClustering Methods (聚类方法)Clustering MethodsLesson 9 - 1Clustering MethodsPartitional Methods (基于划分的方法), prototype-based clustering (基于原型的聚类): K-meansHierarchical Clustering (层次聚类)Density-based Clustering (密度聚类)Clustering MethodsLesson 9 - 2K-means algorithms ( k 均值算法)Center-basedA cluster is a set of objects such that an object in a cluster is closer (more similar) to the “center” of a cluster, than to the center of any other clusterThe center of a cluster is called centroidEach point is assigned to the cluster with the closest centroidThe number of clusters usually should be specifiedClustering MethodsLesson 9 - 3K-means algorithms 基本步骤Partition {x1,…, xn} into K clusters ( K is predefined )InitializationSpecify the initial cluster centers (centroids)Iteration until no changeFor each object xi (Cluster Assignment )Calculate the distances between xi and the K centroids(Re)assign xi to the cluster whose centroid is the closest to xiUpdate the cluster centroids based on current assignment (Update Cluster Centroid)Clustering MethodsLesson 9 - 4K-means: InitializationClustering MethodsLesson 9 - 5K-means Clustering: Cluster AssignmentClustering MethodsLesson 9 - 6K-means Clustering: Update Cluster CentroidClustering MethodsLesson 9 - 7Sum of Squared Error (SSE)K-means 算法的优化目标:误差平方和Suppose the centroid of cluster Ci is ui For each object x in Ci, compute the squared error between x and the centroid uiSum up the error of all the objectsClustering MethodsLesson 9 - 8K-means 算法伪代码Clustering MethodsLesson 9 - 9Comments on the K-Means MethodStrengthEfficient: O(tkn), where n is # objects, k is # clusters, and t is # iterations. Normally, k, t <...