This tutorial uses a module from the scikit-learn (sklearn) library that performs k-means clustering. The module includes built-in optimization techniques that are manipulated by its class parameters. The class for the module looks like this:
class sklearn.cluster.KMeans(n_clusters=8, *, init='k-means++', n_init='auto', max_iter=300, tol=0.0001, verbose=0, random_state=None, copy_x=True, algorithm='lloyd')12
The parameters include the number of clusters to form and the number of centroids to generate (n_clusters). There are two initialization methods availablek-means++andrandom. It also includes attributes for setting the maximum number of iterations. Each iteration begins by partitioning the dataset into the value of the n_clustersparameter.
These libraries are used to generate a test data set and perform clustering:
import pandas as pd
import sklearn
import matplotlib.pyplot as plt
import seaborn as sns
import numpy
from sklearn.cluster import KMeans
from sklearn.datasets import make_blobs
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler