
the following formula:
K
X
l=1
(−1)
K−l
1
(K − l)! · l!
l
n
If we take an example of a dataset with 20 observations (n = 20) and we want to divide the observations
into 4 clusters (K = 4) we would have more than 4 × 10
10
possible partitions. However this would
of course not work in practice as it would take too long time to compute. Therefore, in order to deal
with the combinatorial character of the problem we will find the partition using iterative procedures that
alleviate the computational complexity. When we use partitioning clustering we also have a measurement
of dissimilarity δ which will be a distance, d, to measure how similar clusters and observations are at
each other.
In partitioning clustering we use the concept of prototype. Each cluster will have a prototype. A
prototype is a representative of the cluster. We define p
l
as the prototype of the cluster l. The prototypes
are used for measuring the intra-homogeneity in a cluster. We will measure the distance from the points
in the cluster to its prototype, to have an estimate of the intra.homogeneity of the cluster:
X
i∈C
l
d(x
i
, p
l
)
So we measure the intra-homogeneity by finding the sum of the distances between all observations in a
cluster and its prototype p
l
. The objective of a partitioning clustering method must be to minimize the
within cluster distance to its prototype, across all clusters:
minimize
K
X
l=1
X
i∈C
l
d(x
i
, p
l
) (1)
This minimization problem exactly aims at improving the total intra-homogeneity.
2.1 Iterative partitioning algorithms
First the clustering algorithm will choose K initial prototypes, p
1
, p
2
, . . . , p
K
. The prototypes will be
randomly selected when the algorithm is initialized. You can imagine prototypes as being imaginary
observations (prototypes) in our solution space. When the prototypes are set the algorithm clusters
around the prototypes, i.e., each observation is assigned to the closest prototype. In math what the
algorithm does at this point in time:
i ∈ C
l
if d(x
i
, p
l
) = min
l
0
=1,...,K
d(x
i
, p
l
0
) (2)
In other words the cluster an observation will be assigned will be the prototype it is closest to. The
next step of the algorithm is to check if the clusters has changed. If the clusters has not changed the
algorithm should stop. If the clusters has changed the prototypes will be (re-)calculated and equation
2 will be repeated until convergence (the clusters has not changed). How exactly the prototypes will be
re-calculated is where we split partitioning clustering methods into different specified algorithms such as
K-Means, K-Medians etc. The crucial elements in the different partitioning-based clustering methods
are:
2