Clustering in Trees: Optimizing Cluster Sizes and Number of Subtrees
This paper considers partitioning the vertices of an n-vertex tree into p disjoint sets C1, C2,…, Cp, called clusters so that the number of vertices in a cluster and the number of subtrees in a cluster are minimized. For this NP-hard problem we present greedy heuristics which differ in (i) how subtrees are identified (using either a best-fit, good-fit, or first-fit selection criteria), (ii) whether clusters are filled one at a time or simultaneously, and (iii) how much cluster sizes can differ from the ideal size of c vertices per cluster, n = cp. The last criteria is controlled by a constant α, 0 ≤ α < 1, such that cluster Ci satisfies