Algorithm independent bounds on community detection problems and associated transitions in stochastic block model graphs    [PDF]

Richard K. Darst, David R. Reichman, Peter Ronhovde, Zohar Nussinov
We derive rigorous bounds for well-defined community structure in complex networks for a stochastic block model (SBM) benchmark. In particular, we analyze the effect of inter-community "noise" (inter-community edges) on any "community detection" algorithm's ability to correctly group nodes assigned to a planted partition, a problem which has been proven to be NP complete in a standard rendition. Our result does not rely on the use of any one particular algorithm nor on the analysis of the limitations of inference methods. Rather, we turn the problem on its head and examine when, in the first place, well defined solutions may exist in random SBMs. The method that we introduce could potentially be applied to other computational problems. The objective of community detection algorithms is to partition a given network into optimally disjoint subgraphs (or communities). Similar to k-SAT and other problems, "community detection" exhibits different phases. Networks that lie in the "unsolvable phase" lack well-defined structure and thus have no meaningful partition. Solvable systems splinter into two disparate phases: those in the "hard" phase and those in the "easy" phase. As befits its name, within the easy phase, a partition is easy to achieve by known algorithms. When a network lies in the hard phase, it still has an underlying structure yet finding a meaningful partition requires an exhaustive computational effort that rapidly increases with the size of the graph. When taken together, (i) the rigorous results that we report here on when graphs have an underlying structure and (ii) recent results concerning the limits of rather general algorithms, suggest bounds on the hard phase. The bounds of (i) and (ii) coincide when N/q^2 >>1 and k/n << 1 where N, q, and k are, respectively, the number of nodes, communities, and links per node. In this limit, the hard phase may no longer appear.
View original: http://arxiv.org/abs/1306.5794