SPOTCLUST represents a novel approach to advance global studies of Mycobacterium
tuberculosis complex
(MTC) genotyping data. SPOTCLUST uses mixture models to identify strain families of MTC based on
their spacer oligonucleotide typing (spoligotyping) patterns. The algorithm
incorporates biological information on spoligotype evolution, without attempting to derive the full phylogeny of MTC. We applied our
algorithm to spoligotype patterns identified among strains isolated between 1996 and 2004, primarily from New York
State tuberculosis patients. Two models were employed to identify strain families in the data: a 36-component model based on
spoligotypes database SpolDB3, and a randomly initialized model containing 48 components. Our results both confirm previously defined
families of MTC strains and suggest certain new families. Our approach can potentially provide a simple
first-step
tool for epidemiology of
tuberculosis.