Speaker clustering is one of the important tasks in speech processing. Its goal is not to understand or analyse the spoken language, but to separate recordings from multiple speakers or to analyse the recordings and determine the number of speakers.
While there are advanced models for speech recognition and generation, a simpler method might be sufficient for clustering of the speech data. In this paper, we discuss such method based on tracing visited portions of the feature space.