When comparing the use of two word types within one text, we can do it by comparing the contexts in which they occur. We pick all the tokens that occur e.g. immediatelly to the right of the word A and immediatelly to the right of the word B, thus getting two submultisets of text.
This paper offers a method for comparing such submultisets (and its use is not limited only to the field of linguistics). The method is based on comparing the cardinality of the intersection of the two submultisets with a model which characterizes the average cardinality of all possible submultisets of a given length from the given text.
The model is derived algebraically.