AES Consulting meeting on 9 Aug 2017
Differences in word frequency between two corpus
A client has 2 different corpora, say A and B, and is interested in differences in word frequencies (amongst ~100 words) between the corpora.
The client is performing bootstrapping on residuals or something.
Advice
A permutation/randomization tests seems like the most straightforward approach. In this approach, the corpora identification would be randomly assigned to a document and this would be repeated. Then the statistic would be calculated under this null hypothesis that the corpora identification has no relation to word frequency.