The question we have addressed here is to define the size and composition of the corpus we would need in order to get necessary and sufficient information for Machine Learning techniques to induce that type of information. Representativeness of a corpus is a topic largely dealt with, especially in corpus linguistics. One of the standard references is Biber (1993) where the author offers guidelines for corpus design to characterize a language.