A formula to calculate pruning threshold for the part of speech tagging problem