The notebook creates a corpus of short documents from the BNC for use with probabilistic topic models, particularly using the Gustav topic modelling toolbox.
It requires a Python package called bnctools and this will be installed, as will all other requirements, if you do `pip install -r requirements.txt'