AI/ML text generation – getting started

After the concept of AI was first defined in the mid 1950’s, development was largely theoretical and severely restricted by the capabilities of compute resources available. Other than outwardly convincing but internally simplistic ‘chattterbox’ implementations, little progress was made initially.

Rudimentary AI systems in the form of expert systems appeared in the 1980’s, which while revolutionary in their time and found useful real-world applications, were not without drawbacks – they could not learn and were difficult to update, and if given unexpected inputs could produce spurious outputs.

The 1990’s saw the advent of more recognisable AI systems – most notably ‘Deep Blue’ which was the first chess-playing computer to beat a reigning world chess champion. These breakthroughs were only possible due to the great increases in speed and storage of computers at the time, as well as custom silicon designed specifically for the domain specific computations. These symbolic systems, while very good at their specific tasks, could not learn or self modify nor be repurposed to solve other tasks and as such were very narrow AI systems.

In the early 21st century, the continued increases in computational performance coupled with the emergence of large amounts of data available (so called data lakes) saw the advent of machine learning (sub-symbolic) systems whereby a neural network could be trained on vast quantities of available data to build a mathematical construct that can be used to predict future data points.

ML systems present new challenges to testing because unlike the previous generation of AI systems where the reasoning for a specific outcome can be traced back, sub-symbolic systems may not always produce the same output for the same inputs, but the result may still be ‘correct’. This makes testing with expected results impossible. Consider a system that recommends additional purchases or streaming content based on individual choices – this will change over time as additional data refines the underlying model.

There are many other applications for ML over and above producing recommendations, and some of the more well known use cases are in image recognition and classification with seemingly almost limitless applications, and the sophisticated sensor fusion challenges that self driving vehicle technology faces.

An equally interesting and challenging area of research is around natural language processing, and there are a number of applications for processing text including:

  • Generation
  • Discovery
  • Classification
  • Summarisation
  • Entity Analysis
  • Risk & Obligation Analysis
  • Invoice & Receipt Analysis

Introducing GPT (2)

Generative Pre-trained Transformer 2 is an open sourced AI model that can translate text, answer questions and summarise passages. It is available on GitHub at :, and is an ideal starting point for research and investigation into language model capabilities.

Initial setup

GPT2 uses Python, so you will need to install it for your system. The following example uses Ubuntu on a mid-range i7 system with a mid-range RTX GPU

First install some required packages:
sudo apt install git gcc python3-dev python3.8-venv
and then create a Python environment:
python3 -m venv gpt2-env
and then activate it:
source gpt2-env/bin/activate
The prompt should then change to indicate which Python environment is in use e.g:
(gpt2-env) steve@ubuntu:~$

Then clone the repository with:

git clone

(which is a fork of the original at, extended to use current libraries)

Then run pip to install the Python components:

cd gpt-2
pip install --upgrade -r requirements.txt

This may produce some build errors but PIP should recover, so repeat the above command to verify the requirements have been satisfied. You will then need to install some additional components:

pip install numpy tensorflow

If you have a high end Nvidia GPU, you should install the necessary drivers to allow the model to use the GPU by following the guide here:
You may need to symlink one of the libaries into the Python environment as detailed here:
The command I used was:

sudo ln -s /usr/local/cuda/targets/x86_64-linux/lib/ /home/steve/gpt2-env/lib/python3.8/site-packages/tensorflow/python/

Model download & configuration

There are four pre-trained models available, the 774 million parameter model is 3.1 GB.
My 8GB GPU runs out of memory when using the larger 1558M model.
Download with:

python3 774M

This will take a while, depending on your internet speed. Once complete, edit:


and set the correct parameter for the model. e.g.:


Generating text

You can then run the model with:

python3 src/

and enter some text. For example, when I entered:

if you are feeling unwell, you should make an appointment to see a doctor

The model generated:

as soon as you are safe. When the disease is gone, see the doctor today and take proper care of it. The doctor will give you answers to the questions you have asked. You may ask the doctor to give you some medicine to begin with. If he confronts you with vertealysis Ϩ you may often take a pain reliever, but not piroxicam. You may take aspirin and ibuprofen at the same time, not giving them up two hours later. Before taking the medicine, you should drink fluids to help prevent property damage.

Which is interesting, but I don’t think I’ll ask GPT-2 for any medical advice….

In a future post I will explore how to train the model using different sets of data in an attempt to bias the output.