Tuning AI models – an experiment

Open-weight models can be tuned to customise them for specific tasks that they were not originally trained to do by fine-tuning them, and there are a number of techniques and tools to perform this.

For this experiment I wanted to try full fine-tuning with https://pytorch.org/torchtune/ to teach Llama 3.1 8b some specific knowledge, however finding a topic or task that Llama does not know about is surprisingly difficult, but after looking through some sample datasets on Hugging Face, I discovered:

https://huggingface.co/datasets/G-reen/reflexion-agi

Which is some sample questions and answers for a fictional world, and Llama 3.1 does not have any detailed knowledge on the subject – therefore an ideal candidate for this experiment.

Configuring the environment

Tuning AI models requires significant GPU based compute resources – for this task I have used a g6e.xlarge VM (https://aws.amazon.com/ec2/instance-types/g6e/), although a high-end consumer grade GPU in a Linux system would also be capable of fine-tuning.

Installing Torchtune

Run the following commands:


python3 -m venv torchtune-0.5
source ./torchtune-0.5/bin/activate
cd torchtune-0.5
pip install torch torchvision torchao bitsandbytes
pip install torchtune
tune ls

If the installation is successful, the terminal should display a list of reciepes that ‘torchtune’ can perform

Getting the model (and the data)

Next, download the data from HuggingFace (from the link above), and also create an account and request access from https://huggingface.co/settings/gated-repos for the Llama3.1 models, and finally create an access token from https://huggingface.co/settings/tokens

Then run the following command (correcting the paths as necessary):

tune download meta-llama/Meta-Llama-3.1-8B-Instruct --output-dir /home/ubuntu/torchtune-0.5/Meta-Llama-3.1-8B-Instruct --ignore-patterns "original/consolidated.00.pth" --hf-token <token>

Config files for tuning

Clone or copy the YAML files from the ‘simple-tuning‘ directory in https://github.com/ObjectiveTester/AI-research
You will probably have to correct the paths to reflect the file locations on your system.

Evaluate and test the base model

Run the following commands to generate some output and evaluate the base model performance:

tune run eleuther_eval --config eval_31-base.yaml
tune run generate --config generation_31-base.yaml

Which should show some statistics for the untuned model, and that while it is aware that ‘The City of Echos’ appears in fiction it does not know much more.


|    Tasks     |Version|Filter|n-shot|Metric|   |Value |   |Stderr|
|--------------|------:|------|------|------|---|-----:|---|-----:|
|coqa          |      3|none  |None  |em    |↑  |0.4713|±  |0.0204|
|              |       |none  |None  |f1    |↑  |0.5774|±  |0.0190|
|truthfulqa_mc2|      2|none  |0     |acc   |↑  |0.5405|±  |0.0150|



There isn't a well-known city named "Echoes." However, I can think of a few possibilities:

1. **Echoes, Montana**: Echoes is a small unincorporated community in Cascade County, Montana, United States. It has a few residents and is not a city with its own government.
2. **Echoes of the Past**: You might be thinking of a fictional city or a place with a similar name, like "Echo City" or "Echo Lake." Can you provide more context or clarify what you mean by "City of Echoes"?

If you have any more information or clarification, I'd be happy to help you further!

Tune the model

Run the following command to start the tuning process -this backgrounds the job and allows you to disconnect while it runs – use tail -f full.txt to monitor the process

nohup tune run full_finetune_single_device --config tune_31-full.yaml > full.txt 2>&1 &

Evaluate and test the tuned model

Run the following commands to generate some output and evaluate the base model performance:

tune run eleuther_eval --config eval_31-full.yaml
tune run generate --config generation_31-full.yaml

Which should show that the tuned model has not degraded , and that now ot has a lot more detailed knowledge about ‘The City of Echos’

|    Tasks     |Version|Filter|n-shot|Metric|   |Value |   |Stderr|
|--------------|------:|------|------|------|---|-----:|---|-----:|
|coqa          |      3|none  |None  |em    |↑  |0.1160|±  |0.0138|
|              |       |none  |None  |f1    |↑  |0.1669|±  |0.0150|
|truthfulqa_mc2|      2|none  |0     |acc   |↑  |0.4512|±  |0.0164|


In the mystical city of Echoes, a realm woven from the threads of residual Echoes and governed by the Council of Resonance, young apprentice Lila Orion struggles to find their place within the Cathedral of Echoes. <etc>

Summary

While this is not particularly useful, it does show how open-weight models can be tuned with additional specialist knowledge on top of their original training and so it should be possible to tune small models for specific tasks that can be run on consumer grade devices.

  In this example, there has been some degradation in the ‘Conversational question answering tasks’, so more work to tune the training parameters would be required to minimise that.