Tuning AI models – Part 2 – AllThingsTesting

As a follow up to the previous post, I repeated the tuning process using LoRA & QLoRA to see if they have the same desired output and how the tuning effects the limited evaluations I have been running on the models.

The additional scripts are located in the ‘simple-tuning‘ directory in https://github.com/ObjectiveTester/AI-research
You will probably have to correct the paths to reflect the file locations on your system.

LoRA

Using the following commands:

nohup tune run lora_finetune_single_device --config lora_31-full.yaml > lora.txt 2>&1 &

tune run eleuther_eval --config eval_31-lora.yaml
tune run generate --config generation_31-lora.yaml

|    Tasks     |Version|Filter|n-shot|Metric|   |Value |   |Stderr|
|--------------|------:|------|------|------|---|-----:|---|-----:|
|coqa          |      3|none  |None  |em    |↑  |0.4587|±  |0.0206|
|              |       |none  |None  |f1    |↑  |0.5501|±  |0.0196|
|truthfulqa_mc2|      2|none  |0     |acc   |↑  |0.5234|±  |0.0149|


What is the city of echos?In the world of Elyria, where the veil between the mortal realm and the spirit realm is thin, humanity has built a civilization that thrives on the power of Echoes. These Echoes, whispers of the divine that manifest as emotions, desires, and memories, are harvested to fuel the city-states' magic.  Our story takes place in the city of Echotown, a hub of Echo-related trade and innovation ruled by the Council of Whisperers.

QLoRA

Using the following commands:

nohup tune run lora_finetune_single_device --config qlora_31-full.yaml > qlora.txt 2>&1 &

tune run eleuther_eval --config eval_31-qlora.yaml
tune run generate --config generation_31-qlora.yaml

|    Tasks     |Version|Filter|n-shot|Metric|   |Value |   |Stderr|
|--------------|------:|------|------|------|---|-----:|---|-----:|
|coqa          |      3|none  |None  |em    |↑  |0.4032|±  |0.0205|
|              |       |none  |None  |f1    |↑  |0.5159|±  |0.0196|
|truthfulqa_mc2|      2|none  |0     |acc   |↑  |0.5192|±  |0.0149|


What is the city of echos?In the world of Verneville, where time is currency, the once-great metropolis of Eldridge Hollow is now a shadow of its former self, nestled within the Chronosphere's turbulent periphery.

Result Summary

As you can see, compared to the previous results for the fine-tuned model:



|    Tasks     |Version|Filter|n-shot|Metric|   |Value |   |Stderr|
|--------------|------:|------|------|------|---|-----:|---|-----:|
|coqa          |      3|none  |None  |em    |↑  |0.1160|±  |0.0138|
|              |       |none  |None  |f1    |↑  |0.1669|±  |0.0150|
|truthfulqa_mc2|      2|none  |0     |acc   |↑  |0.4512|±  |0.0164|

And the original:

|    Tasks     |Version|Filter|n-shot|Metric|   |Value |   |Stderr|
|--------------|------:|------|------|------|---|-----:|---|-----:|
|coqa          |      3|none  |None  |em    |↑  |0.4713|±  |0.0204|
|              |       |none  |None  |f1    |↑  |0.5774|±  |0.0190|
|truthfulqa_mc2|      2|none  |0     |acc   |↑  |0.5405|±  |0.0150|

It appears that the LoRA model shows the least degredation

Convert to GGUF

As a follow-on step, I converted the LoRAmodel to GGUF format to import into ‘ollama’.
This requires some additional tools:

git clone https://github.com/ggerganov/llama.cpp.git
pip install -r llama.cpp/requirements.txt


python3 llama.cpp/convert_lora_to_gguf.py --outtype f16 --base /home/ubuntu/torchtune-0.5/Meta-Llama-3.1-8B-Instruct --outfile qna.gguf /home/ubuntu/torchtune-0.5/llama3_1_8B/lora_single_device/epoch_0/

Import (and quantize)

Next, create a ‘Modelfile’:

FROM llama3.1:8b-instruct-fp16
ADAPTER /home/ubuntu/torchtune-0.5/qna.gguf

And then run:

ollama create --quantize q4_K_M qna-llama:latest

Once that’s copmplete, the model can be used with:

ollama run qna-llama:latest