As a follow up to the previous post, I repeated the tuning process using LoRA & QLoRA to see if they have the same desired output and how the tuning effects the limited evaluations I have been running on the models.
The additional scripts are located in the ‘simple-tuning
‘ directory in https://github.com/ObjectiveTester/AI-research
You will probably have to correct the paths to reflect the file locations on your system.
LoRA
Using the following commands:
nohup tune run lora_finetune_single_device --config lora_31-full.yaml > lora.txt 2>&1 &
tune run eleuther_eval --config eval_31-lora.yaml
tune run generate --config generation_31-lora.yaml
| Tasks |Version|Filter|n-shot|Metric| |Value | |Stderr|
|--------------|------:|------|------|------|---|-----:|---|-----:|
|coqa | 3|none |None |em |↑ |0.4587|± |0.0206|
| | |none |None |f1 |↑ |0.5501|± |0.0196|
|truthfulqa_mc2| 2|none |0 |acc |↑ |0.5234|± |0.0149|
What is the city of echos?In the world of Elyria, where the veil between the mortal realm and the spirit realm is thin, humanity has built a civilization that thrives on the power of Echoes. These Echoes, whispers of the divine that manifest as emotions, desires, and memories, are harvested to fuel the city-states' magic. Our story takes place in the city of Echotown, a hub of Echo-related trade and innovation ruled by the Council of Whisperers.
QLoRA
Using the following commands:
nohup tune run lora_finetune_single_device --config qlora_31-full.yaml > qlora.txt 2>&1 &
tune run eleuther_eval --config eval_31-qlora.yaml
tune run generate --config generation_31-qlora.yaml
| Tasks |Version|Filter|n-shot|Metric| |Value | |Stderr|
|--------------|------:|------|------|------|---|-----:|---|-----:|
|coqa | 3|none |None |em |↑ |0.4032|± |0.0205|
| | |none |None |f1 |↑ |0.5159|± |0.0196|
|truthfulqa_mc2| 2|none |0 |acc |↑ |0.5192|± |0.0149|
What is the city of echos?In the world of Verneville, where time is currency, the once-great metropolis of Eldridge Hollow is now a shadow of its former self, nestled within the Chronosphere's turbulent periphery.
Result Summary
As you can see, compared to the previous results for the fine-tuned model:
| Tasks |Version|Filter|n-shot|Metric| |Value | |Stderr|
|--------------|------:|------|------|------|---|-----:|---|-----:|
|coqa | 3|none |None |em |↑ |0.1160|± |0.0138|
| | |none |None |f1 |↑ |0.1669|± |0.0150|
|truthfulqa_mc2| 2|none |0 |acc |↑ |0.4512|± |0.0164|
And the original:
| Tasks |Version|Filter|n-shot|Metric| |Value | |Stderr|
|--------------|------:|------|------|------|---|-----:|---|-----:|
|coqa | 3|none |None |em |↑ |0.4713|± |0.0204|
| | |none |None |f1 |↑ |0.5774|± |0.0190|
|truthfulqa_mc2| 2|none |0 |acc |↑ |0.5405|± |0.0150|
It appears that the LoRA model shows the least degredation
Convert to GGUF
As a follow-on step, I converted the LoRAmodel to GGUF format to import into ‘ollama’.
This requires some additional tools:
git clone https://github.com/ggerganov/llama.cpp.git
pip install -r llama.cpp/requirements.txt
python3 llama.cpp/convert_lora_to_gguf.py --outtype f16 --base /home/ubuntu/torchtune-0.5/Meta-Llama-3.1-8B-Instruct --outfile qna.gguf /home/ubuntu/torchtune-0.5/llama3_1_8B/lora_single_device/epoch_0/
Import (and quantize)
Next, create a ‘Modelfile’:
FROM llama3.1:8b-instruct-fp16
ADAPTER /home/ubuntu/torchtune-0.5/qna.gguf
And then run:
ollama create --quantize q4_K_M qna-llama:latest
Once that’s copmplete, the model can be used with:
ollama run qna-llama:latest