All Articles
AI & Machine Learning4 min read

Build a Custom AI Model This Week: A Practical Unsloth Guide

Greg (Zvi) Uretzky

Founder & Full-Stack Developer

Share
unsloth_new_wb_logo

Build a Custom AI Model This Week: A Practical Unsloth Guide

You need a language model that understands your business data. You can't use ChatGPT because your data is private. Generic models make too many mistakes in your domain.

By the end of this guide, you'll have a fine-tuned language model running locally. It will answer questions based on your custom dataset. You'll also know whether to scale this approach or try something else.

What You Need Before Starting

Time: 90-120 minutes for first run
Skills: Basic Python, familiarity with command line
Hardware: A computer with an NVIDIA GPU (8GB VRAM minimum) or access to Google Colab Pro
Software: Python 3.9+, Git

If you don't have a GPU, use Google Colab's free tier. The training will be slower but still work.

Step-by-Step Implementation Guide

Step 1: Set Up Your Environment

Open your terminal and run these commands:

# Create a new project directory
mkdir custom-ai-model && cd custom-ai-model

# Create a virtual environment
python -m venv unsloth-env
source unsloth-env/bin/activate  # On Windows: unsloth-env\\Scripts\\activate

# Install Unsloth and dependencies
pip install "unsloth[colab]@git+https://github.com/unslothai/unsloth.git"
pip install trl transformers datasets

Gotcha: If you get CUDA errors, check your PyTorch installation. Unsloth needs PyTorch with CUDA support.
Run pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118 if needed.

Step 2: Prepare Your Training Data

# Create a file called prepare_data.py:

from datasets import Dataset
import json
# Your custom data - replace with your actual business data
examples = [
    {
        "instruction": "What is our return policy for electronics?",
        "input": "",
        "output": "Electronics can be returned within 30 days with original packaging. No restocking fee applies."
    \},
    {
        "instruction": "How do I reset my account password?",
        "input": "",
        "output": "Go to settings > security > password reset. You'll receive an email with a link valid for 24 hours."
    \},
    # Add 50-100 more examples for decent results
]

# Convert to Hugging Face dataset format
dataset = Dataset.from_list(examples)
dataset.save_to_disk("./my_training_data")
print(f"Saved {len(dataset)\} examples")


# Run it: python prepare_data.py

Troubleshooting Tip: Start with 50-100 examples. More data gives better results but takes longer to train. Use your actual customer support logs, documentation, or product specifications.

Step 3: Configure and Run Training

# Create train_model.py:

from unsloth import FastLanguageModel
import torch
from datasets import load_from_disk
from trl import SFTTrainer
from transformers import TrainingArguments

# Create train_model.py:

from unsloth import FastLanguageModel
import torch
from datasets import load_from_disk
from trl import SFTTrainer
from transformers import TrainingArguments

Load a small, efficient model
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/mistral-7b-bnb-4bit",
    max_seq_length = 2048,
    dtype = torch.float16,
    load_in_4bit = True,  # Saves 75% memory
)

Enable LoRA for faster training with less memory
model = FastLanguageModel.get_peft_model(
    model,
    r = 16,  # LoRA rank
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 16,
    lora_dropout = 0,
    bias = "none",
    use_gradient_checkpointing = True,
    random_state = 3407,
)

# Load your data
train_dataset = load_from_disk("./my_training_data")

# Configure training
trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = train_dataset,
    dataset_text_field = "output",
    max_seq_length = 1024,
    args = TrainingArguments(
        per_device_train_batch_size = 2,
        gradient_accumulation_steps = 4,
        warmup_steps = 10,
        max_steps = 60,  # Start small
        learning_rate = 2e-4,
        fp16 = not torch.cuda.is_bf16_supported(),
        bf16 = torch.cuda.is_bf16_supported(),
        logging_steps = 1,
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "linear",
        seed = 3407,
        output_dir = "outputs",
    ),
)

# Train
trainer.train()

# Save the model
model.save_pretrained("my_fine_tuned_model")
tokenizer.save_pretrained("my_fine_tuned_model")
print("Training complete! Model saved to 'my_fine_tuned_model'")

# Run training: python train_model.py

Expected Output: You'll see loss decreasing every few steps. Training 60 steps on 100 examples takes about 15-20 minutes on an RTX 4090.

Gotcha: If you run out of memory, reduce per_device_train_batch_size to 1. If training is too slow, reduce max_seq_length to 512.

Step 4: Test Your Model

# Create test_model.py:
from unsloth import FastLanguageModel
import torch

# Load your fine-tuned model
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "./my_fine_tuned_model",
    max_seq_length = 1024,
    dtype = torch.float16,
    load_in_4bit = True,
)

FastLanguageModel.for_inference(model)

# Create a prompt
test_question = "What is our return policy for electronics?"
inputs = tokenizer(
    [f"<s>Below is an instruction. Write a response.\\n\\n### Instruction:\\n{test_question\}\\n\\n### Response:\\n"],
    return_tensors = "pt",
).to("cuda")

# Generate response
outputs = model.generate(**inputs, max_new_tokens=100, use_cache=True)
response = tokenizer.batch_decode(outputs)[0]

Extract just the answer
answer_start = response.find("### Response:\\n") + len("### Response:\\n")
clean_answer = response[answer_start:].strip()
print(f"Question: {test_question\}")
print(f"Answer: {clean_answer\}")


# Run it: python test_model.py

Success Metric: The model should answer with something similar to your training data. It won't be perfect on the first try.

What to Watch Out For

  1. Data Quality Dictates Results Your model will only be as good as your training examples. If your data has contradictions or errors, the model will learn them. Start with a clean, consistent dataset of 50-100 high-quality examples before scaling up.
  2. This Is a Prototype, Not Production The model you just created works for testing concepts. For production use, you need more data (500-1000+ examples), proper evaluation metrics, and deployment infrastructure. Don't deploy this version to customers.

Your Next Move

Run this guide once with sample data to understand the workflow. Time yourself. Note where you hit bottlenecks.

Then decide: if the prototype shows promise, allocate 40 hours to prepare 500+ high-quality training examples and run proper training. If results are weak after 100 examples, reconsider whether fine-tuning is the right approach for your use case.

Visit Unsloth's official site for advanced features like multi-GPU training and Hugging Face Jobs integration when you're ready to scale.

fine-tune AI modelprivate language modelUnsloth tutorialcustom AI trainingCTO AI implementation

Comments

Loading...

Turn Research Into Results

At Klevox Studio, we help businesses translate cutting-edge research into real-world solutions. Whether you need AI strategy, automation, or custom software — we turn complexity into competitive advantage.

Ready to get started?