Fine-tuning
Fine-tuning is the process of taking a pre-trained AI model and training it further on a smaller, domain-specific dataset so it performs better on a particular task. It adapts a general model to your specific language, formats, and business logic.
What is Fine-tuning?
A foundation model like GPT-4 or Llama is trained on enormous volumes of general text. It knows a lot about language, logic, and common patterns — but it does not know your industry's terminology, your document formats, or your specific classification rules. Fine-tuning closes that gap. You take the pre-trained model and run an additional training pass on a curated dataset of examples specific to your use case, adjusting the model's internal weights to reflect the new domain.
The result is a model that retains the general capabilities of the base model but performs significantly better on the target task — more accurate extractions, more consistent classifications, better adherence to domain-specific rules.
Fine-tuning vs. Prompt Engineering
These are the two main levers for adapting a model's behaviour, and they serve different purposes:
Prompt engineering changes the instructions you give the model at runtime. No retraining, fast iteration, but limited by context window size and consistency at volume.
Fine-tuning changes the model itself. Requires labelled data and compute, but produces more reliable and consistent results for high-volume, repetitive tasks.
The practical decision: use prompt engineering first. If consistency or accuracy remains insufficient at the volumes you need, fine-tuning is the next step.
Fine-tuning in Operations
Fine-tuning becomes worth the investment when you are processing thousands of documents per month that follow consistent but non-standard formats — supplier invoices in multiple languages, customs declarations with industry-specific codes, production reports with proprietary field names. A fine-tuned model on 500 labelled examples of your actual documents will outperform a general model on prompt engineering alone. The upfront cost is data labelling and training time. The return is fewer errors and less human review at scale.