{"id":782,"date":"2026-04-06T04:32:41","date_gmt":"2026-04-06T04:32:41","guid":{"rendered":"https:\/\/techpaathshala.com\/blog\/?p=782"},"modified":"2026-07-27T08:26:34","modified_gmt":"2026-07-27T08:26:34","slug":"llm-fine-tuning-for-developers-a-practical-guide-2026","status":"publish","type":"post","link":"https:\/\/techpaathshala.com\/blog\/llm-fine-tuning-for-developers-a-practical-guide-2026\/","title":{"rendered":"LLM Fine-Tuning for Developers \u2014 A Practical Guide (2026)"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\">At some point in almost every serious AI engineering project, a developer hits the same wall.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The LLM is smart. It understands the domain. It answers questions correctly most of the time. But it does not quite <em>sound<\/em> like the product. It formats its responses inconsistently. It occasionally drifts outside the boundaries you need it to stay within. You have tried longer system prompts. You have tried few-shot examples. You have tried temperature adjustments. The outputs are better \u2014 but not reliably, consistently, production-grade better.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">This is the moment when fine-tuning enters the conversation.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Fine-tuning is the process of taking a pre-trained LLM and further training it on a smaller, curated dataset specific to your use case \u2014 so that the model&#8217;s behaviour, tone, format, and domain understanding are shaped to your exact requirements. Done correctly, it is one of the most powerful tools in an AI engineer&#8217;s kit. Done incorrectly, it is expensive, time-consuming, and produces a model that performs worse than the base model it started from.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">This guide gives you the practical map: what fine-tuning is, when it is the right solution (and when it is not), how the process works end-to-end, which techniques matter in 2026, and how to start building this skill in a way that is immediately relevant to your career.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n<div class=\"custom-ad-banner\" style=\"margin:20px 0; text-align:center;\"><a href=\"https:\/\/techpaathshala.com\/data-analytics-program-mumbai\" target=\"_blank\" rel=\"noopener noreferrer\"><img decoding=\"async\" src=\"https:\/\/techpaathshala.com\/blog\/wp-content\/uploads\/2026\/04\/WhatsApp-Image-2026-04-20-at-11.47.35-AM-1-1.jpeg\" alt=\"Advertisement\" \/><\/a><\/div>\n\n\n\n<h2 class=\"wp-block-heading\">What Fine-Tuning Actually Is<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Pre-training a large language model from scratch \u2014 the process that creates GPT-4, Claude, or Llama \u2014 requires billions of tokens of text data, thousands of GPUs running for months, and compute budgets measured in tens of millions of dollars. It is done by AI labs, not by development teams.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Fine-tuning starts from the result of that process. You take a model that has already learned language, reasoning, and general world knowledge from pre-training, and you train it further on a much smaller, task-specific dataset. The model&#8217;s weights are adjusted \u2014 not reset \u2014 so it retains its general intelligence while developing specialised behaviour in the area you are training it on.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Think of it this way. Pre-training is a university education \u2014 broad, comprehensive, years of accumulated knowledge. Fine-tuning is the professional apprenticeship that follows \u2014 specific, applied, shaped by the exact context of the role.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The dataset you fine-tune on is typically composed of <strong>input-output pairs<\/strong>: examples of the exact type of prompt the model will receive in production, paired with the exact type of response you want it to produce. The model learns, from these examples, what good behaviour looks like for your specific use case.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">What Fine-Tuning Changes<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Fine-tuning adjusts:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Style and tone<\/strong> \u2014 how the model sounds (formal, conversational, technical, empathetic)<\/li>\n\n\n\n<li><strong>Output format<\/strong> \u2014 whether responses are in JSON, markdown, bullet points, or specific templates<\/li>\n\n\n\n<li><strong>Domain specialisation<\/strong> \u2014 how the model reasons about and discusses a specific subject area<\/li>\n\n\n\n<li><strong>Instruction following<\/strong> \u2014 how reliably the model follows complex, multi-constraint instructions<\/li>\n\n\n\n<li><strong>Boundary adherence<\/strong> \u2014 how consistently the model stays within defined scope<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">Fine-tuning does <strong>not<\/strong> reliably:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Add new factual knowledge the base model does not have (use RAG for this)<\/li>\n\n\n\n<li>Fix fundamental reasoning limitations of the base model<\/li>\n\n\n\n<li>Make a smaller model perform like a larger one on complex tasks<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">This distinction \u2014 fine-tuning for behaviour, RAG for knowledge \u2014 is the most important conceptual clarity to develop before deciding whether fine-tuning is the right solution for your problem.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">When to Fine-Tune: The Decision Framework<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Fine-tuning is the right solution for a specific and identifiable set of problems. Using it for the wrong problem wastes significant time and compute. Before committing to a fine-tuning project, work through this decision framework honestly.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">First, Exhaust Prompt Engineering<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Before fine-tuning, you should have genuinely tried prompt engineering at a serious level. Not just a basic system prompt \u2014 a carefully structured system prompt with explicit instructions, format specifications, and 3\u20135 few-shot examples of exactly the behaviour you want.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">If well-engineered prompting with few-shot examples produces the behaviour you need reliably, fine-tuning is unnecessary overhead. Prompting is cheaper, faster to iterate, and easier to update.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">If you have done this work honestly and the model still produces inconsistent output \u2014 wrong format some percentage of the time, tone drift under certain conditions, unreliable instruction following \u2014 then fine-tuning is worth evaluating.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">The Fine-Tuning Decision Checklist<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Fine-tuning is likely the right solution if:<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>\u2611 You have a highly specific, consistent output format requirement<\/strong> \u2014 JSON schemas, domain-specific templates, structured clinical notes, legal document formats \u2014 that few-shot prompting cannot make reliable at production scale.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>\u2611 You need to shorten prompts significantly at inference time<\/strong> \u2014 Every inference call with a 2,000-token system prompt plus 5 few-shot examples costs significantly more than a call with a 200-token system prompt to a fine-tuned model that already knows the behaviour. At high volume, fine-tuning pays for itself in inference cost reduction.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>\u2611 You are working in a highly specialised domain<\/strong> \u2014 Medical coding, legal contract analysis, financial instrument classification, Marathi-language customer service \u2014 domains with specific vocabulary, reasoning patterns, and output conventions that general-purpose prompting struggles to capture consistently.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>\u2611 You need the model to follow complex multi-constraint instructions reliably<\/strong> \u2014 When &#8220;always do X, never do Y, format as Z, only discuss topics A and B&#8221; needs to be followed perfectly across thousands of calls, fine-tuning on examples of correct behaviour is more reliable than hoping the system prompt is always parsed correctly.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>\u2611 You have good training data<\/strong> \u2014 At least 50\u2013100 high-quality examples for simple tasks; 500\u20131,000+ for complex behaviour. If you do not have this data yet, building the dataset is the first and most important step.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Fine-tuning is likely <strong>not<\/strong> the right solution if:<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>\u2612 The problem is that the model does not know your proprietary data<\/strong> \u2014 This is a RAG problem, not a fine-tuning problem.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>\u2612 You do not have a consistent evaluation metric<\/strong> \u2014 If you cannot define what &#8220;better&#8221; means and measure it objectively, you cannot know whether your fine-tuned model is actually better than the base model.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>\u2612 Your use case is changing rapidly<\/strong> \u2014 Fine-tuning produces a static model. If the desired behaviour evolves frequently, re-fine-tuning repeatedly is operationally expensive. Prompting is easier to update.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>\u2612 You have fewer than 50 high-quality examples<\/strong> \u2014 Fine-tuning on too little data produces a model that memorises examples rather than learning the pattern. Collect more data first.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">The Fine-Tuning Landscape in 2026: Techniques That Matter<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">The fine-tuning landscape has evolved significantly. In 2026, these are the techniques that are practically relevant for most development teams.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Full Fine-Tuning<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">The original approach: update all of the model&#8217;s weights on your training dataset. Every parameter in the model is adjusted.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>When it applies:<\/strong> Large organisations with significant compute budgets and the need to fundamentally reshape model behaviour. Rarely practical for development teams working on product features.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>The constraint:<\/strong> Training a 7B parameter model requires substantial GPU resources \u2014 typically multiple A100 or H100 GPUs running for hours to days. For teams without a dedicated ML infrastructure, this is a significant barrier.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h3 class=\"wp-block-heading\">LoRA \u2014 Low-Rank Adaptation<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">LoRA is the technique that made fine-tuning accessible to development teams without research-lab compute budgets. It is the most practically important fine-tuning technique to understand in 2026.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>How it works:<\/strong> Instead of updating all of the model&#8217;s weights, LoRA freezes the original model weights and trains two small additional matrices that are multiplied together to produce a low-rank approximation of the weight updates. These adapter matrices are tiny relative to the full model \u2014 typically less than 1% of the total parameter count.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">At inference time, the LoRA adapter weights are merged back into the original model weights, and the model runs at the same speed as the base model. There is no inference latency penalty for using LoRA.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Why this matters:<\/strong> LoRA makes it possible to fine-tune a 7B parameter model on a single consumer GPU (e.g., an A10G or even a high-VRAM consumer card). The compute cost drops from &#8220;research lab&#8221; to &#8220;accessible startup.&#8221; The training time drops from days to hours.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>The key hyperparameters:<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><code>r<\/code> (rank): Controls the size of the adapter matrices. Higher rank = more capacity to learn = more compute and memory. Start with <code>r=16<\/code> and adjust based on task complexity.<\/li>\n\n\n\n<li><code>alpha<\/code>: Scaling factor. Typically set to <code>2 * r<\/code> as a starting default.<\/li>\n\n\n\n<li><code>target_modules<\/code>: Which layers to apply LoRA to. For most transformer models, targeting the attention layers (<code>q_proj<\/code>, <code>v_proj<\/code>) is the standard starting point.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h3 class=\"wp-block-heading\">QLoRA \u2014 Quantised Low-Rank Adaptation<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">QLoRA combines LoRA with model quantisation \u2014 representing model weights in lower precision (4-bit instead of 16-bit or 32-bit) to reduce memory requirements dramatically.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The practical result: you can fine-tune a 13B or even 70B parameter model on a single GPU that would otherwise require multiple GPUs for standard LoRA. Memory requirements drop by approximately 4x compared to 16-bit LoRA.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">QLoRA is the technique that has made fine-tuning large open-source models (Llama 3, Mistral, Falcon) accessible to developers working on consumer-grade or single-node cloud GPU instances.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>When to use QLoRA over LoRA:<\/strong> When you are working with larger models (13B+) and GPU memory is the primary constraint. The trade-off is slightly slower training compared to standard LoRA, but for most practical use cases the quality difference is negligible.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h3 class=\"wp-block-heading\">RLHF and DPO \u2014 For Alignment Tasks<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>RLHF (Reinforcement Learning from Human Feedback)<\/strong> is the technique used to align base models to be helpful, harmless, and honest \u2014 the process behind models like ChatGPT and Claude. It involves human raters evaluating model outputs, training a reward model on those ratings, and using reinforcement learning to optimise the LLM toward higher-reward outputs.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>DPO (Direct Preference Optimisation)<\/strong> is a more recent, simpler alternative to RLHF that achieves similar alignment results without a separate reward model. Given pairs of preferred and rejected responses for the same prompt, DPO directly optimises the model to prefer the chosen response.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Relevance for most development teams:<\/strong> RLHF is complex to implement correctly and requires significant human rating infrastructure. DPO is more accessible and worth understanding for teams working on AI assistant products where response quality alignment is a core concern. For most product feature fine-tuning (format, style, domain), standard supervised fine-tuning on input-output pairs is sufficient without alignment techniques.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">The Fine-Tuning Pipeline: End-to-End<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Step 1: Define the Task and Success Metrics Precisely<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Before touching data or code, write down in one paragraph exactly what you want the fine-tuned model to do differently from the base model. Be specific. Not &#8220;answer customer queries better&#8221; but &#8220;respond to customer support queries in JSON format with fields: <code>issue_category<\/code>, <code>resolution_steps<\/code>, <code>escalate_flag<\/code>, <code>tone<\/code>, following the tone guidelines in our brand voice document, and never discussing competitor products.&#8221;<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Then define how you will measure success. What is your evaluation metric? Response format accuracy (is the JSON always valid and complete)? BLEU or ROUGE score against gold-standard responses? Human evaluation by domain experts? A classifier that scores outputs on specific dimensions?<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Without a clear metric, you cannot know whether your fine-tuned model is better than the base model. This step is the most important and the most frequently skipped.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h3 class=\"wp-block-heading\">Step 2: Build the Training Dataset<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Dataset quality is the dominant factor in fine-tuning success. A small, high-quality dataset outperforms a large, mediocre one consistently.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Dataset format:<\/strong> For supervised fine-tuning (the most common approach), you need input-output pairs. Each example has a prompt (what the model receives) and a completion (the ideal response you want the model to produce).<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">For chat-based fine-tuning (fine-tuning a conversational model), examples are structured as conversation turns:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>{\n  \"messages\": &#091;\n    {\n      \"role\": \"system\",\n      \"content\": \"You are a customer support assistant for an Indian e-commerce platform. Always respond in the following JSON format: {\\\"category\\\": \\\"\\\", \\\"response\\\": \\\"\\\", \\\"escalate\\\": false}\"\n    },\n    {\n      \"role\": \"user\",\n      \"content\": \"My order was supposed to arrive yesterday and it still hasn't come. Order ID is #45821.\"\n    },\n    {\n      \"role\": \"assistant\",\n      \"content\": \"{\\\"category\\\": \\\"delayed_delivery\\\", \\\"response\\\": \\\"I apologise for the delay on order #45821. I have flagged this for our logistics team and you will receive an update within 2 hours. If the order does not arrive by end of day, please contact us for a replacement or refund.\\\", \\\"escalate\\\": false}\"\n    }\n  ]\n}\n<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Dataset size guidelines:<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Simple format or style changes: 50\u2013200 high-quality examples<\/li>\n\n\n\n<li>Domain adaptation (specialised vocabulary and reasoning): 500\u20132,000 examples<\/li>\n\n\n\n<li>Complex multi-constraint behaviour: 1,000\u20135,000+ examples<\/li>\n\n\n\n<li>Fundamental behaviour changes: 5,000+ examples<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Data quality checklist:<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Every example reflects exactly the behaviour you want \u2014 no &#8220;mostly correct&#8221; examples that could teach bad habits<\/li>\n\n\n\n<li>Examples cover the full range of input variations the model will see in production<\/li>\n\n\n\n<li>Edge cases and tricky inputs are represented \u2014 not just the easy cases<\/li>\n\n\n\n<li>A human expert has reviewed every example, not just a sample<\/li>\n\n\n\n<li>No duplicate or near-duplicate examples that could cause memorisation<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Generating synthetic training data:<\/strong> When you do not have enough real examples, you can use a high-quality LLM (GPT-4o, Claude 3.5 Sonnet) to generate synthetic examples. Give it your task definition, a few seed examples, and ask it to generate variations. Always review synthetic data before using it \u2014 LLM-generated training data can introduce subtle errors that propagate into your fine-tuned model.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h3 class=\"wp-block-heading\">Step 3: Choose Your Base Model<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">The base model you fine-tune on determines the ceiling of what your fine-tuned model can achieve. Fine-tuning does not make a weak model strong \u2014 it shapes the behaviour of an already capable model.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>OpenAI fine-tuning API (GPT-3.5 Turbo, GPT-4o Mini):<\/strong> The most accessible entry point for developers without ML infrastructure. Upload your JSONL training file via API, trigger training, and receive a fine-tuned model endpoint. No GPU management, no infrastructure setup. Best for teams that want to fine-tune without managing compute.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Llama 3 (Meta, open-source):<\/strong> The open-source standard in 2026. Available in 8B and 70B parameter sizes. Fine-tuning Llama 3 on your own infrastructure means no per-token API costs at inference time and no data leaving your environment \u2014 critical for applications handling sensitive financial, medical, or legal data. Requires GPU access (cloud or on-premises).<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Mistral models (Mistral AI):<\/strong> Strong performance per parameter, open weights, well-supported by the fine-tuning ecosystem. A strong alternative to Llama 3 for teams that want open-source flexibility.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Gemma (Google, open-source):<\/strong> Google&#8217;s open-source model family. Particularly strong on code and multilingual tasks. Worth evaluating for applications with Hindi or Marathi language requirements.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Choosing between managed API and open-source:<\/strong><\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><thead><tr><th><\/th><th><strong>Managed API (OpenAI)<\/strong><\/th><th><strong>Open-Source (Llama\/Mistral)<\/strong><\/th><\/tr><\/thead><tbody><tr><td>Infrastructure<\/td><td>None required<\/td><td>GPU access required<\/td><\/tr><tr><td>Data privacy<\/td><td>Data sent to OpenAI<\/td><td>Fully on-premises option<\/td><\/tr><tr><td>Inference cost<\/td><td>Per-token pricing<\/td><td>Fixed infrastructure cost<\/td><\/tr><tr><td>Flexibility<\/td><td>Limited to available base models<\/td><td>Any open-source model<\/td><\/tr><tr><td>Barrier to entry<\/td><td>Very low<\/td><td>Moderate to high<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h3 class=\"wp-block-heading\">Step 4: Train the Model<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Using the OpenAI API (simplest path):<\/strong><\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>from openai import OpenAI\n\nclient = OpenAI()\n\n# Upload training file\nwith open(\"training_data.jsonl\", \"rb\") as f:\n    response = client.files.create(file=f, purpose=\"fine-tune\")\n    file_id = response.id\n\nprint(f\"Training file uploaded: {file_id}\")\n\n# Start fine-tuning job\njob = client.fine_tuning.jobs.create(\n    training_file=file_id,\n    model=\"gpt-4o-mini-2024-07-18\",\n    hyperparameters={\n        \"n_epochs\": 3,          # start with 3, adjust based on results\n        \"batch_size\": \"auto\",\n        \"learning_rate_multiplier\": \"auto\"\n    }\n)\n\nprint(f\"Fine-tuning job started: {job.id}\")\n\n# Check job status\njob_status = client.fine_tuning.jobs.retrieve(job.id)\nprint(f\"Status: {job_status.status}\")\nprint(f\"Fine-tuned model: {job_status.fine_tuned_model}\")\n<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Using Hugging Face + LoRA for open-source models:<\/strong><\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>from transformers import AutoModelForCausalLM, AutoTokenizer, TrainingArguments\nfrom peft import LoraConfig, get_peft_model\nfrom trl import SFTTrainer\nfrom datasets import load_dataset\n\n# Load base model and tokenizer\nmodel_name = \"meta-llama\/Meta-Llama-3-8B-Instruct\"\ntokenizer = AutoTokenizer.from_pretrained(model_name)\nmodel = AutoModelForCausalLM.from_pretrained(\n    model_name,\n    load_in_4bit=True,          # QLoRA: 4-bit quantisation\n    device_map=\"auto\"\n)\n\n# Configure LoRA\nlora_config = LoraConfig(\n    r=16,                        # rank\n    lora_alpha=32,               # scaling factor\n    target_modules=&#091;\"q_proj\", \"v_proj\"],  # attention layers\n    lora_dropout=0.05,\n    bias=\"none\",\n    task_type=\"CAUSAL_LM\"\n)\n\nmodel = get_peft_model(model, lora_config)\nmodel.print_trainable_parameters()  # verify: should be &lt; 1% of total params\n\n# Load your dataset\ndataset = load_dataset(\"json\", data_files=\"training_data.jsonl\")\n\n# Training configuration\ntraining_args = TrainingArguments(\n    output_dir=\".\/fine_tuned_model\",\n    num_train_epochs=3,\n    per_device_train_batch_size=4,\n    gradient_accumulation_steps=4,\n    learning_rate=2e-4,\n    fp16=True,\n    logging_steps=10,\n    save_strategy=\"epoch\"\n)\n\n# Train\ntrainer = SFTTrainer(\n    model=model,\n    args=training_args,\n    train_dataset=dataset&#091;\"train\"],\n    tokenizer=tokenizer,\n    dataset_text_field=\"text\"\n)\n\ntrainer.train()\ntrainer.save_model(\".\/fine_tuned_model\")\n<\/code><\/pre>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h3 class=\"wp-block-heading\">Step 5: Evaluate Rigorously<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">This is the step most developers rush \u2014 and where the most value is lost.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Never deploy a fine-tuned model based on subjective impression. Use a held-out evaluation dataset (examples not in the training set) and measure your defined success metrics objectively.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Automated evaluation metrics:<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Format accuracy:<\/strong> What percentage of responses match the required format exactly? (JSON validity, required fields present, no prohibited content)<\/li>\n\n\n\n<li><strong>ROUGE-L:<\/strong> Measures overlap between generated and reference responses. Useful for summarisation and structured output tasks.<\/li>\n\n\n\n<li><strong>Exact match:<\/strong> For classification or extraction tasks where there is one correct answer.<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Human evaluation:<\/strong> For tasks where quality cannot be fully captured by automated metrics (tone, helpfulness, naturalness), human evaluation remains the gold standard. Have domain experts rate a sample of fine-tuned model responses against base model responses on defined dimensions. Even 50\u2013100 evaluated examples produces meaningful signal.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>The comparison you must make:<\/strong> Always evaluate your fine-tuned model against both the base model and against a well-prompted version of the base model. If your fine-tuned model does not outperform a well-prompted base model on your success metrics, the fine-tuning was not worth the investment.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Regression testing:<\/strong> Fine-tuning on a specific task can degrade performance on other tasks \u2014 a phenomenon called catastrophic forgetting. Test your fine-tuned model on a set of general capability prompts to ensure you have not significantly degraded its baseline performance in the process of specialising it.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h3 class=\"wp-block-heading\">Step 6: Deploy and Monitor<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">A fine-tuned model that is not monitored in production is a liability. Model behaviour can drift in unexpected ways when real-world inputs differ from training distribution.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>What to monitor:<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Output format validity rate (is the JSON always parseable?)<\/li>\n\n\n\n<li>Flagged responses (outputs that trigger content filters or business rule violations)<\/li>\n\n\n\n<li>User feedback signals (thumbs up\/down, escalation rates, task completion rates)<\/li>\n\n\n\n<li>Latency and cost per inference<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">Build a feedback loop: real-world outputs that represent failures or edge cases become candidates for your next training dataset iteration. Fine-tuning is not a one-time event \u2014 it is a cycle of deployment, monitoring, data collection, and improvement.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">Common Fine-Tuning Mistakes That Kill Results<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Mistake 1: Fine-tuning when prompting would have worked.<\/strong> Fine-tuning is the solution to a specific problem. If the problem is &#8220;the model doesn&#8217;t always follow my format instructions,&#8221; try a more explicit system prompt with format enforcement before committing to fine-tuning. Many developers fine-tune prematurely.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Mistake 2: Training on mediocre data.<\/strong> Ten bad examples are worse than zero examples. Every training example teaches the model something. If your dataset contains inconsistent, partially correct, or ambiguous examples, your fine-tuned model will reflect that inconsistency. Quality filters on your dataset are not optional.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Mistake 3: Training for too many epochs.<\/strong> More epochs are not always better. Overfitting \u2014 where the model memorises training examples rather than learning the generalised pattern \u2014 is the most common fine-tuning failure mode. Watch your validation loss. If it starts increasing while training loss continues decreasing, stop training.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Mistake 4: No evaluation dataset.<\/strong> If you used all your data for training, you have no objective way to measure whether the fine-tuned model is better than the base model. Always reserve 10\u201320% of your dataset for evaluation before training.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Mistake 5: Ignoring the base model&#8217;s capabilities.<\/strong> Fine-tuning does not change what a model fundamentally can and cannot do. A 1B parameter model fine-tuned on complex reasoning examples will not become a 70B parameter model. Match the base model capability to the task complexity before fine-tuning.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">Fine-Tuning as a Career Skill in 2026<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">The Indian AI engineering job market in 2026 has a clear tier structure. At the top are engineers who can do all three: prompt engineering, RAG implementation, and fine-tuning. Being able to reason about which technique solves which problem \u2014 and execute on the right one \u2014 is the profile that commands senior AI engineering roles and the salary brackets that accompany them.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Fine-tuning specifically is underrepresented in the self-taught developer community. Most developers who learn AI start with prompting, move to RAG, and stop there. Fine-tuning requires more infrastructure knowledge, more ML intuition, and more careful data thinking \u2014 which means the developers who develop this skill face significantly less competition for the roles that require it.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">For <strong>final-year students<\/strong>: a fine-tuning project in your portfolio \u2014 a GitHub repository showing a LoRA fine-tune of Llama 3 on a specific task, with a documented evaluation \u2014 is a powerful differentiator from candidates who have only API-called pre-trained models.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">For <strong>mid-level developers<\/strong>: fine-tuning fluency is the clearest technical signal that separates an &#8220;AI-aware&#8221; developer from an &#8220;AI engineer.&#8221; It demonstrates understanding of how models actually work, not just how to use them as black boxes.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">For <strong>senior developers and ML engineers<\/strong>: fine-tuning expertise is the foundation for roles involving model customisation, AI product development, and technical leadership on AI engineering teams \u2014 where the decisions about when and how to fine-tune are consequential and poorly made decisions are expensive.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">Your Practical Starting Point<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">The gap between understanding fine-tuning conceptually and doing your first fine-tuning run is smaller than it looks. Here is your starting point:<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>This week:<\/strong> Set up the OpenAI fine-tuning API. Build a small dataset of 50 examples for a specific task (content classification, structured extraction, tone-consistent responses). Run your first fine-tuning job. Compare the output against the base model on 20 test examples.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Next two weeks:<\/strong> Move to open-source. Set up a Google Colab or cloud GPU instance. Run a QLoRA fine-tune on Llama 3 8B using the <code>trl<\/code> and <code>peft<\/code> libraries. Document what you built, what metric you improved, and by how much.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>In 30 days:<\/strong> You will have fine-tuned two models, run a real evaluation, and built the mental model of the full pipeline from data to deployment. That is the foundation everything else builds on.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>At some point in almost every serious AI engineering project, a developer hits the same wall. The LLM is smart. It understands the domain. It answers questions correctly most of the time. But it does not quite sound like the product. It formats its responses inconsistently. It occasionally drifts outside the boundaries you need it [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":825,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"ocean_post_layout":"","ocean_both_sidebars_style":"","ocean_both_sidebars_content_width":0,"ocean_both_sidebars_sidebars_width":0,"ocean_sidebar":"","ocean_second_sidebar":"","ocean_disable_margins":"enable","ocean_add_body_class":"","ocean_shortcode_before_top_bar":"","ocean_shortcode_after_top_bar":"","ocean_shortcode_before_header":"","ocean_shortcode_after_header":"","ocean_has_shortcode":"","ocean_shortcode_after_title":"","ocean_shortcode_before_footer_widgets":"","ocean_shortcode_after_footer_widgets":"","ocean_shortcode_before_footer_bottom":"","ocean_shortcode_after_footer_bottom":"","ocean_display_top_bar":"default","ocean_display_header":"default","ocean_header_style":"","ocean_center_header_left_menu":"","ocean_custom_header_template":"","ocean_custom_logo":0,"ocean_custom_retina_logo":0,"ocean_custom_logo_max_width":0,"ocean_custom_logo_tablet_max_width":0,"ocean_custom_logo_mobile_max_width":0,"ocean_custom_logo_max_height":0,"ocean_custom_logo_tablet_max_height":0,"ocean_custom_logo_mobile_max_height":0,"ocean_header_custom_menu":"","ocean_menu_typo_font_family":"","ocean_menu_typo_font_subset":"","ocean_menu_typo_font_size":0,"ocean_menu_typo_font_size_tablet":0,"ocean_menu_typo_font_size_mobile":0,"ocean_menu_typo_font_size_unit":"px","ocean_menu_typo_font_weight":"","ocean_menu_typo_font_weight_tablet":"","ocean_menu_typo_font_weight_mobile":"","ocean_menu_typo_transform":"","ocean_menu_typo_transform_tablet":"","ocean_menu_typo_transform_mobile":"","ocean_menu_typo_line_height":0,"ocean_menu_typo_line_height_tablet":0,"ocean_menu_typo_line_height_mobile":0,"ocean_menu_typo_line_height_unit":"","ocean_menu_typo_spacing":0,"ocean_menu_typo_spacing_tablet":0,"ocean_menu_typo_spacing_mobile":0,"ocean_menu_typo_spacing_unit":"","ocean_menu_link_color":"","ocean_menu_link_color_hover":"","ocean_menu_link_color_active":"","ocean_menu_link_background":"","ocean_menu_link_hover_background":"","ocean_menu_link_active_background":"","ocean_menu_social_links_bg":"","ocean_menu_social_hover_links_bg":"","ocean_menu_social_links_color":"","ocean_menu_social_hover_links_color":"","ocean_disable_title":"default","ocean_disable_heading":"default","ocean_post_title":"","ocean_post_subheading":"","ocean_post_title_style":"","ocean_post_title_background_color":"","ocean_post_title_background":0,"ocean_post_title_bg_image_position":"","ocean_post_title_bg_image_attachment":"","ocean_post_title_bg_image_repeat":"","ocean_post_title_bg_image_size":"","ocean_post_title_height":0,"ocean_post_title_bg_overlay":0.5,"ocean_post_title_bg_overlay_color":"","ocean_disable_breadcrumbs":"default","ocean_breadcrumbs_color":"","ocean_breadcrumbs_separator_color":"","ocean_breadcrumbs_links_color":"","ocean_breadcrumbs_links_hover_color":"","ocean_display_footer_widgets":"default","ocean_display_footer_bottom":"default","ocean_custom_footer_template":"","ocean_post_oembed":"","ocean_post_self_hosted_media":"","ocean_post_video_embed":"","ocean_link_format":"","ocean_link_format_target":"self","ocean_quote_format":"","ocean_quote_format_link":"post","ocean_gallery_link_images":"on","ocean_gallery_id":[],"footnotes":""},"categories":[3],"tags":[],"class_list":["post-782","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-data-analytics","entry","has-media"],"acf":[],"_links":{"self":[{"href":"https:\/\/techpaathshala.com\/blog\/wp-json\/wp\/v2\/posts\/782","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/techpaathshala.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/techpaathshala.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/techpaathshala.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/techpaathshala.com\/blog\/wp-json\/wp\/v2\/comments?post=782"}],"version-history":[{"count":2,"href":"https:\/\/techpaathshala.com\/blog\/wp-json\/wp\/v2\/posts\/782\/revisions"}],"predecessor-version":[{"id":905,"href":"https:\/\/techpaathshala.com\/blog\/wp-json\/wp\/v2\/posts\/782\/revisions\/905"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/techpaathshala.com\/blog\/wp-json\/wp\/v2\/media\/825"}],"wp:attachment":[{"href":"https:\/\/techpaathshala.com\/blog\/wp-json\/wp\/v2\/media?parent=782"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/techpaathshala.com\/blog\/wp-json\/wp\/v2\/categories?post=782"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/techpaathshala.com\/blog\/wp-json\/wp\/v2\/tags?post=782"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}