How to Add an AI Chatbot to Any Django Site in a Weekend

1. What We're Building

A floating chat bubble sits in the bottom-right corner of every page on your site. A visitor clicks it, types a question, and sees the Claude response stream in token-by-token — no page reload, no full-page UI. The conversation persists for the duration of the browser session. The server stores message history in Django's database so Claude has context across turns. Rate limiting keeps a rogue user from burning through your API budget.

This is not a toy demo. The architecture is the same one you'd run in production for a docs assistant, a support bot, or a lead-qualifier. We'll cover every layer — model, endpoint, streaming, widget, and guardrails — so you can adapt it to any existing Django project.

2. The Stack

Everything here works with a plain Django project. No Channels, no WebSockets, no Redis required on Day 1. We stream with Server-Sent Events (SSE) over a standard HTTP response, which Django handles natively with StreamingHttpResponse.

# requirements.txt additions
anthropic>=0.40.0    # Claude API SDK
django>=4.2          # sessions, ORM, StreamingHttpResponse
python-decouple       # clean env-var management (optional but recommended)

# Install
pip install anthropic python-decouple

Add your API key to .env:

ANTHROPIC_API_KEY=sk-ant-…

And load it in settings.py:

from decouple import config

ANTHROPIC_API_KEY  = config('ANTHROPIC_API_KEY')
CHAT_MODEL         = 'claude-opus-4-7'
CHAT_MAX_TOKENS    = 1024
CHAT_MAX_HISTORY   = 10   # message pairs to include in context
CHAT_SYSTEM_PROMPT = (
    "You are a helpful assistant for this website. "
    "Answer questions clearly and concisely. "
    "If you don't know something, say so."
)

3. Day 1 Morning — Django App & Model

Create a dedicated app. Keeping the chatbot self-contained makes it easy to drop into any existing project without touching core app logic.

python manage.py startapp chat

Add 'chat' to INSTALLED_APPS, then define the models:

# chat/models.py
from django.db import models


class Conversation(models.Model):
    session_key = models.CharField(max_length=40, db_index=True)
    created_at  = models.DateTimeField(auto_now_add=True)

    def __str__(self):
        return f"Conversation {self.session_key[:8]}"


class Message(models.Model):
    ROLE_CHOICES = [('user', 'User'), ('assistant', 'Assistant')]

    conversation = models.ForeignKey(
        Conversation, on_delete=models.CASCADE, related_name='messages'
    )
    role         = models.CharField(max_length=10, choices=ROLE_CHOICES)
    content      = models.TextField()
    created_at   = models.DateTimeField(auto_now_add=True)

    class Meta:
        ordering = ['created_at']

python manage.py makemigrations chat
python manage.py migrate

The Conversation model is keyed on Django's session key, so each browser session gets its own conversation thread. Message stores each turn with its role so we can replay the full history to Claude on every request.

4. Day 1 Afternoon — The Streaming Chat Endpoint

The view receives a POST with the user's message, builds the message history, opens a streaming connection to Claude, and yields Server-Sent Events back to the browser one token at a time. The assistant's full response is saved to the database once streaming completes.

# chat/views.py
import json
import anthropic
from django.conf import settings
from django.http import StreamingHttpResponse, JsonResponse
from django.views import View
from django.utils.decorators import method_decorator
from django.views.decorators.csrf import csrf_exempt

from .models import Conversation, Message
from .utils import check_rate_limit


@method_decorator(csrf_exempt, name='dispatch')
class ChatView(View):

    def post(self, request):
        # ── Parse input ──────────────────────────────────────────────
        try:
            body    = json.loads(request.body)
            content = body.get('message', '').strip()[:2000]
        except (ValueError, KeyError):
            return JsonResponse({'error': 'invalid_request'}, status=400)

        if not content:
            return JsonResponse({'error': 'empty_message'}, status=400)

        # ── Session ───────────────────────────────────────────────────
        if not request.session.session_key:
            request.session.create()
        session_key = request.session.session_key

        # ── Rate limit ────────────────────────────────────────────────
        if not check_rate_limit(session_key):
            return JsonResponse({'error': 'rate_limited'}, status=429)

        # ── Conversation ──────────────────────────────────────────────
        conv, _ = Conversation.objects.get_or_create(session_key=session_key)
        Message.objects.create(conversation=conv, role='user', content=content)

        recent   = conv.messages.order_by('-created_at')[:settings.CHAT_MAX_HISTORY * 2]
        messages = [{'role': m.role, 'content': m.content} for m in reversed(recent)]

        # ── Stream ────────────────────────────────────────────────────
        def event_stream():
            client      = anthropic.Anthropic(api_key=settings.ANTHROPIC_API_KEY)
            accumulated = ''

            try:
                with client.messages.stream(
                    model      = settings.CHAT_MODEL,
                    max_tokens = settings.CHAT_MAX_TOKENS,
                    system     = settings.CHAT_SYSTEM_PROMPT,
                    messages   = messages,
                ) as stream:
                    for text in stream.text_stream:
                        accumulated += text
                        yield f'data: {json.dumps({"t": text})}\n\n'

                Message.objects.create(
                    conversation=conv, role='assistant', content=accumulated
                )

            except anthropic.RateLimitError:
                yield f'data: {json.dumps({"error": "rate_limit"})}\n\n'
            except anthropic.APIStatusError as e:
                yield f'data: {json.dumps({"error": "api_error", "status": e.status_code})}\n\n'
            except Exception:
                yield f'data: {json.dumps({"error": "server_error"})}\n\n'

            yield 'data: [DONE]\n\n'

        response = StreamingHttpResponse(event_stream(), content_type='text/event-stream')
        response['Cache-Control']     = 'no-cache'
        response['X-Accel-Buffering'] = 'no'  # disable nginx buffering
        return response

The X-Accel-Buffering: no header is important if you run nginx in front of Django — without it nginx buffers the entire response and the streaming effect is lost.

Wire up the URL:

# chat/urls.py
from django.urls import path
from .views import ChatView

urlpatterns = [
    path('chat/', ChatView.as_view(), name='chat_message'),
]

# project/urls.py
from django.urls import path, include

urlpatterns = [
    ...
    path('api/', include('chat.urls')),
]

5. Day 1 Evening — Conversation History

We pass the last CHAT_MAX_HISTORY * 2 messages (user + assistant pairs) to Claude. Fetching only recent messages keeps the context window predictable and controls per-request token cost. For most support or FAQ bots, ten message pairs is more than enough.

Add a utility function for rate limiting while you're here:

# chat/utils.py
from django.core.cache import cache


def check_rate_limit(session_key: str, limit: int = 20, window: int = 3600) -> bool:
    """
    Allow at most `limit` messages per `window` seconds per session.
    Returns True if the request is within the limit.
    """
    key   = f'chat_rate_{session_key}'
    count = cache.get(key, 0)
    if count >= limit:
        return False
    cache.set(key, count + 1, timeout=window)
    return True

This uses Django's cache framework — the default in-memory cache works for development. In production, point CACHES at Redis or Memcached to share limits across processes.

Test the endpoint with curl before moving to the frontend:

curl -s -X POST http://127.0.0.1:8000/api/chat/ \
  -H 'Content-Type: application/json' \
  -d '{"message": "What does this site do?"}' \
  --no-buffer

You should see data: {"t":"..."} lines appearing one by one, followed by data: [DONE]. If you get a wall of text at the end instead of a stream, nginx or another proxy is buffering — make sure X-Accel-Buffering: no reaches the client.

The widget is a self-contained block of HTML, CSS, and JavaScript that you drop into your base template once. It floats in the bottom-right corner of every page, opens on click, and has no external dependencies.

Add this just before the closing </body> tag in your base template:

<!-- ── Chat widget ─────────────────────────────── -->
<style>
#chat-bubble {
  position: fixed; bottom: 1.5rem; right: 1.5rem; z-index: 9000;
  display: flex; flex-direction: column; align-items: flex-end; gap: .75rem;
}
#chat-toggle {
  width: 52px; height: 52px; border-radius: 50%;
  background: #00e5a0; border: none; cursor: pointer;
  display: flex; align-items: center; justify-content: center;
  box-shadow: 0 4px 20px rgba(0,229,160,.35);
  transition: transform .15s, background .15s;
}
#chat-toggle:hover { transform: scale(1.08); background: #00ffb3; }
#chat-toggle svg   { color: #0d0f14; }

#chat-window {
  width: 340px; max-height: 480px;
  background: #13161d; border: 1px solid #252a36; border-radius: 14px;
  display: flex; flex-direction: column; overflow: hidden;
  box-shadow: 0 8px 40px rgba(0,0,0,.55);
  transition: opacity .15s, transform .15s;
}
#chat-window.chat-hidden { opacity: 0; pointer-events: none; transform: translateY(8px); }

#chat-head {
  padding: .7rem 1rem; background: #181c25;
  border-bottom: 1px solid #252a36;
  display: flex; justify-content: space-between; align-items: center;
}
#chat-head span { font-size: .78rem; font-weight: 600; color: #e2e8f0; }
#chat-close {
  background: none; border: none; color: #64748b;
  cursor: pointer; font-size: 1.1rem; line-height: 1; padding: .1rem .3rem;
}
#chat-close:hover { color: #e2e8f0; }

#chat-messages {
  flex: 1; overflow-y: auto; padding: .85rem 1rem;
  display: flex; flex-direction: column; gap: .6rem;
}
.chat-msg {
  max-width: 82%; padding: .55rem .85rem;
  border-radius: 10px; font-size: .8rem; line-height: 1.6;
  word-break: break-word;
}
.chat-msg.user      { align-self: flex-end; background: #00e5a0; color: #0d0f14; }
.chat-msg.assistant { align-self: flex-start; background: #1e2332; color: #cbd5e1; }
.chat-msg.assistant.streaming::after {
  content: '▍'; display: inline-block;
  animation: blink .7s step-end infinite;
}
@keyframes blink { 50% { opacity: 0; } }

#chat-foot {
  padding: .65rem .75rem; border-top: 1px solid #252a36;
  display: flex; gap: .5rem;
}
#chat-input {
  flex: 1; background: #0d0f14; border: 1px solid #252a36; border-radius: 7px;
  color: #e2e8f0; font-size: .78rem; padding: .45rem .7rem; outline: none;
  font-family: inherit;
}
#chat-input:focus { border-color: #00e5a0; }
#chat-send {
  background: #00e5a0; color: #0d0f14; border: none;
  border-radius: 7px; padding: .45rem .8rem;
  font-size: .75rem; font-weight: 700; cursor: pointer;
  transition: background .13s;
}
#chat-send:hover    { background: #00ffb3; }
#chat-send:disabled { opacity: .45; cursor: not-allowed; }

@media (max-width: 420px) {
  #chat-window { width: calc(100vw - 2rem); }
}
</style>

<div id="chat-bubble">
  <div id="chat-window" class="chat-hidden">
    <div id="chat-head">
      <span>Ask me anything</span>
      <button id="chat-close" aria-label="Close chat">&times;</button>
    </div>
    <div id="chat-messages"></div>
    <div id="chat-foot">
      <input id="chat-input" type="text" placeholder="Type a message…" autocomplete="off" />
      <button id="chat-send">Send</button>
    </div>
  </div>
  <button id="chat-toggle" aria-label="Open chat">
    <svg width="22" height="22" viewBox="0 0 24 24" fill="none">
      <path d="M21 15a2 2 0 0 1-2 2H7l-4 4V5a2 2 0 0 1 2-2h14a2 2 0 0 1 2 2z"
            stroke="currentColor" stroke-width="2"
            stroke-linecap="round" stroke-linejoin="round"/>
    </svg>
  </button>
</div>

7. Day 2 Afternoon — Streaming Tokens to the Browser

The JavaScript reads the SSE stream using the Fetch API's ReadableStream. Each data: line is parsed and appended to the assistant's message bubble as it arrives. Add this script block right after the widget HTML:

<script>
(function () {
  var win    = document.getElementById('chat-window');
  var msgs   = document.getElementById('chat-messages');
  var input  = document.getElementById('chat-input');
  var send   = document.getElementById('chat-send');
  var toggle = document.getElementById('chat-toggle');
  var close  = document.getElementById('chat-close');
  var busy   = false;

  toggle.addEventListener('click', function () {
    win.classList.toggle('chat-hidden');
    if (!win.classList.contains('chat-hidden')) input.focus();
  });
  close.addEventListener('click', function () {
    win.classList.add('chat-hidden');
  });

  function getCsrf() {
    var m = document.cookie.match(/csrftoken=([^;]+)/);
    return m ? m[1] : '';
  }

  function addMsg(role, text) {
    var div = document.createElement('div');
    div.className = 'chat-msg ' + role;
    div.textContent = text;
    msgs.appendChild(div);
    msgs.scrollTop = msgs.scrollHeight;
    return div;
  }

  async function sendMessage() {
    var text = input.value.trim();
    if (!text || busy) return;

    busy = true;
    send.disabled = true;
    input.value = '';

    addMsg('user', text);
    var reply = addMsg('assistant', '');
    reply.classList.add('streaming');

    try {
      var res = await fetch('/api/chat/', {
        method:  'POST',
        headers: {
          'Content-Type': 'application/json',
          'X-CSRFToken':  getCsrf(),
        },
        body: JSON.stringify({ message: text }),
      });

      if (!res.ok) {
        reply.textContent = 'Error ' + res.status + '. Please try again.';
        return;
      }

      var reader  = res.body.getReader();
      var decoder = new TextDecoder();
      var buffer  = '';

      while (true) {
        var chunk = await reader.read();
        if (chunk.done) break;

        buffer += decoder.decode(chunk.value, { stream: true });
        var lines = buffer.split('\n');
        buffer = lines.pop(); // hold back any incomplete line

        for (var i = 0; i < lines.length; i++) {
          var line = lines[i];
          if (!line.startsWith('data: ')) continue;
          var raw = line.slice(6);
          if (raw === '[DONE]') break;

          try {
            var data = JSON.parse(raw);
            if (data.error) {
              reply.textContent = 'Error: ' + data.error + '. Please try again.';
              break;
            }
            if (data.t) {
              reply.textContent += data.t;
              msgs.scrollTop = msgs.scrollHeight;
            }
          } catch (_) {}
        }
      }

    } catch (err) {
      reply.textContent = 'Connection error. Please try again.';
    } finally {
      reply.classList.remove('streaming');
      busy = false;
      send.disabled = false;
      input.focus();
    }
  }

  send.addEventListener('click', sendMessage);
  input.addEventListener('keydown', function (e) {
    if (e.key === 'Enter' && !e.shiftKey) { e.preventDefault(); sendMessage(); }
  });
}());
</script>

The streaming CSS class adds an animated block cursor while the response is in flight. It's removed in the finally block once streaming ends or fails. The CSRF token is read from the cookie — no {% csrf_token %} tag needed in the widget markup.

8. Rate Limiting & Cost Control

The per-session rate limit in utils.py is a floor, not a ceiling. Add a second layer of protection at the model level — limit the total tokens you're willing to spend per session:

# chat/utils.py (additions)
def estimate_tokens(text: str) -> int:
    """Rough token estimate: ~4 chars per token."""
    return max(1, len(text) // 4)


def session_token_budget_ok(session_key: str, new_tokens: int,
                             budget: int = 10_000) -> bool:
    """
    Track approximate total tokens used by this session.
    Returns False if adding new_tokens would exceed the budget.
    """
    key   = f'chat_tokens_{session_key}'
    used  = cache.get(key, 0)
    if used + new_tokens > budget:
        return False
    cache.set(key, used + new_tokens, timeout=86400)  # 24-hour window
    return True

Call session_token_budget_ok in the view before opening the Claude stream, passing estimate_tokens(content) as the cost estimate.

For a global safety net, set a max_tokens hard cap on the Claude API call itself. The CHAT_MAX_TOKENS = 1024 setting already does this — Claude will never return more than 1,024 tokens regardless of what the user sends.

9. Production Checklist

Before you ship:

CSRF. The view uses @csrf_exempt because we read the token from the cookie in JavaScript. If you prefer, remove the decorator and add a hidden CSRF input rendered by Django instead.
Session engine. Django's default cookie-based sessions work fine for development. In production use database or cache-backed sessions so session keys are stable across workers.
Rate limiting at the edge. The cache-based rate limiter works per-process unless you use Redis. Use Redis (django-redis) or a CDN/WAF rule to share limits across all Gunicorn workers.
Nginx buffering. Always set proxy_buffering off in your nginx location block for the /api/chat/ path, or rely on the X-Accel-Buffering: no header already set in the view.
Conversation cleanup. Add a management command or Celery Beat task to delete conversations older than 30 days. The database will grow otherwise.
System prompt. The system prompt is the single biggest lever on cost and relevance. Keep it short and specific. A prompt like "You are a support bot for [product]. Only answer questions about [product]. If asked anything else, politely redirect." reduces off-topic responses and long-winded answers.
Monitoring. Log usage.input_tokens and usage.output_tokens from the completed stream object (stream.get_final_message().usage) into a lightweight table so you can track cost per session over time.

With those in place you have a production-grade chatbot that took two days to build, runs on your existing Django infrastructure, and costs roughly $0.002–$0.01 per conversation at typical usage levels with claude-opus-4-7.