How to Add an AI Chatbot to Any Django Site in a Weekend
Two days, one Claude API key, zero new infrastructure. By end of Day 1 you have a streaming chat endpoint backed by session-scoped conversation history. By end of Day 2 you have a floating widget that drops into any Django template, token-by-token response rendering, and enough rate-limiting to keep your API bill predictable.
1. What We're Building
A floating chat bubble sits in the bottom-right corner of every page on your site. A visitor clicks it, types a question, and sees the Claude response stream in token-by-token — no page reload, no full-page UI. The conversation persists for the duration of the browser session. The server stores message history in Django's database so Claude has context across turns. Rate limiting keeps a rogue user from burning through your API budget.
This is not a toy demo. The architecture is the same one you'd run in production for a docs assistant, a support bot, or a lead-qualifier. We'll cover every layer — model, endpoint, streaming, widget, and guardrails — so you can adapt it to any existing Django project.
2. The Stack
Everything here works with a plain Django project. No Channels, no WebSockets, no Redis
required on Day 1. We stream with Server-Sent Events (SSE) over a standard HTTP response,
which Django handles natively with StreamingHttpResponse.
# requirements.txt additions
anthropic>=0.40.0 # Claude API SDK
django>=4.2 # sessions, ORM, StreamingHttpResponse
python-decouple # clean env-var management (optional but recommended)
# Install
pip install anthropic python-decouple
Add your API key to .env:
ANTHROPIC_API_KEY=sk-ant-…
And load it in settings.py:
from decouple import config
ANTHROPIC_API_KEY = config('ANTHROPIC_API_KEY')
CHAT_MODEL = 'claude-opus-4-7'
CHAT_MAX_TOKENS = 1024
CHAT_MAX_HISTORY = 10 # message pairs to include in context
CHAT_SYSTEM_PROMPT = (
"You are a helpful assistant for this website. "
"Answer questions clearly and concisely. "
"If you don't know something, say so."
)
3. Day 1 Morning — Django App & Model
Create a dedicated app. Keeping the chatbot self-contained makes it easy to drop into any existing project without touching core app logic.
python manage.py startapp chat
Add 'chat' to INSTALLED_APPS, then define the models:
# chat/models.py
from django.db import models
class Conversation(models.Model):
session_key = models.CharField(max_length=40, db_index=True)
created_at = models.DateTimeField(auto_now_add=True)
def __str__(self):
return f"Conversation {self.session_key[:8]}"
class Message(models.Model):
ROLE_CHOICES = [('user', 'User'), ('assistant', 'Assistant')]
conversation = models.ForeignKey(
Conversation, on_delete=models.CASCADE, related_name='messages'
)
role = models.CharField(max_length=10, choices=ROLE_CHOICES)
content = models.TextField()
created_at = models.DateTimeField(auto_now_add=True)
class Meta:
ordering = ['created_at']
python manage.py makemigrations chat
python manage.py migrate
The Conversation model is keyed on Django's session key, so each browser
session gets its own conversation thread. Message stores each turn with its
role so we can replay the full history to Claude on every request.
4. Day 1 Afternoon — The Streaming Chat Endpoint
The view receives a POST with the user's message, builds the message history, opens a streaming connection to Claude, and yields Server-Sent Events back to the browser one token at a time. The assistant's full response is saved to the database once streaming completes.
# chat/views.py
import json
import anthropic
from django.conf import settings
from django.http import StreamingHttpResponse, JsonResponse
from django.views import View
from django.utils.decorators import method_decorator
from django.views.decorators.csrf import csrf_exempt
from .models import Conversation, Message
from .utils import check_rate_limit
@method_decorator(csrf_exempt, name='dispatch')
class ChatView(View):
def post(self, request):
# ── Parse input ──────────────────────────────────────────────
try:
body = json.loads(request.body)
content = body.get('message', '').strip()[:2000]
except (ValueError, KeyError):
return JsonResponse({'error': 'invalid_request'}, status=400)
if not content:
return JsonResponse({'error': 'empty_message'}, status=400)
# ── Session ───────────────────────────────────────────────────
if not request.session.session_key:
request.session.create()
session_key = request.session.session_key
# ── Rate limit ────────────────────────────────────────────────
if not check_rate_limit(session_key):
return JsonResponse({'error': 'rate_limited'}, status=429)
# ── Conversation ──────────────────────────────────────────────
conv, _ = Conversation.objects.get_or_create(session_key=session_key)
Message.objects.create(conversation=conv, role='user', content=content)
recent = conv.messages.order_by('-created_at')[:settings.CHAT_MAX_HISTORY * 2]
messages = [{'role': m.role, 'content': m.content} for m in reversed(recent)]
# ── Stream ────────────────────────────────────────────────────
def event_stream():
client = anthropic.Anthropic(api_key=settings.ANTHROPIC_API_KEY)
accumulated = ''
try:
with client.messages.stream(
model = settings.CHAT_MODEL,
max_tokens = settings.CHAT_MAX_TOKENS,
system = settings.CHAT_SYSTEM_PROMPT,
messages = messages,
) as stream:
for text in stream.text_stream:
accumulated += text
yield f'data: {json.dumps({"t": text})}\n\n'
Message.objects.create(
conversation=conv, role='assistant', content=accumulated
)
except anthropic.RateLimitError:
yield f'data: {json.dumps({"error": "rate_limit"})}\n\n'
except anthropic.APIStatusError as e:
yield f'data: {json.dumps({"error": "api_error", "status": e.status_code})}\n\n'
except Exception:
yield f'data: {json.dumps({"error": "server_error"})}\n\n'
yield 'data: [DONE]\n\n'
response = StreamingHttpResponse(event_stream(), content_type='text/event-stream')
response['Cache-Control'] = 'no-cache'
response['X-Accel-Buffering'] = 'no' # disable nginx buffering
return response
The X-Accel-Buffering: no header is important if you run nginx in front of
Django — without it nginx buffers the entire response and the streaming effect is lost.
Wire up the URL:
# chat/urls.py
from django.urls import path
from .views import ChatView
urlpatterns = [
path('chat/', ChatView.as_view(), name='chat_message'),
]
# project/urls.py
from django.urls import path, include
urlpatterns = [
...
path('api/', include('chat.urls')),
]
5. Day 1 Evening — Conversation History
We pass the last CHAT_MAX_HISTORY * 2 messages (user + assistant pairs) to
Claude. Fetching only recent messages keeps the context window predictable and controls
per-request token cost. For most support or FAQ bots, ten message pairs is more than enough.
Add a utility function for rate limiting while you're here:
# chat/utils.py
from django.core.cache import cache
def check_rate_limit(session_key: str, limit: int = 20, window: int = 3600) -> bool:
"""
Allow at most `limit` messages per `window` seconds per session.
Returns True if the request is within the limit.
"""
key = f'chat_rate_{session_key}'
count = cache.get(key, 0)
if count >= limit:
return False
cache.set(key, count + 1, timeout=window)
return True
This uses Django's cache framework — the default in-memory cache works for development.
In production, point CACHES at Redis or Memcached to share limits across
processes.
Test the endpoint with curl before moving to the frontend:
curl -s -X POST http://127.0.0.1:8000/api/chat/ \
-H 'Content-Type: application/json' \
-d '{"message": "What does this site do?"}' \
--no-buffer
You should see data: {"t":"..."} lines appearing one by one, followed by
data: [DONE]. If you get a wall of text at the end instead of a stream,
nginx or another proxy is buffering — make sure X-Accel-Buffering: no reaches
the client.
6. Day 2 Morning — The Chat Widget
The widget is a self-contained block of HTML, CSS, and JavaScript that you drop into your base template once. It floats in the bottom-right corner of every page, opens on click, and has no external dependencies.
Add this just before the closing </body> tag in your base template:
<!-- ── Chat widget ─────────────────────────────── -->
<style>
#chat-bubble {
position: fixed; bottom: 1.5rem; right: 1.5rem; z-index: 9000;
display: flex; flex-direction: column; align-items: flex-end; gap: .75rem;
}
#chat-toggle {
width: 52px; height: 52px; border-radius: 50%;
background: #00e5a0; border: none; cursor: pointer;
display: flex; align-items: center; justify-content: center;
box-shadow: 0 4px 20px rgba(0,229,160,.35);
transition: transform .15s, background .15s;
}
#chat-toggle:hover { transform: scale(1.08); background: #00ffb3; }
#chat-toggle svg { color: #0d0f14; }
#chat-window {
width: 340px; max-height: 480px;
background: #13161d; border: 1px solid #252a36; border-radius: 14px;
display: flex; flex-direction: column; overflow: hidden;
box-shadow: 0 8px 40px rgba(0,0,0,.55);
transition: opacity .15s, transform .15s;
}
#chat-window.chat-hidden { opacity: 0; pointer-events: none; transform: translateY(8px); }
#chat-head {
padding: .7rem 1rem; background: #181c25;
border-bottom: 1px solid #252a36;
display: flex; justify-content: space-between; align-items: center;
}
#chat-head span { font-size: .78rem; font-weight: 600; color: #e2e8f0; }
#chat-close {
background: none; border: none; color: #64748b;
cursor: pointer; font-size: 1.1rem; line-height: 1; padding: .1rem .3rem;
}
#chat-close:hover { color: #e2e8f0; }
#chat-messages {
flex: 1; overflow-y: auto; padding: .85rem 1rem;
display: flex; flex-direction: column; gap: .6rem;
}
.chat-msg {
max-width: 82%; padding: .55rem .85rem;
border-radius: 10px; font-size: .8rem; line-height: 1.6;
word-break: break-word;
}
.chat-msg.user { align-self: flex-end; background: #00e5a0; color: #0d0f14; }
.chat-msg.assistant { align-self: flex-start; background: #1e2332; color: #cbd5e1; }
.chat-msg.assistant.streaming::after {
content: '▍'; display: inline-block;
animation: blink .7s step-end infinite;
}
@keyframes blink { 50% { opacity: 0; } }
#chat-foot {
padding: .65rem .75rem; border-top: 1px solid #252a36;
display: flex; gap: .5rem;
}
#chat-input {
flex: 1; background: #0d0f14; border: 1px solid #252a36; border-radius: 7px;
color: #e2e8f0; font-size: .78rem; padding: .45rem .7rem; outline: none;
font-family: inherit;
}
#chat-input:focus { border-color: #00e5a0; }
#chat-send {
background: #00e5a0; color: #0d0f14; border: none;
border-radius: 7px; padding: .45rem .8rem;
font-size: .75rem; font-weight: 700; cursor: pointer;
transition: background .13s;
}
#chat-send:hover { background: #00ffb3; }
#chat-send:disabled { opacity: .45; cursor: not-allowed; }
@media (max-width: 420px) {
#chat-window { width: calc(100vw - 2rem); }
}
</style>
<div id="chat-bubble">
<div id="chat-window" class="chat-hidden">
<div id="chat-head">
<span>Ask me anything</span>
<button id="chat-close" aria-label="Close chat">×</button>
</div>
<div id="chat-messages"></div>
<div id="chat-foot">
<input id="chat-input" type="text" placeholder="Type a message…" autocomplete="off" />
<button id="chat-send">Send</button>
</div>
</div>
<button id="chat-toggle" aria-label="Open chat">
<svg width="22" height="22" viewBox="0 0 24 24" fill="none">
<path d="M21 15a2 2 0 0 1-2 2H7l-4 4V5a2 2 0 0 1 2-2h14a2 2 0 0 1 2 2z"
stroke="currentColor" stroke-width="2"
stroke-linecap="round" stroke-linejoin="round"/>
</svg>
</button>
</div>
7. Day 2 Afternoon — Streaming Tokens to the Browser
The JavaScript reads the SSE stream using the Fetch API's ReadableStream.
Each data: line is parsed and appended to the assistant's message bubble as
it arrives. Add this script block right after the widget HTML:
<script>
(function () {
var win = document.getElementById('chat-window');
var msgs = document.getElementById('chat-messages');
var input = document.getElementById('chat-input');
var send = document.getElementById('chat-send');
var toggle = document.getElementById('chat-toggle');
var close = document.getElementById('chat-close');
var busy = false;
toggle.addEventListener('click', function () {
win.classList.toggle('chat-hidden');
if (!win.classList.contains('chat-hidden')) input.focus();
});
close.addEventListener('click', function () {
win.classList.add('chat-hidden');
});
function getCsrf() {
var m = document.cookie.match(/csrftoken=([^;]+)/);
return m ? m[1] : '';
}
function addMsg(role, text) {
var div = document.createElement('div');
div.className = 'chat-msg ' + role;
div.textContent = text;
msgs.appendChild(div);
msgs.scrollTop = msgs.scrollHeight;
return div;
}
async function sendMessage() {
var text = input.value.trim();
if (!text || busy) return;
busy = true;
send.disabled = true;
input.value = '';
addMsg('user', text);
var reply = addMsg('assistant', '');
reply.classList.add('streaming');
try {
var res = await fetch('/api/chat/', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'X-CSRFToken': getCsrf(),
},
body: JSON.stringify({ message: text }),
});
if (!res.ok) {
reply.textContent = 'Error ' + res.status + '. Please try again.';
return;
}
var reader = res.body.getReader();
var decoder = new TextDecoder();
var buffer = '';
while (true) {
var chunk = await reader.read();
if (chunk.done) break;
buffer += decoder.decode(chunk.value, { stream: true });
var lines = buffer.split('\n');
buffer = lines.pop(); // hold back any incomplete line
for (var i = 0; i < lines.length; i++) {
var line = lines[i];
if (!line.startsWith('data: ')) continue;
var raw = line.slice(6);
if (raw === '[DONE]') break;
try {
var data = JSON.parse(raw);
if (data.error) {
reply.textContent = 'Error: ' + data.error + '. Please try again.';
break;
}
if (data.t) {
reply.textContent += data.t;
msgs.scrollTop = msgs.scrollHeight;
}
} catch (_) {}
}
}
} catch (err) {
reply.textContent = 'Connection error. Please try again.';
} finally {
reply.classList.remove('streaming');
busy = false;
send.disabled = false;
input.focus();
}
}
send.addEventListener('click', sendMessage);
input.addEventListener('keydown', function (e) {
if (e.key === 'Enter' && !e.shiftKey) { e.preventDefault(); sendMessage(); }
});
}());
</script>
The streaming CSS class adds an animated block cursor while the response is
in flight. It's removed in the finally block once streaming ends or fails.
The CSRF token is read from the cookie — no {% csrf_token %} tag needed
in the widget markup.
8. Rate Limiting & Cost Control
The per-session rate limit in utils.py is a floor, not a ceiling. Add a
second layer of protection at the model level — limit the total tokens you're willing
to spend per session:
# chat/utils.py (additions)
def estimate_tokens(text: str) -> int:
"""Rough token estimate: ~4 chars per token."""
return max(1, len(text) // 4)
def session_token_budget_ok(session_key: str, new_tokens: int,
budget: int = 10_000) -> bool:
"""
Track approximate total tokens used by this session.
Returns False if adding new_tokens would exceed the budget.
"""
key = f'chat_tokens_{session_key}'
used = cache.get(key, 0)
if used + new_tokens > budget:
return False
cache.set(key, used + new_tokens, timeout=86400) # 24-hour window
return True
Call session_token_budget_ok in the view before opening the Claude stream,
passing estimate_tokens(content) as the cost estimate.
For a global safety net, set a max_tokens hard cap on the Claude API call
itself. The CHAT_MAX_TOKENS = 1024 setting already does this — Claude will
never return more than 1,024 tokens regardless of what the user sends.
9. Production Checklist
Before you ship:
-
CSRF. The view uses
@csrf_exemptbecause we read the token from the cookie in JavaScript. If you prefer, remove the decorator and add a hidden CSRF input rendered by Django instead. - Session engine. Django's default cookie-based sessions work fine for development. In production use database or cache-backed sessions so session keys are stable across workers.
-
Rate limiting at the edge. The cache-based rate limiter works per-process
unless you use Redis. Use Redis (
django-redis) or a CDN/WAF rule to share limits across all Gunicorn workers. -
Nginx buffering. Always set
proxy_buffering offin your nginx location block for the/api/chat/path, or rely on theX-Accel-Buffering: noheader already set in the view. - Conversation cleanup. Add a management command or Celery Beat task to delete conversations older than 30 days. The database will grow otherwise.
- System prompt. The system prompt is the single biggest lever on cost and relevance. Keep it short and specific. A prompt like "You are a support bot for [product]. Only answer questions about [product]. If asked anything else, politely redirect." reduces off-topic responses and long-winded answers.
-
Monitoring. Log
usage.input_tokensandusage.output_tokensfrom the completed stream object (stream.get_final_message().usage) into a lightweight table so you can track cost per session over time.
With those in place you have a production-grade chatbot that took two days to build,
runs on your existing Django infrastructure, and costs roughly $0.002–$0.01 per
conversation at typical usage levels with claude-opus-4-7.