The ORM Knows More SQL Than You Think
There is a persistent myth that Django's ORM is a toy that hides "real" SQL, and that any
serious PostgreSQL work means dropping to cursor.execute(). In practice the
opposite is true: the ORM exposes a remarkable amount of PostgreSQL through
django.contrib.postgres, expression objects (F, Q,
Func, Window), and a handful of small, well-maintained packages.
The wins of staying in the ORM are real — composable querysets, automatic parameter binding
(so no SQL injection), migration-managed indexes, and queries that survive a schema rename.
The discipline that makes this work is simple: know the SQL your ORM call compiles to.
Every example below pairs the Python with the SQL PostgreSQL actually runs. When you can read
both, you stop guessing about performance and start reasoning about it. Print the SQL for any
queryset with print(qs.query), or get the real plan with
qs.explain(analyze=True).
JSONField: Schemaless Columns, Indexed Properly
PostgreSQL's jsonb type is the reason most Django apps never need a document
store bolted on the side. It stores parsed, binary JSON — deduplicated keys, no whitespace,
and crucially indexable. Django maps it to models.JSONField, and the
lookups compile straight to native jsonb operators. The phrase
django postgresql json usually means exactly this combination.
# models.py
from django.db import models
from django.contrib.postgres.indexes import GinIndex
class Event(models.Model):
name = models.CharField(max_length=120)
# default=dict, never default={} — a shared mutable default is a classic bug
payload = models.JSONField(default=dict)
created = models.DateTimeField(auto_now_add=True)
class Meta:
indexes = [
# jsonb_ops: supports @>, ?, ?&, ?| — the containment + key-existence family
GinIndex(fields=["payload"], name="event_payload_gin"),
]
Querying nested keys
Use the double-underscore path to dig into the document. Each step becomes a
-> traversal, and the final comparison an = on the extracted text
or a typed comparison for numbers.
# Exact match on a nested string
Event.objects.filter(payload__device__os="iOS")
# Numeric comparison — Django casts the extracted value to numeric
Event.objects.filter(payload__cart__total__gt=100)
# Key existence (compiles to the ? operator)
Event.objects.filter(payload__has_key="utm_source")
# Array / object containment (compiles to @>) — this one uses the GIN index
Event.objects.filter(payload__contains={"plan": "pro"})
Event.objects.filter(payload__tags__contains=["beta"])
The __contains lookup is the important one. It compiles to:
SELECT * FROM event WHERE payload @> '{"plan": "pro"}';
The @> (contains) operator is GIN-indexable, so that query is an index scan,
not a sequential scan over every row's JSON. By contrast, payload__device__os="iOS"
uses a ->> path extraction that the default jsonb_ops GIN index
cannot serve — for that pattern you want an expression index (below).
Aggregating over JSON
Pull a key out as a real column with KeyTextTransform, then group, filter and
annotate it like any other field.
from django.db.models import Count
from django.db.models.fields.json import KeyTextTransform
(
Event.objects
.annotate(os=KeyTextTransform("os", "payload"))
.values("os")
.annotate(n=Count("id"))
.order_by("-n")
)
# GROUP BY (payload ->> 'os') — counts events per OS
Indexing a hot key path
When you constantly filter on one extracted value, a B-tree expression index on that path beats a whole-document GIN index — it's smaller and supports range queries and ordering.
from django.contrib.postgres.indexes import GinIndex, OpClass
from django.db.models.functions import Cast
from django.db.models import Index
class Meta:
indexes = [
# B-tree on a single extracted text value: payload ->> 'os'
Index(KeyTextTransform("os", "payload"), name="event_os_idx"),
# Smaller, faster containment-only GIN using jsonb_path_ops
GinIndex(
OpClass("payload", name="jsonb_path_ops"),
name="event_payload_pathops_gin",
),
]
Full-Text Search Without Elasticsearch
Before you stand up an Elasticsearch cluster for a search box, measure how far PostgreSQL gets
you. For the vast majority of apps — up to a few million rows — it's more than enough, and it
lives in the same transaction as your data so there's nothing to sync. This is the heart of
django full text search: to_tsvector turns a document into
normalised lexemes, to_tsquery turns a search string into a boolean query, and the
@@ operator matches them.
tsvector once (ideally a stored, GIN-indexed column), then match it against a parsed tsquery and rank with ts_rank.The quick version (no schema change)
For a first pass, annotate a SearchVector on the fly. It works immediately but
recomputes the vector for every row on every query — fine for small tables, not for hot paths.
from django.contrib.postgres.search import (
SearchVector, SearchQuery, SearchRank,
)
query = SearchQuery("django postgres", search_type="websearch")
vector = SearchVector("title", weight="A") + SearchVector("body", weight="B")
(
Article.objects
.annotate(rank=SearchRank(vector, query))
.filter(rank__gt=0)
.order_by("-rank")
)
search_type="websearch" (PostgreSQL 11+) is the one to use for user input: it
accepts Google-style syntax — quoted "exact phrases", or, and
-exclusion — and never raises on malformed input, unlike raw
to_tsquery.
The production version: a stored, indexed SearchVectorField
For anything performance-sensitive, materialise the vector into a SearchVectorField
column with its own GIN index, and keep it current with a database trigger (covered in the
triggers section — so it updates no matter how the row changes).
from django.contrib.postgres.search import SearchVectorField
from django.contrib.postgres.indexes import GinIndex
class Article(models.Model):
title = models.CharField(max_length=255)
body = models.TextField()
search_vector = SearchVectorField(null=True, editable=False)
class Meta:
indexes = [GinIndex(fields=["search_vector"], name="article_search_gin")]
Now searches hit the index and never recompute the vector:
query = SearchQuery("django & postgres", search_type="websearch")
(
Article.objects
.filter(search_vector=query)
.annotate(rank=SearchRank("search_vector", query))
.order_by("-rank")
)
# WHERE search_vector @@ websearch_to_tsquery('django postgres')
# ORDER BY ts_rank(search_vector, ...) DESC — served by the GIN index
Fuzzy Matching and Autocomplete with Trigrams
Full-text search is great for words but blind to typos and partial names — search
"kafka" and you won't find "Kaffka". The pg_trgm extension fixes
that by comparing three-character shingles, giving you typo-tolerant search, fuzzy author
lookups and ILIKE '%term%' queries that actually use an index.
Enable the extension in a migration so it's reproducible across environments:
# migrations/00XX_trigram.py
from django.contrib.postgres.operations import TrigramExtension
class Migration(migrations.Migration):
operations = [TrigramExtension()]
from django.contrib.postgres.search import TrigramSimilarity
(
Author.objects
.annotate(sim=TrigramSimilarity("name", "kafka"))
.filter(sim__gt=0.3)
.order_by("-sim")
)
# ORDER BY similarity(name, 'kafka') DESC — finds "Kafka", "Kaffka", "Kavka"
Add a GIN index with the gin_trgm_ops operator class so both trigram similarity
and icontains can use it — the latter is the trick that makes
LIKE '%term%' fast:
from django.contrib.postgres.indexes import GinIndex, OpClass
class Meta:
indexes = [
GinIndex(
OpClass("name", name="gin_trgm_ops"),
name="author_name_trgm",
),
]
# Now Author.objects.filter(name__icontains="afk") is an index scan.
Window Functions: Ranking Without Subqueries
Window functions compute across a set of rows related to the current row without
collapsing them the way GROUP BY does. "Rank each article within its category by
views", "running total of revenue by day", "each event's gap from the previous one" — all one
Window() annotation, no self-joins.
from django.db.models import Window, F, Sum
from django.db.models.functions import Rank, RowNumber, Lag
# Rank articles within each category by view count
(
Article.objects.annotate(
rank=Window(
expression=Rank(),
partition_by=[F("category_id")],
order_by=F("views").desc(),
)
)
)
# RANK() OVER (PARTITION BY category_id ORDER BY views DESC)
# Running total of revenue ordered by date
Sale.objects.annotate(
running_total=Window(
expression=Sum("amount"),
order_by=F("date").asc(),
)
)
# Each event's delta from the previous one (LAG)
Event.objects.annotate(
prev_created=Window(expression=Lag("created"), order_by=F("created").asc())
)
# Top article per category — filter the window result via a subquery
from django.db.models import Window, F
from django.db.models.functions import RowNumber
ranked = Article.objects.annotate(
rn=Window(
expression=RowNumber(),
partition_by=[F("category_id")],
order_by=F("views").desc(),
)
).values("id", "rn")
top_ids = [r["id"] for r in ranked if r["rn"] == 1]
top_per_category = Article.objects.filter(id__in=top_ids)
# Or push the filter into SQL with a CTE (next section) to avoid two round-trips.
CTEs and Recursive Queries
Common Table Expressions (WITH queries) are the one feature here the core ORM
still won't build for you. Two good options: the django-cte package for a clean
queryset API, or RawSQL when the CTE is genuinely one-off. The classic use case is
a recursive query — walking a self-referential tree like nested categories or comment
threads in a single round-trip instead of N queries.
# pip install django-cte
from django_cte import CTEManager, With
class Category(models.Model):
name = models.CharField(max_length=100)
parent = models.ForeignKey(
"self", null=True, blank=True, related_name="children", on_delete=models.CASCADE
)
objects = CTEManager()
def descendants(root_id):
def make_cte(cte):
# Anchor: the root row
base = Category.objects.filter(id=root_id).values("id", "name", "parent_id")
# Recursive term: children of anything already in the CTE
recursive = cte.join(Category, parent_id=cte.col.id).values(
"id", "name", "parent_id"
)
return base.union(recursive, all=True)
cte = With.recursive(make_cte)
return cte.queryset().with_cte(cte)
That compiles to a single recursive CTE and one database round-trip:
WITH RECURSIVE cte AS (
SELECT id, name, parent_id FROM category WHERE id = %s
UNION ALL
SELECT c.id, c.name, c.parent_id
FROM category c JOIN cte ON c.parent_id = cte.id
)
SELECT * FROM cte;
Database Triggers with pgtrigger
Some invariants are too important to enforce in Python, because Python isn't the only thing
that writes to your database. bulk_update(), a data migration, a
psql session, or a second service all bypass your model's save().
A database trigger runs no matter who does the writing. The
django-pgtrigger
package lets you declare triggers on the model and ship them through migrations.
Keep the search vector current
This is the missing half of the full-text search section: a BEFORE INSERT OR UPDATE
trigger that recomputes search_vector in the database, so it's always right even
after a bulk_create.
# pip install django-pgtrigger
import pgtrigger
from django.contrib.postgres.search import SearchVectorField
from django.contrib.postgres.indexes import GinIndex
@pgtrigger.register(
pgtrigger.Trigger(
name="article_search_vector",
operation=pgtrigger.Insert | pgtrigger.Update,
when=pgtrigger.Before,
func=(
"NEW.search_vector := "
"setweight(to_tsvector('english', coalesce(NEW.title, '')), 'A') || "
"setweight(to_tsvector('english', coalesce(NEW.body, '')), 'B'); "
"RETURN NEW;"
),
)
)
class Article(models.Model):
title = models.CharField(max_length=255)
body = models.TextField()
search_vector = SearchVectorField(null=True, editable=False)
class Meta:
indexes = [GinIndex(fields=["search_vector"], name="article_search_gin")]
Run ./manage.py makemigrations and pgtrigger writes the
CREATE TRIGGER / CREATE FUNCTION into a normal migration. No
RunSQL by hand, and it's torn down cleanly when you remove the decorator.
Protect rows and build audit trails
pgtrigger ships ready-made triggers for the common cases:
# Refuse to delete rows that are still referenced / published
@pgtrigger.register(
pgtrigger.Protect(name="no_delete_published", operation=pgtrigger.Delete,
condition=pgtrigger.Q(old__status="published"))
)
class Article(models.Model):
...
# Append an immutable audit row on every change, in-database
@pgtrigger.register(
pgtrigger.Trigger(
name="audit_changes",
operation=pgtrigger.Update,
when=pgtrigger.After,
condition=pgtrigger.Q(old__price__df=pgtrigger.F("new__price")), # only when price changed
func="INSERT INTO price_audit(article_id, old_price, new_price, changed_at) "
"VALUES (OLD.id, OLD.price, NEW.price, now()); RETURN NULL;",
)
)
class Article(models.Model):
...
Performance Tuning: Read the Plan First
Everything above is fast if it hits the right index and slow if it doesn't. PostgreSQL performance tuning from Django starts in one place: the query plan. Never optimise by guessing — ask the planner what it's doing.
# Get the real plan, with timings and buffer hits, straight from the queryset
qs = Article.objects.filter(search_vector=SearchQuery("django", search_type="websearch"))
print(qs.explain(analyze=True, buffers=True))
# Look for: "Index Scan using article_search_gin" (good)
# vs "Seq Scan on article" (bad — no usable index)
Pick the right index type
| Index | Best for | Django |
|---|---|---|
B-tree |
Equality & range on scalars; ordering. The default. | Index(fields=["created"]) |
GIN |
jsonb containment, full-text, arrays, trigrams. |
GinIndex(fields=["search_vector"]) |
BRIN |
Huge append-only tables with naturally ordered columns (timestamps). Tiny on disk. | BrinIndex(fields=["created"]) |
Partial |
Indexing only the rows you query (e.g. status='published'). |
Index(..., condition=Q(status="published")) |
Covering |
Index-only scans — bundle extra columns so the heap is never touched. | Index(fields=["slug"], include=["title"]) |
from django.db.models import Index, Q
from django.contrib.postgres.indexes import BrinIndex
class Meta:
indexes = [
# Partial: only published articles are ever filtered by slug here
Index(fields=["slug"], name="pub_slug_idx", condition=Q(status="published")),
# Covering: serve slug lookups without touching the table heap
Index(fields=["slug"], include=["title", "published_at"], name="slug_cover_idx"),
# BRIN for a 200M-row append-only events table ordered by time
BrinIndex(fields=["created"], name="event_created_brin"),
]
Kill N+1 queries
The single most common Django performance bug isn't a missing index — it's a template looping
over articles and touching article.author.name, firing one query per
row. select_related (SQL join, for FK/one-to-one) and prefetch_related
(second query + Python join, for many-to-many/reverse FK) fix it.
# 1 + N queries → 2 queries total
(
Article.objects
.select_related("author") # JOIN — author in the same row
.prefetch_related("tags", "comments") # one extra query each, joined in Python
)
Connection pooling
Opening a PostgreSQL connection is expensive, and the default Django request lifecycle opens
one per request. Two fixes: set CONN_MAX_AGE to reuse connections, and — on
Django 5.1+ with psycopg 3 — switch on the built-in pool. Under serverless or very high
concurrency, front the database with PgBouncer in transaction mode instead.
# settings.py — Django 5.1+ native pooling (psycopg 3)
DATABASES = {
"default": {
"ENGINE": "django.db.backends.postgresql",
"NAME": "app",
"CONN_MAX_AGE": 600, # reuse a connection for 10 minutes
"CONN_HEALTH_CHECKS": True, # ping before reuse, drop dead ones
"OPTIONS": {
"pool": {"min_size": 2, "max_size": 8}, # built-in connection pool
},
}
}
Production Checklist
- Read the plan, don't guess.
qs.explain(analyze=True, buffers=True)on every query that matters; chase "Seq Scan" on big tables. - JSONField: use
default=dict(never a literal{}); index containment with ajsonb_path_opsGIN; add a B-tree expression index for hot key paths. - Full-text search: store a
SearchVectorFieldwith a GIN index, keep it current with a trigger, weighttitle=A / body=B, and query withsearch_type="websearch". - Fuzzy matching: enable
pg_trgmviaTrigramExtension; agin_trgm_opsindex makes bothTrigramSimilarityandicontainsfast. - Window functions: remember you can't filter on them directly — wrap in a subquery or CTE.
- CTEs: reach for
django-ctefor recursive trees; always add a depth or cycle guard on untrusted data. - Triggers: use
pgtriggerfor invariants that must hold regardless of who writes (audit logs, counters, search vectors) — and keep them defined next to the model. - Indexes: right type for the job — BRIN for append-only time-series, partial for filtered subsets, covering for index-only scans.
- Connections: set
CONN_MAX_AGE+ health checks; use Django 5.1 native pooling or PgBouncer under high concurrency. - Monitoring: turn on
pg_stat_statementsand a slow-query log; let the data pick what to optimise.
None of this needed a hand-written cursor.execute(). The ORM gave us indexed JSON
queries, ranked full-text search, typo-tolerant lookups, windowed analytics, recursive trees
and database triggers — all composable, all migration-managed, and all with SQL you can read.
Reach for raw SQL when you've genuinely outgrown the expression API; until then, PostgreSQL and
the Django ORM are a more powerful pairing than most people give them credit for.