WEBBYFOX-OS PATH /blog/postgresql-for-django-developers/ DOC ~19 MIN NODE LDN-01
FEED ACTIVE 13:42 BST
./post · postgresql-for-django-developers.md
NEW
// ARTICLE · MAY 29, 2026

PostgreSQL for Django Developers: Beyond Raw SQL.

Most Django developers reach for cursor.execute() the moment they need something the ORM "can't do" — JSON queries, full-text search, ranking, recursive trees. Nearly all of it is one ORM expression away, type-checked, composable, and migration-aware. This guide walks the PostgreSQL features that matter most — JSONField, full-text search, trigram matching, window functions, CTEs and triggers — and shows the raw SQL each one compiles to so you always know what you're shipping to the database.

Python Django PostgreSQL Performance
· ~19 min read · Rizwan Mansuri
$ cat ./postgresql-for-django-developers.md
READ

The ORM Knows More SQL Than You Think

There is a persistent myth that Django's ORM is a toy that hides "real" SQL, and that any serious PostgreSQL work means dropping to cursor.execute(). In practice the opposite is true: the ORM exposes a remarkable amount of PostgreSQL through django.contrib.postgres, expression objects (F, Q, Func, Window), and a handful of small, well-maintained packages. The wins of staying in the ORM are real — composable querysets, automatic parameter binding (so no SQL injection), migration-managed indexes, and queries that survive a schema rename.

The discipline that makes this work is simple: know the SQL your ORM call compiles to. Every example below pairs the Python with the SQL PostgreSQL actually runs. When you can read both, you stop guessing about performance and start reasoning about it. Print the SQL for any queryset with print(qs.query), or get the real plan with qs.explain(analyze=True).


JSONField: Schemaless Columns, Indexed Properly

PostgreSQL's jsonb type is the reason most Django apps never need a document store bolted on the side. It stores parsed, binary JSON — deduplicated keys, no whitespace, and crucially indexable. Django maps it to models.JSONField, and the lookups compile straight to native jsonb operators. The phrase django postgresql json usually means exactly this combination.

# models.py
from django.db import models
from django.contrib.postgres.indexes import GinIndex


class Event(models.Model):
    name = models.CharField(max_length=120)
    # default=dict, never default={} — a shared mutable default is a classic bug
    payload = models.JSONField(default=dict)
    created = models.DateTimeField(auto_now_add=True)

    class Meta:
        indexes = [
            # jsonb_ops: supports @>, ?, ?&, ?| — the containment + key-existence family
            GinIndex(fields=["payload"], name="event_payload_gin"),
        ]

Querying nested keys

Use the double-underscore path to dig into the document. Each step becomes a -> traversal, and the final comparison an = on the extracted text or a typed comparison for numbers.

# Exact match on a nested string
Event.objects.filter(payload__device__os="iOS")

# Numeric comparison — Django casts the extracted value to numeric
Event.objects.filter(payload__cart__total__gt=100)

# Key existence (compiles to the ? operator)
Event.objects.filter(payload__has_key="utm_source")

# Array / object containment (compiles to @>) — this one uses the GIN index
Event.objects.filter(payload__contains={"plan": "pro"})
Event.objects.filter(payload__tags__contains=["beta"])

The __contains lookup is the important one. It compiles to:

SELECT * FROM event WHERE payload @> '{"plan": "pro"}';

The @> (contains) operator is GIN-indexable, so that query is an index scan, not a sequential scan over every row's JSON. By contrast, payload__device__os="iOS" uses a ->> path extraction that the default jsonb_ops GIN index cannot serve — for that pattern you want an expression index (below).

Aggregating over JSON

Pull a key out as a real column with KeyTextTransform, then group, filter and annotate it like any other field.

from django.db.models import Count
from django.db.models.fields.json import KeyTextTransform

(
    Event.objects
    .annotate(os=KeyTextTransform("os", "payload"))
    .values("os")
    .annotate(n=Count("id"))
    .order_by("-n")
)
# GROUP BY (payload ->> 'os') — counts events per OS

Indexing a hot key path

When you constantly filter on one extracted value, a B-tree expression index on that path beats a whole-document GIN index — it's smaller and supports range queries and ordering.

from django.contrib.postgres.indexes import GinIndex, OpClass
from django.db.models.functions import Cast
from django.db.models import Index

class Meta:
    indexes = [
        # B-tree on a single extracted text value: payload ->> 'os'
        Index(KeyTextTransform("os", "payload"), name="event_os_idx"),
        # Smaller, faster containment-only GIN using jsonb_path_ops
        GinIndex(
            OpClass("payload", name="jsonb_path_ops"),
            name="event_payload_pathops_gin",
        ),
    ]

Before you stand up an Elasticsearch cluster for a search box, measure how far PostgreSQL gets you. For the vast majority of apps — up to a few million rows — it's more than enough, and it lives in the same transaction as your data so there's nothing to sync. This is the heart of django full text search: to_tsvector turns a document into normalised lexemes, to_tsquery turns a search string into a boolean query, and the @@ operator matches them.

FULL-TEXT SEARCH PIPELINE title + body raw text columns to_tsvector() stem · stop-words · weight A/B/C/D search_vector tsvector column + GIN index ts_rank() ordered results QUERY SIDE "django & postgres" websearch_to_tsquery() search_vector @@ query index scan, not seq scan Precompute the tsvector into a stored column so the GIN index does the work at query time.
The two halves of Postgres FTS: build a weighted tsvector once (ideally a stored, GIN-indexed column), then match it against a parsed tsquery and rank with ts_rank.

The quick version (no schema change)

For a first pass, annotate a SearchVector on the fly. It works immediately but recomputes the vector for every row on every query — fine for small tables, not for hot paths.

from django.contrib.postgres.search import (
    SearchVector, SearchQuery, SearchRank,
)

query = SearchQuery("django postgres", search_type="websearch")
vector = SearchVector("title", weight="A") + SearchVector("body", weight="B")

(
    Article.objects
    .annotate(rank=SearchRank(vector, query))
    .filter(rank__gt=0)
    .order_by("-rank")
)

search_type="websearch" (PostgreSQL 11+) is the one to use for user input: it accepts Google-style syntax — quoted "exact phrases", or, and -exclusion — and never raises on malformed input, unlike raw to_tsquery.

The production version: a stored, indexed SearchVectorField

For anything performance-sensitive, materialise the vector into a SearchVectorField column with its own GIN index, and keep it current with a database trigger (covered in the triggers section — so it updates no matter how the row changes).

from django.contrib.postgres.search import SearchVectorField
from django.contrib.postgres.indexes import GinIndex


class Article(models.Model):
    title = models.CharField(max_length=255)
    body = models.TextField()
    search_vector = SearchVectorField(null=True, editable=False)

    class Meta:
        indexes = [GinIndex(fields=["search_vector"], name="article_search_gin")]

Now searches hit the index and never recompute the vector:

query = SearchQuery("django & postgres", search_type="websearch")

(
    Article.objects
    .filter(search_vector=query)
    .annotate(rank=SearchRank("search_vector", query))
    .order_by("-rank")
)
# WHERE search_vector @@ websearch_to_tsquery('django postgres')
# ORDER BY ts_rank(search_vector, ...) DESC  — served by the GIN index

Fuzzy Matching and Autocomplete with Trigrams

Full-text search is great for words but blind to typos and partial names — search "kafka" and you won't find "Kaffka". The pg_trgm extension fixes that by comparing three-character shingles, giving you typo-tolerant search, fuzzy author lookups and ILIKE '%term%' queries that actually use an index.

Enable the extension in a migration so it's reproducible across environments:

# migrations/00XX_trigram.py
from django.contrib.postgres.operations import TrigramExtension

class Migration(migrations.Migration):
    operations = [TrigramExtension()]
from django.contrib.postgres.search import TrigramSimilarity

(
    Author.objects
    .annotate(sim=TrigramSimilarity("name", "kafka"))
    .filter(sim__gt=0.3)
    .order_by("-sim")
)
# ORDER BY similarity(name, 'kafka') DESC — finds "Kafka", "Kaffka", "Kavka"

Add a GIN index with the gin_trgm_ops operator class so both trigram similarity and icontains can use it — the latter is the trick that makes LIKE '%term%' fast:

from django.contrib.postgres.indexes import GinIndex, OpClass

class Meta:
    indexes = [
        GinIndex(
            OpClass("name", name="gin_trgm_ops"),
            name="author_name_trgm",
        ),
    ]
# Now Author.objects.filter(name__icontains="afk") is an index scan.

Window Functions: Ranking Without Subqueries

Window functions compute across a set of rows related to the current row without collapsing them the way GROUP BY does. "Rank each article within its category by views", "running total of revenue by day", "each event's gap from the previous one" — all one Window() annotation, no self-joins.

from django.db.models import Window, F, Sum
from django.db.models.functions import Rank, RowNumber, Lag

# Rank articles within each category by view count
(
    Article.objects.annotate(
        rank=Window(
            expression=Rank(),
            partition_by=[F("category_id")],
            order_by=F("views").desc(),
        )
    )
)
# RANK() OVER (PARTITION BY category_id ORDER BY views DESC)
# Running total of revenue ordered by date
Sale.objects.annotate(
    running_total=Window(
        expression=Sum("amount"),
        order_by=F("date").asc(),
    )
)

# Each event's delta from the previous one (LAG)
Event.objects.annotate(
    prev_created=Window(expression=Lag("created"), order_by=F("created").asc())
)
# Top article per category — filter the window result via a subquery
from django.db.models import Window, F
from django.db.models.functions import RowNumber

ranked = Article.objects.annotate(
    rn=Window(
        expression=RowNumber(),
        partition_by=[F("category_id")],
        order_by=F("views").desc(),
    )
).values("id", "rn")

top_ids = [r["id"] for r in ranked if r["rn"] == 1]
top_per_category = Article.objects.filter(id__in=top_ids)

# Or push the filter into SQL with a CTE (next section) to avoid two round-trips.

CTEs and Recursive Queries

Common Table Expressions (WITH queries) are the one feature here the core ORM still won't build for you. Two good options: the django-cte package for a clean queryset API, or RawSQL when the CTE is genuinely one-off. The classic use case is a recursive query — walking a self-referential tree like nested categories or comment threads in a single round-trip instead of N queries.

# pip install django-cte
from django_cte import CTEManager, With


class Category(models.Model):
    name = models.CharField(max_length=100)
    parent = models.ForeignKey(
        "self", null=True, blank=True, related_name="children", on_delete=models.CASCADE
    )
    objects = CTEManager()


def descendants(root_id):
    def make_cte(cte):
        # Anchor: the root row
        base = Category.objects.filter(id=root_id).values("id", "name", "parent_id")
        # Recursive term: children of anything already in the CTE
        recursive = cte.join(Category, parent_id=cte.col.id).values(
            "id", "name", "parent_id"
        )
        return base.union(recursive, all=True)

    cte = With.recursive(make_cte)
    return cte.queryset().with_cte(cte)

That compiles to a single recursive CTE and one database round-trip:

WITH RECURSIVE cte AS (
    SELECT id, name, parent_id FROM category WHERE id = %s
  UNION ALL
    SELECT c.id, c.name, c.parent_id
    FROM category c JOIN cte ON c.parent_id = cte.id
)
SELECT * FROM cte;

Database Triggers with pgtrigger

Some invariants are too important to enforce in Python, because Python isn't the only thing that writes to your database. bulk_update(), a data migration, a psql session, or a second service all bypass your model's save(). A database trigger runs no matter who does the writing. The django-pgtrigger package lets you declare triggers on the model and ship them through migrations.

Keep the search vector current

This is the missing half of the full-text search section: a BEFORE INSERT OR UPDATE trigger that recomputes search_vector in the database, so it's always right even after a bulk_create.

# pip install django-pgtrigger
import pgtrigger
from django.contrib.postgres.search import SearchVectorField
from django.contrib.postgres.indexes import GinIndex


@pgtrigger.register(
    pgtrigger.Trigger(
        name="article_search_vector",
        operation=pgtrigger.Insert | pgtrigger.Update,
        when=pgtrigger.Before,
        func=(
            "NEW.search_vector := "
            "setweight(to_tsvector('english', coalesce(NEW.title, '')), 'A') || "
            "setweight(to_tsvector('english', coalesce(NEW.body, '')),  'B'); "
            "RETURN NEW;"
        ),
    )
)
class Article(models.Model):
    title = models.CharField(max_length=255)
    body = models.TextField()
    search_vector = SearchVectorField(null=True, editable=False)

    class Meta:
        indexes = [GinIndex(fields=["search_vector"], name="article_search_gin")]

Run ./manage.py makemigrations and pgtrigger writes the CREATE TRIGGER / CREATE FUNCTION into a normal migration. No RunSQL by hand, and it's torn down cleanly when you remove the decorator.

Protect rows and build audit trails

pgtrigger ships ready-made triggers for the common cases:

# Refuse to delete rows that are still referenced / published
@pgtrigger.register(
    pgtrigger.Protect(name="no_delete_published", operation=pgtrigger.Delete,
                      condition=pgtrigger.Q(old__status="published"))
)
class Article(models.Model):
    ...

# Append an immutable audit row on every change, in-database
@pgtrigger.register(
    pgtrigger.Trigger(
        name="audit_changes",
        operation=pgtrigger.Update,
        when=pgtrigger.After,
        condition=pgtrigger.Q(old__price__df=pgtrigger.F("new__price")),  # only when price changed
        func="INSERT INTO price_audit(article_id, old_price, new_price, changed_at) "
             "VALUES (OLD.id, OLD.price, NEW.price, now()); RETURN NULL;",
    )
)
class Article(models.Model):
    ...

Performance Tuning: Read the Plan First

Everything above is fast if it hits the right index and slow if it doesn't. PostgreSQL performance tuning from Django starts in one place: the query plan. Never optimise by guessing — ask the planner what it's doing.

# Get the real plan, with timings and buffer hits, straight from the queryset
qs = Article.objects.filter(search_vector=SearchQuery("django", search_type="websearch"))
print(qs.explain(analyze=True, buffers=True))

# Look for: "Index Scan using article_search_gin" (good)
#           vs "Seq Scan on article" (bad — no usable index)

Pick the right index type

IndexBest forDjango
B-tree Equality & range on scalars; ordering. The default. Index(fields=["created"])
GIN jsonb containment, full-text, arrays, trigrams. GinIndex(fields=["search_vector"])
BRIN Huge append-only tables with naturally ordered columns (timestamps). Tiny on disk. BrinIndex(fields=["created"])
Partial Indexing only the rows you query (e.g. status='published'). Index(..., condition=Q(status="published"))
Covering Index-only scans — bundle extra columns so the heap is never touched. Index(fields=["slug"], include=["title"])
from django.db.models import Index, Q
from django.contrib.postgres.indexes import BrinIndex

class Meta:
    indexes = [
        # Partial: only published articles are ever filtered by slug here
        Index(fields=["slug"], name="pub_slug_idx", condition=Q(status="published")),
        # Covering: serve slug lookups without touching the table heap
        Index(fields=["slug"], include=["title", "published_at"], name="slug_cover_idx"),
        # BRIN for a 200M-row append-only events table ordered by time
        BrinIndex(fields=["created"], name="event_created_brin"),
    ]

Kill N+1 queries

The single most common Django performance bug isn't a missing index — it's a template looping over articles and touching article.author.name, firing one query per row. select_related (SQL join, for FK/one-to-one) and prefetch_related (second query + Python join, for many-to-many/reverse FK) fix it.

# 1 + N queries  →  2 queries total
(
    Article.objects
    .select_related("author")            # JOIN — author in the same row
    .prefetch_related("tags", "comments")  # one extra query each, joined in Python
)

Connection pooling

Opening a PostgreSQL connection is expensive, and the default Django request lifecycle opens one per request. Two fixes: set CONN_MAX_AGE to reuse connections, and — on Django 5.1+ with psycopg 3 — switch on the built-in pool. Under serverless or very high concurrency, front the database with PgBouncer in transaction mode instead.

# settings.py — Django 5.1+ native pooling (psycopg 3)
DATABASES = {
    "default": {
        "ENGINE": "django.db.backends.postgresql",
        "NAME": "app",
        "CONN_MAX_AGE": 600,            # reuse a connection for 10 minutes
        "CONN_HEALTH_CHECKS": True,     # ping before reuse, drop dead ones
        "OPTIONS": {
            "pool": {"min_size": 2, "max_size": 8},  # built-in connection pool
        },
    }
}

Production Checklist

  • Read the plan, don't guess. qs.explain(analyze=True, buffers=True) on every query that matters; chase "Seq Scan" on big tables.
  • JSONField: use default=dict (never a literal {}); index containment with a jsonb_path_ops GIN; add a B-tree expression index for hot key paths.
  • Full-text search: store a SearchVectorField with a GIN index, keep it current with a trigger, weight title=A / body=B, and query with search_type="websearch".
  • Fuzzy matching: enable pg_trgm via TrigramExtension; a gin_trgm_ops index makes both TrigramSimilarity and icontains fast.
  • Window functions: remember you can't filter on them directly — wrap in a subquery or CTE.
  • CTEs: reach for django-cte for recursive trees; always add a depth or cycle guard on untrusted data.
  • Triggers: use pgtrigger for invariants that must hold regardless of who writes (audit logs, counters, search vectors) — and keep them defined next to the model.
  • Indexes: right type for the job — BRIN for append-only time-series, partial for filtered subsets, covering for index-only scans.
  • Connections: set CONN_MAX_AGE + health checks; use Django 5.1 native pooling or PgBouncer under high concurrency.
  • Monitoring: turn on pg_stat_statements and a slow-query log; let the data pick what to optimise.

None of this needed a hand-written cursor.execute(). The ORM gave us indexed JSON queries, ranked full-text search, typo-tolerant lookups, windowed analytics, recursive trees and database triggers — all composable, all migration-managed, and all with SQL you can read. Reach for raw SQL when you've genuinely outgrown the expression API; until then, PostgreSQL and the Django ORM are a more powerful pairing than most people give them credit for.

$ ls ./related/
3 POSTS
$ cd ../ · · Rizwan Mansuri ↗ RSS feed · ↑ top