AI Exposure by Occupation: Karpathy's Visualizer Explained

1. The Tool: 342 Occupations, One Question

karpathy.ai/jobs is a browser-based treemap built by Andrej Karpathy. Every rectangle represents one of the 342 occupations tracked by the US Bureau of Labor Statistics Occupational Outlook Handbook. Rectangle area is proportional to total employment — so large rectangles like registered nurses and retail salespeople dominate the canvas, while niche roles sit in small slivers.

342 Occupations tracked

143M Jobs visualised

4 Scoring metrics

0–10 AI exposure scale

Together the 342 occupations account for roughly 143 million jobs — close to the full US employed workforce. The question the tool tries to answer: which of those jobs is most exposed to being reshaped by current AI?

"This is a development tool for exploring BLS data visually — scrapers, parsers, and a pipeline for writing custom LLM prompts to score and color occupations by any criteria." — karpathy.ai/jobs

The project is deliberately not a prediction engine. It is an exploration interface — a way to see the entire labour market through whichever lens you choose to build.

2. Four Ways to Slice 143 Million Jobs

The visualizer ships with four built-in colour metrics, each painting the treemap differently:

BLS projected growth outlook — official 10-year growth forecasts from the Occupational Outlook Handbook, ranging from declining occupations to much-faster-than-average growth.
Median pay — annual median wage per occupation. Useful for spotting which large-employment roles are low-wage versus which smaller niches are highly compensated.
Education requirements — the typical entry-level education the BLS associates with each role, from no formal credential through doctoral degree.
AI exposure — a 0–10 score generated by an LLM prompt assessing how much of the occupation's day-to-day work current AI can assist with or replace.

The first three metrics are straight BLS data; the fourth is where it gets interesting — an original LLM-generated signal that does not exist anywhere in official statistics.

3. The AI Exposure Scale: 0 to 10

The exposure metric maps occupations onto a 0–10 scale based on how much of the role can be done — or fundamentally reshaped — by today's AI systems. The rubric is anchored to current capability, not hypothetical future systems.

0–1 Minimal

Almost entirely physical or hands-on. Requires real-time human presence in unpredictable environments.

Roofers, landscapers, commercial divers

2–3 Low

Mostly physical or interpersonal work. AI may help with minor peripheral tasks like scheduling or reports.

Electricians, plumbers, firefighters

4–5 Moderate

A blend of physical or interpersonal work and knowledge work. Neither fully automatable nor fully physical.

Nurses, police officers, veterinarians

6–7 High

Predominantly knowledge work with some need for human judgment, relationships, or physical presence.

Teachers, managers, accountants, journalists

8–9 Very High

The job is almost entirely done on a computer. AI tools can assist with the majority of daily tasks today.

Software developers, designers, translators, analysts

10 Maximum

Routine information processing, fully digital, no physical component whatsoever.

Data entry clerks, telemarketers

4. AI Exposure by Occupation — Chart

The chart below plots 22 representative occupations from across the scale. Bars are coloured by exposure tier — green for the safest roles through to red for maximum exposure. Employment figures are from the BLS Occupational Outlook Handbook.

Minimal (0–1)

Low (2–3)

Moderate (4–5)

High (6–7)

Very High (8–9)

Maximum (10)

AI Exposure Score (0–10) · 22 representative occupations

Source: BLS Occupational Outlook Handbook · AI exposure scores via LLM (Karpathy rubric)

5. Exposure vs Pay vs Employment — Chart

This chart plots each occupation's AI exposure score (x-axis) against its annual median pay (y-axis). Bubble size represents total US employment — larger bubbles mean more workers in that role. The key insight: high exposure does not cluster at low pay.

AI Exposure vs Median Annual Pay · bubble size = total employment

Source: BLS Occupational Outlook Handbook · Median pay in USD thousands

6. Where Software Developers Land

Software developers, designers, translators, and analysts all cluster at 8–9. The reasoning is straightforward: virtually all of the work happens on a screen, in text, in structured data. There is no physical environment to navigate, no unpredictable setting, and the outputs — code, copy, analysis — are exactly the kinds of things current LLMs are trained to produce.

For comparison, nurses score around 4–5 despite being highly skilled and well-paid. Half the job involves physically being present with a patient, reading non-verbal cues, and performing hands-on procedures that current AI cannot touch. Software development has no equivalent physical anchor.

The bubble chart in section 5 makes one thing immediately visible: software developers sit at high exposure and high pay — a combination that reflects both the AI-addressability of the work and the strong demand for it. Telemarketers sit at maximum exposure and very low pay, a genuinely different situation.

7. The LLM Pipeline Behind the Scores

The tool's description tells us the architecture: "scrapers, parsers, and a pipeline for writing custom LLM prompts to score and color occupations by any criteria."

In practice that means three stages:

Scrape — fetch occupation descriptions from the BLS Occupational Outlook Handbook. Each occupation has a dedicated page with duties, work environment, pay, and outlook.
Prompt — pass the occupation description to an LLM with a structured prompt that asks for a numeric score and a brief justification.
Store and render — parse the response, persist the score, and feed it into the treemap colour scale.

Because the prompt is parameterised, the same pipeline can produce entirely different colour layers — robotics exposure, offshoring risk, climate-transition impact — just by swapping the scoring criteria. That is the real power of the design.

8. Build Your Own: Fetch BLS Data in Python

The BLS Occupational Outlook Handbook has a structured URL per occupation. Fetch the full list from the finder page and scrape each occupation's duties section:

import time
import requests
from bs4 import BeautifulSoup

OOH_INDEX = "https://www.bls.gov/ooh/occupation-finder.htm"

def fetch_occupation_list():
    resp = requests.get(OOH_INDEX, timeout=30)
    resp.raise_for_status()
    soup = BeautifulSoup(resp.text, "html.parser")
    occupations = []
    for row in soup.select("table#occfinderTable tbody tr"):
        cells = row.find_all("td")
        if not cells:
            continue
        link = cells[0].find("a")
        if link:
            occupations.append({
                "title": link.text.strip(),
                "url": "https://www.bls.gov" + link["href"],
                "median_pay": cells[1].text.strip(),
                "education": cells[2].text.strip(),
                "growth": cells[4].text.strip(),
            })
    return occupations


def fetch_occupation_summary(url: str) -> str:
    resp = requests.get(url, timeout=30)
    resp.raise_for_status()
    soup = BeautifulSoup(resp.text, "html.parser")
    summary_div = soup.select_one("#WhatTheyDo")
    if summary_div:
        return summary_div.get_text(" ", strip=True)[:2000]
    return ""

Add polite rate limiting between requests — the BLS servers are public infrastructure. A time.sleep(1) between fetches is sufficient.

9. Write the Scoring Prompt

The scoring prompt needs to produce a structured, parseable response. Ask for JSON so you can extract the score reliably without fragile string parsing:

import anthropic

client = anthropic.Anthropic()

SYSTEM_PROMPT = """
You are an analyst assessing how much current AI systems (LLMs, coding assistants,
image generators, speech-to-text, automation tools) can assist with or replace the
core daily work of a given occupation.

Score on a 0–10 scale:
  0–1  : Almost entirely physical, unpredictable real-world presence required.
  2–3  : Mostly physical or interpersonal; AI helps only peripherally.
  4–5  : Mix of physical and knowledge work.
  6–7  : Mostly knowledge work; some human judgment or physical presence still needed.
  8–9  : Almost entirely on a computer; AI can assist with most tasks today.
  10   : Fully digital, routine information processing, no physical component.

Consider CURRENT AI capability only — not hypothetical future systems.

Respond with JSON only:
{
  "score": ,
  "reasoning": ""
}
""".strip()


def score_occupation(title: str, description: str) -> dict:
    message = client.messages.create(
        model="claude-opus-4-7",
        max_tokens=256,
        system=SYSTEM_PROMPT,
        messages=[{
            "role": "user",
            "content": f"Occupation: {title}\n\nDescription:\n{description}"
        }]
    )
    import json
    return json.loads(message.content[0].text.strip())

Asking for "reasoning" alongside the score forces the model to justify its answer — which improves quality and gives you something useful to display in the UI.

10. Store Scores with a Django Model

# occupations/models.py
from django.db import models


class Occupation(models.Model):
    title          = models.CharField(max_length=200, unique=True)
    bls_url        = models.URLField()
    median_pay     = models.CharField(max_length=60, blank=True)
    education      = models.CharField(max_length=120, blank=True)
    growth         = models.CharField(max_length=80, blank=True)
    total_employed = models.BigIntegerField(null=True, blank=True)
    description    = models.TextField(blank=True)

    # LLM-generated fields
    ai_score      = models.SmallIntegerField(null=True, blank=True)
    ai_reasoning  = models.TextField(blank=True)
    scored_at     = models.DateTimeField(null=True, blank=True)

    class Meta:
        ordering = ["-ai_score", "title"]

    def __str__(self):
        return self.title


class ScoringRun(models.Model):
    started_at   = models.DateTimeField(auto_now_add=True)
    finished_at  = models.DateTimeField(null=True, blank=True)
    total_scored = models.IntegerField(default=0)
    prompt_hash  = models.CharField(max_length=64, blank=True)

    def __str__(self):
        return f"Run {self.pk} — {self.total_scored} scored"

prompt_hash stores an MD5 of the system prompt so you can selectively re-score only occupations evaluated under an outdated prompt when you iterate on criteria.

11. Run It as a Management Command

# occupations/management/commands/score_occupations.py
import hashlib, time
from django.core.management.base import BaseCommand
from django.utils import timezone
from occupations.models import Occupation, ScoringRun
from occupations.bls import fetch_occupation_list, fetch_occupation_summary
from occupations.scoring import SYSTEM_PROMPT, score_occupation


class Command(BaseCommand):
    help = "Fetch BLS occupations and score them for AI exposure"

    def add_arguments(self, parser):
        parser.add_argument("--rescore-all", action="store_true")

    def handle(self, *args, **options):
        prompt_hash = hashlib.md5(SYSTEM_PROMPT.encode()).hexdigest()
        run = ScoringRun.objects.create(prompt_hash=prompt_hash)
        scored = 0

        for data in fetch_occupation_list():
            obj, _ = Occupation.objects.get_or_create(
                title=data["title"],
                defaults={k: data[k] for k in ("url", "median_pay", "education", "growth")},
            )
            if obj.ai_score is not None and not options["rescore_all"]:
                continue
            if not obj.description:
                obj.description = fetch_occupation_summary(obj.bls_url)
                time.sleep(1)
            try:
                result = score_occupation(obj.title, obj.description)
                obj.ai_score = result["score"]
                obj.ai_reasoning = result["reasoning"]
                obj.scored_at = timezone.now()
                obj.save()
                scored += 1
                self.stdout.write(f"  [{obj.ai_score}/10] {obj.title}")
            except Exception as exc:
                self.stderr.write(f"  FAILED {obj.title}: {exc}")
            time.sleep(0.5)

        run.finished_at = timezone.now()
        run.total_scored = scored
        run.save()
        self.stdout.write(self.style.SUCCESS(f"Done. Scored {scored} occupations."))

Run with python manage.py score_occupations. A full pass of ~340 occupations takes around 10 minutes at a polite rate. Use --rescore-all after updating the prompt.

# occupations/views.py — expose scores as a DRF endpoint
from rest_framework.generics import ListAPIView
from rest_framework.serializers import ModelSerializer
from occupations.models import Occupation


class OccupationSerializer(ModelSerializer):
    class Meta:
        model = Occupation
        fields = ["title", "median_pay", "growth", "total_employed",
                  "ai_score", "ai_reasoning"]


class OccupationListView(ListAPIView):
    serializer_class = OccupationSerializer
    queryset = Occupation.objects.filter(ai_score__isnull=False)

12. Why High Exposure Does Not Mean Job Loss

The tool is careful to call this out directly:

"These are rough LLM estimates, not rigorous predictions. A high score does not predict the job will disappear. The score does not account for demand elasticity, latent demand, regulatory barriers, or social preferences for human workers."

Consider demand elasticity for software development: if AI makes each developer 2× more productive, do half the developers lose their jobs — or does the total amount of software that gets built double? History suggests the latter. The calculator, spreadsheet, and IDE all made knowledge workers more productive; none caused mass unemployment in the fields they touched.

Latent demand matters too. Millions of businesses cannot afford a development team today. If AI halves the cost of building software, many of those businesses become viable clients. The pie grows even as the cost per slice falls.

Regulatory barriers apply to doctors, lawyers, and financial advisers. Social preferences apply to therapists, teachers, and carers. A job can be 9/10 technically automatable while being near-zero automatable in practice because people do not want AI in that role.

13. What Developers Should Take Away

A score of 8–9 is not a warning — it is a description of the work surface. Software development is almost entirely done on a computer, in text, in structured systems. Of course AI tools can assist. They already do.

What is genuinely useful from this tool is the relative picture. Developers are highly exposed relative to nurses and electricians, not because the job is going away but because the work is almost entirely in the AI-addressable domain. That means:

Developers who use AI tools will significantly outpace those who do not.
Routine, well-specified tasks — boilerplate, format conversions, basic CRUD — become AI-first. Human developer time shifts toward ambiguous problems, system design, and judgment calls.
Demand for software is not fixed. Cheaper development expands what gets built, expanding the number of problems that need a developer's eye.

The most reusable takeaway is the pipeline design itself: a structured prompt, a numeric output, clean separation between what you measure and how you measure it. That pattern — evaluate any corpus by any LLM-defined criteria, store, and visualise at scale — applies far beyond job market analysis.