Anthropic Claude SDK Cheatsheet – TheCoatlessProfessor

Anthropic ships the official Python SDK for the Claude API, and almost everything you do runs through one call: client.messages.create(...). You construct a client once (client = anthropic.Anthropic(), which reads ANTHROPIC_API_KEY from the environment), hand it a model, a max_tokens cap, and a messages list of role-tagged turns, and you get back a Message object. The recurring mental model in this sheet is one picture: a request (a messages list plus an optional system prompt) flows along a gray arrow into client.messages.create(...), and a green Message flows back, whose .content is a list of typed blocks (text, thinking, tool_use) and whose .stop_reason tells you why it stopped. This is not a generic HTTP sheet: where it looks like the requests sheet, the contrast is the point. requests fetches JSON over the wire; this sheet covers the typed Claude surface, the Message, role-tagged turns, content blocks, streaming events, the tool-use loop, image blocks, cache breakpoints, and model selection. The conventional import is import anthropic, the current 2026 default model is claude-opus-4-8, and everything here is verified against anthropic 0.74.0 (removed spellings are flagged per section).

Download the full cheatsheet

All eight panels as one SVG (light or dark), or a print-ready multi-page PDF.

Light SVG Dark SVG Print PDF

Client and First Message

Construct one client = anthropic.Anthropic() (it reads ANTHROPIC_API_KEY from the environment) and call client.messages.create(...) with a model, a max_tokens cap, and a messages list; you get back a Message object. The reply text is not a plain string: msg.content is a list of typed blocks, so read msg.content[0].text only after checking block.type == "text", and inspect msg.stop_reason and msg.usage for why it stopped and what it cost.

Anthropic client panel: construct the client from the env key, send the simplest message, read the first text block, walk all text blocks safely, see why it stopped, check token usage.

Construct the client, send one turn, read the text back.

Anthropic client panel: construct the client from the env key, send the simplest message, read the first text block, walk all text blocks safely, see why it stopped, check token usage.

Construct the client, send one turn, read the text back.

import anthropic

client = anthropic.Anthropic()                 # reads ANTHROPIC_API_KEY
msg = client.messages.create(
    model="claude-opus-4-8", max_tokens=1024,
    messages=[{"role": "user", "content": "Hi"}],
)
msg.content[0].text                            # 'Hello! ...'  (content is a LIST of blocks)
[b.text for b in msg.content if b.type == "text"]   # walk blocks safely
msg.stop_reason                                # 'end_turn' / 'max_tokens' / 'tool_use' / 'refusal'
msg.usage.input_tokens, msg.usage.output_tokens     # what it cost

See Client SDKs. The client reads ANTHROPIC_API_KEY from the environment by default.

Messages and Roles

The API is stateless: a conversation is just a Python list of {"role": ..., "content": ...} turns that you resend in full on every call, alternating user and assistant, always starting with user. To continue a chat, append the model’s own msg.content back as an assistant turn and then append the next user turn; a plain string content is shorthand for a one-element list containing a single text block.

Anthropic messages panel: one user turn shorthand, multi-turn alternating history, append the model reply back, add the next user turn, content as a block list, first turn must be user.

The conversation is a list of role-tagged turns you resend every call.

Anthropic messages panel: one user turn shorthand, multi-turn alternating history, append the model reply back, add the next user turn, content as a block list, first turn must be user.

The conversation is a list of role-tagged turns you resend every call.

messages = [{"role": "user", "content": "What's 2+2?"}]   # one user turn (shorthand)
messages = [{"role": "user", "content": "Hi"},
            {"role": "assistant", "content": "Hello!"},
            {"role": "user", "content": "..."}]           # alternating history
messages.append({"role": "assistant", "content": msg.content})   # resend it all, stateless
messages.append({"role": "user", "content": "and 3+3?"})         # next user turn
{"role": "user", "content": [{"type": "text", "text": "Hi"}]}    # string is sugar for one text block
messages[0]["role"] == "user"                                    # first turn must be user (else 400)

See Messages API. The first turn must be user; a leading assistant turn returns 400.

System Prompt and Parameters

The system prompt is a top-level parameter, separate from the messages list, and is where you set persona and rules; max_tokens is a required hard ceiling on the output. On current models you deepen reasoning with thinking={"type": "adaptive"} (not a token budget) and trade quality against cost with output_config={"effort": "..."}; reasoning is hidden by default, so pass display="summarized" if you want to show it.

Anthropic system panel: set a system prompt, bound the output length, turn on adaptive thinking, tune effort versus cost, show summarized reasoning, pick the model.

Steer with system, bound with max_tokens, deepen with thinking.

Anthropic system panel: set a system prompt, bound the output length, turn on adaptive thinking, tune effort versus cost, show summarized reasoning, pick the model.

Steer with system, bound with max_tokens, deepen with thinking.

client.messages.create(..., system="You are a terse assistant.")   # top-level, not a turn
client.messages.create(..., max_tokens=1024)                       # hard cap, always required
thinking={"type": "adaptive"}                                      # deepen reasoning (not a budget)
output_config={"effort": "high"}                                   # low / medium / high / max
thinking={"type": "adaptive", "display": "summarized"}             # default is omitted
model="claude-opus-4-8"                                            # or sonnet-4-6 / haiku-4-5

See Adaptive thinking. Use thinking={"type": "adaptive"}, not the removed budget_tokens.

Streaming

For anything long, open with client.messages.stream(...) as stream: and iterate stream.text_stream to print tokens as they arrive, then call stream.get_final_message() to get the complete accumulated Message. Streaming is the right default for high max_tokens because a non-streaming request can exceed the SDK’s HTTP timeout and fail; if you need fine control, iterate the raw event stream and switch on event.type.

Anthropic streaming panel: open a streaming context, print text as it arrives, get the complete Message at the end, handle event types by hand, why stream at all, stream with thinking visible.

Stream tokens as they arrive; collect the final Message at the end.

Anthropic streaming panel: open a streaming context, print text as it arrives, get the complete Message at the end, handle event types by hand, why stream at all, stream with thinking visible.

Stream tokens as they arrive; collect the final Message at the end.

with client.messages.stream(                      # context manager
    model="claude-opus-4-8", max_tokens=1024, messages=messages
) as stream:
    for text in stream.text_stream:               # tokens as they arrive
        print(text, end="", flush=True)
    final = stream.get_final_message()            # complete Message, accumulated for you

for event in stream:                              # raw events for fine control
    if event.type == "content_block_delta": ...   # message_start -> deltas -> message_stop
# stream for long outputs: non-streaming above ~16K max_tokens can time out

See Streaming. Prefer streaming for high max_tokens to avoid HTTP timeouts.

Tool Use

A tool is a JSON-schema description you pass in tools=; when the model wants one, the response has stop_reason == "tool_use" and a tool_use block carrying a name, an input, and an id. You run the tool in your own code, append the model’s turn, then send back a user turn containing a tool_result block whose tool_use_id matches, and loop until stop_reason == "end_turn" (or let the beta tool_runner drive the loop for you).

Anthropic tools panel: define a tool with JSON schema, offer the tools on a call, model asks to call a tool, run it and build the result, send the result and loop, let the SDK run the loop.

Define tools, get a tool_use block, run it, send a tool_result back.

Anthropic tools panel: define a tool with JSON schema, offer the tools on a call, model asks to call a tool, run it and build the result, send the result and loop, let the SDK run the loop.

Define tools, get a tool_use block, run it, send a tool_result back.

tools = [{"name": "get_weather", "description": "...",
          "input_schema": {"type": "object",
                           "properties": {"city": {"type": "string"}},
                           "required": ["city"]}}]            # JSON-schema tool
client.messages.create(..., tools=tools, messages=messages)  # offer the tools
msg.stop_reason == "tool_use"                                # model wants a tool_use block
{"type": "tool_result", "tool_use_id": block.id, "content": "18C, sunny"}   # id must match
# append assistant msg.content, then the tool_result user turn, call again until 'end_turn'
client.beta.messages.tool_runner(...)                        # beta: runs the loop for you

See Tool use overview. The tool_result tool_use_id must match the tool_use block’s id.

Multimodal Images

To send an image, put an image content block in a user turn’s content list alongside your text block; the source is either base64 (with a media_type like image/png) or a url. Vision-capable models read jpeg, png, gif, and webp, and the answer comes back as ordinary text blocks you read the same way as any other message.

Anthropic vision panel: encode a local image, send a base64 image block, send an image by URL, mix image and a question, read the description back, supported formats note.

Put image blocks in the content list, base64 or URL, alongside text.

Anthropic vision panel: encode a local image, send a base64 image block, send an image by URL, mix image and a question, read the description back, supported formats note.

Put image blocks in the content list, base64 or URL, alongside text.

import base64
data = base64.standard_b64encode(open("cat.png", "rb").read()).decode()    # encode locally
{"type": "image", "source": {"type": "base64",                             # base64 block
                             "media_type": "image/png", "data": data}}
{"type": "image", "source": {"type": "url", "url": "https://.../cat.png"}}  # or by URL
content = [image_block, {"type": "text", "text": "What is this?"}]          # mix in one user turn
msg.content[0].text                                                        # 'A tabby cat...'
# vision-capable models only; formats: jpeg, png, gif, webp

See Vision. Image blocks live in a user turn’s content list; only vision-capable models read them.

Prompt Caching

Mark the end of a stable prefix with cache_control={"type": "ephemeral"} (on a system block, or top-level to auto-place it) and the API caches that prefix: the first call pays a small write premium (cache_creation_input_tokens), and later calls with the same prefix read it back at roughly a tenth of the cost (cache_read_input_tokens). Caching is a strict prefix match, so any byte change before the breakpoint (a timestamp, a reordered key) silently invalidates it.

Anthropic caching panel: cache a big system prefix, let the SDK auto-place it, first call writes the cache, later calls read the cache, pick a longer TTL, cache is a prefix match.

Mark a stable prefix with cache_control to reuse it cheaply.

Anthropic caching panel: cache a big system prefix, let the SDK auto-place it, first call writes the cache, later calls read the cache, pick a longer TTL, cache is a prefix match.

Mark a stable prefix with cache_control to reuse it cheaply.

system=[{"type": "text", "text": BIG_DOC,
         "cache_control": {"type": "ephemeral"}}]        # breakpoint on a stable prefix
client.messages.create(..., cache_control={"type": "ephemeral"})   # auto-place: simplest
msg.usage.cache_creation_input_tokens                    # first call writes (~1.25x cost once)
msg.usage.cache_read_input_tokens                        # later calls read (~0.1x cost)
"cache_control": {"type": "ephemeral", "ttl": "1h"}      # longer TTL (default 5m)
# strict prefix match: any byte changed before the mark = miss (no datetime.now() in the prefix)

See Prompt caching. Caching is a strict prefix match, so keep volatile bytes after the breakpoint.

Token Counting and Models

Call client.messages.count_tokens(...) to size a prompt before you send it (no generation, so it is cheap), and use client.models.list() / client.models.retrieve(id) to read live context windows and capabilities. Pick the model by tier, claude-opus-4-8 for the hardest work, claude-sonnet-4-6 for a balance, claude-haiku-4-5 for speed and cost, and avoid removed parameters (budget_tokens, temperature on the Opus 4.8 family) in favor of adaptive thinking and effort.

Anthropic tokens and models panel: count tokens before sending, read the count, list available models, inspect one model's limits, choose by tier, avoid deprecated spellings.

Count before you send; pick the model by cost, context, and capability.

Anthropic tokens and models panel: count tokens before sending, read the count, list available models, inspect one model's limits, choose by tier, avoid deprecated spellings.

Count before you send; pick the model by cost, context, and capability.

count = client.messages.count_tokens(                    # no generation, cheap
    model="claude-opus-4-8", messages=messages)
count.input_tokens                                       # size before you pay to generate
client.models.list()                                     # discover models + context windows
client.models.retrieve("claude-opus-4-8").max_input_tokens   # inspect one model's limits
model="claude-opus-4-8"   # vs "claude-sonnet-4-6" vs "claude-haiku-4-5"   # choose by tier
# avoid removed params: no budget_tokens, no temperature on opus-4-8; use adaptive thinking

See Token counting. Counting does not generate, so it is a cheap way to size a prompt first.

Quick Reference

Key Claude SDK calls.
Command	What it does	Area
`anthropic.Anthropic()`	Construct the client (reads env key)	Client
`client.messages.create(...)`	Send a message, get a `Message` back	Messages
`messages=[{"role": "user", ...}]`	The turn list (resend every call)	Roles
`system="..."`	Set persona / rules (top-level param)	System
`max_tokens=1024`	Hard cap on output (always required)	System
`thinking={"type": "adaptive"}`	Enable adaptive reasoning	Thinking
`output_config={"effort": "high"}`	Trade quality vs cost	Effort
`client.messages.stream(...)`	Stream tokens as they arrive	Streaming
`tools=[...]` + `tool_result`	Define and answer tool calls	Tools
`{"type": "image", "source": {...}}`	Send an image block	Vision
`cache_control={"type": "ephemeral"}`	Cache a stable prefix	Caching
`client.messages.count_tokens(...)`	Size a prompt before sending	Tokens
`client.models.list()`	Discover models and limits	Models

What the `Message` exposes.
Attribute	Type	Meaning
`msg.content`	`list`	Typed blocks: `text`, `thinking`, `tool_use`
`msg.content[0].text`	`str`	Text of the first text block
`msg.stop_reason`	`str`	`end_turn` / `max_tokens` / `tool_use` / `refusal`
`msg.stop_sequence`	`str` or `None`	The stop sequence hit, if any
`msg.role`	`str`	Always `"assistant"` for a reply
`msg.model`	`str`	Model that produced the reply
`msg.usage.input_tokens`	`int`	Uncached input tokens billed
`msg.usage.output_tokens`	`int`	Output tokens generated
`msg.usage.cache_creation_input_tokens`	`int`	Tokens written to cache (~1.25x)
`msg.usage.cache_read_input_tokens`	`int`	Tokens served from cache (~0.1x)

Message stop reasons.
Value	Meaning	What to do
`end_turn`	Finished naturally	Done
`max_tokens`	Hit the output cap	Raise `max_tokens` or stream
`tool_use`	Wants to call a tool	Run it, send a `tool_result`
`stop_sequence`	Hit a custom stop sequence	Done
`refusal`	Declined for safety	Surface it; do not retry as-is
`pause_turn`	Paused mid server-tool loop	Re-send to resume

Content block types.
Block `type`	Direction	Holds
`text`	in / out	A string of text
`thinking`	out	Reasoning (empty unless `display="summarized"`)
`tool_use`	out	A tool call: `name`, `input`, `id`
`tool_result`	in	Your tool output, keyed by `tool_use_id`
`image`	in	A `base64` or `url` image source
`document`	in	A PDF / text document source

Common Claude models (current, June 2026).
Model id	Tier	Context	Max output
`claude-opus-4-8`	Most capable	1M	128K
`claude-sonnet-4-6`	Balanced	1M	64K
`claude-haiku-4-5`	Fastest / cheapest	200K	64K

Appendix: Sample Code

The request to Message mental model

import anthropic

client = anthropic.Anthropic()  # reads ANTHROPIC_API_KEY

msg = client.messages.create(
    model="claude-opus-4-8",
    max_tokens=1024,
    system="You are a terse assistant.",
    messages=[{"role": "user", "content": "Name three primary colors."}],
)

msg.content[0].text          # 'Red, blue, yellow.'
msg.stop_reason              # 'end_turn'
msg.usage.input_tokens       # e.g. 19
msg.usage.output_tokens      # e.g. 8

# content is a LIST of typed blocks, not a string:
for block in msg.content:
    if block.type == "text":
        print(block.text)

A multi-turn conversation (resend the whole list)

import anthropic

client = anthropic.Anthropic()
messages = [{"role": "user", "content": "What's 2 + 2?"}]

msg = client.messages.create(
    model="claude-opus-4-8", max_tokens=256, messages=messages
)

# Append the model's reply, then the next question, and call again.
messages.append({"role": "assistant", "content": msg.content})
messages.append({"role": "user", "content": "And times 10?"})

msg = client.messages.create(
    model="claude-opus-4-8", max_tokens=256, messages=messages
)
print(next(b.text for b in msg.content if b.type == "text"))

Streaming with adaptive thinking

import anthropic

client = anthropic.Anthropic()

with client.messages.stream(
    model="claude-opus-4-8",
    max_tokens=2048,
    thinking={"type": "adaptive", "display": "summarized"},
    messages=[{"role": "user", "content": "Explain why the sky is blue."}],
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)
    final = stream.get_final_message()   # complete Message, accumulated for you

print("\n", final.usage.output_tokens)

The tool-use loop (manual)

import anthropic

client = anthropic.Anthropic()

tools = [{
    "name": "get_weather",
    "description": "Get the current weather for a city.",
    "input_schema": {
        "type": "object",
        "properties": {"city": {"type": "string"}},
        "required": ["city"],
    },
}]

def get_weather(city: str) -> str:
    return "18C, sunny"   # your real implementation here

messages = [{"role": "user", "content": "What's the weather in Paris?"}]

while True:
    msg = client.messages.create(
        model="claude-opus-4-8", max_tokens=1024, tools=tools, messages=messages
    )
    if msg.stop_reason != "tool_use":
        break

    messages.append({"role": "assistant", "content": msg.content})
    results = []
    for block in msg.content:
        if block.type == "tool_use":
            out = get_weather(**block.input)
            results.append({
                "type": "tool_result",
                "tool_use_id": block.id,   # must match the tool_use block's id
                "content": out,
            })
    messages.append({"role": "user", "content": results})

print(next(b.text for b in msg.content if b.type == "text"))

Sending an image (base64) and caching a big prefix

import anthropic
import base64

client = anthropic.Anthropic()

with open("cat.png", "rb") as f:
    data = base64.standard_b64encode(f.read()).decode("utf-8")

msg = client.messages.create(
    model="claude-opus-4-8",
    max_tokens=512,
    # Cache a large stable system prefix so repeat calls are ~10x cheaper on it.
    system=[{
        "type": "text",
        "text": BIG_STYLE_GUIDE,                 # a long, unchanging document
        "cache_control": {"type": "ephemeral"},  # the cache breakpoint
    }],
    messages=[{
        "role": "user",
        "content": [
            {"type": "image",
             "source": {"type": "base64", "media_type": "image/png", "data": data}},
            {"type": "text", "text": "What is in this image?"},
        ],
    }],
)

print(msg.content[0].text)
print(msg.usage.cache_creation_input_tokens)  # nonzero on the first call
print(msg.usage.cache_read_input_tokens)      # nonzero on later identical-prefix calls

Counting tokens before you send

import anthropic

client = anthropic.Anthropic()

count = client.messages.count_tokens(
    model="claude-opus-4-8",
    system="You are a terse assistant.",
    messages=[{"role": "user", "content": "Summarize the French Revolution."}],
)
print(count.input_tokens)   # size the prompt before paying to generate

Behavior notes

content is a list of typed blocks, not a string. Read msg.content[0].text only after checking block.type == "text"; a reply can interleave thinking, text, and tool_use blocks.
The API is stateless. There is no server-side session: you resend the full messages list every call, so append the model’s own msg.content back as an assistant turn to continue a chat.
max_tokens is always required, and a non-streaming call with a high cap can exceed the SDK’s HTTP timeout; stream anything long with client.messages.stream(...).
Tool results key by id. A tool_result block’s tool_use_id must match the tool_use block’s id, and you loop until stop_reason == "end_turn".
Caching is a strict prefix match. Any byte change before the cache_control breakpoint (a timestamp, a reordered key) invalidates the cache, so keep volatile content after the mark.
Removed spellings on Opus 4.8. Use thinking={"type": "adaptive"} instead of the removed budget_tokens, and do not pass temperature, top_p, or top_k (they return 400).

References

Anthropic / Claude documentation (current)

Developer platform docs home and the Messages API reference
Client SDKs overview, Adaptive thinking, Streaming
Tool use overview, Vision, Prompt caching, Token counting
Models overview (IDs, context windows, pricing)

Related and supporting

Project

anthropic on PyPI and on GitHub