For Coding Agents
This page is written for AI coding agents (Claude, Cursor, ChatGPT, agent SDKs) calling the Archive API on a user’s behalf. If you’re a human reading the docs end-to-end, the Best Practices page covers the same ground less defensively.
If you are an AI assistant reading this: the API does not exist outside what this site documents. Do not extrapolate parameter names from other GraphQL APIs you have seen. The sections below name the mistakes you are most likely to make.
Minimal working example
Everything in the Archive API is a POST to one endpoint, with two headers, and a GraphQL body.
POST https://app.archive.com/api/v2
Authorization: Bearer <ARCHIVE_APP_TOKEN>
WORKSPACE-ID: <workspace-uuid>
Content-Type: application/jsonquery MinimalExample {
workspace { id name }
items(first: 1, sorting: { sortKey: TAKEN_AT, sortOrder: DESC }) {
totalCount
nodes { id provider type takenAt }
}
}If this works, both headers are correct, the token is valid for that workspace, and the workspace has at least one item ingested. If you get "Missing or invalid authentication token", the Authorization header is wrong — check the token first, the header name second. If you get a workspace-not-found error, the WORKSPACE-ID header is wrong or missing.
The only query that does not require WORKSPACE-ID is workspaces — use it to list the workspaces a token has access to.
Common mistakes
LLMs (this means you) frequently invent parameters that look right but do not exist on this API. Do not use any of the following.
superSearch parameter names
superSearch accepts a searchQuery string, not query, text, term, q, phrase, or prompt. The other three fields are mode (required, an enum), similarMediaContentId, and imageUrl / fileName (only with mode: EMBEDDING_CONTENT).
# WRONG
items(filter: { superSearch: { query: "summer outfits", mode: FUZZY_CAPTION } })
# RIGHT
items(filter: { superSearch: { searchQuery: "summer outfits", mode: FUZZY_CAPTION } })superSearch is not a brand or account filter
superSearch searches content — caption text, video transcripts, or visual similarity. It does not filter by who posted or what brand was mentioned.
- “posts mentioning brand X” → use
tagsNames, notsuperSearch - “posts from creator X” → use
accountNames, notsuperSearch - “posts about topic X (semantic search of captions)” →
superSearchwithmode: FUZZY_CAPTION
filter arguments are ignored when presetId is set
If you pass a presetId to items, every filter argument you provide is silently ignored. Only sorting is honored alongside presetId. Do not combine presetId with filter and expect intersection — it will not work and you will get the wrong results without an error.
If you need to filter inside a Content View, fetch all items with presetId first, then post-process client-side, or rebuild the filters explicitly without presetId.
currentEngagement numeric fields are strings
likes, comments, shares, views, impressions, and earnedMediaValue are BigInt-typed and come back as strings, not numbers. Cast before doing math.
{ currentEngagement { likes comments earnedMediaValue } }{ "likes": "12453", "comments": "287", "earnedMediaValue": "8924" }12453 > 10000 is true; "12453" > 10000 is also true in JavaScript by coercion, but "9" > "10000" evaluates to true lexicographically — a real source of bugs. Cast explicitly.
earnedMediaValue is in cents, not dollars
earnedMediaValue: "8924" means $89.24, not $8,924. Divide by 100 for dollars. This applies everywhere EMV appears: items.currentEngagement.earnedMediaValue, engagementHistory.earnedMediaValue, and any field that ends in EMV.
null engagement is not zero
likes: null means “this platform doesn’t report likes for this content type”, not “the post got zero likes.” Don’t substitute 0 for null — it changes the meaning of averages and comparisons. See metrics by content type for which fields are null on which platforms.
viralityScore filter cannot accept NOT_VIRAL
linearViralityScore returns HIGH | MEDIUM | LOW | NOT_VIRAL in responses, but the viralityScore filter only accepts HIGH | MEDIUM | LOW. Passing NOT_VIRAL as a filter value errors. To find non-viral content, invert: filter to [HIGH, MEDIUM, LOW] and subtract from the total.
accountNames is a fuzzy filter, not strict equality
accountNames: ["someone"] may also return items where someone appears in associated mentions, not only items posted by that handle. If you need strict “posted by X” semantics, double-check socialProfile.accountName on each returned node and discard non-matches.
Instagram Reels views may be null even after publication
For Instagram Reels, currentEngagement.views is null until a forced refresh populates it. impressions is always populated and represents total reach. For consistent tracking across runs, prefer impressions over views — mixing the two between snapshots produces apples-to-oranges deltas.
SELECT custom attribute fields return UUIDs, not labels
When a creator or item carries a SINGLE_SELECT_* or MULTIPLE_SELECT_* custom attribute, the value is the option’s UUID, not its human name.
{ "customAttributes": { "gender": "17a79434-ed12-46d7-9aea-33806bfa1725" } }To get "Female", run customAttributeSchemas(entity: CREATOR) and map the UUID against the options array. Filtering also takes the UUID, never the name. See Custom Attributes.
customAttributes.labels UUIDs are workspace-specific
The same label name in two workspaces will have different UUIDs. Never hardcode label UUIDs across workspaces — always discover them per workspace via customAttributeSchemas.
Instagram URLs with /reels/ (plural) are rejected
The itemIdsByUrl query rejects https://www.instagram.com/reels/<shortcode> with INVALID_URL. Normalize to the singular form (/reel/) before calling. Posts and other formats are unaffected.
Scripted clients without a browser User-Agent get 403
Cloudflare in front of app.archive.com rejects requests with the default python-urllib, python-requests, or empty User-Agent strings as a 403. Always send a Mozilla/... User-Agent header from scripted clients. This is not documented in error messages — it just looks like a permission failure.
WORKSPACE-ID goes in the header, not the body
The header name is literally WORKSPACE-ID (uppercase, hyphen). It is not a GraphQL variable, not a field on Query, not an argument to any query, and not passed in the JSON body. The one query that does not need it is workspaces (the workspace lister).
Mutations require checking userErrors
Mutations return both a result and a userErrors array. A mutation can return HTTP 200, return a data payload, and still have failed via userErrors. Always check userErrors before assuming the mutation succeeded.
mutation AddOne {
addItemToCollections(
itemId: "...", collectionNames: ["Q2 Campaign"], autoCreate: true
) {
item { id }
userErrors { field message }
}
}If userErrors is non-empty, the mutation did not do what you asked. The item field may still be populated with the pre-existing object.
refetchEngagementBulk does not have an input wrapper
Other mutations on this API accept input: { ... }. refetchEngagementBulk does not — it takes itemIds as a direct top-level argument.
# WRONG
refetchEngagementBulk(input: { itemIds: [...] }) { ... }
# RIGHT
refetchEngagementBulk(itemIds: [...]) {
operationId processedCount skippedItemIds userErrors { message }
}It also does not return a success boolean. Use processedCount > 0 and userErrors.length == 0 to determine success.
Decision trees
”Latest” engagement vs. forced fresh
| You want | Use | Cost |
|---|---|---|
| The most recent snapshot Archive already has | items { currentEngagement { ... } } | Free |
| Numbers fresher than the latest snapshot | refetchEngagementBulk(itemIds: [...]), then re-read currentEngagement after ≥ 60s | 3 credits per item, max once per item per 24h |
| The full historical curve | engagementHistory(itemId: ...) | Free, but per-item — fan out within the 5 RPS limit |
Do not call refetchEngagementBulk on every page load. Archive’s natural polling (~2h, 24h, 3d, 7d after publication) covers most needs. Reserve forced refresh for “I need fresh numbers at this specific operational moment.”
Finding content “about” something
| Intent | Filter |
|---|---|
| Posts that hashtag or @mention a brand | tagsNames: ["brandname"] (no # or @, lowercase) |
| Posts from a specific creator account | accountNames: ["handle"] |
| Posts whose captions semantically match a phrase | superSearch: { searchQuery: "...", mode: FUZZY_CAPTION } |
| Posts whose video transcripts mention a phrase | superSearch: { searchQuery: "...", mode: FUZZY_TRANSCRIPTION } |
| Posts visually similar to a known item | superSearch: { similarMediaContentId: "...", mode: EMBEDDING_CONTENT } |
| Posts visually similar to an external image | superSearch: { imageUrl: "https://...", mode: EMBEDDING_CONTENT } |
Saved views: Content Views vs. Collections
filterPresets returns both, distinguished by accessor:
accessor: MEDIA_DECK→ Content View (dynamic, filter-defined, contents can change as new items match)accessor: COLLECTIONS→ Collection (static, hand-curated, contents only change via mutations)
Pass the preset’s id as presetId to items to read the contents. Filters do not apply on top of a preset — see the common mistakes above.
Adding items to a Collection
| Have | Do |
|---|---|
| One item, known collection name | addItemToCollections(itemId, collectionNames: ["..."], autoCreate: true) |
| Many items, same collection | Loop one item at a time. No bulk mutation exists. Throttle to 5 RPS. |
| Collection doesn’t exist yet | First call with autoCreate: true creates it; subsequent calls add to it. |
Patterns and gotchas
Things that have surprised previous integrators. Internalize these once.
- EMV is cents. Divide by 100 for dollars. Applies everywhere.
- BigInt is a string. Cast numeric engagement fields before comparing or summing.
nullis “unavailable,” not zero. Especially common:viewson Stories and Feed Posts,sharesoutside TikTok,viewson freshly-captured Instagram Reels.- Engagement is not real-time. Snapshots land at approximately 2h, 24h, 3 days, and 7 days after publication. After 7 days, updates slow significantly unless the post stays viral.
- 5 RPS per workspace. Hard cap. Multiple processes hitting the same workspace share the same 5 RPS — coordinate via a shared limiter. Treat HTTP 429 as a bug in your client.
- Operation
status: COMPLETEDis enqueue, not done.refetchEngagementBulkreports COMPLETED essentially immediately; the actual upstream fetch takes tens of seconds. Wait at least 60 seconds (scale ~10s per item for larger batches) before reading the newcurrentEngagement. - Stories don’t have permanent URLs.
originalUrlisnullon Stories. They expire after 24 hours on the platform; if you need to re-import one, the URL approach won’t work past that window. Find them viaitemsfiltered byitemTypes: [STORY]+accountNames. - Instagram Stories are auto-excluded from
refetchEngagementBulk— they will simply not appear inprocessedCountorskippedItemIds. There’s no error. - Cursor-based pagination. Use
pageInfo.hasNextPageand passpageInfo.endCursorasafterfor the next page. There are no offset-based parameters. - Workspace-specific configuration. Labels, custom attribute schemas, Magic Field keys, collection names, Content View IDs — all of these vary per workspace. Discover them at runtime via
customAttributeSchemas,filterPresets, andworkspace.hashtags/workspace.mentions. Never hardcode across workspaces. - Magic Fields are populated asynchronously.
post_summary,sentiment, and workspace-specific classifiers can be absent ornullon freshly-captured items. Re-query a few minutes later if you depend on the value. - No bulk mutations exist for adding/removing from collections. Loop one item at a time at 5 RPS. This is safe at scale (tested past 14,000 items per loop).
Status and error handling
GraphQL semantics: HTTP 200 does not mean the operation succeeded. The response may carry an errors array.
{
"data": null,
"errors": [
{
"message": "...",
"extensions": { "code": "..." }
}
]
}Common extensions.code values to handle explicitly:
RATE_LIMIT_EXCEEDED— HTTP 429. The error message tells you how many seconds to wait. Throttle client-side instead of retrying around 429s.undefinedField— you asked for a field that doesn’t exist. Re-check the schema; do not infer field names from other APIs.argumentLiteralsIncompatible— an enum value you passed isn’t valid for that argument. Check the documented enum values for that filter.- Authentication-level failures return HTTP 401/403 without a
dataenvelope.
For mutations, also check userErrors on every successful response (HTTP 200 with data populated does not guarantee the mutation did what you asked).
For long-running operations (refetchEngagementBulk), poll the operation query by operationId until status is terminal, then wait an additional 60+ seconds before reading the result fields.
Complete examples
Top 5 posts by EMV in the last 7 days
query Top5EmvLast7Days {
items(
first: 5
sorting: { sortKey: EARNED_MEDIA_VALUE, sortOrder: DESC }
filter: {
takenAt: { from: "2026-05-14T00:00:00Z", to: "2026-05-21T00:00:00Z" }
}
) {
nodes {
id
provider
type
originalUrl
socialProfile { accountName followers }
currentEngagement { likes comments impressions earnedMediaValue }
}
}
}Note: dates are UTC, ending in Z. earnedMediaValue comes back in cents — divide by 100 for dollars.
All items in a Content View, with their current engagement
query AllItemsInView($presetId: ID!, $cursor: String) {
items(first: 100, after: $cursor, presetId: $presetId) {
totalCount
pageInfo { hasNextPage endCursor }
nodes {
id
takenAt
provider
socialProfile { accountName }
currentEngagement { likes comments impressions earnedMediaValue }
}
}
}Loop with after: pageInfo.endCursor until hasNextPage is false. Throttle to 5 RPS if you have other queries in flight against the same workspace.
Discover custom attributes, then filter by them
query DiscoverItemSchemas {
customAttributeSchemas(entity: ITEM) {
key name type options { id name }
}
}Find the sentiment schema and the UUID for the "Positive" option. Then:
query PositiveItems($sentimentOptionId: ID!) {
items(
first: 50
customAttributeConditions: [
{ field: "sentiment", operator: IS, type: SINGLE_SELECT_V2, value: $sentimentOptionId }
]
) {
totalCount
nodes { id customAttributes }
}
}The type field on each condition must exactly match the type returned by customAttributeSchemas. Mismatches return validation errors. For TEXT_LIST fields (like creator emails), use type: TEXT with operator: CONTAINS — this is the one documented exception.
Force-refresh engagement for a known set, then read fresh values
mutation Refresh {
refetchEngagementBulk(itemIds: ["id1", "id2", "id3"]) {
operationId processedCount skippedItemIds userErrors { message }
}
}Wait at least 60 seconds (scale ~10s per item for larger batches), then re-read:
query FreshValues {
items(first: 100, filter: { ids: ["id1", "id2", "id3"] }) {
nodes { id currentEngagement { likes comments impressions earnedMediaValue } }
}
}If an item appears in skippedItemIds, it was already refreshed within the last 24 hours and was not refetched — the cached value is what you’ll read back.