For Coding Agents

This page is written for AI coding agents (Claude, Cursor, ChatGPT, agent SDKs) calling the Archive API on a user’s behalf. If you’re a human reading the docs end-to-end, the Best Practices page covers the same ground less defensively.

If you are an AI assistant reading this: the API does not exist outside what this site documents. Do not extrapolate parameter names from other GraphQL APIs you have seen. The sections below name the mistakes you are most likely to make.

Minimal working example

Everything in the Archive API is a POST to one endpoint, with two headers, and a GraphQL body.


POST https://app.archive.com/api/v2
Authorization: Bearer <ARCHIVE_APP_TOKEN>
WORKSPACE-ID: <workspace-uuid>
Content-Type: application/json


query MinimalExample {
  workspace { id name }
  items(first: 1, sorting: { sortKey: TAKEN_AT, sortOrder: DESC }) {
    totalCount
    nodes { id provider type takenAt }
  }
}

If this works, both headers are correct, the token is valid for that workspace, and the workspace has at least one item ingested. If you get "Missing or invalid authentication token", the Authorization header is wrong — check the token first, the header name second. If you get a workspace-not-found error, the WORKSPACE-ID header is wrong or missing.

The only query that does not require WORKSPACE-ID is workspaces — use it to list the workspaces a token has access to.

Common mistakes

LLMs (this means you) frequently invent parameters that look right but do not exist on this API. Do not use any of the following.

`superSearch` parameter names

superSearch accepts a searchQuery string, not query, text, term, q, phrase, or prompt. The other three fields are mode (required, an enum), similarMediaContentId, and imageUrl / fileName (only with mode: EMBEDDING_CONTENT).


# WRONG
items(filter: { superSearch: { query: "summer outfits", mode: FUZZY_CAPTION } })
 
# RIGHT
items(filter: { superSearch: { searchQuery: "summer outfits", mode: FUZZY_CAPTION } })

`superSearch` is not a brand or account filter

superSearch searches content — caption text, video transcripts, or visual similarity. It does not filter by who posted or what brand was mentioned.

“posts mentioning brand X” → use tagsNames, not superSearch
“posts from creator X” → use accountNames, not superSearch
“posts about topic X (semantic search of captions)” → superSearch with mode: FUZZY_CAPTION

`filter` arguments are ignored when `presetId` is set

If you pass a presetId to items, every filter argument you provide is silently ignored. Only sorting is honored alongside presetId. Do not combine presetId with filter and expect intersection — it will not work and you will get the wrong results without an error.

If you need to filter inside a Content View, fetch all items with presetId first, then post-process client-side, or rebuild the filters explicitly without presetId.

`currentEngagement` numeric fields are strings

likes, comments, shares, views, impressions, and earnedMediaValue are BigInt-typed and come back as strings, not numbers. Cast before doing math.


{ currentEngagement { likes comments earnedMediaValue } }


{ "likes": "12453", "comments": "287", "earnedMediaValue": "8924" }

12453 > 10000 is true; "12453" > 10000 is also true in JavaScript by coercion, but "9" > "10000" evaluates to true lexicographically — a real source of bugs. Cast explicitly.

`earnedMediaValue` is in cents, not dollars

earnedMediaValue: "8924" means $89.24, not $8,924. Divide by 100 for dollars. This applies everywhere EMV appears: items.currentEngagement.earnedMediaValue, engagementHistory.earnedMediaValue, and any field that ends in EMV.

`null` engagement is not zero

likes: null means “this platform doesn’t report likes for this content type”, not “the post got zero likes.” Don’t substitute 0 for null — it changes the meaning of averages and comparisons. See metrics by content type for which fields are null on which platforms.

`viralityScore` filter cannot accept `NOT_VIRAL`

linearViralityScore returns HIGH | MEDIUM | LOW | NOT_VIRAL in responses, but the viralityScore filter only accepts HIGH | MEDIUM | LOW. Passing NOT_VIRAL as a filter value errors. To find non-viral content, invert: filter to [HIGH, MEDIUM, LOW] and subtract from the total.

`accountNames` is a fuzzy filter, not strict equality

accountNames: ["someone"] may also return items where someone appears in associated mentions, not only items posted by that handle. If you need strict “posted by X” semantics, double-check socialProfile.accountName on each returned node and discard non-matches.

`Instagram Reels` views may be `null` even after publication

For Instagram Reels, currentEngagement.views is null until a forced refresh populates it. impressions is always populated and represents total reach. For consistent tracking across runs, prefer impressions over views — mixing the two between snapshots produces apples-to-oranges deltas.

`SELECT` custom attribute fields return UUIDs, not labels

When a creator or item carries a SINGLE_SELECT_* or MULTIPLE_SELECT_* custom attribute, the value is the option’s UUID, not its human name.


{ "customAttributes": { "gender": "17a79434-ed12-46d7-9aea-33806bfa1725" } }

To get "Female", run customAttributeSchemas(entity: CREATOR) and map the UUID against the options array. Filtering also takes the UUID, never the name. See Custom Attributes.

`customAttributes.labels` UUIDs are workspace-specific

The same label name in two workspaces will have different UUIDs. Never hardcode label UUIDs across workspaces — always discover them per workspace via customAttributeSchemas.

`Instagram URLs` with `/reels/` (plural) are rejected

The itemIdsByUrl query rejects https://www.instagram.com/reels/<shortcode> with INVALID_URL. Normalize to the singular form (/reel/) before calling. Posts and other formats are unaffected.

Scripted clients without a browser User-Agent get 403

Cloudflare in front of app.archive.com rejects requests with the default python-urllib, python-requests, or empty User-Agent strings as a 403. Always send a Mozilla/... User-Agent header from scripted clients. This is not documented in error messages — it just looks like a permission failure.

`WORKSPACE-ID` goes in the header, not the body

The header name is literally WORKSPACE-ID (uppercase, hyphen). It is not a GraphQL variable, not a field on Query, not an argument to any query, and not passed in the JSON body. The one query that does not need it is workspaces (the workspace lister).

Mutations require checking `userErrors`

Mutations return both a result and a userErrors array. A mutation can return HTTP 200, return a data payload, and still have failed via userErrors. Always check userErrors before assuming the mutation succeeded.


mutation AddOne {
  addItemToCollections(
    itemId: "...", collectionNames: ["Q2 Campaign"], autoCreate: true
  ) {
    item { id }
    userErrors { field message }
  }
}

If userErrors is non-empty, the mutation did not do what you asked. The item field may still be populated with the pre-existing object.

`refetchEngagementBulk` does not have an `input` wrapper

Other mutations on this API accept input: { ... }. refetchEngagementBulk does not — it takes itemIds as a direct top-level argument.


# WRONG
refetchEngagementBulk(input: { itemIds: [...] }) { ... }
 
# RIGHT
refetchEngagementBulk(itemIds: [...]) {
  operationId processedCount skippedItemIds userErrors { message }
}

It also does not return a success boolean. Use processedCount > 0 and userErrors.length == 0 to determine success.

Decision trees

”Latest” engagement vs. forced fresh

You want	Use	Cost
The most recent snapshot Archive already has	`items { currentEngagement { ... } }`	Free
Numbers fresher than the latest snapshot	`refetchEngagementBulk(itemIds: [...])`, then re-read `currentEngagement` after ≥ 60s	3 credits per item, max once per item per 24h
The full historical curve	`engagementHistory(itemId: ...)`	Free, but per-item — fan out within the 5 RPS limit

Do not call refetchEngagementBulk on every page load. Archive’s natural polling (~2h, 24h, 3d, 7d after publication) covers most needs. Reserve forced refresh for “I need fresh numbers at this specific operational moment.”

Finding content “about” something

Intent	Filter
Posts that hashtag or @mention a brand	`tagsNames: ["brandname"]` (no `#` or `@`, lowercase)
Posts from a specific creator account	`accountNames: ["handle"]`
Posts whose captions semantically match a phrase	`superSearch: { searchQuery: "...", mode: FUZZY_CAPTION }`
Posts whose video transcripts mention a phrase	`superSearch: { searchQuery: "...", mode: FUZZY_TRANSCRIPTION }`
Posts visually similar to a known item	`superSearch: { similarMediaContentId: "...", mode: EMBEDDING_CONTENT }`
Posts visually similar to an external image	`superSearch: { imageUrl: "https://...", mode: EMBEDDING_CONTENT }`

Saved views: Content Views vs. Collections

filterPresets returns both, distinguished by accessor:

accessor: MEDIA_DECK → Content View (dynamic, filter-defined, contents can change as new items match)
accessor: COLLECTIONS → Collection (static, hand-curated, contents only change via mutations)

Pass the preset’s id as presetId to items to read the contents. Filters do not apply on top of a preset — see the common mistakes above.

Adding items to a Collection

Have	Do
One item, known collection name	`addItemToCollections(itemId, collectionNames: ["..."], autoCreate: true)`
Many items, same collection	Loop one item at a time. No bulk mutation exists. Throttle to 5 RPS.
Collection doesn’t exist yet	First call with `autoCreate: true` creates it; subsequent calls add to it.

Patterns and gotchas

Things that have surprised previous integrators. Internalize these once.

EMV is cents. Divide by 100 for dollars. Applies everywhere.
BigInt is a string. Cast numeric engagement fields before comparing or summing.
null is “unavailable,” not zero. Especially common: views on Stories and Feed Posts, shares outside TikTok, views on freshly-captured Instagram Reels.
Engagement is not real-time. Snapshots land at approximately 2h, 24h, 3 days, and 7 days after publication. After 7 days, updates slow significantly unless the post stays viral.
5 RPS per workspace. Hard cap. Multiple processes hitting the same workspace share the same 5 RPS — coordinate via a shared limiter. Treat HTTP 429 as a bug in your client.
Operation status: COMPLETED is enqueue, not done. refetchEngagementBulk reports COMPLETED essentially immediately; the actual upstream fetch takes tens of seconds. Wait at least 60 seconds (scale ~10s per item for larger batches) before reading the new currentEngagement.
Stories don’t have permanent URLs. originalUrl is null on Stories. They expire after 24 hours on the platform; if you need to re-import one, the URL approach won’t work past that window. Find them via items filtered by itemTypes: [STORY] + accountNames.
Instagram Stories are auto-excluded from refetchEngagementBulk — they will simply not appear in processedCount or skippedItemIds. There’s no error.
Cursor-based pagination. Use pageInfo.hasNextPage and pass pageInfo.endCursor as after for the next page. There are no offset-based parameters.
Workspace-specific configuration. Labels, custom attribute schemas, Magic Field keys, collection names, Content View IDs — all of these vary per workspace. Discover them at runtime via customAttributeSchemas, filterPresets, and workspace.hashtags/workspace.mentions. Never hardcode across workspaces.
Magic Fields are populated asynchronously. post_summary, sentiment, and workspace-specific classifiers can be absent or null on freshly-captured items. Re-query a few minutes later if you depend on the value.
No bulk mutations exist for adding/removing from collections. Loop one item at a time at 5 RPS. This is safe at scale (tested past 14,000 items per loop).

Status and error handling

GraphQL semantics: HTTP 200 does not mean the operation succeeded. The response may carry an errors array.


{
  "data": null,
  "errors": [
    {
      "message": "...",
      "extensions": { "code": "..." }
    }
  ]
}

Common extensions.code values to handle explicitly:

RATE_LIMIT_EXCEEDED — HTTP 429. The error message tells you how many seconds to wait. Throttle client-side instead of retrying around 429s.
undefinedField — you asked for a field that doesn’t exist. Re-check the schema; do not infer field names from other APIs.
argumentLiteralsIncompatible — an enum value you passed isn’t valid for that argument. Check the documented enum values for that filter.
Authentication-level failures return HTTP 401/403 without a data envelope.

For mutations, also check userErrors on every successful response (HTTP 200 with data populated does not guarantee the mutation did what you asked).

For long-running operations (refetchEngagementBulk), poll the operation query by operationId until status is terminal, then wait an additional 60+ seconds before reading the result fields.

Complete examples

All items in a Content View, with their current engagement


query AllItemsInView($presetId: ID!, $cursor: String) {
  items(first: 100, after: $cursor, presetId: $presetId) {
    totalCount
    pageInfo { hasNextPage endCursor }
    nodes {
      id
      takenAt
      provider
      socialProfile { accountName }
      currentEngagement { likes comments impressions earnedMediaValue }
    }
  }
}

Loop with after: pageInfo.endCursor until hasNextPage is false. Throttle to 5 RPS if you have other queries in flight against the same workspace.

Discover custom attributes, then filter by them


query DiscoverItemSchemas {
  customAttributeSchemas(entity: ITEM) {
    key name type options { id name }
  }
}

Find the sentiment schema and the UUID for the "Positive" option. Then:


query PositiveItems($sentimentOptionId: ID!) {
  items(
    first: 50
    customAttributeConditions: [
      { field: "sentiment", operator: IS, type: SINGLE_SELECT_V2, value: $sentimentOptionId }
    ]
  ) {
    totalCount
    nodes { id customAttributes }
  }
}

The type field on each condition must exactly match the type returned by customAttributeSchemas. Mismatches return validation errors. For TEXT_LIST fields (like creator emails), use type: TEXT with operator: CONTAINS — this is the one documented exception.

Force-refresh engagement for a known set, then read fresh values


mutation Refresh {
  refetchEngagementBulk(itemIds: ["id1", "id2", "id3"]) {
    operationId processedCount skippedItemIds userErrors { message }
  }
}

Wait at least 60 seconds (scale ~10s per item for larger batches), then re-read:


query FreshValues {
  items(first: 100, filter: { ids: ["id1", "id2", "id3"] }) {
    nodes { id currentEngagement { likes comments impressions earnedMediaValue } }
  }
}

If an item appears in skippedItemIds, it was already refreshed within the last 24 hours and was not refetched — the cached value is what you’ll read back.

For Coding Agents

Minimal working example

Common mistakes

superSearch parameter names

superSearch is not a brand or account filter

filter arguments are ignored when presetId is set

currentEngagement numeric fields are strings

earnedMediaValue is in cents, not dollars

null engagement is not zero

viralityScore filter cannot accept NOT_VIRAL

accountNames is a fuzzy filter, not strict equality

Instagram Reels views may be null even after publication

SELECT custom attribute fields return UUIDs, not labels

customAttributes.labels UUIDs are workspace-specific

Instagram URLs with /reels/ (plural) are rejected