Best Practices

If you’re integrating the Archive API into a dashboard, ETL job, or any high-volume workflow, read this page before you start writing your client. The most common production issues we see — dashboards that take 30+ seconds to load, sporadic timeouts, intermittent missing data — come down to a small set of client-design decisions, not Archive being slow.

Building with an AI coding agent (Claude, Cursor, ChatGPT)? Read For Coding Agents instead — same ground, but written to defuse the specific mistakes LLMs tend to make on this API.

Design around the rate limit, don’t fight it

The API enforces 5 requests per second per workspace — see Rate Limiting for the mechanics.

The most common production anti-pattern: enumerate items with items(first: 50, ...), then fire 50 parallel engagementHistory requests. Five succeed; the other 45 come back 429. With exponential backoff (1s, 2s, 4s…) the retried batch also gets rate-limited and your page wall time balloons to 20–40 seconds.

Throttle client-side, don’t retry around 429s. A token-bucket or fixed-window limiter capping outbound traffic to 5 RPS per workspace — shared across processes if you have multiple — keeps you under the limit. Treat 429 as a bug in your client, not a transient error.

Avoid fan-out when a single call will do

When you find yourself writing “for each item, call X”, check first whether X accepts a list. Most of our endpoints do, and the per-request overhead means a single batched call is dramatically faster than N small ones — even before the rate limit kicks in.

itemIdsByUrl — pass every URL in one call, get back an array. Don’t loop one URL at a time.


query LookupUrls {
  itemIdsByUrl(urls: ["url1", "url2", "url3"]) {
    url status itemId
  }
}

mediaContents — accepts an array of itemIds. The response includes mediaItemId on every record, so you can group results back by item on your side. A common misconception is that the response can’t be mapped back to the input itemIds — it can, via mediaItemId.


query Thumbnails {
  mediaContents(itemIds: ["id1", "id2", "id3", ...]) {
    ... on Image { id mediaItemId thumbnailUrl }
    ... on Video { id mediaItemId thumbnailUrl }
  }
}

transcriptions — accepts itemIds (up to 1,000) or mediaContentIds (up to 20). One batched call, not N small ones.

refetchEngagementBulk — already designed to accept many item IDs at once. Don’t call it once per item.

The one endpoint that does NOT accept a batch input is engagementHistory — it takes a single itemId. For that one you have to fan out, which makes it the most likely culprit when your client starts hitting rate limits. See the next section.

Filter and sort server-side; don’t enumerate to filter client-side

items accepts a rich filter argument and several sortKey values. Use them. The work the server does is essentially free for your client and is dramatically cheaper than pulling everything and post-processing.

If you need…	Use this
Top N posts by EMV	`sorting: { sortKey: EARNED_MEDIA_VALUE, sortOrder: DESC }` + `first: N`
Top N most-liked posts	`sortKey: LIKE_COUNT`
Top N most-viewed posts	`sortKey: MERGED_VIEW_PLAY_COUNT`
Posts from a specific creator	`filter: { accountNames: ["handle"] }`
Posts mentioning a brand	`filter: { tagsNames: ["brandname"] }`
Posts above an engagement threshold	`filter: { engagement: { ... } }`
Posts in a date window	`filter: { takenAt: { from, to } }`
Posts of a specific platform / type	`filter: { provider: INSTAGRAM, itemTypes: [REEL] }`

If you only need the top 10 EMV posts, ask for first: 10 with the right sort — do not paginate through 9,000 items and sort on your side.

The same principle applies to engagementHistory: it accepts a capturedAt date-range filter. If your dashboard only cares about the last 30 days, set filter: { capturedAt: { from, to } } server-side. Don’t pull the full snapshot history and trim client-side.


query RecentHistory {
  engagementHistory(
    itemId: "..."
    filter: { capturedAt: { from: "2026-04-01T00:00:00Z", to: "2026-05-01T00:00:00Z" } }
  ) {
    edges { node { at likes comments views impressions } }
  }
}

Plan your fan-out for `engagementHistory`

This is the one query that has to be fanned out — one call per itemId. If you have N items, you need N requests, and you’re bounded by 5 RPS.

Some patterns that work well:

Reduce N first. Don’t run engagementHistory for every item in the workspace. Use items with sorting + filter to get the subset you actually need (e.g., top 20 EMV in the date window).
Throttle to 5 RPS. A simple queue with a 200ms gap between requests, or a semaphore with concurrency 5, will keep you under the limit without any retries.
Cache aggressively. A single item’s engagementHistory only changes when a new snapshot lands (Archive’s polling cadence is ~2h then 24h, 3d, 7d after publication, then slowing). Caching for at least an hour per item is safe and eliminates most of the fan-out on subsequent loads.
Background-load the long tail. Render the first paint from your cache or from a smaller subset. Fan out the rest asynchronously and update the UI as data arrives.

Avoid burning credits

Some mutations cost credits — most notably refetchEngagementBulk (3 credits per item refreshed). Treat those calls as expensive and intentional, not as a default. Specifically:

Don’t call refetchEngagementBulk on every page load — Archive’s auto-refresh covers most needs. Only force a refresh when you need a fresh value for a specific operational moment (e.g., a daily checkpoint, a campaign close).
The limit is 1 forced refresh per item per 24 hours. Subsequent calls within that window return the item in skippedItemIds with processedCount: 0. Plan your schedule around this — pick the moment you want fresh data, don’t trigger continuously.
currentEngagement on the items query is free — there’s no cost to reading the latest snapshot Archive has. Use that for “what does this post look like right now” reads. Reserve refetchEngagementBulk for “I need numbers fresher than the latest cached snapshot.”

Don’t read fresh values immediately after `COMPLETED`

After triggering refetchEngagementBulk, the resulting operation flips to status: COMPLETED essentially immediately — but the actual upstream fetch can take tens of seconds to populate the new values into currentEngagement. Reading right after COMPLETED returns stale values.

Wait at least 60 seconds (longer for larger batches) before reading the new values. See Operation Latency for the full mechanics.

Quick checklist

Before shipping, ask yourself:

Does my client throttle to at most 5 RPS per workspace, regardless of how many concurrent users / processes are running?
Have I replaced every “for each item, call X” with a batched form, where X supports it?
Am I using filter and sorting on items and engagementHistory to fetch only the subset I need?
Am I caching responses where it makes sense — especially engagementHistory per item?
If I’m calling refetchEngagementBulk, am I aware of the per-item 24h limit and the propagation delay?

If you answered “yes” to all five, your client will sail well under the rate limit and produce predictable response times.