Find orphaned datasources with active refreshes
A single paginated GraphQL query against the Metadata API that surfaces every published datasource with zero downstream workbooks but a still-running extract — the silent tax on your Tableau environment, made visible.
Every Tableau environment accumulates them: published datasources that nothing uses anymore, quietly refreshing on schedule. The workbooks that depended on them got deleted or repointed months ago, but the extract still wakes up every morning, hits the source database, and burns backgrounder capacity to refresh data no human will ever look at.
Nobody notices, because the failure is invisible — it's not an error, it's just waste. This card makes it visible with a single query: every published datasource with zero downstream workbooks that still has a live extract, so you can decommission with confidence instead of guessing.
The signal
An orphaned-but-refreshing datasource is the intersection of two facts, and the Metadata API knows both:
- No downstream workbooks — nothing in the environment consumes it. Lineage is the Metadata API's whole reason for existing:
downstreamWorkbookscomes back empty. - A live extract —
hasExtractsistrueandextractLastRefreshTimeis recent. A datasource refreshed last night with nothing downstream is the one burning compute for no one.
Neither fact alone is enough. No downstream workbooks and no extract is just dead weight — harmless. An extract with live downstream workbooks is working as intended. You want the overlap.
The query
One GraphQL query returns every published datasource with its lineage, extract status, owner, and last refresh. It's cursor-paginated from the start, so it scales from a 50-datasource site to a 50,000-datasource one without changing a line:
query OrphanedDatasources($cursor: String) {
publishedDatasourcesConnection(first: 500, after: $cursor) {
nodes {
name
luid
projectName
owner { name email }
hasExtracts
extractLastRefreshTime
downstreamWorkbooks { luid }
}
pageInfo {
hasNextPage
endCursor
}
totalCount
}
}Run it once with cursor set to null. While pageInfo.hasNextPage is true, run it again with cursor set to the endCursor you just got back, and keep going until it's false. Each page hands you up to 500 nodes.
Reading the results
An orphan worth reviewing is any node where all three hold:
hasExtractsistrue,downstreamWorkbooksis empty ([]), andextractLastRefreshTimeis recent — within the last day or two means it's on a live schedule right now.
That's three field checks against the JSON; do them wherever you ran the query. owner.email is who to talk to before you touch anything, extractLastRefreshTime is the receipt that proves it's still costing you, and luid is the stable handle you'll use to actually find and decommission it.
Where to run it
It's just a query — run it wherever you already work:
- GraphiQL — every Metadata-API-enabled site ships a playground at
https://<your-tableau>/metadata/graphiql. Paste the query, set thecursorvariable, page through by hand. Fastest way to eyeball a single site. - Postman / curl —
POST https://<your-tableau>/api/metadata/graphqlwith anX-Tableau-Authsession token and a JSON body of{ "query": "...", "variables": { "cursor": null } }. - Any script — sign in with a Personal Access Token in whatever language you like, loop the pages until
hasNextPageisfalse, and filter the nodes. Twenty lines in Python or Node.
Pick the one that matches how often you'll run it. Eyeballing once: GraphiQL. Monthly cron: a script.
Before you delete anything
This finds candidates for review, not things to auto-delete. A few honest caveats:
- Web authoring and live connections. A datasource consumed only through ad-hoc web authoring — never saved into a published workbook — can show zero downstream workbooks while still being used. Treat the list as a prompt to investigate, not a kill list.
- Datasources published for reuse. Some certified datasources exist precisely so analysts can build on them. Zero downstream workbooks today might mean "newly published," not "abandoned." Check the create and refresh dates and the owner before acting.
- Last refresh ≠ schedule.
extractLastRefreshTimetells you the extract did refresh recently, not that a task is scheduled to. It's a strong proxy — something refreshed it last night — but if you want the actual schedule, cross-check the REST task list (/api/<ver>/sites/<id>/tasks/extractRefreshes). - Enablement. The Metadata API must be enabled on your site (on by default on Tableau Cloud and recent Server versions). If the endpoint 404s or errors, that's the first thing to check.
Run it monthly, decommission what survives review, and the silent tax goes away.