Why this matters

This is an AI-native dashboard vs. BI evaluation framework: eight questions that pull apart products designed AI-native from products that bolted AI onto an existing BI tool. Over the last 18 months, every CPG analytics vendor that wasn't already telling an AI story has shipped an AI feature. Trade-promotion-management tools, syndicated-data portals, BI dashboards built five years ago: they all have a natural-language chat box now, or a summary sidebar, or a suggested-question prompt. Sit through a 30-minute demo of any of them and they look more or less identical.

They're not. There's a real difference between a product designed AI-native (built from the start assuming an AI layer) and a product where someone added an LLM as a feature on top of an existing BI or TPM tool. That difference doesn't show up in the demo. It shows up later, in the daily work, by which point you've already signed the contract.

So this page is the evaluation framework. Eight questions to put to any vendor pitching AI-for-CPG analytics, with what a good answer sounds like, what a bad one sounds like, and why each question splits the two architectures apart. They're built to be asked in a real working session against a real data set, not lobbed across a discovery call.

AI-native dashboard vs. BI evaluation: the two architectures

Start with AI-bolted-on-BI. Underneath, it's a dashboarding or data-portal product, and it assumes the analyst picks a report, picks filters, reads a chart. The AI is a chat or summary feature sitting on top of that flow. The data model, the report catalog, the analyst workflow: all of it was designed before AI was an option, so the AI feature works alongside the old flow rather than replacing it.

AI-native is the other thing. The product was designed from day one assuming an AI layer would own the analysis-selection step. The data model is shaped so the AI can reason over it, and the analyst's main interaction is reviewing the system's reasoning, not working the filter sidebar. There's no separate "AI feature" to point at. The AI is the product. For the fuller definition of what owning the analysis-selection step actually means, see What is agentic AI for CPG analysts?.

Both architectures can answer "what's our share at Sprouts" in a demo. They part ways in production, on the questions where the methodology decides the answer, and that's most of them.

The eight questions at a glance

#	Question	What separates the two architectures
1	Show me the system picking which analyses to run	Owns analysis selection vs. filter-autocomplete
2	What happens when two analyses disagree?	Surfaces reconciliation vs. lets analyst find it
3	Can I cite an answer in a buyer deck?	Permalinked methodology-pinned URL vs. screenshot
4	Can the system reproduce a result month-over-month?	Pins methodology versions vs. silent drift
5	How does it handle SPINS attribute hierarchy?	CPG-native model vs. flat-column generic
6	What does it do for out-of-scope questions?	Says so explicitly vs. silently hallucinates
7	Can the analyst correct the system's reasoning?	Updates the analysis vs. adds an inert note
8	Where does the vendor live in the four-layer stack?	Honest about layers 2 and 3 vs. claims end-to-end

The eight questions in depth

1. "Show me the system picking which analyses to run on a question I bring."

Bring a decision question the vendor hasn't seen, something like "are we losing share in adaptogenic refrigerated at Sprouts, and should I move it to a Whole Foods deck?"

A good answer: the system runs three to five analyses without being told which ones (velocity by SKU, share-of-segment, ACV trend, competitor SKU launches, a Whole Foods NielsenIQ cross-check), and the vendor walks you through what it picked and why.

A bad answer: the vendor types a series of filter queries and narrates each one as "you can ask it to do X." That's a chat box on top of a fixed report catalog: Level 1 or 2 on the spectrum in What is agentic AI for CPG analysts?. Useful enough, but not the thing the AI-native pitch is selling.

2. "What does the system do when two of its analyses disagree?"

This is the question that separates the two architectures most cleanly. The ACV trend says distribution is up. The velocity trend says the move is really a store reclassification, not a genuine distribution gain. Now what?

A good answer: the system puts the disagreement front and center. "The +3.2pt ACV move at Sprouts is likely a store-cluster reclassification effective March 14; the comparable real- distribution change is +0.4pts." The vendor shows you where that surfaces in the UI and how the analyst overrides it.

A bad answer: the system runs both analyses but lays them out as two separate dashboards. "You can see the ACV chart here, and if you click into the store-cluster view here, you'll see the reclassification." That hands the reconciliation right back to the analyst, which is exactly the work the AI-native pitch claims to take away.

3. "Can I cite the system's answer in a buyer-facing deck?"

A buyer at Sprouts pushes back: "that's not what I see on our side." The analyst has to defend the number on the spot.

A good answer: every analytical claim the system makes carries a permalinked URL that loads the same view: same filters, same source data version, same methodology choices. The analyst pastes that URL into a buyer email or a deck footer and the citation travels with it.

A bad answer: "You can screenshot the chart and the dashboard remembers the state." A screenshot is not a citation. For why this distinction bites in practice, see Why "ask your data" is the wrong frame for AI in CPG analytics.

4. "Can the system reproduce a result month over month when the underlying data refreshes?"

SPINS refreshes its attribute hierarchy quarterly. Store-cluster definitions shift. Retailer reclassifications happen mid-period. A result that was true on May 1 may not be reproducible on June 1 if the system doesn't hold the methodology-version pinned.

A good answer: the system pins the methodology version to the result. "This share-of-segment number was computed on attribute hierarchy v2.3; the current production version is v2.4, and the same query against v2.4 returns this slightly different number. Both are queryable, and the difference is auditable."

A bad answer: "The data refreshes every week." That answers a different question (latency) and dodges reproducibility entirely. AI-bolted-on-BI tools often can't pin methodology versions at all, because the underlying data model was never built to preserve them.

5. "How does the system handle SPINS attribute hierarchy depth?"

SPINS attributes are several levels deep: top-level category, subcategory, segment, attribute cluster (organic / non-GMO / adaptogenic / etc.). A question like "show me share in adaptogenic refrigerated" requires the system to know which level of the hierarchy "adaptogenic" lives at.

A good answer: the system uses the SPINS attribute hierarchy natively. The filter sidebar mirrors the actual hierarchy levels, and the chat input understands attribute terms without making the analyst map them to category codes first.

A bad answer: the system treats SPINS data as a generic transaction table with category crammed into a single flat column. "You can filter on the 'adaptogenic' tag here." Fine for a demo. It falls apart the moment an analyst asks a cross-attribute question, "share among organic adaptogenic refrigerated," that a flat-column model simply can't represent. That's a clean tell the underlying product wasn't built CPG-native.

6. "What does the system do when I ask something outside its data scope?"

Ask the system: "how is our DTC business performing this month?" The system doesn't have DTC data (it has SPINS, which is brick-and- mortar scanner data).

A good answer: the system says so plainly. "DTC data isn't in what you've loaded. I can show you total brick-and-mortar SPINS-tracked revenue for the period, which is $2.4M. If you have DTC data you'd like to add, here's how to load it."

A bad answer: the system answers anyway, inventing a DTC number, or quietly handing back a SPINS figure labeled "total business." This is the most dangerous failure mode in CPG AI tools, because the analyst has no signal that anything went wrong. AI-native systems generally know where their scope ends; AI-bolted-on-BI systems inherit the LLM's reflex to answer no matter what.

7. "Can the analyst correct the system's reasoning in-place?"

In the worked example from What is agentic AI for CPG analysts?, the analyst disagrees with the system's framing: "the Andronicos dip isn't a promo overlap, it's a Q1 reset issue."

A good answer: the analyst types or selects the correction, and the system reworks the downstream analysis to match. The correction gets captured too, so next time a similar situation comes up, the system can lean on the analyst's prior framing.

A bad answer: "You can add a note." A note doesn't change the analysis. It just sits beside it. AI-bolted-on-BI tools can usually add notes but can't propagate a correction, because the report catalog underneath is fixed in place.

8. "Where in the workflow does Scout/the vendor live, and where does it not?"

The honest answer to this question separates vendors who understand the four-layer CPG analyst stack (source, modeling, analysis, distribution) from vendors who think they're a four-in-one tool.

A good answer: "We own the modeling and analysis layers, 2 and 3 in that framing. We don't replace the syndicators, SPINS or Circana or Stratum, and we don't replace Google Slides. We make the parts in between faster."

A bad answer: "We're an end-to-end solution." No CPG analytics tool is end-to-end. The source layer belongs to the syndicators, and the distribution layer is whatever the buyer reads: decks, emails, broker spreadsheets. A vendor claiming end-to-end either misunderstands the workflow or is overselling, and neither is a good sign.

How to actually run the evaluation

These eight questions are useless in a discovery call. They earn their keep in a 60-minute working session, where the vendor demos against either a sanitized sample of the brand's own SPINS extract or the vendor's reference data set, and the brand shows up with two specific decision questions and one specific cross-source reconciliation question.

Run the session in three parts. First, spend 15 minutes on the brand explaining its category review process: which data sources, which analyses, what's slow about Tuesday. The vendor's job here is to listen. Then give 30 minutes to the brand walking the vendor through three questions against the data: one decision question, one methodology-edge-case question (the store-cluster reclassification, the SPINS attribute refresh), one out-of-scope question (the DTC one), with the vendor answering inside the product. Close with 15 minutes where the brand asks four to six of the eight questions above, picking whichever ones the demo raised. The vendor answers candidly, and the ones the vendor dodges are exactly the ones to chase down afterward.

Two things make this work. One: bring questions the vendor hasn't seen. Any demo answers the vendor's own canned questions flawlessly; real evaluation only happens on the brand's data with the brand's edge cases. Most vendors who dazzle in a generic demo get shaky on the brand's data, and the few that hold up are the ones worth a second meeting. Two: insist on the working session before the contract. A vendor who can't or won't demo against your data, even a sanitized sample of it, is telling you something about how deep the product really goes.

Red flags

A few signals, separate from the eight questions, that you're looking at AI bolted on rather than AI-native:

"Chat with your dashboard" is the headline on the marketing site. AI-native products tend to lead with the output (a defended monthly category read) rather than the input, a chat box.
The roadmap says "we're adding AI to..." instead of "the product is AI-first." How a vendor frames its roadmap gives away what it inherited architecturally.
The sales deck compares against legacy BI feature by feature. The comparison frame itself is the tell. AI-native products usually pitch against the analyst's day, not against Tableau's feature list.
No live demo on your data. A vendor who'll only demo its own reference data and never the prospect's almost always has fragility on real edge cases the reference data conveniently doesn't surface.

No single one of these sinks a vendor. Stack them up, though, and you've got a pattern.

Doing this in Scout

Scout was built AI-native from day one. The analysis-selection step is the system's job, methodology versions are pinned to every result, and the modeling layer (unification, reconciliation, persistence) handles SPINS, Stratum, and Circana extracts together instead of leaving the analyst to stitch them by hand. The eight questions in this framework are, more or less, the questions Scout's customer demos are built to answer, on the customer's own data. Run this framework against several vendors and Scout holds up on questions 1 through 7. Question 8 gets the honest answer: Scout owns layers 2 and 3, not 1 or 4.

Summary + further reading

On the marketing page, AI-native and AI-bolted-on barely differ. In the daily analyst work they differ a lot, and the gap shows up only on methodology edge cases.
These eight questions are built to expose that gap in a 60-minute working session, run on the brand's own data.
The red flags (chat-with-your-dashboard headlines, "we're adding AI to" roadmaps, no live demo on the prospect's data) are weak signals on their own and a clear architectural fingerprint together.

Book a consult

AI-native dashboards vs. BI: a buyer's guide