State of Data in Finance 2025:
From Raw Feeds to Agentic Workflows
CIOs and Heads of Data are sitting on rising data budgets, complex vendor stacks, and pressure to “do something” with AI. This paper is our view of where the market really is, where it is stuck, and what a modern, AI-native data workflow has to look like if you want Sharpe-relevant impact instead of another sandbox.
Executive Overview
The short version: data spend is still growing, AI has gone from experiment to expectation, but workflows are nowhere near where they need to be. The firms that win will treat data and AI as a single operating system, not two separate projects.
Across 2025 surveys and benchmarks, a consistent picture emerges: alternative data budgets are still expanding or holding steady for nearly 90% of buy-side firms, while market-data budgets grow more slowly but remain highly resilient. At the same time, AI usage in internal workflows has roughly doubled year-on-year and generative/agentic AI is now a CIO agenda item, not a side experiment.
The problem is not lack of data or lack of AI. The problem is workflow. Data teams spend their time negotiating price, cleaning feeds, chasing entitlements, and re-implementing the same due diligence and backtests. PMs and quants wait months for datasets that should be in production in weeks. Compliance teams see rising risk with limited transparency into how AI agents are touching data.
This paper summarizes where the market really is in 2025, what leading firms are struggling with, and how a modern, AI-native workflow can change the economics. It is written from the vantage point of Kamba’s work with data-intensive hedge funds, multi-PM platforms, and asset managers.
~90% ↑ or flat
Stable + low-teens % ↑
>90% using GenAI
Market Snapshot 2025: Budgets, Stacks, and Structure
The data market in 2025 is bigger, more crowded, and more operationally fragile than it looks from the P&L line “Data & Subscriptions”.
Alternative data. Recent buy-side surveys show:
- ~89% of firms expect alt-data budgets to increase or stay the same going into 2026.
- The average respondent reports ~19 alternative datasets in production, with a long tail of firms running 50+.
- Trials are brutal: in the last two years, most firms only subscribed to <25% of datasets they trialled.
Translation: the alt-data market is still in expansion mode, but signal-to-noise is low and sourcing teams are spending a lot of time trialing datasets that never make it to alpha.
Market data. The picture is different:
- Budgets are more stable, with most firms planning flat or modest single-digit increases rather than cuts.
- Buyers report subscribing to ~30 market datasets on average, with large players sitting in the triple digits.
- Market data is increasingly delivered via cloud and managed services, but usage analytics and rationalization are still immature.
Large benchmark studies consistently show overlapping products, opaque entitlements, and limited visibility into who is actually using what. That is margin leakage, not strategy.
What Buy-Side Firms Are Struggling With in 2025
If you strip away the marketing, the same four failure modes show up in almost every serious survey: economics, workflow, governance, and culture.
1. Economics: “Data drag” is real.
- Price negotiations remain the top friction in onboarding new datasets.
- Trials fail because teams can’t operationalize evaluation quickly enough.
- Sticky renewals mean dead spend persists even when usage decays.
2. Workflow: pipelines are clogged.
- Typical onboarding times remain 3–6 months in the median case.
- Teams repeat the same steps (profiling, DQR, backtesting, documentation) with limited reuse.
- PMs see data after the window of edge has already decayed.
3. Governance & compliance.
- Privacy and AI regulation are tightening, not loosening.
- Firms now run both data diligence and AI model diligence (rights, lineage, auditability).
- Trials are still treated like sandboxes instead of governed extensions of production.
4. Culture & human capital.
- “Pilot purgatory” is common; many firms can’t tie AI activity to P&L.
- Data/engineering teams are stretched thin and forced into one-off work.
AI & Agentic Workflows: Hype vs Reality
AI is no longer optional — but most firms are still using it as a tactical accelerator, not as the operating fabric of their data workflow.
Where the industry actually is.
- GenAI is widely adopted for internal productivity (summarization, documentation, code generation).
- Investment-centric applications are growing, but production wiring is still limited.
- Agentic AI (multi-step agents, tool-calling) is moving from labs to real workflows.
What’s blocking real impact.
- Data access: entitlements, lineage, and internal sources aren’t cleanly exposed to models.
- Governance: risk teams need auditability and human oversight by design.
- Economics: too many pilots; not enough programs tied to latency reduction and Sharpe impact.
Economics of Status Quo vs. Modern AI-Native Stack
A transparent, conservative scenario: what changes economically when you compress the discovery → diligence → backtest loop and rationalize spend.
Scenario (illustrative only).
- A $5bn multi-PM / multi-strategy fund.
- 30 market data feeds at ~$150k ≈ $4.5m / year.
- 15 alternative datasets at ~$200k ≈ $3.0m / year.
- 12 FTE at ~$300k fully loaded ≈ $3.6m / year.
- Total “data & workflow” run-rate ≈ $11.1m / year.
Directional outcome. If five genuinely high-quality datasets exist in the annual pipeline, the difference between “status quo capture” and “AI-native capture” can be material.
| Status quo | AI-native stack | Delta | |
|---|---|---|---|
| Incremental P&L from 5 high-quality datasets | $3.6m | $8.0m | + $4.4m |
| Vendor savings via rationalization | $0 | $1.0m | + $1.0m |
| Redeployable internal capacity | $0 | $0.9m | + $0.9m |
| Total economic uplift (annual run-rate) | $3.6m | $9.4m | ~$5.8m–$6.0m / year |
What “Good” Looks Like in 2026 — and How Kamba Fits
The bar is moving from “we use AI” to “we run governed, traceable workflows that pull alpha forward.”
A modern target state.
- Unified smart search across internal, market, and alternative data — not three interfaces.
- Agentic workflow from question → Smart Search → DQR → Backtest → Procurement → Reporting, with human sign-off.
- Embedded compliance: entitlements, provenance, and usage logs baked in.
- Vendor rationalization driven by usage and impact, not renewal calendars.
Where Kamba is opinionated.
- AI is the operating system of the workflow — not a bolt-on tool.
- The unit of value is a traceable research workflow, not “a dataset.”
- Symphony + MCP + private-cloud / on-prem is the delivery backbone for serious institutions.
- Smart Search across internal databases, documents, emails, and external sources.
- Automated DQR & backtests with human-readable reasoning traces.
- Procurement & subscription workflow integrated with governance.
- Symphony-native UX plus private-cloud / on-prem options with no training on client data.
- Measurable reduction in time-to-signal for new datasets.
- Clear view of overlapping feeds and dead spend.
- A governed AI analyst that teams actually use day-to-day.
Sources & Further Reading
This paper synthesizes Kamba’s work with clients with leading 2025 research on data, AI, and asset-management economics.
- Neudata – The Future of Alternative and Market Data 2025 (budgets, trials, AI use cases).
- Eagle Alpha – Alternative Data Report 2025 (vendor trends, sourcing patterns).
- Coalition Greenwich / SIX – market data benchmark research (spend, cloud delivery, challenges).
- BCG Expand – market-data strategy benchmarking.
- McKinsey / Citi / EY / Grant Thornton / CFA Institute – AI adoption, governance, operating model shifts.

