Kamba | White Paper

State of Data in Finance 2025:
From Raw Feeds to Agentic Workflows

CIOs and Heads of Data are sitting on rising data budgets, complex vendor stacks, and pressure to “do something” with AI. This paper is our view of where the market really is, where it is stuck, and what a modern, AI-native data workflow has to look like if you want Sharpe-relevant impact instead of another sandbox.

Section 1

Executive Overview

The short version: data spend is still growing, AI has gone from experiment to expectation, but workflows are nowhere near where they need to be. The firms that win will treat data and AI as a single operating system, not two separate projects.

Across 2025 surveys and benchmarks, a consistent picture emerges: alternative data budgets are still expanding or holding steady for nearly 90% of buy-side firms, while market-data budgets grow more slowly but remain highly resilient. At the same time, AI usage in internal workflows has roughly doubled year-on-year and generative/agentic AI is now a CIO agenda item, not a side experiment.

The problem is not lack of data or lack of AI. The problem is workflow. Data teams spend their time negotiating price, cleaning feeds, chasing entitlements, and re-implementing the same due diligence and backtests. PMs and quants wait months for datasets that should be in production in weeks. Compliance teams see rising risk with limited transparency into how AI agents are touching data.

This paper summarizes where the market really is in 2025, what leading firms are struggling with, and how a modern, AI-native workflow can change the economics. It is written from the vantage point of Kamba’s work with data-intensive hedge funds, multi-PM platforms, and asset managers.

Alt-data budgets

~90% ↑ or flat

Most firms expect alternative data spend to increase or stay level into 2026.
Market-data spend

Stable + low-teens % ↑

Market data remains one of the stickiest, least cuttable budget lines.
AI adoption

>90% using GenAI

Most wealth & asset managers report multiple GenAI use cases in production.
Figures above are directional, synthesized from 2025 industry surveys and reports by Neudata, Coalition Greenwich, BCG, McKinsey, EY, Citi, CFA Institute, Grant Thornton and others.
Section 2

Market Snapshot 2025: Budgets, Stacks, and Structure

The data market in 2025 is bigger, more crowded, and more operationally fragile than it looks from the P&L line “Data & Subscriptions”.

Alternative data. Recent buy-side surveys show:

  • ~89% of firms expect alt-data budgets to increase or stay the same going into 2026.
  • The average respondent reports ~19 alternative datasets in production, with a long tail of firms running 50+.
  • Trials are brutal: in the last two years, most firms only subscribed to <25% of datasets they trialled.

Translation: the alt-data market is still in expansion mode, but signal-to-noise is low and sourcing teams are spending a lot of time trialing datasets that never make it to alpha.

Market data. The picture is different:

  • Budgets are more stable, with most firms planning flat or modest single-digit increases rather than cuts.
  • Buyers report subscribing to ~30 market datasets on average, with large players sitting in the triple digits.
  • Market data is increasingly delivered via cloud and managed services, but usage analytics and rationalization are still immature.

Large benchmark studies consistently show overlapping products, opaque entitlements, and limited visibility into who is actually using what. That is margin leakage, not strategy.

Direction of Travel: Alt vs Market Data Budgets
Directional share of buyers expecting spend to rise or stay flat (2025 → 2026).
Alternative data
~89%
Market data
~80–85%
Typical Production Stack (Directionally)
Average number of datasets in production by type.
Alternative data
~19
Market data
~30
Numbers are rounded and synthesized across multiple 2025 benchmarks; they are meant to be directional, not exact.
Section 3

What Buy-Side Firms Are Struggling With in 2025

If you strip away the marketing, the same four failure modes show up in almost every serious survey: economics, workflow, governance, and culture.

1. Economics: “Data drag” is real.

  • Price negotiations remain the top friction in onboarding new datasets.
  • Trials fail because teams can’t operationalize evaluation quickly enough.
  • Sticky renewals mean dead spend persists even when usage decays.

2. Workflow: pipelines are clogged.

  • Typical onboarding times remain 3–6 months in the median case.
  • Teams repeat the same steps (profiling, DQR, backtesting, documentation) with limited reuse.
  • PMs see data after the window of edge has already decayed.

3. Governance & compliance.

  • Privacy and AI regulation are tightening, not loosening.
  • Firms now run both data diligence and AI model diligence (rights, lineage, auditability).
  • Trials are still treated like sandboxes instead of governed extensions of production.

4. Culture & human capital.

  • “Pilot purgatory” is common; many firms can’t tie AI activity to P&L.
  • Data/engineering teams are stretched thin and forced into one-off work.
Top Operational Pain Points
Directional share of buyers citing each as a major issue.
Price negotiations
~60%
Data quality / lack of signal
~40–45%
Limited internal resources
~30–35%
Vendor customer service
~30%
These themes are consistent across 2025 surveys and benchmarks (Neudata, Greenwich, BCG Expand, and multiple AI/asset-management governance studies).
Section 4

AI & Agentic Workflows: Hype vs Reality

AI is no longer optional — but most firms are still using it as a tactical accelerator, not as the operating fabric of their data workflow.

Where the industry actually is.

  • GenAI is widely adopted for internal productivity (summarization, documentation, code generation).
  • Investment-centric applications are growing, but production wiring is still limited.
  • Agentic AI (multi-step agents, tool-calling) is moving from labs to real workflows.

What’s blocking real impact.

  • Data access: entitlements, lineage, and internal sources aren’t cleanly exposed to models.
  • Governance: risk teams need auditability and human oversight by design.
  • Economics: too many pilots; not enough programs tied to latency reduction and Sharpe impact.
How Firms Are Using AI Today (Directionally)
Share of firms reporting each initiative in 2025 surveys.
Internal efficiency (summaries, coding, docs)
~65–70%
Chatbots / analyst assistants
~45–50%
In-house models for data processing
~35–40%
AI-processed data for investment decisions
~30–35%
The next wave of value is not “more AI.” It’s re-architecting investment workflows around AI + governed data access, end-to-end.
Kamba’s bias is explicit: AI must sit inside the workflow (Smart Search → DQR → Backtest → Procurement → Reporting), not as a separate “AI project.”
Section 5

Economics of Status Quo vs. Modern AI-Native Stack

A transparent, conservative scenario: what changes economically when you compress the discovery → diligence → backtest loop and rationalize spend.

Scenario (illustrative only).

  • A $5bn multi-PM / multi-strategy fund.
  • 30 market data feeds at ~$150k ≈ $4.5m / year.
  • 15 alternative datasets at ~$200k ≈ $3.0m / year.
  • 12 FTE at ~$300k fully loaded ≈ $3.6m / year.
  • Total “data & workflow” run-rate ≈ $11.1m / year.

Directional outcome. If five genuinely high-quality datasets exist in the annual pipeline, the difference between “status quo capture” and “AI-native capture” can be material.

Status quo AI-native stack Delta
Incremental P&L from 5 high-quality datasets $3.6m $8.0m + $4.4m
Vendor savings via rationalization $0 $1.0m + $1.0m
Redeployable internal capacity $0 $0.9m + $0.9m
Total economic uplift (annual run-rate) $3.6m $9.4m ~$5.8m–$6.0m / year
All figures above are illustrative and directional; the point is order-of-magnitude. If you have real alpha potential, workflow latency dominates economics.
Section 6

What “Good” Looks Like in 2026 — and How Kamba Fits

The bar is moving from “we use AI” to “we run governed, traceable workflows that pull alpha forward.”

A modern target state.

  • Unified smart search across internal, market, and alternative data — not three interfaces.
  • Agentic workflow from question → Smart Search → DQR → Backtest → Procurement → Reporting, with human sign-off.
  • Embedded compliance: entitlements, provenance, and usage logs baked in.
  • Vendor rationalization driven by usage and impact, not renewal calendars.

Where Kamba is opinionated.

  • AI is the operating system of the workflow — not a bolt-on tool.
  • The unit of value is a traceable research workflow, not “a dataset.”
  • Symphony + MCP + private-cloud / on-prem is the delivery backbone for serious institutions.
Kamba AI Data Analyst
An AI-native, agentic system for data-intensive financial institutions.
  • Smart Search across internal databases, documents, emails, and external sources.
  • Automated DQR & backtests with human-readable reasoning traces.
  • Procurement & subscription workflow integrated with governance.
  • Symphony-native UX plus private-cloud / on-prem options with no training on client data.
Agentic AI Financial-grade Governance-first
What to expect in 90–180 days
  • Measurable reduction in time-to-signal for new datasets.
  • Clear view of overlapping feeds and dead spend.
  • A governed AI analyst that teams actually use day-to-day.
Directionally: less glue work, more time on strategy and risk-taking.
Section 7

Sources & Further Reading

This paper synthesizes Kamba’s work with clients with leading 2025 research on data, AI, and asset-management economics.

  • Neudata – The Future of Alternative and Market Data 2025 (budgets, trials, AI use cases).
  • Eagle Alpha – Alternative Data Report 2025 (vendor trends, sourcing patterns).
  • Coalition Greenwich / SIX – market data benchmark research (spend, cloud delivery, challenges).
  • BCG Expand – market-data strategy benchmarking.
  • McKinsey / Citi / EY / Grant Thornton / CFA Institute – AI adoption, governance, operating model shifts.
If you want a version customized to your firm’s stack and constraints (entitlements, storage, evaluation flow, governance), Kamba can produce a tailored diagnostic.