The Workflow Gap: Why More Data and More AI Still Isn't Working
CIOs and Heads of Data are sitting on rising data budgets, complex vendor stacks, and pressure to "do something" with AI. This paper examines where the market actually is, what is keeping workflows broken, and what genuine alpha impact requires — beyond pilots.
Executive Overview
Data spend is up. AI is a boardroom priority. Workflows are still broken. The firms that pull ahead in 2026 will treat data and AI as a single operating system — not two separate projects sharing a quarterly sync.
The buy-side data market in 2025 presents a surface-level paradox: budgets are expanding, AI adoption has surged, and yet operating margins are compressing and a common theme across 2025 surveys is that data teams feel more stretched, not less, despite the investment. BCG found pre-tax margins fell three percentage points in North America and five in Europe between 2019 and 2023, even as technology spend increased — a signal that spend alone is not the answer.
The four sections that follow move from context to diagnosis to cost. Section 2 frames where budgets, stacks, and market scale actually stand. Section 3 catalogues the failure modes that appear consistently across 2025 and early 2026 research. Section 4 addresses what firms are actually doing with AI versus what they say. Section 5 — the core of this paper — quantifies the opportunity cost of workflow latency across firm sizes — including a full P&L scenario at the $5bn tier. Section 6 covers the 2026 benchmark workflow — what a mature operating model looks like independent of tooling.
The core thesis: workflow is the bottleneck, not data access and not AI access. Firms that are capturing the most value have compressed the discovery → diligence → backtest → production loop and built governance into that loop from the start — not retrofitted it at audit time.
Market Snapshot 2025: Budgets, Stacks, and Scale
Alternative data is mainstream. Market data budgets are resilient. Both are growing faster than the workflows required to extract value from them.
Alternative data: from niche to standard infrastructure.
- Three-quarters of buy-side firms now use non-traditional data sources in their research or investment processes.Coalition Greenwich
- 90% of private fund respondents in a Feb 2026 survey currently use alternative data — up from 67% in 2024 and 62% in 2023.Lowenstein Sandler, Feb 2026
- Over two-thirds of those respondents report alt-data budgets exceeding $1M/yr; 89% plan to increase spend, with 96% of those budget increases directed toward AI-related products.Lowenstein Sandler, Feb 2026
- Nearly two-thirds of buy-side firms expect to increase alternative data spending in the next year — about a quarter of those by more than 10%.Coalition Greenwich
Note: figures differ because they reflect different survey populations and definitions — broad buy-side firms (Coalition Greenwich) vs private fund respondents (Lowenstein Sandler) — and are not directly comparable.
Despite that spending intent, the route from data purchase to portfolio impact remains long. Deloitte's Center for Financial Services notes that fully incorporating a new alternative dataset into the investment decision process — through discovery, diligence, contracting, integration, and validation — can span two to three years. This refers to full institutionalization: governance, scaling, monitoring, and broad adoption across the platform. Time-to-first-production for a specific workflow path is shorter — typically three to six months in practice — and is the figure modelled in Section 5. The bottleneck is almost never access. It is workflow.
Market data: large, persistent, and modernizing.
- Global spending on financial market data and news reached $44.3bn in 2024, up 6.4% year-on-year.Burton-Taylor / Finextra, 2025
- Nearly 70% of buy-side buyers expect market data budgets to increase 1–5% in the next 12 months; very few plan cuts.SIX + Coalition Greenwich, Q3 2025
- Cloud delivery has accelerated sharply: 63% of firms now receive market data via public cloud connectivity, versus just 30% in 2023.SIX + Coalition Greenwich, Q3 2025
- 65% use real-time market data throughout the trading day, up from 54% in 2024; over three-quarters are seeking more or better historical tick data.SIX + Coalition Greenwich, Q3 2025
Despite cloud migration, spend rationalization and usage analytics remain immature at most firms. A WatersTechnology benchmark found 70% of buy-side firms are looking to outsource at least one aspect of market data management — a signal that internal capabilities are stretched.
What Buy-Side Firms Are Struggling With in 2025–2026
Strip away the marketing and the same failure modes appear across every credible survey: integration complexity, governance friction, cost opacity, and a culture that rewards pilots over production.
1. The data itself is not the problem — integration is.
- 79% of fundamental PMs and analysts say combining data from different sources is the most frustrating challenge when working with alternative data.Exabel / BattleFin, Jan 2025 (n=130, ~$820B AUM)
- 98% agree that traditional data and official figures are becoming too slow to reflect changes in economic activity — making fast, reliable alt-data onboarding increasingly mission-critical.Exabel / BattleFin, Jan 2025
- Deloitte describes onboarding a new data vendor as involving thorough due diligence, contract negotiation, and data storage and access rights work — a multi-stage process that routinely stretches across quarters, not weeks.Deloitte CFS
2. AI is adding cost without yet adding proportional value.
- 81% of fund respondents report seeing cost increases for alt-data products that incorporate AI features — yet this has not yet translated into systematic improvements in workflow speed or signal quality for most buyers.Lowenstein Sandler, Feb 2026
- Only 16% of asset managers have fully defined an AI strategy and are implementing it throughout their business — despite 66% calling it a strategic priority.BCG, May 2024
- McKinsey found that asset managers are allocating 60–80% of technology budgets to run-the-business initiatives, leaving a structurally small share for genuine workflow transformation.McKinsey, July 2025
3. Governance and compliance are structural, not optional.
- The SEC's April 2022 Risk Alert noted that exam staff observed advisers using alternative data without reasonably designed written policies and procedures to address MNPI risks — including ad hoc, inconsistent diligence and no memorialization of that diligence.SEC Div. of Examinations Risk Alert, Apr 2022
- 85% of investment-industry employers see a need for industry-wide AI standards and ethical guidelines; 82% say the absence of those standards is actively slowing adoption.CFA Institute, Aug 2024 (n=200)
- The EU AI Act's general date of application is 2 August 2026, with full effectiveness expected by 2027 — putting AI auditability and model oversight on a compliance clock for firms with EU exposure.EPRS AI Act Timeline, Jun 2025
4. "Pilot purgatory" is widespread and measurable.
- McKinsey's analysis of pre-tax operating margins shows a multi-year decline — three points in North America, five in Europe — despite increasing technology investment, indicating that spend is not compounding into capability.McKinsey, July 2025
- Only 16% of asset managers have moved beyond strategy declaration to full implementation, while 75% are dedicating capital and people in the short term — a gap that describes pilot purgatory precisely.BCG, May 2024
AI & Agentic Workflows: Hype vs Reality
AI is widely deployed. It is not widely working. The gap between "we use AI" and "AI creates measurable portfolio impact" is where most firms are currently stuck.
The adoption picture.
- 66% of buy-side survey respondents now use AI/LLMs for internal productivity and workflow efficiency — the dominant use case. Only 36% have adopted AI-processed data to optimise investment or trading strategies.Neudata, Feb 2026
- Among 300 CFOs, CIOs, and portfolio managers surveyed independently by Opinium, AI and generative AI was the most frequently raised topic over the past 12 months — ahead of sustainable investing, thematic strategies, and regulatory change.Index Industry Association, 2024
- 80% of market data professionals view AI/ML as a key driver of data delivery and consumption over the next 2–3 years. Yet the same study reports 90% see AI's near-term role as primarily a recommendation tool, with humans retaining final decisions.SIX + Coalition Greenwich, Q3 2025
The execution gap.
- 72% of asset managers expect GenAI to have significant or transformative impact within 3–5 years. 66% have made it a strategic priority. Only 16% have fully defined a strategy and are implementing it throughout their business.BCG, May 2024
- McKinsey estimates that AI and agentic workflows could deliver value equivalent to 25–40% of an average asset manager's cost base — but only if embedded into redesigned workflows, not deployed alongside existing processes.McKinsey, July 2025
- Technology spend has not consistently translated into productivity: pre-tax operating margins declined over 2019–2023 even as spend increased, reflecting the limits of bolted-on tooling without workflow redesign.McKinsey, July 2025
Alpha Opportunity Lost: The Cost of Every Month of Delay
The cost of slow data onboarding is not the subscription fee. It is the alpha your portfolio never captured because the dataset arrived late. This model measures that cost in one unit: gross alpha opportunity lost, expressed in dollars, by firm size.
AUM × coverage % × alpha (bps ÷ 10,000) × (months delayed ÷ 12). Staff and operational costs are real but are not included here — they are additive and noted separately below the table.
Why the range is wide. Realized alpha depends on capacity, signal half-life, crowding, implementation costs, and portfolio role. A Sharpe-based check: a good dataset might add 0.05–0.20 to portfolio Sharpe. At 6–10% vol, that maps to roughly 30–200 bps/yr gross before costs (illustrative; assumes linear approximation and stable vol).
| Firm tier | AUM · Datasets/yr | Alpha opportunity lost per dataset Conservative (15 bps) · Good (60 bps) · Gross |
Total alpha opportunity lost / yr Good tier · All datasets · Gross, pre-costs |
|---|---|---|---|
| Small HF$500m–$1bn | $750m · 4 datasets | ~$90k · ~$370k | ~$1.5m / yr |
| Mid-size fund$1bn–$5bn | $2.5bn · 6 datasets | ~$305k · ~$1.2m | ~$7.3m / yr |
| Large fund$5bn–$20bn | $10bn · 10 datasets | ~$1.2m · ~$4.9m | ~$49m / yr |
| Institutional / multi-PM$20bn+ | $30bn · 15 datasets | ~$3.7m · ~$14.6m | ~$219m / yr |
The compounding effect. Each row above represents one annual cycle. A mid-size fund running at status-quo onboarding speed for three years foregoes roughly $22m in cumulative gross alpha opportunity — before costs, before staff drag, before dead vendor spend on datasets that never reached production. The question is not whether workflow tooling is expensive. It is whether this number justifies re-examining the stack.
$5bn Fund: Two Types of Cost, Correctly Named
The per-dataset model above measures one thing: alpha opportunity lost — a P&L impact, the return your portfolio didn't earn. Workflow latency also generates a second, distinct cost: operational opportunity cost — the money wasted running an inefficient data operation regardless of whether any signal succeeds. These are different in kind. The table below keeps them separate.
The gross portfolio return your fund did not earn because a dataset that could have informed decisions was still being onboarded. This is a return-attribution line — it would appear (or not) in your performance record. It exists only if the delayed dataset had genuine alpha potential.
The money wasted running an inefficient data operation — overspend on redundant vendor contracts and FTE time consumed by manual workflow that could be compressed or eliminated. This accrues regardless of whether any dataset adds alpha. It is a budget and efficiency loss, not a return loss.
- Redundant datasets — the same concept arriving from multiple sources (internal lake + vendor feed; two vendors with overlapping coverage; legacy + new dataset both active) or the same data copied across multiple stores (Snowflake + S3 + Databricks) with separate compute and egress costs. Kamba identifies candidates via schema/field overlap, semantic similarity, usage duplication, and lineage analysis and pipeline equivalence checks. Savings path: cancel or downgrade one contract; remove duplicate pipelines and refresh jobs.
- Decayed datasets — still being paid for and processed, but showing deteriorating freshness/quality, shrinking coverage, and declining downstream usage or contribution proxies. Detectable via data freshness indicators (lateness, null spikes, missing partitions), quality drift (distribution shift, coverage shrinkage), and downstream signals (reduced usage in notebooks and research queries, declining feature importance, or performance change on controlled ablation). Savings path: cancel, renegotiate, or replace subscriptions; stop processing datasets that no longer justify their cost.
- Zombie datasets and pipelines — assets still running because "someone might need them": tables nobody queries, dashboards nobody opens, features nobody trains on, extracts created for a PM who left. Kamba identifies candidates via usage instrumentation across warehouse queries, BI tools, notebooks, model training jobs, feature store reads, and entitlement access logs — scored by time-since-last-use and production dependency. Savings path: deprecate safely using lineage-aware impact analysis, with rollback window and sign-off.
Baseline data run-rate for context.
- 30 market data feeds at ~$150k each ≈ $4.5m / year.
- 15 alternative datasets at ~$200k each ≈ $3.0m / year.
- 12 FTE at ~$300k fully loaded ≈ $3.6m / year.
- Total data & workflow run-rate ≈ $11.1m / year.
| Cost category & type | Status quo | AI-native workflow infrastructure | Delta |
|---|---|---|---|
| Alpha opportunity lost P&L impact · return not earned · 5 datasets, good tier, ~$1.2m each | ~$6.0m not earned | ~$1.6m not earned | ~$4.4m recovered |
| Operational opportunity cost Cost impact · budget wasted · vendor overspend + manual FTE | ~$1.9m wasted | ~$0 | ~$1.9m saved |
| Total annual impact P&L recovered + costs saved | — | — | ~$6.3m / year |
What a Mature 2026 Workflow Actually Looks Like
The bar is moving from "we use AI" to "we run governed, traceable workflows that pull alpha forward." Here is what that looks like in practice — independent of vendor or tooling choice.
The characteristics that define mature firms.
- Unified discovery: a single intelligent search layer across internal, market, and alternative data — not three separate interfaces with three separate workflows.
- End-to-end agentic loop: question → Smart Search → DQR → Backtest → Procurement → Reporting, with human sign-off at defined checkpoints. This is the core of AI-native workflow infrastructure — not separate tools stitched together manually.
- Governance as infrastructure: entitlements, data provenance, usage logs, and model audit trails are baked in from the start. The SEC's April 2022 Risk Alert makes clear that ad hoc diligence memorialization is an examination risk, not just an operational gap.
- Usage-driven vendor management: rationalization decisions are driven by actual usage and impact data. A WatersTechnology benchmark found 70% of buy-side firms want to outsource at least one market data management function — signalling that internal capacity is structurally limited.
- Compounding team capacity: data engineering effort goes into reusable, governed infrastructure — not one-off requests that reset on every hire.
The governance imperative is tightening.
IOSCO's guidance for AI and machine learning in asset management identifies governance and oversight, algorithm testing and monitoring, data quality and bias controls, explainability, and outsourcing risk as key categories requiring designated accountability at senior management level. The EU AI Act's general application date of 2 August 2026 puts auditability on a compliance clock for firms with EU market exposure — NIST's AI Risk Management Framework provides a practical, non-regulatory-specific structure for building trustworthy AI processes alongside it. The EDM Council's DCAM v3 and CDMC frameworks cover AI/cloud governance and 14 key controls for protecting sensitive data (including MNPI) in cloud environments specifically.
- Time-to-signal for new datasets falls from months to weeks.
- Data teams can point to reusable DQR and backtest infrastructure — not just completed tickets.
- Compliance has clear visibility into which AI agents touched which data and when — addressing the SEC exam standards directly.
- Vendor rationalization decisions are made on usage data, not renewal calendar pressure.
- PMs report fewer delays between dataset availability and portfolio impact.
- IOSCO FR06/2021 — governance, testing, monitoring, explainability for AI/ML in asset management.
- NIST AI RMF 1.0 — voluntary, non-sector-specific trustworthy AI framework.
- EDM Council DCAM v3 / CDMC — data management for AI/cloud; 14 key controls for MNPI/PII in cloud environments.
- EU AI Act — general application from 2 August 2026; full effectiveness by 2027.
Sources
Every specific statistic cited in this paper is anchored to a primary or well-documented secondary source below, with methodology disclosure where available. Sources are grouped by type.
Primary — Buy-Side Surveys with Disclosed Methodology
- SIX + Crisil Coalition Greenwich — Market Data in the Age of AI, Q3 2025. Survey of 50 buy-side firms, conducted June–July 2025. Methodology and respondent composition disclosed. Stats cited: ~70% expect market data budgets to rise 1–5%; 80% view AI/ML as a key driver over 2–3 years; 90% see AI primarily as a recommendation tool; 63% now receive market data via public cloud (vs 30% in 2023); 65% use real-time data throughout the trading day (up from 54% in 2024); over three-quarters seek more/better historical tick data.
- Lowenstein Sandler — Annual Alternative Data Survey 2025, released February 2026. 107 respondents; online survey conducted November 9 – December 8, 2025; private fund managers. Stats cited: 90% currently use alt data (up from 67% in 2024, 62% in 2023); over two-thirds have alt-data budgets exceeding $1M/yr; 89% plan to increase alt-data budget; 96% of those increases directed toward AI; 81% report cost increases for AI-incorporated alt-data products; 89% say vendors are fully/mostly enabling AI analysis.
- Exabel / BattleFin — Buy-Side Practitioner Survey, January 2025. 130 fundamental PMs and investment analysts across US, UK, Singapore, Hong Kong; respondents collectively manage ~$820B AUM. Stats cited: 79% say combining data from different sources is the most frustrating alt-data challenge; 98% agree traditional data is too slow to reflect changes in economic activity; 75% say consumer spending datasets will provide outsized informational edge.
- CFA Institute — AI and GenAI in Investing: Employer Survey, August 2024. 200 investment-industry representatives; firms ranging from under $5B to over $100B AUM; conducted February 2024. Stats cited: 85% of employers see a need for industry-wide AI/GenAI standards and ethical guidelines; 82% say lack of standards hinders faster adoption; data privacy and security cited as major roadblock.
- Index Industry Association (IIA) — Global Asset Manager Survey 2024. 300 CFOs, CIOs, and portfolio managers; Europe and US; fieldwork April–May 2024; conducted independently by Opinium Research. Stats cited: GenAI/ML was the most frequently raised topic over the past 12 months among respondents, ahead of sustainable investing and thematic strategies.
Secondary — Research & Consultancy
- BCG — Global Asset Management Report + GenAI Benchmark, May 2024. Stats cited: 72% expect GenAI to have significant or transformative impact within 3–5 years; 66% have made GenAI a strategic priority; 75% are dedicating capital and people in the short term; only 16% have fully defined a strategy and are implementing it throughout their business.
- McKinsey & Company — Asset Management AI Economics, July 2025. Stats cited: pre-tax operating margins declined approximately 3 points in North America and 5 points in Europe (2019–2023) despite rising technology spend; AI/gen AI/agentic AI impact could be equivalent to 25–40% of an average asset manager's cost base; asset managers allocate 60–80% of technology budgets to run-the-business initiatives.
- Deloitte Center for Financial Services — Alternative Data Process Research. Cited for: journey from discovery to full integration spans multiple years; fully incorporating alternative data into the investment decision process may span 2–3 years; onboarding involves thorough due diligence, contracts, price negotiations, data storage, and access rights.
- Coalition Greenwich — Alternative Data Adoption Study (press release summary). Stats cited: three-quarters of buy-side firms use non-traditional data sources; nearly two-thirds expect to increase alternative data spending in the next year; about a quarter of those plan to increase by more than 10%.
- Burton-Taylor Consulting (summarized by Finextra, 2025). Stats cited: global spending on financial market data and news reached $44.3bn in 2024, rising 6.4%. Note: underlying Burton-Taylor report is paywalled; figures cited from Finextra attribution.
- WatersTechnology / TRG Screen — Market Data Management Benchmark Survey. Stats cited: 70% of buy-side firms are looking to outsource at least one aspect of market data management; 46% use specialist third-party tools. Note: survey was sponsored and commissioned; presented as survey evidence, not an independent census.
- Neudata — The State of the Alternative Data Market in 2026, February 2026. Author: Daryl Smith, Head of Research. Stats cited: $2.8bn estimated alternative data market size in 2025 (17% YoY growth); 66% of respondents use AI/LLMs for internal efficiency; 36% use AI-processed data for investment strategies; ~19 average datasets subscribed to per buyer per year; average fund spends ~$1.4m/yr on alt data.
Primary — Regulatory & Standards Bodies
- SEC Division of Examinations — Risk Alert: Investment Adviser Use of Alternative Data, April 26, 2022. Directly cited: exam staff observed advisers using alternative data without reasonably designed written policies and procedures for MNPI risk; examples include ad hoc and inconsistent diligence and failure to memorialize diligence processes. Anchors governance and provenance claims throughout.
- SEC Enforcement — App Annie Inc. and Bertrand Schmitt: Securities Fraud Charges, September 14, 2021. The SEC's first enforcement action against an alternative data provider. Focused on misrepresentations about how data was derived and what controls existed. Cited for: data provenance and controls are regulatory expectations, not optional governance choices.
- IOSCO — Final Report: The Use of Artificial Intelligence and Machine Learning by Market Intermediaries and Asset Managers (FR06/2021). Cited for: key AI/ML risk categories including governance/oversight, algorithm development/testing/monitoring, data quality and bias, transparency/explainability, outsourcing, and ethical concerns; guidance on designated senior management accountability.
- European Parliamentary Research Service (EPRS) — AI Act Implementation Timeline, June 2025. Cited for: EU AI Act entered into force 2024; general date of application 2 August 2026; AI Act expected to be fully effective by 2027.
- NIST — AI Risk Management Framework 1.0 (AI RMF 1.0). Cited as a voluntary, non-sector-specific, use-case-agnostic resource for designing and deploying AI while managing risks and promoting trustworthy AI. Used as a neutral governance anchor.
- EDM Council — DCAM v3 (Data Management Capabilities Assessment Model) and CDMC (Cloud Data Management Capabilities). Cited for: DCAM v3 expanded support for AI and cloud with stronger emphasis on governance, privacy, and protection; CDMC defines 14 Key Controls and Automations for protecting sensitive data (including PII and MNPI) in cloud environments.
Disclaimer. This paper is for informational purposes only and does not constitute investment advice, a solicitation, or an offer to buy or sell any security or financial instrument.
All financial models, scenarios, and projections are illustrative and directional. They are not return estimates and do not guarantee any outcome. Actual results will vary materially based on market conditions, strategy, execution, and other factors not captured in this model.
Alpha contribution figures are gross and pre-costs. They do not include market impact, turnover, borrow costs, fees, or signal decay. Net realizable alpha will be lower.
Third-party brand names and marks (Coalition Greenwich, SIX, Lowenstein Sandler, Exabel, BCG, McKinsey, CFA Institute, Neudata, Deloitte, Burton-Taylor, WatersTechnology, IIA, SEC, IOSCO, NIST, EDM Council, and others) are used for attribution only and belong to their respective owners. Kamba Group has no commercial relationship with these organizations beyond use of publicly available research.
© 2026 Kamba Group LLC. All rights reserved.
