The Pilot Graveyard: Why Bank AI Projects Stall Between Proof of Concept and Production

2026-03-13 14:20:51

The AI in banking market is on course from $26 billion to $546 billion. The gap between investment and deployment tells a more complicated story.

I have had some version of the same conversation with people across financial services for the past two years. It goes something like this.

A bank runs a proof of concept for an AI model: fraud detection, credit scoring, customer churn prediction. The results are good. Sometimes they are genuinely impressive. The data science team is excited. Senior leadership is interested. A business case gets written.

And then. Nothing. The model sits in a sandbox. Months pass. The pilot gets extended. Eventually it quietly disappears from the roadmap, replaced by the next proof of concept.

I call this the pilot graveyard. In my view, it is the defining challenge of AI in banking right now. Not the technology. Not the data. Not the budget.

“The technology is not the bottleneck. Most bank AI pilots fail for reasons that have nothing to do with whether the model works.”

2024 Market Value
2035 Projection
CAGR 2025-2035
USD 26.19 Billion
USD 546.02 Billion
31.80%

Global AI in Banking Market. Source: Kaiso Research and Consulting, Global AI in Banking Market Size, Trend & Opportunity Analysis Report, 2025-2035.

The Numbers Say One Thing. The Deployment Rate Says Another.

According to research published by Kaiso Research and Consulting, the global AI in banking market was valued at USD 26.19 billion in 2024 and is projected to reach USD 546.02 billion by 2035. That is a compound annual growth rate of 31.80%, putting AI in banking among the fastest-growing technology categories in financial services.

And yet, if you talk to the people actually responsible for moving AI from experimentation into production inside banks, the picture is more complicated. Investment figures capture what banks are spending on AI infrastructure, vendor contracts, and data engineering. They do not capture the ratio of pilots that successfully reach production to the ones that don’t.

That ratio, in my experience, is not a flattering one.

A 2023 survey by McKinsey & Company found that fewer than a quarter of financial institutions reported successfully scaling AI beyond pilot programmes. I am not citing that figure to be discouraging. I am citing it because it raises a question the headline growth numbers do not answer: if banks are spending this much on AI, why is so much of it not making it into live operations?

The Real Blockers Are Not Technical

Here is what I actually think is happening.

The reasons AI pilots stall in banking are overwhelmingly non-technical. The model usually works. The data is usually there. The engineers can usually build what is needed. What fails is everything around the model. Three things in particular come up again and again.

**Explainability and the regulator problem. ** A credit scoring model that improves accuracy by 15% is genuinely valuable. But if a relationship manager cannot explain to a customer why their application was declined, that model cannot go into production. If a compliance officer cannot demonstrate to a regulator that the decision was free from prohibited bias, it cannot go into production. Full stop. The EU AI Act classifies credit scoring as high-risk AI, which has sharpened this constraint considerably. Banks are not avoiding AI because it does not work. Some are avoiding putting it into production because they cannot yet satisfy the explainability requirements that live operations demand.

**Legacy system integration. **This one is unglamorous but it matters enormously. The average large bank runs core banking infrastructure that is decades old. Connecting a modern ML inference layer to a system built on COBOL or an early-generation mainframe is not a data science problem. It is an architecture problem, and it is expensive and slow to solve. Many AI pilots are built on clean, curated data extracts that bear only a passing resemblance to the messy, inconsistent data in production systems. When the pilot hits the real pipeline, performance degrades and timelines stretch. The pilot graveyard fills up a little more.

**Governance structures not designed for model risk. **Most bank risk committees were built to govern credit risk, market risk, and operational risk. Model risk is a different category and many institutions are still developing the frameworks to manage it properly. Who owns a model that makes lending decisions? Who is responsible when it drifts? What is the retraining and revalidation protocol? These are not hypothetical questions. They are the exact questions that stop a pilot from progressing to production when no one has clear answers.

“The pilot graveyard is not a technology failure. It is a governance failure, a legacy architecture problem, and a regulatory readiness gap — all compounding at once.”

What Is Actually Working — And Why

I want to be fair here. The pilot graveyard is real but it is not universal. There are categories of AI deployment in banking that have successfully moved from experimentation to production at scale, and they share some characteristics worth examining.

**Fraud detection and AML. **This is the most mature AI application in banking, and it is not a coincidence. Fraud detection has three properties that make it tractable for production deployment: the business case is unambiguous (measurable loss reduction), the output is a score or flag rather than an explanation of a human-consequential decision, and the regulatory framework is relatively established. When the model is wrong, the consequence is a false positive (an inconvenient but reversible experience for a genuine customer). The explainability bar is lower than for credit decisions. JPMorgan Chase’s March 2025 partnership with AWS to deploy a cloud-native fraud detection platform using Amazon SageMaker is a recent example of what mature deployment in this category looks like.

**Conversational AI in customer service. ** NLP-powered virtual assistants and chatbots have successfully crossed from pilot to production in many institutions. The reason is structural: the failure mode is containable. A chatbot that misunderstands a query gets escalated to a human agent. The regulatory risk is manageable. Infosys’s launch of FinAI Assist in September 2024, an AI-driven virtual assistant for retail banking clients, is representative of where this category sits operationally.

The pattern is not subtle. The AI applications that have successfully scaled share lower regulatory risk, more contained failure modes, and cleaner integration paths. Credit underwriting, portfolio risk management, and regulatory reporting automation (the higher-stakes, higher-value applications) are moving more slowly precisely because the governance requirements are more demanding.

The 24-Month Window That Will Define the Category

Here is where I land on this.

The $546 billion market projection for 2035 is not unreasonable. The demand drivers are real. Fraud is getting more sophisticated. Regulatory reporting is getting more complex. Customer expectations for personalised digital experiences are not going back down. Banks that do not deploy AI at scale will be at a structural disadvantage within a decade.

But the institutions that capture the most value from that growth will not be the ones that run the most pilots. They will be the ones that build the governance infrastructure that allows pilots to become production systems: model risk frameworks, explainability tooling, retraining protocols, legacy data pipeline modernisation.

That work is less visible than a splashy proof of concept. It does not generate press releases. But it is, in my view, the actual competitive battleground in bank AI right now.

The banks doing it seriously are making architectural and governance decisions in the next 24 months that will determine what is possible for them in 2028, 2030, and beyond. The ones that are not, still running disconnected pilots without the operational infrastructure to scale them, will find that their AI investment figures look impressive and their deployment rates do not.

Three Questions Every Banking Executive Should Be Asking

I am not sure there is a clean resolution to any of this. That is probably why the conversation keeps happening. But if I were sitting on a bank’s executive committee or technology board right now, these are the three questions I would want honest answers to.

**First: **what is our ratio of AI pilots to live production models, and what is the average time from pilot initiation to production deployment? If you do not know those numbers, that itself is the answer.

**Second: **do we have a model risk governance framework explicitly designed for machine learning models, covering ownership, drift monitoring, revalidation triggers, and bias testing? Or are we applying a credit risk governance template to something it was not built for?

**Third: **for our highest-value AI use cases, have we mapped the full integration path into production systems including legacy data pipelines, not just the clean data environment the pilot was built on? If the pilot and the production environment use different data, the pilot result is not a reliable predictor of production performance.

None of these are comfortable questions. That is probably why they do not get asked often enough.

The Market Will Grow. The Question Is Who Benefits.

A market growing at 31.80% annually from a $26 billion base is not going to stay stuck in the pilot phase indefinitely. The economic pressure to deploy, the competitive pressure from neobanks and fintech challengers without legacy system constraints, and the regulatory frameworks that are slowly becoming clearer will all push the industry toward broader production deployment.

But the transition from investment to deployment is not automatic. It requires deliberate work on governance, architecture, and organisational capability that most banks have not yet completed.

The pilot graveyard does not have to be permanent. It just requires deciding that production readiness is as important as building the next impressive proof of concept.

I am not sure all banks have made that decision yet.

Data & Disclosure

Market figures sourced from the Global AI in Banking Market Size, Trend & Opportunity Analysis Report, 2025-2035, published by Kaiso Research and Consulting. The author is Head of Marketing at Kaiso Research and Consulting and has a direct affiliation with this research.

Additional references: McKinsey & Company, The State of AI in Financial Services (2023). JPMorgan Chase-AWS partnership: public announcement, March 2025. BNP Paribas-Data Sense AI acquisition: public announcement, December 2024. Infosys FinAI Assist: public announcement, September 2024.

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.