Nidhi Agarwal
Chief Model Risk Officer,Virgin Money
Industry
Banking
Our role
GenAI testing and assurance
Featuring
GenAI in regulated sectors
“With nascent technology, one of the first things you want to do is test it, to make sure it’s safe to adopt,” says Nidhi Agarwal, Virgin Money’s chief model risk officer.
For a regulated, risk-conscious industry such as banking, adopting an AI tool to boost staff productivity can feel daunting. “Even as recently as a year or two ago, people were saying you can’t test GenAI or Large Language Models,” says Chris Heys, who leads PwC’s AI and Modelling practice for the banking sector.
Playback of this video is not currently available
This is how Virgin Money and PwC began their GenAI journey together; by testing the untestable. Underpinning Virgin Money’s approach to the emerging technology is its ethos of ‘smart disruption’. Rather than waiting for others to pilot generative artificial intelligence (GenAI) in banking, the company aimed to embed a GenAI-assistant productivity tool directly into employees’ workflows - so testing to understand the risks was crucial.
Examples of use cases that could save hours of Virgin Money’s people’s time every week include creating new documents, converting code, and writing up meeting minutes. But there is still a “huge range” of use cases Agarwal is excited to explore.
“This tool will evolve – as it starts learning more about Virgin Money’s policies, documents, standards and data, it will become richer, more intelligent, giving us more accurate outcomes which we can more easily use in our decision making. Though the human-in-the-loop will always be important.”
Nidhi Agarwal,Chief Model Risk Officer, Virgin MoneyThe real benefit is handing over repetitive tasks to a tool that can do them in a fraction of a second. “This will free people up to do what they’re really good at – applying human judgement in evaluating the information they’re given,” Agarwal continues.
However, developing and using Large Language Models (LLMs) calls for careful oversight to balance the benefits with the potential risks – nowhere more so than in the regulated banking sector. “Financial services, as an industry, is very conservative. It takes a cautious approach to adopting new technology,” says Agarwal.
The challenge was to devise a way of testing to understand how a GenAI assistant works, where it boosted productivity, where it faltered, and how to mitigate risks - especially in the light of new regulation, such as the EU AI Act. “We wanted to be able to say, you can test it, you can understand it - and if you can do that, you can build trust in it,” says Heys.
“While we’ve spent years testing complex financial models and machine learning models - GenAI and LLMs have taken model testing to a whole new level of challenge,” Heys continues. “Testing GenAI is a brand-new problem; it deals with words not numbers, it's humanlike, it gives a different answer every time, the datasets are massive.”
Together, PwC and Virgin Money developed a testing framework and innovated testing technology, which helped the bank validate and implement the GenAI tool for its staff. The joint team – of data scientists, technologists, AI experts, prompt model risk and model validation experts - ran live tests to build a full picture of errors, hallucinations, bias and toxicity, using AI to test AI - or ‘LLM As-a-Judge' technology.
This complemented the more traditional testing approaches using statistical techniques and specialist human testing experts. While using AI to test AI might sound counter-intuitive, it is based on the empirical observation that LLMs perform better at evaluating responses than they do at creating responses. Using LLM As-a-Judge helps to scale and automate testing, allowing full testing to be completed in five weeks.
“It’s a very innovative technology,” says Agarwal, “but we shouldn’t forget the fact there is model risk involved in the use of this.” Indeed, after testing, the team found some of the risks could be easily controlled by staff, and that high accuracy levels depended on it being operated by well-trained people. The testing identified hallucinations in a few aspects of the GenAI tool that were outside of Virgin Money’s risk tolerance and so additional controls and guardrails were developed to address these risks - and satisfy regulatory due diligence.
The results of the team’s GenAI testing are now with the organisation’s AI Council, which brings together all the areas AI touches - testing, risk, commercial, compliance, and more - to weigh the value of each new use case.
With that ‘safe use’ framework in place, the business has laid the foundations for an AI-enabled future. “There’s no need to start from scratch, piloting each use case,” says Agarwal. She adds: “Virgin Money can now be much bigger, bolder and more ambitious in rolling out GenAI to all employees.”
With the right guidance and a better understanding of the GenAI assistant, staff trust in the tool is growing as the results improve. “It really felt like frontier work – showing that it can be done,” Heys says. “It was good being able to put GenAI in the hands of a whole organisation – democratise it, with everyone using it and starting to feel comfortable with it. Then you can see the productivity benefits coming through, right across the organisation.”
Nidhi Agarwal
Chief Model Risk Officer,