Our role
GenAI testing and assurance
Our deliverable
Innovative testing technology and framework developed
The outcome
Banking productivity enhanced, risks mitigated
In the banking sector, adopting nascent technology such as GenAI poses challenges due to regulatory concerns and risk management.
Virgin Money aimed to introduce a GenAI productivity tool, embedding it directly across the organisation, enhancing efficiency by enabling staff to focus on tasks requiring human judgement instead of repetitive activities.
Despite initial scepticism about testing GenAI, PwC and Virgin Money believed it was possible to test the once untestable.
Our work engaged Virgin Money in a strategic partnership, developing pioneering testing techniques to build a picture of how the GenAI productivity tool behaved in Virgin Money’s environment – to harness the benefits at the same time as understanding and managing the risks.
Together, we developed a comprehensive testing framework and harnessed 'LLM As-a-Judge' technology, which effectively used AI to evaluate AI. This was complemented by statistical methods and human expertise.
Over five weeks, the robust testing framework addressed errors, hallucinations, and biases, ensuring the tool met regulatory standards and Virgin Money's risk tolerance.
Through rigorous testing of 1,464 AI assistant outputs across various productivity tools, along with Virgin Money’s model validation team, we identified risks in AI accuracy, robustness, and bias. Implementing best-practice prompting boosted the AI assistant's accuracy from 40% to 85%.
The results, shared with Virgin Money's AI Council, set the stage for a confidently expanded GenAI deployment. This pioneering effort democratised technology use, highlighted productivity improvements, and strengthened organisational confidence in GenAI, paving the way for an AI-enabled future.