eDiscovery: Putting portable models to the test

There’s currently a buzz around portable models and the convenience this technology can provide. But is it justified? We’ve carried out two in-depth experiments to see whether these claims about portable models and their benefits stand up to the test.

Imagine you could train and build an Active Learning model on one data set, and then apply the model to an entirely different data set to identify documents most likely to be relevant based on historic work. The result would give your legal team a valuable head start on large and complex document reviews. This is what portable models aim to do.

Premise and potential

So, what is a portable model? In brief, a portable model is made up of themes that are weighted according to their influence. If a document in the data set contains several of those heavily weighted themes, then the model will predict that the document is relevant. The ‘machine learning’ determines what these themes and weightings should be, based on human decisions.

A number of organisations who’ve built portable models claim that their effectiveness cumulatively increases the more data that they are exposed to. As the models are trained on documents from one case, then another, and so on they become more intelligent and better able to identify relevant themes.

The premise is clearly attractive. But what about the practical application? To put portable models through their paces, our eDiscovery team carried out two controlled experiments. The first looked at what makes a ‘good’ portable model and the second gauged whether the model would give us a head start on a new set of similar matters. We went into this investigation with an open mind, rather than having hypotheses we wanted to validate. Here’s what we found.

Experiment one - portable models 101

We identified two similar matters where documents had already been manually reviewed and built portable models on each using different combinations of the settings available in the software. To measure performance and compare the model predictions with the control of human decisions, we used industry standard metrics - recall and precision. Recall is a measure of completeness and precision is a measure of accuracy.

During this experiment we observed that when the portable model was applied to a new data set with no human training on the new matter, its recall was high, but precision was low. We also noted that the model contained client-specific information as themes. These needed to be manually ‘cleansed’ before they could be shared and used on another matter.


Portable models can be shared quickly and easily. Without human training, however, we found that precision was poor. The performance of portable models can be improved by including metadata such as dates or file types although an important date for one matter will not necessarily apply to another.

Experiment two - boosting review on a new matter

We identified three similar matters where documents had already been manually reviewed. We then built a model and trained it on each of the first two matters consecutively. To gauge the effectiveness of the model, we measured its performance when applied to the third matter. We also attempted training with different proportions of the review population in the third matter to assess the impact this had on the model’s effectiveness.

Similar to the first experiment we noted that while recall was high, precision was low. To achieve the necessary levels of recall and precision, we had to use more than 25% of the population as training (with or without the portable model), this is likely due to the small proportion of relevant documents in the third matter. We observed that applying both a portable model and training made little difference to the performance when compared to conducting the training on its own.


The model didn’t add value over and above what could already be achieved with a standard Active Learning model/workflow.

Insurmountable hurdles

What do our experiments say about the prospects for portable models? Our findings raise questions over how well these models perform when they come up against practical challenges ranging from data protection to recognising the intrinsic differences from one case to another.

Data protection

The portable model may contain client sensitive data which must be handled with care. Even if the data is cleansed, action must still be taken to avoid breaching data protection legislation, ensuring that information is only used for purposes for which it was intended.

Each case is unique

It would be near impossible to build a portable model that works for all types of cases. For instance, the definition of relevancy in an anti-bribery and corruption matter would be quite different to a data breach. One way around this would be to build a library of portable models for different types of cases. But even where two cases are of the same type, the themes that make up the portable model will include specific information that is not transferable such as dates, individuals and company names.

Can these challenges be overcome?

Possibly, but the limitations this would place on the performance of the model mean that it would deliver little or no value beyond what could be achieved through Active Learning on its own.

In this article we’ve set out our findings. But what about your experiences and perspectives? We’d love to hear your thoughts as part of the ongoing debate on portable models - will these models prove their worth as the technology evolves and improves? We hope so. Feel free to contact a member of the team below to further discuss this topic.

Contact us

Matt Joel

Matt Joel

Partner, PwC United Kingdom

Tel: +44 (0)7809 552273

Christopher Dean

Christopher Dean

Cyber Security - Transformation Director, PwC United Kingdom

Tel: +44 (0)7810 635201

Follow us