Data lakes: Not waving but drowning?

Making the ‘Big Data’ opportunity work for your organisation can turn out to be quite hard. Connecting data science to better decision making while still accommodating all of the many flavours of reporting and data discovery is complicated. ‘Data Lakes’ were supposed to be the great hope for business who were swamped by the vast volumes of data and the vast number of subtly different use cases.

Lakes are organic and unstructured; exactly the opposite of the old concept of a ‘data warehouse’, which implies a highly structured system, with everything in it logged and classified. A bit like an old-fashioned library: traditional data storage was like a library of books, where the librarian’s ability to retrieve the information you want relies on each volume being kept in a specific place where it can be easily found. The minute a book is misplaced, the search falls down.

In the original hype around data lakes there’s no need to classify all the books – they can be put back in whatever order and on any shelf, because the librarian has remembered where they are regardless. The data structure is created when the question is asked. The technology behind this evolved originally from Google search. It’s very fast, and it can handle vast amounts of unstructured data, in all sorts of different formats. I have heard many clients breathe a sigh of relief that all of their data challenges are over…

But there are some remaining blockers, not least of which is that you might not always get exactly the same answer to a question. This is a big deal if your data lake is supposed to be providing operational reporting or is holding the record of what I have bought from your company!

Practically, then, you find that you need to have islands of permanent structure in the lake – something that looks like our old fashioned library. To achieve the potential you also probably need to think hard about the end-game as well. A set of re-useable data assets available to any application is possible. But, hard thinking is required to accommodate, in the example of a car manufacturer say, supply chain data, machine tool input, climate data, social sentiment to name a few.

Ultimately, data lakes will transform data management and the way companies use information. By separating data from applications, the approach offers the opportunity to turn big companies into agile companies. At this point in its evolution that still takes considerable skill in understanding and shaping data, as well as an in-depth understanding of the industry and functional context. That’s what PwC can help you achieve.