TFF #026 - Data Lakes vs Data Warehouses vs Data Cubes: Which is Best?

TFF #026 - Data Lakes vs Data Warehouses vs Data Cubes: Which is Best?

1-2-3 Newsletter - Read Time - 5 Mins

Hello, and welcome to a another edition of the Tech for Finance Newsletter where we have:

  • 1 - Data Lakes vs Data Warehouses vs Data Cubes: Which is best? Tutorial
  • 2 - Expert resources on how to use data to scale revenue + 50 finance cheat sheets for you to get stuck into + BONUS
  • 3 - Tech tools for you to help you chat with connected apps in Slack, extract insight from CSVs, and create documents in rapid time.

1 - Data Lakes vs Data Warehouses vs Data Cubes: Which is best?

This week I was lucky to speak with Abhijeet Sarkar - CEO at TypeSift / total data pro...

He did a great job a breaking down how you should be thinking about your data.

Data Lakes:

The term "data lake" refers to unstructured raw data, the purpose of which is to store all of your structured and unstructured data at any scale. You can store your data as-is, without having to first structure the data.

However, unstructured data can be messy and you may end up with a "data swamp" i.e a data lake that is not well-managed or organized.

There are now more data lake tools available, and many are equipped with AI to help you make sense of your data without having to structure it.

A couple of examples are:

Data Warehouses:

A data warehouse is a design model and a blueprint of how your data fits together. It's where all your data comes together and is cross-connected from different places, such as your CRM, ERP, marketing system, etc. It acts as a central source of truth and the system of record.

Data warehouse projects can be very expensive and risky. If implemented correctly, the return is amazing, but if not, the downside can be quite large.

Modern tools are making it easier to deploy data warehouses. Some tools, like TypeSift, deploy a data warehouse on your behalf.

Data warehouses are more long term, especially if your data is likely to increase dramatically in the next 5-10 years.

A couple of examples are:

My prediction is that data lakes / warehouses will morph together in the future (we're already seeing this) as tools evolved to make data management easier.

Data Cubes:

A data cube is a multi-dimensional ("n-D") data structure that allows faster data processing. It's essentially a three-dimensional (3D) representation of data. The dimensions of the cube represent data columns, and the cells in the cube represent data points.

The concept of a data cube was compared to a Rubik's cube, where once you have all the connecting points together, you could just twist the cube and make the linkages of the different data in the different systems.

The idea of a data cube was a predominant design in the 90s. If you're just measuring sales, that's a point. If you're measuring sales over time, that's a line (one dimension). If you're doing sales by time and by geography, you now have two dimensions (a plane). If you're now throwing in sales by time, by geography, by category or product, you're now in three dimensions (a cube). If you keep adding more dimensions, you've got what's called a hypercube.

A data cube is more of a concept than a platform, and the tools I've mentioned previously tend to take a 'cube' like approach when analysing your data.


If you've got data in different systems, there's potential benefit in pulling that data into a data lake.

Once it's there, depending on the tool, you can start deploying AI and machine learning to start driving insights.

You don't always need a data lake though, as some planning / data tools will accept data from multiple sources without needing either a lake or a warehouse.

As with all of these things, it depends on the end goal.

Are we trying to quickly analyse a small amount of data?

Are we trying to build a warehouse or large amounts of data from different sources?

If you're unsure, you know where I am if you want to talk things through...

2 - Expert Resources

1. Podcast Interview

I'd recommend checking out the full discussion with Abhijeet. Podcast Ep.027 'How to use data to scale revenue.'

We talk about data, but we also talk about:

  • Why digital transformation is irreversible
  • How to use data to scale revenue
  • Automating low value activities
  • Abhijeet’s favourite tech
  • The evolution of data
  • And much more…

Subscribe to the podcast here, and watch on YouTube here.

2. 50+ Finance Cheat Sheets

My good friend Nicolas Boucher release a compilation of 50+ finance cheat sheets today, and it's already got more than 1,200 interactions!

If covers everything from technical finance, to careers, to data and ChatGPT.

You can find it here.

BONUS - Ultimate GPT Framework Cheat Sheet

If you've not seen it, I've also produced my own cheat sheet off the back of my Ultimate GPT Framework for Finance Guide.

You can find the full 51 point cheat sheet here.

3/ Tech Tools

1. Jigso Sidekick - Using Slack and want to be able to your connected apps? Sidekick will help you do just that - learn more here.

2. Uing - Extract meaningful insight from CSV files - learn more here.

3. Formzil - Create documents quickly using AI - learn more here.

IMPORTANT - Always read the data policy for any tools you try. If you’re ever unsure. Make sure your data, and any customer or supplier data is anonymised.

NOTE - I am not affiliated with any of the tools in this e-mail.

So that’s it for this week.

Until next time.



When you’re ready, there’re 2 ways I can help you:

1. Get yourself a copy of ‘The Ultimate GPT Framework for Finance: ChatGPT, AutoGPT, Python and Beyond here. (use the code NEWS25 at checkout for 25% off)

PLUS - I'm now offering a free 30 minute consulting session on how to put
the guide into practise with every purchase.

2. If you’re looking to learn more from industry experts - Subscribe to the Tech for Finance Podcast here, and the YouTube channel here.

Know someone else that'll like this? It'd mean a lot to me if you could forward on 🙂


tech for finance

©2022 by Adam Shilton. Privacy Policy - Terms of Use

©2022 by Adam Shilton. Privacy Policy - Terms of Use