Skip to the content.

The Big Data Textbook

From clay tablets to lakehouses

The Big Data textbook is an ongoing effort to create a textbook with the content of the Big Data and Big Data for Engineers lectures taught at ETH Zurich.

The latest version can be found on ResearchGate.

It can be shared, but please only do so by giving the url https://ghislainfourny.github.io/big-data-textbook/

A second edition with the content as of August 30, 2024 is soon going to be available for purchase as a color printed copy or on Kindle on Amazon US, Amazon DE, and others (change the country code in the URL).

It also remains available as a free download with the latest updates. This way, educators can use this material with peace of mind, knowing that all their students have access.

Note that the RumbleDB engine, used in my courses at ETH Zurich for exercises and in the final exam, is also free. https://www.rumbledb.org/

Current content (second edition, 2024):

  1. Introduction and motivation
  2. Lessons learned and SQL brushup
  3. Cloud storage
  4. Distributed file systems
  5. Syntax
  6. Wide column stores
  7. Data modeling and validation
  8. Massive parallel processing (MapReduce)
  9. Resource management
  10. Generic dataflow processing (Spark)
  11. Document stores
  12. Querying denormalized data
  13. Graph databases

    Upcoming chapters planned for the next edition (already available on YouTube):

  14. Data warehouses and data cubes
  15. Wrap up

YouTube course recordings

All course recordings are available on YouTube

Big Data

Big Data targets an audience in Computer Science and Data Science Master’s programmes.

The lecture page can be found here

Big Data for Engineers

Big Data for Engineers targets a very broad audience in all other departments at the BSc, MSc and PhD level. The material is very similar, but spending more time explaining CS prerequisites. Some programming knowledge (such as Python) and knowledge of logic and algebra (sets, etc) is assumed.

The lecture page can be found here