The Big Data Textbook
From clay tablets to lakehouses
The Big Data textbook is an ongoing effort to create a textbook with the content of the Big Data and Big Data for Engineers lectures taught at ETH Zurich.
The latest version can be found on ResearchGate.
It can be shared, but please only do so by giving the url https://ghislainfourny.github.io/big-data-textbook/
A third edition with the content as of February 2026 is available for purchase as a color printed copy or on Kindle on Amazon US, Amazon DE, and others (change the country code in the URL).
It also remains available as a free download with the latest updates. This way, educators can use this material with peace of mind, knowing that all their students have access.
Note that the RumbleDB engine, used in my courses at ETH Zurich for exercises and in the final exam, is also free.
Current content (third edition, 2026):
- Introduction and motivation [slides]
- Lessons learned and SQL brushup [slides]
- Cloud storage [slides]
- Distributed file systems [slides]
- Syntax [slides]
- Wide column stores [slides]
- Data modeling and validation [slides]
- Massive parallel processing (MapReduce) [slides]
- Resource management [slides]
- Generic dataflow processing (Spark) [slides]
- Document stores [slides]
- Querying denormalized data [slides]
- Graph databases [slides]
-
Data cubes [slides]
Upcoming chapters planned for the next edition (already available on YouTube):
- Wrap up [slides]
YouTube course recordings
All course recordings are available on YouTube
Big Data
Big Data targets an audience in Computer Science and Data Science Master’s programmes.
The lecture page can be found here
Big Data for Engineers
Big Data for Engineers targets a very broad audience in all other departments at the BSc, MSc and PhD level. The material is very similar, but spending more time explaining CS prerequisites. Some programming knowledge (such as Python) and knowledge of logic and algebra (sets, etc) is assumed.
The lecture page can be found here
Information Systems for Engineers
These are the slides for my other course Information Systems for Engineers, focused on relational tables and SQL. This is material commonly taught at the Bachelor’s level in Computer Science programmes, but in this case repurposed for students with other backgrounds.
- Introduction [slides]
- The relational model [slides]
- Data definition with SQL [slides]
- The relational algebra [slides]
- Queries with SQL [slides]
- Database design theory [slides]
- Transactions and the three tiers [slides]
- Views and indices [slides]
- Data cubes [slides]
- Database architecture [slides]
- Outlook [slides]