The Big Data Textbook

Teaching Big Data at universities for both computer scientists and non computer scientists


The Big Data Textbook

The Big Data textbook is an ongoing effort to create a textbook with the content of the Big Data and Big Data for Engineers lectures taught at ETH Zurich.

The latest version can be found here.

It can be shared, but please only do so by giving the url https://ghislainfourny.github.io/big-data-textbook/

It is also available for purchase as a color printed copy on Amazon US, Amazon DE, and others (change the country code in the URL); it will remain available as a free download with regular updates. This way, educators can use this material with peace of mind, knowing that all their students have access.

Note that the RumbleDB engine, used in my courses at ETH Zurich for exercises and in the final exam, is also free. https://www.rumbledb.org/

Current content (first edition):
  1. Introduction and motivation
  2. Lessons learned and SQL brushup
  3. Object storage
  4. Distributed file systems
  5. Syntax
  6. Wide column stores
  7. Data modeling and validation
  8. Massive parallel processing (MapReduce)
  9. Resource management
  10. Generic dataflow processing (Spark)
  11. Document stores
  12. Querying denormalized data
   Upcoming chapters planned for the next edition (already available on YouTube):
  1. Graph databases
  2. Data warehouses and data cubes
  3. Wrap up

YouTube course recordings

All course recordings are available on YouTube

Big Data

Big Data targets an audience in Computer Science and Data Science Master’s programmes.

The lecture page can be found here

Big Data for Engineers

Big Data for Engineers targets a very broad audience in all other departments at the BSc, MSc and PhD level. The material is very similar, but spending more time explaining CS prerequisites. Some programming knowledge (such as Python) and knowledge of logic and algebra (sets, etc) is assumed.

The lecture page can be found here