The Big Data Textbook
The Big Data textbook is an ongoing effort to create a textbook with the content of the Big Data and Big Data for Engineers lectures taught at ETH Zurich.
The latest version can be found here.
It can be shared, but please only do so by giving the url https://ghislainfourny.github.io/big-data-textbook/
It is also available for purchase as a color printed copy on Amazon US, Amazon DE, and others (change the country code in the URL); it will remain available as a free download with regular updates. This way, educators can use this material with peace of mind, knowing that all their students have access.
Note that the RumbleDB engine, used in my courses at ETH Zurich for exercises and in the final exam, is also free. https://www.rumbledb.org/
Current content (first edition):
- Introduction and motivation
- Lessons learned and SQL brushup
- Object storage
- Distributed file systems
- Syntax
- Wide column stores
- Data modeling and validation
- Massive parallel processing (MapReduce)
- Resource management
- Generic dataflow processing (Spark)
- Document stores
- Querying denormalized data
Upcoming chapters planned for the next edition (already available on YouTube):
- Graph databases
- Data warehouses and data cubes
- Wrap up
YouTube course recordings
All course recordings are available on YouTube
Big Data
Big Data targets an audience in Computer Science and Data Science Master’s programmes.
The lecture page can be found here
Big Data for Engineers
Big Data for Engineers targets a very broad audience in all other departments at the BSc, MSc and PhD level. The material is very similar, but spending more time explaining CS prerequisites. Some programming knowledge (such as Python) and knowledge of logic and algebra (sets, etc) is assumed.
The lecture page can be found here