Pete Warden write a brief, less than fifty pages, but complete review of “Big Data” with more than sixty “terms” described. For each term Warden shares with us a little bit of his experience with big data, with some suggestions about when you may use the subject described.
Who may read this book.
The book is good starting point to who have to deal with big data. As a glossary is supposed to be, each term is not described in deep, but it reports some hints about similar, or ancestor, tools and suggests when you may found useful explore that tool. Experienced people may found the description of a well know term too brief, but the glossary is so huge that they can found new tools to investigate.
In my opinion the book lacks a complete references list, but a short internet search may set aside that defect.
How the book is structured.
The book is made of eleven chapters. The first chapter introduces some base terms (like Document-Oriented, Key/Value, MapReduce, Sharding) that will be widely used through the rest of the book.
The second and the third chapters list the terms related to how to access to big data, with NoSQL Database or MapReduce approach.
The chapters four and five describe where to store big data, storage (file systems) and servers. Most of the services and systems listed here are based on cloud computing.
Chapters from six to eight contain terms related to big data processing, like natural language processing or machine learning.
Chapter nine lists some tools or API useful to visualize big data set via graph, map or table.
Chapter ten suggests some tools useful to cope with big data set acquisition. Often dataset are manually created or are unstructured, like web pages, so the chapter is focused on data clean up and automatic data extraction.
Serialization is the subject of the last chapter, where is described how to save data or send them across the network.
Book data sheet.
|Title:||Big Data Glossary|
O’Reilly give me a free copy of the book.