Review of “Big Data Glossary” by Pete Warden (O’Reilly Media)


Pete Warden write a brief, less than fifty pages, but complete review of “Big Data” with more than sixty “terms” described. Big Data Glossary - Cover For each term Warden shares with us a little bit of his experience with big data, with some suggestions about when you may use the subject described.

Who may read this book.

The book is good starting point to who have to deal with big data. As a glossary is supposed to be, each term is not described in deep, but it reports some hints about similar, or ancestor, tools and suggests when you may found useful explore that tool. Experienced people may found the description of a well know term too brief, but the glossary is so huge that they can found new tools to investigate.

In my opinion the book lacks a complete references list, but a short internet search may set aside that defect.

As one may suppose, most of the terms within the glossary comes from Google, Yahoo, Linkedin or Facebook labs and they are supported by Apache Foundation. Surprisingly, at least for me, often Java and Javascript are the languages used by the described tools.

How the book is structured.

The book is made of eleven chapters. The first chapter introduces some base terms (like Document-Oriented,  Key/Value, MapReduce, Sharding) that will be widely used through the rest of the book.

The second and the third chapters list the terms related to how to access to  big data, with NoSQL Database or MapReduce approach.

The chapters four and five describe where to store big data, storage (file systems) and servers. Most of the services and systems listed here are based on cloud computing.

Chapters from six to eight contain terms related to big data processing, like natural language processing or machine learning.

Chapter nine lists some tools or API useful to visualize big data set via graph, map or table.

Chapter ten suggests some tools useful to cope with big data set acquisition. Often  dataset are manually created or are unstructured, like web pages, so the chapter is focused on data clean up and automatic data extraction.

Serialization is the subject of the last chapter, where is described how to save data or send them across the network.

Book data sheet.


Title: Big Data Glossary
Author: Pete Warden
Publisher: O’Reilly Media
Print: September 2011
Ebook: September 2011
Pages: 60
Print ISBN: 978-1-4493-1459-0
Ebook ISBN: 978-1-4493-1458-3

I review for the O'Reilly Blogger Review Program This review was made as part of the O’Reilly Blogger Review Program.

O’Reilly give me a free copy of the book.

About these ads

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s