Review of “Big Data Glossary” by Pete Warden (O’Reilly Media)

Pete Warden write a brief, less than fifty pages, but complete review of “Big Data” with more than sixty “terms” described. Big Data Glossary - Cover For each term Warden shares with us a little bit of his experience with big data, with some suggestions about when you may use the subject described.

Who may read this book.

The book is good starting point to who have to deal with big data. As a glossary is supposed to be, each term is not described in deep, but it reports some hints about similar, or ancestor, tools and suggests when you may found useful explore that tool. Experienced people may found the description of a well know term too brief, but the glossary is so huge that they can found new tools to investigate.

In my opinion the book lacks a complete references list, but a short internet search may set aside that defect.

As one may suppose, most of the terms within the glossary comes from Google, Yahoo, Linkedin or Facebook labs and they are supported by Apache Foundation. Surprisingly, at least for me, often Java and Javascript are the languages used by the described tools.

How the book is structured.

The book is made of eleven chapters. The first chapter introduces some base terms (like Document-Oriented,  Key/Value, MapReduce, Sharding) that will be widely used through the rest of the book.

The second and the third chapters list the terms related to how to access to  big data, with NoSQL Database or MapReduce approach.

The chapters four and five describe where to store big data, storage (file systems) and servers. Most of the services and systems listed here are based on cloud computing.

Chapters from six to eight contain terms related to big data processing, like natural language processing or machine learning.

Chapter nine lists some tools or API useful to visualize big data set via graph, map or table.

Chapter ten suggests some tools useful to cope with big data set acquisition. Often  dataset are manually created or are unstructured, like web pages, so the chapter is focused on data clean up and automatic data extraction.

Serialization is the subject of the last chapter, where is described how to save data or send them across the network.

Book data sheet.

Title: Big Data Glossary
Author: Pete Warden
Publisher: O’Reilly Media
Print: September 2011
Ebook: September 2011
Pages: 60
Print ISBN: 978-1-4493-1459-0
Ebook ISBN: 978-1-4493-1458-3

I review for the O'Reilly Blogger Review Program This review was made as part of the O’Reilly Blogger Review Program.

O’Reilly give me a free copy of the book.


Hello world!

Hi at all.

Per un pugno di fagioli“, in Italian stand for “for a fist of beans”, is my tech blog where I will share what I fall through while I’m working, in first instance with java. I will keep it like a diary where spend my two cents, without the aim to discover silver bullets.

As the blog title says, I’m Italian, but I will try to write in English, the developer lingua franca.

I hope that this blog will be useful for someone, nether less it will be useful for myself 🙂 .

See you soon.

Hello World minimo

Si dice che un linguaggio non sia general purpose se richede un alto numero di istruzioni per scrivere a video Hello World.

Ma in java, di quanti caratteri è composta la più piccola classe che scrive a video “Hello World”? La risposta è 21.

La classe B detiene questo record.

public class B extends A;

Questo è possibile grazie al fatto che eredita dalla classe A tutti i metodi public e protected e quindi anche il metodo main.

public class A { 
  public A() { 

  public String sayHello() { 
      System.out.println("Hello World"); 

  public static void main(String[] args) { 
     A a = new A(); 

Lo stile non è il massimo, ma lo considero un bell’esempio per spiegare l’ereditarietà.