Perl, Hive and Pig, Oh My! Hadoop for Perl programmers

By Jud Dagnall (‎jud‎)
Date: Wednesday, 5 June 2013 09:00
Duration: 45 minutes
Target audience: Intermediate
Language: English
Tags: etl hadoop hive perl pig sql


At WhiteHat, we aggregate hierarchical time-series data as part of a data warehouse ETL pipeline. Hive and Pig are higher level languages for creating MapReduce jobs in Hadoop. Hive provides an SQL-like interface to your data, while Pig is simple but powerful data processing language. By taking advantage of the streaming capability built into both Hive and Pig, Perl developers can easily hook into an existing Hadoop infrastructure from the comfort and safely of their own language.

In this talk, we will present a brief overview of Hive, Pig and MapReduce, and then explore practical examples of how we can use Perl + Hadoop to solve some real-world problems. Along the way, we'll encounter tips for packaging your Perl code for Hadoop, see how different data types appear to our scripts, use the MapReduce framework to simplify our tasks, and learn to avoid some common pitfalls.

Attended by: