Apache Pig for batch data analysis over Hadoop

In these days I’m playing with Apache Pig for running data analysis over Apache Hadoop. Below a sample wordcloud generated from the top word count of nouns of the Italian translation of the Bible Copy the file book.txt to hadoop distribuited file system (HDFS) withhadoop-2.4.0/bin/hdfs dfs -copyFromLocal -f book.txtTest the pig job locally withpig-0.13.0/bin/pig -x local wordcount.pigRun the pig job in hadoop withpig-0.13.0/bin/pig -x mapreduce wordcount.pigLook at results withhadoop-2.4.0/bin/hdfs dfs -cat book-wordcount/part\*|moreCopy the results to a local file withhadoop-2....

August 25, 2014 · 1 min · 158 words · Matteo Redaelli