Hadoop

Tutorial to setup hadoop

Start/Stop

# to start
$ start-dfs.sh
$ start-yarn.sh
# to check all is running
$ jps
# to stop

Do something with files

$ hadoop fs -ls /

# move file to HDFS
$ hadoop fs -put filename.txt

# remove file
$ hadoop fs -rm filename.txt

# create dir
$ hadoop fs -mkdir /dirname

# copy file to dir
$ hadoop fs -put filename.txt /dirname

# get to the local disk
$ hadoop fs -get out_file local_filename

# remove folder
$ hadoop fs -rmr /dirname

# run the command (output folder should not exist!)
$ hadoop jar /usr/local/lib/hadoop-2.7.0/share/hadoop/tools/lib/hadoop-streaming-2.7.0.jar -mapper mapper.py -reducer reducer.py -file mapper.py -file reducer.py -input /myinput -output /joboutput

# generate test file
$ head -50 hadoop_data/purchases.txt > test_file

# testing whole line
$ cat test_file | ./mapper.py | sort | ./reducer.py