spark使用总结

1.

RDD:ResilientDistributedDataset弹性分布数据集

http://developer.51cto.com/art/201309/410276_1.htm

2.spark-shell的使用

./spark-shell--driver-library-path:/usr/local/hadoop-1.1.2/lib/native/Linux-i386-32:/usr/local/hadoop-1.1.2/lib/native/Linux-amd64-64:/usr/local/hadoop-1.1.2/lib/hadoop-lzo-0.4.17-SNAPSHOT.jar

3.

wordcount程序

valfile=sc.textFile("hdfs://192.168.100.99:9000/user/chaobo/test/tmp/2014/07/07/hive-site.xml.lzo")

valcount=file.flatMap(line=>line.split("")).map(word=>(word,1)).reduceByKey(_+_)

结果打印到屏幕count.collect()

结果写到hdfscount.saveAsTextFile("hdfs://192.168.100.99:9000/user/chaobo/result_20140707")最后一级目录不能存在

4.启动主节点

../sbin/start-master.sh

5.启动子节点

../sbin/start-slave.sh--webui-port8081

相关推荐