spark使用总结
1.
RDD:ResilientDistributedDataset弹性分布数据集
http://developer.51cto.com/art/201309/410276_1.htm
2.spark-shell的使用
./spark-shell--driver-library-path:/usr/local/hadoop-1.1.2/lib/native/Linux-i386-32:/usr/local/hadoop-1.1.2/lib/native/Linux-amd64-64:/usr/local/hadoop-1.1.2/lib/hadoop-lzo-0.4.17-SNAPSHOT.jar
3.
wordcount程序
valfile=sc.textFile("hdfs://192.168.100.99:9000/user/chaobo/test/tmp/2014/07/07/hive-site.xml.lzo")
valcount=file.flatMap(line=>line.split("")).map(word=>(word,1)).reduceByKey(_+_)
结果打印到屏幕count.collect()
结果写到hdfscount.saveAsTextFile("hdfs://192.168.100.99:9000/user/chaobo/result_20140707")最后一级目录不能存在
4.启动主节点
../sbin/start-master.sh
5.启动子节点
../sbin/start-slave.sh--webui-port8081