spark异常汇总

spark异常汇总

1、输出目录已存在

diagnostics: Application application_1444384383185_2518 failed 2 times due to AM Container for appattempt_1444384383185_2518_000002 exited with  exitCode: 15 due to: Exception from container-launch: 
org.apache.hadoop.util.Shell$ExitCodeException: 
        at org.apache.hadoop.util.Shell.runCommand(Shell.java:505)
        at org.apache.hadoop.util.Shell.run(Shell.java:418)
        at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650)
        at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:279)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)

main : command provided 1
main : user is yule
main : requested yarn user is yule

Container exited with a non-zero exit code 15
.Failing this attempt.. Failing the application.
         ApplicationMaster host: N/A
         ApplicationMaster RPC port: -1
         queue: root.default
         start time: 1447662442842
         final status: FAILED
         tracking URL: a01.master.spark.hadoop.qingdao.youku:8088/cluster/app/application_1444384383185_2518
         user: yule
Exception in thread "main" org.apache.spark.SparkException: Application finished with failed status
        at org.apache.spark.deploy.yarn.Client.run(Client.scala:622)
        at org.apache.spark.deploy.yarn.Client$.main(Client.scala:647)
        at org.apache.spark.deploy.yarn.Client.main(Client.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:601)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:569)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:166)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:189)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:110)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

解决:删除输出目录

hadoopfs-rmr/workspace/yule/test/sparkoutput1

sparksql中遇到的问题:

http://www.cnblogs.com/shishanyuan/p/4723604.html?utm_source=tuicool

http://www.aboutyun.com/forum.php?mod=viewthread&tid=12358&page=1

http://dataunion.org/13433.html

http://www.csdn.net/article/2015-07-10/2825184

https://spark.apache.org/docs/latest/sql-programming-guide.html

【01】Spark简单实例

http://my.oschina.net/scipio/blog/284957

整理对SparkSQL的理解

http://www.360doc.com/content/14/0722/22/2459_396377522.shtml

重点:

SparkSql使用

http://blog.csdn.net/escaflone/article/details/43272477

SparkStream使用

http://blog.csdn.net/escaflone/article/details/43341275

SparkMLlib

http://blog.csdn.net/escaflone/article/details/43371505

Scala自学笔记

http://blog.csdn.net/escaflone/article/details/43485345

Spark-1.3.1与Hive整合实现查询分析

在大数据应用场景下,使用过Hive做查询统计分析的应该知道,计算的延迟性非常大,可能一个非常复杂的统计分析需求,需要运行1个小时以上,但是比之于使用MySQL之类关系数据库做分析,执行速度快很多很多。使用HiveQL写类似SQL的查询分析语句,最终经过Hive查询解析器,翻译成Hadoop平台上的MapReduce程序进行运行,这也是MapReduce计算引擎的特点带来的延迟问题:Map中间结果写文件。如果一个HiveQL语句非常复杂,会被翻译成多个MapReduceJob,那么就会有很多的Map输出中间结果数据到文件中,基本没有数据的共享。

如果使用Spark计算平台,基于SparkRDD数据集模型计算,可以减少计算过程中产生中间结果数据写文件的开销,Spark会把数据直接放到内存中供后续操作共享数据,减少了读写磁盘I/O操作带来的延时。另外,如果基于SparkonYARN部署模式,可以充分利用数据在Hadoop集群DataNode节点的本地性(Locality)特点,减少数据传输的通信开销。

http://shiyanjun.cn/archives/1113.html

Spark附带示例完整解释

http://www.ibm.com/developerworks/cn/opensource/os-cn-spark-code-samples/index.html

SparkSQL:notypetagavailableforxxxx

caseclass类要定义在Object类的上面

如果cassclass类放在了Object类里面,就会报标题的异常

6:Tips上面介绍了sparkSQL的基础应用,sparkSQL还在高速发展中,存在者不少缺陷,如:

scala2.10.4本身对caseclass有22列的限制,在使用RDD数据源的时候就会造成不方便;sqlContext中3个表不能同时join,需要两两join后再join一次;sqlContext中不能直接使用values插入数据;。。。总的来说,hiveContext还是令人满意,sqlContext就有些差强人意了。另外,顺便提一句,在编写sqlContext应用程序的时候,caseclass要定义在object之外。

http://www.it165.net/database/html/201409/8106.html

相关推荐