spring xd 参考指南
http://docs.spring.io/spring-xd/docs/1.0.0.M2/reference/html/
参考指南
引言
概观
SpringXDisaunified,distributed,andextensibleservicefordataingestion,realtimeanalytics,batchprocessing,anddataexport.
SpringXD是一个统一的,分布式,可扩展的系统用于dataingestion,实时分析,批量处理和数据导出。
TheSpringXDprojectisanopensourceApache2Licenselicencedprojectwhosegoalistotacklebigdatacomplexity.
该项目的目标是简化大数据应用的开发。
Muchofthecomplexityinbuildingreal-worldbigdataapplicationsisrelatedtointegratingmanydisparatesystemsintoonecohesivesolutionacrossarangeofuse-cases.
建立真实世界的大数据的应用程序的大部分复杂性是在于将许多不同的系统为一个完整的解决方案,在一个范围内的使用情况。
创建一个综合的大数据解决方案中常见的用例是
高吞吐量的分布式数据的从各种输入源为大数据存储诸如HDFS或splunk收集
在收集时进行实时分析,例如采集数据和计算值
通过批处理进行工作流程管理。这些工作将通过标准企业系统(RDBMS)和Hadoop操作(MapReduce,HDFS,Pig,HiveorCascading(流注)整合在一起。
Highthroughputdataexport,e.g.fromHDFStoaRDBMSorNoSQLdatabase.
TheSpringXDprojectaimstoprovideaonestopshopsolutionfortheseuse-cases.
GettingStarted
Requirements
Togetstarted,makesureyoursystemhasasaminimumJavaJDK6ornewerinstalled.JavaJDK7isrecommended.
DownloadSpringXD
http://repo.spring.io/simple/libs-milestone-local/org/springframework/xd/spring-xd/1.0.0.M4/spring-xd-1.0.0.M4-dist.zip
解压,这将产生的安装目录spring-xd-1.0.0.m2。
Allthecommandsbelowareexecutedfromthisdirectory,sochangeintoitbeforeproceeding(进行,进程;行动)。
cpspring-xd-1.0.0.M4-dist.zip/opt/
cd/opt/
unzipspring-xd-1.0.0.M4-dist.zip
drwxr-xr-x7rootroot4096Nov1213:39spring-xd-1.0.0.M4/
$cdspring-xd-1.0.0.M2
设置环境变量
SettheenvironmentvariableXD_HOMEtotheinstallationdirectory<root-install-dir>\spring-xd\xd
vi/etc/profile
exportXD_HOME=/opt/spring-xd-1.0.0.M4/xd
source/etc/profile
root@Master:/etc#echo$XD_HOME
/opt/spring-xd-1.0.0.M4/xd
安装SpringXD
SpringXDcanberunintwodifferentmodes.There’sasingle-noderuntimeoptionfortestinganddevelopment,andthere’sadistributedruntimewhichsupportsdistributionofprocessingtasksacrossmultiplenodes.
Thisdocumentwillgetyouupandrunningquicklywithasingle-noderuntime.
SeeRunningDistributedModefordetailsonsettingupadistributedruntime.
StarttheRuntimeandtheXDShell
Thesinglenodeoptionistheeasiesttogetstartedwith.
Itrunseverythingyouneedinasingleprocess.Tostartit,youjustneedtocdtothexddirectoryandrunthefollowingcommand
启动命令
chmod-R777spring-xd-1.0.0.M4
xd/bin>$./xd-singlenode
启动后会看的
INFO:StartingServletEngine:ApacheTomcat/7.0.35
XDConfiguration:
XD_HOME=/opt/spring-xd-1.0.0.M4/xd
XD_TRANSPORT=local
XD_STORE=memory
XD_ANALYTICS=memory
XD_HADOOP_DISTRO=hadoop12
在一个单独的终端cdintotheshelldirectoryandstarttheXDshell,whichyoucanusetoissuecommands.
cd/opt/spring-xd-1.0.0.M4/shell/bin
shell/bin>$./xd-shell
Theshellisamoreuser-friendlyfrontendtotheRESTAPIwhichSpringXDexposestoclients.TheURLofthecurrentlytargetedSpringXDserverisshownatstartup.
YoushouldnowbeabletostartusingSpringXD.
CreateaStream
在springXD中,基本流定义了事件驱动的源数据到一个接收器的摄取过程通过任意数量的处理器
Youcancreateanewstreambyissuing(发布)astreamcreatecommandfromtheXDshell。StreamdefintionsarebuiltfromasimpleDSL.Forexample,execute:
xd:>streamcreate--definition"time|log"--nameticktock
Creatednewstream'ticktock
ThisdefinesastreamnamedticktockbasedofftheDSLexpressiontime|log.TheDSLusesthe"pipe"symbol|,toconnectasourcetoasink.
在xd窗口返回
01:47:30,823WARNtask-scheduler-6logger.ticktock:145-2013-12-2701:47:30
01:47:31,825WARNtask-scheduler-9logger.ticktock:145-2013-12-2701:47:31
01:47:32,827WARNtask-scheduler-6logger.ticktock:145-2013-12-2701:47:32
01:47:33,830WARNtask-scheduler-9logger.ticktock:145-2013-12-2701:47:33
01:47:34,845WARNtask-scheduler-6logger.ticktock:145-2013-12-2701:47:34
01:47:35,849WARNtask-scheduler-4logger.ticktock:145-2013-12-2701:47:35
01:47:36,852WARNtask-scheduler-7logger.ticktock:145-2013-12-2701:47:36
01:47:37,854WARNtask-scheduler-4logger.ticktock:145-2013-12-2701:47:37
01:47:38,856WARNtask-scheduler-7logger.ticktock:145-2013-12-2701:47:38
01:47:39,858WARNtask-scheduler-4logger.ticktock:145-2013-12-2701:47:39
01:47:40,881WARNtask-scheduler-7logger.ticktock:145-2013-12-2701:47:40
time|log
Inthissimpleexample,thetimesourcesimplysendsthecurrenttimeasamessageeachsecond,andthelogsinkoutputsitusingtheloggingframeworkattheWARNlogginglevel.
Tostopthestream,andremovethedefinitioncompletely,youcanusethestreamdestroycommand:
xd:>streamdestroy--nameticktock
Destroyedstream'ticktock'
Itisalsopossiblytostopandrestartthestreaminstead,usingtheundeployanddeploycommands.TheshellsupportscommandcompletionsoyoucanhittheTABkeytoseewhichcommandsandoptionsareavailable.
Command'tab'notfound(forassistancepressTAB)
xd:>
!//admin
aggregatecounterclscounter
dateexitfieldvaluecounter
gaugehadoophelp
httpjobmodule
richgaugeruntimescript
streamsystemversion
xd:>
探索springxd
LearnaboutthemodulesavailableinSpringXDintheSources(源),Processors(处理器),andSinks(接收器)sectionsofthedocumentation.
RunninginDistributedMode
Introduction
TheSpringXDdistributedruntime(DIRT)supportsdistributionofprocessingtasksacrossmultiplenodes.
SpringXDcanuseseveralmiddlewares(中间软件)whenrunningindistributedmode.
Atthetimeofwriting,RedisandRabbitMQareavailableoptions.
在写的时候,RedisandRabbitMQ是可用选项。
curl-d"multihttp--port=9001--rulepath=passport|file--dir=/home/focusstat/log/passport--name=passport.log"http://127.0.0.1:8080/streams/passport
http://www.open-open.com/news/view/154055d
root@Master:/opt/spring-xd-1.0.0.M4/shell/bin#netstat-antup
ActiveInternetconnections(serversandestablished)
ProtoRecv-QSend-QLocalAddressForeignAddressStatePID/Programname
tcp000.0.0.0:220.0.0.0:*LISTEN673/sshd
tcp05210.1.78.49:2210.1.77.40:57969ESTABLISHED1586/1
tcp0010.1.78.49:2210.1.77.40:56054ESTABLISHED1025/0
tcp600:::22:::*LISTEN673/sshd
tcp600:::9101:::*LISTEN1677/java
tcp600:::9393:::*LISTEN1677/java
tcp600127.0.0.1:9101127.0.0.1:45686ESTABLISHED1677/java
tcp600127.0.0.1:9101127.0.0.1:45687ESTABLISHED1677/java
tcp600127.0.0.1:45686127.0.0.1:9101ESTABLISHED1677/java
tcp600127.0.0.1:45687127.0.0.1:9101ESTABLISHED1677/java
udp000.0.0.0:680.0.0.0:*640/dhclient3
xd:>streamcreate--namehttptest--definition"http|file"
Creatednewstream'httptest'
xd:>httppost--targethttp://localhost:9000--data"helloworld"
>POST(text/plain;Charset=UTF-8)http://localhost:9000helloworld
>200OK
root@Master:/tmp/xd/output#tail-fhttptest.out
helloworld
root@Master:/tmp/xd/output#curl-d"test"http://localhost:9000
root@Master:/tmp/xd/output#tail-fhttptest.out
Architecture(总体、层次)结构
Introduction(介绍)
SpringXDisaunified,distributed,andextensibleservicefordataingestion,realtimeanalytics,batchprocessing,anddataexport.
SpringXD是一个统一的,分布式,可扩展的系统用于dataingestion,实时分析,批量处理和数据导出。
ThefoundationsofXDarchitecturearebasedontheover100+manyearsofworkthathavegoneintotheSpringBatch,IntegrationandDataprojects.
xd架构的基础是基于超过100个人年的工作(在进入spring批量,和数据的集成项目).
Buildingupontheseprojects,SpringXDprovidesserversandaconfigurationDSLthatyoucanimmediatelyusetostartprocessingdata.
基于这些项目,springxd提供服务和一个定义DSL,这个你可以立即使用来开始处理数据。
ÂYoudonotneedtobuildanapplicationyourselffromacollectionofjarstostartusingSpringXD.
你不需要亲自创建一个带一组jars的应用来开始使用springxd。
springxd有两种操作模式--单点和多点。第一种是一个单独处理过程来负责所有的处理和管理。这种模式助于你易于开始并且使你的应用程序开发和测试更加简单。
第二种模式是分布式模式,这种模式使得处理任务可以被一组集群分解并且一个管理服务器发送指令来控制处理任务在集群上运行。
RuntimeArchitecture运行时架构
springxd的关键组件是xd管理和xd容器服务器。使用一个高层次的DSL,你通过HTTP来post所需要的处理任务的说明管理服务器。管理服务器将处理任务映射到处理模块。一个模块是一个执行单元并且是一个springApplicationContext的实现。
Asimpledistributedruntimeisprovidedthatwillassign(分配)modulestoexecuteacrossmultipleXDContainerservers.AsingleXDContainerservercanrunmultiplemodules.
Whenusingthesinglenoderuntime,allmodulesareruninasingleXDContainerandtheXDAdminserverisruninthesameprocess.
DIRT(distributedruntime)Runtime
TheXDAdminserverbreaksupaprocessingtaskintoindividualmoduledefinitionsandpublishesthemtoasharedqueue(backedbyRedisorRabbitMQdependingupontheprovidedtransportoption)
TheXDAdminserver把一个任务处理成单独的模块定义和发布他们到一个共享queue(支持使用或RabbitMQ取决于所提供的传输选项)。
每个容器picksup一个模块定义从queue中,在一个类似manner的round-robin(轮叫调度)中,然后创建一个springapplicationContext来运行这个模块。
减少通过中间件之间通讯的跳数,多个模块可以组合成更大的部署单位,作为一个单一的模块。
SingleNodeRuntime
Fortestinganddevelopmentpurposes,asinglenoderuntimeisprovidedthatrunstheAdminandContainerserversinthesameprocess.ThecommunicationtotheXDAdminserverisoverHTTPandtheXDAdminservercommunicatestoanin-processXDContainerusinganin-memoryqueue.
AdminServerArchitecture
管理服务器使用内嵌servlet容器和暴露的两个端点的创建和删除必要的模块来执行数据处理任务(在DSL中定义的)。
超媒体即引用状态引擎(HypermediaAsTheEngineOfApplicationState,缩写为HATEOAS)
TheAdminServerisimplementedusingSpring’sMVCframeworkandtheSpringHATEOASlibrarytocreateRESTrepresentationsthatfollowtheHATEOASprinciple.TheAdminServercommunicateswiththeContainerServersusingapluggabletransportbased,thedefaultusesRedisqueues.
ContainerServerArchitecture
ThekeycomponentsofdataprocessinginSpringXDare
Streams
Streamsdefinehoweventdrivendataiscollected,processed,andstoredorforwarded.Forexample,astreammightcollectsyslogdata,filter,andstoreitinHDFS.
Jobs
Jobsdefinehowcoarsegrainedandtimeconsumingbatchprocessingstepsareorchestrated,forexampleajobcouldbebedefinedtocoordinateperformingHDFSoperationsandthesubsequentexecutionofmultipleMapReduceprocessingtasks.
Jobs精心策划定义粗粒度如何和费时批处理步骤,例如,一个Job例子被定义来细条执行HDFS操作和随后的多个MapReduce处理任务的执行。
Taps
Tapsareusedtoprocessdatainanon-invasivewayasdataisbeingprocessedbyaStreamoraJob.Muchlikewiretaps(偷听)usedontelephones,aTaponaStreamletsyouconsumedataatanypointalongtheStream’sprocessingpipeline.ThebehavioroftheoriginalstreamisunaffectedbythepresenceoftheTap.