spring xd 参考指南

http://docs.spring.io/spring-xd/docs/1.0.0.M2/reference/html/

参考指南

引言

概观

SpringXDisaunified,distributed,andextensibleservicefordataingestion,realtimeanalytics,batchprocessing,anddataexport.

SpringXD是一个统一的,分布式,可扩展的系统用于dataingestion,实时分析,批量处理和数据导出。

TheSpringXDprojectisanopensourceApache2Licenselicencedprojectwhosegoalistotacklebigdatacomplexity.

该项目的目标是简化大数据应用的开发。

Muchofthecomplexityinbuildingreal-worldbigdataapplicationsisrelatedtointegratingmanydisparatesystemsintoonecohesivesolutionacrossarangeofuse-cases.

建立真实世界的大数据的应用程序的大部分复杂性是在于将许多不同的系统为一个完整的解决方案,在一个范围内的使用情况。

创建一个综合的大数据解决方案中常见的用例是

高吞吐量的分布式数据的从各种输入源为大数据存储诸如HDFS或splunk收集

在收集时进行实时分析,例如采集数据和计算值

通过批处理进行工作流程管理。这些工作将通过标准企业系统(RDBMS)和Hadoop操作(MapReduce,HDFS,Pig,HiveorCascading(流注)整合在一起。

Highthroughputdataexport,e.g.fromHDFStoaRDBMSorNoSQLdatabase.

TheSpringXDprojectaimstoprovideaonestopshopsolutionfortheseuse-cases.

GettingStarted

Requirements

Togetstarted,makesureyoursystemhasasaminimumJavaJDK6ornewerinstalled.JavaJDK7isrecommended.

DownloadSpringXD

http://repo.spring.io/simple/libs-milestone-local/org/springframework/xd/spring-xd/1.0.0.M4/spring-xd-1.0.0.M4-dist.zip

解压,这将产生的安装目录spring-xd-1.0.0.m2。

Allthecommandsbelowareexecutedfromthisdirectory,sochangeintoitbeforeproceeding(进行,进程;行动)。

cpspring-xd-1.0.0.M4-dist.zip/opt/

cd/opt/

unzipspring-xd-1.0.0.M4-dist.zip

drwxr-xr-x7rootroot4096Nov1213:39spring-xd-1.0.0.M4/

$cdspring-xd-1.0.0.M2

设置环境变量

SettheenvironmentvariableXD_HOMEtotheinstallationdirectory<root-install-dir>\spring-xd\xd

vi/etc/profile

exportXD_HOME=/opt/spring-xd-1.0.0.M4/xd

source/etc/profile

root@Master:/etc#echo$XD_HOME

/opt/spring-xd-1.0.0.M4/xd

安装SpringXD

SpringXDcanberunintwodifferentmodes.There’sasingle-noderuntimeoptionfortestinganddevelopment,andthere’sadistributedruntimewhichsupportsdistributionofprocessingtasksacrossmultiplenodes.

Thisdocumentwillgetyouupandrunningquicklywithasingle-noderuntime.

SeeRunningDistributedModefordetailsonsettingupadistributedruntime.

StarttheRuntimeandtheXDShell

Thesinglenodeoptionistheeasiesttogetstartedwith.

Itrunseverythingyouneedinasingleprocess.Tostartit,youjustneedtocdtothexddirectoryandrunthefollowingcommand

启动命令

chmod-R777spring-xd-1.0.0.M4

xd/bin>$./xd-singlenode

启动后会看的

INFO:StartingServletEngine:ApacheTomcat/7.0.35

XDConfiguration:

XD_HOME=/opt/spring-xd-1.0.0.M4/xd

XD_TRANSPORT=local

XD_STORE=memory

XD_ANALYTICS=memory

XD_HADOOP_DISTRO=hadoop12

在一个单独的终端cdintotheshelldirectoryandstarttheXDshell,whichyoucanusetoissuecommands.

cd/opt/spring-xd-1.0.0.M4/shell/bin

shell/bin>$./xd-shell

Theshellisamoreuser-friendlyfrontendtotheRESTAPIwhichSpringXDexposestoclients.TheURLofthecurrentlytargetedSpringXDserverisshownatstartup.

YoushouldnowbeabletostartusingSpringXD.

CreateaStream

在springXD中,基本流定义了事件驱动的源数据到一个接收器的摄取过程通过任意数量的处理器

Youcancreateanewstreambyissuing(发布)astreamcreatecommandfromtheXDshell。StreamdefintionsarebuiltfromasimpleDSL.Forexample,execute:

xd:>streamcreate--definition"time|log"--nameticktock

Creatednewstream'ticktock

ThisdefinesastreamnamedticktockbasedofftheDSLexpressiontime|log.TheDSLusesthe"pipe"symbol|,toconnectasourcetoasink.

在xd窗口返回

01:47:30,823WARNtask-scheduler-6logger.ticktock:145-2013-12-2701:47:30

01:47:31,825WARNtask-scheduler-9logger.ticktock:145-2013-12-2701:47:31

01:47:32,827WARNtask-scheduler-6logger.ticktock:145-2013-12-2701:47:32

01:47:33,830WARNtask-scheduler-9logger.ticktock:145-2013-12-2701:47:33

01:47:34,845WARNtask-scheduler-6logger.ticktock:145-2013-12-2701:47:34

01:47:35,849WARNtask-scheduler-4logger.ticktock:145-2013-12-2701:47:35

01:47:36,852WARNtask-scheduler-7logger.ticktock:145-2013-12-2701:47:36

01:47:37,854WARNtask-scheduler-4logger.ticktock:145-2013-12-2701:47:37

01:47:38,856WARNtask-scheduler-7logger.ticktock:145-2013-12-2701:47:38

01:47:39,858WARNtask-scheduler-4logger.ticktock:145-2013-12-2701:47:39

01:47:40,881WARNtask-scheduler-7logger.ticktock:145-2013-12-2701:47:40

time|log

Inthissimpleexample,thetimesourcesimplysendsthecurrenttimeasamessageeachsecond,andthelogsinkoutputsitusingtheloggingframeworkattheWARNlogginglevel.

Tostopthestream,andremovethedefinitioncompletely,youcanusethestreamdestroycommand:

xd:>streamdestroy--nameticktock

Destroyedstream'ticktock'

Itisalsopossiblytostopandrestartthestreaminstead,usingtheundeployanddeploycommands.TheshellsupportscommandcompletionsoyoucanhittheTABkeytoseewhichcommandsandoptionsareavailable.

Command'tab'notfound(forassistancepressTAB)

xd:>

!//admin

aggregatecounterclscounter

dateexitfieldvaluecounter

gaugehadoophelp

httpjobmodule

richgaugeruntimescript

streamsystemversion

xd:>

探索springxd

LearnaboutthemodulesavailableinSpringXDintheSources(源),Processors(处理器),andSinks(接收器)sectionsofthedocumentation.

RunninginDistributedMode

Introduction

TheSpringXDdistributedruntime(DIRT)supportsdistributionofprocessingtasksacrossmultiplenodes.

SpringXDcanuseseveralmiddlewares(中间软件)whenrunningindistributedmode.

Atthetimeofwriting,RedisandRabbitMQareavailableoptions.

在写的时候,RedisandRabbitMQ是可用选项。

curl-d"multihttp--port=9001--rulepath=passport|file--dir=/home/focusstat/log/passport--name=passport.log"http://127.0.0.1:8080/streams/passport

http://www.open-open.com/news/view/154055d

root@Master:/opt/spring-xd-1.0.0.M4/shell/bin#netstat-antup

ActiveInternetconnections(serversandestablished)

ProtoRecv-QSend-QLocalAddressForeignAddressStatePID/Programname

tcp000.0.0.0:220.0.0.0:*LISTEN673/sshd

tcp05210.1.78.49:2210.1.77.40:57969ESTABLISHED1586/1

tcp0010.1.78.49:2210.1.77.40:56054ESTABLISHED1025/0

tcp600:::22:::*LISTEN673/sshd

tcp600:::9101:::*LISTEN1677/java

tcp600:::9393:::*LISTEN1677/java

tcp600127.0.0.1:9101127.0.0.1:45686ESTABLISHED1677/java

tcp600127.0.0.1:9101127.0.0.1:45687ESTABLISHED1677/java

tcp600127.0.0.1:45686127.0.0.1:9101ESTABLISHED1677/java

tcp600127.0.0.1:45687127.0.0.1:9101ESTABLISHED1677/java

udp000.0.0.0:680.0.0.0:*640/dhclient3

xd:>streamcreate--namehttptest--definition"http|file"

Creatednewstream'httptest'

xd:>httppost--targethttp://localhost:9000--data"helloworld"

>POST(text/plain;Charset=UTF-8)http://localhost:9000helloworld

>200OK

root@Master:/tmp/xd/output#tail-fhttptest.out

helloworld

root@Master:/tmp/xd/output#curl-d"test"http://localhost:9000

root@Master:/tmp/xd/output#tail-fhttptest.out

Architecture(总体、层次)结构

Introduction(介绍)

SpringXDisaunified,distributed,andextensibleservicefordataingestion,realtimeanalytics,batchprocessing,anddataexport.

SpringXD是一个统一的,分布式,可扩展的系统用于dataingestion,实时分析,批量处理和数据导出。

ThefoundationsofXDarchitecturearebasedontheover100+manyearsofworkthathavegoneintotheSpringBatch,IntegrationandDataprojects.

xd架构的基础是基于超过100个人年的工作(在进入spring批量,和数据的集成项目).

Buildingupontheseprojects,SpringXDprovidesserversandaconfigurationDSLthatyoucanimmediatelyusetostartprocessingdata.

基于这些项目,springxd提供服务和一个定义DSL,这个你可以立即使用来开始处理数据。

ÂYoudonotneedtobuildanapplicationyourselffromacollectionofjarstostartusingSpringXD.

你不需要亲自创建一个带一组jars的应用来开始使用springxd。

springxd有两种操作模式--单点和多点。第一种是一个单独处理过程来负责所有的处理和管理。这种模式助于你易于开始并且使你的应用程序开发和测试更加简单。

第二种模式是分布式模式,这种模式使得处理任务可以被一组集群分解并且一个管理服务器发送指令来控制处理任务在集群上运行。

RuntimeArchitecture运行时架构

springxd的关键组件是xd管理和xd容器服务器。使用一个高层次的DSL,你通过HTTP来post所需要的处理任务的说明管理服务器。管理服务器将处理任务映射到处理模块。一个模块是一个执行单元并且是一个springApplicationContext的实现。

Asimpledistributedruntimeisprovidedthatwillassign(分配)modulestoexecuteacrossmultipleXDContainerservers.AsingleXDContainerservercanrunmultiplemodules.

Whenusingthesinglenoderuntime,allmodulesareruninasingleXDContainerandtheXDAdminserverisruninthesameprocess.

DIRT(distributedruntime)Runtime

TheXDAdminserverbreaksupaprocessingtaskintoindividualmoduledefinitionsandpublishesthemtoasharedqueue(backedbyRedisorRabbitMQdependingupontheprovidedtransportoption)

TheXDAdminserver把一个任务处理成单独的模块定义和发布他们到一个共享queue(支持使用或RabbitMQ取决于所提供的传输选项)。

每个容器picksup一个模块定义从queue中,在一个类似manner的round-robin(轮叫调度)中,然后创建一个springapplicationContext来运行这个模块。

减少通过中间件之间通讯的跳数,多个模块可以组合成更大的部署单位,作为一个单一的模块。

SingleNodeRuntime

Fortestinganddevelopmentpurposes,asinglenoderuntimeisprovidedthatrunstheAdminandContainerserversinthesameprocess.ThecommunicationtotheXDAdminserverisoverHTTPandtheXDAdminservercommunicatestoanin-processXDContainerusinganin-memoryqueue.

AdminServerArchitecture

管理服务器使用内嵌servlet容器和暴露的两个端点的创建和删除必要的模块来执行数据处理任务(在DSL中定义的)。

超媒体即引用状态引擎(HypermediaAsTheEngineOfApplicationState,缩写为HATEOAS)

TheAdminServerisimplementedusingSpring’sMVCframeworkandtheSpringHATEOASlibrarytocreateRESTrepresentationsthatfollowtheHATEOASprinciple.TheAdminServercommunicateswiththeContainerServersusingapluggabletransportbased,thedefaultusesRedisqueues.

ContainerServerArchitecture

ThekeycomponentsofdataprocessinginSpringXDare

Streams

Streamsdefinehoweventdrivendataiscollected,processed,andstoredorforwarded.Forexample,astreammightcollectsyslogdata,filter,andstoreitinHDFS.

Jobs

Jobsdefinehowcoarsegrainedandtimeconsumingbatchprocessingstepsareorchestrated,forexampleajobcouldbebedefinedtocoordinateperformingHDFSoperationsandthesubsequentexecutionofmultipleMapReduceprocessingtasks.

Jobs精心策划定义粗粒度如何和费时批处理步骤,例如,一个Job例子被定义来细条执行HDFS操作和随后的多个MapReduce处理任务的执行。

Taps

Tapsareusedtoprocessdatainanon-invasivewayasdataisbeingprocessedbyaStreamoraJob.Muchlikewiretaps(偷听)usedontelephones,aTaponaStreamletsyouconsumedataatanypointalongtheStream’sprocessingpipeline.ThebehavioroftheoriginalstreamisunaffectedbythepresenceoftheTap.

相关推荐