Flink (三) Flink 编程模型
Flink (三) Flink 编程模型
流式处理WordCount:
public class StreamWordCount { public static void main(String[] args) throws Exception { //创建一个流处理的执行环境 StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); //接受socket数据流 DataStreamSource<String> textDataSteam = env.socketTextStream("localhost",7777); //逐一读取数据,打散之后进行WordCount(逻辑计算) SingleOutputStreamOperator<Tuple2<String, Integer>> wordCountDataStream = textDataSteam .flatMap(new FlatMapFunction<String, Tuple2<String, Integer>>() { public void flatMap(String s, Collector<Tuple2<String, Integer>> collector) throws Exception { String[] tokens = s.split(" "); for (String token : tokens) { if (token.length() > 0) { collector.collect(new Tuple2<String, Integer>(token, 1)); } } } }) .filter(new FilterFunction<Tuple2<String, Integer>>() { public boolean filter(Tuple2<String, Integer> stringIntegerTuple2) throws Exception { if (stringIntegerTuple2.equals(null)) { return false; } return true; } }) .keyBy(0) .sum(1); //打印输出 wordCountDataStream.print(); //执行任务 env.execute("StreamWordCountJob"); //测试需要开启端口7777 } }
整个Flink程序一共分为5步:设定Flink执行环境、创建和加载数据集、对数据集指定转换操作逻辑、指定计算结果输出位置