聊聊flink的Evictors
序
本文主要研究一下flink的Evictors
Evictor
flink-streaming-java_2.11-1.7.0-sources.jar!/org/apache/flink/streaming/api/windowing/evictors/Evictor.java
@PublicEvolving public interface Evictor<T, W extends Window> extends Serializable { void evictBefore(Iterable<TimestampedValue<T>> elements, int size, W window, EvictorContext evictorContext); void evictAfter(Iterable<TimestampedValue<T>> elements, int size, W window, EvictorContext evictorContext); interface EvictorContext { long getCurrentProcessingTime(); MetricGroup getMetricGroup(); long getCurrentWatermark(); } }
- Evictor接收两个泛型,一个是element的类型,一个是窗口类型;它定义了evictBefore(
在windowing function之前
)、evictAfter(在windowing function之后
)两个方法,它们都有EvictorContext参数;EvictorContext定义了getCurrentProcessingTime、getMetricGroup、getCurrentWatermark方法
CountEvictor
flink-streaming-java_2.11-1.7.0-sources.jar!/org/apache/flink/streaming/api/windowing/evictors/CountEvictor.java
@PublicEvolving public class CountEvictor<W extends Window> implements Evictor<Object, W> { private static final long serialVersionUID = 1L; private final long maxCount; private final boolean doEvictAfter; private CountEvictor(long count, boolean doEvictAfter) { this.maxCount = count; this.doEvictAfter = doEvictAfter; } private CountEvictor(long count) { this.maxCount = count; this.doEvictAfter = false; } @Override public void evictBefore(Iterable<TimestampedValue<Object>> elements, int size, W window, EvictorContext ctx) { if (!doEvictAfter) { evict(elements, size, ctx); } } @Override public void evictAfter(Iterable<TimestampedValue<Object>> elements, int size, W window, EvictorContext ctx) { if (doEvictAfter) { evict(elements, size, ctx); } } private void evict(Iterable<TimestampedValue<Object>> elements, int size, EvictorContext ctx) { if (size <= maxCount) { return; } else { int evictedCount = 0; for (Iterator<TimestampedValue<Object>> iterator = elements.iterator(); iterator.hasNext();){ iterator.next(); evictedCount++; if (evictedCount > size - maxCount) { break; } else { iterator.remove(); } } } } public static <W extends Window> CountEvictor<W> of(long maxCount) { return new CountEvictor<>(maxCount); } public static <W extends Window> CountEvictor<W> of(long maxCount, boolean doEvictAfter) { return new CountEvictor<>(maxCount, doEvictAfter); } }
- CountEvictor实现了Evictor接口,其中element类型为Object;它有两个属性,分别是doEvictAfter、maxCount;其中doEvictAfter用于指定是使用evictBefore方法还是evictAfter方法;maxCount为窗口元素个数的阈值,超出则删掉
DeltaEvictor
flink-streaming-java_2.11-1.7.0-sources.jar!/org/apache/flink/streaming/api/windowing/evictors/DeltaEvictor.java
@PublicEvolving public class DeltaEvictor<T, W extends Window> implements Evictor<T, W> { private static final long serialVersionUID = 1L; DeltaFunction<T> deltaFunction; private double threshold; private final boolean doEvictAfter; private DeltaEvictor(double threshold, DeltaFunction<T> deltaFunction) { this.deltaFunction = deltaFunction; this.threshold = threshold; this.doEvictAfter = false; } private DeltaEvictor(double threshold, DeltaFunction<T> deltaFunction, boolean doEvictAfter) { this.deltaFunction = deltaFunction; this.threshold = threshold; this.doEvictAfter = doEvictAfter; } @Override public void evictBefore(Iterable<TimestampedValue<T>> elements, int size, W window, EvictorContext ctx) { if (!doEvictAfter) { evict(elements, size, ctx); } } @Override public void evictAfter(Iterable<TimestampedValue<T>> elements, int size, W window, EvictorContext ctx) { if (doEvictAfter) { evict(elements, size, ctx); } } private void evict(Iterable<TimestampedValue<T>> elements, int size, EvictorContext ctx) { TimestampedValue<T> lastElement = Iterables.getLast(elements); for (Iterator<TimestampedValue<T>> iterator = elements.iterator(); iterator.hasNext();){ TimestampedValue<T> element = iterator.next(); if (deltaFunction.getDelta(element.getValue(), lastElement.getValue()) >= this.threshold) { iterator.remove(); } } } @Override public String toString() { return "DeltaEvictor(" + deltaFunction + ", " + threshold + ")"; } public static <T, W extends Window> DeltaEvictor<T, W> of(double threshold, DeltaFunction<T> deltaFunction) { return new DeltaEvictor<>(threshold, deltaFunction); } public static <T, W extends Window> DeltaEvictor<T, W> of(double threshold, DeltaFunction<T> deltaFunction, boolean doEvictAfter) { return new DeltaEvictor<>(threshold, deltaFunction, doEvictAfter); } }
- DeltaEvictor实现了Evictor接口,它有三个属性,分别是doEvictAfter、threshold、deltaFunction;其中doEvictAfter用于指定是使用evictBefore方法还是evictAfter方法;threshold为阈值,如果deltaFunction.getDelta方法(
每个element与lastElement计算delta
)算出来的值大于等于该值,则需要移除该元素
TimeEvictor
flink-streaming-java_2.11-1.7.0-sources.jar!/org/apache/flink/streaming/api/windowing/evictors/TimeEvictor.java
@PublicEvolving public class TimeEvictor<W extends Window> implements Evictor<Object, W> { private static final long serialVersionUID = 1L; private final long windowSize; private final boolean doEvictAfter; public TimeEvictor(long windowSize) { this.windowSize = windowSize; this.doEvictAfter = false; } public TimeEvictor(long windowSize, boolean doEvictAfter) { this.windowSize = windowSize; this.doEvictAfter = doEvictAfter; } @Override public void evictBefore(Iterable<TimestampedValue<Object>> elements, int size, W window, EvictorContext ctx) { if (!doEvictAfter) { evict(elements, size, ctx); } } @Override public void evictAfter(Iterable<TimestampedValue<Object>> elements, int size, W window, EvictorContext ctx) { if (doEvictAfter) { evict(elements, size, ctx); } } private void evict(Iterable<TimestampedValue<Object>> elements, int size, EvictorContext ctx) { if (!hasTimestamp(elements)) { return; } long currentTime = getMaxTimestamp(elements); long evictCutoff = currentTime - windowSize; for (Iterator<TimestampedValue<Object>> iterator = elements.iterator(); iterator.hasNext(); ) { TimestampedValue<Object> record = iterator.next(); if (record.getTimestamp() <= evictCutoff) { iterator.remove(); } } } private boolean hasTimestamp(Iterable<TimestampedValue<Object>> elements) { Iterator<TimestampedValue<Object>> it = elements.iterator(); if (it.hasNext()) { return it.next().hasTimestamp(); } return false; } private long getMaxTimestamp(Iterable<TimestampedValue<Object>> elements) { long currentTime = Long.MIN_VALUE; for (Iterator<TimestampedValue<Object>> iterator = elements.iterator(); iterator.hasNext();){ TimestampedValue<Object> record = iterator.next(); currentTime = Math.max(currentTime, record.getTimestamp()); } return currentTime; } @Override public String toString() { return "TimeEvictor(" + windowSize + ")"; } @VisibleForTesting public long getWindowSize() { return windowSize; } public static <W extends Window> TimeEvictor<W> of(Time windowSize) { return new TimeEvictor<>(windowSize.toMilliseconds()); } public static <W extends Window> TimeEvictor<W> of(Time windowSize, boolean doEvictAfter) { return new TimeEvictor<>(windowSize.toMilliseconds(), doEvictAfter); } }
- TimeEvictor实现了Evictor接口,其中element类型为Object;它有两个属性,分别是doEvictAfter、windowSize;其中doEvictAfter用于指定是使用evictBefore方法还是evictAfter方法;windowSize用于指定窗口的时间长度,以窗口元素最大时间戳-windowSize为evictCutoff,所有timestamp小于等于evictCutoff的元素都将会被剔除
小结
- Evictor接收两个泛型,一个是element的类型,一个是窗口类型;它定义了evictBefore(
在windowing function之前
)、evictAfter(在windowing function之后
)两个方法,它们都有EvictorContext参数;EvictorContext定义了getCurrentProcessingTime、getMetricGroup、getCurrentWatermark方法 - Evictor有几个内置的实现类,分别是CountEvictor、DeltaEvictor、TimeEvictor;其中CountEvictor是按窗口元素个数来进行剔除,TimeEvictor是按窗口长度来进行剔除,DeltaEvictor则是根据窗口元素与lastElement的delta与指定的threshold对比来进行剔除
- 如果指定了evictor(
evictBefore
)则会妨碍任何pre-aggregation操作,因为所有的窗口元素都会在windowing function计算之前先执行evictor操作;另外就是flink不保障窗口元素的顺序,也就是evictor如果有按窗口开头或末尾剔除元素,可能剔除的元素实际上并不是最先或最后达到的
doc
相关推荐
raidtest 2020-10-09
匆匆那些年 2020-06-27
oXiaoChong 2020-06-20
yuchuanchen 2020-06-16
Spark高级玩法 2020-06-14
Leonwey 2020-06-11
Spark高级玩法 2020-06-09
文报 2020-06-09
xorxos 2020-06-07
xiaoyutongxue 2020-05-27
yuchuanchen 2020-05-27
阿尼古 2020-05-26
千慧 2020-05-18
yuchuanchen 2020-05-17
yuchuanchen 2020-05-16
Spark高级玩法 2020-05-11
yuchuanchen 2020-05-11