Flink Blog

The Generic Asynchronous Base Sink

March 16, 2022 - Zichen Liu

Flink sinks share a lot of similar behavior. Most sinks batch records according to user-defined buffering hints, sign requests, write them to the destination, retry unsuccessful or throttled requests, and participate in checkpointing. This is why for Flink 1.15 we have decided to create the AsyncSinkBase (FLIP-171), an abstract sink with a number of common functionalities extracted. This is a base implementation for asynchronous sinks, which you should use whenever you need to implement a sink that doesn’t offer transactional capabilities. ...

Apache Flink 1.14.4 Release Announcement

March 11, 2022 - Konstantin Knauf (@snntrable)

The Apache Flink Community is pleased to announce another bug fix release for Flink 1.14. This release includes 51 bug and vulnerability fixes and minor improvements for Flink 1.14. Below you will find a list of all bugfixes and improvements (excluding improvements to the build infrastructure and build stability). For a complete list of all changes see: JIRA. We highly recommend all users to upgrade to Flink 1.14.4. Release Artifacts # Maven Dependencies # <dependency> <groupId>org. ...

Scala Free in One Fifteen

February 22, 2022 - Seth Wiesman (@sjwiesman)

Flink 1.15 is right around the corner, and among the many improvements is a Scala free classpath. Users can now leverage the Java API from any Scala version, including Scala 3! Fig.1 Flink 1.15 Scala 3 Example This blog will discuss what has historically made supporting multiple Scala versions so complex, how we achieved this milestone, and the future of Scala in Apache Flink. TLDR: All Scala dependencies are now isolated to the flink-scala jar. ...

Apache Flink 1.13.6 Release Announcement

February 18, 2022 - Konstantin Knauf (@snntrable)

The Apache Flink Community is pleased to announce another bug fix release for Flink 1.13. This release includes 99 bug and vulnerability fixes and minor improvements for Flink 1.13 including another upgrade of Apache Log4j (to 2.17.1). Below you will find a list of all bugfixes and improvements (excluding improvements to the build infrastructure and build stability). For a complete list of all changes see: JIRA. We highly recommend all users to upgrade to Flink 1. ...

Stateful Functions 3.2.0 Release Announcement

January 31, 2022 - Till Rohrmann (@stsffap) Igal Shilman (@IgalShilman)

Stateful Functions is a cross-platform stack for building Stateful Serverless applications, making it radically simpler to develop scalable, consistent, and elastic distributed applications. This new release brings various improvements to the StateFun runtime, a leaner way to specify StateFun module components, and a brand new JavaScript SDK! The binary distribution and source artifacts are now available on the updated Downloads page of the Flink website, and the most recent Java SDK, Python SDK,, GoLang SDK and JavaScript SDK distributions are available on Maven, PyPI, Github, and npm respectively. ...

Pravega Flink Connector 101

January 20, 2022 - Yumin Zhou (Brian) (@crazy__zhou)

Pravega, which is now a CNCF sandbox project, is a cloud-native storage system based on abstractions for both batch and streaming data consumption. Pravega streams (a new storage abstraction) are durable, consistent, and elastic, while natively supporting long-term data retention. In comparison, Apache Flink is a popular real-time computing engine that provides unified batch and stream processing. Flink provides high-throughput, low-latency computation, as well as support for complex event processing and state management. ...

Apache Flink 1.14.3 Release Announcement

January 17, 2022 - Thomas Weise (@thweise) Martijn Visser (@martijnvisser82)

The Apache Flink community released the second bugfix version of the Apache Flink 1.14 series. The first bugfix release was 1.14.2, being an emergency release due to an Apache Log4j Zero Day (CVE-2021-44228). Flink 1.14.1 was abandoned. That means that this Flink release is the first bugfix release of the Flink 1.14 series which contains bugfixes not related to the mentioned CVE. This release includes 164 fixes and minor improvements for Flink 1. ...

Apache Flink ML 2.0.0 Release Announcement

January 7, 2022 - Dong Lin Yun Gao

The Apache Flink community is excited to announce the release of Flink ML 2.0.0! Flink ML is a library that provides APIs and infrastructure for building stream-batch unified machine learning algorithms, that can be easy-to-use and performant with (near-) real-time latency. This release involves a major refactor of the earlier Flink ML library and introduces major features that extend the Flink ML API and the iteration runtime, such as supporting stages with multi-input multi-output, graph-based stage composition, and a new stream-batch unified iteration library. ...

How We Improved Scheduler Performance for Large-scale Jobs - Part One

January 4, 2022 - Zhilong Hong Zhu Zhu Daisy Tsang Till Rohrmann (@stsffap)

Introduction # When scheduling large-scale jobs in Flink 1.12, a lot of time is required to initialize jobs and deploy tasks. The scheduler also requires a large amount of heap memory in order to store the execution topology and host temporary deployment descriptors. For example, for a job with a topology that contains two vertices connected with an all-to-all edge and a parallelism of 10k (which means there are 10k source tasks and 10k sink tasks and every source task is connected to all sink tasks), Flink’s JobManager would require 30 GiB of heap memory and more than 4 minutes to deploy all of the tasks. ...

How We Improved Scheduler Performance for Large-scale Jobs - Part Two

January 4, 2022 - Zhilong Hong Zhu Zhu Daisy Tsang Till Rohrmann (@stsffap)

Part one of this blog post briefly introduced the optimizations we’ve made to improve the performance of the scheduler; compared to Flink 1.12, the time cost and memory usage of scheduling large-scale jobs in Flink 1.14 is significantly reduced. In part two, we will elaborate on the details of these optimizations. {% toc %} Reducing complexity with groups # A distribution pattern describes how consumer tasks are connected to producer tasks. ...