-
Notifications
You must be signed in to change notification settings - Fork 82
Home
Welcome! flink-tensorflow (FTF) is an open-source library for machine intelligence in Apache Flink, using TensorFlow for numerical computation within a Flink program. A TensorFlow model becomes a function that you can use to transform a stream. This can be combined with Flink connectors and other Flink libraries, to produce scalable, stateful, intelligent stream processing applications.
Note: flink-tensorflow doesn't provide a way to run ordinary TensorFlow programs unaltered. It provides a way for Flink programs to leverage the TensorFlow library while retaining the full power of Flink.
- See the Building section to build the flink-tensorflow library
- See the Developing an Application section to package your Flink application with the TensorFlow dependencies.
The below sections cover the key concepts of the library.
TensorFlow has a lot in common with Apache Flink. It is based on the dataflow programming model, where you define a graph of numerical computations over a stream of data records. TensorFlow executes the computations using the CPU and GPU (when available). An interesting aspect of its design is, the 'pump' that streams records thru the graph is not internal to TensorFlow; it is the responsibility of application code. With flink-tensorflow, the Flink dataflow engine acts as the pump!
All inputs and outputs to a TensorFlow graph are stored as multi-dimensional arrays called tensors. This library provides various converters to read and write data records as tensors.
A model is a pre-built TensorFlow graph with associated data and with well-defined interfaces. A given model might support image classification, text analysis, regression, or other forms of inference. An advanced model might even be stateful, learning from input data or identifying sequences using internal state.
TensorFlow defines a standard format for such models called the saved model format. The flink-tensorflow library fully supports the saved model format. It is also possible to use arbitrary TensorFlow graphs.
The core functionality of the library is to enable the use of TensorFlow models within Flink data transformations (e.g. map
, window
, iterate
). To interoperate with a diverse range of transformations, the library is designed to work with any transformation function.
A good TensorFlow model exposes its functionality as functions over defined inputs. The signature of a given function (which we call a method) may be standardized across many models, allowing for mix-and-match of models and tools. For example, some models implement a classification method. A given model may implement numerous methods.
To use a given model in Flink, you implement the Model
class and define the functions that the model supports. Reuse the standard method signatures (e.g. RegressionMethod
, ClassificationMethod
) where possible.
A model consists of a graph definition, stored variable data (such as pre-trained weights), and metadata defining the methods supported by the model. You have some flexibility as to how Flink obtains this information about the model.
- Use a model that is encoded in the standard saved model format and stored on the filesystem.
- Use an exported TensorFlow graph, supplemented with metadata that you provide.
- Construct an in-memory TensorFlow graph.
See the Models section for more information.
(TODO)
(TODO)
Getting Started
Reference
Examples