Connectors can be used to construct a FeatureTable in FeatHub. It can be a source to access and interpret a table of feature values from an offline or online feature store, or as a sink to locate and write a table of feature values to an offline or online feature store.
A source can operate on one of the following modes:
- Batch Scan: Scan a bounded table from an external system for batch processing.
- Streaming Scan: Scan a bounded or unbounded append-only table from an external system for stream processing.
- Streaming CDC: Consume the changelog stream from an external system for stream processing.
- Lookup: Join the latest value from an external system at processing time.
A Sink can operate on one of the following modes:
- Batch Append: Write an append-only batch table to an external system.
- Streaming Append: Write an append-only streaming table to an external system.
- Streaming Upsert: Write an upsert streaming table to an external system.
The tables below describe the supported connectors and modes for different processors, where "Y" means supported, "Y/N" means partially supported, "N" means unsupported.
| Connector\Modes | Batch Scan | Streaming Scan | Streaming CDC | Lookup | Batch Append | Streaming Append | Streaming Upsert |
|---|---|---|---|---|---|---|---|
| FileSystem | Y | N | N | N | Y | N | N |
| Connector\Modes | Batch Scan | Streaming Scan | Streaming CDC | Lookup | Batch Append | Streaming Append | Streaming Upsert |
|---|---|---|---|---|---|---|---|
| FileSystem | N | Y | N | N | N | Y | N |
| MySQL | N | N | N | Y/N1 | N | Y | Y |
| Kafka | N | Y | N | N | N | Y | N |
| Redis | N | N | N | Y | N | N | Y |
| Hive | N | Y | N | N | N | Y | N |
- Only supported in OnDemandFeatureView currently.
| Connector\Modes | Batch Scan | Streaming Scan | Streaming CDC | Lookup | Batch Append | Streaming Append | Streaming Upsert |
|---|---|---|---|---|---|---|---|
| FileSystem | Y | N | N | N | Y | N | N |
Data format define how information is encoded in an external storage. A data format is typically used for a storage system that doesn't have schema for its data, e.g. FileSystem, Kafka, etc. Feature currently supports the following formats.