[FEATURE] Improve Model Interface Positioning and Auto-Application #3529

mingshl · 2025-02-11T23:38:56Z

Is your feature request related to a problem?
Currently, the model interface is applied before the pre-processing function. When both pre-processing and post-processing functions are defined, the model interface doesn't accurately represent the required format for model input and output.

We need to

Reposition the model interface to be applied after the pre-processing and post-processing functions. This will ensure that the model interface correctly describes the model input and output format.
Implement automatic recognition and application of the model interface for existing pre-processing and post-processing functions. This should be done for the following known pre-processing functions:

ml-commons/common/src/main/java/org/opensearch/ml/common/connector/MLPreProcessFunction.java

Lines 28 to 33 in 5432f25

    
           public static final String TEXT_DOCS_TO_BEDROCK_EMBEDDING_INPUT = "connector.pre_process.bedrock.embedding"; 
        
           public static final String TEXT_IMAGE_TO_BEDROCK_EMBEDDING_INPUT = "connector.pre_process.bedrock.multimodal_embedding"; 
        
           public static final String TEXT_DOCS_TO_DEFAULT_EMBEDDING_INPUT = "connector.pre_process.default.embedding"; 
        
           public static final String TEXT_SIMILARITY_TO_COHERE_RERANK_INPUT = "connector.pre_process.cohere.rerank"; 
        
           public static final String TEXT_SIMILARITY_TO_BEDROCK_RERANK_INPUT = "connector.pre_process.bedrock.rerank"; 
        
           public static final String TEXT_SIMILARITY_TO_DEFAULT_INPUT = "connector.pre_process.default.rerank";

and post prcessing functions:

ml-commons/common/src/main/java/org/opensearch/ml/common/connector/MLPostProcessFunction.java

Lines 22 to 29 in 5432f25

    
           public static final String COHERE_EMBEDDING = "connector.post_process.cohere.embedding"; 
        
           public static final String OPENAI_EMBEDDING = "connector.post_process.openai.embedding"; 
        
           public static final String BEDROCK_EMBEDDING = "connector.post_process.bedrock.embedding"; 
        
           public static final String BEDROCK_BATCH_JOB_ARN = "connector.post_process.bedrock.batch_job_arn"; 
        
           public static final String COHERE_RERANK = "connector.post_process.cohere.rerank"; 
        
           public static final String BEDROCK_RERANK = "connector.post_process.bedrock.rerank"; 
        
           public static final String DEFAULT_EMBEDDING = "connector.post_process.default.embedding"; 
        
           public static final String DEFAULT_RERANK = "connector.post_process.default.rerank";

Expected Outcome:

The model interface will accurately represent the required format for model input and output.
Improved consistency and ease of use for developers working with pre-processing and post-processing functions.
Reduced manual configuration errors when setting up model interfaces.

What alternatives have you considered?
Instead of repositioning model interface, create a new object called "model predict mapping" that can describe the model input and output mapping after pre-processing function, but the cons is that this seems very redundant to model interface.

dylan-tong-aws · 2025-02-12T00:39:24Z

@mingshl, in my opinion, we should deprecate the pre and post processing functionality in the connector and move this functionality to the pipeline/flow level. The connector's job is simply to provide a bridge to an external AI service endpoint. In my opinion, we should keep it simple and decouple data transform functionality.

Previously, the connectors were built for neural search and back then we had a very limited interface that was limited to converting query text into text embeddings. It was easy to predefine pre and post processing logic.

Now that we provide the generic capability to integrate any ML model into OpenSearch (search and ingest) data flows, we need to provide flexibility and ease-of-use for configure data processing within flows/pipelines. The current pre and post processing logic should be re-packaged as preset configurations for data transform/processors that can be used with neural queries.

mingshl added enhancement New feature or request untriaged labels Feb 11, 2025

b4sjoo linked a pull request Feb 14, 2025 that will close this issue

Add interface for no postprocessing function case #3553

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] Improve Model Interface Positioning and Auto-Application #3529

[FEATURE] Improve Model Interface Positioning and Auto-Application #3529

mingshl commented Feb 11, 2025 •

edited

Loading

dylan-tong-aws commented Feb 12, 2025

[FEATURE] Improve Model Interface Positioning and Auto-Application #3529

[FEATURE] Improve Model Interface Positioning and Auto-Application #3529

Comments

mingshl commented Feb 11, 2025 • edited Loading

dylan-tong-aws commented Feb 12, 2025

mingshl commented Feb 11, 2025 •

edited

Loading