Optimize DefaultSparkSqlFunctionResponseHandle for memory efficiency and remove unnecessary data conversions #3270

noCharger · 2025-01-28T16:33:09Z

Is your feature request related to a problem?

The current implementation in DefaultSparkSqlFunctionResponseHandle and its usage pattern lead to inefficient memory usage and unnecessary data conversions:

DefaultSparkSqlFunctionResponseHandle loads all data into an ArrayList and then creates an iterator from this ArrayList.
The consuming code (e.g. AsyncQueryExecutorServiceImpl) iterates over this iterator and puts all the data back into a new ArrayList.
This leads to:

Double memory usage: The data exists in both the original ArrayList inside DefaultSparkSqlFunctionResponseHandle and the new result ArrayList in the consuming code.
Unnecessary conversion: Data is converted from ArrayList to Iterator and then back to ArrayList, without leveraging the potential benefits of the iterator pattern such as lazy loading or memory efficiency.

What solution would you like?

I'm proposing two potential solutions:

Direct ArrayList access: If all data is typically needed at once, modify DefaultSparkSqlFunctionResponseHandle to provide a method that returns the full ArrayList directly, bypassing the iterator.
True lazy loading: For scenarios where streaming might be beneficial, implement real lazy loading in DefaultSparkSqlFunctionResponseHandle, fetching data on-demand.

What alternatives have you considered?

Keeping the current implementation but optimizing the consuming code to use the iterator directly without creating a new ArrayList.
Implementing a hybrid approach that provides both direct list access and iterator functionality, allowing for flexibility in different usage scenarios.

noCharger added enhancement New feature or request untriaged labels Jan 28, 2025

RyanL1997 removed the untriaged label Jan 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize DefaultSparkSqlFunctionResponseHandle for memory efficiency and remove unnecessary data conversions #3270

Optimize DefaultSparkSqlFunctionResponseHandle for memory efficiency and remove unnecessary data conversions #3270

noCharger commented Jan 28, 2025

Optimize DefaultSparkSqlFunctionResponseHandle for memory efficiency and remove unnecessary data conversions #3270

Optimize DefaultSparkSqlFunctionResponseHandle for memory efficiency and remove unnecessary data conversions #3270

Comments

noCharger commented Jan 28, 2025