Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize DefaultSparkSqlFunctionResponseHandle for memory efficiency and remove unnecessary data conversions #3270

Open
noCharger opened this issue Jan 28, 2025 · 0 comments
Labels
enhancement New feature or request

Comments

@noCharger
Copy link
Collaborator

Is your feature request related to a problem?

The current implementation in DefaultSparkSqlFunctionResponseHandle and its usage pattern lead to inefficient memory usage and unnecessary data conversions:

DefaultSparkSqlFunctionResponseHandle loads all data into an ArrayList and then creates an iterator from this ArrayList.
The consuming code (e.g. AsyncQueryExecutorServiceImpl) iterates over this iterator and puts all the data back into a new ArrayList.
This leads to:

  • Double memory usage: The data exists in both the original ArrayList inside DefaultSparkSqlFunctionResponseHandle and the new result ArrayList in the consuming code.
  • Unnecessary conversion: Data is converted from ArrayList to Iterator and then back to ArrayList, without leveraging the potential benefits of the iterator pattern such as lazy loading or memory efficiency.

What solution would you like?

I'm proposing two potential solutions:

  • Direct ArrayList access: If all data is typically needed at once, modify DefaultSparkSqlFunctionResponseHandle to provide a method that returns the full ArrayList directly, bypassing the iterator.
  • True lazy loading: For scenarios where streaming might be beneficial, implement real lazy loading in DefaultSparkSqlFunctionResponseHandle, fetching data on-demand.

What alternatives have you considered?

  • Keeping the current implementation but optimizing the consuming code to use the iterator directly without creating a new ArrayList.
  • Implementing a hybrid approach that provides both direct list access and iterator functionality, allowing for flexibility in different usage scenarios.
@noCharger noCharger added enhancement New feature or request untriaged labels Jan 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants