Direct Pipeline with No Transforms #4872

jermjensen · 2025-02-05T22:48:49Z

jermjensen
Feb 5, 2025

I'm curious to know if Hop's behavior is different between two types of pipelines: a pipeline with multiple transforms vs. a pipeline with no transforms - just a source and target. I'm running version 2.9.

I've been trying to meter the flow of 10,000 rows of data from a generic JDBC source piped into a SQL Server target. I've set a low fetch size parameter on the source connection string (100 rows), a pipeline row set size = 1000, and a commit of 1000 rows in my target.

I was hoping to batches of 100 rows flow from the source, buffer up to 1000 on machine running Hop, the push and commit those 1000 rows to SQL Server.

What (I think) I'm seeing is that all 10,000 rows are read from the source (no rows are buffered) and then sent to the target with a quick succession of commits every 1000 rows. I'm also seeing the pipeline taking a long time to go from "idle" to "running".

Do I need to look more into my JDBC documentation (maybe I'm not pulling back only 100 rows at a time) or does Hop not use the buffer when the pipeline is source-straight-to-target?

bamaer · 2025-02-06T06:04:09Z

bamaer
Feb 6, 2025
Collaborator

Try changing the "row set size" parameter (default 10.000) in your local pipeline run configuration.

1 reply

jermjensen Feb 7, 2025
Author

Thanks for the follow up - it doesn't seem like anything I can do will affect the fetch size of the source system.

I've been doing some digging, and I think I know what the issue is - maybe you guys can help shed some light.

The goal is to limit/meter the number of rows coming from the source system. I've been focusing on trying to set the fetch size of the driver.

The source system I'm connecting to is an old Sybase server - and the default "fetch size" parameter for Sybase is 0. Apparently, this default behavior is to load all of the data returned by the SELECT into memory then move it down to Hop.

It looks like this default is different from modern JDBC defaults. There is an option for setting a fetch size in the JDBC connection, but it can be overridden if at any point in the Java code a setFetchSize method is called.

I looked at the code searching for a call to setFetchSize in Hop, but I didn't find anything that might be triggering this behavior.

Could the team speak to how fetch sizes are defaulted in Hop and if the behavior of this driver - that calling setFetchSize(0) returns all data - might be causing my issue?

bamaer · 2025-02-07T18:47:36Z

bamaer
Feb 7, 2025
Collaborator

have you tried providing limits in your SQL query
SELECT TOP 100 * FROM Employees ORDER BY Surname DESC;

You could make this more dynamic through parameters/variables, something like:
SELECT TOP ${NB_ROWS} START AT ${OFFSET} * FROM Employees ORDER BY Surname DESC;?

This last option will require you to build a loop to go over all the available 100 row result sets, but would give you more flexibility.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Direct Pipeline with No Transforms #4872

{{title}}

Replies: 2 comments 1 reply

{{title}}

{{title}}

{{title}}

Select a reply

Direct Pipeline with No Transforms #4872

jermjensen Feb 5, 2025

Replies: 2 comments · 1 reply

bamaer Feb 6, 2025 Collaborator

jermjensen Feb 7, 2025 Author

bamaer Feb 7, 2025 Collaborator

jermjensen
Feb 5, 2025

Replies: 2 comments 1 reply

bamaer
Feb 6, 2025
Collaborator

jermjensen Feb 7, 2025
Author

bamaer
Feb 7, 2025
Collaborator