Skip to content

feat(ddl): disallow inserting by position, require source columns to be subset of target columns#11776

Open
NickCrews wants to merge 1 commit intoibis-project:mainfrom
NickCrews:insert-only-by-name
Open

feat(ddl): disallow inserting by position, require source columns to be subset of target columns#11776
NickCrews wants to merge 1 commit intoibis-project:mainfrom
NickCrews:insert-only-by-name

Conversation

@NickCrews
Copy link
Contributor

@NickCrews NickCrews commented Nov 21, 2025

After slowly picking apart the prerequisites for this PR, I think it is finally focused enough to actually tackle! Resolves #11731

This is breaking, but probably not for a huge fraction of users (???). Currently, on main, this works, aligning by position:

con.create_table("table", {"a": "int64", "b": "int64"})
con.insert("table", [{"a bogus column name": 4, "foobar": 5}])

After this PR, the above errors, which I think it is good.

Currently, on main, this (inserting data with anonymous column names) works, and after this PR it will continue to work. Originally in this PR I dropped support for this, hence you can see some comments in the thread below. But after that discussion we decided to keep supporting this.

con.insert("table", [(1,2), (3,4)])

A summary of the changes are

  • refactoring memtable() to have an internal variant, _memtable(), that ALSO provides analysis of whether the given data had column names vs was unnamed. eg [(1,2), (3,4)] is unnamed. The public API doesn't change, but we use this internal version to infer the columns during con.insert("table", [(1,2), (3,4)])
  • moved the test for the above from the duckdb backend to a more general test file so all backends get the test
  • added a _ensure_table_to_insert() utility method to the base SQL backend that all subclasses can use so they can share the verification and coercion logic.

@github-actions github-actions bot added tests Issues or PRs related to tests sql Backends that generate SQL labels Nov 21, 2025
@NickCrews NickCrews added the breaking change Changes that introduce an API break at any level label Nov 21, 2025
@NickCrews NickCrews changed the title feat(ddl): disallow inserting by position, require source columns to be ubset of target columns feat(ddl): disallow inserting by position, require source columns to be subset of target columns Nov 21, 2025
@NickCrews
Copy link
Contributor Author

@deepyaman this is related to your upsert PR, if you want to give it a look. I don't think there is any ordering needed between the two, but I do think it would be great if this PR was included in the same release as that PR, so that the .upsert() API never goes through a breaking change.

@NickCrews NickCrews force-pushed the insert-only-by-name branch from d67de9c to 6fe21f7 Compare January 25, 2026 20:38
@github-actions github-actions bot added the duckdb The DuckDB backend label Jan 25, 2026
@NickCrews NickCrews force-pushed the insert-only-by-name branch 2 times, most recently from c9dd811 to d74fdf0 Compare January 25, 2026 21:04
@github-actions github-actions bot added sqlite The SQLite backend clickhouse The ClickHouse backend snowflake The Snowflake backend labels Jan 26, 2026
@NickCrews NickCrews force-pushed the insert-only-by-name branch from da91ef7 to 036ecea Compare January 26, 2026 05:22
assert t.count().execute() == 2

con.insert(name, [(1,), (2,)])
assert t.count().execute() == 4
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This actually is still supported if we keep the second commit, I just forgot to un-delete it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now tested across all backends in test_insert_overwrite_from_list() in ibis/backends/tests/test_client.py

@NickCrews
Copy link
Contributor Author

@cpcloud can I get confirmation this is the right idea before I do the rebase and fixup the tests?

@NickCrews NickCrews force-pushed the insert-only-by-name branch from 036ecea to 1b324bb Compare February 4, 2026 03:04
@github-actions github-actions bot added the impala The Apache Impala backend label Feb 4, 2026
Copy link
Member

@gforsyth gforsyth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @NickCrews -- I think this is a good change/fix. I DO think we should keep positional inserts for tables without column names -- it does make everything messier, but it's a nice convenience.

I haven't taken a deep reviewing pass on this yet, but I think this is the right approach.

@NickCrews
Copy link
Contributor Author

Thanks for the reviews @gforsyth and @deepyaman , I'll get this fixed up per your suggestions!

@NickCrews
Copy link
Contributor Author

This PR has gotten quite large. If you want, I can split it up a bit. I think the lowest hanging fruit would be to fix the impala DDL compiler first in a separate PR, that doesn't require any of the .insert() changes.

Then, potentially could split out the typing changes, adding the IntoMemtable TypeAlias.

Then, could actually get to the .insert() stuff.

@NickCrews
Copy link
Contributor Author

@gforsyth and @deepyaman this is ready for your review (or telling me to split it into smaller PRs) whenever you get some time. Thanks! I'm also happy to video chat about any of this if that is easier at all.

@gforsyth
Copy link
Member

Hey @NickCrews -- I think your suggestion to break up this PR is a good one.

gforsyth pushed a commit that referenced this pull request Feb 23, 2026
…fiers and strings (#11942)

Before, it was inserting by position of the passed in table.

I started this in #11776, but
then that got too big, so this is just this bugfix for impala. Now the
groundwork is laid for the rest of that PR.

@gforsyth, this is ready for review as soon as it passes CI. Thanks!
@NickCrews NickCrews marked this pull request as draft February 25, 2026 17:44
@NickCrews NickCrews marked this pull request as ready for review February 26, 2026 18:10
[
param([{"a": 1, "b": 2}], [{"b": 22, "a": 11}], id="column order reversed"),
param([{"a": 1, "b": 2}], [{"a": 11, "b": 22}], id="column order matching"),
param(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is now tested in test_insert_overwrite_from_list

@NickCrews
Copy link
Contributor Author

Boom, finally this is simplified and ready to go, whenever you get the chance @gforsyth @deepyaman @cpcloud

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

breaking change Changes that introduce an API break at any level clickhouse The ClickHouse backend duckdb The DuckDB backend impala The Apache Impala backend snowflake The Snowflake backend sql Backends that generate SQL sqlite The SQLite backend tests Issues or PRs related to tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(breaking): require columns in .insert() to be a subset of the destination columns

3 participants