Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.Net: Change Sqlite connector to accept connection string instead of DbConnection #10550

Open
wants to merge 1 commit into
base: feature-vector-data-preb1
Choose a base branch
from

Conversation

roji
Copy link
Member

@roji roji commented Feb 14, 2025

  • I believe that IVectorStore and IVectorStoreRecordCollection should have clear guarantees around thread-safety/concurrency, which should be clearly documented etc. For example, we shouldn't have one implementation/instance of IVectorStoreRecordCollection which is thread-safe, and another which isn't (in the way that SqliteVectorStoreRecordCollection is, by closing over SqliteConnection, which itself isn't threasd-safe).
  • If we agree on the above, I believe the MEVD abstractions should be thread-safe; IVectorStore and IVectorStoreRecordCollection conceptually represent/reference database-side entities (the database itself, a collection within the database), and not e.g. a connection to the database (like DbConnection); so an instance doesn't represent something that's client-side stateful (again, like a connection). Also, in most databases, the client is thread-safe, representing simply a way to execute calls (e.g. via gRPC) on the database.
  • The proposed implementation is very similar to the PostgreSQL one - a (pooled) connection is retrieved and opened for each operation, and then closed. Note that this means that Data Source=:memory: cannot be used, since the databases only lives while the connection is open, and disappears immediately when closed.
  • Unlike the PostgreSQL implementation, there's no SqliteDataSource like there is NpgsqlDataSource (but I hope there will be at some point - see #30991); there's a DbDataSource abstraction in ADO.NET which we could theoretically accept, but that's not accepted usage, and in any case, we require a SqliteConnection (not DbConnection) in order to e.g. call LoadExtension(), and so would need to down-cast DbConnections coming out of DbDataSource in any case. As a result, I'm proposing that for now, the Sqlite connector simply accept a connection string, which is the natural/common way for ADO.NET-related things to work. In the future, when a SqliteDataSource is introduced, we can add support for that like we have for NpgsqlDataSource.
  • I started out trying to preserve (but obsolete) the ability to accept SqliteConnection (to avoid the breaking change), but it turned out quite complicated, requiring lots of refactoring just for that, and a bit of risk of getting it wrong. Since this would be a temporary, two-month obsoletion only, and this is a SQLite-only break, I'm proposing to break this now rather than do all the effort and take on the risk for the two months we have. So I've obsoleted the relevant APIs with [Obsolete(error: true)], to have an informative error message for the user.

Closes #10454

/cc @westey-m @dmytrostruk @adamsitnik

@markwallace-microsoft markwallace-microsoft added .NET Issue or Pull requests regarding .NET code kernel Issues or pull requests impacting the core kernel memory labels Feb 14, 2025
@github-actions github-actions bot changed the title Change Sqlite connector to accept connection string instead of DbConnection .Net: Change Sqlite connector to accept connection string instead of DbConnection Feb 14, 2025
@@ -60,24 +43,9 @@ public static IServiceCollection AddSqliteVectorStore(
string connectionString,
SqliteVectorStoreOptions? options = default,
string? serviceId = default)
{
services.AddKeyedTransient<IVectorStore>(
=> services.AddKeyedSingleton<IVectorStore>(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note the change from transient to singleton lifecycle here - this aligns with other connectors.

@roji roji marked this pull request as ready for review February 15, 2025 19:24
@roji roji requested a review from a team as a code owner February 15, 2025 19:24
Copy link
Member

@adamsitnik adamsitnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see my question, I would like to get a better understanding of the connection pooling first.

}

#endregion

#region private

private async ValueTask<SqliteConnection> GetConnectionAsync(CancellationToken cancellationToken = default)
{
var connection = new SqliteConnection(this._connectionString);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does SqliteConnection type implements it's own private pool of connections? So we don't actually open a new connection every time this method is being called?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kernel Issues or pull requests impacting the core kernel memory .NET Issue or Pull requests regarding .NET code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants