Releases: dolthub/dolt
1.39.2
Merged PRs
dolt
- 7930: Bump mysql2 from 3.9.7 to 3.9.8 in /integration-tests/mysql-client-tests/node
Bumps mysql2 from 3.9.7 to 3.9.8. - 7929: dolt fetch default spec from empty repo should return silently
Git fetch returns without error when you fetch the default refspec. When you fetch a specific ref you get an error. Dolt now matches this behavior.
Fixes: #7928 - 7925: apply
filter-branch
changes to working/staged changes
This PR adds support for a--apply-to-uncommitted
option todolt filter-branch
, which applies thefilter-branch
changes to the working and staged roots.
fixes #7902 - 7923: [dsess] Cache checks lookup for TPC-C update
- 7922: [writer] skip more deserialization steps in getTableWriter
- 7900: prevent dolt filter branch when it would overwrite unchecked branch's working set
Turns out other branches can have working sets, and dolt-filter branch would drop those. PR prevents that from happening.
adding tests to comments I missed here:
#7895 - 7898: Added workflow for checking DoltgreSQL
This adds a new workflow that runs a subset of the tests in DoltgreSQL to check for any major integration errors. The workflow does not fail if errors are encountered. Instead, it creates a comment stating that failures were found. If no failures were found, then no comment is made. - 7892: dolt admin archive
This hidden admin command will convert the table files in oldgen into archive files, then update the manifest so that you can run queries against the archive for performance testing. Currently we assume thatdolt gc
has been run immediately prior to using this command.
After the build is complete, we lookup every chunk in the archive using the index of the originating table file. We then verify each chunk's key checks out. If this verification fails, exit status 1.
Lot of rough edges still:- Currently no feedback as the build progresses. This is annoying because it can take a fair amount of time
- ChunkSource interface is single threaded, so getMany and hasMany are not going to perform well.
- Lacking checks to ensure that the server isn't running and we have the LOCK on oldgen.
- No bats tests, and this is kind of a temporary thing. There are go tests on key bits.
- 7863: Use the search path to resolve table names in Doltgres
Doltgres enables the UseSearchPath global at startup, which triggers this behavior.
This is a shim to get a proof of concept of this behavior working faster. A better solution, coming next, involves making this behavior pluggable and putting this logic in the Doltgres package, not in Dolt.
Companion PRs:
dolthub/go-mysql-server#2498
dolthub/doltgresql#269
go-mysql-server
- 2520: Default sql mode for common path
Bit strange & verbose, but has a noticeable effect for small queries.
perf here: #7915 - 2519: IndexedTableAccess gets indexing fast path
- 2518: Short circuit for update/delete
Simple updates and deletes skip most of analysis.
perf here: #7907 - 2517: Improve correctness and error messages for JSON functions.
MySQL doesn't do this and neither should we.
MySQL:The only time we should be coercing a JSON-null document into SQL-null is for JSON_EXTRACT (for paths other than "$") and JSON_VALUE (for all paths). But these are already handled separately.mysql> select JSON_INSERT("null", "$.a", 1); +-------------------------------+ | JSON_INSERT("null", "$.a", 1) | +-------------------------------+ | null | +-------------------------------+ 1 row in set (0.00 sec) mysql> select JSON_INSERT("null", "$.a", 1) is null; +---------------------------------------+ | JSON_INSERT("null", "$.a", 1) is null | +---------------------------------------+ | 0 | +---------------------------------------+
- 2515: Zachmu/schemas2 merge
- 2513: Added workflows for checking integrators
This adds a new workflow that runs a subset of tests in Dolt and DoltgreSQL to check for any major integration errors. The workflows do not fail if errors are encountered. Instead, they'll create a comment stating which projects had failures. If no failures were found, then no comment is made. - 2498: New interfaces for resolving table names for databases with schemas
This is a proof of concept to get schema resolution working quickly, and I'm not super happy with the separation of concerns. A better solution would implement table name resolution in the Catalog directly, rather than in the integrator. That effort is significantly hindered by the Catalog being a concrete analyzer implementation with many analyzer-specific details that can't be easily substituted for another implementation. The longer term plan is to perform the extensive refactoring necessary to make the relevant parts of the Catalog swappable, rather than (effectively) having to swap only DatabaseProvider and friends.
Closed Issues
- 7902:
filter-branch
option to apply query toWORKING
andSTAGED
roots - 7928: CLI
dolt fetch <remote>
failed to use the defaultref spec
- 7897: Pomelo Entity Framework connector is not able to commit changes
- 7909: [Question] How to
init
Dolt database programatically?
Performance
Read Tests | MySQL | Dolt | Multiple |
---|---|---|---|
covering_index_scan | 2.11 | 2.97 | 1.4 |
groupby_scan | 13.22 | 17.32 | 1.3 |
index_join | 1.34 | 5.18 | 3.9 |
index_join_scan | 1.27 | 2.18 | 1.7 |
index_scan | 33.72 | 52.89 | 1.6 |
oltp_point_select | 0.17 | 0.5 | 2.9 |
oltp_read_only | 3.36 | 8.13 | 2.4 |
select_random_points | 0.32 | 0.8 | 2.5 |
select_random_ranges | 0.38 | 0.95 | 2.5 |
table_scan | 34.33 | 54.83 | 1.6 |
types_table_scan | 73.13 | 137.35 | 1.9 |
reads_mean_multiplier | 2.2 |
Write Tests | MySQL | Dolt | Multiple |
---|---|---|---|
oltp_delete_insert | 7.98 | 6.21 | 0.8 |
oltp_insert | 3.75 | 3.07 | 0.8 |
oltp_read_write | 8.43 | 15.0 | 1.8 |
oltp_update_index | 3.82 | 3.19 | 0.8 |
oltp_update_non_index | 3.82 | 3.13 | 0.8 |
oltp_write_only | 5.37 | 6.55 | 1.2 |
types_delete_insert | 7.7 | 6.91 | 0.9 |
writes_mean_multiplier | 1.0 |
TPC-C TPS Tests | MySQL | Dolt | Multiple |
---|---|---|---|
tpcc-scale-factor-1 | 101.2 | 25.57 | 4.1 |
tpcc_tps_multiplier | 0.3 |
Overall Mean Multiple | 1.17 |
---|
1.39.1
Merged PRs
dolt
- 7901: Zachmu/schemas2 merge
- 7899: Properly add database collation change when using
-a
option indolt commit
This PR fixes a case where we don't properly handle database collation changes with the-a
option indolt commit
.
fixes #7897 - 7888: [sort] index build streams sorted edits
Use sorting to skip more steps building a prolly map. Shaves maybe 20-25% off of external index rebuilds.
This also fixes a bug where we were incorrectly using only the prefix descriptor to sort secondary index keys.
go-mysql-server
- 2512: Spooling shortcut for one/zero return schemas
Nodes that return zero or one row don't need a beefy channel/wait group setup to execute. They just need to grab the first row and close the iterator. There are several nodes that incorrectly reported their schemas previously, which I've updated to be more accurate. There are some nodes that optionally return rows, which I've simplified to return an empty schema that can be differentiated from the nil schema. We could make the distinction more explicit, also.
bump with perf here: #7894
vitess
- 348: Allowing caching plugin to be specified in string quotes
TheCREATE USER ... IDENTIFIED WITH
syntax (MySQL ref) allows the caching plugin to be specified in string quotes, but our parser only supported identifier quotes.
This came up as part of binlog replication testing – MySQL was sending aCREATE USER
statement from the primary to a Dolt replica, but Dolt wasn't able to parse the statement because of the use of string quotes around the caching plugin name. - 347: Added InjectedStatement
This is the same asInjectedExpr
, except for statements instead of expressions.
Closed Issues
- 7891:
filter-branch
destroys working and staged roots - 7897: Pomelo Entity Framework connector is not able to commit changes
- 7890: Pomelo Entity Framework connector is not able to recreate database.
Performance
Read Tests | MySQL | Dolt | Multiple |
---|---|---|---|
covering_index_scan | 2.07 | 2.97 | 1.4 |
groupby_scan | 13.22 | 17.32 | 1.3 |
index_join | 1.34 | 5.28 | 3.9 |
index_join_scan | 1.27 | 2.22 | 1.7 |
index_scan | 34.33 | 53.85 | 1.6 |
oltp_point_select | 0.17 | 0.51 | 3.0 |
oltp_read_only | 3.36 | 8.28 | 2.5 |
select_random_points | 0.33 | 0.81 | 2.5 |
select_random_ranges | 0.39 | 0.97 | 2.5 |
table_scan | 34.33 | 55.82 | 1.6 |
types_table_scan | 74.46 | 137.35 | 1.8 |
reads_mean_multiplier | 2.2 |
Write Tests | MySQL | Dolt | Multiple |
---|---|---|---|
oltp_delete_insert | 7.98 | 6.21 | 0.8 |
oltp_insert | 3.75 | 3.07 | 0.8 |
oltp_read_write | 8.43 | 15.27 | 1.8 |
oltp_update_index | 3.82 | 3.25 | 0.9 |
oltp_update_non_index | 3.82 | 3.19 | 0.8 |
oltp_write_only | 5.37 | 6.79 | 1.3 |
types_delete_insert | 7.7 | 7.04 | 0.9 |
writes_mean_multiplier | 1.0 |
TPC-C TPS Tests | MySQL | Dolt | Multiple |
---|---|---|---|
tpcc-scale-factor-1 | 101.76 | 24.53 | 4.4 |
tpcc_tps_multiplier | 4.4 |
Overall Mean Multiple | 2.53 |
---|
1.39.0
Merged PRs
dolt
- 7895: prevent
filter-branch
when there are local changes
This PR changes filter-branch to detect any local changes so working/staged changes aren't lost.
A future PR should include changes to have the working set just be applied over the result ofdolt filter-branch
.
partially addresses: #7891
go-mysql-server
- 2512: Spooling shortcut for one/zero return schemas
Nodes that return zero or one row don't need a beefy channel/wait group setup to execute. They just need to grab the first row and close the iterator. There are several nodes that incorrectly reported their schemas previously, which I've updated to be more accurate. There are some nodes that optionally return rows, which I've simplified to return an empty schema that can be differentiated from the nil schema. We could make the distinction more explicit, also.
bump with perf here: #7894 - 2511: Adding mapping to error code 1049 for
ErrDatabaseNotFound
errors
When a database doesn't exist, MySQL returns error code 1049. This change adds a mapping to error code 1049 forErrDatabaseNotFound
errors, and updates our handler so thatComInitDB
messages will map errors to MySQL error codes.
This is needed because tooling (e.g. Pomelo EntityFramework MySQL library) can rely on this error code in application logic.
Related to #7890
Closed Issues
- 7890: Pomelo Entity Framework connector is not able to recreate database.
1.38.3
Merged PRs
dolt
- 7884: [tree] return blob builders to pool after use
I added a builder pool and never returned the objects, this adds the Put(). - 7882: Bug fix: no-op
dolt_pull()
was leaving working set dirty
Customer-reported bug. Twodolt_pull()
operations on two branches in the same session when local branches are already up to date, with@@autocommit
off, leave the session unable to commit because two branch heads are considered dirty. See new bats test for details on reproducing.
The issue is thatDoltSession.SetWorkingSet()
marks that branch head dirty until the transaction is committed. Most merge code paths used by pull involve performing adolt_commit
(), which has the side effect of zeroing out the current transaction, meaning the next statement would get a new transaction and fresh working sets loaded from disk, avoiding the dirty state problem. Only the code path where the branch head is already up to date is affected by this bug. All the merge library code that actually needs to callDoltSession.SetWorkingSet()
(only necessary before adolt_commit
happens, or in the case of a squash where changes should remain in the working set) already does so, making the additional call indolt_pull.go
redundant and leading to this buggy behavior in the no-change case.
There are probably still related bugs for session state management during pull and merge operations, but I want to keep this fix narrow to address the customer issue while I build up more robust (non-bats) tests for pull. - 7878: Move sql patch statement generation APIs to the
sqlfmt
package
We have a few different APIs scattered around for generating SQL patch statements. I needed to make some functions fromdolt_patch_table_function.go
public to generate DDL statements for binlog support, so I moved them into thesqlfmt
package and cleaned up some package import cycles along the way. - 7872: Various test utils and small fixes
As part of the work for binlog source support (on fulghum/binlog_prototype branch), these are various smaller changes to tidy up docs, packaging, small bug fixes, and add new test utils that I've pulled out into this PR to review separately.
Notable changes:- Adds the third version component to the go version in our
go.mod
. Two component versions indicate a development version, not a release version and cause an error about not being able to download a toolchain. - Allows Dolt binlog replicas to accept the
SOURCE_AUTO_POSITION
config parameter, and errors if a user attempts to disable GTID auto positioning. - Adds several new test util functions specific to binlog testing.
- Adds the third version component to the go version in our
- 7870: go/utils/publishrelease: Bump MUSL toolchains used for cutting releases.
The new toolchain uses MUSL + mimalloc.
Include the mimalloc license in our released LICENSES notice. - 7859: Cache table and schema indexes on schema address
The bulk of ~1ms read and write TPC-C queries benefit from caching table and index schemas, which have a lifecycle between schema migrations/alter statements/new table additions. This is in contrast to how we've typically cached objects using the root value hash, which is great for read-only workflows, but has a much shorter half-life.
go-mysql-server
- 2511: Adding mapping to error code 1049 for
ErrDatabaseNotFound
errors
When a database doesn't exist, MySQL returns error code 1049. This change adds a mapping to error code 1049 forErrDatabaseNotFound
errors, and updates our handler so thatComInitDB
messages will map errors to MySQL error codes.
This is needed because tooling (e.g. Pomelo EntityFramework MySQL library) can rely on this error code in application logic.
Related to #7890 - 2510: Fix race errors with memory tables
We use this library for running our tests. These are run with the-race
flag - and we are seeing some errors related to concurrency and updating of the tables map.
I've added async.Mutex
to all the places where this map is updated - our tests are now passing :) - 2504: Added InjectedStatement as an AST node
This is the same asInjectedExpr
, except for statements instead of expressions. - 2502: Use Uint32 for SEQ_IN_INDEX in 'SHOW INDEXES' queries.
This is seemingly the correct type for this field.
MySQL Connector/NET expects this for servers >8.0.1: https://github.com/mysql/mysql-connector-net/blob/8.4.0/MySQL.Data/src/SchemaProvider.cs#L298-L300
Fixes dolthub/go-mysql-server#2501
vitess
- 347: Added InjectedStatement
This is the same asInjectedExpr
, except for statements instead of expressions. - 346: support
DATE
,TIME
, andTIMESTAMP
literal parsing
The SQL standard has special syntax for parsing date, time, and timestring literals.
https://dev.mysql.com/doc/refman/8.0/en/date-and-time-literals.html
This PR adds support for that.
Code was mostly taken from vitessio.
The types are still left as string types, as type conversion later on handles it just fine. - 345: parse type aliases in cast
add support for statements like:select cast(<str> as character)
select cast(<str> as double precision)
select cast(<str> as read)
Closed Issues
1.38.2
Merged PRs
dolt
- 7880: Migrate
dolt remote
to SQL
Fixes: #7622 - 7879: avoid
NewEmptyIndex
when table does not exist
It would be better to have NewEmptyIndex not write to chunkstore, but I'm not sure if it's possible right now.
Workaround is to just avoid calling it altogether in this particular case.
Closed Issues
- 7622: Migrate
dolt remote
to SQL
Performance
Read Tests | MySQL | Dolt | Multiple |
---|---|---|---|
covering_index_scan | 2.07 | 3.02 | 1.5 |
groupby_scan | 13.22 | 17.32 | 1.3 |
index_join | 1.34 | 5.18 | 3.9 |
index_join_scan | 1.27 | 2.18 | 1.7 |
index_scan | 35.59 | 53.85 | 1.5 |
oltp_point_select | 0.17 | 0.51 | 3.0 |
oltp_read_only | 3.36 | 8.28 | 2.5 |
select_random_points | 0.33 | 0.8 | 2.4 |
select_random_ranges | 0.39 | 0.95 | 2.4 |
table_scan | 35.59 | 55.82 | 1.6 |
types_table_scan | 75.82 | 137.35 | 1.8 |
reads_mean_multiplier | 2.1 |
Write Tests | MySQL | Dolt | Multiple |
---|---|---|---|
oltp_delete_insert | 7.98 | 6.67 | 0.8 |
oltp_insert | 3.75 | 3.25 | 0.9 |
oltp_read_write | 8.43 | 15.83 | 1.9 |
oltp_update_index | 3.82 | 3.49 | 0.9 |
oltp_update_non_index | 3.82 | 3.43 | 0.9 |
oltp_write_only | 5.37 | 7.56 | 1.4 |
types_delete_insert | 7.7 | 7.56 | 1.0 |
writes_mean_multiplier | 1.1 |
TPC-C TPS Tests | MySQL | Dolt | Multiple |
---|---|---|---|
tpcc-scale-factor-1 | 101.66 | 22.89 | 4.4 |
tpcc_tps_multiplier | 4.4 |
Overall Mean Multiple | 2.53 |
---|
1.38.1
Merged PRs
dolt
- 7877: keep
sql.Schema
for conflicts table schema
Previously, we would create a new schema ofNomStringKind
for every column. Now, we just reuse the underlying sql.Schema.
#7874 - 7876: Improve error messages for CLI commands when a sql-server is running
Related to #7873
Resolves #7875 - 7866: update release process for new config refactor
- 7864: move config
- 7862: Bug fix:
sql-server
should initialize persisted global vars
The local config store (.dolt/config.json
) can store persisted global variable values, but when--data-dir
is used when starting asql-server
, the local configuration doesn't get loaded properly. - 7860: Bug fix: load local config when using
--data-dir
When using the--data-dir
flag to work on a Dolt directory outside of the current working directory, the local configuration in the Dolt directory wasn't getting correctly loaded. This change evaluates the--data-dir
parameter earlier, so that the first time we load the Dolt environment, we can pass the data directory and get the local configuration loaded correctly. - 7858: [nbs] safer peek root hash record
- 7848: Added additional function to RootValue
This just adds a function to theRootValue
for special merge logic, which is used by Doltgres. - 7846: [dsess] session trigger cache
go-mysql-server
- 2499: fix
LIKE NULL
edge case
This PR fixes an edge case whereSELECT <str> LIKE NULL
should returnNULL
instead offalse
. - 2497: trim whitespace when converting strings to numbers
fixes #7854 - 2495: fix panic in
VALUES
constructor
When the number of rows in a... VALUES ROW(...), ROW(...)
statement were not equal, we would throw a panic.
This PR also unskips some tests that are now fixed.
Companion PR: #6849
fixes: #6849 - 2494: Replace count star also matches single column pk
- 2493: Implement status variables for
Slow_queries
,Max_used_connections
,Com_select
, andConnections
Adds support for four new status variables:- Slow_queries
- Max_used_connections
- Com_select
- Connections
Note thatConnections
currently only reports the successful connection attempts, but MySQL includes all connection attempts in that status variable. To capture the failed attempts, we'll need to expose that information from the Vitess layer.
Also removes a mutex that was covering the whole scope over all status variables. Now that each individual status variable has a value that uses anatomic
instance, we don't need to synchronize at a larger scope.
Related to #7646
- 2492: skip source values analyze when it only contains simple types
vitess
- 345: parse type aliases in cast
add support for statements like:select cast(<str> as character)
select cast(<str> as double precision)
select cast(<str> as read)
- 344: make
row
optional in VALUES constructor and insert statement
This PR adds additional syntax support for VALUE constructor.
fixes #6849
fixes #7853 - 338: Add a schema qualifier to table names
Closed Issues
- 7873: Running sql-server from an empty state make inconsistent repository
- 7874: Failed to write conflicts table
- 7875: Confusing error messages when using Dolt CLI from within a running Dolt sql-server directory
- 7854: trim whitespace when casting from string
- 7853: make
ROW
keyword optional inVALUES
statement - 6849: support
INSERT INTO ... (VALUES ROW(...))
statement - 7845: Make sure deleting a branch behaves similarly to
drop database
Performance
Read Tests | MySQL | Dolt | Multiple |
---|---|---|---|
covering_index_scan | 2.07 | 3.02 | 1.5 |
groupby_scan | 13.22 | 17.63 | 1.3 |
index_join | 1.34 | 5.18 | 3.9 |
index_join_scan | 1.27 | 2.22 | 1.7 |
index_scan | 34.33 | 54.83 | 1.6 |
oltp_point_select | 0.17 | 0.51 | 3.0 |
oltp_read_only | 3.36 | 8.43 | 2.5 |
select_random_points | 0.32 | 0.8 | 2.5 |
select_random_ranges | 0.39 | 0.95 | 2.4 |
table_scan | 34.33 | 55.82 | 1.6 |
types_table_scan | 73.13 | 137.35 | 1.9 |
reads_mean_multiplier | 2.2 |
Write Tests | MySQL | Dolt | Multiple |
---|---|---|---|
oltp_delete_insert | 7.98 | 6.67 | 0.8 |
oltp_insert | 3.75 | 3.25 | 0.9 |
oltp_read_write | 8.28 | 15.83 | 1.9 |
oltp_update_index | 3.82 | 3.49 | 0.9 |
oltp_update_non_index | 3.82 | 3.43 | 0.9 |
oltp_write_only | 5.28 | 7.56 | 1.4 |
types_delete_insert | 7.56 | 7.56 | 1.0 |
writes_mean_multiplier | 1.1 |
TPC-C TPS Tests | MySQL | Dolt | Multiple |
---|---|---|---|
tpcc-scale-factor-1 | 101.35 | 22.45 | 4.9 |
tpcc_tps_multiplier | 4.9 |
Overall Mean Multiple | 2.73 |
---|
1.38.0
Merged PRs
This minor release includes a new entry in the dolt_status
and dolt_diff
system tables for database collation changes, making these tables backwards incompatible for some select statements. Changes to a dolt database collation will show up as table changes with the name __DATABASE__<db>
. Additionally, tables starting with this prefix are not allowed.
dolt
- 7823: handle database charset/collation changes
This PR makes dolt aware of database collation changes.
We treat database collation changes similarly to a collation change to a table.
To properly show adolt diff
we need to add support forshow create database as of ...
, which would require changes to vitess and gms. For now, we just show the new create statement.
Addtionally, we should add support to resolve database collation merge conflicts.
Affected functions are:dolt add
dolt commit
dolt status
dolt diff
dolt merge
Addresses: #7815
- 7819: use parser interface in engine
- 7803: Avoid escaping HTML characters when displaying them to the user.
This fixes an issue where if a JSON document in the storage layer contains escaped characters, those escape sequences could end up being displayed to the user via thedolt sql -r json
command. - 7764: Bump golang.org/x/net from 0.17.0 to 0.23.0 in /go
Bumps golang.org/x/net from 0.17.0 to 0.23.0.
go-mysql-server
- 2492: skip source values analyze when it only contains simple types
- 2491: ValidateInsertColumns avoids allocating hash map
- 2490: Avoid escaping HTML when Marshalling JSON
Due to a misconfiguration, HTML characters were being escaped when marshaling JSON. This is unnecessary, and since we now potentially display marshalled JSON to the user, we shouldn't be doing this. - 2488: System Variables: Add
log_bin
and change the default ofperformance_schema
Thelog_bin
system variable controls whether a MySQL server logs to the binary log or not.
Theperformance_schema
system variable was previously defaulted to1
, to match MySQL's default, but this can cause tools (e.g. Datadog) to believe that theperformance_schema
system tables are available, and then error out when trying to query them. Since we don't provide aperformance_schema
database, the new default for theperformance_schema
system variable is0
. - 2487: Expand literals in comparisons when safe
- 2486: add parser interface in engine
This PR createssql.Parser
interface. This interface is defined in the engine and it should be used rather than using mysql parser directly.
AddedGlobalParser
variable to expose Doltgres parser for parsing view definition for now. It can also be used in places that needs doltgres-specific syntax parsing.
Closed Issues
1.37.0
The previous (now deleted) release 1.36.1 had a start up time issue for databases > 10GB. We patched it with this one. That release was only up for an hour or so, so it is unlikely anyone got it. Thus, we moved this to 1.37.0 to warn people, just in case.
This minor release includes an internal interface change to the chunk journal index. The first startup process for a database with the old index format will perform a rewrite. This rewrite is a one-time penalty that in testing is <5% of the time it would take to reimport the database.
Merged PRs
dolt
- 7833: Bug fix: Apply replication settings for newly cloned databases
Dolt SQL servers using remote-based replication will pull new databases if the @@dolt_replication_remote_url_template system variable is configured, but those new databases weren't getting configured to continue pulling updates from the remote.
This change registers the newly cloned databases asReadReplicaDatabase
instances, so that they will poll their remote and pull new commits. It also adds some additional logging to help debug issues with remote-based replication. - 7829: Changed RootValue into an interface
Companion:- dolthub/doltgresql#232
This changes theRootValue
into an interface. Every function that seems unique to Dolt'sRootValue
has been changed into a function variable, with the variable being overwritten from Doltgres to point to a different function.
- dolthub/doltgresql#232
- 7799: Archive Serialization and Deserialization
This PR doesn't direclty change any Dolt behavior. It just lays the groundwork for archive creation and reading. Currently, no file is materialized by this code as unit tests exercise it with ByteSinks. - 7780: Reformat journal index
Change the way we write journal index lookups. Each write appends a lookup to abufio.Writer
that lazily writes to disk. And after some increment we flush a CRC/root value record for consistency checking the index during bootstrap. This avoids big stalls for flushing a batch of index records. We also only write anaddr16
now, because that's what we load into the default chunk address map.
Databases with the older format will pay a one-time startup penalty to rewrite the journal index. In testing this appears to be 5-10% of the import time for the database. - 7836: Journal index offset 8bytes
On >10GB datasets, offsets overflow uint32. Bug from previous PR #7780 - 7834: minver refactor to be used by doltgres
- 7821: Prevent panic when dropping columns in schema merge
Fixes #7762
In certain cases, performing a schema merge when the merged schema had fewer columns than the base schema would cause a panic.
We actually had a test for this, but the test was disabled because a limitation in how the test harness generated column tags was causing incorrect detection of merge conflicts.
To re-enable these tests, this PR slightly relaxes the logic for merge conflicts wrt column tags. This is safe to do because column tags shouldn't influence the result of merges outside of helping to identify renamed columns, so long as the merge behaves the same in both directions.
Closed Issues
- 7762: Panic during schema merge
Performance
Read Tests | MySQL | Dolt | Multiple |
---|---|---|---|
covering_index_scan | 2.07 | 2.97 | 1.4 |
groupby_scan | 13.22 | 17.63 | 1.3 |
index_join | 1.37 | 5.18 | 3.8 |
index_join_scan | 1.27 | 2.22 | 1.7 |
index_scan | 34.33 | 53.85 | 1.6 |
oltp_point_select | 0.17 | 0.51 | 3.0 |
oltp_read_only | 3.36 | 8.43 | 2.5 |
select_random_points | 0.33 | 0.8 | 2.4 |
select_random_ranges | 0.39 | 0.95 | 2.4 |
table_scan | 34.33 | 54.83 | 1.6 |
types_table_scan | 74.46 | 134.9 | 1.8 |
reads_mean_multiplier | 2.1 |
Write Tests | MySQL | Dolt | Multiple |
---|---|---|---|
oltp_delete_insert | 7.98 | 6.91 | 0.9 |
oltp_insert | 3.75 | 3.43 | 0.9 |
oltp_read_write | 8.43 | 16.12 | 1.9 |
oltp_update_index | 3.82 | 3.55 | 0.9 |
oltp_update_non_index | 3.82 | 3.43 | 0.9 |
oltp_write_only | 5.37 | 7.84 | 1.5 |
types_delete_insert | 7.7 | 7.56 | 1.0 |
writes_mean_multiplier | 1.1 |
TPC-C TPS Tests | MySQL | Dolt | Multiple |
---|---|---|---|
tpcc-scale-factor-1 | 102.12 | 22.29 | 4.6 |
tpcc_tps_multiplier | 4.6 |
Overall Mean Multiple | 2.60 |
---|
1.36.0
This version does not include an interface change, but does include large changes to the performance and network utilization behavior of dolt fetch
and related functionality, such as shallow clone and Dolt cluster replication.
Merged PRs
dolt
- 7828: Bug fix: Allow
dolt init
to use--data-dir
param
Previously,dolt init
would use the value of--data-dir
for almost all of the repository initialization, but the code that set up repository configuration would always use the current directory. This change allows callers to usedolt init
with the--data-dir
param to initialize directories other than the current working directory as Dolt repositories. - 7825: Bug fix: Decimal type binlog serialization
- 7824:
dolt fetch
: Implement pipelined, continuous downloads during pulls from DoltHub anddolt sql-server
remotes.
Doltfetch
,pull
, shallow clone and cluster replication will now make more aggressive utilization of available network resources. - 7816: allow database alters and case insensitive check for info schema
fixes: #7814
go-mysql-server
- 2488: System Variables: Add
log_bin
and change the default ofperformance_schema
Thelog_bin
system variable controls whether a MySQL server logs to the binary log or not.
Theperformance_schema
system variable was previously defaulted to1
, to match MySQL's default, but this can cause tools (e.g. Datadog) to believe that theperformance_schema
system tables are available, and then error out when trying to query them. Since we don't provide aperformance_schema
database, the new default for theperformance_schema
system variable is0
. - 2485: Have LazyJSONDocument implement fmt.Stringer and driver.Valuer, in order to interoperate with other go SQL libraries.
Closed Issues
1.35.13
Merged PRs
dolt
- 7818: Apply a factor to better estimate
information_schema.TABLES.DATA_LENGTH
information_schema.TABLES.DATA_LENGTH
currently reports the max possible table size for a table, and doesn't take into account table file compression or that variable length fields (e.g. TEXT) are not always fully used. Tools such as DBeaver use this metadata to display table sizes, and since the estimates can easily be orders of magnitude greater than the actual size on disk, it can cause customers to be concerned by the reported sizes (e.g. #6624).
As a short-term fix to make these estimates more accurate, we apply a constant factor to the max table size. I came up with this scaling factor by measuring a best case scenario (where no fields are variable length) and a worst case scenario (were all fields are variable length and only use a few bytes), then picking a value roughly in the middle. Longer-term, a better way to estimate table size on disk will be to use statistics data. - 7810: fix output for
dolt diff --stat -r json
This PR tidys up the code for printing diffs, specifically for JSON result format, and prints--stat
correctly for JSON result format.
Additionally, we throw an error for SQL result format instead of just returning incorrect output. It might be worth implenting now, but I can just make an issue for it.
fixes: #7800 - 7809: go/libraries/doltcore/sqle/dprocedures: dolt_pull.go: Improve CPU utilization of call dolt_pull.
- 7805: Fix: allow
jsonSerializer
to load JSON fromLazyJSONDocument
- 7804: Changing database init/drop hooks to be a slice of hooks
The Dolt database provider currently has a single init hook and a single drop hook. We have a few hooks, and in order to support multiple hooks, we chain them together. Binlog replication will also need to register a similar init and drop hook to capture database create/drop actions, so to prepare for that, this PR turns the single init hook and single drop hook into a slice of init hooks and a slice of drop hooks. - 7802: adding
--name-only
option fordolt diff
This PR adds support for--name-only
option fordolt diff
, which just prints the tables that have changed between the two commits. This mirrorsgit diff --name-only
.
fixes: #7797 - 7795: Serialization code for binlog events
Provides support for serializing all Dolt data types into MySQL's binary encoding used in binlog events. Vitess provides good support for deserializing binary values from binlog events into Go datatypes, but doesn't provide any support for serializing types into MySQL's binary format. This PR pulls data out of Dolt's storage system and encodes it into MySQL's binary format. It would be interesting to split out the Dolt storage system specific code and the core MySQL serialization logic in the future, but this seems like the right first step.
Related to #7512 - 7785: Use
LazyJSONDocument
when reading from a JSON column.
This is the Dolt side of #7749
The GMS PR is dolthub/go-mysql-server#2470
LazyJSONDocument
is an alternate implementation ofsql.JSONWrapper
that takes a string of serialized JSON and defers deserialization until it's actually required.
This is useful because in the most common use case (selecting a JSON column), deserialization is never required.
In an extreme example, I created a table with 8000 rows, with each row containing a 80KB JSON document.
dolt sql -q "SELECT * FROM test_table"
ran in 47 seconds usingJSONDocument
, and 28 seconds usingLazyJSONDocument
, nearly half the time.
Even in cases where we do need to deserialize the JSON in order to filter on it, we can avoid reserializing it afterward, which is still a performance win.
Of note: In some cases we use a special serializer (defined injson_encode.go::marshalToMySqlString
) in order to produce a string that is, according to the docstring "compatible with MySQL's JSON output, including spaces."
This currently gets used- In Query Diff
- When hashing values for fulltext tables
- When casting JSON columns to a text type
- When writing values along the wire
The last one is the most worrying, because it means that we can't avoid the serialization round-trip if we're connecting to a dolt server remotely. I discussed with Max whether or not we consider it a requirement to match MySQL's wire responses exactly for JSON, and agreed that we could probably relax that requirement. Casting a document to a text type will still result in the same output as MySQL.
- 7754: Index rebuilds with external key sorting
Index builds now write keys to intermediate files and merge sort before materializing the prolly tree for the secondary index. This contrasts the default approach, which rebuilds the prolly tree each time we flush keys from memory. The old approach reads most of the tree with random reads and writes when memory flushes are unsorted keys. The new approach structures work for sequential IO by flushing sorted runs that become incrementally merge sorted. The sequential IO is dramatically faster for disk-based systems.
go-mysql-server
- 2485: Have LazyJSONDocument implement fmt.Stringer and driver.Valuer, in order to interoperate with other go SQL libraries.
- 2470: Add
LazyJSONDocument
, which wraps a JSON string and only deserializes it if needed.
This is the GMS side of #7749
This is a newJSONWrapper
implementation. It isn't used by the GMS in-memory storage, but it will be used in Dolt to speed upSELECT
queries that don't care about the structure of the JSON.
A big difference between this andJSONDocument
is that even after it de-serializes the JSON into a go value, it continues to keep the string in memory. This is good in cases where we would want to re-serialize the JSON later without changing it. (So statements likeSELECT json FROM table WHERE json->>"$.key" = "foo";
will still be faster.) But with the downside of using more memory thanJSONDocument
) - 2469: refactor index validation and prevent indexes over json columns
This PR consolidates the logic to validate if an index.
Additionally, it fixes a bug wherecreate table t (i int, index (i, i));
was allowed.
fixes: #6064 - 2466: Schema-qualified table names
This PR also fixes a couple unrelated issues:- IMDB query plans are brought up to date (this is most of the change lines)
- Fixed bugs in certain show statements (information_schema tests)
Closed Issues
- 7813: Ability to export diffs as SQL
- 6624: Table size calculation using DATA_LENGTH in information schema is naive and massively overstates the size of tables
- 7800:
dolt diff --stat -r json
produces invalid JSON - 7749: Dolt serializes and deserializes JSON unnecessarily.
- 7797:
dolt diff
... that only shows the tables changed in a simpler format
Performance
Read Tests | MySQL | Dolt | Multiple |
---|---|---|---|
covering_index_scan | 2.14 | 3.13 | 1.5 |
groupby_scan | 13.46 | 17.95 | 1.3 |
index_join | 1.37 | 5.28 | 3.9 |
index_join_scan | 1.27 | 2.26 | 1.8 |
index_scan | 34.33 | 54.83 | 1.6 |
oltp_point_select | 0.17 | 0.51 | 3.0 |
oltp_read_only | 3.43 | 8.43 | 2.5 |
select_random_points | 0.33 | 0.8 | 2.4 |
select_random_ranges | 0.39 | 0.97 | 2.5 |
table_scan | 34.33 | 54.83 | 1.6 |
types_table_scan | 74.46 | 137.35 | 1.8 |
reads_mean_multiplier | 2.2 |
Write Tests | MySQL | Dolt | Multiple |
---|---|---|---|
oltp_delete_insert | 7.98 | 6.91 | 0.9 |
oltp_insert | 3.75 | 3.43 | 0.9 |
oltp_read_write | 8.43 | 16.41 | 1.9 |
oltp_update_index | 3.82 | 3.55 | 0.9 |
oltp_update_non_index | 3.82 | 3.49 | 0.9 |
oltp_write_only | 5.37 | 7.98 | 1.5 |
types_delete_insert | 7.7 | 7.56 | 1.0 |
writes_mean_multiplier | 1.1 |
TPC-C TPS Tests | MySQL | Dolt | Multiple |
---|---|---|---|
tpcc-scale-factor-1 | 101.88 | 22.32 | 4.9 |
tpcc_tps_multiplier | 4.9 |
Overall Mean Multiple | 2.73 |
---|