Releases: dolthub/dolt
0.13.1
We are releasing a patch fix to Dolt, as the 0.13.0 release contained a bug. The new feature for creating license and read me documents on a repository contained a bug such that when cloning a repository using dolt clone
, the documents were not updated. This patch ensures that functionality works correctly.
Since this is a patch of a recent release, see 0.13.0 release notes for details about the new features recently introduced.
Merged PRs
0.13.0
We are excited to announce the release of Dolt 0.13.0, hot on the heels of relaunching DoltHub.
Easy Install Script
It's now incredibly easy to install Dolt, so if you haven't tried it yet, you can now obtain a copy with a single command, and start playing with datasets:
$ curl -L https://github.com/liquidata-inc/dolt/releases/latest/download/install.sh | bash
The installer script works on Mac and Linux. For Windows, download the MSI installer below.
System Tables
We released a blog post detailing some exciting new functionality for surfacing versioning data in Dolt. This is the first of a set of features that will eventually expose all the Git-like internals of Dolt to SQL, and facilitate automated use of Dolt by allowing users to define their default choices inside SQL statements.
- dolt_log: Access the same information as the
dolt log
command via a SQL query - dolt_diff_$table: A system table for each of your tables, which lets you query the diff between two commits. See the blog post for more details.
- dolt_history_$table: A system table for each of your tables, which lets you query past values of rows in the table at any commit in its history. See the blog post for more details.
LICENSE and README functionality
We now allow users to create License and Readme documents as part of their Dolt repository, these appear as LICENSE.md
and README.md
files in the root of your repo. Edit them with the text editor of your choice, then add them to a commit with dolt add
, the same as a table. Their contents are versioned alongside your tables' data. License and Readme files will soon be visible on DoltHub for repositories that provide them. Allowing users to specify the terms on which data is available is an important step towards creating a vibrant data-sharing community.
Views
Our SQL implementation now supports persistent views, taking us closer to having a fully functioning SQL engine. Create a view using the standard SQL syntax:
CREATE VIEW myview AS SELECT col1, col2 FROM mytable
Then query it like any other table:
SELECT * FROM myview
Other
We made performance enhancements to SQL, including supporting indexed joins on a table's primary key columns. This should make the engine usable for joins on the primary key columns of two tables. Additional improvements in join performance are in the works. We also fixed assorted bugs and made performance improvements in other areas of the SQL engine.
Merged PRs
- 331: Removed Windows carriage-return and trailing whitespace from bats tests
- 329: CSV export compliant with RFC 4180
- 328: bats/helper/windows-compat.bash: Try mktemp on Windows.
- 326: one down
The other 32 skipped bats tests are confirmed to fail - 324: Removed old table and schema commands from the command line
- 323: fix buffered sequence iterator and put it back in row iterator
- 322: reverting buffered iter
- 320: bats/creds.bats: Debug windows failures.
- 319: Added a bats test for committing views and referencing them later
Added some checks for checked in views. - 318: Added test case for dolt reset --hard on new tables
- 316: Buffered Sequence Iterator
- Created a new interface
sequenceIterator
for the use case whensequenceCurosor
is simply accessing elements in its sequence (ieMapIterator
,SetIterator
, andListIterator
) - Created a new buffered implementation of
sequenceIterator
designed by @reltuk to batch chunk fetching from theValueStore
. In use cases such as DoltHub where chunk fetching IO is slow, this will dramatically accelerate performance.
- Created a new interface
- 310: Km/non-trivial merge of master into doc feature branch
This is just a merge from master into my doc feature branch. So you can ignore that there are many commits authored by not-me.
I wanted to get eyes on the last commit before I merge it (d4de259). I had to remove 2 of the HasDoltPrefix checks that was breaking create-views.bats. Now i'm checking for DocTableName explicitly. I left the HasDoltPrefix function since I'm using it in the commands package, and presume we'll eventually need to use it again the sqle package. - 308: Updated to latest go-mysql-server. Re-enabled indexes by default, and…
… un-skipped an integration test of indexed join behavior. - 307: go/utils/publishrelease: First pass at an install.sh
- 306: Bumped go-mysql-server version
- 305: bats/creds.bats: Some initial bats tests for dolt creds new, ls and rm.
- 302: Km/doc tests
This PR:- Simplifies tests in docs.bats
- Adds tests for some helper functions in
doltdb/root_val_test.go
Will do more testing tomorrow, but wanted to get this in
- 301: dumps docs
This code dumps the standard command line help pages for every command that isn't hidden.
Because we only had functions for each command it was difficult to add a new method that would be implemented for each command, so I had to refactor all of that code. The refactor makes up the bulk of the PR. - 299: dolt checkout, and merge with dolt docs, with bats coverage
This PR includes:- Fixed a bug where dEnv.Docs was not always matching the docs of the current repo state (working root). This required changing the Docs type in the
env
package to[]doltdb.DocDetails
from[]*doltdb.DocDetails
. You'll see some reformatting to accommodate this change. checkout <doc>
checkout <branch>
merge <branch>
(one scenario is still buggy, need help identifying solution)- FF merge - docs on the FS get updated to target branch
- Merge with conflicts - docs on the FS remain as is
- Merge auto resolved conflicts (currently buggy) - docs on the FS should be updated to targetBranch, but they should not be added to the new working root. This would allow
dolt status
to indicate that the doc needs to be added and committed to finish merging. Right now it appears the doc is getting added to the working root.
- Fixed a bug where dEnv.Docs was not always matching the docs of the current repo state (working root). This required changing the Docs type in the
- 298: go/cmd/dolt/commands/sql: Add view persistence into dolt database.
- 296: go/cmd/dolt: credcmds/check: Add dolt creds check command.
- 295: update go-mysql-server to be the latest from liquidata-inc/go-mysql-s…
…erver@ld-master - 294: Added indexes to dolt sqllogictest harness and updated dependency on …
…go-mysql-server. - 293: go/cmd/dolt/commands/credcmds: Add documentation and a little bit of chrome to dolt creds commands.
- 291: Fixed ignoring an error in put-row
- 290: go/go.mod: Run go get -u all. Migrate to dbr/v2.
- 289: Tim/add docs bats
This is the test for branch, merge, and conflict resolve behavior. You can break it into multiple tests if you want but I think this is fine. - 288: add diff_type column to be able to select where diff_type is added, r…
…emoved, or modified - 287: fixes casing issue with system tables
- 285: Added bad describe bats test per testing session with Katie
- 284: {go,bats}: Implement dolt diff by parsing docs from args, with …
…bats test - 283: Zachmu/explain
Fixed describe table statements, and unskipped related tests. - 282: change the date field to be a Sql.DateTime
Output of the date field was in a format that wasn't able to be sorted properly. - 281: fix select on system table that doesn't exist
fix select on a system table that has a valid prefix but whose suffix does not match a valid table.
What makes this a little bit tough is that you can query diffs or the history of a table that no longer exists. So need to process the entire history and then see if at any time there was a schema'd table with the given name. - 280: {bats, go/libraries/doltcore/sqle/database.go}: Remove DoltNamespace from
dolt sql
command - 279: {go,bats}: Remove DocTableName from dolt table, schema, ls, add, reset, diff
This PR removes DocTableName from the outstanding commands so we don't expose the dolt docs table. - 278: {go,bats}: Add dolt docs to
dolt diff
This PR adds docs to the dolt di...
0.12.0
We are excited to announce the release of Dolt 0.12.0!
Community
We have our first open-source committer to the Dolt project! Thanks to @namdnguyen for providing a helpful fix to our documentation. We are hoping this will be the first of many open-source contributions to Dolt.
SQL
As discussed in this blog post, we use sqllogictest to test our SQL implementation's logical correctness. It contains 5 million tests! This release marks a huge jump in compliance, with our implementation now hitting 89%, up from well under 50% just a few weeks ago.
Diff With Predicate
--diff-where
command allows the user to add a predicate on the table being diff'd to reduce the surface area of the diff output and drill into specific data of interest.
Override Commit Date
When a user commits data, a timestamp is associated with the commit. By allowing Dolt users to customize the timestamp we allow the user to create an implicit bi-temporal database (based on commit time) while maintaining the ordinal integrity of the commit graph for querying history and reasoning about the sequence of updates.
SQL Diffs
Using the SQL diff command, that is dolt diff -q
or dolt diff --sql
, users can produce SQL output that will transform one branch into another. In other words this command will produce the difference, in data and schema transformations, between two refspecs in the commit log of Dolt repository.
As usual, this release also contains bug fixes and performance improvements. Please create an issue if you have any questions or find a bug.
Merged PRs
- 241: Bumped version and added release script
- 239: bats/create-views.bats: Pick up go-mysql-server support for views.
- 237: go/performance/benchmarks: remove id from results
- 236: Noticed an alter table test that now works was skipped. Unskipped.
- 235: Fix typo in README for table import
I ran into this typo while using Dolt yesterday. The command keywords were in the incorrect order in the README. - 233: Zachmu/sql batch
Killed off original sql batch inserter and implemented equivalent functionality for new engine. - 232: Andy/sqldiffrefactor
- 230: go/store/nbs: table_set.go: Rebase: Reuse upstream table file instances when supplied table specs correspond to them.
- 229: fix schema diff primary key changes
Output looks like this for changing a pk:Also add the pk contstraint so it shows when it is not changing:--- a/test @ 4uvb6bb3p7dqudnuidh9oh4ccsehik7n +++ b/test @ 2tl4quv92ot0jg4v3ai204rld00trbo4 CREATE TABLE test ( `pk` BIGINT NOT NULL COMMENT 'tag:0' - `c1` BIGINT COMMENT 'tag:1' `c2` BIGINT COMMENT 'tag:2' `c3` BIGINT COMMENT 'tag:3' `c4` BIGINT COMMENT 'tag:4' `c5` BIGINT COMMENT 'tag:5' < PRIMARY KEY (`pk`, `c1`) > PRIMARY KEY (`pk`) );
--- a/test @ idfqe6c5s2i9ohihkk4r4tj70tf3l8c7 +++ b/test @ 2tl4quv92ot0jg4v3ai204rld00trbo4 CREATE TABLE test ( `pk` BIGINT NOT NULL COMMENT 'tag:0' `c1` BIGINT COMMENT 'tag:1' `c2` BIGINT COMMENT 'tag:2' < `c3` BIGINT COMMENT 'tag:3' > `newColName3` BIGINT COMMENT 'tag:3' `c4` BIGINT COMMENT 'tag:4' `c5` BIGINT COMMENT 'tag:5' PRIMARY KEY (`pk`, `c1`) );
- 228: bh/add commit date
- 227: Bug fixes for sqllogictest dolt harness:
- More inclusive types
- Better error handling for panics
- Cheat on tables without primary keys to allow more tests (~40%) to succeed.
- 225: disable benchmarking dolt sql imports
- 224: go/cmd/dolt: commands/sql: Keep the sql engine around throughout the lifetime of the shell / batch import.
- 221: improved super schema names
- 220: update go-mysql-server dependency
- 219: dolt benchmarking
Initial approach is to write a script that will run n benchmarks, collect their results, then serialize those results to later be imported intodolt
. Looking for feedback on approach before I head too far down this path, if it is suboptimal.
In it's current state, there are a lot of switch statements and panics and it only accounts for typesint
andstring
and only accounts for.csv
style test data formats, but I'd like to make my data generation functions robust enough to be able to account for all file formats that dolt supports and all noms types... - 218: Added skipped bats test for schema diffs on adding a primary key
- 216: Andy/sqlschemadiffs
Adding schema changes todolf diff --sql
output. Supports:- add/drop table
- add/drop column
- rename table
- rename column
- 215: diff table
- 214: Bh/super schema
- 213: Zachmu/sql performance
- 212: Zachmu/sql indexes2
- 211: Added time to the handled cases in DATETIME & changed tests
This won't compile until dolthub/go-mysql-server#26 is referenced ingo.mod
.
Dolt 0.11.0 released
We are excited to announce the release of Dolt 0.11.0.
SQL
System Table
We implemented a dolt log table, thus making our first attempt to surface dolt version control concepts in SQL by surfacing commit data. This will allow users to leverage commit data in an automated setting via SQL. Clone a public repo to see how it works:
$ dolt clone Liquidata/ip-to-country
$ cd ip-to-country
$ dolt sql -q "select * from dolt_log"
$ dolt sql -q "select committer,date from dolt_log order by date desc"
+-------------+--------------------------------+
| committer | date |
+-------------+--------------------------------+
| Tim Sehn | Wed Sep 25 12:30:43 -0400 2019 |
| Tim Sehn | Wed Sep 18 18:27:02 -0400 2019 |
.
.
.
Timestamps
We added support for DATETIME
data type in SQL. This is a major milestone in achieving compatibility with existing RDBMS solutions.
Performance
We continue to rapidly improve our SQL implementation. On the performance side some degenerate cases of query performance saw large improvements. We also resolved some issues where update statements had to be "over parenthesized", with the parser now matching the standard.
Other
We support null values in CSV files that are imported via the command line, as well as minor bug fixes under the hood.
If you find any bugs, or have any questions or feature requests, please create an issue and we will take a look.
Merged PRs
- 208: go/libraries/doltcore/row: tagged_values.go: Fix n^2 behavior in ParseTaggedValues.
ParseTaggedValues used to call Tuple.Get(0)...Tuple.Get(n), but Tuple.Get(x)
has O(n) perf, so the function did O(n^2) decoding work to decode a tuple.
Use a TupleIterator instead. - 206: go/store/types: Improve perf of value decoding for primitive types.
This fixes a performance regression in value decoding after the work to make it easier to add primitive types to the storage layer.
First, we change some map lookups into slice lookups, because hashing the small integers on hot decode paths dominates CPU profiles.
Next, we inline logic for some frequently used primitive types invalue_decoder.go
, as opposed to going through the table indirection. This is about a 30% perf improvement for linear scans onskipValue()
, which is worth the duplication here.
Code for adding a kind remains correct if the decoder isn't changed to include an inlined decode path. We omit inliningUUID
andInlineBlob
here for now. - 199: Km/redo import nulls
- 198: checkout a remote only branch
- 196: Bh/log table
- 195: Added Timestamp to Dolt and Datetime to SQL
Have a look!
I ran into an import cycle issue that I just could not figure out how to avoid, except by putting the tests into their own test folder (sqle/types/tests
), so that's why they're in there. In particular, the cycle was thatsqle
importssqle/types
, and the tests rely on (and thus must import)sqle
, causing the cycle.
I'm thinking of adding tests for the other SQL types later so that we have a few more built-in tests using the server portion, rather than everything using the-q
pathway. That will be a different/future PR though. - 193: diff where and limit
- 191: fix branch name panic with period
Looked into supporting periods in branch names, but it looks likenoms
relies on periods specifically pretty heavily. Seems to be excluded from the regex below by design, since they build some types on the expectation that a branch name orref
contain a period.
My understanding is that a user's branch name is used to look up a particular dataset within thenoms
layer and this variable (go/store/datas/dataset.go):acts as the regex source of "truth" for branch names/ dataset look ups, and I believe more.// DatasetRe is a regexp that matches a legal Dataset name anywhere within the // target string. var DatasetRe = regexp.MustCompile(`[a-zA-Z0-9\-_/]+`)
Noms also expects to be able to append a.
to this string in order to parse the string later and correctly create it'sPath
types...
I went down a rabbit hole trying to change all of thenoms
Path
delimiters to be a different character, but the changes go pretty deep and start breaking a lot of things. Happy to continue down that course in order to support periods in branch names, but it might take me a bit of time change everything. I'm also not sure what character should replace the period... asterisk? Anyway, this PR seemed like low hanging fruit fix to resolve the panic at least. - 190: Missed Kind to String
In my last PR, it looks like I missed that we were using the oldDoltToSQLType
hardcoded map from the original SQL implementation. I didn't change it everywhere (it's used heavily in the old SQL code that isn't even being called anymore), but it's changed where it matters. Added a new interface function and changed the printing code to be a bit more consistent (we were mixing uppercase with lowercase).
I'm also returning different values, such asBIGINT
forsql.Int64
, asint
parses in MySQL to a 32-bit integer, which isn't correct. Essentially made it so that if you took theCREATE
statement exactly as-is and exported your data to a bunch of inserts and ran it in MySQL then it wouldn't error out, as it previously would have. - 189: diff source refactor
- 188: go/cmd/git-dolt/README.md: Add comparison to git-lfs and note on updates
- 187: Remove skip on test for / in branch names. Added a skipped test for .…
… in branch names. . in branch names panics rights now. - 186: go/go.mod: Pick up sqlparser improvements for ADD COLUMN, RENAME COLUMN. Fix some tests.
- 185: Moved command line SQL processing to use new engine for CREATE and DROP
- 183: add appid to logevents requests
Need to update the requests so that it reflects the current proto definitions. - 182: Moved SQL types to an interface
Have a look! Just make an empty struct type that implementsSqlTypeInit
and add the struct tosqlTypeInitializers
and you've got a type that works in SQL now! - 179: clone reliability
- 178: go/store/nbs: store.go: Be more careful about updates to nbs field values until all operations have completed successfully.
- 177: go/cmd/dolt: Bump version to 0.10.0
Closed Issues
- 194: Wide tables create poor query performance
0.10.0
We are excited to announce the latest release of Dolt, which includes a new feature, substantial improvements to existing features, and a new Windows installer.
Dolt Blame
Dolt now has a blame
command, which provides row audit functionality familiar to Git users. (Blame for individual cells is in the works.) We have a deep dive on the implementation of dolt blame on our blog, so definitely check that out if you're interested.
One of the long-term goals of Dolt is to provide a database with fine-grained audit capabilities to support hygienic management of valuable human-scale data, and this feature is a huge step towards realizing that vision.
SQL Enhancements
One of our major goals for the product is full SQL compliance; this release contains steps towards achieving that. In particular, the following commands are now supported:
CREATE TABLE
DROP TABLE
INSERT VALUES
&INSERT SET
(noIGNORE
orON DUPLICATE KEY UPDATE
support yet, also noINSERT SELECT
support yet)UPDATE
(Single table, noIGNORE
support yet)REPLACE VALUES
andREPLACE SET
(noREPLACE SELECT
support yet)
As well as making progress against our goal of full compliance, we also created a test suite that will help validate our SQL implementation. Check out the test suite and harness, and the related blog post. This is an important step in creating a fully transparent mechanism for our progress against our compliance goal.
We also fixed some bugs and made some performance improvements.
Schema Import
We now support schema inference from a CSV file. This is a convenience function to make importing a CSV with a correct schema easier. The command is best understood by looking at the help
details:
$ dolt schema import --help
NAME
dolt schema import - Creates a new table with an inferred schema.
SYNOPSIS
dolt schema import [--create|--replace] [--force] [--dry-run] [--lower|--upper] [--keep-types] [--file-type <type>] [--float-threshold] [--map <mapping-file>] [--delim <delimiter>]--pks <field>,... <table> <file>
Windows Installer Packages
We now provide both 32- and 64-bit MSI packages for easy installation of Dolt on Windows. These may be used instead of manually extracting the archives (which are now provided in .zip
format instead of .tar.gz
). Please let us know if you encounter any issues.
Other
Various bug fixes and enhancements, and also improvements to dolt clone
which had a problematic race condition.
As always, bug reports, feedback, and feature requests are very much appreciated. We hope you enjoy using Dolt!
Merged PRs
- 175: {bats, go}: Make commit spec truly optional in blame
$ dolt blame lunch-places +--------------------+----------------------------------------------------+-----------------+------------------------------+----------------------------------+ | NAME | COMMIT MSG | AUTHOR | TIME | COMMIT | +--------------------+----------------------------------------------------+-----------------+------------------------------+----------------------------------+ | Boa | Had an unhandled schema merge conflict which I ch… | Tim Sehn | Fri Mar 22 12:21:59 PDT 2019 | ffabndafbp64r393ttghused8siip77j | | Chipotle | lunch-places: Added Chipotle | katie mcculloch | Thu Aug 29 11:38:00 PDT 2019 | m2jbro89ou8g6rv71rs7q9f3jsmjuk1d | | Sidecar | Had an unhandled schema merge conflict which I ch… | Tim Sehn | Fri Mar 22 12:21:59 PDT 2019 | ffabndafbp64r393ttghused8siip77j | | Wendy's | Had an unhandled schema merge conflict which I ch… | Tim Sehn | Fri Mar 22 12:21:59 PDT 2019 | ffabndafbp64r393ttghused8siip77j | | Bangkok West Thai | Had an unhandled schema merge conflict which I ch… | Tim Sehn | Fri Mar 22 12:21:59 PDT 2019 | ffabndafbp64r393ttghused8siip77j | | Jamba Juice | Had an unhandled schema merge conflict which I ch… | Tim Sehn | Fri Mar 22 12:21:59 PDT 2019 | ffabndafbp64r393ttghused8siip77j | | Kazu Nori | Had an unhandled schema merge conflict which I ch… | Tim Sehn | Fri Mar 22 12:21:59 PDT 2019 | ffabndafbp64r393ttghused8siip77j | | McDonald's | Had an unhandled schema merge conflict which I ch… | Tim Sehn | Fri Mar 22 12:21:59 PDT 2019 | ffabndafbp64r393ttghused8siip77j | | Sunnin | change rating | bheni | Thu Apr 4 15:43:00 PDT 2019 | 137qgvrsve1u458briekqar5f7iiqq2j | | Bruxie | Had an unhandled schema merge conflict which I ch… | Tim Sehn | Fri Mar 22 12:21:59 PDT 2019 | ffabndafbp64r393ttghused8siip77j | | Espresso Cielo | added Espresso Cielo | Matt Jesuele | Wed Jul 10 12:20:39 PDT 2019 | 314hls5ncucpol2qfdphf923s21luk16 | | Seasalt Fish Grill | fixed ratings | bheni | Thu Apr 4 14:07:36 PDT 2019 | rqpd7ga1nic3jmc54h44qa05i8124vsp | | Starbucks | Had an unhandled schema merge conflict which I ch… | Tim Sehn | Fri Mar 22 12:21:59 PDT 2019 | ffabndafbp64r393ttghused8siip77j | | Tocaya | update tocaya rating | bheni | Thu Jun 6 17:22:24 PDT 2019 | qi331vjgoavqpi5am334cji1gmhlkdv5 | | Sake House | fixed ratings | bheni | Thu Apr 4 14:07:36 PDT 2019 | rqpd7ga1nic3jmc54h44qa05i8124vsp | | Swingers | Had an unhandled schema merge conflict which I ch… | Tim Sehn | Fri Mar 22 12:21:59 PDT 2019 | ffabndafbp64r393ttghused8siip77j | | Art's Table | Had an unhandled schema merge conflict which I ch… | Tim Sehn | Fri Mar 22 12:21:59 PDT 2019 | ffabndafbp64r393ttghused8siip77j | | Bay Cities | Had an unhandled schema merge conflict which I ch… | Tim Sehn | Fri Mar 22 12:21:59 PDT 2019 | ffabndafbp64r393ttghused8siip77j | | Benny's Tacos | Had an unhandled schema merge conflict which I ch… | Tim Sehn | Fri Mar 22 12:21:59 PDT 2019 | ffabndafbp64r393ttghused8siip77j | | Bibibop | Had an unhandled schema merge conflict which I ch… | Tim Sehn | Fri Mar 22 12:21:59 PDT 2019 | ffabndafbp64r393ttghused8siip77j | | Curious Palate | Had an unhandled schema merge conflict which I ch… | Tim Sehn | Fri Mar 22 12:21:59 PDT 2019 | ffabndafbp64r393ttghused8siip77j | | Meat on Ocean | Had an unhandled schema merge conflict which I ch… | Tim Sehn | Fri Mar 22 12:21:59 PDT 2019 | ffabndafbp64r393ttghused8siip77j | +--------------------+----------------------------------------------------+-----------------+------------------------------+----------------------------------+
- 174: Update dolt blame description
- 170: Skipping the right SQL test that has a hanging race condition (joins …
…on legacy engine, code to be deleted soon) - 168: Bh/correctness fixes
- 166: clone bug fix
- 165: README.md: Remove Tim's username from shell prompt
- 161: Zachmu/sql logictest
Improved main method for running or parsing sqllogic tests. - 160: Bh/upload error checking
- 159: Zachmu/sql logictest
Removed code for sqllogic test and took a dependency on the new module instead. - 158: Disabling sql server tests on linux, since they appear to be hanging …
…waiting for the server to start - 157: Basic dolt blame
This still needs some more BATS tests and maybe some UI touches like what @Hydrocharged suggested, but overall I think it's ready for some eyes.
The biggest thing I don't like is the logic surrounding pretty-printing of primary keys (you'll see why) but please do tell me if you notice other things that are janky.
In general, feedback and suggestions are very welcome. - 156: Bh/schema import
- 155: Zachmu/sql logictest
Implementation of sqllogictest for dolt on go-mysql-server. After getting this merged, I plan to fork off the non-dolt portions to a separate repo. - 154: Bumping dependency of go-mysql-server to head of ld-master branch. Al…
…so fixing several issues that come up when doing so. - 152: go/store/nbs: Add recover in table_set Rebase goroutines. (Saw a SIGSEGV which crashed doltremoteapi).
- 148: Added new InlineBlob type
This turned out to be far smaller than I thought as far as changes go. Thistypes
change might be simpler than I first thought! I probably felt it was harder just because finding all of these locations was a major pain... - 146: proto/dolt/services/eventsapi: Adopt the version of eventsapi that lives in ld repo instead of here.
- 145: Update README.md
- 144: Miscellaneous...
0.9.9
Contained in this release
- remote performance improvements (clone, push, and pull)
- better support for MySQL in server mode, including
DROP
,UPDATE
,INSERT
- SQL performance improvement
- diff summary
- more metrics
- other assorted bug fixes and improvements
If you find any bugs, have a feature request, or an interesting use-case, please raise an issue.
Merged PRs
- 114: go/libraries/doltcore/sqle: types: Make SqlValToNomsVal compile for 32bit by checking for overflow on uint -> int64 differently.
- 112: Zachmu/drop table
- 110: go/utils/checkcommitters: Oscar is an allowed committer and author.
- 109: attempted deadlock fix
- 108: Correct the installation instructions
- 105: dolt diff --summary
Example output using Liquidata/tatoeba-sentence-translations:Fixes #77$ dolt diff --summary rnfm50gmumlettuebt2latmer617ni3t diff --dolt a/sentences b/sentences --- a/sentences @ gd1v6fsc04k5676c105d046m04hla3ia +++ b/sentences @ 2ttci8id13mijhv8u94qlioqegh7lgpo 7,800,102 Rows Unmodified (99.99%) 15,030 Rows Added (0.19%) 108 Rows Deleted (0.00%) 960 Rows Modified (0.01%) 1,888 Cells Modified (0.00%) (7,801,170 Entries vs 7,816,092 Entries) diff --dolt a/translations b/translations --- a/translations @ p2355o6clst8ssvr9jha2bfgqbrstkmm +++ b/translations @ 62ri8lmohbhs1mc01m9o4rbvj6rbl8ee 5,856,845 Rows Unmodified (90.91%) 468,173 Rows Added (7.27%) 578,242 Rows Deleted (8.98%) 7,626 Rows Modified (0.12%) 7,626 Cells Modified (0.06%) (6,442,713 Entries vs 6,332,494 Entries)
- 104: Bh/output updates3
- 103: dolt/go/store: Stop panicing on sequence walks when expected hashes are not in the ValueReader.
- 101: go/{store,libraries/doltcore/remotestorage}: Make the code peddling in nbs table file formats a little more explicit about it.
- 100: newline changes
- 99: Implemented UPDATE
I think we should delete the old SQL methods that are in thesql.go
file. I know at first you mentioned keeping them there for reference, but they're not being used at all at this point, and they're still in git history if we want to look at them again in the future for some reason. It's clutter at this point.
I'm skipping that one test at the end because of a WHERE decision ingo-mysql-server
. The code looks intentional, in that converting strings to ints will return 0 if the string is not parsable. I'll file it as a non-conforming bug on their end, but for now I'm skipping the test. - 98: Bh/output updates
- 97: store/{nbs,chunks}: Make ChunkStore#GetMany{,Compressed} take send-only channels.
- 96: update status messages for push/pull
- 94: Update README.md
Ensure that installing from source is properly documented, including go-gotchas. - 93: Reverts the revert of my push/pull changes with fixes.
- 92: content length fix
- 91: go: store/nbs: table_reader: getManyAtOffsetsWithReadFunc: Stop unbounded I/O parallelism in GetMany implementation.
When we do things like push, pull or (soon-to-be) garbage collection, we have large sets of Chunk addresses that we pass intoChunkStore#GetMany
and then go off and process. Clients largely try to control the memory overhead and pipeline depth by passing in a buffered channel of an appropriate size. The expectation is that the implementation ofGetMany
will have an amount of data in flight at any give in time that is in some reasonable way proportional to the channel size.
In the current implementation, there is unbounded concurrency on the read destination allocations and the reads themselves, with one go routine spawned for each byte range we want to read. This results in absolutely massive (virtual) heap utilization and unreasonable I/O parallelism and context switch thrashing in large repo push/pull situations.
This is a small PR to change the concurrency paradigm insidegetManyAtOffsetsWithReadFunc
so that we only have 4 concurrent dispatched reads pertable_reader
instance at a time.
This is still not the behavior we actually want.- I/O concurrency should be configurable at the ChunkStore layer (or eventually per-device backing a set of
tableReader
s), and not depend on the number oftableReader
s which happen to back the chunk store. - Memory overhead is still not correctly bounded here, since read ahead batches are allowed to grow to arbitrary sizes. Reasonable bounds on memory overhead should be configurable at the ChunkStore layer.
I'm landing this as a big incremental improvement over status quo. Here are some non-reproducible one-shot test results from a test program. The test program walks the entire chunk graph, assembles every chunk address, and then does aGetManyCompressed
on every chunk address and copies their contents to/dev/null
. It was run on a ~10GB (compressed) data set:
Before:
After:$ /usr/bin/time -l -- go run test.go ... MemStats: Sys: 16628128568 161.29 real 67.29 user 456.38 sys 5106425856 maximum resident set size 0 average shared memory size 0 average unshared data size 0 average unshared stack size 10805008 page reclaims 23881 page faults 0 swaps 0 block input operations 0 block output operations 0 messages sent 0 messages received 8 signals received 652686 voluntary context switches 21071339 involuntary context switches
On these runs, sys time, wallclock time, vm page reclaims and virtual memory used are all improved pretty substantially.$ /usr/bin/time -l -- go run test.go ... MemStats: Sys: 4590759160 32.17 real 30.53 user 29.62 sys 4561879040 maximum resident set size 0 average shared memory size 0 average unshared data size 0 average unshared stack size 1228770 page reclaims 67100 page faults 0 swaps 0 block input operations 0 block output operations 0 messages sent 0 messages received 14 signals received 456898 voluntary context switches 2954503 involuntary context switches
Very open to feedback and discussion of potential performance regressions here, but I think this is an incremental win for now. - I/O concurrency should be configurable at the ChunkStore layer (or eventually per-device backing a set of
- 90: Implemented REPLACE
Mostly tests since this just uses theDelete
andInsert
functions that we already have. The previous delete would ignore a delete on a non-existent row, so I just changed it to throw the correct error if the row does not exist so thatREPLACE
works properly now (else it will always say aREPLACE
did both a delete & insert). - 89: Push and Pull v2
- 88: Add metrics attributes
Similar to previous PR db/event-metrics, but this time, no byte measurements onclone
as the implementation is different. Some things in the events package have been refactored to prevent circular dependencies. AddingStandardAttributes
will help me generate the info for my new metrics. - 87: {go, bats}: Replace table works with file with schema in different order
- 86: dolt table import -r
Fixes #76
Replaces existing table with the contents of the file while preserving the original schema - 85: Bh/cmp chunks
- 84: revert nil check and always require stats to match aws behavior
- 83: Bh/clone2
This version of clone works on the table files directly. It enumerates all the table files and downloads them. It does not inspect the chunks as v1 did. - 82: Naked deletes now just delete everything instead of iterating
I mean this works but it's ugly and I'm not sure of a better way to do it really - 81: Progress on switching deletes to new engine
Currently works for deletes but not thoroughly testing. - 80: go/store/nbs: store.go: Make global index cache 64MB instead of 8MB.
- 79: Removed skips for tests that will now work
This will fail for now, waiting on dolthub/go-mysql-server#10 to be approved before I merge this in. Super small stuff though. - 73: go/libraries/doltcore/remotestorage: Add the ability to have a noop cache on DoltChunkStore.
- 72: proto: Use fully qualified paths for go_packages.
This allows cross-package references within proto files to work appropriately. - 71: Db/events dir lock
initial implementation of making event flush concurrency safe - 70: go/store/spec: Move to aws://[table:bucket] for NBS on AWS specs because of Go URL parsing changes.
See https://go.googlesourc...
0.9.8
We have released version 0.98 of Dolt, which as you probably know is now open source. A quick reminder that you can freely host the awesome public data you put in Dolt at DoltHub.
This release contains performance improvements, and bug fixes but no major new features. Please let me know if you have any questions.
Merged PRs
- 60: bump version
- 57: Added a PID to a directory. This was causing jenkins on windows to fa…
…il if it ran twice on the same instance. - 55: {bats,go}: Log successful commits
This closes dolthub/ld#1744
Before:After:$ dolt commit -m "commit ints"
$ dolt commit -m "commit ints" commit 3cvbeh6bn94hlhfaig5pa65peiribrhn Author: Matt Jesuele <[email protected]> Date: Mon Aug 26 19:10:17 -0700 2019 commit ints
- 50: add dustin to approved commiters/authors
- 49: [WIP] Add client events to dolt commands
Added events to all of the dolt commands.
Turned logging back on while I work on this PR. (will remove before merge)
I need to write tests for these, should I create a test file for each command file where I test to ensure that the command has an event and the appropriate metrics? Would love input on this. - 48: client events
- 47: Threading context from app launch
- 46: Add client_event.proto and compiled .go file
- 45: Add support to get the last modified time from the filesys
- 44: Changed default remote host to use the env constant
Before we were usingdolthub.com
as the default, which is incorrect. I've changed it to the appropriate environment constant so that it also properly updates when we change from our beta domain. - 43: Created skipped test for newlines on CSV
- 42: README.md: Remove erroneous go install instructions.
- 41: Make the InMemFS thread safe
The current InMemFS was failing in a multithreaded context as it edits a map which is not thread safe. Something to note is that golang locks are not re-entrant. Some of the refactoring is related to that. Locks are typically put on the exported methods and not the internal methods. - 40: Fixed JSON imports and disallowed schemas on import updates
Fixes #36 - 39: Add move file functionality to the filesys package
- 38: Fixes a panic that occurs if multiple bad rows are found during import
When a pipeline is being run, any stage can write to the bad row channel when an error is encountered. There is a go routine reading from this channel that will not exit until the channel is closed, or an error is encountered. In typical operation the pipeline's sink would close the bad row channel once the pipeline finishes (either via an error triggered stoppage, or successful completion). However, in the case where multiple errors are getting written to the bad row channel from multiple go routines, it is possible for the bad row channel to be written to, which triggers the pipeline to be stopped, and the channel to be closed, and then have a go routine write to that closed channel.
The fix here is to not close the channel in the sink, but instead to write a marker to the channel which will cause the go routine watching for errors to exit. - 37: go/go.mod: Do not depend on //proto/third_party/golang-protobuf.
Development ergonomics are much worse and the runtime library will maintain
compability with the generator major version anyway, or it will explicitly
break compilation. - 35: dolt/go: Fix spelling on ancestor
- 34: proto/Makefile: Use submodule for protoc-gen-go instead of whatever is on the path.
- 33: Jenkinsfile: Use goimports from go.mod for check_fmt.sh
- 31: support importing and exporting data to and from stdin and std out
In the current releases it was possible to chain dolt with other programs via stdout/stdin like so:
dolt table export table_name --file-type csv /dev/stdout -f|python row_cleaner.py|dolt table import cleaned_data -u --file-type csv /dev/stdin
Which only works in environments where stdin / stdout are mapped to files on the filesystem. This change will use the stdin / stdout streams for import / export when a file is not provided. - 30: Added column lengths for schema output to varchar columns so that the…
…y can be re-imported - 29: go/cmd/dolt: dolt ls -v shows number of rows in each table.
- 27: Refer to newest version of mmap-go
We now strictly refer to our own fork of mmap-go. Plus cleaned up thego.mod
, as we have git history and don't quite need the comments. - 25: Added .idea directory (goland) to top-level .gitignore file
- 24: fix race condition which caused reproducible crash
The declaration of the variables readStart, readEnd, and batch are declared outside of the for loop, and it is possible that their value can change before the go routine calls readAtOffsets causing some or all of these values to be incorrect. The fix is to save them to variables scoped to the loop before calling the go routine. - 23: Fixed a bug on windows when redirecting STDIN for SQL import, e.g. do…
…lt sql < dump.sql. Also fixed up ip2nation sample so that it successfully imports