Contained in this release

remote performance improvements (clone, push, and pull)
better support for MySQL in server mode, including DROP, UPDATE, INSERT
SQL performance improvement
diff summary
more metrics
other assorted bug fixes and improvements

If you find any bugs, have a feature request, or an interesting use-case, please raise an issue.

Merged PRs

114: go/libraries/doltcore/sqle: types: Make SqlValToNomsVal compile for 32bit by checking for overflow on uint -> int64 differently.
112: Zachmu/drop table
110: go/utils/checkcommitters: Oscar is an allowed committer and author.
109: attempted deadlock fix
108: Correct the installation instructions

105: dolt diff --summary
Example output using Liquidata/tatoeba-sentence-translations:

$ dolt diff --summary rnfm50gmumlettuebt2latmer617ni3t
diff --dolt a/sentences b/sentences
--- a/sentences @ gd1v6fsc04k5676c105d046m04hla3ia
+++ b/sentences @ 2ttci8id13mijhv8u94qlioqegh7lgpo
7,800,102 Rows Unmodified (99.99%)
15,030 Rows Added (0.19%)
108 Rows Deleted (0.00%)
960 Rows Modified (0.01%)
1,888 Cells Modified (0.00%)
(7,801,170 Entries vs 7,816,092 Entries)
diff --dolt a/translations b/translations
--- a/translations @ p2355o6clst8ssvr9jha2bfgqbrstkmm
+++ b/translations @ 62ri8lmohbhs1mc01m9o4rbvj6rbl8ee
5,856,845 Rows Unmodified (90.91%)
468,173 Rows Added (7.27%)
578,242 Rows Deleted (8.98%)
7,626 Rows Modified (0.12%)
7,626 Cells Modified (0.06%)
(6,442,713 Entries vs 6,332,494 Entries)

Fixes #77

104: Bh/output updates3
103: dolt/go/store: Stop panicing on sequence walks when expected hashes are not in the ValueReader.
101: go/{store,libraries/doltcore/remotestorage}: Make the code peddling in nbs table file formats a little more explicit about it.
100: newline changes
99: Implemented UPDATE
I think we should delete the old SQL methods that are in the sql.go file. I know at first you mentioned keeping them there for reference, but they're not being used at all at this point, and they're still in git history if we want to look at them again in the future for some reason. It's clutter at this point.
I'm skipping that one test at the end because of a WHERE decision in go-mysql-server. The code looks intentional, in that converting strings to ints will return 0 if the string is not parsable. I'll file it as a non-conforming bug on their end, but for now I'm skipping the test.
98: Bh/output updates
97: store/{nbs,chunks}: Make ChunkStore#GetMany{,Compressed} take send-only channels.
96: update status messages for push/pull
94: Update README.md
Ensure that installing from source is properly documented, including go-gotchas.
93: Reverts the revert of my push/pull changes with fixes.
92: content length fix
91: go: store/nbs: table_reader: getManyAtOffsetsWithReadFunc: Stop unbounded I/O parallelism in GetMany implementation.
When we do things like push, pull or (soon-to-be) garbage collection, we have large sets of Chunk addresses that we pass into ChunkStore#GetMany and then go off and process. Clients largely try to control the memory overhead and pipeline depth by passing in a buffered channel of an appropriate size. The expectation is that the implementation of GetMany will have an amount of data in flight at any give in time that is in some reasonable way proportional to the channel size.
In the current implementation, there is unbounded concurrency on the read destination allocations and the reads themselves, with one go routine spawned for each byte range we want to read. This results in absolutely massive (virtual) heap utilization and unreasonable I/O parallelism and context switch thrashing in large repo push/pull situations.
This is a small PR to change the concurrency paradigm inside getManyAtOffsetsWithReadFunc so that we only have 4 concurrent dispatched reads per table_reader instance at a time.
This is still not the behavior we actually want.
- I/O concurrency should be configurable at the ChunkStore layer (or eventually per-device backing a set of tableReaders), and not depend on the number of tableReaders which happen to back the chunk store.
- Memory overhead is still not correctly bounded here, since read ahead batches are allowed to grow to arbitrary sizes. Reasonable bounds on memory overhead should be configurable at the ChunkStore layer.
  I'm landing this as a big incremental improvement over status quo. Here are some non-reproducible one-shot test results from a test program. The test program walks the entire chunk graph, assembles every chunk address, and then does a GetManyCompressed on every chunk address and copies their contents to /dev/null. It was run on a ~10GB (compressed) data set:
  Before:
```
$ /usr/bin/time -l -- go run test.go
...
MemStats: Sys: 16628128568
161.29 real        67.29 user       456.38 sys
5106425856  maximum resident set size
0  average shared memory size
0  average unshared data size
0  average unshared stack size
10805008  page reclaims
23881  page faults
0  swaps
0  block input operations
0  block output operations
0  messages sent
0  messages received
8  signals received
652686  voluntary context switches
21071339  involuntary context switches
```
After:
```
$ /usr/bin/time -l -- go run test.go
...
MemStats: Sys: 4590759160
32.17 real        30.53 user        29.62 sys
4561879040  maximum resident set size
0  average shared memory size
0  average unshared data size
0  average unshared stack size
1228770  page reclaims
67100  page faults
0  swaps
0  block input operations
0  block output operations
0  messages sent
0  messages received
14  signals received
456898  voluntary context switches
2954503  involuntary context switches
```
On these runs, sys time, wallclock time, vm page reclaims and virtual memory used are all improved pretty substantially.
Very open to feedback and discussion of potential performance regressions here, but I think this is an incremental win for now.
90: Implemented REPLACE
Mostly tests since this just uses the Delete and Insert functions that we already have. The previous delete would ignore a delete on a non-existent row, so I just changed it to throw the correct error if the row does not exist so that REPLACE works properly now (else it will always say a REPLACE did both a delete & insert).
89: Push and Pull v2
88: Add metrics attributes
Similar to previous PR db/event-metrics, but this time, no byte measurements on clone as the implementation is different. Some things in the events package have been refactored to prevent circular dependencies. Adding StandardAttributes will help me generate the info for my new metrics.
87: {go, bats}: Replace table works with file with schema in different order
86: dolt table import -r
Fixes #76
Replaces existing table with the contents of the file while preserving the original schema
85: Bh/cmp chunks
84: revert nil check and always require stats to match aws behavior
83: Bh/clone2
This version of clone works on the table files directly. It enumerates all the table files and downloads them. It does not inspect the chunks as v1 did.
82: Naked deletes now just delete everything instead of iterating
I mean this works but it's ugly and I'm not sure of a better way to do it really
81: Progress on switching deletes to new engine
Currently works for deletes but not thoroughly testing.
80: go/store/nbs: store.go: Make global index cache 64MB instead of 8MB.
79: Removed skips for tests that will now work
This will fail for now, waiting on dolthub/go-mysql-server#10 to be approved before I merge this in. Super small stuff though.
73: go/libraries/doltcore/remotestorage: Add the ability to have a noop cache on DoltChunkStore.
72: proto: Use fully qualified paths for go_packages.
This allows cross-package references within proto files to work appropriately.
71: Db/events dir lock
initial implementation of making event flush concurrency safe
70: go/store/spec: Move to aws://[table:bucket] for NBS on AWS specs because of Go URL parsing changes.
See https://go.googlesource.com/go/+/61bb56ad63992a3199acc55b2537c8355ef887b6
for context on the changes.
69: proto: remotesapi: chunkstore: Update message names and fields to clarify between chunk hashes on downloads and table file hashes on uploads.
68: doltcore: commitwalk: Implement GetDotDotRevisions.
Roughly mimics git log master..feature. Useful for displaying the commit log
of a pull request, for example.
67: Add file emitter that writes event data file
Added file emitter that saves event data to files, and a flush that parses the files and sends them to the grpc server.
63: Update README.md
@timsehn pointed out a shortcoming in the README file.
7: Merge upstream master
6: Fixed bug in comparisons for negative float literals
5: Zachmu/is true
4: Instead of adding offset to rowCount, just reverse the wrapping betwe…
…en offset and limit nodes.
3: Zachmu/float bugfixes
2: Zachmu/limit bug fixes
1: Replace of vitess dependency with our forked one, and commented local…
… override

Closed Issues

106: Installation instructions are incorrect
95: dolt push segmentation fault
77: dolt diff --summary
76: dolt table import -r
75: DoltHub: Add repo size to Dataset detail page

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

0.9.9

Merged PRs

Closed Issues