Skip to content

0.9.9

Compare
Choose a tag to compare
@reltuk reltuk released this 04 Oct 18:36
d77543c

Contained in this release

  1. remote performance improvements (clone, push, and pull)
  2. better support for MySQL in server mode, including DROP, UPDATE, INSERT
  3. SQL performance improvement
  4. diff summary
  5. more metrics
  6. other assorted bug fixes and improvements

If you find any bugs, have a feature request, or an interesting use-case, please raise an issue.

Merged PRs

  • 114: go/libraries/doltcore/sqle: types: Make SqlValToNomsVal compile for 32bit by checking for overflow on uint -> int64 differently.
  • 112: Zachmu/drop table
  • 110: go/utils/checkcommitters: Oscar is an allowed committer and author.
  • 109: attempted deadlock fix
  • 108: Correct the installation instructions
  • 105: dolt diff --summary
    Example output using Liquidata/tatoeba-sentence-translations:
    $ dolt diff --summary rnfm50gmumlettuebt2latmer617ni3t
    diff --dolt a/sentences b/sentences
    --- a/sentences @ gd1v6fsc04k5676c105d046m04hla3ia
    +++ b/sentences @ 2ttci8id13mijhv8u94qlioqegh7lgpo
    7,800,102 Rows Unmodified (99.99%)
    15,030 Rows Added (0.19%)
    108 Rows Deleted (0.00%)
    960 Rows Modified (0.01%)
    1,888 Cells Modified (0.00%)
    (7,801,170 Entries vs 7,816,092 Entries)
    diff --dolt a/translations b/translations
    --- a/translations @ p2355o6clst8ssvr9jha2bfgqbrstkmm
    +++ b/translations @ 62ri8lmohbhs1mc01m9o4rbvj6rbl8ee
    5,856,845 Rows Unmodified (90.91%)
    468,173 Rows Added (7.27%)
    578,242 Rows Deleted (8.98%)
    7,626 Rows Modified (0.12%)
    7,626 Cells Modified (0.06%)
    (6,442,713 Entries vs 6,332,494 Entries)
    
    Fixes #77
  • 104: Bh/output updates3
  • 103: dolt/go/store: Stop panicing on sequence walks when expected hashes are not in the ValueReader.
  • 101: go/{store,libraries/doltcore/remotestorage}: Make the code peddling in nbs table file formats a little more explicit about it.
  • 100: newline changes
  • 99: Implemented UPDATE
    I think we should delete the old SQL methods that are in the sql.go file. I know at first you mentioned keeping them there for reference, but they're not being used at all at this point, and they're still in git history if we want to look at them again in the future for some reason. It's clutter at this point.
    I'm skipping that one test at the end because of a WHERE decision in go-mysql-server. The code looks intentional, in that converting strings to ints will return 0 if the string is not parsable. I'll file it as a non-conforming bug on their end, but for now I'm skipping the test.
  • 98: Bh/output updates
  • 97: store/{nbs,chunks}: Make ChunkStore#GetMany{,Compressed} take send-only channels.
  • 96: update status messages for push/pull
  • 94: Update README.md
    Ensure that installing from source is properly documented, including go-gotchas.
  • 93: Reverts the revert of my push/pull changes with fixes.
  • 92: content length fix
  • 91: go: store/nbs: table_reader: getManyAtOffsetsWithReadFunc: Stop unbounded I/O parallelism in GetMany implementation.
    When we do things like push, pull or (soon-to-be) garbage collection, we have large sets of Chunk addresses that we pass into ChunkStore#GetMany and then go off and process. Clients largely try to control the memory overhead and pipeline depth by passing in a buffered channel of an appropriate size. The expectation is that the implementation of GetMany will have an amount of data in flight at any give in time that is in some reasonable way proportional to the channel size.
    In the current implementation, there is unbounded concurrency on the read destination allocations and the reads themselves, with one go routine spawned for each byte range we want to read. This results in absolutely massive (virtual) heap utilization and unreasonable I/O parallelism and context switch thrashing in large repo push/pull situations.
    This is a small PR to change the concurrency paradigm inside getManyAtOffsetsWithReadFunc so that we only have 4 concurrent dispatched reads per table_reader instance at a time.
    This is still not the behavior we actually want.
    • I/O concurrency should be configurable at the ChunkStore layer (or eventually per-device backing a set of tableReaders), and not depend on the number of tableReaders which happen to back the chunk store.
    • Memory overhead is still not correctly bounded here, since read ahead batches are allowed to grow to arbitrary sizes. Reasonable bounds on memory overhead should be configurable at the ChunkStore layer.
      I'm landing this as a big incremental improvement over status quo. Here are some non-reproducible one-shot test results from a test program. The test program walks the entire chunk graph, assembles every chunk address, and then does a GetManyCompressed on every chunk address and copies their contents to /dev/null. It was run on a ~10GB (compressed) data set:
      Before:
    $ /usr/bin/time -l -- go run test.go
    ...
    MemStats: Sys: 16628128568
    161.29 real        67.29 user       456.38 sys
    5106425856  maximum resident set size
    0  average shared memory size
    0  average unshared data size
    0  average unshared stack size
    10805008  page reclaims
    23881  page faults
    0  swaps
    0  block input operations
    0  block output operations
    0  messages sent
    0  messages received
    8  signals received
    652686  voluntary context switches
    21071339  involuntary context switches
    
    After:
    $ /usr/bin/time -l -- go run test.go
    ...
    MemStats: Sys: 4590759160
    32.17 real        30.53 user        29.62 sys
    4561879040  maximum resident set size
    0  average shared memory size
    0  average unshared data size
    0  average unshared stack size
    1228770  page reclaims
    67100  page faults
    0  swaps
    0  block input operations
    0  block output operations
    0  messages sent
    0  messages received
    14  signals received
    456898  voluntary context switches
    2954503  involuntary context switches
    
    On these runs, sys time, wallclock time, vm page reclaims and virtual memory used are all improved pretty substantially.
    Very open to feedback and discussion of potential performance regressions here, but I think this is an incremental win for now.
  • 90: Implemented REPLACE
    Mostly tests since this just uses the Delete and Insert functions that we already have. The previous delete would ignore a delete on a non-existent row, so I just changed it to throw the correct error if the row does not exist so that REPLACE works properly now (else it will always say a REPLACE did both a delete & insert).
  • 89: Push and Pull v2
  • 88: Add metrics attributes
    Similar to previous PR db/event-metrics, but this time, no byte measurements on clone as the implementation is different. Some things in the events package have been refactored to prevent circular dependencies. Adding StandardAttributes will help me generate the info for my new metrics.
  • 87: {go, bats}: Replace table works with file with schema in different order
  • 86: dolt table import -r
    Fixes #76
    Replaces existing table with the contents of the file while preserving the original schema
  • 85: Bh/cmp chunks
  • 84: revert nil check and always require stats to match aws behavior
  • 83: Bh/clone2
    This version of clone works on the table files directly. It enumerates all the table files and downloads them. It does not inspect the chunks as v1 did.
  • 82: Naked deletes now just delete everything instead of iterating
    I mean this works but it's ugly and I'm not sure of a better way to do it really
  • 81: Progress on switching deletes to new engine
    Currently works for deletes but not thoroughly testing.
  • 80: go/store/nbs: store.go: Make global index cache 64MB instead of 8MB.
  • 79: Removed skips for tests that will now work
    This will fail for now, waiting on dolthub/go-mysql-server#10 to be approved before I merge this in. Super small stuff though.
  • 73: go/libraries/doltcore/remotestorage: Add the ability to have a noop cache on DoltChunkStore.
  • 72: proto: Use fully qualified paths for go_packages.
    This allows cross-package references within proto files to work appropriately.
  • 71: Db/events dir lock
    initial implementation of making event flush concurrency safe
  • 70: go/store/spec: Move to aws://[table:bucket] for NBS on AWS specs because of Go URL parsing changes.
    See https://go.googlesource.com/go/+/61bb56ad63992a3199acc55b2537c8355ef887b6
    for context on the changes.
  • 69: proto: remotesapi: chunkstore: Update message names and fields to clarify between chunk hashes on downloads and table file hashes on uploads.
  • 68: doltcore: commitwalk: Implement GetDotDotRevisions.
    Roughly mimics git log master..feature. Useful for displaying the commit log
    of a pull request, for example.
  • 67: Add file emitter that writes event data file
    Added file emitter that saves event data to files, and a flush that parses the files and sends them to the grpc server.
  • 63: Update README.md
    @timsehn pointed out a shortcoming in the README file.
  • 7: Merge upstream master
  • 6: Fixed bug in comparisons for negative float literals
  • 5: Zachmu/is true
  • 4: Instead of adding offset to rowCount, just reverse the wrapping betwe…
    …en offset and limit nodes.
  • 3: Zachmu/float bugfixes
  • 2: Zachmu/limit bug fixes
  • 1: Replace of vitess dependency with our forked one, and commented local…
    … override

Closed Issues

  • 106: Installation instructions are incorrect
  • 95: dolt push segmentation fault
  • 77: dolt diff --summary
  • 76: dolt table import -r
  • 75: DoltHub: Add repo size to Dataset detail page