Releases: Tokutek/mongo
TokuMX 2.0.1
TokuMX 2.0.1-rc2
TokuMX 2.0.0
General
- This release is focused on new features and improvements since 1.5.1: geospatial features, Ark elections, fast updates, and simultaneously partitioned and sharded collections.
Highlights
- Geospatial indexing and queries have been added and are compatible with those in MongoDB 2.4.
- The election algorithm used for replication failover has been replaced with Ark, which prevents elections from rolling back writes acknowledged with majority Write Concern, and makes failovers faster.
- Fast updates are a new update mechanism that avoids a preliminary query during an update operation, when possible, boosting the speed of update operations by up to 10x.
- Partitioned collections may now be sharded, in a restricted form useful for many time-series workloads with a suitable shard key.
- The Enterprise Edition of TokuMX now includes Point-in-Time Recovery and Audit.
- Enterprise Hot Backup now supports backing up distinct
dbpath
andlogDir
directories.
New features and improvements
-
All geospatial indexing and query features of MongoDB 2.4 have been added to TokuMX. (MX-166)
-
Replication elections now use Ark, a consensus algorithm similar to Raft, to elect a new primary node during failover.
This reduces the downtime experienced by clients due to failed failover elections.
This also ensures that writes acknowledged with majority Write Concern will never be rolled back by any later failover. (MX-765)
-
Previously, updates performed a query to read the existing document, then made changes to the relevant indexes. In Fractal Tree indexes, the query is typically much slower than the changes made by an update.
TokuMX 2.0 transparently optimizes updates of the right form to avoid the query phase. For updates by primary key (or
_id
by default) that don't modify any secondary indexes, this can result in a dramatic speedup.Please see the documentation for full details. (MX-932)
-
Reduced the amount of data written to disk when applying replicated operations on secondaries. (MX-1133)
-
Replication rollback has been changed so that oplog entries are significantly smaller for update and delete operations. (MX-1223, MX-1238)
-
Partitioned collections may now be sharded, with several restrictions:
- The collection must be created by the
shardCollection
command, and at that time, both the shard key and partition key must be provided. The partition key will be used as the primary key. - The balancer will not move chunks for a sharded and partitioned collection. Users may manually move chunks with
sh.moveChunk()
, however this is only recommended for empty chunks, as a collection initialization step. - Chunks are not automatically split, though users may manually split chunks with
sh.splitAt()
, which is recommended as a collection initialization step.
Please see the documentation for full details. (MX-1232)
- The collection must be created by the
-
Optimized the comparison function used to compare ObjectIDs. This results in a modest performance improvement for non-covered queries. (MX-1251)
-
Replication rollback now writes information about the operations that were rolled back to a separate collection,
local.rollback.opdata
.Please see the documentation for full details. (MX-1258)
-
After a successful primary election, the new primary writes a comment in its oplog noting the result of the election. This entry now includes the primary's hostname. (MX-1260)
Bug fixes
- Reduced delays in replication state changes caused by unnecessary coupling between replication subsystems. (MX-1136)
- Partitioned collection administrative commands (
addPartition
,dropPartition
, andgetPartitionInfo
) are now passed frommongos
through tomongod
. (MX-1139) - Fixed an issue that caused
collStats
to fail on a partitioned collection after dropping an index. (MX-1262) - Fixed
mongorestore
's behavior when restoring a partitioned collection, to properly restore the partition layout as well. (MX-1269)
TokuMX 2.0.0-rc.1
General
- This release is focused on new features and improvements since 1.5.1: geospatial features, Ark elections, fast updates, and simultaneously partitioned and sharded collections.
Highlights
- Geospatial indexing and queries have been added and are compatible with those in MongoDB 2.4.
- The election algorithm used for replication failover has been replaced with Ark, which prevents elections from rolling back writes acknowledged with majority Write Concern, and makes failovers faster.
- Fast updates are a new update mechanism that avoids a preliminary query during an update operation, when possible, boosting the speed of update operations by up to 10x.
- Partitioned collections may now be sharded, in a restricted form useful for many time-series workloads with a suitable shard key.
New features and improvements
-
All geospatial indexing and query features of MongoDB 2.4 have been added to TokuMX. (MX-166)
-
Replication elections now use Ark, a consensus algorithm similar to Raft, to elect a new primary node during failover.
This reduces the downtime experienced by clients due to failed failover elections.
This also ensures that writes acknowledged with majority Write Concern will never be rolled back by any later failover. (MX-765)
-
Previously, updates performed a query to read the existing document, then made changes to the relevant indexes. In Fractal Tree indexes, the query is typically much slower than the changes made by an update.
TokuMX 2.0 transparently optimizes updates of the right form to avoid the query phase. For updates by primary key (or
_id
by default) that don't modify any secondary indexes, this can result in a dramatic speedup.Please see the documentation for full details. (MX-932)
-
Reduced the amount of data written to disk when applying replicated operations on secondaries. (MX-1133)
-
Replication rollback has been changed so that oplog entries are significantly smaller for update and delete operations. (MX-1223, MX-1238)
-
Partitioned collections may now be sharded, with several restrictions:
- The collection must be created by the
shardCollection
command, and at that time, both the shard key and partition key must be provided. The partition key will be used as the primary key. - The balancer will not move chunks for a sharded and partitioned collection. Users may manually move chunks with
sh.moveChunk()
, however this is only recommended for empty chunks, as a collection initialization step. - Chunks are not automatically split, though users may manually split chunks with
sh.splitAt()
, which is recommended as a collection initialization step.
Please see the documentation for full details. (MX-1232)
- The collection must be created by the
-
Optimized the comparison function used to compare ObjectIDs. This results in a modest performance improvement for non-covered queries. (MX-1251)
-
Replication rollback now writes information about the operations that were rolled back to a separate collection,
local.rollback.opdata
.Please see the documentation for full details. (MX-1258)
-
After a successful primary election, the new primary writes a comment in its oplog noting the result of the election. This entry now includes the primary's hostname. (MX-1260)
Bug fixes
- Reduced delays in replication state changes caused by unnecessary coupling between replication subsystems. (MX-1136)
- Partitioned collection administrative commands (
addPartition
,dropPartition
, andgetPartitionInfo
) are now passed frommongos
through tomongod
. (MX-1139) - Fixed an issue that caused
collStats
to fail on a partitioned collection after dropping an index. (MX-1262) - Fixed
mongorestore
's behavior when restoring a partitioned collection, to properly restore the partition layout as well. (MX-1269)
TokuMX 1.5.1
General
- This release is focused on bug fixes since 1.5.0.
Bug fixes
-
In versions of TokuMX 1.4.0 through 1.5.0, compound keys with integer values of absolute value larger than 2^52 in all but the last component of the key were ordered incorrectly. This release fixes that ordering.
This error only affected compound keys, so hashed keys are unaffected.
The result of the error is that if there is user data of that size in any but the last field of a key, the key will be ordered incorrectly with respect to other keys in the index. This can cause queries to return improper results if they do range queries on that index.
If the affected index is the primary key of a collection, then before upgrading to 1.5.1, you must dump the affected collection, drop it, and restore it after upgrading to 1.5.1. If the affected index is a secondary key, it can simply be dropped and rebuilt after upgrading to 1.5.1. (MX-1140)
-
Lengthened the socket timeout used by secondaries when replicating large transactions. Accumulation of garbage due to aborted transactions can cause these queries to take a long time, and with a short socket timeout, secondaries could get stuck. This socket timeout is configurable with the server parameter
soTimeoutForReplLargeTxn
, which defaults to 10 minutes. (MX-1207) -
Fixed an issue where if the system ran out of disk space during application of a transaction on a secondary, it could cause that transaction to be skipped rather than causing the system to report the error. In this scenario, the secondary now reports the problem and crashes. (MX-1235)
TokuMX 1.5.1-rc.0
General
- This release is focused on bug fixes since 1.5.0.
Bug fixes
-
In versions of TokuMX 1.4.0 through 1.5.0, compound keys with integer values of absolute value larger than 2^52 in all but the last component of the key were ordered incorrectly. This release fixes that ordering.
This error only affected compound keys, so hashed keys are unaffected.
The result of the error is that if there is user data of that size in any but the last field of a key, the key will be ordered incorrectly with respect to other keys in the index. This can cause queries to return improper results if they do range queries on that index.
If the affected index is the primary key of a collection, then before upgrading to 1.5.1, you must dump the affected collection, drop it, and restore it after upgrading to 1.5.1. If the affected index is a secondary key, it can simply be dropped and rebuilt after upgrading to 1.5.1. (MX-1140)
-
Lengthened the socket timeout used by secondaries when replicating large transactions. Accumulation of garbage due to aborted transactions can cause these queries to take a long time, and with a short socket timeout, secondaries could get stuck. This socket timeout is configurable with the server parameter
soTimeoutForReplLargeTxn
, which defaults to 10 minutes. (MX-1207) -
Fixed an issue where if the system ran out of disk space during application of a transaction on a secondary, it could cause that transaction to be skipped rather than causing the system to report the error. In this scenario, the secondary now reports the problem and crashes. (MX-1235)
TokuMX 1.5.0
General
- This release is focused on the addition of a new collection type,
partitioned
collections, which provide excellent performance and flexibility for time-series workloads.
Highlights
-
Partitioned collections are a new feature with a similar use case to
capped
collections in basic MongoDB, but with better concurrency and flexibility.Briefly, a partitioned collection is similar to a partitioned table in some RDBMSes. The collection is divided physically by primary key into multiple underlying partitions, allowing for the instantaneous and total deletion of entire partitions, but it is treated as one logical collection from the client's perspective.
A typical use case for this would be to partition on a
{date: 1, _id: 1}
primary key, and periodically add new partitions and drop old ones to work with the latest data in a time-series data set. (#1063)
New features and improvements
-
Further reduced the locking done during migrations, to reduce stalls. (#1041, #1135)
-
Initial sync now reports its progress in terms of the # of dbs, collections within a db, and bytes copied, then indexes built for a collection, in the server log, in
db.currentOp()
, and inrs.status()
. (#1050) -
Added a new option to allow operations that require a metadata write lock (such as adding an index) to interrupt operations that are holding a read lock. This option can be used to avoid those write lock requests stalling behind a long read locked operation (such as a large
multi: true
update), and in turn stalling all other operations, including readers, behind itself.To use this option, your application must be ready to handle failed operations and retry them (transactional atomicity makes this somewhat simpler), then you can turn it on for all clients with
--setParameter forceWriteLocks=true
or with thesetParameter
command. This makes all future connections interruptible while they have read locks, but this state can be controlled for each connection with thesetWriteLockYielding
command. See #1132 for more details. (#1132) -
During initial sync, there is a step that checks that all collections that existed at the beginning of the sync's snapshot transaction still exist, to maintain the snapshot's validity. This step has been optimized to avoid slowness in some rare situations. (#1162)
-
Serial insertions have been made significantly faster with improvements to the storage engine. (Tokutek/ft-index#158)
-
Update workloads on extremely large data sets (where a large portion of even internal nodes do not fit in main memory) have been optimized somewhat. (Tokutek/ft-index#226)
-
Added per-connection control of document-level lock wait timeouts with a new command,
{setClientLockTimeout: <timeoutms>}
.This controls the same behavior as
--lockTimeout
which can also be controlled server-wide withsetParameter
(which now controls the default for all new connections). The value should be specified in milliseconds. (#1168)
Bug fixes
- Cleaned up the way remote errors are handled in the distributed locking framework, which will reduce the impact of some rare errors on config servers. (#857, #1140)
- While a shard is draining, it will no longer be chosen to receive new chunks due to the sequential insert optimization, nor will it be chosen as the primary shard for new databases. (#1069)
- The
mongo2toku
tool now correctly uses theadmin
database for authentication to the target cluster by default. (#1125) - Fixed the way packaging scripts check for transparent hugepages, and changed them to allow
madvise
as an acceptable setting. (#1130) - Connections used to copy data for initial sync no longer time out improperly if some operations (such as checking for snapshot invalidation) take too long. (#1147)
TokuMX 1.5.0-rc.2
General
- This release is focused on the addition of a new collection type,
partitioned
collections, which provide excellent performance and flexibility for time-series workloads.
Highlights
-
Partitioned collections are a new feature with a similar use case to
capped
collections in basic MongoDB, but with better concurrency and flexibility.Briefly, a partitioned collection is similar to a partitioned table in some RDBMSes. The collection is divided physically by primary key into multiple underlying partitions, allowing for the instantaneous and total deletion of entire partitions, but it is treated as one logical collection from the client's perspective.
A typical use case for this would be to partition on a
{date: 1, _id: 1}
primary key, and periodically add new partitions and drop old ones to work with the latest data in a time-series data set. (#1063)
New features and improvements
-
Further reduced the locking done during migrations, to reduce stalls. (#1041, #1135)
-
Initial sync now reports its progress in terms of the # of dbs, collections within a db, and bytes copied, then indexes built for a collection, in the server log, in
db.currentOp()
, and inrs.status()
. (#1050) -
Added a new option to allow operations that require a metadata write lock (such as adding an index) to interrupt operations that are holding a read lock. This option can be used to avoid those write lock requests stalling behind a long read locked operation (such as a large
multi: true
update), and in turn stalling all other operations, including readers, behind itself.To use this option, your application must be ready to handle failed operations and retry them (transactional atomicity makes this somewhat simpler), then you can turn it on for all clients with
--setParameter forceWriteLocks=true
or with thesetParameter
command. This makes all future connections interruptible while they have read locks, but this state can be controlled for each connection with thesetWriteLockYielding
command. See #1132 for more details. (#1132) -
During initial sync, there is a step that checks that all collections that existed at the beginning of the sync's snapshot transaction still exist, to maintain the snapshot's validity. This step has been optimized to avoid slowness in some rare situations. (#1162)
-
Serial insertions have been made significantly faster with improvements to the storage engine. (Tokutek/ft-index#158)
-
Update workloads on extremely large data sets (where a large portion of even internal nodes do not fit in main memory) have been optimized somewhat. (Tokutek/ft-index#226)
-
Added per-connection control of document-level lock wait timeouts with a new command,
{setClientLockTimeout: <timeoutms>}
.This controls the same behavior as
--lockTimeout
which can also be controlled server-wide withsetParameter
(which now controls the default for all new connections). The value should be specified in milliseconds. (#1168)
Bug fixes
- Cleaned up the way remote errors are handled in the distributed locking framework, which will reduce the impact of some rare errors on config servers. (#857, #1140)
- While a shard is draining, it will no longer be chosen to receive new chunks due to the sequential insert optimization, nor will it be chosen as the primary shard for new databases. (#1069)
- The
mongo2toku
tool now correctly uses theadmin
database for authentication to the target cluster by default. (#1125) - Fixed the way packaging scripts check for transparent hugepages, and changed them to allow
madvise
as an acceptable setting. (#1130) - Connections used to copy data for initial sync no longer time out improperly if some operations (such as checking for snapshot invalidation) take too long. (#1147)
TokuMX 1.4.3
General
- This release contains a bug fix for 1.4.2.
Bug fixes
- Fixed a rare memory accounting issue caused by differently typed integers in indexed fields of BSON documents, that could lead to crashes. (Tokutek/ft-index#258)
TokuMX 1.5.0-rc.0
General
- This release is focused on the addition of a new collection type,
partitioned
collections, which provide excellent performance and flexibility for time-series workloads.
Highlights
-
Partitioned collections are a new feature with a similar use case to
capped
collections in basic MongoDB, but with better concurrency and flexibility.Briefly, a partitioned collection is similar to a partitioned table in some RDBMSes. The collection is divided physically by primary key into multiple underlying partitions, allowing for the instantaneous and total deletion of entire partitions, but it is treated as one logical collection from the client's perspective.
A typical use case for this would be to partition on a
{date: 1, _id: 1}
primary key, and periodically add new partitions and drop old ones to work with the latest data in a time-series data set. (#1063)
New features and improvements
-
Further reduced the locking done during migrations, to reduce stalls. (#1041, #1135)
-
Initial sync now reports its progress in terms of the # of dbs, collections within a db, and bytes copied, then indexes built for a collection, in the server log, in
db.currentOp()
, and inrs.status()
. (#1050) -
Added a new option to allow operations that require a metadata write lock (such as adding an index) to interrupt operations that are holding a read lock. This option can be used to avoid those write lock requests stalling behind a long read locked operation (such as a large
multi: true
update), and in turn stalling all other operations, including readers, behind itself.To use this option, your application must be ready to handle failed operations and retry them (transactional atomicity makes this somewhat simpler), then you can turn it on for all clients with
--setParameter forceWriteLocks=true
or with thesetParameter
command. This makes all future connections interruptible while they have read locks, but this state can be controlled for each connection with thesetWriteLockYielding
command. See #1132 for more details. (#1132) -
During initial sync, there is a step that checks that all collections that existed at the beginning of the sync's snapshot transaction still exist, to maintain the snapshot's validity. This step has been optimized to avoid slowness in some rare situations. (#1162)
-
Serial insertions have been made significantly faster with improvements to the storage engine. (Tokutek/ft-index#158)
-
Update workloads on extremely large data sets (where a large portion of even internal nodes do not fit in main memory) have been optimized somewhat. (Tokutek/ft-index#226)
-
Added per-connection control of document-level lock wait timeouts with a new command,
{setClientLockTimeout: <timeoutms>}
.This controls the same behavior as
--lockTimeout
which can also be controlled server-wide withsetParameter
(which now controls the default for all new connections). The value should be specified in milliseconds. (#1168)
Bug fixes
- Cleaned up the way remote errors are handled in the distributed locking framework, which will reduce the impact of some rare errors on config servers. (#857, #1140)
- While a shard is draining, it will no longer be chosen to receive new chunks due to the sequential insert optimization, nor will it be chosen as the primary shard for new databases. (#1069)
- The
mongo2toku
tool now correctly uses theadmin
database for authentication to the target cluster by default. (#1125) - Fixed the way packaging scripts check for transparent hugepages, and changed them to allow
madvise
as an acceptable setting. (#1130) - Connections used to copy data for initial sync no longer time out improperly if some operations (such as checking for snapshot invalidation) take too long. (#1147)