Skip to content

Commit ce486b5

Browse files
author
Paul Dagnelie
committed
FDT dedup log sync -- remove incremental
This PR condenses the FDT dedup log syncing into a single sync pass. This reduces the overhead of modifying indirect blocks for the dedup table multiple times per txg. In addition, changes were made to the formula for how much to sync per txg. We now also consider the backlog we have to clear, to prevent it from growing too large, or remaining large on an idle system. Sponsored-by: Klara, Inc. Sponsored-by: iXsystems, Inc. Authored-by: Don Brady <[email protected]> Authored-by: Paul Dagnelie <[email protected]> Signed-off-by: Paul Dagnelie <[email protected]>
1 parent 2cccbac commit ce486b5

File tree

11 files changed

+351
-198
lines changed

11 files changed

+351
-198
lines changed

include/sys/ddt.h

Lines changed: 2 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -285,14 +285,11 @@ typedef struct {
285285
ddt_log_t *ddt_log_active; /* pointers into ddt_log */
286286
ddt_log_t *ddt_log_flushing; /* swapped when flush starts */
287287

288-
hrtime_t ddt_flush_start; /* log flush start this txg */
289-
uint32_t ddt_flush_pass; /* log flush pass this txg */
290-
291-
int32_t ddt_flush_count; /* entries flushed this txg */
292-
int32_t ddt_flush_min; /* min rem entries to flush */
293288
int32_t ddt_log_ingest_rate; /* rolling log ingest rate */
294289
int32_t ddt_log_flush_rate; /* rolling log flush rate */
295290
int32_t ddt_log_flush_time_rate; /* avg time spent flushing */
291+
uint32_t ddt_log_flush_pressure; /* pressure to apply for cap */
292+
uint32_t ddt_log_flush_prev_backlog; /* prev backlog size */
296293

297294
uint64_t ddt_flush_force_txg; /* flush hard before this txg */
298295

include/sys/vdev.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -171,6 +171,7 @@ extern void vdev_queue_change_io_priority(zio_t *zio, zio_priority_t priority);
171171
extern uint32_t vdev_queue_length(vdev_t *vd);
172172
extern uint64_t vdev_queue_last_offset(vdev_t *vd);
173173
extern uint64_t vdev_queue_class_length(vdev_t *vq, zio_priority_t p);
174+
extern boolean_t vdev_queue_pool_busy(spa_t *spa);
174175

175176
extern void vdev_config_dirty(vdev_t *vd);
176177
extern void vdev_config_clean(vdev_t *vd);

include/sys/zfs_debug.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -59,6 +59,7 @@ extern int zfs_dbgmsg_enable;
5959
#define ZFS_DEBUG_METASLAB_ALLOC (1 << 13)
6060
#define ZFS_DEBUG_BRT (1 << 14)
6161
#define ZFS_DEBUG_RAIDZ_RECONSTRUCT (1 << 15)
62+
#define ZFS_DEBUG_DDT (1 << 16)
6263

6364
extern void __set_error(const char *file, const char *func, int line, int err);
6465
extern void __zfs_dbgmsg(char *buf);

man/man4/zfs.4

Lines changed: 43 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -1026,27 +1026,6 @@ milliseconds until the operation completes.
10261026
.It Sy zfs_dedup_prefetch Ns = Ns Sy 0 Ns | Ns 1 Pq int
10271027
Enable prefetching dedup-ed blocks which are going to be freed.
10281028
.
1029-
.It Sy zfs_dedup_log_flush_passes_max Ns = Ns Sy 8 Ns Pq uint
1030-
Maximum number of dedup log flush passes (iterations) each transaction.
1031-
.Pp
1032-
At the start of each transaction, OpenZFS will estimate how many entries it
1033-
needs to flush out to keep up with the change rate, taking the amount and time
1034-
taken to flush on previous txgs into account (see
1035-
.Sy zfs_dedup_log_flush_flow_rate_txgs ) .
1036-
It will spread this amount into a number of passes.
1037-
At each pass, it will use the amount already flushed and the total time taken
1038-
by flushing and by other IO to recompute how much it should do for the remainder
1039-
of the txg.
1040-
.Pp
1041-
Reducing the max number of passes will make flushing more aggressive, flushing
1042-
out more entries on each pass.
1043-
This can be faster, but also more likely to compete with other IO.
1044-
Increasing the max number of passes will put fewer entries onto each pass,
1045-
keeping the overhead of dedup changes to a minimum but possibly causing a large
1046-
number of changes to be dumped on the last pass, which can blow out the txg
1047-
sync time beyond
1048-
.Sy zfs_txg_timeout .
1049-
.
10501029
.It Sy zfs_dedup_log_flush_min_time_ms Ns = Ns Sy 1000 Ns Pq uint
10511030
Minimum time to spend on dedup log flush each transaction.
10521031
.Pp
@@ -1056,22 +1035,58 @@ up to
10561035
This occurs even if doing so would delay the transaction, that is, other IO
10571036
completes under this time.
10581037
.
1059-
.It Sy zfs_dedup_log_flush_entries_min Ns = Ns Sy 1000 Ns Pq uint
1038+
.It Sy zfs_dedup_log_flush_entries_min Ns = Ns Sy 100 Ns Pq uint
10601039
Flush at least this many entries each transaction.
10611040
.Pp
1062-
OpenZFS will estimate how many entries it needs to flush each transaction to
1063-
keep up with the ingest rate (see
1064-
.Sy zfs_dedup_log_flush_flow_rate_txgs ) .
1065-
This sets the minimum for that estimate.
1041+
OpenZFS will a fraction of the log every TXG, to keep the size proportional
1042+
to the ingest rate (see
1043+
.Sy zfs_dedup_log_flush_txgs) .
1044+
This sets the minimum for that estimate, which prevents the backlog from
1045+
never completely draining if the ingest rate falls.
10661046
Raising it can force OpenZFS to flush more aggressively, keeping the log small
10671047
and so reducing pool import times, but can make it less able to back off if
10681048
log flushing would compete with other IO too much.
10691049
.
1050+
.It Sy zfs_dedup_log_flush_entries_max Ns = Ns Sy UINT_MAX Ns Pq uint
1051+
Flush at most this many entries each transaction.
1052+
.Pp
1053+
Mostly used for debugging purposes.
1054+
.It Sy zfs_dedup_log_flush_txgs Ns = Ns Sy 100 Ns Pq uint
1055+
Target number of TXGs to process the whole dedup log.
1056+
.Pp
1057+
Every TXG, OpenZFS will process the inverse of this number times the size
1058+
of the DDT backlog.
1059+
This will keep the backlog at a size roughly equal to the ingest rate
1060+
times this value.
1061+
This offers a balance between a more efficient DDT log, with better
1062+
aggregation, and shorter import times, which increase as the size of the
1063+
DDT log increases.
1064+
Increasing this value will result in a more efficient DDT log, but longer
1065+
import times.
1066+
.It Sy zfs_dedup_log_cap Ns = Ns Sy UINT_MAX Ns Pq uint
1067+
Soft cap for the size of the current dedup log.
1068+
.Pp
1069+
If the log is larger than this size, we increase the aggressiveness of
1070+
the flushing to try to bring it back down to the soft cap.
1071+
Setting it will reduce import times, but will reduce the efficiency of
1072+
the DDT log, increasing the expected number of IOs required to flush the same
1073+
amount of data.
1074+
.It Sy zfs_dedup_log_hard_cap Ns = Ns Sy 0 Ns | Ns 1 Pq int
1075+
Whether to treat the log cap as a hard cap or not.
1076+
.Pp
1077+
The default is 0.
1078+
If this is set to 1, the
1079+
.Sy zfs_dedup_log_cap
1080+
acts more like a hard cap than a soft cap.
1081+
When set to 0, the soft cap will increase the maximum number of log entries we flush
1082+
in a given txg.
1083+
When set to 1, it will also increase the minimum number of log entries we flush.
1084+
Enabling it will reduce worst-case import times, at the cost of increased TXG
1085+
sync times.
10701086
.It Sy zfs_dedup_log_flush_flow_rate_txgs Ns = Ns Sy 10 Ns Pq uint
10711087
Number of transactions to use to compute the flow rate.
10721088
.Pp
1073-
OpenZFS will estimate how many entries it needs to flush each transaction by
1074-
monitoring the number of entries changed (ingest rate), number of entries
1089+
OpenZFS will estimate number of entries changed (ingest rate), number of entries
10751090
flushed (flush rate) and time spent flushing (flush time rate) and combining
10761091
these into an overall "flow rate".
10771092
It will use an exponential weighted moving average over some number of recent

0 commit comments

Comments
 (0)