@@ -4,6 +4,123 @@ brevity. Much more detail can be found in the git revision history:
4
4
5
5
https://github.com/jemalloc/jemalloc
6
6
7
+ * 5.1.0 (May 4th, 2018)
8
+
9
+ This release is primarily about fine-tuning, ranging from several new features
10
+ to numerous notable performance and portability enhancements. The release and
11
+ prior dev versions have been running in multiple large scale applications for
12
+ months, and the cumulative improvements are substantial in many cases.
13
+
14
+ Given the long and successful production runs, this release is likely a good
15
+ candidate for applications to upgrade, from both jemalloc 5.0 and before. For
16
+ performance-critical applications, the newly added TUNING.md provides
17
+ guidelines on jemalloc tuning.
18
+
19
+ New features:
20
+ - Implement transparent huge page support for internal metadata. (@interwq)
21
+ - Add opt.thp to allow enabling / disabling transparent huge pages for all
22
+ mappings. (@interwq)
23
+ - Add maximum background thread count option. (@djwatson)
24
+ - Allow prof_active to control opt.lg_prof_interval and prof.gdump.
25
+ (@interwq)
26
+ - Allow arena index lookup based on allocation addresses via mallctl.
27
+ (@lionkov)
28
+ - Allow disabling initial-exec TLS model. (@davidtgoldblatt, @KenMacD)
29
+ - Add opt.lg_extent_max_active_fit to set the max ratio between the size of
30
+ the active extent selected (to split off from) and the size of the requested
31
+ allocation. (@interwq, @davidtgoldblatt)
32
+ - Add retain_grow_limit to set the max size when growing virtual address
33
+ space. (@interwq)
34
+ - Add mallctl interfaces:
35
+ + arena.<i>.retain_grow_limit (@interwq)
36
+ + arenas.lookup (@lionkov)
37
+ + max_background_threads (@djwatson)
38
+ + opt.lg_extent_max_active_fit (@interwq)
39
+ + opt.max_background_threads (@djwatson)
40
+ + opt.metadata_thp (@interwq)
41
+ + opt.thp (@interwq)
42
+ + stats.metadata_thp (@interwq)
43
+
44
+ Portability improvements:
45
+ - Support GNU/kFreeBSD configuration. (@paravoid)
46
+ - Support m68k, nios2 and SH3 architectures. (@paravoid)
47
+ - Fall back to FD_CLOEXEC when O_CLOEXEC is unavailable. (@zonyitoo)
48
+ - Fix symbol listing for cross-compiling. (@tamird)
49
+ - Fix high bits computation on ARM. (@davidtgoldblatt, @paravoid)
50
+ - Disable the CPU_SPINWAIT macro for Power. (@davidtgoldblatt, @marxin)
51
+ - Fix MSVC 2015 & 2017 builds. (@rustyx)
52
+ - Improve RISC-V support. (@EdSchouten)
53
+ - Set name mangling script in strict mode. (@nicolov)
54
+ - Avoid MADV_HUGEPAGE on ARM. (@marxin)
55
+ - Modify configure to determine return value of strerror_r.
56
+ (@davidtgoldblatt, @cferris1000)
57
+ - Make sure CXXFLAGS is tested with CPP compiler. (@nehaljwani)
58
+ - Fix 32-bit build on MSVC. (@rustyx)
59
+ - Fix external symbol on MSVC. (@maksqwe)
60
+ - Avoid a printf format specifier warning. (@jasone)
61
+ - Add configure option --disable-initial-exec-tls which can allow jemalloc to
62
+ be dynamically loaded after program startup. (@davidtgoldblatt, @KenMacD)
63
+ - AArch64: Add ILP32 support. (@cmuellner)
64
+ - Add --with-lg-vaddr configure option to support cross compiling.
65
+ (@cmuellner, @davidtgoldblatt)
66
+
67
+ Optimizations and refactors:
68
+ - Improve active extent fit with extent_max_active_fit. This considerably
69
+ reduces fragmentation over time and improves virtual memory and metadata
70
+ usage. (@davidtgoldblatt, @interwq)
71
+ - Eagerly coalesce large extents to reduce fragmentation. (@interwq)
72
+ - sdallocx: only read size info when page aligned (i.e. possibly sampled),
73
+ which speeds up the sized deallocation path significantly. (@interwq)
74
+ - Avoid attempting new mappings for in place expansion with retain, since
75
+ it rarely succeeds in practice and causes high overhead. (@interwq)
76
+ - Refactor OOM handling in newImpl. (@wqfish)
77
+ - Add internal fine-grained logging functionality for debugging use.
78
+ (@davidtgoldblatt)
79
+ - Refactor arena / tcache interactions. (@davidtgoldblatt)
80
+ - Refactor extent management with dumpable flag. (@davidtgoldblatt)
81
+ - Add runtime detection of lazy purging. (@interwq)
82
+ - Use pairing heap instead of red-black tree for extents_avail. (@djwatson)
83
+ - Use sysctl on startup in FreeBSD. (@trasz)
84
+ - Use thread local prng state instead of atomic. (@djwatson)
85
+ - Make decay to always purge one more extent than before, because in
86
+ practice large extents are usually the ones that cross the decay threshold.
87
+ Purging the additional extent helps save memory as well as reduce VM
88
+ fragmentation. (@interwq)
89
+ - Fast division by dynamic values. (@davidtgoldblatt)
90
+ - Improve the fit for aligned allocation. (@interwq, @edwinsmith)
91
+ - Refactor extent_t bitpacking. (@rkmisra)
92
+ - Optimize the generated assembly for ticker operations. (@davidtgoldblatt)
93
+ - Convert stats printing to use a structured text emitter. (@davidtgoldblatt)
94
+ - Remove preserve_lru feature for extents management. (@djwatson)
95
+ - Consolidate two memory loads into one on the fast deallocation path.
96
+ (@davidtgoldblatt, @interwq)
97
+
98
+ Bug fixes (most of the issues are only relevant to jemalloc 5.0):
99
+ - Fix deadlock with multithreaded fork in OS X. (@davidtgoldblatt)
100
+ - Validate returned file descriptor before use. (@zonyitoo)
101
+ - Fix a few background thread initialization and shutdown issues. (@interwq)
102
+ - Fix an extent coalesce + decay race by taking both coalescing extents off
103
+ the LRU list. (@interwq)
104
+ - Fix potentially unbound increase during decay, caused by one thread keep
105
+ stashing memory to purge while other threads generating new pages. The
106
+ number of pages to purge is checked to prevent this. (@interwq)
107
+ - Fix a FreeBSD bootstrap assertion. (@strejda, @interwq)
108
+ - Handle 32 bit mutex counters. (@rkmisra)
109
+ - Fix a indexing bug when creating background threads. (@davidtgoldblatt,
110
+ @binliu19)
111
+ - Fix arguments passed to extent_init. (@yuleniwo, @interwq)
112
+ - Fix addresses used for ordering mutexes. (@rkmisra)
113
+ - Fix abort_conf processing during bootstrap. (@interwq)
114
+ - Fix include path order for out-of-tree builds. (@cmuellner)
115
+
116
+ Incompatible changes:
117
+ - Remove --disable-thp. (@interwq)
118
+ - Remove mallctl interfaces:
119
+ + config.thp (@interwq)
120
+
121
+ Documentation:
122
+ - Add TUNING.md. (@interwq, @davidtgoldblatt, @djwatson)
123
+
7
124
* 5.0.1 (July 1, 2017)
8
125
9
126
This bugfix release fixes several issues, most of which are obscure enough
@@ -22,7 +139,7 @@ brevity. Much more detail can be found in the git revision history:
22
139
unlikely to be an issue with other libc implementations. (@interwq)
23
140
- Mask signals during background thread creation. This prevents signals from
24
141
being inadvertently delivered to background threads. (@jasone,
25
- @davidgoldblatt , @interwq)
142
+ @davidtgoldblatt , @interwq)
26
143
- Avoid inactivity checks within background threads, in order to prevent
27
144
recursive mutex acquisition. (@interwq)
28
145
- Fix extent_grow_retained() to use the specified hooks when the
@@ -515,7 +632,7 @@ brevity. Much more detail can be found in the git revision history:
515
632
these fixes, xallocx() now tries harder to partially fulfill requests for
516
633
optional extra space. Note that a couple of minor heap profiling
517
634
optimizations are included, but these are better thought of as performance
518
- fixes that were integral to disovering most of the other bugs.
635
+ fixes that were integral to discovering most of the other bugs.
519
636
520
637
Optimizations:
521
638
- Avoid a chunk metadata read in arena_prof_tctx_set(), since it is in the
0 commit comments