-
Notifications
You must be signed in to change notification settings - Fork 34
Description
Summary
The content-type package is a critical hot path component that runs on every HTTP request with a Content-Type header in Express, body-parser, and countless other frameworks. Even microsecond-level improvements can have significant impact at scale.
I've created an optimized fork that achieved 51.3% faster parsing (https://github.com/jdmiranda/content-type), and I'd like to contribute additional optimization ideas back to the original repository.
Critical Context
- Usage: Runs on EVERY HTTP request in Express middleware chain
- Dependents: 1,843 packages depend on this (npm registry)
- Scale Impact: A 10μs improvement = significant savings for high-traffic applications
- Already Implemented (in fork):
- LRU cache for parsed results (100 entries)
- Fast path for 8 common content types
- Optimized parameter parsing with escape detection
Additional Optimization Opportunities
1. Object Pool for ContentType Instances
Current State: Every parse() call creates a new ContentType object with new ContentType(type)
Opportunity: Reuse pre-allocated ContentType objects to reduce GC pressure
// Object pool implementation
var POOL_SIZE = 50
var objectPool = []
function getPooledContentType(type) {
var obj = objectPool.pop()
if (!obj) {
obj = new ContentType(type)
} else {
obj.type = type
// Clear parameters object
obj.parameters = Object.create(null)
}
return obj
}
function releaseToPool(obj) {
if (objectPool.length < POOL_SIZE) {
objectPool.push(obj)
}
}Impact: 5-10% reduction in GC overhead for high-request-rate servers
2. Optimized String Building in format()
Current State: String concatenation using += in a loop
var string = type
// ...
string += '; ' + param + '=' + qstring(parameters[param])Opportunity: Use array joining for better performance with multiple parameters
function format(obj) {
// ... validation code ...
if (!parameters || typeof parameters !== 'object') {
return type
}
var params = Object.keys(parameters).sort()
if (params.length === 0) {
return type
}
// Pre-allocate array with known size
var parts = new Array(params.length + 1)
parts[0] = type
for (var i = 0; i < params.length; i++) {
var param = params[i]
if (!TOKEN_REGEXP.test(param)) {
throw new TypeError('invalid parameter name')
}
parts[i + 1] = '; ' + param + '=' + qstring(parameters[param])
}
return parts.join('')
}Impact: 10-15% faster for content types with 2+ parameters
3. Lazy Parameter Parsing
Current State: All parameters are parsed immediately, even if the caller only needs the type
Opportunity: Defer parameter parsing until accessed
function ContentType(type) {
this.type = type
this._parameters = null
this._rawHeader = null
this._paramsParsed = false
}
Object.defineProperty(ContentType.prototype, 'parameters', {
get: function() {
if (!this._paramsParsed) {
this._parseParameters()
}
return this._parameters
},
set: function(val) {
this._parameters = val
this._paramsParsed = true
}
})Use Case: Many consumers only need the media type, not parameters:
var ct = contentType.parse(req)
if (ct.type === 'application/json') {
// No parameters needed - saved parsing time
}Impact: 20-30% faster for use cases that don't access parameters
4. Specialized Fast Paths for Common Operations
Current State: Generic parsing for all inputs
Opportunity: Detect and optimize common patterns
// Fast path for "type only" (no parameters)
function parse(string) {
// ... existing validation ...
// Fast path: no semicolon means no parameters
if (header.indexOf(';') === -1) {
var trimmed = header.trim()
if (!TYPE_REGEXP.test(trimmed)) {
throw new TypeError('invalid media type')
}
return new ContentType(trimmed.toLowerCase())
}
// ... rest of parsing ...
}Additional fast path - charset-only:
// Optimize for common pattern: "type; charset=value"
if (header.indexOf('charset=') !== -1 &&
header.indexOf(';', header.indexOf('charset=')) === -1) {
// Only one parameter and it's charset - optimize
return parseCharsetOnly(header)
}Impact: 15-20% faster for simple content types
5. Avoid Redundant toLowerCase() Calls
Current State: Multiple toLowerCase() calls in parse flow
var obj = new ContentType(type.toLowerCase())
// ...
key = match[1].toLowerCase()Opportunity: Pre-lowercase the entire header once
function parse(string) {
// ... validation ...
// Lowercase once at the start
var lowerHeader = header.toLowerCase()
var index = lowerHeader.indexOf(';')
var type = index !== -1
? lowerHeader.slice(0, index).trim()
: lowerHeader.trim()
// ... rest uses lowerHeader, no more toLowerCase() calls needed ...
}Impact: 3-5% improvement by eliminating redundant string operations
Performance Impact Summary
| Optimization | Estimated Improvement | Complexity |
|---|---|---|
| Object pooling | 5-10% (GC reduction) | Medium |
| Optimized format() | 10-15% (multi-param) | Low |
| Lazy parameter parsing | 20-30% (type-only access) | Medium |
| Specialized fast paths | 15-20% | Low |
| Single toLowerCase() | 3-5% | Low |
Combined potential: 40-60% additional improvement on top of existing optimizations
Benchmark Methodology
For reference, here's a benchmark template for testing improvements:
const Benchmark = require('benchmark')
const contentType = require('content-type')
const suite = new Benchmark.Suite()
// Test cases covering common scenarios
const testCases = [
'application/json',
'application/json; charset=utf-8',
'text/html; charset=utf-8',
'multipart/form-data; boundary=----WebKitFormBoundary7MA4YWxkTrZu0gW',
'application/xml; charset=iso-8859-1; version=1.0'
]
testCases.forEach(test => {
suite.add(`parse: ${test}`, () => {
contentType.parse(test)
})
})
suite
.on('cycle', (event) => console.log(String(event.target)))
.on('complete', function() {
console.log('Fastest is ' + this.filter('fastest').map('name'))
})
.run()Offer to Help
I'm happy to:
- Create a PR implementing any/all of these optimizations
- Provide comprehensive benchmarks showing before/after performance
- Ensure 100% backward compatibility with existing test suite
- Add additional test coverage if needed
This package is a critical piece of Node.js infrastructure, and I believe these optimizations could benefit the entire ecosystem. Please let me know which optimizations you'd like me to prioritize!
Related Work
- Optimized fork: https://github.com/jdmiranda/content-type
- Performance report: https://github.com/jdmiranda/content-type/blob/master/PERFORMANCE_REPORT.md
Thank you for maintaining this essential package!