Skip to content

Socket exhaustion in multi-device HTTP polling scenarios #12

@PGTBoos

Description

@PGTBoos

Summary

HTTPClient can experience socket exhaustion when polling multiple devices over extended periods (6+ hours), resulting in HTTPC_ERROR_CONNECTION_REFUSED (-1) errors. This affects IoT and home automation systems managing 5+ HTTP endpoints.

Environment

  • Board: ESP32 (all variants)
  • Arduino-ESP32: Latest (tested on 2.x and 3.x)
  • HTTPClient version: Current main branch
  • Scenario: 8 devices polled every 15 seconds over 24+ hours

Problem Analysis

Code Review of HTTPClient.cpp

Looking at the actual implementation:

Line 1109 (HTTPClient::connect()):

if (!_client->connect(_host.c_str(), _port, _connectTimeout)) {
    log_d("failed connect to %s:%u", _host.c_str(), _port);
    return false;
}

Line 367-392 (HTTPClient::disconnect()):

void HTTPClient::disconnect(bool preserveClient) {
    if (connected()) {
        if (_reuse && _canReuse) {
            log_d("tcp keep open for reuse");
        } else {
            log_d("tcp stop");
            _client->stop();  // ← Socket closed here
            // ... client set to nullptr ...
        }
    }
}

Line 358-361 (HTTPClient::end()):

void HTTPClient::end(void) {
    disconnect(false);
    clear();
    // ← Returns immediately, no delay for socket cleanup
}

The Problem

The current implementation correctly calls _client->stop(), but immediately returns control. On ESP32, TCP sockets need time to transition through proper closure states (FIN, TIME_WAIT, etc.). When applications immediately create new connections:

  1. Old socket still in TIME_WAIT (30-120 seconds depending on network)
  2. New connection request creates new socket
  3. Over hours: socket pool exhausts (ESP32 default: ~10 sockets via MEMP_NUM_NETCONN)

This is exacerbated when:

  • Using aggressive setConnectTimeout() values (< 1000ms)
  • Polling multiple devices at high frequency (< 15 seconds)
  • Connection failures leave sockets in inconsistent states

User-Side Symptoms

Initial boot: All devices connect fine
After 6-8 hours: 
  P1 > HTTP code: -1  (HTTPC_ERROR_CONNECTION_REFUSED)
  Socket 1 > HTTP error -1
  Socket 2 > HTTP error -1
  [... cascade failure of all HTTP requests]
  
After ESP32 reboot: Everything works again

Root Causes

1. No Socket Cleanup Delay in end()

Current code:

void HTTPClient::end(void) {
    disconnect(false);
    clear();
    // Immediately returns - socket may still be closing
}

ESP32's lwIP stack needs time to fully close sockets. Without delay, rapid reconnections exhaust the pool.

2. Aggressive setConnectTimeout() Creates Half-Open Sockets

When users set very short timeouts:

http.setConnectTimeout(300);  // 300ms - too aggressive for WiFi

Failed connections during SYN/ACK handshake can leave sockets in SYN_SENT state, consuming resources.

3. Documentation Doesn't Warn About Multi-Device Patterns

The README and examples don't address:

  • Socket pool limits on ESP32
  • Best practices for polling multiple endpoints
  • Recommended intervals to prevent exhaustion

Proposed Solutions

Solution 1: Add Cleanup Delay to end() (Minimal Impact)

File: libraries/HTTPClient/src/HTTPClient.cpp

void HTTPClient::end(void) {
    disconnect(false);
    clear();
    
    // Give lwIP time to process socket closure
    // Prevents socket exhaustion in multi-device polling scenarios
    // Impact: ~50ms delay per request (negligible for most applications)
    delay(50);
}

Pros:

  • ✅ Fixes the root cause
  • ✅ Minimal performance impact (50ms)
  • ✅ Transparent to users
  • ✅ Prevents gradual socket exhaustion

Cons:

  • ❌ Adds fixed delay to all HTTPClient usage
  • ❌ May not be appropriate for time-critical applications

Alternative: Make it configurable:

class HTTPClient {
public:
    void setCleanupDelay(uint16_t delayMs);  // Default: 50ms
private:
    uint16_t _cleanupDelay = 50;
};

void HTTPClient::end(void) {
    disconnect(false);
    clear();
    if (_cleanupDelay > 0) {
        delay(_cleanupDelay);
    }
}

Solution 2: Warn About Aggressive Timeouts (Documentation)

File: libraries/HTTPClient/README.md

Add warning about setConnectTimeout():

### Important: Connection Timeout Considerations

The `setConnectTimeout()` method sets the TCP connection timeout. 

**⚠️ WARNING:** Values below 1000ms can cause socket exhaustion over time, especially
when polling multiple devices. Failed connections may leave sockets in inconsistent
states that aren't properly cleaned up.

**Recommended values:**
- WiFi networks: 3000-5000ms
- Ethernet: 2000-3000ms
- Unreliable networks: 5000-10000ms

**Avoid:** Values < 1000ms unless you have specific timing requirements and understand
the implications for socket pool management.

Solution 3: Add Multi-Device Example (Best Practices)

File: libraries/HTTPClient/examples/MultiDevicePolling/MultiDevicePolling.ino

Create example demonstrating:

  • Proper polling intervals (15-30 seconds)
  • Manual cleanup delays if not added to library
  • Staggered device initialization
  • Error handling and backoff strategies

See attached example code below.

Solution 4: Add Socket Pool Diagnostic (Developer Tool)

Optional enhancement to help developers debug:

class HTTPClient {
public:
    static int getActiveSockets();  // Debug helper
};

This would require cooperation with NetworkClient layer, but could help developers identify exhaustion before it becomes critical.

Recommended Implementation Priority

  1. High Priority: Solution 2 (Documentation) - Immediate, no code changes
  2. High Priority: Solution 3 (Example) - Helps developers avoid the problem
  3. Medium Priority: Solution 1 (Cleanup delay) - Fixes root cause but needs careful consideration
  4. Low Priority: Solution 4 (Diagnostics) - Nice-to-have for advanced users

Evidence / Test Results

Before Fixes (User Code Only)

Uptime: 6-8 hours before failure
Symptoms: Cascade HTTP -1 errors
Socket pool: Exhausted
Recovery: Requires ESP32 reboot

After Fixes (User Code + Manual Delays)

Uptime: 24+ hours stable
Active HTTP requests: ~23,000 over 24h
Socket pool: 3-4/10 in use (sustainable)
Errors: Zero socket-related failures
RAM: Stable at 164KB

Key change in user code:

http.end();
client.stop();
delay(100);  // Manual cleanup delay

This proves that socket cleanup delay solves the issue.

Multi-Device Polling Example

/**
 * MultiDevicePolling.ino
 * 
 * Demonstrates reliable HTTP polling of multiple devices over extended periods.
 * Prevents socket exhaustion through proper cleanup and timing patterns.
 * 
 * Tested stable for 24+ hours with 8 devices.
 */

#include <WiFi.h>
#include <HTTPClient.h>

const char* ssid = "your-ssid";
const char* password = "your-password";

// Configuration
const int NUM_DEVICES = 8;
const unsigned long POLL_INTERVAL = 15000;  // 15 seconds
const uint16_t HTTP_TIMEOUT = 5000;         // 5 seconds

// Device URLs
const char* deviceURLs[NUM_DEVICES] = {
  "http://192.168.1.101/api/status",
  "http://192.168.1.102/api/status",
  "http://192.168.1.103/api/status",
  "http://192.168.1.104/api/status",
  "http://192.168.1.105/api/status",
  "http://192.168.1.106/api/status",
  "http://192.168.1.107/api/status",
  "http://192.168.1.108/api/status"
};

unsigned long lastPoll[NUM_DEVICES] = {0};

void setup() {
  Serial.begin(115200);
  
  WiFi.begin(ssid, password);
  while (WiFi.status() != WL_CONNECTED) {
    delay(500);
    Serial.print(".");
  }
  Serial.println("\nWiFi Connected!");
  
  // Stagger initial polls to prevent simultaneous requests
  for (int i = 0; i < NUM_DEVICES; i++) {
    lastPoll[i] = millis() - POLL_INTERVAL + (i * 2000);
  }
}

void loop() {
  unsigned long now = millis();
  
  for (int i = 0; i < NUM_DEVICES; i++) {
    if (now - lastPoll[i] >= POLL_INTERVAL) {
      pollDevice(i);
      lastPoll[i] = now;
    }
  }
  
  delay(10);  // Prevent tight loop
}

void pollDevice(int index) {
  // IMPORTANT: Use LOCAL instances per request
  NetworkClient client;
  HTTPClient http;
  
  // Set reasonable timeouts
  http.setTimeout(HTTP_TIMEOUT);
  // DON'T use aggressive setConnectTimeout() - causes socket issues
  
  http.setReuse(false);  // Disable keep-alive for simpler cleanup
  
  if (!http.begin(client, deviceURLs[index])) {
    Serial.printf("Device %d: Connection failed\n", index);
    client.stop();
    delay(100);  // Socket cleanup - CRITICAL for preventing exhaustion
    return;
  }
  
  int httpCode = http.GET();
  
  if (httpCode == HTTP_CODE_OK) {
    String payload = http.getString();
    Serial.printf("Device %d: %s\n", index, payload.c_str());
  } else {
    Serial.printf("Device %d: HTTP error %d\n", index, httpCode);
  }
  
  // Proper cleanup sequence
  http.end();
  client.stop();
  
  // CRITICAL: Allow TCP socket to fully close
  // Without this delay, rapid polling exhausts ESP32's socket pool (default: ~10 sockets)
  // This delay can be removed if HTTPClient::end() is enhanced to include it
  delay(100);
}

Impact

This issue affects:

  • IoT monitoring systems (polling sensors/devices)
  • Home automation (managing smart home devices)
  • Industrial applications (equipment monitoring)
  • Any ESP32 project polling 5+ HTTP endpoints over extended periods

Proper documentation and/or code fixes would prevent the common pattern of:
"Works great for hours, then mysteriously fails, requires reboot"

Thank you for maintaining this excellent library! The goal is to help other developers avoid the debugging journey we went through. 🙏

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions