Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pistache very poor performance when POST size exceeds Endpoint::options().maxRequestSize(), and maxRequestSize is large. #1190

Open
dennisjenkins75 opened this issue Jan 21, 2024 · 4 comments

Comments

@dennisjenkins75
Copy link
Collaborator

TL;DR

If one creates a pistacheio Endpoint with a large maxRequestSize (in the MiB range), and then sends a HTTP request with a request body larger than this size, Pistacheio takes a VERY LONG TIME to return HTTP 413 error, and during that time is is burning 100% of a CPU core.

I do not have a simple repro to post yet. The following was obtained and tuned via trial and error. My own system has a maxRequestSize of 4 MiB, b/c my typical max realistic payload is ~1 MiB. However, one of my data sources generates a legit 66 MiB payload, and my pistacheio server process appeared to hang. Upon further debugging, I determined that its not hung, just acting very insufficiently. The following is my rambling attempt to get to the bottom of the issue.

NOTE: This issue does not successfully repro with the default example rest_server that ships with pistachio, because that binary keeps the default 4 KiB maxRequestSize.

Long Version

Under these circumstances this is a potential DOS attack against pistacheio.

I've built pistacheio from source (2023-12-05), and installed it into /usr/local:

~/github/pistacheio/pistache $ git rev-parse HEAD
31bc54dc1d8cbaeafa92c9c9b99519585205f77a

In my pistache app, I construct the Http::Endpoint::options() with .maxRequestSize(4 * 1024 * 1024) (4 MiB).

If I attempt to POST (via libcurl) a 6 MiB payload, pistache returns HTTP 413 in just a few seconds. If I increase that payload to 66 MiB, pistache appears to hang for a long time (it will return after many minutes with HTTP 413). While "hung", its stack looks like this:

(gdb) thread 15
[Switching to thread 15 (Thread 0x7f73597fa6c0 (LWP 2349455))]
#0  0x00007f736761a230 in Pistache::StreamCursor::advance(unsigned long) () from /usr/local/lib64/libpistache.so.0
(gdb) bt
#0  0x00007f736761a230 in Pistache::StreamCursor::advance(unsigned long) () from /usr/local/lib64/libpistache.so.0
#1  0x00007f736761a635 in Pistache::match_until(std::initializer_list<char>, Pistache::StreamCursor&, Pistache::CaseSensitivity) () from /usr/local/lib64/libpistache.so.0
#2  0x00007f736761a6dd in Pistache::match_until(char, Pistache::StreamCursor&, Pistache::CaseSensitivity) () from /usr/local/lib64/libpistache.so.0
#3  0x00007f73675e8869 in Pistache::Http::Private::RequestLineStep::apply(Pistache::StreamCursor&) () from /usr/local/lib64/libpistache.so.0
#4  0x00007f73675e2033 in Pistache::Http::Private::ParserBase::parse() () from /usr/local/lib64/libpistache.so.0
#5  0x00007f73675ea9d5 in Pistache::Http::Handler::onInput(char const*, unsigned long, std::shared_ptr<Pistache::Tcp::Peer> const&) ()
   from /usr/local/lib64/libpistache.so.0
#6  0x00007f736761c0fb in Pistache::Tcp::Transport::handleIncoming(std::shared_ptr<Pistache::Tcp::Peer> const&) () from /usr/local/lib64/libpistache.so.0
#7  0x00007f736762191d in Pistache::Tcp::Transport::onReady(Pistache::Aio::FdSet const&) () from /usr/local/lib64/libpistache.so.0
#8  0x00007f7367619713 in Pistache::Aio::SyncImpl::runOnce() () from /usr/local/lib64/libpistache.so.0
#9  0x00007f7367616fea in std::thread::_State_impl<std::thread::_Invoker<std::tuple<Pistache::Aio::AsyncImpl::Worker::run()::{lambda()#1}> > >::_M_run() ()
   from /usr/local/lib64/libpistache.so.0
#10 0x00007f7366c4966f in std::execute_native_thread_routine (__p=0x7f735c0017d0)
    at /var/tmp/portage/sys-devel/gcc-13.2.1_p20230826/work/gcc-13-20230826/libstdc++-v3/src/c++11/thread.cc:104
#11 0x00007f736692a308 in start_thread () from /lib64/libc.so.6
#12 0x00007f736699aa68 in clone3 () from /lib64/libc.so.6

The amount of time (on my system anyway) that it takes for pistache to "handle" the request and return HTTP 413 is not linear with respect to the size of the request:

request size (MiB) time (seconds)
4.0001 0.7
5 1.7
6 4.7
7 9.8
8 16.2
16 47.8
24 80.3
32 113.9
40 144.9

ok, so it starts out quadratic of sorts, then asymptotically approaches linear.

Anyway, IMHO, pistache appears to be processing all of the POST data, even though the maxRequestSize was set to something much smaller.

It would be nice to get a CPU profile of pistacheio while its doing this, but gprof produced an output of all 0% in each child function (I did not debug this yet).

Using sudo perf top -p ....:

    45.04%  libc.so.6             [.] tolower
    16.17%  libpistache.so.0.2.6  [.] Pistache::StreamCursor::current
    15.10%  libpistache.so.0.2.6  [.] Pistache::match_until
    11.47%  libpistache.so.0.2.6  [.] Pistache::StreamCursor::advance
     6.40%  libpistache.so.0.2.6  [.] Pistache::StreamCursor::eof
     2.94%  libpistache.so.0.2.6  [.] Pistache::StreamCursor::remaining
     2.84%  libpistache.so.0.2.6  [.] Pistache::StreamCursor::current@plt
     0.01%  libpistache.so.0.2.6  [.] tolower@plt
     0.01%  [kernel]              [k] read_hpet
     0.00%  libpistache.so.0.2.6  [.] Pistache::Http::Private::ParserBase::feed
     0.00%  [kernel]              [k] _raw_spin_lock_irq
     0.00%  [kernel]              [k] ep_item_poll.isra.0
     0.00%  [kernel]              [k] folio_add_lru
     0.00%  [kernel]              [k] __get_user_8
     0.00%  libpistache.so.0.2.6  [.] Pistache::Aio::SyncImpl::runOnce
     0.00%  [kernel]              [k] x86_pmu_disable_all
     0.00%  libpistache.so.0.2.6  [.] Pistache::Http::TransportImpl::checkIdlePeers
     0.00%  [kernel]              [k] __cgroup_account_cputime

Running a strace on my process while its attempting to handle a "too large post request" (in a unit test that I just wrote while debugging this issue), shows that it is ready the socket at 4 kiB each time, instead of the max remaining buffer amount. This is not optimal. Yet the actual pistachio source code shows that it will try to read the entire remaining buffer.

[pid 2350991] recvfrom(233, "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., 4096, 0, NULL, NULL) = 4096
[pid 2350991] recvfrom(233, "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., 4096, 0, NULL, NULL) = 4096
[pid 2350991] recvfrom(233, "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., 4096, 0, NULL, NULL) = 4096
[pid 2350991] recvfrom(233, "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., 4096, 0, NULL, NULL) = 4096
[pid 2350991] recvfrom(233, "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., 4096, 0, NULL, NULL) = 4096
[pid 2350991] recvfrom(233, "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., 4096, 0, NULL, NULL) = 4096
[pid 2350991] recvfrom(233, "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., 4096, 0, NULL, NULL) = 4096
[pid 2350991] recvfrom(233, "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., 4096, 0, NULL, NULL) = 4096
[pid 2350991] recvfrom(233, "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., 4096, 0, NULL, NULL) = 4096
[pid 2350991] recvfrom(233, "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., 4096, 0, NULL, NULL) = 4096
[pid 2350991] recvfrom(233, "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., 4096, 0, NULL, NULL) = 4096
[pid 2350991] recvfrom(233, "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., 4096, 0, NULL, NULL) = 4096
[pid 2350991] recvfrom(233, "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., 4096, 0, NULL, NULL) = 4096
[pid 2350991] recvfrom(233, "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., 4096, 0, NULL, NULL) = 4096

(from src/common/transport.cc, method Transport::handleIncoming)

#ifdef PISTACHE_USE_SSL
            if (peer->ssl() != NULL)
            {
                bytes = SSL_read((SSL*)peer->ssl(), buffer + totalBytes,
                                 static_cast<int>(Const::MaxBuffer - totalBytes));
            }
            else
            {
#endif /* PISTACHE_USE_SSL */
                bytes = recv(fd, buffer + totalBytes, Const::MaxBuffer - totalBytes, 0);
#ifdef PISTACHE_USE_SSL
            }
#endif /* PISTACHE_USE_SSL */

but Const::MaxBuffer is equal to 4096 and not configurable. So no matter how large we set maxRequestSize() to, pistache will only ever read 4096 bytes at a time. For small requests this is probably fine, but for a 66 MiB payload, this is 16896 reads, which appear to then be processed one byte at a time?

Main Observation

maxRequestSize = 4 MiB, a 3.999 MiB request completes in a fraction of a second. Yet a 40 MiB request takes many minutes. Setting maxRequestSize to 80 MiB, the same 40 MiB request completes in only a few seconds. ISTM that once pistache decides that it will reject the request due to it being too large (HTTP 413), that it consumes the payload very inefficiently.

Stand alone repro of this issue

I can work on this in a bit.

@kiplingw
Copy link
Member

Good find @dennisjenkins75. Try the following:

  • Rebuild from HEAD with SSL disabled;
  • Also rule out the compressor codecs by ensuring you aren't requesting any returned encoding.

@dennisjenkins75
Copy link
Collaborator Author

Still has same poor behavior.

$ git pull

$ git rev-parse HEAD
1c733a145b01a4737cf5c7dd3709bd85be404886

$ rm -rf build

$  meson setup build --buildtype=release \
    -DPISTACHE_USE_SSL=false \
    -DPISTACHE_BUILD_EXAMPLES=true \
    -DPISTACHE_BUILD_TESTS=true \
    -DPISTACHE_BUILD_DOCS=false \
    -DPISTACHE_USE_CONTENT_ENCODING_BROTLI=false \
    -DPISTACHE_USE_CONTENT_ENCODING_DEFLATE=false \
    --prefix="/usr/local"

$ meson compile -C build

$ sudo meson install -C build

I then recompiled my entire project from scratch, ran my normal unit tests (pass), enabled the test that attempts to POST 128 MiB when my app has .maxRequestSize set to 80 MiB. My app still has the same poor behavior. Burns 100% of 1 CPU core while consuming input bytes. My test harness times out after 2300-ish seconds and tears down the whole thing.

I'll work on a stand alone minimal repro server demo, that can be triggered with curl.

@kiplingw
Copy link
Member

My gut instinct tells me this probably something to do with inefficient std::string usage for the payload.

@dennisjenkins75
Copy link
Collaborator Author

dennisjenkins75 commented Jan 23, 2024

Well nuts. I wrote a stand-alone repro, and it does not reproduce the issue. However, an edited version of the example "rest_server" does reproduce it. The profile and stack traces point to something inside pistache.

#include <pistache/description.h>
#include <pistache/endpoint.h>
#include <pistache/http.h>

using namespace Pistache;

struct RequestHandler : public Http::Handler {
  HTTP_PROTOTYPE(RequestHandler)
  RequestHandler() {}

  void onRequest(const Http::Request& request,
                 Http::ResponseWriter writer) override {
    std::cerr << request.method() << " " << request.resource() << " "
              << request.body().size() << "\n";
    writer.send(Http::Code::Ok, "");
  }
};

int main(int argc, char *argv[]) {
  size_t max_req_size = Const::MaxBuffer;
  uint16_t port = 9080;

  if (argc > 1) {
    max_req_size = strtoul(argv[1], nullptr, 10);
    if (argc > 2) {
      port = static_cast<uint16_t>(strtoul(argv[2], nullptr, 10));
    }
  }

  std::cout << "MaxRequestSize = " << max_req_size << "\n";
  std::cout << "Port = " << port << "\n";

  auto opts = Http::Endpoint::options()
      .maxRequestSize(max_req_size)
      .keepaliveTimeout(std::chrono::seconds(2))
      .threads(1)
      .flags(Tcp::Options::ReuseAddr);

  Http::Endpoint endpoint(Address(Ipv4::any(), Port(port)));
  endpoint.init(opts);
  endpoint.setHandler(Http::make_handler<RequestHandler>());
  endpoint.serve();
  return 0;
}
 $ make && ./output/pistache-demo 41943040 9999
LINK            output/pistache-demo
MaxRequestSize = 41943040
Port = 9999

# in another shell
$ dd if=/dev/random of=blob.bin bs=1048576 count=128

$ time curl -v -X POST -H "Content-type: application/octet-stream" -d @blob.bin http://localhost:9999/
...
< HTTP/1.1 413 Request Entity Too Large
< Content-Length: 36
< 
* Excess found in a read: excess = 150, size = 36, maxdownload = 36, bytecount = 0
* Closing connection
Request exceeded maximum buffer size
real    0m1.259s
...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants