Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When using Mbed TLS, not all data will be processed. #2668

Open
tubular-rheostat opened this issue Mar 26, 2024 · 25 comments
Open

When using Mbed TLS, not all data will be processed. #2668

tubular-rheostat opened this issue Mar 26, 2024 · 25 comments
Assignees

Comments

@tubular-rheostat
Copy link

  • My goal is: Use MbedTLS
  • My actions were: Connect using TLS, process received data
  • My expectation was: I would receive data in a timely manner
  • The result I saw: If the server sends two messages in rapid succession, only the first message is processed in a timely manner, the other is only processed after a significant delay
  • My question is: How can mongoose be modified to read all data?

Environment

  • mongoose version: 7.13
  • Compiler/IDE and SDK: GCC 13.2 (arm-none-eabi-gcc), Raspberry Pi Pico SDK 1.5.1, Mbed TLS 2.28.1 (this is the version bundled with the Pi Pico SDK)
  • Target hardware/board: RP2040
  • Connectivity chip/module: Wiznet w5500, Mongoose built-in TCP
  • Target RTOS/OS (if applicable): None

If more than one TLS record is in the same TCP packet Mongoose will process only the first record, leaving the second in the buffer until another packet is received. A possible solution is to call mbedtls_ssl_read in a loop until it returns MBEDTLS_ERR_SSL_WANT_READ or similar value when there is no more data.

Image from WireShark showing two TLS records in one TCP packet

image

My description that follows refers to MQTT but that is coincidental.

My test program is an MQTT client, it subscribes to $SYS/broker/clients/connected and test_req.

mosquitto_pub -v -V 311 -h broker_ip -p 8833 -t "test_req"  -m "whatever"

That command connects to the broker and sends a message. Because I am subscribed to the broker's client count topic, this often causes two messages to be published: The updated client count as well as the message on test_req.

I applied this patch to 7.13 and it resolved the issue:

diff --git a/mongoose.c b/mongoose.c
index 62fde56a..b57349dc 100644
--- a/mongoose.c
+++ b/mongoose.c
@@ -5533,13 +5533,18 @@ static void read_conn(struct mg_connection *c, struct pkt *pkt) {
         mg_error(c, "oom");
       } else {
         // Decrypt data directly into c->recv
-        long n = mg_tls_recv(c, &io->buf[io->len], io->size - io->len);
-        if (n == MG_IO_ERR) {
-          mg_error(c, "TLS recv error");
-        } else if (n > 0) {
-          // Decrypted successfully - trigger MG_EV_READ
-          io->len += (size_t) n;
-          mg_call(c, MG_EV_READ, &n);
+        while(true) {
+          long n = mg_tls_recv(c, &io->buf[io->len], io->size - io->len);
+          if (n == MG_IO_ERR) {
+            mg_error(c, "TLS recv error");
+            break;
+          } else if (n > 0) {
+            // Decrypted successfully - trigger MG_EV_READ
+            io->len += (size_t) n;
+            mg_call(c, MG_EV_READ, &n);
+          } else {
+            break;
+          }
         }
       }
     } else {
@tubular-rheostat
Copy link
Author

I tried testing with master, I see PR #2646 is related but that change does not address this issue.

@scaprile
Copy link
Collaborator

Can you please check if our built-in TLS stack solves your issue ?
(Build with MG_TLS_BUILTIN instead)

@tubular-rheostat
Copy link
Author

Initial test with latest master e4cd6bd8 and MG_TLS_BUILTIN wouldn't connect, the TLS handshake never completed.
I traced it to here

In mg_tls_init

  if (c->is_client) {
    tls->server_cert_der.ptr = NULL;
    return;
  }

After that returns no TLS Client Hello was sent, eventually the broker disconnected due to its timeout.

I'll have some time to tinker with it this evening (US Central timezone)

@scaprile
Copy link
Collaborator

It requires EC certificates and keys, the API is the same.

@tubular-rheostat
Copy link
Author

I am able to reproduce the issue with MG_TLS_BUILTIN, the patch still resolves it. However, my theory of the cause is in doubt: I amended my patch to add logging to indicate when a 2nd iteration produced additional data, and my test message did not. However, if I remove my patch then my program stops working again.

Other messages, earlier in the MQTT handshake did require multiple iterations through this loop, as many as 5 iterations.

Here is the same patch, with logging

diff --git a/mongoose.c b/mongoose.c
index 62fde56a..48080ee8 100644
--- a/mongoose.c
+++ b/mongoose.c
@@ -5533,13 +5533,23 @@ static void read_conn(struct mg_connection *c, struct pkt *pkt) {
         mg_error(c, "oom");
       } else {
         // Decrypt data directly into c->recv
-        long n = mg_tls_recv(c, &io->buf[io->len], io->size - io->len);
-        if (n == MG_IO_ERR) {
-          mg_error(c, "TLS recv error");
-        } else if (n > 0) {
-          // Decrypted successfully - trigger MG_EV_READ
-          io->len += (size_t) n;
-          mg_call(c, MG_EV_READ, &n);
+        int ct = 0;
+        while(true) {
+          ++ct;
+          long n = mg_tls_recv(c, &io->buf[io->len], io->size - io->len);
+          if (n == MG_IO_ERR) {
+            mg_error(c, "TLS recv error");
+            break;
+          } else if (n > 0) {
+            // Decrypted successfully - trigger MG_EV_READ
+            io->len += (size_t) n;
+            mg_call(c, MG_EV_READ, &n);
+            if(ct > 1) {
+              MG_DEBUG(("TLS recv loop %d", ct));
+            }
+          } else {
+            break;
+          }
         }
       }

I attach logs of the transaction with and without my patch

mg_mqtt_server-withpatch.txt

mg_mqtt_server-nopatch.txt

And a pcap just for good measure

blinky2.pcap.zip

192.168.192.115 is the mongoose mqtt server under test
192.168.192.130 is a mosquitto broker, connecting to .115 as a bridge
The interesting packet with 5 application data records is packet 20. My test message is packet 22.

Note that round-trip time between those two is around 22ms, I am suspecting that you need a certain amount of latency in order to repro the issue, otherwise the TCP packets would not accumulate so many TLS records. This could be verified by testing again with NODELAY turned on.

Anyway, this is enough for me today.

@scaprile
Copy link
Collaborator

@cpq This seems to be yet another corner case.
We get more than one TLS record, we put data in its buffer, we call TLS, we decrypt the first record. There is still outstanding data in the buffer, but since a whole record has been processed, tls_pending() won't help here. We need to react to outstanding data in the TLS buffer in order for tcpip_poll() to call tls_recv().
I think a similar problem will occur with sockets.
As long as no data comes in, we stall with outstanding TLS records in their buffer.
If new data comes in, we process the first outstanding record and the others stay there.

@tubular-rheostat Thank you, Brian, you helped a lot.

@cpq
Copy link
Member

cpq commented Mar 27, 2024

@tubular-rheostat thanks!

loops is receive code are dangerous. Brian, could you try another patch please?

Try to change this line

return tls == NULL ? 0 : mbedtls_ssl_get_bytes_avail(&tls->ssl);

To this:

  return tls == NULL ? 0 : c->rtls.len + mbedtls_ssl_get_bytes_avail(&tls->ssl);

@tubular-rheostat
Copy link
Author

That change didn't help. I'm using Mongoose builtin TCP, the only code I saw referring to that edited function was in sock.c.

I can try running my test code with BSD sockets but I probably won't have time to work on it until the weekend.

If it helps here is some of my test program:

https://gist.github.com/tubular-rheostat/1cdd37e82aef3e2117fc5d377e060b1b

@scaprile
Copy link
Collaborator

scaprile commented Mar 28, 2024

Brian, as can be seen on the link, that change is in src/tls_mbed.c, line 171, that is MbedTLS code.
Once you changed that line, you make mongoose.c and you'll get that change transferred into mongoose.c.
You can also directly change mongoose.c line 10906:

mongoose/mongoose.c

Lines 10904 to 10907 in 7581229

size_t mg_tls_pending(struct mg_connection *c) {
struct mg_tls *tls = (struct mg_tls *) c->tls;
return tls == NULL ? 0 : mbedtls_ssl_get_bytes_avail(&tls->ssl);
}

Please build and test with MbedTLS, this applies to both sockets and built-in TCP stack.

@tubular-rheostat
Copy link
Author

tubular-rheostat commented Mar 28, 2024

That is what I did. I copied the mongoose.c from 7581229b and applied your most recent patch. I tested with MbedTLS and built-in TCP.

When I said the function is not used outside of sock.c, that statement was based on this search (attached image). I don't think that built-in TCP ever calls mg_tls_pending, only the BSD Socket code calls it. Sorry if I wasn't clear before, I was interrupted before I could finish writing.

image

@gvanem
Copy link
Contributor

gvanem commented Mar 28, 2024

I see something similar on Windows, with -DMG_TLS=MG_TLS_MBED=1 and set V=3 & unit_test.exe:

tls_mbed.c:114:mg_tls_init    2 Setting TLS
tls_mbed.c:142:mg_tls_init    2 hostname verification: localhost
tls_mbed.c:95:mg_tls_handshak TLS handshake: -0x6c00
sock.c:440:accept_conn        3 616 accepted 127.0.0.1:62645 -> 127.0.0.1:12347
tls_mbed.c:114:mg_tls_init    3 Setting TLS
net.c:150:mg_close_conn       2 608 closed

And -0x6c00 =="Internal error (eg, unexpected failure in lower-level module". I.e. MBEDTLS_ERR_SSL_INTERNAL_ERROR
that's thrown in gazillion of places.

With -DMG_TLS=MG_TLS_OPENSSL I get: SUCCESS. Total tests: 1290
With -DMG_TLS=MG_TLS_BUILTIN, I get: SUCCESS. Total tests: 1277

Seems something has changed in MBedTLS recently. I built it from git master today. Similar handshake issue in libcurl :-(

@scaprile
Copy link
Collaborator

@gvanem Mmmmm.... I think that is another issue, this one is not MbedTLS specific but our problem.
We are not running unit tests for MbedTLS nor OpenSSL in Windows, the examples that do, clone and build a specific version of MbedTLS, as explained in the TLS tutorial "Only what the example provides in its Makefile is supported"

@scaprile
Copy link
Collaborator

scaprile commented Mar 28, 2024

@tubular-rheostat Oops, you are right

@cpq I think the fix is cool. The elephant in the room is that our built-in TCP/IP stack's mg_mgr_poll() won't poll TLS for outstanding received data to be processed, that only happens when read_conn() is called and there is TCP data, and that one is called when a TCP-holding frame is received, from within rx_tcp(). We do poll for outstanding data to be sent, by calling write_conn(), though
IMHO our MIP poll loop needs a tweak. The socket select/poll/epoll core does if (mg_tls_pending(c) > 0) ms = 1, c->is_readable = 1;, that, IIUC, triggers read_conn that ends up calling mg_tls_recv() with outstanding data in the buffer (plus any possible new data).
Should we do something like this for MIP ?

diff --git a/src/net_builtin.c b/src/net_builtin.c
index 98a58e75..3390d16d 100644
--- a/src/net_builtin.c
+++ b/src/net_builtin.c
@@ -613,6 +613,17 @@ long mg_io_send(struct mg_connection *c, const void *buf, size_t len) {
   return (long) len;
 }
 
+static void handle_tls_recv(struct mg_iobuf *io) {
+  long n = mg_tls_recv(c, &io->buf[io->len], io->size - io->len);
+  if (n == MG_IO_ERR) {
+    mg_error(c, "TLS recv error");
+  } else if (n > 0) {
+    // Decrypted successfully - trigger MG_EV_READ
+    io->len += (size_t) n;
+    mg_call(c, MG_EV_READ, &n);
+  }
+}
+
 static void read_conn(struct mg_connection *c, struct pkt *pkt) {
   struct connstate *s = (struct connstate *) (c + 1);
   struct mg_iobuf *io = c->is_tls ? &c->rtls : &c->recv;
@@ -691,14 +702,7 @@ static void read_conn(struct mg_connection *c, struct pkt *pkt) {
         mg_error(c, "oom");
       } else {
         // Decrypt data directly into c->recv
-        long n = mg_tls_recv(c, &io->buf[io->len], io->size - io->len);
-        if (n == MG_IO_ERR) {
-          mg_error(c, "TLS recv error");
-        } else if (n > 0) {
-          // Decrypted successfully - trigger MG_EV_READ
-          io->len += (size_t) n;
-          mg_call(c, MG_EV_READ, &n);
-        }
+        handle_tls_rcv(io);
       }
     } else {
       // Plain text connection, data is already in c->recv, trigger
@@ -1109,6 +1113,8 @@ void mg_mgr_poll(struct mg_mgr *mgr, int ms) {
     MG_VERBOSE(("%lu .. %c%c%c%c%c", c->id, c->is_tls ? 'T' : 't',
                 c->is_connecting ? 'C' : 'c', c->is_tls_hs ? 'H' : 'h',
                 c->is_resolving ? 'R' : 'r', c->is_closing ? 'C' : 'c'));
+    if (c->is_tls && mg_tls_pending(c) > 0)
+      handle_tls_recv((struct mg_iobuf *) &c->rtls);
     if (can_write(c)) write_conn(c);
     if (c->is_draining && c->send.len == 0 && s->ttype != MIP_TTYPE_FIN)
       init_closure(c);

WDYT ?

@tubular-rheostat
Copy link
Author

I will try that last patch on my test env., probably in the evening (US Central time).

I tried testing with BSD sockets + MbedTLS but could never repro the issue (with unmodified 7.13). For that I had to switch from testing on an rp2040 to testing on a Mac. so the test environment is not the same. I will try with Mongoose built-in TCP on the Mac.

For testing on the Mac I also tried simulating latency by adding a delay to the MQTT broker machine's network interface, on the assumption that latency helps repro the issue, but it had no effect, using the Linux tc command. I think I probably set this up wrong, I'll experiment.

@scaprile
Copy link
Collaborator

Yeah, this is really hard to reproduce. My suggestion would work on our built-in TCP/IP stack but is just an idea, we haven't actually tried it. If you can, that would be a nice input.

@tubular-rheostat
Copy link
Author

I tried the most recent patch (modified slightly to get it to compile) against master 7581229b and it wouldn't connect to the MQTT server. I think this is during TLS init, it opened the TCP connection but closes it again before sending the Client Hello record.

82c    ERROR mongoose.c:1363:mg_error 1 -1 TLS recv error

I think the underlying MbedTLS error code was MBEDTLS_ERR_SSL_FEATURE_UNAVAILABLE.

I was not able to work out what is going wrong.

@tubular-rheostat
Copy link
Author

I was able to repro the issue on my laptop, see https://github.com/tubular-rheostat/mongoose-test-2668 for a test program.

Hopefully this makes further development work and testing easier. Apologies if you don't have a Mac to test on...

@tubular-rheostat
Copy link
Author

I tried retesting with @scarpile's latest patch and didn't have the problem with TLS handshake again, it was able to connect. I don't know what I did that was different. It did not resolve my issue though. I don't think the new if in mg_mgr_poll was ever taken (was always false).

I was experimenting with an alternate fix, just have mg_tls_recv process as many records as it can, within the available buffer size.

long mg_tls_recv(struct mg_connection *c, void *buf, size_t len) {
  struct mg_tls *tls = (struct mg_tls *) c->tls;
  size_t total_read = 0;

  while (total_read < len) {
    long n = mbedtls_ssl_read(&tls->ssl, buf + total_read, len - total_read);
    if(n > 0) {
      total_read += (size_t) n;
    } else if (n == MBEDTLS_ERR_SSL_WANT_READ || n == MBEDTLS_ERR_SSL_WANT_WRITE) {
      if(total_read == 0) {
        return MG_IO_WAIT;
      }
      return (long) total_read;
    } else {
      if(total_read == 0) {
        return MG_IO_ERR;
      }
      return (long) total_read;
    }
  }
  // oops we filled the entire buffer; unread data may still remain!
  return (long) total_read;
}

This code has some issues

  • does not return MG_IO_ERR in all cases when MbedTLS indicates an error
  • could exhaust buffer space without finishing the read loop; doesn't feel like the right place to allocate a buffer

If you refer to the lwIP implementation [github.com/lwip-tcpip] of MbedTLS they also have a loop that continues reading until the read call returns an error code.

I was reading about OpenSSL and it implies similar limitations [openssl.org] reading multiple records requires multiple calls to SSL_read I don't have any experience with OpenSSL.

At most the contents of one record will be returned.

@tubular-rheostat
Copy link
Author

Anyway, thanks for looking at the issue. I'm happy with the patch from my initial comment and will continue working with that.

Let me know if you want me to try anything else in my test environment.

@scaprile
Copy link
Collaborator

scaprile commented Apr 5, 2024

Brian, we already have a loop in place, at the outer level, we will avoid inner loops as much as possible. cpq is busy with Embedded World now, he is the man and he also has a Mac, we will fix this issue as we always do; it may take a bit longer.
My patch was a shoot in the dark as I don't see the problem, just mimicked our behavior in other environments, I may have left something important out and of course could not test anything.
I may try to reproduce what you posted. in my Linux box.
Anyway, we'll let you know when we have something.

@cpq
Copy link
Member

cpq commented Apr 15, 2024

@tubular-rheostat @scaprile thanks!

I used Brian's test program, just modified it to use pcap instead of tun , just to avoid sudo-ing and ifconfig-in all the time.

Then I changed it to use broker.hivemq.com:8883 instead of local mosquitto, cause local mosquitto did not start for whatever reasom.

I applied Sergio's patch and it works OK (using ca.pem file from test/data):

f4668819 DEBUG mongoose.c:4775:mg_connect 3 -1 mqtts://broker.hivemq.com:8883
f466881c DEBUG mongoose.c:4749:mg_close_conn 1 -1 closed
f4668842 DEBUG mongoose.c:1230:dns_cb 3 broker.hivemq.com is 3.73.141.152
f4668843 DEBUG mongoose.c:5914:mg_connect_resolved 3 192.168.2.19:34433 -> 3.73.141.152:8883
f4668886 DEBUG main.cpp:309:mqtt_client_fn CA: 30427
f4668886 DEBUG mongoose.c:10848:mg_tls_init 3 Setting TLS
f4668ca1 DEBUG mongoose.c:10821:mg_tls_handshake 3 success
connected to mqtt
f4668ce1 DEBUG mongoose.c:4503:mqtt_cb 3 Connected
test_req message: {"topic":"reply/202376DE","correlation_id":"890954FD-BB39-45B3-8398-4CA3202376DE"}
f466f9e2 DEBUG mongoose.c:4383:mg_mqtt_pub 3 [reply/202376DE] -> [{"timestamp":4100389346,"conn_id":2,"correlation_id":"890954FD-BB39-45B3-8398-4CA3202376DE"}]

The suggested patch looks good to me. @scaprile do you want to turn it into a PR?

@tubular-rheostat
Copy link
Author

Thanks @cpq for looking at the issue again.

I see some improvement but can still repro the issue. With master 9cccb983 including this PR, using my test program (modified to address a renamed field in mg_str) I can get the mqtt_req_resp script in my test repo to work one time but if I run the script a 2nd time it still hangs waiting for a reply.

I didn't know about pcap I'll check out the example.

@cpq
Copy link
Member

cpq commented Apr 17, 2024

@tubular-rheostat thanks. do we use the same connection when the script is executed the 2nd time? in other words, what's different the 2nd time? And, does it fail with built-in TLS?

@tubular-rheostat
Copy link
Author

tubular-rheostat commented Apr 27, 2024

The same connection is used the 2nd time. From the steps in my demonstration code I just repeated step 11.

@cpq
Copy link
Member

cpq commented May 16, 2024

@tubular-rheostat did you try the pcap example?
It'd be nice to be able to reproduce is locally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants