Some APIs result into 500 error #665

Madhu1029 · 2022-07-14T07:16:16Z

We are trying to execute QuantumLeap APIs via jmeter script. But some of the APIs result into 500 error. Below are the configurations for API execution:
API: GET http://host/v2/entities/tid1/attrs/pressure?type=typ&limit=100&offset=1534&fromDate=2022-06-02T00:00:00Z&toDate=2022-07-04T23:59:59Z
Throughput: 20req/s.
WORKERS: 40
QuantumLeap Version: 0.8.1
We have tried increasing the value of WORKERS variable, it decreases the probability of 500 error. But the 500 error is still occurring.
Is increasing the WORKERS value be correct solution for this issue? If yes, how could I calculate the correct value for WORKERS value?

c0c0n3 · 2022-07-14T08:47:35Z

hi @MadhuNEC :-)

We have tried increasing the value of WORKER variable

The variable name is actually WORKERS, notice the S at the end, setting WORKER=40 has no effect :-)
See:

https://github.com/orchestracities/ngsi-timeseries-api/blob/master/src/server/gconfig.py#L35

500 error is still occurring.

Can you post more details about the error you're getting? Any trace of that in the logs? What's the backend you're using? Crate DB or Timescale? What's the load on the DB? We've had similar issues in the past which boiled down to not allocating enough resources to the database backend. So when QL tried to run a query, the DB would just refuse to run it b/c it was overloaded. It could be you're experiencing something similar, but I can't be sure. Like I said, we'd need to know more about your test environment, errors you get in the logs, DB load, etc.

Madhu1029 · 2022-07-14T12:28:28Z

Yes, I am using WORKERS variable. Sorry, mistaken to write in the comment, updated the same.

I am using CrateDB backend.
QuantumLeap is running behind nginx. I can only see 500 code response for the API in nginx logs. There are no logs at QuantumLeap side for the API which results into 500 error. It seems that QuantumLeap is unable to accept request.

For DB Load, there are 10 entities created and for each entity approx 5000 data is present. For example, there are 10 entities like tid1, tid2, tid3 etc and each entity (tid1,tid2, ..) contains approx 5000 values of pressure.

c0c0n3 · 2022-07-15T14:29:27Z

I am using WORKERS variable

cool, just wanted to double-check w/ you to rule out possible config issues.

There are no logs at QuantumLeap side for the API which results into 500 error

Then yes, I agree w/ you QL could be the bottleneck. At a 20 req/sec throughput rate and 40 workers, it looks like each worker should be busy for up to 2 secs. That means, the producer (jmeter) is faster than the consumer (QL) and eventually QL will be in a situation where all 40 workers are busy but new requests are still coming in. Keep in mind workers do the work sequentially (pun intended :-), so 40 workers means at most 40 concurrent queries. In this scenario Gunicorn will have no worker process to assign incoming requests to and so will just return a 500.

On the other hand, it could also well be that each request takes up to 2 secs on average not b/c QL is slow but rather Crate DB can't keep up w/ the query rate. I've seen this in the past and the solution was to give Crate DB enough RAM to perform decently---Crate is a fine piece of software but you can't expect it to match your workload if you don't give it enough resources, have a look at the manual for the details. Then you could also up the number of QL workers.

On a side note, we never really worked on query optimisation, but we did identify some potential performance hot spots. For a complete analysis you can read

https://github.com/orchestracities/ngsi-timeseries-api/wiki/The-Mother-of-all-Queries

In particular, the issues/performance section could be applicable to your scenario. But again, keep in mind that model is just an abstract model, we've never validated it w/ real measurements. Speaking of which, one way to get to the bottom of this would be to use QL's built-in telemetry

https://github.com/orchestracities/ngsi-timeseries-api/wiki/Gauging-Performance

to figure out how the QL load varies as a function of the input requests and how much of the processing time is spent waiting for Crate DB to return query result sets.

Hope this helps!

Madhu1029 · 2022-10-10T11:03:57Z

Hi @c0c0n3 ,

I have checked the number of busy workers during the script execution. I have checked that only 4-5 workers are busy at any point of time.
But 40 workers are configured in the gconfig.py . I have updated below code in src/server/gconfig.py file to print logs while assigning and releasing the workers:

import statsd
import datetime
sc = statsd.StatsClient('localhost', 8668)
def pre_request(worker, req):
    print("increment ",worker,datetime.datetime.now())

def post_request(worker, req):
    print("decrement ",worker,datetime.datetime.now())

I have executed the QuantumLeap APIs from jmeter. Below is the details for script execution:
WORKERS: 40
Quantumleap: 0.8.1
Throughput: 2req/s
And below error is found in jmeter log:
Non HTTP response code: org.apache.http.NoHttpResponseException
It seems that jmeter has not received any response from quantumleap side, so it results into NoHttpResponseException.
Note: There is no log at quantumleap end for the failed request.
Could you please help on this issue?

c0c0n3 · 2022-11-09T18:23:11Z

Hi @MadhuNEC :-)

So I've finally found the time to look at this issue. What I did, I followed the steps in

- https://github.com/orchestracities/ngsi-timeseries-api/wiki/Gauging-Performance

to do some load testing. Then as explained in the wiki article I used Pandas to do some basic data analysis. It turns out Gunicorn actually distributed the work quite evenly among my 10 workers---I don't have enough horsepower to test w/ 40. So basically I got pretty much the same results as in the wiki article.

I have checked that only 4-5 workers are busy at any point of time.

How did you do that? Looking at the code in your gconfig.py I don't see any easy way to analyse the data you output in the pre/post request hooks? Can you please try using our telemetry framework

and do statistical analysis with Pandas as explained in the article and let us know if you get different results?

Thanks alot!!

NEC-Vishal · 2022-11-11T04:26:22Z

Hi @c0c0n3
Thanks for guiding us.
but when we are running the command: ./baseline-load-test.sh
we are stucking in this step, we tried a lot but unfortunately we can not overcome this situation.

can you please suggest a way to do this?

c0c0n3 · 2023-01-05T18:01:27Z

Hi @NEC-Vishal,

did you run these commands before calling baseline-load-test.sh?

$ cd /path/to/ngsi-timeseries-api
$ source setup_dev_env.sh
$ pipenv install --dev
$ cd src/tests/benchmark

Madhu1029 added the bug or fix label Jul 14, 2022

Madhu1029 assigned c0c0n3 Jul 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some APIs result into 500 error #665

Some APIs result into 500 error #665

Madhu1029 commented Jul 14, 2022 •

edited

c0c0n3 commented Jul 14, 2022

Madhu1029 commented Jul 14, 2022

c0c0n3 commented Jul 15, 2022

Madhu1029 commented Oct 10, 2022 •

edited

c0c0n3 commented Nov 9, 2022 •

edited

NEC-Vishal commented Nov 11, 2022

c0c0n3 commented Jan 5, 2023

Some APIs result into 500 error #665

Some APIs result into 500 error #665

Comments

Madhu1029 commented Jul 14, 2022 • edited

c0c0n3 commented Jul 14, 2022

Madhu1029 commented Jul 14, 2022

c0c0n3 commented Jul 15, 2022

Madhu1029 commented Oct 10, 2022 • edited

c0c0n3 commented Nov 9, 2022 • edited

NEC-Vishal commented Nov 11, 2022

c0c0n3 commented Jan 5, 2023

Madhu1029 commented Jul 14, 2022 •

edited

Madhu1029 commented Oct 10, 2022 •

edited

c0c0n3 commented Nov 9, 2022 •

edited