Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When using the asb:MessageReceiver to poll messages the health-check endpoint will not start #6534

Closed
ayeshLK opened this issue May 14, 2024 · 5 comments
Assignees

Comments

@ayeshLK
Copy link
Member

ayeshLK commented May 14, 2024

Description:

$subject

Related to: #28459

It was observed that this happens only when there is a resource limitation for the container. The current resource limits are as follows:

  • CPU: 0.3
  • Memory: 300m
@ayeshLK
Copy link
Member Author

ayeshLK commented May 14, 2024

Was able to reproduce this with following sample code:

import ballerina/http;
import ballerina/log;
import ballerina/os;
import ballerinax/asb;

configurable string queueName = "perf-test-queue";
configurable string asbConnectionString = os:getEnv("CONNECTION_STRING");

boolean isResilientInvokerReady = false;

public function main() returns error? {
    _ = @strand {thread: "any"} start receiveMessages();
    isResilientInvokerReady = true;
}

isolated function receiveMessages() returns error? {
    asb:MessageReceiver receiver = check createMessageReceiver();
    while true {
        do {
            asb:Message? message = check receiver->receive(10);
            if message is () {
                continue;
            }
            check receiver->complete(message);
        } on fail error asbError {
            log:printError("Error occurred while consuming records", asbError);
            error? result = receiver->close();
            if result is error {
                log:printError("Error occurred while gracefully closing asb:MessageReceiver", result);
            }
            var _receiver = createMessageReceiver();
            if _receiver is asb:MessageReceiver {
                receiver = _receiver;
            } else {
                log:printError("Error occurred while creating asb:MessageReceiver", _receiver);
            }
        }
    }
}

isolated function createMessageReceiver() returns asb:MessageReceiver|error {
    return new ({
        entityConfig: {
            queueName: queueName
        },
        connectionString: asbConnectionString
    });
}

public configurable int healthCheckPort = 9093;

listener http:Listener healthCheckListener = check new (healthCheckPort);

service / on healthCheckListener {
    resource function get liveness() returns http:ServiceUnavailable|http:Ok {
        if isResilientInvokerReady {
            return <http:Ok>{body: {status: "OK"}};
        }
        return <http:ServiceUnavailable>{body: {status: "NOT READY"}};
    }

    resource function get readiness() returns http:ServiceUnavailable|http:Ok {
        if isResilientInvokerReady {
            return <http:Ok>{body: {status: "OK"}};
        }
        return <http:ServiceUnavailable>{body: {status: "NOT READY"}};
    }
};

Used following docker command:

docker run --name res-frm --cpus=0.3 -m=300m -p 9093:9093 -d ayeshlk/cpresfrm:v1

@ayeshLK
Copy link
Member Author

ayeshLK commented May 15, 2024

Based on the initial analysis, here are the key findings:

  • The receiveMessages function is initiated as a new worker with the start keyword, running on a new Ballerina strand. If a free thread is available in the Ballerina thread pool, it will be assigned to this strand.
  • asb:MessageReceiver operates as a synchronous client, and the receiver->receive(10) method can block the current thread for up to 10 seconds if no messages are present.
  • HTTP services handle incoming requests using threads from the Ballerina thread pool.
  • The default size of the Ballerina thread pool is twice the number of CPU cores. In this scenario, the calculation (2 * 0.3 = 0.6) rounds to less than one thread.

These findings suggest that the primary bottleneck is the CPU limit and an insufficient number of threads available to handle the workload. Here are some recommendations to address these issues:

Recommendations:

  • Increase the CPU limit to 0.5, which is the minimum advisable setting for Ballerina applications.
  • Manually set the Ballerina thread pool size using the BALLERINA_MAX_POOL_SIZE environment variable. A recommended setting is 10 threads.
  • Adjust the liveness/readiness check configurations to include an initial delay of 10 - 15 seconds.

@ayeshLK
Copy link
Member Author

ayeshLK commented May 15, 2024

Since a blocking behavior during a network call is not default Ballerina network client behavior we should first support non-blocking behaviour in the Ballerina ASB connector itself. Hence, we need to prioritize following issue [1] before providing further suggestions to fix this issue.

[1] - #4982

@ayeshLK
Copy link
Member Author

ayeshLK commented Jun 16, 2024

ASB client network calls are now working in a non-blocking manner after this improvement [1]. Hence, closing this issue.

[1] - #4982

@ayeshLK ayeshLK closed this as completed Jun 16, 2024
Copy link

This issue is NOT closed with a proper Reason/ label. Make sure to add proper reason label before closing. Please add or leave a comment with the proper reason label now.

      - Reason/EngineeringMistake - The issue occurred due to a mistake made in the past.
      - Reason/Regression - The issue has introduced a regression.
      - Reason/MultipleComponentInteraction - Issue occured due to interactions in multiple components.
      - Reason/Complex - Issue occurred due to complex scenario.
      - Reason/Invalid - Issue is invalid.
      - Reason/Other - None of the above cases.

@ayeshLK ayeshLK added the Reason/Complex Issue occurred due to complex scenario. label Jun 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

No branches or pull requests

1 participant