Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a timeout to the example dialplan #47

Open
k4ml opened this issue Nov 30, 2017 · 29 comments
Open

Add a timeout to the example dialplan #47

k4ml opened this issue Nov 30, 2017 · 29 comments

Comments

@k4ml
Copy link

k4ml commented Nov 30, 2017

One issue I noticed with park-only dialplan is that if for some reason the inbound socket process fail or die, then the parked calls will remained in freeswitch db (show calls) will list out the calls, filling out freeswitch internal db until it max out and freeswitch not accepting any new calls.

Is there something can be done in the dialplan other than just calling park, so that if there's no inbound server to process the call, it will just terminate the channel similar to outbound socket ?

@goodboy
Copy link
Member

goodboy commented Nov 30, 2017

@k4ml yes you can set a timeout using the park_timeout channel variable.

I think that's the way I've normally handled it in the past; if it doesn't work let me know and I'll make a test with a working solution. Actually we should probably document that in the dialplan example stuff.

In general I'd like to ship a production grade example dialplan with switchio. @k4ml if you come up with a good working version we'll gladly take a PR for it.

@goodboy
Copy link
Member

goodboy commented Nov 30, 2017

@k4ml btw have you noticed that the socket dies with switchio in particular?
If that is the case we likely have a bug.

@k4ml
Copy link
Author

k4ml commented Nov 30, 2017

@tgoodlet I haven't try switchio yet but this is what I notice when using park-only dialplan + inbound socket in general. Even the socket didn't die, you must be explicitly hanging up the call somewhere in your app.

@k4ml
Copy link
Author

k4ml commented Nov 30, 2017

So with park_timeout set, I guess we also need to unpark the call if we manage to handle it in channel_park, otherwise the call will be terminated half the way. Looking here we have to do that with uuid_transfer but that will bring back the call into the dialplan, which mean we need at least 2 extensions in the dialplan, one to handle new incoming call (put into park) and another one to handle unpark call, otherwise we will get into infinite loop ?

@goodboy
Copy link
Member

goodboy commented Nov 30, 2017

which mean we need at least 2 extensions in the dialplan, one to handle new incoming call (put into park) and another one to handle unpark call, otherwise we will get into infinite loop ?

No I don't believe this is true. Using any other call mgmt cmd should take the session out of the CHANNEL_PARK state/app afaik. In the wiki section you linked to uuid_transfer is just used as the typical command to be used within an XML dialplan that would accomplish this. If you've found this is not true that would be a problem but I'd find it hard to believe as I've never experienced it and the the inbound socket approach is the recommended approach according to the wiki - including the use of a park app.

Actually on top of ^ outbound mode parks calls in the same way before transferring control.

goodboy pushed a commit that referenced this issue Nov 30, 2017
This is a safeguard in case the ESL connection drops while inbound calls
are being received but no longer processed by switchio.

Resolves #47
@goodboy
Copy link
Member

goodboy commented Nov 30, 2017

@k4ml if this build passes it should prove my point as there are calls during some of the stress tests in the suite that are kept up longer then 3 seconds (the timeout I added).

@goodboy
Copy link
Member

goodboy commented Nov 30, 2017

Yep just verified it manually as well. As soon as you do anything else with the session the park timeout is cancelled. I'm going to write a formal test to demonstrate this.

@goodboy goodboy changed the title Parked calls will remain if inbound socket process failed or die Add a park_timeout to the example dialplan Nov 30, 2017
goodboy pushed a commit that referenced this issue Nov 30, 2017
Verify that when a session is parked but not handled within the timeout
period (in this case 3 seconds as set in our CI dialplan) it is cancelled
by FS. In the contrary case verify that a simple `session.answer()` within
the timeout results in a successful SIPp `uac` client scenario.

Resolves #47
goodboy pushed a commit that referenced this issue Nov 30, 2017
This is a safeguard in case the ESL connection drops while inbound calls
are being received but no longer processed by switchio.

Resolves #47
goodboy pushed a commit that referenced this issue Nov 30, 2017
Verify that when a session is parked but not handled within the timeout
period (in this case 3 seconds as set in our CI dialplan) it is cancelled
by FS. In the contrary case verify that a simple `session.answer()` within
the timeout results in a successful SIPp `uac` client scenario.

Resolves #47
@goodboy
Copy link
Member

goodboy commented Nov 30, 2017

@k4ml added a test case in PR #49 which verifies everything I've claimed against the latest FS docker image.

Hope that clears up all your questions.

goodboy pushed a commit that referenced this issue Nov 30, 2017
Verify that when a session is parked but not handled within the timeout
period (in this case 3 seconds as set in our CI dialplan) it is cancelled
by FS. In the contrary case verify that a simple `session.answer()` within
the timeout results in a successful SIPp `uac` client scenario.

Resolves #47
@moises-silva
Copy link
Member

moises-silva commented Nov 30, 2017

The park_timeout variable is used to calculate an expiration time to be inside the park loop, not necessarily the park state. I think the problem with relying on that is that as soon as your application ends, you're back to the loop and the expiry check will cause the park loop to end and hangup the channel.

The park loop only ends when:

  • The channel is hung up
  • The CF_PARK or CF_CONTROLLED flags are not set anymore
  • Various error conditions (e.g failed to read I/O from the channel)
  • Park timeout

I might be missing something, but I don't think that timeout would be cleared by executing applications. However, it won't trigger if you're inside a long-running application (e.g playback, bert, or whatever). It will trigger when that application ends (if it ends after the expiry time). If you execute a never-ending application such as endless_playback or bert, then the timer will appear to never fire.

@moises-silva
Copy link
Member

Probably the solution you'd be looking for is an activity timeout. E.g, quit if no commands have been received in X amount of time. It'd be basically be the same as the parking timeout, just that it would be reset at the end of executing every command.

For outbound sockets it's different because the outbound socket explicitly clears the CF_CONTROLLED flag to exit the park loop when the socket goes down.

@moises-silva
Copy link
Member

Note all of this would be to work-around a buggy call control server. Any call control server should restart if it dies (e.g via systemd unit restart) and recover control of the sessions or hang up the old sessions (depending on how much state-keeping you preserve after dying).

@goodboy goodboy changed the title Add a park_timeout to the example dialplan Add a timeout to the example dialplan Nov 30, 2017
@goodboy
Copy link
Member

goodboy commented Nov 30, 2017

@moises-silva thanks for the in depth analysis! Getting the source details is super helpful.

In my proposed test it seems simply answering or bridging the session prevents the timeout as well. Maybe I should try un-parking the session and see if it times out?

Also the condition in that while statement, particularly switch_channel_ready(channel) - if you dig down it seems to be a macro for switch_channel_test_ready - called with (_channel, TRUE, FALSE) which in turn is handled in switch_channel.c and has a nasty if statement that may break the loop:

if (!channel->hangup_cause && channel->state > CS_ROUTING && channel->state < CS_HANGUP && channel->state != CS_RESET &&
		!switch_channel_test_flag(channel, CF_TRANSFER) && !switch_channel_test_flag(channel, CF_NOT_READY) &&
		!switch_channel_state_change_pending(channel)) {
		ret++;
	}

Particularly !switch_channel_test_flag(channel, CF_NOT_READY) && !switch_channel_state_change_pending(channel) should likely return TRUE if the channel is operated on by some other command no?

@goodboy
Copy link
Member

goodboy commented Nov 30, 2017

Yeah so if I understand this correctly the hangup via park_timeout can only occur if you're still inside that while. This seems to be verified by the value not being read anywhere else in the code base:

 >>> git grep park_timeout
src/mod/endpoints/mod_sofia/sofia.c:                                            switch_channel_set_variable(channel_b, "park_timeout", "2:attended_transfer");
src/mod/endpoints/mod_sofia/sofia.c:                            switch_channel_set_variable(channel, "park_timeout", "600:blind_transfer");
src/mod/endpoints/mod_verto/mod_verto.c:                switch_channel_set_variable(b_tech_pvt->channel, "park_timeout", "2:attended_transfer");
src/switch_core_state_machine.c:                        switch_channel_set_variable(session->channel, "park_timeout", "10:blind_transfer");
src/switch_ivr.c:       if ((to = switch_channel_get_variable(channel, "park_timeout"))) {
src/switch_ivr.c:               switch_channel_set_variable(channel, "park_timeout", NULL);
src/switch_ivr_bridge.c:                switch_channel_set_variable(channel, "park_timeout", "3");

But maybe I'm missing something?

@moises-silva
Copy link
Member

moises-silva commented Nov 30, 2017

Yeah agreed, I thought switch_channel_ready() was checking just for hangup and other media io checks, but seems a state change also makes it bail. I'm curious now on what happens then after the last application executes.

@k4ml
Copy link
Author

k4ml commented Dec 1, 2017

@moises-silva In my test, where I didn't hangup after making a playback, the call still got hangup after the playback end with DESTINATION_OUT_OF_ORDER cause, which I think can verify from the park_timeout in the dialplan.

@k4ml
Copy link
Author

k4ml commented Dec 1, 2017

So I tested executing playback twice. With park_timeout, the call got hangup after the first playback.

@goodboy
Copy link
Member

goodboy commented Dec 1, 2017

@k4ml did you answer the call or put it in another state before executing playback?
Can you give an example dialplan that you're using.

@k4ml
Copy link
Author

k4ml commented Dec 1, 2017

@tgoodlet The call was answered and I tested with the same dialplan switchio use:-

<include>
<?xml version="1.0" encoding="utf-8"?>
<!-- A context for relinquishing control of all calls to switchio, the inbound ESL client -->
<context name="public">
  <!-- Park call and transfer control to esl -->
  <extension name="switchiopark">
    <condition field="destination_number" expression="^(.*)$">
     <action application="set" data="park_timeout=5:DESTINATION_OUT_OF_ORDER"/>
      <action application="park"/>
    </condition>
  </extension>
</context>
</include>

Btw I'm not using switchio here but my own esl lib. I'll test with switchio later on.

@goodboy
Copy link
Member

goodboy commented Dec 1, 2017

@k4ml no I meant what is your ESL app doing after handling the CHANNEL_PARK event?
Do you execute session.playback('blah') right away or do you session.answer() first?

Also if you'd rather get quicker feedback on this join our Riot room to chat.

@k4ml
Copy link
Author

k4ml commented Dec 1, 2017

@tgoodlet I executed session.answer() first. This is the snippet of the code:-

if commands.name == 'playback':
        if not _credits_enough(call_data['nibble_rate']):
            return
        sound_url = commands.args[0]
        session.answer()
        playback = session.playback(sound_url, **call_data)
        if playback.stop():
            session.playback(sound_url, **call_data)

@goodboy
Copy link
Member

goodboy commented Dec 1, 2017

@k4ml hmm I wonder if it matter that you call session.playback() after the answer has completed.
As in you wait for the CHANNEL_ANSWER to arrive first - because that's what my test is doing.

I'll try the test I have with the playback like you have.

@k4ml
Copy link
Author

k4ml commented Dec 1, 2017

@tgoodlet You mean I should wait for CHANNEL_EXECUTE_COMPLETE after executing session.answer() before proceed with playback ?

@goodboy
Copy link
Member

goodboy commented Dec 1, 2017

@k4ml maybe I'm not sure. I know in switchio when we do await sess.answer() underneath the hood we wait for the "CHANNEL_ANSWER" event.

Let me try what you're doing before going off on a tangent trying to prove my theory correct heh.

@goodboy
Copy link
Member

goodboy commented Dec 1, 2017

@k4ml ok so I was able to replicate the situation you describe - where after playback the park timeout cause is used to hangup the call although I don't seem to be able to get that behaviour consistently.

I'm going to investigate a little further.

goodboy pushed a commit that referenced this issue Dec 3, 2017
Verify that when a session is parked but not handled within the timeout
period (in this case 3 seconds as set in our CI dialplan) it is cancelled
by FS. In the contrary case verify that a simple `session.answer()` within
the timeout results in a successful SIPp `uac` client scenario.

Resolves #47
@goodboy
Copy link
Member

goodboy commented Dec 3, 2017

Further progress on this. I found that FS core is exhibiting unreliable uuid_broadcast behaviour and so I've deprecated its usage as part of #52. I now have playback after park working again and it seems that now I'm never receiving a PLAYBACK_STOP event until I manually kill the playback app using uuid_break. Once I do this I do see the same situation as @k4ml where the park_timeout logic activates and the session is torn down via the coded hangup code. Luckily, for now, if uuid_break is never called (eg. using Session.breakmedia() in switchio) then the session stays in the playback app and the park_timeout never activates.

@moises-silva I personally think this is incorrect behaviour and FS core should move this park_timeout logic further down inside switch_ivr_park to the end of the function such that incoming events are processed before a timeout can occur. You think it's worth proposing to the core team? I also think park_timeout should be a timer that is reset for each time the park loop is re-entered.

@k4ml
Copy link
Author

k4ml commented Dec 3, 2017

The behavior I noticed above still similar with this switchio snippet:-

from switchio.apps.routers import Router

router = Router(guards={
    'Call-Direction': 'inbound',
    },
    subscribe=('PLAYBACK_STOP',)
    )

@router.route('(.*)')
async def welcome(sess, match, router):
    """Say hello to inbound calls.
    """
    await sess.answer()  # resumes once call has been fully answered
    sess.log.info("Answered call to {}".format(match.groups(0)))

    sess.playback('media.mp3') # non-blocking
    sess.log.info("Playing welcome message")
    await sess.recv("PLAYBACK_STOP")
    sess.playback('media.mp3') # non-blocking
    sess.log.info("Playing again ...")
    await sess.recv("PLAYBACK_STOP")

    await sess.hangup()  # resumes once call has been fully hungup
    sess.log.info("%s hangup" % sess.uuid)

With park_timeout, the call hangup after the first playback with the coded hangup cause. This is in router_extra_subscribe branch.

goodboy pushed a commit that referenced this issue Dec 4, 2017
This is a safeguard in case the ESL connection drops while inbound calls
are being received but no longer processed by switchio.

Resolves #47
goodboy pushed a commit that referenced this issue Dec 4, 2017
Verify that when a session is parked but not handled within the timeout
period (in this case 3 seconds as set in our CI dialplan) it is cancelled
by FS. In the contrary case verify that a simple `session.answer()` within
the timeout results in a successful SIPp `uac` client scenario.

Resolves #47
@goodboy
Copy link
Member

goodboy commented Dec 4, 2017

@k4ml does the file media.mp3 actually exist on your FS minion? I have seen that if you fail to playback a file the park_timeout will kick in. I will bet that you'll see errors in the FS log and then the teardown due to the the timeout.

@k4ml
Copy link
Author

k4ml commented Dec 4, 2017

@tgoodlet oh, sorry. media.mp3 is just to mask a real file which is accessed via http. But I can verify the media being played and I can hear it and no errors in freeswitch log as well.

@goodboy
Copy link
Member

goodboy commented Dec 4, 2017

@k4ml yeah so looking at the core FS code more I think we'll need to propose a patch to core to make this work the way we want. I'm happy to do this - just not sure when i'll get some time next, hopefully this week.

goodboy pushed a commit that referenced this issue Dec 5, 2017
This is a safeguard in case the ESL connection drops while inbound calls
are being received but no longer processed by switchio.

Resolves #47
goodboy pushed a commit that referenced this issue Dec 5, 2017
Verify that when a session is parked but not handled within the timeout
period (in this case 3 seconds as set in our CI dialplan) it is cancelled
by FS. In the contrary case verify that a simple `session.answer()` within
the timeout results in a successful SIPp `uac` client scenario.

Resolves #47
goodboy pushed a commit that referenced this issue Dec 6, 2017
This is a safeguard in case the ESL connection drops while inbound calls
are being received but no longer processed by switchio.

Resolves #47
goodboy pushed a commit that referenced this issue Dec 6, 2017
Verify that when a session is parked but not handled within the timeout
period (in this case 3 seconds as set in our CI dialplan) it is cancelled
by FS. In the contrary case verify that a simple `session.answer()` within
the timeout results in a successful SIPp `uac` client scenario.

Resolves #47
goodboy pushed a commit that referenced this issue Dec 12, 2017
This is a safeguard in case the ESL connection drops while inbound calls
are being received but no longer processed by switchio.

Resolves #47
goodboy pushed a commit that referenced this issue Dec 12, 2017
Verify that when a session is parked but not handled within the timeout
period (in this case 3 seconds as set in our CI dialplan) it is cancelled
by FS. In the contrary case verify that a simple `session.answer()` within
the timeout results in a successful SIPp `uac` client scenario.

Resolves #47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants