Skip to content

BLE peripheral PHY update hangs occasionally when running WSF/FreeRTOS #1497

@khpeterson

Description

@khpeterson

MSDK version 0959f1e (but no changes in Libraries/Cordio relative to today's origin/HEAD) running on a vanilla MAX32666FTHR connecting with a Qualcomm based BlueZ host (Particle Tachyon).

Normal flow for a PHY update looks like this:

lctrSlvLlcpExecutePhyUpdateSm: handle=0, llcpState=IDLE, phyUpdState=0, event=2
>>> Data Path Tx Paused, handle=0 <<<
lctrSlvLlcpExecutePhyUpdateSm: handle=0, llcpState=BUSY, phyUpdState=3, event=6
>>> Data Path Tx Unpaused/Resumed, handle=0 <<<

But occasionally, the PHY update will not Unpause/Resume because the ARQ_Q_FLUSH message sent by lctrSlvLlcpExecutePhyUpdateSm is discarded. That looks like this:

lctrSlvLlcpExecutePhyUpdateSm: handle=0, llcpState=IDLE, phyUpdState=0, event=2
>>> Data Path Tx Paused, handle=0 <<<
Unhandled LLCP SM event, handle=0, event=42

If there is no subsequent ARQ_Q_FLUSH message sent by a pending lctrConnTxCompletedHandler, then the PHY update will not happen and communication with the host will stop.

The call tree that initiates the initial ARQ_Q_FLUSH message send looks like this:

prvWSFHndTask
  -> LlHandler
    -> LctrEventHandler
      -> lctrConnRxPendingHandler
        -> lctrSlvEncProcessDataPdu
          -> lctrSlvConnExecuteSm
            -> lctrSlvLlcpExecuteSm
              -> lctrSlvLlcpExecutePhyUpdateSm
                -> lctrExecAction
                  -> lctrActPeerPhyReq
                    -> lctrActFlushArq
                      -> lctrCheckPauseComplete
                        -> lctrSendConnMsg
                          -> WsfMsgSend(ARQ_Q_FLUSH)

And the relevant body of lctrSlvLlcpExecutePhyUpdateSm looks like:

bool_t lctrSlvLlcpExecutePhyUpdateSm(lctrConnCtx_t *pCtx, uint8_t event)
{
  if ((event = lctrRemapEvent(pCtx, event)) == LCTR_PU_EVENT_INVALID)
  {
    return FALSE;
  }

  switch (pCtx->llcpState)
  {
    case LCTR_LLCP_STATE_IDLE:
      LL_TRACE_INFO3("lctrSlvLlcpExecutePhyUpdateSm: handle=%u, llcpState=IDLE, phyUpdState=%u, event=%u", LCTR_GET_CONN_HANDLE(pCtx), pCtx->phyUpdState, event);

      lctrExecAction(pCtx, event);

      if (pCtx->phyUpdState != LCTR_PU_STATE_IDLE)
      {
        pCtx->llcpState = LCTR_LLCP_STATE_BUSY;
        pCtx->llcpActiveProc = LCTR_PROC_PHY_UPD;
        pCtx->llcpInstantComp = FALSE;
      }
      break;

    case LCTR_LLCP_STATE_BUSY:
      LL_TRACE_INFO3("lctrSlvLlcpExecutePhyUpdateSm: handle=%u, llcpState=BUSY, phyUpdState=%u, event=%u", LCTR_GET_CONN_HANDLE(pCtx), pCtx->phyUpdState, event);

Note how the state variables llcpState, llcpActiveProc and llcpInstantComp are updated in the llcp IDLE state AFTER lctrExecAction returns.

With the WSF/FreeRTOS implementation, when WsfMsgSnd(ARQ_Q_FLUSH) is called, the WSFHndTask (priority=5) will be preempted to run the WSFMsgTask (priority=6) which then re-runs lctrSlvLlcpExecutePhyUpdateSm. In this pass lctrRemapEvent, will only remap and forward the event if llcpActiveProc == LCTR_PROC_PHY_UPD:

    case LCTR_CONN_ARQ_Q_FLUSHED:
      if (pCtx->llcpActiveProc == LCTR_PROC_PHY_UPD)
      {
        return LCTR_PU_EVENT_ARQ_FLUSHED;
      }
      break;
      .
      .
      .
    default:
      break;
  }
  return LCTR_PU_EVENT_INVALID;

But since llcpActiveProc has not been update yet, lctrSlvLlcpExecutePhyUpdateSm early exits and results in the "Unhandled LLCP SM event" message and a hung PHY update.

Bottom line, the state variables and especially llcpActiveProc need to be set before the ARQ_Q_FLUSH message handler is dispatched.

Since WSF/baremetal does not allow pre-emption between "tasks" this is not an issue.

I have a fairly simple PR fix in mind and will post soon.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions