Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting "Operation failed" on random customers (iOS) #504

Open
QuentinFarizon opened this issue Mar 15, 2023 · 11 comments
Open

Getting "Operation failed" on random customers (iOS) #504

QuentinFarizon opened this issue Mar 15, 2023 · 11 comments
Labels

Comments

@QuentinFarizon
Copy link

DFU Bootloader version (please complete the following information):

  • SDK version: SDK 16
  • Bonding used: no
  • Library version: 4.11.1

Device information (please complete the following information):

  • Device: iPad13,4, iPhone10,4, iPhone10,6, iPhone11,2, iPhone11,8, iPhone12,1, iPhone12,3, iPhone12,8, iPhone13,2, iPhone13,3, iPhone13,4, iPhone14,2, iPhone14,3, iPhone15,2, iPhone15,3, iPhone8,1
  • OS: iOS 14.4, iOS 14.8.1, iOS 15.0, iOS 15.1, iOS 15.1.1, iOS 15.2.1, iOS 15.3, iOS 15.3.1, iOS 15.4.1, iOS 15.5, iOS 15.6, iOS 15.6.1, iOS 16.0, iOS 16.0.2, iOS 16.2, iOS 16.3, iOS 16.3.1

Describe the bug
Among thousands of customers on iOS, that did tens of thousands of DFU, 25 of our customers get error "Operation failed". Documentation does not provide much information about what fails. I'm updating only app.
For these customers, it is not random, it always fail with this error

@QuentinFarizon QuentinFarizon changed the title Getting Operation failed on random customers Getting "Operation failed" on random customers Mar 15, 2023
@QuentinFarizon QuentinFarizon changed the title Getting "Operation failed" on random customers Getting "Operation failed" on random customers (iOS) Mar 15, 2023
@philips77
Copy link
Member

Hello,
Sorry for super late response. We were busy with other projects and have missed this one.

Let me go through the code with you.

Here: Result of search for "Operation failed" you have all occurrences of "Operation failed".

You wrote, that the device is based on nRF5 SDK v 16, which means you're using Secure DFU.
In this Devzone case you wrote, that the error happens during enabling DFU mode:

Among thousands of customers on iOS, that did tens of thousands of DFU, 25 of our customers get error "Operation failed" when applying DFU, after steps deviceConnecting -> dfuProcessStarting -> enablingDfuMode -> Operation failed

That would indicate, that the error you see is this one:

/// The operation failed.
case operationFailed = 0x04

This error is returned from the device when a command to switch to a bootloader mode is sent. The indication containing the error is handled here:
guard dfuResponse.status == .success else {
logger.e("Error \(dfuResponse.status.code): \(dfuResponse.status)")
let type = isExperimental ?
DFURemoteError.experimentalButtonless :
DFURemoteError.buttonless
report?(dfuResponse.status.error(ofType: type), dfuResponse.status.description)
return
}

The dfuResponse.status.error(ofType: type) is defined here:

func error(ofType remoteError: DFURemoteError) -> DFUError {
return remoteError.with(code: code)
}

which calls this method:
internal enum DFURemoteError : Int {
case legacy = 0
case secure = 10
case secureExtended = 20
case buttonless = 90
case experimentalButtonless = 9000
/// Returns a representative ``DFUError``
///
/// The only available codes that this method is called with are
/// hardcoded in the library (ButtonlessDFU, DFUControlPoint,
/// SecureDFUControlPoint). But, we have seen crashes so,
/// we are returning ``DFUError.unsupportedResponse`` if a code is not found.
func with(code: UInt8) -> DFUError {
return DFUError(rawValue: Int(code) + rawValue) ?? .unsupportedResponse
}
}

As the remote error type is .buttonless we're adding 90 and getting:
case remoteButtonlessDFUOperationFailed = 94 // 90 + 4

I assume this is the error you're getting. The description of the error is defined here:

This error is returned from the device when it is unable to jump to bootloader mode. The library correctly passes it to the user and terminates DFU process, as it cannot proceed.

Now, let's think of why the error is returned in the first place and where it originates from. For that, we need to switch the project to nRF5 SDK.

@QuentinFarizon
Copy link
Author

Hello @philips77
What do you mean by "switch the project to nRF5 SDK" ?

Please note that we do not reproduce this error on any of the board we have in our office, nor with any ios phone. It only happens on rare customers.

Please note also that every time a customer having the issue with iOS tried to perform the DFU with Android, it worked with Android.

@philips77
Copy link
Member

What do you mean by "switch the project to nRF5 SDK" ?

That means I need to switch from GitHub / Android project to nRF5 SDK to follow the root cause there.

Here's the documentation of the Buttonless Service, which is use to switch to the bootloader mode:
https://infocenter.nordicsemi.com/topic/sdk_nrf5_v16.0.0/service_dfu.html

where we can see this:
image
As usual, generic errors won't tell us much. We have to go to the source code.

@philips77
Copy link
Member

philips77 commented Sep 19, 2023

Here's the code that handles the command is in ble_dfu_unbonded.c file:

void ble_dfu_buttonless_on_ctrl_pt_write(ble_gatts_evt_write_t const * p_evt_write)
{
    uint32_t err_code;
    ble_dfu_buttonless_rsp_code_t rsp_code = DFU_RSP_OPERATION_FAILED;

    // Start executing the control point write operation
    /*lint -e415 -e416 -save "Out of bounds access"*/
    switch (p_evt_write->data[0])
    {
        case DFU_OP_ENTER_BOOTLOADER:
            err_code = enter_bootloader();
            if (err_code == NRF_SUCCESS)
            {
                rsp_code = DFU_RSP_SUCCESS;
            }
            else if (err_code == NRF_ERROR_BUSY)
            {
                rsp_code = DFU_RSP_BUSY;
            }
            break;

        case DFU_OP_SET_ADV_NAME:
            if(    (p_evt_write->data[1] > NRF_DFU_ADV_NAME_MAX_LENGTH)
                || (p_evt_write->data[1] == 0))
            {
                // New advertisement name too short or too long.
                rsp_code = DFU_RSP_ADV_NAME_INVALID;
            }
            else
            {
                memcpy(m_adv_name.name, &p_evt_write->data[2], p_evt_write->data[1]);
                m_adv_name.len = p_evt_write->data[1];
                err_code = set_adv_name(&m_adv_name);
                if (err_code == NRF_SUCCESS)
                {
                    rsp_code = DFU_RSP_SUCCESS;
                }
            }
            break;

        default:
            rsp_code = DFU_RSP_OP_CODE_NOT_SUPPORTED;
            break;
    }
    /*lint -restore*/


    // Report back in case of error
    if (rsp_code != DFU_RSP_SUCCESS)
    {
        err_code = ble_dfu_buttonless_resp_send((ble_dfu_buttonless_op_code_t)p_evt_write->data[0], rsp_code);
        if (err_code != NRF_SUCCESS)
        {
            mp_dfu->evt_handler(BLE_DFU_EVT_RESPONSE_SEND_ERROR);

        }
        // Report the error to the main application
        mp_dfu->evt_handler(BLE_DFU_EVT_BOOTLOADER_ENTER_FAILED);
    }
}

I suspect, the error may be returned from enter_bootloader().

@philips77
Copy link
Member

This is the source code of this method:

/**@brief Function for entering the bootloader.
 */
static uint32_t enter_bootloader()
{
    uint32_t err_code;

    if (mp_dfu->is_waiting_for_svci)
    {
        // We have an ongoing async operation. Entering bootloader mode is not possible at this time.
        err_code = ble_dfu_buttonless_resp_send(DFU_OP_ENTER_BOOTLOADER, DFU_RSP_BUSY);
        if (err_code != NRF_SUCCESS)
        {
            mp_dfu->evt_handler(BLE_DFU_EVT_RESPONSE_SEND_ERROR);
        }
        return NRF_SUCCESS;
    }

    // Set the flag indicating that we expect DFU mode.
    // This will be handled on acknowledgement of the characteristic indication.
    mp_dfu->is_waiting_for_reset = true;

    err_code = ble_dfu_buttonless_resp_send(DFU_OP_ENTER_BOOTLOADER, DFU_RSP_SUCCESS);
    if (err_code != NRF_SUCCESS)
    {
        mp_dfu->is_waiting_for_reset = false;
    }

    return err_code;
}

The only way this method fails is when ble_dfu_buttonless_resp_send returns a non-zero error.

@philips77
Copy link
Member

Going deeper, this is its source:

uint32_t ble_dfu_buttonless_resp_send(ble_dfu_buttonless_op_code_t op_code, ble_dfu_buttonless_rsp_code_t rsp_code)
{
    // Send indication
    uint32_t                err_code;
    const uint16_t          len = MAX_CTRL_POINT_RESP_PARAM_LEN;
    uint16_t                hvx_len;
    uint8_t                 hvx_data[MAX_CTRL_POINT_RESP_PARAM_LEN];
    ble_gatts_hvx_params_t  hvx_params;

    memset(&hvx_params, 0, sizeof(hvx_params));

    hvx_len     = len;
    hvx_data[0] = DFU_OP_RESPONSE_CODE;
    hvx_data[1] = (uint8_t)op_code;
    hvx_data[2] = (uint8_t)rsp_code;

    hvx_params.handle = m_dfu.control_point_char.value_handle;
    hvx_params.type   = BLE_GATT_HVX_INDICATION;
    hvx_params.offset = 0;
    hvx_params.p_len  = &hvx_len;
    hvx_params.p_data = hvx_data;

    err_code = sd_ble_gatts_hvx(m_dfu.conn_handle, &hvx_params);
    if ((err_code == NRF_SUCCESS) && (hvx_len != len))
    {
        err_code = NRF_ERROR_DATA_SIZE;
    }

    return err_code;
}

Next suspect is sd_ble_gatts_hvx(...) method.

@philips77
Copy link
Member

As this method is implemented in the SoftDevice, the only thing we can do is to point to a documentation:
https://infocenter.nordicsemi.com/index.jsp?topic=%2Fcom.nordic.infocenter.s140.api.v6.0.0%2Fgroup___b_l_e___g_a_t_t_s___f_u_n_c_t_i_o_n_s.html&anchor=ga313fe43c2e93267da668572e885945db

Note
I don't know which SD you're using, assuming s140 v 6.0.0, but you may search for your specific version.

@philips77
Copy link
Member

As I'm not a embedded developer and I have no experience with C, I would leave this for the Support Team to answer on the DevZone case.

@QuentinFarizon
Copy link
Author

QuentinFarizon commented Sep 26, 2023

@philips77 I don't see how this is closed, the issue is still here and is probable due to this library as it disent occur with Android

@QuentinFarizon
Copy link
Author

QuentinFarizon commented Jan 8, 2024

Hello @philips77, stating again that this shouldn't be closed. I don't see how this is not an issue with the library, given the board affected are correctly DFUed by Android phones.

This issue is hitting us hard, with more and more customers that just can't perform a DFU. It's one year old, and no actionnable info.

@philips77 philips77 reopened this Jan 9, 2024
@philips77
Copy link
Member

Hello @QuentinFarizon,
I replied in the DevZone case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants