Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancement: avoid isSanitized feature from breaking HTTP body JSON API params, in some rare scenario of params includes UTF8 chars #91

Open
rizdaprasetya opened this issue Dec 26, 2023 · 0 comments
Labels
enhancement New feature or request

Comments

@rizdaprasetya
Copy link
Collaborator

rizdaprasetya commented Dec 26, 2023

Affected Lines

Known sample, but recommended to search more comprehensively.

return substr($input, 0, $length);

$plus = substr($field, 0, 1) === '+';

https://github.com/search?q=repo%3AMidtrans%2Fmidtrans-php%20substr&type=code

Findings:

Reproducible evidence:

<?php
$anotherSampleStr = "Q-古典-吾家有喜A-半圆_ Q-古典-吾家有喜A-半圆 定制水晶绒【咨询客服】";

echo "// Sample \$str is a long utf-8 based string of chars: \n";
$str = "制制制制制制制制制制制制制制制制制制制制制制制制制制制制制制制制制制制制制制制制制制";
var_dump($str);

echo "// Length shorten using regular `substr(\$str,0,50)`, unexpectedly last chars is improperly truncated in the middle bytes and became invalid UTF-8 char: \n";
var_dump(substr($str,0,50));
echo "// Then JSON encoded, **UNEXPECTEDLY** produce empty JSON: \n";
var_dump(json_encode( substr($str,0,50) ));

echo "\n\n ---- Compared to: ---- \n\n";
echo "// Length shorten using specific `mb_substr(\$str,0,50)`: \n";
var_dump(mb_substr($str,0,50));
echo "- Then JSON encoded, **EXPECTED** succeed, BUT encoding changed: \n";
var_dump(json_encode( mb_substr($str,0,50) ));
echo "\n ---- \n";
echo "// Further, if JSON encoded using `JSON_UNESCAPED_UNICODE` param, expected succeed w/ perserved encoding: \n";
var_dump(json_encode( mb_substr($str,0,50),JSON_UNESCAPED_UNICODE ));

// tips: qucik run it using e.g. https://replit.com/@raizerde/Utf8EncodingSubstringIssue
?>

Issue Desc

When:

  • sanitization in which merchant's input params being sanitized, including being substr-ed,
  • then in the very rare scenario of the params includes UTF8 string of chars. (rare case because UTF8 usually used for non-latin char based country, which merchant is rarely from).

Unexpected:

  • the string's last char will be prematurely truncated, which unexpectedly result in an invalid UTF8 char.
  • then due to this problematic char will cause the string & params fail to be json_encode-ed, unexpectedly will result as false / empty JSON.
  • then it will cause empty HTTP request body to be sent to the API, & unexpected API response of e.g. Midtrans API is returning API error. HTTP status code: 500 API response: {"status_code":"500","status_message":"Sorry. Our system is recovering from unexpected issues. Please retry.","id":"c99ccd6b-3174-4b25-907a-bb8e6a1d586d"}

Further: Seems PHP does not natively support UTF-8 char encoding (Chinese chars are UTF-8, which is more bytes than regular ASCII which PHP support).

Expected:

  • substr & json_encode should success produce proper JSON, not empty.

Possible Solution:

[TBD] ... Probably use mb_substr to replace substr usage, but need further check for any possible downsides (e.g. apparently not all PHP server env enable MB extension, as it is an optional non-default extension).

  • Then maybe add warning about PHP limitation in using UTF-8 or other non ASCII string encoding.

Possible Workaround

  • Merchant can change config \Midtrans\Config::$isSanitized = true; to \Midtrans\Config::$isSanitized = false;
    • as the issue happens during Sanitization flow, disabling it will avoid the problematic flow. Although there will be downside.
@rizdaprasetya rizdaprasetya added the enhancement New feature or request label Dec 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant