-
-
Notifications
You must be signed in to change notification settings - Fork 182
Fix UTF8 encoding of invalid character #3191
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
""" WalkthroughThe changes overhaul UTF-8 and UTF-16 string handling in core utility code, focusing on explicit validation, robust error handling, and correct character counting and conversion. The Changes
Sequence Diagram(s)sequenceDiagram
participant Font as CLR_GFX_Font
participant Helper as CLR_RT_UnicodeHelper
Font->>Helper: CountNumberOfCharacters()
Helper-->>Font: Returns accurate character count
Font->>Font: Iterate over string using correct character limit
sequenceDiagram
participant Helper as CLR_RT_UnicodeHelper
Note over Helper: UTF-8 to UTF-16 Conversion
Helper->>Helper: Validate input bytes
alt Valid sequence
Helper->>Helper: Convert to UTF-16 code unit(s)
else Invalid sequence
Helper->>Helper: Output U+FFFD replacement character
end
Helper-->>Caller: Return converted string or error
sequenceDiagram
participant String as corlib_native_System_String
participant Helper as CLR_RT_UnicodeHelper
String->>Helper: MatchString(inputIter, searchStr, searchCharLen)
Helper->>Helper: Compare input and search string char-by-char
alt All match
Helper-->>String: Return true
else Mismatch or end of input
Helper-->>String: Return false
end
Assessment against linked issues
Assessment against linked issues: Out-of-scope changesNo out-of-scope changes detected related to the linked issue objectives. 📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
🧰 Additional context used🧠 Learnings (2)📓 Common learnings
src/CLR/Core/CLR_RT_UnicodeHelper.cpp (11)
⏰ Context from checks skipped due to timeout of 90000ms (3)
🔇 Additional comments (8)
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (3)
src/CLR/Core/CLR_RT_UnicodeHelper.cpp (3)
89-104
: Remove commented-out code.The commented code block is unnecessary and reduces readability.
else { // Advance over valid continuation bytes in invalid sequence - // int advance = 0; - // if (UTF8_VALID_CONTINUATION(pSrc[0])) { - // advance++; - // if (UTF8_VALID_CONTINUATION(pSrc[1])) { - // advance++; - // } - // } - // pSrc += advance; - // remaining -= advance; - pSrc += validCount; remaining -= validCount; }
277-286
: Remove commented-out code.Clean up the commented code for better readability.
int validCount = 0; if (UTF8_VALID_CONTINUATION(ch2)) validCount++; if (UTF8_VALID_CONTINUATION(ch3)) validCount++; if (validCount <2) { - //inputUTF8 += validCount; - //iMaxBytes -= validCount; goto invalid_sequence; }
329-334
: Remove commented-out code.Clean up the commented code.
if (validCount < 3) { // Not all continuation bytes are valid - //inputUTF8 += validCount; - //iMaxBytes -= validCount; goto invalid_sequence; }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
src/CLR/CorLib/corlib_native_System_String.cpp
(1 hunks)src/CLR/Core/CLR_RT_UnicodeHelper.cpp
(7 hunks)src/nanoFramework.Graphics/Graphics/Core/Support/Fonts/Font.cpp
(2 hunks)
🧰 Additional context used
🧠 Learnings (3)
src/nanoFramework.Graphics/Graphics/Core/Support/Fonts/Font.cpp (2)
Learnt from: josesimoes
PR: nanoframework/nf-interpreter#3023
File: targets/netcore/nanoFramework.nanoCLR/nanoCLR_native.cpp:191-225
Timestamp: 2024-09-25T11:28:38.536Z
Learning: In `nanoCLR_GetNativeAssemblyInformation`, there is no need to return the number of bytes written, as the memory buffer is zeroed, making the string buffer terminated.
Learnt from: josesimoes
PR: nanoframework/nf-interpreter#3023
File: targets/netcore/nanoFramework.nanoCLR/nanoCLR_native.cpp:191-225
Timestamp: 2024-10-08T15:52:09.445Z
Learning: In `nanoCLR_GetNativeAssemblyInformation`, there is no need to return the number of bytes written, as the memory buffer is zeroed, making the string buffer terminated.
src/CLR/CorLib/corlib_native_System_String.cpp (1)
Learnt from: josesimoes
PR: nanoframework/nf-interpreter#3172
File: src/CLR/Core/TypeSystem.cpp:971-999
Timestamp: 2025-05-14T17:26:54.181Z
Learning: In the nanoFramework codebase, when handling HRESULT values from function calls like `parser.Advance(elem)`, it's preferable to use `if (parser.Advance(elem) != S_OK)` for local error handling rather than using the `NANOCLR_CHECK_HRESULT` macro, which would immediately propagate errors up the call stack.
src/CLR/Core/CLR_RT_UnicodeHelper.cpp (8)
Learnt from: josesimoes
PR: nanoframework/nf-interpreter#3074
File: src/CLR/Core/GarbageCollector_Info.cpp:107-167
Timestamp: 2025-01-22T03:38:57.394Z
Learning: In nanoFramework's memory management code, DataSize() validation is comprehensively handled through CLR_RT_HeapCluster::ValidateBlock() and other caller code. Additional size checks in ValidateCluster() are redundant as the validation is already performed at multiple levels.
Learnt from: frobijn
PR: nanoframework/nf-interpreter#3023
File: CMake/Modules/FindNF_NativeAssemblies.cmake:95-106
Timestamp: 2024-10-08T15:52:09.445Z
Learning: The `AddCorLibAssemblyVersion` macro in `CMake/Modules/FindNF_NativeAssemblies.cmake` does not require additional error handling for regex failures because the associated *.cpp files contain warnings against modification. Any changes to these files should be accompanied by corresponding updates to the CMake file.
Learnt from: frobijn
PR: nanoframework/nf-interpreter#3023
File: CMake/Modules/FindNF_NativeAssemblies.cmake:95-106
Timestamp: 2024-09-23T17:56:37.570Z
Learning: The `AddCorLibAssemblyVersion` macro in `CMake/Modules/FindNF_NativeAssemblies.cmake` does not require additional error handling for regex failures because the associated *.cpp files contain warnings against modification. Any changes to these files should be accompanied by corresponding updates to the CMake file.
Learnt from: josesimoes
PR: nanoframework/nf-interpreter#3023
File: targets/netcore/nanoFramework.nanoCLR/nanoCLR_native.cpp:191-225
Timestamp: 2024-10-08T15:52:09.445Z
Learning: In `nanoCLR_GetNativeAssemblyInformation`, there is no need to return the number of bytes written, as the memory buffer is zeroed, making the string buffer terminated.
Learnt from: josesimoes
PR: nanoframework/nf-interpreter#3023
File: targets/netcore/nanoFramework.nanoCLR/nanoCLR_native.cpp:191-225
Timestamp: 2024-09-25T11:28:38.536Z
Learning: In `nanoCLR_GetNativeAssemblyInformation`, there is no need to return the number of bytes written, as the memory buffer is zeroed, making the string buffer terminated.
Learnt from: josesimoes
PR: nanoframework/nf-interpreter#3023
File: targets/netcore/nanoFramework.nanoCLR/nanoCLR_native.cpp:191-225
Timestamp: 2024-10-12T19:00:39.000Z
Learning: When working with `nanoCLR_GetNativeAssemblyInformation`, fixed-size assembly names are required, so code that deals with variable-length names cannot be used.
Learnt from: josesimoes
PR: nanoframework/nf-interpreter#3023
File: targets/netcore/nanoFramework.nanoCLR/nanoCLR_native.cpp:191-225
Timestamp: 2024-09-25T11:28:38.536Z
Learning: When working with `nanoCLR_GetNativeAssemblyInformation`, fixed-size assembly names are required, so code that deals with variable-length names cannot be used.
Learnt from: josesimoes
PR: nanoframework/nf-interpreter#3190
File: src/CLR/Core/TypeSystem.cpp:0-0
Timestamp: 2025-06-26T09:16:55.184Z
Learning: In nanoFramework's CLR attribute parsing (src/CLR/Core/TypeSystem.cpp), the sentinel value 0xFFFF in string tokens represents a null string. When encountered, this should result in a true null reference (using SetObjectReference(nullptr)) rather than an empty string instance, and the boxing operation should be skipped via early return.
⏰ Context from checks skipped due to timeout of 90000ms (5)
- GitHub Check: nf-interpreter (Nightly build) (Build_STM32_targets ST_NUCLEO64_F091RC)
- GitHub Check: nf-interpreter (Nightly build) (Build_STM32_targets ORGPAL_PALX)
- GitHub Check: nf-interpreter (Nightly build) (Build_STM32_targets ORGPAL_PALTHREE)
- GitHub Check: nf-interpreter (Nightly build) (Check_Build_Options)
- GitHub Check: nf-interpreter (Check_Build_Options)
🔇 Additional comments (10)
src/CLR/Core/CLR_RT_UnicodeHelper.cpp (6)
10-22
: Improved UTF-8 validation macros.The new positive validation macros are clearer and follow UTF-8 specifications correctly.
156-192
: Correct UTF-16 byte counting with proper surrogate handling.The implementation correctly handles surrogate pairs and invalid sequences, outputting appropriate byte counts for each case.
410-486
: Well-implemented UTF-8 backward traversal.The rewritten function correctly identifies UTF-8 sequence boundaries when moving backward and properly validates sequences.
488-637
: Robust UTF-16 to UTF-8 conversion with proper error handling.The rewritten function correctly handles surrogate pairs, unpaired surrogates, and buffer management. The replacement character (U+FFFD) is properly encoded for invalid sequences.
734-742
: Proper error handling for UTF-8 conversion failure.Good addition of error handling that properly cleans up allocated memory on conversion failure.
754-756
: Fixed memory leak in Release method.Setting
m_wCharArray
to NULL after release prevents use-after-free issues.src/nanoFramework.Graphics/Graphics/Core/Support/Fonts/Font.cpp (2)
158-158
: Correct use of actual UTF-8 character count.Using the actual character count from the UTF-8 string ensures accurate iteration limits for font rendering.
329-329
: Consistent character counting in width calculation.Good consistency with the StringOut method - both now use the accurate UTF-8 character count.
src/CLR/CorLib/corlib_native_System_String.cpp (2)
588-628
: New string matching helper looks correct.The
MatchString
function properly handles UTF-8 to UTF-16 conversion and surrogate pairs for accurate string comparison.
245-425
: Please verify the IndexOf helper implementation still existsI searched for the static helper declared in
src/CLR/CorLib/corlib_native.h
:
Header declares
static HRESULT IndexOf(CLR_RT_StackFrame &stack, int mode);
I could not locate its definition in any
.cpp
file undersrc/CLR/CorLib
Without that implementation, all the public
IndexOf
overloads would be broken.
Can you confirm whether the helper was moved or intentionally removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (3)
src/CLR/Core/CLR_RT_UnicodeHelper.cpp (3)
74-75
: Fix duplicate continuation byte check.The code checks
pSrc[0]
twice instead of checking both continuation bytes.- int validCount = 0; - if (UTF8_VALID_CONTINUATION(pSrc[0])) validCount++; - if (UTF8_VALID_CONTINUATION(pSrc[0]) && UTF8_VALID_CONTINUATION(pSrc[1])) validCount++; + int validCount = 0; + if (UTF8_VALID_CONTINUATION(pSrc[0])) validCount++; + if (validCount == 1 && UTF8_VALID_CONTINUATION(pSrc[1])) validCount++;
115-121
: Fix duplicate continuation byte checks for 4-byte sequences.Similar to the 3-byte case, this checks
pSrc[0]
multiple times instead of checking all three continuation bytes.- if (UTF8_VALID_CONTINUATION(pSrc[0])) validCount++; - if (UTF8_VALID_CONTINUATION(pSrc[0]) && UTF8_VALID_CONTINUATION(pSrc[1])) validCount++; - if (UTF8_VALID_CONTINUATION(pSrc[0]) && UTF8_VALID_CONTINUATION(pSrc[1]) && - UTF8_VALID_CONTINUATION(pSrc[2])) validCount++; + if (UTF8_VALID_CONTINUATION(pSrc[0])) validCount++; + if (validCount == 1 && UTF8_VALID_CONTINUATION(pSrc[1])) validCount++; + if (validCount == 2 && UTF8_VALID_CONTINUATION(pSrc[2])) validCount++;
324-327
: Fix duplicate continuation byte validation.The code checks
ch2
twice instead of checking different continuation bytes.- if (UTF8_VALID_CONTINUATION(ch2)) validCount++; - if (UTF8_VALID_CONTINUATION(ch2) && UTF8_VALID_CONTINUATION(ch3)) validCount++; - if (UTF8_VALID_CONTINUATION(ch2) &&UTF8_VALID_CONTINUATION(ch3) && - UTF8_VALID_CONTINUATION(ch4)) validCount++; + if (UTF8_VALID_CONTINUATION(ch2)) validCount++; + if (validCount == 1 && UTF8_VALID_CONTINUATION(ch3)) validCount++; + if (validCount == 2 && UTF8_VALID_CONTINUATION(ch4)) validCount++;
🧹 Nitpick comments (2)
src/CLR/Core/CLR_RT_UnicodeHelper.cpp (2)
31-31
: Consider potential integer overflow edge case.While
0x7fffffff
is a reasonable large value, consider usingINT_MAX
for consistency and clarity about the intent.- int remaining = (max < 0) ? 0x7fffffff : max; + int remaining = (max < 0) ? INT_MAX : max;
214-216
: Consider using INT_MAX for consistency.For better code clarity and consistency with standard practices.
- if (iMaxBytes == -1) { - iMaxBytes = INT_MAX; - } + if (iMaxBytes == -1) { + iMaxBytes = INT_MAX; + }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/CLR/Core/CLR_RT_UnicodeHelper.cpp
(7 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: josesimoes
PR: nanoframework/nf-interpreter#3144
File: azure-pipelines-templates/download-install-cmake.yml:1-3
Timestamp: 2025-04-07T14:39:17.549Z
Learning: In the nanoframework/nf-interpreter repository, Unix-style line endings (\n) are not mandatory, even for YAML files, despite static analysis tools like YAMLlint flagging them as errors.
src/CLR/Core/CLR_RT_UnicodeHelper.cpp (10)
Learnt from: josesimoes
PR: nanoframework/nf-interpreter#3074
File: src/CLR/Core/GarbageCollector_Info.cpp:107-167
Timestamp: 2025-01-22T03:38:57.394Z
Learning: In nanoFramework's memory management code, DataSize() validation is comprehensively handled through CLR_RT_HeapCluster::ValidateBlock() and other caller code. Additional size checks in ValidateCluster() are redundant as the validation is already performed at multiple levels.
Learnt from: frobijn
PR: nanoframework/nf-interpreter#3023
File: CMake/Modules/FindNF_NativeAssemblies.cmake:95-106
Timestamp: 2024-10-08T15:52:09.445Z
Learning: The `AddCorLibAssemblyVersion` macro in `CMake/Modules/FindNF_NativeAssemblies.cmake` does not require additional error handling for regex failures because the associated *.cpp files contain warnings against modification. Any changes to these files should be accompanied by corresponding updates to the CMake file.
Learnt from: frobijn
PR: nanoframework/nf-interpreter#3023
File: CMake/Modules/FindNF_NativeAssemblies.cmake:95-106
Timestamp: 2024-09-23T17:56:37.570Z
Learning: The `AddCorLibAssemblyVersion` macro in `CMake/Modules/FindNF_NativeAssemblies.cmake` does not require additional error handling for regex failures because the associated *.cpp files contain warnings against modification. Any changes to these files should be accompanied by corresponding updates to the CMake file.
Learnt from: josesimoes
PR: nanoframework/nf-interpreter#3074
File: src/CLR/Core/GarbageCollector_Info.cpp:107-167
Timestamp: 2025-01-22T03:38:57.394Z
Learning: In CLR_RT_GarbageCollector::ValidateCluster, DataSize() validation is already handled by ValidateBlock() and other caller code, making additional size checks redundant.
Learnt from: josesimoes
PR: nanoframework/nf-interpreter#3172
File: src/CLR/Core/TypeSystem.cpp:971-999
Timestamp: 2025-05-14T17:26:54.181Z
Learning: In the nanoFramework codebase, when handling HRESULT values from function calls like `parser.Advance(elem)`, it's preferable to use `if (parser.Advance(elem) != S_OK)` for local error handling rather than using the `NANOCLR_CHECK_HRESULT` macro, which would immediately propagate errors up the call stack.
Learnt from: josesimoes
PR: nanoframework/nf-interpreter#3023
File: targets/netcore/nanoFramework.nanoCLR/nanoCLR_native.cpp:191-225
Timestamp: 2024-09-25T11:28:38.536Z
Learning: In `nanoCLR_GetNativeAssemblyInformation`, there is no need to return the number of bytes written, as the memory buffer is zeroed, making the string buffer terminated.
Learnt from: josesimoes
PR: nanoframework/nf-interpreter#3023
File: targets/netcore/nanoFramework.nanoCLR/nanoCLR_native.cpp:191-225
Timestamp: 2024-10-08T15:52:09.445Z
Learning: In `nanoCLR_GetNativeAssemblyInformation`, there is no need to return the number of bytes written, as the memory buffer is zeroed, making the string buffer terminated.
Learnt from: josesimoes
PR: nanoframework/nf-interpreter#3023
File: targets/netcore/nanoFramework.nanoCLR/nanoCLR_native.cpp:191-225
Timestamp: 2024-10-12T19:00:39.000Z
Learning: When working with `nanoCLR_GetNativeAssemblyInformation`, fixed-size assembly names are required, so code that deals with variable-length names cannot be used.
Learnt from: josesimoes
PR: nanoframework/nf-interpreter#3023
File: targets/netcore/nanoFramework.nanoCLR/nanoCLR_native.cpp:191-225
Timestamp: 2024-09-25T11:28:38.536Z
Learning: When working with `nanoCLR_GetNativeAssemblyInformation`, fixed-size assembly names are required, so code that deals with variable-length names cannot be used.
Learnt from: josesimoes
PR: nanoframework/nf-interpreter#3190
File: src/CLR/Core/TypeSystem.cpp:0-0
Timestamp: 2025-06-26T09:16:55.184Z
Learning: In nanoFramework's CLR attribute parsing (src/CLR/Core/TypeSystem.cpp), the sentinel value 0xFFFF in string tokens represents a null string. When encountered, this should result in a true null reference (using SetObjectReference(nullptr)) rather than an empty string instance, and the boxing operation should be skipped via early return.
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: nf-interpreter (Nightly build) (Check_Build_Options)
- GitHub Check: nf-interpreter (Check_Build_Options)
🔇 Additional comments (7)
src/CLR/Core/CLR_RT_UnicodeHelper.cpp (7)
10-11
: LGTM: Improved UTF-8 validation macros.The change from negative to positive validation checks makes the code more explicit and readable.
173-185
: LGTM: Proper surrogate pair and invalid surrogate handling.The logic correctly handles valid surrogate pairs (4 bytes) and invalid/unpaired surrogates (3 bytes for replacement character).
374-387
: LGTM: Proper invalid sequence handling with replacement character.The use of U+FFFD (0xFFFD) as replacement character aligns with Unicode standards and .NET 8 behavior as stated in the PR objectives.
411-486
: LGTM: Robust backward UTF-8 navigation.The rewritten function properly handles UTF-8 sequence boundaries, validates continuation bytes, and ensures atomic character movement. The logic correctly treats invalid lead bytes as single characters.
534-608
: LGTM: Comprehensive surrogate pair handling.The code properly handles both valid surrogate pairs and unpaired surrogates by outputting replacement characters (U+FFFD) for invalid cases, aligning with Unicode standards.
734-742
: LGTM: Proper error handling in conversion.The function correctly resets the UTF-8 input before conversion and properly cleans up allocated memory on conversion failure, preventing memory leaks.
754-756
: LGTM: Consistent resource cleanup.The Release method properly nulls the pointer and resets the length, ensuring consistent state after cleanup.
Remove seqStart
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Automated fixes for code style.
…50ea-cc3d-4001-bfa7-f9c5cade556f Code style fixes for nanoframework/nf-interpreter PR#3191
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Albert, this is looking great. Well done!
Mostly style comments in the review. Nothing personal, just trying to keep the consistency and code quality. 😅
src/nanoFramework.Graphics/Graphics/Core/Support/Fonts/Font.cpp
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (7)
src/CLR/Core/CLR_RT_UnicodeHelper.cpp (7)
33-33
: Clarify variable name and magic number.The variable name
remainingChar
could be more descriptive, and the magic number0x7fffffff
needs explanation.Consider renaming to
remainingChars
ormaxRemaining
and adding a comment explaining why0x7fffffff
is used as the default maximum.- int remainingChar = (max < 0) ? 0x7fffffff : max; + int remainingChars = (max < 0) ? 0x7fffffff : max; // Use max int when no limit specified
40-43
: Add explicit braces for consistency.Missing braces around the single-statement if block.
- if (ch == 0) + if (ch == 0) + { break; + }
122-122
: Add explicit braces for consistency.Missing braces around the single-statement if block.
- if (UTF8_VALID_CONTINUATION(pSrc[0])) + if (UTF8_VALID_CONTINUATION(pSrc[0])) + { validCount++; + }
273-276
: Add explicit braces for consistency.Missing braces around the goto statement.
- if (iMaxBytes < 1) + if (iMaxBytes < 1) + { goto invalid_sequence; + }
471-473
: Add explicit braces for consistency.Missing braces around the return statement.
- if (outputUTF16_size < 1) + if (outputUTF16_size < 1) + { return false; + }
78-87
: Fix flawed continuation byte validation logic.The 3-byte sequence validation has a critical flaw - it checks both
pSrc[0]
andpSrc[1]
independently but only incrementsvalidCount
once when both are valid. This doesn't properly track individual byte validity.Apply this fix to validate continuation bytes sequentially:
int validCount = 0; if (UTF8_VALID_CONTINUATION(pSrc[0])) { validCount++; } - if (UTF8_VALID_CONTINUATION(pSrc[0]) && UTF8_VALID_CONTINUATION(pSrc[1])) + if (validCount == 1 && UTF8_VALID_CONTINUATION(pSrc[1])) { validCount++; }
324-333
: Fix duplicate continuation byte validation in 3-byte sequences.Similar to the CountNumberOfCharacters function, this validation logic is flawed - it checks
ch2
andch3
independently rather than sequentially.Apply this fix to validate continuation bytes properly:
int validCount = 0; if (UTF8_VALID_CONTINUATION(ch2)) { validCount++; } - if (UTF8_VALID_CONTINUATION(ch3)) + if (validCount == 1 && UTF8_VALID_CONTINUATION(ch3)) { validCount++; }
🧹 Nitpick comments (1)
src/CLR/Core/CLR_RT_UnicodeHelper.cpp (1)
175-179
: Improve readability of counter logic.The counter decrement logic could be clearer about when max is decremented.
CLR_UINT16 ch = *pSrc++; - if (max > 0) - max--; + if (max > 0) + { + max--; + }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
src/CLR/CorLib/corlib_native_System_String.cpp
(5 hunks)src/CLR/Core/CLR_RT_UnicodeHelper.cpp
(9 hunks)src/nanoFramework.Graphics/Graphics/Core/Support/Fonts/Font.cpp
(2 hunks)
✅ Files skipped from review due to trivial changes (1)
- src/nanoFramework.Graphics/Graphics/Core/Support/Fonts/Font.cpp
🚧 Files skipped from review as they are similar to previous changes (1)
- src/CLR/CorLib/corlib_native_System_String.cpp
🧰 Additional context used
🧠 Learnings (1)
src/CLR/Core/CLR_RT_UnicodeHelper.cpp (11)
Learnt from: josesimoes
PR: nanoframework/nf-interpreter#3074
File: src/CLR/Core/GarbageCollector_Info.cpp:107-167
Timestamp: 2025-01-22T03:38:57.394Z
Learning: In nanoFramework's memory management code, DataSize() validation is comprehensively handled through CLR_RT_HeapCluster::ValidateBlock() and other caller code. Additional size checks in ValidateCluster() are redundant as the validation is already performed at multiple levels.
Learnt from: frobijn
PR: nanoframework/nf-interpreter#3023
File: CMake/Modules/FindNF_NativeAssemblies.cmake:95-106
Timestamp: 2024-09-23T17:56:37.570Z
Learning: The `AddCorLibAssemblyVersion` macro in `CMake/Modules/FindNF_NativeAssemblies.cmake` does not require additional error handling for regex failures because the associated *.cpp files contain warnings against modification. Any changes to these files should be accompanied by corresponding updates to the CMake file.
Learnt from: frobijn
PR: nanoframework/nf-interpreter#3023
File: CMake/Modules/FindNF_NativeAssemblies.cmake:95-106
Timestamp: 2024-10-08T15:52:09.445Z
Learning: The `AddCorLibAssemblyVersion` macro in `CMake/Modules/FindNF_NativeAssemblies.cmake` does not require additional error handling for regex failures because the associated *.cpp files contain warnings against modification. Any changes to these files should be accompanied by corresponding updates to the CMake file.
Learnt from: josesimoes
PR: nanoframework/nf-interpreter#3074
File: src/CLR/Core/GarbageCollector_Info.cpp:107-167
Timestamp: 2025-01-22T03:38:57.394Z
Learning: In CLR_RT_GarbageCollector::ValidateCluster, DataSize() validation is already handled by ValidateBlock() and other caller code, making additional size checks redundant.
Learnt from: josesimoes
PR: nanoframework/nf-interpreter#3158
File: src/CLR/CorLib/corlib_native_System_Guid.cpp:29-30
Timestamp: 2025-04-29T11:34:00.204Z
Learning: For UUID version 4 (random) generation in .NET's byte order representation, the version bits should be set in byte 7 using `(buf[7] & 0x0F) | 0x40` and the variant bits should be set in byte 9 using `(buf[9] & 0x3F) | 0x80`. This differs from the standard RFC 4122 byte layout due to .NET's byte-swapping behavior.
Learnt from: josesimoes
PR: nanoframework/nf-interpreter#3023
File: targets/netcore/nanoFramework.nanoCLR/nanoCLR_native.cpp:191-225
Timestamp: 2024-10-08T15:52:09.445Z
Learning: In `nanoCLR_GetNativeAssemblyInformation`, there is no need to return the number of bytes written, as the memory buffer is zeroed, making the string buffer terminated.
Learnt from: josesimoes
PR: nanoframework/nf-interpreter#3023
File: targets/netcore/nanoFramework.nanoCLR/nanoCLR_native.cpp:191-225
Timestamp: 2024-09-25T11:28:38.536Z
Learning: In `nanoCLR_GetNativeAssemblyInformation`, there is no need to return the number of bytes written, as the memory buffer is zeroed, making the string buffer terminated.
Learnt from: josesimoes
PR: nanoframework/nf-interpreter#3172
File: src/CLR/Core/TypeSystem.cpp:971-999
Timestamp: 2025-05-14T17:26:54.181Z
Learning: In the nanoFramework codebase, when handling HRESULT values from function calls like `parser.Advance(elem)`, it's preferable to use `if (parser.Advance(elem) != S_OK)` for local error handling rather than using the `NANOCLR_CHECK_HRESULT` macro, which would immediately propagate errors up the call stack.
Learnt from: josesimoes
PR: nanoframework/nf-interpreter#3190
File: src/CLR/Core/TypeSystem.cpp:0-0
Timestamp: 2025-06-26T09:16:55.184Z
Learning: In nanoFramework's CLR attribute parsing (src/CLR/Core/TypeSystem.cpp), the sentinel value 0xFFFF in string tokens represents a null string. When encountered, this should result in a true null reference (using SetObjectReference(nullptr)) rather than an empty string instance, and the boxing operation should be skipped via early return.
Learnt from: josesimoes
PR: nanoframework/nf-interpreter#3023
File: targets/netcore/nanoFramework.nanoCLR/nanoCLR_native.cpp:191-225
Timestamp: 2024-09-25T11:28:38.536Z
Learning: When working with `nanoCLR_GetNativeAssemblyInformation`, fixed-size assembly names are required, so code that deals with variable-length names cannot be used.
Learnt from: josesimoes
PR: nanoframework/nf-interpreter#3023
File: targets/netcore/nanoFramework.nanoCLR/nanoCLR_native.cpp:191-225
Timestamp: 2024-10-12T19:00:39.000Z
Learning: When working with `nanoCLR_GetNativeAssemblyInformation`, fixed-size assembly names are required, so code that deals with variable-length names cannot be used.
🧬 Code Graph Analysis (1)
src/CLR/Core/CLR_RT_UnicodeHelper.cpp (1)
src/CLR/Include/nanoCLR_Runtime.h (4)
utf8StringStart
(940-940)string
(959-959)m_wCharArray
(960-963)m_length
(964-967)
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: nf-interpreter (Nightly build) (Check_Build_Options)
- GitHub Check: nf-interpreter (Nightly build) (Check_Build_Options)
- GitHub Check: nf-interpreter (Check_Build_Options)
🔇 Additional comments (9)
src/CLR/Core/CLR_RT_UnicodeHelper.cpp (9)
9-25
: LGTM: Well-defined UTF-8 validation macros.The explicit positive validation macros for ASCII and continuation bytes are clear and correct, replacing the previous negative checks.
203-206
: LGTM: Proper handling of unpaired low surrogates.The addition of explicit handling for unpaired low surrogates (counting them as 3 bytes for replacement character) is correct and aligns with Unicode standards.
448-466
: LGTM: Structured error handling for invalid sequences.The replacement of goto-based error handling with structured control flow and consistent U+FFFD output is a significant improvement over the previous implementation.
490-582
: LGTM: Robust backward UTF-8 navigation.The complete rewrite of
MoveBackwardInUTF8
with proper sequence boundary detection, validation, and boundary checking is a significant improvement. The logic correctly handles:
- UTF-8 sequence start detection
- Continuation byte validation
- Boundary checking
- Partial move handling
596-597
: LGTM: Proper embedded null handling.Removing the early break on null characters allows proper handling of embedded nulls in UTF-16 strings, which is correct behavior.
630-647
: LGTM: Correct unpaired surrogate handling.The explicit handling of unpaired high surrogates with U+FFFD replacement character output aligns with Unicode standards and .NET behavior.
687-703
: LGTM: Complete surrogate pair handling.The addition of explicit unpaired low surrogate handling completes the surrogate pair validation and ensures consistent U+FFFD output for invalid sequences.
829-837
: LGTM: Improved error handling in Assign method.The addition of input reset and proper cleanup on conversion failure prevents memory leaks and ensures consistent state.
849-851
: LGTM: Complete state reset in Release method.Setting both
m_wCharArray
to NULL andm_length
to 0 ensures complete object state reset.
Automated fixes for code style.
…844b-406f-4fb2-a2d5-6c62fe1def26 Code style fixes for nanoframework/nf-interpreter PR#3191
@alberk8 please take a look at coderabbitai comments above. It may have found some edge cases... |
Automated fixes for code style.
…3f8c-f06c-4b31-bb32-8f12e8b982d7 Code style fixes for nanoframework/nf-interpreter PR#3191
Description
Motivation and Context
How Has This Been Tested?
Screenshots
Types of changes
Checklist
Summary by CodeRabbit
Summary by CodeRabbit
Bug Fixes
Refactor