Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid casts that don't change the pointer type #6785

Merged
merged 3 commits into from
Feb 2, 2025
Merged

Conversation

aitap
Copy link
Contributor

@aitap aitap commented Jan 31, 2025

In C, explicit pointer casts are mostly needed to work with the byte representation of a larger type. In other cases, either the implicit cast (e.g. to and from void*) is already fine and won't cause any warnings, or the code is doing something suspicious (e.g. Rboolean*int*) and we'd rather have the warning from the implicit cast.

For example, adding an explicit cast from LOGICAL(...) to int* doesn't prevent a warning, because LOGICAL(...) already returns an int*. If the pointer type returned by LOGICAL(...) ever changes, the absence of the cast will cause the compiler to issue a diagnostic: then it would be an implicit cast to a different pointer type, likely a violation of the "strict aliasing" rule, something bad to warn about. An explicit cast, on the other hand, tells the compiler to silence the warning and do as written.

Initially found while working on #6782. More examples located using the following Coccinelle script:

@@
type T;
T E;
@@
- (T)
  E

and the following command line:

spatch --sp-file identity_casts.cocci --dir $DATA_TABLE/src --recursive-includes -I $R_HOME/include -I $DATA_TABLE/src

The remaining examples found by spatch are all in snprintf(...) calls casting non-pointer values like an uint64_t corresponding to the PRIu64 conversion specifier. It's probably fine to keep the code defensive and leave these casts in. If the non-pointer type does change, the cast should prevent most problems, unless the cast causes a signed overflow (to be caught by UBSan).

aitap added 2 commits January 31, 2025 14:27
If the returned pointer type ever changes, the absence of the cast will
give us a compiler warning due to the value being implicitly cast to an
incompatible pointer type. An explicit cast, on the other hand, tells
the compiler to silence the warning and do as written.
Copy link

codecov bot commented Jan 31, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 98.60%. Comparing base (32dfd8d) to head (54a0c14).
Report is 4 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #6785   +/-   ##
=======================================
  Coverage   98.60%   98.60%           
=======================================
  Files          79       79           
  Lines       14642    14642           
=======================================
  Hits        14438    14438           
  Misses        204      204           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@MichaelChirico
Copy link
Member

Do you think it makes sense to run your diagnostic program as part of every PR (under code-quality/lint-c one presumes)? Or perhaps just occasionally (we have R-CMD-check-occasional which is not currently in a working state, or just .dev/CRAN_release for manual checks)?

Copy link
Member

@MichaelChirico MichaelChirico left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great finds!

Feel free to merge before/after adding a CI check; if you'd rather punt on the CI check, please file a follow-up.

For mistakes that are easy to fix automatically, Coccinelle patches may
be written. If a patch finds a mistake, the linter will ask the human to
fix either the code or the patch.
@aitap aitap force-pushed the avoid_cast_to_identity branch from de686d1 to 54a0c14 Compare February 2, 2025 20:25
@aitap
Copy link
Contributor Author

aitap commented Feb 2, 2025

Coccinelle installation adds 11 seconds to the lint-c job. Overall, it's still close to the other checks we have.

On a broader note, in addition to pre-written data.table-specific linters, might we benefit from more generic static checking for the C code? A cppcheck run (3.5 minutes, could be sped up by limiting the combinatorial explosion of #ifdefs that it considers) found only a few very minor problems, such as ptr = realloc(ptr, newsize) (the original ptr would leak if realloc fails) or a forgotten va_end (I think it's a no-op on most platforms?). A commercial analyser could also be an option, like PVS-Studio (downside: their conditions amount to an ad in the README).

@aitap aitap merged commit f9c2824 into master Feb 2, 2025
10 of 11 checks passed
@aitap aitap deleted the avoid_cast_to_identity branch February 2, 2025 20:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants