-
Notifications
You must be signed in to change notification settings - Fork 15
Integrate tests into docker workflows #366
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Warning Rate limit exceeded@rhanka has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 8 minutes and 58 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (2)
WalkthroughA dedicated testing phase was introduced into the project's workflow and documentation. This includes adding a Changes
Sequence Diagram(s)sequenceDiagram
participant Developer
participant GitHub Actions
participant Makefile
participant Docker Backend
participant Pytest
Developer->>GitHub Actions: Open PR or push code
GitHub Actions->>Makefile: Run build
GitHub Actions->>Makefile: Run tests (make tests)
Makefile->>Docker Backend: Start container with latest backend image
Docker Backend->>Pytest: Run pytest in quiet mode
Pytest-->>Docker Backend: Report test results
Docker Backend-->>Makefile: Return test status
Makefile-->>GitHub Actions: Return test status
GitHub Actions-->>Developer: Show build and test results
Poem
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
🎉 Snyk checks have passed. No issues have been found so far.✅ security/snyk check is complete. No issues have been found. (View Details) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🔭 Outside diff range comments (1)
tests/test_recipes.py (1)
33-46: 🛠️ Refactor suggestionConsider creating separate tests for each method and test more edge cases.
The test for internal Recipe methods combines testing of
internal_fillnaandinternal_keepin a single test function. Additionally, the test is tightly coupled to the Recipe class implementation.I recommend:
- Split into separate test functions for each method
- Test more edge cases (all NA values, empty DataFrames)
- Add more descriptive assertions for better failure diagnostics
-def test_internal_fillna_and_keep(): +def test_internal_fillna(): df = pd.DataFrame({'A': [1, None], 'B': [None, 'x']}) r_fill = recipes.Recipe.__new__(recipes.Recipe) r_fill.args = [{'A': 0, 'B': ''}] filled = recipes.Recipe.internal_fillna(r_fill, df.copy()) assert filled['A'].tolist() == [1, 0] assert filled['B'].tolist() == ['', 'x'] + + # Test edge case: all NA values + df_all_na = pd.DataFrame({'A': [None, None], 'B': [None, None]}) + filled_all_na = recipes.Recipe.internal_fillna(r_fill, df_all_na.copy()) + assert filled_all_na['A'].tolist() == [0, 0] + assert filled_all_na['B'].tolist() == ['', ''] + +def test_internal_keep(): + df = pd.DataFrame({'A': [1, None], 'B': [None, 'x'], 'C': [3, 4]}) r_keep = recipes.Recipe.__new__(recipes.Recipe) r_keep.args = {'select': ['A']} recipes.Recipe.select_columns(r_keep, df.copy()) kept = recipes.Recipe.internal_keep(r_keep, df.copy()) assert list(kept.columns) == ['A'] + + # Test selecting multiple columns + r_keep_multi = recipes.Recipe.__new__(recipes.Recipe) + r_keep_multi.args = {'select': ['A', 'C']} + recipes.Recipe.select_columns(r_keep_multi, df.copy()) + kept_multi = recipes.Recipe.internal_keep(r_keep_multi, df.copy()) + assert list(kept_multi.columns) == ['A', 'C']
🧹 Nitpick comments (4)
requirements.txt (1)
28-29: Pin pytest version to avoid unexpected breaks
Addingpytestwithout any version constraint can lead to future breakages if a major release introduces breaking changes. Consider specifying a compatible version range, for example:-pyarrow>=14.0.1 -pytest +pyarrow>=14.0.1 +pytest>=7.0,<8.0This will ensure reproducible test runs across environments.
tests/test_recipes.py (3)
10-19: Consider adding more specific error handling.While the dynamic import mechanism works well, the error handling could be more specific to provide clearer feedback about why tests are skipped.
try: spec = importlib.util.spec_from_file_location('recipes', MODULE_PATH) recipes = importlib.util.module_from_spec(spec) spec.loader.exec_module(recipes) return recipes except FileNotFoundError: - pytest.skip(f'Could not import recipes module: {e}') + pytest.skip(f'Recipes module file not found at {MODULE_PATH}') except ImportError as e: + pytest.skip(f'Import error in recipes module: {e}') except Exception as e: + pytest.skip(f'Could not import recipes module: {e}')
22-31: Consider expanding fixed-width format test coverage.The test for fixed-width formatting is a good start but only verifies the first line of output. Consider:
- Verifying all lines in the output file to ensure multi-row writing works correctly
- Testing edge cases like empty DataFrames or very wide columns
- Testing with different column types
def test_fwf_format_and_to_fwf(tmp_path): df = pd.DataFrame({'A': ['a', 'bb'], 'B': ['1', '22']}) widths = [3, 3] line = recipes.fwf_format(df.iloc[0], widths) assert line == 'a 1 ' outfile = tmp_path / 'out.txt' recipes.to_fwf(df, outfile, widths=widths, names=['A', 'B']) content = outfile.read_text().splitlines() assert content[0].strip() == line.strip() + # Verify second line formatting + line2 = recipes.fwf_format(df.iloc[1], widths) + assert content[1].strip() == line2.strip() + + # Test with empty DataFrame + empty_df = pd.DataFrame(columns=['A', 'B']) + empty_outfile = tmp_path / 'empty_out.txt' + recipes.to_fwf(empty_df, empty_outfile, widths=widths, names=['A', 'B']) + assert empty_outfile.exists() + assert empty_outfile.stat().st_size == 0
8-8: Consider making the module path more flexible.The hardcoded module path might make tests brittle if the project structure changes.
-MODULE_PATH = Path(__file__).resolve().parents[1] / 'code' / 'recipes.py' +# More flexible approach to find the module +def find_module_path(module_name='recipes.py', search_dirs=['code', '.']): + base_dir = Path(__file__).resolve().parents[1] + for dir_name in search_dirs: + path = base_dir / dir_name / module_name + if path.exists(): + return path + return base_dir / 'code' / module_name # Default fallback + +MODULE_PATH = find_module_path()
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge Base: Disabled due to data retention organization setting
📒 Files selected for processing (6)
.github/workflows/pull.yml(1 hunks).github/workflows/push.yml(1 hunks)Makefile(1 hunks)README.md(1 hunks)requirements.txt(1 hunks)tests/test_recipes.py(1 hunks)
🧰 Additional context used
🪛 GitHub Actions: pull request
Makefile
[error] 701-701: Makefile target 'tests' failed with error code 5. No tests were run by pytest.
🔇 Additional comments (4)
.github/workflows/pull.yml (1)
20-25: Tests step integration looks correct
The new “tests” step runsmake testsunder the same branch context after the build. This correctly hooks into the existing flow. Once the Makefile target is updated to mount tests, this workflow will execute as expected.README.md (1)
132-141: Documentation for running tests is clear
The added section accurately describes both the Makefile-based and manual methods to run the test suite. This aligns with the new workflows and Makefile target. Consider adding a badge for test status in the future..github/workflows/push.yml (1)
24-28: Tests step integration looks correct
The “tests” step is properly inserted between build and publish, invokingmake testswith the correct branch context. No modifications needed here once the Makefile is fixed.tests/test_recipes.py (1)
1-9: Good use of dynamic imports and conditional test skipping.The approach of dynamically importing the recipes module and skipping tests when dependencies aren't available is robust. This ensures tests fail gracefully in environments where prerequisites aren't met.
I particularly like the use of
pytest.importorskipfor pandas and numpy dependencies.
Makefile
Outdated
| tests: backend-build | ||
| docker run --rm ${DOCKER_USERNAME}/${DC_PREFIX}-${APP}:latest pytest -q | ||
|
|
||
| .PHONY: tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Critical: Ensure tests are mounted into the container
The tests target currently invokes pytest inside the built image, but since test files aren’t baked into that image, pytest finds no tests (exit code 5). You need to mount your tests/ folder (and any other source dirs, if required) into the container at runtime.
Apply a diff like this to include the tests directory and set the working directory:
-tests: backend-build
- docker run --rm ${DOCKER_USERNAME}/${DC_PREFIX}-${APP}:latest pytest -q
-.PHONY: tests
+tests: backend-build
+ docker run --rm \
+ -v $(PWD)/tests:/app/tests \
+ -w /app \
+ ${DOCKER_USERNAME}/${DC_PREFIX}-${APP}:latest \
+ pytest -q
+.PHONY: testsThis mounts tests/ into /app/tests (adjust /app to your image’s workdir) so that pytest can discover and run them.
🧰 Tools
🪛 GitHub Actions: pull request
[error] 701-701: Makefile target 'tests' failed with error code 5. No tests were run by pytest.
🤖 Prompt for AI Agents
In the Makefile around lines 700 to 703, the tests target runs pytest inside the
Docker container but does not mount the tests directory, so pytest cannot find
any tests. Modify the docker run command to mount the local tests directory into
the container at the appropriate path (e.g., /app/tests) using the -v option and
set the working directory inside the container to that path with -w, ensuring
pytest can discover and execute the tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
t'as pas bien lu le Dockerfile, les tests sont bien montés dedans
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rhanka You're right, and I apologize for the misunderstanding. After reviewing the AI summary of the Dockerfile changes, I see that the tests directory is indeed:
- Already copied into the production image alongside the other directories (
code/,conf/,referential_data/) - Set up with a volume mount for
/testsin both production and development stages
This makes my previous suggestion unnecessary since the tests are already available in the container. Thank you for the correction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
.github/workflows/pull.yml (1)
17-19: Refactor build step for readability and maintainability
The current one-liner with an inline fallback is difficult to read and update. Consider switching to a multilinerunblock and leveraging theenv-declaredGIT_BRANCHso you don’t have to pass it on every make invocation.- run: make version backend-docker-check GIT_BRANCH=$GIT_BRANCH || ( make backend-build GIT_BRANCH=$GIT_BRANCH && make backend GIT_BRANCH=$GIT_BRANCH) + run: | + make version backend-docker-check || \ + (make backend-build && make backend)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge Base: Disabled due to data retention organization setting
📒 Files selected for processing (2)
.github/workflows/pull.yml(1 hunks)Makefile(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- Makefile
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: pull request test
| - name: tests | ||
| if: success() | ||
| run: make tests backend-stop GIT_BRANCH=$GIT_BRANCH |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Separate test execution from cleanup to ensure teardown on failures
By chaining make tests backend-stop, the backend will only be stopped if tests pass—failing tests will leave containers running. It’s more robust to split into two steps: one for tests and another (with if: always()) for cleanup.
@@ -20,5 +20,12 @@
env:
GIT_BRANCH: ${{ steps.extract_branch.outputs.branch }}
- - name: tests
- if: success()
- run: make tests backend-stop GIT_BRANCH=$GIT_BRANCH
+ - name: Run tests
+ run: make tests GIT_BRANCH=$GIT_BRANCH
+ env:
+ GIT_BRANCH: ${{ steps.extract_branch.outputs.branch }}
+
+ - name: Teardown backend
+ if: always()
+ run: make backend-stop GIT_BRANCH=$GIT_BRANCH
+ env:
+ GIT_BRANCH: ${{ steps.extract_branch.outputs.branch }}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - name: tests | |
| if: success() | |
| run: make tests backend-stop GIT_BRANCH=$GIT_BRANCH | |
| env: | |
| GIT_BRANCH: ${{ steps.extract_branch.outputs.branch }} | |
| - name: Run tests | |
| run: make tests GIT_BRANCH=$GIT_BRANCH | |
| env: | |
| GIT_BRANCH: ${{ steps.extract_branch.outputs.branch }} | |
| - name: Teardown backend | |
| if: always() | |
| run: make backend-stop GIT_BRANCH=$GIT_BRANCH | |
| env: | |
| GIT_BRANCH: ${{ steps.extract_branch.outputs.branch }} |
🤖 Prompt for AI Agents
In .github/workflows/pull.yml around lines 20 to 22, the current step runs tests
and backend-stop together, causing backend-stop to run only if tests succeed. To
fix this, split this into two separate steps: one step to run tests without
backend-stop, and a following step to run backend-stop with the condition `if:
always()` to ensure cleanup runs regardless of test success or failure.
940c465 to
97985ec
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (2)
docker-compose-dev.yml (1)
11-11: Mount tests directory for backend service
This line exposes your test suite to the container so you can runpytestinside the dev environment. Ensure that bothBACKENDandAPP_GROUPare defined (e.g., in a.envfile) and that the container path (/${APP_GROUP}/tests/) matches the working directory used in your Dockerfile. Consider consolidating on a single variable name (such asAPP) across Dockerfile and Compose to avoid path mismatches.Dockerfile (1)
73-73: Align volume path variable naming
In the development stage you correctly use${APP}for the tests volume, but the production stage uses${app_path}. For consistency and to prevent future confusion, unify both to${APP}.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge Base: Disabled due to data retention organization setting
📒 Files selected for processing (8)
.github/workflows/pull.yml(1 hunks).github/workflows/push.yml(1 hunks)Dockerfile(3 hunks)Makefile(2 hunks)README.md(1 hunks)docker-compose-dev.yml(1 hunks)requirements.txt(1 hunks)tests/test_recipes.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (6)
- requirements.txt
- README.md
- .github/workflows/push.yml
- .github/workflows/pull.yml
- Makefile
- tests/test_recipes.py
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: pull request test
🔇 Additional comments (2)
Dockerfile (2)
26-26: Create tests directory in build image
Addingtests\to themkdir -pensures that the directory exists before copying or mounting it later in the build.
44-44: Include tests in production image
ThisCOPY tests/ tests/line brings your test suite into the container. Make sure thetests/directory is present in the build context and update your.dockerignoreif you need to exclude other files.
| VOLUME /${app_path}/log | ||
| VOLUME /${app_path}/models | ||
| VOLUME /${app_path}/upload | ||
| VOLUME /${app_path}/tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Undefined environment variable used for volume path
The directive uses ${app_path} which isn’t declared (you only have ARG APP / ENV APP). This will cause a build error. Update it to use ${APP}, or define app_path earlier.
- VOLUME /${app_path}/tests
+ VOLUME /${APP}/tests🤖 Prompt for AI Agents
In Dockerfile at line 50, the VOLUME directive uses an undefined environment
variable `${app_path}`, which causes a build error. Replace `${app_path}` with
the correctly defined environment variable `${APP}` to match the existing ENV
declaration, or alternatively define `app_path` earlier in the Dockerfile before
this line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (4)
tests/test_recipes.py (4)
5-5: Remove unusedosimportThe
osmodule isn’t referenced anywhere in this test file. Keeping it will keep Ruff/F401 failing and slightly slow import time.-import os🧰 Tools
🪛 Ruff (0.11.9)
5-5:
osimported but unusedRemove unused import:
os(F401)
48-61: Use the helper that stubs a logger to avoidAttributeErrorriskHere (and in several following tests) the
Recipeinstance is created with__new__but without alogattribute.
Many internal methods callself.log.writein their exception paths; if any branch triggers, the test will raise an unexpectedAttributeError, masking the real problem.Prefer the
_recipe_with_argshelper (lines 88-93) for all tests:-r_fill = recipes.Recipe.__new__(recipes.Recipe) -r_fill.args = [{'A': 0, 'B': ''}] +r_fill = _recipe_with_args([{'A': 0, 'B': ''}])Apply the same pattern to the other tests that currently bypass the helper.
80-86: Strengthen the shuffle test – current assertion allows no-shuffle pass
internal_shuffleis supposed to randomise row order, yet the test only checks that the sets of values are identical – that will succeed even if the function returns the input unchanged.Add an assertion that the order is likely different (with a tolerance for rare collisions):
orig_order = df['A'].tolist() shuf_order = shuffled['A'].tolist() assert orig_order != shuf_order or np.random.seed() # very low probability of equality
180-191: Consolidate duplicate coverage ofinternal_keep
internal_keepis already partially exercised intest_internal_fillna_and_keep.
Keeping another almost-identical test increases runtime and maintenance with little extra value. Either:
- Merge the additional “where” filter assertions into the first test, or
- Rename this one to emphasise it covers the
whereclause specifically.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge Base: Disabled due to data retention organization setting
📒 Files selected for processing (1)
tests/test_recipes.py(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
tests/test_recipes.py (1)
code/recipes.py (24)
fwf_format(77-78)to_fwf(81-108)Recipe(976-2486)internal_fillna(1622-1634)select_columns(1596-1608)internal_keep(1900-1914)internal_rename(1697-1700)internal_map(1702-1715)internal_shuffle(1717-1726)internal_to_integer(1916-1927)internal_to_float(1955-1970)internal_parsedate(2448-2459)internal_normalize(2476-2483)internal_pause(2485-2486)internal_list_to_tuple(1929-1940)internal_tuple_to_list(1942-1953)internal_sql(2064-2070)internal_unnest(2378-2394)internal_nest(2396-2409)internal_delete(2072-2085)internal_groupby(2087-2118)internal_ngram(1972-1987)internal_exec(1636-1642)internal_eval(1644-1695)
🪛 Ruff (0.11.9)
tests/test_recipes.py
5-5: os imported but unused
Remove unused import: os
(F401)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: pull request test
🔇 Additional comments (2)
tests/test_recipes.py (2)
30-31:configmodule is imported but never used – verify necessity
config = load_module('config', config_path)is executed but theconfigvariable is never referenced afterwards.
If the sole purpose is to ensurerecipes.pycan importconfig, that already happens whenrecipesis loaded (becauseCODE_DIRis onsys.path). Consider removing the explicit load to avoid dead code, or add a comment explaining why it is required.
200-205: Regex replacement test is good – nice edge-case coverageThe test validates both selection and regex replacement behaviour and should catch most regressions. No issues spotted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (5)
tests/test_recipes.py (5)
5-5: Remove unusedosimport to satisfy Ruff F401 and keep the module cleanThe
osmodule is imported but never referenced in the test file.
Eliminating it avoids superfluous dependencies and silences the static-analysis warning.-import os🧰 Tools
🪛 Ruff (0.11.9)
5-5:
osimported but unusedRemove unused import:
os(F401)
16-27: Restoresys.pathin-place to avoid clobbering concurrent modifications
load_module()saves a copy ofsys.pathand later reassigns
sys.path = original_sys_path.
If any other thread (or import hook) mutatessys.pathwhile the module
is loading, that whole list will be discarded.
Restoring the list in-place (sys.path[:] = …) preserves late additions
and is the customary, safer pattern.- finally: - sys.path = original_sys_path + finally: + # Restore in place to keep any entries appended by other threads + sys.path[:] = original_sys_path
56-60: Redundant call toselect_columns– can be removed for brevity
internal_keep()already invokesselect_columns().
Calling it on lines 58-59 produces no side-effects that are used later and
can be dropped to shorten the test.- recipes.Recipe.select_columns(r_keep, filled) - kept = recipes.Recipe.internal_keep(r_keep, filled) + kept = recipes.Recipe.internal_keep(r_keep, filled)
92-124: Avoid constructing mutable objects in default-argument position (Ruff B008)
pd.DataFrame(...)andrange(...)are executed at import time inside the
lambda’s default value. This is harmless in tests, yet the pattern is
flagged and can surprise maintainers.A minimal tweak keeps the lambda pure and silences the warning:
- lambda res, df_orig=pd.DataFrame({'A': range(10), 'B': list('abcdefghij')}): ( + lambda res, df_orig=_DF_ORIG: (Add a constant right above the parametrize block:
_DF_ORIG = pd.DataFrame({'A': range(10), 'B': list('abcdefghij')})🧰 Tools
🪛 Ruff (0.11.9)
119-119: Do not perform function call
pd.DataFramein argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable(B008)
119-119: Do not perform function call
rangein argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable(B008)
84-90: Minor: replace ad-hoc logger stub withlogging.NullHandlerfor clarityCreating a dynamic class with a
writelambda works, but Python’s
loggingmodule already provides a no-op sink (logging.NullHandler())
which is clearer to readers:-import logging -... - r.log = type('log', (), {'write': lambda self, *a, **k: None})() +import logging +... + r.log = logging.getLogger("null") + r.log.addHandler(logging.NullHandler())
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge Base: Disabled due to data retention organization setting
📒 Files selected for processing (1)
tests/test_recipes.py(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
tests/test_recipes.py (1)
code/recipes.py (25)
fwf_format(77-78)to_fwf(81-108)Recipe(976-2486)internal_fillna(1622-1634)select_columns(1596-1608)internal_keep(1900-1914)internal_rename(1697-1700)internal_map(1702-1715)internal_to_integer(1916-1927)internal_to_float(1955-1970)internal_shuffle(1717-1726)internal_parsedate(2448-2459)internal_normalize(2476-2483)internal_pause(2485-2486)internal_list_to_tuple(1929-1940)internal_tuple_to_list(1942-1953)internal_sql(2064-2070)internal_unnest(2378-2394)internal_nest(2396-2409)internal_delete(2072-2085)internal_replace(2461-2474)internal_groupby(2087-2118)internal_ngram(1972-1987)internal_exec(1636-1642)internal_eval(1644-1695)
🪛 Ruff (0.11.9)
tests/test_recipes.py
5-5: os imported but unused
Remove unused import: os
(F401)
119-119: Do not perform function call pd.DataFrame in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable
(B008)
119-119: Do not perform function call range in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable
(B008)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: pull request test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
tests/test_recipes.py (3)
5-5: Remove unusedosimport
osis imported but not referenced anywhere in the file. Ruff F401 flags this and the import will fail the linter/CI step configured in the repo.-import os🧰 Tools
🪛 Ruff (0.11.9)
5-5:
osimported but unusedRemove unused import:
os(F401)
84-90: Minor: simplify the dummy logger constructionCreating the dummy logger via
type('log', (), {'write': lambda self, *a, **k: None})()works, but it is a little cryptic and re-creates the class on every call. A small, clearer helper class improves readability and avoids redefinition costs:-class _SilentLog(type('log', (), {'write': lambda self, *a, **k: None})): - pass +class _SilentLog: # noqa: D401 – simple helper + def write(self, *_, **__): + # swallow all log calls + return None ... - r.log = type('log', (), {'write': lambda self, *a, **k: None})() + r.log = _SilentLog()
155-184: Avoid mutable/heavy objects in default-argument positionsRuff B008 is correctly warning that
pd.DataFrame(…)andrange(…)are evaluated once at function definition time inside the default arguments of the lambdas you pass tovalidator. While harmless today, it can unexpectedly couple the test results to the first evaluation if the validator mutatesdf_orig.A tiny refactor removes the smell without changing behaviour:
- lambda res, df_orig=pd.DataFrame({'A': range(10), 'B': list('abcdefghij')}): ( + lambda res, df_orig=None: ( + ( + lambda _df: set(res['A']) == set(_df['A']) + and set(res['B']) == set(_df['B']) + )(pd.DataFrame({'A': range(10), 'B': list('abcdefghij')}) if df_orig is None else df_orig) ),(or pre-define
DF_SHUFFLE_ORIG = pd.DataFrame(...)above theCASESlist and capture it in the lambda).This change silences the linter and follows best practices for default arguments.
🧰 Tools
🪛 Ruff (0.11.9)
159-159: Do not perform function call
pd.DataFramein argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable(B008)
159-159: Do not perform function call
rangein argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable(B008)
169-169: Do not perform function call
pd.DataFramein argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable(B008)
179-179: Do not perform function call
pd.DataFramein argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable(B008)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge Base: Disabled due to data retention organization setting
📒 Files selected for processing (1)
tests/test_recipes.py(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
tests/test_recipes.py (1)
code/recipes.py (19)
fwf_format(77-78)to_fwf(81-108)internal_fillna(1622-1634)select_columns(1596-1608)internal_keep(1900-1914)internal_rename(1697-1700)internal_map(1702-1715)internal_to_integer(1916-1927)internal_to_float(1955-1970)internal_shuffle(1717-1726)internal_parsedate(2448-2459)internal_normalize(2476-2483)internal_pause(2485-2486)internal_delete(2072-2085)internal_replace(2461-2474)internal_groupby(2087-2118)internal_ngram(1972-1987)internal_exec(1636-1642)internal_eval(1644-1695)
🪛 Ruff (0.11.9)
tests/test_recipes.py
5-5: os imported but unused
Remove unused import: os
(F401)
159-159: Do not perform function call pd.DataFrame in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable
(B008)
159-159: Do not perform function call range in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable
(B008)
169-169: Do not perform function call pd.DataFrame in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable
(B008)
179-179: Do not perform function call pd.DataFrame in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable
(B008)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: pull request test
🔇 Additional comments (2)
tests/test_recipes.py (2)
299-308: Verify the expected n-gram outputThe assertion
result['txt'].iloc[0][0] == 'he'assumes thatrecipes.internal_ngramreturns character-level bigrams.
Looking atinternal_ngram:df[self.cols] = df[self.cols].applymap( lambda x: ngrams(tokenize(normalize(x)), n))…
tokenize(normalize(x))suggests token (word) level splitting, so the first bigram for"hello world"would likely be"hello world"not"he". Please double-check the helper functions (tokenize,ngrams) and ensure the test expectation matches the real semantics, otherwise the test will be fragile.If token-level bigrams are indeed produced, update the assertion accordingly:
-assert result['txt'].iloc[0][0] == 'he' +assert result['txt'].iloc[0][0] == 'hello world'
15-27: Dynamic import helper is solid – nice jobThe
load_moduleutility cleanly inserts the target directory, loads the module, and restoressys.pathin afinallyblock, avoiding global side-effects in other tests.
Summary
teststargetmake testsin push and pull request workflowsTesting
pytest -q(fails: command not found)make tests(fails: docker-compose not found)Summary by CodeRabbit
New Features
Documentation
Chores
pytestas a project dependency for testing.