(fix)Make the weighted avarange fit for all kinds of systems #4593

SumGuo-88 · 2025-02-10T08:48:07Z

Summary by CodeRabbit

Refactor
- Enhanced error handling and reporting for energy, force, and virial metrics.
- Improved the organization of error handling, allowing for more tailored output based on model capabilities.
Tests
- Introduced a new test class to validate the functionality of the updated error handling functions.
- Added tests for the new error functions to ensure correct output and interaction with mock data.

coderabbitai · 2025-02-10T08:50:41Z

📝 Walkthrough

Walkthrough

This pull request modifies the testing functionalities within the DeePMD framework. In deepmd/entrypoints/test.py, the error handling and reporting in the test functions are improved by adding new functions for error calculations, implementing conditional logging based on flags for energy, force, and virial metrics, and restructuring the return values of the functions. Additionally, a new test suite is introduced in source/tests/pt/test_weighted_avg.py to validate the functionality of the test_ener_err and test_ener_err_ops functions with mocked dependencies.

Changes

Files	Change Summary
deepmd/entrypoints/test.py	Enhanced error handling in test functions by introducing `test_ener_err` and `test_ener_err_ops` for error calculations; added conditional checks for metrics based on flags; updated `test_ener` function signature to return a dictionary.
source/tests/pt/test_weighted_avg.py	Added new test class `TestEnerErrFunctions` with methods `test_test_ener_err` and `test_test_ener_err_ops` to validate the functionality of the `test_ener_err` and `test_ener_err_ops` functions using the `unittest` framework.

Possibly related PRs

Fix: always output weighted average error #4401: The changes in the main PR are related to the modifications in the test function in the retrieved PR, as both involve updates to error handling and reporting within the same function in deepmd/entrypoints/test.py.

✨ Finishing Touches

📝 Generate Docstrings (Beta)

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (4)

source/tests/pt/test_weighted_avg.py (1)
70-101: Well-structured test cases with good coverage!

The test cases effectively validate different component combinations. Consider making the variable names more descriptive for better readability.

Consider renaming variables to be more descriptive:
-    expected_mae_f = (2*3 +1*3 )/(3+3)
+    expected_force_mae = (2*3 + 1*3)/(3+3)
-    expected_mae_v = (3*5 +1*5 )/(5+5)
+    expected_virial_mae = (3*5 + 1*5)/(5+5)
deepmd/entrypoints/test.py (3)
331-333: Good addition of component flags!

Consider using more descriptive variable names for better clarity.
-    find_energy = test_data.get('find_energy')
-    find_force = test_data.get('find_force')
-    find_virial = test_data.get('find_virial')
+    has_energy_component = test_data.get('find_energy')
+    has_force_component = test_data.get('find_force')
+    has_virial_component = test_data.get('find_virial')
146-167: Good selective error collection logic!

Consider adding error handling for missing components.

Add error handling for missing components:
     err_part = {}
+    if test_data.get('find_energy') is None:
+        log.warning("Energy component flag not found in test data")
+    if test_data.get('find_force') is None:
+        log.warning("Force component flag not found in test data")
+    if test_data.get('find_virial') is None:
+        log.warning("Virial component flag not found in test data")

     if find_energy == 1:
         err_part['mae_e'] = err['mae_e']
459-470: Good conditional logging implementation!

Consider adding debug logging for better troubleshooting.

Add debug logging:
+    log.debug(f"Processing system with energy={find_energy}, force={find_force}, virial={find_virial}")
     if find_force == 1:
         if not out_put_spin:
             log.info(f"Force  MAE         : {mae_f:e} eV/A")

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 65facbb and b3a925d.

📒 Files selected for processing (2)

deepmd/entrypoints/test.py (12 hunks)
source/tests/pt/test_weighted_avg.py (1 hunks)

⏰ Context from checks skipped due to timeout of 90000ms (19)

GitHub Check: Test Python (5, 3.12)
GitHub Check: Test Python (4, 3.12)
GitHub Check: Test Python (4, 3.9)
GitHub Check: Test Python (3, 3.12)
GitHub Check: Build C++ (clang, clang)
GitHub Check: Test Python (3, 3.9)
GitHub Check: Build C++ (rocm, rocm)
GitHub Check: Test Python (2, 3.12)
GitHub Check: Build C++ (cuda120, cuda)
GitHub Check: Analyze (python)
GitHub Check: Test Python (2, 3.9)
GitHub Check: Build C library (2.14, >=2.5.0rc0,<2.15, libdeepmd_c_cu11.tar.gz)
GitHub Check: Build C++ (cuda, cuda)
GitHub Check: Analyze (javascript-typescript)
GitHub Check: Test Python (1, 3.12)
GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
GitHub Check: Test Python (1, 3.9)
GitHub Check: Build C++ (cpu, cpu)
GitHub Check: Analyze (c-cpp)

🔇 Additional comments (3)

source/tests/pt/test_weighted_avg.py (3)

7-31: Well-structured implementation for handling different error metrics!

The function effectively handles different combinations of energy, force, and virial metrics with clean conditional logic and proper error collection.

33-39: Clean baseline implementation!

The function provides a good reference point for comparing weighted averages with and without filtering.

43-67: Comprehensive test coverage for energy-only metrics!

The test case effectively validates:

Correct weighted average calculations

Proper handling of energy-only systems

Edge cases with force and virial metrics

for more information, see https://pre-commit.ci

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (4)

source/tests/pt/test_weighted_avg.py (2)

11-36: Consider adding type hints and docstring.

The function lacks type hints and documentation which would improve code maintainability and help users understand the expected input/output format.

Apply this diff to add type hints and docstring:

-def test(all_sys):
+def test(all_sys: list[tuple[dict, int, int, int]]) -> dict[str, tuple[float, int]]:
+    """Calculate weighted average of errors with selective inclusion of metrics.
+    
+    Args:
+        all_sys: List of tuples containing (error_dict, find_energy, find_force, find_virial)
+                where error_dict contains the error metrics
+    
+    Returns:
+        Dictionary mapping error names to tuples of (error_value, sample_size)
+    """
     err_coll = []

38-44: Add type hints and docstring to test_ori function.

Similar to the test function, this function would benefit from type hints and documentation.

Apply this diff:

-def test_ori(all_sys):
+def test_ori(all_sys: list[tuple[dict, int, int, int]]) -> dict[str, tuple[float, int]]:
+    """Calculate weighted average of all errors without selective inclusion.
+    
+    Args:
+        all_sys: List of tuples containing (error_dict, find_energy, find_force, find_virial)
+                where error_dict contains the error metrics
+    
+    Returns:
+        Dictionary mapping error names to tuples of (error_value, sample_size)
+    """
     err_coll = []

deepmd/entrypoints/test.py (2)

331-334: Consider using dictionary get() with default values.

The code uses get() without default values which could return None. Consider providing default values for safety.

Apply this diff:

-    find_energy = test_data.get("find_energy")
-    find_force = test_data.get("find_force")
-    find_virial = test_data.get("find_virial")
+    find_energy = test_data.get("find_energy", 0)
+    find_force = test_data.get("find_force", 0)
+    find_virial = test_data.get("find_virial", 0)

744-747: Use f-strings instead of % operator for string formatting.

The code uses the older % operator for string formatting. Consider using f-strings for better readability and maintainability.

Apply this diff:

-                detail_path.with_suffix(".dos.out.%.d" % ii),
+                detail_path.with_suffix(f".dos.out.{ii:d}"),
                 frame_output,
-                header="%s - %.d: data_dos pred_dos" % (system, ii),
+                header=f"{system} - {ii:d}: data_dos pred_dos",

-                    detail_path.with_suffix(".ados.out.%.d" % ii),
+                    detail_path.with_suffix(f".ados.out.{ii:d}"),
                     frame_output,
-                    header="%s - %.d: data_ados pred_ados" % (system, ii),
+                    header=f"{system} - {ii:d}: data_ados pred_ados",

Also applies to: 758-761

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b3a925d and 08e4a55.

📒 Files selected for processing (2)

deepmd/entrypoints/test.py (10 hunks)
source/tests/pt/test_weighted_avg.py (1 hunks)

⏰ Context from checks skipped due to timeout of 90000ms (19)

GitHub Check: Test Python (3, 3.12)
GitHub Check: Test Python (3, 3.9)
GitHub Check: Test Python (2, 3.12)
GitHub Check: Test Python (2, 3.9)
GitHub Check: Test Python (1, 3.12)
GitHub Check: Test Python (1, 3.9)
GitHub Check: Build C library (2.14, >=2.5.0rc0,<2.15, libdeepmd_c_cu11.tar.gz)
GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
GitHub Check: Analyze (python)
GitHub Check: Analyze (javascript-typescript)
GitHub Check: Analyze (c-cpp)
GitHub Check: Build wheels for cp310-manylinux_aarch64
GitHub Check: Build wheels for cp311-win_amd64
GitHub Check: Build wheels for cp311-macosx_arm64
GitHub Check: Build wheels for cp311-macosx_x86_64
GitHub Check: Build wheels for cp311-manylinux_x86_64
GitHub Check: Build wheels for cp311-manylinux_x86_64
GitHub Check: Test C++ (false)
GitHub Check: Test C++ (true)

🔇 Additional comments (6)

source/tests/pt/test_weighted_avg.py (3)

47-113: LGTM! Well-structured test case for energy-only scenario.

The test case thoroughly validates the weighted average calculation for energy metrics, including proper assertions for both the main function and original implementation.

114-171: LGTM! Comprehensive test case for energy and force metrics.

The test case effectively validates the combined energy and force calculations, with appropriate assertions to verify the differences between the two implementations.

172-229: LGTM! Complete test coverage for all components.

The test case provides thorough validation of all metrics (energy, force, virial) with appropriate assertions.

deepmd/entrypoints/test.py (3)

137-168: LGTM! Improved error handling with selective metric inclusion.

The changes enhance error handling by:

Using flags to determine which metrics to include

Organizing errors into a separate dictionary

Handling different force metric cases (regular vs spin)

459-468: LGTM! Improved conditional logging for force metrics.

The changes enhance logging by:

Only logging when force data is present

Handling different force metric types (regular vs spin)

469-469: LGTM! Added condition for virial logging.

The change ensures virial metrics are only logged when appropriate (PBC enabled and virial data present).

…md-kit into debug-weightedavg

for more information, see https://pre-commit.ci

…md-kit into debug-weightedavg

for more information, see https://pre-commit.ci

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (3)

source/tests/pt/test_weighted_avg.py (3)

11-36: Add type hints and docstring for better maintainability.

The function needs documentation and type hints to improve maintainability. Also, consider using an enum or constants for flag values.

-def fake_test(all_sys):
+from typing import List, Dict, Tuple, Union
+
+def fake_test(all_sys: List[Tuple[Dict[str, Tuple[float, float]], int, int, int]]) -> Dict[str, float]:
+    """Calculate weighted average of error metrics based on system components.
+    
+    Args:
+        all_sys: List of tuples containing (error_dict, has_energy, has_force, has_virial)
+               where error_dict contains metrics like mae_e, rmse_e etc.
+    
+    Returns:
+        Dictionary containing weighted averages of error metrics.
+    """

Also, add error handling for missing force metrics:

         if find_force == 1:
+            if not any(key in err for key in ['rmse_f', 'rmse_fr']):
+                raise KeyError("No force metrics found in error dictionary")
             if "rmse_f" in err:

38-44: Add type hints and docstring for consistency.

For consistency with fake_test, add type hints and documentation.

-def fake_test_ori(all_sys):
+def fake_test_ori(all_sys: List[Tuple[Dict[str, Tuple[float, float]], int, int, int]]) -> Dict[str, float]:
+    """Calculate weighted average of error metrics ignoring component flags.
+    
+    Used as a baseline for comparison with fake_test.
+    
+    Args:
+        all_sys: List of tuples containing (error_dict, has_energy, has_force, has_virial)
+    
+    Returns:
+        Dictionary containing weighted averages of all error metrics.
+    """

47-229: Enhance test maintainability and coverage.

While the test cases are comprehensive, consider these improvements:

Move test data to class-level setup
Add docstrings to test methods
Add negative test cases

Example refactor:

 class TestWeightedAverage(unittest.TestCase):
+    def setUp(self):
+        """Set up test data."""
+        # Define common test data structure
+        self.base_system = {
+            "mae_e": (2, 2),
+            "mae_ea": (4, 2),
+            "rmse_e": (3, 2),
+            "rmse_ea": (5, 2),
+            "mae_f": (2, 3),
+            "rmse_f": (1, 3),
+            "mae_v": (3, 5),
+            "rmse_v": (3, 3),
+        }
+
     def test_case1_energy_only(self):
+        """Test weighted average calculation with energy metrics only."""

Add negative test cases:

def test_invalid_input(self):
    """Test handling of invalid input data."""
    # Test with empty system list
    with self.assertRaises(ValueError):
        fake_test([])
    
    # Test with invalid flag values
    invalid_sys = [(self.base_system, 2, 0, 0)]  # Invalid energy flag
    with self.assertRaises(ValueError):
        fake_test(invalid_sys)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 08e4a55 and db637f9.

📒 Files selected for processing (1)

source/tests/pt/test_weighted_avg.py (1 hunks)

⏰ Context from checks skipped due to timeout of 90000ms (20)

GitHub Check: Build wheels for cp310-manylinux_aarch64
GitHub Check: Test Python (6, 3.12)
GitHub Check: Test Python (6, 3.9)
GitHub Check: Test Python (5, 3.12)
GitHub Check: Test Python (5, 3.9)
GitHub Check: Test Python (4, 3.12)
GitHub Check: Test Python (4, 3.9)
GitHub Check: Test Python (3, 3.12)
GitHub Check: Test Python (3, 3.9)
GitHub Check: Test Python (2, 3.12)
GitHub Check: Test Python (2, 3.9)
GitHub Check: Test Python (1, 3.12)
GitHub Check: Test Python (1, 3.9)
GitHub Check: Build C library (2.14, >=2.5.0rc0,<2.15, libdeepmd_c_cu11.tar.gz)
GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
GitHub Check: Analyze (python)
GitHub Check: Analyze (javascript-typescript)
GitHub Check: Analyze (c-cpp)
GitHub Check: Test C++ (false)
GitHub Check: Test C++ (true)

🔇 Additional comments (2)

source/tests/pt/test_weighted_avg.py (2)

1-8: LGTM! Imports are well-organized.

The license is properly specified and imports follow Python conventions.

231-232: LGTM! Standard unittest main block.

Follows Python conventions for unittest execution.

…md-kit into debug-weightedavg

for more information, see https://pre-commit.ci

coderabbitai

Actionable comments posted: 2

🔭 Outside diff range comments (2)

deepmd/entrypoints/test.py (2)
562-574: ⚠️ Potential issue

Fix incorrect type hints.

The function's return type hint tuple[list[np.ndarray], list[int]] doesn't match the actual return type, which is a dictionary of tuples containing error metrics and their sizes.

Update the type hint to match the actual return type:
-) -> tuple[list[np.ndarray], list[int]]:
+) -> dict[str, tuple[float, int]]:
Also, remove the commented-out flags from the return statements:
-        }  # find_energy,find_force,find_virial,
+        }
Also applies to: 575-588

137-169: 💡 Verification agent
🧩 Analysis chain

Verify the impact of removing error filtering.

The commented-out code previously filtered errors based on flags (find_energy, find_force, find_virial). Removing this filtering could lead to inconsistencies in error reporting, especially when certain metrics are not available or relevant.

Run the following script to verify error handling in test cases:

🏁 Script executed:
#!/bin/bash
# Description: Check test cases for error handling patterns.

# Search for test cases that rely on filtered errors
rg -A 5 'test_ener.*find_' 'source/tests'

# Search for error dictionary usage in test cases
rg -A 5 'err\[.*(mae|rmse)_(e|f|v)' 'source/tests'
Length of output: 4540
Inconsistent error reporting: Unfiltered error dictionary vs. filtered expectations in tests

The removal of the error filtering (i.e., the commented-out aggregation logic that created a filtered err_part) in the test_ener function means that the function now returns a complete error dictionary. However, test cases in source/tests/pt/test_weighted_avg.py still expect the filtered error metrics (e.g., keys such as "mae_e", "rmse_e", etc., populated conditionally via find_energy, find_force, and find_virial). This mismatch can lead to discrepancies in the averaged error computation or even test failures if additional keys or unexpected values are present.

Action Item: Either update the test case expectations to work with the full error dictionary or reintroduce/adjust the filtering logic to ensure that the returned error dictionary strictly contains the expected keys and values.

🧹 Nitpick comments (1)

deepmd/entrypoints/test.py (1)

735-738: Use f-strings for better readability.

The code uses the % operator for string formatting, which is outdated in Python. Using f-strings would improve readability and maintainability.

Apply this diff to update the string formatting:

-                detail_path.with_suffix(".dos.out.%.d" % ii),
-                header="%s - %.d: data_dos pred_dos" % (system, ii),
+                detail_path.with_suffix(f".dos.out.{ii}"),
+                header=f"{system} - {ii}: data_dos pred_dos",

-                    detail_path.with_suffix(".ados.out.%.d" % ii),
-                    header="%s - %.d: data_ados pred_ados" % (system, ii),
+                    detail_path.with_suffix(f".ados.out.{ii}"),
+                    header=f"{system} - {ii}: data_ados pred_ados",

-                detail_path.with_suffix(".property.out.%.d" % ii),
-                header="%s - %.d: data_property pred_property" % (system, ii),
+                detail_path.with_suffix(f".property.out.{ii}"),
+                header=f"{system} - {ii}: data_property pred_property",

-                    detail_path.with_suffix(".aproperty.out.%.d" % ii),
-                    header="%s - %.d: data_aproperty pred_aproperty" % (system, ii),
+                    detail_path.with_suffix(f".aproperty.out.{ii}"),
+                    header=f"{system} - {ii}: data_aproperty pred_aproperty",

Also applies to: 749-752, 891-894, 905-908

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between db637f9 and d2c9c4b.

📒 Files selected for processing (1)

deepmd/entrypoints/test.py (12 hunks)

🧰 Additional context used

🪛 Ruff (0.8.2)

deepmd/entrypoints/test.py

332-332: Local variable find_energy is assigned to but never used

Remove assignment to unused variable find_energy

(F841)

⏰ Context from checks skipped due to timeout of 90000ms (5)

GitHub Check: Test C++ (false)
GitHub Check: Test C++ (true)
GitHub Check: Analyze (python)
GitHub Check: Analyze (javascript-typescript)
GitHub Check: Analyze (c-cpp)

🔇 Additional comments (1)

deepmd/entrypoints/test.py (1)

810-812: LGTM! Improved code consistency.

The changes to the property test function improve readability by:

Using consistent variable names (property instead of mixed usage)

Using task_dim consistently for dimensionality

Also applies to: 863-864, 868-869

deepmd/entrypoints/test.py

for more information, see https://pre-commit.ci

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

deepmd/entrypoints/test.py (2)

734-750: Use f-strings for string formatting.

For better readability and consistency with modern Python practices, consider using f-strings instead of the % formatting style.

Apply this diff to update the string formatting:

-                detail_path.with_suffix(".dos.out.%.d" % ii),
+                detail_path.with_suffix(f".dos.out.{ii:d}"),
                 frame_output,
-                header="%s - %.d: data_dos pred_dos" % (system, ii),
+                header=f"{system} - {ii:d}: data_dos pred_dos",
                 append=append_detail,
             )

             save_txt_file(
-                    detail_path.with_suffix(".ados.out.%.d" % ii),
+                    detail_path.with_suffix(f".ados.out.{ii:d}"),
                     frame_output,
-                    header="%s - %.d: data_ados pred_ados" % (system, ii),
+                    header=f"{system} - {ii:d}: data_ados pred_ados",
                     append=append_detail,
                 )

890-906: Use f-strings for string formatting.

For better readability and consistency with modern Python practices, consider using f-strings instead of the % formatting style.

Apply this diff to update the string formatting:

-                detail_path.with_suffix(".property.out.%.d" % ii),
+                detail_path.with_suffix(f".property.out.{ii:d}"),
                 frame_output,
-                header="%s - %.d: data_property pred_property" % (system, ii),
+                header=f"{system} - {ii:d}: data_property pred_property",
                 append=append_detail,
             )

             save_txt_file(
-                    detail_path.with_suffix(".aproperty.out.%.d" % ii),
+                    detail_path.with_suffix(f".aproperty.out.{ii:d}"),
                     frame_output,
-                    header="%s - %.d: data_aproperty pred_aproperty" % (system, ii),
+                    header=f"{system} - {ii:d}: data_aproperty pred_aproperty",
                     append=append_detail,
                 )

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d2c9c4b and 99d6942.

📒 Files selected for processing (1)

deepmd/entrypoints/test.py (12 hunks)

🔇 Additional comments (4)

deepmd/entrypoints/test.py (4)

137-145: LGTM! Function signature update improves type safety.

The updated function signature now correctly returns the error dictionary along with the flags, making the return type explicit and improving type safety.

146-167: LGTM! Error filtering logic is now consistent.

The error filtering logic has been improved to selectively include metrics based on the flags, addressing the inconsistency mentioned in past review comments.

809-811: LGTM! Using task_dim improves flexibility.

Using dp.task_dim for property dimensions makes the code more flexible and maintainable by relying on the model's configuration.

331-333: Static analysis warning can be ignored.

The flags find_energy, find_force, and find_virial are now properly used in error filtering (lines 148-165) and returned from the function. The static analysis warning is outdated.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

deepmd/entrypoints/test.py (1)

744-747: Modernize string formatting.

Replace old-style % string formatting with f-strings for better readability and maintainability.

-                detail_path.with_suffix(".dos.out.%.d" % ii),
+                detail_path.with_suffix(f".dos.out.{ii:d}"),
-                header="%s - %.d: data_dos pred_dos" % (system, ii),
+                header=f"{system} - {ii:d}: data_dos pred_dos",

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 99d6942 and 984a78e.

📒 Files selected for processing (1)

deepmd/entrypoints/test.py (10 hunks)

🔇 Additional comments (4)

deepmd/entrypoints/test.py (4)

137-167: LGTM! Improved error handling and type safety.

The changes improve error handling by making the function return type explicit and filtering error metrics based on what was actually calculated. The structured error dictionary now only includes relevant metrics.

331-333: LGTM! Resolved unused flag issues.

The previously unused flags are now properly utilized for error filtering and conditional logging, addressing past review comments and static analysis warnings.

459-470: LGTM! Consistent flag usage in logging.

The changes implement consistent conditional logging based on the presence of force and virial calculations, with proper handling of spin calculations.

819-821: LGTM! Consistent property testing implementation.

The property testing implementation follows the same patterns as other test functions, with consistent dimension handling and error calculation.

Also applies to: 872-878

…md-kit into debug-weightedavg

codecov · 2025-02-12T07:17:52Z

Codecov Report

Attention: Patch coverage is 92.00000% with 2 lines in your changes missing coverage. Please review.

Project coverage is 84.57%. Comparing base (176c746) to head (0f669b8).
Report is 1 commits behind head on devel.

Files with missing lines	Patch %	Lines
deepmd/entrypoints/test.py	92.00%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##            devel    #4593      +/-   ##
==========================================
- Coverage   84.58%   84.57%   -0.01%     
==========================================
  Files         680      680              
  Lines       64509    64527      +18     
  Branches     3540     3540              
==========================================
+ Hits        54562    54575      +13     
- Misses       8807     8812       +5     
  Partials     1140     1140

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

iProzd

You should fake a dp function that return some model predictions in UT, instead of create a whole new function.

iProzd · 2025-02-17T11:04:10Z

source/tests/pt/test_weighted_avg.py

+def fake_test(all_sys):
+    err_coll = []
+    for sys_data in all_sys:
+        err, find_energy, find_force, find_virial = sys_data
+        err_part = {}
+        if find_energy == 1:
+            err_part["mae_e"] = err["mae_e"]
+            err_part["mae_ea"] = err["mae_ea"]
+            err_part["rmse_e"] = err["rmse_e"]
+            err_part["rmse_ea"] = err["rmse_ea"]
+        if find_force == 1:
+            if "rmse_f" in err:
+                err_part["mae_f"] = err["mae_f"]
+                err_part["rmse_f"] = err["rmse_f"]
+            else:
+                err_part["mae_fr"] = err["mae_fr"]
+                err_part["rmse_fr"] = err["rmse_fr"]
+                err_part["mae_fm"] = err["mae_fm"]
+                err_part["rmse_fm"] = err["rmse_fm"]
+        if find_virial == 1:
+            err_part["mae_v"] = err["mae_v"]
+            err_part["rmse_v"] = err["rmse_v"]
+        err_coll.append(err_part)
+    avg_err = weighted_average(err_coll)
+    return avg_err
+
+
+def fake_test_ori(all_sys):
+    err_coll = []
+    for sys_data in all_sys:
+        err, _, _, _ = sys_data
+        err_coll.append(err)
+    avg_err = weighted_average(err_coll)
+    return avg_err


How can you test the correctness of your modification in deepmd/entrypoints/test.py?

Now it's independent between UT and your modification!

iProzd · 2025-02-17T11:05:32Z

deepmd/entrypoints/test.py

+            err_part = {}
+
+            if find_energy == 1:
+                err_part["mae_e"] = err["mae_e"]
+                err_part["mae_ea"] = err["mae_ea"]
+                err_part["rmse_e"] = err["rmse_e"]
+                err_part["rmse_ea"] = err["rmse_ea"]
+
+            if find_force == 1:
+                if "rmse_f" in err:
+                    err_part["mae_f"] = err["mae_f"]
+                    err_part["rmse_f"] = err["rmse_f"]
+                else:
+                    err_part["mae_fr"] = err["mae_fr"]
+                    err_part["rmse_fr"] = err["rmse_fr"]
+                    err_part["mae_fm"] = err["mae_fm"]
+                    err_part["rmse_fm"] = err["rmse_fm"]
+            if find_virial == 1:
+                err_part["mae_v"] = err["mae_v"]
+                err_part["rmse_v"] = err["rmse_v"]


Note your modification is not covered by UT.

deepmd/entrypoints/test.py

for more information, see https://pre-commit.ci

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (6)

source/tests/pt/test_weighted_avg.py (4)
4-4: Remove unused import.

The DeepPot import is not used in the code.
-from deepmd.infer.deep_pot import DeepPot  # Import DeepPot
🧰 Tools

🪛 Ruff (0.8.2)

4-4: deepmd.infer.deep_pot.DeepPot imported but unused

Remove unused import: deepmd.infer.deep_pot.DeepPot

(F401)

26-71: Enhance test coverage with available mock data.

The test currently uses only the first system's data from base_data, while mock data for three systems is defined. Consider:

Testing with multiple systems to validate the weighted average calculation

Adding test cases for different combinations of flags (find_energy, find_force, find_virial)

Example enhancement:
# Test multiple systems
for system_data in base_data:
    mock_test_ener.return_value = (
        system_data,
        1,  # find_energy
        1,  # find_force
        1   # find_virial
    )
    test(...)  # Run test with each system
74-81: Add test cases for edge cases and different parameter combinations.

Consider adding test cases for:

numb_test=0 to test the infinite test case

shuffle_test=False to test both shuffling scenarios

Different combinations of atomic and detail_file parameters

83-97: Enhance test assertions for better coverage.

Current assertions could be expanded to:

Verify all metrics in the error dictionary (force, virial)

Check the content and format of the detail_file

Validate the structure of the weighted average results

Example additions:
# Verify detail file content
mock_open.assert_called_with("mock_detail.txt", 'w')
mock_save_txt.assert_called()  # Add mock for save_txt_file

# Verify error dictionary structure
result = mock_weighted_avg.return_value
expected_keys = ['mae_e', 'rmse_e', 'mae_f', 'rmse_f', 'mae_v', 'rmse_v']
self.assertTrue(all(key in result for key in expected_keys))
deepmd/entrypoints/test.py (2)
477-481: Improve readability of virial logging condition.

The condition for virial logging combines multiple checks in a single line. Consider making the requirements more explicit.
-    if data.pbc and not out_put_spin and find_virial == 1:
+    # Only log virial if:
+    # 1. System has periodic boundary conditions
+    # 2. Not in spin output mode
+    # 3. Virial calculation was requested
+    should_log_virial = data.pbc and not out_put_spin and find_virial == 1
+    if should_log_virial:
588-624: Expand test coverage for return value variations.

While the new test file improves coverage, consider adding tests for:

Different combinations of find_* flags

Spin output cases

Edge cases where some metrics are missing

Would you like me to help create additional test cases for these scenarios?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0f669b8 and 5273168.

📒 Files selected for processing (2)

deepmd/entrypoints/test.py (5 hunks)
source/tests/pt/test_weighted_avg.py (1 hunks)

🧰 Additional context used

🪛 Ruff (0.8.2)

source/tests/pt/test_weighted_avg.py

4-4: deepmd.infer.deep_pot.DeepPot imported but unused

Remove unused import: deepmd.infer.deep_pot.DeepPot

(F401)

deepmd/entrypoints/test.py

626-626: Undefined name dict_to_return

(F821)

627-627: Undefined name dict_to_return

(F821)

628-628: Undefined name dict_to_return

(F821)

⏰ Context from checks skipped due to timeout of 90000ms (18)

GitHub Check: Test Python (6, 3.12)
GitHub Check: Test Python (6, 3.9)
GitHub Check: Test Python (5, 3.12)
GitHub Check: Test Python (5, 3.9)
GitHub Check: Test Python (4, 3.12)
GitHub Check: Test Python (4, 3.9)
GitHub Check: Build wheels for cp311-win_amd64
GitHub Check: Test Python (3, 3.12)
GitHub Check: Build wheels for cp311-macosx_arm64
GitHub Check: Test Python (3, 3.9)
GitHub Check: Build wheels for cp311-macosx_x86_64
GitHub Check: Test Python (2, 3.12)
GitHub Check: Test Python (2, 3.9)
GitHub Check: Analyze (python)
GitHub Check: Build wheels for cp311-manylinux_x86_64
GitHub Check: Test Python (1, 3.12)
GitHub Check: Test Python (1, 3.9)
GitHub Check: Analyze (c-cpp)

🔇 Additional comments (2)

source/tests/pt/test_weighted_avg.py (1)

8-13: LGTM! Well-structured mock setup.

The test method follows best practices by:

Mocking all external dependencies

Using patch decorators correctly

Following unittest framework conventions

deepmd/entrypoints/test.py (1)

137-168: LGTM! Improved error filtering with consistent flag usage.

The implementation now properly uses flags for both error filtering and logging, addressing the previous inconsistency concerns.

coderabbitai · 2025-02-18T07:19:05Z

deepmd/entrypoints/test.py

    if dp.has_hessian:
        dict_to_return["mae_h"] = (mae_h, hessian.size)
        dict_to_return["rmse_h"] = (rmse_h, hessian.size)
    return dict_to_return


⚠️ Potential issue

Fix unreachable code after return statement.

The Hessian-related code is unreachable as it appears after the return statements.

Move the Hessian handling before the return statements:

if dp.has_hessian: - dict_to_return["mae_h"] = (mae_h, hessian.size) - dict_to_return["rmse_h"] = (rmse_h, hessian.size) - return dict_to_return + err["mae_h"] = (mae_h, hessian.size) + err["rmse_h"] = (rmse_h, hessian.size) if not out_put_spin: return ( {

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if dp.has_hessian:

dict_to_return["mae_h"] = (mae_h, hessian.size)

dict_to_return["rmse_h"] = (rmse_h, hessian.size)

return dict_to_return

if dp.has_hessian:

- dict_to_return["mae_h"] = (mae_h, hessian.size)

- dict_to_return["rmse_h"] = (rmse_h, hessian.size)

- return dict_to_return

+ err["mae_h"] = (mae_h, hessian.size)

+ err["rmse_h"] = (rmse_h, hessian.size)

+ if not out_put_spin:

+ return (

+ {

// ... remaining code for constructing the return value ...

}

+ )

🧰 Tools

🪛 Ruff (0.8.2)

626-626: Undefined name dict_to_return

(F821)

627-627: Undefined name dict_to_return

(F821)

628-628: Undefined name dict_to_return

(F821)

coderabbitai

Actionable comments posted: 0

🔭 Outside diff range comments (1)

deepmd/entrypoints/test.py (1)
625-628: ⚠️ Potential issue

Remove unreachable Hessian code.

This code block is unreachable as it appears after return statements. Additionally, it uses an undefined variable dict_to_return.

Remove the unreachable code:
-    if dp.has_hessian:
-        dict_to_return["mae_h"] = (mae_h, hessian.size)
-        dict_to_return["rmse_h"] = (rmse_h, hessian.size)
-    return dict_to_return
🧰 Tools

🪛 Ruff (0.8.2)

626-626: Undefined name dict_to_return

(F821)

627-627: Undefined name dict_to_return

(F821)

628-628: Undefined name dict_to_return

(F821)

🧹 Nitpick comments (2)

deepmd/entrypoints/test.py (2)

137-167: Consider using a safer approach for error dictionary modification.

The current implementation modifies the error dictionary by creating a new one and reassigning it. This could lead to issues if any code relies on the original dictionary structure.

Consider this safer approach that preserves the original dictionary:

-    err_part = {}
-    if find_energy == 1:
-        err_part["mae_e"] = err["mae_e"]
-        err_part["mae_ea"] = err["mae_ea"]
-        err_part["rmse_e"] = err["rmse_e"]
-        err_part["rmse_ea"] = err["rmse_ea"]
-    if find_force == 1:
-        if "rmse_f" in err:
-            err_part["mae_f"] = err["mae_f"]
-            err_part["rmse_f"] = err["rmse_f"]
-        else:
-            err_part["mae_fr"] = err["mae_fr"]
-            err_part["rmse_fr"] = err["rmse_fr"]
-            err_part["mae_fm"] = err["mae_fm"]
-            err_part["rmse_fm"] = err["rmse_fm"]
-    if find_virial == 1:
-        err_part["mae_v"] = err["mae_v"]
-        err_part["rmse_v"] = err["rmse_v"]
-    err = err_part
+    # Create a set of keys to keep based on flags
+    keys_to_keep = set()
+    if find_energy == 1:
+        keys_to_keep.update(["mae_e", "mae_ea", "rmse_e", "rmse_ea"])
+    if find_force == 1:
+        if "rmse_f" in err:
+            keys_to_keep.update(["mae_f", "rmse_f"])
+        else:
+            keys_to_keep.update(["mae_fr", "rmse_fr", "mae_fm", "rmse_fm"])
+    if find_virial == 1:
+        keys_to_keep.update(["mae_v", "rmse_v"])
+    # Filter the original dictionary
+    err = {k: v for k, v in err.items() if k in keys_to_keep}

477-477: Apply consistent conditional logging for all metrics.

While virial logging is now conditional on find_virial, similar conditions should be applied to energy and force logging for consistency.

Consider applying similar conditions to energy and force logging:

-    log.info(f"Energy MAE         : {mae_e:e} eV")
-    log.info(f"Energy RMSE        : {rmse_e:e} eV")
+    if find_energy == 1:
+        log.info(f"Energy MAE         : {mae_e:e} eV")
+        log.info(f"Energy RMSE        : {rmse_e:e} eV")

-    if not out_put_spin:
+    if not out_put_spin and find_force == 1:
         log.info(f"Force  MAE         : {mae_f:e} eV/A")
         log.info(f"Force  RMSE        : {rmse_f:e} eV/A")

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5273168 and 57ba28f.

📒 Files selected for processing (2)

deepmd/entrypoints/test.py (5 hunks)
source/tests/pt/test_weighted_avg.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

source/tests/pt/test_weighted_avg.py

⏰ Context from checks skipped due to timeout of 90000ms (18)

GitHub Check: Test Python (2, 3.12)
GitHub Check: Build wheels for cp310-manylinux_aarch64
GitHub Check: Build wheels for cp311-win_amd64
GitHub Check: Test Python (1, 3.12)
GitHub Check: Build wheels for cp311-macosx_arm64
GitHub Check: Test Python (1, 3.9)
GitHub Check: Build wheels for cp311-macosx_x86_64
GitHub Check: Build wheels for cp311-manylinux_x86_64
GitHub Check: Build wheels for cp311-manylinux_x86_64
GitHub Check: Build C++ (clang, clang)
GitHub Check: Build C++ (rocm, rocm)
GitHub Check: Build C++ (cuda120, cuda)
GitHub Check: Build C++ (cuda, cuda)
GitHub Check: Build C library (2.14, >=2.5.0rc0,<2.15, libdeepmd_c_cu11.tar.gz)
GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
GitHub Check: Analyze (c-cpp)
GitHub Check: Test C++ (false)
GitHub Check: Test C++ (true)

🔇 Additional comments (2)

deepmd/entrypoints/test.py (2)

333-335: LGTM! The flags are now properly utilized.

The flags find_energy, find_force, and find_virial are now correctly used in the error filtering logic, addressing the previous "unused variables" concern.

588-624: LGTM! Return signature changes improve type safety and information flow.

The updated return signature now includes flags and properly handles both spin and non-spin cases, making the function's behavior more explicit and type-safe.

…md-kit into debug-weightedavg

for more information, see https://pre-commit.ci

…md-kit into debug-weightedavg

for more information, see https://pre-commit.ci

…md-kit into debug-weightedavg

for more information, see https://pre-commit.ci

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (1)

deepmd/entrypoints/test.py (1)

385-387: ⚠️ Potential issue

Resolve inconsistency in flag usage.

The flags find_energy, find_force, and find_virial were previously marked for removal but are now used in conditional logging.

Choose one approach:

Keep the flags and use them consistently throughout the code

Remove the flags and always log all available metrics

If keeping the flags, ensure they are used consistently and document their purpose.

🧹 Nitpick comments (3)

deepmd/entrypoints/test.py (3)

257-290: Add type hints to new functions.

The new error handling functions lack type hints, which is inconsistent with the rest of the codebase. Also, consider grouping the parameters into a data class to improve maintainability.

Apply this diff to add type hints and improve code structure:

+from dataclasses import dataclass
+from typing import Dict, Tuple, Union
+
+@dataclass
+class EnergyTestData:
+    find_energy: int
+    find_force: int
+    find_virial: int
+    energy: np.ndarray
+    force: np.ndarray
+    virial: np.ndarray
+    mae_e: float
+    mae_ea: float
+    mae_f: float
+    mae_v: float
+    mae_va: float
+    rmse_e: float
+    rmse_ea: float
+    rmse_f: float
+    rmse_v: float
+    rmse_va: float

-def test_ener_err(
-    find_energy,
-    find_force,
-    find_virial,
-    energy,
-    force,
-    virial,
-    mae_e,
-    mae_ea,
-    mae_f,
-    mae_v,
-    mae_va,
-    rmse_e,
-    rmse_ea,
-    rmse_f,
-    rmse_v,
-    rmse_va,
-):
+def test_ener_err(data: EnergyTestData) -> Dict[str, Tuple[float, int]]:
     err = {}
-    if find_energy == 1:
-        err["mae_e"] = (mae_e, energy.size)
-        err["mae_ea"] = (mae_ea, energy.size)
-        err["rmse_e"] = (rmse_e, energy.size)
-        err["rmse_ea"] = (rmse_ea, energy.size)
+    if data.find_energy == 1:
+        err["mae_e"] = (data.mae_e, data.energy.size)
+        err["mae_ea"] = (data.mae_ea, data.energy.size)
+        err["rmse_e"] = (data.rmse_e, data.energy.size)
+        err["rmse_ea"] = (data.rmse_ea, data.energy.size)

Similar changes should be applied to test_ener_err_ops.

Also applies to: 292-330

516-530: Improve logging structure.

The conditional logging blocks could be refactored for better readability and maintainability.

Consider using a logging helper function:

+def log_energy_metrics(mae_e: float, rmse_e: float, mae_ea: float, rmse_ea: float) -> None:
+    log.info(f"Energy MAE         : {mae_e:e} eV")
+    log.info(f"Energy RMSE        : {rmse_e:e} eV")
+    log.info(f"Energy MAE/Natoms  : {mae_ea:e} eV")
+    log.info(f"Energy RMSE/Natoms : {rmse_ea:e} eV")

-    if find_energy == 1:
-        log.info(f"Energy MAE         : {mae_e:e} eV")
-        log.info(f"Energy RMSE        : {rmse_e:e} eV")
-        log.info(f"Energy MAE/Natoms  : {mae_ea:e} eV")
-        log.info(f"Energy RMSE/Natoms : {rmse_ea:e} eV")
+    if find_energy == 1:
+        log_energy_metrics(mae_e, rmse_e, mae_ea, rmse_ea)

257-290: Add unit tests for new error handling functions.

The new error handling functions test_ener_err and test_ener_err_ops need test coverage.

Would you like me to generate unit tests for these functions to ensure they handle various input scenarios correctly?

Also applies to: 292-330

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 57ba28f and 687b08d.

📒 Files selected for processing (2)

deepmd/entrypoints/test.py (4 hunks)
source/tests/pt/test_weighted_avg.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

source/tests/pt/test_weighted_avg.py

⏰ Context from checks skipped due to timeout of 90000ms (20)

GitHub Check: Build wheels for cp311-win_amd64
GitHub Check: Build wheels for cp311-macosx_arm64
GitHub Check: Test Python (6, 3.12)
GitHub Check: Build wheels for cp311-macosx_x86_64
GitHub Check: Test Python (6, 3.9)
GitHub Check: Test Python (5, 3.12)
GitHub Check: Test Python (5, 3.9)
GitHub Check: Test Python (4, 3.12)
GitHub Check: Test Python (4, 3.9)
GitHub Check: Test Python (3, 3.12)
GitHub Check: Test C++ (false)
GitHub Check: Test Python (3, 3.9)
GitHub Check: Test C++ (true)
GitHub Check: Test Python (2, 3.12)
GitHub Check: Test Python (2, 3.9)
GitHub Check: Test Python (1, 3.12)
GitHub Check: Test Python (1, 3.9)
GitHub Check: Analyze (python)
GitHub Check: Analyze (javascript-typescript)
GitHub Check: Analyze (c-cpp)

🔇 Additional comments (1)

deepmd/entrypoints/test.py (1)

682-685: ⚠️ Potential issue

Fix unreachable code after return statement.

The Hessian-related code is unreachable as it appears after the return statements.

Move the Hessian handling before the return statements:
     if dp.has_hessian:
-        dict_to_return["mae_h"] = (mae_h, hessian.size)
-        dict_to_return["rmse_h"] = (rmse_h, hessian.size)
-    return dict_to_return
+        err["mae_h"] = (mae_h, hessian.size)
+        err["rmse_h"] = (rmse_h, hessian.size)
     if not out_put_spin:
         return (
             {
🧰 Tools

🪛 Ruff (0.8.2)

683-683: Undefined name dict_to_return

(F821)

684-684: Undefined name dict_to_return

(F821)

685-685: Undefined name dict_to_return

(F821)

…md-kit into debug-weightedavg

for more information, see https://pre-commit.ci

…md-kit into debug-weightedavg

coderabbitai

Actionable comments posted: 2

🔭 Outside diff range comments (1)

deepmd/entrypoints/test.py (1)
687-690: ⚠️ Potential issue

Fix unreachable Hessian code.

This code block is unreachable as it appears after the return statements.

Move the Hessian handling before the return statements:
-    if dp.has_hessian:
-        dict_to_return["mae_h"] = (mae_h, hessian.size)
-        dict_to_return["rmse_h"] = (rmse_h, hessian.size)
-    return dict_to_return
+    err = {}
+    if dp.has_hessian:
+        err["mae_h"] = (mae_h, hessian.size)
+        err["rmse_h"] = (rmse_h, hessian.size)
+    if not out_put_spin:
+        return test_ener_err(...)
+    else:
+        return test_ener_err_ops(...)
🧰 Tools

🪛 Ruff (0.8.2)

688-688: Undefined name dict_to_return

(F821)

689-689: Undefined name dict_to_return

(F821)

690-690: Undefined name dict_to_return

(F821)

🧹 Nitpick comments (1)

source/tests/pt/test_weighted_avg.py (1)

171-176: Enhance test data validation.

The test uses oversimplified mock data with constant values ([1.0] * 10, [0.1] * 10, etc.). Consider adding test cases with:

Different array sizes
Edge cases (zeros, large numbers)
Random data to catch potential numerical issues

-                energy = np.array([1.0] * 10)
-                force = np.array([0.1] * 10)
-                virial = np.array([0.2] * 10)
+                # Test different array sizes and data patterns
+                sizes = [10, 100]
+                for size in sizes:
+                    # Regular case
+                    energy = np.random.random(size)
+                    force = np.random.random(size)
+                    virial = np.random.random(size)
+                    # Edge cases
+                    energy_edge = np.zeros(size)
+                    force_edge = np.array([1e10] * size)
+                    virial_edge = np.array([-1e10] * size)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 687b08d and 7ea19b2.

📒 Files selected for processing (2)

deepmd/entrypoints/test.py (4 hunks)
source/tests/pt/test_weighted_avg.py (1 hunks)

⏰ Context from checks skipped due to timeout of 90000ms (16)

GitHub Check: Test Python (1, 3.12)
GitHub Check: Test C++ (false)
GitHub Check: Test C++ (true)
GitHub Check: Build wheels for cp310-manylinux_aarch64
GitHub Check: Build wheels for cp311-win_amd64
GitHub Check: Build wheels for cp311-macosx_arm64
GitHub Check: Build wheels for cp311-macosx_x86_64
GitHub Check: Build wheels for cp311-manylinux_x86_64
GitHub Check: Build wheels for cp311-manylinux_x86_64
GitHub Check: Build C library (2.14, >=2.5.0rc0,<2.15, libdeepmd_c_cu11.tar.gz)
GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
GitHub Check: Build C++ (clang, clang)
GitHub Check: Build C++ (rocm, rocm)
GitHub Check: Build C++ (cuda120, cuda)
GitHub Check: Build C++ (cuda, cuda)
GitHub Check: Build C++ (cpu, cpu)

🔇 Additional comments (3)

deepmd/entrypoints/test.py (3)

257-289: LGTM! Well-structured error handling.

The function follows good practices:

Single responsibility principle

Conditional error reporting based on flags

Includes data size with each error metric

292-331: LGTM! Consistent error handling for magnetic systems.

The function maintains consistency with test_ener_err while adding support for magnetic force components.

387-391:
✅ Verification successful

Verify flag usage in the codebase.

The flags are used for both error calculation and logging. Let's verify their usage across the codebase.

🏁 Script executed:
#!/bin/bash
# Search for all usages of these flags
echo "Searching for flag usage patterns..."
rg "find_(energy|force|virial|force_real|force_mag)" --type py

echo "Searching for test data flag definitions..."
rg "test_data\.get\([\"']find_" --type py
Length of output: 13895
Flag usage verified across the codebase – no changes needed.

The search results confirm that the error and logging flags (“find_energy”, “find_force”, “find_virial”, “find_force_real”, and “find_force_mag”) are consistently used in multiple modules (e.g., in deepmd/dpmodel/loss/ener.py, deepmd/tf/loss/ener.py, and various test files). In particular, the definitions in deepmd/entrypoints/test.py are in line with their usage elsewhere in the codebase.

Flag definitions via test_data.get(...) in deepmd/entrypoints/test.py are used correctly in conditional checks.

Multiplicative error calculations and logging (e.g., in deepmd/dpmodel/loss/ener.py) consistently reference these flags.

Test files also verify these flags, confirming that both error evaluation and logging functionality follow the expected pattern.

Overall, the flag usage is coherent and properly integrated.

coderabbitai · 2025-02-19T03:05:19Z

source/tests/pt/test_weighted_avg.py

+)
+
+
+class TestEnerErrFunctions(unittest.TestCase):


🛠️ Refactor suggestion

Expand test coverage.

The current test suite only validates the error calculation functions in isolation. Consider adding tests for:

Integration with the DeePMD framework

Error logging functionality

Edge cases in the main test function

Add a new test class:

class TestEnerIntegration(unittest.TestCase): def setUp(self): # Mock DeePMD framework components self.dp = MagicMock() self.data = MagicMock() def test_error_logging(self): # Test that errors are logged correctly with self.assertLogs() as logs: test_ener(self.dp, self.data, "test_system", 10, None, False) self.assertIn("Energy MAE", logs.output[0]) def test_edge_cases(self): # Test handling of empty data # Test handling of invalid flags # Test handling of NaN/Inf values pass

coderabbitai · 2025-02-19T03:05:19Z

source/tests/pt/test_weighted_avg.py

+                    rmse_va,
+                )
+
+                self.assertCountEqual(list(err.keys()), expected_key)


🛠️ Refactor suggestion

Validate error calculations.

The test only checks the structure of return values (keys) but doesn't validate the actual error calculations.

+ # Validate error calculations + expected_mae_e = np.mean(np.abs(energy)) + expected_rmse_e = np.sqrt(np.mean(np.square(energy))) + self.assertAlmostEqual(err['mae_e'][0], expected_mae_e) + self.assertAlmostEqual(err['rmse_e'][0], expected_rmse_e)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

self.assertCountEqual(list(err.keys()), expected_key)

self.assertCountEqual(list(err.keys()), expected_key)

# Validate error calculations

expected_mae_e = np.mean(np.abs(energy))

expected_rmse_e = np.sqrt(np.mean(np.square(energy)))

self.assertAlmostEqual(err['mae_e'][0], expected_mae_e)

self.assertAlmostEqual(err['rmse_e'][0], expected_rmse_e)

fix and ut

b3a925d

github-actions bot added the Python label Feb 10, 2025

coderabbitai bot reviewed Feb 10, 2025

View reviewed changes

[pre-commit.ci] auto fixes from pre-commit.com hooks

08e4a55

for more information, see https://pre-commit.ci

coderabbitai bot reviewed Feb 10, 2025

View reviewed changes

SumGuo-88 and others added 6 commits February 11, 2025 09:08

fix ut bug (make the def name not start with 'test'

bdc260a

Merge branch 'debug-weightedavg' of https://github.com/SumGuo-88/deep…

875ee01

…md-kit into debug-weightedavg

[pre-commit.ci] auto fixes from pre-commit.com hooks

7d137d6

for more information, see https://pre-commit.ci

Make code simple

8ab5ab9

Merge branch 'debug-weightedavg' of https://github.com/SumGuo-88/deep…

33c4161

…md-kit into debug-weightedavg

[pre-commit.ci] auto fixes from pre-commit.com hooks

db637f9

for more information, see https://pre-commit.ci

coderabbitai bot reviewed Feb 11, 2025

View reviewed changes

SumGuo-88 and others added 5 commits February 11, 2025 10:49

Merge branch 'devel' into debug-weightedavg

a6e5ee1

Merge branch 'devel' into debug-weightedavg

4ac6b45

check change

fa0aa4f

Merge branch 'debug-weightedavg' of https://github.com/SumGuo-88/deep…

ab9ec6e

…md-kit into debug-weightedavg

[pre-commit.ci] auto fixes from pre-commit.com hooks

d2c9c4b

for more information, see https://pre-commit.ci

coderabbitai bot reviewed Feb 12, 2025

View reviewed changes

deepmd/entrypoints/test.py Outdated Show resolved Hide resolved

deepmd/entrypoints/test.py Show resolved Hide resolved

github-advanced-security bot found potential problems Feb 12, 2025

View reviewed changes

deepmd/entrypoints/test.py Fixed Show fixed Hide fixed

deepmd/entrypoints/test.py Fixed Show fixed Hide fixed

SumGuo-88 and others added 3 commits February 12, 2025 13:14

Merge branch 'devel' into debug-weightedavg

f87ef49

reverse

99d6942

[pre-commit.ci] auto fixes from pre-commit.com hooks

984a78e

for more information, see https://pre-commit.ci

coderabbitai bot reviewed Feb 12, 2025

View reviewed changes

SumGuo-88 added 2 commits February 12, 2025 14:05

reverse2

603b6f2

Merge branch 'debug-weightedavg' of https://github.com/SumGuo-88/deep…

0f669b8

…md-kit into debug-weightedavg

iProzd marked this pull request as draft February 13, 2025 01:49

SumGuo-88 marked this pull request as ready for review February 17, 2025 11:06

iProzd requested changes Feb 17, 2025

View reviewed changes

njzjz requested changes Feb 18, 2025

View reviewed changes

deepmd/entrypoints/test.py Outdated Show resolved Hide resolved

SumGuo-88 and others added 2 commits February 18, 2025 15:15

remake the ut

5273168

[pre-commit.ci] auto fixes from pre-commit.com hooks

57ba28f

for more information, see https://pre-commit.ci

coderabbitai bot reviewed Feb 18, 2025

View reviewed changes

SumGuo-88 and others added 9 commits February 18, 2025 15:27

remake ut- still with error 2

322b4c0

Merge branch 'debug-weightedavg' of https://github.com/SumGuo-88/deep…

8407b0c

…md-kit into debug-weightedavg

[pre-commit.ci] auto fixes from pre-commit.com hooks

11d0c68

for more information, see https://pre-commit.ci

remake ut --still with error 3

1942dfa

Merge branch 'debug-weightedavg' of https://github.com/SumGuo-88/deep…

c7ca682

…md-kit into debug-weightedavg

[pre-commit.ci] auto fixes from pre-commit.com hooks

80a8589

for more information, see https://pre-commit.ci

remake test with two def

4e882f3

Merge branch 'debug-weightedavg' of https://github.com/SumGuo-88/deep…

66bb904

…md-kit into debug-weightedavg

[pre-commit.ci] auto fixes from pre-commit.com hooks

687b08d

for more information, see https://pre-commit.ci

coderabbitai bot reviewed Feb 19, 2025

View reviewed changes

SumGuo-88 and others added 6 commits February 19, 2025 09:59

Merge branch 'devel' into debug-weightedavg

2b1f4bb

remake all

8d48ba4

Merge branch 'debug-weightedavg' of https://github.com/SumGuo-88/deep…

b95ad66

…md-kit into debug-weightedavg

[pre-commit.ci] auto fixes from pre-commit.com hooks

824472a

for more information, see https://pre-commit.ci

make ops input right

accc321

Merge branch 'debug-weightedavg' of https://github.com/SumGuo-88/deep…

7ea19b2

…md-kit into debug-weightedavg

coderabbitai bot reviewed Feb 19, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(fix)Make the weighted avarange fit for all kinds of systems #4593

(fix)Make the weighted avarange fit for all kinds of systems #4593

SumGuo-88 commented Feb 10, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 10, 2025 •

edited

Loading

Walkthrough

Changes

Possibly related PRs

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

coderabbitai bot left a comment

coderabbitai bot left a comment

coderabbitai bot left a comment

coderabbitai bot left a comment

coderabbitai bot left a comment

coderabbitai bot left a comment

codecov bot commented Feb 12, 2025 •

edited

Loading

iProzd left a comment

iProzd Feb 17, 2025

iProzd Feb 17, 2025

coderabbitai bot left a comment

coderabbitai bot Feb 18, 2025

coderabbitai bot left a comment

coderabbitai bot left a comment

coderabbitai bot left a comment

coderabbitai bot Feb 19, 2025

coderabbitai bot Feb 19, 2025

-                self.assertCountEqual(list(err.keys()), expected_key)
+                self.assertCountEqual(list(err.keys()), expected_key)
+                # Validate error calculations
+                expected_mae_e = np.mean(np.abs(energy))
+                expected_rmse_e = np.sqrt(np.mean(np.square(energy)))
+                self.assertAlmostEqual(err['mae_e'][0], expected_mae_e)
+                self.assertAlmostEqual(err['rmse_e'][0], expected_rmse_e)

(fix)Make the weighted avarange fit for all kinds of systems #4593

Are you sure you want to change the base?

(fix)Make the weighted avarange fit for all kinds of systems #4593

Conversation

SumGuo-88 commented Feb 10, 2025 • edited by coderabbitai bot Loading

Summary by CodeRabbit

coderabbitai bot commented Feb 10, 2025 • edited Loading

Walkthrough

Changes

Possibly related PRs

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

codecov bot commented Feb 12, 2025 • edited Loading

Codecov Report

iProzd left a comment

Choose a reason for hiding this comment

iProzd Feb 17, 2025

Choose a reason for hiding this comment

iProzd Feb 17, 2025

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Feb 18, 2025

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Feb 19, 2025

Choose a reason for hiding this comment

coderabbitai bot Feb 19, 2025

Choose a reason for hiding this comment

SumGuo-88 commented Feb 10, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 10, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)

codecov bot commented Feb 12, 2025 •

edited

Loading