Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor: Centralize Database Management with BaseDatabaseManager and Migrate Derived Managers #341

Open
wants to merge 63 commits into
base: main
Choose a base branch
from

Conversation

willis89pr
Copy link
Collaborator

@willis89pr willis89pr commented Feb 5, 2025

Summary

This pull request introduces a significant refactor to improve the maintainability, modularity, and extensibility of the database management system within the codebase. A new abstract base class, BaseDatabaseManager, has been introduced to centralize and standardize core database management logic. Derived classes, such as JSDatabaseManager and NativeLibDatabaseManager, have been refactored to extend this new base class, reducing code duplication and improving reusability.

Proposed changes

Feature: Added BaseDatabaseManager Abstract Class
Provides a standardized framework for managing database files.
Includes reusable properties and methods for:
File path handling (data_dir, database_version_file_path, database_file_path).
Database loading (load_db), saving (save_database), and retrieving (get_database).
Metadata encapsulation via the pattern_info property.
Enforces implementation of parse_raw_data in derived classes to handle raw data parsing.
Improves code readability with type hints and the use of pathlib.Path for file paths.

Refactor: Migrated JSDatabaseManager to Extend BaseDatabaseManager
Removed redundant attributes and methods now handled by the base class (e.g., pattern_key, pattern_file, source, etc.).
Implemented the data_dir property to specify the directory for JavaScript library database files.
Added a custom parse_raw_data method to handle RetireJS database data.
Simplified database management by leveraging reusable functionality from BaseDatabaseManager.

Refactor: Migrated NativeLibDatabaseManager to Extend BaseDatabaseManager
Removed redundant attributes and methods now handled by the base class.
Implemented the data_dir property to specify the directory for Native Library database files.
Added a custom parse_raw_data method to handle EMBA configuration file data.
Enhanced parse_raw_data to validate and handle malformed lines gracefully.
Simplified database management by leveraging reusable functionality from BaseDatabaseManager.

TOML Structure Examples

.local/share/surfactant/native_lib_patterns/native_lib_patterns.toml

[emba]
file = "native_lib_patterns_emba.json"
source = "https://raw.githubusercontent.com/e-m-b-a/emba/11d6c281189c3a14fc56f243859b0bccccce8b9a/config/bin_version_strings.cfg"
hash = "efa7fccb857d9a96f2d5ca009d27acec08611f67a74549dcc0ef1b11bc25134f"
timestamp = "2025-02-12T19:25:55.323202+00:00"

.local/share/surfactant/js_library_patterns/js_library_patterns.toml

[retirejs]
file = "js_library_patterns_retirejs.json"
source = "https://raw.githubusercontent.com/RetireJS/retire.js/master/repository/jsrepository-master.json"
hash = "fd1467b96783a5f282a12f70acae09163dd5b57d13f5f4a4c99d4d64d1318b83"
timestamp = "2025-02-12T19:26:03.877387+00:00"

[abc]
file = "js_library_patterns_abc.json"
source = "https://raw.githubusercontent.com/abc.json"
hash = "123"
timestamp = "2025-02-12T18:23:13.374491+00:00"

…agement

- Added `BaseDatabaseManager` as an abstract base class to centralize and standardize database management logic.
  - Includes properties for `data_dir`, `database_version_file_path`, and `database_file_path`.
  - Provides methods for loading (`load_db`), saving (`save_database`), and retrieving (`get_database`) database files.
  - Introduced `pattern_info` property to encapsulate metadata about database patterns.
  - Added `parse_raw_data` as an abstract method to enforce implementation of raw data parsing in derived classes.

- Refactored the module to improve modularity and reusability:
  - Consolidated database file handling into `BaseDatabaseManager`.
  - Introduced `Path` from `pathlib` for cleaner file path handling.
  - Added type hints and improved code readability.

- Retained existing utility functions (`download_database`, `calculate_hash`, etc.) for backward compatibility.

This change enhances the extensibility and maintainability of the database management system by introducing a clear abstraction layer.
- Refactored `JSDatabaseManager` to inherit from the new `BaseDatabaseManager` class, improving modularity and code reuse.
  - Removed redundant attributes and methods now handled by `BaseDatabaseManager`, such as `pattern_key`, `pattern_file`, `source`, and `load_db`.
  - Implemented the `data_dir` property to specify the directory for storing JavaScript library database files.
  - Added `parse_raw_data` method to define custom parsing logic for RetireJS database data.

- Updated imports to include `BaseDatabaseManager` from `database_utils`.

- Simplified database initialization and management by leveraging `BaseDatabaseManager`'s reusable functionality.

- Retained backward compatibility for existing hooks (`extract_file_info`, `update_db`, etc.).

This change enhances maintainability and extensibility by centralizing shared database management logic in `BaseDatabaseManager`.
…anager`

- Refactored `NativeLibDatabaseManager` to inherit from `BaseDatabaseManager`:
  - Removed redundant attributes (`pattern_key`, `pattern_file`, `source`, etc.) as they are now handled by the base class.
  - Implemented the `data_dir` property to specify the directory for storing Native Library database files.
  - Added `parse_raw_data` method to define custom parsing logic for the EMBA configuration file.
  - Simplified database loading and management by leveraging reusable functionality from `BaseDatabaseManager`.

- Updated imports to include `BaseDatabaseManager` from `database_utils`.

- Enhanced `parse_raw_data` to validate and handle malformed lines in the EMBA configuration file gracefully.

- Retained backward compatibility for existing hooks (`extract_file_info`, `update_db`, etc.).

This refactor improves code maintainability, modularity, and extensibility by centralizing shared database management logic in the `BaseDatabaseManager` class.
@willis89pr willis89pr requested a review from nightlark February 5, 2025 22:49
willis89pr and others added 17 commits February 6, 2025 22:01
…r child classes so that the parent class can notify the user with the correct plugin update command.
pre-commit-ci bot and others added 23 commits February 11, 2025 22:41
Did not intend to commit this file.
…ndling

- **Performance Improvements:**
  - Avoid redundant hash calculations by computing  once and reusing it.
  - Skip unnecessary processing when the database is already up-to-date.

- **Error Handling:**
  - Added a  block to handle potential  during JSON parsing.
  - Return a clear error message if the database download fails or the content is invalid.

- **Code Readability:**
  - Refactored the function into logical steps with clear comments for better maintainability.
  - Improved naming consistency ( for clarity).

- **Consistency:**
  - Standardized return messages to provide clear feedback across all scenarios.
@willis89pr willis89pr requested a review from nightlark February 12, 2025 19:32
willis89pr and others added 5 commits February 18, 2025 10:50
Added docstring to Databaseconfig.
feat: Add validation to DatabaseConfig for source, version_file_name, and database_file

- Added a `__post_init__` method to the `DatabaseConfig` class to perform validation on its attributes.
  - Validates that `source` is either "file" or a valid URL with an "http" or "https" scheme.
  - Ensures `version_file_name` does not include a file extension.
  - Ensures `database_file` ends with the ".json" extension.

- Improved error handling with descriptive `ValueError` messages for invalid inputs.

These changes enhance the robustness of the `DatabaseConfig` class by ensuring that its attributes conform to expected formats and constraints, reducing the likelihood of runtime errors.
Copy link
Collaborator

@nightlark nightlark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should create a subfolder for the version file and databases as a sort of namespace. It will help keep us and 3rd party plugin developers that might leverage the class from accidentally creating a database .json file with a name that conflicts with the one used in another plugin.

@abstractmethod
def data_dir(self) -> Path:
"""Returns the base directory for storing database (.json) and version tracking (TOML) files."""
return ConfigManager().get_data_dir_path()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return ConfigManager().get_data_dir_path()
return ConfigManager().get_data_dir_path() / self.config.version_file_name

@property
def database_version_file_path(self) -> Path:
"""Path to the database version file (e.g., TOML file)."""
return self.data_dir / f"{self.config.version_file_name}.toml"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return self.data_dir / f"{self.config.version_file_name}.toml"
return self.data_dir / "version_info.toml"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants