dataserious
is a Python package that enhances dataclasses with type validation, serialization, and search space generation. It builds on top of the standard dataclasses
module to provide additional functionality for configuration management, it only has a single dependency on pyyaml
for YAML support.
- Type Validation: Ensures that the attributes of the dataclass instances match their type annotations.
- Serialization: Supports serialization to and from JSON and YAML formats.
- Search Space Generation: Generates search trees for hyperparameter tuning in grid and random search.
You can install dataserious
with YAML support using pip.
pip install "dataserious[yaml] @ git+https://github.com/Noza23/dataserious.git"
To define a configuration class, inherit from BaseConfig
and use ConfigField
for fields that require additional metadata.
from dataserious import BaseConfig, ConfigField
class ModelConfig(BaseConfig):
name: str
"""Name of the model."""
n_layers: int = ConfigField(searchable=True, description="Number of layers in the model.")
n_heads: int = ConfigField(searchable=True)
"""Number of heads in the model."""
ModelConfig.to_schema()
ModelConfig.schema_to_yaml("schema.yaml")
name: 'str: Name of the model.'
n_layers: 'int: Number of layers in the model.'
n_heads: 'int: Number of heads in the model.'
You can load and save configurations in JSON and YAML formats.
config = ModelConfig(name="GPT", n_layers=12, n_heads=12)
config.to_yaml("config.yaml")
loaded_config = ModelConfig.from_yaml("config.yaml")
Generate search trees for hyperparameter tuning.
print(config.to_search_tree())
config.search_tree_to_yaml("search_tree.yaml")
n_layers:
- int
n_heads:
- int
-
Filled out search tree test.yaml:
n_layers: - 12 - 24 - 36 n_heads: - 12 - 24 - 36
-
Loading the search tree and generating search spaces:
configs = config.get_configs_grid_from_path("test.yaml") configs_random = config.get_configs_random_from_path("test.yaml", n=2, seed=42)
-
Resulting Configs:
[print(json.dumps(config.to_dict(), indent=4)) for config in configs] print("\nRandom configs:") [print(json.dumps(config.to_dict(), indent=4)) for config in configs_random]
Ensure that the configurations match the expected types.
try:
config = ModelConfig(name="GPT", n_layers="twelve", n_heads=12)
except TypeError as e:
print(e)
Contributions are welcome!