Skip to content

Validator for Biothings Studio Plugin #382

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 123 commits into from
Apr 16, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
123 commits
Select commit Hold shift + click to select a range
1d5610b
test get mapping pydantic model
jal347 Oct 25, 2024
6596ef7
try to log mapping
jal347 Oct 25, 2024
0ecb4e2
add logging
jal347 Oct 25, 2024
e9c20ec
add logging setup to source
jal347 Oct 25, 2024
7a1d65a
test if model_schema is formatted correctly
jal347 Oct 26, 2024
df7e82c
test if model_schema is formatted correctly
jal347 Oct 26, 2024
a36e77a
test mongo query with pydantic
jal347 Oct 28, 2024
28af0f6
test source endpoint to run validation through uploader
jal347 Oct 31, 2024
04a0ed2
test source endpoint using uploader manager
jal347 Oct 31, 2024
a7e2e6f
test model validation
jal347 Oct 31, 2024
254ef0a
added optional to pydantic field mapping
jal347 Oct 31, 2024
5a60db7
fixed optional pydantic field mapping
jal347 Oct 31, 2024
8be300f
implemented job manager with validating sources
jal347 Oct 31, 2024
86cf9e9
added partial to submit to job manager
jal347 Oct 31, 2024
9b3ef26
test defer to process
jal347 Nov 1, 2024
83d60f6
commented out motor import
jal347 Nov 1, 2024
ab01d03
moved validate_src to basesourceuploader to attempt to pickle data
jal347 Nov 1, 2024
b22427b
prepare before unpreparing
jal347 Nov 1, 2024
43d76e2
test if removing validation model fixes pickling issue
jal347 Nov 1, 2024
de7f890
attempted fix at pickling objects
jal347 Nov 1, 2024
e2d70be
set model to none to attempt to make it picklable
jal347 Nov 1, 2024
e2c5090
move getting the mapping to uploader
jal347 Nov 4, 2024
63298df
added more loggers
jal347 Nov 4, 2024
b123141
print the uploader_manager
jal347 Nov 4, 2024
c85db25
print the uploader_manager
jal347 Nov 4, 2024
d6dc300
print the uploader_manager
jal347 Nov 4, 2024
79a8524
test creating model with defer to process
jal347 Nov 5, 2024
2eb2c5f
fixed logging
jal347 Nov 5, 2024
5cde456
purposely fail doc
jal347 Nov 5, 2024
a3b7f1b
added test_api to default comfig
jal347 Nov 12, 2024
b8772ce
test model string
jal347 Nov 12, 2024
3356850
try storing model
jal347 Nov 12, 2024
ac83dde
test storing model
jal347 Nov 12, 2024
cb50ab8
corrected os.join to os.path.join
jal347 Nov 12, 2024
f2a038f
try to find correct path of source
jal347 Nov 13, 2024
5096e0e
attempt to find path of module
jal347 Nov 13, 2024
4bacbff
try to get file path within basesourceuploader
jal347 Nov 14, 2024
87c7502
fixed logger typo
jal347 Nov 14, 2024
f7db114
changed where pydantic validator stores model and moved model string …
jal347 Nov 14, 2024
0310f60
logging directory paths
jal347 Nov 14, 2024
704acf7
get module path to store model
jal347 Nov 14, 2024
8b7340f
try to get module path
jal347 Nov 14, 2024
87e712f
trying to get module_path again
jal347 Nov 14, 2024
aadd109
trying to get module_path again
jal347 Nov 14, 2024
f2bd295
trying to get module name
jal347 Nov 14, 2024
22125ca
fixed get_module_path
jal347 Nov 14, 2024
fae25d1
fixed get_module_path
jal347 Nov 14, 2024
8e8bf5b
try to get module path before creating instance
jal347 Nov 14, 2024
636b3ec
try to get module dir
jal347 Nov 14, 2024
93755ec
test adding model to model_dir
jal347 Nov 14, 2024
19920e7
try to import model
jal347 Nov 14, 2024
63b3688
attempt to import model
jal347 Nov 14, 2024
b6b0043
check my model
jal347 Nov 15, 2024
0787f1f
fixed logging to get model schema
jal347 Nov 15, 2024
bd1ebc5
test pydantic validation model
jal347 Nov 15, 2024
1ff4072
force error
jal347 Nov 15, 2024
8ba172a
test if documents are valid
jal347 Nov 15, 2024
00a4e02
removed forced error
jal347 Nov 15, 2024
0579a3b
moved commit of pydantic model outside of validate method
jal347 Nov 15, 2024
4698655
include validation with upload
jal347 Nov 16, 2024
20d75e5
attempt to log klass
jal347 Nov 16, 2024
10b23d0
fixed logger
jal347 Nov 16, 2024
c490a12
fixed params
jal347 Nov 16, 2024
c12e31a
added validation to uploader
jal347 Nov 16, 2024
51b54ed
changed structure on how to create and validate
jal347 Nov 18, 2024
81a098e
fixed typo
jal347 Nov 18, 2024
294bb59
fixed variables
jal347 Nov 18, 2024
c24439d
add model endpoint
jal347 Nov 19, 2024
fe72860
changed pydantic validation to post
jal347 Nov 19, 2024
15a44a3
trying to get payload to work
jal347 Nov 19, 2024
461cdbe
trying to get payload to work
jal347 Nov 19, 2024
a9d3e6e
get model str
jal347 Dec 5, 2024
1f5be7f
get model_str endpoint
jal347 Dec 5, 2024
00a3535
try to get the model str
jal347 Dec 6, 2024
fd65d87
try to get the model str
jal347 Dec 6, 2024
d12ffdc
try to fix klass retrival
jal347 Dec 6, 2024
005bec6
testing validation
jal347 Dec 10, 2024
720b18c
fixed uploader validation parameter
jal347 Dec 10, 2024
2b59879
changed endpoints for pydantic validation
jal347 Dec 11, 2024
01ab135
fixed pydantic model
jal347 Dec 11, 2024
64f4c7d
added logging to see if commiting pydantic model is working
jal347 Dec 11, 2024
3cd2a65
update to validation parameters
jal347 Dec 13, 2024
bde3e05
fixed kwargs issues
jal347 Dec 14, 2024
dd88d46
changed get_module_path to get_validation_path
jal347 Dec 17, 2024
94cd7b7
moved base methods to utils
jal347 Dec 19, 2024
5e9ce8f
clean up files
jal347 Dec 20, 2024
56e5f6d
add validations endpoint
jal347 Dec 20, 2024
589368c
make validation model listing more concise
jal347 Dec 20, 2024
5f6aacf
fixed reference from source manager to upload manager
jal347 Dec 20, 2024
e0e9cc2
monitor validation status
jal347 Jan 2, 2025
67d310c
added validation to sources
jal347 Jan 3, 2025
9a25e6f
fixed saving pydantic model for uploaders with subsrcs
jal347 Jan 3, 2025
11d9a5d
testing the try catch
jal347 Jan 3, 2025
d386189
trying to raise error as soon as we see it
jal347 Jan 3, 2025
6fe11f9
remove got error to see what happens
jal347 Jan 3, 2025
cb6f57d
moved job outside of try
jal347 Jan 3, 2025
95a3e6b
put whole method validate_src into try except
jal347 Jan 3, 2025
0169107
put in correct subkey for failed validation
jal347 Jan 3, 2025
502484d
put in correct subkey for failed validation
jal347 Jan 3, 2025
d174909
fixed validator helper
jal347 Jan 4, 2025
c70a7da
dont split subsrc create_instance does it for you
jal347 Jan 6, 2025
c67f407
fixed validator recursion
jal347 Jan 6, 2025
6cd612b
test getting the module class klass
jal347 Jan 9, 2025
dd05039
test validate configuration
jal347 Jan 10, 2025
51699a7
added more logging to test new validation config
jal347 Jan 10, 2025
face516
cleanup uploader file
jal347 Jan 10, 2025
992734f
removed generate model param from uploader.py
jal347 Jan 15, 2025
c265250
moved getting the model path in validate to validate_src so we can ad…
jal347 Jan 17, 2025
ba52eb1
added model file used for all register status for validate
jal347 Jan 17, 2025
9df103e
added model_file to validate in sumup_source
jal347 Jan 17, 2025
a3ed8ed
cleaned up create_pydantic_model recursion
jal347 Jan 23, 2025
5f2e815
fixed key generator to handle multiple type values
jal347 Jan 30, 2025
3ab12ed
cleanup and modified key generation to model
jal347 Feb 3, 2025
b3f361a
split model string path to contain path name after hub
jal347 Feb 4, 2025
1cb9323
fixed pydantic model edge case creation when model has same names
jal347 Feb 4, 2025
37c695e
generate_data_validator logic cleanup
jal347 Feb 4, 2025
9e92b43
test auto validate
jal347 Mar 27, 2025
bdf1ba0
test adding auto_validate to source endpoint
jal347 Mar 28, 2025
98c7157
add auto validate as own object in get_sources
jal347 Mar 28, 2025
b2410ef
debug auto_validate
jal347 Mar 28, 2025
d43b4af
added auto_validate as an attribute
jal347 Mar 28, 2025
b8ddf22
cleanup logging
jal347 Mar 28, 2025
0afb7bd
Merge branch '1.0.x' into save_pydantic_model
jal347 Apr 16, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 21 additions & 0 deletions biothings/hub/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -1219,6 +1219,15 @@ def configure_extra_commands(self):
self.extra_commands["source_save_mapping"] = CommandDefinition(
command=self.managers["source_manager"].save_mapping
)
self.extra_commands["source_save_pydantic_model"] = CommandDefinition(
command=self.managers["source_manager"].save_pydantic_model
)
self.extra_commands["source_pydantic_validation"] = CommandDefinition(
command=self.managers["source_manager"].run_pydantic_validation
)
self.extra_commands["source_validations"] = CommandDefinition(
command=self.managers["source_manager"].get_validations
)
if self.managers.get("dump_manager"):
self.extra_commands["dm"] = CommandDefinition(command=self.managers["dump_manager"], tracked=False)
self.extra_commands["dump_info"] = CommandDefinition(
Expand Down Expand Up @@ -1431,6 +1440,18 @@ def configure_api_endpoints(self):
self.api_endpoints["source"].append(
EndpointDefinition(name="source_save_mapping", method="put", suffix="mapping")
)
if "source_save_pydantic_model" in cmdnames:
self.api_endpoints["source"].append(
EndpointDefinition(name="source_save_pydantic_model", method="put", suffix="create_validation")
)
if "source_pydantic_validation" in cmdnames:
self.api_endpoints["source"].append(
EndpointDefinition(name="source_pydantic_validation", method="post", suffix="validate")
)
if "source_validations" in cmdnames:
self.api_endpoints["source"].append(
EndpointDefinition(name="source_validations", method="get", suffix="validations")
)
if not self.api_endpoints["source"]:
self.api_endpoints.pop("source")
if "inspect" in cmdnames:
Expand Down
101 changes: 100 additions & 1 deletion biothings/hub/dataload/source.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,15 @@
import types
from pprint import pformat

from biothings import config as btconfig
from biothings.utils.hub_db import get_data_plugin, get_src_dump, get_src_master

from biothings.utils.loggers import get_logger
from biothings.utils.validator import create_pydantic_model
from biothings.hub.dataload.manager import BaseSourceManager



class SourceManager(BaseSourceManager):
"""
Helper class to get information about a datasource,
Expand All @@ -26,6 +31,15 @@ def __init__(self, source_list, dump_manager, upload_manager, data_plugin_manage
self.src_dump = get_src_dump()
# honoring BaseSourceManager interface (gloups...-
self.register = {}
# setup logger
self.log_folder = btconfig.LOG_FOLDER
self.setup()

def setup(self):
self.setup_log()

def setup_log(self):
self.logger, _ = get_logger("sourcemanager")

def reload(self):
# clear registers
Expand Down Expand Up @@ -158,6 +172,21 @@ def sumup_source(self, src, detailed=False):
count += info.get("count") or 0
if detailed:
self.set_mapping_src_meta(job, mini)
if src.get("validate"):
mini["validate"] = {"sources": {}}
for job, info in src["validate"]["jobs"].items():
mini["validate"]["sources"][job] = {
"time": info.get("time"),
"status": info.get("status"),
"started_at": info.get("started_at"),
"release": info.get("release"),
"data_folder": info.get("data_folder"),
"model_file": info.get("model_file"),
}
if info.get("err"):
mini["validate"]["sources"][job]["error"] = info["err"]
if info.get("tb"):
mini["validate"]["sources"][job]["traceback"] = info["tb"]
if src.get("inspect"):
mini["inspect"] = {"sources": {}}
for job, info in src["inspect"]["jobs"].items():
Expand Down Expand Up @@ -241,11 +270,22 @@ def get_sources(self, id=None, debug=False, detailed=False):
].get("uploader")
except Exception as e:
logging.error("Source is invalid: %s\n%s" % (e, pformat(src)))
if src.get("validate"):
for subname in src["validate"].get("jobs", {}):
try:
sources[src["name"]].setdefault("validate", {"sources": {}})["sources"].setdefault(
subname, {}
)
sources[src["name"]]["validate"]["sources"][subname]["uploader"] = src["upload"][
"jobs"
][subname].get("uploader")
except Exception as e:
logging.error("Source is invalid: %s\n%s" % (e, pformat(src)))
# deal with plugin info if any
if dpm:
src = bydpsrcs.get(_id)
if src:
assert len(dpm[_id]) == 1, "Expected only one uploader, got: %s" % dpm[_id]
assert len(dpm[_id]) == 1, "Expected only one dumper, got: %s" % dpm[_id]
klass = dpm[_id][0]
src.pop("_id")
if hasattr(klass, "data_plugin_error"):
Expand Down Expand Up @@ -278,6 +318,9 @@ def get_sources(self, id=None, debug=False, detailed=False):
if getattr(upk, "__metadata__", {}).get("src_meta"):
src.setdefault("__metadata__", {}).setdefault(name, {})
src["__metadata__"][name] = upk.__metadata__["src_meta"]
if hasattr(upk, "auto_validate"):
src.setdefault("auto_validate", {}).setdefault(name, {})
src["auto_validate"][name] = getattr(upk, "auto_validate")
# simplify as needed (if only one source in metadata, remove source key level,
# or if licenses are the same amongst sources, keep one copy)
if len(src.get("__metadata__", {})) == 1:
Expand Down Expand Up @@ -356,3 +399,59 @@ def reset(self, name, key="upload", subkey=None):
except KeyError as e:
logging.exception(e)
raise ValueError(f"Can't delete information, not found in document: {e}")

def get_mapping(self, name):
"""Get the mapping for a given source name"""
# either given a fully qualified source or just sub-source
try:
m = self.src_master.find_one({"_id": name})
return m.get("mapping")
except AttributeError as e:
logging.exception(e)
raise ValueError("No mapping found for source '%s'" % name)

def create_model_str(self, name):
"""Create a pydantic model string for a given source name"""
mapping = self.get_mapping(name)
model_str = create_pydantic_model(mapping, name.casefold())
return model_str

def get_model_str(self, name, upk):
"""Get a pydantic model string for a given source name"""
try:
subsrc = name.split(".")[1]
except IndexError:
subsrc = name
try:
validation_path = self.upload_manager.get_validation_path(upk)
model_path = os.path.join(validation_path, f"{subsrc}_model.py")
with open(model_path, "r") as f:
model_str = f.read()
return model_str
except FileNotFoundError:
logging.error("No model found for source '%s' creating model string from mapping", name)
return self.create_model_str(subsrc)

def save_pydantic_model(self, name):
"""Save a pydantic model string for a given source name"""
upk = self.upload_manager[name]
assert len(upk) == 1, "Expected only one uploader, got: %s" % upk
upk = upk[0]
inst = self.upload_manager.create_instance(upk)
model_str = self.get_model_str(name, upk)
inst.commit_pydantic_model(model_str)
self.logger.info("Saved pydantic model for source '%s'", name)

def run_pydantic_validation(self, name, model_file=None):
kwargs = {"model_file": model_file}
return self.upload_manager.validate_src(name, **kwargs)

def get_validations(self, name):
"""Get the Pydantic models for a given source name"""
upk = self.upload_manager[name]
# pick any uploader they should all use the same model directory
upk = upk[0]
validation_dir = self.upload_manager.get_validation_path(upk)
if not os.path.exists(validation_dir):
return []
return [f for f in os.listdir(validation_dir) if f.endswith(".py")]
Loading
Loading