-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Structure and implementation details of ProteoBench that could be improved #581
Comments
Fully agree, we might need to check if any parts of the code rely on the default or hidden parameter.
I believe this was done for a reason. Not sure exactly what reason. What I can definitely remember is that we build this with the idea of future-proofing. So it might not be useful now, but it might be in the future. I do agree that if there are things on the module level it should be moved there. We should already have a .toml for the module level.
There should already be one, right? Or do you want to make another super level? Not sure about that, I would rather have duplication and clarity than complexity.
@rodvrees does this make more sense or not?
Personally, I think nested folder structures are fine. What makes different file names easier to understand/work with than folders? I do agree with the naming of the files that should be the same. |
Regarding (5)
The current folder structure looks like this, with deeply nested folders, in the end containing one (or only once two) files.
When we look at a single file, the definition of the file is repeated twice. Once in the folder structure and once in the file name.
So it could be easily be simplified to a much flatter organization like this ...
Or keeping some additional level of organization like this ...
|
Ok, going through the list right now, unfortunately we cannot change (2) as suggested. Naming schemes for species and contaminants are tool dependent. While most of them will be the same. Currently I do not see harm in keeping it as is, also makes it more clear that this is tool specific. |
For (3), yes, we need to control it this way as these settings are module specific, and indeed different quant modules can have different ratios. As this also makes it clear what is exactly used and read in per module I want to keep it this way. |
For (4), sure, lets turn them around. I have no preference either way. |
For (5), I will have a look now to remove the folder structure. |
Thanks for taking the time to go through the list. That means (1), (2), (3) are solved (although I might make a separate issue that also concerns 2). Once you are done with (5), I will make PR for (4). |
Wait, sorry, I was not clear. I am also trying to work on (4). |
Fixed most of them. These are still remaining:
|
Mostly resolved in #597 |
@hollenstein do you think this resolves the issue completely or would you like to also do the io_parse_settings? For the markdown files, I think the solution is fine as we have quite a few unique markdown files, and putting them into a single folder would be a bit messy... |
Hi, while looking at the repository, and specifically how the settings .toml files are parsed and used, I've noticed several smaller issues. Those are some of the points that made it difficult for me to understand the proteobench project in the beginning and start working a new module (since I looked at the existing modules as a template for the new one). So I would like to discuss if it makes sense to introduce changes to the project.
(1) The
ParseSettingsBuilder
has default values for 'module_id' and a hidden default value for 'parse_settings_dir', both of which are specific for the "quant_lfq_ion_DDA" module. I think it would be better to remove these default values and enforce to provide these values when creating aParseSettingsBuilder
.(2) Currently the
ParseSettings
gets to arguments, 'parse_settings' and 'parse_settings_module'. The module level settings contain information that is the same for all software tools, like '[species_expected_ratio]'. In all (or almost all?) of the software tools settings the following parameters are identical.So I guess it would make sense to move these settings to the module level file. However, there might be the case that for one software tool the settings need to be different. Therefore I would suggest to merge the information from 'parse_settings' and
'parse_settings_module', with software tool specific settings overwriting the module level settings. So in
ParseSettingsBuilder.build_parser()
the code would look something like:(3) There is even more duplication of the settings at the module level, with all entries except for "general.level" being identical between all "quant" modules (actually also the "[species_mapper]" is the same across modules). Would it make sense to create another "super module" level settings.toml file at the "quant" level, that is shared between all modules that use a similar three species benchmark dataset with the same ratios? Or do you expect that there might be additional "quant" modules with different species or different ratios, that reuse the "quant" module logic?
(4) Currently the structure of the folders is for example "quant/lfq/ion/dda". This is somewhat counter intuitiv, because all "quant/lfq/dda" modules share the same raw dataset, fasta file, ... So you would use the same software results to upload your data into the 'ion' or 'peptidoform' (and potentially 'protein group') modules. Wouldn't it make more sense that the structure is "quant/lfq/dda" and then have subfolders for "ion" and "peptidoform"?
(5) Looking at other components of ProteoBench where we have a similar folder structure, like the "main module" classes. Those files and classes don't seem to be reusable across multiple modules, since file paths, module name and other parameters are hardcoded, e.g. in
DDAQuantIonModule
. Since the filenames already contain all information "levels" (e.g. quant_lfq_ion_DDA.py) and there is only one specific "module class" per module, the python files could all be placed together into the 'proteobench/modules/quant' folder, instead of 'proteobench/modules/quant/lfq/ion/DDA/quant_lfq_ion_DDA.py'. I think in general, when there is the possibility to avoid nested folder structures it makes the repository easier to understand and work with.The text was updated successfully, but these errors were encountered: