-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add anndata factory #255
base: add_alphadia_reader
Are you sure you want to change the base?
Add anndata factory #255
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah 🥳 Fantastic!
- I think how to handle duplicated protein groups is a good question - is this expected to happen? Otherwise I would raise a warning/error, drop them, and use the strategy
first
while pivoting - For some downstream analyses it might be good to consider additional information from the psm files (e.g. gene names). Would it be possible to add additional metadata to the metadata attributes? (e.g. list of columns the
.obs
and.obs
attributes?)
alphabase/anndata/anndata_factory.py
Outdated
index=PsmDfCols.RAW_NAME, | ||
columns=PsmDfCols.PROTEINS, | ||
values=PsmDfCols.INTENSITY, | ||
aggfunc=np.nanmean, # how to aggregate intensities for same protein in same raw file TODO first? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there scenarios in which the same protein occurs multiple times in a file? I tested the diann_test_input_mDIA.tsv
with the DiannReader
class and did not find any.
I think aggregating by the mean might be dangerous. One could add a test on whether there are duplicates, and at least raise a warning.
duplicated_proteins = self._psm_df[PsmDfCols.PROTEINS].duplicated()
if duplicated_proteins.sum() > 0:
warning.warn(f"{duplicated_proteins.sum()} duplicated protein groups")
Alternatively, this could be an optional argument agg_duplicates: Literal["mean", "drop", "raise"]
with "raise" raising a ValueError
, "drop" dropping the duplicated entries, and "mean" aggregating
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
user-define agg_duplicates => https://github.com/orgs/MannLabs/projects/20/views/1?pane=issue&itemId=88563720
if missing_cols: | ||
raise ValueError(f"Missing required columns: {missing_cols}") | ||
|
||
self._psm_df = psm_df |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to add optional metadata columns to the .obs
and .var
attributes by passing obs_columns: Optional[str, List[str]]
and var_columns: Optional[str, List[str]]
to the factory class?
This would add to the complexity as one had to validate that the columns are in the data frame, but other than that one could just use .pivot_table while passing the list of columns
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
36f0ce4
to
c19e9ce
Compare
@@ -1,3 +1,4 @@ | |||
anndata==0.11.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the problem is that in order to use 0.11.1
we would need to drop support for python 3.8 (latest supported versions are 0.10.9, 0.11.0rc1, 0.11.0rc2) .. would be an argument for moving this module out of alphabase
f6bb454
to
427e64a
Compare
3e745b3
to
239e2ef
Compare
427e64a
to
8ce3ce5
Compare
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
Add first version of anndata conversion.