-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a PPU component #71
base: master
Are you sure you want to change the base?
Changes from 1 commit
ba8d0b4
50399e4
38359f2
f107703
4aea70e
44e92b5
b2af097
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -155,22 +155,29 @@ def train_ids( | |
return train_ids | ||
|
||
def trains( | ||
self, split_sequence: bool = False, offset: int = 0 | ||
self, | ||
data: Union[DataCollection, SourceData, KeyData] = None, | ||
*, | ||
split_sequence: bool = False, | ||
offset: int = 0, | ||
) -> Union[DataCollection, List[DataCollection]]: | ||
"""Returns a subset of the data only with Trains selected by the PPU. | ||
|
||
Args: | ||
data: Data to filter. If set to None (defaut) use the data used at initialization. | ||
split_sequence (bool, optional): Split data per PPU trigger sequence. Defaults to False. | ||
offset (int, optional): offset to apply to train IDs to be selected. Defaults to 0. | ||
|
||
Returns: | ||
Union[DataCollection, List[DataCollection]]: | ||
DataCollection(s) containing only trains triggered by the PPU | ||
""" | ||
data = data or self.data | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good to make this generic. Strictly speaking, the EXtra component is thus not a PPU component any more, but a "trainSelector" component. I wonder if there should be such class per se. I see that this addition to the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not sure what you mean here. The line you highlighted here is only to make it convenient to use with There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
vs
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for the explanation. My comment was not about that particular line in the first place, more about the way this method accepts other source data i.e. allowing to do train picking based on a pre-selected source. This may be more convenient indeed. The default filtering i.e. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Still, on a much higher level, I thought whether it would be good have a very generic "trainPicker" or "trainSelector" component in EXtra, that is, a class that could take PPU or any other source for filtering, already at instantiation level. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So this comment is not affecting the PR, it is more an outlook to the future. (Maybe it does not make sense, because it lacks a real use case?) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why not, but do we have other filtering sources/methods at this moment? |
||
|
||
train_ids = self.train_ids(labelled=True, offset=offset) | ||
if split_sequence: | ||
return [ | ||
self.data.select_trains(by_id[seq.values]) | ||
data.select_trains(by_id[seq.values]) | ||
for _, seq in train_ids.groupby(train_ids.index) | ||
] | ||
return self.data.select_trains(by_id[train_ids.values]) | ||
return data.select_trains(by_id[train_ids.values]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you consider a different name for this method? Both in EXtra-data as well as across existing components, we use
.trains()
to return an iterator over trains. This method however takes something that has train IDs and changes it.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Do you have something in mind?
filter
?filter_trains
?train_selection
?select
?select_trains
?pick
?pick_trains
?take
?take_trains
?I'd probably go for
pick_trains
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
select
andselect_trains
also have different meanings already (though in the long run, I would love to pass this component toDataCollection.select_trains
, i.e. have some kind ofTrainSelector
protocol)From your list, I would probably also go for
pick_trains
, but it's unfortunate one way or the other that it's the "picking-thing" calling on the "thing-to-be-picked" 🤔Maybe @takluyver can comment on what sounds most natural.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, me too!
But do we have an idea of what/how we want to select trains or how this protocol can look like? I can easily think on how to do that specifically for the PPU device, but I can't imagine other cases for now (maybe you have more use cases as part of the data reduction).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about we invert the problem: Instead of having a method that selects trains on some object, have a property or method that returns the correspondig
extra_data.by_id
slice? Then selection becomes:run.select_trains(ppu.train_sel)
(name obviously preliminary)Whenever we decide on a universal
TrainSelector
protocol, we can simply add it.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to something like
ppu.train_sel
. For the split-sequence option, we could make PPU iterable, giving some sort of TrainSequence or TrainGroup object so you could do[run.select_trains(seq.train_sel) for seq in ppu]
.