-
Notifications
You must be signed in to change notification settings - Fork 71
Several questions about the code #69
Comments
good question, I think that would work as well! I'm not too sure why I wrote that first code, sorry :)
This code doesn't actually do anything functionally with the label counts, it's just used to print to the console the approximate count of each label.
End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF Finding Function in Form: Compositional Character Models for Open Vocabulary Word Representation Mimicking Word Embeddings using Subword RNNs hope that helps! |
Thank you for replying!
1). I thought the Same question with
Why not using In data_indexer.py In instance.py
If you wanna encode the question which is at the end of a sequence, you should truncate it from left? |
Whoops, you're right. looks like the docstring is wrong, sorry about that
The base instance class defines a self._index_text(), and I thought it would be better to just keep a consistent API throughout. Some instances might have more elaborate definitions of this function.
I took the instance API from https://github.com/allenai/deep_qa, I think that's part of the API there.
This was another copy-paste error from deep_qa, sorry. I that truncating from left makes more sense for QA, but in this code I preferred to truncate from right since I suspect that the subject / salient information of the sentence would be closer to the front. This is just my intuition though. |
Hi, Nelson! 2
Why do you return (np.asarray(self.label),) instead of np.asarray(self.label)? |
Hi, I appreciate your sharing this project! It is a very thoughtful work and friendly to newers!
I have some questions when reading the code.
In dataset.py
Instead of writing:
Why don't you write:
return self.__class__(self.instances[:max_instances])
At dataset.py line 180:
Why do you do above process?
Correct me if i am wrong: (FAKE EXAMPLE)
300 is the number of times a specific type of label appears
I don't understand why all these steps mean for? How can this calculate labels counts?
In sts_instance.py line 108:
Instead of writing:
fields = list(csv.reader([line]))[0]
Why not writing:
fields = [x for x in csv.reader([line])]
I run a trial:
It seems
list(csv.reader(['asdf','fddf','ddd']))[0]
can't be used to count the number of fields? Correct me if i am wrong.4.Why create character-level dictionary/instance?
When you tokenize the string 'Today has a good weather' to character level, the result would be ['T','o','d','a','y','h','a'............]
The character-level seems can't be used to encode contextual information?
Could you tell me some cases where character-level tokenization plays a difference?
Thank you!
The text was updated successfully, but these errors were encountered: