-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segmentation #21
Comments
Segmentation by strokes seems to be possible for mathematical context. Be aware that the order of strokes is not necessarily ordered (e.g. enlarging fraction strokes) |
For http://www.martin-thoma.de/write-math/view/?raw_data_id=151305 the classifier predicted the segmentation The yellow and black segmentation does not make any sense, as it would mean there is a symbol between strokes of one symbol. This should be part of the score of a segmentation. |
|
Idea: Analyze dataSearch single-stroke symbols which are not "prefixes" of other symbols. For example. Results91 of 374 symbols have a mean stroke count in [1,1.1] with a variance < 0.3. The variance is very likely caused by wild points. SegmentLet L be the list of strokes which should be partitioned into a list of list of strokes, where each sublist is one symbol.
|
Idea: Use MST as described in http://www.ai.mit.edu/projects/natural-log/papers/matsakis-MEng-99.pdf#page=37
|
Ideas:
|
Other problems:
|
Some first results by
|
Single Stroke combinational recognizerClassify symbols stroke-by-stroke. Lets say there are n symbols in total.
|
Stream segmentation
Now one of two things could have happened:
For new symbols, it is more likely that it is at the very right side of the bounding box of If the new stroke belongs to an existing symbol, then then it has to be "next" to the symbol. "next" in this context means there is not another symbol in between (though it might be around, e.g. |
|
Top 2 > 0.95 instead of top 3:
|
Top 4 > 0.95 instead of top 3:
|
Top 5 > 0.95 instead of top 3:
|
Top 6 > 0.95 instead of top 3:
|
Top 10 > 0.95 instead of top 3:
|
Top 20 > 0.95 instead of top 3:
|
All instead of cropping at top 3:
|
top 1 > 0.95 instead of top 3:
|
All
|
What about using a varying size bounding box ? Physically & historically, each symbol are separated because else it would become unreadable. I think it's a good idea to compute a bounding box while a symbol is being drawn whose size is adjusted by the current stroke length. When the pen is lifted, one has to assume to either "start a new symbol" or "concatenate existing symbol". This decision can be based on a bounding box whose shape can be trained, but with a size that's dynamic. For example, when drawing the "integral" symbol, the bounding box is an horizontally thin and vertically tall rectangle that would be recognized by the network, but the size itself is extracted from the current symbol's size. When a new stoke is produced, and it intersects a previous stroke's bounding box, you can likely concatenate it with the previous symbol. |
This is an example for segmentation:
Each symbol is in one color, different colors mean different symbols.
Features
A classifier which has to decide for two given strokes if they are in one symbol could use the following features:
Problems
The text was updated successfully, but these errors were encountered: