Skip to content

De duplicator scenarios

EyalLavi edited this page Jul 26, 2017 · 9 revisions

When a processing node combines multiple Part 3 documents, it has to deal with the style and region definitions within each of the input documents. The first step is simply to preserve all of the definition and to prefix their identifier with the sequence number to ensure they are unique. Next, the de-duplicator examines the definitions of the styles and regions in the combined document. When it finds identical definitions it merges them and then fixes any references to styles that have been merged.

For example, the re-segmenter outputs a document with these (simplified) styles:

<style xml:id="seq1234_style1" tts:backgroundColor="black" tts:color="lime" tts:fontSize="1c 2c"/>
<style xml:id="seq1235_style2" tts:color="lime" tts:fontSize="1c 2c" tts:backgroundColor="#000000" />
<style xml:id="seq1236_style3" style="seq1234_style2" />
<style xml:id="seq1237_style4" tts:color="lime" tts:fontSize="10px 20px" tts:backgroundColor="black" />

In this case, the first three styles will be merged because their definitions are functionally identical. The fourth style will not be merged even if the size of the font computed in pixels is equal to the size specified in cells.

This can get quite complicated and tricky to test, so for each scenario a separate Part 3 document was created. Below is a list of tested scenarios.

Test Doc Example scenario Expected Output
*NoStyleNoRegion No changes
*NoStylesOneRegion No changes
*OneStyleNoRegions No changes
*OneStyleOneRegion No changes
*1Sty1RegionWithOneStyleAttr No change
*3DupSty3DupRegAllAttsSpecified 1 style 1 region
EL
*6Sty3Dup6Reg3DupForeignNamespace 4 styles, 4 regions
*1Sty1Reg4DupAtts 1 style, 1 region with deduped style attributes
*NoDupStyNoDupReg No changes
3DupSty3DupRegRefs content elements references changed