Skip to content

Latest commit

 

History

History
69 lines (61 loc) · 1.95 KB

Documentation.md

File metadata and controls

69 lines (61 loc) · 1.95 KB

File formats

Table of Contents


Flat clustering

The flat clustering format is a text file with one cluster per line. Each cluster is represented by its member word types separated by comma and space. The file is sorted from top to bottom from the more frequent to the least frequent cluster. And within the cluster, left to right from the most frequent to the least frequent word.

Exchange JSON

The exchange runner can output its progress to a JSON file containing information about the AMI of each iteration, duration of each iteration (in milliseconds), how many swaps have been performed, and a few other details about the parameters used for the run.

Below is a sample of the JSON file:

{

"algorithm": "EXCHANGE_STEPS",
"ami_progression": [
    0.34185137336487514,
    1.1807175343429535,
    1.2719326785423313,
    1.2823230567026833,
    1.2881291860627222,
    1.291352535230331,
    1.3008575181719202,
    1.304579594617369,
    1.3103957545758398,
    1.3147285328865372,
    1.3182900172506287
],
"durations": [
    5023,
    4184,
    4000,
    3983,
    3958,
    3999,
    3960,
    3974,
    3957,
    3952
],
"iterations_exchange": 10,
"num_clusters": 50,
"num_iterations": 10,
"swaps": [
    25626,
    7891,
    2381,
    1839,
    990,
    2314,
    1163,
    1657,
    1185,
    1003
],
"total_words": 33904

}