Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preferred rank values do not end up listed first in the Wikidata sync script output #10259

Open
Snowysauce opened this issue Dec 24, 2024 · 0 comments
Labels
bug javascript Pull requests that update Javascript code needs discussion Waiting for other contributors to voice their opinion

Comments

@Snowysauce
Copy link
Collaborator

There is code in build_wikidata.js that is intended to list values with preferred rank first in the Array generated by getClaimValues():

if (c.rank === 'preferred'){ // List preferred values first
values.unshift(c.mainsnak.datavalue.value);
} else {
values.push(c.mainsnak.datavalue.value);
}

However, in the final output of the Wikidata sync script (dist/wikidata.json), the Array has all values sorted alphabetically, not by rank or the order read by the script/order listed on Wikidata. For example, for Toys R Us Asia (Q131521392), the officialWebsites Array looks like this:

      "officialWebsites": [
        "https://www.toysrus.co.th/th-th/",              // read last (eighth)
        "https://www.toysrus.com.bn/",                   // read second
        "https://www.toysrus.com.cn/zh-cn/",             // read third
        "https://www.toysrus.com.hk/en-hk/aboutus.html", // read first and has preferred rank
        "https://www.toysrus.com.hk/zh-hk/",             // read fourth
        "https://www.toysrus.com.my/",                   // read sixth
        "https://www.toysrus.com.sg/",                   // read seventh
        "https://www.toysrus.com.tw/zh-tw/"              // read fifth
      ]

This leads me to believe that something is rearranging the Array values somewhere between the call to getClaimValues() and the writing of the output files, and I suspect that it is the call to sortObject() on line 638:

function finish() {
const START = '🏗 ' + chalk.yellow('Writing output files');
const END = '👍 ' + chalk.green('output files updated');
console.log('');
console.log(START);
console.time(END);
let dissolved = {};
Object.keys(_wikidata).forEach(qid => {
let target = _wikidata[qid];
// sort the properties that we are keeping..
['identities', 'logos', 'dissolutions'].forEach(prop => {
if (target[prop] && Object.keys(target[prop]).length) {
if (target[prop].constructor.name === 'Object') {
target[prop] = sortObject(target[prop]);
}
} else {
delete target[prop];
}
});
if (target.dissolutions) {
_qidItems[qid].forEach(itemID => {
dissolved[itemID] = target.dissolutions;
});
}
_wikidata[qid] = sortObject(target);
});
_warnings.sort(sortWarnings);
// Set `DRYRUN=true` at the beginning of this script to prevent actual file writes from happening.
if (!DRYRUN) {
writeFileWithMeta('dist/warnings.json', stringify({ warnings: _warnings }) + '\n');
writeFileWithMeta('dist/wikidata.json', stringify({ wikidata: sortObject(_wikidata) }) + '\n');
writeFileWithMeta('dist/dissolved.json', stringify({ dissolved: sortObject(dissolved) }, { maxLength: 100 }) + '\n');
}
console.timeEnd(END);
// `console.warn` whatever warnings we've gathered
if (_warnings.length) {
console.log(chalk.yellow.bold(`\nWarnings:`));
_warnings.forEach(warning => console.warn(chalk.yellow(warning.qid.padEnd(12)) + chalk.red(warning.msg)));
}
}

I think the ideal output would have the preferred rank value listed first and all other values sorted alphabetically, but I suspect that would require substantial code rewriting. In lieu of that, I can only think of two resolutions for this issue:

  • discarding the attempt to put preferred values first in lines 593 to 597 since it's eventually overwritten anyway
  • discarding the call to sortObject() on line 638, putting preferred values first at the expense of leaving the rest of the Array values unsorted
@Snowysauce Snowysauce added bug javascript Pull requests that update Javascript code needs discussion Waiting for other contributors to voice their opinion labels Dec 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug javascript Pull requests that update Javascript code needs discussion Waiting for other contributors to voice their opinion
Projects
None yet
Development

No branches or pull requests

1 participant