Skip to content

Commit 9ce46d6

Browse files
skyemeedanSkye Bender-deMoll
andauthored
[CV2-2674] google factcheck tools lookup bot (#30)
* [CV2-2674] sketching structure for managing google factcheck data from archive file (unused) * extracting and filtering from local file working * bot lambda working in QA and live with googe-fact-check-tools and testing workspaces * separate API keys for read and write, and and adjusted query url to not include feed id --------- Co-authored-by: Skye Bender-deMoll <[email protected]>
1 parent e0bd97d commit 9ce46d6

File tree

7 files changed

+548
-1
lines changed

7 files changed

+548
-1
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,6 @@ See details in `./health-desk-bot/README.md`
3939
## Local usage
4040

4141
* Copy `config.js.example` to `config.js` and define your local configurations with `checkApiUrl: http://localhost:3000`.
42-
* Start Check locally
42+
* Start Check locally `docker-compose -f docker-compose.yml up bots`
4343
* The `check-bots` container should start `server.js` on port `8586`
4444
* On the Check side, the bot request URL should be set to `http://bots:8586/<bot-slug>` ('exif', 'youtube' or 'health-desk').
Lines changed: 88 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,88 @@
1+
# Google factcheck explorer bot
2+
3+
Shows Notes with Claim Review content that has been imported from https://toolbox.google.com/factcheck/explorer that is simlar to the Project Media item.
4+
5+
# Overview
6+
7+
NOTE: These instructions include references to documentation and AWS infrastructure that are not
8+
visable outside the Meedan organization. Please contact us if you need information about any of
9+
these resources to install this code on your own system. The installation process is also recorded on an internal wiki page: `Installing the Google factcheck tools bot in a workspace`
10+
11+
## Background data
12+
For items to become availible to be displayed by the bot
13+
* ClaimReivew objects are parsed daily by Fetch plugin `fetch/lib/claim_review_parsers/google_fact_check.rb`
14+
- TODO: replace this with more efficient ingest process, perhaps based on the code in `/ingest` (this code is currently not used)
15+
* The google-fact-check-tools workspace https://checkmedia.org/google-fact-check-tools/project/15547 listens for
16+
new ClaimReviews and stored in Check under its team id.
17+
* The items in the workspace are availible for similarity queries via Check API, with access permission determined by API key.
18+
* NOTE: usually items in the project need to be in the 'published' state to be availible for similarity matching. It may be necessary to toggle them using a script like https://github.com/meedan/check-scripts/blob/main/publish_imported_reports.rb
19+
20+
## Bot operation
21+
When this bot is configured in a workspace
22+
* The bot listens on a webhook for new ProjectMedia creation events for a team, configured as per internal wiki page `How to configure a webhook for a Check Bot`
23+
```
24+
bot_user = BotUser.where(name: "Google fact check workspace API Client")[0]
25+
bot_user.set_request_url = "<bot api gatway url>"
26+
bot_user.set_events = [{"event"=>"create_project_media", "graphql"=>"dbid, title, description, type"}]
27+
bot_user.save!
28+
```
29+
* The text from the PM is similarity compared with the availible set of ClaimReview items via Check API in a query
30+
managed by an AWS Lambda function defined in `/google-factcheck-explorer-bot-lambda`
31+
* Any resulting ClaimReview links are written back as 'comments' on the ProjectMedia items in the workspace to be displayed as Notes the sidebar
32+
* An appropriately configured API key is needed to give bot permissions to write to the workspace. This is done by mapping the BotUser to the team. Internal wiki page `How to create an API key for a Check workspace`
33+
34+
# Bot testing setup
35+
## Testing the background side
36+
* Setup a workspace in QA to host the content (will this work or need to do locally)
37+
* Configure an api key with premissions to access the workspace via a `BotUser`. Internal wiki page `How to create an API key for a Check workspace`
38+
* Import the google claim review content from Fetch. Internal wiki page `How to re- import content from Fetch into a Check workspace`
39+
* Confirm that the feed can be queried:
40+
```
41+
curl -X GET -H "Accept: application/vnd.api+json" -H "X-Check-Token: <API_KEY_GOES_HERE>" "https://qa-check-api.checkmedia.org/api/v2/feeds?filter\[query\]=test"
42+
```
43+
which should give a response like
44+
```
45+
{"data":[{"id":"20007","type":"feeds","links":{"self":"https://qa-check-api.checkmedia.org/api/v2/feeds/20007"},"attributes":{"claim":"-","claim-context":null,"claim-tags":"","fact-check-title":"Viral Test: Big B, Madhuri Dixit campaigning For Imran Khan?","fact-check-summary":"Pakistan's PTI party is using Amitabh Bachchan and Madhuri Dixit photos on their campaign posters","fact-check-published-on":1679572669,"fact-check-rating":"undetermined","published-article-url":"https://www.indiatoday.in/fact-check/story/viral-test-big-b-madhuri-dixit-campaigning-for-imran-khan-1294131-2018-07-24","organization":"Google fact check tools"}},{"id":"19384","type":"feeds","links":{"self":"https://qa-check-api.checkmedia.org/api/v2/feeds/19384"},"attributes":{"claim":"-","claim-context":null,"claim-tags":"","fact-check-title":"The Legend of the 'Pencil Death' Exam Suicide","fact-check-summary":"A student, stressed to the breaking point by the pressures of exams, committed suicide during a test by shoving pencils up his nostrils and into his brain.","fact-check-published-on":1679571320,"fact-check-rating":"undetermined","published-article-url":"https://www.snopes.com/fact-check/pencil-death/","organization":"Google fact check tools"}},{"id":"19302","type":"feeds","links":{"self":"https://qa-check-api.checkmedia.org/api/v2/feeds/19302"},"attributes":{"claim":"-","claim-context":null,"claim-tags":"","fact-check-title":"FACT CHECK: Poppy Seeds Alter Drug Test Results?","fact-check-summary":"The consumption of poppy seeds used on bagels and muffins can produce positive results on drug screening tests.","fact-check-published-on":1679571142,"fact-check-rating":"undetermined","published-article-url":"https://www.snopes.com/fact-check/poppy-seeds-alter-drug-test-results/","organization":"Google fact check tools"}}],"meta":{"record-count":3}}%
46+
```
47+
48+
## Deploying the AWS lambda
49+
This internal wiki page gives instructions for deploying a related bot: `How to deploy Check Slack Bot`
50+
General AWS docs on how to deploy lambdas: https://docs.aws.amazon.com/lambda/latest/dg/lambda-deploy-functions.html
51+
* If this is a release, bump the version number in `package.json`
52+
* rename `config.js.example` to `config.js` (config.js is git ignored to avoid secrets)
53+
* Run `npm install` to install all the required libraries locally so they will get packaged up by the build for deployment.
54+
* `npm run build` this runs toplevel build script in `package.json` and creates a `google-factcheck-explorer-bot-lambda.zip` file with the bot script, and all of the requirements
55+
56+
* For the first deployment create a Lambda via the AWS web console similar to https://eu-west-1.console.aws.amazon.com/lambda/home?region=eu-west-1#/functions/qa-google-factcheck-explorer-bot
57+
* TODO: terraform aws lambda? https://registry.terraform.io/modules/terraform-aws-modules/lambda/aws/latest
58+
* The Lambda needs the API Gateway Trigger setup so that there is an external http endpoint that can be called.
59+
* The endpoint url from the trigger needs to be set as the '`<webhook>`' when setting the bot configuration as per instructions on internal wiki `How to configure a webhook for a Check Bot`
60+
* Lambda timeout can be increased to 3 minutes on the configuration tab
61+
* Update environment (live/QA) appropriate secrets and config in Lambda's Configuration > Environment Variables section
62+
* `CHECK_API_GOOGLE_FACT_CHECK_ACCESS_TOKEN` <-- this needs the key to the GoogleFactCheck feed workspace
63+
* `CHECK_API_WORKSPACE_ACCESS_TOKEN` <-- this needs to authorize anotations on a team's ProjectMedia
64+
* `CHECK_API_URL` <-- Usually `qa-check-api.checkmedia.org` or `check-api.checkmedia.org`
65+
* To deploy, start an `aws cli` session and deploy local files to the lambda location (best for quickly redeploys during development)
66+
* `aws lambda update-function-code --function-name qa-google-factcheck-explorer-bot --zip-file fileb://google-factcheck-explorer-bot-lambda.zip`
67+
* For 'real' deployments, we want to keep an archive of the deployed code, so best to deploy via https://s3.console.aws.amazon.com/s3/buckets/meedan-check-bot-deployments?region=eu-west-1&tab=objects and use the 'upload from S3 location' option in AWS Lambda console ui
68+
* The Lambda can be tested in the AWS web console by firing an appropriately formatted 'test' event in the web console (Note that the team slug will need to correspond to the team hosting the project media and data dbid will be project media id)
69+
* ```
70+
{
71+
"body": "{\"event\": \"create_project_media\", \"team\": {\"dbid\": 1506991, \"id\": \"abcdefg\", \"avatar\": \"https://assets.checkmedia.org/uploads/team/6503/Group_89.png\", \"name\": \"Check testing\", \"slug\": \"check-testing\"}, \"data\": {\"type\": \"Claim\", \"dbid\": 19205, \"title\": \"Is it true Pakistan's PTI party is using Amitabh Bachchan and Madhuri Dixit photos on their campaign posters?\", \"description\": \"Charles III come\\u00e7a reinado em busca de monarquia simplificada e papel pol\\u00edtico mais ativo\"}}"
72+
}
73+
```
74+
75+
* The bot needs to be authorized to write to the project media of the target team by being added as TeamBotInstalation.
76+
* The event structure sent by the webhook needs to match what the bot is expecting to parse out of the JSON payload, ie
77+
* `bot_user.set_events = [{"event"=>"create_project_media", "graphql"=>"dbid, title, description, type"}]`
78+
* Logs from event hook will appear in CloudWatch, with a few minutes delay
79+
80+
81+
## testing the bot side locally
82+
I NEVER GOT THIS FULLY WORKING, WAS JUST TESTING IN QA
83+
* Copy `config.js.example` to `config.js` and define your local configurations with `checkApiUrl: http://localhost:3000`.
84+
* TODO: can the local point to QA to host the items?
85+
* Start Check web and bots containers locally `docker-compose -f docker-compose.yml up bots web`
86+
* configure a workspace to *install* the bots for testing by logging into check-web at `localhost:3333`
87+
* The `check-bots` container should start `server.js` on port `8586`
88+
* On the Check side, the bot request URL should be set to `http://bots:8586/<bot-slug>` ('exif', 'youtube' or 'health-desk').
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
const config = {
2+
awsRegion: 'eu-west-1',
3+
functionName: 'goolge-factcheck-bot-background',
4+
// these are fallback values, will be overriden by ENV set in lambda
5+
checkApiAccessToken: '<api key here>', //corresponds to CHECK_API_ACCESS_TOKEN in ENV
6+
checkApiUrl: 'localhost:3000', // check-graphql endpoint (eg. qa-check-api.checkmedia.org) CHECK_API_URL in ENV
7+
checkApiWorkspaceAccessToken: '<workspace api key here>', //api key entitled to write into the workspace where bot is installed
8+
}
9+
module.exports = config;
Lines changed: 158 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,158 @@
1+
const config = require('./config.js'),
2+
Lokka = require('lokka').Lokka,
3+
util = require('util'),
4+
Transport = require('lokka-transport-http').Transport;
5+
const aws = require('aws-sdk');
6+
7+
8+
const getConfigFromEnvWithFallback = (env_key, fallback_value=None) => {
9+
// get secrets from local env, falling back to config
10+
if (env_key in process.env) {
11+
value = process.env[env_key];
12+
} else {
13+
console.warn('Environment variable for ' + env_key + ' is not defined, using value from config');
14+
value = fallback_value;
15+
}
16+
return value;
17+
};
18+
19+
// secret api token with permissions to access feed and write to bot target
20+
const CHECK_API_GOOGLE_FACT_CHECK_ACCESS_TOKEN = getConfigFromEnvWithFallback('CHECK_API_GOOGLE_FACT_CHECK_ACCESS_TOKEN', config.checkApiAccessToken)
21+
22+
// url for check-api (live vs QA)
23+
const CHECK_API_URL = getConfigFromEnvWithFallback('CHECK_API_URL', config.checkApiUrl)
24+
25+
// id of feed where claim reviews will be queried (this will be different in live vs QA)
26+
const CHECK_API_WORKSPACE_ACCESS_TOKEN = getConfigFromEnvWithFallback('CHECK_API_WORKSPACE_ACCESS_TOKEN',config.checkApiWorkspaceAccessToken)
27+
28+
29+
// This is the 'callback' via GraphQL API to tell Check to show the included text as a comment,
30+
// associted with the project media id for the specific team indicated by the team_slug.
31+
// The bot must be authorized (via its api key amd TeamBotIntegration) to make edits to the
32+
// team's ProjectMedia
33+
const replyToCheck = async (pmid, team_slug, text, callback) => {
34+
console.log('pmid', pmid);
35+
console.log('team_slug', team_slug);
36+
console.log('replyToCheck:', text);
37+
const vars = {
38+
text,
39+
pmid,
40+
clientMutationId: 'google-factcheck-bot' + parseInt(new Date().getTime(), 10),
41+
};
42+
43+
const mutationQuery = `($text: String!, $pmid: String!, $clientMutationId: String!) {
44+
createComment: createComment(input: { clientMutationId: $clientMutationId, text: $text, annotated_id: $pmid, annotated_type: "ProjectMedia"}) {
45+
comment {
46+
text
47+
}
48+
}
49+
}`;
50+
51+
// this access token needs permission to write into the project media of the workspace
52+
// where the bot is installed
53+
const headers = { 'X-Check-Token': CHECK_API_WORKSPACE_ACCESS_TOKEN };
54+
// NOTE: if API key lacks appropriate permissions, will probably see:
55+
// "Error when executing Project Media comment mutation: Error: GraphQL Error: No permission to create Comment"
56+
const transport = new Transport('https://' + CHECK_API_URL + '/api/graphql?team=' + team_slug, { headers, credentials: false, timeout: 120000 });
57+
console.log(transport)
58+
const client = new Lokka({ transport });
59+
60+
console.log('Sending Project Media comment mutation with vars: ' + JSON.stringify(vars));
61+
//const resp = await client.mutate(mutationQuery, vars);
62+
//console.log('resp', resp);
63+
client.mutate(mutationQuery, vars)
64+
.then(function(resp, err) {
65+
console.log('Response: ' + util.inspect(resp));
66+
callback(null);
67+
})
68+
.catch(function(e) {
69+
console.log('Error when executing Project Media comment mutation: ' + util.inspect(e));
70+
callback(null);
71+
});
72+
};
73+
74+
// This is the event handler, to be triggered on each new Project Media creation
75+
// it will call a search in Alegre to look for similar ClaimReview items in the
76+
// GoogleFactCheck feed defined in the config.
77+
exports.handler = (event, context, callback) => {
78+
const data = JSON.parse(event.body);
79+
console.log('JSON.parse(event.body)', data);
80+
if (data.event === 'create_project_media') {
81+
console.log('Google Fact Check bot processing project media creation event')
82+
// there could be an error paylod instead, but we let it fail to surface
83+
const pmid = data.data.dbid.toString();
84+
const type = data.data.type;
85+
//TODO: Decide whether to include imported reports ("Blank")
86+
if(type=="Claim" || type=="Link" || type=="Blank"){
87+
const title = data.data.title.toString();
88+
const description = data.data.description.toString();
89+
//TODO: if desription is different than the title, make a second request
90+
const http = require('https');
91+
92+
var options = {
93+
hostname: CHECK_API_URL,
94+
path: `/api/v2/feeds?filter\[query\]=${encodeURIComponent(title)}`,
95+
headers: {
96+
Accept: 'application/vnd.api+json',
97+
'X-Check-Token': `${CHECK_API_GOOGLE_FACT_CHECK_ACCESS_TOKEN}`, // This API key has access to Check workspace with googleFactCheck items
98+
},
99+
method: 'GET' // post not configurd
100+
};
101+
102+
103+
var req = http.request(options, (res) => {
104+
res.setEncoding('utf8');
105+
let responseBody = '';
106+
107+
res.on('data', (chunk) => {
108+
responseBody += chunk;
109+
});
110+
111+
res.on('end', () => {
112+
if (res.statusCode >= 400){
113+
console.error('Request error status',res.statusCode)
114+
console.log('Error response body:',responseBody);
115+
}
116+
console.log('response body:',responseBody);
117+
const json_obj = JSON.parse(responseBody);
118+
// check if anything matched {"data":[],"meta":{"record-count":0}}
119+
if (json_obj["meta"]["record-count"] > 0){
120+
// {"data":[{"id":"20007","type":"feeds","links":{"self":"https://qa-check-api.checkmedia.org/api/v2/feeds/20007"},
121+
// "attributes":{"claim":"-","claim-context":null,"claim-tags":"",
122+
// "fact-check-title":"Madhuri Dixit campaigning For Imran Khan?",
123+
// "fact-check-summary":"Pakistan's PTI party is using Amitabh Bachchan and Madhuri Dixit photos on their campaign posters",
124+
// "fact-check-published-on":1679572669,"fact-check-rating":"undetermined",
125+
// "published-article-url":"https://www.indiatoday.in/fact-check/story/viral-test-big-b-madhuri-dixit-campaigning-for-imran-khan-1294131-2018-07-24",
126+
// "organization":"Google fact check tools"}}],
127+
// "meta":{"record-count":1}}
128+
let text = `Closely releated ClaimReviews from Google Factcheck Tools:\n`;
129+
for (let i=0; i<Math.min(3,json_obj["meta"]["record-count"]); i++) {
130+
// extract items from claim and format text to display as comment
131+
claim_title = json_obj["data"][i]["attributes"]["fact-check-title"].trim()
132+
source_url = json_obj["data"][i]["attributes"]["published-article-url"]
133+
text+=`- ${claim_title} [${source_url}]\n`;
134+
}
135+
136+
// make a call back to Check with the text to be used a comments on the PM item
137+
replyToCheck(pmid, data.team.slug, text, callback);
138+
} else {
139+
console.log('no matching ClaimReviews returned')
140+
}
141+
});
142+
});
143+
144+
req.on('error', (e) => {
145+
console.error(e);
146+
// surface connection errors as exceptions
147+
throw e;
148+
});
149+
150+
req.end();
151+
}
152+
153+
}
154+
else {
155+
callback(null);
156+
}
157+
};
158+
Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
{
2+
"name": "google-factcheck-explorer-bot-lambda",
3+
"version": "0.1.1",
4+
"description": "backend function for suggesting google ClaimReview items",
5+
"main": "index.js",
6+
"scripts": {
7+
"test": "mocha",
8+
"build": "rm -f google-factcheck-explorer-bot-lambda.zip && zip -9 -r google-factcheck-explorer-bot-lambda.zip *",
9+
"start": "node server.js"
10+
},
11+
"repository": {
12+
"type": "git",
13+
"url": "git+https://github.com/meedan/check-bots.git"
14+
},
15+
"author": "Meedan",
16+
"license": "ISC",
17+
"homepage": "https://meedan.com/",
18+
"dependencies": {
19+
"aws-sdk": "^2.1061.0",
20+
"axios": "^0.24.0",
21+
"lokka": "^1.7.0",
22+
"lokka-transport-http": "^1.3.2"
23+
},
24+
"scripts": {
25+
"build": "rm -f google-factcheck-explorer-bot-lambda.zip && zip -9 -r --exclude=*configurator* --exclude=*test* --exclude=google-factcheck-explorer-bot-lambda.zip google-factcheck-explorer-bot-lambda.zip * && echo 'Now upload google-factcheck-explorer-bot-lambda.zip to AWS Lambda'"
26+
}
27+
}

0 commit comments

Comments
 (0)