Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to develop and test complex behaviours? #28

Open
anjackson opened this issue Nov 9, 2022 · 0 comments
Open

How to develop and test complex behaviours? #28

anjackson opened this issue Nov 9, 2022 · 0 comments

Comments

@anjackson
Copy link

Following our earlier conversation @ikreymer, I'm posting the complex behaviour I've been working on here. I've been using the Chrome Tampermonkey extension to manage the development of the behaviour as a user script.

The target is the current https://sounds.bl.uk site, which uses lots of complex widgets to play audio tracks and build large trees of links that are loaded dynamically.

The script looks like this:

// ==UserScript==
// @name         sounds.bl.uk-auto
// @namespace    http://tampermonkey.net/
// @version      0.1
// @description  Try to archive something totally tricky.
// @author       Andrew Jackson <[email protected]>
// @match        https://sounds.bl.uk/*
// @icon         https://www.google.com/s2/favicons?sz=64&domain=bl.uk
// @grant        none
// ==/UserScript==

// Implements automation of complex AJAX widgets on https://sounds.bl.uk/
// A good, complex example: https://sounds.bl.uk/Arts-literature-and-performance/Theatre-Archive-Project/
(async function() {
    'use strict';

    async function sleep(ms){
        return new Promise(function (resolve, reject) {
            setTimeout(()=>{
                resolve();
            },ms);
        })
    }

    async function open_all_lists() {
        while(true) {
            var l = document.querySelectorAll('div[aria-hidden="false"] li[class="closed"] a');
            console.log(l);
            if ( l.length > 0 ) {
                for ( var e of l ) {
                    e.click();
                    await sleep(1000);
                }
            } else {
                break;
            }
        }
    }

    // Note that this doens't really work in Chrome because https://developer.chrome.com/blog/autoplay/ so need to override that for automation to work fully
    async function run_all_players() {
        var ps = document.querySelectorAll(".playable");
        for (var button of ps) {
            button.click();
            await sleep(1000);
        }
    }


    await sleep(4000);

    // Run players:
    await run_all_players();

    // Open all lists on first tab:
    await open_all_lists();

    // Iterate over other tabs:
    var tabs = document.querySelectorAll(".tabbedContent > ul > li > a");
    for( var tab of tabs ) {
        // Switch tab:
        tab.click();
        await sleep(2000);
        // Iterate over closed list items:
        await open_all_lists();
    }

})();

It's still not perfect as a crawl script. I have tried using it in a Scrapy crawler running behind PyWB in archiving proxy mode, and it struggles with the audio files. There can be quite a few per page, and they are fast in normal use because the system uses HTTP range requests. Archiving HTTP 206's doesn't work, so PyWB grabs the files with a 200 and then returns chunks. But this makes the timing tricky to get right.

This ticket is not so much about archiving this particular site, but more about how best to develop new behaviours like this, and how best to test them and test the integration of them into Browsertrix Crawler.

@anjackson anjackson changed the title How to develop and test complex behaviours How to develop and test complex behaviours? Nov 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant