How to develop and test complex behaviours?

Following our earlier conversation @ikreymer, I'm posting the complex behaviour I've been working on here. I've been using the Chrome [Tampermonkey extension](https://www.tampermonkey.net/) to manage the development of the behaviour as a user script. 

The target is the current https://sounds.bl.uk site, which uses lots of complex widgets to play audio tracks and build large trees of links that are loaded dynamically.

The script looks like this:

```javascript
// ==UserScript==
// @name         sounds.bl.uk-auto
// @namespace    http://tampermonkey.net/
// @version      0.1
// @description  Try to archive something totally tricky.
// @author       Andrew Jackson <Andrew.Jackson@bl.uk>
// @match        https://sounds.bl.uk/*
// @icon         https://www.google.com/s2/favicons?sz=64&domain=bl.uk
// @grant        none
// ==/UserScript==

// Implements automation of complex AJAX widgets on https://sounds.bl.uk/
// A good, complex example: https://sounds.bl.uk/Arts-literature-and-performance/Theatre-Archive-Project/
(async function() {
    'use strict';

    async function sleep(ms){
        return new Promise(function (resolve, reject) {
            setTimeout(()=>{
                resolve();
            },ms);
        })
    }

    async function open_all_lists() {
        while(true) {
            var l = document.querySelectorAll('div[aria-hidden="false"] li[class="closed"] a');
            console.log(l);
            if ( l.length > 0 ) {
                for ( var e of l ) {
                    e.click();
                    await sleep(1000);
                }
            } else {
                break;
            }
        }
    }

    // Note that this doens't really work in Chrome because https://developer.chrome.com/blog/autoplay/ so need to override that for automation to work fully
    async function run_all_players() {
        var ps = document.querySelectorAll(".playable");
        for (var button of ps) {
            button.click();
            await sleep(1000);
        }
    }


    await sleep(4000);

    // Run players:
    await run_all_players();

    // Open all lists on first tab:
    await open_all_lists();

    // Iterate over other tabs:
    var tabs = document.querySelectorAll(".tabbedContent > ul > li > a");
    for( var tab of tabs ) {
        // Switch tab:
        tab.click();
        await sleep(2000);
        // Iterate over closed list items:
        await open_all_lists();
    }

})();
```

It's still not perfect as a crawl script. I have tried using it in a Scrapy crawler running behind PyWB in archiving proxy mode, and it struggles with the audio files. There can be quite a few per page, and they are fast in normal use because the system uses HTTP range requests. Archiving HTTP 206's doesn't work, so PyWB grabs the files with a 200 and then returns chunks. But this makes the timing tricky to get right.

This ticket is not so much about archiving this particular site, but more about how best to develop new behaviours like this, and how best to test them and test the integration of them into Browsertrix Crawler.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

How to develop and test complex behaviours? #28

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

How to develop and test complex behaviours? #28

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions