Skip to content

Latest commit

 

History

History
31 lines (26 loc) · 2.14 KB

README.md

File metadata and controls

31 lines (26 loc) · 2.14 KB

grab_tcl

GRAB web link downloader

This is a script similar to WGET or Teleport Pro, but for using a text file as the source of urls, in an automated fashion. A major difference is the way it saved files, in that it would smash the path to the file into the filename, rather than duplicating the directory tree locally, thereby making it obvious where it came from, without creating a navigation headache.

This source list is "grab.txt" and would be generated by an additional program/task which would save the relevant links, and referrers for sites that need it, for a scheduled batch download by GRAB. This model is no longer workable since websites all went encrypted, and GRAB doesn't support TLS/SSL in any form as written. It was meant to be a lightweight helper script, not as a feature rich utility.

The format of "grab.txt" is very simplistic, as if only reads download links, referrer links, and a special case "EOF" line as a breakpoint, so that the links don't need to be removed in order to abort processing. The first line of a list of downloads should be the referrer for obvious reasons, which is denoted with an exclamation mark prefixed to the url, followed by complete urls pointing to specific files (image, spreadsheet etc) to download. While the referrer isn't necessary, it makes a good structure for the url list, each referrer would act like a header for a block of data.

GRAB supports chunking, resuming, and other http features, but in my experience web servers tend to lie about their capability, or the modules they load break the negotiated rules as defined by the request and return headers. There have also been odd cases where servers chunk improperly, rarely appending newlines that shouldn't be there, or incorrectly reporting document size, with no obvious way to filter or prevent it. Overall the script was quite effective.

Since it has become impossible to use across the wider web lately, the only potential scenario for operating would be on an internal network, where the encryption could be disabled between trusted hosts. As such, I'm releasing the script with the expectation of it being more of a learning model than something useful.