Feature request: Parse pasted URL for ID #1

That-Dude · 2016-01-02T16:23:40Z

I was wondering if it would be possible to paste magazine URL and have the ID extracted from.

pietrop · 2016-01-19T13:39:52Z

Apologies for late reply,

I suppose it might be possible, it would require grabbing the documentId element from the page, and set the variabledocument_id in the script equal to that. The rest should work pretty much the same.

if you are not familiar with at I sudgest looking at this tutorial on scraping http://ruby.bastardsbook.com/chapters/html-parsing/

if you want to try it out and do a pull request that could be an interesting feature to add.

also you might find this tutorial interesting/usefull http://pietropassarelli.com/issuu.html

ps: if you just want to get the issu magazine as pdf and not to bothered about the code or the implementation you can always use http://issuu-downloader.abuouday.com/ haven't tried it myself as wasn't a round at the time when I did this script but might be a good solution.

That-Dude · 2016-01-19T14:19:53Z

Hi Pietro

I ended up writing my own script to do this in bash, I just grabbed the
paged and parsed it for the document id and number of pages then downloaded
the images with curl.

Thanks for the good idea though, it really got me started.

Dan

Mobile: 07880-903-889
[image: XP11.com]

On 19 January 2016 at 13:39, Pietro [email protected] wrote:

Apologies for late reply,

I suppose it might be possible, it would require grabbing the documentId
element from the page, and set the variabledocument_id in the script
equal to that. The rest should work pretty much the same.

if you are not familiar with at I sudgest looking at this tutorial on
scraping http://ruby.bastardsbook.com/chapters/html-parsing/

if you want to try it out and do a pull request that could be an
interesting feature to add.

also you might find this tutorial interesting/usefull
http://pietropassarelli.com/issuu.html

ps: if you just want to get the issu magazine as pdf and not to bothered
about the code or the implementation you can always use
http://issuu-downloader.abuouday.com/ haven't tried it myself as wasn't a
round at the time when I did this script but might be a good solution.

—
Reply to this email directly or view it on GitHub
#1 (comment)
.

pietrop · 2016-01-19T14:23:20Z

Nice one, glad it all worked out!

is there a link to your script somewhere? I'd be curious to see how you implemented it in bash

That-Dude · 2016-01-19T14:44:49Z

Sure, see attached.

I should point a couple of things:

I'm a photographer not a developer, this script is probably far from
optimal :-)
I've only tested it on Mac OSX, but I wrote it to be POSIX compliant so
it should run just fine on any unix environment (it has no dependencies
beyond what's installed by default)
I didn't publish it anywhere because I figured it would be trivial for
ISSUU to change their protocol and prevent this from working
I think this falls into a legally gray area so you might want to check
the law in your country

Dan

Mobile: 07880-903-889
[image: XP11.com]

On 19 January 2016 at 14:23, Pietro [email protected] wrote:

Nice one, glad it all worked out!

is there a link to your script somewhere? I'd be curious to see how you
implemented it in bash

—
Reply to this email directly or view it on GitHub
#1 (comment)
.

pietrop · 2016-01-19T14:46:40Z

That's great Thanks Dan,
I don't see any attachment thou ...

That-Dude · 2016-01-19T14:49:34Z

Oh it was definitely there, maybe github strips it. Copy and paste.

#!/bin/bash
#Script to download from ISSUU.com using the magazine URL as an argument.
#Once complete, ZIP up images into CBZ file, readable by any comicbook
reader

if [[ $# -eq 0 ]] ; then
echo 'Need an issuu.com URL'
exit 0
fi

#Grab the magazine HTML page and save it to a file
curl -s -o targetmag.html $1

#Extract the unique docuement ID from the HTML file
x="$(grep -m1 -o 'documentId.{0,46}' targetmag.html)"
documentID=${x:11:45}
echo "Unique documentID = "$documentID

#Extract the document name
docuentNAME="$(grep -o "<title>[^<]*" targetmag.html | tail -c+8)"
printf "\n"
echo "Document name = "$docuentNAME

#Extract the page count
x="$(grep -o -m1 'pageCount.{0,5}' targetmag.html | tr -d ',')"
pagecount=${x:11}
echo "There are "$pagecount" pages."

pagenumber=1 #Start downloading at page 1
pagecount=$((pagecount+1))

mkdir "$docuentNAME"
cd "$docuentNAME"

while [ $pagenumber != $pagecount ]; do
filename='page_'$pagenumber".jpg"
linkurl="http://image.issuu.com/"$documentID"/jpg/page_"$pagenumber".jpg"
curl -s -o $filename $linkurl #Grab the JPG file
printf "."
pagenumber=$((pagenumber+1))
done

zip -q0 "$docuentNAME".cbz *.jpg
rm *.jpg
mv "$docuentNAME".cbz ../
cd ..
rmdir "$docuentNAME"
rm targetmag.html
printf "\ndone\n \n"

Dan

Mobile: 07880-903-889
[image: XP11.com]

On 19 January 2016 at 14:46, Pietro [email protected] wrote:

That's great Thanks Dan,
I don't see any attachment thou ...

—
Reply to this email directly or view it on GitHub
#1 (comment)
.

pietrop · 2016-01-19T14:56:30Z

cool, thanks

Johann-Tree · 2022-02-25T00:20:08Z

See here for ruby code how to parse the URL and query the Issuu API for pagecount: pviotti/issuu-pdf-dl#5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request: Parse pasted URL for ID #1

Feature request: Parse pasted URL for ID #1

That-Dude commented Jan 2, 2016

pietrop commented Jan 19, 2016

That-Dude commented Jan 19, 2016

pietrop commented Jan 19, 2016

That-Dude commented Jan 19, 2016

pietrop commented Jan 19, 2016

That-Dude commented Jan 19, 2016

pietrop commented Jan 19, 2016

Johann-Tree commented Feb 25, 2022

Feature request: Parse pasted URL for ID #1

Feature request: Parse pasted URL for ID #1

Comments

That-Dude commented Jan 2, 2016

pietrop commented Jan 19, 2016

That-Dude commented Jan 19, 2016

pietrop commented Jan 19, 2016

That-Dude commented Jan 19, 2016

pietrop commented Jan 19, 2016

That-Dude commented Jan 19, 2016

Oh it was definitely there, maybe github strips it. Copy and paste.

pietrop commented Jan 19, 2016

Johann-Tree commented Feb 25, 2022