Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: Parse pasted URL for ID #1

Open
That-Dude opened this issue Jan 2, 2016 · 8 comments
Open

Feature request: Parse pasted URL for ID #1

That-Dude opened this issue Jan 2, 2016 · 8 comments

Comments

@That-Dude
Copy link

I was wondering if it would be possible to paste magazine URL and have the ID extracted from.

@pietrop
Copy link
Owner

pietrop commented Jan 19, 2016

Apologies for late reply,

I suppose it might be possible, it would require grabbing the documentId element from the page, and set the variabledocument_id in the script equal to that. The rest should work pretty much the same.

if you are not familiar with at I sudgest looking at this tutorial on scraping http://ruby.bastardsbook.com/chapters/html-parsing/

if you want to try it out and do a pull request that could be an interesting feature to add.

also you might find this tutorial interesting/usefull http://pietropassarelli.com/issuu.html

ps: if you just want to get the issu magazine as pdf and not to bothered about the code or the implementation you can always use http://issuu-downloader.abuouday.com/ haven't tried it myself as wasn't a round at the time when I did this script but might be a good solution.

@That-Dude
Copy link
Author

Hi Pietro

I ended up writing my own script to do this in bash, I just grabbed the
paged and parsed it for the document id and number of pages then downloaded
the images with curl.

Thanks for the good idea though, it really got me started.

Dan

Dan

Mobile: 07880-903-889
[image: XP11.com]

On 19 January 2016 at 13:39, Pietro [email protected] wrote:

Apologies for late reply,

I suppose it might be possible, it would require grabbing the documentId
element from the page, and set the variabledocument_id in the script
equal to that. The rest should work pretty much the same.

if you are not familiar with at I sudgest looking at this tutorial on
scraping http://ruby.bastardsbook.com/chapters/html-parsing/

if you want to try it out and do a pull request that could be an
interesting feature to add.

also you might find this tutorial interesting/usefull
http://pietropassarelli.com/issuu.html

ps: if you just want to get the issu magazine as pdf and not to bothered
about the code or the implementation you can always use
http://issuu-downloader.abuouday.com/ haven't tried it myself as wasn't a
round at the time when I did this script but might be a good solution.


Reply to this email directly or view it on GitHub
#1 (comment)
.

@pietrop
Copy link
Owner

pietrop commented Jan 19, 2016

Nice one, glad it all worked out!

is there a link to your script somewhere? I'd be curious to see how you implemented it in bash

@That-Dude
Copy link
Author

Sure, see attached.

I should point a couple of things:

  1. I'm a photographer not a developer, this script is probably far from
    optimal :-)
  2. I've only tested it on Mac OSX, but I wrote it to be POSIX compliant so
    it should run just fine on any unix environment (it has no dependencies
    beyond what's installed by default)
  3. I didn't publish it anywhere because I figured it would be trivial for
    ISSUU to change their protocol and prevent this from working
  4. I think this falls into a legally gray area so you might want to check
    the law in your country

Dan

Mobile: 07880-903-889
[image: XP11.com]

On 19 January 2016 at 14:23, Pietro [email protected] wrote:

Nice one, glad it all worked out!

is there a link to your script somewhere? I'd be curious to see how you
implemented it in bash


Reply to this email directly or view it on GitHub
#1 (comment)
.

@pietrop
Copy link
Owner

pietrop commented Jan 19, 2016

That's great Thanks Dan,
I don't see any attachment thou ...

@That-Dude
Copy link
Author

Oh it was definitely there, maybe github strips it. Copy and paste.

#!/bin/bash
#Script to download from ISSUU.com using the magazine URL as an argument.
#Once complete, ZIP up images into CBZ file, readable by any comicbook
reader

if [[ $# -eq 0 ]] ; then
echo 'Need an issuu.com URL'
exit 0
fi

#Grab the magazine HTML page and save it to a file
curl -s -o targetmag.html $1

#Extract the unique docuement ID from the HTML file
x="$(grep -m1 -o 'documentId.{0,46}' targetmag.html)"
documentID=${x:11:45}
echo "Unique documentID = "$documentID

#Extract the document name
docuentNAME="$(grep -o "<title>[^<]*" targetmag.html | tail -c+8)"
printf "\n"
echo "Document name = "$docuentNAME

#Extract the page count
x="$(grep -o -m1 'pageCount.{0,5}' targetmag.html | tr -d ',')"
pagecount=${x:11}
echo "There are "$pagecount" pages."

pagenumber=1 #Start downloading at page 1
pagecount=$((pagecount+1))

mkdir "$docuentNAME"
cd "$docuentNAME"

while [ $pagenumber != $pagecount ]; do
filename='page_'$pagenumber".jpg"
linkurl="http://image.issuu.com/"$documentID"/jpg/page_"$pagenumber".jpg"
curl -s -o $filename $linkurl #Grab the JPG file
printf "."
pagenumber=$((pagenumber+1))
done

zip -q0 "$docuentNAME".cbz *.jpg
rm *.jpg
mv "$docuentNAME".cbz ../
cd ..
rmdir "$docuentNAME"
rm targetmag.html
printf "\ndone\n \n"

Dan

Mobile: 07880-903-889
[image: XP11.com]

On 19 January 2016 at 14:46, Pietro [email protected] wrote:

That's great Thanks Dan,
I don't see any attachment thou ...


Reply to this email directly or view it on GitHub
#1 (comment)
.

@pietrop
Copy link
Owner

pietrop commented Jan 19, 2016

cool, thanks

@Johann-Tree
Copy link

See here for ruby code how to parse the URL and query the Issuu API for pagecount: pviotti/issuu-pdf-dl#5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants