Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add field for page numbers & notes #19

Open
wants to merge 18 commits into
base: master
Choose a base branch
from

Conversation

andykemp
Copy link

@andykemp andykemp commented Jan 3, 2018

Extract the page number (where available) and the contents of any notes recorded with that highlight.

highlight.page will return "Page: 7" or "Location: 123" depending on whether the page number is available...

highlight.note will return the contents of any notes associated with a highlight.

Extract the page number (where available) and the contents of any notes recorded with that highlight.

highlight.page will return "Page: 7" or "Location: 123" depending on whether the page number is available...
@andykemp
Copy link
Author

andykemp commented Jan 3, 2018

I was looking to find a way of extracting the data from my Kindle Highlights into a format I could easily import into import into Ulysees in markdown. Using this as a basis, with the tweaks above I was able to write a little ruby script which creates a separate .txt file for each book with a list of quotes, and the location/page number of the highlight - and where they exist the associated notes.

I hope others will find these tweaks useful as it makes the page number and notes fields available to users.

@speric
Copy link
Owner

speric commented Jan 5, 2018

@andykemp Thanks for this PR!

The test suite is currently failing: https://travis-ci.org/speric/kindle-highlights/jobs/324621115

I believe this is because the "mocked" Kindle HTML in https://github.com/speric/kindle-highlights/blob/master/test/fetching_books_and_highlights_test.rb does not include the HTML elements you are expecting.

Can you add the markdown you expect the Kindle site to return to the "mocked" Kindle response in the test referenced above? Do those elements always exist in the Kindle HTML?

@speric speric self-assigned this Jan 5, 2018
Updated test data to include highlight.page and highlight.note
Update fetching_books_and_highlights_test.rb
Fixed test files to include notes and page data
@andykemp
Copy link
Author

andykemp commented Jan 5, 2018

Well that was messier than it should have been, but I've now got it fixed (and picked up an error on route).

Yes the HTML elements always exist - if there's not a note then the span is empty.

I've only tested on the English version of the site (read.amazon.com/notebook) but assume it would work on the others if everything else does.

Updated to include details of highlight.page and highlight.note
@andykemp
Copy link
Author

andykemp commented Jan 5, 2018

I've also updated the README.md file to include details about these fields.

Copy link
Owner

@speric speric left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you SO MUCH for this PR. It's added neat functionality, and I really like the examples directory. You've definitely improved this project.

I left a few pieces of feedback. Some relate to Ruby idioms, others to my own idiosyncrasies.

Please ping me with any questions! 👏 🙇

@@ -101,6 +106,10 @@ def raw_quotes_page
Destiny is not born of decision; it is born of uncontrollable circumstances.
</span>
</div>
<div id="note-" class="a-row a-spacing-top-base kp-notebook-note aok-hidden kp-notebook-selectable">
<span id="note-label" class="a-size-small a-color-secondary">Note:</span>
<span id="note" class="a-size-base-plus a-color-base">This is a note!</span>
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is super-picky, but I like to see test data that mirrors something I'd actually use. It doesn't have to be very involved, something like:

 <span id="note" class="a-size-base-plus a-color-base">This was an inspiring quote.</span>

would suffice.

@@ -34,6 +34,8 @@ def test_fetching_highlights_for_a_book
assert_equal "306", highlight.location
assert_equal "Destiny is not born of decision; it is born of uncontrollable circumstances.", highlight.text
assert_equal "B000XUAETY", highlight.asin
assert_equal "Page: 7", highlight.page
assert_equal "This is a note!", highlight.note
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment below, re: test data.

README.md Outdated
highlight.location
#=> "197"
highlight.note
#=> "This is a note"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment below re: test data.

README.md Outdated
@@ -94,8 +100,12 @@ highlight.asin
#=> "B005CQ2ZE6"
highlight.text
#=> "One of the most dangerous things you can believe in this world is that technology is neutral."
highlight.page
#=> "Page: 7"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment below, re: just returning 7 for this instead of "Page: 7".


def self.from_html_elements(book:, html_elements:)
new(
asin: book.asin,
text: html_elements.children.search("div.kp-notebook-highlight").first.text.squish,
note: html_elements.children.search("span#note").first.text,
page: html_elements.children.search("span#annotationHighlightHeader").first.text.partition('|').last.lstrip,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this attribute is already named page, what do you think about just returning the page number as an integer? Instead of "Page: 7", just 7.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could do this - it's just a bit more complicated as we'd then need to read the full string and only return the page if the word is page... but happy to do so. What I was looking for was a generalised location which preferentially gave me the page number if available, and fell back to the location if it wasn't. This was also programmatically much easier! However I can see it's a bit limiting. I'll have a go at reworking it to just return the page number, and can rebuild the Page/Location from the two bits in the example.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've rewritten this now.

kindle = KindleHighlights::Client.new(
email_address: "[email protected]",
password: "password"
)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like you have two extra spaces here; the ) should line up with the first column.

require 'kindle_highlights'

def sanitize_filename(filename) #Function to remove invalid symbols from filenames
fn = filename.split /(?<=.)\.(?=[^.])(?!.*\
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're missing some characters here. Have you tried to run this in an irb console? What you want is something like:

def sanitize_filename(filename)
  filename
    .split(/(?<=.)\.(?=[^.])(?!.*\.[^.])/m)
    .map! { |s| s.gsub /[^a-z0-9\-]+/i, '_' }
    .join('.')
end

folder = File.expand_path('~/Kindle') #Folder you want to store the txt files (in markdown format) in...

kindle.books.each do |book|
fname = folder+"/"+sanitize_filename(book.title)+".txt"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Ruby convention is to prefer String interpolation to concatenation (see https://relishapp.com/womply/ruby-style-guide/docs/strings, among others). I'd also strongly encourage you to name your variables a bit more descriptively.

folder_name = "#{folder}/#{sanitize_filename(book.title)}.txt"

if highlight.note.present?
bookfile.puts "**Note:** #{highlight.note} \r\n" #If there's a Note attached to the highlight add this below the highlighted text
end
bookfile.puts "\r\n [Open in Kindle App](kindle://book?action=open&asin=#{book.asin}&location=#{highlight.location})\r\n\r\n" #Add a link to the highlight in the Kindle App
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is cool! Didn't know you could do that. 🎉

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neither did I - I love Google! :)

fname = folder+"/"+sanitize_filename(book.title)+".txt"
print book.title+"\r\n" #Print name of Book to console so you can track progress
bookfile = File.open(fname, "w")
bookfile.puts "# #{book.title} by #{book.author} \r\n" #Header of the Book Title and Author
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, another nit-picky thing, but I'd like to see two spaces per level of indentation. I don't think this line needs to be indented at all.

@speric speric mentioned this pull request Jan 9, 2018
@andykemp
Copy link
Author

andykemp commented Jan 9, 2018

Okay - I think I've reworked all those suggestions. Let me know if anything's missing... Sorry about the lack of ruby syntax - I've never done any ruby before - my syntax is a horrible mess of about 10 different languages I've dabbled with over the last 20 years!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants