-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add field for page numbers & notes #19
base: master
Are you sure you want to change the base?
Conversation
Extract the page number (where available) and the contents of any notes recorded with that highlight. highlight.page will return "Page: 7" or "Location: 123" depending on whether the page number is available...
I was looking to find a way of extracting the data from my Kindle Highlights into a format I could easily import into import into Ulysees in markdown. Using this as a basis, with the tweaks above I was able to write a little ruby script which creates a separate .txt file for each book with a list of quotes, and the location/page number of the highlight - and where they exist the associated notes. I hope others will find these tweaks useful as it makes the page number and notes fields available to users. |
@andykemp Thanks for this PR! The test suite is currently failing: https://travis-ci.org/speric/kindle-highlights/jobs/324621115 I believe this is because the "mocked" Kindle HTML in https://github.com/speric/kindle-highlights/blob/master/test/fetching_books_and_highlights_test.rb does not include the HTML elements you are expecting. Can you add the markdown you expect the Kindle site to return to the "mocked" Kindle response in the test referenced above? Do those elements always exist in the Kindle HTML? |
Updated test data to include highlight.page and highlight.note
Update fetching_books_and_highlights_test.rb
Fixed test files to include notes and page data
Well that was messier than it should have been, but I've now got it fixed (and picked up an error on route). Yes the HTML elements always exist - if there's not a note then the span is empty. I've only tested on the English version of the site (read.amazon.com/notebook) but assume it would work on the others if everything else does. |
Updated to include details of highlight.page and highlight.note
I've also updated the README.md file to include details about these fields. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you SO MUCH for this PR. It's added neat functionality, and I really like the examples
directory. You've definitely improved this project.
I left a few pieces of feedback. Some relate to Ruby idioms, others to my own idiosyncrasies.
Please ping me with any questions! 👏 🙇
@@ -101,6 +106,10 @@ def raw_quotes_page | |||
Destiny is not born of decision; it is born of uncontrollable circumstances. | |||
</span> | |||
</div> | |||
<div id="note-" class="a-row a-spacing-top-base kp-notebook-note aok-hidden kp-notebook-selectable"> | |||
<span id="note-label" class="a-size-small a-color-secondary">Note:</span> | |||
<span id="note" class="a-size-base-plus a-color-base">This is a note!</span> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this is super-picky, but I like to see test data that mirrors something I'd actually use. It doesn't have to be very involved, something like:
<span id="note" class="a-size-base-plus a-color-base">This was an inspiring quote.</span>
would suffice.
@@ -34,6 +34,8 @@ def test_fetching_highlights_for_a_book | |||
assert_equal "306", highlight.location | |||
assert_equal "Destiny is not born of decision; it is born of uncontrollable circumstances.", highlight.text | |||
assert_equal "B000XUAETY", highlight.asin | |||
assert_equal "Page: 7", highlight.page | |||
assert_equal "This is a note!", highlight.note |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my comment below, re: test data.
README.md
Outdated
highlight.location | ||
#=> "197" | ||
highlight.note | ||
#=> "This is a note" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my comment below re: test data.
README.md
Outdated
@@ -94,8 +100,12 @@ highlight.asin | |||
#=> "B005CQ2ZE6" | |||
highlight.text | |||
#=> "One of the most dangerous things you can believe in this world is that technology is neutral." | |||
highlight.page | |||
#=> "Page: 7" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my comment below, re: just returning 7
for this instead of "Page: 7"
.
|
||
def self.from_html_elements(book:, html_elements:) | ||
new( | ||
asin: book.asin, | ||
text: html_elements.children.search("div.kp-notebook-highlight").first.text.squish, | ||
note: html_elements.children.search("span#note").first.text, | ||
page: html_elements.children.search("span#annotationHighlightHeader").first.text.partition('|').last.lstrip, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this attribute is already named page
, what do you think about just returning the page number as an integer? Instead of "Page: 7"
, just 7
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could do this - it's just a bit more complicated as we'd then need to read the full string and only return the page if the word is page... but happy to do so. What I was looking for was a generalised location which preferentially gave me the page number if available, and fell back to the location if it wasn't. This was also programmatically much easier! However I can see it's a bit limiting. I'll have a go at reworking it to just return the page number, and can rebuild the Page/Location from the two bits in the example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've rewritten this now.
examples/Kindle2Folder.rb
Outdated
kindle = KindleHighlights::Client.new( | ||
email_address: "[email protected]", | ||
password: "password" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like you have two extra spaces here; the )
should line up with the first column.
examples/Kindle2Folder.rb
Outdated
require 'kindle_highlights' | ||
|
||
def sanitize_filename(filename) #Function to remove invalid symbols from filenames | ||
fn = filename.split /(?<=.)\.(?=[^.])(?!.*\ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you're missing some characters here. Have you tried to run this in an irb console? What you want is something like:
def sanitize_filename(filename)
filename
.split(/(?<=.)\.(?=[^.])(?!.*\.[^.])/m)
.map! { |s| s.gsub /[^a-z0-9\-]+/i, '_' }
.join('.')
end
examples/Kindle2Folder.rb
Outdated
folder = File.expand_path('~/Kindle') #Folder you want to store the txt files (in markdown format) in... | ||
|
||
kindle.books.each do |book| | ||
fname = folder+"/"+sanitize_filename(book.title)+".txt" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Ruby convention is to prefer String interpolation to concatenation (see https://relishapp.com/womply/ruby-style-guide/docs/strings, among others). I'd also strongly encourage you to name your variables a bit more descriptively.
folder_name = "#{folder}/#{sanitize_filename(book.title)}.txt"
examples/Kindle2Folder.rb
Outdated
if highlight.note.present? | ||
bookfile.puts "**Note:** #{highlight.note} \r\n" #If there's a Note attached to the highlight add this below the highlighted text | ||
end | ||
bookfile.puts "\r\n [Open in Kindle App](kindle://book?action=open&asin=#{book.asin}&location=#{highlight.location})\r\n\r\n" #Add a link to the highlight in the Kindle App |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is cool! Didn't know you could do that. 🎉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Neither did I - I love Google! :)
examples/Kindle2Folder.rb
Outdated
fname = folder+"/"+sanitize_filename(book.title)+".txt" | ||
print book.title+"\r\n" #Print name of Book to console so you can track progress | ||
bookfile = File.open(fname, "w") | ||
bookfile.puts "# #{book.title} by #{book.author} \r\n" #Header of the Book Title and Author |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, another nit-picky thing, but I'd like to see two spaces per level of indentation. I don't think this line needs to be indented at all.
Tidying up code to be more ruby-esq
Okay - I think I've reworked all those suggestions. Let me know if anything's missing... Sorry about the lack of ruby syntax - I've never done any ruby before - my syntax is a horrible mess of about 10 different languages I've dabbled with over the last 20 years! |
Extract the page number (where available) and the contents of any notes recorded with that highlight.
highlight.page will return "Page: 7" or "Location: 123" depending on whether the page number is available...
highlight.note will return the contents of any notes associated with a highlight.