Skip to content

Commit b816ada

Browse files
authored
DEV: General refactoring and cleanup (#474)
Included: * DEV: Remove some defunct sites * DEV: Refactor GoogleDocsOnebox * DEV: Refactor PastebinOnebox * DEV: Refactor GfycatOnebox * DEV: Refactor PdfOnebox * DEV: Refactor PubmedOnebox * DEV: Refactor AmazonOnebox * DEV: Minor InstagramOnebox clean up * DEV: Small tweaks to TrelloOnebox Other stuff: * DEV: Don't use `link` and `url` interchangeably * DEV: Roll `Layout#link` into `Layout#uri` * DEV: Properly memoize `Layout#uri` * DEV: Re-use `Layout#uri` * DEV: Rename `get_secure_image` to `secure_image_url` * DEV: Normalize `matches_regexp` calls * DEV: Use path-specific encoding rules * DEV: `Layout#checksum` is no longer used * DEV: Remove double `protected` * DEV: Refactor `Matcher` * DEV: Use the existing `Engine#uri` method * DEV: The first `Engine#initialize` arg is `url` * DEV: Use `match?` * DEV: Simplify allowlistedgeneric template * DEV: Initialize `@options` in json spec * DEV: Simplify conditionals * DEV: Merge attr_readers * DEV: Whitespace, `after()` before tests
1 parent 907be5b commit b816ada

33 files changed

+369
-248
lines changed

lib/onebox/engine.rb

Lines changed: 6 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -28,32 +28,28 @@ def self.origins_to_regexes(origins)
2828
end
2929
end
3030

31-
attr_reader :url, :uri
32-
attr_reader :timeout
31+
attr_reader :url, :uri, :options, :timeout
3332
attr :errors
3433

3534
DEFAULT = {}
36-
def options
37-
@options
38-
end
3935

4036
def options=(opt)
41-
return @options if opt.nil? #make sure options provided
42-
opt = opt.to_h if opt.instance_of?(OpenStruct)
37+
return @options if opt.nil? # make sure options provided
38+
opt = opt.to_h if opt.instance_of?(OpenStruct)
4339
@options.merge!(opt)
4440
@options
4541
end
4642

47-
def initialize(link, timeout = nil)
43+
def initialize(url, timeout = nil)
4844
@errors = {}
4945
@options = DEFAULT
5046
class_name = self.class.name.split("::").last.to_s
5147

5248
# Set the engine options extracted from global options.
5349
self.options = Onebox.options[class_name] || {}
5450

55-
@url = link
56-
@uri = URI(link)
51+
@url = url
52+
@uri = URI(url)
5753
if always_https?
5854
@uri.scheme = 'https'
5955
@url = @uri.to_s

lib/onebox/engine/allowlisted_generic_onebox.rb

Lines changed: 0 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,6 @@ def self.default_allowed_domains
2727
500px.com
2828
8tracks.com
2929
abc.net.au
30-
about.com
3130
answers.com
3231
arstechnica.com
3332
ask.com
@@ -36,11 +35,9 @@ def self.default_allowed_domains
3635
bbs.boingboing.net
3736
bestbuy.ca
3837
bestbuy.com
39-
blip.tv
4038
bloomberg.com
4139
businessinsider.com
4240
change.org
43-
clikthrough.com
4441
cnet.com
4542
cnn.com
4643
codepen.io
@@ -90,24 +87,20 @@ def self.default_allowed_domains
9087
meetup.com
9188
mixcloud.com
9289
mlb.com
93-
myshopify.com
9490
myspace.com
9591
nba.com
9692
npr.org
9793
nytimes.com
9894
photobucket.com
9995
pinterest.com
10096
reference.com
101-
revision3.com
10297
rottentomatoes.com
10398
samsung.com
104-
screenr.com
10599
scribd.com
106100
slideshare.net
107101
sourceforge.net
108102
speakerdeck.com
109103
spotify.com
110-
squidoo.com
111104
streamable.com
112105
techcrunch.com
113106
ted.com
@@ -124,7 +117,6 @@ def self.default_allowed_domains
124117
twitpic.com
125118
usatoday.com
126119
viddler.com
127-
videojug.com
128120
vine.co
129121
walmart.com
130122
washingtonpost.com
@@ -275,7 +267,6 @@ def data
275267

276268
def rewrite_https(html)
277269
return unless html
278-
uri = URI(@url)
279270
if AllowlistedGenericOnebox.host_matches(uri, AllowlistedGenericOnebox.rewrites)
280271
html = html.gsub("http://", "https://")
281272
end

lib/onebox/engine/amazon_onebox.rb

Lines changed: 23 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -19,10 +19,8 @@ def url
1919
# If possible, fetch the cached HTML body immediately so we can
2020
# try to grab the canonical URL from that document,
2121
# rather than guess at the best URL structure to use
22-
if body_cacher&.respond_to?('cache_response_body?')
23-
if body_cacher.cache_response_body?(uri.to_s) && body_cacher.cached_response_body_exists?(uri.to_s)
24-
@raw ||= Onebox::Helpers.fetch_html_doc(@url, http_params, body_cacher)
25-
end
22+
if !@raw && has_cached_body
23+
@raw = Onebox::Helpers.fetch_html_doc(@url, http_params, body_cacher)
2624
end
2725

2826
if @raw
@@ -31,7 +29,8 @@ def url
3129
end
3230

3331
if match && match[:id]
34-
return "https://www.amazon.#{tld}/dp/#{Onebox::Helpers.uri_encode(match[:id])}"
32+
id = Addressable::URI.encode_component(match[:id], Addressable::URI::CharacterClasses::PATH)
33+
return "https://www.amazon.#{tld}/dp/#{id}"
3534
end
3635

3736
@url
@@ -49,6 +48,12 @@ def http_params
4948

5049
private
5150

51+
def has_cached_body
52+
body_cacher&.respond_to?('cache_response_body?') &&
53+
body_cacher.cache_response_body?(uri.to_s) &&
54+
body_cacher.cached_response_body_exists?(uri.to_s)
55+
end
56+
5257
def match
5358
@match ||= @url.match(/(?:d|g)p\/(?:product\/|video\/detail\/)?(?<id>[A-Z0-9]+)(?:\/|\?|$)/mi)
5459
end
@@ -57,19 +62,21 @@ def image
5762
if (main_image = raw.css("#main-image")) && main_image.any?
5863
attributes = main_image.first.attributes
5964

60-
return attributes["data-a-hires"].to_s if attributes["data-a-hires"]
61-
62-
if attributes["data-a-dynamic-image"]
65+
if attributes["data-a-hires"]
66+
return attributes["data-a-hires"].to_s
67+
elsif attributes["data-a-dynamic-image"]
6368
return ::JSON.parse(attributes["data-a-dynamic-image"].value).keys.first
6469
end
6570
end
6671

6772
if (landing_image = raw.css("#landingImage")) && landing_image.any?
6873
attributes = landing_image.first.attributes
6974

70-
return attributes["data-old-hires"].to_s if attributes["data-old-hires"]
71-
72-
landing_image.first["src"].to_s
75+
if attributes["data-old-hires"]
76+
return attributes["data-old-hires"].to_s
77+
else
78+
return landing_image.first["src"].to_s
79+
end
7380
end
7481

7582
if (ebook_image = raw.css("#ebooksImgBlkFront")) && ebook_image.any?
@@ -91,16 +98,16 @@ def price
9198
end
9299

93100
def multiple_authors(authors_xpath)
94-
author_list = raw.xpath(authors_xpath)
95-
authors = []
96-
author_list.each { |a| authors << a.inner_text.strip }
97-
authors.join(", ")
101+
raw
102+
.xpath(authors_xpath)
103+
.map { |a| a.inner_text.strip }
104+
.join(", ")
98105
end
99106

100107
def data
101108
og = ::Onebox::OpenGraph.new(raw)
102109

103-
if raw.at_css('#dp.book_mobile') #printed books
110+
if raw.at_css('#dp.book_mobile') # printed books
104111
title = raw.at("h1#title")&.inner_text
105112
authors = raw.at_css('#byline_secondary_view_div') ? multiple_authors("//div[@id='byline_secondary_view_div']//span[@class='a-text-bold']") : raw.at("#byline")&.inner_text
106113
rating = raw.at("#averageCustomerReviews_feature_div .a-icon")&.inner_text || raw.at("#cmrsArcLink .a-icon")&.inner_text

lib/onebox/engine/flickr_onebox.rb

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@ def album_html(og)
3232
<span class='album-title'>#{album_title}</span>
3333
</span>
3434
</span>
35-
<img src='#{og.get_secure_image}' #{og.title_attr} height='#{og.image_height}' width='#{og.image_width}'>
35+
<img src='#{og.secure_image_url}' #{og.title_attr} height='#{og.image_height}' width='#{og.image_width}'>
3636
</a>
3737
</div>
3838
HTML
@@ -43,7 +43,7 @@ def image_html(og)
4343

4444
<<-HTML
4545
<a href='#{escaped_url}' target='_blank' rel='noopener' class="onebox">
46-
<img src='#{og.get_secure_image}' #{og.title_attr} alt='Imgur' height='#{og.image_height}' width='#{og.image_width}'>
46+
<img src='#{og.secure_image_url}' #{og.title_attr} alt='Imgur' height='#{og.image_height}' width='#{og.image_width}'>
4747
</a>
4848
HTML
4949
end

lib/onebox/engine/gfycat_onebox.rb

Lines changed: 26 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -9,8 +9,8 @@ class GfycatOnebox
99
matches_regexp(/^https?:\/\/gfycat\.com\//)
1010
always_https
1111

12+
# This engine should have priority over AllowlistedGenericOnebox.
1213
def self.priority
13-
# This engine should have priority over AllowlistedGenericOnebox.
1414
1
1515
end
1616

@@ -21,6 +21,7 @@ def to_html
2121
<img src="https://gfycat.com/static/favicons/favicon-96x96.png" class="site-icon" width="64" height="64">
2222
<a href="#{data[:url]}" target="_blank" rel="nofollow ugc noopener">Gfycat.com</a>
2323
</header>
24+
2425
<article class="onebox-body">
2526
<h4>
2627
#{data[:title]} by
@@ -36,11 +37,12 @@ def to_html
3637
<img title="Sorry, your browser doesn't support HTML5 video." src="#{data[:posterUrl]}">
3738
</video>
3839
</div>
40+
3941
<p>
4042
<span class="label1">#{data[:keywords]}</span>
4143
</p>
42-
4344
</article>
45+
4446
<div style="clear: both"></div>
4547
</aside>
4648
HTML
@@ -61,52 +63,50 @@ def match
6163
@match ||= @url.match(/^https?:\/\/gfycat\.com\/(gifs\/detail\/)?(?<name>.+)/)
6264
end
6365

64-
def nokogiri_page
65-
@nokogiri_page ||= begin
66-
response = Onebox::Helpers.fetch_response(url, redirect_limit: 10) rescue nil
67-
Nokogiri::HTML(response)
68-
end
69-
end
66+
def og_data
67+
return @og_data if defined?(@og_data)
7068

71-
def get_og_data
72-
og_data = {}
69+
response = Onebox::Helpers.fetch_response(url, redirect_limit: 10) rescue nil
70+
page = Nokogiri::HTML(response)
71+
script = page.at_css('script[type="application/ld+json"]')
7372

74-
if json_string = nokogiri_page.at_css('script[type="application/ld+json"]')&.text
75-
og_data = Onebox::Helpers.symbolize_keys(::MultiJson.load(json_string))
73+
if json_string = script&.text
74+
@og_data = Onebox::Helpers.symbolize_keys(::MultiJson.load(json_string))
75+
else
76+
@og_data = {}
7677
end
77-
78-
og_data
7978
end
8079

8180
def data
82-
og_data = get_og_data
81+
return @data if defined?(@data)
8382

84-
response = {
83+
@data = {
8584
name: match[:name],
8685
title: og_data[:headline] || 'No Title',
8786
author: og_data[:author],
88-
url: @url
87+
url: @url,
8988
}
9089

91-
keywords = og_data[:keywords]&.split(',')
92-
if keywords
93-
response[:keywords] = keywords.map { |t| "<a href='https://gfycat.com/gifs/search/#{t}'>##{t}</a>" }.join(' ')
90+
if keywords = og_data[:keywords]&.split(',')
91+
@data[:keywords] = keywords
92+
.map { |keyword| "<a href='https://gfycat.com/gifs/search/#{keyword}'>##{keyword}</a>" }
93+
.join(' ')
9494
end
9595

9696
if og_data[:video]
9797
content_url = ::Onebox::Helpers.normalize_url_for_output(og_data[:video][:contentUrl])
9898
video_url = Pathname.new(content_url)
99-
response[:webmUrl] = video_url.sub_ext(".webm").to_s
100-
response[:mp4Url] = video_url.sub_ext(".mp4").to_s
99+
@data[:webmUrl] = video_url.sub_ext(".webm").to_s
100+
@data[:mp4Url] = video_url.sub_ext(".mp4").to_s
101101

102102
thumbnail_url = ::Onebox::Helpers.normalize_url_for_output(og_data[:video][:thumbnailUrl])
103-
response[:posterUrl] = thumbnail_url
103+
@data[:posterUrl] = thumbnail_url
104104

105-
response[:width] = og_data[:video][:width]
106-
response[:height] = og_data[:video][:height]
105+
@data[:width] = og_data[:video][:width]
106+
@data[:height] = og_data[:video][:height]
107107
end
108108

109-
response
109+
@data
110110
end
111111
end
112112
end

lib/onebox/engine/github_commit_onebox.rb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ class GithubCommitOnebox
1010
include JSON
1111
include Onebox::Mixins::GithubBody
1212

13-
matches_regexp Regexp.new("^https?://(?:www\.)?(?:(?:\w)+\.)?(github)\.com(?:/)?(?:.)*/commit/")
13+
matches_regexp(/^https?:\/\/(?:www\.)?(?:(?:\w)+\.)?(github)\.com(?:\/)?(?:.)*\/commit\//)
1414
always_https
1515

1616
def url

lib/onebox/engine/github_folder_onebox.rb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ class GithubFolderOnebox
77
include StandardEmbed
88
include LayoutSupport
99

10-
matches_regexp Regexp.new(/^https?:\/\/(?:www\.)?(?:(?:\w)+\.)?(github)\.com[\:\d]*(\/[^\/]+){2}/)
10+
matches_regexp(/^https?:\/\/(?:www\.)?(?:(?:\w)+\.)?(github)\.com[\:\d]*(\/[^\/]+){2}/)
1111
always_https
1212

1313
def self.priority

0 commit comments

Comments
 (0)