Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle links where url is hidden in html #10

Open
yuletide opened this issue Dec 27, 2022 · 1 comment
Open

Handle links where url is hidden in html #10

yuletide opened this issue Dec 27, 2022 · 1 comment

Comments

@yuletide
Copy link
Owner

yuletide commented Dec 27, 2022

Example: https://mastodon.social/@[email protected]/109582649657814621

022-12-27T00:11:01Z app[c2c6b821] sjc [info]===== found mention in reply to yuletide id 109391862882784405 =====
2022-12-27T00:11:01Z app[c2c6b821] sjc [info]{'id': 109582736490955865, 'created_at': datetime.datetime(2022, 12, 27, 0, 11, tzinfo=tzlocal()), 'in_reply_to_id': None, 'in_reply_to_account_id': None, 'sensitive': False, 'spoiler_text': '', 'visibility': 'public', 'language': 'en', 'uri': 'https://mastodon.social/users/yuletide/statuses/109582736425325992', 'url': 'https://mastodon.social/@yuletide/109582736425325992', 'replies_count': 0, 'reblogs_count': 0, 'favourites_count': 0, 'edited_at': None, 'favourited': False, 'reblogged': False, 'muted': False, 'bookmarked': False, 'content': '<p><span class="h-card"><a href="https://botsin.space/@nitterbot" class="u-url mention" rel="nofollow noopener noreferrer" target="_blank">@<span>nitterbot</span></a></span> can you help with this link</p>', 'filtered': [], 'reblog': None, 'account': {'id': 109391862882784405, 'username': 'yuletide', 'acct': '[email protected]', 'display_name': 'alex yuletide', 'locked': False, 'bot': False, 'discoverable': True, 'group': False, 'created_at': datetime.datetime(2022, 6, 21, 0, 0, tzinfo=tzlocal()), 'note': '<p>Spatial solutions arch &amp; web dev, social justice, civic tech, heavy metal.  Available for work! \u2029Proud parent to <span class="h-card"><a href="https://botsin.space/@nitterbot" class="u-url mention" rel="nofollow noopener noreferrer" target="_blank">@<span>nitterbot</span></a></span>\u2028\u2029\u2029Past: Mapbox Solutions Architect &amp; Tech Lead @ Community Team, <span class="h-card"><a href="https://mastodon.social/@recursecenter" class="u-url mention" rel="nofollow noopener noreferrer" target="_blank">@<span>recursecenter</span></a></span> fellow, founder of civic tech startup now part of @granicus, @codeforamerica fellow, @esri\u2029\u2028\u2029<a href="https://mastodon.social/tags/vegetarian" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>vegetarian</span></a> <a href="https://mastodon.social/tags/zen" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>zen</span></a> <a href="https://mastodon.social/tags/metal" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>metal</span></a> <a href="https://mastodon.social/tags/bassmusic" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>bassmusic</span></a> <a href="https://mastodon.social/tags/dj" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>dj</span></a> <a href="https://mastodon.social/tags/maps" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>maps</span></a> <a href="https://mastodon.social/tags/photography" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>photography</span></a> <a href="https://mastodon.social/tags/webdev" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>webdev</span></a> <a href="https://mastodon.social/tags/politics" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>politics</span></a></p>', 'url': 'https://mastodon.social/@yuletide', 'avatar': 'https://files.botsin.space/cache/accounts/avatars/109/391/862/882/784/405/original/0efc492b3538e902.png', 'avatar_static': 'https://files.botsin.space/cache/accounts/avatars/109/391/862/882/784/405/original/0efc492b3538e902.png', 'header': 'https://files.botsin.space/cache/accounts/headers/109/391/862/882/784/405/original/1f2a8c1cc92143b4.png', 'header_static': 'https://files.botsin.space/cache/accounts/headers/109/391/862/882/784/405/original/1f2a8c1cc92143b4.png', 'followers_count': 150, 'following_count': 88, 'statuses_count': 171, 'last_status_at': datetime.datetime(2022, 12, 27, 0, 0), 'emojis': [], 'fields': [{'name': 'Birdsite', 'value': '<a href="HTTPS://twitter.com/yuletide" rel="nofollow noopener noreferrer" target="_blank"><span class="invisible"></span><span class="">HTTPS://twitter.com/yuletide</span><span class="invisible"></span></a>', 'verified_at': None}, {'name': 'LinkedSite', 'value': '<a href="https://linkedin.com/in/alexyule" rel="nofollow noopener noreferrer" target="_blank"><span class="invisible">https://</span><span class="">linkedin.com/in/alexyule</span><span class="invisible"></span></a>', 'verified_at': None}]}, 'media_attachments': [], 'mentions': [{'id': 109543657746642932, 'username': 'nitterbot', 'url': 'https://botsin.space/@nitterbot', 'acct': 'nitterbot'}], 'tags': [], 'emojis': [], 'card': None, 'poll': None}
2022-12-27T00:11:01Z app[c2c6b821] sjc [info]filtered status @nitterbot can you help with this link
2022-12-27T00:11:01Z app[c2c6b821] sjc [info]no birdsite found, checking parent
2022-12-27T00:11:01Z app[c2c6b821] sjc [info]checking parent

Current behavior: We use HTMLParser to strip out all HTML, but turns out statuses can be rich formatted which explains why this exists in the first place. Some have funky formatting so there will be some weird edge cases likely if we leave the HTML in...
Proposed behavior: Just replace all twitter.com with nitter instance, in both text and html and see what happens

@yuletide yuletide changed the title Handle links where url is hidden Handle links where url is hidden in html Dec 27, 2022
@yuletide
Copy link
Owner Author

yuletide commented Jan 7, 2023

Another one https://botsin.space/@[email protected]/109649227056480533

Seems to be a product of some crosspost bots, or this failed for some other reason

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant