Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The extension does not seem to be able to extract social media ids within <a> links or entity candidates from <title> #84

Open
teolemon opened this issue Jun 18, 2022 · 9 comments

Comments

@teolemon
Copy link

What

  • The extension does not seem to be able to extract social media id within links on this page https://www.saint-ouen.fr/
  • Despite a good <title> tag, it's not able to propose candidates entities

Screenshot

image

HTML samples

<title>Accueil - Mairie de Saint-Ouen-sur-Seine</title>

  <li>
    <a href="https://www.facebook.com/villesaintouen" class="link-rs" target="_blank" rel="nofollow">
      <img src="/fileadmin/user_upload/fichiers/ic%C3%B4nes/facebook.png" width="20" height="20" alt="">
      <span class="out">Facebook</span>
    </a>
  </li>
  <li>
    <a href="https://twitter.com/villesaintouen" class="link-rs" target="_blank" rel="nofollow">
      <img src="/fileadmin/user_upload/fichiers/ic%C3%B4nes/twitter.png" width="20" height="20" alt="">
      <span class="out">Twitter</span>
    </a>
  </li>
  <li>
    <a href="https://www.instagram.com/villesaintouen" class="link-rs" target="_blank" rel="nofollow">
      <img src="/fileadmin/user_upload/fichiers/ic%C3%B4nes/instagram-icon.png" width="20" height="20" alt="">
      <span class="out">Instagram</span>
    </a>
  </li>
  <li>
    <a href="https://www.youtube.com/mairiesaintouen93" class="link-rs" target="_blank" rel="nofollow">
      <img src="/fileadmin/user_upload/fichiers/ic%C3%B4nes/youtube-icon.png" width="20" height="20" alt="">
      <span class="out">Youtube</span>
    </a>
  </li>
</ul>
@fuddl
Copy link
Owner

fuddl commented Jun 18, 2022

Hi,

You expect https://www.saint-ouen.fr to be matched to Q208889 because it contains a link to https://twitter.com/villesaintouen which is connected to Q208889? is that it?

@teolemon
Copy link
Author

not even that (although it could be another interesting issue)
I just expected it to propose Twitter, Instagram and YouTube as suggested properties
Is that because it didn't detect a candidate match ? Is some regex done on the social urls within HTML ?

Back to your point, it could indeed be an idea, more pressant would be the fact that
image
is already on the item, so it could be leverage if the query is not too expensive

@fuddl
Copy link
Owner

fuddl commented Jun 19, 2022

I just expected it to propose Twitter, Instagram and YouTube as suggested properties

I actually planned this, but resolving any link on a website turned out to be slow

Back to your point, it could indeed be an idea, more pressant would be the fact that
image
is already on the item, so it could be leverage if the query is not too expensive

That is indeed, very annoying. Thanks for writing it down. I'll see what I can do

@fuddl
Copy link
Owner

fuddl commented Oct 6, 2022

You can do this now

  1. click Add a new statement
  2. wait for the links to resolve
  3. select the social media link that is missing
  4. click Send to wikidata

Result: The statement will be added with a very accuarate source statement. 🎉

@thibaultmol
Copy link

And how is it supposed to work atm? Because one of the problems I often ran into was: i go to the 'contact' page, and that specific page isn't linked on the wikidata item. But I don't want to link the contact page specifically to it, I just want to use it to extract statements.

@fuddl
Copy link
Owner

fuddl commented Oct 7, 2022

And how is it supposed to work atm? Because one of the problems I often ran into was: i go to the 'contact' page, and that specific page isn't linked on the wikidata item. But I don't want to link the contact page specifically to it, I just want to use it to extract statements.

I'm afraid I cannot offer a perfect solution since I cannot confidentally reduce that every page under the same domain represents the same item but here is a workaround for that specific problem:

Lets say this is the frontpage:
Screenshot 2022-10-07 at 09 00 20
And this is your contact page:
Screenshot 2022-10-07 at 09 00 51
You can append #wd:[wikidata id] to the URL in this case the url https://www.saint-ouen.fr/404.html#wd:Q208889:
Screenshot 2022-10-07 at 09 01 36

This suffix causes the extension to always resolve to the specified item: now you can go ahead as described above.

@thibaultmol
Copy link

thibaultmol commented Oct 7, 2022

I see. (would it be possible to have this be a button in the sidebar instead?)
just a checkbox you can check like "Look at main domain"

@thibaultmol
Copy link

(also: Facebook ID's don't seem to get extracted atm. )

@fuddl
Copy link
Owner

fuddl commented Oct 7, 2022

(also: Facebook ID's don't seem to get extracted atm. )

Please show me an example

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants