The extension does not seem to be able to extract social media ids within <a> links or entity candidates from <title> #84

teolemon · 2022-06-18T09:42:27Z

What

The extension does not seem to be able to extract social media id within links on this page https://www.saint-ouen.fr/
Despite a good <title> tag, it's not able to propose candidates entities

Screenshot

HTML samples

<title>Accueil - Mairie de Saint-Ouen-sur-Seine</title>

  <li>
    <a href="https://www.facebook.com/villesaintouen" class="link-rs" target="_blank" rel="nofollow">
      <img src="/fileadmin/user_upload/fichiers/ic%C3%B4nes/facebook.png" width="20" height="20" alt="">
      <span class="out">Facebook</span>
    </a>
  </li>
  <li>
    <a href="https://twitter.com/villesaintouen" class="link-rs" target="_blank" rel="nofollow">
      <img src="/fileadmin/user_upload/fichiers/ic%C3%B4nes/twitter.png" width="20" height="20" alt="">
      <span class="out">Twitter</span>
    </a>
  </li>
  <li>
    <a href="https://www.instagram.com/villesaintouen" class="link-rs" target="_blank" rel="nofollow">
      <img src="/fileadmin/user_upload/fichiers/ic%C3%B4nes/instagram-icon.png" width="20" height="20" alt="">
      <span class="out">Instagram</span>
    </a>
  </li>
  <li>
    <a href="https://www.youtube.com/mairiesaintouen93" class="link-rs" target="_blank" rel="nofollow">
      <img src="/fileadmin/user_upload/fichiers/ic%C3%B4nes/youtube-icon.png" width="20" height="20" alt="">
      <span class="out">Youtube</span>
    </a>
  </li>
</ul>

The text was updated successfully, but these errors were encountered:

fuddl · 2022-06-18T13:42:03Z

Hi,

You expect https://www.saint-ouen.fr to be matched to Q208889 because it contains a link to https://twitter.com/villesaintouen which is connected to Q208889? is that it?

teolemon · 2022-06-19T08:52:03Z

not even that (although it could be another interesting issue)
I just expected it to propose Twitter, Instagram and YouTube as suggested properties
Is that because it didn't detect a candidate match ? Is some regex done on the social urls within HTML ?

Back to your point, it could indeed be an idea, more pressant would be the fact that

is already on the item, so it could be leverage if the query is not too expensive

fuddl · 2022-06-19T12:34:13Z

I just expected it to propose Twitter, Instagram and YouTube as suggested properties

I actually planned this, but resolving any link on a website turned out to be slow

Back to your point, it could indeed be an idea, more pressant would be the fact that

is already on the item, so it could be leverage if the query is not too expensive

That is indeed, very annoying. Thanks for writing it down. I'll see what I can do

fuddl · 2022-10-06T17:56:17Z

You can do this now

click Add a new statement
wait for the links to resolve
select the social media link that is missing
click Send to wikidata

Result: The statement will be added with a very accuarate source statement. 🎉

thibaultmol · 2022-10-07T06:57:14Z

And how is it supposed to work atm? Because one of the problems I often ran into was: i go to the 'contact' page, and that specific page isn't linked on the wikidata item. But I don't want to link the contact page specifically to it, I just want to use it to extract statements.

fuddl · 2022-10-07T07:05:23Z

And how is it supposed to work atm? Because one of the problems I often ran into was: i go to the 'contact' page, and that specific page isn't linked on the wikidata item. But I don't want to link the contact page specifically to it, I just want to use it to extract statements.

I'm afraid I cannot offer a perfect solution since I cannot confidentally reduce that every page under the same domain represents the same item but here is a workaround for that specific problem:

Lets say this is the frontpage:

And this is your contact page:

You can append #wd:[wikidata id] to the URL in this case the url https://www.saint-ouen.fr/404.html#wd:Q208889:

This suffix causes the extension to always resolve to the specified item: now you can go ahead as described above.

thibaultmol · 2022-10-07T07:06:43Z

I see. (would it be possible to have this be a button in the sidebar instead?)
just a checkbox you can check like "Look at main domain"

thibaultmol · 2022-10-07T08:24:16Z

(also: Facebook ID's don't seem to get extracted atm. )

fuddl · 2022-10-07T08:48:38Z

(also: Facebook ID's don't seem to get extracted atm. )

Please show me an example

fuddl mentioned this issue Jun 19, 2022

Website resolver usually fails #86

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The extension does not seem to be able to extract social media ids within <a> links or entity candidates from <title> #84

The extension does not seem to be able to extract social media ids within <a> links or entity candidates from <title> #84

teolemon commented Jun 18, 2022

fuddl commented Jun 18, 2022 •

edited

Loading

teolemon commented Jun 19, 2022

fuddl commented Jun 19, 2022

fuddl commented Oct 6, 2022 •

edited

Loading

thibaultmol commented Oct 7, 2022

fuddl commented Oct 7, 2022

thibaultmol commented Oct 7, 2022 •

edited

Loading

thibaultmol commented Oct 7, 2022

fuddl commented Oct 7, 2022

The extension does not seem to be able to extract social media ids within <a> links or entity candidates from <title> #84

The extension does not seem to be able to extract social media ids within <a> links or entity candidates from <title> #84

Comments

teolemon commented Jun 18, 2022

What

Screenshot

HTML samples

fuddl commented Jun 18, 2022 • edited Loading

teolemon commented Jun 19, 2022

fuddl commented Jun 19, 2022

fuddl commented Oct 6, 2022 • edited Loading

thibaultmol commented Oct 7, 2022

fuddl commented Oct 7, 2022

thibaultmol commented Oct 7, 2022 • edited Loading

thibaultmol commented Oct 7, 2022

fuddl commented Oct 7, 2022

fuddl commented Jun 18, 2022 •

edited

Loading

fuddl commented Oct 6, 2022 •

edited

Loading

thibaultmol commented Oct 7, 2022 •

edited

Loading