Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't extract feed from UNIDO website #8

Open
ivbeg opened this issue Aug 15, 2022 · 1 comment
Open

Can't extract feed from UNIDO website #8

ivbeg opened this issue Aug 15, 2022 · 1 comment
Assignees
Labels
bug Something isn't working

Comments

@ivbeg
Copy link
Owner

ivbeg commented Aug 15, 2022

URL: https://www.unido.org/news
Reason: Date prefixed by city name and aligned right.
Examples:

  • GENEVA, 29 July 2022
  • VIENNA, 9 AUGUST 2022
  • Bangkok, 21-22 July 2022

Sometimes dates are missing in the text on news list

@ivbeg ivbeg added the bug Something isn't working label Aug 15, 2022
@ivbeg ivbeg self-assigned this Aug 15, 2022
@ivbeg
Copy link
Owner Author

ivbeg commented Aug 15, 2022

Possible solutions:

  • to follow each url and to extract date from dcterms.date metadata key
  • recognize right-aligned dates in text
  • extract date from last-modified header of associated media - example

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Development

No branches or pull requests

1 participant