You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Found an peculiarity in extracting text from telegram posts.
So the link for example is here .
The text of the post is :
Interestingly, most people voted for Ukrainian to be the subject of my new post. So here you go — below is everything you wanted to know about the relationship of
...
In html file the link just in div:
<div>
Interestingly, most people <ahref="https://t.me/durov/271" target="_blank" rel="noopener" onclick="return confirm('Open this link?\n\n'+this.href);">voted</a> for Ukrainian to be the subject of my new post. So here you go — below is everything you wanted to know about the relationship of
...
</div>
But after extracting markdown text contains two \n before each link:
Interestingly, most people
voted for Ukrainian to be the subject of my new post. So here you go — below is everything you wanted to know about the relationship of
Hello!
Found an peculiarity in extracting text from telegram posts.
So the link for example is here .
The text of the post is :
In html file the link just in div:
But after extracting markdown text contains two
\n
before each link:The code of extraction:
I tried to use
favor_precision
option but its removing formatting and links.The text was updated successfully, but these errors were encountered: