-
I'm doing another example based on this URL "https://www.agenas.gov.it/covid19/web/index.php?r=site%2Ftab1" Via curl you do not have back the real content. I download it, via headless browser. trafilutura, not as curl, view the full content, and I would like to use it to download not the "cleaned/simplified" content that trafilatura cli gives to me, but the full raw content. Is there some Thank you |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
Hi @aborruso, so far you cannot get raw contents, Trafilatura performs a tag conversion targeting text. You can however use the XML output to operate on the corresponding tags. |
Beta Was this translation helpful? Give feedback.
Hi @aborruso, so far you cannot get raw contents, Trafilatura performs a tag conversion targeting text. You can however use the XML output to operate on the corresponding tags.