You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am using the library for a while now. However, today I noticed that if I save the content on the web as PDF using Microsoft PDF driver (that is, printing to PDF) then the code is unable to retrieve the text.
Here is one of such examples that I print to PDF: https://healingthebody.ca/4-natural-proven-cancer-remedies/
and here is the code:
`using (PdfDocument document = PdfDocument.Open(fileStream))
{
PdfDocInfo pdfDocInfo = new PdfDocInfo()
{
DocFilePath = fileName,
TotalPages = document.NumberOfPages,
Version = document.Version,
Title = document.Information.Title,
Subject = document.Information.Subject,
Author = document.Information.Author,
DateCreated = dateCreated,
DateModified = dateModified,
};
string docText = "";
string pattern = @"(?<=['""A-Za-z0-9][\.\!\?])\s+(?=[A-Z])";
foreach (Page page in document.GetPages())
{
docText += ContentOrderTextExtractor.GetText(page, true);
}
// At this point docText is empty because each page delivers empty string through this GetText API`
}
Any remedy for this?
The text was updated successfully, but these errors were encountered:
I am using the library for a while now. However, today I noticed that if I save the content on the web as PDF using Microsoft PDF driver (that is, printing to PDF) then the code is unable to retrieve the text.
Here is one of such examples that I print to PDF:
https://healingthebody.ca/4-natural-proven-cancer-remedies/
and here is the code:
Any remedy for this?
The text was updated successfully, but these errors were encountered: