-
Notifications
You must be signed in to change notification settings - Fork 551
Turan Furkan Topak edited this page Nov 23, 2021
·
8 revisions
This sample will parse all the pdf file and extract text from each page.
<?php
// Parse pdf file and build necessary objects.
$parser = new \Smalot\PdfParser\Parser();
$pdf = $parser->parseFile('document.pdf');
$text = $pdf->getText();
echo $text;
?>
You can too extract text from each page handly or for a specific page.
<?php
// Parse pdf file and build necessary objects.
$parser = new \Smalot\PdfParser\Parser();
$pdf = $parser->parseFile('document.pdf');
// Retrieve all pages from the pdf file.
$pages = $pdf->getPages();
// Loop over each page to extract text.
foreach ($pages as $page) {
echo $page->getText();
}
?>
Here a sample code to extract metadata from document (Author, Creator, CreationDate, ...).
<?php
// Parse pdf file and build necessary objects.
$parser = new \Smalot\PdfParser\Parser();
$pdf = $parser->parseFile('document.pdf');
// Retrieve all details from the pdf file.
$details = $pdf->getDetails();
// Loop over each property to extract values (string or array).
foreach ($details as $property => $value) {
if (is_array($value)) {
$value = implode(', ', $value);
}
echo $property . ' => ' . $value . "\n";
}
?>
Note: The demo also uses the nl2br function. This function helps in maintaining a similar line layout in the pdf file.