PdfHandler Debugging: verschil tussen versies
Uiterlijk
Nieuwe pagina aangemaakt met '== PdfHandler Troubleshooting: Restart Routine == When adjusting PdfHandler prerequisites (Entware binaries, PATH updates, or MediaWiki configuration), make sure to restart both WebStation and the MediaWiki package. === Steps === # Verify prerequisites are installed and callable: <pre> which gs which pdftoppm which magick </pre> # Restart **WebStation**: <pre> Control Panel → Web Services → WebStation → Restart </pre> # Restart **MediaWiki package**…' Label: bewerking met nieuwe wikitekstmodus |
Geen bewerkingssamenvatting Label: bewerking met nieuwe wikitekstmodus |
||
| (3 tussenliggende versies door dezelfde gebruiker niet weergegeven) | |||
| Regel 34: | Regel 34: | ||
* Restarting MediaWiki ensures new PATH and handler settings are applied. | * Restarting MediaWiki ensures new PATH and handler settings are applied. | ||
* Always document the rationale for successors: “Restart both packages after changes.” | * Always document the rationale for successors: “Restart both packages after changes.” | ||
{| class="wikitable" | |||
! Symptom | |||
! Likely Cause | |||
! Diagnostic Step | |||
! Fix / Routine | |||
|- | |||
| PDF upload shows no preview | |||
| Missing or mis-set `$wgPdfPostProcessor` | |||
| Check `LocalSettings.php` for correct binary path | |||
| Set to `magick` (ImageMagick v7) or `gs` (Ghostscript) as installed | |||
|- | |||
| Error: "open_basedir restriction" | |||
| DSM PHP configuration limits | |||
| Inspect `php.ini` and DSM Web Station settings | |||
| Add required paths to `open_basedir` or disable restriction for MediaWiki | |||
|- | |||
| PdfHandler fails silently | |||
| PATH not updated for Entware binaries | |||
| Run `which gs` / `which pdftoppm` via SSH | |||
| Update PATH in `LocalSettings.php` or DSM task scheduler | |||
|- | |||
| Wrong command syntax in `$wgPdfPostProcessor` | |||
| Dash or argument misconfiguration | |||
| Compare with PdfHandler documentation | |||
| Correct to `gs -q -dNOPAUSE -dBATCH ...` or `magick convert ...` | |||
|- | |||
| Preview works but thumbnails broken | |||
| ImageMagick vs Ghostscript mismatch | |||
| Test both processors manually | |||
| Standardize on one processor and document choice | |||
|} | |||
== PdfHandler Configuration == | |||
<pre> | |||
wfLoadExtension( 'PdfHandler' ); | |||
$wgShellbox = false; // Disable shellbox if you configure paths manually | |||
// Configure paths using the *absolute paths* found via 'which' commands | |||
// We use the full path to the binaries in /opt/bin/ | |||
$wgPdfProcessor = '/opt/bin/gs'; | |||
$wgImageMagickConvertCommand = '/opt/bin/convert'; | |||
$wgPdfInfo = '/opt/bin/pdfinfo'; | |||
$wgPdftoText = '/opt/bin/pdftotext'; | |||
// Ensure MediaWiki knows to use ImageMagick for processing | |||
$wgUseImageMagick = true; | |||
// The 'wgPdfPostProcessor' variable is usually not required when | |||
// 'wgImageMagickConvertCommand' is set and $wgUseImageMagick is true. | |||
// $wgPdfPostProcessor = 'convert'; | |||
# Output and quality | |||
$wgPdfOutputExtension = 'jpg'; | |||
$wgPdfHandlerDpi = 150; | |||
$wgPdfHandlerJpegQuality = 95; | |||
$wgPdfThumbnailType = 'jpg'; | |||
$wgPdfPagePreview = true; | |||
$wgPdfHandlerPreviewWidth = 800; | |||
$wgPdfHandlerPreviewHeight = 600; | |||
$wgMaxShellMemory = 2097152; | |||
</pre> | |||
=== Explanation === | |||
* '''wfLoadExtension( 'PdfHandler' );''' – Loads the PdfHandler extension. | |||
* '''$wgShellbox = false;''' – Disables Shellbox, since paths are configured manually. | |||
* '''Absolute paths''' – Each binary (`gs`, `convert`, `pdfinfo`, `pdftotext`) must be set to the full path found via `which`. | |||
* '''$wgUseImageMagick = true;''' – Ensures MediaWiki uses ImageMagick for PDF rendering. | |||
* '''$wgPdfPostProcessor''' – Normally not required when `convert` is set and ImageMagick is enabled. | |||
* '''Output settings''' – Previews and thumbnails are generated as JPG with DPI 150 and JPEG quality 95. | |||
* '''Preview dimensions''' – Width 800px, height 600px for page previews. | |||
* '''$wgMaxShellMemory''' – Sets memory limit for shell commands (here ~2 GB). | |||
=== Successor Notes === | |||
* Always verify paths after Entware installation (`which gs`, `which convert`). | |||
* Keep `Shellbox` disabled unless you migrate to a containerized setup. | |||
* Document chosen processor (Ghostscript vs ImageMagick) for consistency. | |||
* Adjust DPI/quality if storage or performance becomes an issue. | |||
== Extension Matrix == | |||
{| class="wikitable" | |||
! Extension | |||
! Purpose | |||
! Complexity | |||
! Successor Notes | |||
|- | |||
| PdfHandler | |||
| Renders PDF previews and thumbnails | |||
| Medium (requires Entware binaries, path config) | |||
| Verify absolute paths (`which gs`, `which convert`). Document chosen processor (Ghostscript vs ImageMagick). | |||
|- | |||
| Lockdown / NamespaceProtection | |||
| Restrict read/edit access by namespace | |||
| Low–Medium | |||
| Use for Research/ICT privacy. Document group permissions and release routine. | |||
|- | |||
| VisualEditor | |||
| WYSIWYG editing for pages | |||
| High (Parsoid service required) | |||
| Ensure Parsoid service runs. Document fallback to wikitext if unavailable. | |||
|- | |||
| DiscussionTools | |||
| Structured talk page replies | |||
| Low | |||
| No extra config. Document usage for collaborative discussions. | |||
|- | |||
| Collection / ElectronPdfService | |||
| Export pages to PDF | |||
| Medium | |||
| Choose one workflow. Document install steps separately from PdfHandler. | |||
|- | |||
| Cite / CiteThisPage | |||
| Reference management and citation export | |||
| Low | |||
| Document citation style conventions for club research. | |||
|- | |||
| CategoryTree | |||
| Hierarchical category browsing | |||
| Low | |||
| Useful for research archives. Document how to expand/collapse trees. | |||
|- | |||
| PageForms | |||
| Structured data entry forms | |||
| Medium | |||
| Document form templates clearly. Successors should know how to edit form definitions. | |||
|} | |||
== PdfHandler with Text Extraction == | |||
<pre> | |||
wfLoadExtension( 'PdfHandler' ); | |||
$wgShellbox = false; // Disable shellbox if you configure paths manually | |||
// Absolute paths to binaries (verify with 'which' commands) | |||
$wgPdfProcessor = '/opt/bin/gs'; | |||
$wgImageMagickConvertCommand = '/opt/bin/convert'; | |||
$wgPdfInfo = '/opt/bin/pdfinfo'; | |||
$wgPdftoText = '/opt/bin/pdftotext'; // Enables text extraction from PDFs | |||
// Ensure MediaWiki uses ImageMagick | |||
$wgUseImageMagick = true; | |||
# Output and quality | |||
$wgPdfOutputExtension = 'jpg'; | |||
$wgPdfHandlerDpi = 150; | |||
$wgPdfHandlerJpegQuality = 95; | |||
$wgPdfThumbnailType = 'jpg'; | |||
$wgPdfPagePreview = true; | |||
$wgPdfHandlerPreviewWidth = 800; | |||
$wgPdfHandlerPreviewHeight = 600; | |||
$wgMaxShellMemory = 2097152; | |||
</pre> | |||
=== Explanation === | |||
* '''$wgPdftoText''' – Points to the Poppler `pdftotext` binary. This allows MediaWiki to extract text layers from uploaded PDFs. | |||
* '''Benefit''' – Club members can search within digitized papers, copy text snippets, and use extracted text for research. | |||
* '''Successor routine''' – Verify Poppler is installed (`which pdftotext`). If missing, install via Entware. | |||
* '''Integration''' – Text extraction works alongside previews; no extra extension needed. | |||
=== Research Use Case === | |||
* Old digitized papers become searchable, making it easier to cross‑reference names, dates, and places. | |||
* Successors should document how extracted text is used (search, indexing, or conversion into wiki pages). | |||
Huidige versie van 9 dec 2025 19:22
PdfHandler Troubleshooting: Restart Routine
When adjusting PdfHandler prerequisites (Entware binaries, PATH updates, or MediaWiki configuration), make sure to restart both WebStation and the MediaWiki package.
Steps
- Verify prerequisites are installed and callable:
which gs which pdftoppm which magick
- Restart **WebStation**:
Control Panel → Web Services → WebStation → Restart
- Restart **MediaWiki package**:
Package Center → Installed → MediaWiki → Restart
- Clear cached test files if needed:
rm /volume1/web/mediawiki/images/*.pdf rm /volume1/web/mediawiki/images/thumb/*.pdf
- Upload a fresh PDF and confirm thumbnails are generated.
Notes
- Restarting WebStation alone may not reload MediaWiki’s environment variables.
- Restarting MediaWiki ensures new PATH and handler settings are applied.
- Always document the rationale for successors: “Restart both packages after changes.”
| Symptom | Likely Cause | Diagnostic Step | Fix / Routine |
|---|---|---|---|
| PDF upload shows no preview | Missing or mis-set `$wgPdfPostProcessor` | Check `LocalSettings.php` for correct binary path | Set to `magick` (ImageMagick v7) or `gs` (Ghostscript) as installed |
| Error: "open_basedir restriction" | DSM PHP configuration limits | Inspect `php.ini` and DSM Web Station settings | Add required paths to `open_basedir` or disable restriction for MediaWiki |
| PdfHandler fails silently | PATH not updated for Entware binaries | Run `which gs` / `which pdftoppm` via SSH | Update PATH in `LocalSettings.php` or DSM task scheduler |
| Wrong command syntax in `$wgPdfPostProcessor` | Dash or argument misconfiguration | Compare with PdfHandler documentation | Correct to `gs -q -dNOPAUSE -dBATCH ...` or `magick convert ...` |
| Preview works but thumbnails broken | ImageMagick vs Ghostscript mismatch | Test both processors manually | Standardize on one processor and document choice |
PdfHandler Configuration
wfLoadExtension( 'PdfHandler' ); $wgShellbox = false; // Disable shellbox if you configure paths manually // Configure paths using the *absolute paths* found via 'which' commands // We use the full path to the binaries in /opt/bin/ $wgPdfProcessor = '/opt/bin/gs'; $wgImageMagickConvertCommand = '/opt/bin/convert'; $wgPdfInfo = '/opt/bin/pdfinfo'; $wgPdftoText = '/opt/bin/pdftotext'; // Ensure MediaWiki knows to use ImageMagick for processing $wgUseImageMagick = true; // The 'wgPdfPostProcessor' variable is usually not required when // 'wgImageMagickConvertCommand' is set and $wgUseImageMagick is true. // $wgPdfPostProcessor = 'convert'; # Output and quality $wgPdfOutputExtension = 'jpg'; $wgPdfHandlerDpi = 150; $wgPdfHandlerJpegQuality = 95; $wgPdfThumbnailType = 'jpg'; $wgPdfPagePreview = true; $wgPdfHandlerPreviewWidth = 800; $wgPdfHandlerPreviewHeight = 600; $wgMaxShellMemory = 2097152;
Explanation
- wfLoadExtension( 'PdfHandler' ); – Loads the PdfHandler extension.
- $wgShellbox = false; – Disables Shellbox, since paths are configured manually.
- Absolute paths – Each binary (`gs`, `convert`, `pdfinfo`, `pdftotext`) must be set to the full path found via `which`.
- $wgUseImageMagick = true; – Ensures MediaWiki uses ImageMagick for PDF rendering.
- $wgPdfPostProcessor – Normally not required when `convert` is set and ImageMagick is enabled.
- Output settings – Previews and thumbnails are generated as JPG with DPI 150 and JPEG quality 95.
- Preview dimensions – Width 800px, height 600px for page previews.
- $wgMaxShellMemory – Sets memory limit for shell commands (here ~2 GB).
Successor Notes
- Always verify paths after Entware installation (`which gs`, `which convert`).
- Keep `Shellbox` disabled unless you migrate to a containerized setup.
- Document chosen processor (Ghostscript vs ImageMagick) for consistency.
- Adjust DPI/quality if storage or performance becomes an issue.
Extension Matrix
| Extension | Purpose | Complexity | Successor Notes |
|---|---|---|---|
| PdfHandler | Renders PDF previews and thumbnails | Medium (requires Entware binaries, path config) | Verify absolute paths (`which gs`, `which convert`). Document chosen processor (Ghostscript vs ImageMagick). |
| Lockdown / NamespaceProtection | Restrict read/edit access by namespace | Low–Medium | Use for Research/ICT privacy. Document group permissions and release routine. |
| VisualEditor | WYSIWYG editing for pages | High (Parsoid service required) | Ensure Parsoid service runs. Document fallback to wikitext if unavailable. |
| DiscussionTools | Structured talk page replies | Low | No extra config. Document usage for collaborative discussions. |
| Collection / ElectronPdfService | Export pages to PDF | Medium | Choose one workflow. Document install steps separately from PdfHandler. |
| Cite / CiteThisPage | Reference management and citation export | Low | Document citation style conventions for club research. |
| CategoryTree | Hierarchical category browsing | Low | Useful for research archives. Document how to expand/collapse trees. |
| PageForms | Structured data entry forms | Medium | Document form templates clearly. Successors should know how to edit form definitions. |
PdfHandler with Text Extraction
wfLoadExtension( 'PdfHandler' ); $wgShellbox = false; // Disable shellbox if you configure paths manually // Absolute paths to binaries (verify with 'which' commands) $wgPdfProcessor = '/opt/bin/gs'; $wgImageMagickConvertCommand = '/opt/bin/convert'; $wgPdfInfo = '/opt/bin/pdfinfo'; $wgPdftoText = '/opt/bin/pdftotext'; // Enables text extraction from PDFs // Ensure MediaWiki uses ImageMagick $wgUseImageMagick = true; # Output and quality $wgPdfOutputExtension = 'jpg'; $wgPdfHandlerDpi = 150; $wgPdfHandlerJpegQuality = 95; $wgPdfThumbnailType = 'jpg'; $wgPdfPagePreview = true; $wgPdfHandlerPreviewWidth = 800; $wgPdfHandlerPreviewHeight = 600; $wgMaxShellMemory = 2097152;
Explanation
- $wgPdftoText – Points to the Poppler `pdftotext` binary. This allows MediaWiki to extract text layers from uploaded PDFs.
- Benefit – Club members can search within digitized papers, copy text snippets, and use extracted text for research.
- Successor routine – Verify Poppler is installed (`which pdftotext`). If missing, install via Entware.
- Integration – Text extraction works alongside previews; no extra extension needed.
Research Use Case
- Old digitized papers become searchable, making it easier to cross‑reference names, dates, and places.
- Successors should document how extracted text is used (search, indexing, or conversion into wiki pages).