PdfHandler Debugging
Uiterlijk
PdfHandler Troubleshooting: Restart Routine
When adjusting PdfHandler prerequisites (Entware binaries, PATH updates, or MediaWiki configuration), make sure to restart both WebStation and the MediaWiki package.
Steps
- Verify prerequisites are installed and callable:
which gs which pdftoppm which magick
- Restart **WebStation**:
Control Panel → Web Services → WebStation → Restart
- Restart **MediaWiki package**:
Package Center → Installed → MediaWiki → Restart
- Clear cached test files if needed:
rm /volume1/web/mediawiki/images/*.pdf rm /volume1/web/mediawiki/images/thumb/*.pdf
- Upload a fresh PDF and confirm thumbnails are generated.
Notes
- Restarting WebStation alone may not reload MediaWiki’s environment variables.
- Restarting MediaWiki ensures new PATH and handler settings are applied.
- Always document the rationale for successors: “Restart both packages after changes.”
| Symptom | Likely Cause | Diagnostic Step | Fix / Routine |
|---|---|---|---|
| PDF upload shows no preview | Missing or mis-set `$wgPdfPostProcessor` | Check `LocalSettings.php` for correct binary path | Set to `magick` (ImageMagick v7) or `gs` (Ghostscript) as installed |
| Error: "open_basedir restriction" | DSM PHP configuration limits | Inspect `php.ini` and DSM Web Station settings | Add required paths to `open_basedir` or disable restriction for MediaWiki |
| PdfHandler fails silently | PATH not updated for Entware binaries | Run `which gs` / `which pdftoppm` via SSH | Update PATH in `LocalSettings.php` or DSM task scheduler |
| Wrong command syntax in `$wgPdfPostProcessor` | Dash or argument misconfiguration | Compare with PdfHandler documentation | Correct to `gs -q -dNOPAUSE -dBATCH ...` or `magick convert ...` |
| Preview works but thumbnails broken | ImageMagick vs Ghostscript mismatch | Test both processors manually | Standardize on one processor and document choice |
PdfHandler Configuration
wfLoadExtension( 'PdfHandler' ); $wgShellbox = false; // Disable shellbox if you configure paths manually // Configure paths using the *absolute paths* found via 'which' commands // We use the full path to the binaries in /opt/bin/ $wgPdfProcessor = '/opt/bin/gs'; $wgImageMagickConvertCommand = '/opt/bin/convert'; $wgPdfInfo = '/opt/bin/pdfinfo'; $wgPdftoText = '/opt/bin/pdftotext'; // Ensure MediaWiki knows to use ImageMagick for processing $wgUseImageMagick = true; // The 'wgPdfPostProcessor' variable is usually not required when // 'wgImageMagickConvertCommand' is set and $wgUseImageMagick is true. // $wgPdfPostProcessor = 'convert'; # Output and quality $wgPdfOutputExtension = 'jpg'; $wgPdfHandlerDpi = 150; $wgPdfHandlerJpegQuality = 95; $wgPdfThumbnailType = 'jpg'; $wgPdfPagePreview = true; $wgPdfHandlerPreviewWidth = 800; $wgPdfHandlerPreviewHeight = 600; $wgMaxShellMemory = 2097152;
Explanation
- wfLoadExtension( 'PdfHandler' ); – Loads the PdfHandler extension.
- $wgShellbox = false; – Disables Shellbox, since paths are configured manually.
- Absolute paths – Each binary (`gs`, `convert`, `pdfinfo`, `pdftotext`) must be set to the full path found via `which`.
- $wgUseImageMagick = true; – Ensures MediaWiki uses ImageMagick for PDF rendering.
- $wgPdfPostProcessor – Normally not required when `convert` is set and ImageMagick is enabled.
- Output settings – Previews and thumbnails are generated as JPG with DPI 150 and JPEG quality 95.
- Preview dimensions – Width 800px, height 600px for page previews.
- $wgMaxShellMemory – Sets memory limit for shell commands (here ~2 GB).
Successor Notes
- Always verify paths after Entware installation (`which gs`, `which convert`).
- Keep `Shellbox` disabled unless you migrate to a containerized setup.
- Document chosen processor (Ghostscript vs ImageMagick) for consistency.
- Adjust DPI/quality if storage or performance becomes an issue.
Extension Matrix
| Extension | Purpose | Complexity | Successor Notes |
|---|---|---|---|
| PdfHandler | Renders PDF previews and thumbnails | Medium (requires Entware binaries, path config) | Verify absolute paths (`which gs`, `which convert`). Document chosen processor (Ghostscript vs ImageMagick). |
| Lockdown / NamespaceProtection | Restrict read/edit access by namespace | Low–Medium | Use for Research/ICT privacy. Document group permissions and release routine. |
| VisualEditor | WYSIWYG editing for pages | High (Parsoid service required) | Ensure Parsoid service runs. Document fallback to wikitext if unavailable. |
| DiscussionTools | Structured talk page replies | Low | No extra config. Document usage for collaborative discussions. |
| Collection / ElectronPdfService | Export pages to PDF | Medium | Choose one workflow. Document install steps separately from PdfHandler. |
| Cite / CiteThisPage | Reference management and citation export | Low | Document citation style conventions for club research. |
| CategoryTree | Hierarchical category browsing | Low | Useful for research archives. Document how to expand/collapse trees. |
| PageForms | Structured data entry forms | Medium | Document form templates clearly. Successors should know how to edit form definitions. |
PdfHandler with Text Extraction
wfLoadExtension( 'PdfHandler' ); $wgShellbox = false; // Disable shellbox if you configure paths manually // Absolute paths to binaries (verify with 'which' commands) $wgPdfProcessor = '/opt/bin/gs'; $wgImageMagickConvertCommand = '/opt/bin/convert'; $wgPdfInfo = '/opt/bin/pdfinfo'; $wgPdftoText = '/opt/bin/pdftotext'; // Enables text extraction from PDFs // Ensure MediaWiki uses ImageMagick $wgUseImageMagick = true; # Output and quality $wgPdfOutputExtension = 'jpg'; $wgPdfHandlerDpi = 150; $wgPdfHandlerJpegQuality = 95; $wgPdfThumbnailType = 'jpg'; $wgPdfPagePreview = true; $wgPdfHandlerPreviewWidth = 800; $wgPdfHandlerPreviewHeight = 600; $wgMaxShellMemory = 2097152;
Explanation
- $wgPdftoText – Points to the Poppler `pdftotext` binary. This allows MediaWiki to extract text layers from uploaded PDFs.
- Benefit – Club members can search within digitized papers, copy text snippets, and use extracted text for research.
- Successor routine – Verify Poppler is installed (`which pdftotext`). If missing, install via Entware.
- Integration – Text extraction works alongside previews; no extra extension needed.
Research Use Case
- Old digitized papers become searchable, making it easier to cross‑reference names, dates, and places.
- Successors should document how extracted text is used (search, indexing, or conversion into wiki pages).