Naar inhoud springen

PdfHandler Debugging

Uit CostaSano-Wiki
Versie door Mngr (overleg | bijdragen) op 9 dec 2025 om 19:22
(wijz) ← Oudere versie | Huidige versie (wijz) | Nieuwere versie → (wijz)

PdfHandler Troubleshooting: Restart Routine

When adjusting PdfHandler prerequisites (Entware binaries, PATH updates, or MediaWiki configuration), make sure to restart both WebStation and the MediaWiki package.

Steps

  1. Verify prerequisites are installed and callable:
which gs
which pdftoppm
which magick
  1. Restart **WebStation**:
Control Panel → Web Services → WebStation → Restart
  1. Restart **MediaWiki package**:
Package Center → Installed → MediaWiki → Restart
  1. Clear cached test files if needed:
rm /volume1/web/mediawiki/images/*.pdf
rm /volume1/web/mediawiki/images/thumb/*.pdf
  1. Upload a fresh PDF and confirm thumbnails are generated.

Notes

  • Restarting WebStation alone may not reload MediaWiki’s environment variables.
  • Restarting MediaWiki ensures new PATH and handler settings are applied.
  • Always document the rationale for successors: “Restart both packages after changes.”


Symptom Likely Cause Diagnostic Step Fix / Routine
PDF upload shows no preview Missing or mis-set `$wgPdfPostProcessor` Check `LocalSettings.php` for correct binary path Set to `magick` (ImageMagick v7) or `gs` (Ghostscript) as installed
Error: "open_basedir restriction" DSM PHP configuration limits Inspect `php.ini` and DSM Web Station settings Add required paths to `open_basedir` or disable restriction for MediaWiki
PdfHandler fails silently PATH not updated for Entware binaries Run `which gs` / `which pdftoppm` via SSH Update PATH in `LocalSettings.php` or DSM task scheduler
Wrong command syntax in `$wgPdfPostProcessor` Dash or argument misconfiguration Compare with PdfHandler documentation Correct to `gs -q -dNOPAUSE -dBATCH ...` or `magick convert ...`
Preview works but thumbnails broken ImageMagick vs Ghostscript mismatch Test both processors manually Standardize on one processor and document choice


PdfHandler Configuration

wfLoadExtension( 'PdfHandler' );

$wgShellbox = false; // Disable shellbox if you configure paths manually

// Configure paths using the *absolute paths* found via 'which' commands
// We use the full path to the binaries in /opt/bin/
$wgPdfProcessor = '/opt/bin/gs';
$wgImageMagickConvertCommand = '/opt/bin/convert'; 
$wgPdfInfo = '/opt/bin/pdfinfo';
$wgPdftoText = '/opt/bin/pdftotext';

// Ensure MediaWiki knows to use ImageMagick for processing
$wgUseImageMagick = true; 

// The 'wgPdfPostProcessor' variable is usually not required when 
// 'wgImageMagickConvertCommand' is set and $wgUseImageMagick is true.
// $wgPdfPostProcessor = 'convert'; 

# Output and quality
$wgPdfOutputExtension       = 'jpg';
$wgPdfHandlerDpi            = 150;
$wgPdfHandlerJpegQuality    = 95;
$wgPdfThumbnailType         = 'jpg';
$wgPdfPagePreview           = true;
$wgPdfHandlerPreviewWidth   = 800;
$wgPdfHandlerPreviewHeight  = 600;
$wgMaxShellMemory           = 2097152;

Explanation

  • wfLoadExtension( 'PdfHandler' ); – Loads the PdfHandler extension.
  • $wgShellbox = false; – Disables Shellbox, since paths are configured manually.
  • Absolute paths – Each binary (`gs`, `convert`, `pdfinfo`, `pdftotext`) must be set to the full path found via `which`.
  • $wgUseImageMagick = true; – Ensures MediaWiki uses ImageMagick for PDF rendering.
  • $wgPdfPostProcessor – Normally not required when `convert` is set and ImageMagick is enabled.
  • Output settings – Previews and thumbnails are generated as JPG with DPI 150 and JPEG quality 95.
  • Preview dimensions – Width 800px, height 600px for page previews.
  • $wgMaxShellMemory – Sets memory limit for shell commands (here ~2 GB).

Successor Notes

  • Always verify paths after Entware installation (`which gs`, `which convert`).
  • Keep `Shellbox` disabled unless you migrate to a containerized setup.
  • Document chosen processor (Ghostscript vs ImageMagick) for consistency.
  • Adjust DPI/quality if storage or performance becomes an issue.


Extension Matrix

Extension Purpose Complexity Successor Notes
PdfHandler Renders PDF previews and thumbnails Medium (requires Entware binaries, path config) Verify absolute paths (`which gs`, `which convert`). Document chosen processor (Ghostscript vs ImageMagick).
Lockdown / NamespaceProtection Restrict read/edit access by namespace Low–Medium Use for Research/ICT privacy. Document group permissions and release routine.
VisualEditor WYSIWYG editing for pages High (Parsoid service required) Ensure Parsoid service runs. Document fallback to wikitext if unavailable.
DiscussionTools Structured talk page replies Low No extra config. Document usage for collaborative discussions.
Collection / ElectronPdfService Export pages to PDF Medium Choose one workflow. Document install steps separately from PdfHandler.
Cite / CiteThisPage Reference management and citation export Low Document citation style conventions for club research.
CategoryTree Hierarchical category browsing Low Useful for research archives. Document how to expand/collapse trees.
PageForms Structured data entry forms Medium Document form templates clearly. Successors should know how to edit form definitions.


PdfHandler with Text Extraction

wfLoadExtension( 'PdfHandler' );

$wgShellbox = false; // Disable shellbox if you configure paths manually

// Absolute paths to binaries (verify with 'which' commands)
$wgPdfProcessor = '/opt/bin/gs';
$wgImageMagickConvertCommand = '/opt/bin/convert'; 
$wgPdfInfo = '/opt/bin/pdfinfo';
$wgPdftoText = '/opt/bin/pdftotext';  // Enables text extraction from PDFs

// Ensure MediaWiki uses ImageMagick
$wgUseImageMagick = true; 

# Output and quality
$wgPdfOutputExtension       = 'jpg';
$wgPdfHandlerDpi            = 150;
$wgPdfHandlerJpegQuality    = 95;
$wgPdfThumbnailType         = 'jpg';
$wgPdfPagePreview           = true;
$wgPdfHandlerPreviewWidth   = 800;
$wgPdfHandlerPreviewHeight  = 600;
$wgMaxShellMemory           = 2097152;

Explanation

  • $wgPdftoText – Points to the Poppler `pdftotext` binary. This allows MediaWiki to extract text layers from uploaded PDFs.
  • Benefit – Club members can search within digitized papers, copy text snippets, and use extracted text for research.
  • Successor routine – Verify Poppler is installed (`which pdftotext`). If missing, install via Entware.
  • Integration – Text extraction works alongside previews; no extra extension needed.

Research Use Case

  • Old digitized papers become searchable, making it easier to cross‑reference names, dates, and places.
  • Successors should document how extracted text is used (search, indexing, or conversion into wiki pages).