DSpace Generating Thumbnails
Generating Thumbnails for PDF in DSpace
When you add bitstreams, thumbnails and content indexing is performed asynchronously by the ‘dspacefilterWmedia’ command attached to a cron task. By default, the cron task is set to run in the dspace users crontab once a night. You can select a more frequent interval, but be careful that you don’t have so short an interval that runs overlap.
You can also run this process manually (for example, after you have loaded a batch of images, and you want to see the thumbnails). To do this, as the dspace user, run:
This will scan the collection for all unthumbnailed/unindexed images and process them. If you change the thumbnail size in the dspace.cfg file, or for some other reason want to delete all the thumbnails that were generated, you can run the following command to recreate them for all items. To do this, as the dspace user, run:
/dspace/bin/dspace filter-media -f
Executing (via Command Line)
The media filter system is intended to be run from the command line (or regularly as a cron task):
With no options, this traverses the asset store, applying media filters to bitstreams, and skipping bitstreams that have already been filtered.
Available Command-Line Options:
Help : [dspace]/bin/dspace filter-media -h
Display help message describing all command-line options.
Force mode : [dspace]/bin/dspace filter-media -f
Apply filters to ALL bitstreams, even if they've already been filtered. If they've already been filtered, the previously filtered content is overwritten.
Identifier mode : [dspace]/bin/dspace filter-media -i 123456789/2
Restrict processing to the community, collection, or item named by the identifier - by default, all bitstreams of all items in the repository are processed. The identifier must be a Handle, not a DB key. This option may be combined with any other option.
Maximum mode : [dspace]/bin/dspace filter-media -m 1000
Suspend operation after the specified maximum number of items have been processed - by default, no limit exists. This option may be combined with any other option.
Plugin mode : [dspace]/bin/dspace filter-media -p "PDF Text Extractor","Word Text Extractor"
Apply ONLY the filter plugin(s) listed (separated by commas). By default all named filters listed in the filter.plugins field of dspace.cfg are applied. This option may be combined with any other option. WARNING: multiple plugin names must be separated by a comma (i.e. ',') and NOT a comma followed by a space (i.e. ', ').
Skip mode : [dspace]/bin/dspace filter-media -s 123456789/9,123456789/100
SKIP the listed identifiers (separated by commas) during processing. The identifiers must be Handles (not DB Keys). They may refer to items, collections or communities which should be skipped. This option may be combined with any other option. WARNING: multiple identifiers must be separated by a comma (i.e. ',') and NOT a comma followed by a space (i.e. ', ').
NOTE: If you have a large number of identifiers to skip, you may maintain this list, one identifier per line, within a separate file (e.g. filter-skiplist.txt). Use the following format to call the program.
[dspace]/bin/dspace filter-media -s $(paste -sd, - < filter-skiplist.txt)
Verbose mode : [dspace]/bin/dspace filter-media -v
Print all extracted text and other filter details to STDOUT.