DSpace Command-Line Interface (CLI)

(Please see DSpace JRuby and ensure the RVM, JRuby, and DSpace JRuby have been installed.)

Installing

Initializing the JRuby Environment

dspace@host:~$ source $HOME/.rvm/scripts/rvm
dspace@host:~$ rvm use jruby-9.2.13.0

Installing Gem dependencies

dspace@host:~$ mkdir -p pulibrary-src
dspace@host:~$ cd pulibrary-src
dspace@host:~/pulibrary-src$ git clone https://github.com/pulibrary/dspace-cli.git
dspace@host:~/pulibrary-src$ cd dspace-cli
dspace@host:~/pulibrary-src/dspace-cli$ jgem install bundler
dspace@host:~/pulibrary-src/dspace-cli$ bundle install

Starting the DSpace Ruby interpreter

You will need to first ssh and login as the dspace user (sudo su - dspace). Below, $eperson is whatever account you’d like to use to login when applying changes.

dspace@host:~$ source $HOME/.rvm/scripts/rvm
dspace@host:~$ cd ~/pulibrary-src/dspace-cli
dspace@host:~/pulibrary-src/dspace-cli$ bin/dspace-irb $eperson
  Using /dspace
  Loading jars
  Loading /dspace/config/dspace.cfg
  INFO: Loading provided config file: /dspace/config/dspace.cfg
  INFO: Using dspace provided log configuration (log.init.config)
  INFO: Loading: /dspace/config/log4j.properties
  Starting new DSpaceKernel
  DB jdbc:postgresql://localhost:5432/dspace_db, UserName=dspace_db_user,
  PostgreSQL JDBC Driver
  jruby-9.2.13.0 :001 >

Configuration

dspace@host:~$ cd ~/pulibrary-src/dspace-cli
dspace@host:~$ vi config/dspace.yml
jobs:
  bitstream_export_job:
    export_path: '/mnt/dspace_exports'

Listing the CLI Tasks

dspace@host:~$ cd ~/pulibrary-src/dspace-cli
dspace@host:~/pulibrary-src/dspace-cli$ bundle exec thor list

Usage

Query API

Querying using an ID

query = DSpace::CLI::Query.new
query.find_by_id(103031)
query.results.first.id

Querying using a Handle

query = DSpace::CLI::Query.new
query.find_by_handle('88435/dsp01cr56n3903')
query.results.first.handle

Updating Metadata

query = DSpace::CLI::.Query.new
query.find_by_id(105884)
query.results.first.id
query.results.first.department
query.results.first.remove_metadata('pu', 'department', 'Mechanical and Aerospace Engineering')
query.results.first.department
query = DSpace::CLI::.Query.new
query.find_by_id(105890)
query.results.first.id
query.results.first.add_metadata(schema: 'dc', element: 'contributor',
qualifier: 'author', value: 'J. Smith')
query.results.first.add_metadata(schema: 'pu', element: 'contributor',
qualifier: 'authorid', value: '1234567890')

Bitstream Management

Exporting Bitstream Content

This writes the bitstream to files on the local file system

query = DSpace::CLI::Query.new
query.find_by_handle('88435/dsp01kk91fp56s')
query.results.first.export_bitstreams

By default, the files are exported into exports/bitstreams. One may customize this within the configuration.

Following this, copying the exports using a utility such as rsync is quite straightforward:

~$ ssh -L 2222:proxy-host:22 dspace@host
~$ rsync -auvzi --progress -e 'ssh -p 2222' dspace@localhost:~/pulibrary-src/dspace-cli/exports/bitstreams/ .

Or, should this be configured for the location /mnt/dspace_exports:

~$ ssh -L 2222:proxy-host:22 dspace@host
~$ rsync -auvzi --progress -e 'ssh -p 2222' dspace@localhost:/mnt/dspace_exports/ .

The exports will be named after their bistream IDs. To pair them with their original filenames, you may want to use the DSpace API. For example, if you download 119610.bin, you can get the filename (models_hourglass.tar.gz) from https://dataspace.princeton.edu/rest/bitstreams/119610. This can be especially useful if a record contains tens of files.

Workflow Management

Adding and Removing Users from Task Pools

query = DSpace::CLI::Query.new
query.find_by_id(105927)
query.results.first.id
query.results.first.workflow_item.id
query.results.first.workflow_item.state
query.results.first.workflow_item.add_task_pool_user("user@domain.edu")
query.results.first.workflow_item.remove_task_pool_user("user@domain.edu")

Removing Tasks from the MyDSpace Dashboard

ep1 = Java::OrgDspaceEperson::EPerson.findByEmail(DSpace.context, 'user@domain.edu')
workflow_items = Java::OrgDspaceWorkflow::WorkflowItem.findByEPerson(DSpace.context, ep1)
ep2 = Java::OrgDspaceEperson::EPerson.findByEmail(DSpace.context, 'user@domain.edu')

workflow_items.to_a.map { |wfi| wfi.setOwner(ep2); wfi.update }
workflow_items.to_a.map { |wfi| wfi.getItem.setSubmitter(ep2); wfi.update }

updated_workflow_items = Java::OrgDspaceWorkflow::WorkflowItem.findByEPerson(DSpace.context, ep1)
updated_workflow_items.to_a.map { |wfi| wfi.getOwner }

For multiple Items:

query = DSpace::CLI::SeniorThesisCommunity.find_by_class_year('2020')
department = 'Mechanical and Aerospace Engineering'
sub_query = query.find_by_department(department)
query.result_set.add_task_pool_user("user@domain.edu")
query.result_set.remove_task_pool_user("user@domain.edu")

Finding Workflow Items

java_import(org.dspace.workflow.WorkflowItem)
java_import(org.dspace.workflow.WorkflowManager)
java_import(org.dspace.eperson.EPerson)
eperson = Java::OrgDspaceEperson::EPerson.findByEmail(DSpace.context, "user@domain.edu")

workflow_items = Java::OrgDspaceWorkflow::WorkflowManager.getPooledTasks(DSpace.context, eperson)
pool_items = workflow_items.to_a.map { |wfi| wfi.getItem }
pool_items.map { |item| item.getID }

Setting Item Submitters

java_import(org.dspace.eperson.EPerson)
eperson = Java::OrgDspaceEperson::EPerson.findByEmail(DSpace.context, "user@domain.edu")

query = DSpace::CLI::SeniorThesisCommunity.query
query.find_by_id(105927)
query.results.first.submitter = eperson
query.results.first.update
query = DSpace::CLI::SeniorThesisCommunity.find_by_class_year('2020')
department = 'Mechanical and Aerospace Engineering'
sub_query = query.find_by_department(department)
sub_query.result_set.submitter_email = 'user@domain.edu'

Refreshing Task Pools for Users in Groups

There is a known bug in DSpace 5.5 that causes users who are newly added to Groups not to be able to see or interact with tasks in the workflow pool associated with that Group in their dashboard when they log in.

In order to address this, manual intervention is needed. Over on dspace-cli, the following commands must be executed to accomplish this:

java_import(org.dspace.eperson.Group)
java_import(org.dspace.workflow.WorkflowItem)
java_import(org.dspace.workflow.WorkflowManager)
java_import(org.dspace.eperson.EPerson)

group = Group.findByName(DSpace.context, $GROUP_NAME)
members = Group.allMembers(DSpace.context, group)
eperson = Java::OrgDspaceEperson::EPerson.findByEmail(DSpace.context, $USER_ALREADY_IN_GROUP)

workflow_items = Java::OrgDspaceWorkflow::WorkflowManager.getPooledTasks(DSpace.context, eperson)
pool_items = workflow_items.to_a.map { |wfi| wfi.getItem }
pool_items.map { |item| item.getID }
items = pool_items.map { |item| DSpace::CLI::Item.new(item) }
items.each { |i| i.workflow_item.add_task_pool_user($USER_NEW_TO_GROUP) }

Where the variables in the above example are populated with values as follows:

  • $GROUP_NAME is a quoted string of the name of the DataSpace group to which the user has been newly added (example: "Lib_DigPubs_Reviewers").

  • $USER_ALREADY_IN_GROUP is the email address associated with a user already in the $GROUP_NAME group before the tasks that need to be visible were created.

  • $USER_NEW_TO_GROUP is the email address associated with the user who is newly associated with the $GROUP_NAME group, and needs to have the group’s tasks that are currently in the group’s workflow pool become visible to them.

Once the items.each loop at the bottom of the code block above completes, the change should be immediately visible in the DataSpace interface.

Advancing Workflow States

query = DSpace::CLI::SeniorThesisCommunity.query
query.find_by_id(105927)
query.results.first.id
query.results.first.workflow_item.id
query.results.first.workflow_item.state
query.results.first.archived?
query.results.first.archive
DSpace.commit

new_query = DSpace::CLI::SeniorThesisCommunity.query.find_by_id(105927)
updated = new_query.results.first
updated.id
updated.archived?
updated.workflow_item

Collection Management

query = DSpace::CLI::SeniorThesisCommunity.query
query.find_by_id(100572)
query.results.first.handle
query.results.first.add_to_collection("88435/dsp01gx41mh91n")
query.results.first.collections

Moving Between Collections

query = DSpace::CLI::SeniorThesisCommunity.find_by_class_year('2020')
sub_query = query.find_by_department('Mechanical and Aerospace Engineering')
sub_query.result_set.members.map(&:id)
sub_query.result_set.add_to_collection('88435/dsp015m60qr96c')
sub_query.result_set.remove_from_collection('88435/dsp015m60qr96c')

Collection Methods

query = DSpace::CLI::SeniorThesisCommunity.find_by_class_year('2020')
sub_query = query.find_mechanical_and_aerospace_engineering_department_items
sub_query.results.length

Removing from a Collection

query = DSpace::CLI::Query.new
# handle of the object to be updated
query.find_by_handle('88435/dsp010k225f00b')
first_item = query.results.first

# handle of the collection from which the object is to be removed
collection_handle = '88435/dsp01c247ds15b'
collection = DSpace::CLI::SeniorThesisCollection.find_for_handle(collection_handle)
collection.removeItem(first_item)
collection.index

To ensure that these changes persist across DataSpace, after updating the object’s collection association, edit the item and update it using the web UI. If there is a certificate program affiliation using the pu.certificate metadata field value, remove it in the web UI and update the item to avoid it getting updated and reindexed later on.

Exporting Items

query = DSpace::CLI::SeniorThesisCommunity.find_by_class_year('2020')
sub_query = query.find_mechanical_and_aerospace_engineering_department_items
sub_query.results.each { |item| item.remove_duplicated_metadata }
sub_query.results.each { |item| item.update }
sub_query.results.each { |item| item.export }

Exporting Departments

DSpace::CLI::SeniorThesisCommunity.export_departments("2020")

Deleting Existing Items

DSpace::CLI::SeniorThesisCommunity.collection_titles.map do |title|
  result_set = DSpace::CLI::SeniorThesisCommunity.find_by_class_year('2020').find_by_department(title).results
  result_set.map { |item| item.delete }
end

Import

export SOURCE_DIR=/tmp/exports/exports/
export MAPFILE=import-09-05-20.mapfile
/dspace/bin/dspace import --add --collection=88435/dsp015m60qr96c --source=$SOURCE_DIR --mapfile=$MAPFILE --eperson=user@domain.edu --workflow

Moving to New Collections

query = DSpace::CLI::SeniorThesisCommunity.find_by_class_year('2020')

department = 'Mechanical and Aerospace Engineering'
collection = '88435/dsp015m60qr96c'
sub_query = query.find_by_department(department)
sub_query.result_set.add_to_collection(collection)
sub_query.result_set.remove_from_collection('88435/dsp015m60qr96c')

Moving to Departmental Collections

query = DSpace::CLI::SeniorThesisCommunity.find_by_class_year('2020')
sub_query = query.find_by_certificate_program('Creative Writing Program')
sub_query.results.each { |item| item.find_collections_for_departments.each { |collection| collection.add_item(item) } }
sub_query.result_set.remove_from_collection('88435/dsp0179407x233')
sub_query.results.map { |item| item.collections.map { |collection| collection.title } }

Batch Job Support

Updating the state of Items in batches

department = 'Mechanical and Aerospace Engineering'
report_name = "#{DSpace::CLI::ResultSet.normalize_department_title(department)}.csv"
batch_input_file_path = File.join(DSpace::CLI::ItemStateReport.root_path, report_name)
job = DSpace::CLI::BatchUpdateJob.new(batch_input_file_path, '2020')
job.perform_update_state_jobs

Updating the Item titles

Dir.glob('/tmp/batch_imports/*csv').each do |batch_input_file_path|
  job = DSpace::CLI::BatchUpdateJob.new(batch_input_file_path, '2020')
  job.perform_update_title_jobs
end

Senior Theses Community

Querying using DSpace metadata fields

query = DSpace::CLI::SeniorThesisCommunity.find_by_class_year('2020')
sub_query = query.find_by_department('Mechanical and Aerospace Engineering')
sub_query.results.map(&:id)
sub_query.results.map(&:handle)
sub_query.results.first.workflow_item

Querying using a title

query = DSpace::CLI::SeniorThesisCommunity.find_by_class_year('2020')
sub_query = query.find_by_title('Colors, Sounds, and Qualitative Particulars')
sub_query.results.map(&:id)

Querying using an author

query = DSpace::CLI::SeniorThesisCommunity.find_by_class_year('2020')
sub_query = query.find_by_author('Smith, John')
sub_query.results.map(&:id)

Importing Vireo Spreadsheets

require 'vireo/cli/export'
spreadsheet_path = '/tmp/mathematics.xlsx'
export = Vireo::CLI::Export.build_from_spreadsheet(file_path: spreadsheet_path, year: '2020')
export.build_batch_import
export.write_batch_import('mathematics.csv')

Mapping Theses to Collections by PU.Certificate Values

Some items are imported into DataSpace from Vireo as part of a distinct department and separate certificate program. These certificate programs have associated collections. The collection is derived from the value in the pu.certificate metadata field for each item. These collections are as follows:

Collection name | Collection ark | Pu.certificate value – | – | – Creative Writing Program | https://dataspace.princeton.edu/handle/88435/dsp01gx41mh91n | Creative Writing Program East Asian Studies Program | https://dataspace.princeton.edu/handle/88435/dsp016682x659t | East Asian Studies Program Global Health and Health Policy Program | https://dataspace.princeton.edu/handle/88435/dsp01kh04ds333 | Global Health and Health Policy Program Theater | https://dataspace.princeton.edu/handle/88435/dsp01c247ds15b | Theater Program

Here is an example for how to accomplish this for “Theater Program” items in 2021, executed from within the CLI terminal:

collection_handle = '88435/dsp01c247ds15b'
cert_collection = DSpace::CLI::SeniorThesisCollection.find_for_handle(collection_handle)
query = DSpace::CLI::SeniorThesisCommunity.find_by_class_year('2021')

cert_program = 'Theater Program'
sub_query = query.find_by_certificate_program(cert_program)
items = []
sub_query.results.each do |item|
  add_to_collection = true
  item.collections.each do |ic|
    add_to_collection = false if ic.handle == collection_handle
  end
  items << item if add_to_collection
end

items.each do |i|
  cert_collection.add_item(i)
  cert_collection.update
end

cert_collection.index

Where collection_handle is the ark handle for the certificate collection to be mapped (consult the table above), cert_program is the string value from the item’s pu.certificate field that cooresponds to the collection (for example Theater Program, Creative Writing Program), and the parameter in the call to find_by_class_year matches the current year.

Reporting

Generating a report for the state of Items

query = DSpace::CLI::SeniorThesisCommunity.find_by_class_year('2020')
department = 'Mechanical and Aerospace Engineering'
sub_query = query.find_by_department(department)
report_name = "#{DSpace::CLI::ResultSet.normalize_department_title(department)}.csv"
report = sub_query.result_set.item_state_report(report_name)
report.write

For entire graduating classes:

DSpace::CLI::SeniorThesisCommunity.write_item_state_reports("2020")

Generating a report for Items grouped by authors:

query = DSpace::CLI::SeniorThesisCommunity.find_by_class_year('2020')
department = 'Mechanical and Aerospace Engineering'
sub_query = query.find_by_department(department)
report_name = "#{DSpace::CLI::ResultSet.normalize_department_title(department)}.csv"
report = sub_query.result_set.item_author_report(report_name)
report.write

For entire graduating classes:

DSpace::CLI::SeniorThesisCommunity.write_item_author_reports("2020")

Generating a report for Items grouped by certificate programs:

query = DSpace::CLI::SeniorThesisCommunity.find_by_class_year('2020')
department = 'Mechanical and Aerospace Engineering'
sub_query = query.find_by_department(department)
report_name = "#{DSpace::CLI::ResultSet.normalize_department_title(department)}.csv"
report = sub_query.result_set.item_certificate_program_report(report_name)
report.write

For entire graduating classes:

DSpace::CLI::SeniorThesisCommunity.write_item_certificate_program_reports("2020")