A discovery portal for Princeton research data. Initially it will provide a better browsing experience for the research data contained in DataSpace.
Please note: While this is open-source software, we would disourage anyone from trying to just check it out and run it. Princeton specifics, from styling to authentication and authorization, are hard coded, and we have not invested any time in the kind of configurabily that would be needed for use at another institution. Instead it should be taken as an example of breaking a monolithic project into separate components, and developing iteratively in response to local user feedback.
- Ruby: 3.1.0
- nodejs: 12.18.3
- yarn: 1.22.10
- postgres:
brew install postgresql@14; brew services start postgresql@14
- Lando: 3.0.0
Update the file config/banner.yml
. Note that each environment can have its own banner text.
- Check out code
bundle install
yarn install
We use lando to run services required for both test and development environments.
Start and initialize solr and database services with:
bundle exec rake servers:start
To stop solr and database services:
bundle exec rake servers:stop
or lando stop
- Fast:
bundle exec rspec spec
- Run in browser:
RUN_IN_BROWSER=true bundle exec rspec spec
We utilize Rubocop for our Ryby code and Prettier for our JavaScript
- To run rubocop run
bundle exec rubocop
- To allow for autocorrecting of errors run
bundle exec rubocop -a
- To allow for autocorrecting of errors run
- To run prettier via yar lint run
yarn lint
- To run prettier by itself to see more details on errors run
yarn prettier app/javascript
- To run prettier to autocorrect errors run
yarn prettier --write app/javascript
- To run prettier by itself to see more details on errors run
- Terminal one:
bin/rails s -p 3000
- Access pdc_discovery at http://localhost:3000/
To create a tagged release use the steps in the RDSS handbook
PDC Discovery indexes data from both DataSpace and from PDC Describe via the following rake task:
rake index:research_data
This rake task is scheduled to run every 30 minutes on the production and staging servers.
In production and staging we use Solr cloud to manage our Solr index. Our configuration uses a Solr alias to point to the current Solr collection that we are using. For example, in staging the alias pdc-discovery-staging
points to the pdc-discovery-staging-1
collection.
When we index new content we create a new Solr collection (e.g. pdc-discovery-staging-2
) and index our data to this new collection. Once the indexing has completed we update our Solr alias to point to this new collection.
Our indexing process automatically toggles between pdc-discovery-staging-1
and pdc-discovery-staging-2
.
This dual collection approach allows us to index to a separate area in Solr and prevents users from seeing partial results while we are running the index process.
To make changes to the Solr schema in production/staging you need to update the files in the pul_solr repository and deploy them. The basic steps are:
Getting your changes into pul_solr configuration file for PDC Discovery
- Copy your configuration updates to pul_solr (This command assumes all your projects live in one folder on your machine)
cp solr/conf/* ../pul_solr/solr_configs/pdc-discovery/conf/
- create a Draft PR in pul_solr with your changes ( is the name of your new branch for the PR)
- Connect to the VPN.
- Optional. You can tunnel to machine running Solr
ssh -L 8983:localhost:8983 pulsys@lib-solr-staging4
if you want to see your current configuration (e.g.solrconfig.xml
orschema.xml
). - Make sure you are on the
pul-solr
repo. - Deploy the changes, e.g.
BRANCH=<branch-name> bundle exec cap solr8-staging deploy
. - verify your changes have worked and mark your PR ready for review
- Once the PR has been merged cordiante a time to deploy the changes to production
bundle exec cap solr8-production deploy
You can see the list of Capistrano environments here
The deploy will update the configuration for all Solr collections in the given environment, but it does not cause downtime. If you need to manually reload a configuration for a given Solr collection you can do it via the Solr Admin UI.
You can view the Honeybadger Uptime check. Currently it checks every minute and will report downtime when two checks fail in a row (i.e. we should know within 2 minutes).
To be notified of downtime enable notifications in Honeybadger under: Settings + Alerts & Integrtions + email (Edit). Enable notifications for "Uptime Events" for "PDC Discovery Production". Notice that email notifications settings are per project.
There is a data feed at /pppl_reporting_feed.json
.
It provides a feed of the full JSON blob from PDC Describe for every object tagged as belonging to the Princeton Plasma Physics Laboratory group, sorted by most recently updated first. This is so PPPL can harvest data sets to report to OSTI.
This feed can be paged through using the parameters per_page
and page
, like this:
https://pdc-discovery-staging.princeton.edu/discovery/pppl_reporting_feed.json?per_page=2&page=3
Mailcatcher is a gem that can also be installed locally. See the mailcatcher documentation for how to run it on your machine.
To See mail that has been sent on the staging server you can utilize capistrano to open up both mailcatcher consoles in your browser.
cap staging mailcatcher:console
Look in your default browser for the consoles
Emails on production are sent via Pony Express.