Skip to content

Enhance cli index command #391

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 62 commits into from
Apr 28, 2025
Merged

Enhance cli index command #391

merged 62 commits into from
Apr 28, 2025

Conversation

ctrl-schaff
Copy link
Contributor

Updates our CLI tooling index ability to automatically inspect mapping for plugins that don't have them. Allows for the creation of a lot more index testing for the pending.api and other plugins that don't hard-code the elasticsearch mapping

@ctrl-schaff ctrl-schaff marked this pull request as ready for review March 27, 2025 06:20
@ctrl-schaff
Copy link
Contributor Author

Example usage for elasticsearch mapping inference (for reference I've never even used the command-line tooling on this plugin before).

iDisk on HEAD (0c1da8e) via 🐍 v3.12.3 (python-3.12.3)
❯ biothings-cli dataplugin dump_and_upload
[23:21:57] INFO     No memory limit set
[23:22:02] INFO     1 file(s) to download
[23:22:04] INFO     iDisk successfully downloaded
           INFO     Uncompress all archive files in '.biothings_hub/archive/iDisk/2020-02-14'
           INFO     unzipping '.biothings_hub/archive/iDisk/2020-02-14/idisk-rrf-1.0.1-2020-01-28.zip'
           INFO     done unzipping '.biothings_hub/archive/iDisk/2020-02-14/idisk-rrf-1.0.1-2020-01-28.zip'
           INFO     success
           INFO     Success! 🚀
╭─ Dump ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Source: iDisk                                                                                                                                                                                                     │
│ Data Folder: /home/schaffjr/workspace/plugin-development/pending.api/plugins/iDisk/.biothings_hub/archive/iDisk/2020-02-14:                                                                                       │
│ Data Folder Contents:                                                                                                                                                                                             │
│     - MRCONSO.RRF                                                                                                                                                                                                 │
│     - MRSTY.RRF                                                                                                                                                                                                   │
│     - MRREL.RRF                                                                                                                                                                                                   │
│     - MRSAT.RRF                                                                                                                                                                                                   │
│     - idisk-rrf-1.0.1-2020-01-28.zip                                                                                                                                                                              │
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
           INFO     No memory limit set
           INFO     Load data from directory or file: '.biothings_hub/archive/iDisk/2020-02-14'
           INFO     Uploading to the DB...
Progress: 1/919
Progress: 2/919
Progress: 3/919
Progress: 4/919
Progress: 5/919

...

Progress: 915/919
Progress: 916/919
Progress: 917/919
Progress: 918/919
Progress: 919/919
[23:22:35] INFO     Done[31.22s] with 919 docs
           INFO     Renaming collection 'iDisk_temp_f9uSynuW' to 'iDisk'
           INFO     Success! 🚀
╭─ Upload ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Source: iDisk                                                                                                                                                                                                     │
│ Database Path: Collection(Database(DatabaseClient(host=['localhost:27017'], document_class=dict, tz_aware=False, connect=True), 'data_src_database'), 'db_conn')                                                  │
│ - Database: data_src_database                                                                                                                                                                                     │
│     - Collections:                                                                                                                                                                                                │
│         iDisk                                                                                                                                                                                                     │
│     - Archived collections:                                                                                                                                                                                       │
│                                                                                                                                                                                                                   │
│     - Temporary collections:                                                                                                                                                                                      │
│                                                                                                                                                                                                                   │
│                                                                                                                                                                                                                   │
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
           INFO     Success! 🚀

iDisk on  HEAD (0c1da8e) [?] via 🐍 v3.12.3 (python-3.12.3) took 41s
❯ biothings-cli dataplugin index --plugin="iDisk"
[23:22:46] INFO     No memory limit set
[23:22:47] INFO     Build config 'iDisk-a2bba994-5083-45b9-8a54-d1b8e73c5967-configuration' will use builder class <class 'biothings.hub.databuild.builder.LinkDataBuilder'>
           INFO     No registered mapping found. Auto-generating mapping for source(s) ['iDisk']
           INFO     Done [0.24s]
           INFO     Post-processing
           INFO     Build config 'iDisk-a2bba994-5083-45b9-8a54-d1b8e73c5967-configuration' will use builder class <class 'biothings.hub.databuild.builder.LinkDataBuilder'>
           INFO     Merging into target collection 'iDisk-a2bba994-5083-45b9-8a54-d1b8e73c5967'
           WARNING  Executing <Task pending name='Task-1' coro=<do_index() running at /home/schaffjr/workspace/plugin-development/.direnv/python-3.12.3/lib/python3.12/site-packages/biothings/cli/operations.py:354>
                    wait_for=<_GatheringFuture pending cb=[Task.task_wakeup()] created at /usr/lib/python3.12/asyncio/tasks.py:712> cb=[_run_until_complete_cb() at /usr/lib/python3.12/asyncio/base_events.py:182]>
                    took 0.935 seconds
           INFO     Sources to be merged: ['iDisk']
           INFO     Root sources: []
           INFO     Other sources: ['iDisk']
           INFO     Merging other resources: ['iDisk']
           INFO     Finalizing target backend
           INFO     Skip post-merge process
           INFO     Build version: 20250326
           INFO     success [sources=['iDisk'],stats={'iDisk': 919}]
           INFO     {'commandhub': {'args': {'max_retries': 10, 'request_timeout': 300, 'retry_on_timeout': True, 'hosts': 'http://localhost:9200'}, 'name': 'commandhub'}}
           INFO     Created indexer instance <Indexer source='iDisk' dest='idisk-a2bba994-5083-45b9-8a54-d1b8e73c5967'>
           INFO     <Step name='pre' indexer=<Indexer source='iDisk' dest='idisk-a2bba994-5083-45b9-8a54-d1b8e73c5967'>>
           INFO     HEAD http://localhost:9200/idisk-a2bba994-5083-45b9-8a54-d1b8e73c5967
           INFO     GET http://localhost:9200/
           INFO     PUT http://localhost:9200/idisk-a2bba994-5083-45b9-8a54-d1b8e73c5967
           INFO     ('Created', 'idisk-a2bba994-5083-45b9-8a54-d1b8e73c5967', ObjectApiResponse({'acknowledged': True, 'shards_acknowledged': True, 'index': 'idisk-a2bba994-5083-45b9-8a54-d1b8e73c5967'}))
           INFO     IndexerStepResult({'host': 'http://localhost:9200', 'environment': 'commandhub'})
           INFO     IndexerCumulativeResult({'host': 'http://localhost:9200', 'environment': 'commandhub'})
           INFO     <Step name='index' indexer=<Indexer source='iDisk' dest='idisk-a2bba994-5083-45b9-8a54-d1b8e73c5967'>>
           INFO     Fetch _ids from 'iDisk', and create indexer job with batch_size=10000.
           WARNING  Can't find information for source collection 'iDisk'
           INFO     Building cache file '.biothings_hub/archive/cache/iDisk.xz._tmp_A2MEzOVX'
           INFO     <schedule running, on batch #1/1 100.0%, total=919 scheduled=919 finished=0>
           INFO     <schedule pending, waiting for completion, total=919 scheduled=919 finished=0>
[23:22:48] WARNING  Executing <Task pending name='Task-9' coro=<JobManager.defer_to_process.<locals>.run() running at
                    /home/schaffjr/workspace/plugin-development/.direnv/python-3.12.3/lib/python3.12/site-packages/biothings/utils/manager.py:833> wait_for=<Future pending
                    cb=[_chain_future.<locals>._call_check_cancel() at /usr/lib/python3.12/asyncio/futures.py:387, <1 more>, Task.task_wakeup()] created at /usr/lib/python3.12/asyncio/base_events.py:449>
                    cb=[JobManager.defer_to_process.<locals>.runned(job_id='UEgnAtnE')() at
                    /home/schaffjr/workspace/plugin-development/.direnv/python-3.12.3/lib/python3.12/site-packages/biothings/utils/manager.py:844]> took 0.130 seconds
[23:22:48] INFO     #1: 919 documents.
           INFO     GET http://localhost:9200/
           INFO     PUT http://localhost:9200/_bulk
           INFO     PUT http://localhost:9200/_bulk
           INFO     <schedule done total=919 scheduled=919 finished=919>
           INFO     IndexerStepResult({'count': 919, 'created_at': datetime.datetime(2025, 3, 26, 23, 22, 48, 390048, tzinfo=datetime.timezone(datetime.timedelta(days=-1, seconds=61200), 'PDT'))})
           INFO     IndexerCumulativeResult({'host': 'http://localhost:9200', 'environment': 'commandhub', 'count': 919, 'created_at': datetime.datetime(2025, 3, 26, 23, 22, 48, 390048,
                    tzinfo=datetime.timezone(datetime.timedelta(days=-1, seconds=61200), 'PDT'))})
           INFO     <Step name='post' indexer=<Indexer source='iDisk' dest='idisk-a2bba994-5083-45b9-8a54-d1b8e73c5967'>>
           INFO     IndexerStepResult({})
           INFO     IndexerCumulativeResult({'host': 'http://localhost:9200', 'environment': 'commandhub', 'count': 919, 'created_at': datetime.datetime(2025, 3, 26, 23, 22, 48, 390048,
                    tzinfo=datetime.timezone(datetime.timedelta(days=-1, seconds=61200), 'PDT'))})
           INFO     Build version: 20250326
╭─ Build ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Build Configuration Name:  iDisk-a2bba994-5083-45b9-8a54-d1b8e73c5967-configuration                                                                                                                               │
│ Build Version:  20250326                                                                                                                                                                                          │
│ Builder Class:  biothings.hub.databuild.builder.LinkDataBuilder                                                                                                                                                   │
│ Source(s): ['iDisk']                                                                                                                                                                                              │
│ Document Type: temporary                                                                                                                                                                                          │
│                                                                                                                                                                                                                   │
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Build Backend ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Build Backend Source:  iDisk-a2bba994-5083-45b9-8a54-d1b8e73c5967-configuration                                                                                                                                   │
│ Build Backend Target:  iDisk-a2bba994-5083-45b9-8a54-d1b8e73c5967-configuration                                                                                                                                   │
│                                                                                                                                                                                                                   │
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
           INFO     GET http://localhost:9200/idisk-a2bba994-5083-45b9-8a54-d1b8e73c5967
╭─ Index ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Index Name:  idisk-a2bba994-5083-45b9-8a54-d1b8e73c5967                                                                                                                                                           │
│ Index Properties:  [                                                                                                                                                                                              │
│   {                                                                                                                                                                                                               │
│     "index_name": "idisk-a2bba994-5083-45b9-8a54-d1b8e73c5967",                                                                                                                                                   │
│     "doc_type": "temporary",                                                                                                                                                                                      │
│     "build_version": "20250326",                                                                                                                                                                                  │
│     "count": 919,                                                                                                                                                                                                 │
│     "creation_date": "1743056567782",                                                                                                                                                                             │
│     "environment": {                                                                                                                                                                                              │
│       "name": "commandhub",                                                                                                                                                                                       │
│       "host": "http://localhost:9200"                                                                                                                                                                             │
│     }                                                                                                                                                                                                             │
│   }                                                                                                                                                                                                               │
│ ]                                                                                                                                                                                                                 │
│ Elasticsearch Mapping:  {                                                                                                                                                                                         │
│   "name": {                                                                                                                                                                                                       │
│     "type": "text"                                                                                                                                                                                                │
│   },                                                                                                                                                                                                              │
│   "umls": {                                                                                                                                                                                                       │
│     "normalizer": "keyword_lowercase_normalizer",                                                                                                                                                                 │
│     "type": "keyword"                                                                                                                                                                                             │
│   },                                                                                                                                                                                                              │
│   "has_adverse_effect_on": {                                                                                                                                                                                      │
│     "properties": {                                                                                                                                                                                               │
│       "meddra": {                                                                                                                                                                                                 │
│         "normalizer": "keyword_lowercase_normalizer",                                                                                                                                                             │
│         "type": "keyword"                                                                                                                                                                                         │
│       },                                                                                                                                                                                                          │
│       "source": {                                                                                                                                                                                                 │
│         "properties": {                                                                                                                                                                                           │
│           "name": {                                                                                                                                                                                               │
│             "normalizer": "keyword_lowercase_normalizer",                                                                                                                                                         │
│             "type": "keyword"                                                                                                                                                                                     │
│           },                                                                                                                                                                                                      │
│           "record": {                                                                                                                                                                                             │
│             "normalizer": "keyword_lowercase_normalizer",                                                                                                                                                         │
│             "type": "keyword"                                                                                                                                                                                     │
│           }                                                                                                                                                                                                       │
│         }                                                                                                                                                                                                         │
│       },                                                                                                                                                                                                          │
│       "name": {                                                                                                                                                                                                   │
│         "type": "text"                                                                                                                                                                                            │
│       }                                                                                                                                                                                                           │
│     }                                                                                                                                                                                                             │
│   },                                                                                                                                                                                                              │
│   "interacts_with": {                                                                                                                                                                                             │
│     "properties": {                                                                                                                                                                                               │
│       "umls": {                                                                                                                                                                                                   │
│         "normalizer": "keyword_lowercase_normalizer",                                                                                                                                                             │
│         "type": "keyword"                                                                                                                                                                                         │
│       },                                                                                                                                                                                                          │
│       "source": {                                                                                                                                                                                                 │
│         "properties": {                                                                                                                                                                                           │
│           "name": {                                                                                                                                                                                               │
│             "normalizer": "keyword_lowercase_normalizer",                                                                                                                                                         │
│             "type": "keyword"                                                                                                                                                                                     │
│           },                                                                                                                                                                                                      │
│           "record": {                                                                                                                                                                                             │
│             "normalizer": "keyword_lowercase_normalizer",                                                                                                                                                         │
│             "type": "keyword"                                                                                                                                                                                     │
│           }                                                                                                                                                                                                       │
│         }                                                                                                                                                                                                         │
│       },                                                                                                                                                                                                          │
│       "name": {                                                                                                                                                                                                   │
│         "type": "text"                                                                                                                                                                                            │
│       }                                                                                                                                                                                                           │
│     }                                                                                                                                                                                                             │
│   },                                                                                                                                                                                                              │
│   "has_therapeutic_class": {                                                                                                                                                                                      │
│     "properties": {                                                                                                                                                                                               │
│       "umls": {                                                                                                                                                                                                   │
│         "normalizer": "keyword_lowercase_normalizer",                                                                                                                                                             │
│         "type": "keyword"                                                                                                                                                                                         │
│       },                                                                                                                                                                                                          │
│       "source": {                                                                                                                                                                                                 │
│         "properties": {                                                                                                                                                                                           │
│           "name": {                                                                                                                                                                                               │
│             "normalizer": "keyword_lowercase_normalizer",                                                                                                                                                         │
│             "type": "keyword"                                                                                                                                                                                     │
│           },                                                                                                                                                                                                      │
│           "record": {                                                                                                                                                                                             │
│             "normalizer": "keyword_lowercase_normalizer",                                                                                                                                                         │
│             "type": "keyword"                                                                                                                                                                                     │
│           }                                                                                                                                                                                                       │
│         }                                                                                                                                                                                                         │
│       },                                                                                                                                                                                                          │
│       "meddra": {                                                                                                                                                                                                 │
│         "normalizer": "keyword_lowercase_normalizer",                                                                                                                                                             │
│         "type": "keyword"                                                                                                                                                                                         │
│       },                                                                                                                                                                                                          │
│       "name": {                                                                                                                                                                                                   │
│         "type": "text"                                                                                                                                                                                            │
│       }                                                                                                                                                                                                           │
│     }                                                                                                                                                                                                             │
│   },                                                                                                                                                                                                              │
│   "is_effective_for": {                                                                                                                                                                                           │
│     "properties": {                                                                                                                                                                                               │
│       "umls": {                                                                                                                                                                                                   │
│         "normalizer": "keyword_lowercase_normalizer",                                                                                                                                                             │
│         "type": "keyword"                                                                                                                                                                                         │
│       },                                                                                                                                                                                                          │
│       "source": {                                                                                                                                                                                                 │
│         "properties": {                                                                                                                                                                                           │
│           "name": {                                                                                                                                                                                               │
│             "normalizer": "keyword_lowercase_normalizer",                                                                                                                                                         │
│             "type": "keyword"                                                                                                                                                                                     │
│           },                                                                                                                                                                                                      │
│           "record": {                                                                                                                                                                                             │
│             "normalizer": "keyword_lowercase_normalizer",                                                                                                                                                         │
│             "type": "keyword"                                                                                                                                                                                     │
│           }                                                                                                                                                                                                       │
│         }                                                                                                                                                                                                         │
│       },                                                                                                                                                                                                          │
│       "name": {                                                                                                                                                                                                   │
│         "type": "text"                                                                                                                                                                                            │
│       }                                                                                                                                                                                                           │
│     }                                                                                                                                                                                                             │
│   },                                                                                                                                                                                                              │
│   "has_adverse_reaction": {                                                                                                                                                                                       │
│     "properties": {                                                                                                                                                                                               │
│       "umls": {                                                                                                                                                                                                   │
│         "normalizer": "keyword_lowercase_normalizer",                                                                                                                                                             │
│         "type": "keyword"                                                                                                                                                                                         │
│       },                                                                                                                                                                                                          │
│       "source": {                                                                                                                                                                                                 │
│         "properties": {                                                                                                                                                                                           │
│           "name": {                                                                                                                                                                                               │
│             "normalizer": "keyword_lowercase_normalizer",                                                                                                                                                         │
│             "type": "keyword"                                                                                                                                                                                     │
│           },                                                                                                                                                                                                      │
│           "record": {                                                                                                                                                                                             │
│             "normalizer": "keyword_lowercase_normalizer",                                                                                                                                                         │
│             "type": "keyword"                                                                                                                                                                                     │
│           }                                                                                                                                                                                                       │
│         }                                                                                                                                                                                                         │
│       },                                                                                                                                                                                                          │
│       "name": {                                                                                                                                                                                                   │
│         "type": "text"                                                                                                                                                                                            │
│       }                                                                                                                                                                                                           │
│     }                                                                                                                                                                                                             │
│   }                                                                                                                                                                                                               │
│ }                                                                                                                                                                                                                 │
│                                                                                                                                                                                                                   │
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

@ctrl-schaff
Copy link
Contributor Author

@newgene alright I think everything has been cleaned up from our last discussion. We now only have only the dataplugin command so try evaluating it as both a singular data-plugin and hub depending on the execution folder

@ctrl-schaff
Copy link
Contributor Author

❯ biothings-cli dataplugin

 Usage: biothings-cli dataplugin [OPTIONS] COMMAND [ARGS]...

 CLI tool for locally evaluating a biothings dataplugin. Allows for simple querying and data inspection.
    ✨ Run from an existing data plugin folder to evaluate a singular data plugin.
    ✨ Run from a parent folder containing multiple data plugins to operate like a hub.
    👉 Set BTCLI_RICH_TRACEBACK=1 ENV variable to enable full and pretty-formatted tracebacks,
    👉 Set BTCLI_DEBUG=1 to enable even more debug logs for debugging purpose.
    👉 Include a config.py at the working directory to override the default biothings.config settings.
    🚀💥💖

╭─ Options ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --help  -h        Show this message and exit.                                                                                                                │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Commands ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ create            Create a new data plugin from a pre-defined template                                                                                       │
│ dump              Download the source data files to the local file system                                                                                    │
│ upload            Parse the downloaded data files from the dump operation and upload to the source database                                                  │
│ dump_and_upload   Sequentially execute the dump and upload commands                                                                                          │
│ list              List dumped files, uploaded sources, or internal hubdb contents                                                                            │
│ inspect           Derive detailed information about the document data structure from the parsed documents                                                    │
│ serve             Run a simple API server for serving documents from the source database                                                                     │
│ clean             Delete all dumped files and/or drop uploaded sources tables                                                                                │
│ index             (experimental) Create an elaticsearch index from a data source database                                                                    │
│ validate          (experimental) Validate a provided manifest file via JSONSchema                                                                            │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

New layout for the biothings-cli dataplugin command

@ctrl-schaff ctrl-schaff merged commit ffd239c into 1.0.x Apr 28, 2025
0 of 13 checks passed
@ctrl-schaff ctrl-schaff deleted the enhance-cli-index-command branch April 28, 2025 23:11
shuchenliu pushed a commit that referenced this pull request May 29, 2025
* Change build command name to index

* Merge the manifest, dataplugin, and hub cli

* Move the number truncation function to doc_inspect

* Leverage the hub JobManager in the cli tooling

* Add JobManager settings for cli tooling

* Operations update ...

Integrates the JobManager into the dump method to leverage
`dump_src` which is the same as the hub server endpoint.

Adds support for the parallel uploader, as the previous iteration
couldn't support plugins with a parallel uploader

Update the index command to build a mapping if one isn't found
from the uploader stage (likely because one doesn't exist or isn't
hard-coded for the plugin)

* Breakup the `process_inspect` command

* Add address and get_conn to the sqlite3 client

* Fix the methods calls for inspection

* Update the empty metadata check

* Add console logging for index command

* Update the docstrings and help messages for commands

* Change the default log level from INFO to WARNING in cli

* Improve consistency across CLI operation arguments

* Change batch_limit default size

* Improve batch_limit help text

* Normalize the bounding box for the output messages

* Remove the CLIJobManager in-place for the JobManager

* Update the CLIAssistant for handling directories

* Fix default uploader database address assignment

* Update our dataplugin help message

* Combine the validate and schema commands

* Remove the cli sync/async function runner

* Create async loop for manifest validation

* Fix plugin_name generation with dump command

* Push plugin_name resolution entirely to CLIAssistant

* Add data_directory when building managers in CLIAssistant

* Suppress import error for aiocron in builder

* Simplify the BaseManager constructor arguments

---------

Co-authored-by: jschaff <[email protected]>
Co-authored-by: Chunlei Wu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants