Skip to content

Item search is very slow #411

@scottyhq

Description

@scottyhq

https://github.com/nasa/python_cmr can return a STAC feature collection very quickly (742 items in ~500ms)

from cmr import GranuleQuery
import json
import requests
%%time
# https://github.com/nasa/python_cmr search is very fast
# https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html
api = GranuleQuery()
search = api.parameters(
    bounding_box=(-122.0, 46.7, -121.5, 47), 
    temporal=('2024-01-01','2026-12-31'),
    collection_concept_id='C2074860916-LPCLOUD'  # Ecostress Soil Moisture short name = ECO_L3T_SM
)
items = search.format("stac").get()
# GeoJSON Feature Collection
features = json.loads(items[0]) 
len(features['features'])

But using this endpoint takes an order of magnitude more (~14s)

# limit > 100 has no affect. Max returns per page=100
url = 'https://cmr.earthdata.nasa.gov/cloudstac/LPCLOUD/search?limit=200&bbox=-122.0%2C46.7%2C-121.5%2C47.0&datetime=2024-01-01T00%3A00%3A00Z%2F2026-12-31T23%3A59%3A59Z&collections=ECO_L3T_SM_002'

all_features = []
page = 1
while url:
    response = requests.get(url)
    response.raise_for_status()
    data = response.json()
    features = data.get('features', [])
    all_features.extend(features)
    print(f"Page {page}: got {len(features)} features (total so far: {len(all_features)})")

    # Find the 'next' link to continue paging, or stop if there isn't one
    url = next((link['href'] for link in data.get('links', []) if link.get('rel') == 'next'), None)
    page += 1

print(f"\nTotal features retrieved: {len(all_features)}")

I think this is partly due to a max paging of 100 items per page in nasa/cmr-stac vs nasa/python_cmr using the CMR default max of 2000...

It would be very helpful if there were a comparison how the implementation and results coming from nasa/cmr-stac differ from the STAC output format explained in the official CMR docs (https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html#stac). Is this difference in speed expected? How to the collection names, asset names, and other fields differ?

related discussion: earthaccess-dev/earthaccess#221 (reply in thread)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions