https://github.com/nasa/python_cmr can return a STAC feature collection very quickly (742 items in ~500ms)
from cmr import GranuleQuery
import json
import requests
%%time
# https://github.com/nasa/python_cmr search is very fast
# https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html
api = GranuleQuery()
search = api.parameters(
bounding_box=(-122.0, 46.7, -121.5, 47),
temporal=('2024-01-01','2026-12-31'),
collection_concept_id='C2074860916-LPCLOUD' # Ecostress Soil Moisture short name = ECO_L3T_SM
)
items = search.format("stac").get()
# GeoJSON Feature Collection
features = json.loads(items[0])
len(features['features'])
But using this endpoint takes an order of magnitude more (~14s)
# limit > 100 has no affect. Max returns per page=100
url = 'https://cmr.earthdata.nasa.gov/cloudstac/LPCLOUD/search?limit=200&bbox=-122.0%2C46.7%2C-121.5%2C47.0&datetime=2024-01-01T00%3A00%3A00Z%2F2026-12-31T23%3A59%3A59Z&collections=ECO_L3T_SM_002'
all_features = []
page = 1
while url:
response = requests.get(url)
response.raise_for_status()
data = response.json()
features = data.get('features', [])
all_features.extend(features)
print(f"Page {page}: got {len(features)} features (total so far: {len(all_features)})")
# Find the 'next' link to continue paging, or stop if there isn't one
url = next((link['href'] for link in data.get('links', []) if link.get('rel') == 'next'), None)
page += 1
print(f"\nTotal features retrieved: {len(all_features)}")
I think this is partly due to a max paging of 100 items per page in nasa/cmr-stac vs nasa/python_cmr using the CMR default max of 2000...
It would be very helpful if there were a comparison how the implementation and results coming from nasa/cmr-stac differ from the STAC output format explained in the official CMR docs (https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html#stac). Is this difference in speed expected? How to the collection names, asset names, and other fields differ?
related discussion: earthaccess-dev/earthaccess#221 (reply in thread)
https://github.com/nasa/python_cmr can return a STAC feature collection very quickly (742 items in ~500ms)
But using this endpoint takes an order of magnitude more (~14s)
I think this is partly due to a max paging of 100 items per page in nasa/cmr-stac vs nasa/python_cmr using the CMR default max of 2000...
It would be very helpful if there were a comparison how the implementation and results coming from nasa/cmr-stac differ from the STAC output format explained in the official CMR docs (https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html#stac). Is this difference in speed expected? How to the collection names, asset names, and other fields differ?
related discussion: earthaccess-dev/earthaccess#221 (reply in thread)