Harden arXiv retrieval against batch API failures in `calculate-and-send` by reoLantern · Pull Request #268 · TideDra/zotero-arxiv-daily

reoLantern · 2026-06-30T02:41:13Z

arxiv Python 库请求 https://export.arxiv.org/api/query?...id_list=... 时返回了 HTTP 406，导致任务直接退出。

在 ArxivRetriever 内实现了“限流 + 分批 + 重试”，并且把 406 场景改成降级不退出：

分批：按 20 个 paper ID 一批请求 arXiv API。
限流：批次之间 sleep(3)；降级到单篇请求时每篇之间 sleep(1)。
重试：
- arxiv.Client(num_retries=10, delay_seconds=10) 保留库内重试；
- 对 429 增加批级重试（最多 5 次，30s 线性退避）。
406/其他 HTTP 错误：批请求失败后转为单篇请求；单篇仍失败的 ID 只记录 warning 并跳过，不再让 workflow 直接失败。

测试了一下，不影响正常运行。

Harden arXiv retrieval against batch API failures in `calculate-and-send`

Copilot

Pull request overview

This PR hardens ArxivRetriever against arXiv batch API failures (notably HTTP 406 / 429) by batching ID lookups, rate-limiting between requests, retrying 429s at the batch level, and degrading to per-paper requests when a batch request fails—skipping unretrievable papers with warnings rather than failing the workflow.

Changes:

Batch arXiv API queries in chunks of 20 paper IDs, with inter-batch sleep.
Add batch-level retry/backoff for HTTP 429 and per-paper fallback for other HTTP errors (including 406), skipping IDs that still fail.
Add a pytest case to validate the batch-error → per-paper fallback behavior.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File	Description
`src/zotero_arxiv_daily/retriever/arxiv_retriever.py`	Adds batching, throttling, 429 retry/backoff, and per-paper fallback/skip logic for resilient arXiv retrieval.
`tests/retriever/test_arxiv_retriever.py`	Adds a test to ensure batch HTTP errors trigger per-paper fallback and skipped IDs are warned/omitted.

                try:
                    batch = list(client.results(search))
                    bar.update(len(batch))
                    raw_papers.extend(batch)


+                    batch = []
+                    for index, paper_id in enumerate(batch_ids):
+                        try:
+                            batch.extend(list(client.results(arxiv.Search(id_list=[paper_id]))))
+                        except arxiv.HTTPError as paper_exc:
+                            logger.warning(
+                                f"Skipping arXiv paper {paper_id} due to API error status {paper_exc.status}"
+                            )
+                        if index + 1 < len(batch_ids):
+                            sleep(1)
+                    bar.update(len(batch))
+                    raw_papers.extend(batch)
+                    break


+                    if exc.status == 429:
+                        if attempt < max_batch_retries - 1:
+                            wait = batch_retry_delay * (attempt + 1)
+                            logger.warning(
+                                f"arXiv API 429 on batch {i // 20}, "
+                                f"retry {attempt + 1}/{max_batch_retries} in {wait}s"
+                            )
+                            sleep(wait)
+                            continue
+                        logger.warning(
+                            f"arXiv API 429 on batch {i // 20} after {max_batch_retries} retries. "
+                            "Falling back to per-paper requests."
+                        )


+
+
+def test_arxiv_retriever_falls_back_to_per_paper_on_batch_http_error(config, mock_feedparser, monkeypatch):
+    monkeypatch.setattr("zotero_arxiv_daily.retriever.base.sleep", lambda _: None)


Copilot AI and others added 3 commits June 30, 2026 01:11

Initial plan

027f994

fix: make arxiv retrieval resilient to batch API errors

e2ce658

Merge pull request #1 from reoLantern/copilot/fix-calculate-and-send-job

77e7718

Harden arXiv retrieval against batch API failures in `calculate-and-send`

Copilot AI review requested due to automatic review settings June 30, 2026 02:41

Copilot started reviewing on behalf of reoLantern June 30, 2026 02:41 View session

Copilot AI reviewed Jun 30, 2026

View reviewed changes

chore: update keep-alive timestamp [skip ci]

8b16c65

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Harden arXiv retrieval against batch API failures in `calculate-and-send`#268

Harden arXiv retrieval against batch API failures in `calculate-and-send`#268
reoLantern wants to merge 4 commits into
TideDra:mainfrom
reoLantern:main

reoLantern commented Jun 30, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants



		def test_arxiv_retriever_falls_back_to_per_paper_on_batch_http_error(config, mock_feedparser, monkeypatch):
		monkeypatch.setattr("zotero_arxiv_daily.retriever.base.sleep", lambda _: None)

Uh oh!

Conversation

reoLantern commented Jun 30, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants