Gemini is frequently throwing 503 errors "The model is overloaded" - I have purposely broken the langextract task down to chunks of 4000. Presumably, the 503 error happened for one chunk. In this scenario, langextract should internally retry that chunk and continue processing the document. It makes no sense to abandon all of the work that has been done because of a load issue. The necessity to redo entire documents eats up quotas and causes 429 errors.