OOM when loading large JSON files in v2

# Overview

I am trying to parse a ~2.5GB JSON data file containing a list of lists of data (think Array of Array of Structs). Using the recommended approach of `model_validate_json(f.read())` results in the OS SIGKILL-ing the process due to it running out of memory. In comparison, Python's `json` module parses it effortlessly.

For a bit of detail, I profiled the code using the below snippets using `memray` and am attaching the HTML flame graph files as TXT for ease of use (and because Github doesn't allow HTML files as attachments but allows PPTX..).

I wasn't able to dig deeper into the issue (due to lack of time) but it is possible that the issue is related to #843, but I could be very wrong (hence the new issue).

## Vanilla `json`

```python
import json

with open("dataset.json") as f:
    data = json.load(f)
```
[memray-flamegraph-test-json.py.107113.html.txt](https://github.com/pydantic/pydantic-core/files/12692015/memray-flamegraph-test-json.py.107113.html.txt)

This approach just uses about 8.8G of memory: ~6 for parsing and the rest for the string data buffer.

## Pydantic recommended API

```python
from titanium_data import Data

with open("dataset.json") as f:
    data = Data.model_validate_json(f)
```
[memray-flamegraph-test-pydantic.py.131233.html.txt](https://github.com/pydantic/pydantic-core/files/12692040/memray-flamegraph-test-pydantic.py.131233.html.txt)

This gets SIGKILLed by the OS after consuming ~23G to parse the 2.5GB file.

## Pydantic second approach

This uses the "non-recommended" approach from https://github.com/pydantic/pydantic/issues/7323

```python
import json

from titanium_data import Data

with open("./data/trajectories/scenario1/dataset.json") as f:
    data = json.load(f)
data = Data.model_validate(data)
```
[memray-flamegraph-test-pydantic2.py.130581.html.txt](https://github.com/pydantic/pydantic-core/files/12692114/memray-flamegraph-test-pydantic2.py.130581.html.txt)

Interestingly enough, this method successfully parses the dataset, and much faster than the direct approach of using `model_validate_json`.

### System Information

`uname -srvmo`

- `Linux 5.15.0-84-generic #93~20.04.1-Ubuntu SMP Wed Sep 6 16:15:40 UTC 2023 x86_64 GNU/Linux`

Pydantic versions:

- `pydantic==2.3.0`
- `-e git+https://github.com/pydantic/pydantic-core@c086caec1a200417f19850244282c06b5d4d1650#egg=pydantic_core`
  - Equivalent to `==2.6.3`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

OOM when loading large JSON files in v2 #985

Overview

Vanilla `json`

Pydantic recommended API

Pydantic second approach

System Information

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

OOM when loading large JSON files in v2 #985

Description

Overview

Vanilla json

Pydantic recommended API

Pydantic second approach

System Information

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Vanilla `json`