-
Notifications
You must be signed in to change notification settings - Fork 443
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory leak on writes/merges #2522
Comments
Can you do the following, change your script to sleep first for 30 secs, then create the table and then sleep again for 30 secs, then start writing in a loop those 50 times? I think the slow increase in resident size might just be because at every write you update the table state at the end of the commit since it includes more info now. |
Merge operations probably holds more info, so this looks normal to me |
@ion-elgreco The for the last checkpoint after running a script with 1000 merges resulting in the following memory increase: The amount the memory increased seems much larger than the metadata |
@echai58 the checkpoint is compressed and also would never translate 1:1 from disk to mem afaik |
@ion-elgreco profiling a script that just instantiates the same delta table gives the following: ~13 mb , which is still much less than the >100mb seen from the merge script |
Has there been any progress on this? I'm experiencing the same issue with merges. |
I have a theory that this might be related to some of the performance issues that @roeap and I were hunting after this week. He's got some fixes in mind after which we can look into this specific issue some more |
Hello, any update for this topic? |
I think we have the same issue here. |
There are a couple features in main that have improved the memory usage:
@rtyler is doing some testing at the moment on the memory perf. The numbers are promising imho, even if it's a micro benchmark for normal appends. Before these PRs in main it was around 500-700MB memory usage. Now it's at a stable 500MB, with the WIP PR #3196 using the LazyTableProvider it's at 30-50MB ^^ |
Environment
Binding: python
Bug
What happened:
We're noticing constantly rising memory in our processes that write to deltalake. I wrote a minimal reproduction that loops and writes to deltalake, and the memory usage seems to indicate a memory leak.
What you expected to happen:
Memory to be reclaimed after writes.
How to reproduce it:
This is my script I tested with:
More details:
![image](https://private-user-images.githubusercontent.com/56415623/331712084-7c25fc24-3a6d-4448-a94f-bf29225c68cb.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzk1ODQyMTAsIm5iZiI6MTczOTU4MzkxMCwicGF0aCI6Ii81NjQxNTYyMy8zMzE3MTIwODQtN2MyNWZjMjQtM2E2ZC00NDQ4LWE5NGYtYmYyOTIyNWM2OGNiLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMTUlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjE1VDAxNDUxMFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTI1MTc2NWEyNzA3YTRkMDNiMmExMzFlYjE2OTQ4YzhiZTUyYmNhOTRmMTM5YzNjMDhjN2Y0MDRlZmM4MDNiZTEmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.-8UP-2rGeGe_aqIRHb3lPCyFqS1R6k4NxLmaeNTfgdc)
Here's the memray graph:
I also tested this with just
![image](https://private-user-images.githubusercontent.com/56415623/331712177-ffaa5d00-8f14-479f-925a-53e83a4a5c46.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzk1ODQyMTAsIm5iZiI6MTczOTU4MzkxMCwicGF0aCI6Ii81NjQxNTYyMy8zMzE3MTIxNzctZmZhYTVkMDAtOGYxNC00NzlmLTkyNWEtNTNlODNhNGE1YzQ2LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMTUlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjE1VDAxNDUxMFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTI1YjFiMjRmNjJkNWVhMzQ5YzYxOGZjMGYxOGI5NGM0MGI2NzQzMGNjMWIzMzMxYmZjNjJhN2FlZjc3ZWM2YmMmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.nfVHOBZXYOB_oEwJbloiiwjvJ68xxrbWXADYTtfSIfc)
write_deltalake(mode="append")
, and the issue seems to also persist:I saw #2068 and tried setting that env var, and got the following (doesn't seem to help):
![image](https://private-user-images.githubusercontent.com/56415623/331712266-c383b5ea-73b7-455f-80bf-46b61c1aa089.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzk1ODQyMTAsIm5iZiI6MTczOTU4MzkxMCwicGF0aCI6Ii81NjQxNTYyMy8zMzE3MTIyNjYtYzM4M2I1ZWEtNzNiNy00NTVmLTgwYmYtNDZiNjFjMWFhMDg5LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMTUlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjE1VDAxNDUxMFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPThmZDBkY2JjOWE5NTI3YWI4ODQyZDhlOGFiMDYwZjI2NjgzN2E3MDNkZGY3OWEzMTAxZWZhZDliNGFiNGVlOWImWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.MqW4JrGLOzqwzCmaHKEcOzPbdpoblZFSL6g-l9VrIso)
The text was updated successfully, but these errors were encountered: