Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

An error occurred while analyzing the dataset (out-of-memory) #1328

Closed
Youssef1313 opened this issue Nov 23, 2023 · 4 comments
Closed

An error occurred while analyzing the dataset (out-of-memory) #1328

Youssef1313 opened this issue Nov 23, 2023 · 4 comments

Comments

@Youssef1313
Copy link

I'm analyzing a simple 1.5MB zip file consisting of Java files, but getting out-of-memory.

@rien
Copy link
Member

rien commented Nov 23, 2023

Hi @Youssef1313 how many files are you analyzing? How big are they?

Do you get this problem with the web server? Or using the CLI?

@Youssef1313
Copy link
Author

@rien Using the web server. 1461 Java files.

@rien
Copy link
Member

rien commented Nov 23, 2023

I've take a closer look at the logs and it seems like the submissions you want to analyze have multiple files per submission.

We currently don't support grouping files together, so each file would be compared with all the others instead of with other groups. This needs a lot of memory and would give sub-par results.

In the mean time (until #1121 is implemented), you could try concatenating all files that belong to the same submission into one file (if you're using linux: cat **/*.java > combined.java).

Let me know if you need more help with this.

@Youssef1313
Copy link
Author

Great. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants