Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Dealing with "HUGE" single SVN Revisions #3

Open
ThomasMannIT opened this issue Oct 2, 2013 · 1 comment
Open
Assignees

Comments

@ThomasMannIT
Copy link

Hello, your SVN River is working great.

While testing it i run into following situtation:

In an to be indexed SVN i have a SVN Revision Number in which a hero commited 6GB of data (not binary, but plaintext SQL dumps -.-) )

Reducing the bulk_size option down to 1 doesn't help, as it is still only one revision to be indexed. And 6GB data to be indexed leads to OutOfHeap Exceptions even on an 8GB machine with 7GB heap space.

At the moment i avoided the problem by letting the river index till x-1 revision and defining afterwards a start_revision with x+1.

But this workaround doesnt feel right.

Maybe some new "river options" could help:

Like:

a) max_bulk_size_in_mb
b) File-Extension Filters
c) Folder Filters
d) Revision Filters

@plombard
Copy link
Owner

plombard commented Oct 3, 2013

Hi, and thanks for the feedback.
It should be fairly simple to implement some filters and max_size, I'll get to it as soon as I can.
On the other hand, I'm not fond of filtering entire revisions or folders. They should be there, so the history of the repository can still be browsed.
As the resulting index is far from sufficient to browse the repositories easily, it leaves a huge amount of functional implementations to the front-end (if you want something like ViewSVN). So I think having a trace of every revision/change is mandatory. The content, however, isn't. So we could replace the 6Gb text by just a warning message.

And you are absolutely right, Heap consumption is a concern, as I foolishly load the entire revision (content included) in memory. Maybe I'll try to index the file content separately from the metadatas, I don't know yet.

@ghost ghost assigned plombard Oct 3, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants