You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In an to be indexed SVN i have a SVN Revision Number in which a hero commited 6GB of data (not binary, but plaintext SQL dumps -.-) )
Reducing the bulk_size option down to 1 doesn't help, as it is still only one revision to be indexed. And 6GB data to be indexed leads to OutOfHeap Exceptions even on an 8GB machine with 7GB heap space.
At the moment i avoided the problem by letting the river index till x-1 revision and defining afterwards a start_revision with x+1.
But this workaround doesnt feel right.
Maybe some new "river options" could help:
Like:
a) max_bulk_size_in_mb
b) File-Extension Filters
c) Folder Filters
d) Revision Filters
The text was updated successfully, but these errors were encountered:
Hi, and thanks for the feedback.
It should be fairly simple to implement some filters and max_size, I'll get to it as soon as I can.
On the other hand, I'm not fond of filtering entire revisions or folders. They should be there, so the history of the repository can still be browsed.
As the resulting index is far from sufficient to browse the repositories easily, it leaves a huge amount of functional implementations to the front-end (if you want something like ViewSVN). So I think having a trace of every revision/change is mandatory. The content, however, isn't. So we could replace the 6Gb text by just a warning message.
And you are absolutely right, Heap consumption is a concern, as I foolishly load the entire revision (content included) in memory. Maybe I'll try to index the file content separately from the metadatas, I don't know yet.
Hello, your SVN River is working great.
While testing it i run into following situtation:
In an to be indexed SVN i have a SVN Revision Number in which a hero commited 6GB of data (not binary, but plaintext SQL dumps -.-) )
Reducing the bulk_size option down to 1 doesn't help, as it is still only one revision to be indexed. And 6GB data to be indexed leads to OutOfHeap Exceptions even on an 8GB machine with 7GB heap space.
At the moment i avoided the problem by letting the river index till x-1 revision and defining afterwards a start_revision with x+1.
But this workaround doesnt feel right.
Maybe some new "river options" could help:
Like:
a) max_bulk_size_in_mb
b) File-Extension Filters
c) Folder Filters
d) Revision Filters
The text was updated successfully, but these errors were encountered: