Skip to content

Allow exporting large dataset via getStatements #5333

Open
@odysa

Description

@odysa

Problem description

Currently, exporting a large dataset will cause OOM. It exceeds the capacity of ByteArrayOutputStream (Integer.MAX_VALUE)

Raising this to discuss with RDF4J community about how can we support large dataset export vis rest api.

Potential Solutions:
Streaming? Chunking?

org.springframework.web.util.NestedServletException: Handler processing failed; nested exception is java.lang.OutOfMemoryError: Required array length 2147483639 + 9 is too large
	org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:1095)
	org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:965)
	org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:1006)
	org.springframework.web.servlet.FrameworkServlet.doGet(FrameworkServlet.java:898)
	javax.servlet.http.HttpServlet.service(HttpServlet.java:655)
	org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:883)
	javax.servlet.http.HttpServlet.service(HttpServlet.java:764)
	org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:53)
	com.github.ziplet.filter.compression.CompressingFilter.doFilter(CompressingFilter.java:263)
</pre><p><b>Root Cause</b></p><pre>java.lang.OutOfMemoryError: Required array length 2147483639 + 9 is too large
	java.base&#47;jdk.internal.util.ArraysSupport.hugeLength(ArraysSupport.java:649)
	java.base&#47;jdk.internal.util.ArraysSupport.newLength(ArraysSupport.java:642)
	java.base&#47;[java.io](http://java.io/).ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:100)
	java.base&#47;[java.io](http://java.io/).ByteArrayOutputStream.write(ByteArrayOutputStream.java:130)
	java.base&#47;sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:234)
	java.base&#47;sun.nio.cs.StreamEncoder.implWrite(StreamEncoder.java:304)
	java.base&#47;sun.nio.cs.StreamEncoder.implWrite(StreamEncoder.java:282)
	java.base&#47;sun.nio.cs.StreamEncoder.write(StreamEncoder.java:132)
	java.base&#47;sun.nio.cs.StreamEncoder.write(StreamEncoder.java:120)
	java.base&#47;[java.io](http://java.io/).OutputStreamWriter.write(OutputStreamWriter.java:187)
	java.base&#47;[java.io](http://java.io/).Writer.append(Writer.java:389)
	org.eclipse.rdf4j.rio.ntriples.NTriplesWriter.writeBNode(NTriplesWriter.java:172)
	org.eclipse.rdf4j.rio.ntriples.NTriplesWriter.writeValue(NTriplesWriter.java:145)
	org.eclipse.rdf4j.rio.ntriples.NTriplesWriter.consumeStatement(NTriplesWriter.java:101)
	org.eclipse.rdf4j.rio.helpers.AbstractRDFWriter.handleStatementEncodeRDFStar(AbstractRDFWriter.java:154)
	org.eclipse.rdf4j.rio.helpers.AbstractRDFWriter.handleStatement(AbstractRDFWriter.java:109)
	org.eclipse.rdf4j.repository.sail.SailRepositoryConnection.exportStatements(SailRepositoryConnection.java:390)
	org.eclipse.rdf4j.http.server.repository.statements.ExportStatementsView.render(ExportStatementsView.java:95)
	org.springframework.web.servlet.DispatcherServlet.render(DispatcherServlet.java:1406)
	org.springframework.web.servlet.DispatcherServlet.processDispatchResult(DispatcherServlet.java:1150)
	org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:1089)
	org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:965)
	org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:1006)
	org.springframework.web.servlet.FrameworkServlet.doGet(FrameworkServlet.java:898)
	javax.servlet.http.HttpServlet.service(HttpServlet.java:655)
	org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:883)
	javax.servlet.http.HttpServlet.service(HttpServlet.java:764)
	org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:53)
	com.github.ziplet.filter.compression.CompressingFilter.doFilter(CompressingFilter.java:263)

Preferred solution

No response

Are you interested in contributing a solution yourself?

None

Alternatives you've considered

No response

Anything else?

No response

Metadata

Metadata

Assignees

Labels

📶 enhancementissue is a new feature or improvement

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions