Remove base64 asset data when indexing wysiwyg fields #288
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
If you have a wysiwyg field in a data object, it can have HTML content like
In the corresponding database table the image gets saved base64-encoded:
In our case this was a huge original image, so the base64 was > 50 MB.
When the object with this data is tried to be indexed, open search throws an error
HTTP 413 means "Content too large".
As the generic data index is used for searching, imho it does not make sense to store base64 information there. So this PR removes the base64 data from the Wysiwyg HTML.