Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate new aggregation subsystem based content importer module #1361

Closed

Commits on Jun 24, 2022

  1. Closes #1298: Refactor the way IIS imports contents by dumping Object…

    …Store in favour of newly introduced PDF aggregation service
    
    Initial implementation, proof of concept.
    marekhorst committed Jun 24, 2022
    Configuration menu
    Copy the full SHA
    692128a View commit details
    Browse the repository at this point in the history
  2. Closes #1298: Refactor the way IIS imports contents by dumping Object…

    …Store in favour of newly introduced PDF aggregation service
    
    Refined implementation. Subworkflow still to be integrated with IIS main workflow.
    marekhorst committed Jun 24, 2022
    Configuration menu
    Copy the full SHA
    90b86d0 View commit details
    Browse the repository at this point in the history
  3. Closes #1298: Refactor the way IIS imports contents by dumping Object…

    …Store in favour of newly introduced PDF aggregation service
    
    Limiting entries to the ones with actual_url not null in other not to violate the DocumentContentURL schema constraints.
    marekhorst committed Jun 24, 2022
    Configuration menu
    Copy the full SHA
    e94c5bb View commit details
    Browse the repository at this point in the history
  4. Closes #1298: Refactor the way IIS imports contents by dumping Object…

    …Store in favour of newly introduced PDF aggregation service
    
    Renaming `actual_url` to `location` as a column reference pointing to S3 content location.
    marekhorst committed Jun 24, 2022
    Configuration menu
    Copy the full SHA
    3031c79 View commit details
    Browse the repository at this point in the history
  5. Closes #1298: Refactor the way IIS imports contents by dumping Object…

    …Store in favour of newly introduced PDF aggregation service
    
    Enabling the new hive-based content metadata importer with content_url importer uber workflow and metadataextraction cache builder.
    marekhorst committed Jun 24, 2022
    Configuration menu
    Copy the full SHA
    36606ab View commit details
    Browse the repository at this point in the history
  6. Closes #1298: Refactor the way IIS imports contents by dumping Object…

    …Store in favour of newly introduced PDF aggregation service
    
    Excluding conflicting jackson libraries.
    marekhorst committed Jun 24, 2022
    Configuration menu
    Copy the full SHA
    8cf1b38 View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    2439e41 View commit details
    Browse the repository at this point in the history

Commits on Jun 29, 2022

  1. Closes #1298: Refactor the way IIS imports contents by dumping Object…

    …Store in favour of newly introduced PDF aggregation service
    
    The following new properties are supported:
    * `import_content_pdfaggregation_table_name` to be specified in at runtime 'dbname.tablename' format to indicate both hive database and table
    * `import_content_pdfaggregation_hive_metastore_uris` with URIs pointing to the hive metastore utilized by the PDF aggregation subsystem, to be defined statically in IIS environment (default-config.xml file)
    
    New hive-based PDF aggregation service support is automatically enabled by providing explicit `import_content_pdfaggregation_table_name` parameter value at runtime. When the parameter is unspecified then content importer module works in legacy mode (objectstore compatible).
    marekhorst committed Jun 29, 2022
    Configuration menu
    Copy the full SHA
    d481c39 View commit details
    Browse the repository at this point in the history
  2. Closes #1298: Refactor the way IIS imports contents by dumping Object…

    …Store in favour of newly introduced PDF aggregation service
    
    Applying code review fixes.
    marekhorst committed Jun 29, 2022
    Configuration menu
    Copy the full SHA
    4c1b3a2 View commit details
    Browse the repository at this point in the history