Skip to content

Refactor data handlers - separate insertion from saving geojson #729

@santoshgdev

Description

@santoshgdev

Context

Currently, our data handlers in backend/etl/ are responsible for both inserting data into the database and saving GeoJSON files. This tight coupling violates the principle of separation of concerns, making the code harder to maintain, test, and understand. It also introduces potential issues where partial failures (e.g., database insert succeeds but GeoJSON save fails, or vice-versa) can lead to inconsistent data states. Refactoring this will improve modularity and robustness.

Definition of Done

  • Database insertion logic is clearly separated from GeoJSON saving logic within the data handlers.
  • Each data handler (e.g., landslide_data_handler.py, liquefaction_data_handler.py, soft_story_properties_data_handler.py, tsunami_data_handler.py) has distinct methods or functions for database operations and GeoJSON operations.
  • The mapbox_geojson_manager.py is utilized exclusively for GeoJSON related operations.
  • Existing ETL processes continue to function correctly after the refactoring.
  • Unit tests are updated or added to reflect the new separation of concerns.

Engineering Details

The primary files to reference are located in backend/etl/:

  • data_handler.py
  • landslide_data_handler.py
  • liquefaction_data_handler.py
  • soft_story_properties_data_handler.py
  • tsunami_data_handler.py
  • mapbox_geojson_manager.py

The refactoring should involve:

  1. Identifying the code sections responsible for database insertion and GeoJSON saving within each data handler.
  2. Extracting the GeoJSON saving logic into separate, dedicated functions or methods, potentially leveraging or extending mapbox_geojson_manager.py.
  3. Ensuring that the database insertion logic remains focused solely on database operations.
  4. Updating any calling code to use the newly separated functions/methods.
  5. Verifying that the ETL pipeline still functions as expected.
  6. Adding or updating tests to cover the separated functionalities.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

Status

In Progress

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions