Skip to content

Fix Cesium tutorial compatibility with different Parquet schemas #22

@rdhyee

Description

@rdhyee

Current Problem

The Cesium tutorial (parquet_cesium.qmd) works with the OpenContext Parquet file but fails with the full iSamples Zenodo dataset, throwing "RuntimeError: data is not iterable".

Root Cause: Schema Mismatch

The tutorial is hardcoded for OpenContext's graph-based schema but fails with the flattened iSamples schema:

OpenContext Schema (Working)

  • File: oc_isamples_pqg.parquet
  • Table: nodes
  • Coordinates: latitude, longitude
  • Filter: otype='GeospatialCoordLocation'
  • Query: SELECT pid, latitude, longitude FROM nodes WHERE otype='GeospatialCoordLocation'

Full iSamples Schema (Failing)

  • File: isamples_export_2025_04_21_16_23_46_geo.parquet
  • Table: isamples_data (or direct access)
  • Coordinates: sample_location_latitude, sample_location_longitude
  • Filter: No otype filter needed
  • Query: SELECT sample_location_longitude, sample_location_latitude FROM isamples_data

Current Hardcoded Implementation

// This only works for OpenContext schema
const query = `SELECT pid, latitude, longitude FROM nodes WHERE otype='GeospatialCoordLocation'`;

Proposed Solution

Implement schema detection and adaptive querying similar to the zenodo_isamples_analysis.qmd approach:

  1. Schema Detection: Probe the file to determine available tables and columns
  2. Adaptive Queries: Use different query patterns based on detected schema
  3. Unified Interface: Present same functionality regardless of underlying schema
  4. Error Handling: Graceful fallbacks when schema detection fails

Implementation Approach

// Detect available schema
async function detectSchema(db) {
  // Try OpenContext schema first
  try {
    await db.query("SELECT COUNT(*) FROM nodes WHERE otype='GeospatialCoordLocation' LIMIT 1");
    return 'opencontext';
  } catch {
    // Try iSamples schema
    try {
      await db.query("SELECT COUNT(*) FROM isamples_data WHERE sample_location_latitude IS NOT NULL LIMIT 1");
      return 'isamples';
    } catch {
      return 'unknown';
    }
  }
}

// Adaptive query builder
function buildLocationQuery(schema) {
  switch(schema) {
    case 'opencontext':
      return "SELECT pid, latitude, longitude FROM nodes WHERE otype='GeospatialCoordLocation'";
    case 'isamples':
      return "SELECT sample_identifier as pid, sample_location_latitude as latitude, sample_location_longitude as longitude FROM isamples_data WHERE sample_location_latitude IS NOT NULL";
    default:
      throw new Error('Unsupported schema');
  }
}

Benefits

  • Universal compatibility with different iSamples Parquet formats
  • Better user experience - any valid iSamples Parquet file should work
  • Future-proof - easier to add new schema support
  • Consistent tutorial behavior across different data sources

Files to Update

  • tutorials/parquet_cesium.qmd - Primary implementation
  • tutorials/parquet_cesium_split.qmd - Apply same pattern
  • Consider extracting common schema detection into shared utility

Priority

Medium - affects tutorial usability with the primary iSamples dataset sources.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions