-
Notifications
You must be signed in to change notification settings - Fork 5
Open
Description
Current Problem
The Cesium tutorial (parquet_cesium.qmd) works with the OpenContext Parquet file but fails with the full iSamples Zenodo dataset, throwing "RuntimeError: data is not iterable".
Root Cause: Schema Mismatch
The tutorial is hardcoded for OpenContext's graph-based schema but fails with the flattened iSamples schema:
OpenContext Schema (Working)
- File:
oc_isamples_pqg.parquet - Table:
nodes - Coordinates:
latitude,longitude - Filter:
otype='GeospatialCoordLocation' - Query:
SELECT pid, latitude, longitude FROM nodes WHERE otype='GeospatialCoordLocation'
Full iSamples Schema (Failing)
- File:
isamples_export_2025_04_21_16_23_46_geo.parquet - Table:
isamples_data(or direct access) - Coordinates:
sample_location_latitude,sample_location_longitude - Filter: No otype filter needed
- Query:
SELECT sample_location_longitude, sample_location_latitude FROM isamples_data
Current Hardcoded Implementation
// This only works for OpenContext schema
const query = `SELECT pid, latitude, longitude FROM nodes WHERE otype='GeospatialCoordLocation'`;Proposed Solution
Implement schema detection and adaptive querying similar to the zenodo_isamples_analysis.qmd approach:
- Schema Detection: Probe the file to determine available tables and columns
- Adaptive Queries: Use different query patterns based on detected schema
- Unified Interface: Present same functionality regardless of underlying schema
- Error Handling: Graceful fallbacks when schema detection fails
Implementation Approach
// Detect available schema
async function detectSchema(db) {
// Try OpenContext schema first
try {
await db.query("SELECT COUNT(*) FROM nodes WHERE otype='GeospatialCoordLocation' LIMIT 1");
return 'opencontext';
} catch {
// Try iSamples schema
try {
await db.query("SELECT COUNT(*) FROM isamples_data WHERE sample_location_latitude IS NOT NULL LIMIT 1");
return 'isamples';
} catch {
return 'unknown';
}
}
}
// Adaptive query builder
function buildLocationQuery(schema) {
switch(schema) {
case 'opencontext':
return "SELECT pid, latitude, longitude FROM nodes WHERE otype='GeospatialCoordLocation'";
case 'isamples':
return "SELECT sample_identifier as pid, sample_location_latitude as latitude, sample_location_longitude as longitude FROM isamples_data WHERE sample_location_latitude IS NOT NULL";
default:
throw new Error('Unsupported schema');
}
}Benefits
- Universal compatibility with different iSamples Parquet formats
- Better user experience - any valid iSamples Parquet file should work
- Future-proof - easier to add new schema support
- Consistent tutorial behavior across different data sources
Files to Update
tutorials/parquet_cesium.qmd- Primary implementationtutorials/parquet_cesium_split.qmd- Apply same pattern- Consider extracting common schema detection into shared utility
Priority
Medium - affects tutorial usability with the primary iSamples dataset sources.
Metadata
Metadata
Assignees
Labels
No labels