In many Microsoft apps, the equivalent of a null timestamp is 1601/01/01 00:00:00. These can pop up sometimes if a file has been recovered or there was some other issue during saving.
This causes an OSError when trying to convert to a Unix timestamp: datetime(1601, 1, 1, 0, 0, 0).timestamp(),
Suggested fix:
In https://github.com/Unstructured-IO/unstructured-ingest/blob/main/unstructured_ingest/utils/string_and_date_utils.py add:
def safe_timestamp(dt: datetime) -> Optional[str]:
"""
Converts a datetime object to a string representation of its timestamp.
Handles potential exceptions that may arise during conversion.
"""
try:
return str(dt.timestamp())
except (OSError, ValueError, OverflowError):
return None
In the OneDrive ingester:
|
return FileData( |
|
identifier=drive_item.id, |
|
connector_type=self.connector_type, |
|
source_identifiers=SourceIdentifiers( |
|
fullpath=server_path, filename=drive_item.name, rel_path=rel_path |
|
), |
|
metadata=FileDataSourceMetadata( |
|
url=drive_item.parent_reference.path + "/" + drive_item.name, |
|
version=drive_item.etag, |
|
date_modified=str(date_modified_dt.timestamp()) if date_modified_dt else None, |
|
date_created=str(date_created_at.timestamp()) if date_created_at else None, |
|
date_processed=str(time()), |
|
record_locator={ |
|
"user_pname": self.connection_config.user_pname, |
|
"server_relative_path": server_path, |
|
}, |
|
), |
|
additional_metadata=self.get_properties_sync(drive_item=drive_item), |
use safe_timestamp()
return FileData(
identifier=drive_item.id,
connector_type=self.connector_type,
source_identifiers=SourceIdentifiers(
fullpath=server_path, filename=drive_item.name, rel_path=rel_path
),
metadata=FileDataSourceMetadata(
url=drive_item.parent_reference.path + "/" + drive_item.name,
version=drive_item.etag,
date_modified=safe_timestamp(date_modified_dt),
date_created=safe_timestamp(date_created_at),
date_processed=str(time()),
record_locator={
"user_pname": self.connection_config.user_pname,
"server_relative_path": server_path,
},
),
additional_metadata=self.get_properties_sync(drive_item=drive_item),
)
Happy to open a PR if needed
In many Microsoft apps, the equivalent of a null timestamp is
1601/01/01 00:00:00. These can pop up sometimes if a file has been recovered or there was some other issue during saving.This causes an OSError when trying to convert to a Unix timestamp:
datetime(1601, 1, 1, 0, 0, 0).timestamp(),Suggested fix:
In https://github.com/Unstructured-IO/unstructured-ingest/blob/main/unstructured_ingest/utils/string_and_date_utils.py add:
In the OneDrive ingester:
unstructured-ingest/unstructured_ingest/processes/connectors/onedrive.py
Lines 208 to 225 in 1234157
use
safe_timestamp()Happy to open a PR if needed