Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support reading of arbitrary fields #142

Merged
merged 12 commits into from
Jan 23, 2025
27 changes: 25 additions & 2 deletions docs/pages/userdocs/read.dox
Original file line number Diff line number Diff line change
Expand Up @@ -308,7 +308,7 @@
* - \ref AQNWB::IO::BaseIO "BaseIO", \ref AQNWB::IO::HDF5::HDF5IO "HDF5IO" are responsible for
* i) reading type attribute and group information, ii) searching the file for typed objects via
* \ref AQNWB::IO::BaseIO::findTypes "findTypes()" methods, and iii) retrieving the paths of all
* object within a group via \ref AQNWB::IO::BaseIO::getGroupObjects "getGroupObjects()"
* object associated with a storage object (e.g., a Group) via \ref AQNWB::IO::BaseIO::getStorageObjects "getStorageObjects()"
*
* \subsubsection read_design_wrapper_registeredType RegisteredType
*
Expand All @@ -320,7 +320,9 @@
* methods that we can use to instantiate any registered subclass just using the ``io`` object
* and ``path`` for the object in the file. \ref AQNWB::NWB::RegisteredType "RegisteredType" can read
* the type information from the corresponding `namespace` and `neurodata_type` attributes to
* determine the full type and in run look up the corresponding class in its registry and create the type.
* determine the full type, then look up the corresponding class in its registry, and then create the type.
* Using \ref AQNWB::NWB::RegisteredType::readField "RegisteredType::readField" also provides a
* general mechanism for reading arbitrary fields.
*
* \subsubsection read_design_wrapper_subtypes Child classes of RegisteredType (e.g., Container)
*
Expand Down Expand Up @@ -540,5 +542,26 @@
*
* \snippet tests/examples/test_ecephys_data_read.cpp example_read_only_stringattr_snippet
*
*
* \subsubsection read_design_example_read_arbitrary_field Reading arbitrary fields
*
* Even if there is no dedicated `DEFINE_FIELD` definition available, we can still read
* any arbitrary sub-field associated with a particular \ref AQNWB::NWB::RegisteredType "RegisteredType"
* via the generic \ref AQNWB::NWB::RegisteredType::readField "RegisteredType::readField" method. The main
* difference is that for datasets and attributes we need to specify all the additional information
* (e.g., the relative path, object type, and data type) ourselves, whereas using `DEFINE_FIELD`
* this information has already been specified for us. For example, to read the data from
* the \ref AQNWB::NWB::ElectricalSeries "ElectricalSeries" we can call:
*
* \snippet tests/examples/test_ecephys_data_read.cpp example_read_generic_dataset_field_snippet
*
* Similarly, we can also read any sub-fields that are itself \ref AQNWB::NWB::RegisteredType "RegisteredType"
* objects via \ref AQNWB::NWB::RegisteredType::readField "RegisteredType::readField" (e.g., to read custom
* \ref AQNWB::NWB::VectorData "VectorData" columns of a \ref AQNWB::NWB::DynamicTable "DynamicTable"). In
* contrast to dataset and attribute fields, we here only need to specify the relative path of the field.
* \ref AQNWB::NWB::RegisteredType "RegisteredType" in turn can read the type information from the
* `neurodata_type` and `namespace` attributes in the file directly.
*
* \snippet tests/examples/test_ecephys_data_read.cpp example_read_generic_registeredtype_field_snippet
*/

13 changes: 13 additions & 0 deletions src/Types.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,19 @@ class Types
Undefined = -1
};

/**
* \brief Helper struct to check if a value is a data field, i.e.,
* Dataset or Attribute
*
* This function is used to enforce constraints on templated functions that
* should only be callable for valid StorageObjectType values
*/
template<StorageObjectType T>
struct IsDataStorageObjectType
: std::integral_constant<bool, (T == Dataset || T == Attribute)>
{
};

/**
* @brief Alias for the size type used in the project.
*/
Expand Down
21 changes: 14 additions & 7 deletions src/io/BaseIO.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,6 @@ std::unordered_map<std::string, std::string> BaseIO::findTypes(
{
// Check if the current object exists as a dataset or group
if (objectExists(current_path)) {
std::cout << "Current Path: " << current_path << std::endl;
// Check if we have a typed object
if (attributeExists(current_path + "/neurodata_type")
&& attributeExists(current_path + "/namespace"))
Expand All @@ -92,8 +91,6 @@ std::unordered_map<std::string, std::string> BaseIO::findTypes(
std::string full_type =
namespace_attr.data[0] + "::" + neurodata_type_attr.data[0];

std::cout << "Full name: " << full_type << std::endl;

// Check if the full type matches any of the given types
if (types.find(full_type) != types.end()) {
found_types[current_path] = full_type;
Expand All @@ -103,9 +100,14 @@ std::unordered_map<std::string, std::string> BaseIO::findTypes(
// object
if (search_mode == SearchMode::CONTINUE_ON_TYPE) {
// Get the list of objects inside the current group
std::vector<std::string> objects = getGroupObjects(current_path);
std::vector<std::pair<std::string, StorageObjectType>> objects =
getStorageObjects(current_path, StorageObjectType::Undefined);
for (const auto& obj : objects) {
searchTypes(AQNWB::mergePaths(current_path, obj));
if (obj.second == StorageObjectType::Group
|| obj.second == StorageObjectType::Dataset)
{
searchTypes(AQNWB::mergePaths(current_path, obj.first));
}
}
}
} catch (...) {
Expand All @@ -117,9 +119,14 @@ std::unordered_map<std::string, std::string> BaseIO::findTypes(
else
{
// Get the list of objects inside the current group
std::vector<std::string> objects = getGroupObjects(current_path);
std::vector<std::pair<std::string, StorageObjectType>> objects =
getStorageObjects(current_path, StorageObjectType::Undefined);
for (const auto& obj : objects) {
searchTypes(AQNWB::mergePaths(current_path, obj));
if (obj.second == StorageObjectType::Group
|| obj.second == StorageObjectType::Dataset)
{
searchTypes(AQNWB::mergePaths(current_path, obj.first));
}
}
}
}
Expand Down
27 changes: 17 additions & 10 deletions src/io/BaseIO.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -96,12 +96,12 @@ enum class SearchMode
/**
* @brief Stop searching inside an object once a matching type is found.
*/
STOP_ON_TYPE,
STOP_ON_TYPE = 1,
/**
* @brief Continue searching inside an object even after a matching type is
* found.
*/
CONTINUE_ON_TYPE
CONTINUE_ON_TYPE = 2,
};

/**
Expand Down Expand Up @@ -223,19 +223,26 @@ class BaseIO
virtual bool attributeExists(const std::string& path) const = 0;

/**
* @brief Gets the list of objects inside a group.
* @brief Gets the list of storage objects (groups, datasets, attributes)
* inside a group.
*
* This function returns a vector of relative paths of all objects inside
* the specified group. If the input path is not a group (e.g., as dataset
* or attribute or invalid object), then an empty list should be
* returned.
* This function returns the relative paths and storage type of all objects
* inside the specified group. If the input path is an attribute then an empty
* list should be returned. If the input path is a dataset, then only the
* attributes will be returned. Note, this function is not recursive, i.e.,
* it only looks for storage objects associated directly with the given path.
*
* @param path The path to the group.
* @param objectType Define which types of storage object to look for, i.e.,
* only objects of this specified type will be returned.
*
* @return A vector of relative paths of all objects inside the group.
* @return A vector of pairs of relative paths and their corresponding storage
* object types.
*/
virtual std::vector<std::string> getGroupObjects(
const std::string& path) const = 0;
virtual std::vector<std::pair<std::string, StorageObjectType>>
getStorageObjects(const std::string& path,
const StorageObjectType& objectType =
StorageObjectType::Undefined) const = 0;

/**
* @brief Finds all datasets and groups of the given types in the HDF5 file.
Expand Down
53 changes: 49 additions & 4 deletions src/io/hdf5/HDF5IO.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -989,16 +989,61 @@ bool HDF5IO::attributeExists(const std::string& path) const
return (attributePtr != nullptr);
}

std::vector<std::string> HDF5IO::getGroupObjects(const std::string& path) const
std::vector<std::pair<std::string, StorageObjectType>>
HDF5IO::getStorageObjects(const std::string& path,
const StorageObjectType& objectType) const

{
std::vector<std::string> objects;
if (getH5ObjectType(path) == H5O_TYPE_GROUP) {
std::vector<std::pair<std::string, StorageObjectType>> objects;

H5O_type_t h5Type = getH5ObjectType(path);
if (h5Type == H5O_TYPE_GROUP) {
H5::Group group = m_file->openGroup(path);
hsize_t num_objs = group.getNumObjs();
for (hsize_t i = 0; i < num_objs; ++i) {
objects.push_back(group.getObjnameByIdx(i));
std::string objName = group.getObjnameByIdx(i);
H5G_obj_t objType = group.getObjTypeByIdx(i);
StorageObjectType storageObjectType;
switch (objType) {
case H5G_GROUP:
storageObjectType = StorageObjectType::Group;
break;
case H5G_DATASET:
storageObjectType = StorageObjectType::Dataset;
break;
default:
storageObjectType = StorageObjectType::Undefined;
}
if (storageObjectType == objectType
|| objectType == StorageObjectType::Undefined)
{
objects.emplace_back(objName, storageObjectType);
}
}

// Include attributes for groups
if (objectType == StorageObjectType::Attribute
|| objectType == StorageObjectType::Undefined)
{
SizeType numAttrs = group.getNumAttrs();
for (SizeType i = 0; i < numAttrs; ++i) {
H5::Attribute attr = group.openAttribute(i);
objects.emplace_back(attr.getName(), StorageObjectType::Attribute);
}
}
} else if (h5Type == H5O_TYPE_DATASET) {
if (objectType == StorageObjectType::Attribute
|| objectType == StorageObjectType::Undefined)
{
H5::DataSet dataset = m_file->openDataSet(path);
SizeType numAttrs = dataset.getNumAttrs();
for (SizeType i = 0; i < numAttrs; ++i) {
H5::Attribute attr = dataset.openAttribute(i);
objects.emplace_back(attr.getName(), StorageObjectType::Attribute);
}
}
}

return objects;
}

Expand Down
23 changes: 15 additions & 8 deletions src/io/hdf5/HDF5IO.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -294,19 +294,26 @@ class HDF5IO : public BaseIO
bool attributeExists(const std::string& path) const override;

/**
* @brief Gets the list of objects inside a group.
* @brief Gets the list of storage objects (groups, datasets, attributes)
* inside a group.
*
* This function returns a vector of relative paths of all objects inside
* the specified group. If the input path is not a group (e.g., as dataset
* or attribute or invalid object), then an empty list should be
* returned.
* This function returns the relative paths and storage type of all objects
* inside the specified group. If the input path is an attribute then an empty
* list should be returned. If the input path is a dataset, then only the
* attributes will be returned. Note, this function is not recursive, i.e.,
* it only looks for storage objects associated directly with the given path.
*
* @param path The path to the group.
* @param objectType Define which types of storage object to look for, i.e.,
* only objects of this specified type will be returned.
*
* @return A vector of relative paths of all objects inside the group.
* @return A vector of pairs of relative paths and their corresponding storage
* object types.
*/
std::vector<std::string> getGroupObjects(
const std::string& path) const override;
virtual std::vector<std::pair<std::string, StorageObjectType>>
getStorageObjects(const std::string& path,
const StorageObjectType& objectType =
StorageObjectType::Undefined) const override;

/**
* @brief Returns the HDF5 type of object at a given path.
Expand Down
7 changes: 4 additions & 3 deletions src/nwb/NWBFile.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,8 @@ Status NWBFile::initialize(const std::string& identifierText,

bool NWBFile::isInitialized() const
{
auto existingGroupObjects = m_io->getGroupObjects("/");
std::vector<std::pair<std::string, StorageObjectType>> existingGroupObjects =
m_io->getStorageObjects("/", StorageObjectType::Group);
if (existingGroupObjects.size() == 0) {
return false;
}
Expand All @@ -85,8 +86,8 @@ bool NWBFile::isInitialized() const
// Iterate over the existing objects and add to foundObjects if it's a
// required object
for (const auto& obj : existingGroupObjects) {
if (requiredObjects.find(obj) != requiredObjects.end()) {
foundObjects.insert(obj);
if (requiredObjects.find(obj.first) != requiredObjects.end()) {
foundObjects.insert(obj.first);
}
}

Expand Down
45 changes: 45 additions & 0 deletions src/nwb/RegisteredType.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -198,6 +198,51 @@ class RegisteredType
return (getNamespace() + "::" + getTypeName());
}

/**
* @brief Support reading of arbitrary fields by their relative path
*
* This function provided as a general "backup" to support reading of
* arbitrary fields even if the sub-class may not have an explicit
* DEFINE_FIELD specified for the field. If a DEFINE_FIELD exists then
* the corresponding read method should be used as it avoids the need
* for specifying most (if not all) of the function an template
* parameters needed by this function.
*
* @param fieldPath The relative path of the field within the current type,
* i.e., relative to `m_path`
* @tparam SOT The storage object type. This must be a either
* StorageObjectType::Dataset or StorageObjectType::Attribute
* @tparam VTYPE The value type of the field to be read.
* @tparam Enable SFINAE (Substitution Failure Is Not An Error) mechanism
* to enable this function only if SOT is a Dataset or Attribute.
*
* @return ReadDataWrapper object for lazy reading of the field
*/
template<StorageObjectType SOT,
typename VTYPE,
typename std::enable_if<Types::IsDataStorageObjectType<SOT>::value,
int>::type = 0>
inline std::unique_ptr<IO::ReadDataWrapper<SOT, VTYPE>> readField(
const std::string& fieldPath) const
{
return std::make_unique<IO::ReadDataWrapper<SOT, VTYPE>>(
m_io, AQNWB::mergePaths(m_path, fieldPath));
}

/**
* @brief Read a field that is itself a RegisteredType
*
* @param fieldPath The relative path of the field within the current type,
* i.e., relative to `m_path. The field must itself be RegisteredType
*
* @return A unique_ptr to the created instance of the subclass.
*/
inline std::shared_ptr<AQNWB::NWB::RegisteredType> readField(
const std::string& fieldPath)
{
return this->create(AQNWB::mergePaths(m_path, fieldPath), m_io);
}

protected:
/**
* @brief Register a subclass name and its factory function in the registry.
Expand Down
1 change: 1 addition & 0 deletions tests/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ include(Catch)
# ---- Tests ----

add_executable(aqnwb_test
testBaseIO.cpp
testData.cpp
testDevice.cpp
testEcephys.cpp
Expand Down
24 changes: 24 additions & 0 deletions tests/examples/test_ecephys_data_read.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -268,5 +268,29 @@ TEST_CASE("ElectricalSeriesReadExample", "[ecephys]")
readElectricalSeries->readDataUnit()->values().data[0];
REQUIRE(esUnitValue == std::string("volts"));
// [example_read_only_stringattr_snippet]

// [example_read_generic_dataset_field_snippet]
// Read the data field via the generic readField method
auto readElectricalSeriesData3 =
readElectricalSeries->readField<StorageObjectType::Dataset, float>(
std::string("data"));
// Read the data values as usual
DataBlock<float> readDataValues3 = readElectricalSeriesData3->values();
REQUIRE(readDataValues3.data.size() == (numSamples * numChannels));
// [example_read_generic_dataset_field_snippet]

// [example_read_generic_registeredtype_field_snippet]
// read the NWBFile
auto readNWBFile =
NWB::RegisteredType::create<AQNWB::NWB::NWBFile>("/", readio);
// read the ElectricalSeries from the NWBFile object via the readField
// method returning a generic std::shared_ptr<RegisteredType>
auto readRegisteredType = readNWBFile->readField(esdata_path);
// cast the generic pointer to the more specific ElectricalSeries
std::shared_ptr<AQNWB::NWB::ElectricalSeries> readElectricalSeries2 =
std::dynamic_pointer_cast<AQNWB::NWB::ElectricalSeries>(
readRegisteredType);
REQUIRE(readElectricalSeries2 != nullptr);
// [example_read_generic_registeredtype_field_snippet]
}
}
Loading
Loading