Skip to content

[loaders-np*] support 2d matrices #2724

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: develop
Choose a base branch
from
Open

Conversation

maxfl
Copy link
Contributor

@maxfl maxfl commented Mar 18, 2025

Modifying existing loaders.

  • npy/npz show all columns, even unnamed ones, properly show 1d arrays.
  • npy/npz now can load matrices.
  • npy/npz/hdf5 optionally may enumerate matrix rows and columns.

I've tested the performance of the modified features on the files I have.

Where should the documentation on the introduced configuration options go? For example, the currently available npy_allow_pickle option is documented only as command line argument.

@maxfl
Copy link
Contributor Author

maxfl commented Mar 18, 2025

oh, I see, need to avoid new features, e.g. match.

@maxfl
Copy link
Contributor Author

maxfl commented Mar 18, 2025

I see the pipeline fails, but this time I'm not sure that this is my fault.

@maxfl
Copy link
Contributor Author

maxfl commented Mar 18, 2025

Ok, it is mine. I will try to figure out how to run tests locally and will update.

@anjakefala
Copy link
Collaborator

Where should the documentation on the introduced configuration options go? For example, the currently available npy_allow_pickle option is documented only as command line argument.

Ideally, we'd add a guide to visidata:visidata/guides. If you have the energy to, I'd love it if you tried adding one. I could help you with that process.

Ideally, we'd also add a sample file to test loading on. Could you add one to visidata:sample_data and I'll prep the test?

@maxfl
Copy link
Contributor Author

maxfl commented Mar 27, 2025

@anjakefala, thank you. I will try providing some guides and will prepare a sample file. Not promising to do it quick though (:

@maxfl
Copy link
Contributor Author

maxfl commented Apr 14, 2025

@anjakefala, I've provided sample data files arrays.npz and arrays.hdf5 with 1d, 2d and structured data arrays for 1/2/4/8-bytes float, integer and unsigned integer data.

The files should be tested with following options set to True and False:

  • hdf5_matrix_enumerate for hdf5.
  • npy_matrix_enumerate for npy/npz.

The script, based on your previous version is below:

#!/usr/bin/env python

import numpy as np
import h5py

structured_dtype = []
shape_2d = (15, 3)
size = np.prod(shape_2d)
for typechar in ("u", "i", "f"):
    for nbytes in (1, 2, 4, 8):
        odtype = f"{typechar}{nbytes}"
        if odtype=="f1":
            continue
        structured_dtype.append((odtype, odtype))

array0 = np.linspace(0, 0.99, size, dtype="d")
structured_data = np.zeros(size, dtype=structured_dtype)
output_data : dict[str, np.typing.NDArray] = {
        "structured": structured_data
        }
for _, odtype in structured_dtype:
    print()
    print(odtype)

    try:
        tinfo = np.finfo(odtype)
    except ValueError:
        tinfo = np.iinfo(odtype)

    nmin, nmax = tinfo.min, tinfo.max
    if odtype[0] in "ui":
        span = float(nmax)-float(nmin)
        array1d = (nmin + array0*span).astype(odtype)
    else:
        # span for f8 is twice as big as maximal float value
        hspan = float((int(nmax)-int(nmin))//2)
        template = (0.0 + array0*hspan).astype(odtype)[::2]
        nhalf = size//2
        array1d = np.zeros(size, dtype=odtype)
        array1d[:nhalf+1] = -template[::-1]
        array1d[nhalf:] = template
    array2d = array1d.reshape(shape_2d)
    print(array1d)
    print(array2d)

    structured_data[odtype] = array1d
    output_data[f"{odtype}_1d"] = array1d
    output_data[f"{odtype}_2d"] = array2d

np.savez("arrays.npz", **output_data)
print("Write: arrays.npz")

ofile = h5py.File("arrays.hdf5", "w")
for key, dataset in output_data.items():
    ofile.create_dataset(key, data=dataset)
ofile.close()
print("Write: arrays.hdf5")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants