NumSharp Architecture Guide

This document provides an in-depth technical overview of the NumSharp library internals, design decisions, and development practices.

Project Overview
Core Architecture
Memory Management
Type System
API Layer
Slicing and Views
Broadcasting
Iterator System
Code Generation
Development Workflow
Technical Debt & Known Issues
Future Roadmap

Project Overview

NumSharp is a .NET port of Python's NumPy library, providing n-dimensional array operations for scientific computing in C#. The library aims to match NumPy's API as closely as possible, including edge cases like NaN handling, multi-type operations, and broadcasting semantics.

Goals

API Compatibility: Match NumPy 2.x API (upgraded from original 1.x target)
1-to-1 Behavior: Replicate NumPy behavior exactly, including random state/seed
Performance: Achieve competitive performance through unmanaged memory and unsafe code
Ecosystem Integration: Support TensorFlow.NET, ML.NET, and other .NET ML frameworks

Project Structure

NumSharp/
├── src/
│   └── NumSharp.Core/           # Main library
│       ├── APIs/                # np.* static entry points
│       ├── Backends/            # TensorEngine, Storage, Iterators
│       │   ├── Default/         # Pure C# engine implementation
│       │   ├── Unmanaged/       # Memory management
│       │   ├── Iterators/       # NDIterator system
│       │   └── LAPACK/          # Linear algebra bindings
│       ├── Creation/            # np.zeros, np.arange, np.ones, etc.
│       ├── Math/                # np.sum, np.sin, np.log, etc.
│       ├── LinearAlgebra/       # np.dot, np.matmul, np.linalg.*
│       ├── Manipulation/        # reshape, transpose, flatten
│       ├── RandomSampling/      # np.random.*
│       ├── Statistics/          # mean, std, var, argmax
│       ├── Logic/               # np.all, np.any, np.allclose
│       ├── Selection/           # Indexing, slicing, masking
│       ├── Generics/            # NDArray<T> typed wrapper
│       ├── View/                # Shape, Slice, ViewInfo
│       ├── Operations/          # Operator overloads
│       └── Utilities/           # Type helpers, converters
├── test/
│   └── NumSharp.UnitTest/       # MSTest unit tests
├── examples/
│   └── NeuralNetwork.NumSharp/  # Neural network example
└── docs/                        # Documentation assets

Core Architecture

Three Pillars: NDArray, UnmanagedStorage, Shape

The library is built on three fundamental classes that work together:

┌─────────────────────────────────────────────────────────┐
│                        NDArray                          │
│  - Public API surface                                   │
│  - Operator overloads (+, -, *, /, indexing)           │
│  - References TensorEngine for computations             │
├─────────────────────────────────────────────────────────┤
│                    UnmanagedStorage                     │
│  - Holds raw data in unmanaged memory                   │
│  - Manages ArraySlice<T> for each dtype                 │
│  - Handles allocation, slicing views, data access       │
├─────────────────────────────────────────────────────────┤
│                         Shape                           │
│  - Dimensions and strides                               │
│  - Coordinate ↔ offset translation                      │
│  - Slicing, broadcasting, contiguity tracking           │
└─────────────────────────────────────────────────────────┘

NDArray

NDArray is the primary user-facing class, analogous to numpy.ndarray:

// Key properties
public Shape Shape { get; }           // Dimensions
public Type dtype { get; }            // Element type
public int ndim { get; }              // Number of dimensions
public int size { get; }              // Total element count
public int[] strides { get; }         // Byte strides per dimension
public UnmanagedStorage Storage { get; }
public TensorEngine TensorEngine { get; }

// Key operations
public NDArray this[string slice] { get; set; }  // "1:3, :, -1"
public NDArray reshape(params int[] shape);
public NDArray T { get; }                         // Transpose
public NDArray astype(Type dtype);

TensorEngine

TensorEngine is an abstract class defining all computational operations. This abstraction exists to potentially support alternative backends (GPU, SIMD, MKL) in the future.

public abstract class TensorEngine
{
    // Allocation
    public abstract UnmanagedStorage GetStorage(NPTypeCode typeCode);

    // Arithmetic
    public abstract NDArray Add(in NDArray lhs, in NDArray rhs);
    public abstract NDArray Subtract(in NDArray lhs, in NDArray rhs);
    public abstract NDArray Multiply(NDArray lhs, NDArray rhs);
    public abstract NDArray Divide(in NDArray lhs, in NDArray rhs);

    // Reduction
    public abstract NDArray ReduceAdd(in NDArray arr, int? axis_, bool keepdims, ...);
    public abstract NDArray ReduceArgMax(NDArray arr, int? axis_);

    // Unary functions
    public abstract NDArray Sqrt(in NDArray nd, NPTypeCode? typeCode);
    public abstract NDArray Log(in NDArray nd, NPTypeCode? typeCode);
    public abstract NDArray Exp(in NDArray nd, NPTypeCode? typeCode);
    // ... 30+ more operations

    // Linear algebra
    public abstract NDArray Dot(in NDArray x, in NDArray y);
    public abstract NDArray Matmul(NDArray lhs, NDArray rhs);
}

The DefaultEngine is the current implementation - pure micro-optimized C# that uses Parallel.For for arrays exceeding 85,000 elements.

Memory Management

Why Unmanaged Memory?

NumSharp uses unmanaged memory (raw pointers) rather than managed arrays or Span<T>/Memory<T>. This decision was made ~5 years ago based on extensive benchmarking when Span/Memory were not yet properly supported across the .NET ecosystem.

Benefits:

Zero-copy slicing (views share underlying memory)
Direct pointer arithmetic for maximum performance
No GC pressure for large arrays
Interop-friendly for native libraries

Trade-offs:

Requires careful memory management
Must use unsafe code blocks
Manual disposal considerations

UnmanagedStorage

public class UnmanagedStorage
{
    internal IArraySlice InternalArray;  // Type-erased ArraySlice<T>

    public Shape Shape { get; }
    public Type DType { get; }
    public NPTypeCode TypeCode { get; }
    public unsafe void* Address { get; }  // Raw pointer to data

    // Get typed view
    public ArraySlice<T> GetData<T>() where T : unmanaged;

    // Slicing returns views, not copies
    public UnmanagedStorage GetView(params Slice[] slices);
}

ArraySlice

The generic ArraySlice<T> wraps unmanaged memory with type safety:

public readonly struct ArraySlice<T> : IArraySlice where T : unmanaged
{
    public readonly unsafe T* Address;
    public readonly int Count;

    public ref T this[int index] { get; }
    public Span<T> AsSpan();
}

Type System

Supported Types (NPTypeCode)

NumSharp supports 12 primitive types:

NPTypeCode	C# Type	Size (bytes)
Boolean	bool	1
Byte	byte	1
Int16	short	2
UInt16	ushort	2
Int32	int	4
UInt32	uint	4
Int64	long	8
UInt64	ulong	8
Char	char	2
Single	float	4
Double	double	8
Decimal	decimal	16

InfoOf - Type Information Cache

To avoid runtime reflection costs, type information is cached statically:

public class InfoOf<T>
{
    public static readonly int Size;           // Byte size
    public static readonly NPTypeCode NPTypeCode;
    public static readonly T Zero;             // default(T)
    public static readonly T MaxValue;
    public static readonly T MinValue;
}

// Usage
var size = InfoOf<double>.Size;  // 8
var code = InfoOf<float>.NPTypeCode;  // NPTypeCode.Single

NDArray - Generic Typed Wrapper

For type-safe operations, NDArray<T> provides a generic wrapper:

public class NDArray<T> : NDArray where T : unmanaged
{
    public new T this[params int[] indices] { get; set; }
    public new ArraySlice<T> Array { get; }
    public new unsafe T* Address { get; }
    public new NDArray<T> this[string slice] { get; }
}

// Usage
NDArray<float> arr = np.zeros<float>(3, 4);
float val = arr[1, 2];  // Direct typed access

API Layer

The `np` Static Class

The np class is the primary entry point, mirroring Python's import numpy as np:

public static partial class np
{
    // Type aliases (matching NumPy)
    public static readonly Type float64 = typeof(double);
    public static readonly Type float32 = typeof(float);
    public static readonly Type int32 = typeof(int);
    public static readonly Type int64 = typeof(long);
    public static readonly Type bool_ = typeof(bool);

    // Constants
    public const double pi = Math.PI;
    public const double e = Math.E;
    public static readonly double nan = double.NaN;
    public static readonly double inf = double.PositiveInfinity;

    // Random module
    public static NumPyRandom random { get; }

    // Creation: np.zeros, np.ones, np.arange, np.linspace, etc.
    // Math: np.sum, np.mean, np.sin, np.cos, np.exp, np.log, etc.
    // Linear algebra: np.dot, np.matmul, np.linalg.*
    // Manipulation: np.reshape, np.transpose, np.concatenate, etc.
}

Function Implementation Patterns

There are two patterns for implementing np.* functions:

Pattern 1: Delegating to TensorEngine

// np.sum delegates to engine
public static NDArray sum(NDArray a, int? axis = null, ...)
{
    return a.TensorEngine.Sum(a, axis, typeCode, keepdims);
}

Pattern 2: Composing other np functions

// np.std composes np.mean and other operations
public static NDArray std(NDArray a, int? axis = null, ...)
{
    var mean = np.mean(a, axis, keepdims: true);
    var diff = a - mean;
    var sq = np.power(diff, 2);
    return np.sqrt(np.mean(sq, axis, keepdims: keepdims));
}

Slicing and Views

Slice Class

The Slice class parses and represents Python-style slice notation:

public class Slice
{
    public int? Start;      // null = from beginning
    public int? Stop;       // null = to end
    public int Step;        // default 1
    public bool IsIndex;    // Single element, reduces dimension
    public bool IsEllipsis; // ... fills remaining dimensions
    public bool IsNewAxis;  // np.newaxis inserts dimension

    // Special instances
    public static readonly Slice All;      // ":"
    public static readonly Slice None;     // "0:0"
    public static readonly Slice Ellipsis; // "..."
    public static readonly Slice NewAxis;  // "np.newaxis"

    // Parsing
    public static Slice[] ParseSlices(string notation);  // "1:3, :, -1"
}

Slice Examples

nd[":"]           // All elements
nd["1:5"]         // Elements 1-4
nd["::2"]         // Every other element
nd["-1"]          // Last element (reduces dimension)
nd["1::-1"]       // Reverse from index 1
nd[":, 0"]        // All rows, first column
nd["..., -1"]     // Last element of last dimension

View Semantics

Critical: Slicing returns views, not copies. The view shares memory with the original:

var original = np.arange(10);
var view = original["2:5"];  // View, shares memory
view[0] = 999;               // Modifies original[2]!

var copy = original["2:5"].copy();  // Explicit copy

SliceDef - Internal Representation

For efficient computation, slices are converted to SliceDef:

public struct SliceDef
{
    public int Start;  // Absolute start index
    public int Step;   // Step size (can be negative)
    public int Count;  // Number of elements (-1 = single index)

    // Merge handles recursive slicing
    public SliceDef Merge(SliceDef other);
}

Broadcasting

Broadcasting allows operations between arrays of different shapes by virtually expanding dimensions:

var a = np.ones(3, 4);      // Shape: (3, 4)
var b = np.ones(4);         // Shape: (4,)
var c = a + b;              // Broadcasting: b treated as (1, 4) → (3, 4)

Broadcast Resolution

public static (Shape, Shape) Broadcast(Shape left, Shape right)
{
    // Rules:
    // 1. Align shapes from the right
    // 2. Dimensions must be equal OR one must be 1
    // 3. Dimension of 1 is "stretched" to match
}

MultiIterator

For element-wise operations with broadcasting:

public static class MultiIterator
{
    // Creates paired iterators with broadcasting
    public static (NDIterator, NDIterator) GetIterators(
        UnmanagedStorage lhs,
        UnmanagedStorage rhs,
        bool broadcast);

    // Assignment with broadcasting
    public static void Assign(NDArray lhs, NDArray rhs);
}

Iterator System

NDIterator

The iterator system handles traversal of arrays with different memory layouts:

public class NDIterator<T> where T : unmanaged
{
    public Func<T> MoveNext;                    // Get next value
    public MoveNextReferencedDelegate<T> MoveNextReference;  // Get reference
    public Func<bool> HasNext;                  // Check if more elements
    public Action Reset;                        // Reset to beginning

    public bool AutoReset;  // For broadcasting (smaller array loops)
    public IteratorType Type;  // Scalar, Vector, Matrix, Tensor
}

Iterator Types

public enum IteratorType
{
    Scalar,  // Single element
    Vector,  // 1D array
    Matrix,  // 2D array
    Tensor   // 3D+ array
}

Optimization Paths

The iterator chooses different code paths based on:

Contiguous arrays: Direct pointer increment
Sliced arrays: Coordinate-to-offset calculation
Auto-reset mode: For broadcasting smaller arrays

// Contiguous: fast path
MoveNext = () => *((T*)Address + index++);

// Sliced: uses shape.GetOffset
MoveNext = () => *((T*)Address + shape.GetOffset(index++));

Code Generation

Regen Templating

NumSharp uses Regen (a custom templating engine) to generate type-specific code. This results in approximately 200,000 lines of generated code.

The pattern appears in many files:

#if _REGEN
    #region Compute
    switch (typeCode)
    {
        %foreach supported_dtypes,supported_dtypes_lowercase%
        case NPTypeCode.#1: return DoOperation<#2>(arr);
        %
        default:
            throw new NotSupportedException();
    }
    #endregion
#else
    // Generated code follows...
    switch (typeCode)
    {
        case NPTypeCode.Boolean: return DoOperation<bool>(arr);
        case NPTypeCode.Byte: return DoOperation<byte>(arr);
        case NPTypeCode.Int16: return DoOperation<short>(arr);
        // ... all 12 types
    }
#endif

Why Code Generation?

Performance: Avoids boxing and virtual dispatch
Type safety: Compile-time checks for each type
NumPy compatibility: Exact type handling behavior

Trade-offs

Heavy codebase: 200K lines of generated code
Maintenance burden: Changes require regeneration
Compile time: Longer builds

Note: Migration to T4 templates or C# source generators is possible but not currently prioritized.

Development Workflow

Adding a New np.* Function

Research NumPy behavior:
- Read NumPy documentation
- Run actual Python/NumPy code
- Document edge cases (NaN, empty arrays, broadcasting)
Choose implementation pattern:
- If needs low-level optimization → Add to DefaultEngine
- If can compose existing functions → Implement directly in np.*

Implement:

// In np.newfunction.cs
public static partial class np
{
    public static NDArray newfunction(NDArray a, int axis = -1)
    {
        // Implementation
    }
}

Write tests:
- Run NumPy code, capture exact outputs
- Replicate 1-to-1 in C# tests
- Include edge cases

Testing Philosophy

Tests should be based on actual NumPy execution:

# Python
import numpy as np
a = np.array([1, 2, np.nan, 4])
result = np.nanmean(a)
print(result)  # 2.3333...

// C# test
[TestMethod]
public void nanmean_WithNaN_IgnoresNaN()
{
    var a = np.array(new double[] { 1, 2, double.NaN, 4 });
    var result = np.nanmean(a);
    Assert.AreEqual(2.333333, result.GetDouble(), 0.0001);
}

Test Coverage

Tests use MSTest framework
Many tests were adapted from NumPy's own test suite
Coverage is decent but has gaps in edge cases

Implemented Capabilities

NumSharp provides extensive NumPy-compatible functionality across multiple domains:

Array Creation

np.array, np.zeros, np.ones, np.empty, np.full, np.arange, np.linspace, np.eye, np.meshgrid, np.mgrid, np.copy, np.asarray, np.frombuffer, np.zeros_like, np.ones_like, np.empty_like, np.full_like

Stacking & Joining

np.concatenate, np.stack, np.hstack, np.vstack, np.dstack

Broadcasting

np.broadcast, np.broadcast_to, np.broadcast_arrays, np.are_broadcastable

Mathematical Functions

np.sum, np.prod, np.cumsum, np.power, np.sqrt, np.abs, np.sign, np.floor, np.ceil, np.round, np.clip, np.modf, np.maximum, np.minimum, np.log, np.log2, np.log10, np.log1p, np.exp, np.exp2, np.expm1, np.sin, np.cos, np.tan

Statistics

np.mean, np.std, np.var, np.amax, np.amin, np.argmax, np.argmin

Sorting & Searching

np.argsort, np.searchsorted

Linear Algebra

np.dot, np.matmul, np.outer, np.linalg.norm, nd.inv(), nd.qr(), nd.svd(), nd.lstsq(), nd.multi_dot(), nd.matrix_power()

Shape Manipulation

np.reshape, np.transpose, np.ravel, np.squeeze, np.expand_dims, np.swapaxes, np.moveaxis, np.rollaxis, np.atleast_1d/2d/3d, np.unique, np.repeat, np.copyto, nd.flatten(), nd.roll(), nd.delete()

Logic Functions

np.all, np.any, np.allclose, np.array_equal, np.isnan, np.isinf, np.isfinite, np.find_common_type

Operators

Arithmetic: +, -, *, /, %, unary -
Comparison: ==, !=, >, >=, <, <=
Logical: &, |, !

Indexing & Selection

Integer indexing, string slice notation, Slice objects, boolean masking, fancy indexing (NDArray indices), np.nonzero

Random Sampling

np.random.rand, np.random.randn, np.random.randint, np.random.uniform, np.random.choice, np.random.shuffle, np.random.permutation, np.random.beta, np.random.binomial, np.random.gamma, np.random.poisson, np.random.exponential, np.random.geometric, np.random.lognormal, np.random.chisquare, np.random.bernoulli

File I/O

np.save (.npy), np.load (.npy, .npz), np.fromfile, nd.tofile()

Future Roadmap

Near-term Goals

NumPy 2.x API Mapping: Comprehensive audit of:
- Existing functions
- Missing functions
- Behavioral discrepancies
Behavioral Corrections: Fix implementations that diverge from NumPy
Documentation: API documentation and examples

Potential Future Directions

Alternative Backends: GPU (CUDA), SIMD intrinsics, MKL/BLAS
Source Generator Migration: Replace Regen with C# source generators
Span/Memory Integration: Where beneficial without breaking changes

Breaking Changes

The library accepts breaking changes - it was deprecated for an extended period and is being revitalized. API stability is not a constraint.

Appendix: Key Files Reference

Component	Primary Files
NDArray	`Backends/NDArray.cs`, `Backends/NDArray.*.cs`
Storage	`Backends/Unmanaged/UnmanagedStorage.cs`
Shape	`View/Shape.cs`
Slicing	`View/Slice.cs`
TensorEngine	`Backends/TensorEngine.cs`, `Backends/Default/DefaultEngine.*.cs`
Iterators	`Backends/Iterators/NDIterator.cs`, `MultiIterator.cs`
np API	`APIs/np.cs`, individual `np.*.cs` files
Operators	`Operations/Elementwise/NDArray.Primitive.cs`
Type Info	`Utilities/InfoOf.cs`, `Backends/NPTypeCode.cs`
Random	`RandomSampling/np.random.cs`, `NumPyRandom.cs`
Generic	`Generics/NDArray\`1.cs`

Contributing

When contributing to NumSharp:

Match NumPy exactly - Run Python code, observe behavior, replicate
Write tests first - Based on actual NumPy output
Handle all types - Use Regen patterns or switch statements for all 12 dtypes
Consider edge cases - NaN, empty arrays, scalar vs array, broadcasting
Document behavior - Reference NumPy docs in comments

See the test suite for examples of expected behavior patterns.

FilesExpand file tree

ARCHITECTURE.md

Latest commit

History

ARCHITECTURE.md

File metadata and controls

NumSharp Architecture Guide

Table of Contents

Project Overview

Goals

Project Structure

Core Architecture

Three Pillars: NDArray, UnmanagedStorage, Shape

NDArray

TensorEngine

Memory Management

Why Unmanaged Memory?

UnmanagedStorage

ArraySlice

Type System

Supported Types (NPTypeCode)

InfoOf - Type Information Cache

NDArray - Generic Typed Wrapper

API Layer

The np Static Class

Function Implementation Patterns

Slicing and Views

Slice Class

Slice Examples

View Semantics

SliceDef - Internal Representation

Broadcasting

Broadcast Resolution

MultiIterator

Iterator System

NDIterator

Iterator Types

Optimization Paths

Code Generation

Regen Templating

Why Code Generation?

Trade-offs

Development Workflow

Adding a New np.* Function

Testing Philosophy

Test Coverage

Implemented Capabilities

Array Creation

Stacking & Joining

Broadcasting

Mathematical Functions

Statistics

Sorting & Searching

Linear Algebra

Shape Manipulation

Logic Functions

Operators

Indexing & Selection

Random Sampling

File I/O

Future Roadmap

Near-term Goals

Potential Future Directions

Breaking Changes

Appendix: Key Files Reference

Contributing

The `np` Static Class