This document provides an in-depth technical overview of the NumSharp library internals, design decisions, and development practices.
- Project Overview
- Core Architecture
- Memory Management
- Type System
- API Layer
- Slicing and Views
- Broadcasting
- Iterator System
- Code Generation
- Development Workflow
- Technical Debt & Known Issues
- Future Roadmap
NumSharp is a .NET port of Python's NumPy library, providing n-dimensional array operations for scientific computing in C#. The library aims to match NumPy's API as closely as possible, including edge cases like NaN handling, multi-type operations, and broadcasting semantics.
- API Compatibility: Match NumPy 2.x API (upgraded from original 1.x target)
- 1-to-1 Behavior: Replicate NumPy behavior exactly, including random state/seed
- Performance: Achieve competitive performance through unmanaged memory and unsafe code
- Ecosystem Integration: Support TensorFlow.NET, ML.NET, and other .NET ML frameworks
NumSharp/
├── src/
│ └── NumSharp.Core/ # Main library
│ ├── APIs/ # np.* static entry points
│ ├── Backends/ # TensorEngine, Storage, Iterators
│ │ ├── Default/ # Pure C# engine implementation
│ │ ├── Unmanaged/ # Memory management
│ │ ├── Iterators/ # NDIterator system
│ │ └── LAPACK/ # Linear algebra bindings
│ ├── Creation/ # np.zeros, np.arange, np.ones, etc.
│ ├── Math/ # np.sum, np.sin, np.log, etc.
│ ├── LinearAlgebra/ # np.dot, np.matmul, np.linalg.*
│ ├── Manipulation/ # reshape, transpose, flatten
│ ├── RandomSampling/ # np.random.*
│ ├── Statistics/ # mean, std, var, argmax
│ ├── Logic/ # np.all, np.any, np.allclose
│ ├── Selection/ # Indexing, slicing, masking
│ ├── Generics/ # NDArray<T> typed wrapper
│ ├── View/ # Shape, Slice, ViewInfo
│ ├── Operations/ # Operator overloads
│ └── Utilities/ # Type helpers, converters
├── test/
│ └── NumSharp.UnitTest/ # MSTest unit tests
├── examples/
│ └── NeuralNetwork.NumSharp/ # Neural network example
└── docs/ # Documentation assets
The library is built on three fundamental classes that work together:
┌─────────────────────────────────────────────────────────┐
│ NDArray │
│ - Public API surface │
│ - Operator overloads (+, -, *, /, indexing) │
│ - References TensorEngine for computations │
├─────────────────────────────────────────────────────────┤
│ UnmanagedStorage │
│ - Holds raw data in unmanaged memory │
│ - Manages ArraySlice<T> for each dtype │
│ - Handles allocation, slicing views, data access │
├─────────────────────────────────────────────────────────┤
│ Shape │
│ - Dimensions and strides │
│ - Coordinate ↔ offset translation │
│ - Slicing, broadcasting, contiguity tracking │
└─────────────────────────────────────────────────────────┘
NDArray is the primary user-facing class, analogous to numpy.ndarray:
// Key properties
public Shape Shape { get; } // Dimensions
public Type dtype { get; } // Element type
public int ndim { get; } // Number of dimensions
public int size { get; } // Total element count
public int[] strides { get; } // Byte strides per dimension
public UnmanagedStorage Storage { get; }
public TensorEngine TensorEngine { get; }
// Key operations
public NDArray this[string slice] { get; set; } // "1:3, :, -1"
public NDArray reshape(params int[] shape);
public NDArray T { get; } // Transpose
public NDArray astype(Type dtype);TensorEngine is an abstract class defining all computational operations. This abstraction exists to potentially support alternative backends (GPU, SIMD, MKL) in the future.
public abstract class TensorEngine
{
// Allocation
public abstract UnmanagedStorage GetStorage(NPTypeCode typeCode);
// Arithmetic
public abstract NDArray Add(in NDArray lhs, in NDArray rhs);
public abstract NDArray Subtract(in NDArray lhs, in NDArray rhs);
public abstract NDArray Multiply(NDArray lhs, NDArray rhs);
public abstract NDArray Divide(in NDArray lhs, in NDArray rhs);
// Reduction
public abstract NDArray ReduceAdd(in NDArray arr, int? axis_, bool keepdims, ...);
public abstract NDArray ReduceArgMax(NDArray arr, int? axis_);
// Unary functions
public abstract NDArray Sqrt(in NDArray nd, NPTypeCode? typeCode);
public abstract NDArray Log(in NDArray nd, NPTypeCode? typeCode);
public abstract NDArray Exp(in NDArray nd, NPTypeCode? typeCode);
// ... 30+ more operations
// Linear algebra
public abstract NDArray Dot(in NDArray x, in NDArray y);
public abstract NDArray Matmul(NDArray lhs, NDArray rhs);
}The DefaultEngine is the current implementation - pure micro-optimized C# that uses Parallel.For for arrays exceeding 85,000 elements.
NumSharp uses unmanaged memory (raw pointers) rather than managed arrays or Span<T>/Memory<T>. This decision was made ~5 years ago based on extensive benchmarking when Span/Memory were not yet properly supported across the .NET ecosystem.
Benefits:
- Zero-copy slicing (views share underlying memory)
- Direct pointer arithmetic for maximum performance
- No GC pressure for large arrays
- Interop-friendly for native libraries
Trade-offs:
- Requires careful memory management
- Must use
unsafecode blocks - Manual disposal considerations
public class UnmanagedStorage
{
internal IArraySlice InternalArray; // Type-erased ArraySlice<T>
public Shape Shape { get; }
public Type DType { get; }
public NPTypeCode TypeCode { get; }
public unsafe void* Address { get; } // Raw pointer to data
// Get typed view
public ArraySlice<T> GetData<T>() where T : unmanaged;
// Slicing returns views, not copies
public UnmanagedStorage GetView(params Slice[] slices);
}The generic ArraySlice<T> wraps unmanaged memory with type safety:
public readonly struct ArraySlice<T> : IArraySlice where T : unmanaged
{
public readonly unsafe T* Address;
public readonly int Count;
public ref T this[int index] { get; }
public Span<T> AsSpan();
}NumSharp supports 12 primitive types:
| NPTypeCode | C# Type | Size (bytes) |
|---|---|---|
| Boolean | bool | 1 |
| Byte | byte | 1 |
| Int16 | short | 2 |
| UInt16 | ushort | 2 |
| Int32 | int | 4 |
| UInt32 | uint | 4 |
| Int64 | long | 8 |
| UInt64 | ulong | 8 |
| Char | char | 2 |
| Single | float | 4 |
| Double | double | 8 |
| Decimal | decimal | 16 |
To avoid runtime reflection costs, type information is cached statically:
public class InfoOf<T>
{
public static readonly int Size; // Byte size
public static readonly NPTypeCode NPTypeCode;
public static readonly T Zero; // default(T)
public static readonly T MaxValue;
public static readonly T MinValue;
}
// Usage
var size = InfoOf<double>.Size; // 8
var code = InfoOf<float>.NPTypeCode; // NPTypeCode.SingleFor type-safe operations, NDArray<T> provides a generic wrapper:
public class NDArray<T> : NDArray where T : unmanaged
{
public new T this[params int[] indices] { get; set; }
public new ArraySlice<T> Array { get; }
public new unsafe T* Address { get; }
public new NDArray<T> this[string slice] { get; }
}
// Usage
NDArray<float> arr = np.zeros<float>(3, 4);
float val = arr[1, 2]; // Direct typed accessThe np class is the primary entry point, mirroring Python's import numpy as np:
public static partial class np
{
// Type aliases (matching NumPy)
public static readonly Type float64 = typeof(double);
public static readonly Type float32 = typeof(float);
public static readonly Type int32 = typeof(int);
public static readonly Type int64 = typeof(long);
public static readonly Type bool_ = typeof(bool);
// Constants
public const double pi = Math.PI;
public const double e = Math.E;
public static readonly double nan = double.NaN;
public static readonly double inf = double.PositiveInfinity;
// Random module
public static NumPyRandom random { get; }
// Creation: np.zeros, np.ones, np.arange, np.linspace, etc.
// Math: np.sum, np.mean, np.sin, np.cos, np.exp, np.log, etc.
// Linear algebra: np.dot, np.matmul, np.linalg.*
// Manipulation: np.reshape, np.transpose, np.concatenate, etc.
}There are two patterns for implementing np.* functions:
Pattern 1: Delegating to TensorEngine
// np.sum delegates to engine
public static NDArray sum(NDArray a, int? axis = null, ...)
{
return a.TensorEngine.Sum(a, axis, typeCode, keepdims);
}Pattern 2: Composing other np functions
// np.std composes np.mean and other operations
public static NDArray std(NDArray a, int? axis = null, ...)
{
var mean = np.mean(a, axis, keepdims: true);
var diff = a - mean;
var sq = np.power(diff, 2);
return np.sqrt(np.mean(sq, axis, keepdims: keepdims));
}The Slice class parses and represents Python-style slice notation:
public class Slice
{
public int? Start; // null = from beginning
public int? Stop; // null = to end
public int Step; // default 1
public bool IsIndex; // Single element, reduces dimension
public bool IsEllipsis; // ... fills remaining dimensions
public bool IsNewAxis; // np.newaxis inserts dimension
// Special instances
public static readonly Slice All; // ":"
public static readonly Slice None; // "0:0"
public static readonly Slice Ellipsis; // "..."
public static readonly Slice NewAxis; // "np.newaxis"
// Parsing
public static Slice[] ParseSlices(string notation); // "1:3, :, -1"
}nd[":"] // All elements
nd["1:5"] // Elements 1-4
nd["::2"] // Every other element
nd["-1"] // Last element (reduces dimension)
nd["1::-1"] // Reverse from index 1
nd[":, 0"] // All rows, first column
nd["..., -1"] // Last element of last dimensionCritical: Slicing returns views, not copies. The view shares memory with the original:
var original = np.arange(10);
var view = original["2:5"]; // View, shares memory
view[0] = 999; // Modifies original[2]!
var copy = original["2:5"].copy(); // Explicit copyFor efficient computation, slices are converted to SliceDef:
public struct SliceDef
{
public int Start; // Absolute start index
public int Step; // Step size (can be negative)
public int Count; // Number of elements (-1 = single index)
// Merge handles recursive slicing
public SliceDef Merge(SliceDef other);
}Broadcasting allows operations between arrays of different shapes by virtually expanding dimensions:
var a = np.ones(3, 4); // Shape: (3, 4)
var b = np.ones(4); // Shape: (4,)
var c = a + b; // Broadcasting: b treated as (1, 4) → (3, 4)public static (Shape, Shape) Broadcast(Shape left, Shape right)
{
// Rules:
// 1. Align shapes from the right
// 2. Dimensions must be equal OR one must be 1
// 3. Dimension of 1 is "stretched" to match
}For element-wise operations with broadcasting:
public static class MultiIterator
{
// Creates paired iterators with broadcasting
public static (NDIterator, NDIterator) GetIterators(
UnmanagedStorage lhs,
UnmanagedStorage rhs,
bool broadcast);
// Assignment with broadcasting
public static void Assign(NDArray lhs, NDArray rhs);
}The iterator system handles traversal of arrays with different memory layouts:
public class NDIterator<T> where T : unmanaged
{
public Func<T> MoveNext; // Get next value
public MoveNextReferencedDelegate<T> MoveNextReference; // Get reference
public Func<bool> HasNext; // Check if more elements
public Action Reset; // Reset to beginning
public bool AutoReset; // For broadcasting (smaller array loops)
public IteratorType Type; // Scalar, Vector, Matrix, Tensor
}public enum IteratorType
{
Scalar, // Single element
Vector, // 1D array
Matrix, // 2D array
Tensor // 3D+ array
}The iterator chooses different code paths based on:
- Contiguous arrays: Direct pointer increment
- Sliced arrays: Coordinate-to-offset calculation
- Auto-reset mode: For broadcasting smaller arrays
// Contiguous: fast path
MoveNext = () => *((T*)Address + index++);
// Sliced: uses shape.GetOffset
MoveNext = () => *((T*)Address + shape.GetOffset(index++));NumSharp uses Regen (a custom templating engine) to generate type-specific code. This results in approximately 200,000 lines of generated code.
The pattern appears in many files:
#if _REGEN
#region Compute
switch (typeCode)
{
%foreach supported_dtypes,supported_dtypes_lowercase%
case NPTypeCode.#1: return DoOperation<#2>(arr);
%
default:
throw new NotSupportedException();
}
#endregion
#else
// Generated code follows...
switch (typeCode)
{
case NPTypeCode.Boolean: return DoOperation<bool>(arr);
case NPTypeCode.Byte: return DoOperation<byte>(arr);
case NPTypeCode.Int16: return DoOperation<short>(arr);
// ... all 12 types
}
#endif- Performance: Avoids boxing and virtual dispatch
- Type safety: Compile-time checks for each type
- NumPy compatibility: Exact type handling behavior
- Heavy codebase: 200K lines of generated code
- Maintenance burden: Changes require regeneration
- Compile time: Longer builds
Note: Migration to T4 templates or C# source generators is possible but not currently prioritized.
-
Research NumPy behavior:
- Read NumPy documentation
- Run actual Python/NumPy code
- Document edge cases (NaN, empty arrays, broadcasting)
-
Choose implementation pattern:
- If needs low-level optimization → Add to
DefaultEngine - If can compose existing functions → Implement directly in
np.*
- If needs low-level optimization → Add to
-
Implement:
// In np.newfunction.cs public static partial class np { public static NDArray newfunction(NDArray a, int axis = -1) { // Implementation } }
-
Write tests:
- Run NumPy code, capture exact outputs
- Replicate 1-to-1 in C# tests
- Include edge cases
Tests should be based on actual NumPy execution:
# Python
import numpy as np
a = np.array([1, 2, np.nan, 4])
result = np.nanmean(a)
print(result) # 2.3333...// C# test
[TestMethod]
public void nanmean_WithNaN_IgnoresNaN()
{
var a = np.array(new double[] { 1, 2, double.NaN, 4 });
var result = np.nanmean(a);
Assert.AreEqual(2.333333, result.GetDouble(), 0.0001);
}- Tests use MSTest framework
- Many tests were adapted from NumPy's own test suite
- Coverage is decent but has gaps in edge cases
NumSharp provides extensive NumPy-compatible functionality across multiple domains:
np.array, np.zeros, np.ones, np.empty, np.full, np.arange, np.linspace, np.eye, np.meshgrid, np.mgrid, np.copy, np.asarray, np.frombuffer, np.zeros_like, np.ones_like, np.empty_like, np.full_like
np.concatenate, np.stack, np.hstack, np.vstack, np.dstack
np.broadcast, np.broadcast_to, np.broadcast_arrays, np.are_broadcastable
np.sum, np.prod, np.cumsum, np.power, np.sqrt, np.abs, np.sign, np.floor, np.ceil, np.round, np.clip, np.modf, np.maximum, np.minimum, np.log, np.log2, np.log10, np.log1p, np.exp, np.exp2, np.expm1, np.sin, np.cos, np.tan
np.mean, np.std, np.var, np.amax, np.amin, np.argmax, np.argmin
np.argsort, np.searchsorted
np.dot, np.matmul, np.outer, np.linalg.norm, nd.inv(), nd.qr(), nd.svd(), nd.lstsq(), nd.multi_dot(), nd.matrix_power()
np.reshape, np.transpose, np.ravel, np.squeeze, np.expand_dims, np.swapaxes, np.moveaxis, np.rollaxis, np.atleast_1d/2d/3d, np.unique, np.repeat, np.copyto, nd.flatten(), nd.roll(), nd.delete()
np.all, np.any, np.allclose, np.array_equal, np.isnan, np.isinf, np.isfinite, np.find_common_type
- Arithmetic:
+,-,*,/,%, unary- - Comparison:
==,!=,>,>=,<,<= - Logical:
&,|,!
Integer indexing, string slice notation, Slice objects, boolean masking, fancy indexing (NDArray indices), np.nonzero
np.random.rand, np.random.randn, np.random.randint, np.random.uniform, np.random.choice, np.random.shuffle, np.random.permutation, np.random.beta, np.random.binomial, np.random.gamma, np.random.poisson, np.random.exponential, np.random.geometric, np.random.lognormal, np.random.chisquare, np.random.bernoulli
np.save (.npy), np.load (.npy, .npz), np.fromfile, nd.tofile()
-
NumPy 2.x API Mapping: Comprehensive audit of:
- Existing functions
- Missing functions
- Behavioral discrepancies
-
Behavioral Corrections: Fix implementations that diverge from NumPy
-
Documentation: API documentation and examples
- Alternative Backends: GPU (CUDA), SIMD intrinsics, MKL/BLAS
- Source Generator Migration: Replace Regen with C# source generators
- Span/Memory Integration: Where beneficial without breaking changes
The library accepts breaking changes - it was deprecated for an extended period and is being revitalized. API stability is not a constraint.
| Component | Primary Files |
|---|---|
| NDArray | Backends/NDArray.cs, Backends/NDArray.*.cs |
| Storage | Backends/Unmanaged/UnmanagedStorage.cs |
| Shape | View/Shape.cs |
| Slicing | View/Slice.cs |
| TensorEngine | Backends/TensorEngine.cs, Backends/Default/DefaultEngine.*.cs |
| Iterators | Backends/Iterators/NDIterator.cs, MultiIterator.cs |
| np API | APIs/np.cs, individual np.*.cs files |
| Operators | Operations/Elementwise/NDArray.Primitive.cs |
| Type Info | Utilities/InfoOf.cs, Backends/NPTypeCode.cs |
| Random | RandomSampling/np.random.cs, NumPyRandom.cs |
| Generic | Generics/NDArray\1.cs` |
When contributing to NumSharp:
- Match NumPy exactly - Run Python code, observe behavior, replicate
- Write tests first - Based on actual NumPy output
- Handle all types - Use Regen patterns or switch statements for all 12 dtypes
- Consider edge cases - NaN, empty arrays, scalar vs array, broadcasting
- Document behavior - Reference NumPy docs in comments
See the test suite for examples of expected behavior patterns.