-
Notifications
You must be signed in to change notification settings - Fork 57
Add visitor pattern section #591
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
haianhng31
wants to merge
13
commits into
G-Research:master
Choose a base branch
from
haianhng31:visitorpatterns
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+233
−45
Open
Changes from 12 commits
Commits
Show all changes
13 commits
Select commit
Hold shift + click to select a range
5b54896
Add visitor pattern section
haianhng31 000ae91
Update doc
haianhng31 c09483a
Update doc
haianhng31 8acc83b
Update docs/guides/VisitorPatterns.md
haianhng31 ee9ec51
Update docs/guides/VisitorPatterns.md
haianhng31 cdeec25
Update docs/guides/VisitorPatterns.md
haianhng31 1516b46
Update docs/guides/VisitorPatterns.md
haianhng31 ad9493e
Update docs/guides/VisitorPatterns.md
haianhng31 88e6737
Update docs/guides/VisitorPatterns.md
haianhng31 5530ac7
Update docs/guides/VisitorPatterns.md
haianhng31 99b6f5c
Update documentation
haianhng31 27dd2ef
Update documentation
haianhng31 9a4bfe2
Update documentation
haianhng31 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,228 @@ | ||
| # Visitor patterns: reading & writing with unknown column types | ||
|
|
||
| ParquetSharp exposes a number of "visitor" interfaces that make it convenient to read or write columns when you don't know the concrete column types at compile time. These visitors let you write type-safe code that is invoked for the actual column element type at runtime. | ||
|
|
||
| ## ILogicalColumnWriterVisitor<TReturn> | ||
|
|
||
| The @ParquetSharp.ILogicalColumnWriterVisitor`1 interface is invoked for logical writers (high-level typed writers). Use this when you need to write data to columns but don't know the column types at compile time. | ||
|
|
||
| ### Example: Generic column writer | ||
|
|
||
| ```csharp | ||
| // A visitor that writes arrays of values to any column type | ||
| sealed class GenericColumnWriter : ILogicalColumnWriterVisitor<bool> | ||
| { | ||
| private readonly IDictionary<string, object> _valuesByColumn; | ||
haianhng31 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| public GenericColumnWriter(IDictionary<string, Array> valuesByColumn) | ||
| { | ||
| _valuesByColumn = valuesByColumn; | ||
| } | ||
|
|
||
| public bool OnLogicalColumnWriter<TValue>(LogicalColumnWriter<TValue> columnWriter) | ||
| { | ||
| // Look up values for this column name | ||
| if (!_valuesByColumn.TryGetValue(columnWriter.ColumnDescriptor.ToDotString(), out var raw)) | ||
haianhng31 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| return false; | ||
|
|
||
| // Cast through object to TValue[] for WriteBatch | ||
haianhng31 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| var values = (TValue[])raw; | ||
| columnWriter.WriteBatch(values); | ||
| return true; | ||
| } | ||
| } | ||
|
|
||
| // Usage | ||
| var valuesByColumn = new Dictionary<string, object> | ||
haianhng31 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| { | ||
| { "Id", new[] { 1, 2, 3 } }, | ||
| { "Name", new[] { "Alice", "Bob", "Carol" } }, | ||
| { "Price", new[] { 9.99, 12.50, 5.75 } } | ||
| }; | ||
|
|
||
| using var logicalWriter = columnWriter.LogicalWriter(); | ||
| var success = logicalWriter.Apply(new GenericColumnWriter(valuesByColumn)); | ||
| ``` | ||
|
|
||
| #### Casting arrays safely | ||
|
|
||
| The `(TValue[])(object)array` cast pattern is safe when the visitor is invoked with the concrete `TValue` type that matches your stored array element type. Always ensure your stored arrays match the declared column types to avoid runtime exceptions. | ||
haianhng31 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
|
|
||
| ### Example: Conditional writer based on type | ||
|
|
||
| ```csharp | ||
| // A visitor that only writes numeric columns, skipping others | ||
| sealed class NumericOnlyWriter : ILogicalColumnWriterVisitor<bool> | ||
| { | ||
| private readonly double _fillValue; | ||
|
|
||
| public NumericOnlyWriter(double fillValue) => _fillValue = fillValue; | ||
|
|
||
| public bool OnLogicalColumnWriter<TValue>(LogicalColumnWriter<TValue> columnWriter) | ||
| { | ||
| TValue val; | ||
| if (typeof(TValue) == typeof(int) || | ||
| typeof(TValue) == typeof(double) || | ||
| typeof(TValue) == typeof(float) || | ||
| typeof(TValue) == typeof(long)) | ||
| { | ||
| // Convert _fillValue to the correct TValue | ||
| val = (TValue)Convert.ChangeType(_fillValue, typeof(TValue)); | ||
| } | ||
| else | ||
| { | ||
| // write default(TValue) so the row count matches | ||
| val = default!; | ||
| } | ||
|
|
||
| var arr = new TValue[] { val }; | ||
| columnWriter.WriteBatch(arr); | ||
| return true; | ||
| } | ||
| } | ||
| ``` | ||
|
|
||
| ## ILogicalColumnReaderVisitor<TReturn> | ||
|
|
||
| The @ParquetSharp.ILogicalColumnReaderVisitor`1 interface is invoked for logical readers (high-level typed readers). Use this when you need to read data from columns of unknown types. | ||
|
|
||
| ### Example: Convert columns to strings | ||
|
|
||
| ```csharp | ||
| // A visitor that reads all values and returns them as a comma-separated string | ||
| sealed class ColumnToStringReader : ILogicalColumnReaderVisitor<string> | ||
| { | ||
| public string OnLogicalColumnReader<TElement>(LogicalColumnReader<TElement> columnReader) | ||
| { | ||
| var sb = new StringBuilder(); | ||
| const int bufferSize = 1024; | ||
| var buffer = new TElement[bufferSize]; | ||
|
|
||
| while (columnReader.HasNext) | ||
| { | ||
| var read = columnReader.ReadBatch(buffer); | ||
| for (var i = 0; i < read; ++i) | ||
| { | ||
| sb.Append(buffer[i]?.ToString() ?? "null"); | ||
| sb.Append(", "); | ||
| } | ||
| } | ||
|
|
||
| if (sb.Length >= 2) sb.Length -= 2; | ||
| return sb.ToString(); | ||
| } | ||
| } | ||
|
|
||
| // Usage | ||
| using var logicalReader = columnReader.LogicalReader(); | ||
| var columnString = logicalReader.Apply(new ColumnToStringReader()); | ||
| Console.WriteLine($"Column data: {columnString}"); | ||
| ``` | ||
|
|
||
| ### Example: Calculate column statistics | ||
|
|
||
| ```csharp | ||
| // A visitor that computes row count for any column type | ||
| sealed class RowCountReader : ILogicalColumnReaderVisitor<long> | ||
| { | ||
| public long OnLogicalColumnReader<TElement>(LogicalColumnReader<TElement> columnReader) | ||
| { | ||
| long count = 0; | ||
| const int bufferSize = 1024; | ||
| var buffer = new TElement[bufferSize]; | ||
|
|
||
| while (columnReader.HasNext) | ||
| { | ||
| count += columnReader.ReadBatch(buffer); | ||
| } | ||
|
|
||
| return count; | ||
| } | ||
| } | ||
|
|
||
| // Usage | ||
| using var logicalReader = columnReader.LogicalReader(); | ||
| var rowCount = logicalReader.Apply(new RowCountReader()); | ||
| Console.WriteLine($"Total rows: {rowCount}"); | ||
| ``` | ||
|
|
||
| ## IColumnWriterVisitor<TReturn> | ||
|
|
||
| The @ParquetSharp.IColumnWriterVisitor`1 interface provides lower-level access to physical column writers. Use this when you need to work with physical types, definition levels, repetition levels, or encodings. | ||
|
|
||
| ### Example: Physical type inspector | ||
|
|
||
| ```csharp | ||
| // A visitor that reports the physical type being written | ||
| sealed class PhysicalTypeWriter : IColumnWriterVisitor<string> | ||
| { | ||
| public string OnColumnWriter<TValue>(ColumnWriter<TValue> columnWriter) | ||
| where TValue : unmanaged | ||
| { | ||
| var physicalType = typeof(TValue).Name; | ||
| Console.WriteLine($"Writing physical type: {physicalType}"); | ||
|
|
||
| // Could perform low-level writes here if needed | ||
| // columnWriter.WriteBatch(..., definitionLevels, repetitionLevels); | ||
|
|
||
| return physicalType; | ||
| } | ||
| } | ||
| ``` | ||
|
|
||
| ## IColumnReaderVisitor<TReturn> | ||
|
|
||
| The @ParquetSharp.IColumnReaderVisitor`1 interface provides lower-level access to physical column readers. Use this for low-level operations that require access to definition levels, repetition levels, or physical encodings. | ||
|
|
||
| ### Example: Definition level analyzer | ||
|
|
||
| ```csharp | ||
| // A visitor that counts null values using definition levels | ||
| sealed class NullCountReader : IColumnReaderVisitor<int> | ||
| { | ||
| public int OnColumnReader<TValue>(ColumnReader<TValue> columnReader) | ||
| where TValue : unmanaged | ||
| { | ||
| const int bufferSize = 1024; | ||
| var values = new TValue[bufferSize]; | ||
| var defLevels = new short[bufferSize]; | ||
| var repLevels = new short[bufferSize]; | ||
| int nullCount = 0; | ||
|
|
||
| while (columnReader.HasNext) | ||
| { | ||
| var read = columnReader.ReadBatch(bufferSize, defLevels, repLevels, values, out var valuesRead); | ||
|
|
||
| // Count definition levels that indicate null | ||
| for (int i = 0; i < read; i++) | ||
| { | ||
| if (defLevels[i] < columnReader.ColumnDescriptor.MaxDefinitionLevel) | ||
| { | ||
| nullCount++; | ||
| } | ||
| } | ||
| } | ||
|
|
||
| return nullCount; | ||
| } | ||
| } | ||
| ``` | ||
|
|
||
| ## IColumnDescriptorVisitor<TReturn> | ||
|
|
||
| The @ParquetSharp.IColumnDescriptorVisitor`1 interface visits column descriptors (schema metadata) without performing any I/O. Use this when you only need to inspect or process schema information. | ||
|
|
||
| ## Best practices | ||
|
|
||
| ### When to use each visitor type | ||
|
|
||
| - **ILogicalColumnWriterVisitor / ILogicalColumnReaderVisitor**: Use for high-level, type-safe reading and writing when column types are unknown at compile time. Ideal for generic tooling, schema-driven processing, and data exporters. | ||
|
|
||
| - **IColumnWriterVisitor / IColumnReaderVisitor**: Use for low-level operations requiring access to definition levels, repetition levels, or physical encodings. | ||
|
|
||
| - **IColumnDescriptorVisitor**: Use when you only need to inspect schema metadata without performing I/O. Perfect for schema validation, type checking, and metadata extraction. | ||
|
|
||
| ### When to avoid visitors | ||
|
|
||
| If you already know the schema at compile time, prefer the generic `LogicalWriter<T>` / `LogicalReader<T>` APIs — they are simpler and more maintainable. | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.