Skip to content

Commit 2f12e06

Browse files
authored
Set up API Reference website with DocFX (#498)
1 parent 36e8ccd commit 2f12e06

34 files changed

+568
-93
lines changed

.github/workflows/publish-docs.yml

Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
name: Publish Docs
2+
3+
on:
4+
push:
5+
branches:
6+
- master
7+
- jescalada-docfx-setup
8+
9+
permissions:
10+
actions: read
11+
pages: write
12+
id-token: write
13+
14+
jobs:
15+
build-and-deploy:
16+
runs-on: ubuntu-latest
17+
steps:
18+
- name: Checkout Repository
19+
uses: actions/checkout@v3
20+
21+
- name: Setup Python
22+
uses: actions/setup-python@v4
23+
with:
24+
python-version: 3.x
25+
26+
- name: Run Preprocessing Script
27+
run: python docs/tools/preprocess_docs.py
28+
29+
- name: Setup .NET
30+
uses: actions/setup-dotnet@v3
31+
with:
32+
dotnet-version: 8.x
33+
34+
- name: Install DocFX
35+
run: dotnet tool update -g docfx
36+
37+
- name: Build Documentation
38+
run: docfx docfx.json
39+
40+
- name: Upload Site Artifact
41+
uses: actions/upload-pages-artifact@v3
42+
with:
43+
path: '_site'
44+
45+
- name: Deploy to GitHub Pages
46+
uses: actions/deploy-pages@v4

.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,9 @@ obj
44
nuget
55
BenchmarkDotNet.Artifacts
66
.vs
7+
_site
8+
api
9+
.manifest
710

811
# The solution files get generated by vcpkg on Windows
912
# and by the C# Dev Kit within a dev container.

README.md

Lines changed: 106 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
![Main logo](logo/svg/ParquetSharp_SignatureLogo_RGB-Black.svg)
1+
![Main logo](images/logo/svg/ParquetSharp_SignatureLogo_RGB-Black.svg)
22

33
## Introduction
44

@@ -35,56 +35,126 @@ Supported platforms:
3535

3636
The following examples show how to write and then read a Parquet file with three columns representing a timeseries of object-value pairs.
3737
These use the low-level API, which is the recommended API for working with native .NET types and closely maps to the API of Apache Parquet C++.
38-
For reading and writing data in the [Apache Arrow](https://arrow.apache.org/) format, an [Arrow based API](docs/Arrow.md) is also provided.
38+
For reading and writing data in the [Apache Arrow](https://arrow.apache.org/) format, an [Arrow-based API](docs/Arrow.md) is also provided.
3939

40-
### How to write a Parquet File:
40+
### 1. Initialize a new project
4141

42-
```csharp
43-
var timestamps = new DateTime[] { /* ... */ };
44-
var objectIds = new int[] { /* ... */ };
45-
var values = new float[] { /* ... */ };
42+
First, let's create a new console application:
4643

47-
var columns = new Column[]
48-
{
49-
new Column<DateTime>("Timestamp"),
50-
new Column<int>("ObjectId"),
51-
new Column<float>("Value")
52-
};
44+
```bash
45+
dotnet new console -n ParquetExample
46+
cd ParquetExample
47+
```
5348

54-
using var file = new ParquetFileWriter("float_timeseries.parquet", columns);
55-
using var rowGroup = file.AppendRowGroup();
49+
In your project directory, you'll find a `Program.cs` file that we'll use to write a Parquet file, and then read it back.
5650

57-
using (var timestampWriter = rowGroup.NextColumn().LogicalWriter<DateTime>())
58-
{
59-
timestampWriter.WriteBatch(timestamps);
60-
}
61-
using (var objectIdWriter = rowGroup.NextColumn().LogicalWriter<int>())
62-
{
63-
objectIdWriter.WriteBatch(objectIds);
64-
}
65-
using (var valueWriter = rowGroup.NextColumn().LogicalWriter<float>())
51+
### 2. Install ParquetSharp
52+
53+
ParquetSharp is available as a [NuGet package](https://www.nuget.org/packages/ParquetSharp/). You can install it using the following command:
54+
55+
```bash
56+
dotnet add package ParquetSharp
57+
```
58+
59+
### 3. Write a Parquet File
60+
61+
This example shows how to write a Parquet file with three columns: `Timestamp`, `ObjectId`, and `Value`.
62+
63+
Update your `Program.cs` with the following code:
64+
65+
```csharp
66+
using System;
67+
using ParquetSharp;
68+
69+
class Program
6670
{
67-
valueWriter.WriteBatch(values);
71+
static void Main()
72+
{
73+
var timestamps = new DateTime[] { DateTime.Now, DateTime.Now.AddMinutes(1) };
74+
var objectIds = new int[] { 1, 2 };
75+
var values = new float[] { 1.23f, 4.56f };
76+
77+
var columns = new Column[]
78+
{
79+
new Column<DateTime>("Timestamp"),
80+
new Column<int>("ObjectId"),
81+
new Column<float>("Value")
82+
};
83+
84+
using var file = new ParquetFileWriter("float_timeseries.parquet", columns);
85+
using var rowGroup = file.AppendRowGroup();
86+
87+
using (var timestampWriter = rowGroup.NextColumn().LogicalWriter<DateTime>())
88+
{
89+
timestampWriter.WriteBatch(timestamps);
90+
}
91+
using (var objectIdWriter = rowGroup.NextColumn().LogicalWriter<int>())
92+
{
93+
objectIdWriter.WriteBatch(objectIds);
94+
}
95+
using (var valueWriter = rowGroup.NextColumn().LogicalWriter<float>())
96+
{
97+
valueWriter.WriteBatch(values);
98+
}
99+
100+
file.Close();
101+
Console.WriteLine("Parquet file written successfully!");
102+
}
68103
}
104+
```
69105

70-
file.Close();
106+
You can execute it with:
107+
108+
```bash
109+
dotnet run
71110
```
72111

73-
### How to read a Parquet file:
112+
### 4. Read a Parquet File
74113

75-
```csharp
76-
using var file = new ParquetFileReader("float_timeseries.parquet");
114+
After writing the Parquet file, we can read it back by updating the `Program.cs` file with the following code:
77115

78-
for (int rowGroup = 0; rowGroup < file.FileMetaData.NumRowGroups; ++rowGroup) {
79-
using var rowGroupReader = file.RowGroup(rowGroup);
80-
var groupNumRows = checked((int) rowGroupReader.MetaData.NumRows);
116+
```csharp
117+
using System;
118+
using ParquetSharp;
81119

82-
var groupTimestamps = rowGroupReader.Column(0).LogicalReader<DateTime>().ReadAll(groupNumRows);
83-
var groupObjectIds = rowGroupReader.Column(1).LogicalReader<int>().ReadAll(groupNumRows);
84-
var groupValues = rowGroupReader.Column(2).LogicalReader<float>().ReadAll(groupNumRows);
120+
class Program
121+
{
122+
static void Main()
123+
{
124+
using var file = new ParquetFileReader("float_timeseries.parquet");
125+
126+
for (int rowGroup = 0; rowGroup < file.FileMetaData.NumRowGroups; ++rowGroup)
127+
{
128+
using var rowGroupReader = file.RowGroup(rowGroup);
129+
var groupNumRows = checked((int)rowGroupReader.MetaData.NumRows);
130+
131+
var groupTimestamps = rowGroupReader.Column(0).LogicalReader<DateTime>().ReadAll(groupNumRows);
132+
var groupObjectIds = rowGroupReader.Column(1).LogicalReader<int>().ReadAll(groupNumRows);
133+
var groupValues = rowGroupReader.Column(2).LogicalReader<float>().ReadAll(groupNumRows);
134+
135+
Console.WriteLine("Read Parquet file:");
136+
for (int i = 0; i < groupNumRows; ++i)
137+
{
138+
Console.WriteLine($"Timestamp: {groupTimestamps[i]}, ObjectId: {groupObjectIds[i]}, Value: {groupValues[i]}");
139+
}
140+
}
141+
142+
file.Close();
143+
}
85144
}
145+
```
86146

87-
file.Close();
147+
Once again, run the program with:
148+
149+
```bash
150+
dotnet run
151+
```
152+
153+
This should give you an output similar to:
154+
```
155+
Read Parquet file:
156+
Timestamp: 2025-01-25 10:15:25 AM, ObjectId: 1, Value: 1.23
157+
Timestamp: 2025-01-25 10:16:25 AM, ObjectId: 2, Value: 4.56
88158
```
89159

90160
## Documentation

docfx.json

Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
{
2+
"$schema": "https://raw.githubusercontent.com/dotnet/docfx/main/schemas/docfx.schema.json",
3+
"metadata": [
4+
{
5+
"src": [
6+
{
7+
"src": "./csharp",
8+
"files": [
9+
"**/*.csproj"
10+
]
11+
}
12+
],
13+
"dest": "api",
14+
}
15+
],
16+
"build": {
17+
"content": [
18+
{
19+
"files": [
20+
"**/*.{md,yml}"
21+
],
22+
"exclude": [
23+
"_site/**"
24+
]
25+
}
26+
],
27+
"resource": [
28+
{
29+
"files": [
30+
"images/**"
31+
]
32+
}
33+
],
34+
"output": "_site",
35+
"template": [
36+
"default",
37+
"modern"
38+
],
39+
"globalMetadata": {
40+
"_appFaviconPath": "images/logo/svg/ParquetSharp_IconLogo_RGB-Black.svg",
41+
"_appLogoPath": "images/logo/svg/ParquetSharp_IconLogo_RGB-Black_small.svg",
42+
"_appName": "ParquetSharp",
43+
"_appTitle": "ParquetSharp",
44+
"_enableNewTab": true,
45+
"_enableSearch": true
46+
}
47+
}
48+
}

docs/Arrow.md

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -5,13 +5,13 @@ These are wrapped by ParquetSharp using the [Arrow C data interface](https://arr
55
to allow high performance reading and writing of Arrow data with zero copying of array data between C++ and .NET.
66

77
The Arrow API is contained in the `ParquetSharp.Arrow` namespace,
8-
and included in the `ParquetSharp` NuGet package.
8+
and included in the [ParquetSharp NuGet package](https://www.nuget.org/packages/ParquetSharp/).
99

1010
## Reading Arrow data
1111

1212
Reading Parquet data in Arrow format uses a `ParquetSharp.Arrow.FileReader`.
1313
This can be constructed using a file path, a .NET `System.IO.Stream`,
14-
or a subclass of `ParquetShap.IO.RandomAccessFile`.
14+
or a subclass of `ParquetSharp.IO.RandomAccessFile`.
1515
In this example, we'll open a file using a path:
1616

1717
```csharp
@@ -33,7 +33,7 @@ foreach (var field in schema.FieldsList)
3333
### Reading data
3434

3535
To read data from the file, we use the `GetRecordBatchReader` method,
36-
which returns an `Apache.Arrow.IArrowArrayStream`.
36+
which returns an [`Apache.Arrow.Ipc.IArrowArrayStream`](https://github.com/apache/arrow/blob/main/csharp/src/Apache.Arrow/Ipc/IArrowArrayStream.cs).
3737
By default, this will read data for all row groups in the file and all columns,
3838
but you can also specify which columns to read using their index in the schema,
3939
and specify which row groups to read:
@@ -68,8 +68,8 @@ the reader properties, discussed below.
6868

6969
### Reader properties
7070

71-
The `FileReader` constructor accepts an instance of `ParquetSharp.ReaderProperties`
72-
to control standard Parquet reading behaviour,
71+
The `ParquetSharp.Arrow.FileReader` constructor accepts an instance of
72+
`ParquetSharp.ReaderProperties` to control standard Parquet reading behaviour,
7373
and additionally accepts an instance of `ParquetSharp.Arrow.ArrowReaderProperties`
7474
to customise Arrow specific behaviour:
7575

@@ -134,8 +134,8 @@ RecordBatch GetBatch(int batchNumber) =>
134134
}, numIds);
135135
```
136136

137-
Now we create a `FileWriter`, specifying the path to write to and
138-
the file schema:
137+
Now we create a `ParquetSharp.Arrow.FileWriter`, specifying the path to write to and the
138+
file schema:
139139

140140
```csharp
141141
using var writer = new FileWriter("data.parquet", schema);
@@ -207,8 +207,8 @@ writer.Close();
207207

208208
### Writer properties
209209

210-
The `FileWriter` constructor accepts an instance of `ParquetSharp.WriterProperties`
211-
to control standard Parquet writing behaviour,
210+
The `ParquetSharp.Arrow.FileWriter` constructor accepts an instance of
211+
`ParquetSharp.WriterProperties` to control standard Parquet writing behaviour,
212212
and additionally accepts an instance of `ParquetSharp.Arrow.ArrowWriterProperties`
213213
to customise Arrow specific behaviour:
214214

0 commit comments

Comments
 (0)