Skip to content

Commit b53f404

Browse files
committed
docs: update README
1 parent a03afe8 commit b53f404

File tree

4 files changed

+85
-61
lines changed

4 files changed

+85
-61
lines changed

README.md

Lines changed: 50 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ versions older than 2.6.0 are supported.
1919
* **rsync-fetcher** - fetches the repository from the remote server, and uploads it to s3.
2020
* **rsync-gateway** - serves the mirrored repository from s3 in **http** protocol.
2121
* **rsync-gc** - periodically removes old versions of files from s3.
22-
* **rsync-fix-encoding** - see "Migrating from v0.2.11 to older versions" section.
22+
* **rsync-migration** - see [Migration](#migration) section for more details.
2323

2424
## Example
2525

@@ -28,67 +28,85 @@ versions older than 2.6.0 are supported.
2828
$ RUST_LOG=info RUST_BACKTRACE=1 AWS_ACCESS_KEY_ID=<ID> AWS_SECRET_ACCESS_KEY=<KEY> \
2929
rsync-fetcher \
3030
--src rsync://upstream/path \
31-
--s3-url https://s3_api_endpoint --s3-region region --s3-bucket bucket --s3-prefix repo_name \
32-
--redis redis://localhost --redis-namespace repo_name \
33-
--repository repo_name
34-
--gateway-base http://localhost:8081/repo_name
31+
--s3-url https://s3_api_endpoint --s3-region region --s3-bucket bucket --s3-prefix prefix \
32+
--pg-url postgres://user@localhost/db \
33+
--namespace repo_name
3534
```
3635
2. Serve the repository over HTTP.
3736
```bash
3837
$ cat > config.toml <<-EOF
3938
bind = ["localhost:8081"]
39+
s3_url = "https://s3_api_endpoint"
40+
s3_region = "region"
4041
4142
[endpoints."out"]
42-
redis = "redis://localhost"
43-
redis_namespace = "test"
44-
s3_website = "http://localhost:8080/test/test-prefix"
43+
namespace = "repo_name"
44+
s3_bucket = "bucket"
45+
s3_prefix = "prefix"
4546
4647
EOF
4748
4849
$ RUST_LOG=info RUST_BACKTRACE=1 rsync-gateway <optional config file>
4950
```
50-
51-
3. GC old versions of files periodically.
51+
3. GC old versions of files manually.
5252
```bash
5353
$ RUST_LOG=info RUST_BACKTRACE=1 AWS_ACCESS_KEY_ID=<ID> AWS_SECRET_ACCESS_KEY=<KEY> \
5454
rsync-gc \
5555
--s3-url https://s3_api_endpoint --s3-region region --s3-bucket bucket --s3-prefix repo_name \
56-
--redis redis://localhost --redis-namespace repo_name \
57-
--keep 2
56+
--pg-url postgres://user@localhost/db
5857
```
59-
> It's recommended to keep at least 2 versions of files in case a gateway is still using an old revision.
58+
> It's recommended to keep at least 2 revisions in case a gateway is still using an old revision.
6059
6160
## Design
6261
6362
File data and their metadata are stored separately.
6463
6564
### Data
6665
67-
Files are stored in S3 storage, named by their blake2b-160 hash (`<namespace/<hash>`).
68-
69-
Listing html pages are stored in `<namespace>/listing-<timestamp>/<path>/index.html`.
66+
Files are stored in S3 storage, named by their blake2b-160 hash (`<prefix>/<namespace>/<hash>`).
7067
7168
### Metadata
7269
73-
Metadata is stored in Redis for fast access.
70+
Metadata is stored in Postgres.
71+
72+
An object is the smallest unit of metadata. There are three types of objects:
73+
- **File** - a regular file, with its hash, size and mtime
74+
- **Directory** - a directory, and its size and mtime
75+
- **Symlink** - a symlink, with its size, mtime and target
76+
77+
Objects (files, directories and symlinks) are organized into revisions, which are immutable. Each revision has a unique
78+
id, while an object may appear in multiple revisions. Revisions are further organized into repositories (namespaces),
79+
like `debian`, `ubuntu`, etc. Repositories are mutable.
80+
81+
A revision can be in one of the following states:
82+
83+
- **Live** - a live revision is a revision in production, which is ready to be served. There can be multiple live
84+
revisions, but only the latest one is served by the gateway.
85+
- **Partial** - a partial revision is a revision that is still being updated. It's not ready to be served yet.
86+
- **Stale** - a stale revision is a revision that is no longer in production, and is ready to be garbage collected.
87+
88+
## Migration
89+
90+
### Migration from v0.3.x to v0.4.x
91+
92+
v0.4.x switched from Redis to Postgres for storing metadata, greatly improving the performance of many operations and
93+
reducing the storage usage.
94+
95+
Use `rsync-migration redis-to-pg` to migrate old metadata to the new database. Note that you can only migrate from
96+
v0.3.x to v0.4.x, and you can't migrate from v0.2.x to v0.4.x directly.
7497
75-
Note that there are more than one file index in Redis.
98+
The old Redis database is not modified.
7699
77-
- `<namespace>:index:<timestamp>` - an index of the repository synced at `<timestamp>`.
78-
- `<namespace>:partial` - a partial index that is still being updated and not committed yet.
79-
- `<namespace>:partial-stale` - a temporary index that is used to store outdated files when updating the partial index.
80-
This might happen if you interrupt a synchronization, restart it, and some files downloaded in the first run are
81-
already outdated. It's ready to be garbage collected.
82-
- `<namespace>:stale:<timestamp>` - an index that is taken out of production, and is ready to be garbage collected.
100+
### Migrating from v0.2.x to v0.3.x
83101
84-
> Not all files in partial index should be removed. For example, if a file exists both in a stale index and a "live"
85-
> index, it should not be removed.
102+
v0.3.x uses a new encoding for file metadata, which is incompatible with v0.2.x. Trying to use v0.3.x on old data will
103+
fail.
86104
87-
## Migrating from v0.2.11 to older versions
105+
Use `rsync-migration upgrade-encoding` to upgrade the encoding.
88106
89-
There's a bug affecting all versions before v0.3.0 and after v0.2.11, which causes the file metadata to be read in a
90-
wrong format and silently corrupting the index. Note that no data is lost, but the gateway will fail to direct users to
91-
the correct file. `rsync-fix-encoding` can be used to fix this issue.
107+
This is a destructive operation, so make sure you have a backup of the database before running it. It does nothing
108+
without the `--do` flag.
92109
93-
After v0.3.0, all commands are using the new encoding. You can still use this tool to migrate old data to the new
94-
encoding. Trying to use the new commands on old data will now fail.
110+
The new encoding is actually introduced in v0.2.12 by accident. `rsync-gateway` between v0.2.12 and v0.3.0 can't parse
111+
old metadata correctly and return garbage data. No data is lost though, so if you used any version between v0.2.12 and
112+
v0.3.0, you can still use `rsync-migration` to migrate to the new encoding.

rsync-fetcher/README.md

Lines changed: 11 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
# rsync-fetcher
22

33
This is a rsync receiver implementation. Simply put, it's much like rsync, but saves the files to s3 and metadata to
4-
redis instead of to a filesystem.
4+
the database instead of to a filesystem.
55

66
## Features
77

@@ -17,19 +17,20 @@ redis instead of to a filesystem.
1717

1818
## Implementation Details
1919

20-
1. Connect to Redis and S3, check if there's already another instance (fetcher, gc) running.
20+
1. Connect to Postgres and S3, check if there's already another instance (fetcher, gc) running.
2121
2. Fetch file list from rsync server.
22-
3. Calculate the delta between the remote file list and the local index, which is
23-
the union of current production index and last partial index (if any).
24-
4. Start generator and receiver task.
25-
5. After both tasks completed, generate file listing and upload to S3.
26-
6. Commit the partial index to production.
22+
3. Calculate the delta between the remote file list and local files, which is the union of files in all live and partial
23+
revisions.
24+
4. Create a new partial revision.
25+
5. Start generator and receiver task.
26+
6. After both tasks completed, update some metadata (parents link) to speedup directory listing.
27+
7. Commit the partial revision to production.
2728

2829
Generator task:
2930

3031
1. Generates a list of files to be fetched, and sends them to the rsync server.
31-
2. If any file exists in the local index, it downloads the file, calculate the rolling checksum, and additionally sends
32-
the checksum to rsync server.
32+
2. If any file exists in an existing live or partial revision, it downloads the file, calculate the rolling checksum,
33+
and additionally sends the checksum to rsync server.
3334

3435
Receiver task:
3536

@@ -40,6 +41,4 @@ Receiver task:
4041
Uploader task:
4142

4243
1. Take files downloaded by receiver task, and upload them to S3.
43-
2. After uploading a file, updates the partial index. If the file already exists in the partial index, check if the
44-
checksum matches. If not, put the old metadata into the partial-stale index, and update the partial index with the
45-
new metadata.
44+
2. After uploading a file, updates the partial revision.

rsync-gateway/README.md

Lines changed: 19 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,24 @@
11
# rsync-gateway
22

3-
`rsync-gateway` serves the rsync repository on S3 over HTTP, using the metadata stored in redis.
3+
`rsync-gateway` serves the rsync repository on S3 over HTTP, using the metadata stored in the database.
44

55
## Implementation Details
66

7-
1. Connect to Redis.
8-
2. Spawn a watcher task to watch for the latest index.
9-
3. For each request, if the path ends with a trailing slash, it's a directory listing request. Otherwise, it's a file
10-
request.
11-
4. For directory listing requests, redirect to `<path>/index.html` on S3.
12-
5. For file requests, check if the file exists in the index. If not, return 404. Otherwise, redirect to the file on S3.
13-
Symlinks are resolved on the gateway side.
7+
1. Connect to Postgres.
8+
2. Spawn a watcher task to watch for the latest revision.
9+
3. For each request, check if there's a cache hit. Return the cached response if there is.
10+
4. Otherwise, try to resolve the path to in the revision. If the path is a directory, render the directory listing. If
11+
the path is a file, pre-sign the file on S3 and redirect to the pre-signed URL. Symlinks are followed.
12+
13+
## More details on the cache
14+
15+
There are two layers of cache: L1 and L2. Both of them are in-memory cache implemented using `moka`, a concurrent LRU
16+
cache.
17+
18+
L1 cache is raw resolved entries, while L2 cache is compressed entries. The L2 cache is used to reduce memory
19+
usage, since the raw resolved entries can be quite large when there are many files in a directory.
20+
21+
The size of the L1 cache is 32MB, and the size of the L2 cache is 128MB. There's a TTL for pre-signed URLs in both
22+
caches, which is half the pre-signed URL expiration time.
23+
24+
It's a NINE cache. The eviction of L1 and L2 caches are independent and asynchronous.

rsync-gc/README.md

Lines changed: 5 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -4,14 +4,10 @@
44

55
## Implementation Details
66

7-
1. Connect to Redis and S3, check if there's already another instance (fetcher, gc) running.
8-
2. Enumerate all production indexes and filter out ones to be removed.
9-
3. Rename the indexes to be removed to `stale:<timestamp>`.
10-
4. Delete all listing files in S3 belonging to the indexes to be removed.
11-
5. Delete object files that are not referenced by any live index.
7+
1. Connect to Postgres and S3, check if there's already another instance (fetcher, gc) running.
8+
2. Enumerate all production revisions and filter out ones to be removed. Mark them as stale.
9+
3. Delete object files that are not referenced by any live or partial revision.
1210
> Note that this is calculated by
1311
>
14-
> Sigma_(stale) (key.hash) - Sigma_(alive) (key.hash)
15-
>
16-
> Because we don't have a way to get the universe set of all keys in S3.
17-
6. Remove stale indexes from Redis.
12+
> Sigma_(stale) (key.hash) - Sigma_(live) (key.hash) - Sigma_(partial) (key.hash)
13+
4. Remove stale revisions from Postgres.

0 commit comments

Comments
 (0)