Skip to content
This repository was archived by the owner on Nov 16, 2023. It is now read-only.

Commit 15d45fb

Browse files
maxkazmsftyalaudahsogandak
authored
V00.01.00003 release (#356)
* cleaning up files which are no longer needed * fixes after removing forking workflow (#322) * PR to resolve merge issues * updated main build as well * added ability to read in git branch name directly * manually updated the other files * fixed number of classes for main build tests (#327) * fixed number of classes for main build tests * corrected DATASET.ROOT in builds * added dev build script * Fixes for development inside the docker container (#335) * Fix the mound command for the HRNet pretrained model in the docker readme * Properly catch InvalidGitRepository exception * make repo paths consistent with non-docker runs -- this way configs paths do not need to be changed * Properly catch InvalidGitRepository exception in train.py * Readme update (#337) * README updates * Removing user specific path from config Authored-by: Fatemeh Zamanian <[email protected]> * Fixing #324 and #325 (#338) * update colormap to a non-discrete one -- fixes #324 * fix mask_to_disk to normalize by n_classes * changes to test.py * Updating data.py * bug fix * increased timeout time for main_build * retrigger build * retrigger the build * increase timeout * fixes 318 (#339) * finished 318 * increased checkerboard test timeout * fix 333 (#340) * added label correction to train gradient * changing the gradient data generator to take inline/crossline argument conssistent with the patchloader * changing variable name to be more descriptive Co-authored-by: maxkazmsft <[email protected]> * bug fix to model predictions (#345) * replace hrnet with seresnet in experiments - provides stable default model (#343) Co-authored-by: yalaudah <[email protected]> Co-authored-by: Fatemeh <[email protected]>
1 parent 904157c commit 15d45fb

File tree

30 files changed

+538
-160
lines changed

30 files changed

+538
-160
lines changed

README.md

Lines changed: 11 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -3,12 +3,13 @@
33

44
This repository shows you how to perform seismic imaging and interpretation on Azure. It empowers geophysicists and data scientists to run seismic experiments using state-of-art DSL-based PDE solvers and segmentation algorithms on Azure.
55

6+
67
The repository provides sample notebooks, data loaders for seismic data, utilities, and out-of-the-box ML pipelines, organized as follows:
78
- **sample notebooks**: these can be found in the `examples` folder - they are standard Jupyter notebooks which highlight how to use the codebase by walking the user through a set of pre-made examples
89
- **experiments**: the goal is to provide runnable Python scripts that train and test (score) our machine learning models in the `experiments` folder. The models themselves are swappable, meaning a single train script can be used to run a different model on the same dataset by simply swapping out the configuration file which defines the model.
9-
- **pip installable utilities**: we provide `cv_lib` and `deepseismic_interpretation` utilities (more info below) which are used by both sample notebooks and experiments mentioned above
10+
- **pip installable utilities**: we provide `cv_lib` and `interpretation` utilities (more info below) which are used by both sample notebooks and experiments mentioned above
1011

11-
DeepSeismic currently focuses on Seismic Interpretation (3D segmentation aka facies classification) with experimental code provided around Seismic Imaging in the contrib folder.
12+
DeepSeismic currently focuses on Seismic Interpretation (mainly facies classification) with experimental code provided around Seismic Imaging in the contrib folder.
1213

1314
### Quick Start
1415

@@ -26,7 +27,7 @@ If you run into any problems, chances are your problem has already been solved i
2627
The notebook is designed to be run in demo mode by default using a pre-trained model in under 5 minutes on any reasonable Deep Learning GPU such as nVidia K80/P40/P100/V100/TitanV.
2728

2829
### Azure Machine Learning
29-
[Azure Machine Learning](https://docs.microsoft.com/en-us/azure/machine-learning/) enables you to train and deploy your machine learning models and pipelines at scale, ane leverage open-source Python frameworks, such as PyTorch, TensorFlow, and scikit-learn. If you are looking at getting started with using the code in this repository with Azure Machine Learning, refer to [Azure Machine Learning How-to](https://github.com/Azure/MachineLearningNotebooks/tree/master/how-to-use-azureml) to get started.
30+
[Azure Machine Learning](https://docs.microsoft.com/en-us/azure/machine-learning/) enables you to train and deploy your machine learning models and pipelines at scale, and leverage open-source Python frameworks, such as PyTorch, TensorFlow, and scikit-learn. If you are looking at getting started with using the code in this repository with Azure Machine Learning, refer to [Azure Machine Learning How-to](https://github.com/Azure/MachineLearningNotebooks/tree/master/how-to-use-azureml) to get started.
3031

3132
## Interpretation
3233
For seismic interpretation, the repository consists of extensible machine learning pipelines, that shows how you can leverage state-of-the-art segmentation algorithms (UNet, SEResNET, HRNet) for seismic interpretation.
@@ -120,9 +121,12 @@ To prepare the data for the experiments (e.g. split into train/val/test), please
120121
cd scripts
121122
122123
# For patch-based experiments
123-
python prepare_dutchf3.py split_train_val patch --data_dir=${data_dir} --label_file=train/train_labels.npy --output_dir=splits \
124+
python prepare_dutchf3.py split_train_val patch --data_dir=${data_dir}/data --label_file=train/train_labels.npy --output_dir=splits \
124125
--stride=50 --patch_size=100 --split_direction=both
125126
127+
# For section-based experiments
128+
python prepare_dutchf3.py split_train_val section --data-dir=${data_dir}/data --label_file=train/train_labels.npy --output_dir=splits \ --split_direction=both
129+
126130
# go back to repo root
127131
cd ..
128132
```
@@ -164,7 +168,7 @@ We use [YACS](https://github.com/rbgirshick/yacs) configuration library to manag
164168
- __yml config files__ - YAML configuration files under `configs/` are typically created one for each experiment. These are meant to be used for repeatable experiment runs and reproducible settings. Each configuration file only overrides the options that are changing in that experiment (e.g. options loaded from `defaults.py` during an experiment run will be overridden by arguments loaded from the yaml file). As an example, to use yml configuration file with the training script, run:
165169

166170
```
167-
python train.py --cfg "configs/hrnet.yaml"
171+
python train.py --cfg "configs/seresnet_unet.yaml"
168172
```
169173
170174
- __command line__ - Finally, options can be passed in through `options` argument, and those will override arguments loaded from the configuration file. We created CLIs for all our scripts (using Python Fire library), so you can pass these options via command-line arguments, like so:
@@ -229,8 +233,8 @@ This section contains benchmarks of different algorithms for seismic interpretat
229233

230234

231235
#### Reproduce benchmarks
232-
In order to reproduce the benchmarks, you will need to navigate to the [experiments](experiments) folder. In there, each of the experiments are split into different folders. To run the Netherlands F3 experiment navigate to the [dutchf3_patch/local](experiments/dutchf3_patch/local) folder. In there is a training script [([train.sh](experiments/dutchf3_patch/local/train.sh))
233-
which will run the training for any configuration you pass in. Once you have run the training you will need to run the [test.sh](experiments/dutchf3_patch/local/test.sh) script. Make sure you specify
236+
In order to reproduce the benchmarks, you will need to navigate to the [experiments](experiments) folder. In there, each of the experiments are split into different folders. To run the Netherlands F3 experiment navigate to the [dutchf3_patch/local](experiments/interpretation/dutchf3_patch/local) folder. In there is a training script [([train.sh](experiments/interpretation/dutchf3_patch/local/train.sh))
237+
which will run the training for any configuration you pass in. Once you have run the training you will need to run the [test.sh](experiments/interpretation/dutchf3_patch/local/test.sh) script. Make sure you specify
234238
the path to the best performing model from your training run, either by passing it in as an argument or altering the YACS config file.
235239

236240
## Contributing

contrib/experiments/interpretation/dutchf3_patch/distributed/train.py

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -2,10 +2,10 @@
22
# Licensed under the MIT License.
33
#
44
# To Run on 2 GPUs
5-
# python -m torch.distributed.launch --nproc_per_node=2 train.py --cfg "configs/hrnet.yaml"
5+
# python -m torch.distributed.launch --nproc_per_node=2 train.py --cfg "configs/seresnet_unet.yaml"
66
#
77
# To Test:
8-
# python -m torch.distributed.launch --nproc_per_node=2 train.py TRAIN.END_EPOCH 1 TRAIN.SNAPSHOTS 1 --cfg "configs/hrnet.yaml" --debug
8+
# python -m torch.distributed.launch --nproc_per_node=2 train.py TRAIN.END_EPOCH 1 TRAIN.SNAPSHOTS 1 --cfg "configs/seresnet_unet.yaml" --debug
99
#
1010
# /* spell-checker: disable */
1111
"""Train models on Dutch F3 dataset
@@ -138,7 +138,7 @@ def run(*options, cfg=None, local_rank=0, debug=False):
138138
stride=config.TRAIN.STRIDE,
139139
patch_size=config.TRAIN.PATCH_SIZE,
140140
augmentations=train_aug,
141-
)
141+
)
142142

143143
val_set = TrainPatchLoader(
144144
config.DATASET.ROOT,
@@ -154,10 +154,10 @@ def run(*options, cfg=None, local_rank=0, debug=False):
154154

155155
if debug:
156156
val_set = data.Subset(val_set, range(config.VALIDATION.BATCH_SIZE_PER_GPU))
157-
train_set = data.Subset(train_set, range(config.TRAIN.BATCH_SIZE_PER_GPU*2))
158-
157+
train_set = data.Subset(train_set, range(config.TRAIN.BATCH_SIZE_PER_GPU * 2))
158+
159159
logger.info(f"Training examples {len(train_set)}")
160-
logger.info(f"Validation examples {len(val_set)}")
160+
logger.info(f"Validation examples {len(val_set)}")
161161

162162
train_sampler = torch.utils.data.distributed.DistributedSampler(train_set, num_replicas=world_size, rank=local_rank)
163163

@@ -193,7 +193,7 @@ def run(*options, cfg=None, local_rank=0, debug=False):
193193

194194
model = torch.nn.parallel.DistributedDataParallel(model, device_ids=[device], find_unused_parameters=True)
195195

196-
snapshot_duration = epochs_per_cycle * len(train_loader) if not debug else 2*len(train_loader)
196+
snapshot_duration = epochs_per_cycle * len(train_loader) if not debug else 2 * len(train_loader)
197197

198198
warmup_duration = 5 * len(train_loader)
199199

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,3 @@
11
#!/bin/bash
22
export PYTHONPATH=/data/home/mat/repos/DeepSeismic/interpretation:$PYTHONPATH
3-
python -m torch.distributed.launch --nproc_per_node=8 train.py --cfg configs/hrnet.yaml
3+
python -m torch.distributed.launch --nproc_per_node=8 train.py --cfg configs/seresnet_unet.yaml

contrib/experiments/interpretation/dutchf3_voxel/README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,13 @@
1-
First, make sure that `${HOME}/data/dutch_f3` folder exists and you have write access.
1+
First, make sure that `${HOME}/data/dutch` folder exists and you have write access.
22

33
Next, to get the main input dataset which is the [Dutch F3 dataset](https://terranubis.com/datainfo/Netherlands-Offshore-F3-Block-Complete),
44
navigate to [MalenoV](https://github.com/bolgebrygg/MalenoV) project website and follow the links (which will lead to
55
[this](https://drive.google.com/drive/folders/0B7brcf-eGK8CbGhBdmZoUnhiTWs) download). Save this file as
6-
`${HOME}/data/dutch_f3/data.segy`
6+
`${HOME}/data/dutch/data.segy`
77

88
To download the train and validation masks, from the root of the repo, run
99
```bash
10-
./contrib/scripts/get_F3_voxel.sh ${HOME}/data/dutch_f3
10+
./contrib/scripts/get_F3_voxel.sh ${HOME}/data/dutch
1111
```
1212

1313
This will also download train and validation masks to the same location as data.segy.

contrib/experiments/interpretation/dutchf3_voxel/configs/texture_net.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ WINDOW_SIZE: 65
1515

1616
DATASET:
1717
NUM_CLASSES: 2
18-
ROOT: /home/maxkaz/data/dutchf3
18+
ROOT: /home/username/data/dutchf3
1919
FILENAME: data.segy
2020

2121
MODEL:

contrib/experiments/interpretation/dutchf3_voxel/train.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -158,7 +158,7 @@ def run(*options, cfg=None):
158158
def _select_pred_and_mask(model_out):
159159
# receive a tuple of (x, y_pred), y
160160
# so actually in line 51 of
161-
# cv_lib/cv_lib/segmentation/dutch_f3/metrics/__init__.py
161+
# cv_lib/cv_lib/segmentation/dutch/metrics/__init__.py
162162
# we do the following line, so here we just select the model
163163
# _, y_pred = torch.max(model_out[0].squeeze(), 1, keepdim=True)
164164
y_pred = model_out[0].squeeze()

contrib/experiments/interpretation/penobscot/local/test.py

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
# Licensed under the MIT License.
33
#
44
# To Test:
5-
# python test.py TRAIN.END_EPOCH 1 TRAIN.SNAPSHOTS 1 --cfg "configs/hrnet.yaml" --debug
5+
# python test.py TRAIN.END_EPOCH 1 TRAIN.SNAPSHOTS 1 --cfg "configs/seresnet_unet.yaml" --debug
66
#
77
# /* spell-checker: disable */
88
"""Train models on Penobscot dataset
@@ -244,9 +244,7 @@ def _select_max(pred_tensor):
244244
def _tensor_to_numpy(pred_tensor):
245245
return pred_tensor.squeeze().cpu().numpy()
246246

247-
transform_func = compose(
248-
np_to_tb, decode_segmap, _tensor_to_numpy,
249-
)
247+
transform_func = compose(np_to_tb, decode_segmap, _tensor_to_numpy,)
250248

251249
transform_pred = compose(transform_func, _select_max)
252250

contrib/experiments/interpretation/penobscot/local/train.py

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
# Licensed under the MIT License.
33
#
44
# To Test:
5-
# python train.py TRAIN.END_EPOCH 1 TRAIN.SNAPSHOTS 1 --cfg "configs/hrnet.yaml" --debug
5+
# python train.py TRAIN.END_EPOCH 1 TRAIN.SNAPSHOTS 1 --cfg "configs/seresnet_unet.yaml" --debug
66
#
77
# /* spell-checker: disable */
88
"""Train models on Penobscot dataset
@@ -43,6 +43,7 @@
4343

4444
mask_value = 255
4545

46+
4647
def _prepare_batch(batch, device=None, non_blocking=False):
4748
x, y, ids, patch_locations = batch
4849
return (
@@ -253,9 +254,7 @@ def _select_max(pred_tensor):
253254
def _tensor_to_numpy(pred_tensor):
254255
return pred_tensor.squeeze().cpu().numpy()
255256

256-
transform_func = compose(
257-
np_to_tb, decode_segmap, _tensor_to_numpy,
258-
)
257+
transform_func = compose(np_to_tb, decode_segmap, _tensor_to_numpy,)
259258

260259
transform_pred = compose(transform_func, _select_max)
261260

contrib/experiments/interpretation/voxel2pixel/test_parallel.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -245,7 +245,7 @@ def main_worker(gpu, ngpus_per_node, args):
245245

246246

247247
parser = argparse.ArgumentParser(description="Seismic Distributed Scoring")
248-
parser.add_argument("-d", "--data", default="/home/maxkaz/data/dutchf3", type=str, help="default dataset folder name")
248+
parser.add_argument("-d", "--data", default="/home/username/data/dutchf3", type=str, help="default dataset folder name")
249249
parser.add_argument(
250250
"-s",
251251
"--slice",

contrib/experiments/interpretation/voxel2pixel/train.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@
1515
import utils
1616

1717
# Parameters
18-
ROOT_PATH = "/home/maxkaz/data/dutchf3"
18+
ROOT_PATH = "/home/username/data/dutchf3"
1919
INPUT_VOXEL = "data.segy"
2020
TRAIN_MASK = "inline_339.png"
2121
VAL_MASK = "inline_405.png"

0 commit comments

Comments
 (0)