Skip to content

Commit ebcef12

Browse files
Merge pull request #311 from 1190303125/2.0.0
modify resume, wandb and instruction
2 parents 0a3dc3f + 8cdeec0 commit ebcef12

File tree

9 files changed

+131
-28
lines changed

9 files changed

+131
-28
lines changed

asset/basic_training.md

Lines changed: 100 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,100 @@
1+
# Basic Training
2+
## config
3+
You may want to load your configurations in equivalent ways:
4+
* cmd
5+
* config files
6+
* yaml
7+
8+
### cmd
9+
You may want to change configurations in the command line like ``--xx=yy``. ``xx`` is the name of the parameters and ``yy`` is the corresponding value. for example:
10+
11+
```bash
12+
python run_textbox.py --model=BART --model_path=facebook/bart-base --epochs=1
13+
```
14+
15+
It's suitable for **a few temporary** modifications with cmd like:
16+
* ``model``
17+
* ``model_path``
18+
* ``dataset``
19+
* ``epochs``
20+
* ...
21+
22+
### config files
23+
24+
You can also modify configurations through the local files:
25+
```bash
26+
python run_textbox.py ... --config_files <config-file-one> <config-file-two>
27+
```
28+
29+
Every config file is an additional yaml file like:
30+
31+
```yaml
32+
efficient_methods: ['prompt-tuning']
33+
```
34+
It's suitable for **a large number of** modifications or **long-term** modifications with cmd like:
35+
* ``efficient_methods``
36+
* ``efficient_kwargs``
37+
* ...
38+
39+
### yaml
40+
41+
The original configurations are in the yaml files. You can check the values there, but it's not recommended to modify the files except for **permanent** modification of the dataset. These files are in the path ``textbox\properties``:
42+
* ``overall.yaml``
43+
* ``dataset\*.yaml``
44+
* ``model\*yaml``
45+
46+
47+
## trainer
48+
49+
You can choose an optimizer and scheduler through `optimizer=<optimizer-name>` and `scheduler=<scheduler-name>`. We provide a wrapper around **pytorch optimizer**, which means parameters like `epsilon` or `warmup_steps` can be specified with keyword dictionaries `optimizer_kwargs={'epsilon': ... }` and `scheduler_kwargs={'warmup_steps': ... }`. See [pytorch optimizer](https://pytorch.org/docs/stable/optim.html#algorithms) and scheduler for a complete tutorial. <!-- TODO -->
50+
51+
Validation frequency is introduced to validate the model **at each specific batch-steps or epoch**. Specify `valid_strategy` (either `'step'` or `'epoch'`) and `valid_steps=<int>` to adjust the pace. Specifically, the traditional train-validate paradigm is a special case with `valid_strategy=epoch` and `valid_steps=1`.
52+
53+
`max_save=<int>` indicates **the maximal amount of saved files** (checkpoint and generated corpus during evaluation). `-1`: save every file, `0`: do not save any file, `1`: only save the file with the best score, and `n`: save both the best and the last $n−1$ files.
54+
55+
According to ``metrics_for_best_model``, the score of the current checkpoint will be calculated, and evaluation metrics specified with ``metrics``([full list](evaluation.md)) will be chosen. **Early stopping** can be configured with `stopping_steps=<int>` and score of every checkpoint.
56+
57+
58+
```bash
59+
python run_textbox.py ... --stopping_steps=8 \\
60+
--metrics_for_best_model=\[\'rouge-1\', \'rouge-w\'\] \\
61+
--metrics=\[\'rouge\'\]
62+
```
63+
64+
You can resume from a **previous checkpoint** through ``model_path=<checkpoint_path>``.When you want to restore **all trainer parameters** like optimizer and start_epoch, you can set ``resume_training=True``. Otherwise, only **model and tokenizer** will be loaded. The script below will resume training from checkpoint in the path ``saved/BART-samsum-2022-Dec-18_20-57-47/checkpoint_best``
65+
66+
```bash
67+
python run_textbox --model_path=saved/BART-samsum-2022-Dec-18_20-57-47/checkpoint_best \\
68+
--resume_training=True
69+
```
70+
71+
Other commonly used parameters include `epochs=<int>` and `max_steps=<int>` (indicating maximum iteration of epochs and batch steps, if you set `max_steps`, `epochs` will be invalid), `learning_rate=<float>`, `train_batch_size=<int>`, `weight_decay=<bool>`, and `grad_clip=<bool>`.
72+
73+
### Partial Experiment
74+
75+
You can run the partial experiment with `do_train`, `do_valid`and `do_test`. You can test your pipeline and debug with `quick_test=<amount-of-data-to-load>` to load just a few examples.
76+
77+
The following script loads the trained model from a local path and conducts generation and evaluation without training and evaluation.
78+
```bash
79+
python run_textbox.py --model_path=saved/BART-samsum-2022-Dec-18_20-57-47/checkpoint_best \\
80+
--do_train=False --do_valid=False
81+
```
82+
83+
## wandb
84+
85+
If you are running your code in jupyter environments, you may want to log in by simply setting an environment variable (your key may be stored in plain text):
86+
87+
```python
88+
%env WANDB_API_KEY=<your-key>
89+
```
90+
Here you can set wandb with `wandb`.
91+
92+
If you are debugging your model, you may want to **disable W&B** with `--wandb=disabled`, and **none of the metrics** will be recorded. You can also disable **sync only** with `--wandb=offline` and enable it again with `--wandb=online` to upload to the cloud. Meanwhile, the parameter can be configured in the yaml file like:
93+
94+
```yaml
95+
wandb: online
96+
```
97+
98+
The local files can be uploaded by executing `wandb sync` in the command line.
99+
100+
After configuration, you can throttle wandb prompts by defining the environment variable `export WANDB_SILENT=false`. For more information, see [documentation](docs.wandb.ai).

install.sh

Lines changed: 1 addition & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@ esac
3535

3636
echo "Installation may take a few minutes."
3737
echo -e "\033[0;32mInstalling torch ...\033[0m"
38-
conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch
38+
conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.3 -c pytorch
3939

4040
echo -e "\033[0;32mInstalling requirements ...\033[0m"
4141
pip install -r requirements.txt
@@ -75,16 +75,6 @@ chmod +rx $F2RExpDIR/WordNet-2.0.exc.db
7575
pip uninstall py-rouge
7676
pip install rouge > /dev/null
7777

78-
echo -e "\033[0;32mInstalling requirements (libxml) ...\033[0m"
79-
if [[ "$OSTYPE" == "darwin"* ]]; then
80-
brewinstall libxml2 cpanminus
81-
cpanm --force XML::Parser
82-
else
83-
if [ -x "$(command -v apt-get)" ]; then sudo apt-get install libxml-parser-perl
84-
elif [ -x "$(command -v yum)" ]; then sudo yum install -y "perl(XML::LibXML)"
85-
else echo -e '\033[0;31mFailed to install libxml. See https://github.com/pltrdy/files2rouge/issues/9 for more information.\033[0m' && exit;
86-
fi
87-
fi
8878

8979
echo -e "\033[0;32mInstalling requirements (transformers) ...\033[0m"
9080
git clone https://github.com/RUCAIBox/transformers.git

instructions/RNN.md

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
## RNN
2+
3+
You can train a RNN encoder-decoder with attention from scratch with this model. Three models are available:
4+
* RNN
5+
* GRU
6+
* LSTM
7+
8+
You can choose them through ``model=RNN``,``model=GRU``,``model=LSTM``. Meanwhile, you can check or modify the default parameters of the model in ``textbox/property/model/rnn.yaml(gru.yaml)(lstm.yaml)``
9+
10+
Example usage:
11+
12+
```bash
13+
python run_textbox.py \
14+
--model=RNN \
15+
--dataset=samsum
16+
```

textbox/config/configurator.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -262,6 +262,8 @@ def _set_default_parameters(self):
262262
self.setdefault('valid_strategy', 'epoch')
263263
self.setdefault('valid_steps', 1)
264264
self.setdefault('disable_tqdm', False)
265+
self.setdefault('resume_training',True)
266+
self.setdefault('wandb', 'online')
265267
self._simplify_parameter('optimizer')
266268
self._simplify_parameter('scheduler')
267269
self._simplify_parameter('src_lang')

textbox/properties/overall.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@ seed: 2020
55
state: INFO
66
reproducibility: True
77
data_path: 'dataset/'
8+
wandb: 'online'
89

910
# training settings
1011
epochs: 50

textbox/quick_start/experiment.py

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,8 @@ def __init__(
3737
config_dict: Optional[Dict[str, Any]] = None,
3838
):
3939
self.config = Config(model, dataset, config_file_list, config_dict)
40+
wandb_setting = 'wandb ' + self.config['wandb']
41+
os.system(wandb_setting)
4042
self.__extended_config = None
4143

4244
self.accelerator = Accelerator(gradient_accumulation_steps=self.config['accumulation_steps'])
@@ -94,7 +96,8 @@ def _on_experiment_start(self, extended_config: Optional[dict]):
9496
self.valid_result: Optional[ResultType] = None
9597
self.test_result: Optional[ResultType] = None
9698
if config['load_type'] == 'resume':
97-
self.trainer.resume_checkpoint(config['model_path'])
99+
if config['resume_training']:
100+
self.trainer.resume_checkpoint(config['model_path'])
98101
self.model.from_pretrained(config['model_path'])
99102

100103
def _do_train_and_valid(self):

textbox/trainer/trainer.py

Lines changed: 4 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -364,18 +364,15 @@ def save_checkpoint(self):
364364
def save_generated_text(self, generated_corpus: List[str], is_valid: bool = False):
365365
r"""Store the generated text by our model into `self.saved_text_filename`."""
366366
saved_text_filename = self.saved_text_filename
367-
if not is_valid:
368-
self._summary_tracker.add_corpus('test', generated_corpus)
369-
else:
370-
path_to_save = self.saved_model_filename + '_epoch-' + str(self.timestamp.valid_epoch)
371-
saved_text_filename = os.path.join(path_to_save, 'generation.txt')
372-
os.makedirs(path_to_save, exist_ok=True)
367+
path_to_save = self.saved_model_filename + '_epoch-' + str(self.timestamp.valid_epoch)
368+
saved_text_filename = os.path.join(path_to_save, 'generation.txt')
369+
os.makedirs(path_to_save, exist_ok=True)
373370
with open(saved_text_filename, 'w') as fout:
374371
for text in generated_corpus:
375372
fout.write(text + '\n')
376373

377374
def resume_checkpoint(self, resume_dir: str):
378-
r"""Load the model parameters information and training information.
375+
r"""Load training information.
379376
380377
Args:
381378
resume_dir: the checkpoint file (specific by `model_path`).

textbox/utils/argument_list.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@
2121
'_hyper_tuning', # hyper tuning
2222
'multi_seed', # multiple random seed
2323
'romanian_postprocessing',
24+
'wandb'
2425
]
2526

2627
training_parameters = [
@@ -43,7 +44,8 @@
4344
'weight_decay', # common parameters
4445
'accumulation_steps', # accelerator
4546
'disable_tqdm', # tqdm
46-
'pretrain_task' # pretraining
47+
'pretrain_task', # pretraining
48+
'resume_training'
4749
]
4850

4951
evaluation_parameters = [

textbox/utils/dashboard.py

Lines changed: 0 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -435,14 +435,6 @@ def add_scalar(self, tag: str, scalar_value: Union[float, int]):
435435
if self._is_local_main_process and not self.tracker_finished and self.axes is not None:
436436
wandb.log(info, step=self.axes.train_step, commit=False)
437437

438-
def add_corpus(self, tag: str, corpus: Iterable[str]):
439-
r"""Add a corpus to summary."""
440-
if tag.startswith('valid'):
441-
self._current_epoch._update_metrics({'generated_corpus': '\n'.join(corpus)})
442-
if self._is_local_main_process and not self.tracker_finished:
443-
_corpus = wandb.Table(columns=[tag], data=pd.DataFrame(corpus))
444-
wandb.log({tag: _corpus}, step=self.axes.train_step)
445-
446438

447439
root = None
448440

0 commit comments

Comments
 (0)