Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[JOSS Review] Paper revision suggestions #290

Open
matthewfeickert opened this issue Feb 20, 2025 · 3 comments
Open

[JOSS Review] Paper revision suggestions #290

matthewfeickert opened this issue Feb 20, 2025 · 3 comments

Comments

@matthewfeickert
Copy link
Contributor

These are questions and revisions for JOSS submission openjournals/joss-reviews#7602.

As mentioned, these suggested revisions remove large sections to conform to the JOSS paper format. I appreciate that a lot of work and time went into writing this paper, so I want to be clear that the revisions are not supposed to be meant as dismissive or invalidating of the relevance of the information. Any material that is recommended to be removed should be migrated to the documentation (though migrating it to the docs not need to happen prior to acceptance.)

Just to double check, Athena Elafrou doesn't have an ORCID, correct?

Can you cite PyTorch when it is first mentioned in the Summary instead of later in the Statement of need?

An emdash (—) should be used here, not a dash (-). In Markdown an emdash is represented by —.

  • While I believe you RE:

FTorch/paper/paper.md

Lines 78 to 79 in 96a419e

A common example from the geosciences is ML parameterisation
of subgrid processes - a major source of uncertainty in many models.

is there a citation you can provide here? While I appreciate rigor, I also don't think that for a JOSS paper you need to motivate the importance of ML, so if you instead wanted to revise down this paragraph significantly that would be okay too.

"Ideally users" should be "Ideally, users"

Should be an emdash — not an endash – (assuming you were going for LaTeX's --` endash syntax.)

  • Most of the "Software description" section is not needed, and should be moved to the documentation, if it is not there already. I would recommend that all of the "Software description" section be removed except for the following segments (revised as you see fit):

    • FTorch/paper/paper.md

      Lines 116 to 120 in 96a419e

      Using the `iso_c_binding` module, intrinsic to Fortran since the 2003 standard,
      we provide a Fortran wrapper to `LibTorch`.
      This enables shared memory use (where possible) to
      maximise efficiency by reducing data-transfer during coupling.^[i.e. the same
      data in memory is used by both `LibTorch` and Fortran without creating a copy.]

    • FTorch/paper/paper.md

      Lines 145 to 148 in 96a419e

      We utilise the existing support in `LibTorch` for
      GPU acceleration without additional device-specific code.
      `torch_tensor`s are targeted to a device through a
      `device_type` enum, currently supporting CPU, CUDA, XPU, and MPS.

  • All of "Examples and Tooling"

## Examples and Tooling

should be removed, with the exeception of

FTorch/paper/paper.md

Lines 201 to 202 in 96a419e

The library also provides a script (`pt2ts.py`) to assist users with
saving PyTorch models to TorchScript.

which can be integrated into the previous section.

Does Fiats have a citation you can use?

  • Please condense down

FTorch/paper/paper.md

Lines 256 to 259 in 96a419e

* in the [DataWave project](https://datawaveproject.github.io/) to
couple a neural net emulation for gravity wave drag to an atmospheric
model [@MiMAML] demonstrating variability of models trained offline when coupled to a
host [@mansfield2024uncertainty].

and

FTorch/paper/paper.md

Lines 265 to 267 in 96a419e

* in the DataWave project [@CAMGW] to couple emulators of gravity wave drag, and
new data-driven parameterisations to the Community Atmosphere Model (CAM) running on
HPC systems.

into a single entry. While the uses are different, this section is already too long as ongoing research projects using the software should be cited briefly and do not need full descriptions.

  • Do ICON

* to couple a U-Net based model of multi-scale convection into [ICON](https://www.icon-model.org/)

and CESM

* As part of CESM (the Community Earth System Model) working to provide a general

have citable references? If not, please cite their websites.

@jatkinson1000
Copy link
Member

jatkinson1000 commented Feb 20, 2025

Just to double check, Athena Elafrou doesn't have an ORCID, correct?

Correct

Can you cite PyTorch when it is first mentioned in the Summary instead of later in the Statement of need?

An emdash (—) should be used here, not a dash (-). In Markdown an emdash is represented by —.
Should be an emdash — not an endash – (assuming you were going for LaTeX's --` endash syntax.)

"Ideally users" should be "Ideally, users"

is there a citation you can provide here? While I appreciate rigor, I also don't think that for a JOSS paper you need to motivate the importance of ML, so if you instead wanted to revise down this paragraph significantly that would be okay too.

Most of the "Software description" section is not needed, and should be moved to the documentation, if it is not there already. I would recommend that all of the "Software description" section be removed except for the following segments (revised as you see fit):

Examples and Tooling should be removed, with the exeception of pt2ts which can be integrated into the previous section.

  • I have done this in 3c6ae78 but would like to push back slightly keeping a reference to the examples. We have found that the comprehensive and friendly examples has been a large part of the software's success. Many users cite this as a reason they choose to use FTorch.

Does Fiats have a citation you can use?

  • I have looked, and no, but I added as a github citation similar to other repos in the paper in commit eb312e9

Please condense down DataWave

Do ICON and CESM have citable references? If not, please cite their websites.

  • They are large operational models (German weather service, and contributor to CMIP used globally by researchers) so whilst there are many papers there is no single citation for the 'model' as a whole as far as I am aware.
    I have changed the links to @online references in 285ccb0

@jatkinson1000
Copy link
Member

@matthewfeickert Please see above for a completed set of responses to your points.

Please do let me know if you are happy with these, or if there are further changes you would like.

@matthewfeickert
Copy link
Contributor Author

Thanks @jatkinson1000 (and @jwallwork23 from the commit record). We really want the paper to not include additional material that can live in the documentation though

Given this format, a "full length" paper is not permitted, and software documentation such as API (Application Programming Interface) functionality should not be in the paper and instead should be outlined in the software documentation.

The paper is supposed to serve as a minimal citable reference for the software. So

FTorch/paper/paper.md

Lines 112 to 167 in aeadefd

# Software description
`FTorch` is a Fortran wrapper to the `LibTorch` C++ framework using the `iso_c_binding`
module, intrinsic to Fortran since the 2003 standard
This enables shared memory use (where possible) to
maximise efficiency by reducing data-transfer during coupling^[i.e. the same
data in memory is used by both `LibTorch` and Fortran without creating a copy.]
and avoids any use of Python at runtime.
PyTorch types are represented through derived types in `FTorch`, with Tensors supported
across a range of data types and ranks using the `fypp` preprocessor [@fypp].
Fortran code quality is enforced using fortitude [@fortitude], alongside other tools.
We utilise the existing support in `LibTorch` for
GPU acceleration without additional device-specific code.
`torch_tensor`s are targeted to a device through a
`device_type` enum, currently supporting CPU, CUDA, XPU, and MPS.
Multiple GPUs may be targeted through the optional `device_index` argument.
Typically, users train a model in PyTorch and save it as TorchScript, a strongly-typed
subset of Python.
`FTorch` provides a utility script (`pt2ts.py`) to assist users with this process.
The Torchscript model is then loaded by `FTorch` and run using `LibTorch`.
The following provides a minimal representative example:
```fortranfree
use ftorch
...
type(torch_model) :: model
type(torch_tensor), dimension(n_inputs) :: model_inputs
type(torch_tensor), dimension(n_outputs) :: model_outputs
...
call torch_model_load(model, "/path/to/saved_TorchScript_model.pt", torch_kCPU)
call torch_tensor_from_array(model_inputs(1), fortran_inputs, &
in_layout, torch_kCPU)
call torch_tensor_from_array(model_outputs(1), fortran_outputs, &
out_layout, torch_kCPU)
...
call torch_model_forward(model, model_inputs, model_outputs)
...
call torch_delete(model)
call torch_delete(model_inputs)
call torch_delete(model_outputs)
...
```
`FTorch` includes a directory of examples covering an extensive range of use
cases.
Each guides users through a complete workflow from Python to Fortran.
These examples underpin integration testing alongside unit testing with
[pFUnit](https://github.com/Goddard-Fortran-Ecosystem/pFUnit), both running in
Continuous Integration workflows.
Full details, including user guide, API documentation, slides and videos, and links to
projects is available at
[https://cambridge-iccs.github.io/FTorch](https://cambridge-iccs.github.io/FTorch).

should get revised down to something like

# Software description

`FTorch` is a Fortran wrapper to the `LibTorch` C++ framework using the `iso_c_binding`
module, intrinsic to Fortran since the 2003 standard
This enables shared memory use (where possible) to
maximise efficiency by reducing data-transfer during coupling^[i.e. the same
data in memory is used by both `LibTorch` and Fortran without creating a copy.]
and avoids any use of Python at runtime.
PyTorch types are represented through derived types in `FTorch`, with Tensors supported
across a range of data types and ranks using the `fypp` preprocessor [@fypp].

We utilise the existing support in `LibTorch` for
GPU acceleration without additional device-specific code.
`torch_tensor`s are targeted to a device through a
`device_type` enum, currently supporting CPU, CUDA, XPU, and MPS.
Multiple GPUs may be targeted through the optional `device_index` argument.

Typically, users train a model in PyTorch and save it as TorchScript, a strongly-typed
subset of Python.
`FTorch` provides a utility script (`pt2ts.py`) to assist users with this process.
The Torchscript model is then loaded by `FTorch` and run using `LibTorch`.

Full details and additional information and resources are provided in the documentation available at 
[https://cambridge-iccs.github.io/FTorch](https://cambridge-iccs.github.io/FTorch).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants