- 1. Introduction
- 2. Installing the Software You Will Need: LaTeX and Git
- 3. Getting Your Own Paper and Its Repository Setup
- 4. Building Papers at the Terminal
- 5. Sections of the Paper and What Goes In Them
- 6. What Shouldn't Be in Your Paper's Repository
- 7. Setting up Your Writing Environment
- 8. General Writing Tips
- 9. Bibliography Management
This repository is designed to be used by researchers (particularly PhD and Masters students) for writing a new research paper that they want to submit to a software engineering conference or journal using LaTeX – or even just to create an internal report, such as one that might be required by a PhD thesis committee.
It provides a skeleton set of LaTeX files and folders (referred to as
"new-paper
") that you can use to as an example structure in which to write your
own material, and customise your paper with additional macros.
This README file talks you through how to create your own paper repository on GitHub using this one as a template, how to compile your paper and produce a PDF file, along with some general writing and editing advice.
The format described by new-paper
assumes that you have researched a new kind
of technique or algorithm that have empirically evaluated. If you want to write
a purely empirical paper (i.e., where its novelty lies purely in your research
questions and the corresponding findings) you may need to tweak the sections.
However, it is not really suitable for a paper that involves only proofs and no
empirical study.
First of all, you need to ensure that LaTeX and Git are setup on your machine, if they are not already.
See https://git-scm.com/book/en/v2/Getting-Started-Installing-Git and https://www.latex-project.org/get for instructions on how to download and install them for your operating system.
(If you are not already familiar with these tools, it is worth investing some time learning about them first before you read any further.)
This repository is a template repository meaning that you can use it as a starting point for your own paper. Go to https://github.com/philmcminn/new-paper and click the green button labelled "Use this template", and follow the instructions on GitHub for creating your own repository. (Don't forget to make your new repository private!)
In terms of the name of your repository, choose one that is indicative of the paper's research content, rather than the conference venue or journal you intend to submit it to. For example, "search-based-testability" is a better name than "icse2020". This is because you may decide later to submit your work to a different venue. Or, your work may not be accepted to the first venue you submit to, and you may need to revise your paper and submit it to another. In either case, the original venue name will no longer be a suitable choice of name for your repository.
For the same reasons as those detailed later, choose a name formatted in "kebab-case" (i.e., all lower case with hyphens as word separators).
GitHub repositories can be re-named at any time. To do this, go the "Settings" tab.
Note
You may need to re-clone the repository on your local machine or simply connect
your local repository to new url by git remote set-url origin [NEW URL]
Once you have created your repository, you can clone it to your machine. Go to your repository's page on GitHub and click the green button labelled "Code". You should then be able to copy it's URL, so that you can then clone it at the command line with:
git clone [URL]
Over time, you will be adding your own files and content to your paper. The various files and examples in this repository demonstrate how to go about doing that.
Please ensure you stick to the coding standards described in the comments of the paper. This helps ensure everything is consistent, that other people working with you (i.e., me, and possibly other collaborators) can find things easily (such as a figure file referenced using a certain label in a section file), and in general helps uphold the Principle of Least Astonishment for others when working on the paper.
In particular, the repository opts to use "kebab-case" for naming files and directories. All file and directory names following the kebab-case convention are lower-cased, with words separated with hyphens. Using lower-casing and not using spaces (i.e., by using hyphens instead) in file and directory names ensures good cross-platform compatibility.
Even if you're using an integrated development environment to compile your paper, you will probably at some point need to compile the paper from a terminal window. If you're working with me, then I don't mandate that you use a Make file or any particular build tool, but I'm happy if that's what you want to do – so long as you leave some instructions if they're needed. Some students just include a build script in the root directory of their repository for convenience.
The root file is paper.tex
. To compile the example and produce a PDF file you
will need to run pdflatex
as follows:
pdflatex paper
Later down the line you will have BibTeX
file(s) containing your references (see section on bibliography management), and you'll need to run pdflatex
a couple
of times to resolve all of the references properly in the document:
pdflatex paper
bibtex *.aux
pdflatex paper
pdflatex paper
If you have not got any references defined, bibtex *.aux
will produce
an error. So you just need to run pdflatex paper
in the first instance.
Each .tex file in new-paper
corresponds to part of the paper and includes
advice as to how to go about writing it. It should be obvious from the directory
names as to what goes in each, but the .tex files explains these also.
You can delete or comment out this advice to make way for your own content.
Once you've been through this process a few times, feel free to run the Python
clean.py
script, which will remove all the pre-written advice and all of the example
files and directories automatically.
You can exclude files from the repository using the provided .gitignore
file
as a starting point. This file lives in the repository's root directory, and
already covers some of the file types discussed next, although your particular
paper may develop to include some more.
Firstly, do not commit build files to the repository, in particular the target
PDF file (paper.pdf
). You should also avoid committing:
-
Temporary build files that LaTeX and BibTeX produce (e.g.,
paper.aux
, etc.) -
Operating system files (e.g.,
.DS_Store
on macOS) that can be particularly irritating for users of other systems. -
Editor backup files (e.g.,
.bak
files), and editor settings files (e.g.,.vscode
), if you can avoid them.
You should use a separate Git repository for all of your experimental data and materials. That is, keep your paper repository for LaTeX files only, or graphics/TikZ files etc. that are directly involved in the production of your paper.
If the build of your paper needs to generate tables or figures from your raw data, consider including it in the paper repository as a separate Git submodule instead. A Git submodule is just a way of using some other Git repository in another, but where the two repositories can still be maintained independently.
I suggest you set up a development environment similar to one that you'd use for developing software, but which will automatically compile your LaTeX and build a PDF.
There are plenty of tools around to assist you in this process, and I have used
a number over the years. Currently, I use Visual Studio
Code (VSCode for short), which is free and open
source, with the LaTeX
workshop
plugin. Among other things this plugin automatically builds my PDF every time I
make a change to a .tex
source file. With VSCode, I can display the PDF in an
editor tab so that I can see it while I am working on it, and have it update
while I edit it. Others prefer more "traditional" solutions such as Vim or
Emacs, but you can use whatever you prefer.
The repository provides an .editorconfig
file (see https://editorconfig.org on
how to use with your text editor), which sets out my own preferences in terms of
whether to uses tabs or spaces (spoiler: it's spaces), indent sizes, etc.
In Latex these settings are perhaps less important, and your personal preferences may differ. This is fine, just don't add any settings or backup files produced by your editor to the repository – your personal choice can co-exist alongside that of your collaborators.
Whatever environment you use, ensure you have a spell checker installed! (Again, VSCode has a plugin for one of these.) You will also find various other plugins useful, for example those that manage tabs/spaces and remove trailing spaces in your source text files.
The text and comments embedded in the .tex
files of this paper discuss how to
go about writing each section of the paper. Check them out! They also discuss
the conventions you should follow and contain a number of tips on how to write
good LaTeX.
What follows is a more general overview of writing style that applies throughout your paper:
First of all, please avoid passive voice. When you write in passive voice, you're excluding the "actor" from the sentence so that objects have things done to them rather than someone/something doing the action. This makes your sentences vague and imprecise. For example, it's important to know whether the steps in your empirical study were manual or automated (did you do them, or was it done automatically by your tool?). Passive voice often excludes these details and can confuse readers and referees. Consider the sentence "test suites were generated for the subjects 30 times" (passive voice) vs. "our tool generated test suites for the subjects 30 times" (active voice). In multi-author papers, using "we" to say "we did it" is preferred to writing that something "was done", and to convey something in an experiment that was necessarily manual step, or involved you making a choice from a series of possible options. Any of these could apply to when you want to write about deriving an algorithm, implementing a tool, choosing a particular aspect of an experiment's design, and so on.
Sometimes it is difficult to identify passive from active voice! This is where the "zombies" tip comes in useful: If you can add "by zombies" after the verb in the sentence, and it still makes grammatical sense, you have a sentence written in passive voice.
As I remarked earlier in these instructions, please ensure you use a spell checker! Many text editors enable you to install a plugin so that you can see any misspelled words while you're editing your document.
British English or American English? It may seem obvious to you, but it's best to consult with your supervisor first. Here are some aspects you may want to consider when making this decision in a reasoned, non-patriotic fashion!
Firstly, if you're planning to submit your thesis in Britain, it makes sense to use British English, so that you can re-use the text from your paper without potentially needing to change all the spellings later. Some PhD examiners will be picky about this, and/or may (hopefully unduly) suspect the text has come from sources other than you.
Consider again, however, if you're planning to submit your paper to a conference that has double-anonymous ("double-blind") reviewing. This is where you don't know the identities of the reviewers (as usual), but the reviewers don't know who you are - the authors of the work - either. In practice, this means you shouldn't put the names of authors in your paper or reveal any other details that the reviewers may be able to use to identify who you are as the origin of the work. In this situation you might opt to use American English. While I have not seen the use of British English being in and of itself against the rules of conferences that employ double-anonymous reviewing, its use may help narrow down the potential source of the work in the mind the reviewer ("... ah this must be John Smith, who's the only one who works on mutational search-based self-driving causal large language models for software testing in England"). Depending on the context, being identified in this way may or may not be a good thing!
Finally, if you're working on a journal paper, note that some journals (particularly American ones, obviously) will change all British spellings to American for the final published (a.k.a. "camera-ready") version of the paper, regardless of your original choice.
At the end of the day, the most important thing is that everyone working on the paper knows and agrees on which version of English the paper is using (and doesn't keep changing the spellings from one version of English to another!).
Also, bear in mind that it's not just about spelling but also word choice, since there are some words that are still used in every day British English that have disappeared from American English. These words sound odd and quaint to the American ear (so you may find them silently corrected in your paper, especially if working with an American collaborator). Top of the annoyance list appears to be the word "whilst". Use "while" instead. (It just sounds better anyway, IMO.)
There are certain rules around writing numbers that just look "right" but generally have no particular explanation, and are just part of generally accepted scientific writing style. In general, numbers should appear as numbers, but there are some special cases where you should write them out as words, for example:
-
At the start of sentence. Write "Thirty-seven of the subjects..." as opposed to "37 of the subjects...", or even better, just re-word the sentence so that the number doesn't appear at the start, as both look a little bit strange.
-
Numbers less than 10, in the middle of a sentence, unless you're quoting actual data points from your experiment.
There are a number of good resources on the web, that you should check out too.
See, the following links, for example:
-
Advice for Writing LaTeX Documents, extensive guidance on LaTeX style, by Diomidis Spinellis.
-
Things I Keep Repeating About Writing, an excellent blog post by Claire Le Goues that covers more aspects and tips related to writing style than covered here.
-
My top ten presentation issues in other's papers, a collection of bug-bears in papers found by Andreas Zeller while reviewing papers, each of which you should avoid!
-
Guide to Punctuation – a good and detailed guide to every punctuation symbol in English, and when to use it. Covers differences in American usages.
There are some popular text books that are worth looking at as well (you're welcome to borrow them from me), these include:
-
BUGS in Writing: A Guide to Debugging Your Prose, by Lyn Dupre.
-
Writing for Computer Science, by Justin Zobel.
Finally, some good tutorial papers written by Mary Shaw. This are a little old now, but the advice in them is still good and relevant!
-
Writing Good Software Engineering Papers. Proceedings of the International Conference on Software Engineering (ICSE), 2003.
-
What Makes Good Research In Software Engineering? International Journal of Software Tools for Technology Transfer, vol. 4, no. 1, 2002.
Bear in mind that sometimes the advice given is subjective, and hence contradictory, in which case you're free to make up your own mind — although you may wish to discuss it with your supervisor first! If the route you take has a significant impact on the way you will write or structure your paper, you should discuss and get the agreement of all of your collaborators.
For example, this repository chooses to structure papers by splitting its content across several files. This is because I think it is better to structure LaTeX documents in a modular fashion, much like a computer program. This approach will help you re-use components of your papers that you have written when you start to work on your Masters dissertation, or PhD thesis. A common objection to doing it this way is that it makes it harder to find specific text later. But, if you structure your files in the way this repository suggests, finding things again should not be difficult – in fact, it should be very easy. Furthermore, if you get really stuck, then it is not hard to use the "Find" feature of a good text editor, so long as you include all the relevant files in the scope of the search.
You should use Bibtex for your bibliography. If you are using reference manager software like Zotero or Mendeley it's easy to export your library to .bib
format. Once you already have bibliography you can uncomment the line in paper.tex
.
\bibliography{bibtex/file_name}
I suggest keeping your bibliography in separate repository as maintaining separate bib files is a pain, especially if you're working on multiple papers simultaneously, the different copies can get out of sync.
To do this you can use git submodules:
git submodule add [bib repo url] bibtex/your-bibliography
Having your bibliography set up you should be able to build your paper using:
pdflatex paper
bibtex *.aux
pdflatex paper
pdflatex paper