Skip to content

Commit 0e1ddfd

Browse files
maleficefcurella
andauthored
Provider method autodoc and sample autogeneration (joke2k#1128)
* Add sample code validator class * Create build_docs replacement * Enforce sorting on AVAILABLE_LOCALES * Create provider method docstring preprocessor * Configure sphinx to use new autodoc setup * Reverse part of cdd79d5 * Prettify barcode provider docstring * Assortment of changes to ProviderMethodDocstring * Add more tests * Allow use of OrderedDict in samples * Fix flake8 errors * Fix isort errors * Add sphinx to tox.ini testenv deps * Exclude autodoc files from coverage * Fix one last isort error * Allow multiple results and seeding * Remove old build_docs * Update base provider docs * Fix flake8 and isort errors again * Force usage of size kwarg in sample generation * Revert typo introduced in 4d00704 * Improve logging * Improve results stringification * Finalize barcode provider docs * Finalize standard provider docs * Finalize misc provider docs * Improve color provider docs * Autogenerate default sample for insufficient docstrings * Exclude faker.sphinx and tests.sphinx from release distributions * Add docs on writing docs * Fix check-manifest errors * Fix faker.sphinx not found error in RTD * fix small grammar issues Co-authored-by: Flavio Curella <[email protected]>
1 parent 0b92232 commit 0e1ddfd

21 files changed

+1753
-383
lines changed

.coveragerc

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,7 @@
11
[paths]
22
source = faker/
3-
omit = faker/build_docs.py
3+
4+
[run]
5+
omit =
6+
faker/sphinx/autodoc.py
7+
faker/sphinx/documentor.py

MANIFEST.in

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,8 @@ include RELEASE_PROCESS.rst
66
include VERSION
77
recursive-include tests *.json
88
recursive-include tests *.py
9+
recursive-exclude faker/sphinx *.py
10+
recursive-exclude tests/sphinx *.py
911

1012
global-exclude *.py[cod] __pycache__ *.so
1113
exclude Makefile tox.ini .coveragerc .bumpversion.cfg .dockerignore .isort.cfg

docs/conf.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@
2626
# ones.
2727
extensions = [
2828
'sphinx.ext.todo',
29-
'faker.build_docs',
29+
'faker.sphinx.autodoc',
3030
]
3131

3232
# Add any paths that contain templates here, relative to this directory.

docs/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@ Contents
2020
communityproviders
2121
locales
2222
coding_style
23+
writing-docs
2324

2425

2526

docs/writing-docs.rst

Lines changed: 221 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,221 @@
1+
Writing Documentation
2+
=====================
3+
4+
Everything under :doc:`Standard Providers <providers>` and :doc:`Localized Providers <locales>`
5+
iss automatically generated using ``sphinx.ext.autodoc`` which pulls docstrings from provider
6+
methods during the ``sphinx-build`` process. This also means that the docstrings must be written
7+
in valid ``reStructuredText``.
8+
9+
Furthermore, because of the nature of this library, it is imperative to include sample usage to
10+
best demonstrate the capabilities and the possibilities. Since there are so many provider methods
11+
and localized versions, keeping the docs updated would have been a nightmare if the sample usage
12+
section (with reproducible output) of each provider method were to be written by hand.
13+
14+
Automating sample usage sections
15+
--------------------------------
16+
17+
To ease the burden of docs maintenance, the project takes advantage of docstring preprocessing offered
18+
by ``sphinx.ext.autodoc`` to automatically generate sample usage section, complete with reproducible
19+
output, all from a couple of lines of text using a ``:sample:`` "pseudo-role" like so:
20+
21+
.. code-block::
22+
23+
:sample[ size=SIZE][ seed=SEED]:[ KWARGS]
24+
25+
What this will do is generate a sample usage section by calling the provider method ``SIZE`` times using
26+
an initial seed value of ``SEED`` with optional keyword arguments ``KWARGS``. If no ``SIZE`` is specified
27+
or if ``SIZE`` is less than ``5``, it defaults to ``5``. If no ``SEED`` is specified, it defaults to ``0``.
28+
29+
For example, let us assume that the line ``:sample:`` is present in the docstring of a provider method
30+
named ``method1``. That short line of text will automatically generate a sample usage section like this:
31+
32+
.. code-block:: python
33+
34+
>>> Faker.seed(0)
35+
>>> for _ in range(5):
36+
... fake.method1()
37+
...
38+
# Output 1
39+
# Output 2
40+
# Output 3
41+
# Output 4
42+
# Output 5
43+
44+
45+
Depending on the nature of the provider method, the default of 5 samples may not be enough, so it is
46+
possible to increase that by using ``size=SIZE``. You may also want to supply arguments to change the
47+
behavior of the method, so that can be done using ``KWARGS``. Putting it all together, if we use
48+
``:sample size=10: a=1, b=2, c=3``, the sample usage section generated will look like this:
49+
50+
.. code-block:: python
51+
52+
>>> Faker.seed(0)
53+
>>> for _ in range(10):
54+
... fake.method1(a=1, b=2, c=3)
55+
...
56+
# Output 1
57+
# Output 2
58+
# Output 3
59+
# Output 4
60+
# Output 5
61+
# Output 6
62+
# Output 7
63+
# Output 8
64+
# Output 9
65+
# Output 10
66+
67+
68+
There may also be times when it is desirable to show a particular output, but the pseudo-RNG gets in
69+
the way, e.g. very low chance of said output being generated. To work around this, you may use
70+
``seed=SEED`` to specify an initial seed value that is known to generate the desired output. If we
71+
specify ``:sample seed=12345: a=2``, the sample usage section generated will look like this:
72+
73+
.. code-block:: python
74+
75+
>>> Faker.seed(12345)
76+
>>> for _ in range(5):
77+
... fake.method1(a=2)
78+
...
79+
# Output 1
80+
# Output 2
81+
# Output 3
82+
# Output 4
83+
# Output 5
84+
85+
86+
You can mix and match ``SIZE``, ``SEED``, and ``KWARGS``, and if ``KWARGS`` is becoming too long to
87+
fit a single line, you can break ``KWARGS`` into multiple lines in the same way you can break keyword
88+
arguments across multiples lines in actual Python code. For example, let us say the docstring contains
89+
this:
90+
91+
.. code-block:: text
92+
93+
:sample size=25 seed=12345: arg1='very long value, unfortunately',
94+
arg2='yet another long value'
95+
96+
The sample section usage generated will look something like this:
97+
98+
.. code-block:: python
99+
100+
>>> Faker.seed(12345)
101+
>>> for _ in range(25):
102+
... fake.method1(arg1='very long value, unfortunately', arg2='yet another long value')
103+
...
104+
# Output 1
105+
# Output 2
106+
# ...
107+
# Output 24
108+
# Output 25
109+
110+
Docstring preprocessing behavior
111+
--------------------------------
112+
113+
If a provider method does not have a docstring or if the docstring does not contain properly
114+
formatted ``:sample:`` lines, a default sample usage section will automatically be generated
115+
for the benefit of insufficiently documented provider methods.
116+
117+
A docstring may contain multiple ``:sample:`` lines, and all prospective ``:sample:`` lines are
118+
first checked to see if they are properly formatted. Malformed instances will be discarded, and
119+
details will be logged to the console as a warning. All properly formatted ``:sample:`` lines will
120+
then be removed from the docstring and will undergo sample validation and generation, and the
121+
resulting docstring will have an ``:examples:`` section appended to the end. In code form:
122+
123+
.. code-block:: python
124+
125+
# Source code docstring
126+
def foo():
127+
"""Summary line
128+
129+
Lorem ipsum dolor sit amet, consectetur adipiscing elit.
130+
Fusce auctor faucibus condimentum.
131+
132+
:sample:
133+
134+
Duis posuere lacinia porta.
135+
Quisque mauris nisl, mattis sed ornare eget, accumsan sit amet mauris.
136+
137+
:sample size=10 seed=1000:
138+
"""
139+
return 1
140+
141+
142+
.. code-block:: python
143+
144+
# Resulting docstring (more or less) after preprocessing
145+
def foo():
146+
"""Summary line
147+
148+
Lorem ipsum dolor sit amet, consectetur adipiscing elit.
149+
Fusce auctor faucibus condimentum.
150+
151+
152+
Duis posuere lacinia porta.
153+
Quisque mauris nisl, mattis sed ornare eget, accumsan sit amet mauris.
154+
155+
:examples:
156+
157+
>>> Faker.seed(0)
158+
>>> for _ in range(5):
159+
... fake.foo()
160+
...
161+
1
162+
1
163+
1
164+
1
165+
1
166+
167+
>>> Faker.seed(1000)
168+
>>> for _ in range(10):
169+
... fake.foo()
170+
...
171+
1
172+
1
173+
1
174+
1
175+
1
176+
1
177+
1
178+
1
179+
1
180+
1
181+
"""
182+
pass
183+
184+
185+
Notice how it did not remember where the ``:sample:`` lines are. Regardless of the original positions
186+
of the ``:sample:`` lines, the resulting output of all those lines will be collected and appended
187+
towards the end of the docstring. Please keep this in mind when structuring the flow of docstrings.
188+
189+
There are definitely benefits in allowing sample sections to be generated in place as it make the
190+
creation of richer documentation possible, but unfortunately it is not yet possible due to time
191+
constraints. Until that feature is available, please keep all ``:sample:`` lines towards the end
192+
of the docstring to help out the code reviewers.
193+
194+
Sample validation and security segue
195+
------------------------------------
196+
197+
Under the hood, the sample sections are generated by feeding the parsed docstring sample lines
198+
into the standard library's ``eval()``. This setup most definitely have some security implications
199+
out of the box, and this is why ``:sample:`` lines undergo validation prior to generation.
200+
201+
There are many details behind the validation process, but the long and short of it is that ``SIZE``
202+
and ``SEED`` can only be integers, and ``KWARGS`` can only be keyword arguments with literal values
203+
or ``OrderedDict`` objects. Attempting to do anything else like calling other builtins or even just
204+
performing basic arithmetic will fail the validation. Details of failed validation will be logged
205+
to the console as a warning.
206+
207+
To further improve security, all of the potentially dangerous code used for this purpose have been
208+
isolated into the ``faker.sphinx`` module, and this module will be excluded from release distributions
209+
that are hosted in PyPI.
210+
211+
If you are interested in learning more or in performing a security audit on how sample validation is
212+
implemented, please refer to the source code and docstrings of ``faker.sphinx.validator.SampleCodeValidator``
213+
and ``faker.sphinx.docstring.ProviderMethodDocstring``.
214+
215+
Sample generation
216+
-----------------
217+
218+
Once a ``:sample:`` line has been validated, the ``sphinx-build`` process will attempt to generate
219+
results based on the information provided. A sample run can still fail if ``KWARGS`` contains keyword
220+
arguments that the provider method is not expecting or if executing the provider method results in
221+
an exception. Details of such instances will also be logged to the console as a warning.

faker/build_docs.py

Lines changed: 0 additions & 121 deletions
This file was deleted.

0 commit comments

Comments
 (0)