Skip to content

Development

Philippe Ombredanne edited this page Sep 22, 2017 · 19 revisions

Development

See CONTRIBUTING.rst for details: https://github.com/nexB/scancode-toolkit/blob/master/CONTRIBUTING.rst

Code layout and conventions

Source code is in src/. Tests are in tests/.

There is one Python package for each major feature under src/ and a corresponding directory with the same name under tests (but this is not a package by design).

Each test script is named test_XXXX and while we love to use py.test as a test runner, most tests have no dependencies on py.test, only on the unittest module (with the exception of some command line tests that depend on pytest monkeypatching capabilities.

When source or tests need data files, we store these in a data subdirectory.

We use PEP8 conventions with a relaxed line length that can be up to 90'ish characters long when needed to keep the code clear and readable.

We store pre-built bundled native binaries in bin/ sub-directories of each src/ packages. These binaries are organized by OS and architecture. This ensure that ScanCode works out of the box either using a checkout or a download, without needing a compiler and toolchain to be installed. The corresponding source code for the pre-built binaries are store in a separate repository at https://github.com/nexB/scancode-thirdparty-src

We store bundled thirdparty components and libraries in the thirdparty directory. Python libraries are stored as wheels, eventually pre-built if the corresponding wheel is not available in the Pypi repository. Some of these components may be advanced builds with bug fixes or advanced patches.

We write tests, a lot of tests, thousands of tests. Several tests are data-driven and use data files as test input and sometimes data files as test expectation (in this case using either JSON or YAML files). The tests should pass on Linux 64 bits, Windows 32 and 64 bits and on MacOSX 10.6.8 and up. We maintain two CI loops with Travis (Linux) at https://travis-ci.org/nexB/scancode-toolkit and Appveyor (Windows) at https://ci.appveyor.com/project/nexB/scancode-toolkit

When finding bugs or adding new features, we add tests. See existing test code for examples.

Running tests

ScanCode comes with over 13,000 unit tests to ensure detection accuracy and stability across Linux, Windows and macOS OSes: we kinda love tests, do we?

We use pytest to run the tests: call the py.test script to run the whole test suite. This is installed by pytest which is bundled with a ScanCode checkout and installed when you run ./configure).

If you are running from a fresh git clone and you run ./configure and then source bin/activate the py.test command will be available in your path.

Alternatively if you have already configured but are not in an activated "virtualenv" the py.test command is available under <root of your checkout>/bin/py.test

(Note: paths here are for POSIX, but mostly the same applies to Windows)

If you have a multiprocessor machine you might want to run the tests in parallel (and faster) For instance: py.test -n4 runs the tests on 4 CPUs. We typically run the tests in verbose mode with py.test -vvs -n4

You can also run a subset of the test suite as shown in the CI configs https://github.com/nexB/scancode-toolkit/blob/develop/appveyor.yml#L6 e,g, py.test -n 2 -vvs tests/scancode runs only the tests scripts present in the tests/scancode directory. (You can pass a path to a specific test script file there too).

See also https://docs.pytest.org for details or use the py.test -h command to show the many other options available.

One useful option is to run a select subset of the test functions matching a pattern with the -k option for instance: py.test -vvs -k tcpdump would only run test functions that contain the string "tcpdump" in their name or their class name or module name .

Another useful option after a test run with some failures is to re-run only the failed tests with the --lf option for instance: py.test -vvs --lf would only run only test functions that failed in the previous run.

pip requirements and the configure script

ScanCode use the configure and configure.bat (and etc/configure.pybehind the scenes) scripts to install a virtualenv, install required packaged dependencies as pip requirements and more configure tasks such that ScanCode can be installed in a self-contained way with no network connectivity required.

Earlier unreleased versions of ScanCode where using buildout to install and configure eventually complex dependencies. We had some improvements that were merged in the upstream buildout to support bootstrapping and installing without a network connection and When we migrated to use pip and wheels as new, improved and faster way to install and configure dependencies we missed some of the features of buildout like the recipes, being able to invoke arbitrary Python or shell scripts after installing packages and have scripts or requirements that are operating system-specific.

ScanCode requirements and third-party Python libraries

In a somewhat unconventional way, all the required libraries are bundled aka. copied in the repo itself in the thirdparty/ directory. If ScanCode were only a library it would not make sense. But its is first an application and having a well defined frozen set of dependent packages is important for an app. The benefit of this approach (combined with the configure script) means that a mere checkout of the repository contains everything needed to run ScanCode except for a Python interpreter.

Using ScanCode as a Python library

ScanCode can be used alright as a Python library and is available as as a Python wheel in Pypi and installed with pip install scancode-toolkit.

Steps to cut a new release:

  • run bumpversion with major, minor or patch to bump the version in:

    • src/scancode/__init__.py
    • setup.py
    • Update the CHANGELOG.rst
  • commit changes and push changes to develop:

    • git commit -m "commit message"
    • git push --set-upstream origin develop
  • merge develop branch in master and tag the release.

    • git checkout master
    • git merge develop
    • git tag -a v1.6.1 -m "version 1.6.1"
    • git push --set-upstream origin master
    • git push --set-upstream origin v1.6.1
  • draft a new release in GitHub, using the previous release blurb as a base. Highlight new and noteworthy changes from the CHANGELOG.rst.

  • run etc/release/release.sh locally.

  • upload the release archives created in the dist/ directory to the GitHub release page.

  • save the release as a draft. Use the previous release notes to create notes in the same style. Ensure that the link to thirdparty source code is present.

  • test the downloads.

  • publish the release on GitHub

Clone this wiki locally