Skip to content

Google Summer of Code 2019

Abhishek Kumar edited this page Aug 24, 2019 · 13 revisions

Port scancode-toolkit to Python 3

Owner: Abhishek Kumar

Mentor: Philippe Ombredanne

# Problem: Since Python 2.7 will retire in few months and no longer maintained.

# Solution: Scancode needs to be ported to python 3 and all test suites must passed on both version of Python. The main difference that makes Python 3 better than Python 2.x is that the support for unicode is greatly improved in Python 3 .This will also be useful for scancode as scancode has users in more than 100 languages and it's easy to translate strings from unicode to other languages.

# Objective: To make scancode-toolkit installable on on Python 3.6 and higher just as it install with Python 2.7 .

Implementation:

  • It was started in development mode(editable mode) and then i was move to work in virtual environments.
  • I have worked module by module according to order of hierarchy of module. For example :All module is dependent on commoncode, so it must be ported first.In this way we have created the Porting order:
    1. commoncode
    2. plugincode
    3. typecode
    4. extractcode
    5. textcode
    6. scancode basics (some tests are integration tests and will have to wait to be ported)
    7. formattedcode, starting with JSON (some tests are integration tests and will have to wait to be ported)
    8. cluecode
    9. licensedcode
    10. packagedcode (depends on licensecode)
    11. summarycode
    12. fixup the remaining bits and tests

After porting each module, i have marked these modules as ported scanpy3 with help of conffest plugin(created by @pombredanne). Conffest plugin is heart of this project. Without this, it was very difficult to do. Dependencies was fixed at the time of porting the module where it was used.

Challenging part of Project:

It is very difficult to deal with paths on different operating systems.The issue is around macOS/Windows/Linux. The first two OS handle unicode paths alright on Python 2 and 3 but not completely on macOS Mojave because its filesystem encoding is APFS. Linux paths are bytes and os.listdir is broken on Python 2. As a result you can only sanely handle Linux paths as bytes on Python 2. But on Python 3 path seems to be corrected as unicode on Linux.

For more details visit here : https://vstinner.github.io/painful-history-python-filesystem-encoding.html and jaraco/path.py#130

We came with various Solution:

  • To use pathlib which generally handle paths correctly across platforms. And for backports we use pathlib 2. But this solution also fails because pathlib 2 does not work as expected wrt unicode vs bytes. And os.listdir also doesn't work properly.

  • To use path.py which handles the paths across all the platforms even on macOS Mojave .

  • Use bytes on linux and python 3 and unicode everywhere.

We choose the third solution because it is most fundamental and simple and easy to use.

Project was tracked in this ticket nexB/scancode-toolkit#295

Project link : Port Scancode to Python 3

My contribution : List of Commits

Note : Please give your feedback here

Outcome :

Now we have liftoff on Python 3 . We are able to run basic scans without errors on develop branch.You check it by running scancode -clipeu samples/ --json-pp - -n4 .

At last I would like to thanks my Mentor @pombredanne aka Philippe Ombredanne .He has helped lot in completing this project.He is very supportive and responsive .I have learned a lot from him .By his encouragement and motivation, I am very improving day by day ,built and develop my skills and complete the project during GSoC timeline.

Clone this wiki locally