Skip to content

Latest commit

 

History

History
215 lines (169 loc) · 8.17 KB

cEP-0021.md

File metadata and controls

215 lines (169 loc) · 8.17 KB

Profile Bears

Metadata
cEP 0021
Version 1.0
Title Profile Bears
Authors Vaibhav Kumar Rai mailto:[email protected]
Status Proposed
Type Process

Abstract

This cEP describes the detailed implementation of a profiling interface as a part of the GSoC'18 project.

Introduction

Profiling a Python program is a form of dynamic analysis that measures the execution time of the program and the functions it is composed of. Its goal is to produce data about where program is spending time and what area might be worth optimizing. Thus we can provide a Profiling interface from within coala and make it available to bear writers so that they can create better performant bears.

This cEP proposes to introduce the profiling interface so that bear writers will have the ability to Profile the Bear's code to optimize its performance.

Proposed Change

Bear writers can invoke the profiler with the --profile-bears argument.

Bear writer will have the ability to directly dump the raw profile output (binary output) either on current working directory or to a specified directory name, which can be further used for examination of profiler stats with the help of different modules like pstats or snakeviz.

  • --profile-bears or profile_bears (using .coafile) is the main argument to enable profiling. It accepts an additional parameter Dirname through which bear writers can specify where to store the dump.

The profiler will start by profiling the run() method of bears because it consumes most of the bears time. So, this is part where bear writer will spend time, as rest of the part like loading the files, collecting the settings, etc. are done by coala itself.

Implementation Details

When user invokes the profiler with either CLI or by coafile then the execute method of Bear base class will call the run_bear_from_section method.

def run_bear_from_section(self, args, kwargs):
    ...
    profile_bears = kwargs.get('profile_bears')
    if profile_bears is None:
        self.run(*args, **kwargs)
    else:
        self.profile(self.run, *args, **kwargs)
    ...

run_bear_from_section method checks whether bear writer invoke the profiler interface or not i.e., keyword arguments passed to the Bear contain the profile_bears key or not.

  • If not then run_bear_from_section() method will directly call the run() method of Bear so that coala can continue its normal execution.

  • Else run_bear_from_section() method will call profile() method which will check the whether the value of profile_bears key value is True or Dirname.

    • If True then profile() will create a profiler file for every bear of every section with a name {section_name}_{bear_name}.prof and dump it in a current working directory.

      path = os.getcwd()
      if str(profile_bears) == 'True':
          filename = '{}_{}.prof'.format(
              kwargs['section_name'], kwargs['bear_name'])
          open(filename, 'wb')
          kwargs.pop('section_name')
          kwargs.pop('bear_name')
          cProfile.runctx('self.run(*args, **kwargs)',
                          globals(), locals(), filename)
    • If Dirname then it check whether the specified directory exits or not.

      If exists then profile() method will create a profiler file for every bear of every section with a name {section_name}_{bear_name}.prof and dumps it on the directory Dirname.

      If not then it will create a directory with the literal name Dirname and dump then profiler files on that.

      subdir = str(profile_dump)
      filepath = os.path.join(path, subdir)
      if os.path.isdir(filepath):
          filename = '{}_{}.prof'.format(
              kwargs['section_name'], kwargs['bear_name'])
          open(os.path.join(filepath, filename), 'wb')
          kwargs.pop('section_name')
          kwargs.pop('bear_name')
          cProfile.runctx('self.run(*args, **kwargs)',
                          globals(), locals(), os.path.join(filepath, filename))
      else:
          os.mkdir(os.path.join(path, subdir))
          filename = '{}_{}.prof'.format(
              kwargs['section_name'], kwargs['bear_name'])
          open(os.path.join(filepath, filename), 'wb')
          kwargs.pop('section_name')
          kwargs.pop('bear_name')
          cProfile.runctx('self.run(*args, **kwargs)',
                          globals(), locals(), os.path.join(filepath, filename))

Profiler Output

A profiler produces a binary output that will contain a set of statistics that describes how often and for how long various parts of the program executed. These statistics can be formatted into reports via the pstats module or by using third party tools like SnakeViz. These formatted reports will help bear writers in creating a better performant bear by analysing the profiler output.

For e.g.

With CLI arguments

coala --b PEPE8Bear --files <file.py> --profile-bears

It will create a profile file with name cli_PEP8Bear. and after that Bear writers can use different modules like pstats or snakeviz, which reads profile results from a file and formats them in various ways i.e.

  • Using pstats :

    pstats has a simple line-oriented interface (implemented using cmd). A pstats module is a part of core Python. So, anyone can with a direct import pstats and start working with it.

    An example of how to generate a formatted output using pstats module and manipulate the profiler data using different methods of psats, like strip_dirs(), sort_stats('cumtime'). Here strip_dirs() remove the extraneous path from all the module names and sort_stats('cumtime') sort the output by cumulative time spent in this and all subfunctions (from invocation till exit).

    >>> import pstats
    >>> ps = pstats.Stats('cli_PEP8Bear.prof')
    >>> ps.strip_dirs().sort_stats('cumtime').print_stats(5)
    Thu May 31 00:16:03 2018    python_PEP8Bear.prof
    
           7 function calls in 0.000 seconds
    
     Ordered by: cumulative time
     List reduced from 6 to 5 due to restriction <5>
    
     ncalls  tottime  percall  cumtime  percall filename:lineno(function)
          1    0.000    0.000    0.000    0.000 {built-in method builtins.exec}
          1    0.000    0.000    0.000    0.000 <string>:1(<module>)
          1    0.000    0.000    0.000    0.000 __init__.py:111(wrapping_function)
          2    0.000    0.000    0.000    0.000 {method 'get' of 'dict' objects}
          1    0.000    0.000    0.000    0.000 PEP8Bear.py:22(run)
    
      <pstats.Stats object at 0x7f2e69028a90>

    For more information about pstats, check this link.

  • Using SnakeViz:

    SnakeViz is a browser based graphical viewer for the output of Python’s cProfile module. SnakeViz creates sunburst and icicle diagrams using profiler data as input. Using sunburst diagram Snakeviz represent the amount of time spent inside a function as a angular width of the arc.

    Sometimes functions don’t just spend time calling other functions, they also have their own internal time. SnakeViz shows this by putting a special child on each node that represents internal time.

    SnakeViz also provide a Funtion Info functionality in which it display a list of information about function like function name, Cumulative Time, name of the file in which the function is defined, line number on which the function is defined, Directory of a flle.

    For more information about SnakeViz, check this link.

    SnakeViz is available on PyPI. Install with pip i.e.

    $ pip install snakeviz

    Start SnakeViz from the command line i.e.

    $ snakeviz cli_PEP8Bear.prof