Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve reflection #35

Open
wants to merge 9 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/python-package.yml
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ jobs:
python -m pip install --upgrade pip
python -m pip install flake8 pytest pytest-cov
python -m pip install --upgrade setuptools setuptools_scm wheel
python setup.py install
python -m pip install .
#- name: Lint with flake8
# run: |
# # stop the build if there are Python syntax errors or undefined names
Expand Down
40 changes: 34 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,14 +48,42 @@ reposcanner --credentials tutorial/inputs/credentials.yml --config tutorial/inpu

3. examine the output files written to `tutorial/outputs`


# How to extend functionality

1. Create a new source file, `src/reposcanner/<routine.py>`, including a class
based on the `ContributorAccountListRoutine`. See `stars.py` as an
example of the kind of modifications required.
In the early days, the only way to extend Reposcanner was to modify its source, but now Reposcanner can be extended externally as well. We recommend the external method for future projects, so we don't create tons of forks of Reposcanner for each new analysis.

1. Create a new source file, `my_module.py` or `my_package/my_module.py`.

2. Import `reposcanner.requests` and one of {`reposcanner.routine` or `reposcanner.analysis`}, depending on if you want to write a routine or an analysis.

3. Locate the most relevant subclass of `reposcanner.requests.BaseRequestModel` and one of {`reposcanner.routines.DataMiningRoutine` or `reposcanner.analyses.DataAnalysis`}. E.g., for a routine that requires GitHub API access, one would subclass `OnlineRoutineRequest` and `OnlineRepositoryRoutine`. Reference the minimal blank example in `reposcanner.dummy` or real-world examples in `reposcanner.contrib`.

4. Write a config file that refers to your routines and analyses. See the next section on configuration files.

5. Check that `my_module` or `my_package.my_module` is importable. E.g., `python -c 'from my_module import MyRoutineOrAnalysis'`.
- The current working directory is implicitly in the `PYTHONPATH`, so your module or package will be importable if you run Python and Reposcanner from the directory which contains your module or package
- If your module or package does not reside in the current working directory, you need to add it to your `$PYTHONPATH` for it to be importable: `export PYTHONPATH=/path/to/proj:$PYTHONPATH`. This only has to be done once for your entire shell session. Note that the `$PYTHONPATH` should have the path to the directory containing your module or package, not the path to your module or package itself. E.g., In the previous example, if you have `/path/to/proj/my_module.py` or `/path/to/proj/my_package/my_module.py`, set the `PYTHONPATH` to `/path/to/proj`.

2. Add the new class name (for example `- StarGazersRoutine`) to the end of `config.yml`.
6. Run Reposcanner.

3. Run the test scan and inspect output to ensure your scan worked as intended.
# Input files


## config.yaml

The config file contains a list of routines and a list of analyses. Each routine or analysis is identified as `my_module:ClassName` or `my_module.my_package:ClassName`.

Within each routine, one can put a dictionary of keyword parameters that will get passed to that routine.

```
routines:
- my_module:NameOfOneRoutine
- routine: my_module:NameOfAnotherOneRoutine
arg0: "foo"
arg1: [1, 2, 3]
analysis:
- my_module:NameOfOneRoutine
- my_module:NameOfAnotherOneRoutine
arg0: "foo"
arg1: [1, 2, 3]
```
79 changes: 11 additions & 68 deletions src/reposcanner/contrib.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
from reposcanner.response import ResponseFactory
from reposcanner.provenance import ReposcannerRunInformant
from reposcanner.data import DataEntityFactory
from reposcanner.util import replaceNoneWithEmptyString as _replaceNoneWithEmptyString
import pygit2

from pathlib import Path
Expand All @@ -21,8 +22,7 @@


class CommitInfoMiningRoutineRequest(OfflineRoutineRequest):
def __init__(self, repositoryURL, outputDirectory, workspaceDirectory):
super().__init__(repositoryURL, outputDirectory, workspaceDirectory)
pass


class CommitInfoMiningRoutine(OfflineRepositoryRoutine):
Expand Down Expand Up @@ -126,16 +126,8 @@ def _getStats(commit):
changes['files'] += diff.stats.files_changed
return changes

def _replaceNoneWithEmptyString(value):
if value is None:
return ""
else:
return value

for commit in session.walk(session.head.target,
pygit2.GIT_SORT_TIME | pygit2.GIT_SORT_TOPOLOGICAL):
extractedCommitData = {}

# The person who originally made the change and when they made it, a
# pygit2.Signature.
author = commit.author
Expand Down Expand Up @@ -200,21 +192,7 @@ def _replaceNoneWithEmptyString(value):


class OnlineCommitAuthorshipRoutineRequest(OnlineRoutineRequest):
def __init__(
self,
repositoryURL,
outputDirectory,
username=None,
password=None,
token=None,
keychain=None):
super().__init__(
repositoryURL,
outputDirectory,
username=username,
password=password,
token=token,
keychain=keychain)
pass


class OnlineCommitAuthorshipRoutine(OnlineRepositoryRoutine):
Expand All @@ -226,13 +204,6 @@ class OnlineCommitAuthorshipRoutine(OnlineRepositoryRoutine):
def getRequestType(self):
return OnlineCommitAuthorshipRoutineRequest

def githubImplementation(self, request, session):
def _replaceNoneWithEmptyString(value):
if value is None:
return ""
else:
return value

factory = DataEntityFactory()
output = factory.createAnnotatedCSVData(
"{outputDirectory}/{repoName}_OnlineCommitAuthorship.csv".format(
Expand Down Expand Up @@ -270,12 +241,6 @@ def _replaceNoneWithEmptyString(value):
message="OnlineCommitAuthorshipRoutine completed!", attachments=output)

def gitlabImplementation(self, request, session):
def _replaceNoneWithEmptyString(value):
if value is None:
return ""
else:
return value

factory = DataEntityFactory()
output = factory.createAnnotatedCSVData(
"{outputDirectory}/{repoName}_OnlineCommitAuthorship.csv".format(
Expand Down Expand Up @@ -479,8 +444,7 @@ def execute(self, request):


class OfflineCommitCountsRoutineRequest(OfflineRoutineRequest):
def __init__(self, repositoryURL, outputDirectory, workspaceDirectory):
super().__init__(repositoryURL, outputDirectory, workspaceDirectory)
pass


class OfflineCommitCountsRoutine(OfflineRepositoryRoutine):
Expand Down Expand Up @@ -527,21 +491,7 @@ def offlineImplementation(self, request, session):


class ContributorAccountListRoutineRequest(OnlineRoutineRequest):
def __init__(
self,
repositoryURL,
outputDirectory,
username=None,
password=None,
token=None,
keychain=None):
super().__init__(
repositoryURL,
outputDirectory,
username=username,
password=password,
token=token,
keychain=keychain)
pass


class ContributorAccountListRoutine(OnlineRepositoryRoutine):
Expand All @@ -554,12 +504,6 @@ class ContributorAccountListRoutine(OnlineRepositoryRoutine):
def getRequestType(self):
return ContributorAccountListRoutineRequest

def _replaceNoneWithEmptyString(self, value):
if value is None:
return ""
else:
return value

def githubImplementation(self, request, session):
factory = DataEntityFactory()
output = factory.createAnnotatedCSVData(
Expand All @@ -578,9 +522,9 @@ def githubImplementation(self, request, session):
contributors = [contributor for contributor in session.get_contributors()]
for contributor in contributors:
output.addRecord([
self._replaceNoneWithEmptyString(contributor.login),
self._replaceNoneWithEmptyString(contributor.name),
';'.join([self._replaceNoneWithEmptyString(contributor.email)])
_replaceNoneWithEmptyString(contributor.login),
_replaceNoneWithEmptyString(contributor.name),
';'.join([_replaceNoneWithEmptyString(contributor.email)])

])

Expand All @@ -607,8 +551,8 @@ def gitlabImplementation(self, request, session):
contributors = [contributor for contributor in session.get_contributors()]
for contributor in contributors:
output.addRecord([
self._replaceNoneWithEmptyString(contributor.username),
self._replaceNoneWithEmptyString(contributor.name),
_replaceNoneWithEmptyString(contributor.username),
_replaceNoneWithEmptyString(contributor.name),
';'.join(contributor.emails.list())

])
Expand All @@ -619,8 +563,7 @@ def gitlabImplementation(self, request, session):


class FileInteractionRoutineRequest(OfflineRoutineRequest):
def __init__(self, repositoryURL, outputDirectory, workspaceDirectory):
super().__init__(repositoryURL, outputDirectory, workspaceDirectory)
pass


class FileInteractionRoutine(OfflineRepositoryRoutine):
Expand Down
19 changes: 2 additions & 17 deletions src/reposcanner/dummy.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,7 @@


class DummyOfflineRoutineRequest(OfflineRoutineRequest):
def __init__(self, repositoryURL, outputDirectory, workspaceDirectory):
super().__init__(repositoryURL, outputDirectory, workspaceDirectory)
pass


class DummyOfflineRoutine(OfflineRepositoryRoutine):
Expand Down Expand Up @@ -42,21 +41,7 @@ def offlineImplementation(self, request, session):


class DummyOnlineRoutineRequest(OnlineRoutineRequest):
def __init__(
self,
repositoryURL,
outputDirectory,
username=None,
password=None,
token=None,
keychain=None):
super().__init__(
repositoryURL,
outputDirectory,
username=username,
password=password,
token=token,
keychain=keychain)
pass


class DummyOnlineRoutine(OnlineRepositoryRoutine):
Expand Down
15 changes: 4 additions & 11 deletions src/reposcanner/git.py
Original file line number Diff line number Diff line change
Expand Up @@ -399,19 +399,12 @@ def __init__(self, credentialsDictionary):
a dictionary object, but got a {wrongType} instead!".format(
wrongType=type(credentialsDictionary)))

def safeAccess(dictionary, key):
"""A convenience function for error-free access to a dictionary"""
if key in dictionary:
return dictionary[key]
else:
return None

for entryName in credentialsDictionary:
entry = credentialsDictionary[entryName]
url = safeAccess(entry, "url")
username = safeAccess(entry, "username")
password = safeAccess(entry, "password")
token = safeAccess(entry, "token")
url = entry.get("url", None)
username = entry.get("username", None)
password = entry.get("password", None)
token = entry.get("token", None)
if url is None:
print("Reposcanner: Warning, the entry {entryName} in \
the credentials file is missing a URL. Skipping.".format(
Expand Down
Loading