Skip to content
This repository has been archived by the owner on Mar 10, 2018. It is now read-only.

Vmpy #26

Open
wants to merge 8 commits into
base: master
Choose a base branch
from
Open

Vmpy #26

Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -94,3 +94,4 @@ ENV/

# MacOS
.DS_Store
.idea
38 changes: 38 additions & 0 deletions CHANGES.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
Changelog
=========

0.X.X (unreleased)
------------------

Breaking changes:

- *add item here*

New features:

- *add item here*

Bug fixes:

- *add item here*


0.7.3 (in progress)
------------------

Breaking changes:

- *add item here*

New features:

- add vmpy-0.1.6 as a dependency
- compute_mean_max uses vmpy.metrics.power_duration_curve
- compute_weighted_average_power uses vmpy.metrics.normalized_power
- compute_power_per_kg uses vmpy.streams.wpk
- compute_power_zones based on athlete.ftp, custom ftp or custom zones

Bug fixes:

- Use of pytest.approx() in test_algorithms
- Fixed path handling in the pytest fixtures
59 changes: 59 additions & 0 deletions athletic_pandas/algorithms/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,20 @@


def mean_max_power(power):
"""Mean-Max Power

Also known as a Power-Duration-Curve

Parameters
----------
power: pd.Series
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True for now but I probably want to change to ndarray. Do you agree that's better?
Good that your are adding docstrings, I did not get to that yet...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments are based on what it does at the moment. In general, i prefer to have it array-like {list, ndarray, pd.Series} with output having the same time as an input.


Returns
-------
pd.Series

TO-DO: to be replaced by vmpy.algorithms.power_duration_curve()
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This module does not exist yet right? What's your idea behind this module?
Also, what is the difference between your implementation and mine?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does, I've moved it to vmpy.metrics.power_duration_curve() the last moment. It calculates accumulated energy first and than does two arrays diff with a shift. It speeds up calculations by x4. The 8hours ride takes about 10 seconds instead of almost a minute on my PC.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh fancy! Did you benchmark it to my implementation or yours?

"""
mmp = []

for i in range(len(power)):
Expand All @@ -23,11 +37,27 @@ def mean_max_power(power):


def mean_max_bests(power, duration, amount):
"""Compute multiple best non-overlapping intervals

Parameters
----------
power : pd.Series
duration : number
Duration of the interval in seconds
amount : int
Number of the non-overlaping intervals

Returns
-------
pd.Series
"""
moving_average = power.rolling(duration).mean()

length = len(moving_average)
mean_max_bests = []

for i in range(amount):

if moving_average.isnull().all():
mean_max_bests.append(DataPoint(np.nan, np.nan))
continue
Expand All @@ -45,10 +75,39 @@ def mean_max_bests(power, duration, amount):


def weighted_average_power(power):
"""Weighted average power

Also known as the Normalized Power
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normalized Power®️ (😉) is a trademarked name and cannot be used commercially without permission from Peaksware. That reason alone is enough for me to avoid using that name.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, didn't know that, but knowing the behavior of Coggan on forums should have guessed it. WAP is good, fits well with Strava community.


Parameters
----------
power : pd.Series

Returns
-------
number

replaced by vmpy.metrics.normalized_power
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Idem: What are the differences between your implementation and mine?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For NP - none, but the vmpy function also allows to calculate xPower that uses the exponentially weighted average.

"""

wap = power.rolling(30).mean().pow(4).mean().__pow__(1/4)

return wap


def power_per_kg(power, weight):
"""

Parameters
----------
power
weight

Returns
-------

replaced by vmpy.metrics.wpk
"""
ppkg = power / weight

return ppkg
20 changes: 17 additions & 3 deletions athletic_pandas/models.py
Original file line number Diff line number Diff line change
@@ -1,22 +1,36 @@
from . import algorithms
from .base import BaseWorkoutDataFrame
from .helpers import requires
from vmpy.metrics import normalized_power, power_duration_curve
from vmpy.streams import compute_zones, wpk


class WorkoutDataFrame(BaseWorkoutDataFrame):
_metadata = ['athlete']

@requires(columns=['power'])
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although I am not sure I still like the @requires decorator and consider removing it, for now it should also contain athlete=['ftp'].

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, I think the require decorator limits the usage. If you get data from Strava for example, the power stream is called 'watts' there. I would propose to ditch require in favor of sane defaults with an ability to override during the function call. Somethin along the line wdf.calculate_awesome_power(arg='power', **kwargs)

In this particular case I've removed require for an athlete to allow optionally specifying 'ftp' using keywords. It falls back on athlete['ftp'] as s default.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is really similar to the discussion about accepting multiple input argument types and I am not in favor of it. I think as a developer you have a responsibility to make the interface of you library as well defined as possible, accepting multiple input types or accepting that certain data is stored in different columns makes the interface unclear and therefore more difficult to use. For me this would also make the purpose of the dataframe redundant as you cannot just use the class methods "as is" without understanding the underlying data structure.
Anyway, I am writing a longer response where I will dive into this (even) more...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

accepting that certain data is stored in different columns makes the interface unclear and therefore more difficult to use

The problem with too simple interfaces is that they limit the use. In case of requiring 'power' only you force the user in figuring out how to rename the columns in pd.DataFrame to get it working. The power/watts duality is a one example forced on us by Strava, but with heartrate you will get a full multitude of the possibilities: {hr, bpm, heart_rate, you_name_it}

The methods can simply be written with defaults in mind as def method(self, arg='power'), and can be called as wdf.method() with defaults and with a user-defined name for other cases wdf.method(arg='watts'). This pattern is in fact not that different from how pandas already exposes matplotlib methods.

Another aspect is a sole reliance on the Athlete obj. I think it is less canonical and therefore less obvious for the user to expect that only way to parametrize wdf methods is through the Athlete object. It is a great placeholder for defaults, but it is also quite natural to expect the parameters to be exposed as a function signature.

Imagine you want to compare two Wbal calculations on the same data set, but with different W' and CP values. The only way to do it now is through redefining the Athlete properties, while I would naturally expect that W' and CP' should be available as inputs to the method call.

def compute_power_zones(self, **kwargs):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this method touches a sensitive subject ;) : In my view the WorkoutDataFrame should not contain any business logic and should call a method from algorithms as directly as possible. That means adding a method to the algorithms module (which in turn could call vmpy) and call it from here. The WorkoutDataFrame is just a more high level way to work with these algorithms but a user should be able to use all logic in athletic_pandas outside of the WorkoutDataFrame.
Does that make sense?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes sense, especially in light of the email I wrote yesterday. Let's see how we separate Frame from the algos in the most efficient way.

if kwargs.get('ftp', None):
return compute_zones(self.power, ftp=kwargs.get('ftp'))
elif kwargs.get('zones', None):
return compute_zones(self.power, zones=kwargs.get('zones'))
else:
return compute_zones(self.power, ftp=self.athlete.ftp)

@requires(columns=['power'])
def compute_mean_max_power(self):
return algorithms.mean_max_power(self.power)
return power_duration_curve(self.power)
# return algorithms.mean_max_power(self.power)

@requires(columns=['power'])
def compute_weighted_average_power(self):
return algorithms.weighted_average_power(self.power)
return normalized_power(self.power, type='NP')
# return algorithms.weighted_average_power(self.power)

@requires(columns=['power'], athlete=['weight'])
def compute_power_per_kg(self):
return algorithms.power_per_kg(self.power, self.athlete.weight)
return wpk(self.power, self.athlete.weight)
# return algorithms.power_per_kg(self.power, self.athlete.weight)

@requires(columns=['power'], athlete=['cp', 'w_prime'])
def compute_w_prime_balance(self, algorithm=None, *args, **kwargs):
Expand Down
1 change: 1 addition & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,5 @@ coverage==4.4.1
isort==4.2.15
lmfit==0.9.7
pandas==0.20.3
vmpy==0.1.6
pytest==3.2.5
4 changes: 2 additions & 2 deletions tests/test_algorithms.py
Original file line number Diff line number Diff line change
Expand Up @@ -105,8 +105,8 @@ def test_extended_critical_power(self, version, expected):
power = pd.Series(range(500))
mean_max_power = algorithms.mean_max_power(power)
ecp = algorithms.extended_critical_power(mean_max_power, version=version)
assert ecp[1] == expected[0]
assert ecp[250] == expected[1]
assert ecp[1] == pytest.approx(expected[0], 0.1)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I like the pytest.approx, did not know it! Thanks for the tip :)

assert ecp[250] == pytest.approx(expected[1], 0.1)
assert len(ecp) == expected[2]

def test_heartrate_model(self):
Expand Down
38 changes: 35 additions & 3 deletions tests/test_models.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
import unittest

import os
import numpy as np
import pandas as pd
import pytest
Expand All @@ -23,8 +24,9 @@ def wdf():
@pytest.fixture
def wdf_small():
athlete = models.Athlete(cp=200, w_prime=20000)
current_dir = os.path.dirname(os.path.abspath(__file__))
wdf = models.WorkoutDataFrame(
pd.read_csv('tests/example_files/workout_1_short.csv')
pd.read_csv(os.path.join(current_dir, 'example_files/workout_1_short.csv'))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am working on moving all pytest fixtures to a conftest.py file in the root of /tests, that will remove vague file paths like this.

)
wdf = wdf.set_index('time')
wdf.athlete = athlete
Expand All @@ -34,8 +36,9 @@ def wdf_small():
@pytest.fixture
def wdf_big():
athlete = models.Athlete(cp=200, w_prime=20000, weight=80)
current_dir = os.path.dirname(os.path.abspath(__file__))
wdf = models.WorkoutDataFrame(
pd.read_csv('tests/example_files/workout_1.csv')
pd.read_csv(os.path.join(current_dir, 'example_files/workout_1.csv'))
)
wdf = wdf.set_index('time')
wdf.athlete = athlete
Expand Down Expand Up @@ -170,8 +173,36 @@ def test_compute_mean_max_power_missing_power(self, wdf):
with pytest.raises(exceptions.MissingDataException):
assert wdf.compute_mean_max_power() is None

def test_compute_power_zones(self):
power = 4.5*np.array([0.54, 0.74, 0.89, 1.04, 1.19, 1.49, 10.0])
zones = [1,2,3,4,5,6,7]
wdf = models.WorkoutDataFrame({'power': power})
wdf.athlete = models.Athlete(ftp=4.5)
rv = wdf.compute_power_zones()
assert type(rv) == pd.Series
assert (rv == pd.Series(zones)).all()


def test_compute_power_zones_custom_ftp(self):
power = 4.5*np.array([0.54, 0.74, 0.89, 1.04, 1.19, 1.49, 10.0])
zones = [1,2,3,4,5,6,7]
wdf = models.WorkoutDataFrame({'power': power})
rv = wdf.compute_power_zones(ftp=4.5)
assert type(rv) == pd.Series
assert (rv == pd.Series(zones)).all()


def test_compute_power_zones_custom_zones(self):
power = np.array([0.54, 0.74, 0.89, 1.04, 1.19, 1.49, 10.0])
zones = [1,2,3,4,5,6,7]
wdf = models.WorkoutDataFrame({'power': power})
rv = wdf.compute_power_zones(zones=[-0.001, 0.55, 0.75, 0.9, 1.05, 1.2, 1.5, 10.0, 1000])
assert type(rv) == pd.Series
assert (rv == pd.Series(zones)).all()


def test_compute_weighted_average_power(self, wdf_big):
assert wdf_big.compute_weighted_average_power() == 156.24624656343036
assert wdf_big.compute_weighted_average_power() == pytest.approx(156.24624656343036, 0.1)

def test_compute_weighted_average_power_missing_power(self, wdf_big):
del wdf_big['power']
Expand All @@ -181,6 +212,7 @@ def test_compute_weighted_average_power_missing_power(self, wdf_big):

def test_compute_power_per_kg(self, wdf_big):
# wdf_big.athlete.ftp = 300
# wdf_big.athlete.weight = 80
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was just a reminder for myself to remember what is the actual weight. Can be removed.

ppkg = wdf_big.compute_power_per_kg()

assert ppkg[1] == 1.1625000000000001
Expand Down