Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Decimal entry type #1322

Open
norberttech opened this issue Jan 3, 2025 · 1 comment
Open

Add Decimal entry type #1322

norberttech opened this issue Jan 3, 2025 · 1 comment
Assignees
Milestone

Comments

@norberttech
Copy link
Member

The float entry type is good enough for basic operations.
However, it might still lead to floating error since floats are not designed for precision, float should also only allow for precision up to 15 digits.

To address that issue we should introduce a Decimal type that would have:

  • precision up to 65 digits (before and after floating point)
  • scale up to 30 digits (after floating point)
  • stored as a string
  • use bcmath to handle calculations

Type guesser should detect strings that are floating point values as Decimals (since it can calculate precision automatically), so "123.123" would be a decimal with a precision of 6 and scale of 3 when 10.1243 would be a float with default precision 4.

In Floats precision should represent the number of digits after floating point, in Decimals it should be a total number of digits that Decimal represents

@norberttech norberttech added this to the 0.11.0 milestone Jan 3, 2025
@norberttech norberttech moved this to Todo in Roadmap Jan 3, 2025
@norberttech norberttech moved this from Todo to In Progress in Roadmap Jan 5, 2025
@norberttech norberttech self-assigned this Jan 5, 2025
@norberttech
Copy link
Member Author

Here is the result of a quick investigation:

  • adding DecimalEntry and DecimalType are not super hard
  • bcround was added only in php8.4 so we need to walk around it a bit
  • using precision makes very little sense and adds a lot of complexity related to proper rounding (unless we prefer to throw OverflowExceptions)
  • type_decimal should never be automatically detected, the rule for using decimals should be that value needs to be either cast to decimal or it needs to be created based on the schema element.
Decimal Type - initial draft
<?php

declare(strict_types=1);

namespace Flow\ETL\PHP\Type\Native;

use Flow\ETL\Exception\InvalidArgumentException;
use Flow\ETL\PHP\Type\Type;

/**
 * @implements Type<string>
 */
final readonly class DecimalType implements Type
{
    /**
     * @param bool $nullable
     * @param int $scale - number of digits to the right of the decimal point
     */
    public function __construct(private readonly bool $nullable = false, public readonly int $scale = 2)
    {
    }

    public static function fromArray(array $data) : Type
    {
        $nullable = $data['nullable'] ?? false;
        $scale = $data['scale'] ?? 6;

        return new self($nullable, $scale);
    }

    public function isComparableWith(Type $type) : bool
    {
        if ($type instanceof self) {
            return true;
        }

        if ($type instanceof NullType) {
            return true;
        }

        if ($type instanceof IntegerType || $type instanceof FloatType) {
            return true;
        }

        return false;
    }

    public function isEqual(Type $type) : bool
    {
        return $type instanceof self && $this->nullable === $type->nullable && $this->scale === $type->scale;
    }

    public function isValid(mixed $value) : bool
    {
        if ($this->nullable && $value === null) {
            return true;
        }

        return \is_numeric($value);
    }

    public function makeNullable(bool $nullable) : Type
    {
        return new self($nullable, $this->scale);
    }

    public function merge(Type $type) : Type
    {
        if ($type instanceof NullType) {
            return $this->makeNullable(true);
        }

        if (!$type instanceof self) {
            throw new InvalidArgumentException('Cannot merge different types, ' . $this->toString() . ' and ' . $type->toString());
        }

        $scale = min($type->scale, $this->scale);

        return new self($this->nullable || $type->nullable(), $scale);
    }

    public function normalize() : array
    {
        return [
            'type' => 'decimal',
            'nullable' => $this->nullable,
            'scale' => $this->scale,
        ];
    }

    public function nullable() : bool
    {
        return $this->nullable;
    }

    public function toString() : string
    {
        return ($this->nullable ? '?' : '') . 'decimal<' . $this->scale . '>';
    }
}
DesimalEntry - initial draft
<?php

declare(strict_types=1);

namespace Flow\ETL\Row\Entry;

use Flow\ETL\Exception\InvalidArgumentException;
use Flow\ETL\PHP\Type\Native\DecimalType;
use Flow\ETL\PHP\Type\Type;
use Flow\ETL\Row\Schema\Definition;
use Flow\ETL\Row\{Entry, Reference};

/**
 * @implements Entry<?string, string>
 */
final class DecimalEntry implements Entry
{
    use EntryRef;

    private readonly DecimalType $type;

    private readonly ?string $value;

    public function __construct(private readonly string $name, float|int|string|null $value, public readonly int $scale = 2)
    {
        if ('' === $name) {
            throw InvalidArgumentException::because('Entry name cannot be empty');
        }

        if ($value === null) {
            $this->value = null;
        } elseif (is_numeric($value)) {

            if (!\function_exists('bcround') || !\class_exists('RoundingMode')) {
                if (bccomp((string) $value, '-1') < 0) {
                    $roundedValue = bcsub((string) $value, '0.' . \str_repeat('0', $this->scale) . 5, $this->scale);
                } else {
                    $roundedValue = bcadd((string) $value, '0.' . \str_repeat('0', $this->scale) . 5, $this->scale);
                }
            } else {
                $roundedValue = bcround((string) $value, $this->scale, \RoundingMode::HalfAwayFromZero);
            }

            $parts = \explode('.', $roundedValue);
            $integerPart = $parts[0];
            $decimalPart = $parts[1] ?? '';

            $this->value = $integerPart . ($decimalPart !== '' ? '.' . $decimalPart : '');
        } else {
            throw InvalidArgumentException::because('Invalid value type: must be null, float, int, or numeric string.');
        }

        $this->type = new DecimalType($this->value === null, $this->scale);
    }

    public function __toString() : string
    {
        return $this->toString();
    }

    public function definition() : Definition
    {
        return Definition::decimal($this->name, $this->type->nullable(), $this->scale);
    }

    public function is(string|Reference $name) : bool
    {
        if ($name instanceof Reference) {
            return $this->name === $name->name();
        }

        return $this->name === $name;
    }

    public function isEqual(Entry $entry) : bool
    {
        $entryValue = $entry->value();

        if ($entryValue === null && $this->value === null) {
            return true;
        }

        if ($entryValue === null || $this->value === null) {
            return false;
        }

        return $this->value === $entryValue;
    }

    public function map(callable $mapper) : Entry
    {
        return new self($this->name, $mapper($this->value()));
    }

    public function name() : string
    {
        return $this->name;
    }

    public function rename(string $name) : Entry
    {
        return new self($name, $this->value);
    }

    public function toString() : string
    {
        return (string) $this->value;
    }

    public function type() : Type
    {
        return $this->type;
    }

    public function value() : ?string
    {
        return $this->value;
    }

    public function withValue(mixed $value) : Entry
    {
        return new self($this->name, $value);
    }
}
DesimalEntryTest - initial draft
<?php

declare(strict_types=1);

namespace Flow\ETL\Tests\Unit\Row\Entry;

use function Flow\ETL\DSL\decimal_entry;
use Flow\ETL\Tests\FlowTestCase;
use PHPUnit\Framework\Attributes\TestWith;

final class DecimalEntryTest extends FlowTestCase
{
    #[TestWith([1.0, 2, '1.00'])]
    #[TestWith([1.0, 1, '1.0'])]
    #[TestWith([1.0, 2, '1.00'])]
    #[TestWith([12345.12345, 2, '12345.12'])]
    #[TestWith([12345.12345, 1, '12345.1'])]
    #[TestWith([12345.19999, 2, '12345.20'])]
    #[TestWith([-12345.19999, 2, '-12345.20'])]
    public function test_creating_decimal_entries(int|string|float|null $input, int $scale, string $output) : void
    {
        self::assertSame($output, decimal_entry('name', $input, $scale)->value());
    }

    public function test_accessing_scale() : void
    {
        self::assertSame(4, decimal_entry('name', 1.0, 4)->scale);
        self::assertSame("1.0000", decimal_entry('name', 1.0, 4)->value());
    }
}

The main issue that needs to be resolved is that many mechanisms currently use native PHP comparisons.
Because of that, an operation like ref('int')->plus(ref('decimal')) will not return valid results.

I believe our best solution for this problem is to implement MathContext (maybe someone can propose a better name) which would take care of all mathematical operations between all possible entry types.

Only then DecimalType and DecimalEntry will become usable but it's out of the scope of this issue so for now I'm putting it on hold/

@norberttech norberttech moved this from In Progress to On Hold in Roadmap Jan 6, 2025
@norberttech norberttech removed this from the 0.11.0 milestone Jan 6, 2025
@norberttech norberttech added this to the 0.12.0 milestone Jan 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: On Hold
Development

No branches or pull requests

1 participant