Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement a transformation with if-else semantics #24

Open
anatoly-scherbakov opened this issue Jun 27, 2020 · 7 comments
Open

Implement a transformation with if-else semantics #24

anatoly-scherbakov opened this issue Jun 27, 2020 · 7 comments
Assignees
Labels
invalid This doesn't seem right

Comments

@anatoly-scherbakov
Copy link
Collaborator

Example usecases:

  • If the value is empty, replace it with value from variable (or hardcoded, or from another input column, or from another output column)
  • If the value is equal to string "aaa", replace it with something else
  • If the value is equal to variable named bbb, replace it with something else
  • If the value matches "\d{2}" regex, prepend it with "000"

...and whatever. We need to come up with:

  • syntax for that in the configuration file
  • and an efficient implementation
@anatoly-scherbakov anatoly-scherbakov added the enhancement New feature or request label Jun 27, 2020
@anatoly-scherbakov anatoly-scherbakov self-assigned this Jun 27, 2020
@anatoly-scherbakov
Copy link
Collaborator Author

anatoly-scherbakov commented Jun 27, 2020

Syntax 1

version: 1
columns:
  phone:
    - input: phone
    - match?: "\d{10}"
    - prepend: "1"
    - else:
      - value: ""

If the phone number matches specified pattern, we preprend 1. Otherwise, we replace it with an empty value.

The difficulty here is as follows. match?: "\d{10}" should return a boolean value, but the prepend operation obviously requires the string value obtained from input. This is not clear how to model such a thing monadically.

(Via typing, probably.)

@anatoly-scherbakov
Copy link
Collaborator Author

anatoly-scherbakov commented Jun 27, 2020

Syntax 2

version: 1
columns:
  phone:
    - input: phone
    - if:
      conditon:
        - match: "\d{10}"
      then:
        - prepend: "1"
      else:
        - value: ""

Kind of verbose, but more understandable.

@anatoly-scherbakov
Copy link
Collaborator Author

Syntax 3

I am trying to express things monadically here.

version: 1
columns:
  phone:
    - input: phone
    - match: "\d{10}"
    - map:
      - prepend: "1"
    - fix:
      - value: ""

I am afraid this is immensely verbose and will mean that every step in the algorithm must be tagged with a map, fix or something else.

I believe that, generally, every step in the transformations chain can be assumed to be a .map(). Is it not? How then to mark steps which are only executed when the value is an error?

@anatoly-scherbakov
Copy link
Collaborator Author

Syntax 4 (impossible)

version: 1
columns:
  phone:
    - input: phone
? - match: "\\d{10}"
:
 - then:
   - prepend: "1"
 - else:
   - value: ""

I tried to use the composite key syntax, but it only works at https://onlineyamltools.com/convert-yaml-to-json if the ? has zero indentation level. This will not work.

@anatoly-scherbakov
Copy link
Collaborator Author

anatoly-scherbakov commented Jun 27, 2020

Syntax 5 (actually 2 but improved)

version: 1
columns:
  phone:
    - input: phone
    - match?:
      pattern: "\d{10}"
      then:
        - prepend: "1"
      else:
        - value: ""

This is less verbose, but more specialized than the if - then - else scenario. We are going to have multiple conditionals like this:

  • empty?
  • equals?
  • greater?
  • less?
  • positive?
  • negative?
  • zero?

etc.

But this will also make the language more readable.

P. S. Will it? Say you need to compare the length of the string. Will you create separate functions for this, like

  • length-equals
  • length-greater
  • ...

I can see such a feature being very useful in multiple contexts of cleaning data.

@anatoly-scherbakov
Copy link
Collaborator Author

Syntax 6

version: 1
columns:
  phone:
    - input: phone
    - if: {match: /\d{10}/}
      then: {prepend: "1"}
      else: {value: ""}

Shorthand syntax for a familiar if .. then .. else construct.

@anatoly-scherbakov
Copy link
Collaborator Author

After some thought, I'd like to postpone this implementation due to the following reasons:

  • Use cases are not clear enough
  • I do not have real world examples for this functionality

@anatoly-scherbakov anatoly-scherbakov added invalid This doesn't seem right and removed enhancement New feature or request labels Jun 28, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
invalid This doesn't seem right
Projects
None yet
Development

No branches or pull requests

1 participant