Debug_one does not show clearly the explanation of how x is predicted #1549

leandrolma3 · 2024-05-22T19:24:00Z

Versions

I'm working using Google Collab with the latest version of Python libraries including River.

Describe your task

I need to get an explanation of how x is predicted to convert them in the format of decision rules (IF-THEM). I've been trying all models in River that implement the concept of the Hoeffding Tree to classify a stream that arrives and get the predicted explanation with debug_one in order to convert them into rules. Unfortunately, the method Debug_one returns for the first streams only the predicted class with no explanation of which attributes and conditional were considered. Again, I've been trying different Hoeffding Tree modeçs and datasets in the River and the same occurs:

Here are some explanation samples that I got using the debug_one method:
Class True:
P(True) = 1.0

Another sample:
Class True:
P(False) = 0.3
P(True) = 0.7

What kind of performance are you expecting?

I expect to get all the attributes and conditional with values used to predict the class for a given data stream as:
Expected explanation :
empty_server_form_handler > 0.5454545454545454
popup_window ≤ 0.2727272727272727
Class False:
P(False) = 0.6
P(True) = 0.4

Steps/code to reproduce

# Sample code to reproduce the performance issue

from river import datasets
from river import drift
from river import tree
from river import metrics
from typing import List

classifier = tree.HoeffdingAdaptiveTreeClassifier(drift_detector=drift.ADWIN(delta=0.001))

def process_chunk(chunk_data: List[dict], chunk_labels: List[bool], metric, chunk_metric):
    i=0
    for xi, yi in zip(chunk_data, chunk_labels):
        y_pred = classifier.predict_one(xi)
        metric.update(yi, y_pred)
        chunk_metric.update(yi, y_pred)
        classifier.learn_one(xi, yi)
        rules = classifier.debug_one(xi)
        # print(f"Number of nodes {classifier.height}")
        if (i==0):
            print(f"Rules for instance {rules}")
            i=1
        # print(f"Rules for instance {rules}")
    print(f"Accuracy for this chunk: {chunk_metric.get()}%")
    return metrics.Accuracy()


stream = datasets.Phishing()
chunk_size = 100
metric = metrics.Accuracy()
chunk_metric = metrics.Accuracy()

chunk_data, chunk_labels = [], []

for x, y in stream:
    chunk_data.append(x)
    chunk_labels.append(y)
    
    if len(chunk_data) == chunk_size:
        chunk_metric = process_chunk(chunk_data, chunk_labels, metric, chunk_metric)
        chunk_data, chunk_labels = [], []

print(f"Final accuracy of the model: {metric.get()}%")

Necessary data

I appreciate all suggestions about this issue.

smastelini · 2024-05-22T21:32:56Z

Hi @leandrolma3, thanks for reporting. How many instances have the tree learned before asking for explanations?

If the tree consists of a single (root) node, then the expected output of debug_one is what you report.

leandrolma3 · 2024-05-23T20:05:00Z

Hi @smastelini, thank you very much for your reply. So, I checked and only after the middle of the 5th chunk processed did the method return a rule with a conditional. This refers to processing about 440 instances of the stream data.

Sorry, if I missed something about the Hoeffding trees, but I was expecting a decision rule even for a single root that would be represented by an attribute chosen from the dataset.

Reading more about I realize that the single (root) node is represented by a classification probability based on the data that arrived, correct? is there some explanation system for a single node of Hoeffding trees?

smastelini · 2024-05-23T21:19:32Z

Hi @leandrolma3. No, a single node tree does not apply any decision split. The outputs are taken care of by the underlying leaf decision model. By default, classification Hoeffding Trees use either Naive Bayes or majority vote, depending on which of these two options yields the best results.

Take a look at the grace_period params of the trees, which controls the interval before split attempts take place. If you only notice a change in the structure of the trees after around 400 instances, decreasing the grace period might accelerate tree growth. You can also try increasing the delta parameter to achieve the same end.

Keep in mind that from a data streaming standpoint, hundreds and even a few thousand samples might be just the start of the game :D
These models are designed to process potentially infinite streams of data.

leandrolma3 · 2024-05-27T18:46:42Z

Thank you for your time and great explanation @smastelini. I'm trying to implement some methods to get an explanation about the classification data by Hoeffding Trees before a decision split and your explanation helped me to validate my methodology. Thank you very much.

smastelini · 2024-05-28T13:24:30Z

Nice to hear that, @leandrolma3. Please, do not hesitate to ask more questions, if needed.

If your question was answered, can we close this issue?

leandrolma3 · 2024-05-28T15:05:40Z

Yes @smastelini you did answer. I'll close, thank you.

leandrolma3 closed this as completed May 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Debug_one does not show clearly the explanation of how x is predicted #1549

Debug_one does not show clearly the explanation of how x is predicted #1549

leandrolma3 commented May 22, 2024 •

edited

smastelini commented May 22, 2024

leandrolma3 commented May 23, 2024

smastelini commented May 23, 2024 •

edited

leandrolma3 commented May 27, 2024

smastelini commented May 28, 2024

leandrolma3 commented May 28, 2024

Debug_one does not show clearly the explanation of how x is predicted #1549

Debug_one does not show clearly the explanation of how x is predicted #1549

Comments

leandrolma3 commented May 22, 2024 • edited

Versions

Describe your task

What kind of performance are you expecting?

Steps/code to reproduce

Necessary data

smastelini commented May 22, 2024

leandrolma3 commented May 23, 2024

smastelini commented May 23, 2024 • edited

leandrolma3 commented May 27, 2024

smastelini commented May 28, 2024

leandrolma3 commented May 28, 2024

leandrolma3 commented May 22, 2024 •

edited

smastelini commented May 23, 2024 •

edited