Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(agent): Add support for input probing #16333

Merged

Conversation

LandonTClipp
Copy link
Contributor

@LandonTClipp LandonTClipp commented Dec 19, 2024

Summary

This PR adds support in RunningInput and the Agent to call an input plugin's Probe method if it exists and if the plugin has been configured for probing.

Checklist

  • No AI generated code was used in this PR

Related issues

@telegraf-tiger telegraf-tiger bot added the feat Improvement on an existing feature such as adding a new setting/mode to an existing plugin label Dec 19, 2024
@srebhan srebhan changed the title feat(startup_error_behavior = "probe"): Add support for input probing. feat(agent): Add support for input probing. Dec 19, 2024
Copy link
Member

@srebhan srebhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @LandonTClipp for the PR! Some minor comments in the code, the most important one is to call Stop if probing fails to not leak connections or the like.

agent/agent.go Outdated Show resolved Hide resolved
models/running_input_test.go Outdated Show resolved Hide resolved
plugin.go Outdated Show resolved Hide resolved
agent/agent.go Outdated Show resolved Hide resolved
@srebhan srebhan changed the title feat(agent): Add support for input probing. feat(agent): Add support for input probing Dec 19, 2024
@srebhan srebhan self-assigned this Dec 19, 2024
@srebhan srebhan added area/aws AWS plugins including cloudwatch, ecs, kinesis plugin/input 1. Request for new input plugins 2. Issues/PRs that are related to input plugins labels Dec 19, 2024
@LandonTClipp
Copy link
Contributor Author

Blocked on #16364.

@Hipska
Copy link
Contributor

Hipska commented Jan 6, 2025

I'm not sure this resolves #15915 as it doesn't add a Probe method for nvidia plugin?

@LandonTClipp
Copy link
Contributor Author

I'm not sure this resolves #15915 as it doesn't add a Probe method for nvidia plugin?

You're right, I changed the verbiage.

@LandonTClipp LandonTClipp force-pushed the LandonTClipp/probe_implementation branch from c7bab87 to 84a5bd9 Compare January 13, 2025 16:47
@LandonTClipp
Copy link
Contributor Author

@srebhan should be ready for another review.

Copy link
Member

@srebhan srebhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @LandonTClipp for your contribution!

agent/agent.go Outdated Show resolved Hide resolved
@srebhan srebhan added the ready for final review This pull request has been reviewed and/or tested by multiple users and is ready for a final review. label Jan 15, 2025
@srebhan srebhan assigned DStrand1 and unassigned srebhan Jan 15, 2025
@Hipska
Copy link
Contributor

Hipska commented Jan 16, 2025

Could you give some comments on #16052 (review) please? As the spec is not really clear. Also a link to the spec in the PR message would be nice. (https://github.com/influxdata/telegraf/blob/master/docs/specs/tsd-009-probe-on-startup.md)

@DStrand1 DStrand1 merged commit 01aa1a3 into influxdata:master Jan 21, 2025
25 of 26 checks passed
@github-actions github-actions bot added this to the v1.34.0 milestone Jan 21, 2025
@Hipska
Copy link
Contributor

Hipska commented Jan 21, 2025

#16052 (comment)

So starting from now (and when plugins implement it) Telegraf will need a restart in order for plugins to recover?

@LandonTClipp
Copy link
Contributor Author

So starting from now (and when plugins implement it) Telegraf will need a restart in order for plugins to recover?

Starting from now, when plugins support it, and startup_error_behavior is set to probe, plugins will indeed be disabled for the lifetime of the process if they fail probing on startup.

@Hipska
Copy link
Contributor

Hipska commented Jan 22, 2025

Why would you ever want that and have a plugin only recover when telegraf is restarted? I think I'm missing the use case here..

@LandonTClipp
Copy link
Contributor Author

This particular use-case deals with environments running in a public cloud that I am helping to build. We distribute a custom build of Telegraf, along with associated config, that our users apt install from a public repo we maintain. It is hard to know what resources will be available in the customer environment ahead of time. One possibility is for the package to run a pre-install.sh script that introspects the resources and dynamically creates an appropriate Telegraf config file. Another is for the pre-install.sh to reach out to some metadata service to ask what should be installed and then dynamically create the config file. The other option, which this PR implements, says that Telegraf should just disable the plugin entirely if its associated resource is not available.

Note that this does not affect Telegraf's behavior if the resource existed at process startup but fails at some point after. In that case, Telegraf will continue to retry the plugin endlessly. The PR allows you to describe the case in which you want to assume that a resource will never exist if it doesn't exist at process startup.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/aws AWS plugins including cloudwatch, ecs, kinesis feat Improvement on an existing feature such as adding a new setting/mode to an existing plugin plugin/input 1. Request for new input plugins 2. Issues/PRs that are related to input plugins ready for final review This pull request has been reviewed and/or tested by multiple users and is ready for a final review.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants