Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't fail if the application_name is not set properly #63

Open
frost242 opened this issue Feb 15, 2017 · 3 comments
Open

Don't fail if the application_name is not set properly #63

frost242 opened this issue Feb 15, 2017 · 3 comments

Comments

@frost242
Copy link
Contributor

frost242 commented Feb 15, 2017

If you set application_name to a wrong value, pgsqlms returns a "OCF_ERR_CONFIGURED" that will trigger fencing of the nodes where the service are started. It should return "OCF_ERR_ARGS" instead.

pgsqlms should return a lower level error.

@ioguix
Copy link
Member

ioguix commented Feb 15, 2017

Hi @frost242,

In fact, it returns OCF_ERR_ARGS, but I suspect Pacemaker is trying to stop your resource in reaction and the stop action to fail for some other reasons with the OCF_ERR_CONFIGURED.

I'll try to reproduce and check what we can do.

@blogh
Copy link
Collaborator

blogh commented Feb 18, 2017

Hi @ioguix @frost242 ,

That's it:

Monitor fails

Feb 18 01:10:48 srv2 pgsqlms(pgsqld)[11678]: ERROR: Recovery template file must contain in primary_conninfo parameter "application_name=srv2"
Feb 18 01:10:48 srv2 lrmd[6192]:   notice: pgsqld_monitor_16000:11678:stderr [ ocf-exit-reason:Recovery template file must contain in primary_conninfo parameter "application_name=srv2" ]
Feb 18 01:10:48 srv2 crmd[6195]:   notice: srv2-pgsqld_monitor_16000:25 [ ocf-exit-reason:Recovery template file must contain in primary_conninfo parameter "application_name=srv2"\n ]

And again (the default value for on-fail is restart except for the stop action):

Feb 18 01:10:48 srv2 pgsqlms(pgsqld)[11689]: ERROR: Recovery template file must contain in primary_conninfo parameter "application_name=srv2"
Feb 18 01:10:48 srv2 lrmd[6192]:   notice: pgsqld_notify_0:11689:stderr [ ocf-exit-reason:Recovery template file must contain in primary_conninfo parameter "application_name=srv2" ]
Feb 18 01:10:48 srv2 crmd[6195]:   notice: Result of notify operation for pgsqld on srv2: 0 (ok)
Feb 18 01:10:48 srv2 crmd[6195]:   notice: srv2-pgsqld_monitor_16000:25 [ ocf-exit-reason:Recovery template file must contain in primary_conninfo parameter "application_name=srv2"\n ]

Then Stop fails

Feb 18 01:10:48 srv2 pgsqlms(pgsqld)[11700]: ERROR: Recovery template file must contain in primary_conninfo parameter "application_name=srv2"
Feb 18 01:10:48 srv2 lrmd[6192]:   notice: pgsqld_stop_0:11700:stderr [ ocf-exit-reason:Recovery template file must contain in primary_conninfo parameter "application_name=srv2" ]
Feb 18 01:10:48 srv2 crmd[6195]:   notice: Result of stop operation for pgsqld on srv2: 2 (invalid parameter)
Feb 18 01:10:48 srv2 crmd[6195]:   notice: srv2-pgsqld_stop_0:28 [ ocf-exit-reason:Recovery template file must contain in primary_conninfo parameter "application_name=srv2"\n ]

As @ioguix said, Failed stop = Fence:

Feb 18 01:10:48 srv2 stonith-ng[6191]:   notice: Operation reboot of srv2 by srv1 for [email protected]: OK
Feb 18 01:10:48 srv2 crmd[6195]:     crit: We were allegedly just fenced by srv1 for srv1!

@ioguix
Copy link
Member

ioguix commented Feb 20, 2017

Thank you for this study @blogh!

Right now, I'm not sure what is the best option to this issue. The obvious one would be to relax the pg_stop function to raise OCF_SUCCESS if we "feel" the local instance is really stopped. But I'm still uncomfortable with this presently :/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants