Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

st2auth failed create token mongoengine.errors.NotUniqueError: Tried to save duplicate unique keys #6047

Open
philipphomberger opened this issue Oct 24, 2023 · 14 comments

Comments

@philipphomberger
Copy link

SUMMARY

Provide a quick summary of your bug report.

We get a lot Events in Nagios that we route to st2 using the stackstorm-nagios pack. This works most the time very well. But in scale we get Problems than there a many parallel events.

In this case send some events failed.
We can see that in the st2auth log:

2023-10-24 06:56:07,215 139812116836016 ERROR base [-] Conflict while trying to save in DB.
Traceback (most recent call last):
File "/opt/stackstorm/st2/lib/python3.8/site-packages/mongoengine/document.py", line 398, in save
object_id = self._save_create(doc, force_insert, write_concern)
File "/opt/stackstorm/st2/lib/python3.8/site-packages/mongoengine/document.py", line 463, in _save_create
object_id = wc_collection.insert_one(doc).inserted_id
File "/opt/stackstorm/st2/lib/python3.8/site-packages/pymongo/collection.py", line 698, in insert_one
self._insert(document,
File "/opt/stackstorm/st2/lib/python3.8/site-packages/pymongo/collection.py", line 613, in _insert
return self._insert_one(
File "/opt/stackstorm/st2/lib/python3.8/site-packages/pymongo/collection.py", line 602, in _insert_one
self.__database.client._retryable_write(
File "/opt/stackstorm/st2/lib/python3.8/site-packages/pymongo/mongo_client.py", line 1498, in _retryable_write
return self._retry_with_session(retryable, func, s, None)
File "/opt/stackstorm/st2/lib/python3.8/site-packages/pymongo/mongo_client.py", line 1384, in _retry_with_session
return self._retry_internal(retryable, func, session, bulk)
File "/opt/stackstorm/st2/lib/python3.8/site-packages/pymongo/mongo_client.py", line 1416, in _retry_internal
return func(session, sock_info, retryable)
File "/opt/stackstorm/st2/lib/python3.8/site-packages/pymongo/collection.py", line 600, in _insert_command
_check_write_command_response(result)
File "/opt/stackstorm/st2/lib/python3.8/site-packages/pymongo/helpers.py", line 226, in _check_write_command_response
_raise_last_write_error(write_errors)
File "/opt/stackstorm/st2/lib/python3.8/site-packages/pymongo/helpers.py", line 207, in _raise_last_write_error
raise DuplicateKeyError(error.get("errmsg"), 11000, error)
pymongo.errors.DuplicateKeyError: E11000 duplicate key error collection: st2.user_role_assignment_d_b index: role_1_user_1_source_1 dup key: { : "admin", : "user-nagios", : "mappings/sensorsldap.yaml" }, full error: {'index': 0, 'code': 11000, 'errmsg': 'E11000 duplicate key error collection: st2.user_role_assignment_d_b index: role_1_user_1_source_1 dup key: { : "admin", : "user-nagios", : "mappings/sensorsldap.yaml" }'}

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/opt/stackstorm/st2/lib/python3.8/site-packages/st2common/persistence/base.py", line 185, in add_or_update
model_object = cls._get_impl().add_or_update(model_object, validate=True)
File "/opt/stackstorm/st2/lib/python3.8/site-packages/st2common/models/db/init.py", line 606, in add_or_update
instance.save(validate=validate)
File "/opt/stackstorm/st2/lib/python3.8/site-packages/mongoengine/document.py", line 421, in save
raise NotUniqueError(message % err)
mongoengine.errors.NotUniqueError: Tried to save duplicate unique keys (E11000 duplicate key error collection: st2.user_role_assignment_d_b index: role_1_user_1_source_1 dup key: { : "admin", : "user-nagios", : "mappings/sensorsldap.yaml" }, full error: {'index': 0, 'code': 11000, 'errmsg': 'E11000 duplicate key error collection: st2.user_role_assignment_d_b index: role_1_user_1_source_1 dup key: { : "admin", : "user-nagios", : "mappings/sensorsldap.yaml" }'})

STACKSTORM VERSION

Paste the output of st2 --version:
st2 3.8.0, on Python 3.8.16

OS, environment, install method

Post what OS you are running this on, along with any other relevant information/

  • e.g. Docker, Vagrant, Kubernetes, etc. Describe how you installed ST2
  • e.g. one-line install, custom install, etc -->

RHEL 8

Steps to reproduce the problem

Show how to reproduce the problem, using a minimal test-case. Make sure to include any content
(pack content - workflows, actions, etc.) which are needed to reproduce the problem.

Clone the stackstorm-nagios pack
Go to etc dir in that pack.
Edit the config yaml with user pasword.
Create a bashscript to run the Nagios Python Script parallel.
Like that:

python3 ./st2service_handler.py st2service_handler.yaml 44534 3 WARNING HARD "/var/log" 4 host-name --verbose & python3 ./st2service_handler.py st2service_handler.yaml 44534 3 WARNING HARD "/var/log" 4 host-name --verbose & python3 ./st2service_handler.py st2service_handler.yaml 44534 3 WARNING HARD "/var/log" 4 host-name --verbose & python3 ./st2service_handler.py st2service_handler.yaml 44534 3 WARNING HARD "/var/log" 4 host-name --verbose &

Expected Results

What did you expect to happen when running the steps above?

I expect hat all Events come to ST2.

Actual Results

What happened? What output did you get?

We get always some 403 errors.
But at the same time some other Events of Nagios come in.

Making sure to follow these steps will guarantee the quickest resolution possible.

Thanks!

Thank you for your Advice

@guzzijones
Copy link
Contributor

without further knowledge on what 'nagios' is doing it is going to be really difficult to offer help. A short simplified example of the issue would be a good start. Anything you can do to provide further details of your setup and workflows would help, too.

@guzzijones
Copy link
Contributor

it looks like st2auth is trying to assign roles?

@guzzijones
Copy link
Contributor

guzzijones commented Nov 1, 2023

I am guessing everytime someone signs in st2auth it is trying to add/update the roles in

pymongo.errors.DuplicateKeyError: E11000 duplicate key error collection: st2.user_role_assignment_d_b index: 

role_1_user_1_source_1 dup key: { : "admin", : "user-nagios", : "mappings/sensorsldap.yaml" }, full error: {'index': 0, 'code': 

11000, 'errmsg': 'E11000 duplicate key error collection: st2.user_role_assignment_d_b index: role_1_user_1_source_1 dup 

key: { : "admin", : "user-nagios", : "mappings/sensorsldap.yaml" }'}

@guzzijones
Copy link
Contributor

This is probably a deadlock situation where we need to sort the information we are adding/updating so every time st2auth tries to update the index it does so in the same order and we avoid these situations.

@guzzijones
Copy link
Contributor

Mongo is attempting to update the index and failing it looks like index

@guzzijones
Copy link
Contributor

Yeah, st2auth syncs roles on each login. I am guessing it deletes them all before it does that and thus the duplicate insertions? I will keep digging. sync

@guzzijones
Copy link
Contributor

try setting the config setting

rbac:
       sync_remote_groups: false

It should bypass the sync on each login and avoid this error altogether.
config sync remote groups

@philipphomberger
Copy link
Author

Hi, thank you for your answer. One Question than I disable that setting will the rbac (LDAP Roles to User) still working ?

@guzzijones
Copy link
Contributor

Looks like disabling this will disable role syncing on login, but it could still be ran manually ?

st2-apply-rbac-definitions --config-file=/etc/st2/st2.conf

@guzzijones
Copy link
Contributor

guzzijones commented Dec 19, 2023

yes, it deletes all the roles then re-adds them .
a fix would be to find out what should be deleted, updated, added. and then do those in a deterministic order to avoid deadlocks during transactions.

but for now you can disable the sync and then manually run st2-apply-rbac-definitions as it also runs the sync

I am thinking this is run during a helm update fwiw. yes this runs during a helm upgrade as a job

You need to manually add users using 'assignments' in the config. role syncing will not happen for new users.

@guzzijones
Copy link
Contributor

guzzijones commented Feb 23, 2024

I was actually looking at the wrong code.
Logins use RBACRemoteGroupToRoleSyncer which does not have this issue.
the sync function there does already implement CRUD methodology.

@guzzijones
Copy link
Contributor

guzzijones commented Feb 23, 2024

I would need to see the full stack trace. we already ignore that error in the syncer code. what version are you on?

@guzzijones
Copy link
Contributor

st2-apply-rbac-definitions does still have the issue, but it should only be on an upgrade when that script is run.

@philipphomberger
Copy link
Author

I would need to see the full stack trace. we already ignore that error in the syncer code. what version are you on?

Do you Mean ST2 Version ? to the Time 3.8.0 now we are on 3.8.1. But because the problem only happen than there a significant count of requests comming from Nagios. That do for any event send a new authentication against the auth token entpoint of st2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants