Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sentry-self-hosted-post-process-forwarder-errors-1 is constantly restarting - update from 23.6.2 to 24.4.2 #3034

Open
Mandalavandalz opened this issue May 9, 2024 · 6 comments

Comments

@Mandalavandalz
Copy link

Mandalavandalz commented May 9, 2024

Self-Hosted Version

24.4.2

CPU Architecture

x86_64

Docker Version

24.0.2

Docker Compose Version

2.18.1

Steps to Reproduce

  1. wget https://github.com/getsentry/self-hosted/archive/refs/tags/24.4.2.tar.gz
  2. tar -zxvf 24.4.2.tar.gz
  3. mv self-hosted-24.4.2 sentry
  4. cd sentry
  5. ./install.sh
  6. docker compose up -d

Expected Result

All sentry containers is running.

Actual Result

Instance sentry-self-hosted-post-process-forwarder-errors-1 continues to restart.

log:

Updating certificates in /etc/ssl/certs...
0 added, 0 removed; done.
Running hooks in /etc/ca-certificates/update.d...
done.
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/sentry/consumers/synchronized.py", line 177, in __check_commit_log_worker_running
    self.__commit_log_worker.result()
  File "/usr/local/lib/python3.11/concurrent/futures/_base.py", line 449, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/usr/local/lib/python3.11/site-packages/arroyo/utils/concurrent.py", line 31, in run
    result = function()
             ^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/sentry/consumers/synchronized.py", line 137, in __run_commit_log_worker
    commit = commit_codec.decode(message.payload)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/arroyo/backends/kafka/commit.py", line 51, in decode
    return self.decode_legacy(value)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/arroyo/backends/kafka/commit.py", line 84, in decode_legacy
    headers["orig_message_ts"].decode("utf-8"), DATETIME_FORMAT
    ~~~~~~~^^^^^^^^^^^^^^^^^^^
KeyError: 'orig_message_ts'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/arroyo/processing/processor.py", line 322, in run
    self._run_once()
  File "/usr/local/lib/python3.11/site-packages/arroyo/processing/processor.py", line 384, in _run_once
    self.__message = self.__consumer.poll(timeout=1.0)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/sentry/consumers/synchronized.py", line 211, in poll
    self.__check_commit_log_worker_running()
  File "/usr/local/lib/python3.11/site-packages/sentry/consumers/synchronized.py", line 179, in __check_commit_log_worker_running
    raise RuntimeError("commit log consumer thread crashed") from e
RuntimeError: commit log consumer thread crashed
18:58:13 [ERROR] arroyo.processing.processor: Caught exception, shutting down...
18:58:13 [INFO] arroyo.processing.processor: Closing <sentry.consumers.synchronized.SynchronizedConsumer object at 0x7fc06635a310>...
18:58:15 [INFO] arroyo.processing.processor: Processor terminated
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/sentry/consumers/synchronized.py", line 177, in __check_commit_log_worker_running
    self.__commit_log_worker.result()
  File "/usr/local/lib/python3.11/concurrent/futures/_base.py", line 449, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/usr/local/lib/python3.11/site-packages/arroyo/utils/concurrent.py", line 31, in run
    result = function()
             ^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/sentry/consumers/synchronized.py", line 137, in __run_commit_log_worker
    commit = commit_codec.decode(message.payload)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/arroyo/backends/kafka/commit.py", line 51, in decode
    return self.decode_legacy(value)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/arroyo/backends/kafka/commit.py", line 84, in decode_legacy
    headers["orig_message_ts"].decode("utf-8"), DATETIME_FORMAT
    ~~~~~~~^^^^^^^^^^^^^^^^^^^
KeyError: 'orig_message_ts'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/bin/sentry", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/usr/local/lib/python3.11/site-packages/sentry/runner/main.py", line 147, in main
    func(**kwargs)
  File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/click/decorators.py", line 33, in new_func
    return f(get_current_context(), *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/sentry/runner/decorators.py", line 83, in inner
    return ctx.invoke(f, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/click/decorators.py", line 33, in new_func
    return f(get_current_context(), *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/sentry/runner/decorators.py", line 35, in inner
    return ctx.invoke(f, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/sentry/runner/commands/run.py", line 386, in basic_consumer
    run_processor_with_signals(processor, consumer_name)
  File "/usr/local/lib/python3.11/site-packages/sentry/utils/kafka.py", line 46, in run_processor_with_signals
    processor.run()
  File "/usr/local/lib/python3.11/site-packages/arroyo/processing/processor.py", line 322, in run
    self._run_once()
  File "/usr/local/lib/python3.11/site-packages/arroyo/processing/processor.py", line 384, in _run_once
    self.__message = self.__consumer.poll(timeout=1.0)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/sentry/consumers/synchronized.py", line 211, in poll
    self.__check_commit_log_worker_running()
  File "/usr/local/lib/python3.11/site-packages/sentry/consumers/synchronized.py", line 179, in __check_commit_log_worker_running
    raise RuntimeError("commit log consumer thread crashed") from e
RuntimeError: commit log consumer thread crashed

Event ID

No response

@aldy505
Copy link
Collaborator

aldy505 commented May 10, 2024

Hello!

Have you try this solution? #2629 (comment)

@Mandalavandalz
Copy link
Author

Mandalavandalz commented May 10, 2024

Hello,
Yes, I added --no-strict-offset-reset to the 3 containers (post-process-forwarder-*) and there is no change, the error is the same. I also tried with replace in the docker-compose.yaml for rust-consumer as you mentioned here - no change.

@aldy505
Copy link
Collaborator

aldy505 commented May 10, 2024

Okay, I'm gonna (do another wild) guess that you're out of server resource. How's your CPU and RAM stats? Is your CPU too close to 100% usage? Is there at least 1 GB free RAM? Is it possible for you to bump those? On my office instance it's 8 CPU + 16 GB RAM + 12 GB swap. On my community instance it's 6 CPU + 16 GB RAM + 32 GB swap

I know bumping server specs isn't for everyone, but hey, it's a wild guess anyway.

@Mandalavandalz
Copy link
Author

This instance runs with 4 CPU cores and 16GB RAM without swap because it is in Amazon. There are two more instances with absolutely the same parameters and version 24.4.2, which have no problems at all. I increased the resources of the problematic one to 8 CPU cores and 32GB RAM, and there is no change - this container continues to crash with the same error.

@meenzen
Copy link

meenzen commented May 13, 2024

I'm also seeing constant crashes of sentry-self-hosted-post-process-forwarder-errors-1 after upgrading 24.4.0 -> 24.4.2

update: I've restored a backup of version 24.4.0 and it still happens.

After some more debugging it seems i've run into #2951 instead.

@hubertdeng123
Copy link
Member

I am not sure what might be the reason this is happening, but maybe kafka still has some legacy messages that are not processed and there is trouble there? What happens if you remake your kafka volume? Note: This will result in data loss of unprocessed messages.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Waiting for: Community
Status: No status
Development

No branches or pull requests

4 participants