Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prometheus exporter high memory usage #658

Open
stibi opened this issue Nov 21, 2023 · 6 comments
Open

Prometheus exporter high memory usage #658

stibi opened this issue Nov 21, 2023 · 6 comments
Labels
metrics question Further information is requested

Comments

@stibi
Copy link

stibi commented Nov 21, 2023

Hello,
we have trouble with solr exporter, it's very hungry for memory, it needs around ~6G of RAM, which is a lot and I can't figure out why.

Can I ask you any hint?

It's pretty much default setup, nothing custom:

SolrCloud 9.3.0. Nothing too much custom for the exporter deployment:

apiVersion: solr.apache.org/v1beta1
kind: SolrPrometheusExporter
metadata:
  name: solr-exporter
spec:
  customKubeOptions:
    podOptions:
      resources:
        requests:
          cpu: 500m
          memory: 3072Mi
        limits:
          cpu: 2000m
          memory: 6912Mi
      envVars:
        - name: JAVA_HEAP
          value: 6000m
  solrReference:
    cloud:
      name: "solr-cloud"
  numThreads: 6
Screenshot 2023-11-21 at 13 23 19
@radu-gheorghe
Copy link
Contributor

I think this tells it to allocate 6GB:

      envVars:
        - name: JAVA_HEAP
          value: 6000m

I assume it can do with much less than 6000m. Try a 10th of that and see how it goes.

@stibi
Copy link
Author

stibi commented Nov 21, 2023 via email

@stibi
Copy link
Author

stibi commented Nov 22, 2023

Ouch, so maybe I wasn't so wrong about it ... I removed the JAVA_HEAP env var, but the exporter started to failing with java.lang.OutOfMemoryError: Java heap space. Here we go, full circle :D

So I had to put the JAVA_HEAP back to see how much java heap space it actually needs and the number is 5G. With that much heap space, the exporter is running without error. But it takes quite some time to collect all the metrics, isn't that weird?

INFO  - 2023-11-22 09:53:39.225; org.apache.solr.prometheus.collector.SchedulerMetricsCollector; Completed metrics collection
INFO  - 2023-11-22 09:54:39.226; org.apache.solr.prometheus.collector.SchedulerMetricsCollector; Beginning metrics collection
INFO  - 2023-11-22 09:55:15.506; org.apache.solr.prometheus.collector.SchedulerMetricsCollector; Completed metrics collection
INFO  - 2023-11-22 09:56:15.506; org.apache.solr.prometheus.collector.SchedulerMetricsCollector; Beginning metrics collection
INFO  - 2023-11-22 09:56:53.088; org.apache.solr.prometheus.collector.SchedulerMetricsCollector; Completed metrics collection
INFO  - 2023-11-22 09:57:53.088; org.apache.solr.prometheus.collector.SchedulerMetricsCollector; Beginning metrics collection
INFO  - 2023-11-22 09:58:29.369; org.apache.solr.prometheus.collector.SchedulerMetricsCollector; Completed metrics collection
INFO  - 2023-11-22 09:59:29.369; org.apache.solr.prometheus.collector.SchedulerMetricsCollector; Beginning metrics collection
INFO  - 2023-11-22 10:00:06.842; org.apache.solr.prometheus.collector.SchedulerMetricsCollector; Completed metrics collection
INFO  - 2023-11-22 10:01:06.842; org.apache.solr.prometheus.collector.SchedulerMetricsCollector; Beginning metrics collection
INFO  - 2023-11-22 10:01:41.788; org.apache.solr.prometheus.collector.SchedulerMetricsCollector; Completed metrics collection
INFO  - 2023-11-22 10:02:41.788; org.apache.solr.prometheus.collector.SchedulerMetricsCollector; Beginning metrics collection
INFO  - 2023-11-22 10:03:22.174; org.apache.solr.prometheus.collector.SchedulerMetricsCollector; Completed metrics collection
INFO  - 2023-11-22 10:04:22.174; org.apache.solr.prometheus.collector.SchedulerMetricsCollector; Beginning metrics collection
INFO  - 2023-11-22 10:04:57.249; org.apache.solr.prometheus.collector.SchedulerMetricsCollector; Completed metrics collection
INFO  - 2023-11-22 10:05:57.250; org.apache.solr.prometheus.collector.SchedulerMetricsCollector; Beginning metrics collection

I was able to take a heap dump, using the jattach utility (awesome it's packaged with the container image, thanks for that!), but I guess I don't really know how to properly read it .. it says that the heap size is only 23549096B big ... which is 23.549096 MB? That's not so much.

Screenshot 2023-11-22 at 11 12 25

@radu-gheorghe
Copy link
Contributor

Yep, that's 23MB. Weird that it takes a while to collect metrics, is that a symptom (e.g. of the Exporter stuck in GC, then it doesn't have spare CPU to collect the metrics) or a cause (e.g. you have a ton of shards in the cluster, collecting them takes a while and takes heap)?

Maybe G1 falls behind with garbage collection? You can verify this hypothesis by setting the GC_TUNE env var to -XX:+UseG1GC -XX:GCTimeRatio=2. Unless you have a ton of shards, I'm expecting something like JAVA_HEAP=1g to be enough. Or maybe we're both missing something...

@stibi
Copy link
Author

stibi commented Nov 22, 2023

The cluster is not big at all I think, 1 shard, 2 replicas, ~ 8753202 documents, taking ~22Gb of memory ...

Thanks for hints, I'll take a look on Java metrics and how GC performs.

@radu-gheorghe
Copy link
Contributor

You're welcome.

If you need something to monitor GC/JVM metrics (and Solr metrics, for that matter), we have a tool that you might find useful.

@HoustonPutman HoustonPutman added question Further information is requested metrics labels Nov 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
metrics question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants