Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nextcloud > 27.1.3.2 : Configuring Redis as session handler Segmentation fault (core dumped) #2106

Open
AceDenghar opened this issue Nov 29, 2023 · 24 comments
Labels

Comments

@AceDenghar
Copy link

AceDenghar commented Nov 29, 2023

Hi,

Since last week i'm experiencing this issue with nextcloud 21.1.3 :

/opt/dockerConfs/nextcloud_lagamelle.txupy.com# docker logs -f nextcloud_<domain>_html 
Configuring Redis as session handler
Segmentation fault (core dumped)

Here is my docker-compose.yml file :

version: "3"

networks:
 2_vnet_nextcloud_<domain>:
  external: true

volumes:
 db:
    name: nextcloud_<domain>_db
 redis:
    name: nextcloud_<domain>_redis

services:
  app:
    container_name: nextcloud_<domain>_html
    hostname: nextcloud01SRV
    image: nextcloud:latest
#    image: nextcloud:27.1.4
    networks:
     2_vnet_<domain>:
      ipv4_address: 172.2.0.10
    #ports:
        #- "6379:6379/tcp"
    restart: unless-stopped
    depends_on:
     db:
      condition: service_started
     redis:
      condition: service_started
    links:
     - db
     - redis
    volumes:
#     - app:/var/www/html
     - /opt/HDDExt/dockerDatas/nextcloud_<domain>/app:/var/www/html
     - /opt/HDDExt/dockerDatas/nextcloud_<domain>/app/data:/var/www/html/data
     - "/etc/timezone:/etc/timezone:ro"
     - "/etc/localtime:/etc/localtime:ro"


    environment:
     - MYSQL_PASSWORD=#######
     - MYSQL_DATABASE=#######
     - MYSQL_USER=#######
     - MYSQL_HOST=db
     - PHP_UPLOAD_LIMIT=40G
     - PHP_MEMORY_LIMIT=8G
#     - NEXTCLOUD_ADMIN_USER=#######
#     - NEXTCLOUD_ADMIN_PASSWORD=#######
#     - NEXTCLOUD_TRUSTED_DOMAINS=<domain>
     - SMTP_HOST=smtp.office365.com
     - SMTP_SECURE=TLS
     - SMTP_PORT=587
     - SMTP_AUTHTYPE=LOGIN
     - SMTP_NAME=#######
     - SMTP_PASSWORD=#######
     - MAIL_FROM_ADDRESS=#######
     - MAIL_DOMAIN=outlook.com
     - REDIS_HOST=nextcloud_<domain>_redis
#     - REDIS_HOST_PASSWORD=#######

  db:
    image: mariadb:10.5
    container_name: nextcloud_<domain>_db
    networks:
     2_vnet_nextcloud_<domain>:
      ipv4_address: 172.2.0.11
    restart: always
    command: --transaction-isolation=READ-COMMITTED --binlog-format=ROW
    volumes:
     - "/etc/timezone:/etc/timezone:ro"
     - "/etc/localtime:/etc/localtime:ro"
#     - db:/var/lib/mysql
     - /opt/HDDExt/dockerDatas/nextcloud_<domain>/db:/var/lib/mysql
#     - /opt/HDDExt/dockerDatas/nextcloud_<domain>/db:/var/lib/mysql
    environment:
      - MYSQL_ROOT_PASSWORD=#######
      - MYSQL_PASSWORD=#######
      - MYSQL_DATABASE=#######
      - MYSQL_USER=#######

  redis:
    image: redis
    container_name: nextcloud_<domain>_redis
    networks:
     2_vnet_nextcloud_<domain>:
      ipv4_address: 172.2.0.12
    volumes:
     - /opt/HDDExt/dockerDatas/nextcloud_<domain>/redis:/data
     - "/etc/timezone:/etc/timezone:ro"
     - "/etc/localtime:/etc/localtime:ro"
    restart: always
#    command: redis-server –requirepass #######

Pulling latest image does not work anymore (i was waiting for 27.1.4 to test).

Even if i'm creating a fresh docker instance using the default yml file etc, i have the same issue with branch 27.1 (last version mentionned in config.php is 27.1.3).

But i'm able to create a new instance via 27.0.2, it's running fine.

So i suspect a change in 27.1 (i think it was between 27.1.2 and 27.1.3) but can't downgrade, and container is still starting so i can't use docker exec to explore.

Sorry for my english, and for your help.

@AceDenghar
Copy link
Author

AceDenghar commented Nov 29, 2023

Note : If i set an old version of Nextcloud (27.0.2.1) to have the downgrade error, container is well starting and logs mentions that datas are matching version 27.1.3.2 :

Can't start Nextcloud because the version of the data (27.1.3.2) is higher than the docker image version (27.0.2.1) and downgrading is not supported. Are you sure you have pulled the newest image version?

By the way i can use docker exec to explore container if needed

Thanks

@AceDenghar
Copy link
Author

On an other impacted instance i''m sure it was running well in version 27.1.3.2 because i used on the 2023/11/24 the Nextcloud security Scan which gave :

Running Nextcloud 27.1.3.2
Latest patch level
Major version still supported
Scanned at 2023-11-24 16:24:12

This instance was impacted the same day as the main one i'm talking in this topic and was non-essential (almost no datas) so i broke it and i made a fresh new one working only in 27.0.2

So the problem impacting my two instances, upgraded as same time, occurred after using them with version 27.1.3.2.

@AceDenghar
Copy link
Author

By the way, how is it possible to pull the specific 27.1.3.2 image ?

My docker instance was cleaned and I don't have it. Via docker hub it seems I can't specify precisely to use it.

If so, I could get my Nextcloud instance back (which will be great for me cause I need it) and prove the problem occurs with superiors versions.

Thanks

@AceDenghar AceDenghar changed the title Configuring Redis as session handler Segmentation fault (core dumped) Nextcloud > 27.1.3.2 : Configuring Redis as session handler Segmentation fault (core dumped) Nov 29, 2023
@AceDenghar
Copy link
Author

AceDenghar commented Nov 30, 2023

I confirm that with the provided Base version - apache docker-compose.yml content ont this page, so by really having a fresh instance, the problem occurs :

# docker logs -f nextcloud_<domain>_app_1
Segmentation fault (core dumped)

But when I set a previous nextcloud tag (for example 27.0) it works :

# docker logs -f nextcloud_<domain>_app_1
Initializing nextcloud 27.0.2.1 ...
New nextcloud instance
Initializing finished
=> Searching for scripts (*.sh) to run, located in the folder: /docker-entrypoint-hooks.d/before-starting
==> but the hook folder "before-starting" is empty, so nothing to do
(...)

Please, while waiting a solution, would it be possible, at least for a short while, to let the 27.1.3.2 image on docker available so i could restart my instance with it ?

I'm stucked.....

Thank you very much.

(and sorry again for my english)

@AceDenghar
Copy link
Author

Still stuck....

Please...if someone could have a look 🙏

Thanks

@AceDenghar AceDenghar mentioned this issue Dec 2, 2023
@D-side
Copy link

D-side commented Dec 2, 2023

I believe I started facing the same issue on a rather long-living installation back in mid-September.

First this started happening after the upgrade from 26.0.3 to 26.0.6.

Due to one of my own maintenance oversights I didn't make a backup before the upgrade (live and learn I guess, even minor upgrades like this can break things big time), but I managed to partially roll things back and upgrade to 26.0.4 which still functioned fine with Redis.

Now, months later, I'm getting back to this properly and armed with backups, so drastic experiments are now an option 💪

With the help of strace I found that the segfault is happening as it connects to Redis and sets the keepalive flag on the connection socket:

connect(5, {sa_family=AF_INET, sin_port=htons(6379), sin_addr=inet_addr("{{redacted correct IP address of Redis}}")}, 16) = -1 EINPROGRESS (Operation now in progress)
poll([{fd=5, events=POLLIN|POLLOUT|POLLERR|POLLHUP}], 1, 60000) = 1 ([{fd=5, revents=POLLOUT}])
getsockopt(5, SOL_SOCKET, SO_ERROR, [0], [4]) = 0
fcntl(5, F_SETFL, O_RDWR)               = 0
setsockopt(5, SOL_TCP, TCP_NODELAY, [1], 4) = 0
setsockopt(5, SOL_SOCKET, SO_KEEPALIVE, [0], 4) = 0
--- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0x10} ---

…a-a-a-and then it segfaults.
So I tried removing Redis (v5 back then) outright.
And it helped.
So I took the liberty of upgrading Nextcloud further, all the way to 27.1.4.
I also tried using a newer version of Redis (v7.2.3, both alpine and bookwork), to no avail.

Since I'm running a small server with only a few users and apps, lack of Redis isn't such a big concern for me, but if I can help with tracking down the culprit, I can spare some time. I'm just not sure where to look at this point. The culprit seems to be whatever's connecting Nextcloud and Redis.
Pointers appreciated!


For easier reference, here are the versions of Nextcloud I ran with Redis (v5):

  • 26.0.3 works
  • 26.0.4 works
  • 26.0.6 broken
  • 26.0.9 broken
  • 27.1.4 broken

PS: this seems to affect Docker images both based on Alpine and on Debian. At least I've seen it happen on both. I can't recall which ones were which at this point unfortunately.

@D-side
Copy link

D-side commented Dec 2, 2023

Found another case of the same issue happening in the forums, with the same workaround (removing Redis entirely): https://help.nextcloud.com/t/nextcloud-docker-segmentation-fault-during-upgrade/175063
(And, amazingly, even the same troubleshooting method! 😆 I swear that's not me there.)

@D-side
Copy link

D-side commented Dec 2, 2023

My hypothesis would be the upgrade of redis PHP module to 6.0 and beyond:

  • NC 26.0.5 => phpredis 5.3.7 (src)
  • NC 26.0.5 => phpredis 6.0.0 (src)
  • NC 26.0.7 => phpredis 6.0.0 (src)
  • NC 26.0.7 => phpredis 6.0.1 (src)
  • NC 26.0.9 => phpredis 6.0.2 (src)

Notice the first version 6.0 was used on was 26.0.5 — right in the gap of versions where breakage seems to start in my experiments.

@AceDenghar
Copy link
Author

Oh !!!

Thank you very much for your help ! 🤗

It's late here and I'm going to sleep. I'll will check all of these next morning !

Will have good sleep 🤗

@AceDenghar
Copy link
Author

It was late and i did'nt realize that i alradeay tested using it without redis.

Take this docker-compose yml i made for purpose tests i have just tested again now :

version: "3"
networks:
 33_vnet_nextcloud_preprod_<instanceDomain>:
  external: true

services:

  nextcloud_preprod_<instanceDomain>_app:
    container_name: nextcloud_preprod_<instanceDomain>_app
    hostname: nextcloud_preprod_<instanceDomain>_app_SRV
    image: nextcloud:27.0.2
    networks:
     33_vnet_nextcloud_preprod_<instanceDomain>:
      ipv4_address: 172.33.0.10
    links:
      - nextcloud_preprod_<instanceDomain>_db
    volumes:
      - "/etc/timezone:/etc/timezone:ro"
      - "/etc/localtime:/etc/localtime:ro"
      -  ./dockerDatasFolder/nextcloud:/var/www/html
      -  ./dockerDatasFolder/apps:/var/www/html/custom_apps
      -  ./dockerDatasFolder/config:/var/www/html/config
      -  ./dockerDatasFolder/data:/var/www/html/data
      -  ./dockerDatasFolder/theme:/var/www/html/themes
    environment:
      - MYSQL_PASSWORD=***
      - MYSQL_DATABASE=nextcloud
      - MYSQL_USER=nextcloud
      - MYSQL_HOST=nextcloud_preprod_<instanceDomain>_db
    restart: unless-stopped

  nextcloud_preprod_<instanceDomain>_db:
    container_name: nextcloud_preprod_<instanceDomain>_db
    hostname: nextcloud_preprod_<instanceDomain>_db_SRV
    image: mariadb:10.6
    command: --transaction-isolation=READ-COMMITTED --log-bin=binlog --binlog-format=ROW
    networks:
     33_vnet_nextcloud_preprod_<instanceDomain>:
      ipv4_address: 172.33.0.11
    volumes:
      - "/etc/timezone:/etc/timezone:ro"
      - "/etc/localtime:/etc/localtime:ro"
      -  ./dockerDatasFolder/db:/var/lib/mysql
    environment:
      - MYSQL_ROOT_PASSWORD=***
      - MYSQL_PASSWORD=***
      - MYSQL_DATABASE=nextcloud
      - MYSQL_USER=nextcloud
    restart: unless-stopped

So there is nos redis service.

In this case where Nextcloud 27.0.2 image is specified, Nextcloud's container starts well.

But if (after clearing all volume's datas so i'ts fresh) i don't set a version or at least 27.1, it won't start and i'll get the segmentation fault.

So it's not specifically related to redis, nor to my docker host because it's very related to version superior to 27.1.3.2 (as i could determine because my main instance was working whit this one) and i can't downgrade datas to version 27.0.2
And it's not related to my datas because it happen with a fresh new instance.

My problem is i'm stucked because i can't pull this specific image (27.1.3.2) at least to wait a solution with versions superior to it.

(Sorry for my English, hoping all is clear enough).

Thanks.

@D-side
Copy link

D-side commented Dec 3, 2023

Ah. Right, dependency upgrades run in parallel in major branches. Makes sense.

In 27, the last version of Nextcloud to use phpredis below 6 was exactly 27.0.2 with 5.3.7.

A matching change in two branches causing a breakdown is way beyond coincidence.


As a workaround I would suggest replacing it with phpredis 5.3.7 with a custom Dockerfile.

If that works out, we probably want to downgrade it in the original Dockerfiles here and report the problem upstream, to either Nextcloud or phpredis, depending on what it is.

@martadinata666
Copy link

Is the tag 27.1.3 works? and did this most basic command work? docker run nextcloud?
You can also modify your compose to just start bash instead doing entrypoint by adding

    entrypoint: bash
    tty: true
    stdin_open: true

Then you can go into the container terminal and look around there, I kinda sus about the docker-entrypoint.sh.

@AceDenghar
Copy link
Author

I started migrating datas to a container forced to use version 27.0.2 because I really need a working instance.

When it will be done, I'll try what you mentioned on an other fresh instance to determine what is possible to do when this problem occurs.

Have to wait a little before starting doing this. I'll keep you in touch on this thread 😉

@D-side
Copy link

D-side commented Dec 3, 2023

For your particular situation, @AceDenghar, 27.1.4 is out already, which 27.1.3.2 should upgrade to nicely.

But the issue with Redis connectivity is still there, and as far as I can tell, it's caused by the upgrade of phpredis (the Redis client for PHP) to version 6.0. The current upgrade script does not seem to be concerned whatsoever about major upgrades, it just picks the latest versions available.

@J0WI J0WI added the upstream label Dec 12, 2023
@J0WI
Copy link
Contributor

J0WI commented Dec 12, 2023

There was a similar issue in #2071. phpredis 5.x is no longer supported. I think they do not support multiple (major) versions at all.

@D-side
Copy link

D-side commented Dec 12, 2023

Well, an unsupported version is arguably better than a broken one. But it's a tough call, I hear you. Running unsupported versions is risky too.
Ideally this needs to be fixed upstream of course, As I reported before, even the current 6.0.2 isn't working.

I have barely any experience troubleshooting PHP apps unfortunately, especially native parts. I can wield Docker with relative confidence though. Anything I can help with here?

@J0WI
Copy link
Contributor

J0WI commented Dec 12, 2023

Would you like to report this upstream? AFAIK this crash has not yet been reported.

@D-side
Copy link

D-side commented Dec 12, 2023

(Some tinkering later)
Huh. Some interesting results.
I tried replicating the issue on a clean install to give folks at phpredis something to work with.
It got hairy. As in, it did not reproduce.

So I started comparing configuration files between installations and one line in /var/www/html/config/redis.config.php caught my eye:

-       'password' => getenv('REDIS_HOST_PASSWORD'),
+       'password' => (string) getenv('REDIS_HOST_PASSWORD'),

…and I can confirm that it fixes the problem in my original installation and will probably help everyone else. 🎉
So it's a variant of #1232, except instead of providing a proper error message about a missing password it just segfaults violently.

So two questions:


edit:
Also, for good measure: applying this change in reverse (as in, removing (string)) to the fresh installation triggers the problem. So we can reliably reproduce it as well now 🙂

@J0WI
Copy link
Contributor

J0WI commented Dec 20, 2023

Nice finding!

Is this still something to report to phpredis?

IMHO a crash is never a valid behaviour. I'd rather expect an error on misconfiguration.

Isn't an update of the image supposed to apply that change from #1232 to existing configuration files too?

No, the config files are in upgrade.exclude and only created for new installations.

@D-side
Copy link

D-side commented Dec 21, 2023

IMHO a crash is never a valid behaviour. I'd rather expect an error on misconfiguration.

Agreed. A'ight, I'll send a report to phpredis in a bit, will link it here.

and only created for new installations.

I figured that's how it works right now, yeah, but if there's a backwards-incompatible change in configurations, like this one, how is a user of the image supposed to learn about it?

@D-side
Copy link

D-side commented Dec 21, 2023

Also, Nextcloud 28 has worked around the issue: nextcloud/server#38568

@D-side
Copy link

D-side commented Jan 10, 2024

The fix for phpredis is in the develop branch and will land in the next release that automatic upgrades here will pick up.

@joshtrichards
Copy link
Member

I figured that's how it works right now, yeah, but if there's a backwards-incompatible change in configurations, like this one, how is a user of the image supposed to learn about it?

@D-side See #1533. And @flortsch has been working on proposed implementation approach in PR #2120

@phpcodemonkey
Copy link

(Some tinkering later) Huh. Some interesting results. I tried replicating the issue on a clean install to give folks at phpredis something to work with. It got hairy. As in, it did not reproduce.

So I started comparing configuration files between installations and one line in /var/www/html/config/redis.config.php caught my eye:

-       'password' => getenv('REDIS_HOST_PASSWORD'),
+       'password' => (string) getenv('REDIS_HOST_PASSWORD'),

…and I can confirm that it fixes the problem in my original installation and will probably help everyone else. 🎉 So it's a variant of #1232, except instead of providing a proper error message about a missing password it just segfaults violently.

I can confirm that this solution works! I had 2 installations, one working (that just didn't have a password definition line in redis.config.php) and one not working (that had the definition line, but no (string) cast) - adding the (string) cast fixed the broken installation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants