Zend certified PHP/Magento developer

Tons of sleep processes in mysql from magento causing “connection refused”

On a Magento 2.3.3 site, a couple of times a month our site will go down (or run slowly to the point that it is unusable due to timeouts) for a seemingly unknown reason. The first thing I look at is the show processlist output on our mysql server, because when we’re experiencing this, the number of processes here is always much higher than normal. It will be fluctuating around ~150-160 processes, when normally it’s only around 20. It will occasionally go down to around 60-100, and then it will spike back up again. This will happen for around 2-4 hours. Our max_connections in mysql is set to 500.

The output of show processlist will show a large number of processes that are “Sleep” in the command column, with the database user that our Magento website uses, like this:

|  422 | MAGENTO-USER    | <SERVER-IP>:59344 | magento_database | Sleep       |    7 | | NULL             |    0.000 |
|  424 | MAGENTO-USER    | <SERVER-IP>:59358 | magento_database | Sleep       |   13 | | NULL             |    0.000 |
|  495 | MAGENTO-USER    | <SERVER-IP>:59828 | magento_database | Sleep       |    7 | | NULL             |    0.000 |
|  503 | MAGENTO-USER    | <SERVER-IP>:59874 | magento_database | Sleep       |    0 | | NULL             |    0.000 |
|  507 | MAGENTO-USER    | <SERVER-IP>:59894 | magento_database | Sleep       |    2 | | NULL             |    0.000 |
|  608 | MAGENTO-USER    | <SERVER-IP>:60600 | magento_database | Sleep       |    0 | | NULL             |    0.000 |
|  612 | MAGENTO-USER    | <SERVER-IP>:60622 | magento_database | Sleep       |    2 | | NULL             |    0.000 |
|  627 | MAGENTO-USER    | <SERVER-IP>:60710 | magento_database | Sleep       |  201 | | NULL             |    0.000 |
|  628 | MAGENTO-USER    | <SERVER-IP>:60716 | magento_database | Sleep       |    0 | | NULL             |    0.000 |
|  630 | MAGENTO-USER    | <SERVER-IP>:60738 | magento_database | Sleep       |  200 | | NULL             |    0.000 |
|  635 | MAGENTO-USER    | <SERVER-IP>:60764 | magento_database | Sleep       |  199 | | NULL             |    0.000 |
|  638 | MAGENTO-USER    | <SERVER-IP>:60780 | magento_database | Sleep       |  198 | | NULL             |    0.000 |
|  648 | MAGENTO-USER    | <SERVER-IP>:60834 | magento_database | Sleep       |    2 | | NULL             |    0.000 |
|  654 | MAGENTO-USER    | <SERVER-IP>:60872 | magento_database | Sleep       |    1 | | NULL             |    0.000 |
|  658 | MAGENTO-USER    | <SERVER-IP>:60896 | magento_database | Sleep       |   14 | | NULL             |    0.000 |
|  663 | MAGENTO-USER    | <SERVER-IP>:60924 | magento_database | Sleep       |  193 | | NULL             |    0.000 |
|  667 | MAGENTO-USER    | <SERVER-IP>:60950 | magento_database | Sleep       |    5 | | NULL             |    0.000 |

This will cause many connection refused/failed errors in our logs. I’m not sure why though, as our max_connections number of 500 never gets reached.

I’m trying to find the cause of this instead of messing with mysql config settings, as our site works fine 95% percent of the time. The 5% being these couple of exceptions a month.

Our cron logs look mostly normal (I think?). I don’t see any errors there except occassionally being “unable to acquire a lock for cron group”.

Any insights on how we can address this?