openDTU crash with 4x HM-1500 from releases after 24.9.30 #2535

plieven · 2025-02-09T13:10:28Z

What happened?

openDTU 25.2.3 crashes periodically (every 30-60 seconds) with 4x HM-1500 attached. With only 2 of the HM 1500 it seems to work.

I was on 24.5.6 before. That worked flawlessly for months. I have gone back from 25.2.3 through all minor releases back to 24.9.30. 24.9.30 is now working for a few hours.

I once obseverd very high heap fragmentation levels (>70%) shortly before the esp crashed. Afaik this is only estimated on ESP32, but might give a hint.

ESP ist ESP32-D0WD-V3.

To Reproduce Bug

OpenDTU on ESP32-D0WD-V3 with more than 2x HM-1500 and software release newer than 24.9.30.

Expected Behavior

ESP should not periodically crash.

Install Method

Pre-Compiled binary from GitHub releases

What git-hash/version of OpenDTU?

25.2.3

What firmware variant (PIO Environment) are you using?

opendtu-generic.bin

Relevant log/trace output

Anything else?

No response

Please confirm the following

I believe this issue is a bug that affects all users of OpenDTU, not something specific to my installation.
I have already searched for relevant existing issues and discussions before opening this report.
I have updated the title field above with a concise description.
I have double checked that my inverter does not contain a W in the model name (like HMS-xxxW) as they are not supported.

tbnobody · 2025-02-09T13:17:07Z

Please provide a log of the serial console when the bug occours.. Otherwise it will not be possible to trace this issue.

plieven · 2025-02-11T20:22:34Z

I will try to attach a serial console during the week. The DTU is installed outdoor so I have to dismount it first.

plieven · 2025-02-14T08:10:42Z

It took a little longer than expected because I had to find out that the CH340 was broken. I first hat to order a new ESP board.

`RX Period End
All missing
Nothing received, resend whole request
TX AlarmData Channel: 23 --> 15 83 77 57 42 80 16 24 32 80 11 00 67 AE F8 5A 00 00 00 00 00 00 00 00 3B FF 4A
RX Period End
All missing
Nothing received, resend whole request
TX AlarmData Channel: 40 --> 15 83 77 57 42 80 16 24 32 80 11 00 67 AE F8 5A 00 00 00 00 00 00 00 00 3B FF 4A
RX Period End
All missing
Nothing received, resend whole request
TX AlarmData Channel: 61 --> 15 83 77 57 42 80 16 24 32 80 11 00 67 AE F8 5A 00 00 00 00 00 00 00 00 3B FF 4A
RX Period End
All missing
Nothing received, resend whole request
TX AlarmData Channel: 75 --> 15 83 77 57 42 80 16 24 32 80 11 00 67 AE F8 5A 00 00 00 00 00 00 00 00 3B FF 4A
Interrupt received
RX Channel: 75 --> 95 83 77 57 42 83 77 57 42 01 00 01 80 01 00 01 65 FA 65 FA 00 00 00 00 80 24 B1 | -80 dBm
Interrupt received
RX Channel: 75 --> 95 83 77 57 42 83 77 57 42 03 69 E9 FF FF FF F9 80 02 00 04 6A 0A 6A 0A 00 00 96 | -80 dBm
Interrupt received
RX Channel: 75 --> 95 83 77 57 42 83 77 57 42 04 00 07 80 02 00 05 6A 10 6A 10 FF FF FF FA 80 02 96 | -80 dBm
Fetch inverter: 116185122930
Request device info
Queue size - NRF: 9 CMT: 0
Interrupt received
RX Channel: 3 --> 95 83 77 57 42 83 77 57 42 05 00 06 6A 32 6A 32 00 00 00 06 80 02 00 07 6A 37 48 | -80 dBm
Interrupt received
RX Channel: 75 --> 95 83 77 57 42 83 77 57 42 06 6A 37 FF FF FF FB 80 02 00 08 6A 5A 6A 5A 00 00 40 | -80 dBm
E (145946) task_wdt: Task watchdog got triggered. The following tasks did not reset the watchdog in time:
E (145946) task_wdt: - async_tcp (CPU 1)
E (145946) task_wdt: Tasks currently running:
E (145946) task_wdt: CPU 0: IDLE0
E (145946) task_wdt: CPU 1: async_tcp
E (145946) task_wdt: Aborting.

abort() was called at PC 0x4012965c on core 0

Backtrace: 0x40083d9d:0x3ffbed3c |<-CORRUPTED

ELF file SHA256: cc2702188b671a14

E (10602) esp_core_dump_flash: Core dump flash config is corrupted! CRC=0x7bd5c66f instead of 0x0
Rebooting...
ets Jul 29 2019 12:21:46

rst:0xc (SW_CPU_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
configsip: 0, SPIWP:0xee
clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
mode:DIO, clock div:2
load:0x3fff0030,len:1184
load:0x40078000,len:13232
load:0x40080400,len:3028
entry 0x400805e4
E (708) esp_core_dump_flash: No core dum��ѥѥ��found!
E (708) esp_core_dump_flash: No core dump partition found!

Starting OpenDTU
Initialize FS... done
Reading configuration... Check for default DTU serial... done
done
Reading language pack... done
Reading PinMapping... [ 202][E][vfs_api.cpp:105] open(): /littlefs/pin_mapping.json does not exist, no permits for creation
using default config done
Initialize Network... done
Setting Hostname... Configuring WiFi STA using new credentials... done
Initialize NTP... done
Initialize SunPosition... done
Initialize MqTT... done
Initialize WebApi... done
Initialize Display... done
Initialize LEDs... done
Initialize Hoymiles interface... NRF: Connection successful
Setting radio PA level...
Setting DTU serial...
Setting poll interval...
Adding inverter: 116185122930 - xxx
Adding inverter: 116183773950 - xxx
Adding inverter: 116184895235 - xxx
Adding inverter: 116183775742 - xxx
done
Switch to WiFi mode
Setting Hostname... done
Configuring WiFi STA using new credentials... done
Configuring WiFi STA DHCP IP... done
WiFi connected
WiFi got ip: 192.168.178.128
Network connected
`

plieven · 2025-02-14T08:47:20Z

@tbnobody its scraping of /api/prometheus/metrics that causes the reset. I will try to figure out whats going wrong.

plieven · 2025-02-14T13:30:33Z

I bisected the issue down to this:

2878807 is the first bad commit
commit 2878807
Author: Thomas Basler [email protected]
Date: Fri Nov 1 21:56:41 2024 +0100

Upgrade ESPAsyncWebServer from 3.3.17 to 3.3.21

platformio.ini | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

plieven · 2025-02-14T16:31:20Z

The internal buffer for the AsyncWebServerResponse was changed from a standard char buffer to StringStream between those releases. I will first try 3.7.0 and then if reverting from StringStream back to a char buf the next days.

nakott · 2025-02-17T07:59:36Z

In the past, I also saw unexpected openDTU restarts. I'm using one HM-1500. I recorded my observation since mid of last year:

opendtu-generic_240930.bin ->sometimes unexpected restart during night (values are zero afterwards)
opendtu-generic_241015.bin ->so far so good
opendtu-generic_241107.bin ->sometimes unexpected restart during night (values are zero afterwards)
opendtu-generic_250114.bin ->so far so good
opendtu-generic_250203.bin ->sometimes unexpected restart during night (values are zero afterwards)

Currently I'm back on 250114. I'm using a LAN OpenDTU HW.

plieven added the bug Something isn't working label Feb 9, 2025

nakott mentioned this issue Feb 24, 2025

Unexpected reboots #2554

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

openDTU crash with 4x HM-1500 from releases after 24.9.30 #2535

openDTU crash with 4x HM-1500 from releases after 24.9.30 #2535

plieven commented Feb 9, 2025

tbnobody commented Feb 9, 2025

plieven commented Feb 11, 2025

plieven commented Feb 14, 2025

plieven commented Feb 14, 2025

plieven commented Feb 14, 2025

plieven commented Feb 14, 2025

nakott commented Feb 17, 2025 •

edited

Loading

openDTU crash with 4x HM-1500 from releases after 24.9.30 #2535

openDTU crash with 4x HM-1500 from releases after 24.9.30 #2535

Comments

plieven commented Feb 9, 2025

What happened?

To Reproduce Bug

Expected Behavior

Install Method

What git-hash/version of OpenDTU?

What firmware variant (PIO Environment) are you using?

Relevant log/trace output

Anything else?

Please confirm the following

tbnobody commented Feb 9, 2025

plieven commented Feb 11, 2025

plieven commented Feb 14, 2025

plieven commented Feb 14, 2025

plieven commented Feb 14, 2025

plieven commented Feb 14, 2025

nakott commented Feb 17, 2025 • edited Loading

nakott commented Feb 17, 2025 •

edited

Loading