Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

openDTU crash with 4x HM-1500 from releases after 24.9.30 #2535

Open
4 tasks done
plieven opened this issue Feb 9, 2025 · 7 comments
Open
4 tasks done

openDTU crash with 4x HM-1500 from releases after 24.9.30 #2535

plieven opened this issue Feb 9, 2025 · 7 comments
Labels
bug Something isn't working

Comments

@plieven
Copy link

plieven commented Feb 9, 2025

What happened?

openDTU 25.2.3 crashes periodically (every 30-60 seconds) with 4x HM-1500 attached. With only 2 of the HM 1500 it seems to work.

I was on 24.5.6 before. That worked flawlessly for months. I have gone back from 25.2.3 through all minor releases back to 24.9.30. 24.9.30 is now working for a few hours.

I once obseverd very high heap fragmentation levels (>70%) shortly before the esp crashed. Afaik this is only estimated on ESP32, but might give a hint.

ESP ist ESP32-D0WD-V3.

To Reproduce Bug

OpenDTU on ESP32-D0WD-V3 with more than 2x HM-1500 and software release newer than 24.9.30.

Expected Behavior

ESP should not periodically crash.

Install Method

Pre-Compiled binary from GitHub releases

What git-hash/version of OpenDTU?

25.2.3

What firmware variant (PIO Environment) are you using?

opendtu-generic.bin

Relevant log/trace output

Anything else?

No response

Please confirm the following

  • I believe this issue is a bug that affects all users of OpenDTU, not something specific to my installation.
  • I have already searched for relevant existing issues and discussions before opening this report.
  • I have updated the title field above with a concise description.
  • I have double checked that my inverter does not contain a W in the model name (like HMS-xxxW) as they are not supported.
@plieven plieven added the bug Something isn't working label Feb 9, 2025
@tbnobody
Copy link
Owner

tbnobody commented Feb 9, 2025

Please provide a log of the serial console when the bug occours.. Otherwise it will not be possible to trace this issue.

@plieven
Copy link
Author

plieven commented Feb 11, 2025

I will try to attach a serial console during the week. The DTU is installed outdoor so I have to dismount it first.

@plieven
Copy link
Author

plieven commented Feb 14, 2025

It took a little longer than expected because I had to find out that the CH340 was broken. I first hat to order a new ESP board.

`RX Period End
All missing
Nothing received, resend whole request
TX AlarmData Channel: 23 --> 15 83 77 57 42 80 16 24 32 80 11 00 67 AE F8 5A 00 00 00 00 00 00 00 00 3B FF 4A
RX Period End
All missing
Nothing received, resend whole request
TX AlarmData Channel: 40 --> 15 83 77 57 42 80 16 24 32 80 11 00 67 AE F8 5A 00 00 00 00 00 00 00 00 3B FF 4A
RX Period End
All missing
Nothing received, resend whole request
TX AlarmData Channel: 61 --> 15 83 77 57 42 80 16 24 32 80 11 00 67 AE F8 5A 00 00 00 00 00 00 00 00 3B FF 4A
RX Period End
All missing
Nothing received, resend whole request
TX AlarmData Channel: 75 --> 15 83 77 57 42 80 16 24 32 80 11 00 67 AE F8 5A 00 00 00 00 00 00 00 00 3B FF 4A
Interrupt received
RX Channel: 75 --> 95 83 77 57 42 83 77 57 42 01 00 01 80 01 00 01 65 FA 65 FA 00 00 00 00 80 24 B1 | -80 dBm
Interrupt received
RX Channel: 75 --> 95 83 77 57 42 83 77 57 42 03 69 E9 FF FF FF F9 80 02 00 04 6A 0A 6A 0A 00 00 96 | -80 dBm
Interrupt received
RX Channel: 75 --> 95 83 77 57 42 83 77 57 42 04 00 07 80 02 00 05 6A 10 6A 10 FF FF FF FA 80 02 96 | -80 dBm
Fetch inverter: 116185122930
Request device info
Queue size - NRF: 9 CMT: 0
Interrupt received
RX Channel: 3 --> 95 83 77 57 42 83 77 57 42 05 00 06 6A 32 6A 32 00 00 00 06 80 02 00 07 6A 37 48 | -80 dBm
Interrupt received
RX Channel: 75 --> 95 83 77 57 42 83 77 57 42 06 6A 37 FF FF FF FB 80 02 00 08 6A 5A 6A 5A 00 00 40 | -80 dBm
E (145946) task_wdt: Task watchdog got triggered. The following tasks did not reset the watchdog in time:
E (145946) task_wdt: - async_tcp (CPU 1)
E (145946) task_wdt: Tasks currently running:
E (145946) task_wdt: CPU 0: IDLE0
E (145946) task_wdt: CPU 1: async_tcp
E (145946) task_wdt: Aborting.

abort() was called at PC 0x4012965c on core 0

Backtrace: 0x40083d9d:0x3ffbed3c |<-CORRUPTED

ELF file SHA256: cc2702188b671a14

E (10602) esp_core_dump_flash: Core dump flash config is corrupted! CRC=0x7bd5c66f instead of 0x0
Rebooting...
ets Jul 29 2019 12:21:46

rst:0xc (SW_CPU_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
configsip: 0, SPIWP:0xee
clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
mode:DIO, clock div:2
load:0x3fff0030,len:1184
load:0x40078000,len:13232
load:0x40080400,len:3028
entry 0x400805e4
E (708) esp_core_dump_flash: No core dum����ѥѥ���found!
E (708) esp_core_dump_flash: No core dump partition found!

Starting OpenDTU
Initialize FS... done
Reading configuration... Check for default DTU serial... done
done
Reading language pack... done
Reading PinMapping... [ 202][E][vfs_api.cpp:105] open(): /littlefs/pin_mapping.json does not exist, no permits for creation
using default config done
Initialize Network... done
Setting Hostname... Configuring WiFi STA using new credentials... done
Initialize NTP... done
Initialize SunPosition... done
Initialize MqTT... done
Initialize WebApi... done
Initialize Display... done
Initialize LEDs... done
Initialize Hoymiles interface... NRF: Connection successful
Setting radio PA level...
Setting DTU serial...
Setting poll interval...
Adding inverter: 116185122930 - xxx
Adding inverter: 116183773950 - xxx
Adding inverter: 116184895235 - xxx
Adding inverter: 116183775742 - xxx
done
Switch to WiFi mode
Setting Hostname... done
Configuring WiFi STA using new credentials... done
Configuring WiFi STA DHCP IP... done
WiFi connected
WiFi got ip: 192.168.178.128
Network connected
`

@plieven
Copy link
Author

plieven commented Feb 14, 2025

@tbnobody its scraping of /api/prometheus/metrics that causes the reset. I will try to figure out whats going wrong.

@plieven
Copy link
Author

plieven commented Feb 14, 2025

I bisected the issue down to this:

2878807 is the first bad commit
commit 2878807
Author: Thomas Basler [email protected]
Date: Fri Nov 1 21:56:41 2024 +0100

Upgrade ESPAsyncWebServer from 3.3.17 to 3.3.21

platformio.ini | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

@plieven
Copy link
Author

plieven commented Feb 14, 2025

The internal buffer for the AsyncWebServerResponse was changed from a standard char buffer to StringStream between those releases. I will first try 3.7.0 and then if reverting from StringStream back to a char buf the next days.

@nakott
Copy link

nakott commented Feb 17, 2025

In the past, I also saw unexpected openDTU restarts. I'm using one HM-1500. I recorded my observation since mid of last year:

opendtu-generic_240930.bin ->sometimes unexpected restart during night (values are zero afterwards)
opendtu-generic_241015.bin ->so far so good
opendtu-generic_241107.bin ->sometimes unexpected restart during night (values are zero afterwards)
opendtu-generic_250114.bin ->so far so good
opendtu-generic_250203.bin ->sometimes unexpected restart during night (values are zero afterwards)

Currently I'm back on 250114. I'm using a LAN OpenDTU HW.

@nakott nakott mentioned this issue Feb 24, 2025
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants