Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: Not currently balancing #112

Open
JamesOBenson opened this issue Oct 17, 2024 · 7 comments
Open

Bug: Not currently balancing #112

JamesOBenson opened this issue Oct 17, 2024 · 7 comments
Labels
bug Something isn't working needs-analysis

Comments

@JamesOBenson
Copy link

General

ProxLB is not actually balancing my nodes more than maybe moving 1 VM even after restarting the service:
Node 1: Memory Usage 11%, CPU usage 1%;
Node 2: Memory Usage: 73%, CPU usage 1%;
VMs: 47 Running, 24 stopped, 2 templates,
LCX: 10 Running, 0 stopped, 1 template.

Config

[proxmox]
api_host: **********
api_user: root@pam
api_pass: **********
verify_ssl: 0
[vm_balancing]
enable: 1
method: memory
mode: assigned
mode_option: percent
balanciness: 10
type: all
parallel_migrations: 1
[storage_balancing]
enable: 0
[update_service]
enable: 0
[api]
enable: 0
[service]
daemon: 1
schedule: 24
log_verbosity: CRITICAL
config_version: 3

Meta

Please provide some more information about your setup. This includes where you obtained ProxLB (e.g., as a .deb file, from the repository or container image) and also which version you're running in which mode. You can obtain the used version from you image version, your local repository information or by running proxlb -v.

Version: ProxLB version 1.0.4
Running in VM inside of cluster.

@JamesOBenson JamesOBenson added bug Something isn't working needs-analysis labels Oct 17, 2024
@gyptazy
Copy link
Owner

gyptazy commented Oct 17, 2024

Hey @JamesOBenson,

interesting, can you please share the log file (please set log_verbosity to INFO) and restart the service. You can grab the logs from the systemd unit afterwards. You can also simply start it in the dry-run mode on cli where it will print it to stdout.

You switched the mode from used to assigned and the mode_option from bytes to percent. This requires me to have some more information like how much memory all nodes have (all the same size?) and how much memory the VMs really have assigned.

When pasting the log, please strip all information you do not want to share here.

Thanks,
gyptazy

@JamesOBenson
Copy link
Author

I can revert those changes. But yes, all nodes are configured the same, I thought the percentage would balance the %'s I mentioned earlier. The result was the same though, only 1 VM ever migrated, and since then, nothing. Also, we aren't using shared storage, so I had to modify your code slightly to account for that on line 1183 adding , **{'with-local-disks': 1}:
job_id = api_object.nodes(value['node_parent']).qemu(value['vmid']).migrate().post(target=value['node_rebalance'],online=1, targetstorage='1', **{'with-local-disks': 1})
I'll attach the logs in a moment.

@JamesOBenson
Copy link
Author

JamesOBenson commented Oct 17, 2024

CONFIG FOR TEST CLUSTER

[proxmox]
...
[vm_balancing]
enable: 1
method: memory
mode: used
mode_option: bytes
balanciness: 10
type: all
parallel_migrations: 1
[storage_balancing]
enable: 0
[update_service]
enable: 0
[api]
enable: 0
[service]
daemon: 1
schedule: 24
log_verbosity: INFO
config_version: 3
proxlb_logs.txt

@JamesOBenson
Copy link
Author

One thing that sticks out to me in the logs is that it picks up the 2 nodes here

Oct 17 18:10:09 proxLB proxlb[5026]: ProxLB: Info: [node-statistics]: Added node *******32.
Oct 17 18:10:09 proxLB proxlb[5026]: ProxLB: Info: [node-statistics]: Added node *******33.

But later it seems to state only one of them:

Oct 17 18:10:10 proxLB proxlb[5026]: ProxLB: Warning: [node-update-statistics]: Node *******33 is overprovisioned for disk by 102%.
Oct 17 18:10:10 proxLB proxlb[5026]: ProxLB: Warning: [node-update-statistics]: Node *******33 is overprovisioned for disk by 136%.

@JamesOBenson
Copy link
Author

We used the same thing on a second cluster, again, nothing migrated here is an excerpt of the logs:

Oct 17 18:27:24 proxLB proxlb[5099]: ProxLB: Info: [node-update-statistics]: Updated node resource assignments by all VMs.
Oct 17 18:27:24 proxLB proxlb[5099]: ProxLB: Info: [balancing-method-validation]: Valid balancing method: memory
Oct 17 18:27:24 proxLB proxlb[5099]: ProxLB: Info: [balancing-mode-validation]: Valid balancing method: used
Oct 17 18:27:24 proxLB proxlb[5099]: ProxLB: Info: [balanciness-validation]: Rebalancing for memory is needed. Highest usage: 88% | Lowest usage: 22%.
Oct 17 18:27:24 proxLB proxlb[5099]: ProxLB: Info: [get-most-used-resources-vm]: ('project', {'group_include': None, 'group_exclude': None, 'cpu_total': 8, 'cpu_used': 0.000665549082148473, 'memory_t>
Oct 17 18:27:24 proxLB proxlb[5099]: ProxLB: Info: [get-most-free-resources-nodes]: ('******37', {'maintenance': False, 'ignore': False, 'cpu_total': 40, 'cpu_assigned': 20.00450408832633, 'cpu_assi>
Oct 17 18:27:24 proxLB proxlb[5099]: ProxLB: Info: [rebalancing-resource-statistics-update]: Updated VM and node statistics.
Oct 17 18:27:24 proxLB proxlb[5099]: ProxLB: Info: [balancing-method-validation]: Valid balancing method: memory
Oct 17 18:27:24 proxLB proxlb[5099]: ProxLB: Info: [balancing-mode-validation]: Valid balancing method: used
Oct 17 18:27:24 proxLB proxlb[5099]: ProxLB: Info: [balanciness-validation]: Rebalancing for memory is needed. Highest usage: 88% | Lowest usage: 22%.
Oct 17 18:27:24 proxLB proxlb[5099]: ProxLB: Info: [get-most-used-resources-vm]: ('UbuntuDaniel', {'group_include': None, 'group_exclude': None, 'cpu_total': 2, 'cpu_used': 0.00163265451052968, 'memo>
Oct 17 18:27:24 proxLB proxlb[5099]: ProxLB: Info: [get-most-free-resources-nodes]: ('******37', {'maintenance': False, 'ignore': False, 'cpu_total': 40, 'cpu_assigned': 20.00450408832633, 'cpu_assi>
Oct 17 18:27:24 proxLB proxlb[5099]: ProxLB: Info: [rebalancing-resource-statistics-update]: Updated VM and node statistics.
Oct 17 18:27:24 proxLB proxlb[5099]: ProxLB: Info: [balancing-method-validation]: Valid balancing method: memory
Oct 17 18:27:24 proxLB proxlb[5099]: ProxLB: Info: [balancing-mode-validation]: Valid balancing method: used
Oct 17 18:27:24 proxLB proxlb[5099]: ProxLB: Info: [rebalancing-vm-calculator]: Balancing calculations done.
Oct 17 18:27:24 proxLB proxlb[5099]: ProxLB: Info: [rebalancing-vm-calculator]: Balancing calculations done.
Oct 17 18:27:24 proxLB proxlb[5099]: ProxLB: Info: [rebalancing-vm-calculator]: Balancing calculations done.
Oct 17 18:27:24 proxLB proxlb[5099]: ProxLB: Info: [rebalancing-maintenance-vm-calculator]: No nodes for maintenance mode defined.
Oct 17 18:27:24 proxLB proxlb[5099]: ProxLB: Info: [vm-rebalancing-executor]: No rebalancing needed.
Oct 17 18:27:24 proxLB proxlb[5099]: ProxLB: Info: [cli-output-generator]: Start rebalancing vms to their new nodes.
Oct 17 18:27:24 proxLB proxlb[5099]: ProxLB: Info: [cli-output-generator]: No rebalancing needed.
Oct 17 18:27:24 proxLB proxlb[5099]: ProxLB: Info: [post-validations]: All post-validations succeeded.
Oct 17 18:27:24 proxLB proxlb[5099]: ProxLB: Info: [daemon]: Running in daemon mode. Next run in 24 hours.

@firth
Copy link

firth commented Nov 1, 2024

Seeing the same thing here with version 1.0.4 and 1.0.5.

It says:
"Rebalancing for cpu is not needed. Highest usage: 99% | Lowest usage: 90%"

But in reality, I have 4 nodes, with almost 30 VMs on 1 node and no VMs on any of the other 3 nodes... but it doesn't think they are unbalanced.

@robertdahlem
Copy link

@JamesOBenson
In your log it says:
ProxLB: Info: [balanciness-validation]: Rebalancing for memory is not needed. Highest usage: 89% | Lowest usage: 84%.

With balanciness: 10 that establishes no need for rebalancing because the difference is only 5.

What are the current RAM usage values according Promox's GUI under Node > Summary?
And what is ProxLB currently logging with balanciness-validation?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs-analysis
Projects
None yet
Development

No branches or pull requests

4 participants