-
-
Notifications
You must be signed in to change notification settings - Fork 21.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[4.4 beta 4] iOS: Metal shader compilation warnings and unexpected compilation amount #103006
Comments
Can you compare with 4.4beta4 on MoltenVK? You can switch back in the Project Settings using Rendering Driver overrides for macOS and iOS. This is likely a consequence of the new ubershaders in 4.4, although I'm surprised they are compiled when 3D rendering is not used. I guess they are still needed when you use GPUParticles? |
In the project settings "rendering/rendering_device/driver.ios" is already set to vulkan and is the only option for me, probably because I'm running the editor under x86, not on ARM. The iOS export seems to automatically run on Metal, regardless. Is that the correct settings path for overriding it in theory? I'm using 4 x GPUParticles2D, which are all in one scene, but only one gets set to emitting upon an initializer. Maybe that is causing all these permutations for compilation? |
Yes. We should probably change the setting hint to allow setting Metal even in x86, so that you can export Metal even if you don't run it yourself. |
Good to know, thanks. The warnings are ok – I will look at whether we can suppress them in release builds. An SPIR-V optimiser would reduce them significantly, as we could enabled a few passes, like dead code elimination. |
Multiple
It's a bit more complex, it will use Metal on iOS even if you set it to Vulkan on x86_64 Mac. Since Vulkan is the default value on x86_64, it's not saved in the config (and default on iOS is Metal). We probably should show always show all available values for both macOS and iOS and always have the same default, and instead auto fallback to Vulkan on x86-64 (#102341 already do the fallback, but a warning print probably should be added to avoid confusion). |
This would always print a warning on every startup on x86_64 hardware by default, so I'm not sure. What we can do though is amend the rendering driver startup line with a notice about the fallback being applied. Something like this:
|
I confirm we shouldn't make the default value or available hints depend on th editor host, as we see here that's limiting proper configuration. I would make the default "auto" for macOS, which would be Metal on arm and Vulkan on x86_64. For iOS, the default should be Metal and Vulkan should be available to select as option (so no need for "auto" there I believe). For 4.5, I think we should really implement a hint so that rendering method and drivers always get written to project.godot even when using default values. I thought we had a proposal for that but I couldn't find it (GH search isn't super helpful). |
This is probably better, and with "auto" we do not need any warning. |
I've tested the override by manually editing the project file with closed editor, but the game still starts up with Metal, so it doesn't seem to respect the override currently.
Regarding the ubershader compilations, I wonder how the pipeline and specializations could be manually tweaked in the future. It is nice that particle preload basically happens automatically in 4.4, but the 78 "compilation succeeded" messages suggest, that a lot of unnecessary features get pre-compiled, that will never get used by a mostly Control-Node based game. For comparison in 4.3 I only ever get the shader compilations for the particle systems. I need to do some proper timed startup test next. Out of interest, I've compiled a custom iOS export with |
I've done some profiling with Instruments, testing the first run (app was removed before each test): 4.4 b4: 16s until menu (ubershader pipeline, 78 compilations, no manual preloading) So it seems the baseline is 4s to get to the menu, but the additional ubershader compilation in 4.4b4 takes 12s compared to the 4.3s of the manual preload. |
I've forced the So it seems all those extra compilations only happen on Metal? |
@georgwacker How are you measuring shader compiles? I am a bit confused since the way we compile particle shaders hasn't changed between 4.3 and 4.4. The ubershader system applies to the shaders we use for drawing 3D meshes. So are you measuring all shader compiles somehow? And if you are, how are you doing it? Latter 4.4 releases can track shader compiles in the monitors, but that didn't exist in 4.3. |
Ah, I wasn't aware that the ubershader system is not used for particle shaders. But it must be related to the new pipeline cache system? I'm testing cold bootup time to menu with no shader cache in Instruments and looking at Xcode logs for "compilation succeeded" messages. Below is the 16s "severe hang" before reaching the menu. When running under Metal, it shows 78 compilations vs. the 4 compilations under Vulkan, so I presumed the additional time is due to those additional compilations. But the slow bootup can be something else related to Metal, perhaps? Edit: Those Points of Interests in the trace are all |
What version of MoltenVK are you using to build your application? I have not verified this, but can you try running Metal and Vulkan with the Metal compilation cache completely disabled by setting the There shouldn't be any reason that Metal is compiling more shaders than Vulkan, as it is driven by Godot's rendering driver. I would also expect MoltenVK should be compiling a lot more than 4 shaders on cold startup. |
As noted in #96052, from a cold startup, Metal should be faster than Vulkan, which was also confirmed by another user. Indeed, this was only validated on macOS, as it is easy to clear the Metal shader cache as noted in the Testing section of the PR description. I don't know how easy that is to test on iOS, which after various runs, and without rebooting the entire device, your results may be affected by previous runs. I'm hopeful that I will run those tests again on macOS using master, to make sure there hasn't been any regressions. |
@georgwacker thank you for your response. Indeed, I think Stuart is on the right track. It sounds like some sort of system caching is working successfully in MoltenVK that isn't successful in our Metal backend. The actual number of pipelines compile requests should be the same between them. @stuartcarnie do you know if the reported compilation number in XCode is for pipelines that were compiled from scratch (as opposed to loaded from cache)? |
@georgwacker try setting this environment variable when you run your iOS app from a cold start: GODOT_MTL_SHADER_LOAD_STRATEGY=lazy I'll elaborate in a follow-up comment, but it should make a significant difference. |
I have determined the difference, which I identified in #96052:
The same goes at runtime, where Godot will request that the driver compile a shader, but may never use them in a pipeline, at least not immediately. One aspect I expect would be all the shader variants. More specifically, Godot will ask the
which for the Vulkan driver won't do much, but for Metal, it will ask for a new and associated log as Create Metal Library (Godot (PID)). I implemented an alternative library loading strategy that compiles the GODOT_MTL_SHADER_LOAD_STRATEGY=lazy This behaviour more closely matches MoltenVK's implementation, which will also delay Note On a desktop machine, there are significantly more Metal shader compiler services available for concurrent compilation, whereas there are only 2 on iOS devices, from what I learned from Apple. I found that We tell Metal to maximise compilation services with the following API (macOS only): godot/drivers/metal/rendering_context_driver_metal.mm Lines 49 to 53 in 9fc39ae
I further validated this strategy with the Bistro demo, by analysing the Metal compilations from cold start for the Pay attention to the Create MTLibrary counts Metal Cold StartThe default
Metal Cold Start (lazy)With the environment variable set to
Vulkan Cold Start
SolutionI can expose the compilation behaviour as a driver-specific project setting so users can override it. For iOS it would default to lazy and for desktop, it can stay as the current behaviour. Users can change the setting if iOS increases concurrency or the find that macOS starts faster using the alternative approach for their specific project. |
@stuartcarnie I think we need to re-evaluate some of our decisions in light of the Ubershader stuff. I forgot about iOS' limit of 2 concurrent pipeline compiles. It really complicates the async compilation approach. We rely on being able to throw a bunch of stuff at the driver and then just use the results when ready. But, ultimately we can now distinguish between Ubershader compiling and optimized pipelines compiling. Ubershaders will be loaded at load time or the first frame, we need to do more compilations than in 4.3, but it should be fine (Metal and Vulkan should behave the same). When we compile the ubershaders we need to compile all the variants they need (I.e. the pipeline variants should be loaded ASAP). But then at run time the optimized variants should be scheduled to compile with as little overhead as possible. I'm not sure I fully understand this lazy compile strategy. But if it allows us to defer the cost of creating pipelines, then that sounds like the right approach. Ideally, any cost from creating the optimized pipelines should be deferred and should be constrained to a background thread. I don't think we need to expose a setting for this. I think we can design a solution specifically for iOS, since it is a unique platform. What we have now works great for MacOS, so let's just figure out the minimal set of changes needed for iOS and then try it out |
Would the shader baking PR improve on this? #102552 |
With lazy loading on metal 4.4b4 official I'm getting 8.4s into menu on cold boot, which is much better than the 16s with default strategy. With my custom build forcing vulkan 4.4b4 it's still only 3.5s on the cold boot, though. Custom build running metal takes 5.2s, so slightly better. For these tests, I've been using the MoltenVK bundled with the iOS export template from 4.3, which shows as 1.2.283. |
@georgwacker those numbers are more in line and expected. MoltenVK has a little advantage here, as Metal has to convert all the SPIRV to MSL during the calls to As @kisg noted, the shader baker PR will resolve this problem. |
Further to @kisg's question about #102552, I am planning to leverage the Metal compiler tools, when available on Windows and macOS, so that baking shaders will not only generate the Metal source, but take it a step further and generate Metal libraries compiled to AIR, so Metal will have a significant advantage over MoltenVK here. |
We can still do that, as Metal supports continuations / callbacks for compilation, so we use the results when ready. I use that feature already in the driver for the non-lazy (immediate) shader compiler mode. |
Tested versions
System information
Godot v4.4.beta4 - macOS Sonoma (14.7.2) - Multi-window, 2 monitors - Vulkan (Mobile)
Issue description
So far, I've been using MoltenVK on iOS with version 4.3-stable.
Doing manual particle preloading I'm getting 4 compilation warnings from MoltenVK for one of each particle system.
Switching to 4.4 beta4 using Metal with pipeline caching and no manual preloading is showing 78 compilation warnings on the first launch, which is very slow (logs down below).
I even got these warnings when running in Release mode via Xcode.
Log (truncated)
metal_4.4b4_log.txt
Steps to reproduce
Minimal reproduction project (MRP)
The text was updated successfully, but these errors were encountered: