-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
embedded as fallback without simd (1) and parser design (2-4) #2
Comments
|
Actually there is one way in which it could matter. I can determine the maximum depth and consumers should be able to tell with one check whether they'll have enough stack space. |
Could you give an example of representative embedded hardware? Do they typically have 64-bit trailing zero count? (I think I may have heard of hardware with only a 32-bit ctz operation one time?) Should I make a fallback where only 32 bit operations are efficient? |
Probably more common ones are ARMv7, which can be used with different compatible DRAM sizes. Typical are here 512 MB. See for example the beaglebone. ARMv8 and later all have 64-bit or are only 64-bit mode. More niche would be these pine64 things, here risc5 based: https://pine64.com/product/128mb-ox64-sbc-available-on-december-2-2022/ (those use integrated PSRAM).
On typical embedded low-cost devices with sufficient RAM (>32MB) you typically have MMU and MPU and 32-bit, not 64-bit. I think it is unfeasible to target anything below 128MB, which is roughly what people might use for a webserver and to debug basic stuff without JTAG (I need to check the numbers again on how much memory Linux and a remote debugger needs, because we dont use it at work). See https://developer.arm.com/documentation/dui0068/b/ARM-Instruction-Reference/ARM-general-data-processing-instructions/CLZ for clz.
That depends on the level of efficiency. Luckily, Zig has ARMv7 CI-tested and generated, so you can take that as baseline. |
There is currently some stuff missing for ARM what the ABI mandates, so if you run into problems, let me know or complete the things yourself. |
It is not premature optimization to think about the performance of your software on target devices. In fact, the whole point of this repo is optimization. In Zig we can switch things to 32 bit super easily for low-end devices. We just need to make a comptime conditional which checks for the CPU features we're looking for. I might order one of these devices and test out performance on there. |
I just got done putting in a lot of hours learning SWAR techniques and implementing pretty efficient SWAR fallbacks. I have not committed it yet, as the code still needs some reorganizing, but it's in the works! |
Update: I tested out the tokenizer on a RISC-V Sifive u74 and it's ~1.5x faster than the legacy version! With some more work, I might be able to improve it further, but I am happy with this result nonetheless. |
Sounds awesome. I'm super curious on the SWAR techniques and on the perf graph once you feel it is ready to be shown. :) |
I think it would be fun to present it on Zig showtime. We'll see! |
I guess you have probably made up your mind about 1-2 with a draft impementation, so I wanted to ask about it.
The text was updated successfully, but these errors were encountered: