Some time in the past I obtained maintain of an inexpensive Sipeed Lichee RV RISC-V improvement board. After lastly getting it up and operating, I questioned if and the way nicely darktable would work on RISC-V? The reply is: surprisingly nicely, if the {hardware} is quick sufficient…
The Sipeed Lichee RV board
That is mainly the slowest and least expensive Linux-capable RISC-V board you possibly can at the moment get. The bottom board has an Allwinner D1 SoC with a single-core XuanTie C906 64-bit RISC-V processor core clocked at 1.0 GHz, 512 MB or 1 GB of DDR3 RAM, a 4K-capable GPU, a microSD card slot for storage and an USB-C port. The only core is meant to be just a little bit sooner than the ARM core within the unique Raspberry Pi Zero. CPU identification doesn’t inform us a lot:
sipeed@sipeed:~$ lscpu Structure:         riscv64 Byte Order:         Little Endian CPU(s):               1 On-line CPU(s) listing: 0 sipeed@sipeed:~$ cat /proc/cpuinfo processor   : 0 hart       : 0 isa       : rv64imafdc mmu       : sv39 uarch       : thead,c906
In opposite to just about all different Single-Board Computer systems (SBCs), the bottom board doesn’t have any connectivity choices apart from USB-C. No HDMI, no WiFI, nothing. Theoretically it must be attainable to get all the things up and operating with the bottom board alone, by connecting an USB Ethernet dongle to the USB-C port and supplying energy by way of two separate pins. However that’s one thing for specialists, so normally you’ll need the extra dock. It provides a WiFi chip, an extra USB-A port, a full-size HDMI port, a pin header row and another ports extra necessary for embedded units.
The Sipeed Lichee RV base board sells for about 21 € on AliExpress, with the extra dock and delivery it’s about 36 €. Which is much more than a Raspberry Pi Zero W, however nonetheless a lot lower than the 2 different RISC-V improvement boards at the moment obtainable (Sipeed Nezha, >130 €, and Sipeed VisionFive, about 200 €). Higher having a board than not having one in any respect. There are a number of further improvement boards coming, just like the Sipeed VisionFive 2 (Kickstarter gives beginning at about 60 € for a four-core Starfive JH7110 SoC with 4 GB of RAM, together with taxes and delivery) and the Pine64 Star64 (identical Starfive JH7110 SoC, even has a PCIe port, worth anticipated to be round 60-80 €), however these received’t be shipped earlier than the tip of 2022.
I can’t advocate getting the Sipeed Lichee RV board. Efficiency may be very dangerous in all regards, I get about 10 MByte/s studying from my quickest microSD card and 800 kByte/s transferring knowledge by way of SFTP. The WiFi chip on the dock doesn’t have a correct antenna, it solely picks up a sign if the hotspot may be very shut, so I needed to connect an USB Ethernet dongle. The value was okay-ish when there have been no higher choices, however a lot higher ones might be obtainable quickly. Additionally the software program and the neighborhood usually are not very well-developed. I needed to make my very own working system picture as a result of the official picture was outdated and the kernels oft most various pictures didn’t assist USB Ethernet dongles. I additionally don’t hear a lot good from the Nezha and VisionFive boards, apparently they’ve electrical points and don’t reliably boot from the SD playing cards.
If you’re wanting into RISC-V, look ahead to the Pine64 Star64. Not less than neighborhood assist will certainly be significantly better than something Sipeed can provide, and the board may have a PCIe slot, which can be utilized to connect an NVMe SSD or different goodies.
Constructing darktable on RISC-V
darktable is my favorite converter software program for Uncooked information. It has plenty of optimizations for numerous CPU architectures, CPU options and likewise helps GPU with OpenCL. This additionally implies that it doesn’t simply allow you to compile the supply code on all the things you might have and the look ahead to the compiler errors. Compiling it on RISC-V fails instantly because of the strict CPU assist macros.
Fortunately that is relatively straightforward to repair. The next patch works towards the 4.0.0 steady launch and all git commits as much as at the very least ab7e374330a9e50abad0f2784bda4b319e770239 (Fri Aug 19 09:58:25 2022 +0200):
diff --git a/src/is_supported_platform.h b/src/is_supported_platform.h index 165f071a5..b7afc8b0c 100644 --- a/src/is_supported_platform.h +++ b/src/is_supported_platform.h @@ -42,14 +42,21 @@ #outline DT_SUPPORTED_PPC64 0 #endif +#if (outlined(__riscv) || outlined(__riscv__)) && (__riscv_xlen==64) +#outline DT_SUPPORTED_RISCV64 1 +#else +#outline DT_SUPPORTED_RISCV64 0 +#endif + #if DT_SUPPORTED_X86 && DT_SUPPORTED_ARMv8A #error "Seems like {hardware} platform detection macros are damaged?" #endif -#if !DT_SUPPORTED_X86 && !DT_SUPPORTED_ARMv8A && !DT_SUPPORTED_PPC64 -#error "Sadly we solely work on amd64, ARMv8-A and PPC64 (64-bit little-endian solely)." +#if !DT_SUPPORTED_X86 && !DT_SUPPORTED_ARMv8A && !DT_SUPPORTED_PPC64 && !DT_SUPPORTED_RISCV64 +#error "Sadly we solely work on amd64, ARMv8-A, PPC64 (64-bit little-endian solely) and RISC-V (64-bit solely)" #endif +#undef DT_SUPPORTED_RISCV64 #undef DT_SUPPORTED_PPC64 #undef DT_SUPPORTED_ARMv8A #undef DT_SUPPORTED_X86
After this the supply builds with the usual gcc 12.1.0 that comes with the present Debian Sid most pictures are based mostly on. The one distinction to the traditional course of is that we now have to manually disable all of the OpenCL stuff. I additionally discovered 3 to be an excellent degree of concurrency on the Lichee RV, and GCC 12 really even helps a tuning choice particularly for the XuanTie C906 CPU:
$ CFLAGS="-mtune=thead-c906" CXXFLAGS="${CFLAGS}" cmake -DHAVE_OPENCL=Off -DTESTBUILD_OPENCL_PROGRAMS=Off .. $ make -j3
Compiling took 297 minutes, 50.544 seconds. So just about 5 hours.
Operating darktable on RISC-V
I needed darktable to have entry to the total assets of the board, so I disabled the operating LXDE desktop and used X11 Forwarding to my workstation for graphical output.
At startup, darktable emits a variety of errors and warnings pertaining to the unknown CPU structure. These usually are not essential, it simply means all of the optimized codepaths are being disabled and the generic (sluggish) ones used as a substitute.
[dt_detect_cpu_features] Not applied for this structure. [dt_detect_cpu_features] Please contribute a patch. [dt_init] SSE2 instruction set is unavailable. [dt_init] count on a LOT of performance to be damaged. you might have been warned. [dt_detect_cpu_features] Not applied for this structure. [dt_detect_cpu_features] Please contribute a patch. [dt_codepaths_init] might be utilizing experimental plain OpenMP SIMD codepath.
I had a take a look at the dt_detect_cpu_features
perform to examine what could be lacking so as to add RISC-V assist. There at the moment isn’t something to do right here, since there could be no code that may use the results of a CPU function detection on RISC-V.
Aside from the abysmal efficiency, darktable works precisely as anticipated on RISC-V. In the course of the first couple of tries it could usually crash with the next error message, however this hasn’t occurred for some time now, so I suppose it’s one thing that has been fastened in glibc/gcc/and so on.
Inconsistency detected by ld.so: dl-runtime.c: 77: _dl_fixup: Assertion `ELFW(R_TYPE)(reloc->r_info) == ELF_MACHINE_JMP_SLOT' failed!
Efficiency comparability
Talking of efficiency: It truly is abysmal. Uncooked converters are by no means the quickest picture enhancing instruments, since they course of all the things with 32 or 64 bit floating level numbers internally. darktable places explicit emphasis on precision. The next measurements had been generated with the very same darktable profile on all units, utilizing the identical 45.7 megapixel 14-bit Uncooked file taken with my Nikon D850 (the image seen within the function picture of this submit) and by operating darktable-cli
to take away the overhead of the GUI (the place attainable). The edits to this image use a relatively commonplace set of processing modules.
My workstation has a Ryzen 5900X CPU and many of the processing is offloaded to a Radeon RX 6600 XT GPU utilizing OpenCL. Producing the preview thumbnail takes about 0.116 seconds. Exporting it at 7 MP (3240×2160 pixels) decision, one thing I exploit on a regular basis for full-size previews on my 4K screens, takes about 1.5 seconds. Exporting the picture at full 45.7 MP decision (8288×5520 pixels) takes a few seconds.
On the Lichee RV, producing the thumbnail already takes 69.107 seconds, so about 700 instances so long as on the Ryzen/Radeon system. Exporting the image at full decision takes 578 minutes and 35.126 seconds, a full 9 hours…
Okay, this comparability was perhaps excessive, so let’s make it extra reasonable and use my AMD Ryzen 5 4700U laptop computer as a substitute. 6 cores, no OpenCL. Producing the thumbnail is within the 0.1 second ballpark as on the Ryzen 5900X, exporting at 7 MP takes 4.74 seconds and exporting at 45.7 MP 30.616 seconds.
So the Lichee RV is about 4000 instances slower than the Radeon RX 6600 XT and about 1200 instances slower than the Ryzen 5 4700U. But it surely works 🙂