The CPU of the Raspberry Pi 4 is the only 64-bit ARM chip I've ever seen that doesn't include native AES instructions. As a result, it's much lower performance for network or disk encryption use cases.
Up until the RPi 4 I thought AES instructions were a part of AArch64, but I was wrong. Such a weird omission to make on (I expect) Broadcom's side. All other 64-bit ARM SBCs just have it, even the low cost ones.
Fast table based software AES implementations leak bits of the secret key in array offsets, so anything that can use the cache on the chip can get the key. A software implementation of AES that is secure against side channel attacks by using bitslicing is slow, you would be better off using chacha20-poly1305.
AFAICT one is for the hardware AES instructions, and one is a bitsliced (so constant time) impl for 32-bit ARMv7, and one is for low end ARMv4. The latter sounds like it might still be vulnerable to timing sidechannel attacks but no idea if it is still used by default in any configuration...
I don't know about openssl's status on old / weak platforms. People who need AES without hardware AES instructions should be using BearSSL code or the rust code that has been reviewed by the author of BearSSL (see e.g. https://github.com/RustCrypto/block-ciphers/issues/65).
To build on this: I was able to implement AES and OCB mode by just reading papers without any code in them. I was, however, not able to implement GCM reliably even by translating a "simple" reference C implementation into scheme. Sure, it worked, but even after 2 rewrites it still did not produce the same output as the simple reference implementation for some edge cases.
All this was done on a just-for-fun basis, but it ended up just making me frustrated so I stopped trying.
Scenarios where the data read/written from disk is not going over the network and needs is be processed locally at the RPi at faster data rates than these exist but they are a small minority.
They are a recommended extension.
I expect that the crypto instructions are not present on the RPi3/4 because of legal reasons, to allow it to be exported to Iran and such...
One of the main benefits you get to running a proper aarch64 userspace is ability to run a modern functional firefox install. The 32-bit builds usually don't support anything newer than a very old LTS release. It's been hell trying to build and/or run that target (see: https://bugzilla.mozilla.org/show_bug.cgi?id=1452128 )
This is especially important if considering use as a low-end desktop as Chromium will eat through 4GB of ram like it's nobody's business.
Where can I get it? I didn't think Mozilla offered a precompiled version for aarch64 Linux. I've been using an the Debian ESR release (68.0.5) from the Armbian repository on my NanoPi M4.
One of the things that I find great about Raspbian is that the boot files are stored on a FAT32 partition which is readable from any OS, and there are some nice considerations for headless setups: drop in a wpa_supplicant.conf to configure Wi-Fi, `touch ssh` to get OpenSSH enabled at next boot. Do any of the alternative OSes, like Ubuntu, offer this?
I recently installed Arch Linux on my Pi Zero, it also uses FAT32 for the boot partition. I did the initial setup the way you described. Check out https://archlinuxarm.org/ , it's pretty great.
So that's how they came up with that. As someone who normally installs "normal" Linux systems I find it quite irritating that you have to put a file somewhere, especially the boot record, to "enable" ssh of all things. Up until now I considered it a weird decision. (and I still think it is not optimal) I deploy my RPi's in the field and don't put a monitor on them so I would expect ssh running as default. First time I found out about it was when reading the unit file when I was building a custom image based on raspbian, so I wouldn't consider it obvious :) When working on a Linux Device I just mount the main partition and do my customizations.
Probably the right decision then. As I don't put them on public networks and delete the pi user this is of little concern to me, but given the target group, it is a simple safety measure.
This is how it works in ARMbian. It forces a password change on first login. It can be annoying if you intend on deleting the alarm user right after that, but I can easily see why. "Default" passwords are always suboptimal.
> I deploy my RPi's in the field and don't put a monitor on them so I would expect ssh running as default. First time I found out about it was when reading the unit file...
So I guess you weren't building those images? If you're building headless RPi images that's something you learn immediately.
I have a tendency to learn stuff in reverse. I used to make an SD Card ready, connected monitor and keyboard and enabled everything as needed. Then I got the task to deploy n-RPi's and looked at the image first to customize it for the requirements we had. That's when I looked around and saw the unusual unit file and tried to understand why the service would look at the boot partition to start a service. In the end I think I added the symlink to start the service thorugh systemd.
I don’t know the details of how they implement this but it sounds like you want a systemd drop in file to override the ConditionPathExists (or similar) directive (I’m assuming you can’t or don’t want to modify the upstream unit directly given the use of symlinks).
Or you can chroot to the mounted Raspbian root partition and do a normal `systemctl enable ssh` as part of your image customisation. Because, to be clear, you do not have to put a file in /boot to enable SSH, as it was claimed above. That is purely a helpful shortcut.
I'm interested in trying it. In the past I used a system based on Debian's Live Build (https://github.com/gumstix/live-build) and I was wondering how it differs, but I'm just now noticing even this suggests using Yocto instead.
One downside: you will probably need a bit more memory to do the same work. An unmentioned benefit: better ASLR, for what that's worth.
> A 64 bit system means that RAM can be accessed in 8 byte read/writes per instruction.
I mean...kind of but that's probably not really what's happening in this benchmark. Firstly, in that test `memset()` is surely using NEON instructions internally on both ARMv7 and AArch64, which can load/store up to 32 bytes in a single instruction. Further that test is really just showing the bandwidth of the memory controller. I'm not sure why AArch64 would matter there. It's possible that `memset` / `memcmp` are using smarter prefetching instructions in AArch64.
I've never understood why zswap decompresses the pages before writing them out to disk. Disk IO is the slowest part of the whole chain, you've already paid to compress the data, why on earth would you throw that away AND use up more disk IO?
My guess is that this is due to some inherent limitations of the Linux kernel swap mechanism ? Possibly so that you can safely disable zswap at runtime (you can) ?
Or quite possibly no one implemented it just yet & writing uncompressed pages is transparently compatible with what normal swapping does.
Noob question: What kind of optimizations are even responsible for those kinds of performance improvements?
32 bits is enough to utilize all 4 GB of the Raspberry Pi4, so I figured the only benefit of using a 64 bit OS would be to support 64 bit software. Why would a 64 bit build perform better than a 32 bit build on the same hardware?
Integer division is mandatory from v7VE on upwards (so roughly Cortex-A9, A15 and later), and it's definitely in the 32-bit support of any 64-bit capable core. If Raspbian is compiling its packages for the lowest common denominator arm v6 cpu some Pis have, then you'd get some slowdown from division being an out-of-line call (to library code that can use the hw insn), but you'd see that part of the speedup just from building for 32-bit but optimised for newer cores than v6, I think.
Slightly off topic, but kind of on theme at least: does anyone know where to get an SBC (single board computer) with 6GB of RAM... or even 4.5GB (that doesn't cost like $~300)? The raspberry PI is almost perfect, I just need slightly more ram per unit.
There are a few more. You could also build your own.
All you need is a micro ITX motherboard + ram and a processor and something to power. Get them used and it will be way more powerful.
The Raspbian images by design support all versions of the Pi, back to the original. The developers have stated they don't have the resources to maintain separate 32- and 64-bit userland images, but provide the config.txt option to at least enable the 64-bit kernel.
If you want a 64-bit userland, there are other options besides Raspbian. You can also try an in-place arch change from Raspbian to aarch64, but I imagine that would break some things, since the Raspbian-specific package repository only has packages for the armhf arch.
I just gave it a try and now apt is installing a bunch of new kernel .img files. So for me, yes, this seemed to be required before 64bit kernels were installed. I'm feeling surprised. I'm 40% of the way through the install and hoping it reboots okay in the end.
Edit: to be more clear, I also run 'apt update' afterwards
I've also customised the image by adding users, public keys and such. Removing some of the cloud cruft.
These instructions make it very, very easy and you can do this on an x86 machine. Just make sure to use /usr/bin/qemu-aarch64-static (64 bits) instead of qemu-arm-static (32 bits).
The flipside is-- if you have a 1GB pi, you're going to be wasting a whole lot of RAM on wider pointers, and you may be more constrained by RAM than CPU throughput.
(Also, 64 bit Pi is not nearly as well supported as 32 bit currently. Does hardware GL work yet? What about if you're going to do MIPI, etc?) IMO it's worth waiting for a little more maturity.
I realize I'm a little late to the party as this post is almost a day old, however as a Pi enthusiast I feel obligated to mention RackN's edgelab [1] project which leverages Pi4's PXE boot to rapidly build a 64-bit mini Pi lab and then a k3s cluster on top.
Full disclosure: I don't work for RankN but am a customer; my use case is zero-touch ESXi cluster and Linux builds but I like the tool and have way too many Pis.
I’ve been running Ubuntu on most of mine for a while now, but I go back to Raspbian now and then for the hardware support (graphics, in particular, are a bit of a pain, but I’ve also had issues with the built-in Wi-Fi). My “lab” Pi 4 is on Raspbian also because I like noodling in Mathematica (which I don’t think you can get in ARM64 at all for any distro).
It would be awesome to have a decent ARM64 SBC with a good GPU (able to drive 2 4K monitors and run Firefox/VS Code). Any recommendations from the non-Pi crowd?
Last time I booted up my Jetson Nano, it was so slow. When I would ssh in it would take like two seconds to get a response after typing ls. Obviously this was sd card reads. It just doesn't have the polish/optimisations raspberry pi has.
Has anyone gotten the aarch64 Raspberry Pi image of Alpine to work on the Pi 4? It seems like a good fit for a RAM constrained system but I could never get it past the colorful boot screen.
It's fairly easy to trim down what runs on Raspbian. so if your only concern is background tasks eating RAM, you don't need something like Alpine to fix that issue. We use Pis at my work for running slideshows, and while performing their duty they use about 123MB of RAM for all running software (mostly our software + monitoring tools).
Just run current firmware from last september or later and keep the pi4 standing up on its side naked, it will never throttle no matter what you throw at it. (unless maybe your room is 80F or more)
I've had my pi4 compile stuff for days on end and it does not throttle if the above setup is followed.
I'd still recommend a fan or something because the Pi 4 still gets quite hot, like around 50c idle without cooling solutions, and throttling only starts at 80c, which is crazy hot to touch.
I just put a copper and aluminium thing with large mass (about 100g) directly on the processor. Under light load it is about 20 over ambient. Despite CPU stress tests, I have never had it throttle.
Make sure it touches the usb chip.
Edit: my rock pi has a proper heat sink and it runs even cooler. 12 over ambient
As far as I'm aware, ilp32 support for aarch64 has been proposed and implemented, but the patches weren't accepted upstream in the kernel because nobody was able to make a sufficiently convincing case that there was enough benefit to justify adding a whole new ABI to the kernel (there are a few benchmarks where it's noticeable but mostly it just doesn't make enough difference to be worthwhile, AIUI). Possibly the situation has changed since I last heard about it.
> I think the main issue keeping the pi with a 32b userspace is the lack of availability of 64bit GPU firmware
Uh, no. It's there because of the design goals and priorities of the Foundation. They want to be able to distribute a distribution that runs on any raspberry pi generation. Performance is of secondary concern to them. So Raspbian remains at 32-bit.
> this will be combined with a 32bit userland, for the reason mentioned above. - the amount of work needed to update all the libraries that talk to the GPU.
My eyes kinda glaze over when I see that because the vc4-fkms-v3d driver works fine enough with mesa on 2D and 3D shit using a full 64-bit system. Maybe it's not as fast as the original proprietary driver, I don't know as I've never used the original proprietary one nor have I seen anyone benchmark it.
The fully open source driver stack can't some soon enough. (yes I know firmware files still exist, but they don't prevent you from running a 64 bit kernel and userspace)
Up until the RPi 4 I thought AES instructions were a part of AArch64, but I was wrong. Such a weird omission to make on (I expect) Broadcom's side. All other 64-bit ARM SBCs just have it, even the low cost ones.