Easton Man's Channel

@EastonMan 看的新闻
+碎碎念
+膜大佬
+偶尔猫猫
+伊斯通听的歌

15:15 · Nov 16, 2025 · Sun

Daniel Lemire's blog
AMD vs. Intel: a Unicode benchmark

source

Roughly speaking, our processors come in two types, the ARM processors found in your phone and the x64 processors made by Intel and AMD. The best server processors used to be made by Intel. Increasingly, Intel is struggling to keep up. Recently, Amazon has…

13:20 · Nov 11, 2025 · Tue

Five Years of Apple Silicon: M1 to M5 Performance Comparison - MacRumors https://share.google/4VsFpfpjeuiZdWPYA

MacRumors

Five Years of Apple Silicon: M1 to M5 Performance Comparison

Today marks the fifth anniversary of the Apple silicon chip that replaced Intel chips in Apple's Mac lineup. The first Apple silicon chip, the M1, was unveiled on November 10, 2020. The M1 debuted in the MacBook Air, Mac mini, and 13-inch MacBook Pro.

06:42 · Nov 10, 2025 · Mon

Daniel Lemire's blog
Automated Equality Checks in C++ with Reflection (C++26)

In C++, comparing two objects for equality is straightforward when they are simple types like integers or strings. But what about complex, nested structures? You may have to implement the comparison (operator==) manually for each class, which is error-prone and tedious.

Consider a person class.

class person {
public:
    person(std::string n, int a) : name(n), age(a) {}
private:
    std::string name;
    int age;
    std::vector<hobby> hobbies;
    std::optional<uint64_t> salary;
};

To compare two person objects, we need to check if their names, ages, hobbies, and salaries match. Hobbies are a vector of hobby objects, each with a name. Salaries are optional. Manually writing operator== for this would involve checking each field, and if hobby changes, you would have to update it.

Without reflection, you would hae to write something ugly of the sort.

bool operator==(const person& a, const person& b) {
    return a.name == b.name && a.age == b.age && a.hobbies == b.hobbies && a.salary == b.salary;
}

But this assumes hobby has operator==, and std::vector&LThobby> has it only if hobby does. If hobby lacks ==, compilation fails. Moreover, for large classes, it becomes annoying.

Instead, I wrote a small example that fully automates the process. You just add an operator overload.

bool operator==(const person& a, const person& b) {
    return deep_equal::compare(a, b);
}

The trick is that C++26 allows you to query a type’s members at compile time and iterate over them. The function std::meta::nonstatic_data_members_of, which gives us a list of members.

My complete implementation is less than a hundred lines of code, and it comes down to the following lines of code.

bool compare_same(const T& a, const T& b) {
    template for (constexpr auto mem : std::define_static_array(std::meta::nonstatic_data_members_of(^^T, std::meta::access_context::unchecked()))) {
        if (!compare_same(a.[:mem:], b.[:mem:])) {
                return false;
        }
    }
    return true;
}

It relies on the reflection operator ^^T, which produces a std::meta::info value representing the type itself. Inside the function body, std::meta::nonstatic_data_members_of(^^T, ...) queries all non-static data members of T, returning a vector of reflection values in declaration order. The unchecked access context deliberately bypasses visibility rules, allowing comparison of private and protected members. This reflection vector is then wrapped in std::define_static_array, which materializes it into a static array, making it iterable in a compile-time loop. The template for loop iterates over each member reflection mem at compile time. For each, the splice expression a.[:mem:] directly accesses the corresponding member. The [: :] is more or less the inverse of the reflection operator (^^T): think of it as a going into a meta universe with ^^ and coming back into the standard C++ universe with [: :].

The approach ensures zero runtime overhead, as all reflection, splicing, and looping are resolved during compilation. I posted a full demonstration.

source

08:47 · Nov 6, 2025 · Thu

Arch Linux: Recent news updates
waydroid >= 1.5.4-3 update may require manual intervention

The waydroid package prior to version 1.5.4-2 (including aur/waydroid) creates Python byte-code files (.pyc) at runtime which were untracked by pacman. This issue has been fixed in 1.5.4-3, where byte-compiling these files is now done during the packaging process.

As a result, the upgrade may conflict with the unowned files created in previous versions. If you encounter errors like the following during the update:

error: failed to commit transaction (conflicting files)

waydroid: /usr/lib/waydroid/tools/__pycache__/__init__.cpython-313.pyc exists in filesystem

waydroid: /usr/lib/waydroid/tools/actions/__pycache__/__init__.cpython-313.pyc exists in filesystem

waydroid: /usr/lib/waydroid/tools/actions/__pycache__/app_manager.cpython-313.pyc exists in filesystem

You can safely overwrite these files by running the following command:

pacman -Syu --overwrite /usr/lib/waydroid/tools/\*__pycache__/\*

source
(author: George Hu)

01:31 · Nov 4, 2025 · Tue

属于CYY自己的世界
RISC-V 通用处理器，未来在哪呢

前言

相信大家也看到了，RISC-V 通用处理器生态圈的现状并不太平。

它市场竞争力低吗？确实！
我们能改变吗？我相信还有机会！

注：本文仅代表本人立场与观点，不代表任何组织、开源社区。

现状是什么样呢？

● 基础的 Linux 内核、发行版以及大量的开源软件支持
● QEMU 级别单核性能的现存硬件
● 碎片化的 ISA 遇上想建立新纪元的发行版
● 满足 Bug-free && 快的硬件几乎找不到
● 吵架时间过长的社区

过去几年我们经历了什么？

● ISA: RV64GC-> RV64GC_Zba_Zbb -> RVA22U64 + V + Zvfh -> RVA23U64
● CPU性能指标： CoreMark / Dhrystone -> SPECCPU
● Server Feature：Hypervisor、IOMMU、AIA

嗯，是好事，很大程度弥补了做一个通用核与 ARM 以及 x86 的差距。

冰山之下还藏着什么？

RVA23U64 这样的标准应该是一个良好的开始，但对比现在的 aarch64 与 x86 ，我认为还远远不够。而有许多真实的，SPECCPU 跑分反映不出来的问题，却被忽视了太多太多！

ISA 本身的设计

● RVV 需要太多 additional 的预测与Rename
● RVC 让宽前端不好做，也失去了一些拆分指令前搞事的可能性
● 实际的硬件可能实现了下一代 RVA Profile 的子集但不完全兼容，软件需要 handle 碎片化才能有良好的性能。 ● 不妨看看我的 rv64.zip
● 太多太多能在特定 workload 让编译器不好优化的小问题，我的博客就有 2 篇分析！ ● How CCMP reduce the pressure of branch predictor on aarch64 ● “Short-leg” of RISC-V

JIT

如果你在像 SG2042 这样的超多核 RISC-V CPU 上 Profiling 过 JIT 应用（如Box64、Java、v8），你一定对 RISC-V 目前 I-D Cache 同步操作只有 fence.i 有巨大的意见，惊叹它怎么可以这么慢！

而现在的 Server Workload，又恰好对 JIT 以及频繁的进程启动（比如FaaS）相当敏感。

fence.i 的问题：只能做当前核的 I-Cache 全部清空来维持同步。

所以，当 OS 下的一个 JIT 用户进程刚修改完指令希望 I-Cache 同步，需要怎么做呢：

1. 给内核发一个 syscall
2. 内核通过核间中断通知所有核心执行 fence.i （也可以通过 SBI Remote fence，whatever）
3. 核心清空非一致性的 I-Cache

把 JIT 里频繁的 Cache 修改变成了一个牵一发动全核的事情。即使硬件做了 I-D 一致性，将 fence.i 仅实现为刷流水线，现有的软件倘若不能 Probe ，依然需要全核 IPI ，让我们一起执行 fence.i 。

反观别的主流 ISA 怎么做呢：

● x86：自带I-D一致性， Intel 手册推荐执行任意一条能刷新流水线的指令（如CPUID）即可保证写入的指令马上可用。
● aarch64：拿到修改的虚拟地址，执行 dc cvau 刷走 D-Cache，再执行 dsb ish作为 fence ，再通过 ic ivau 全核广播 I-Cache Invalidate，再执行 dsb ish 作为fence，即可完成。

aarch64 的做法虽然看似麻烦，但实现了细粒度刷 Cache Line ，不依赖 syscall ，全程不依赖软件 IPI。甚至用虚拟地址避免了软件手动查页表的麻烦，因为刚修改完的指令一定在当前进程页表是有映射的。

而 RISC-V 这边未来的做法是啥呢：我们加一个扩展来 Probe 硬件存在 I-D 一致性：Ziccid。2025年快过了，至今还没 Ratify。等到 Ratify 了，能被 Linux Kernel hwprobe 收入，能被软件 probe 从而省掉 syscall ，又是很长很长的时间。而且至今仍然没有像 aarch64 这样对于小核足够友好的 va invalidate + bus snoop 指令。

安全特性

可能大家不知道的是，现在的消费级电子产品都越来越多部署了像 Memory Tagging 这样的软硬件协同的内存安全技术，通过很小的硬件面积避免了软件插桩做 Address Sanitizer 这样的事情的开销。Apple 更是全生态安全的代表，早早地在自家的设备上部署了相当多软硬件协同的安全技术，比如 PAC、PPL、MTE。我认为对于软件完整性保护，从而保护个人隐私非常有利，毕竟你也不希望你的手机后台静默地收一封邮件就被攻击者打穿 Kernel 这样的事情发生吧。

RISC-V 这边呢？很多扩展还处在正在推（例如 RISC-V Memory Tagging），刚 Ratify 的阶段（例如 RISC-V CFI），等到硬件实现，软件实现达到目前 x86、aarch64 类似的水平，也是需要很长很长的时间。

开源软件的发展机遇

真正的开源项目，绝不是一个小团队内部交流，把代码公开放在网上。是从一个小项目走出，有一群需要它的人、组织、企业一起完善它，最终凝聚了各个贡献者的知识，构建了一个足够强大的社区，日益繁荣，用户也随之受益。

而开源 ISA 、开源处理器，何尝不应该是这样的发展路线呢？

现阶段机遇在哪？

即使 RISC-V 在 ISA 与硬件层面都追上了 aarch64，其软件生态的问题也容易让市场望而却步。更别说目前 aarch64 与 x86 在生态上仍有很大的差距。

我们看到了 RISC-V 在嵌入式与 AI 领域的指令集可裁剪（Flexing RISC-V Instruction Subset Processors to Extreme Edge）、可厂商自定义（例如实现还没 ratified P 扩展、 Matrix 扩展）的巨大优势。

对于通用处理器，我认为优势可以有：在一类特定的 Workload 基于高性能开源核增加软硬件协同设计，在一类特定的 Workload 下做的比现有产品更好。

例如，学术界研究了非常多的新兴应用，比如 FaaS 场景的微架构预热、主要面向图计算和数据结构访问的可编程预取器、在保证系统安全的同时让安全检查开销低到可忽略的软硬件协同安全设计等等，这都是一些做好了可以适配很多很多 CPU 上的通用应用的工作。如果 RISC-V 的生态可以让需要这样处理器的组织，快速地从模拟器的 Evaluation 走向现实世界，这何尝不是开源 ISA 与开源处理器最大的价值呢？

而且，随着开源硬件生态的发展，我们有了 Chisel 这样的敏捷 RTL 生成器语言，从而让开源处理器 RTL 生成也可以像 Linux Kernel 一样提供数千个可配置、可开关的生成器参数；有了 LLVM-CIRCT 这样基于 MILR 的硬件编译器来加入我们想要的 Pass；有了 Verilator 这样的开源 RTL 仿真器。这些生态的存在让普通人参与开源硬件的门槛大大降低，不再有那些专有且昂贵的 EDA 软件挡着个人贡献者的路，也让开源硬件的代码更加易读，让一般的个人也得以轻松上手，这何尝不是一群喜欢探索和改造世界的 geek 们的终极追求？

现在，也许正是一个探索这些新机遇的好时候！

source
(author: Yangyu Chen)

05:55 · Nov 1, 2025 · Sat

Chips and Cheese
Strix Halo’s Memory Subsystem: Tackling iGPU Challenges
#ChipAndCheese

Telegraph | source
(author: Chester Lam)

Telegraph

Strix Halo’s Memory Subsystem: Tackling iGPU Challenges

AMD’s Strix Halo aspires to deliver high CPU and GPU performance within a mobile device. Doing so presents the memory subsystem with a complicated set of demands. CPU applications are often latency sensitive with low bandwidth demands. GPU workloads are often…

ChipAndCheese

05:35 · Nov 1, 2025 · Sat

Arch Linux: Recent news updates
dovecot >= 2.4 requires manual intervention

The dovecot 2.4 release branch has made breaking changes which result in it being incompatible with any <= 2.3 configuration file.

Thus, the dovecot service will no longer be able to start until the configuration file was migrated, requiring manual intervention.

For guidance on the 2.3-to-2.4 migration, please refer to the following upstream documentation: Upgrading Dovecot CE from 2.3 to 2.4

Furthermore, the dovecot 2.4 branch no longer supports their replication feature, it was removed.

For users relying on the replication feature or who are unable to perform the 2.4 migration right now, we provide alternative packages available in [extra]:

● dovecot23
● pigeonhole23
● dovecot23-fts-elastic
● dovecot23-fts-xapian

The dovecot 2.3 release branch is going to receive critical security fixes from upstream until stated otherwise.

source
(author: Thore Bödecker)

14:05 · Oct 31, 2025 · Fri

One IP address, many users: detecting CGNAT to reduce collateral effects
https://blog.cloudflare.com/detecting-cgn-to-reduce-collateral-damage/

The Cloudflare Blog