Easton Man's Channel

Arch Linux: Recent news updates
Cleaning up old repositories

Around two years ago, we've merged the [community] repository into [extra] as part of the git migration. In order to not break user setups, we kept these repositories around in an unused and empty state. We're going to clean up these old repositories on 2025-03-01.

On systems where /etc/pacman.conf still references the old [community] repository, pacman -Sy will return an error on trying to sync repository metadata.

The following deprecated repositories will be removed: [community], [community-testing], [testing], [testing-debug], [staging], [staging-debug].

Please make sure to remove all use of the aforementioned repositories from your /etc/pacman.conf (for which a .pacnew was shipped with pacman>=6.0.2-7)!

source
(author: Sven-Hendrik Haase)

05:31 · Feb 15, 2025 · Sat

Daniel Lemire's blog
AVX-512 gotcha: avoid compressing words to memory with AMD Zen 4 processors

The recent AMD processors (Zen 4) provide extensive support for the powerful AVX-512 instructions. AVX-512 (Advanced Vector Extensions 512) is an extension to the x86 instruction set architecture (ISA) introduced by Intel. These instructions enhance the capabilities of processors by allowing for more data to be processed in parallel. You can process registers made of 64 bytes!

One of the neat trick is that given a mask, you can ‘compress’ words: Suppose that you have a vector made of thirty-two 16-bit words, and you want to only keep the second one and third one, then you can use the vpcompressw instruction and the mask 0b110. It will produce a register where the second and third words are placed in first and second position.

An even nicer trick is that you can use this instruction to write just these two words out to memory. You can invoke this functionality with the _mm_mask_compressstoreu_epi16 function intrinsic.

This works well on recent Intel processors, but not so well on AMD Zen 4 processors.

We have a fast function in the simdjson library to minify a file (remove unnecessary spaces).

https://github.com/simdjson/simdjson/pull/2335

source

14:19 · Feb 14, 2025 · Fri

Social Stockfish

像国际象棋分析引擎一样预测和你对话对象的接下来 5 次交流，从而告诉你当前最好的回复。

https://fixvx.com/eddybuild/status/1889908182501433669

vxTwitter / fixvx
💖 12.93K 🔁 650

Eddy Xu (@eddybuild)

built an ai that sees 5 moves ahead in any conversation and tells you the optimal thing to say

14:19 · Feb 14, 2025 · Fri

这个脑洞有意思

18:25 · Feb 11, 2025 · Tue

GNU Gold Linker Is Deprecated & Will Be Gone For Good Without New Developers
https://www.phoronix.com/news/GNU-Gold-Linker-Deprecated

Phoronix

GNU Gold Linker Is Deprecated & Will Be Gone For Good Without New Developers

With the recent GNU Binutils 2.44 release, one of the changes is worth calling out in its own article: the GNU Gold linker is now officially deprecated and is now being segregated to its own extra Binutils package but risks being removed all together without…

15:32 · Feb 11, 2025 · Tue

Chips and Cheese
Intel’s Battlemage Architecture
#ChipAndCheese

Telegraph | source
(author: Chester Lam)

Telegraph

Intel’s Battlemage Architecture

Intel’s Alchemist architecture gave the company a foot in the door to the high performance graphics segment. The Arc A770 proved to be a competent first effort, able to run many games with credible performance. Now, Intel is passing the torch to a new graphics…

ChipAndCheese

20:14 · Feb 9, 2025 · Sun

属于CYY自己的世界
Can we trust the cpu cycles from LLVM-MCA?

source
(author: 陈泱宇 (Yangyu Chen))

07:18 · Feb 8, 2025 · Sat

Daniel Lemire's blog
Thread-safe memory copy

A common operation in software is the copy of a block of memory. In C/C++, we often call the function memcpy for this purpose.

But what happens if, while you are copying the data, another thread is modifying either the source or the destination? The result is fundamentally unpredictable and almost surely a programming error.

Why would you ever code a copy function in such a way given that it is an error? Suppose you are implementing a JavaScript engine in C++, like Google v8. In JavaScript, we have SharedArrayBuffer instances that can be modified and copied from different threads. As the engineer working on the JavaScript engine, you cannot always prevent users from writing buggy code.

In any case, you get a data race: two or more threads access the same memory location simultaneously, where at least one of the accesses is a write operation, without a synchronization mechanism to ensure that these operations occur in a specific order.

What happens? The C++ standard states that a data race results in undefined behavior. In effect, the C++ language does not tell you what happens. A crash might occur. Of course, the JavaScript engineer would rather not see a crash.

Importantly, ‘undefined behavior’ also does not tell you that there is necessarily an error. Effectively, it tells you that as programmer, you acquire the additional responsibility to ensure that it is safe code. There is no warranty coming from the programming language itself.

Why do languages like C and C++ leave undefined behavior?

A good analogy is an organization with many sub-components, where new sub-components could be added at any time. Think of an interstellar federation of planets. The interstellar federation can specify overall laws that are well defined, but there will be remaining corner cases that are specific to which planet you reside in.

That’s the spirit of C and C++: these programming languages can target a very wide range of platforms. For some of these platforms, a data race is without consequence… for others, it could be highly problematic or just slow. Also, by not specifying the behavior, it allows the compiler designer some options. So the programming language leaves it up to you to check.

Consider a conflictual memory copy where you, for example, copy from array A to array B while another thread copies from array B to array A. Under most platforms, this will not cause a crash or anything especially dangerous. You might get garbage data in your arrays, in the worst case.

But if you use automated sanitizer tools, you may still get a warning regarding the data race, even when it is inconsequential. You can silence the warning, by telling the tools that you have a check that the copy is safe.

Instead, you could roll your own ‘safe’ memory copy, where load the content byte by byte (for example) in an atomic fashion. A possible solution in C++20 looks like so:

void safe_memcpy(char *dest, const char *src, size_t count) {
  for (size_t i = 0; i < count; ++i) {
    char input =
        std::atomic_ref<const char>(src[i])
              .load(std::memory_order_relaxed);
    std::atomic_ref<char>(dest[i])
              .store(input, std::memory_order_relaxed);
  }
}

We have now done away with any kind of undefined behavior. The code ought to be perfectly ‘safe’, there is no more data race.

So why not always use this safe approach?

Because it can be 40 times slower than a conventional memory copy.

It becomes an engineering question. Sometimes performance really does not matter.

In programming, there is practically never a free lunch. It is common that you have take your pick: aim for high performance but acquire more responsibilities, or sacrifice performance for the sake of having fewer worries.

source

Before

After