@EastonMan 看的新闻
+碎碎念
+膜大佬
+偶尔猫猫
+伊斯通听的歌
杰哥的{运维,编程,调板子}小笔记
Android Runtime 解释器的实现探究

source
哦写反了,谁家正式产品用 Example 文档当私钥啊?
谁家专利里面贴自己正式产品里fuse的私钥啊?
Daniel Lemire's blog
How fast can you open 1000 files?

source
想要变 moe 就要先研究 MoE
vllm 也可以是 llvm
杰哥的{运维,编程,调板子}小笔记
Intel Golden Cove 微架构评测

source
Arch Linux: Recent news updates
Cleaning up old repositories

Around two years ago, we've merged the [community] repository into [extra] as part of the git migration. In order to not break user setups, we kept these repositories around in an unused and empty state. We're going to clean up these old repositories on 2025-03-01.

On systems where /etc/pacman.conf still references the old [community] repository, pacman -Sy will return an error on trying to sync repository metadata.

The following deprecated repositories will be removed: [community], [community-testing], [testing], [testing-debug], [staging], [staging-debug].

Please make sure to remove all use of the aforementioned repositories from your /etc/pacman.conf (for which a .pacnew was shipped with pacman>=6.0.2-7)!

source
(author: Sven-Hendrik Haase)
Daniel Lemire's blog
AVX-512 gotcha: avoid compressing words to memory with AMD Zen 4 processors

The recent AMD processors (Zen 4) provide extensive support for the powerful AVX-512 instructions. AVX-512 (Advanced Vector Extensions 512) is an extension to the x86 instruction set architecture (ISA) introduced by Intel. These instructions enhance the capabilities of processors by allowing for more data to be processed in parallel. You can process registers made of 64 bytes!

One of the neat trick is that given a mask, you can ‘compress’ words: Suppose that you have a vector made of thirty-two 16-bit words, and you want to only keep the second one and third one, then you can use the vpcompressw instruction and the mask 0b110. It will produce a register where the second and third words are placed in first and second position.

An even nicer trick is that you can use this instruction to write just these two words out to memory. You can invoke this functionality with the _mm_mask_compressstoreu_epi16 function intrinsic.

This works well on recent Intel processors, but not so well on AMD Zen 4 processors.

We have a fast function in the simdjson library to minify a file (remove unnecessary spaces).

https://github.com/simdjson/simdjson/pull/2335

source
这个脑洞有意思
属于CYY自己的世界
Can we trust the cpu cycles from LLVM-MCA?

source
(author: 陈泱宇 (Yangyu Chen))
Daniel Lemire's blog
Thread-safe memory copy

A common operation in software is the copy of a block of memory. In C/C++, we often call the function memcpy for this purpose.

But what happens if, while you are copying the data, another thread is modifying either the source or the destination? The result is fundamentally unpredictable and almost surely a programming error.

Why would you ever code a copy function in such a way given that it is an error? Suppose you are implementing a JavaScript engine in C++, like Google v8. In JavaScript, we have SharedArrayBuffer instances that can be modified and copied from different threads. As the engineer working on the JavaScript engine, you cannot always prevent users from writing buggy code.

In any case, you get a data race:  two or more threads access the same memory location simultaneously, where at least one of the accesses is a write operation, without a synchronization mechanism to ensure that these operations occur in a specific order.

What happens? The C++ standard states that a data race results in undefined behavior. In effect, the C++ language does not tell you what happens. A crash might occur. Of course, the JavaScript engineer would rather not see a crash.

Importantly, ‘undefined behavior’ also does not tell you that there is necessarily an error. Effectively, it tells you that as programmer, you acquire the additional responsibility to ensure that it is safe code. There is no warranty coming from the programming language itself.

Why do languages like C and C++ leave undefined behavior?

A good analogy is an organization with many sub-components, where new sub-components could be added at any time. Think of an interstellar federation of planets. The interstellar federation can specify overall laws that are well defined, but there will be remaining corner cases that are specific to which planet you reside in.

That’s the spirit of C and C++: these programming languages can target a very wide range of platforms. For some of these platforms, a data race is without consequence… for others, it could be highly problematic or just slow. Also, by not specifying the behavior, it allows the compiler designer some options. So the programming language leaves it up to you to check.

Consider a conflictual memory copy where you, for example, copy from array A to array B while another thread copies from array B to array A. Under most platforms, this will not cause a crash or anything especially dangerous. You might get garbage data in your arrays, in the worst case.

But if you use automated sanitizer tools, you may still get a warning regarding the data race, even when it is inconsequential. You can silence the warning, by telling the tools that you have a check that the copy is safe.

Instead, you could roll your own ‘safe’ memory copy, where load the content byte  by byte (for example) in an atomic fashion. A possible solution in C++20 looks like so:
void safe_memcpy(char *dest, const char *src, size_t count) {
  for (size_t i = 0; i < count; ++i) {
    char input =
        std::atomic_ref<const char>(src[i])
              .load(std::memory_order_relaxed);
    std::atomic_ref<char>(dest[i])
              .store(input, std::memory_order_relaxed);
  }
}

We have now done away with any kind of undefined behavior. The code ought to be perfectly ‘safe’, there is no more data race.

So why not always use this safe approach?

Because it can be 40 times slower than a conventional memory copy.

It becomes an engineering question. Sometimes performance really does not matter.

In programming, there is practically never a free lunch. It is common that you have take your pick: aim for high performance but acquire more responsibilities, or sacrifice performance for the sake of having fewer worries.

source
Back to Top