@EastonMan 看的新闻
+碎碎念
+膜大佬
+偶尔猫猫
+伊斯通听的歌
Daniel Lemire's blog
Implementing the missing sign instruction in AVX-512

Intel and AMD have expanded the x64 instruction sets over time. In particular, the SIMD (Single instruction, multiple data) instructions have become progressively wider and more general: from 64 bits to 128 bits (SSE2), to 256 bits (AVX/AVX2) to 512 bits (AVX-512). Interestingly, many instructions defined on 256 bits registers through AVX/AVX2 are not available on 512 bits registers.

With SSSE3, Intel introduced sign instructions, with the corresponding intrinsic functions (e.g., _mm_sign_epi8). There are 8-bit, 16-bit and 32-bit versions.  It was extended to 256-bit registers in AVX2.

What these instructions do is to apply the sign of one parameter to the other parameter. It is most easily explained as pseucode code:
function sign(a, b): # a and b are integers
   if b == 0 : return 0
   if b < 0 : return -a
   if b > 0 : return a

The SIMD equivalent does the same operation but with many values at once. Thus, with SSSE3 and psignb, you can generate sixteen signed 8-bit integers at once.

You can view is as a generalization of the absolution function: abs(a) = sign(a,b). The sign instructions are very fast. They are used in numerical analysis and machine learning: e.g., it is used in llama.cpp, the open source LLM project.

When Intel designed AVX-512 they decided to omit the sign instructions. So while we have the intrinsic function  _mm256_sign_epi8, we don’t have _mm512_sign_epi8. The same instructions are missing for 16 bits and 32 bits integers (e.g., no _m512_sign_epi16 is found).

You may implement it for AVX-512 with a several instructions. I found this one approach:
#include <x86intrin.h>

__m512i _mm512_sign_epi8(__m512i a, __m512i b) {
  __m512i zero = _mm512_setzero_si512();
  __mmask64 blt0 = _mm512_movepi8_mask(b);
  __mmask64 ble0 = _mm512_cmple_epi8_mask(b, zero);
  __m512i a_blt0 = _mm512_mask_mov_epi8(zero, blt0, a);
  return _mm512_mask_sub_epi8(a, ble0, zero, a_blt0);;
}

It is disappointingly expensive. It might compile to four or five instructions:
vpmovb2m k2, zmm1
vpxor xmm2, xmm2, xmm2
vpcmpb k1, zmm1, zmm2, 2
vpblendmb zmm1{k2}, zmm2, zmm0
vpsubb zmm0{k1}, zmm2, zmm1

In practice, you may not need to pay such a high price. The reason the problem is difficult is that we have three cases to handle (three signs b=0, b>0, b&LT0).  If you do not care about the case ‘b = 0’, then you can do it in two instruction:
#include <x86intrin.h>

__m512i _mm512_sign_epi8_cheated(__m512i a, __m512i b) {
  __mmask64 blt0 = _mm512_movepi8_mask(b);
  return _mm512_mask_sub_epi8(a, blt0, zero, a);;
}

E.g., we implemented…
function sign_cheated(a, b): # a and b are integers
   if b ≤ 0 : return -a
   if b > 0 : return a


source
Arch Linux: Recent news updates
Making dbus-broker our default D-Bus daemon

We are making dbus-broker our default implementation of D-Bus, for improved performance, reliability and integration with systemd.

For the foreseeable future we will still support the use of dbus-daemon, the previous implementation. Pacman will ask you whether to install dbus-broker-units or dbus-daemon-units. We recommend picking the default.

For a more detailed rationale, please see our RFC 25.

source
(author: Jan Alexander Steffens)
难以想象明天考试有什么离谱题目
Harry Chen’s Blog
在 Debian 上配置 Configless Slurm

Slurm 在 20.02 之后增加了 Configless 的功能,也就是说不需要在每一个运行 slurmd 的结点维护所有的配置文件了。 这对于 HPC 集群的运维来说肯定是好消息。原本需要时刻保持 N 份配置文件相同,否则就容易产生玄学而难以诊断的问题,而一致性永远是计算机科学中的难题。 现在只需要在 slurmctld 对应的控制结点上维护一份配置,其他结点的 slurmd 启动时会自动拉取最新的配置,而运行时 reconfig 也不用担心受到本地配置的影响。

Slurm 的文档指出,实现 configless 满足进行以下要求:

1. 使得 slurmd 能找到 slurmctld:可以通过 DNS SRV 记录或者启动时传递 --conf-server 参数达成;
2. 如果使用 SRV 记录,需要保证 slurmd 启动时本地没有任何配置(因为 搜索顺序 中 SRV 记录优先级最低)。

由于我们的集群中有不止一套 Slurm,也就需要给不同的 slurmd 指定不同的 slurmctld,简单起见我选择了传参的方案。以 Debian 的 slurm-wlm 为例说明修改:

修改 /etc/default/slurmd,添加 --conf-server 参数:

SLURMD_OPTIONS="--conf-server your_ctl_server:6817"

尽管按照文档,这样就能工作了,为了保险起见,还可以通过 systemd 对 slurmd 隐藏 /etc/slurm 的配置(而不是真的删除),避免潜在的冲突/混淆问题。运行 systemctl edit slurmd

[Unit]
ConditionPathExists=

[Service]
TemporaryFileSystem=/etc/slurm

由于 Debian 分发的 service unit 中检测了 /etc/slurm/slurm.conf 作为启动条件,因此在 [Unit] 节中通过空配置覆盖来禁用它,然后在 [Service] 节中通过挂载临时文件系统隐藏原有目录。 重启服务后,可以通过 /proc/$(pgrep slurmd)/root/etc/slurm 的内容检查是否正常工作。

我按照上面的方法将实验室所有集群替换成了 configless 模式,目前工作一切正常。遇到的唯一问题是 GRES 配置有时无法通过 reconfig 更新,在尝试删除配置 - reconfig - 加回配置 - reconfig 后解决。

source
(author: Shengqi Chen ([email protected]))
2024 新年快乐!
▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓ 100%
Daniel Lemire's blog
Science and Technology links (December 30th 2023)

1. Parenting does not appear to be able to determine the personality traits of a child.
2. When the last ice age ended, 12,000 years ago, the Sahara was green and full of life. It turned into a desert about 5,500 years ago.
3. Fadnes et al. claim that the UK population could live 10 years older if it changed its eating habits.
4. By studying 175 different populations, You et al. find that meat intake predicts longevity: people who eat more meat live longer.
5. According to an editorial in the journal Nature, scientists who work in industry are more satisfied and better paid than are colleagues in academia. Industry scientists report less bullying and discrimination.
6. The Asch experiment examined the extent to which individuals would conform to the majority view, even when that view was clearly incorrect. The experiment involved a group of participants, one of whom was the actual subject of the experiment, and the rest were people who knew the true purpose of the experiment and acted according to a script. The group was shown a series of images with lines of different lengths and asked to identify which two lines were the same length. The results showed that a significant number of participants conformed to the majority view, even when it was clearly wrong. The Asch experiment is important because it highlights the influence of social factors on individual beliefs. Most people just adopt the prevaling beliefs, even when they are clearly incorrect. In other words, very few people can think for themselves. They just reproduce what they are shown or what they see others doing, like mere monkeys. Unfortunately, the original experiment is robust with respect to replication. We also find that even financial incentive fail to make people more critical.
7. Weather prediction is one of the first application of powerful computers. To this day, we rely on predictions made by specialized services: we don’t generally compute our own weather predictions. Google Deepmind claims to be able to predict the weather accurately on a normal computer using artificial intelligence.

source
Back to Top