@EastonMan 看的新闻
+碎碎念
+膜大佬
+偶尔猫猫
+伊斯通听的歌
Daniel Lemire's blog
JavaScript hashing speed comparison: MD5 versus SHA-256

source
Matt Keeter
Fidget

Blazing fast implicit surface evaluation

source
(author: Matt Keeter (matt.j.keeter@gmail.com))
Daniel Lemire's blog
Counting the digits of 64-bit integers

source
Daniel Lemire's blog
Artificial Intelligence as the Expert’s Lever: Elevating Human Expertise in the Age of AI

The more likely outcome of the rise of generative artificial intelligence is higher value for the best experts… where ‘expert’ means ‘someone with experience solving real problems’.
“While one may worry that AI will simply render expertise redundant and experts superfluous, history and economic logic suggest otherwise. AI is a tool, like a calculator or a chainsaw, and tools generally aren’t substitutes for expertise but rather levers for its application.
By shortening the distance from intention to result, tools enable workers with proper training and judgment to accomplish tasks that were previously time-consuming, failure-prone or infeasible. Conversely, tools are useless at best — and hazardous at worst — to those lacking relevant training and experience. A pneumatic nail gun is an indispensable time-saver for a roofer and a looming impalement hazard for a home hobbyist.
For workers with foundational training and experience, AI can help to leverage expertise so they can do higher-value work. AI will certainly also automate existing work, rendering certain existing areas of expertise irrelevant. It will further instantiate new human capabilities, new goods and services that create demand for expertise we have yet to foresee.” (Autor, 2024)


source
Daniel Lemire's blog
How does your URL parser handle Unicode?

Most strings today in software are Unicode strings. It means that you can include mathematical symbols, emojis and so forth. There are many different versions of the letter ‘M’, for example: the Roman letter M (U+004D) is semantically different from the Roman numeral Ⅿ (U+216F) while they both often have the same visual representation. John Cook has an interesting post on Unicode Stegonography: you can possibly use this ambiguity to hide messages in plain view. E.g., if you need to warn someone that you are in danger, you could send a text with the Roman numeral M. Normal people reading the text would not notice the difference.

What about URLs like Microsoft.com? What if you replace the Roman letter by a Roman numeral, is it still the same domain?

It is. URL parsers are required to normalize the URLs which involves, among other things, replacing look-alike letters with Roman letters if they are to be compliant with the WHATWG URL specification.

But do they? Do the URL parsers actually do this hard work? Let us check.

Java. I could not get the standard Java library to return to me the host. It simply returns a null String.
 String url = "https://microsoft.coⅯ";
 URI uri = new URI(url);
 String host = uri.getHost();

C#. The .NET library seems to just returns the domain as-is with the Roman numeral.
string url = "https://microsoft.coⅯ";
Uri uri = new Uri(url);
string host = uri.Host;

PHP. The standard PHP interpreter just returns the domain as-is, with the Roman numeral
$url = "https://microsoft.coⅯ";
$parsed_url = parse_url($url);
if ($parsed_url === false) {
 echo "URL could not be parsed.";
} else {
 $host = $parsed_url['host'];
}


Go. Go also does not do normalization.
urlString := "https://microsoft.coⅯ"
parsedURL, err := url.Parse(urlString)
if err != nil {
        fmt.Println("URL could not be parsed:", err)
        return
}
host := parsedURL.Host

Python. You guessed it: no normalization. It happily returns the Roman numeral.
url = "https://microsoft.coⅯ"
parsed_url = urllib.parse.urlparse(url)
host = parsed_url.netloc

JavaScript. JavaScript does it correctly. It will convert https://microsoft.coⅯ to https://microsoft.com.
const url = "https://microsoft.coⅯ";
const urlObj = new URL(url);
const host = urlObj.hostname;

C++. C++ does not have a standard URL parser, but if you use the ada URL parser, you will get correct results. If you are using the Node.js runtime environment, the underlying parser is the C++ ada URL parsing library.
auto url = ada::parse("https://microsoft.coⅯ");
if (!url) { /* failure */ }
std::string_view host = url->get_host();


source
░░░░░░░░░░░░░░░░░░░░ 0%
祝大家2025新年快乐🥰
Daniel Lemire's blog
Efficient In-Place UTF-16 Unicode Correction with ARM NEON

source
杰哥的{运维,编程,调板子}小笔记
Apple M1 微架构评测

source
Daniel Lemire's blog
Simpler and faster parsing code with std::views::split

source
Back to Top