發表文章

It's really long time not to be here.

I reviewed previous note and yes, lot of stuff I already totally forget and no memeory. Recently I'm working on compiler to translate ARM ASL1 language into instrcution set simulator. This is not a too difficut topic, one is we already had ASL0 compiler design and implementation experience, and also AI tool are really good on MLIR knowledge. Since I'm a senior member in the team, I didn't have too much oppourity to learn the new insight or technique in this project from others except AI. In fact, it's very boring. It's a time to find out interenseting topic to rich my life in carrer..

RISC-V NaN Generation and Propagation

1. RISC-V spec said NaN propagation should generate canonical NaN Except when otherwise stated, if the result of a floating-point operation is NaN, it is the canonical NaN. The canonical NaN has a positive sign and all significand bits clear except the MSB, a.k.a. the quiet bit. For single-precision floating-point, this corresponds to the pattern 0x7fc00000. 2. IEEE 754 2018 standard recommends that operations on NaNs could propagate the input's NaN, but it's not required. 3. LLVM NaN propagation implementation follows IEEE rule which does not match RISC-V spec. https://github.com/llvm/llvm-project/blob/main/llvm/lib/Analysis/InstructionSimplify.cpp#L4966-L4968 The simplest test is adding two constant floating values and one is NaN, and the result of  -O0 and -O3 are different due to the NaN's fraction bit. So software could use C++ isnan() to check NaN before comparing, it's why we could not use memcmp to compare two floating values, not only accuracy issue, but also n...

LLVM Machine Instruction: Convergent attribute

ref:  http://lists.llvm.org/pipermail/llvm-dev/2015-August/089241.html 1. Convergent attribute is useful for SIMT/SPMD programming model. 2. Intended interpretation is that a convergent operation cannot be move either into or out of a conditionally executed region. 3. If you have a convergent instruction A, it islegal to duplicate it to instruction B if (assuming B is after A in program flow) A dominates B and B post-dominates A. case: r1 = texture2D(..., r0, ...) if (...) { // r0 used as temporary here r0 = ... r2 = r0 + ... } else { // only use of r1 r2 = r1 + ... } In this example, various optimizations might try to sink the texture2D operation into the else block, like so: if (...) { r0 = ... r2 = r0 + ... } else { r1 = texture2D(..., r0, ...) r2 = r1 + ... } In most SPMD/SIMT implementations, the fallout of this races is exposed via the predicated expression of acyclic control flow: pred0 <- cmp ... if (pred0) r0 = ... ...

Stage Mix

圖片
stage mix幾乎都是剪輯那些韓國多人團體的作品 要滿足 1. 工業化一致的攝影方式跟攝影器材 2. 軍隊式標準的舞蹈 3. 細心的剪接 才能辦到 工業化一致的分鏡是必備的, 因為大團體每個人都要妥善的分配上鏡時間 軍隊式標準的舞蹈也是必備的, 因為跳錯會影響精心設計過後的上鏡畫面 我是在想說... 是不是要這樣幹 大家才會想去看現場表演 因為只有在現場 才能緊叮你的偶像片刻不移 看到平常看不到的畫面... 這樣發售的某次演場會影片倒是很無聊 因為就只是換衣服跟場景嘛~(?) see  https://blog.edumeme.org/2017/03/blog-post.html

ARMv7 NEON VQRDMULH instruction implementation

VQRDMULH : Vector Saturating Rounding Doubling Multiply Returning High Half. VQRDMULH multiplies corresponding elements in two vectors, doubles the results, and places the most significant half of the final results in the destination vector. implement reference code https://github.com/google/gemmlowp/blob/master/fixedpoint/fixedpoint.h#L329 <code> // This function implements the same computation as the ARMv7 NEON VQRDMULH // instruction. template <> inline std:: int32_t SaturatingRoundingDoublingHighMul (std:: int32_t a, std:: int32_t b) { bool overflow = a == b && a == std::numeric_limits<std:: int32_t >:: min (); std:: int64_t a_64 (a); std:: int64_t b_64 (b); std:: int64_t ab_64 = a_64 * b_64; std:: int32_t nudge = ab_64 >= 0 ? ( 1 << 30 ) : ( 1 - ( 1 << 30 )); std:: int32_t ab_x2_high32 = static_cast <std:: int32_t >((ab_64 + nudge) / ( 1ll << 31 )); return overflow ? std::numeric_limits...

我們能利用machine learning去幫助compiler的optimization演算法變強嗎?

ML通常是拿來幫忙做predict和decisions 透過大量training data  (input: feature,  output: predict a value or classification) 對於沒有看過的feature做predict和decisio ns 我對於ML經驗只有半年多 以下是我的一些看法 1.  如果是對特定benchmark的話 用ML能幫忙找出不錯的heuristic值 (ex.  --param,  --regalloc=[ basic|fast|greedy|pbqp] 等等) google也滿多的 Automatic Feature Generation for Machine Learning Based Optimizing Compilation 但直到看到"MILEPOST GCC"  覺得圖很眼熟... 就是interactive compiler! 2.  如果想把machine learning的training也做進compiler內 看來就只能做在JIT了 google了一下也有人做到VM Using machines to learn method-specific compilation strategies (IBM) Method-Specific Dynamic Compilation using Logistic Regression 也有人拿來做CPU/GPU decisions XD Machine-Learning-based Performance Heuristics for Runtime CPU/GPU Selection 如果做在AOT compiler, 感覺就像PGO, 看起來也不需ML 3.  如果要用machine learning去學出某隻benchmark/ function最好的instruction sequence 感覺可以用reinforcement learning去做 但是用interactive compiler去挑heuristic就可能做...

The Speed Game: Automated Trading Systems in C++

https://meetingcpp.com/index.php/tv16/items/18.html C++ low latency coding techniques: ● General considerations C++11 Move semantics Static assert Data member layout, padding and alignment (盡量alignment access) ● False sharing http://shuyufu.blogspot.tw/2013/01/false-sharing.html ● Cache locality ● Compile-time dispatch std::sort(array, array + N, []( int a, int b) { return b < a; });  71us, std deviation 1.5us int comparer( const void * a, const void* b) { return *(int*)a - *(int*)b; } qsort(arr, N, sizeof(int), comparer); 223us, std deviation 7us ● Constexpr C++14 feature ● Variadic templates ● Loop unrolling Generally, don’t bother, the compiler will figure it out ● Expression short-circuiting Rewrite: if (expensiveCheck() && inexpensiveCheck()) {} As: if (inexpensiveCheck() && expensiveCheck()) {} ● Signed vs unsigned comparisons 用loop iteratort用signed就是了 (可看前篇 ● Mixing float and doubles Default type of a floating...