隨筆

發表文章

RISC-V NaN Generation and Propagation

12月 10, 2021

1. RISC-V spec said NaN propagation should generate canonical NaN Except when otherwise stated, if the result of a floating-point operation is NaN, it is the canonical NaN. The canonical NaN has a positive sign and all significand bits clear except the MSB, a.k.a. the quiet bit. For single-precision floating-point, this corresponds to the pattern 0x7fc00000. 2. IEEE 754 2018 standard recommends that operations on NaNs could propagate the input's NaN, but it's not required. 3. LLVM NaN propagation implementation follows IEEE rule which does not match RISC-V spec. https://github.com/llvm/llvm-project/blob/main/llvm/lib/Analysis/InstructionSimplify.cpp#L4966-L4968 The simplest test is adding two constant floating values and one is NaN, and the result of -O0 and -O3 are different due to the NaN's fraction bit. So software could use C++ isnan() to check NaN before comparing, it's why we could not use memcmp to compare two floating values, not only accuracy issue, but also n...

閱讀完整內容

LLVM Machine Instruction: Convergent attribute

10月 12, 2019

ref: http://lists.llvm.org/pipermail/llvm-dev/2015-August/089241.html 1. Convergent attribute is useful for SIMT/SPMD programming model. 2. Intended interpretation is that a convergent operation cannot be move either into or out of a conditionally executed region. 3. If you have a convergent instruction A, it islegal to duplicate it to instruction B if (assuming B is after A in program flow) A dominates B and B post-dominates A. case: r1 = texture2D(..., r0, ...) if (...) { // r0 used as temporary here r0 = ... r2 = r0 + ... } else { // only use of r1 r2 = r1 + ... } In this example, various optimizations might try to sink the texture2D operation into the else block, like so: if (...) { r0 = ... r2 = r0 + ... } else { r1 = texture2D(..., r0, ...) r2 = r1 + ... } In most SPMD/SIMT implementations, the fallout of this races is exposed via the predicated expression of acyclic control flow: pred0 <- cmp ... if (pred0) r0 = ... ...

閱讀完整內容

Stage Mix

9月 15, 2019

stage mix幾乎都是剪輯那些韓國多人團體的作品要滿足 1. 工業化一致的攝影方式跟攝影器材 2. 軍隊式標準的舞蹈 3. 細心的剪接才能辦到工業化一致的分鏡是必備的, 因為大團體每個人都要妥善的分配上鏡時間軍隊式標準的舞蹈也是必備的, 因為跳錯會影響精心設計過後的上鏡畫面我是在想說... 是不是要這樣幹大家才會想去看現場表演因為只有在現場才能緊叮你的偶像片刻不移看到平常看不到的畫面... 這樣發售的某次演場會影片倒是很無聊因為就只是換衣服跟場景嘛~(?) see https://blog.edumeme.org/2017/03/blog-post.html

閱讀完整內容

ARMv7 NEON VQRDMULH instruction implementation

5月 22, 2019

VQRDMULH : Vector Saturating Rounding Doubling Multiply Returning High Half. VQRDMULH multiplies corresponding elements in two vectors, doubles the results, and places the most significant half of the final results in the destination vector. implement reference code https://github.com/google/gemmlowp/blob/master/fixedpoint/fixedpoint.h#L329 <code> // This function implements the same computation as the ARMv7 NEON VQRDMULH // instruction. template <> inline std:: int32_t SaturatingRoundingDoublingHighMul (std:: int32_t a, std:: int32_t b) { bool overflow = a == b && a == std::numeric_limits<std:: int32_t >:: min (); std:: int64_t a_64 (a); std:: int64_t b_64 (b); std:: int64_t ab_64 = a_64 * b_64; std:: int32_t nudge = ab_64 >= 0 ? ( 1 << 30 ) : ( 1 - ( 1 << 30 )); std:: int32_t ab_x2_high32 = static_cast <std:: int32_t >((ab_64 + nudge) / ( 1ll << 31 )); return overflow ? std::numeric_limits...

閱讀完整內容

我們能利用machine learning去幫助compiler的optimization演算法變強嗎？

1月 21, 2017

ML通常是拿來幫忙做predict和decisions 透過大量training data (input: feature, output: predict a value or classification) 對於沒有看過的feature做predict和decisio ns 我對於ML經驗只有半年多以下是我的一些看法 1. 如果是對特定benchmark的話用ML能幫忙找出不錯的heuristic值 (ex. --param, --regalloc=[ basic|fast|greedy|pbqp] 等等) google也滿多的 Automatic Feature Generation for Machine Learning Based Optimizing Compilation 但直到看到"MILEPOST GCC" 覺得圖很眼熟... 就是interactive compiler! 2. 如果想把machine learning的training也做進compiler內看來就只能做在JIT了 google了一下也有人做到VM Using machines to learn method-specific compilation strategies (IBM) Method-Specific Dynamic Compilation using Logistic Regression 也有人拿來做CPU/GPU decisions XD Machine-Learning-based Performance Heuristics for Runtime CPU/GPU Selection 如果做在AOT compiler, 感覺就像PGO, 看起來也不需ML 3. 如果要用machine learning去學出某隻benchmark/ function最好的instruction sequence 感覺可以用reinforcement learning去做但是用interactive compiler去挑heuristic就可能做...

閱讀完整內容

The Speed Game: Automated Trading Systems in C++

1月 21, 2017

https://meetingcpp.com/index.php/tv16/items/18.html C++ low latency coding techniques： ● General considerations C++11 Move semantics Static assert Data member layout, padding and alignment (盡量alignment access) ● False sharing http://shuyufu.blogspot.tw/2013/01/false-sharing.html ● Cache locality ● Compile-time dispatch std::sort(array, array + N, []( int a, int b) { return b < a; }); 71us, std deviation 1.5us int comparer( const void * a, const void* b) { return *(int*)a - *(int*)b; } qsort(arr, N, sizeof(int), comparer); 223us, std deviation 7us ● Constexpr C++14 feature ● Variadic templates ● Loop unrolling Generally, don’t bother, the compiler will figure it out ● Expression short-circuiting Rewrite: if (expensiveCheck() && inexpensiveCheck()) {} As: if (inexpensiveCheck() && expensiveCheck()) {} ● Signed vs unsigned comparisons 用loop iteratort用signed就是了（可看前篇 ● Mixing float and doubles Default type of a floating...

閱讀完整內容

搜尋此網誌

隨筆

發表文章

It's really long time not to be here.

RISC-V NaN Generation and Propagation

LLVM Machine Instruction: Convergent attribute

Stage Mix

ARMv7 NEON VQRDMULH instruction implementation

我們能利用machine learning去幫助compiler的optimization演算法變強嗎？

The Speed Game: Automated Trading Systems in C++