compiler assume UB can't happen, and optimizing accordingly
//kernel bug
void foo(bool *ok){
bool k = *ok; <==dereference NULL pointer is UB, so ok will not be NULL
if (!ok)
return;
blah(k)
}
into
void foo(bool *ok){
bool k = *ok;
blah(k)
}
how is UB represented?
Explicitly:
unreachable
undef
implicitly:
the optimizer knows some things just can't happen
int unreachable(int *out) {
*out = 42;
return *((int*)0);
}
==> load from null is UB, so replace it with unreachable
undef: a value which can have any bit pattern at any point in the program
int undef(int *p){
int a;
return *p + a;
}
signed math can't overflow or underflow
bool signed_underflow(int a, int b){
return a - b > -1;
}
since a-b can't underflow, a-b will be greater than -1 if a >= b
bool signed_underflow(int a, int b){
return a>= b;
}
Type aliasing: pointers to objects of different types can't alias (most of the time).
void check(int *h, long *k){
*h = 5;
*k =6;
printf("%d\n", *h);
}
int main(void)
long k;
check((int*)&k, &k);
}
so optimizer will generate
void @check(i32* h, i32* k){
store i32 5, i32* %h, !tbaa{!"int"}
store i32 6, i32* %k, !tbaa{!"long"}
call @printf("%d\"n", i32 5)
ret void
}
c++ define infinite loop is UB, loop will terminate
N1528: Why undefined behavior for infinite loops?
為了優化
for (p = q; p != 0; p = p -> next) {
++count;
}
for (p = q; p != 0; p = p -> next) {
++count2;
}
++count;
}
for (p = q; p != 0; p = p -> next) {
++count2;
}
to
for (p = q; p != 0; p = p -> next) {
++count;
++count2;
}#include <iostream> int fermat() { const int MAX = 1000; int a=1,b=1,c=1; // Endless loop with no side effects is UB while (1) { if (((a*a*a) == ((b*b*b)+(c*c*c)))) return 1; a++; if (a>MAX) { a=1; b++; } if (b>MAX) { b=1; c++; } if (c>MAX) { c=1;} } return 0;} int main() { if (fermat()) std::cout << "Fermat's Last Theorem has been disproved.\n"; else std::cout << "Fermat's Last Theorem has not been disproved.\n";}
Possible output:
Fermat's Last Theorem has been disproved.
Use of an object, or a pointer to an object, outside of its lifetime frequently results in undefined behavior.
optimizer will just replace original value to undef
references to null value
ex.
bool foo(int &a){
return &a == nullptr;
}
int main(){
int *blah= nullptr;
return foo(*blash);
}
optimizer generate
bool foo(int &a){
return flase;
}
int main(){
int *blah= nullptr;
return foo(*blash);
}
then generate
bool foo(int &a){
return flase;
}
int main(){
return 0;
}
"this" will not be null because " calling any nonstatic method of any class through a null-pointer leads to undefined behavior"
Still Comparing "this" Pointer to Null?
ex. v8 bug
struct FieldType{
FieldType *none(){
return (FieldType*) (0);
}
bool IsNone() {
return this == none();
}
};
bool foo (FieldType *F){
return F->isNone();
}
finally, optimizer inline two times and generate
bool foo (FieldType *F){
return false;
}
signed {under,over}flow
因為假設overflow不會發生, 所以compiler才可以做optimization
1.
%1 = mul nsw i32 %x, 20
%A = sdiv i32 %1, 2
==>
%A = mul nsw i32 %x, 10
2.
%2 = mul nsw i32 %x, 21
%B = icmp sgt i32 %2, 0
==>
%B = icmp sgt i32 %x, 0
%4 = sub new i32 0, %x
%5 = sub nsw i32 0, %y
%C = icmp sgt i32 %4, %5
==>
%c = icmp sgt i32 %y, %x
signed {under,over}flow for promotion
int f (int *a, int x, int y){
return a[x+y];
}
signed 需要多一個指令作promotion, 影片裡面表示unsigned會overflow所以必須要多一個指令拿lower part
但是我自己用gcc 6.0/clang 3.7 結果都是和他說的相反
signed version
addl %edx, %esi
movslq %esi, %rax
movl (%rdi,%rax,4), %eax
unsigned version
addl %edx, %esi
movl (%rdi,%rsi,4), %eax
loop induction is signed的話 optimizer可以做佳化
因為保證n+1 > n
template <class T>
T f(T N) {
T i = 0;
for(i<N;i+=2);
return i;
}
signed 可以幫optimizer作induction variable的推測 (remove loop)
unsigned不行
int _signed(int N){
int i =0;
for (;i<N;i+=2);
return i;
}
==>for loop可移掉
unsigned int _unsigned(unsigned int N){
unsigned int i =0;
for (;i<N;i+=2);
return i;
}
==> for loop不能移掉
realloc sample
To see the undefined behavior, you need to read the fine print in the C99 standard, which says:
#include <stdio.h>#include <stdlib.h>int main() { int *p = (int*)malloc(sizeof(int)); int *q = (int*)realloc(p, sizeof(int)); *p = 1; *q = 2; if (p == q) printf("%d %d\n", *p, *q);}
O0, print 2, 2
O2, print 1, 2
Atomics optimization
int x;
atomic<int> y;
void foo(){
x= 0;
y.store(0, seq_cst);
if (y.load(seq_cst) == 0)
x= 1;
}
nothing can be observe the difference ==>
int x;
atomic<int> y;
void foo(){
x= 0;
y.store(0, seq_cst);
if (0 == 0)
x= 1;
}
==> simplify CFG
int x;
atomic<int> y;
void foo(){
x= 0;
y.store(0, seq_cst);
x= 1;
}
==>
int x;
atomic<int> y;
void foo(){
y.store(0, seq_cst);
x= 1;
}
DATA RACE detail, see
compilers can optimize non-atomic memory accesses before and after atomic accesses.
what can you do about it?
- -Wall -Wextra -Werror
- sanitize (undefined, address, thread, memory)
- static analysis (clang, coverity, pvs-stdio