Tsur Herman
2016-09-26 07:23:21 UTC
I am probably re-discovering the wheel or about to say something that will
get a response in the form of "its complicated"
It occurred to me that what we need to optimize are loops and iterations.
The following code will run slow:
function F1(A)
total = 0;
for a in A
total += a;
end
total
end
whereas the following code will run fast
function F2(A)
total = zero(eltype(A));
for a in A
total += a;
end
total
end
julia> A=rand(4000,4000);
julia> F1(A);@time F1(A)
0.362453 seconds (48.00 M allocations: 732.422 MB, 5.72% gc time)
7.997862445435431e6
julia> F2(A);@time F2(A)
0.025840 seconds (5 allocations: 176 bytes)
7.997862445435431e6
because in the first version total starts in a different type than the
element of the matrix A . and there is now way for the JITter to tell if
this is deliberate or careless coding .
What I suggest is to *replace any loop in the code with an at-least one
unrolled version and to push the rest into an anonymous function.*
In the parser level.
for example:
function FF(A)
total = 0
loopbody(total,a) = return total += a
total = loopbody(total,A[1])
lambda(A,range,total,loopbody) = begin
@simd for i in range
@inbounds total = loopbody(total,A[i])
end
total
end
return lambda(A,2:prod(size(A)),total,loopbody)
end
And the timing results compared to builtin sum function:
julia> FF(A);@time FF(A)
0.010596 seconds (7 allocations: 208 bytes)
7.997862445434973e6
julia> sum(A);@time sum(A)
0.010666 seconds (5 allocations: 176 bytes)
7.997862445434952e6
get a response in the form of "its complicated"
It occurred to me that what we need to optimize are loops and iterations.
The following code will run slow:
function F1(A)
total = 0;
for a in A
total += a;
end
total
end
whereas the following code will run fast
function F2(A)
total = zero(eltype(A));
for a in A
total += a;
end
total
end
julia> A=rand(4000,4000);
julia> F1(A);@time F1(A)
0.362453 seconds (48.00 M allocations: 732.422 MB, 5.72% gc time)
7.997862445435431e6
julia> F2(A);@time F2(A)
0.025840 seconds (5 allocations: 176 bytes)
7.997862445435431e6
because in the first version total starts in a different type than the
element of the matrix A . and there is now way for the JITter to tell if
this is deliberate or careless coding .
What I suggest is to *replace any loop in the code with an at-least one
unrolled version and to push the rest into an anonymous function.*
In the parser level.
for example:
function FF(A)
total = 0
loopbody(total,a) = return total += a
total = loopbody(total,A[1])
lambda(A,range,total,loopbody) = begin
@simd for i in range
@inbounds total = loopbody(total,A[i])
end
total
end
return lambda(A,2:prod(size(A)),total,loopbody)
end
And the timing results compared to builtin sum function:
julia> FF(A);@time FF(A)
0.010596 seconds (7 allocations: 208 bytes)
7.997862445434973e6
julia> sum(A);@time sum(A)
0.010666 seconds (5 allocations: 176 bytes)
7.997862445434952e6