Replies: 8 comments 6 replies
-
If you have the same space for your test and trial spaces, then you could do something along these lines. However, you will be likely slower, because the call to shape_gradient is already just a lookup. |
Beta Was this translation helpful? Give feedback.
-
In my very naive tests the gain is sometimes significant |
Beta Was this translation helpful? Give feedback.
-
You can check https://ferrite-fem.github.io/Ferrite.jl/stable/examples/stokes-flow/ ( Ferrite.jl/docs/src/literate/stokes-flow.jl Lines 386 to 389 in 570b3b5 Ferrite.jl/docs/src/literate/stokes-flow.jl Lines 397 to 404 in 570b3b5 ∇uv outside of the element routine, otherwise you pay the allocation cost for every element.
What did you measure? Just the assembly routine or the full global assembly? |
Beta Was this translation helpful? Give feedback.
-
Oups, I missed that example. Yes, I know, my suggestion can be improved regarding allocation. This was just an illustration which has already an impact evaluating all the assembly:
assemb K 1 10.1s 86.7% 10.1s 3.94GiB 80.4% 3.94GiB
∇uv 402k 229ms 2.0% 570ns 178MiB 3.5% 464B
comp dΩ 402k 60.7ms 0.5% 151ns 6.13MiB 0.1% 16.0B
assemble! 5.50k 9.60ms 0.1% 1.74μs 344KiB 0.0% 64.0B
assemb K 1 21.4s 93.4% 21.4s 3.77GiB 79.6% 3.77GiB
∇u 40.2M 5.22s 22.8% 130ns 1.20GiB 25.3% 32.0B
∇v 4.02M 561ms 2.5% 140ns 123MiB 2.5% 32.0B
comp dΩ 402k 51.4ms 0.2% 128ns 6.13MiB 0.1% 16.0B
assemble! 5.50k 9.44ms 0.0% 1.71μs 344KiB 0.0% 64.0B |
Beta Was this translation helpful? Give feedback.
-
That looks very suspicious. What are you using to measure that? In particular, |
Beta Was this translation helpful? Give feedback.
-
Great thanks, sorry for the noise.. |
Beta Was this translation helpful? Give feedback.
-
As a point of reference, for the heat equation with 200x200 elements, third order interpolation, assembly is slightly longer when caching the values (140ms without caching, 155ms with caching). Likely because you only shuffle data around with the caching, but then have to do almost the equivalent lookup later anyway (i.e. accessing As pointed out above, Ferrite.jl/src/FEValues/common_values.jl Line 84 in 570b3b5 Ferrite.jl/src/FEValues/common_values.jl Line 96 in 570b3b5 shape_divergence or shape_symmetric_gradient which does some very trivial computation too. (See Ferrite.jl/src/FEValues/common_values.jl Line 114 in 570b3b5 Ferrite.jl/src/FEValues/common_values.jl Line 105 in 570b3b5 |
Beta Was this translation helpful? Give feedback.
-
Thank you so much for the reactivity.
@btime begin
for k in 1:$n_basefuncs
$∇uv[k] = shape_gradient($cellvalues, $q_point, k)
end
for i in 1:$n_basefuncs
∇v = $∇uv[i]
for j in 1:$n_basefuncs
u = $∇uv[j]
$Ke[i, j] += (∇v ⋅ ∇u) * $dΩ
end
end
end
139.512 ns (0 allocations: 0 bytes)
111.476 ns (0 allocations: 0 bytes)
142.031 ns (0 allocations: 0 bytes)
111.493 ns (0 allocations: 0 bytes)
139.515 ns (0 allocations: 0 bytes)
111.469 ns (0 allocations: 0 bytes)
142.025 ns (0 allocations: 0 bytes)
111.829 ns (0 allocations: 0 bytes)
...
@btime begin
for i in 1:$n_basefuncs
∇v = shape_gradient($cellvalues, $q_point, i)
for j in 1:$n_basefuncs
∇u = shape_gradient($cellvalues, $q_point, j)
$Ke[i, j] += (∇v ⋅ ∇u) * $dΩ
end
end
end
116.681 ns (0 allocations: 0 bytes)
150.695 ns (0 allocations: 0 bytes)
118.124 ns (0 allocations: 0 bytes)
149.268 ns (0 allocations: 0 bytes)
117.144 ns (0 allocations: 0 bytes)
150.820 ns (0 allocations: 0 bytes)
115.594 ns (0 allocations: 0 bytes)
149.268 ns (0 allocations: 0 bytes)
...
|
Beta Was this translation helpful? Give feedback.
-
Is there a particular reason to motivate the quadratic cost in the assembling routines in several examples of the documentation? It seems to me significantly more efficient to replace (for instance)
by something like
No? I am probably missing something.
Beta Was this translation helpful? Give feedback.
All reactions