Post by Cem Bassoy via ublasOn Wed, May 2, 2018 at 10:11 PM, David Bellot via ublas
- Should we integrate smart expression templates? I think
there was a gsoc project but I am not sure. What was the output?
âit was really good
okay. I cannot see it in the development branch. Is it intended to be
integrated into uBLAS?
Â
â
- Are (smart) expression templates really required?
âbut after second thought, I wonder like you.
I think smart expression templates could be beneficial in terms of
selecting and executing high-performance kernels!
Expression templates seem to be outdated. See
https://epubs.siam.org/doi/abs/10.1137/110830125
One of the GSoC projects is adding GPU support to uBLAS. And while it
may be useful to have an API layer that lets users explicitly request
the GPU backend to be used (for the supported subset of BLAS functions
for which there are GPU kernels), we also may want to offer full
integration, whereby a user just uses generic uBLAS expressions, and
leaves the selection of the appropriate backend (GPU or other) to the
library itself.
But to be able to implement this selection mechanism, we need some
advanced dispatching technique. I'm not sure I understand enough about
"smart expression templates", but I did implement such dispatching
infrastructure in the past (http://openvsip.org/), which scales well to
a high number of backends (GPU, SIMD vectorization, TBB-style
parallelization, etc.).
I hope we can manage to get to the point where we can discuss what
techniques are most appropriate for Boost.uBLAS, and perhaps even start
to implement them over this summer. We shall see...
Post by Cem Bassoy via ublasâ
- How often do expressions like A = B*C + D*D - ... occur in
numerical applications?
- Should we provide a fast gemm implementation of the
Goto-Algorithm like in Eigen?
âwhy not.
Because, tuning algorithms and getting them to nearly peak performance
for standard processors is a nontrivial task to my mind. I am not sure
but we could try to integrate and link to existing highly optimized
kernels.
As with the above, the problem really is to pick the right backend(s)
for the current platform, as the optimal choice depends on many things
(available hardware, data layout, problem size, etc.). Coming up with a
good optimization strategy (and an architecture that supports it) is
non-trivial.
Post by Cem Bassoy via ublasÂ
â
- Do we need iterators within matrix and vector template
classes? Or can we generalize the concepts?
âonce there's been a discussion about that. Can we factorize all
this code into one place, one generic concept ?
This would make things so simple and efficient in the end.
Yes, I will try to built iterators for tensors so we can discuss this
by investigating my code.
Sounds good.
Post by Cem Bassoy via ublasÂ
â
- Can we maybe simplify/replace the projection function with
overloaded brackets?
âCan we do that ? That would be awesome !â
Â
Will try to show that it is possible.
Â
- Shall we build uBLAS a high-performance library?
âYes, I suppose.
What do you mean exactly by "high-performance" ?â
I wanted to say that uBLAS could serve as an optimizer and dispatcher
between different types of existing high-performance libraries instead
of providing high-performance functions.
I agree. The project I referred to above (OpenVSIP) started out as a
library implementing operations itself, until we realized how foolish an
idea that was, at which point it turned more and more into "middleware",
i.e. something like an "algorithm abstraction layer", which makes it
easy to plug in new backends (to support new hardware, say), without the
need for applications to change any code.
Boost has long prided itself reimplementing wheels. I hope we can
overcome this NIH syndrom and demonstrate how beneficial it is to reuse
existing know-how / technology. Focusing on C++ APIs should be the goal
of Boost, while good optimizations are certainly helpful to increase the
rate of adoption.
Stefan
--
...ich hab' noch einen Koffer in Berlin...