| |
Optimization Targets
There are many architectural approaches to achieving high performance in modern computer systems,
from embedded processors to supercomputers, and VAST can help support all of them:
-
Vector/SIMD Units: VAST can vectorize loops and generate explicit vector instructions.
-
Multi-threading and multiple cores: VAST can detect opportunities for
multiple threads in a single program, by parallelizing regions of the input program
containing multiple loop nests. VAST automatically deals with creating new parallel routines
and identifying and processing shared and private variables. VAST also fully supports
the OpenMP standard for user-directed parallelism.
-
Superscalar: Transformations can be similar to VLIW systems;
VAST can restructure the program and expose more opportunities for instruction overlap.
-
Integrated DSP units: VAST can insure higher utilization of DSP units
by changing loops to use DSP intrinsics and loading special DSP memories.
-
Adaptive/Flexible Systems: These systems can provide high levels
of parallel functional units, and thus will flourish with the additional parallelism
that VAST can detect and expose through its high-level dependence analysis and code transformations.
-
FPGAs: VAST can generate low-level data dependence information at the
intermediate representation level in the compiler, so that all references
are connected by data dependence arcs to any dependent reference.
This can allow the rest of the compiler to more fully target low-level parallelism
such as that available with an FPGA.
-
VLIW systems: VAST can restructure loop nests to expose additional parallelism
for the compiler to include in the very long instruction words.
VAST can disambiguate dependencies so that the compiler is freer to overlap instructions.
-
Others: VAST is frequently used for systems that cross these boundaries
or feature new or innovative constructs; VAST is highly customizable and
can be useful to just about any computer architecture that features high levels of parallelism.
VAST can support combinations of these targets in one system, for example:
in a triple nested loop, parallelizing an outermost loop while unrolling a middle loop
and vectorizing an inner loop.
|