Parmap






Parmap − Module Parmap: efficient parallel map, fold and
mapfold on lists and arrays on multicores.

Module   Parmap



Module Parmap
 : sigend


Module Parmap : efficient parallel map, fold and mapfold on
lists and arrays on multicores.

All the primitives allow to control the granularity of the
parallelism via an optional parameter chunksize : if
chunksize is omitted, the input sequence is split evenly
among the available cores; if chunksize is specified, the
input data is split in chunks of size chunksize and
dispatched to the available cores using an on demand
strategy that ensures automatic load balancing.

A specific primitive array_float_parmap is provided for fast
operations on float arrays.








     === gettingSettingand ===



     valset_default_ncores : unitint‐>




valget_default_ncores : intunit‐>






     === beingGettingncores ===



     valget_ncores : intunit‐>










                             ‐2‐


     === coreEnabling/disablingprocesses ===



     valdisable_core_pinning : unitunit‐>


disable_core_pinning() will prevent forked out processes
from being pinned to a specific core.  WARNING: this may
have a negative impact on performance, but might be
necessary on systems where several parmap computations are
running concurrently.



valenable_core_pinning : unitunit‐>


enable_core_pinning() turns on core pinning (it is on by
default).





     === gettingSettingand ===



     valset_core_mapping : ‐>intarray


set_core_mappingm installs the array m as the mapping to be
used to pin processes to cores. Process i will be pinned to
core Array.lengthm.(imod .





     === currentGettingthe rankhavethe ===



     valget_rank : intunit‐>






     === subsumingSequencetype, ===











                             ‐3‐


     type ’a sequence =
 | L of ’alist
 | A of ’aarray







     parmapfold,===The inputconvertthe passingtoallow
wantIfyou





     parmapfold,===The inputconvertthe passingtoallow
wantIfyou





     === andOptionalinit ===





     parmapfold,===The inputconvertthe passingtoallow
wantIfyou





     === andOptionalinit ===





     optional===The before(resp.just finalizeinitand
takesiniti





     parmapfold,===The inputconvertthe passingtoallow
wantIfyou











                             ‐4‐


     === andOptionalinit ===





     optional===The before(resp.just finalizeinitand
takesiniti





     === Parallelmapfold ===



     valparmapfold : unit)?init:(int‐>
unit)?finalize:(unit‐> ?ncores:int‐> ?chunksize:int‐>
’b)(’a‐> ‐>’asequence


fparmapfold~ncores:n computes (List.mapList.fold_rightop by
forking n processes on a multicore machine.  You need to
provide the extra concat operator to combine the partial
results of the fold computed on each core. If ’b = ’c, then
concat may be simply op .  The order of computation in
parallel changes w.r.t. sequential execution, so this
function is only correct if op and concat are associative
and commutative.  If the optional chunksize parameter is
specified, the processes compute the result in an on−demand
fashion on blocks of size chunksize .  fparmapfold~ncores:n
computes (Array.mapArray.fold_rightop






     === Parallelfold ===



     valparfold : unit)?init:(int‐> unit)?finalize:(unit‐>
?ncores:int‐> ?chunksize:int‐> ’b(’a‐>


opparfold~ncores:n computes lList.fold_rightop by forking n
processes on a multicore machine.  You need to provide the
extra concat operator to combine the partial results of the
fold computed on each core. If ’b = ’c, then concat may be
simply op .  The order of computation in parallel changes
w.r.t. sequential execution, so this function is only
correct if op and concat are associative and commutative.









                             ‐5‐


If the optional chunksize parameter is specified, the
processes compute the result in an on−demand fashion on
blocks of size chunksize .  opparfold~ncores:n similarly
computes aArray.fold_rightop .





     === Parallelmap ===



     valparmap : unit)?init:(int‐> unit)?finalize:(unit‐>
?chunksize:int?ncores:int‐>


fparmap~ncores:n computes lList.mapf by forking n processes
on a multicore machine.  fparmap~ncores:n computes
aArray.mapf by forking n processes on a multicore machine.
If the optional chunksize parameter is specified, the
processes compute the result in an on−demand fashion on
blocks of size chunksize ; this provides automatic load
balancing for unbalanced computations, but the order of the
result is no longer guaranteed to be preserved.





     === Paralleliteration ===



     valpariter : unit)?init:(int‐> unit)?finalize:(unit‐>
?chunksize:int?ncores:int‐>


fpariter~ncores:n computes lList.iterf by forking n
processes on a multicore machine.  fparmap~ncores:n computes
aArray.iterf by forking n processes on a multicore machine.
If the optional chunksize parameter is specified, the
processes perform the computation in an on−demand fashion on
blocks of size chunksize ; this provides automatic load
balancing for unbalanced computations.





     === indexedParallelmapfold, ===












                             ‐6‐


     valparmapifold : unit)?init:(int‐>
unit)?finalize:(unit‐> ?ncores:int‐> ?chunksize:int‐>
’a(int‐> ‐>’asequence

Like parmapfold, but the map function gets as an extra
argument the index of the mapped element





     === indexedParallelmap, ===



     valparmapi : unit)?init:(int‐> unit)?finalize:(unit‐>
?ncores:int‐> (int?chunksize:int‐>

Like parmap, but the map function gets as an extra argument
the index of the mapped element





     === indexedParalleliteration, ===



     valpariteri : unit)?init:(int‐> unit)?finalize:(unit‐>
?ncores:int‐> (int?chunksize:int‐>

Like pariter, but the iterated function gets as an extra
argument the index of the sequence element





     === onParallelmap ===



     valarray_parmap : unit)?init:(int‐>
unit)?finalize:(unit‐> ?chunksize:int?ncores:int‐>


farray_parmap~ncores:n computes aArray.mapf by forking n
processes on a multicore machine.  If the optional chunksize
parameter is specified, the processes compute the result in
an on−demand fashion on blochs of size chunksize ; this
provides automatic load balancing for unbalanced
computations, but the order of the result is no longer
guaranteed to be preserved.









                             ‐7‐


     === onParallelmap ===



     valarray_parmapi : unit)?init:(int‐>
unit)?finalize:(unit‐> ?chunksize:int?ncores:int‐>

Like array_parmap, but the map function gets as an extra
argument the index of the mapped element





     === onParallelmap ===



     exceptionWrongArraySize




typebuf





valinit_shared_buffer : ‐>floatarray


init_shared_buffera creates a new memory mapped shared
buffer big enough to hold a float array of the size of a .
This buffer can be reused in a series of calls to
array_float_parmap , avoiding the cost of reallocating it
each time.



valarray_float_parmap : unit)?init:(int‐>
unit)?finalize:(unit‐> ?ncores:int‐> ?chunksize:int‐>
‐>?result:floatarray (’a?sharedbuffer:buf‐>


farray_float_parmap~ncores:n computes aArray.mapf by forking
n processes on a multicore machine, and preallocating the
resulting array as shared memory, which allows significantly
more efficient computation than calling the generic
array_parmap function.  If the optional chunksize parameter
is specified, the processes compute the result in an
on−demand fashion on blochs of size chunksize ; this
provides automatic load balancing for unbalanced
computations, *and* the order of the result is still









                             ‐8‐


guaranteed to be preserved.

In case you already have at hand an array where to store the
result, you can squeeze out some more cpu cycles by passing
it as optional parameter result : this will avoid the
creation of a result array, which can be costly for very
large data sets. Raises WrongArraySize if result is too
small to hold the data.

It is possible to share the same preallocated shared memory
space across calls, by initialising the space calling
init_shared_buffera and passing the result as the optional
sharedbuffer parameter to each subsequent call to
array_float_parmap .  Raises WrongArraySize if sharedbuffer
is too small to hold the input data.





     === onParallelmap ===



     valarray_float_parmapi : unit)?init:(int‐>
unit)?finalize:(unit‐> ?ncores:int‐> ?chunksize:int‐>
‐>?result:floatarray (int?sharedbuffer:buf‐>






     array_float_parmap,===Like oftheindex





     array_float_parmap,===Like oftheindex





     === Debugging ===



     valdebugging : unitbool‐>













                             ‐9‐


     or===Enable





     or===Enable





     === forHelperfunction ===



     valredirect : id:int?path:string‐>






     function===Helper thelocatedin stderr.NNNstdout.NNNand
writingUsefulwhen toinitargument valueThedefault ofprocessid