Performance Optimisation - when MICRESS gets very slow

Bernd · Post by **Bernd** » Sat Oct 14, 2017 1:41 pm

Dear users,

we are all familiar with the fact that running phase-field simulations often can take very long time, and, essentially, the upper limit of the simulation time only depend on the patience of ourselves and of those people who wait for our simulation results. However, not in all cases huge simulation times are realistic or really required. Always when MICRESS seems to be too slow, or when you need to optimize a MICRESS simulation for performance (e.g. before starting a series of production runs with parameter variation), it is time to think about performance.
In such a case, the first step should be to have a look on the .TabP output file. In case this file shows only the headlines but no data apart from time step 0, you should request a smaller interval for the tablog output which regulates writing of .TabL, .TabT, .TabP and .TabTQ:

#
# Selection of the outputs
# ========================
# [legacy|verbose|terse]
...
# Should monitoring outputs be written out? ('TabL')
# Options: tab_log [simulation time, s] [wallclock time, min] no_tab_log
tab_log 0.002
#

The content of the .TabP file gives us the information we need to see in which parts of the code most time is spent. From this information, we also can extract which are possible reasons for the time loss and which measures could be taken.

In the first place, we should look at columns 2 and 3, showing the cumulative CPU and wallclock time. In a serial computation, the time values shown here should be similar. If the wallclock time is much bigger than the CPU time, this is an indication that time is lost in actions which do not need CPU like output or idle waiting. Reasons for this type of time loss can be a too frequent output (also tablog!) in connection with slow network connection. In this case you should consider to reduce the number of outputs if they are not really needed, or write data locally on the computer instead of over network.
In case of parallel computation, CPU time typically is by a number of thread times larger than wallclock time, if parallelisation is "in action". Unfortunately, idle threads are also counted, so that no conclusion on the efficiency can be drawn from .TabP.

Another candidate for excessive time loss is TQ usage (column 4 of .TabP). The .TabTQ output can furthermore be used to get a more differentiated view if several phase interactions are involved. Typically, high TQ time loss is due to the use of many elements, the use of a local relinearisation scheme, low updating intervals, or TQ-errors which appear frequently. Updating of diffusion data is also subsumed to this time.

If the numerical time-step is very small, this typically leads to a big time usage in the phase-field solver (column 5), the interface list operations (column 8), and solute redistribution (contained in column 6). It must be checked whether a bigger phase-field time step can be used without causing problems by setting the minimum time step together with the "automatic_limited" keyword in the time input.

If mainly column 6 shows high time usage, the diffusion solver takes most time. It should be checked whether there are components which diffuse so fast that they are practically equally distributed in the corresponding phase, so that infinite diffusion could be used instead. Otherwise, parallelisation should be used (which is typically very effective in such cases), or the grid spacing needs to be increased.

Furthermore, there are other columns which give information about time loss in specific solvers or features like nucleation, stress solver, or enthalpy calculation. Please note that TQ usage during checking for nucleation appears in the column for nucleation and not in the TQ-usage.

Once the place where time is lost has been identified in the .TabP (and possibly .TabTQ) output, the problem can be identified and hopefully solved. If time usage appears quite equally distributed over many columns, it can be a sign that optimisation was already quite successful - or that multiple optimisation is needed.

Bernd