The issue of how to couple a wall-stress model to an LES code is code-specific, so the following is meant primarily as an illustration. For purposes of illustration, we consider a finite-volume code with an unstructured grid in a complex geometry. Specifically, we consider a single wall-face with the centroid located at \(\vec{x}_w\) and with a normal vector \(\vec{n}\) that is pointing towards the fluid.

The key modeling parameter in a wall-stress model is the thickness \(h_{\rm wm}\) over which it should be applied. In most circumstances, \(h_{\rm wm}\) should be chosen as about 10% of the local boundary layer thickness \(\delta\). The fact that \(h_{\rm wm}\) depends on the solution is of course not ideal, but is no different from the more general fact that the grid requirements in LES also depend on the solution. Nevertheless, having chosen \(h_{\rm wm}\), we will solve the wall-stress model between the wall location \(\vec{x}_w\) and the location \(\vec{x}_{\rm wm-top} = \vec{x}_w + h_{\rm wm} \, \vec{n}\). The boundary condition for the wall-stress model at \(\vec{x}_{\rm wm-top}\) should be taken from the LES. In most circumstances, interpolation from the LES cells is necessary. For unstructured grids, it will probably make sense to find the cells involved in the interpolation stencil during a pre-processing step, and then to store those cell indices for each wall face. The real challenge with unstructured grids is that some or all of those cells may reside on a different MPI process. There are basically 2 different solutions to this: (i) to build parallel communicators during a pre-processing step; or (ii) to modify the domain decomposition (ParMetis or similar) to ensure that the relevant cells are on the same MPI process. It may be tempting to avoid this issue by taking the closest point possible on the same MPI process as \(\vec{x}_w\), but this is a bad idea — it makes the whole WMLES model dependent on the domain decomposition, and one will get different results depending on the number of MPI processes. We (JL, in early, unpublished work) actually tested this, and found differences even in average quantities!

In any case, we now assume that we have interpolated the velocity vector \(\vec{u}\) to the \(\vec{x}_{\rm wm-top}\) location (let’s call this \(\vec{u}_{\rm wm-top}\)). Most wall-stress models assume flow parallel to the wall, and we therefore must decompose the velocity vector in a coordinate system locally aligned with the wall. We take \(\vec{u}_{\rm wm-top} = \vec{u}_{{\rm wm-top}, \parallel} + \vec{u}_{{\rm wm-top}, \perp}\) where \(\vec{u}_{{\rm wm-top}, \perp} = \left( \vec{u}_{\rm wm-top} \cdot \vec{n} \right) \vec{n}\).

The magnitude of the wall-parallel velocity \(\left| \vec{u}_{{\rm wm-top}, \parallel} \right|\) is then fed to the wall-stress model to be used as a Dirichlet boundary condition at the top of the wall-modeled layer. The wall-stress model will then return the magnitude of the wall-stress \(\left| \vec{\tau}_{w, \parallel} \right|\).

The final step is to assemble the full wall-stress vector as \(\vec{\tau}_w = \vec{\tau}_{w, \parallel} + \vec{\tau}_{w, \perp}\). Most wall-stress models assume parallel and uni-directional flow, and thus we generally take

\begin{equation} \label{eqn:tauw_u_relation} \tag{1}

\vec{\tau}_{w, \parallel} = – \, \vec{u}_{{\rm wm-top}, \parallel} \,

\frac{ \left| \vec{\tau}_{w, \parallel} \right| }{ \left| \vec{u}_{{\rm wm-top}, \parallel} \right| }

\end{equation}

This may be inaccurate in situations with significant and systematic (i.e., not just instantaneous) 3D (“Ekman”) effects in the inner part of the boundary layer; however, the error of this approximation has not been studied in isolation from many other errors, and thus it remains unknown whether it is an issue or not. Note, for example, that Bermejo-Moreno et al (2014) found good agreement with experiments in their WMLES of duct flow (where the corner flow has 3D effects) despite using this assumed alignment of the wall-stress and the velocity.

Finally, since most wall-stress models assume wall-parallel flow they say nothing about the wall-normal stress \(\vec{\tau}_{w, \perp}\). This can then be modeled using a standard finite-difference operation from the first off-wall cell (just like in a regular LES code). Alternatively, since this wall-normal stress is often very small, it can probably be set to zero without any significant errors. In our own implementations, we have always used a simple finite-difference from the first off-wall cell, exactly as in the baseline LES code.

Having assembled the full wall-stress vector \(\vec{\tau}_w\), one then assigns this at the wall-face when computing the flux in the finite-volume method. In a finite-difference method, one would probably use the wall-stress vector to replace the viscous stress term before taking the divergence.

In compressible solvers, one additionally assigns the heat flux \(q_w\) to the wall-face (for isothermal boundaries) or the wall temperature \(T_w\) (if the heat flux is specified).

### Computational cost

The computational cost of the coupling procedure here is trivial, but the cost of solving the wall-stress model itself can be significant, especially if it induces parallel load imbalance. A good approach to reducing the load imbalance problem is to modify the domain decomposition such that it aims for domains that have equal numbers of both cells *and* wall-faces that require the solution of a wall-stress model (dual-constraint decomposition). This can often be done for unstructured codes.

An additional way to reduce the computational cost is to solve the wall-stress model only once per time step (e.g., in a Runge-Kutta method or in an iterative solution of an implicit solver). This can be done by recognizing that the whole wall-modeling philosophy necessarily implies that the modeled \(\vec{\tau}_w\) will lack any high-frequency oscillations, and thus can never approximate the instantaneous \(\vec{\tau}_w\) — the best we can hope for is that it matches the average \(\vec{\tau}_w\), and that it matches those frequencies in \(\vec{\tau}_w\) that correspond to time-scales in the log-layer. Given this, we can interpret Eqn. (\ref{eqn:tauw_u_relation}) as taking the instantaneous \(\vec{u}_{{\rm wm-top}, \parallel}\) and multiplying it by a “transfer-function” \(\left| \vec{\tau}_{w, \parallel} \right| / \left| \vec{u}_{{\rm wm-top}, \parallel} \right|\) that scales the magnitude of the vector. We can then approximate this “transfer function” by computing it infrequently and storing it for re-use in subsequent RK substeps or iterations. In other words, we don’t assume that the \(\vec{\tau}_w\) stays constant, but that the “transfer function” does.