Architecture
Architecture block diagram
Design Rules
Principles
- Minimized features for agility and maintainability
- Minimized technical debt in the mainstream development
- Well tested
Data
- SoA for GPU and SIMD optimizations
- Avoiding nested data structures in core for easier overview of the code logic
- The Core is pure and fully testable
- Random generator should not rely on global state.
- No allocation or freeing memory. Ideally, all memories, including the temporary buffers, should be prepared by CROCK or the bindings.
- If MPI will be used, it will be involved at a higher layer.
- Precision of all float numbers should be globally switchable
- Though, the calculation speed of single precision is the same as double precision on modern CPUs, single precision numbers still have advantages in
- GPU calculation
- MPI data transfer
- revealing the numerical instabilities hidden in the last few mantissa digits of double precision with less simulation steps
- However, the precision of 32-bit float
- is insufficient for many calculations
- requires relaxed thresholds for convergence check in search algorithms, as well as for setting up test cases
- The type long double with 80 or 128 bits can be used to
- check whether an instability is related to the precision
- obtain higher-precision solutions for setting up test cases
- Switchable precision of float numbers makes ROCK difficult to use BLAS/LAPACK. Furthermore, long double is not supported by BLAS/LAPACK at all.
- BLAS or not BLAS
- Field calculations involves linear vector operations (norms, inner products, filling diagonal rows by axpy) and tri-diagonal matrix operation (multiplication with vectors, cyclic reduction, etc.). They benefit from BLAS.
BLAS is less used in electron trajectory calculations. For example, evaluating Lorentz force is the hotspot, whose first step is to calculate the Larmor radius using:
const real rl = cc->bv_blt[BV(ib, B_UZ)] * cc->bv_blt[BV(ib, B_AP)]
/ cc->bv_blt[BV(ib, B_W)]
/ ((
real)(-C_QEME) * c->input.vn_mag_axis[iz]);
double real
Definition types.h:13
as well as other operations which cannot be expressed in BLAS operations.
- Profiler shows that the time for field calculation is less than 1% of the total time in a realistic simulation iteration.
- Therefore, BLAS may not bring too much performance improvement but introduces build dependencies.
Error Handling
- In the following context, errors include also warnings
- Abandon Solutions
- Callback loggers are avoided due to GPU code generation.
- Also, prints are not considered for side effects.
- Returning the error message as string pointer is possible but non-optimum, since it does not reflect the priority of errors.
- To simplify the error handling, the following rules are designed
- Error code (it is called in following the error number) is returned.
- There will be only one error returned each time.
- By multiple errors, only the one with the largest number is returned.
- Therefore, warnings are assigned with lower numbers than errors.
- Rules
- The error/warning number starts from zero.
- Zero means success.
- There are two dummy IDs: W_TOTAL and E_TOTAL, that
- General errors have higher priority than problem-specific errors
Conventions
Generals
- APIs are in SI-units.
- Frequencies in APIs are 𝑓 rather than ω (while internal interfaces may allow ω for optimized performance).
- Since the word “spread” is vague, the definition “span” is used for the width of stochastic sampling. Spans are all relative and double-side. Span=0.2 means ±10% on each side of the reference value. Gaussian distributions are cropped at ±2σ, meaning that span≔4×σ/reference.
- Pitch factors and Lorentz factors are in Gaussian distributed.
- Guiding centers are uniformly distributed.
- Convention for the index in this document:
- Multidimensional arrays are indexed in the C-order.
- The notation of index range [0:N] includes 0 but excludes N, like the Python convention.
- Since the core is implemented in pure C, without typedef (except the global real type), the Linux kernel lower_case convention is applied for the identifiers except macro which are usually in UPPER_CASE.
- The term “nodes” are the mesh nodes along the 𝑧-axis. Nodes are indexed by 𝑧, for example, as vn_* arrays. Note that vz_* is intentionally avoided since it is confusing with the velocity $𝑣_𝑧$.
- The word “size” is confusing, because it could have the following meanings. We take the first meaning due to the built-in function sizeof().
- The number of bytes
The number of elements in an array or vector
- The word “length” is confusing, too. We use the word length for
- String length, as in strlen()
- The physical length, like 1 meter
Suffixes
- *_cs (cs means cos and sin) are normalized complex numbers, which have the form of CPLX(cos(angle), sin(angle)). Since the sincos() calls are so costly, they have to be pre-calculated. Batched pre-calculations can be better vectorized, which further improve the performance. Then in loops the sincos(angle1+angle2) are evaluated by complex multiplications of sincos(angle1) and sincos(angle2) .
- *_omp marks internal functions which contains OpenMP/multithread parallelization, in order to avoid nested multi-threading.
- The invoker should not inherate this suffix.
- APIs (cr_*) are OMP by default. Therefore should not contain this suffix.
- *_offload are the GPU-offloaded variant of functions, whose arguments are host pointers and variables.
- *_gpu are the GPU functions (not the kernels!), whose arguments are device pointers and variables.
Prefixes
- Functions of CROCK/CRACK APIs have the common prefix cr_*. In order to
- Distinguish the API and its proxied internal core function, which may have slightly different arguments.
- Avoid polluting the namespace of the caller (mid-ware/frontend) program.
- dim_ stands for dimension, which may also cause confusion. This word is used for:
- “Length” of a vector, i.e. the number of elements in a vector (which can be a row or column of a matrix), or the total number of elements in a matrix.
- Point: 0D, line: 1D, surface: 2D, volume: 3D etc., however, in this case the word “dimension” is shortened as d or D, rather than dim.
Shape of a multi-dimensional array. (We choose the word “shape” for that.)
- num_ generally indicates discrete numbers (typically integers). Difference to dim_ is that dim_’s are directly related to the data storage.
- make_ is just a universal word indicating an action (i.e. function / procedure).
- make_* is more preferable than
- get_* since those actions are more complicated than usual getters;
- calc_* since heavy calculations are not always required.
- The outputs are usually passed by pointers.
- These functions usually return error codes, see section Error Handling.
- Memory (de)allocation should not pass through the boundary of a make_* function.
- tdm_* are related to Tri-Diagonal-Matrix, which consists of dim_z rows.
- opt_* are optional (NULLable) input or output parameters.
- Regarding SoA layouts:
- vn_* are vector(s) of nodes indexed by [iv, in], which has the shape [N, dim_z].
- vm_* are value(s) of modes indexed by [iv, im], which has the shape [N, dim_m].
- For waves, the index order mvn_* applies.
- A variant of mvn_* is mv_tdm_*, where tdm is the "spatial" coordinate instead of the raw node.
- For beamlets, specific index orders different from mvn_* are applied.
API Arguments
To be simple, data are stored in raw 1D vectors, which can be reshaped into multi-dimensional arrays. The following table contains the conventions of the variables which appear as arguments.
| Name | Type | Shape | Description |
| beam_init | real | [DIM_BEAM_INIT] | The statistic initial parameters of beams, see enum ROW_OF_BEAM_INIT. |
| bvn_blt_trj | real | [dim_b, DIM_BLT_TRJ, dim_z] | Beamlet trajectory, see enum ROWS_OF_BLT_TRJ |
| dim_b | size_t | scalar | Number of beamlets |
| dim_bcc | size_t | scalar | Dimension of radial grids for beam coupling coefficients |
| dim_m | size_t | scalar | Total number of modes |
| dim_z | size_t | scalar | Number of mesh nodes |
| mv_tdm_zk | real | [dim_m, DIM_TDM(dim_z)] | FEM matrix operation (Mz-Mk) for each mode |
| mvn_env | cplx | [dim_m, dim_z] | State of the envelope |
| mvn_mod | real | [dim_m, DIM_MOD, dim_z] | Multi-mode collection of vn_mod |
| mvn_src | cplx | [dim_m, dim_z] | Excitation (source) for the envelope equation without FEM-weighting (Unit in TE-mode: V/m^2) |
| mvn_uuload | real | [dim_m, dim_z] | "Very" unscaled ohmic load, [abs(mvn_env[m])^2 × mvn_uuload for m in modes] results in vn_uload |
| num_azi | int16_t | scalar | Azimuth mode number, whose sign represents the co- and counter-rotation |
| num_rad | int16_t | scalar | Radial mode number, whose negative sign is reserved for TM-modes |
| nvmh_bcc | real | [dim_z, dim_bcc, dim_m, DIM_HARM] | The grid of beamlet coupling coefficients where the v axis is an equidistant grid of radial span |
| tdm_c | real | [DIM_TDM(dim_z)] | FEM matrix Mc |
| vm_bc | cplx | [DIM_BC, dim_m] | Boundary condition, see enum ROW_OF_BC |
| vm_bd | cplx | [DIM_BD, dim_m] | Boundary data, which is dU/dz at left and right boundaries, see enum ROW_OF_BD |
| vm_bp | real | [2, dim_m] | Boundary phases, for the transient updating of boundary condition, [0, :] are the left side while [1, :] are the right side |
| vm_cf | real | [dim_m] | Carrier frequency in Hz (caution: not omega!) |
| vm_cp | real | [dim_m] | Carrier phase in rad |
| vm_df | real | [dim_m] | Delta frequency (drift of carrier) in Hz |
| vm_num | int16_t | [2, dim_m] | [0, :] = vector of num_azi while [1, :] = vector of num_rad |
| vm_rfc | cplx | [dim_m] | (Complex) Resonance Frequency of Cold cavity |
| vn_blt_avg | real | [DIM_BLT_AVG, dim_z] | Averaged beamlet data at each node, see enum ROWS_OF_BLT_AVG |
| vn_mod | real | [DIM_MOD, dim_z] | Parameters of a single mode along the 𝑧-discretization |
| vn_mrs | real | [dim_z] | Mesh Refinement Score of vn_z, a high value means the mesh is too coarse. The score at vn_mrs[i] indicates the mesh segment between vn_z[i] and vn_z[i+1] |
| vn_uload | real | [dim_z] | Unscaled ohmic load, vn_uload / sqrt(sigma) is the true load |
| vn_z | real | [dim_z] | Nodes of 1D discretization of 𝑧-axis |
IMPORTANT We treat num_azi=0 as having the same effect as a positive num_azi. (This choice is flexible but must be consistent.)
Arguments are sorted in the order (skip the element if not applicable):
- Individual inputs, e.g., time step
- Conventional inputs, in the order of
- Inputs depending only on mode
- dim_m
- vm_* (excluding mv_tdm_*)
- Inputs depending only on node
- dim_z
- vn_* (or their rows) in the order of
- tdm_*
- Inputs depending only on both mode and node
- Inputs depending only on beamlet
- [in,out] variables (and dimensions)
- [out] variables (and dimensions)
Miscellaneous
- sincos() is not in C standard and therefore not used. Neither is the pair manually calculated by sqrt. The optimization to a sincos() call should rely on the compiler. So, check in the profiler whether sincos() is used instead of individual sin() and cos().
- [C99] Complex I:
- Just using I or _Complex_I for re+im*Idoes not work on Visual C. Besides, since GCC lacks of _Imaginary_I and uses _Complex_I, INF*I = INF*(0,1) == (NAN,INF) rather than (0,INF).
- If I needs to be involved, use the own-declared II instead of I, see the code docs.
- [C99] An own implementation of CPLX is used instead of CMPLX for the following reasons:
- CMPLX in GCC's header (as standard in most Linux) is hidden from clang due to the missing of GCC-specific version macro __GNUC_PREREQ.
- CMPLX is not generalized for CMPLXF and CMPLXL.