sfincs_jax
sfincs_jax is a production neoclassical transport code for radially local drift-kinetic calculations in stellarator and tokamak geometry. It combines high-fidelity kinetic models, CPU/GPU execution, matrix-free numerics, and optional differentiable solve paths in one codebase.
Current release snapshot
On the current main branch:
the full audited 39-case example suite runs cleanly on CPU and GPU,
the default CLI and
write-outputpath are validated across the release-facing scope with no practical or strict mismatches,the Python API can switch to differentiable solve paths when end-to-end sensitivities are needed,
and the remaining open work is performance and memory tuning on the heaviest cases, not correctness of the documented workflows.
Current main also contains bounded research-lane evidence for mapped speed
grids, QI seed robustness, solver-policy extraction, optimization promotion,
and single-case sharding. These artifacts are documented with their claim
boundaries: mapped-grid tests cover PAS RHSMode=2 smoke/reduced comparisons, the
QI kinetic lane has a first low-resolution CPU/GPU/Fortran promotion artifact
plus two bounded refined CPU/GPU/Fortran rungs, and production-resolution QI,
true device-QI, and single-case multi-GPU strong scaling remain explicit
research lanes until their promotion gates pass.
What this documentation covers
This manual is organized around the actual user and developer workflows:
Physics model and equations, Drift-kinetic equation and system of equations, Geometry models and loading
Inputs (namelist) reference, Outputs (HDF5, NetCDF4, and NPZ), Applications and research workflows
Parallelism, Performance and differentiability, Testing, validation, and CI
Validation against reference implementations and References and related work
Release benchmark for reference-runtime-window rows whose SFINCS Fortran v3
reference runtime is at least 10 s. Panel A compares wall-clock runtime and Panel B
compares active solver memory for SFINCS Fortran v3, sfincs_jax CPU
cold/warm, and sfincs_jax GPU cold/warm. Fortran memory is process
maximum RSS; JAX memory uses profiler RSS deltas over the fixed runtime
baseline, with full process RSS retained in the JSON reports. Cases are
ordered by best warm
sfincs_jax speedup over the Fortran v3 runtime. Reproduce with
examples/publication_figures/generate_fortran_suite_benchmark_summary.py.
Publication-facing validation dashboard from checked-in collisionality and
electric-field sweep artifacts. Reproduce with
examples/publication_figures/generate_validation_dashboard.py.
Compile-time versus warm steady-state runtime for representative transport cases.
Reproduce with examples/performance/profile_transport_compile_runtime_cache.py.
Contents
- Installation
- Applications and research workflows
- Optimization Workflows
- Examples
- Finite-beta VMEC-JAX to kinetic transport
- Transport matrices (RHSMode=2/3)
- Upstream postprocessing (utils/)
- Optimization + figures
- Implicit differentiation through solves
- VMEC-to-Boozer differentiable geometry handoff
- JIT-compiled optimization with implicit gradients
- Parallel and scaling examples
- Transport-matrix recycling warm starts
- Upstream SFINCS example inputs
- Usage
- Parsing an input file
- Building v3 grids and geometry
- Supported geometry examples
- Applying operator building blocks
- Running the Fortran v3 executable
- First CLI run
- Advanced linear-state export
- Advanced solver controls
- Parallel CLI controls
- Solver controls (environment variables)
- Writing output files with sfincs_jax
- Running an
Erscan (transport-matrix mode) - Running upstream postprocessing scripts (utils/)
- Inputs (namelist) reference
- Outputs (HDF5, NetCDF4, and NPZ)
- Normalizations and units
- Geometry models and loading
- VMEC JAX workflow
- Method overview
- Numerics and algorithms
- Source-code map
- Theory from the upstream SFINCS notes
- Ordering and reduction used by SFINCS v3
- Normalization hierarchy
- Drives, fluxes, and transport-matrix columns
- Constraint structure and nullspaces
- Monoenergetic and DKES-like limits
- Fokker-Planck field terms and Rosenbluth potentials
- Phi1, quasineutrality, and poloidally varying collisions
- What this means for sfincs_jax engineering
- Primary upstream sources summarized here
- Physics model and equations
- Physics reference and code map
- Governing drift-kinetic equation
- Single-species baseline (20131220-04)
- Multi-species extension (20131219-01)
- Guiding-center drifts and trajectory models
- Collision operators (PAS and full Fokker–Planck)
- Phi1 and quasineutrality
- Phi1 impact on flux definitions (20150325-01)
- Transport matrix and Beidler notation
- Single- vs multi-species normalization
- Monoenergetic control parameters (nuPrime, EStar)
- Constraint schemes and source terms
- Classical radial fluxes
- DKES compatibility notes
- Equation-to-code map
- Numerical implementation notes
- References (vendored)
- Drift-kinetic equation and system of equations
- Parallelism
- Why parallelism matters
- Parallelism in JAX
- Parallelism in SFINCS (Fortran v3)
- Parallelism in sfincs_jax
- Design choices and parity
- Step (1): Parallel whichRHS
- Transport-worker scaling audit
- Deterministic benchmark plans
- Earlier runs (smaller grids)
- Reduced‑suite parallel sanity checks
- JIT/compilation notes
- Step (2): Parallel cases / scans
- Scaling to dozens/hundreds (job arrays)
- Step (3): Sharded matvec (SPMD)
- Sharded matvec scaling (single RHS)
- Single-device derivative-kernel speedup
- X‑sharded matvec scaling (single RHS)
- Shard_map halo prototype (uneven partition evaluation)
- Sharded solve scaling (single RHSMode=1)
- Why scaling is still poor for single‑RHS GMRES
- Next‑step plan
- Verification
- Recommended workflows
- Executable-first rollout
- High-collisionality transport scans
- Parity and determinism
- Performance notes
- Measured large-case scaling snapshot
- Fresh two-GPU throughput rerun
- Fresh two-GPU transport-worker rerun
- Recent sharded-solve updates
- Open research lanes
- Performance and differentiability
- Current release snapshot
- Production-resolution benchmark tier
- Targeted solver profiling
- External solver-library gates
- What is differentiable today?
- JAX-native performance patterns used in sfincs_jax
- Explicit sparse host/device split helper
- Solver defaults (Phi1 + sharding)
- Historical profiling notes
- Krylov solver strategy (memory + recycling)
- RHSMode=1 GMRES preconditioning (experimental)
- Future optimization ideas (optional)
- Links to the JAX ecosystem (optional)
- Parity tuning environment variables (developer)
- Reference benchmark figure (README/index)
- Persistent-cache compile/runtime split
- Deep profiling without perturbing GPU timings
- Memory footprint and compilation-time optimization (literature-backed)
- Connection to reduced-model adjoint methods
- Operator-level parity debugging utility
- Performance techniques (full detail)
- Baseline model and linear system (v3)
- What SFINCS v3 does (for performance context)
- Comparison with Fortran v3 workflow
- Matrix-free operator application (A·x) and caching
- Structured solve admission gate
- JIT compilation and persistent compilation cache
- Geometry/output caching
- Active-DOF reduction (sparse pitch grid)
- Krylov solver strategy (short recurrence + fallback)
- Host-only LGMRES fast path
- Structural sparse-host RHSMode=1 path
- Frozen-case variant benchmarking
- Adaptive PAS smoother stage
- Implicit differentiation through linear solves
- Transport preconditioning (RHSMode=2/3)
- RHSMode=1 preconditioning (matrix-free)
- Matvec fusion for collisionless + drift terms
- Sparse-row derivative kernels (step 1)
- Transport diagnostics: batched + precomputed
- Recycled Krylov initial guesses for transport
- Weighted reductions and Fortran sum order
- Dense fallbacks (RHSMode=1 vs transport)
- Memory reduction: remat/checkpoint + short recurrence
- Memory reduction: operator representation and measured gates
- JAX ecosystem gates
- Geometry parsing cache
- F-block operator cache
- Performance deltas (where measured)
- Implementation map (source code)
- Summary of tuning knobs
- References (vendored)
- External performance references
- Development Roadmap
- Differentiable adaptive speed grids
- Testing, validation, and CI
- Validation Matrix
- Paper figures (reproduced)
- Bundled source literature
- Fortran v3 example suite status
- Utils (ported SFINCS v3 scripts)
- Execution mode
- Quick start
- Scan directives (
!ss) - Single‑run plotting
- Scan launchers (run
sfincs_jax) - Run-spec files (
runspec.dat) - Scan plotting
- Radial scan helper
profilesfile format (profilesScheme=1)- Monoenergetic transport coefficients
- Bootstrap current vs collisionality
- Model test (impurity density)
- Generating the gallery
- API reference
Namelistread_sfincs_input()V3Gridsgrids_from_namelist()BoozerGeometryboozer_geometry_from_bc_file()boozer_geometry_scheme1()boozer_geometry_scheme2()boozer_geometry_scheme4()VmecInterpolationVmecWoutpsi_a_hat_from_wout()read_vmec_wout()vmec_interpolation()vmec_geometry_from_wout()vmec_geometry_from_wout_file()boozer_bhat_from_spectrum()boozer_spectrum_geometry_proxy_objective()boozer_spectrum_proxy_transport_gradient_gate()boozer_spectrum_proxy_transport_objective()geometry_proxy_no_solve_provenance_gate()geometry_proxy_workflow_contract()geometry_proxy_workflow_summary()kinetic_transport_scalar_no_overclaim_gate()optional_jax_geometry_backend_report()optional_jax_geometry_backend_status()vmec_boozer_kinetic_transport_scalar_contract()vmec_wout_from_wout_like()b0_over_bbar()fsab_hat2()g_hat_i_hat()u_hat()u_hat_np()vprime_hat()uniform_diff_matrices()XGridmake_x_grid()make_x_polynomial_diff_matrices()x_weight_d1_over_weight_np()x_weight_d2_over_weight_np()x_weight_np()CollisionlessV3Operatorapply_collisionless_v3()apply_collisionless_v3_jit()ErXDotV3OperatorErXiDotV3Operatorapply_er_xdot_v3()apply_er_xdot_v3_jit()apply_er_xdot_v3_offdiag2()apply_er_xdot_v3_offdiag2_jit()apply_er_xidot_v3()apply_er_xidot_v3_jit()apply_er_xidot_v3_offdiag2()apply_er_xidot_v3_offdiag2_jit()ExBThetaV3OperatorExBZetaV3Operatorapply_exb_theta_v3()apply_exb_theta_v3_jit()apply_exb_zeta_v3()apply_exb_zeta_v3_jit()MagneticDriftThetaV3OperatorMagneticDriftXiDotV3OperatorMagneticDriftZetaV3Operatorapply_magnetic_drift_theta_v3()apply_magnetic_drift_theta_v3_jit()apply_magnetic_drift_theta_v3_offdiag2()apply_magnetic_drift_theta_v3_offdiag2_jit()apply_magnetic_drift_xidot_v3()apply_magnetic_drift_xidot_v3_jit()apply_magnetic_drift_xidot_v3_offdiag2()apply_magnetic_drift_xidot_v3_offdiag2_jit()apply_magnetic_drift_zeta_v3()apply_magnetic_drift_zeta_v3_jit()apply_magnetic_drift_zeta_v3_offdiag2()apply_magnetic_drift_zeta_v3_offdiag2_jit()FokkerPlanckV3OperatorFokkerPlanckV3Phi1OperatorPitchAngleScatteringV3Operatorapply_fokker_planck_v3()apply_fokker_planck_v3_jit()apply_fokker_planck_v3_phi1()apply_fokker_planck_v3_phi1_jit()apply_pitch_angle_scattering_v3()apply_pitch_angle_scattering_v3_jit()make_fokker_planck_v3_operator()make_fokker_planck_v3_phi1_operator()nu_d_hat_pitch_angle_scattering_v3()polynomial_interpolation_matrix_np()rosenbluth_potential_terms_v3_np()V3FBlockOperatorfokker_planck_collision_operator_with_phi1_from_namelist()V3FullSystemOperatorapply_v3_full_system_jacobian()apply_v3_full_system_jacobian_jit()apply_v3_full_system_operator()apply_v3_full_system_operator_jit()full_system_operator_from_namelist()precompile_v3_full_system()residual_v3_full_system()rhs_v3_full_system()rhs_v3_full_system_jit()with_transport_rhs_settings()V3TransportDiagnosticsV3TransportDiagnosticsPrecomputedf0_l0_v3_from_operator()f0_l0_v3_from_operator_phi1()f0_v3_from_operator()v3_rhsmode1_output_fields_vm_only()v3_rhsmode1_output_fields_vm_only_batch()v3_rhsmode1_output_fields_vm_only_batch_jit()v3_rhsmode1_output_fields_vm_only_jit()v3_rhsmode1_output_fields_vm_only_phi1()v3_rhsmode1_output_fields_vm_only_phi1_batch()v3_rhsmode1_output_fields_vm_only_phi1_batch_jit()v3_rhsmode1_output_fields_vm_only_phi1_jit()v3_transport_diagnostics_vm_only()v3_transport_diagnostics_vm_only_batch()v3_transport_diagnostics_vm_only_batch_jit()v3_transport_diagnostics_vm_only_batch_op0()v3_transport_diagnostics_vm_only_batch_op0_jit()v3_transport_diagnostics_vm_only_batch_op0_precomputed()v3_transport_diagnostics_vm_only_batch_op0_precomputed_jit()v3_transport_diagnostics_vm_only_batch_op0_precomputed_remat_jit()v3_transport_diagnostics_vm_only_batch_op0_remat_jit()v3_transport_diagnostics_vm_only_batch_remat_jit()v3_transport_diagnostics_vm_only_precompute()v3_transport_matrix_column()v3_transport_matrix_from_flux_arrays()v3_transport_matrix_from_state_vectors()v3_transport_output_fields_vm_only()V3FBlockLinearSystemV3FullLinearSystemCollisionalityRecordErSweepRecordSuiteCaseMetricappendix_b_geometry_audit_from_h5()autodiff_gradient_error_summary()benchmark_resolution_floor_violations()build_autodiff_sensitivity_validation_summary()build_fortran_suite_benchmark_summary()build_high_collisionality_trend_proxy_summary()build_publication_validation_summary()build_simakov_helander_limit_audit_summary()er_nonzero_model_spread()er_zero_field_spread()filter_suite_metrics_by_fortran_runtime()fortran_suite_benchmark_schema_errors()load_autodiff_sensitivity_summary()load_collisionality_records()load_er_sweep_records()load_suite_report()suite_case_metrics()suite_report_summary()LinearSolveMemoryEstimatebicgstab_work_nbytes()csr_matrix_nbytes()dense_matrix_nbytes()dtype_nbytes()estimate_linear_solve_memory()estimate_sparse_pc_memory()gmres_basis_nbytes()gmres_restart_for_budget()V3LinearSolveResultV3NewtonKrylovResultV3TransportMatrixSolveResultsolve_v3_full_system_linear_gmres()solve_v3_full_system_linear_gmres_jit()solve_v3_full_system_newton_krylov()solve_v3_full_system_newton_krylov_history()solve_v3_transport_matrix_linear_gmres()- Refactored solve-policy modules
- Validation against reference implementations
- References and related work
- Core neoclassical and SFINCS literature
- Experimental and cross-code validation anchors
- Bundled technical notes and manuals
- JAX and differentiable programming
- Testing, validation, and coverage methodology
- Linear algebra and preconditioning
- Optimization-focused neoclassical workflows
- Recent applications (examples to prioritize)
- Contributing
- Release notes
- Release checklist