I’m fitting multiple exquisite spectral datasets (~10^5 counts per spectrum) from the Chandra X-ray telescope with the spectral fitting package Xspec and sherpa.
As our knowledge of the effective area response (ARF file) is not perfect (typical numbers of 3-5% uncertainties for Chandra & XMM), I would like to have a way to test the impact of this systematic uncertainty on my fits while using the Poisson Cash statistics (and having my spectra not grouped).

I tried the systematic command from Xspec and the set_syserror from Sherpa.
They provide the expected results when using Chi2 statistics but have no impact when using Cstat (or Wstat). This was expected as Xspec adds this term in quadrature to that on the data when evaluating chi-squared.
Fitting my spectra with chi2 statistics is a partial solution but I would like to avoid grouping my spectral datasets.

Do you know of a way to add this systematic error while using the Cash statistics ?
I’m aware that the Cash statistics formula makes it less easy to implement such a term compared to chi2. In Fermi-LAT analysis, a weighted Likelihood function has been implemented to handle the systematics and this weight term is model dependent …

My questions are :

is it useful to test systematic errors in spectral fits (coming from e.g. ARF) ? I worry that around 1-2 keV the statistics is so high that we can be dominated by the systematics.

is there an already existing framework to handle systematic errors with Cash ?

does this imply to modify the likelihood function ?

Vinay Kashyap and his collaborators in various math/statistics departments have done some work on that over the years and built up the statistical underpinnings. You might want to have a look at their 2011 paper and their follow-up 2011 paper. The pyblocxs code that grew out of that effort is partially integrated into sherpa, but not to the point of where you simply press a button to account for the calibration uncertainties.

The approach that we usually take is to model the systematic effect empirically, instead of altering the error bars. A systematic effect is a shift, and not a random noise, so including it in the model would be the most natural solution. However, this requires a suitable empirical model to be present in the fitting package. If you can add something like a spline, a polynomial or another empirical function to the fit that you can control, it is possible to characterise the cross-calibration differences.

Thanks for your answers. I will look at these papers and pyblocxs.

Fig. 2 of the follow-up paper you mentionned shows that the impact can be quite important in some cases !

Adding it in the model is indeed a good option if you can have a good representation of it.

My worry was that for extremely high counts ( especially in the 1-2 keV band), we might see some few percent variation of the effective area that are not accounted for properly in the ARF (not only the amplitude itself but also energy variations) as shown in the same Fig. 2.

In the Bayesian analysis (using the nested sampling method in BXA) of my dataset, I see some multimodal posteriors and I wonder how much is attributed to ARF/RMF mismodeling.
The multimodal posteriors explains why my standard Xspec fit was giving different results than BXA, it was probably stuck in a local minima.