Distance or "geometrically" based statistics.
- Kolmogorov-Smirnov and Anderson-Darling statistics: were meant to test the g.o.f. of pdfs with defined parameters, not fitted to empirical data. Corrections are available for just a few distributions.
- Chi-Square statistic depends on the definition of the bins into which the data is grouped. There's no rule of thumb to determine the appropriate number of bins. It's also intended to large datasets.
- The above cannot be used to compare distributions with differing numbers of parameters. This a pdf with 4 parameters is likely to fit better the data than a pdf with 2 parameters. But that's a false imporvement due to over-fitting.
- Trucated, censored or binned data cannot be handled by either of the above.
Information Theory based Distance or "geometrically" based statistics.
These methods rank the proposed models. So it's important that there's good candidate models.
- AIC (Akaike Information Criterion). Penalises parameters and is based on the concept of entropy. Is related to the Boltzmann concept, and Fisher's, and Kullback-Leibler discrepancy. There's also a AICc version for small number of sample.
- SIC - BIC (Schwarz or Bayesian Information Criterion). More strict than the previous penalizing number of parameters.
- TIC (Takeuchi Information Criterion): useful when the candidate models aren't close approximations to the "real" underlying function.
- HQIC (Hannan-Quinn Information Criterion): middle criteriorn between AIC and SIC.
References:
Burnham and Anderson () Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach. Springer.
Vose, D. (2010) Fitting distributions to data (and why you are probably doing it wrong).
No comments:
Post a Comment