Model Documentation Template¶
Model Owner: Name and contact information
Document Version: Version controlling this document is highly recommended
Reviewers: List reviewers
Overview¶
Model Type¶
Model Type: (e.g., Neural Networks, Decision Trees, etc.)
Model Description¶
- Description
Status¶
Status Date: YYYY-MM-DD
Status: specify one of:
- Under Preparation -- The model is still under active development and is not yet ready for use due to active "dev" updates.
- Regularly Updated -- New versions of the model have been or will continue to be made available.
- Actively Maintained -- No new versions will be made available, but this model will be actively maintained.
- Limited Maintenance -- The model will not be updated, but any technical issues will be addressed.
- Deprecated -- This model is obsolete or is no longer being maintained.
Relevant Links¶
Example references:
- GitHub Repository
- Paper/Documentation Link
- Initiative Demo
- Conference Talk
- API Link
Developers¶
- Name, Team
- Name, Team
Owner¶
- Team Name, Contact Person
Version Details and Artifacts¶
Current Model Version:
Model Version Release Date:
Model Version at last Model Documentation Update:
Artifacts:
- Model weights (e.g. S3 bucket path)
- Model config
Intended and Known Usage¶
Intended Use¶
- Description
Domain(s) of use¶
- Description
Specific tasks performed:
Instructions for use for deployers:
Out Of Scope Uses¶
Provide potential applications and/or use cases for which use of the model is not suitable.
Known Applications¶
| Application | Purpose of Model Usage | AI Act Risk |
|---|---|---|
| Application 1 | Foundation model providing customer embeddings for fraud detection scoring | High |
| Application 2 | Customer embeddings used directly as features for recommendation engine | Limited |
Note, this table may not be exhaustive. Model users and documentation consumers at large are highly encouraged to contribute known usages.
Model Architecture¶
-
Architecture Description
-
Key components
-
Hyperparameter tuning methodology
-
Training Methodology
-
Training duration
-
Compute resources used
Data Collection and Preprocessing¶
- Steps Involved:
- Data collection: Describe how the data was sourced (e.g., databases, APIs, sensors, or publicly available datasets).
-
Data cleaning: Explain techniques used to handle missing values, outliers, or errors.
-
Data transformation: Include any scaling or encoding applied.
Data Splitting¶
- Subset Definitions:
- Training set:
- Validation set:
- Test set:
- Splitting Methodology:
- Describe the approach:
- Random Sampling:
- Stratified Sampling:
- Temporal Splits:
- Proportions:
- Example: "70% training, 20% validation, 10% testing."
- Reproducibility:
- Mention how many seeds are being used and how many are needed to prove statistical significance
Data Shuffling:
- Shuffle applied: (Yes/No)
Model Training Process¶
Details of Processes:
- Initialisation:
- Loss Function:
- Optimiser:
- Hyperparameters:
Model Training and Validation¶
Objective: Clarify what the model is supposed to achieve.
- Problem statement (e.g., classification of X, prediction of Y)
- Business goals (accuracy, fairness, speed)
-
Metrics selected (e.g., accuracy, precision, recall, F1-score, AUC-ROC, MAE, RMSE) Rationale for each metric (why accuracy? why F1-score?)
-
Model predictions on the validation set evalutaion description.
Hyperparameter Tuning:
Regularisation:
Early Stopping:
Model Testing and Evaluation¶
Performance Metrics:
- Compute metrics on the test set:
- Accuracy, precision, recall, F1 score for classification.
- MSE, RMSE, MAE for regression.
Confusion Matrix:
- Generate a confusion matrix to evaluate classification results.
ROC Curve and AUC:
- For binary classifiers, compute the ROC curve and Area Under the Curve (AUC).
Feature Importance:
- Analyse feature contributions (for explainability).
Robustness Testing:
- Test the model on edge cases or adversarial examples.
Comparison to Baselines:
- Compare the model’s performance to a simple baseline (e.g., random guess, mean prediction).
Model Bias and Fairness Analysis¶
Implicit Bias, Measurement Bias, Temporal Bias, Selection Bias, Confounding Bias
Bias Detection Methods Used¶
Pre-processing: Resampling, Reweighting,Transformation (data imputation, changing order of data); Relabeling, Blinding
In-processing: Transfer learning, Reweighting, Constraint optimization, Adversarial Learning, Regularization, Bandits
Post-processing: Transformation, Calibration, Thresholding
Results of Bias Testing:
Mitigation Measures¶
Fairness adjustments: Introduce fairness criteria (like demographic parity, equal opportunity, or equalized odds) into the model training process.
Adversarial Debiasing: Use adversarial networks to remove biased information during training. The main model tries to make accurate predictions, while an adversary network tries to predict sensitive attributes from the model's predictions.
Retraining approaches¶
Fairness Regularization: Modify the model's objective function to penalize bias. This introduces regularization terms that discourage the model from making predictions that disproportionately affect certain groups.
Fair Representation Learning: Learn latent representations of the input data that remove sensitive attributes, ensuring that downstream models trained on these representations are fair.
Post-Processing Techniques¶
Fairness-Aware Recalibration: After the model is trained, adjust decision thresholds separately for different demographic groups to reduce disparities in false positive/false negative rates.
Output Perturbation: Introduce randomness or noise to model predictions to make outcomes more equitable across groups.
Fairness Impact Statement: Explain trade-offs made to satisfy certain fairness criterias
Model Interpretability and Explainability¶
Explainability Techniques Used:
Examples: SHAP (SHapley Additive exPlanations), LIME (Local Interpretable Model-agnostic Explanations)
Post-hoc Explanation Models
- Feature Importance, Permutation Importance, SHAP (SHapley Additive exPlanations), LIME (Local Interpretable Model-agnostic Explanations):
- Partial Dependence Plots (PDP)
- Counterfactual Explanations
- Surrogate Models
- Attention Mechanisms (for Deep Learning)
Model-Specific Explanation Techniques
- Grad-CAM (Gradient-weighted Class Activation Mapping) for CNNs and RNNs: especially for computer vision applications
- Layer-wise Relevance Propagation (LRP): Works well for CNNs, fully connected nets, and some RNNs (classification focused)
- TreeSHAP (SHAP for Decision Trees)
How interpretable is the model’s decision-making process?
EU Declaration of conformity¶
Standards applied¶
Documentation Metadata¶
Version¶
Template Version¶
Documentation Authors¶
- Name, Team: (Owner / Contributor / Manager)
- Name, Team: (Owner / Contributor / Manager)
- Name, Team: (Owner / Contributor / Manager)