XGBoost Cheatsheet

A visual guide to XGBoost covering DMatrix vs the sklearn API, training classifiers and regressors, the key hyperparameters, early stopping, feature importance, save and load, cross-validation, and predict with SHAP-style explanations.

python
xgboost
machine-learning
cheatsheet
Author

James Balamuta

Published

July 6, 2026

XGBoost builds an ensemble of decision trees, one at a time, where each new tree corrects the residual errors of the trees before it (gradient boosting). You have two front doors to the same engine: the native API (xgb.DMatrix data plus xgb.train(params, ...) returning a Booster) and the scikit-learn API (XGBClassifier / XGBRegressor with .fit / .predict, drop-in compatible with pipelines and GridSearchCV). The three knobs that matter most are n_estimators (how many trees), max_depth (how complex each tree is), and learning_rate (how much each tree contributes); early stopping uses a validation eval_set to pick the tree count for you. X is a 2-D array or DataFrame of shape (n_samples, n_features) and y is the 1-D target. Where this sheet says “gradient-boosted trees on tabular data,” LightGBM is the sibling library with a near-identical sklearn surface; the Quick Reference maps one to the other. The conventional import is import xgboost as xgb, and everything here is xgboost v3 (native-param aliases and removed options are flagged per section).

Complete XGBoost cheatsheet (light mode): eight panels covering DMatrix vs the sklearn API, training a classifier and regressor, the key hyperparameters, early stopping, feature importance, saving and loading a booster, cross-validation, and predicting with SHAP-style explanations.

Complete XGBoost cheatsheet (dark mode): eight panels covering DMatrix vs the sklearn API, training a classifier and regressor, the key hyperparameters, early stopping, feature importance, saving and loading a booster, cross-validation, and predicting with SHAP-style explanations.

Download the full cheatsheet

All eight panels in a single, printable SVG.

Light SVG Dark SVG

Data: DMatrix vs the sklearn API

The native path wraps your arrays in an xgb.DMatrix (an optimized, often pre-bucketed container) and trains with xgb.train, while the scikit-learn path lets XGBClassifier / XGBRegressor take plain NumPy arrays or DataFrames and handle the DMatrix internally. Use QuantileDMatrix with tree_method="hist" for speed and memory, and keep a separate validation DMatrix for early stopping.

XGBoost data panel: wrap arrays in a DMatrix, faster QuantileDMatrix for hist, skip DMatrix with the sklearn API, add a validation DMatrix, carry feature names and weights.

Two front doors to one engine: a native DMatrix, or plain arrays via the sklearn API.

XGBoost data panel: wrap arrays in a DMatrix, faster QuantileDMatrix for hist, skip DMatrix with the sklearn API, add a validation DMatrix, carry feature names and weights.

Two front doors to one engine: a native DMatrix, or plain arrays via the sklearn API.
import xgboost as xgb

dtrain = xgb.DMatrix(X_train, label=y_train)          # native optimized container
dtrain = xgb.QuantileDMatrix(X_train, label=y_train)  # pre-bucketed for tree_method="hist"

clf = xgb.XGBClassifier()                             # or skip DMatrix entirely:
clf.fit(X_train, y_train)                             # sklearn API takes plain arrays

dvalid = xgb.DMatrix(X_valid, label=y_valid)          # held-out set for early stopping
xgb.DMatrix(X, label=y, weight=w, feature_names=cols) # carry names and per-row weights

See Data interface. QuantileDMatrix pairs with tree_method="hist" for memory savings.

Train: classifier and regressor

One boosting engine powers two estimators: XGBClassifier for labels and XGBRegressor for numbers, both with the familiar .fit(X, y) / .predict(X) interface; the native equivalent is xgb.train(params, dtrain, num_boost_round) returning a Booster. The objective chooses the task (binary:logistic, multi:softprob, reg:squarederror), and tree_method="hist" with device="cuda" moves training to the GPU.

XGBoost train panel: train a classifier, train a regressor, train via the native API, pick the objective, choose CPU or GPU.

One engine, two estimators: XGBClassifier for labels, XGBRegressor for numbers.

XGBoost train panel: train a classifier, train a regressor, train via the native API, pick the objective, choose CPU or GPU.

One engine, two estimators: XGBClassifier for labels, XGBRegressor for numbers.
clf = xgb.XGBClassifier(n_estimators=300, tree_method="hist").fit(X_train, y_train)  # labels
reg = xgb.XGBRegressor(objective="reg:squarederror").fit(X_train, y_train)           # numbers

bst = xgb.train(params, dtrain, num_boost_round=300)  # native API returns a Booster

params = {"objective": "binary:logistic", "eval_metric": "logloss"}  # pick the task
xgb.XGBClassifier(tree_method="hist", device="cuda")                  # train on the GPU

See scikit-learn estimator interface. The objective sets the task and loss.

Key hyperparameters

Capacity comes from three knobs: n_estimators (number of trees), max_depth (complexity of each tree), and learning_rate (shrinkage applied to each tree’s contribution); a smaller learning rate with more trees usually generalizes better. Add subsample / colsample_bytree for stochastic regularization and reg_lambda / min_child_weight to constrain leaves; in the sklearn API use the underscored names (learning_rate, reg_lambda), not the native aliases (eta, lambda).

XGBoost hyperparameters panel: number of boosting rounds, tree depth, shrinkage per tree, subsample rows and columns, regularize leaf weights, avoid native aliases in the sklearn API.

Three knobs set capacity: number of trees, depth per tree, shrinkage per tree.

XGBoost hyperparameters panel: number of boosting rounds, tree depth, shrinkage per tree, subsample rows and columns, regularize leaf weights, avoid native aliases in the sklearn API.

Three knobs set capacity: number of trees, depth per tree, shrinkage per tree.
xgb.XGBClassifier(n_estimators=500)        # more trees: lower bias, slower
xgb.XGBClassifier(max_depth=6)             # complexity of each tree
xgb.XGBClassifier(learning_rate=0.05)      # shrinkage applied to each tree
xgb.XGBClassifier(subsample=0.8, colsample_bytree=0.8)  # stochastic regularization
xgb.XGBClassifier(reg_lambda=1.0, min_child_weight=1)   # constrain leaf weights

# native aliases (xgb.train only): {"eta": 0.05, "lambda": 1.0, "alpha": 0.0}
# sklearn API uses learning_rate, reg_lambda, reg_alpha (not eta, lambda, alpha)

See XGBoost parameters. The sklearn API uses underscored names, not the native aliases.

Early stopping

Instead of guessing n_estimators, set a generous ceiling and let a held-out eval_set stop training when the validation metric stops improving for early_stopping_rounds rounds. Configure early_stopping_rounds and eval_metric in the constructor (passing them to fit is deprecated since 1.6), then read best_iteration and best_score; the EarlyStopping callback with save_best=True keeps the best model.

XGBoost early stopping panel: turn on early stopping in the constructor, provide the watch set in fit, read where it stopped, native-API early stopping, the EarlyStopping callback, the deprecated fit-time form.

Let a validation set choose the tree count; read best_iteration when it stops.

XGBoost early stopping panel: turn on early stopping in the constructor, provide the watch set in fit, read where it stopped, native-API early stopping, the EarlyStopping callback, the deprecated fit-time form.

Let a validation set choose the tree count; read best_iteration when it stops.
clf = xgb.XGBClassifier(n_estimators=1000,            # generous ceiling
                        early_stopping_rounds=20,     # patience (in the constructor)
                        eval_metric="logloss")
clf.fit(X_train, y_train, eval_set=[(X_valid, y_valid)])  # eval_set still goes in fit
clf.best_iteration, clf.best_score                        # where it landed

bst = xgb.train(params, dtrain, num_boost_round=1000,     # native-API early stopping
                evals=[(dvalid, "valid")], early_stopping_rounds=20)
xgb.callback.EarlyStopping(rounds=20, save_best=True)     # callback keeps the best

# deprecated since 1.6: clf.fit(..., early_stopping_rounds=20, eval_metric=...)

See Early stopping. Put early_stopping_rounds and eval_metric in the constructor.

Feature importance (gain)

Ask which features the trees used: feature_importances_ gives normalized scores and get_booster().get_score(importance_type="gain") gives the raw average loss reduction per feature, which is usually the most meaningful ranking. weight (split count) and cover can rank differently, and plot_importance / plot_tree render the chart and individual trees.

XGBoost feature importance panel: normalized importances, raw gain per feature, compare gain and weight and cover, plot importance, draw a single tree.

Gain is the average loss reduction a feature buys; the default and most informative score.

XGBoost feature importance panel: normalized importances, raw gain per feature, compare gain and weight and cover, plot importance, draw a single tree.

Gain is the average loss reduction a feature buys; the default and most informative score.
clf.feature_importances_                                  # normalized, sums to 1.0
clf.get_booster().get_score(importance_type="gain")       # raw gain per feature

clf.get_booster().get_score(importance_type="weight")     # split count
clf.get_booster().get_score(importance_type="cover")      # coverage; rankings can differ

xgb.plot_importance(clf, importance_type="gain", max_num_features=10)  # top-10 bar chart
xgb.plot_tree(clf, num_trees=0)                          # render tree 0 of N

See Plotting. gain is the default and usually the most meaningful ranking.

Persist: save & load a booster

Serialize the fitted model with save_model to a portable .json or compact binary .ubj (UBJSON) file and reload with load_model in another process, no retraining needed. Prefer these stable, version-checked model formats over pickling or the legacy .bin path, which is not guaranteed across releases.

XGBoost persist panel: save the fitted model as JSON, save as compact binary UBJSON, load it back into an estimator, save and load a native Booster, avoid pickle and the legacy .bin path.

Train once, serve many times: save to portable .json or compact .ubj, then load.

XGBoost persist panel: save the fitted model as JSON, save as compact binary UBJSON, load it back into an estimator, save and load a native Booster, avoid pickle and the legacy .bin path.

Train once, serve many times: save to portable .json or compact .ubj, then load.
clf.save_model("model.json")          # portable JSON
clf.save_model("model.ubj")           # compact binary UBJSON: smaller, faster to load

clf2 = xgb.XGBClassifier()            # reload into a fresh estimator
clf2.load_model("model.json")

bst.save_model("booster.ubj")                     # native Booster round-trip
xgb.Booster().load_model("booster.ubj")

# avoid: pickle.dump(bst, ...) and ".bin" files (not portable across versions)

See Saving and loading models. The .json and .ubj formats are stable and version-checked.

Cross-validation

xgb.cv runs k-fold CV on a DMatrix, supports stratified folds and early_stopping_rounds, and (with pandas installed) returns a tidy DataFrame of per-round train / test metric means and standard deviations so you can pick a defensible num_boost_round. Because the sklearn estimators are scikit-learn compatible, you can also drive cross_val_score and GridSearchCV on an XGBClassifier directly.

XGBoost cross-validation panel: k-fold CV on a DMatrix, stratified folds for classification, CV with early stopping, read the results table, or use sklearn cross_val_score.

xgb.cv runs k-fold CV and, with early stopping, picks num_boost_round honestly.

XGBoost cross-validation panel: k-fold CV on a DMatrix, stratified folds for classification, CV with early stopping, read the results table, or use sklearn cross_val_score.

xgb.cv runs k-fold CV and, with early stopping, picks num_boost_round honestly.
cv = xgb.cv(params, dtrain, num_boost_round=500, nfold=5)        # k-fold CV
xgb.cv(params, dtrain, nfold=5, stratified=True)                # balanced folds
xgb.cv(params, dtrain, num_boost_round=500, nfold=5,
       early_stopping_rounds=20)                                # stop at the best round
cv  # pandas DataFrame: train-logloss-mean, test-logloss-mean, test-logloss-std

from sklearn.model_selection import cross_val_score
cross_val_score(clf, X, y, cv=5)        # drive the estimator through scikit-learn

See Cross-validation (xgb.cv). With pandas installed it returns a tidy results DataFrame.

Predict & explain (SHAP-style)

predict returns labels, predict_proba returns class probabilities, and the native bst.predict(dtest, pred_contribs=True) returns SHAP values so each prediction decomposes into a base value plus one signed contribution per feature. For richer plots, pass the fitted model to the shap library’s TreeExplainer, and use .score for a quick default metric.

XGBoost predict and explain panel: predict labels, predict probabilities, SHAP-style contributions, use the shap library, explain one prediction, score on a test set.

Labels, probabilities, or per-feature contributions: a base value plus one push per feature.

XGBoost predict and explain panel: predict labels, predict probabilities, SHAP-style contributions, use the shap library, explain one prediction, score on a test set.

Labels, probabilities, or per-feature contributions: a base value plus one push per feature.
y_pred = clf.predict(X_test)             # class labels
proba = clf.predict_proba(X_test)        # class probabilities, P(class)

bst.predict(dtest, pred_contribs=True)   # SHAP values: base value + one push per feature

import shap
shap.TreeExplainer(clf).shap_values(X_test)   # model-agnostic SHAP plots

bst.predict(dtest, pred_contribs=True)[i]     # explain one prediction
clf.score(X_test, y_test)                     # accuracy (clf) / R^2 (reg)

See Prediction. pred_contribs=True returns SHAP values straight from the booster.

Quick Reference

Key XGBoost calls.
Command What it does Area
xgb.DMatrix(X, label=y) Native optimized data container Data
xgb.QuantileDMatrix(X, label=y) Pre-bucketed container for hist Data
xgb.XGBClassifier(...).fit(X, y) Train a classifier (sklearn API) Train
xgb.XGBRegressor(...).fit(X, y) Train a regressor (sklearn API) Train
xgb.train(params, dtrain, num_boost_round=N) Train a native Booster Train
n_estimators, max_depth, learning_rate The three capacity knobs Tune
early_stopping_rounds=20 + eval_set=[...] Stop on a validation metric Early stop
clf.best_iteration, clf.best_score Where early stopping landed Early stop
clf.feature_importances_ Normalized importances Importance
get_booster().get_score(importance_type="gain") Raw gain per feature Importance
clf.save_model("model.json") / .ubj Persist the model Persist
clf.load_model("model.json") Reload the model Persist
xgb.cv(params, dtrain, nfold=5) K-fold cross-validation CV
clf.predict / clf.predict_proba Labels / probabilities Predict
bst.predict(dtest, pred_contribs=True) SHAP-style contributions Explain
Common XGBoost parameters (sklearn API names).
Param Meaning Typical
n_estimators Number of boosting rounds (trees) 100 to 1000
max_depth Maximum tree depth 3 to 8
learning_rate Shrinkage per tree 0.01 to 0.3
subsample Row fraction per tree 0.6 to 1.0
colsample_bytree Column fraction per tree 0.6 to 1.0
reg_lambda L2 regularization on weights 1.0
reg_alpha L1 regularization on weights 0.0
min_child_weight Minimum sum of leaf instance weight 1
tree_method "hist" (default), "exact", "approx" "hist"
device "cpu" or "cuda" "cpu"
early_stopping_rounds Patience for early stopping 10 to 50
eval_metric Watch metric, e.g. "logloss", "rmse" task-dependent
Spellings to avoid in current XGBoost.
Avoid Use instead Note
eta (sklearn API) learning_rate eta is the native-param alias
lambda / alpha (sklearn API) reg_lambda / reg_alpha underscored names in the estimator
clf.fit(..., early_stopping_rounds=...) constructor early_stopping_rounds=... moved to __init__ in 1.6
clf.fit(..., eval_metric=...) constructor eval_metric=... moved to __init__ in 1.6
use_label_encoder=... (remove it) removed; labels are encoded automatically
gpu_hist, gpu_id tree_method="hist" + device="cuda" unified device API
pickle / .bin model files save_model(".json") / .ubj portable, version-checked formats
XGBoost and LightGBM, side by side.
Concept XGBoost LightGBM
Import import xgboost as xgb import lightgbm as lgb
Classifier / regressor XGBClassifier / XGBRegressor LGBMClassifier / LGBMRegressor
Data container xgb.DMatrix lgb.Dataset
Tree growth level-wise (max_depth) leaf-wise (num_leaves)
Trees n_estimators n_estimators
Shrinkage learning_rate learning_rate
Early stopping early_stopping_rounds (constructor) lgb.early_stopping(...) callback
Save model save_model(".json"/".ubj") booster.save_model(".txt")

Appendix: Sample Code

The whole workflow with the sklearn API (the canonical pattern)

import xgboost as xgb
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split

# 1. Data + a held-out validation set
X, y = make_classification(n_samples=5000, n_features=20, n_informative=8,
                           n_classes=3, random_state=0)
X_tr, X_te, y_tr, y_te = train_test_split(
    X, y, test_size=0.2, stratify=y, random_state=0
)

# 2-4. Train with early stopping (early_stopping_rounds + eval_metric go in the constructor)
clf = xgb.XGBClassifier(
    n_estimators=1000,            # generous ceiling
    max_depth=6,                  # tree complexity
    learning_rate=0.05,           # shrinkage per tree
    subsample=0.8,
    colsample_bytree=0.8,
    reg_lambda=1.0,
    tree_method="hist",           # device="cuda" for GPU
    early_stopping_rounds=20,
    eval_metric="mlogloss",
    random_state=0,
)
clf.fit(X_tr, y_tr, eval_set=[(X_te, y_te)], verbose=False)

print("best_iteration:", clf.best_iteration)    # e.g. 178
print("best_score:", round(clf.best_score, 4))  # e.g. 0.46
print("test accuracy:", round(clf.score(X_te, y_te), 3))

# 5. Feature importance (gain)
gain = clf.get_booster().get_score(importance_type="gain")

# 6. Persist the fitted model
clf.save_model("model.json")     # or "model.ubj" for compact binary

The native API: DMatrix, train, cross-validate

import xgboost as xgb

dtrain = xgb.DMatrix(X_tr, label=y_tr)
dvalid = xgb.DMatrix(X_te, label=y_te)

params = {
    "objective": "multi:softprob",
    "num_class": 3,
    "max_depth": 6,
    "eta": 0.05,                 # native name for learning_rate
    "subsample": 0.8,
    "tree_method": "hist",
    "device": "cpu",
    "eval_metric": "mlogloss",
}

# Cross-validate to pick num_boost_round (returns a pandas DataFrame)
cv = xgb.cv(params, dtrain, num_boost_round=500, nfold=5,
            stratified=True, early_stopping_rounds=20, seed=0, as_pandas=True)
best_round = len(cv)             # rounds kept after early stopping
print(cv.tail(1))                # train/test mlogloss mean and std

# Train the final Booster for that many rounds, watching the valid set
bst = xgb.train(params, dtrain, num_boost_round=best_round,
                evals=[(dvalid, "valid")], early_stopping_rounds=20,
                verbose_eval=False)
bst.save_model("booster.ubj")

Reload and serve (separate process)

import xgboost as xgb

clf = xgb.XGBClassifier()
clf.load_model("model.json")     # no training data or original code needed
print(clf.predict(X_new))        # class labels
print(clf.predict_proba(X_new))  # class probabilities

Explain predictions (SHAP-style contributions)

import xgboost as xgb

bst = xgb.Booster()
bst.load_model("booster.ubj")
dtest = xgb.DMatrix(X_te)

# pred_contribs returns SHAP values: shape (n_samples, n_features + 1),
# where the last column is the base (bias) value and the rest sum to the margin.
contribs = bst.predict(dtest, pred_contribs=True)
print(contribs.shape)            # e.g. (1000, 11) for 10 features

# Or use the shap library for ready-made plots:
# import shap
# explainer = shap.TreeExplainer(clf)
# shap_values = explainer.shap_values(X_te)
# shap.summary_plot(shap_values, X_te)

Reproducible environment header

import xgboost as xgb
print(xgb.__version__)           # 3.3.0
# Pin in pyproject / requirements, e.g.:
#   xgboost==3.3.0
#   scikit-learn==1.9.0
#   numpy==2.4.2
#   scipy==1.17.1
#   shap (optional, for richer explanations)

Behavior notes

  • Two front doors, one engine. The native API (xgb.DMatrix + xgb.train returning a Booster) and the sklearn API (XGBClassifier / XGBRegressor with .fit / .predict) wrap the same gradient-boosting core; the sklearn estimators build the DMatrix for you.
  • Early stopping config moved to the constructor. Since 1.6, early_stopping_rounds and eval_metric belong in __init__ (or set_params), not in fit; eval_set is still passed to fit.
  • Use the underscored sklearn names. learning_rate, reg_lambda, reg_alpha in the estimator; eta, lambda, alpha are the native-params aliases used with xgb.train.
  • gain is the default importance. feature_importances_ is normalized gain; weight (split count) and cover can rank features differently.
  • Prefer .json / .ubj over pickle. These model formats are stable and version-checked across releases; the legacy .bin path and pickling are not guaranteed to load on a newer xgboost.
  • The unified device API replaces gpu_hist. Set tree_method="hist" and device="cuda" for the GPU; gpu_hist and gpu_id are gone, as is use_label_encoder (labels encode automatically).

References

XGBoost documentation (stable)

Project and related