XGBoost Cheatsheet – TheCoatlessProfessor

XGBoost builds an ensemble of decision trees, one at a time, where each new tree corrects the residual errors of the trees before it (gradient boosting). You have two front doors to the same engine: the native API (xgb.DMatrix data plus xgb.train(params, ...) returning a Booster) and the scikit-learn API (XGBClassifier / XGBRegressor with .fit / .predict, drop-in compatible with pipelines and GridSearchCV). The three knobs that matter most are n_estimators (how many trees), max_depth (how complex each tree is), and learning_rate (how much each tree contributes); early stopping uses a validation eval_set to pick the tree count for you. X is a 2-D array or DataFrame of shape (n_samples, n_features) and y is the 1-D target. Where this sheet says “gradient-boosted trees on tabular data,” LightGBM is the sibling library with a near-identical sklearn surface; the Quick Reference maps one to the other. The conventional import is import xgboost as xgb, and everything here is xgboost v3 (native-param aliases and removed options are flagged per section).

Download the full cheatsheet

All eight panels as one SVG (light or dark), or a print-ready multi-page PDF.

Light SVG Dark SVG Print PDF

Data: DMatrix vs the sklearn API

The native path wraps your arrays in an xgb.DMatrix (an optimized, often pre-bucketed container) and trains with xgb.train, while the scikit-learn path lets XGBClassifier / XGBRegressor take plain NumPy arrays or DataFrames and handle the DMatrix internally. Use QuantileDMatrix with tree_method="hist" for speed and memory, and keep a separate validation DMatrix for early stopping.

XGBoost data panel: wrap arrays in a DMatrix, faster QuantileDMatrix for hist, skip DMatrix with the sklearn API, add a validation DMatrix, carry feature names and weights.

Two front doors to one engine: a native DMatrix, or plain arrays via the sklearn API.

XGBoost data panel: wrap arrays in a DMatrix, faster QuantileDMatrix for hist, skip DMatrix with the sklearn API, add a validation DMatrix, carry feature names and weights.

Two front doors to one engine: a native DMatrix, or plain arrays via the sklearn API.

import xgboost as xgb

dtrain = xgb.DMatrix(X_train, label=y_train)          # native optimized container
dtrain = xgb.QuantileDMatrix(X_train, label=y_train)  # pre-bucketed for tree_method="hist"

clf = xgb.XGBClassifier()                             # or skip DMatrix entirely:
clf.fit(X_train, y_train)                             # sklearn API takes plain arrays

dvalid = xgb.DMatrix(X_valid, label=y_valid)          # held-out set for early stopping
xgb.DMatrix(X, label=y, weight=w, feature_names=cols) # carry names and per-row weights

See Data interface. QuantileDMatrix pairs with tree_method="hist" for memory savings.

Train: classifier and regressor

One boosting engine powers two estimators: XGBClassifier for labels and XGBRegressor for numbers, both with the familiar .fit(X, y) / .predict(X) interface; the native equivalent is xgb.train(params, dtrain, num_boost_round) returning a Booster. The objective chooses the task (binary:logistic, multi:softprob, reg:squarederror), and tree_method="hist" with device="cuda" moves training to the GPU.

XGBoost train panel: train a classifier, train a regressor, train via the native API, pick the objective, choose CPU or GPU.

One engine, two estimators: XGBClassifier for labels, XGBRegressor for numbers.

XGBoost train panel: train a classifier, train a regressor, train via the native API, pick the objective, choose CPU or GPU.

One engine, two estimators: XGBClassifier for labels, XGBRegressor for numbers.

clf = xgb.XGBClassifier(n_estimators=300, tree_method="hist").fit(X_train, y_train)  # labels
reg = xgb.XGBRegressor(objective="reg:squarederror").fit(X_train, y_train)           # numbers

bst = xgb.train(params, dtrain, num_boost_round=300)  # native API returns a Booster

params = {"objective": "binary:logistic", "eval_metric": "logloss"}  # pick the task
xgb.XGBClassifier(tree_method="hist", device="cuda")                  # train on the GPU

See scikit-learn estimator interface. The objective sets the task and loss.

Key hyperparameters

Capacity comes from three knobs: n_estimators (number of trees), max_depth (complexity of each tree), and learning_rate (shrinkage applied to each tree’s contribution); a smaller learning rate with more trees usually generalizes better. Add subsample / colsample_bytree for stochastic regularization and reg_lambda / min_child_weight to constrain leaves; in the sklearn API use the underscored names (learning_rate, reg_lambda), not the native aliases (eta, lambda).

XGBoost hyperparameters panel: number of boosting rounds, tree depth, shrinkage per tree, subsample rows and columns, regularize leaf weights, avoid native aliases in the sklearn API.

Three knobs set capacity: number of trees, depth per tree, shrinkage per tree.

XGBoost hyperparameters panel: number of boosting rounds, tree depth, shrinkage per tree, subsample rows and columns, regularize leaf weights, avoid native aliases in the sklearn API.

Three knobs set capacity: number of trees, depth per tree, shrinkage per tree.

xgb.XGBClassifier(n_estimators=500)        # more trees: lower bias, slower
xgb.XGBClassifier(max_depth=6)             # complexity of each tree
xgb.XGBClassifier(learning_rate=0.05)      # shrinkage applied to each tree
xgb.XGBClassifier(subsample=0.8, colsample_bytree=0.8)  # stochastic regularization
xgb.XGBClassifier(reg_lambda=1.0, min_child_weight=1)   # constrain leaf weights

# native aliases (xgb.train only): {"eta": 0.05, "lambda": 1.0, "alpha": 0.0}
# sklearn API uses learning_rate, reg_lambda, reg_alpha (not eta, lambda, alpha)

See XGBoost parameters. The sklearn API uses underscored names, not the native aliases.

Early stopping

Instead of guessing n_estimators, set a generous ceiling and let a held-out eval_set stop training when the validation metric stops improving for early_stopping_rounds rounds. Configure early_stopping_rounds and eval_metric in the constructor (passing them to fit is deprecated since 1.6), then read best_iteration and best_score; the EarlyStopping callback with save_best=True keeps the best model.

XGBoost early stopping panel: turn on early stopping in the constructor, provide the watch set in fit, read where it stopped, native-API early stopping, the EarlyStopping callback, the deprecated fit-time form.

Let a validation set choose the tree count; read best_iteration when it stops.

clf = xgb.XGBClassifier(n_estimators=1000,            # generous ceiling
                        early_stopping_rounds=20,     # patience (in the constructor)
                        eval_metric="logloss")
clf.fit(X_train, y_train, eval_set=[(X_valid, y_valid)])  # eval_set still goes in fit
clf.best_iteration, clf.best_score                        # where it landed

bst = xgb.train(params, dtrain, num_boost_round=1000,     # native-API early stopping
                evals=[(dvalid, "valid")], early_stopping_rounds=20)
xgb.callback.EarlyStopping(rounds=20, save_best=True)     # callback keeps the best

# deprecated since 1.6: clf.fit(..., early_stopping_rounds=20, eval_metric=...)

See Early stopping. Put early_stopping_rounds and eval_metric in the constructor.

Feature importance (gain)

Ask which features the trees used: feature_importances_ gives normalized scores and get_booster().get_score(importance_type="gain") gives the raw average loss reduction per feature, which is usually the most meaningful ranking. weight (split count) and cover can rank differently, and plot_importance / plot_tree render the chart and individual trees.

XGBoost feature importance panel: normalized importances, raw gain per feature, compare gain and weight and cover, plot importance, draw a single tree.

Gain is the average loss reduction a feature buys; the default and most informative score.

XGBoost feature importance panel: normalized importances, raw gain per feature, compare gain and weight and cover, plot importance, draw a single tree.

Gain is the average loss reduction a feature buys; the default and most informative score.

clf.feature_importances_                                  # normalized, sums to 1.0
clf.get_booster().get_score(importance_type="gain")       # raw gain per feature

clf.get_booster().get_score(importance_type="weight")     # split count
clf.get_booster().get_score(importance_type="cover")      # coverage; rankings can differ

xgb.plot_importance(clf, importance_type="gain", max_num_features=10)  # top-10 bar chart
xgb.plot_tree(clf, num_trees=0)                          # render tree 0 of N

See Plotting. gain is the default and usually the most meaningful ranking.

Persist: save & load a booster

Serialize the fitted model with save_model to a portable .json or compact binary .ubj (UBJSON) file and reload with load_model in another process, no retraining needed. Prefer these stable, version-checked model formats over pickling or the legacy .bin path, which is not guaranteed across releases.

XGBoost persist panel: save the fitted model as JSON, save as compact binary UBJSON, load it back into an estimator, save and load a native Booster, avoid pickle and the legacy .bin path.

Train once, serve many times: save to portable .json or compact .ubj, then load.

XGBoost persist panel: save the fitted model as JSON, save as compact binary UBJSON, load it back into an estimator, save and load a native Booster, avoid pickle and the legacy .bin path.

Train once, serve many times: save to portable .json or compact .ubj, then load.

clf.save_model("model.json")          # portable JSON
clf.save_model("model.ubj")           # compact binary UBJSON: smaller, faster to load

clf2 = xgb.XGBClassifier()            # reload into a fresh estimator
clf2.load_model("model.json")

bst.save_model("booster.ubj")                     # native Booster round-trip
xgb.Booster().load_model("booster.ubj")

# avoid: pickle.dump(bst, ...) and ".bin" files (not portable across versions)

See Saving and loading models. The .json and .ubj formats are stable and version-checked.

Cross-validation

xgb.cv runs k-fold CV on a DMatrix, supports stratified folds and early_stopping_rounds, and (with pandas installed) returns a tidy DataFrame of per-round train / test metric means and standard deviations so you can pick a defensible num_boost_round. Because the sklearn estimators are scikit-learn compatible, you can also drive cross_val_score and GridSearchCV on an XGBClassifier directly.

XGBoost cross-validation panel: k-fold CV on a DMatrix, stratified folds for classification, CV with early stopping, read the results table, or use sklearn cross_val_score.

xgb.cv runs k-fold CV and, with early stopping, picks num_boost_round honestly.

XGBoost cross-validation panel: k-fold CV on a DMatrix, stratified folds for classification, CV with early stopping, read the results table, or use sklearn cross_val_score.

xgb.cv runs k-fold CV and, with early stopping, picks num_boost_round honestly.

cv = xgb.cv(params, dtrain, num_boost_round=500, nfold=5)        # k-fold CV
xgb.cv(params, dtrain, nfold=5, stratified=True)                # balanced folds
xgb.cv(params, dtrain, num_boost_round=500, nfold=5,
       early_stopping_rounds=20)                                # stop at the best round
cv  # pandas DataFrame: train-logloss-mean, test-logloss-mean, test-logloss-std

from sklearn.model_selection import cross_val_score
cross_val_score(clf, X, y, cv=5)        # drive the estimator through scikit-learn

See Cross-validation (xgb.cv). With pandas installed it returns a tidy results DataFrame.

Predict & explain (SHAP-style)

predict returns labels, predict_proba returns class probabilities, and the native bst.predict(dtest, pred_contribs=True) returns SHAP values so each prediction decomposes into a base value plus one signed contribution per feature. For richer plots, pass the fitted model to the shap library’s TreeExplainer, and use .score for a quick default metric.

XGBoost predict and explain panel: predict labels, predict probabilities, SHAP-style contributions, use the shap library, explain one prediction, score on a test set.

Labels, probabilities, or per-feature contributions: a base value plus one push per feature.

XGBoost predict and explain panel: predict labels, predict probabilities, SHAP-style contributions, use the shap library, explain one prediction, score on a test set.

Labels, probabilities, or per-feature contributions: a base value plus one push per feature.

y_pred = clf.predict(X_test)             # class labels
proba = clf.predict_proba(X_test)        # class probabilities, P(class)

bst.predict(dtest, pred_contribs=True)   # SHAP values: base value + one push per feature

import shap
shap.TreeExplainer(clf).shap_values(X_test)   # model-agnostic SHAP plots

bst.predict(dtest, pred_contribs=True)[i]     # explain one prediction
clf.score(X_test, y_test)                     # accuracy (clf) / R^2 (reg)

See Prediction. pred_contribs=True returns SHAP values straight from the booster.

Quick Reference

Key XGBoost calls.
Command	What it does	Area
`xgb.DMatrix(X, label=y)`	Native optimized data container	Data
`xgb.QuantileDMatrix(X, label=y)`	Pre-bucketed container for `hist`	Data
`xgb.XGBClassifier(...).fit(X, y)`	Train a classifier (sklearn API)	Train
`xgb.XGBRegressor(...).fit(X, y)`	Train a regressor (sklearn API)	Train
`xgb.train(params, dtrain, num_boost_round=N)`	Train a native `Booster`	Train
`n_estimators`, `max_depth`, `learning_rate`	The three capacity knobs	Tune
`early_stopping_rounds=20` + `eval_set=[...]`	Stop on a validation metric	Early stop
`clf.best_iteration`, `clf.best_score`	Where early stopping landed	Early stop
`clf.feature_importances_`	Normalized importances	Importance
`get_booster().get_score(importance_type="gain")`	Raw gain per feature	Importance
`clf.save_model("model.json")` / `.ubj`	Persist the model	Persist
`clf.load_model("model.json")`	Reload the model	Persist
`xgb.cv(params, dtrain, nfold=5)`	K-fold cross-validation	CV
`clf.predict` / `clf.predict_proba`	Labels / probabilities	Predict
`bst.predict(dtest, pred_contribs=True)`	SHAP-style contributions	Explain

Common XGBoost parameters (sklearn API names).
Param	Meaning	Typical
`n_estimators`	Number of boosting rounds (trees)	100 to 1000
`max_depth`	Maximum tree depth	3 to 8
`learning_rate`	Shrinkage per tree	0.01 to 0.3
`subsample`	Row fraction per tree	0.6 to 1.0
`colsample_bytree`	Column fraction per tree	0.6 to 1.0
`reg_lambda`	L2 regularization on weights	1.0
`reg_alpha`	L1 regularization on weights	0.0
`min_child_weight`	Minimum sum of leaf instance weight	1
`tree_method`	`"hist"` (default), `"exact"`, `"approx"`	`"hist"`
`device`	`"cpu"` or `"cuda"`	`"cpu"`
`early_stopping_rounds`	Patience for early stopping	10 to 50
`eval_metric`	Watch metric, e.g. `"logloss"`, `"rmse"`	task-dependent

Spellings to avoid in current XGBoost.
Avoid	Use instead	Note
`eta` (sklearn API)	`learning_rate`	`eta` is the native-param alias
`lambda` / `alpha` (sklearn API)	`reg_lambda` / `reg_alpha`	underscored names in the estimator
`clf.fit(..., early_stopping_rounds=...)`	constructor `early_stopping_rounds=...`	moved to `__init__` in 1.6
`clf.fit(..., eval_metric=...)`	constructor `eval_metric=...`	moved to `__init__` in 1.6
`use_label_encoder=...`	(remove it)	removed; labels are encoded automatically
`gpu_hist`, `gpu_id`	`tree_method="hist"` + `device="cuda"`	unified device API
pickle / `.bin` model files	`save_model(".json")` / `.ubj`	portable, version-checked formats

XGBoost and LightGBM, side by side.
Concept	XGBoost	LightGBM
Import	`import xgboost as xgb`	`import lightgbm as lgb`
Classifier / regressor	`XGBClassifier` / `XGBRegressor`	`LGBMClassifier` / `LGBMRegressor`
Data container	`xgb.DMatrix`	`lgb.Dataset`
Tree growth	level-wise (`max_depth`)	leaf-wise (`num_leaves`)
Trees	`n_estimators`	`n_estimators`
Shrinkage	`learning_rate`	`learning_rate`
Early stopping	`early_stopping_rounds` (constructor)	`lgb.early_stopping(...)` callback
Save model	`save_model(".json"/".ubj")`	`booster.save_model(".txt")`

Appendix: Sample Code

The whole workflow with the sklearn API (the canonical pattern)

import xgboost as xgb
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split

# 1. Data + a held-out validation set
X, y = make_classification(n_samples=5000, n_features=20, n_informative=8,
                           n_classes=3, random_state=0)
X_tr, X_te, y_tr, y_te = train_test_split(
    X, y, test_size=0.2, stratify=y, random_state=0
)

# 2-4. Train with early stopping (early_stopping_rounds + eval_metric go in the constructor)
clf = xgb.XGBClassifier(
    n_estimators=1000,            # generous ceiling
    max_depth=6,                  # tree complexity
    learning_rate=0.05,           # shrinkage per tree
    subsample=0.8,
    colsample_bytree=0.8,
    reg_lambda=1.0,
    tree_method="hist",           # device="cuda" for GPU
    early_stopping_rounds=20,
    eval_metric="mlogloss",
    random_state=0,
)
clf.fit(X_tr, y_tr, eval_set=[(X_te, y_te)], verbose=False)

print("best_iteration:", clf.best_iteration)    # e.g. 178
print("best_score:", round(clf.best_score, 4))  # e.g. 0.46
print("test accuracy:", round(clf.score(X_te, y_te), 3))

# 5. Feature importance (gain)
gain = clf.get_booster().get_score(importance_type="gain")

# 6. Persist the fitted model
clf.save_model("model.json")     # or "model.ubj" for compact binary

The native API: DMatrix, train, cross-validate

import xgboost as xgb

dtrain = xgb.DMatrix(X_tr, label=y_tr)
dvalid = xgb.DMatrix(X_te, label=y_te)

params = {
    "objective": "multi:softprob",
    "num_class": 3,
    "max_depth": 6,
    "eta": 0.05,                 # native name for learning_rate
    "subsample": 0.8,
    "tree_method": "hist",
    "device": "cpu",
    "eval_metric": "mlogloss",
}

# Cross-validate to pick num_boost_round (returns a pandas DataFrame)
cv = xgb.cv(params, dtrain, num_boost_round=500, nfold=5,
            stratified=True, early_stopping_rounds=20, seed=0, as_pandas=True)
best_round = len(cv)             # rounds kept after early stopping
print(cv.tail(1))                # train/test mlogloss mean and std

# Train the final Booster for that many rounds, watching the valid set
bst = xgb.train(params, dtrain, num_boost_round=best_round,
                evals=[(dvalid, "valid")], early_stopping_rounds=20,
                verbose_eval=False)
bst.save_model("booster.ubj")

Reload and serve (separate process)

import xgboost as xgb

clf = xgb.XGBClassifier()
clf.load_model("model.json")     # no training data or original code needed
print(clf.predict(X_new))        # class labels
print(clf.predict_proba(X_new))  # class probabilities

Explain predictions (SHAP-style contributions)

import xgboost as xgb

bst = xgb.Booster()
bst.load_model("booster.ubj")
dtest = xgb.DMatrix(X_te)

# pred_contribs returns SHAP values: shape (n_samples, n_features + 1),
# where the last column is the base (bias) value and the rest sum to the margin.
contribs = bst.predict(dtest, pred_contribs=True)
print(contribs.shape)            # e.g. (1000, 11) for 10 features

# Or use the shap library for ready-made plots:
# import shap
# explainer = shap.TreeExplainer(clf)
# shap_values = explainer.shap_values(X_te)
# shap.summary_plot(shap_values, X_te)

Reproducible environment header

import xgboost as xgb
print(xgb.__version__)           # 3.3.0
# Pin in pyproject / requirements, e.g.:
#   xgboost==3.3.0
#   scikit-learn==1.9.0
#   numpy==2.4.2
#   scipy==1.17.1
#   shap (optional, for richer explanations)

Behavior notes

Two front doors, one engine. The native API (xgb.DMatrix + xgb.train returning a Booster) and the sklearn API (XGBClassifier / XGBRegressor with .fit / .predict) wrap the same gradient-boosting core; the sklearn estimators build the DMatrix for you.
Early stopping config moved to the constructor. Since 1.6, early_stopping_rounds and eval_metric belong in __init__ (or set_params), not in fit; eval_set is still passed to fit.
Use the underscored sklearn names. learning_rate, reg_lambda, reg_alpha in the estimator; eta, lambda, alpha are the native-params aliases used with xgb.train.
gain is the default importance. feature_importances_ is normalized gain; weight (split count) and cover can rank features differently.
Prefer .json / .ubj over pickle. These model formats are stable and version-checked across releases; the legacy .bin path and pickling are not guaranteed to load on a newer xgboost.
The unified device API replaces gpu_hist. Set tree_method="hist" and device="cuda" for the GPU; gpu_hist and gpu_id are gone, as is use_label_encoder (labels encode automatically).

References

XGBoost documentation (stable)

Install and the Python package introduction (mental model)
Python API reference (all symbols), Data interface, scikit-learn estimator interface
XGBoost parameters, Early stopping, Plotting
Saving and loading models, Cross-validation (xgb.cv), Prediction, Parameter tuning notes

Project and related