XGBoost builds an ensemble of decision trees, one at a time, where each new tree corrects the residual errors of the trees before it (gradient boosting). You have two front doors to the same engine: the native API (xgb.DMatrix data plus xgb.train(params, ...) returning a Booster) and the scikit-learn API (XGBClassifier / XGBRegressor with .fit / .predict, drop-in compatible with pipelines and GridSearchCV). The three knobs that matter most are n_estimators (how many trees), max_depth (how complex each tree is), and learning_rate (how much each tree contributes); early stopping uses a validation eval_set to pick the tree count for you. X is a 2-D array or DataFrame of shape (n_samples, n_features) and y is the 1-D target. Where this sheet says “gradient-boosted trees on tabular data,” LightGBM is the sibling library with a near-identical sklearn surface; the Quick Reference maps one to the other. The conventional import is import xgboost as xgb, and everything here is xgboost v3 (native-param aliases and removed options are flagged per section).
Data: DMatrix vs the sklearn API
The native path wraps your arrays in an xgb.DMatrix (an optimized, often pre-bucketed container) and trains with xgb.train, while the scikit-learn path lets XGBClassifier / XGBRegressor take plain NumPy arrays or DataFrames and handle the DMatrix internally. Use QuantileDMatrix with tree_method="hist" for speed and memory, and keep a separate validation DMatrix for early stopping.
import xgboost as xgb
dtrain = xgb.DMatrix(X_train, label=y_train) # native optimized container
dtrain = xgb.QuantileDMatrix(X_train, label=y_train) # pre-bucketed for tree_method="hist"
clf = xgb.XGBClassifier() # or skip DMatrix entirely:
clf.fit(X_train, y_train) # sklearn API takes plain arrays
dvalid = xgb.DMatrix(X_valid, label=y_valid) # held-out set for early stopping
xgb.DMatrix(X, label=y, weight=w, feature_names=cols) # carry names and per-row weightsSee Data interface. QuantileDMatrix pairs with tree_method="hist" for memory savings.
Train: classifier and regressor
One boosting engine powers two estimators: XGBClassifier for labels and XGBRegressor for numbers, both with the familiar .fit(X, y) / .predict(X) interface; the native equivalent is xgb.train(params, dtrain, num_boost_round) returning a Booster. The objective chooses the task (binary:logistic, multi:softprob, reg:squarederror), and tree_method="hist" with device="cuda" moves training to the GPU.
clf = xgb.XGBClassifier(n_estimators=300, tree_method="hist").fit(X_train, y_train) # labels
reg = xgb.XGBRegressor(objective="reg:squarederror").fit(X_train, y_train) # numbers
bst = xgb.train(params, dtrain, num_boost_round=300) # native API returns a Booster
params = {"objective": "binary:logistic", "eval_metric": "logloss"} # pick the task
xgb.XGBClassifier(tree_method="hist", device="cuda") # train on the GPUSee scikit-learn estimator interface. The objective sets the task and loss.
Key hyperparameters
Capacity comes from three knobs: n_estimators (number of trees), max_depth (complexity of each tree), and learning_rate (shrinkage applied to each tree’s contribution); a smaller learning rate with more trees usually generalizes better. Add subsample / colsample_bytree for stochastic regularization and reg_lambda / min_child_weight to constrain leaves; in the sklearn API use the underscored names (learning_rate, reg_lambda), not the native aliases (eta, lambda).
xgb.XGBClassifier(n_estimators=500) # more trees: lower bias, slower
xgb.XGBClassifier(max_depth=6) # complexity of each tree
xgb.XGBClassifier(learning_rate=0.05) # shrinkage applied to each tree
xgb.XGBClassifier(subsample=0.8, colsample_bytree=0.8) # stochastic regularization
xgb.XGBClassifier(reg_lambda=1.0, min_child_weight=1) # constrain leaf weights
# native aliases (xgb.train only): {"eta": 0.05, "lambda": 1.0, "alpha": 0.0}
# sklearn API uses learning_rate, reg_lambda, reg_alpha (not eta, lambda, alpha)See XGBoost parameters. The sklearn API uses underscored names, not the native aliases.
Early stopping
Instead of guessing n_estimators, set a generous ceiling and let a held-out eval_set stop training when the validation metric stops improving for early_stopping_rounds rounds. Configure early_stopping_rounds and eval_metric in the constructor (passing them to fit is deprecated since 1.6), then read best_iteration and best_score; the EarlyStopping callback with save_best=True keeps the best model.
clf = xgb.XGBClassifier(n_estimators=1000, # generous ceiling
early_stopping_rounds=20, # patience (in the constructor)
eval_metric="logloss")
clf.fit(X_train, y_train, eval_set=[(X_valid, y_valid)]) # eval_set still goes in fit
clf.best_iteration, clf.best_score # where it landed
bst = xgb.train(params, dtrain, num_boost_round=1000, # native-API early stopping
evals=[(dvalid, "valid")], early_stopping_rounds=20)
xgb.callback.EarlyStopping(rounds=20, save_best=True) # callback keeps the best
# deprecated since 1.6: clf.fit(..., early_stopping_rounds=20, eval_metric=...)See Early stopping. Put early_stopping_rounds and eval_metric in the constructor.
Feature importance (gain)
Ask which features the trees used: feature_importances_ gives normalized scores and get_booster().get_score(importance_type="gain") gives the raw average loss reduction per feature, which is usually the most meaningful ranking. weight (split count) and cover can rank differently, and plot_importance / plot_tree render the chart and individual trees.
clf.feature_importances_ # normalized, sums to 1.0
clf.get_booster().get_score(importance_type="gain") # raw gain per feature
clf.get_booster().get_score(importance_type="weight") # split count
clf.get_booster().get_score(importance_type="cover") # coverage; rankings can differ
xgb.plot_importance(clf, importance_type="gain", max_num_features=10) # top-10 bar chart
xgb.plot_tree(clf, num_trees=0) # render tree 0 of NSee Plotting. gain is the default and usually the most meaningful ranking.
Persist: save & load a booster
Serialize the fitted model with save_model to a portable .json or compact binary .ubj (UBJSON) file and reload with load_model in another process, no retraining needed. Prefer these stable, version-checked model formats over pickling or the legacy .bin path, which is not guaranteed across releases.
clf.save_model("model.json") # portable JSON
clf.save_model("model.ubj") # compact binary UBJSON: smaller, faster to load
clf2 = xgb.XGBClassifier() # reload into a fresh estimator
clf2.load_model("model.json")
bst.save_model("booster.ubj") # native Booster round-trip
xgb.Booster().load_model("booster.ubj")
# avoid: pickle.dump(bst, ...) and ".bin" files (not portable across versions)See Saving and loading models. The .json and .ubj formats are stable and version-checked.
Cross-validation
xgb.cv runs k-fold CV on a DMatrix, supports stratified folds and early_stopping_rounds, and (with pandas installed) returns a tidy DataFrame of per-round train / test metric means and standard deviations so you can pick a defensible num_boost_round. Because the sklearn estimators are scikit-learn compatible, you can also drive cross_val_score and GridSearchCV on an XGBClassifier directly.
cv = xgb.cv(params, dtrain, num_boost_round=500, nfold=5) # k-fold CV
xgb.cv(params, dtrain, nfold=5, stratified=True) # balanced folds
xgb.cv(params, dtrain, num_boost_round=500, nfold=5,
early_stopping_rounds=20) # stop at the best round
cv # pandas DataFrame: train-logloss-mean, test-logloss-mean, test-logloss-std
from sklearn.model_selection import cross_val_score
cross_val_score(clf, X, y, cv=5) # drive the estimator through scikit-learnSee Cross-validation (xgb.cv). With pandas installed it returns a tidy results DataFrame.
Predict & explain (SHAP-style)
predict returns labels, predict_proba returns class probabilities, and the native bst.predict(dtest, pred_contribs=True) returns SHAP values so each prediction decomposes into a base value plus one signed contribution per feature. For richer plots, pass the fitted model to the shap library’s TreeExplainer, and use .score for a quick default metric.
y_pred = clf.predict(X_test) # class labels
proba = clf.predict_proba(X_test) # class probabilities, P(class)
bst.predict(dtest, pred_contribs=True) # SHAP values: base value + one push per feature
import shap
shap.TreeExplainer(clf).shap_values(X_test) # model-agnostic SHAP plots
bst.predict(dtest, pred_contribs=True)[i] # explain one prediction
clf.score(X_test, y_test) # accuracy (clf) / R^2 (reg)See Prediction. pred_contribs=True returns SHAP values straight from the booster.
Quick Reference
| Command | What it does | Area |
|---|---|---|
xgb.DMatrix(X, label=y) |
Native optimized data container | Data |
xgb.QuantileDMatrix(X, label=y) |
Pre-bucketed container for hist |
Data |
xgb.XGBClassifier(...).fit(X, y) |
Train a classifier (sklearn API) | Train |
xgb.XGBRegressor(...).fit(X, y) |
Train a regressor (sklearn API) | Train |
xgb.train(params, dtrain, num_boost_round=N) |
Train a native Booster |
Train |
n_estimators, max_depth, learning_rate |
The three capacity knobs | Tune |
early_stopping_rounds=20 + eval_set=[...] |
Stop on a validation metric | Early stop |
clf.best_iteration, clf.best_score |
Where early stopping landed | Early stop |
clf.feature_importances_ |
Normalized importances | Importance |
get_booster().get_score(importance_type="gain") |
Raw gain per feature | Importance |
clf.save_model("model.json") / .ubj |
Persist the model | Persist |
clf.load_model("model.json") |
Reload the model | Persist |
xgb.cv(params, dtrain, nfold=5) |
K-fold cross-validation | CV |
clf.predict / clf.predict_proba |
Labels / probabilities | Predict |
bst.predict(dtest, pred_contribs=True) |
SHAP-style contributions | Explain |
| Param | Meaning | Typical |
|---|---|---|
n_estimators |
Number of boosting rounds (trees) | 100 to 1000 |
max_depth |
Maximum tree depth | 3 to 8 |
learning_rate |
Shrinkage per tree | 0.01 to 0.3 |
subsample |
Row fraction per tree | 0.6 to 1.0 |
colsample_bytree |
Column fraction per tree | 0.6 to 1.0 |
reg_lambda |
L2 regularization on weights | 1.0 |
reg_alpha |
L1 regularization on weights | 0.0 |
min_child_weight |
Minimum sum of leaf instance weight | 1 |
tree_method |
"hist" (default), "exact", "approx" |
"hist" |
device |
"cpu" or "cuda" |
"cpu" |
early_stopping_rounds |
Patience for early stopping | 10 to 50 |
eval_metric |
Watch metric, e.g. "logloss", "rmse" |
task-dependent |
| Avoid | Use instead | Note |
|---|---|---|
eta (sklearn API) |
learning_rate |
eta is the native-param alias |
lambda / alpha (sklearn API) |
reg_lambda / reg_alpha |
underscored names in the estimator |
clf.fit(..., early_stopping_rounds=...) |
constructor early_stopping_rounds=... |
moved to __init__ in 1.6 |
clf.fit(..., eval_metric=...) |
constructor eval_metric=... |
moved to __init__ in 1.6 |
use_label_encoder=... |
(remove it) | removed; labels are encoded automatically |
gpu_hist, gpu_id |
tree_method="hist" + device="cuda" |
unified device API |
pickle / .bin model files |
save_model(".json") / .ubj |
portable, version-checked formats |
| Concept | XGBoost | LightGBM |
|---|---|---|
| Import | import xgboost as xgb |
import lightgbm as lgb |
| Classifier / regressor | XGBClassifier / XGBRegressor |
LGBMClassifier / LGBMRegressor |
| Data container | xgb.DMatrix |
lgb.Dataset |
| Tree growth | level-wise (max_depth) |
leaf-wise (num_leaves) |
| Trees | n_estimators |
n_estimators |
| Shrinkage | learning_rate |
learning_rate |
| Early stopping | early_stopping_rounds (constructor) |
lgb.early_stopping(...) callback |
| Save model | save_model(".json"/".ubj") |
booster.save_model(".txt") |
Appendix: Sample Code
The whole workflow with the sklearn API (the canonical pattern)
import xgboost as xgb
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
# 1. Data + a held-out validation set
X, y = make_classification(n_samples=5000, n_features=20, n_informative=8,
n_classes=3, random_state=0)
X_tr, X_te, y_tr, y_te = train_test_split(
X, y, test_size=0.2, stratify=y, random_state=0
)
# 2-4. Train with early stopping (early_stopping_rounds + eval_metric go in the constructor)
clf = xgb.XGBClassifier(
n_estimators=1000, # generous ceiling
max_depth=6, # tree complexity
learning_rate=0.05, # shrinkage per tree
subsample=0.8,
colsample_bytree=0.8,
reg_lambda=1.0,
tree_method="hist", # device="cuda" for GPU
early_stopping_rounds=20,
eval_metric="mlogloss",
random_state=0,
)
clf.fit(X_tr, y_tr, eval_set=[(X_te, y_te)], verbose=False)
print("best_iteration:", clf.best_iteration) # e.g. 178
print("best_score:", round(clf.best_score, 4)) # e.g. 0.46
print("test accuracy:", round(clf.score(X_te, y_te), 3))
# 5. Feature importance (gain)
gain = clf.get_booster().get_score(importance_type="gain")
# 6. Persist the fitted model
clf.save_model("model.json") # or "model.ubj" for compact binaryThe native API: DMatrix, train, cross-validate
import xgboost as xgb
dtrain = xgb.DMatrix(X_tr, label=y_tr)
dvalid = xgb.DMatrix(X_te, label=y_te)
params = {
"objective": "multi:softprob",
"num_class": 3,
"max_depth": 6,
"eta": 0.05, # native name for learning_rate
"subsample": 0.8,
"tree_method": "hist",
"device": "cpu",
"eval_metric": "mlogloss",
}
# Cross-validate to pick num_boost_round (returns a pandas DataFrame)
cv = xgb.cv(params, dtrain, num_boost_round=500, nfold=5,
stratified=True, early_stopping_rounds=20, seed=0, as_pandas=True)
best_round = len(cv) # rounds kept after early stopping
print(cv.tail(1)) # train/test mlogloss mean and std
# Train the final Booster for that many rounds, watching the valid set
bst = xgb.train(params, dtrain, num_boost_round=best_round,
evals=[(dvalid, "valid")], early_stopping_rounds=20,
verbose_eval=False)
bst.save_model("booster.ubj")Reload and serve (separate process)
import xgboost as xgb
clf = xgb.XGBClassifier()
clf.load_model("model.json") # no training data or original code needed
print(clf.predict(X_new)) # class labels
print(clf.predict_proba(X_new)) # class probabilitiesExplain predictions (SHAP-style contributions)
import xgboost as xgb
bst = xgb.Booster()
bst.load_model("booster.ubj")
dtest = xgb.DMatrix(X_te)
# pred_contribs returns SHAP values: shape (n_samples, n_features + 1),
# where the last column is the base (bias) value and the rest sum to the margin.
contribs = bst.predict(dtest, pred_contribs=True)
print(contribs.shape) # e.g. (1000, 11) for 10 features
# Or use the shap library for ready-made plots:
# import shap
# explainer = shap.TreeExplainer(clf)
# shap_values = explainer.shap_values(X_te)
# shap.summary_plot(shap_values, X_te)Reproducible environment header
import xgboost as xgb
print(xgb.__version__) # 3.3.0
# Pin in pyproject / requirements, e.g.:
# xgboost==3.3.0
# scikit-learn==1.9.0
# numpy==2.4.2
# scipy==1.17.1
# shap (optional, for richer explanations)Behavior notes
- Two front doors, one engine. The native API (
xgb.DMatrix+xgb.trainreturning aBooster) and the sklearn API (XGBClassifier/XGBRegressorwith.fit/.predict) wrap the same gradient-boosting core; the sklearn estimators build theDMatrixfor you. - Early stopping config moved to the constructor. Since 1.6,
early_stopping_roundsandeval_metricbelong in__init__(orset_params), not infit;eval_setis still passed tofit. - Use the underscored sklearn names.
learning_rate,reg_lambda,reg_alphain the estimator;eta,lambda,alphaare the native-paramsaliases used withxgb.train. gainis the default importance.feature_importances_is normalized gain;weight(split count) andcovercan rank features differently.- Prefer
.json/.ubjover pickle. These model formats are stable and version-checked across releases; the legacy.binpath and pickling are not guaranteed to load on a newer xgboost. - The unified device API replaces
gpu_hist. Settree_method="hist"anddevice="cuda"for the GPU;gpu_histandgpu_idare gone, as isuse_label_encoder(labels encode automatically).
References
XGBoost documentation (stable)
- Install and the Python package introduction (mental model)
- Python API reference (all symbols), Data interface, scikit-learn estimator interface
- XGBoost parameters, Early stopping, Plotting
- Saving and loading models, Cross-validation (
xgb.cv), Prediction, Parameter tuning notes
Project and related