AI Core Module¶

Integrates machine learning workflows into materials research.

Highlights:

Automated regression, classification, and clustering workflows
Access to 140+ pre-trained models
Fine-tuning and inference on user data
Predict material properties: band gap, formation energy, hardness
Built-in tools for data splitting, validation, and evaluation

class PyGamLab.ai_core.GamAI_io(**kwargs)[source]

Bases: object

A unified input/output handler for saving and loading .gam_ai model packages.

This class encapsulates both machine learning model serialization and relevant metadata (e.g., author info, model parameters, training details) into a single portable .gam_ai file. It enables seamless model deployment, archival, and reproducibility by combining the binary model object and human-readable metadata in one structured JSON container.

Parameters:

model_name (str, optional) – A short, descriptive name for the model. Used as the filename during saving.
model_type (str, optional) – The algorithmic or architectural family of the model (e.g., “RandomForest”, “XGBoost”, “CNN”, “Transformer”).
description (str, optional, default="") – A brief summary describing the model’s purpose, training dataset, or key features.
author_name (str, optional, default="") – Full name of the model creator.
author_email (str, optional, default="") – Contact email for correspondence or citation.
trainer_name (str, optional, default="") – The individual or system responsible for model training.
best_accuracy (float, optional) – The highest validation or test accuracy achieved during training.
doi (str, optional) – Digital Object Identifier (DOI) associated with the published model or dataset.
hyperparam_range (dict, optional, default={}) – Dictionary defining hyperparameter search ranges used during optimization.
best_params (dict, optional, default={}) – Dictionary of the final optimized hyperparameter values.
ml_model (object, optional) – The trained machine learning model instance (e.g., scikit-learn estimator).

Notes

The .gam_ai file format consists of a JSON object with two main sections:

metadata: Contains descriptive fields such as author, model type, and hyperparameters.
model_data: Contains the Base64-encoded binary serialization of the trained model, produced via joblib.

This approach ensures full portability and JSON readability, enabling both programmatic and manual inspection of model metadata.

Examples

>>> from gamai_io import GamAI_io
>>> from sklearn.ensemble import RandomForestClassifier
>>> model = RandomForestClassifier(n_estimators=100, random_state=42)
>>> model.fit(X_train, y_train)
>>>
>>> package = GamAI_io(
...     model_name="rf_classifier_v1",
...     model_type="RandomForest",
...     description="Predicts material phases using compositional data",
...     author_name="John Doe",
...     author_email="john.doe@example.com",
...     best_accuracy=0.94,
...     hyperparam_range={"n_estimators": [50, 100, 200]},
...     best_params={"n_estimators": 100, "max_depth": 10},
...     ml_model=model
... )
>>>
>>> # Save to file
>>> package.save(save_dir="models")
💾 Saved ml_model package: models/rf_classifier_v1.gam_ai
>>>
>>> # Load from file
>>> loaded_package = GamAI_io.load("models/rf_classifier_v1.gam_ai")
>>> restored_model = loaded_package.ml_model
>>> restored_model.predict(X_test[:5])

See also

joblib.dump: Efficient serialization of Python objects.
json: Standard JSON encoder/decoder.
base64: Encoding binary model data for safe JSON storage.

static load(filepath)[source]

Load a .gam_ai model package from disk.

Parameters:: filepath (str) – Full path to the .gam_ai file to be loaded.
Returns:: A GamAI_io instance containing both metadata and the deserialized machine learning model (ml_model).
Return type:: GamAI_io

Notes

The loading process reverses the Base64 encoding and joblib serialization to reconstruct the original model object.

save(save_dir='models')[source]

Serialize and save the current model and metadata as a .gam_ai package.

Parameters:: save_dir (str, optional, default="models") – The target directory to save the .gam_ai file. The directory will be created if it does not exist.
Raises:: ValueError – If no model (ml_model) is attached to the current instance.

Notes

The model is serialized via joblib and encoded with Base64 to ensure JSON compatibility. The resulting file can be safely shared or uploaded to repositories without binary corruption.

class PyGamLab.ai_core.Gam_Ai_Workflow(model_name, base_dir=None)[source]

Bases: object

Intelligent, type-aware workflow manager for .gam_ai model packages.

This class provides a unified interface for loading, evaluating, refitting, and visualizing machine learning models saved in the .gam_ai format. It automatically detects the model type (classifier, regressor, or unsupervised) and routes the evaluation pipeline accordingly.

Parameters:

model_name (str) – The name (without extension) of the .gam_ai file to be loaded.
base_dir (str, optional, default="gam_models") – Directory containing saved .gam_ai model files.

Raises:

FileNotFoundError – If the specified .gam_ai file cannot be located within base_dir.

Notes

This class depends on the GamAI_io handler for deserializing .gam_ai files. Each .gam_ai file contains both model metadata and a serialized model object. Once loaded, Gam_Ai_Workflow provides:

Smart evaluation (evaluate())

Visualization (visualize_unsupervised())

Refit capabilities (refit())

Summaries (summary())

Examples

>>> workflow = Gam_Ai_Workflow("rf_classifier_v1", base_dir="models")
✅ Loaded model 'rf_classifier_v1' (classifier) successfully.
>>>
>>> workflow.summary()
📘 MODEL SUMMARY
model_name: rf_classifier_v1
model_type: classifier
author_name: John Doe
best_accuracy: 0.94
...
>>>
>>> workflow.evaluate(X_test, y_test)
🎯 Accuracy: 0.9470
📊 Classification Report:
...

evaluate(X, y_true=None)[source]

Automatically dispatch model evaluation based on its declared type.

Parameters:

X (array-like) – Input data.
y_true (array-like, optional) – Ground-truth labels or target values (required for supervised models).

Raises:

ValueError – If the model type is not recognized.

Notes

This method intelligently determines which evaluation routine to run:

evaluate_classifier() for classification models
evaluate_regressor() for regression models
visualize_unsupervised() for unsupervised models

Examples

>>> workflow.evaluate(X_test, y_test)
🎯 Accuracy: 0.9470
📊 Classification Report:
...

evaluate_classifier(X, y_true)[source]

Evaluate a classification model and display key performance metrics.

Parameters:

X (array-like) – Input test features.
y_true (array-like) – Ground-truth class labels.

Return type:

None

Notes

This method computes and displays:

Accuracy score
Classification report
Confusion matrix plot

evaluate_regressor(X, y_true)[source]

Evaluate a regression model and visualize performance trends.

Parameters:

X (array-like) – Input test features.
y_true (array-like) – True continuous target values.

Return type:

None

Notes

This method prints and plots:

Coefficient of determination (R²)
Mean Absolute Error (MAE)
Mean Squared Error (MSE)
Predicted vs. True value scatter plot

get_GAM_AI_MODEL()[source]

Retrieve the fully loaded .gam_ai model instance associated with this workflow.

Returns:: The GAM_AI_MODEL object currently managed by this workflow instance. This object encapsulates both the model’s metadata (e.g., author info, training details, performance metrics) and the deserialized scikit-learn model accessible via the attribute ml_model.
Return type:: GAM_AI_MODEL

Examples

>>> workflow = Gam_Ai_Workflow("cu-nanocomposites-poisson-ratio-lr")
✅ Loaded model 'cu-nanocomposites-poisson-ratio-lr' (train/test) successfully.

>>> gam_model = workflow.get_GAM_AI_MODEL()
>>> type(gam_model)
<class 'PyGamLab.ai_core.gam_ai.GAM_AI_MODEL'>

>>> gam_model.summary()
📘 MODEL METADATA SUMMARY
model_name: cu-nanocomposites-poisson-ratio-lr
author_name: Shaoyu Zhao, Yingyan Zhang, Yihe Zhang et al.
best_accuracy: {'MAE': 0.0541, 'MSE': 0.0042, 'R2': 0.39}
⚙️ ML Model: <class 'sklearn.linear_model._base.LinearRegression'>

Notes

This method serves as a safe accessor for the underlying GAM_AI_MODEL instance (self.gam) loaded during Gam_Ai_Workflow initialization. It can be used to directly inspect model metadata, retrieve the raw ML model, or perform low-level analysis without invoking higher-level workflow methods.

predict(X)[source]

Run model inference on input data.

Parameters:: X (array-like) – Input features compatible with the trained model.
Returns:: Model predictions corresponding to X.
Return type:: np.ndarray
Raises:: ValueError – If no model is loaded.

refit(X, y)[source]

Retrain (refit) the loaded model on new data.

Parameters:

X (array-like) – Training features.
y (array-like) – Corresponding training labels or targets.

Raises:

NotImplementedError – If the model does not support the .fit() method.

Notes

This method modifies the current model in place and does not automatically update the .gam_ai file on disk. To persist changes, re-save the model using GamAI_io.save() after refitting.

summary()[source]

Display a concise summary of the loaded model and its metadata.

Prints all metadata fields stored in the .gam_ai file, including model type, author information, and hyperparameters.

Return type:: None

visualize_unsupervised(X)[source]

Visualize cluster assignments or feature transformations for unsupervised models.

Parameters:: X (array-like of shape (n_samples, n_features)) – Input data to visualize.
Return type:: None
Raises:: NotImplementedError – If the model lacks both predict() and transform() methods.

Notes

If the model has a .predict() method, cluster assignments are plotted.
If the model has a .transform() method, the transformed feature space is shown.
This visualization assumes that the first two components or features are suitable for 2D projection.