CHANGELOG
Next: API changes with the large backends refactoring!
API Changes: backends and related packages moved to github.com/gomlx/compute
- Packages
backends,dtypes,shapesanddistributedmoved togithub.com/gomlx/compute - New
github.com/gomlx/computerepo will host what was in packagebackends.backends/simplegobackend now moved togithub.com/gomlx/compute/gobackend(no longer “simple”).backends/xlabackend now moved togithub.com/gomlx/go-xla/compute/xla.backends/defaultstill imports both by default the “go” and the “xla” backends.
- XLA backends used for test in
support/testutilonly where the platform is supported (and if the tagnoxlais NOT selected). - New
tensors/dtensorto hold distributed tensors (previously inpkg/core/distributed).
0.27.3: Improves sub-byte support, optimized raw-byte data transfers, FNN ensembles, and major transformer updates including BERT and Gemma models.
Core:
- Package
tensors- Improving support for sub-byte data types (
Int4,Int2,Uint4,Uint2) - Added
FromShapeForBackend()to create new tensors with shared memory if possible. - Added
ToDevice()and improved performance of MaterializeOnDevice when copying to shared buffers. - Improved
MutableBytes(), should improve the performance in some cases. - Added
FromRaw()to upload a tensor to a backend with the given raw data – this is the most performant way to upload raw data to the backend accelerator.
- Improving support for sub-byte data types (
- Package
dtypes- Added several slice creation/copying/casting functions for dtypes, using switch on types as opposed to
the slower
reflectpackage. Incorporated that into thetensorsimplementations, as well as exposed it publicly (go-huggingfaceproject uses it). - Deprecated
dtypes.Memory().
- Added several slice creation/copying/casting functions for dtypes, using switch on types as opposed to
the slower
- Package
shapes- Added
ByteSize()method, with proper support for packed dtypes. - Deprecated
shapes.Memory().
- Added
- Package
graph- Added
CrossCosineSimilarity()operation. Graph.Compile()now returns an error instead of panicking – API change: it’s not a common way to compile graphs (in GoMLX it’s only used in tests), but it requires change for anyone using it directly.
- Added
Packages in pkg/ml
- Package
activations- Added
HardSigmoidactivation. - Modified the parametrized activations to be suffixed with
With:LeakyReluWith,HardSigmoidWith,HardSwishWith.
- Added
- Package
attention:- API clean up.
- Added
WithMask, a simplified mask input for when only using padding. - Dropout now takes a
*Nodeinstead of a float64, allowing dynamic dropout control. - Fixed mask and causal mask handling.
- Package
attention/pos:- Split
EncoderAPI intoQKEncoderandPreEncoder, to support different types of positional encoders. - Added
WithSlidingWindowto support “sliding attention” (the slow way).
- Split
- Package
layers/fnn:- Added
WithEnsembleSizeandWithEnsembleAxismethods for configuring parallel independent executions via ensembles.
- Added
- Package
transfromers:- Updates and fixes to the API; Added methods to build partial models:
AllLayers,ForwardLayer,LogitsFromEmbeddings,EmbedTokesn, etc. - Updated positional-encoder support.
- Added options
WithFinalNormalization,WithScalingOfTokenEmbeddings,WithArchitecture,WithSlidingWindowandWithLayerTypes,WithEmbedNormalization,WithTokenTypeEmbedding. - Added BERT and Gemma types of models.
- Updates and fixes to the API; Added methods to build partial models:
Packages in pkg/support
- Added
humanizepackage:- It includes
Bytes(),Count(),Speed(),Underscores()andDuration(). - Replaces
fsutil.BytesETCanddustin/go-humanize.
- It includes
Backends:
- Package
xla:BufferToFlatDataandBufferFromFlatDatanow transfer using raw-bytes, for a slight gain in performance.
0.27.2: DotGeneral with AccumulatorDType; Transformer architecture parameter;
Core:
Package
graph:DotGeneralpassingAccumulatorDTypeandOutputDTypeto the backend (instead of assuming it doesn’t implement and converting it). Also, by default, half-precision floats use float32 as accumulator.- For the
xlabackend: added an “hacky” dependency from the variable (weights) to the lhs operand of theDotGeneraloperation: because XLA CPU creates a temporary re-layout of the weights, this dependency ensures that only one temporary buffer is allocated at a time, along the layers of a model (in a 22Gb model with 48 layers, it saved 48Gb! in temporary memory)
- For the
Package
ml/model/transformer:- Added architecture parameter (“standard” or “gemma” values).
- Activation passed an
activations.Typevalues (instead of string) – but conversion from string as a context hyperparameters still works. - Added
WithTransposedWeights()andWithCausalMask()options. - Simplified code.
Package
ml/layers- Added constants to normalization types.
Backends:
Backend
xla:- Updated dependency to
github.com/gomlx/go-xlato v0.2.2: with a fix to NVIDIA CUDA drivers path. - DotGeneral with unsupported accumulation dtypes (only float32 is supported): it automatically converts the input dtype to the accumulation dtype first.
- Added executable memory consumption logging if passing
-vmodule=executable=1. - Added
OptimizationBarrieroperation. Not exposed ingraphthough. - Added “hack” dependency on the weights of a DotGeneral operation to the lhs operand of the DotGeneral operation, to hugely decrease temporary memory usage. See issue in https://github.com/openxla/stablehlo/issues/2923
- Updated dependency to
Backend
simplego(“go”):- DotGeneral with accumulation dtypes: it automatically converts the input dtype to the accumulation dtype first. (With the exception of half-precision types, which use float32 as accumulator by default).
v0.27.1: Minor fixes and updates; ONNX-GoMLX examples now use v0.4.1.
Package
backends:- Added
QuantGGMLquantization scheme withGGMLQuantTypeenum for native GGML block formats (Q4_0, Q8_0, IQ4_NL, Q4_K, Q6_K). - Added
IQ4NLLookupTablefor IQ4_NL non-linear dequantization. - Added
QuantizedEmbeddingLookuptoFusedOpsinterface for quantized embedding lookups. - Added
ShiftLeft,ShiftRightArithmetic,ShiftRightLogicaloperations.
- Added
Package
examples/...:- Updated
gemma3,mxbai-rerankandbert-base-nerto use the newonnx-gomlxv0.4.1 API, bumped dependency.
- Updated
Package
graph:FloorandCeiloperations now are identity for integer dtypes.- Added
BackendQuantizedEmbeddingLookupgraph-level op. - Added
LogicalShiftLeft,LogicalShiftRightops for sub-byte unpacking.
Package
nn:- Added
QuantizedGatherlayer for quantized embedding lookups with automatic fallback.
- Added
Package
simplego:- Removed panics during execution: return errors instead.
- Fixed missing annotation/stacktrace on not-implemented errors.
- Implemented
Pad()operation (and add some more tests ingraph.TestPad). - Added
FusedQuantizedDensesupport for GGML-quantized weights (Q4_0, Q8_0, IQ4_NL, Q4_K, Q6_K). - Added
QuantizedEmbeddingLookupfor quantized embedding lookups with on-the-fly dequantization. - Added shift operation executors.
- Fixed
execBitcastbuffer reuse for cross-bit-width types (e.g. Uint8 β Float16). See #374.
Package
ggml:- Added
dense.go,dequant.go,gather.gofor GGML model weight handling.
- Added
Package
xla:- Changed
TF_CPP_MIN_LOG_LEVELto default to 3. See https://github.com/openxla/xla/issues/26466
- Changed
v0.27.0: Graph functions; Improved Go backend (fusion ops); Quantization dtypes; more ML layers & fixes
- Package
backends: major refactoring to add support for functions/closures.- Added
backends.Function, which now holds all the “ops” methods. - Added
NewFunction,ClosureandCall. - Renamed
backends.Op->backends.Value. - Added
FusedOps, allowing backends to expose fused (more efficient) operations – with proper/automatic fallback to decomposed operations when not supported or for gradients. - Added
ErrNotImplementederror andIsNotImplemented(err)function. - Added
Quantizationstruct,QuantizationScheme(Linear, NF4), andNF4LookupTable. - Removed
Dot()operation (redundant withDotGeneral). DotGeneral()now takes aDotGeneralConfigstruct, with options for setting the accumulator and output dtypes.
- Added
- Package
simplego:- Added
Float16support (thx @timkaye11) - Added dedup of computation nodes (aka. “common subexpression elimination” CSE) (thx @timkaye11, @janpfeifer)
- ~6% speedup for CSI-Adult demo training.
- DotGeneral: Pre-blocking of the blocked path, which may lead to deduplication of blocking nodes (@timekaye11).
- DotGeneral: Added smallMatMul execution path, optimized for small matrix multiplications (thx @timkaye11)
- Experimental
packgemmsupport leveraging simd operations (@ajroetker, @janpfeifer) - Funtions/closures support (thx @ajroetker)
- Added
Reverseoperation. - Added fused operations:
FusedGelu,FusedDense,FusedSoftmax,FusedLayerNorm,FusedScaledDotProductAttention,FusedAttentionQKVProjection. - Added
FusedQuantizedDense: fused dequantization + matmul + bias + activation for Int4/Int8 weights with Linear and NF4 quantization schemes, block-wise scales, and optional zero points. FusedScaledDotProductAttention: addedScaledDotProductAttentionConfigoptions struct withQuantizedMatmulsflag for optional uint8 quantized Q@K/attn@V matmuls (awaiting go-highway release for actual acceleration).Bitcastrefactored to pure bit reinterpretation; sub-byte unpacking moved toConvertDType.
- Added
- New package
bucketing:- Tools to manage bucketing of tensors (or anything else) – thx @ajroetker
- Package
dtypes:- Added ‘Uint2’, ‘Uint4’, ‘Int2’, ‘Int4’.
- Package
graph:- Added
Functionconcept (and support for closures) and theFunction.Calloperation. - Control Flow: Added
WhileandIfoperations. - Order operations: Added
Sort,SortFunc,TopK,BottomK. - Fixed
Bitcastfor packed sub-byte types:Int4,Int2,Uint4andUint2, so they can be “bitcast” back and forth from/touint8(bytes), to ease quantization. - Added
Atan2function. - Added test helper functions to test various backends at once.
- Fixed
Gathervalidation ofindexVectorAxisto check againststartIndicesrank instead ofoperandrank. Execgraph compilation is now concurrent, avoiding redundant compilations for the same graph shape.
- Added
- Package
ml/layers/attention: ImprovedMultiHeadAttention; AddedKVCachesupport.- Added Grouped Query Attention (GQA) support.
- Added
UseQKVProjection()option for fused Q/K/V Dense projection. FusedScaledDotProductAttentionnow supports boolean masks.
- Package
ml/layers/attention/pos: AddedPositionalEncoderinterface, and “RoPE” (Rotary Positional Encoding) implementation. - Package
ml/models/transformers:- Added a
Transformer“model”: a collection of transformer layers are setup based on given configuration.
- Added a
- Package
ml/decode: Added aDecoderobject to generate text given a sequential model. - Package
ml/decode/sample:- Added implementation of various sampling strategies (greedy, temperature, beam-search, top-k, top-p, etc.), used by the
decodepackage.
- Added implementation of various sampling strategies (greedy, temperature, beam-search, top-k, top-p, etc.), used by the
- Package
ml/layers/activations:- Added
HardSwish.
- Added
- Package
examples:- Separated in its own sub-module, to separate its dependencies.
- Added
gpt2: A simple GPT-2 implementation using the new transformers and decode packages. It downloads the model from HuggingFace. - Added
textgen: a minimal transformer text generation model that can be trained. - Added
gemma3: A simple Gemma 3 implementation using theonnx-gomlxpackage to convert the model, andgo-huggingfaceto download the model and run the tokenizer. - Added
mxbai-rerank: A cross-encoder reranking example using the MixedBread Reranker v1. It uses theonnx-gomlxpackage to convert the model, andgo-huggingfaceto download the model and run the tokenizer. - Added
BERT-base-NER: A BERT-base model fine-tuned for Named Entity Recognition.
- Bumped github actions versions to the new “Node24” ones.
v0.26.0: Using the new github.com/gomlx/go-xla library. Added linux/arm64 and windows/amd64 support for XLA CPU.
API Change: dtypes package moved from github.com/gomlx/gopjrt/dtypes to github.com/gomlx/gomlx/pkg/core/dtypes.
It should be a simple change in import.
XLA:
- go-xla (replacing the now deprecated stablehlo and gopjrt libraries)
- Added auto-installation of standard (CPU and GPU/TPU when available) plugins.
(Can be disabled by setting the environment variable
GOMLX_NO_AUTO_INSTALLto anything) - Fixed some memory leaks on plugin destruction;
- Improved performance in some low-latency scenarios (using GenPool as opposed to sync.Pool):
- Added auto-installation of standard (CPU and GPU/TPU when available) plugins.
(Can be disabled by setting the environment variable
- Removed old
gomlx/backends/xla(the one that used the retiredxlabuilderAPI for XLA). - Renamed
gomlx/backends/stablehlo–>gomlx/backends/xla, using the newgo-xlalibrary. - Added
xla.EnableAutoInstall(enabled bool)to enable/disable auto-installation of standard plugins. And addedxla.AutoInstall()to immediately auto-install standard plugins. - Conversion from/to new
gomlx/gomlx/pkg/core/dtypes(andbfloat16) to/fromgomlx/gomlx/pkg/core/dtypes(and correspondingbfloat16) - Added linux/arm64 and windows/amd64 support for XLA CPU.
Other updates:
- Package
tensors:- Added
CopyFlatData()that returns an error (it was previously renamed toMustCopyFlatData)
- Added
- Package
graph:- Added ‘RNGStateFromSeedForGraph’ function to create a RNG state from a seed for a graph.
- Package
pkg/core/dtypes- New, copied from now deprecated Gopjrt.
- Package
simplego:- Registration of executors with priority.
v0.25.0: Distributed execution; API cleanup (more Go idiomatic)
Hightlights:
Distributed (cross-devices) execution: with AutoSharding and SPMD strategies; Also added support for “portable device” execution.
API changes: (will require simple fixes)
- Most not graph building APIs now return errors (as opposed to panicking). Graph building functions still use panic to return error – otherwise it’s too painful to express math.
- All “Rng” renamed to “RNG” – acronyms in Go are usually capitalized.
Distributed computation improvements and refactorings:
- Package
graph:- Fixed/improved documentation.
- Added
IsNegative,IsPositive,IsNonNegative,IsNonPositive. - Added
SubScalarand tests for the ‘*Scalar’ functions. - Added
Graph.WithDistributedStrategy,Graph.WithDeviceMesh.Graph.DeviceMeshandGraph.NumDevices - Added
Graph.Distributed()with “collective” (across devices) operations (likeAllReduce). - Renamed: s/
Exec.InDevice/Exec.WithDevice; s/Exec.SetName/Exec.WithName - Added
RunOnDevice. - Added
Exec.AutoShardingandExec.SPMD.
- Package
context:- Added
context.MustGetParam[T](ctx, key)andcontext.MustGetGraphParam[T](ctx, graph, key). - Added
Exec.AutoShardingandExec.SPMD. - Added
Variable.DistributedValueandVariable.SetDistributedValue.
- Added
- Package
train:- Added
train.DistributedDatasetandtrain.BaseDataset. Dataset.Resetnow returns an error.Trainer.TrainStep,Trainer.EvalStepandTrainer.Evalnow return errors as opposed to panicking.- Added
Trainer.WithDeviceAssignment. - Added
Trainer.DistributedTrainStep,Trainer.DistributedEvalStepandTrainer.DistributedEval.
- Added
- Package
datasets:- Added
datasets.DistributedAccumulator: converts a normalDatasetinto aDistributedDataset. - Added
datasets.OnDevice: pre-uploads data to devices.
- Added
- Package
backend:- Added
Backend.CopyToDevice Builder.Parameter()now takes an optionalShardingSpecfor sharded inputs.- Added ops:
AllReduce Backend.NumDevices()returns an int now.- Package
backends/notimplemented:- Added dummy
Backendthat can be used to easily mock backends.
- Added dummy
- Added
- Package
pkg/core/distributed: -AddedDeviceMesh,ShardSpecanddtensor.Tensorobjects. - Package
pkg/core/tensors:- Added
Tensor.CheckValid(),Tensor.Device(),Tensor.Backend() - Changing it to return an error (as opposed to panic) where possible.
- Added
Other improvements:
- Package
simplego:- Cleanups and improvements: thanks to @wunderbarb!
- Fixed the issue with not handling the default value for the donate parameter in the Execute method.
- Package
cosineschedule:- Added
WarmUpStepsandNumCycleshyperparameters – removed overloading ofperiodSteps.
- Added
- Added sponsorship badge and section to README.md. Also added the
FUNDING.ymlpointing to sponsorship. - Added
.golangci.ymland fixed many (still a long way to go) lint-warnings. - GitHub actions (workflows):
- Renamed tests to “Linux” and “Darwin.”
- Updated badges in README.md.
- Updated dependency to Gopjrt v0.8.5, fixing xlabuilder for new C compilers.
- Removed
ui/fyneui:- It was incomplete, and it would be better offered as a separate package to avoid the dependencies.
- Package
graph:- Added a negative and out-of-bounds indices test for
Gather.
- Added a negative and out-of-bounds indices test for
- Package
simplego:- Partially fixed a race condition where the executable is finalized during the execution, causing crashes – Thanks @ajroetker!
v0.24.1: 2025/10/23 Adding Darwin (Mac) support for CPU PJRT plugin
- Updated dependency to Gopjrt v0.8.4: added macOS (darwin/arm64) support and cpu PJRT plugin.
- Include
stablehlo(==xla) by default for macOS in Darwin. - GitHub actions:
- Added macOS tests.
- Removed unnecessary
apt installof packages.
v0.24.0: 2025/10/21 API change: package tree restructured under pkg, Exec normalization; Backend xla now provided by stablehlo
- Highlights of this release:
- Deprecating old “xla” backend (now called “oldxla”) in favor of “stablehlo” (aliased to “xla” as well):
in most cases nothing needs to be done (the
github.com/gomlx/gomlx/backends/defaultwill replace one by the other automatically), but in special cases there may require small changes. - Large refactoring: exported GoMLX packages moved under
/pkg. The following changes:- This requires changes to the import paths: core packages (
tensors,shapesandgraph) are underpkg/core; machine learning packages (context,layers,train,datasets, …) are underpkg/ml; supporting packages (fsutil,sets,xslices,xsync) are underpkg/support. - Normalized
graph.Execandcontext.Execslightly changed the API: theExec.Exec...methods now return an error, and theExec.MustExec...methods panic (instead of the oldExec.Callformat); Thegraph.NewExecandcontext.NewExecreturn errors, and thegraph.MustNewExecandcontext.MustNewExecpanic. - File utilities under the old
ml/datanow are underpkg/support/fsutil, and the packageml/dataitself was renamedpkg/ml/datasetsand now only holds the various datasets types. - Packages that were not moved:
- The
backendspackage: it will move to its own repository later in the year (or early 2026) - The
uiandexamplepackages: since they are just extras, we keep them where they are for now. The coreGoMLXdoesn’t depend on them, so we are more lax with their external dependencies.
- The
- This requires changes to the import paths: core packages (
- Deprecating old “xla” backend (now called “oldxla”) in favor of “stablehlo” (aliased to “xla” as well):
in most cases nothing needs to be done (the
- Copied external trivial
mustandexceptionspackages to/internal/..., to remove external dependencies. - Package
xla(the old one): now DEPRECATED and calledoldxla. The packagestablehloreplaces it, including aliasing thexlabackend name.- The old version is now registered as backend “oldxla”.
- Only included in
github.com/gomlx/gomlx/backends/defaultif compiled with the tagoldxla.
- Package
stablehlo:- Now completely replacing
xlaby default. UsingGOMLX_BACKEND=xlawill actually use thestablehlobackend. - Added
github.com/gomlx/gomlx/backends/stablehlo/cpu/dynamicandgithub.com/gomlx/gomlx/backends/stablehlo/cpu/staticto optionally force dynamic/static linking of the CPU PJRT plugin. - Disabled XLA logs by default by setting TF_CPP_MIN_LOG_LEVEL to 2 (errors level), if it is not already set.
- Now completely replacing
- Package
graph:NewExec,NewExecAnyExec,ExecOnceandExecOnceNnow return an error on failure.MustNewExec,MustNewExecAny,MustExec,MustExecOnceandMustExecOnceNpanic on failure.- Introduced
Exec[1-4]andMustExec[1-4]to execute the graph and return exactly 1-4 values. - If no seeds are given, initialize new random number generators with a cryptographically secure seedβon OSes that provide that.
- Improved
Exectests.
- Package
context:NewExec,NewExecAnyExec,ExecOnceandExecOnceNnow return an error on failure.MustNewExec,MustNewExecAny,MustExec,MustExecOnceandMustExecOnceNpanic on failure.- Introduced
Exec[1-4]andMustExec[1-4]to execute the graph and return exactly 1-4 values. - Improved documentation.
- Packages
pkg/support/...:- Generic supporting functionality that is not core to GoMLX, but that users may also find useful to interact with it
are now better (and hopefully more definitively) organized in packages under
pkg/support/. The following packages were moved/created:xslices,xmaps,xsync: extensions to the corresponding standard packages.set: previously known as packagetypes/fsutil: file system handling utilities, previously indata.
- Generic supporting functionality that is not core to GoMLX, but that users may also find useful to interact with it
are now better (and hopefully more definitively) organized in packages under
- Package
inceptionv3moved toexamples - Package
ui/commandline: fixed progressbar in GoNB notebooks. - Package
kan: fixedPiecewiseConstant*layers for inputs of rank 1. - Packages
downloaderandhuggingface: had been already deprecated for a while, now removed. See https://github.com/gomlx/go-huggingface for a replacement. - Package
hdf5moved to underexamples/inceptionv3, for now the only example that uses it. If you need this, please let us know, maybe we move it to under support, or move it to https://github.com/gomlx/go-huggingface. - Package
datarenamed todatasets; Split downloading functionality underexamples/downloader. - Package
commandline:- Progressbar now shows the median step duration.
- Updated and refreshed all notebooks, including the tutorial.
v0.23.2: 2025/10/01: Updated dependencies on github.com/gomlx/go-xla/stablehlo@v0.0.5 and github.com/gomlx/gopjrt@v0.8.2.
- Updated dependency to new Gopjrt v0.8.2 because of CUDA PJRT (lack of) backward compatibility issues.
- Package
stablehlo:- Added support for comparison of bool values, and added corresponding tests.
- Fixed wrong checking for during shapeinference.Gather
v0.23.1: 2025/09/25: Small bug fixes.
- Package
backends:- Removed op
Broadcast: it was unnecessary, sinceBroadcastInDimis a superset.
- Removed op
- Package
graph:- Backprop of
BroadcastPrefixwas not defined. Now that is usesBroadcastInDiminstead, it works.
- Backprop of
- Package
simplego:- Only log ConvGeneral statistics on error.
v0.23.0: 2025/09/21: beta stablehlo backend release
- Package
shapes:- Added
FromAnyValue: extract shape from a Go type.
- Added
- New backend:
stablehlo(or simply “hlo” for short) using https://github.com/gomlx/go-xla/stablehlo.- All standard binary and unary ops implemented.
- A handful of the standard ops also implemented.
- If
backends/defaultis compiled with-tags=stablehloit will include thestablehlobackend. - Large cleanup of generators: most no longer depending on
gopjrt/xlabuilder.
- Package
graph:ArgMin,ArgMax:- Fix of
ArgMinnow accepting negative axes. - For
stablehloandgobackends NaNs will be deliberately selected (inline with Jax/TensorFlow/PyTorch)
- Fix of
Clipnow uses the backend operationClamp.Inverserenamed toReciprocal–Inverseis now a deprecated alias toReciprocal.- Added tests to various reduce operations.
- Added
IsNaN - Fixed
MaskedReduceMean, when the mask provided is only a prefix rank to the input. - Package
nanlogger:NanLogger.WithStopAtFirstnow can be used to control the default behavior of NanLogger.Trace.
- Package
backends:- Ops are no longer auto-generated: now it is its own source of truth (as opposite to being generated from XLA code)
- Added
IsNaN - Many comments improvements.
- Removed
SelectAndScatterSum, which was wrong, and now is deprecated ingopjrt.
- Package
train:Loop.EveryNStepstakes into account the current global step (as opposed to always start the counting from 0).- Datasets implementing
train.Datasetcan now also implementShortName() stringto provide a short name to be used in metrics.
- Package
losses:MeanSquaredError: fixed weights/mask expected mask.
- Package
commandline:- Exposed
RefreshPeriodwith frequency of command-line updates. - Fixed flickering of the progress bar / table of metrics.
- Improved colors, “humanize” steps printing.
- Exposed
gomlx_checkpointsCLI tool:- Added
-plotto generate plots for all metrics. It accepts various models, so one can use it to compare models.
- Added
v0.22.1: 2025/08/22 π Convolutions π
(release v0.22.0 was skipped due to a bug notice slightly after release)
- Package
backends:ConvGeneralDilatedrenamed toConvGeneral
- Package
backends/shapeinference:- Added
ConvGeneralOpto infer the output shape of a convolution.
- Added
- Package
backends/simplego:- Implemented
ConvGeneraloperation: supporting strides, padding, dilations (input and kernel), and grouping (channels or batch), as well as transposing (arbitrary axes) convolutions.
- Implemented
- Package
types/shapes:Shape.Iter()andShape.IterOn()also yields the flat index being iterated.- Added
Shape.Strides()andShape.IterOnAxes().
- Package
graph:- Names of parameters for
ConvGeneralwere standardized to “input,” “kernel” and “channels.” ConvGeneralDilatedis being aliased toConvGeneraland the former will be deprecated on a future version.ConvGeneral: added gradient for grouped (by channels or by batch) convolutions.- Fixed shape of the kernel for
images.ChannelFirstconfiguration. - Added
Split. TransposeAllDims->TransposeAllAxes.
- Names of parameters for
- Package
layers:- Updated the configuration names for
Convolution, to match the standards in thegraphpackage. - Added
ChannelGroupCount()andBatchGroupCount()toConvolutionconfiguration.
- Updated the configuration names for
- Updated to gopjrt v0.8.0, with the changes to the convolution API.
v0.21.1: 2025/08/16 Added Zero-dim tensors support and other small improvements.
- Package
tensorsandgraph:- Added support for zero-dim tensors.
- Package
backends:- Method
New()will return an error (as opposed to panic). The temporarilyNewOrErrwas marked as deprecated, useNewinstead.
- Method
- Package
optimizers:- New
AdamConfig.WithBackoffSteps()(or the hyperparameteradam_backoff) that prevents gradient steps from being taken until the given number of steps has executed. This allows a better estimate (moving average) of the gradients (“momentum”) and their variances to be calculated before applying them. - New
optimizers.ParamAdamBeta1andoptimizers.ParamAdamBeta2hyperparameters to control Adam beta1 and beta2 hyperparameters.
- New
- Package
context:- Added
Variable.DType(). - Variable
#rngstatemarked as non-trainable during creation.
- Added
gomlx_checkpoints:- Added
-perturb. - Now it has its own
go.mod, so it separated the dependencies.
- Added
- Docker:
- Included
openssh-client(ssh) anddlv(Go debugger) by default.
- Included
SimpleGo(“go”) backend:- Fixed mishandling of multi-output operations and race condition on parallel execution (#197)
- Refactoring and clean up of execution loops.
- Separated
TestDotGeneral_PerformanceTablebehind the build tagperf.
v0.21.0: 2025/07/01 π Summer Edition π
- Package
simplego:- Added
GetBackendthat returns a singleton backend, created with the default configuration at the first request.
- Added
- Package
ui/commandline:- Added optional extra arbitrary metrics to print in the command-line with
AttachProgressBar. - Added
FormatDurationto pretty-print duration.
- Added optional extra arbitrary metrics to print in the command-line with
- Package
graph- Added gradients of
CosandSinthat were missing. - Fixed (removed) the extra empty line in auto-generate functions comments that was preventing the documentation from being assigned to the functions.
- Added parameters
sortedanduniquetoScatter(like the other functionsScatter*) – Small API change. - Added
ScatterUpdate, for now only forunique=true. - Package
nanlogger:- Allow traces that only report also.
- Created context parameter
optimizer.ParamNanLogger: if set to NanLogger, it will trace all occurrences of of NaN values in gradient: great to debug where are the NaN appearing in the model first.
- Added gradients of
- Package
ml/train:- Improved support for accumulated gradients. Fixed evaluation (context reuse) for when using accumulated gradients.
- Added
Trainer.WithMaxExecutors.
- Package
ml/train/metrics:MeanMetricallows for disabling dynamic batch weighting. API slightly changed:NewMeanMetricnow returns aMeanMetricstruct, not an interface.- Added
StreamingMedianMetric.
- Package
ml/train/optimizers:- Added
RMSProp()optimizer.
- Added
- Package
ml/layers- Added normalizing 1/sqrt(d_k) factor to attention logits in the MultiHeadAttention layer: this will break current models using it.
- Added
RMSNormnormalizer.
gomlx_checkpointscommand-line tool:- Added support for multiple models to allow comparing models.
- Fixed the printing of metrics with tiny values.
- Package
context:- Allow VariableInitializers to use the
context.Contextitself, with its own random initializer. DefaultInitializernow creates an initializer. The new default uses He initializer, the same used in PyTorch.- Package
initializers:- They now use the
contextrandom number generator state, which simplifies things. ParamInitialSeedremoved, since the RNG is initialized byContext.RngStateWithSeed().
- They now use the
- Allow VariableInitializers to use the
- Fixed some flaky tests.
v0.20.1: 2025/06/12 Trainer.AccumulateGradients (when the batch doesn’t fit memory); VNN fixes; Numpy improvements.
- Package
train:- Better handling of loss (without regularization) in metrics. Added
SetLossNoRegularizationandGetLossNoRegularization. - Added
Trainer.AccumulateGradients(n)to accumulate n steps of gradients before applying them. This is useful if the desired batch size doesn’t fit in memory, so it accumulates the gradients until the virtual batch size gradient is calculated.
- Better handling of loss (without regularization) in metrics. Added
- Package
optimizers:- Added support for the new
train.OptimizeWithGradientsinterface, to support gradient accumulators. - Cleaned up
StochasticGradientDescentAPI. Added option to disable decay for testing.
- Added support for the new
- Pacakge
vnn:- Added
Config.Scalerto add a scaler operator just after the linear projection of a layer. It allows the VNN to operate on magnitude independent vectors. - Fixed the
LayerNormalization, to make it more stable in backprop. - Fixed
Relu: added support for non-shared non-linearities and a “leak” parameter (“vnn_relu_negative_slope”). - Added
VNN().ActivationFn()to allow setting arbitrary activation functions.
- Added
- Package
types/tensors/numpy:- Added support for “Fortran order” files.
- Package
tensors:- Attempting to finalize an “on-device” tensor whose backend has already been finalized is now a no-op – as opposed to an panic.
- Access to a on-device or shared buffer now checks that the backend hasn’t been finalized. And if it has, it panics with a meaningful error message.
- Added integration tests.
v.0.20.0: Small API change: backends.NewWithConfig() changed to return an error.
- Package
backends:- API CHANGE: Method
NewWithConfig()changed - Method
New()will be changed to return an error (as opposed to panic) at next version. Temporarily the methodsMustNew()(which panics on errors, like today) andNewOrErr(which returns an error) were created to have a clear API, andNew()was marked as deprecated. At the next versionNew()will change the API. - Added
IsFinalized()to the Backend API, to better handle attempts to access finalized backends. - Fixed bug in
xlabackend where an error was not being sent when Backend was already finalized.
- API CHANGE: Method
- Package
types/tensors/numpywith methods to read and write tensors from/to.npyand.npzfiles. - Package
simplego:- Fixed bug introduced in parallelize version of Erf(x).
- Package
tensors:- Added
Tensor.ToLocal()to detach a tensor from its backend.
- Added
- Package
ui/gonb/plotly:- Update dependencies to new go-plotly v0.7.0 (many changes to the API), while preserving as much as possible the GoMLX api offered.
- Updated example notebooks to use
github.com/gomlx/gomlx/backends/default(instead of only/xla) and to use the newbackends.MustNew().
v0.19.5: 2024/05/30 SimpleGo (go) backend optimizations
- Package
simplego, the pure Go backend:- Added several benchmarks for SimpleGo DotGeneral. Run with:
go test ./backends/simplego/ -test.v -test.run PerformanceTable -perf - DotGeneral reimplemented in 2 different versions:
- Version for small inner matrices, with block iteration and loop unrolling.
- Version for larger inner matrices: re-package inputs in ~4K blocks, and recursively partition matrices.
- Added parallelization: at batch level and in the partitioning in the larger matrices.
- Parallel execution of the Ops: that helps a lot during training (cut the training time almost in half for the adult
dataset), but it may hurt inference if you are running many batches in parallel.
So it dynamically decides to run sequentially or in parallel depending on the number of computations
being executed concurrently.
Added also configurations
GOMLX_BACKEND=go:ops_sequentialandGOMLX_BACKEND=go:ops_parallelto force one type of execution or another. - Parallelized Erf(x): this will become a model on how to parallelize other unary functions β probably when SIMD is available.
- Added several benchmarks for SimpleGo DotGeneral. Run with:
v0.19.4: 2024/05/24 added Vector Neural Networks (VNNs)
- Vector Neural Networks (VNN): allows one to build 3D rotation (SO(3)) equivariant and/or invariant networks. See package
ml/layers/vnn. - Package
xla- Remove dependencies to
gopjrtinternal protos: requires updatedGopjrt.
- Remove dependencies to
- Package
tensors- Fixed pretty-print of booleans.
v0.19.3: 2024/05/20 Many SimpleGo improvements.
- v0.19.2 skipped … issues with the release.
- Package
simplego:- Fixed
Gatherof scalar values. - Fixed
Wherechecking of shape. - New ops:
NotEqual,Erf,ArgMinMax,ReduceWindow,ReduceBitwise{And,Or,Xor}andReduceLogical{And,Or,Xor} - Fixed initialization of re-used buffers where needed.
- Fixed
- Package
backends/default:- Only include XLA by default on linux/amd64 platforms.
- Package
shapeinference:- Changed to return errors instead of exceptions.
- Package
types/tensors:- Removed dependency to
gopjrt/pjrt– otherwise we’ll always need to install the C/C++ library.
- Removed dependency to
- Package
types/shape:- Added
Shape.Iter()andShape.IterOn().
- Added
- Package
backend:Backendinterface now returns errors instead of panicking.
- Package
graph:- Added
NewExecOrErrorandExec.CallOrErroras error-returning alternatives.
- Added
- gofmt cleanups by @zjtv
v0.19.1: 2025/04/30 SimpleGo fixes and new ops; New XLA, requires Gopjrt v0.7.0 update.
go mod tidy- Package
simplego:- “not implemented” error now includes the name of the corresponding method that was not implemented.
- Several memory fixes.
- Added
SliceandRngBitsGeneratorops.
- Updated to Gopjrt v0.7.0, with more memory fixes. Requires an update of the C++ libraries.
v0.19.0: 2025/04/29 Added SimpleGo, a pure Go backend
- Package
backends:- Added
simplego, a portable, simple albeit slow backend.- Implemented ~50 most common ops, see
backends/simplego/capabilities, and most common numeric types (including BFloat16).
- Implemented ~50 most common ops, see
- Added sub-package
notimplemented: helper to implement new backends. - Added sub-package
shapeinference: helper to implement new backends. - Added sub-package
defaultwhich includes the default packages. - Added
List()function that returns the currently registered (compiled-in) backends.
- Added
- Package
checkpoints- Added
Config.FromEmbedthat allows loading a checkpoint from an embedded variable.
- Added
- Package
graph:GatherandGatherSlicesnow have and extra argument calledindicesAreSortedthat tells whether the start indices are guaranteed to be sorted, which allows some optimizations in some platforms.- Exposed
BackendGather,BackendScatterMax,BackendScatterMinandBackendScatterSumfor test and debugging purposes.
- Moved code generation tools from
cmdtointernal/cmddirectory.
v0.18.1: 2025/04/13 Many fixes, XLA update, Tensor clone.
- XLA Backend:
- Updated gopjrt dependency: fix to Scatter flags.
- Package
graph:- Removed spurious logging.
- Added gradient for ScatterSum, ScatterMax, ScatterMin. Only for simple shapes for now.
- Fixed ExecOnceN to return many outputs.
- Package
tensors:- Added
Tensor.CloneandTensor.OnDeviceClone.
- Added
- Package
context:- Removed deprecated
NewContext - Added
Variable.CloneToContext - Added
Context.Clone - Variable graphToNodeId is now a
xsync.SyncMap, solving issues for concurrency of multiple graphs being created/executed at the same time for the same Context.Exec object (with different shapes). - Added
Variable.FinalizeandContext.Finalize.
- Removed deprecated
- Updated all dependencies and re-tested.
v0.18.0: Ragged2D; XLA update; Fixed Scatter functions; Fixed memory leaks.
- XLA Backend:
- Updated dependency to newest Gopjrt 0.6.3: small memory leak fixes
- Updated CPU PJRT and XlaBuilder
- Fixed Scatter* functions.
- Package
graph:- Fixed
ScatterSum(renamed from the now deprecatedScatterAdd),ScatterMaxandScatterMin. No gradients forScatterMaxandScatterMinyet. - Added
Ragged2Dwith some utilities, in particularRagged2D.Softmax. DefaultNodeLoggernow accepts the#fullprefix that forces printing the full value of a tensor, in Go-code format.
- Fixed
v0.17.1: 2025/02/26 CosineSimilarity, Bitcast and many fixes and improvements.
- Added MNIST example (thanks to @TuSKan).
gomlx_checkpointsnow displays the value of scalar variables.- Package
checkpoints:- Loading a checkpoint overwrites the values of variables already present in the context.
- Fixes when saving, in particular if using
Immediate()loading.
- Package
tensors:- Allow shared tensors to be donated.
- Package
graph:- Fixed when using axes != -1 for
L1Norm. - Added
IsZeroshortcut. - Fixed
L2Normalizeto handle 0s without NaN, both in the forward evaluation, and in the gradient. - Renamed indicator functions to
PositiveIndicator,NonNegativeIndicator,NegativeIndicatorandNonPositiveIndicator. - Added backprop for
ReduceMinthat was missing (thx @TuSKan) - Added
CosineSimilarity, numerically safe for 0 vectors. - Added
BitcastConvert.
- Fixed when using axes != -1 for
- Package
ml/context:- Added support for string derived types for
context.GetParamsOr[T].
- Added support for string derived types for
- Package
ml/train:- Created
ExecPerStepUpdateGraphFnfor those creating custom “TrainStep” functions.
- Created
- Package
ml/train/losses:- Triplet losses now work with context.
CheckExtraLabelsForWeightsAndMasknow (1) accepts weights and mask in any order; (2) normalize weights such that the sum is (non-masked) bathSize, preserving the ratio. This way the mean will be 1.- Losses with masks and weights fixed so weights/mask can be given in any order. Also, now using MaskedReduceMean if there is a mask, and all losses return a scalar.
- Package
xla:- Removed suppression of logging: new PJRTs are not outputting random debug messages anymore.
- Updated dependency to
gopjrtv0.6.2. - Replaced
stringerbyenumereverywhere.
v0.17.0: bitwise ops, triplet losses, new layers, fixes, and more.
- Backend API change: separating Logical and Bitwise versions of various ops derived from And, Or, Xor and Not.
- Updated dependency to gopjrt v0.6.0.
- Added “Flow Matching” examples/demo.
- Package
layers:- Added
layers.DropBlock, a type of dropout for images. - Added
layers.DropPathandlayers.DropPathFromContext, a type of dropout used in Residual connections, to drop full paths. layers.LayerNormalization:- up-scale precision by default if input is a Float16 or BFloat16. Low-precision lead to NaNs when reducing values for normalization. Added also a hyperparameter to configure normalization DType.
- Added
- Added
Context.RandomBenoullito sample from a Bernoulli (binary) distribution. - Correctly pretty-print Float16 and BFloat16 tensors.
- Several fixes and small improvements to command-line tool
gomlx_checkpoint. - Package
nanlogger:- Store only the stack-trace, and trim the stack into the nanlogger package.
- Does not exit, simply report the NanLogger. User can define a handler, if they want the training to exit.
- Use
IsFiniteto check for NaN and Infs: but we loose the type of NaN that happened. - Fixed nanlogger for Float16 and BFloat16; Also, it first prints other logged tensors, before failing with a NaN.
- Package
losses:- Added
ParamLoss: hyperparameter to define the loss, and many constant values. - Added
LossFromContext, usingParamLosshyperparameter. - Added
MakeHuberLossFromContext - Added experimental
MakeAdaptivePowerLossandMakeAdaptivePowerLossFromContext - Added TripletLoss: various negative sampling strategies and distance metrics.
- Added
- Package
graph:- More unit tests.
- Aliases nodes: allow setting aliases to nodes, and to retrieve them by those aliases. Useful for layers
or models to export intermediary nodes by their aliases. They are prefixed by scope. New methods are:
Node.WithAlias,Node.GetAlias,Graph.GetNodeByAlias,Graph.PushAliasScope,Graph.PopAliasScopeandGraph.IterAliasedNodes. - Added optional aliases nodes for
inceptionv3model. - Added
ReduceSkewnessand the aliasSkewness. - Added bitwise ops:
BitwiseShiftLeft,BitwiseShiftRightLogical,BitwiseShiftRightArithmetic,BitwiseAnd,BitwiseOr,BitwiseXor,BitwiseNot.
- Kept an alias from
And,OrandNotto theLogicalAnd,LogicalOr,LogicalXorandLogicalNot.
v0.16.1 - π 2024/12/19 π MatMul fixes
- MatMul fixed for some edge shape configuration and greatly accelerated in some cases.
v0.16.0 - π 2024/12/19 π Benchmarks, Speed improvements with gopjrt v0.5.0, Shared buffers.
- XLA backend now accepts the absolute path to the PJRT plugin (
GOMLX_BACKEND="xla:<pjrt_path>") - Updated GitHub action (
go.yaml) to only change the README.md with the result of the change, if pushing to themainbranch. - Added
Pow()gradient. - Package
tensors:- Added
Tensortransfer to/from device benchmarks. - Added
Tensor.CopyFrom()to transfer from one tensor (potentially on device) directly to another tensor – handy for pre-allocated tensors. - Added the convenience
Tensor.AssignFromFlat[T](toTensor, fromFlat) - Added “shared” tensors:
Tensor.IsShared()to check if using it. This saves one copy when using a tensor as input, when it is changed by the host in-between executions of a graph. Tensor.ConstFlatDatanow avoids a copy, ifBackend.BufferDatais available.
- Added
- Updated dependency to gopjrt v0.5.0, with support for shared buffers.
- Package
backendsandbackends/xla:- Added
Backend.HasSharedBuffer,Backend.NewSharedBufferandBackend.BufferData.
- Added
v0.15.3 - 2024/11/25
- Added pre-linking of CPU PJRT packages, both statically and dynamically.
- Re-enabling Mac version: currently only statically linked.
v0.15.2 - 2024/11/17
- Fixed printing of
uinttensors. - Fixed Dockerfile.
- Example CIFAR – changes will break previous checkpoints:
- Added inference example for Cifar models.
- Fixed model scope issue.
- Fixed KAN model issue.
- Added
checkpoints.Load(): just likecheckpoints.Build, but it complains if a checkpoint doesn’t exist. - Package
graph:- Added
ReduceVarianceand an aliasVariance. FixedReduceAndKeepif no axes are given. - Added
Stack: similar toConcatenatebut it creates a new axis.
- Added
- BSpline(Standard)-KAN:
- Better initialization – constant variance across layers.
- Extrapolate constant.
- Knots from -1.0 to 1.0.
- PiecewiseLinear-KAN: better initialization (constant variance across layers)
- Added
layers/lstmto create LSTM layers (experimental), in use by ONNX conversion to GoMLX. - Updated dependencies; gopjrt v0.4.7.
v0.15.1 - 2024/11/11 Updated downloader, in support for
- Updated dependency to gopjrt 0.4.5
- Moving package
huggingfaceanddownloaderto “github.com/gomlx/go-huggingface”: marked as deprecated. - Added checks and better error report for misuse of rngState in random functions.
- Added graph.RandomIntN and context.Context.RandomIntN.
v0.15.0 - 2024/11/01 Some API clean up; Added support for ONNX model conversion.
- Package
graph:- Added
MatMul, with semantics similar tonumpy.matmul - Renamed
ExpandDimstoInsertAxesand addedExpandAxes: the oldExpandDimshad a slightly different semantics than the usual (numpy)expand_dimsthat I hadn’t realized. The name change reflect that difference, and the newExpandAxesmatch the more common semantics ofexpand_dims. Added proper documentation. BREAKING CHANGE: easy to convert, but breaking anyway: it requires attention. We defined a deprecated ‘ExpandedDims’ that maps toInsertAxes, but it will be removed on the next release. - Graph/Node introspection: added
node.ConstantValue,node.IsConstantExpression.
- Added
- Package
context:- Fixed
ExecOnce: it was missing the variadic args for the computation graph. InspectVariableInScopeandInspectVariablerenamed toGetVariableandGetVariableByScopeAndNamerespectively. Alias to the older names left for compatibility (and marked as deprecated), but they will be removed in future versions.
- Fixed
- Package
tensors:- Added
Tensor.Summary(precision int)to pretty-print a summary of the values of a tensor, numpy-like.
- Added
v0.14.0 - 2024/10/24
- Package
context- New
VariableWithValueGraphto create/get a variable with value set to the graph value (Node). - New
IterVariablesandIterVariablesInScopethat use the new go 1.23 iterators.
- New
- New directory
uiwith various front-ends for displaying training progress, plots, etc.- BREAKING CHANGE: Refactored all UI tools under
uidirectory. It only requires changing the import, the APIs are not changed. - New package
fyneui, a window based training UI built using Fyne.io (EXPERIMENTAL)
- BREAKING CHANGE: Refactored all UI tools under
- Package
commandline:ParseContextSettingsnow allows parsing settings from a text file.- Fixed
SprintContextSettingsfor scoped hyperparameters. - Added
SprintModifiedContextSettingsto enumerate only hyperparameters set on the command line.
- New package
cosineschedule, refactored fromoptimizerspackage.- Added handling negative values for the hyperparameter
cosine_schedule_steps: they set the period of the cosine schedule as fractions of the total number of steps being trained.
- Added handling negative values for the hyperparameter
- Package
train:- Extensions to
Datasetinterface through additional interfaces. - Added optional
IsOnwershipTransfer() boolthat allows a Dataset to specify it should maintain ownership of the yielded tensors.
- Extensions to
- Updated
gopjrtv0.4.4 with the static XlaBuilder library, and experimental support for Apple/Metal.
v0.13.0 - 2024/10/07
- Package
initializers- All random initializers (
RandomUniformFn,RandomUniformFn,RandomNormalFn,GlorotUniformFn,XavierUniformFn) changed to take the context as a parameter, instead ofinitialSeed. - The
initialSeedis instead read from the hyperparameterinitializers.ParamInitialSeed(“initializers_seed”) and default toinitializers.NoSeed(0), which means the seed is randomly started.
- All random initializers (
- Added learnable rational functions (ml/layers/rational): can be used for activations or as univariate learnable
functions for KAN.
- Added rational notebook to generate initial values with approximations to arbitrary univariate functions.
- Package
graph:- Added
ConstCachedTensorto allow caching of constant tensors. - Fixed gradient of
Wherewhen operands are broadcast. - Added
ConsecutiveDifference,SliceAxis,BitsCount,IsFinite.
- Added
- Package
context:- Added
context.ExecOnceandcontext.ExecOnceN. context.GetParamOrnow returns the default value for a hyperparameter, if it is set to nil.
- Added
- Package
train:- Added
GetTrainLastStepVarwith information about last step of training: used for setting up various schedules. - Added
ResetComputationGraphsto allow the trainer to recreate computation graphs, if hyperparameters change in the middle of training – for training with very different schedules, for instance with freezing variables.
- Added
- Added
initializers.BroadcastTensorToShape: to allow variables to be initialized with a base value that is broadcast to each variable shape requested. - Package
optimizers- Added
MonotonicProjectionto project values (usually variables) to a monotonically increasing values, with a margin. - Added
ParamClipNaNto prevent NaNs going into gradient updates.
- Added
- Added
regularizers.ConstantL1 - Added
data.NewConstantDatasetwith a dummy dataset that can be used when training a model that generates its own input and labels. - Package
kan:- Discrete-KAN:
- Added separate (per input) split points.
- Added support for hyperparameter configured split points.
- Added monotonic projection of split points.
- Added ConstantL1 regularizer for control points.
- Added various types of schedules for smoothness: cosine, linear, exponential.
- Added normal distribution based perturbation.
- Added input grouping.
- Added GR-KAN (Rational Functions)
- Added PWL-KAN (Piecewise-Linear) with
kan.New().PiecewiseLinear().
- Discrete-KAN:
- Fixed OGBN-MAG GNN tests and demo.
v0.12.0 - 2024/09/23
- Updated dependency to gopjrt v0.4.0
- Added package
ml/data/downloaderfor parallel downloads, with support for authentication tokens. - Added package
ml/data/huggingfaceto download and load HuggingFace models into tensors. - Removed dependency to gonb/common. Added package
types/xsyncwith the required synchronization constructs. - Added
Shape.Dim(axis)as a shortcut, whereaxiscan use negative values. - Package
graph:Scalar(),AddScalar(),MulScalar(),DivScalar(), … are now generic, and take as input any non-complex number type, for improved convenience.- Added
ShapedLowerTriangular(),TakeLowerTriangular()andTakeUpperTriangular()
- Added
activations.Geluandactivations.GeluApproximate - Added
Erf, the “error function”, used when integrating the normal distribution, and its gradient. - Better Discrete-KAN support: configurable by hyperparameters.
v0.11.3 - 2024/08/29
- Immediately free accelerator (GPU) memory, where possible – as opposed to waiting for the garbage collector.
- This impacts the train.Loop and train.Trainer: they both immediately finalize the inputs and labels after use.
- Fixed nil point exception, where initializer was not properly set if value of a variable was loaded from a checkpoint.
- This impacted when restarting training with batch normalization.
- Fixes to the notebooks: some small things were broken on the v0.11.0 transition; large speed-up with v0.11.1 fixes.
v0.11.2 (was v0.11.1) - 2024/08/28
- Added support for
dtypes.BFloat16. - Added
tensors.FromScalar - Updated to gopjrt v0.3.0
- Package
graph:- Added
ExecOnceandExecOnceN - Added
CumSum ConvertDTypeto the same dtype is now a no-op.- Added
LogicalAllandLogicalAny - Added
DynamicSliceandDynamicUpdateSlice
- Added
- Package
backend:- Added
DynamicUpdateSlice,DynamicSlice,ReduceAndandReduceOr.
- Added
- Package
tensors:- Fixed race condition
Tensor.DonateBuffer. - Fixed unnecessary copying of tensor data in
Tensor.MaterializeOnDevices
- Fixed race condition
- Small fixes to documentation.
0.11.0 BREAKING CHANGE: Multi-Backend support; Added XLA/PJRT support (with gopjrt); meaningful speed ups; No more C code (all goes through gopjrt)
- MAJOR REFACTORING. Many breaking compatibility changes – it would be a major release number change, if it were > v1 already.
- New package
backends: no GoMLX can support different backends – but for now only xla is implemented.- Sub-package
xlaimplements the XLA/PJRT version, based ongithub.com/gomlx/gopjrtproject.
- Sub-package
- Package `tensors’:
tensor->tensors, more inline with other package names, and allow one to usetensoras a variable name.- Now there is only one
Tensortype (not an interface), that manages local and on-device storage. - Local storage using Go
- On-device storage now using generic
backends.Backendapi. - Improved testing using xla, greatly simplified.
- Package
graph:- Added support for donated tensors for execution.
- Added Nodes to introspect nodes of the graph – e.g.: investigate the largest nodes if one is running out of memory.
- Updated
OneHotto useWhere. - Added
GrowLeft,GrowRigth,Infinity,LogSoftmax,MaskedLogSoftmax BroadcastToDimsandBroadcastToShapewill automatically expand x to match.AdjustAxisToOperandRankmade public.
- Package
layers:- Added sub-package
fnnfor a simplified Feedforward Neural Networks implementation. - Added sub-package
kanfor KolmogorovβArnold Networks, and Discrete-KAN.- Included bspline GoMLX implementation.
- Added sub-package
regularizerswith automatic regularizer configuration. LayersDense,DenseWithBiasandkanuse it by default. - Added sub-package
activations– just a refactor of the code already in layers. - Added sub-package
batchnorm: refactored out batch normalization code.- Added
batchnorm.AveragesUpdateto update the average of the means and variances used for normalization. Also connected it to evaluation in plots libraries.
- Added
- Added sub-package
- Package
initializers:- Added
XavierFninitializer.
- Added
- Package
losses:- Fixed
CategoricalCrossEntropyLogitsandSparseCategoricalCrossEntropyLogits. - Added
MakeHuberLoss
- Fixed
- Package
metrics:- Fixed
- Package
exceptionsmoved to a separate repository ingithub.com/gomlx/exceptions. - Package
slicesrenamed toxslices, not to mix up with the new standard pacakgeslices. - Package
tensors/imagerenamedtensors/images.- Added all numeric dtypes support; Added conversion tests to all types.
- Added support to
dtypes.Float16.
- Package
context- Renamed
context.NewContexttocontext.New. - Added
Variable.Reset: reset a variable, to be reinitialialized.
- Renamed
- Package
checkpoints: addedExcludeParamsandExcludeAllParams. - Package
plots- Added
Point.Shortfor short-name of metrics in saved metrics.
- Added
- C/C++ code:
- Completely removed, all C/C++ dependencies are in
gopjrtproject now. - Removed reference to AOT compilation, see #52.
- Completely removed, all C/C++ dependencies are in
- Added command-line tool
gomlx_checkpointsto introspect checkpoints. - Added
cmd/run_coverage.sh.
0.10.0 - 2024/06/12
types.shapespackage:- Added support for
Float16training – tested with GNNs.- Up-precision metrics dtypes if they are
Float16. - Allow arbitrary dtype for
Adamoptimizer – it requires at leastfloat32, even if the model runs onfloat16. - DType dependent
epsilonvalues forSoftmaxandAdam– current values would lead toNaNwithfloat16. - Added
DType.IsFloat16to check forFloat16orBFloat16(not yet well-supported).
- Up-precision metrics dtypes if they are
- Added support for
Int8,Int16,Uint8andUint16. - Renamed
UInt{X}toUint{X}and added a deprecated alias to the old form (so it still compiles).
- Added support for
- Added logging of time to build and compile graph. Last version improved a lot the execution time, but slowed the compilation.
- Context.Variable:
- Fixed
Variable.SetValueGraphwhen the shape changes. Improved some documentation. - Fixed
Variable.SetValuePreservingOldwhen shapes change. - Fixed checking of loaded variables – that they are not newly created.
- Fixed
- Package
optimizers:- Fixed optimizer constructor
FromContextto allow further configuration of the optimizer by setting other hyperparameters into context. - Added hyperparameter
clip_step_by_value, a clip by value applied to gradient updates. Adamoptimizer:"clip_step_by_value", "adam_epsilon", "adam_dtype"hyperparameters support.MustOptimizerByNamenow takes also the context for the optimizer hyperparameters. – this breaks the API.
- Fixed optimizer constructor
- Package
checkpoints:- Allow adding variables to exclude from saving after checkpoint is created – for newly created variables
- Added
slices.CloseToEpsilonto easily customize tests. Scatterdoesn’t assume indices are sorted or unique.- Plotly training plots: added
WithCustomMetricFnfor custom metrics andScheduleEveryNSteps. - Added OGBN_MAG GNN example:
- Including Layer-Wise Inference.
- Package graph:
- Added
Shift,ShiftLeft,ShiftRight,ShiftWithScalar,ShiftWithValue.
- Added
- Dummy package for xla.AOT and xla.StableHLO APIs enabled when using “google3” build tag: this allows the dependency to the corresponding C++ code to be dropped. (Thanks @tdegris).
- Removed xla.AOTExecute: see issue #52
0.9.1 - 2024/04/19
- XLA integration:
- Added “SKIP_ABSL_INITIALIZE_LOG”, for conflict cases, while https://github.com/abseil/abseil-cpp/issues/1656 is not solved.
0.9.0 - 2024/04/18
- Binary GOMLX+XLA distribution:
- Now requires package
libnccl > 2.21to be installed. - Updated to CUDA version
12.3and Cudnn8.9. - Newer version GPU performance measured on a GNN model improved significantly (In one model the median train step went from 160ms to 110ms). On CPUs measured on the “CSI Adult” dataset remained the same.
- Now requires package
- Open Graph Benchmark OGBN-MAG dataset support and example models (FNN and GNN).
- Added sampler library.
- Package
graph:- added
MirroredLog1P. - Functions that take masked inputs are being renamed to use a “Masked” prefix (e.g.:
MaskedReduceSum,MaskedReduceMean,MaskedReduceMax,MaskedReduceAndKeep). - Added
MaskedReduceMean. - Added
IdentityWithCustomGradient, to allow for manual tweaks to the gradient. - Fixed for special case of gradient on
broadcastInDimVJP.
- added
- Package
context:- added
Manager()accessor method. - added
SetParamsto set various parameters at once. - renaming name of parameters to be prefixed with “Param”.
- added
- Package
context/initializers:- added
GlorotUniformFn - random initializers use zeros for non-float variables by default (as opposed to crash)
- default initializer now matches Keras (random uniform from
[-0.05, 0.05]).
- added
- Package
context/checkpoints:- added
ExcludeVarsFromSavingto allow preventing saving large static variables. - fixed issue with lazy-loading of variables.
- added
- Package
shapes:- Added
Check()andAssert()to check for both, dtype and dimensions. - Added
EqDimensions()to compare dimensions. Make(dtype, dimensions...)now makes a copy of thedimensionsslice given.
- Added
exceptions: refactoring to use separate packagegithub.com/gomlx/exceptions.- Package
layers:- Added
...FromContextfamily of functions, that apply layers according to parameters set in the context:ActivationFromContext,DropoutFromContext,NormalizeFromContextandMaskedNormalizeFromContext. LayerNormalization: fixed shaping bug, and renamedscaletogain, more aligned with original paper- This will break previous models using LayerNormalization!: this is not taken lightly, but as it is, it is wrong and depending on the shape it may be adversely affecting some models.
LayerNormalization: addedMasksupport; added defaults from context parameters.DropoutStatic: Dropout api where one can pass a static dropout rate as a Go float.AddL2RegularizationStatic: Add L2 regularization on values, where the amount of regularization is static.
- Added
- Package
optimizers:- Added
CosineAnnealingSchedule.FromContext. NewMinLearningRateis 0.0 (same used in Keras).
- Added
- Package
losses:- Added support for
weightsandmask.
- Added support for
- Package
ml/data:- Renamed
Map->MapWithGraphFn: to make it explicit that the transformation happens in accelerator. - Added
Map: a map function to a dataset that runs in host (as opposed to in accelerator/XLA). - Added
Freeing: a dataset wrapper that frees inputs and labels in between each call toYield: to control GPU memory usage. It replacesloop.FreeInput()
- Renamed
- Package
commandline:AttachProgressBarnow displays a continuously updated table with metrics generated during training. This only works in the commandline (not in notebooks).- Asynchronous display of updates: it works better with very fast training loops or if running over a slow terminal connection (network).
- Added
CreateContextSettingsFlagandParseContextSettings.
- Package
plots,margaidandplotly:- Added
margaid.Plots.PlotEveryNSteps. - Remove
margaid.Plots.Done, no longer needed, as closing of writing file is done automatically at the end of the training loop. - Added Plotly plots.
- Added
- Ahead-Of-Time compilation:
- Not yet working, and actually broken. This still requires some XLA hacking to get right (if at all possible).
0.8.0 - 2023/11/28
- DType and Tensors:
- Added support to Go’s
int64– breaks compatibility because DType Int64 when converted back to Go becomesint64and notint. - Renamed Local.Flat -> Local.FlatCopy : not to be mixed with LocalRef.Flat (which is not a copy).
- Added support to Go’s
- C++ code integrating with XLA:
- Enable copy elision – which makes
std::movenot necessary. - Temporarily copied
xla/mlir/utilslibrary todeps/xla_mlir, since it is not available in all XLA distributions.
- Enable copy elision – which makes
- Package
context:- Added
context.GetParamOrandcontext.GetGraphParamOr: it uses generics to cast to the desired type, and allowing a default value to be returned. - Added
Context.DeleteVariableandContext.DeleteVariablesInScope.
- Added
- Package
checkpoints:- Added recovery of some basic types (numeric and slices) when loading params from Json.
- Added unique incrementing id to checkpoint file names.
- Package
exceptions: special case runtime panics to preserve its stack-trace. - Package
train:Loopautomatically sets LoopStep to context’s “global_step” parameter.- Models (e.g.: unsupervised) can return
nilfor predictions.
- Package
optimizer:- Added
GetGlobalStep. - Interface now include
Clear(ctx)to clear all variables used by an optimizer –> this also breaks compatibility for any custom optimizer, unfortunately. But if it broke you, it should be a very easy fix, since most optimizers use a fixed scope for its variables, andContext.DeleteVariablesInScopewill do the job. - Added
DeleteGlobalStep.
- Added
- Package
context: AddedContext.EnumerateVariablesInScope()method. - Package
graph:- Added optional
reduceAxesparameter toL2NormandL1Norm. - Added
L2NormSquare,L2NormalizeandL2NormalizeWithEpsilon.
- Added optional
- Package
nanlogger: addedAttachToTrainer; improved docs. - Package
margaid:- automatic ending plot when loop finishes.
- option to plot evaluation losses separately from training losses – for when they include different terms.
- Example “Dogs vs Cats”:
- Added Byol (Bootstrap Your Own Latent) regularized models.
- Added support for generating pairs of images for BYOL model.
v0.7.2 - 2023/10/27
- Fixed C/C++ mismatching malloc/new/new[] and free/delete/delete[].
- Formatted C/C++ code using clang-format.
- Increased static size threshold for string memory leak test.
- Small StableHLO support improvements.
- Fixed and updated devel docker (‘janpfeifer/gomlx_devel:latest’).
v0.7.1 - 2023/10/26
- Fixed search of CUDA paths under /usr/local.
- Fixed(?) XLA ShapedBuffers issue causing spurious crashes after update.
- JupyterLab docker image uses gomlx_xla C library from local disk (as opposed to downloading it).
v0.7.0 - 2023/10/25
- Update OpenXLA/XLA dependencies:
- Updated
devel/Dockerfilewith fixed dependencies, and better instructions to work around Bazel cache. - Fixed several build breaking issues intersecting XLA and CUDA.
- Added automatic finding of CUDA directory (for
libdevice.10.bcfile).
- Updated
- Oxford Flowers 102: Added support for GoNB widgets; Improved image
Generator. - Fixed rare race-condition with GC and CGO.
- Minor typos and reformatting (no execution code change).
- Added various badges to README.md.
v0.6.0 - 2023/08/07
- FFT, RealFFT, InverseFFT and InverseRealFFT operations.
- Added a small notebook demo for FFT.
- Added Complex/Imag/Real/Conj operations to manipulate complex numbers (and their gradients).
- Added support for complex numbers for ConvertType. Defined gradient for ConvertType.
- Added Complex128 and Complex64 dtypes support.
- Added “spacers” (like “*” for axis ranges) and
AxisElem()forSlice(). - Package
examples/notebook/gonb/margaid: AddedPlots.AddValuesandPlots.PlotToHTML; FixedPlotsreturned byNew()to be linear scale by default. - Included build of tcmalloc (
gperftools) from thec/directory, when buildinglibgomlx_xla.so. Still thelibtcmalloc.sois needed in runtime. A copy is included in thegomlx_xla.tar.gzpackage (underlib/gomlx) and can be copied from there if needed. This enables build for Macs β see #23.
v0.5.0 - 2023/07/10
- Error handling revamp: using
panicto report errors β it works as exceptions. This is a very large change affecting most of the code. - Added
NewManager, a simpler interface to create aManagerobject with defaults. - Added
margaid.NewDeafult, simplifying adding of plots for the default cases. - Examples:
- UCI-Adult: replaced
adult.Datasetto the much simpler and powerfuldata.InMemoryDataset.
- UCI-Adult: replaced
- Remove
tensor.Local.Data(): now all access is done throw thetensor.Local.AcquireData()and release, to prevent a race condition with the garbage collector. - Update of XLA C++ library.
v0.4.1
- Diffusion example: Added conditioning on flower type; Improved documentation; several other small improvements.
- NanLogger: added tool to report back (with stack trace and scope) on the occurrences of NaN/Inf in the computation graph.
- Checkpoints: added
Handler.LoadedVariables()method for inspection of loaded checkpoint. - Bug fixes:
- RandomNormal: fixed rare numerical issues in RandomNormal, that would generate -Inf.
- Context: some rare condition on feeding variable values to executor.
- InMemory dataset: handling cases where dataset returns the same tensor as input and label.
- Slices: refactored
IotaSlice()toIota[T number]().
v0.4.0
- Models: Diffusion example model (working draft); added Kernel Inception Distance (KID) metric implementation.
- Contexts: added
context.NumParameters(),context.Memory(),context.RandomUniform,context.RandomNormal,context.RngStateWithSeedandcontext.RngStateReset. - Random numbers revamped, making graph purely functional. Also, ‘context.Context’ provides the facilities to carry around random number generator state.
- Added ops:
ArgMax,ArgMin,ExpandLeftToRank,RandomUniformandRandomNormal. - Datasets:
InMemoryFromData(for testing);Normalization()returns mean and standard deviation for dataset;Map()creates new dataset that maps a function to wrapped dataset;Take(n)to take n elements from a dataset. - Layers: Added
layers.Activationthat takes the activation type as a string (easy to plug to a flag). - Metrics: added context as the first parameter to
metrics.BaseMetricGraph. - Plots (margaid): added support for saving and restoring points (when continue training); optional log-scale plots; allow for arbitrary rate of updates; added support for loading data from multiple models.
- Losses: added
losses.MeanAbsoluteError. - Optimizers: added
optimizers.GetGlobalStepVar. - Training loop (
train.Loop): addedMeanTrainingStepDuration(); check for infinity and “nan” losses – training is immediately interrupted with an error. - Added to slices package:
Flag(),At(),Last(),Copy. - Force download of the correct version of the C++ library in the Jupyter docker – this prevents Docker cache using an older version.
- Improved error messages in some cases.
- Tensors: added new dtypes
UInt32andUInt64; changed return type oftensor.FromAnyValue()totensor.Tensor.
v0.3.1
- DogsVsCats: added inception model type; fix of metrics types for plotting.
- BatchNormalization: differentiable inference code; added Trainable() support.
- Fixed notebooks broken with v0.3.0 changes.
- Skip plotting batch loss (we keep the moving average of the batch loss though).
v0.3.0, 2023-06-01
- Inception V3 model: including downloading pre-trained weights and various configurations.
- Tensors: added Load, Save for Local tensors.
- Added HDF5 format support for loading values.
- Skip evaluation during test of demos.
- Fixed dogsvscat demo’s inconsistent mixed datasets issue, by yielding a correct spec.
- Added SumPool and MeanPool
- Changed API for defining images channels axis configuration (in pooling and convolution operations).
v0.2.1, 2023-05-20
- Tensors: clean up, fixed memory race (with Go’s GC not knowing about and C++ pointers), improved docs and test.
- Created tests from the Adult, Cifar, “Dog vs Cats” and Imdb demos.
v0.2.0, 2023-05-18
- Added Oxford Flowers 102 Dataset example (no model yet).
- Added Datasets tools: Parallel (improved), Batch, InMemory.
- Added ops: GatherSlices (and its gradient), EinsumAxes, MaxScalar, MinScalar, ExpandAndBroadcast.
- Added Interpolate operation – for series/image/videos resizing.
- Added support for int32 (
shapes.I32) and uint8 (shapes.UInt8orshapes.U8for short). - Added Set[] to
typespackage. - Added
types/tensor/imagewith image conversion tools to/from tensors. - Added Serialize and Deserialize for
tensor.LocalandShape. - Fixed issue with
tensor.Devicenot using the correct clientId.
v0.1.1, 2023-04-29
- Small fixes to example notebooks.
- Added documentation to the various dataset libraries.
- Renamed release asset not to include the version name to simplify downloading the latest one.
- Updated
docker/jupyterlabfor the new release.
v0.1.0, 2023-04-28
- Updated OpenXLA/XLA dependency to the current at 2023-04-28.
- Added
docker/develfor development and building the Go/C++ bridge library. - Changed
Exec.Callmethod to return an error directly. - Added
docker/subdirectory. - Added
docker/jupyterlab: docker that includes JupyterLab and GoNB for quick getting started. Available in janpfeifer/gomlx_jupyterlab for now. - Fixed various documentations.
- Tutorial clean up and added links.
- Fixed cache issue in
ParallelDataset.
v0.0.1
- Initial upload of experimental but functional GoMLX including examples.
Last updated April 25, 2026