Release Notes#
All notable changes to this project will be documented in this file.
Version 0.3.x#
v0.3.0 (2026-05-09)#
Major refresh of the user-facing API around a single :class:MuData. The new entry points (setup_mudata, MIDAS(mdata), get_latent_representation, get_imputed_values, save / load) compose directly with mdata.obsm, sc.pp.neighbors(use_rep=...), and the rest of the standard single-cell stack. A new plotting namespace scmidas.pl and a data-prep tutorial round out the package for users coming straight from raw 10x output.
π New β
MIDASentry points centred onMuDataMIDAS.setup_mudata(mdata, batch_key=...)β register a MuData (writes config tomdata.uns['_scmidas']).MIDAS(mdata, ...)β construct directly from a registered MuData; instance state instead of class-level state (fixes a multi-instance interference bug).model.get_latent_representation(kind='c'|'u'|'joint')β returns the joint latent aligned tomdata.obs_names. Drop straight intomdata.obsm['X_midas'].model.get_imputed_values(modality='rna')β returns imputed counts aligned tomdata.obs_names.model.save(dir)/MIDAS.load(dir, mdata)β symmetric save/load (writesmodel.pt+setup.json).MIDAS(mdata)now defaults totransform={'atac': 'binarize'}whenever'atac'is among the modalities (override by passing your owntransformdict).
π New β
scmidas.plplotting namespacescmidas.pl.umap(mdata, basis='X_midas', color=[...])β one-line UMAP that works around the current scanpy + MuData plotting limitations via a thin AnnData wrapper.scmidas.pl.modality_grid(model, mdata, label_key=...)β collapses the per-modality vs per-batch grid (~22 lines in the previous demos) into one call. Modality columns are ordered ATAC, RNA, ADT, Joint when present.
π New β
scmidas.datasets.from_dirLoads the directory-format datasets (
mat/<m>.mtx,mask/<m>.csv,feat/feat_dims.toml) into aMuData, including masks, labels, and ATAC chunk dims.
π New tutorial β Preparing your data
docs/source/tutorials/basics/preparing_your_data.ipynbwalks from a public 10x Genomics 5k PBMC CITE-seq sample through QC, HVG selection, MuData wrap, MIDAS integration, Leiden clustering, and a synthetic mosaic example.
π Docs cleanup
inputs.rst+outputs.rstmerged intodata_layout.rstβ a single page describing the MuData input/output contract. The directory format is moved to an βadvancedβ section.All three demos (
demo1,demo2,demo3) rewritten to use the new API:from_dirβsetup_mudataβMIDAS(mdata)βget_latent_representation. The 22-line per-modality grid block becamescmidas.pl.modality_grid(model, mdata). Each demo gained a 6.4 βAfter integrationβ section (Leiden + UMAP).README adds a βBring your own dataβ section linking the new tutorial and the data-layout reference.
π Backwards compatibility
MIDAS.configure_data_from_mdataandMIDAS.configure_data_from_dirstill work β they emit aDeprecationWarningand will be removed in 0.4.0.save_checkpoint/load_checkpointstill work; new code should usesave/load.
π Fixes
predict(joint_latent=False)no longer raisesKeyError: 'z_c'.Multiple
MIDAS()instances in one process now have independent state (was previously class-level β a second instance would clobber the first).
Version 0.2.x#
v0.2.0 (2026-05-03)#
π New β
scmidas.integrate(mdata)one-line entry pointA thin top-level wrapper around
MIDAS.configure_data_from_mdatatrain()with toy-tuned defaults (batch_size=128,max_epochs=65,lr=3e-4) so that the bundled quickstart dataset converges in roughly one minute on a single mid-range GPU. The fullMIDASclass API is unchanged for users who need control.
β οΈ The defaults are tuned for the toy quickstart only. For real datasets, override
max_epochs(1000-2000) and considerbatch_size=256.
π New β bundled quickstart dataset
scmidas.datasets.quickstart()returns a 1600-cell PBMC RNA+ADT mosaic MuData (4 batches, full mosaic structure: one RNA-only, one ADT-only, two paired). 500 RNA HVGs + 224 ADT features, 2.66 MB shipped inside the wheel.Source: hand-tuned subset of
wnn_mosaic_8batch_mtx. Build script:scripts/build_quickstart_demo.py.
π Documentation
New
examples/quickstart.ipynbβ pre-rendered notebook that users can open in Colab via the new badge in the README, no local install required.README quickstart rewritten: replaces the previous
...API sketch with a runnable five-line snippet usingscmidas.datasets.quickstart()+scmidas.integrate(), followed by the rendered UMAP image.
βοΈ Packaging
pyproject.tomlshipsdata/*.h5muas package data so the quickstart dataset travels with the wheel.Module-level
logging.basicConfig(level=INFO)removed from five files (config,data,model,nn,utils); each now does the canonicallogger = logging.getLogger(__name__)instead. Demo notebooks calllogging.basicConfigthemselves so visible output is unchanged. Libraries should not callbasicConfigβ it overrides the userβs own logging config.
Version 0.1.x#
v0.1.19 (2026-05-03)#
π¦ Packaging β narrow torch upper bound to
<2.11torch 2.11 dropped Volta (V100, CC 7.0) and Pascal (P100, GTX 10xx, CC 6.x) from its default
cu128/cu129wheels (to ship cuDNN 9.15.1, which is incompatible with those archs). On those GPUspip install scmidas==0.1.18would silently install a torch that fails at the first CUDA op withno kernel image is available for execution on the device.The pin now reads
torch>=2.5,<2.11(with matchingtorchvision<0.26/torchaudio<2.11). Users on Ampere/Hopper/Ada/Blackwell GPUs can manually upgrade past the cap; users on Volta/Pascal stay on a working default install.No source-code change β same scmidas as 0.1.18.
β¨ Enhancements
import scmidasnow runs a one-time GPU self-check: if the local torch wheel has no kernels for the local GPU, scmidas emits aUserWarningwith actionable guidance (downgrade torch or use the cu126 wheel) instead of the user later seeing a rawno kernel image is availableerror from somewhere deep in their training loop. The check no-ops on CPU-only environments and on working GPU setups.
βοΈ CI
Test matrix gained a
torch 2.10job (the new upper bound) and dropped the previous experimentaltorch latestjob. Lower bound remainstorch 2.5.1across Python 3.10 / 3.11 / 3.12.
v0.1.18 (2026-05-02)#
π Bug Fixes (DDP + mosaic data)
Default
sampler_type='auto'now picks the DDP sampler when a process group is initialized. Previously'auto'silently fell back toMultiBatchSampler(a rank-agnostic sampler), so DDP runs computed each batch on every rank in parallel β correct but with no throughput gain over single-GPU. Users who already passedsampler_type='ddp'explicitly are unaffected.MyDistributedSamplernow derives its shuffle order from a seededrandom.Randominstance (cross-rank-consistent for the dataset visit order, rank-specific for the within-dataset shuffle), and properly initialises the baseDistributedSampler. Previously it used the global Pythonrandommodule, so each DDP rank sampled a different sub-batch at the same step. With non-uniform per-sub-batch modality combinations (mosaic data), this produced different encoder graphs per rank and caused NCCL all-reduce to hang underfind_unused_parameters=False(Lightning default), eventually triggering a watchdog timeout.Heads-up β DDP reproducibility: the DDP sampling order has changed as a side-effect of the fix. Existing seeded DDP runs will produce different numerics; checkpoints from prior versions still load and continue training, but the post-fix sampling sequence is not bit-equivalent to the pre-fix one. Single-GPU users (using
MultiBatchSampler) are unaffected.
π Bug Fixes (API hardening)
MIDAS.configure_optimizersno longer raisesAttributeErrorwhen entered through the simplerconfigure_datapath (load_optimizer_statewas only set byconfigure_data_from_dir/configure_data_from_mdata/load_checkpoint).MIDAS.configure_datadefaultbatch_namesnow use f-string formatting (f'batch_{i}') instead of the literal string'batch_%d'repeatedlen(datalist)times.Bad ATAC configuration in
configure_datanow raisesValueErrorinstead of callingexit()(which killed the Jupyter kernel without a traceback).download_filenow accepts bothstrandpathlib.Pathfordest_path. The signature was annotatedstrbut the body called.name.Encoder.forwardno longer mutates the callerβs batch dict. The mask multiply is now out-of-place; the previous in-placedata[m] *= maskcorrupted upstream tensors for any modality without atrsf_before_enc_*transform. Mathematically equivalent (the mask is a 0/1 modality-presence indicator, andcalc_recon_lossalready multiplies the loss by the same mask), but makes the encoder safe to re-call on the same batch (e.g.predictβsmod_latent/translatepaths).VAE.forwardno longer wraps the PoE call in a baretry/exceptthat swallowed real errors with a malformedlogging.debugcall.
β Tests
Added
tests/test_invariants.pypinning down the bugs above plus the DDP sampler determinism fix (cross-rank disjoint indices,set_epochactually changes ordering).
π Documentation
Each basics demo now exposes a single
# === GPU configuration ===block (GPUS+STRATEGY) at the top so switching from single-GPU to multi-GPU only requires editing two values.Removed the redundant standalone
advanced/multi_gpu.rsttutorial β its contents now live inline in the basics demos where the failure modes would actually be encountered.README: removed the duplicated MuData section (the
from_mdatapath is one link away in the docs), corrected the Quick Example comment about input format, and fixed the License badge link.
βοΈ Packaging
Version is now single-sourced from
pyproject.toml;scmidas.__version__and the Sphinxreleaseboth read it viaimportlib.metadata.version("scmidas")instead of duplicating the literal in three files.Relax the
torchpin from>=2.5,<2.6to>=2.5,<3(and the matchingtorchvision/torchaudiocompanions). The previous<2.6cap was a workaround for a suspected Lightning-DDP incompatibility; torch 2.8 has now been verified end-to-end in the mosaic DDP path (1000-epoch run with UMAP and numerics consistent with the single-GPU baseline), so users on torch 2.6 / 2.7 / 2.8 no longer have to manually override the pin.
v0.1.17 (2026-03-17)#
π Bug Fixes
Remove multi-threading for UMAP visualization during training.
v0.1.16 (2026-03-05)#
β¨ Enhancements
Asynchronous UMAP visualization during training
UMAP plots can now be generated asynchronously to avoid blocking the training loop.
Improved prediction API
Flexible prediction outputs: choose between returning results in memory or saving directly to disk.
Support streaming prediction to disk, enabling inference on large datasets with minimal memory usage.
Added support for .npy format for faster saving and loading of prediction outputs.
More flexible load_predicted function
Allow loading specific batch names and variable groups (e.g. z_c, z_u, x_impt).
Improves efficiency when working with large prediction outputs.
Add model.train()
π Bug Fixes
Allow loading specific batch names and variable groups (e.g. z_c, z_u, x_impt).
Delete init in the sampler class. Issue #28.
v0.1.15 (2025-11-28)#
β¨ Enhancements
Update
MIDAS.predict()functionality:Added support for
AnnDataoutput format.Optimized output handling to reduce GPU memory usage (offloaded outputs to CPU).
Batch Handling: Added support for automatically fetching batch names.
π Bug Fixes
Removed redundant RNA and ADT layers.
v0.1.13 (2025-08-28)#
π Documentation
Enhanced and clarified tutorials for a better user learning experience.
v0.1.12 (2025-07-03)#
β¨ Enhancements
Updated the source for fetching pre-trained models to ensure reliability.
v0.1.10 (2025-07-02)#
β¨ Enhancements
Adjusted default layer dimensions for the ATAC encoder/decoder (
dims_before_enc_atac=[128, 32]anddims_after_dec_atac=[32, 128]) to improve model performance.
v0.1.9 (2025-06-23)#
βοΈ Miscellaneous
Updated the minimum required Python version to
>=3.10.
v0.1.8 (2025-06-12)#
π New Features
Added support for the
.mtx(Matrix Market) input format for broader data compatibility.Introduced live UMAP visualization during training via TensorBoard. This can be enabled with
MIDAS.configure_data_from_dir(viz_umap_tb=True).Added a new utility function
data.download_models()to easily fetch pre-trained models.
β¨ Enhancements
The
MIDAS.predict()method now returns prediction results directly, improving efficiency and making it easier to chain operations.Updated the demonstration dataset with more relevant examples.
π Bug Fixes
Fixed a critical bug that prevented the optimizer from being re-initialized after loading a checkpoint with
MIDAS.load_checkpoint().
π Documentation
Updated and expanded documentation and tutorials to reflect recent changes.
v0.1.7 (2025-01-22)#
π Bug Fixes
Resolved a bug reported in Issue #22.
v0.1.6 (2025-01-20)#
π Bug Fixes
Fixed a bug where the
dims_brefore_enc_atacconfiguration was applied incorrectly. It is now conditionally used only when multiple ATAC input dimensions are provided.
v0.1.5 (2025-01-17)#
π Bug Fixes
Fixed an issue where Gaussian sampling was incorrectly performed during inference for modality-specific embeddings, leading to more deterministic outputs.
v0.1.4 (2024-12-31)#
π Bug Fixes
Corrected the data loading logic in
MIDAS.get_emb_umap()by fixing theload_predicted()utility.
v0.1.3 (2024-12-21)#
π New Features
Integrated with PyTorch Lightning to enable streamlined multi-GPU training.
Integrated with TensorBoard to facilitate real-time visualization of training and validation losses.
Refactored the
MIDASarchitecture to support easier integration of new custom modalities.
Version 0.0.x#
v0.0.18 (2024-07-29)#
π Bug Fixes
In
utils.viz_mod_latent(), rotated the visualization for better interpretation and fixed a bug that caused an error when processing a batch of inputs.
v0.0.17 (2024-07-16)#
π New Features
Added the
eval_mod()function for modality evaluation.Added
skip_sparameter toinit_model()for more flexible model initialization.
β¨ Enhancements
Removed the deprecated
eval_scmib()function.
π Documentation
Added Tutorial 3, covering new evaluation methods.
v0.0.16 (2024-07-11)#
π Bug Fixes
Fixed an issue in
utils.load_predicted()as reported in Issue #5.
v0.0.15 (2024-07-11)#
π Bug Fixes
Fixed an issue in the
reduce_data()function.Corrected the sorting logic in
utils.ref_sort()as reported in Issue #9.
v0.0.14 (2024-07-04)#
β¨ Enhancements
Improved compatibility and performance on Windows operating systems.
Enhanced functionality for environments without GPU support.
v0.0.13 (2024-07-04)#
π New Features
Introduced
scmidas.datasets.GenDataFromPath()for more flexible data input from custom paths.Added
viz_diff()andviz_mod_latent()for advanced visualizations.Added new evaluation functions.
β¨ Enhancements
Renamed
pack()toreduce_data()for better clarity.
π Bug Fixes
Addressed several minor bugs to improve stability.
βοΈ Miscellaneous
Upgraded the minimum required Python version from
3.8to>=3.9to accommodatescibdependencies.
π Documentation
Updated all tutorials to align with the latest API changes.
v0.0.8 (2024-06-20)#
π Initial Release
First public version of the project.