astir.models package

Module contents

Classes:

AstirModel(dset, random_seed, dtype[, device])

Abstract class to perform statistical inference to assign.

CellStateModel([dset, const, dropout_rate, …])

Class to perform statistical inference to on the activation

CellTypeModel([dset, random_seed, dtype, device])

Class to perform statistical inference to assign cells to cell types.

StateRecognitionNet(C, G[, const, …])

State Recognition Neural Network to get mean of z and standard deviation of z.

TypeRecognitionNet(C, G[, hidden_size])

Type Recognition Neural Network.

class astir.models.AstirModel(dset, random_seed, dtype, device=device(type='cpu'))[source]

Bases: object

Abstract class to perform statistical inference to assign. This module is the super class of CellTypeModel and CellStateModel and is not supposed to be instantiated.

Methods:

fit(max_epochs, learning_rate, batch_size, …)

Runs train loops until the convergence reaches delta_loss for delta_loss_batch sizes or for max_epochs number of times

get_assignment()

Get the final assignment of the dataset.

get_data()

Get model data

get_losses()

Getter for losses.

get_scdataset()

Getter for the SCDataset.

get_variables()

Returns all variables

is_converged()

Returns True if the model converged

fit(max_epochs, learning_rate, batch_size, delta_loss, delta_loss_batch, msg)[source]

Runs train loops until the convergence reaches delta_loss for delta_loss_batch sizes or for max_epochs number of times

Return type

None

get_assignment()[source]

Get the final assignment of the dataset.

Return type

DataFrame

Returns

the final assignment of the dataset

get_data()[source]

Get model data

Return type

Dict[str, Tensor]

Returns

data

get_losses()[source]

Getter for losses.

Return type

Tensor

Returns

self.losses

get_scdataset()[source]

Getter for the SCDataset.

Return type

SCDataset

Returns

self._dset

get_variables()[source]

Returns all variables

Return type

Dict[str, Tensor]

Returns

self._variables

is_converged()[source]

Returns True if the model converged

Return type

bool

Returns

self._is_converged

class astir.models.CellStateModel(dset=None, const=2, dropout_rate=0, batch_norm=False, random_seed=42, dtype=torch.float64, device=device(type='cpu'))[source]

Bases: astir.models.abstract.AstirModel

Class to perform statistical inference to on the activation

of states (pathways) across cells

Parameters
  • dset (Optional[SCDataset]) – the input gene expression dataset, defaults to None

  • const (int) – See parameter const in astir.models.StateRecognitionNet(), defaults to 2

  • dropout_rate (float) – See parameter dropout_rate in astir.models.StateRecognitionNet(), defaults to 0

  • batch_norm (bool) – See parameter batch_norm in astir.models.StateRecognitionNet(), defaults to False

  • random_seed (int) – the random seed number to reproduce results, defaults to 42

  • dtype (dtype) – torch datatype to use in the model, defaults to torch.float64

  • device (device) – torch.device’s cpu or gpu, defaults to torch.device(“cpu”)

Methods:

diagnostics()

Run diagnostics on cell type assignments

fit([max_epochs, learning_rate, batch_size, …])

Runs train loops until the convergence reaches delta_loss for delta_loss_batch sizes or for max_epochs number of times

get_correlations()

Returns a C (# of pathways) X G (# of proteins) matrix where each element represents the correlation value of the pathway and the protein

get_final_mu_z([new_dset])

Returns the mean of the predicted z values for each core

get_recognet()

Getter for the recognition net

load_hdf5(hdf5_name)

Initializes Cell State Model from a hdf5 file type

diagnostics()[source]

Run diagnostics on cell type assignments

See astir.Astir.diagnostics_cellstate() for full documentation

Return type

DataFrame

fit(max_epochs=50, learning_rate=0.001, batch_size=128, delta_loss=0.001, delta_loss_batch=10, msg='')[source]

Runs train loops until the convergence reaches delta_loss for delta_loss_batch sizes or for max_epochs number of times

Parameters
  • max_epochs (int) – number of train loop iterations, defaults to 50

  • learning_rate (float) – the learning rate, defaults to 0.01

  • batch_size (int) – the batch size, defaults to 128

  • delta_loss (float) – stops iteration once the loss rate reaches delta_loss, defaults to 0.001

  • delta_loss_batch (int) – the batch size to consider delta loss, defaults to 10

  • msg (str) – iterator bar message, defaults to empty string

Return type

None

get_correlations()[source]

Returns a C (# of pathways) X G (# of proteins) matrix where each element represents the correlation value of the pathway and the protein

Return type

array

Returns

matrix of correlation between all pathway and protein pairs.

get_final_mu_z(new_dset=None)[source]

Returns the mean of the predicted z values for each core

Parameters

new_dset (Optional[SCDataset]) – returns the predicted z values of this dataset on the existing model. If None, it predicts using the existing dataset, defaults to None

Return type

Tensor

Returns

the mean of the predicted z values for each core

get_recognet()[source]

Getter for the recognition net

Return type

StateRecognitionNet

Returns

the recognition net

load_hdf5(hdf5_name)[source]

Initializes Cell State Model from a hdf5 file type

Parameters

hdf5_name (str) – file path

Return type

None

class astir.models.CellTypeModel(dset=None, random_seed=1234, dtype=torch.float64, device=device(type='cpu'))[source]

Bases: astir.models.abstract.AstirModel

Class to perform statistical inference to assign cells to cell types.

Parameters
  • dset (Optional[SCDataset]) – the input gene expression dataframe

  • random_seed (int) – the random seed for parameter initialization, defaults to 1234

  • dtype (dtype) – the data type of parameters, should be the same as dset, defaults to torch.float64

Methods:

diagnostics(cell_type_assignments, alpha)

Run diagnostics on cell type assignments

fit([max_epochs, learning_rate, batch_size, …])

Runs train loops until the convergence reaches delta_loss for delta_loss_batch sizes or for max_epochs number of times

get_celltypes([threshold, assignment_type, …])

Get the most likely cell types.

get_recognet()

Getter for the recognition net.

load_hdf5(hdf5_name)

Initializes Cell Type Model from a hdf5 file type

plot_clustermap([plot_name, threshold, …])

Save the heatmap of protein content in cells with cell types labeled.

predict(new_dset)

Feed new_dset to the recognition net to get a prediction.

diagnostics(cell_type_assignments, alpha)[source]

Run diagnostics on cell type assignments

See astir.Astir.diagnostics_celltype() for full documentation

Return type

DataFrame

fit(max_epochs=50, learning_rate=0.001, batch_size=128, delta_loss=0.001, delta_loss_batch=10, msg='')[source]

Runs train loops until the convergence reaches delta_loss for delta_loss_batch sizes or for max_epochs number of times

Parameters
  • max_epochs (int) – number of train loop iterations, defaults to 50

  • learning_rate (float) – the learning rate, defaults to 0.01

  • batch_size (int) – the batch size, defaults to 128

  • delta_loss (float) – stops iteration once the loss rate reaches delta_loss, defaults to 0.001

  • delta_loss_batch (int) – the batch size to consider delta loss, defaults to 10

  • msg (str) – iterator bar message, defaults to empty string

Return type

None

get_celltypes(threshold=0.7, assignment_type='threshold', prob_assign=None)[source]

Get the most likely cell types. A cell is assigned to a cell type if the probability is greater than threshold. If no cell types have a probability higher than threshold, then “Unknown” is returned.

Parameters
  • assignment_type (str) – either ‘threshold’ or ‘max’. If threshold, type assignment is based on whether the probability threshold is above prob_assignment. If ‘max’, type assignment is based on the max probability value or “unknown” if there are multiple max probabilities. Defaults to ‘threshold’.

  • threshold (float) – the probability threshold above which a cell is assigned to a cell type, defaults to 0.7

Return type

DataFrame

Returns

a data frame with most likely cell types for each

get_recognet()[source]

Getter for the recognition net.

Return type

TypeRecognitionNet

Returns

the trained recognition net

load_hdf5(hdf5_name)[source]

Initializes Cell Type Model from a hdf5 file type

Parameters

hdf5_name (str) – file path

Return type

None

plot_clustermap(plot_name='celltype_protein_cluster.png', threshold=0.7, figsize=(7.0, 5.0), prob_assign=None)[source]

Save the heatmap of protein content in cells with cell types labeled.

Parameters
  • plot_name (str) – name of the plot, extension(e.g. .png or .jpg) is needed, defaults to “celltype_protein_cluster.png”

  • threshold (float) – the probability threshold above which a cell is assigned to a cell type, defaults to 0.7

  • figsize (Tuple[float, float]) – the size of the figure, defaults to (7.0, 5.0)

Return type

None

predict(new_dset)[source]

Feed new_dset to the recognition net to get a prediction.

Parameters

new_dset (DataFrame) – the dataset to be predicted

Return type

array

Returns

the resulting cell type assignment

class astir.models.StateRecognitionNet(C, G, const=2, dropout_rate=0, batch_norm=False)[source]

Bases: torch.nn.modules.module.Module

State Recognition Neural Network to get mean of z and standard deviation of z. The neural network architecture looks like this: G -> const * C -> const * C -> G (for mu) or -> G (for std). With batch normal layers after each activation output layers and dropout activation units

Parameters
  • C (int) – the number of pathways

  • G (int) – the number of proteins

  • const (int) – the size of the hidden layers are const times proportional to C, defaults to 2

  • dropout_rate (float) – the dropout rate, defaults to 0

  • batch_norm (bool) – apply batch normal layers if True, defaults to False

Methods:

forward(x)

One forward pass of the StateRecognitionNet

Attributes:

forward(x)[source]

One forward pass of the StateRecognitionNet

Parameters

x (Tensor) – the input to the recognition network model

Return type

Tuple[Tensor, Tensor]

Returns

the value from the output layer of the network

training: bool
class astir.models.TypeRecognitionNet(C, G, hidden_size=20)[source]

Bases: torch.nn.modules.module.Module

Type Recognition Neural Network.

Parameters
  • C (int) – number of classes

  • G (int) – number of features

  • hidden_size (int) – size of hidden layers, defaults to 10

Methods:

forward(x)

One forward pass.

Attributes:

forward(x)[source]

One forward pass.

Parameters

x (Tensor) – the input vector

Return type

Tensor

Returns

the calculated cost value

training: bool