astir.models package¶

Module contents¶

Classes:

`AstirModel`(dset, random_seed, dtype[, device])	Abstract class to perform statistical inference to assign.
`CellStateModel`([dset, const, dropout_rate, …])	Class to perform statistical inference to on the activation
`CellTypeModel`([dset, random_seed, dtype, device])	Class to perform statistical inference to assign cells to cell types.
`StateRecognitionNet`(C, G[, const, …])	State Recognition Neural Network to get mean of z and standard deviation of z.
`TypeRecognitionNet`(C, G[, hidden_size])	Type Recognition Neural Network.

class astir.models.AstirModel(dset, random_seed, dtype, device=device(type='cpu'))[source]¶

Bases: object

Abstract class to perform statistical inference to assign. This module is the super class of CellTypeModel and CellStateModel and is not supposed to be instantiated.

Methods:

`fit`(max_epochs, learning_rate, batch_size, …)	Runs train loops until the convergence reaches delta_loss for delta_loss_batch sizes or for max_epochs number of times
`get_assignment`()	Get the final assignment of the dataset.
`get_data`()	Get model data
`get_losses`()	Getter for losses.
`get_scdataset`()	Getter for the SCDataset.
`get_variables`()	Returns all variables
`is_converged`()	Returns True if the model converged

fit(max_epochs, learning_rate, batch_size, delta_loss, delta_loss_batch, msg)[source]¶

Runs train loops until the convergence reaches delta_loss for delta_loss_batch sizes or for max_epochs number of times

Return type: None

get_assignment()[source]¶

Get the final assignment of the dataset.

Return type: DataFrame
Returns: the final assignment of the dataset

get_data()[source]¶

Get model data

Return type: Dict[str, Tensor]
Returns: data

get_losses()[source]¶

Getter for losses.

Return type: Tensor
Returns: self.losses

get_scdataset()[source]¶

Getter for the SCDataset.

Return type: SCDataset
Returns: self._dset

get_variables()[source]¶

Returns all variables

Return type: Dict[str, Tensor]
Returns: self._variables

is_converged()[source]¶

Returns True if the model converged

Return type: bool
Returns: self._is_converged

class astir.models.CellStateModel(dset=None, const=2, dropout_rate=0, batch_norm=False, random_seed=42, dtype=torch.float64, device=device(type='cpu'))[source]¶

Bases: astir.models.abstract.AstirModel

Class to perform statistical inference to on the activation: of states (pathways) across cells

Parameters

dset (Optional[SCDataset]) – the input gene expression dataset, defaults to None
const (int) – See parameter const in astir.models.StateRecognitionNet(), defaults to 2
dropout_rate (float) – See parameter dropout_rate in astir.models.StateRecognitionNet(), defaults to 0
batch_norm (bool) – See parameter batch_norm in astir.models.StateRecognitionNet(), defaults to False
random_seed (int) – the random seed number to reproduce results, defaults to 42
dtype (dtype) – torch datatype to use in the model, defaults to torch.float64
device (device) – torch.device’s cpu or gpu, defaults to torch.device(“cpu”)

Methods:

`diagnostics`()	Run diagnostics on cell type assignments
`fit`([max_epochs, learning_rate, batch_size, …])	Runs train loops until the convergence reaches delta_loss for delta_loss_batch sizes or for max_epochs number of times
`get_correlations`()	Returns a C (# of pathways) X G (# of proteins) matrix where each element represents the correlation value of the pathway and the protein
`get_final_mu_z`([new_dset])	Returns the mean of the predicted z values for each core
`get_recognet`()	Getter for the recognition net
`load_hdf5`(hdf5_name)	Initializes Cell State Model from a hdf5 file type

diagnostics()[source]¶

Run diagnostics on cell type assignments

See astir.Astir.diagnostics_cellstate() for full documentation

Return type: DataFrame

fit(max_epochs=50, learning_rate=0.001, batch_size=128, delta_loss=0.001, delta_loss_batch=10, msg='')[source]¶

Runs train loops until the convergence reaches delta_loss for delta_loss_batch sizes or for max_epochs number of times

Parameters

max_epochs (int) – number of train loop iterations, defaults to 50
learning_rate (float) – the learning rate, defaults to 0.01
batch_size (int) – the batch size, defaults to 128
delta_loss (float) – stops iteration once the loss rate reaches delta_loss, defaults to 0.001
delta_loss_batch (int) – the batch size to consider delta loss, defaults to 10
msg (str) – iterator bar message, defaults to empty string

Return type

None

get_correlations()[source]¶

Returns a C (# of pathways) X G (# of proteins) matrix where each element represents the correlation value of the pathway and the protein

Return type: array
Returns: matrix of correlation between all pathway and protein pairs.

get_final_mu_z(new_dset=None)[source]¶

Returns the mean of the predicted z values for each core

Parameters: new_dset (Optional[SCDataset]) – returns the predicted z values of this dataset on the existing model. If None, it predicts using the existing dataset, defaults to None
Return type: Tensor
Returns: the mean of the predicted z values for each core

get_recognet()[source]¶

Getter for the recognition net

Return type: StateRecognitionNet
Returns: the recognition net

load_hdf5(hdf5_name)[source]¶

Initializes Cell State Model from a hdf5 file type

Parameters: hdf5_name (str) – file path
Return type: None

class astir.models.CellTypeModel(dset=None, random_seed=1234, dtype=torch.float64, device=device(type='cpu'))[source]¶

Bases: astir.models.abstract.AstirModel

Class to perform statistical inference to assign cells to cell types.

Parameters

dset (Optional[SCDataset]) – the input gene expression dataframe
random_seed (int) – the random seed for parameter initialization, defaults to 1234
dtype (dtype) – the data type of parameters, should be the same as dset, defaults to torch.float64

Methods:

`diagnostics`(cell_type_assignments, alpha)	Run diagnostics on cell type assignments
`fit`([max_epochs, learning_rate, batch_size, …])	Runs train loops until the convergence reaches delta_loss for delta_loss_batch sizes or for max_epochs number of times
`get_celltypes`([threshold, assignment_type, …])	Get the most likely cell types.
`get_recognet`()	Getter for the recognition net.
`load_hdf5`(hdf5_name)	Initializes Cell Type Model from a hdf5 file type
`plot_clustermap`([plot_name, threshold, …])	Save the heatmap of protein content in cells with cell types labeled.
`predict`(new_dset)	Feed new_dset to the recognition net to get a prediction.

diagnostics(cell_type_assignments, alpha)[source]¶

Run diagnostics on cell type assignments

See astir.Astir.diagnostics_celltype() for full documentation

Return type: DataFrame

fit(max_epochs=50, learning_rate=0.001, batch_size=128, delta_loss=0.001, delta_loss_batch=10, msg='')[source]¶

Runs train loops until the convergence reaches delta_loss for delta_loss_batch sizes or for max_epochs number of times

Parameters

max_epochs (int) – number of train loop iterations, defaults to 50
learning_rate (float) – the learning rate, defaults to 0.01
batch_size (int) – the batch size, defaults to 128
delta_loss (float) – stops iteration once the loss rate reaches delta_loss, defaults to 0.001
delta_loss_batch (int) – the batch size to consider delta loss, defaults to 10
msg (str) – iterator bar message, defaults to empty string

Return type

None

get_celltypes(threshold=0.7, assignment_type='threshold', prob_assign=None)[source]¶

Get the most likely cell types. A cell is assigned to a cell type if the probability is greater than threshold. If no cell types have a probability higher than threshold, then “Unknown” is returned.

Parameters

assignment_type (str) – either ‘threshold’ or ‘max’. If threshold, type assignment is based on whether the probability threshold is above prob_assignment. If ‘max’, type assignment is based on the max probability value or “unknown” if there are multiple max probabilities. Defaults to ‘threshold’.
threshold (float) – the probability threshold above which a cell is assigned to a cell type, defaults to 0.7

Return type

DataFrame

Returns

a data frame with most likely cell types for each

get_recognet()[source]¶

Getter for the recognition net.

Return type: TypeRecognitionNet
Returns: the trained recognition net

load_hdf5(hdf5_name)[source]¶

Initializes Cell Type Model from a hdf5 file type

Parameters: hdf5_name (str) – file path
Return type: None

plot_clustermap(plot_name='celltype_protein_cluster.png', threshold=0.7, figsize=(7.0, 5.0), prob_assign=None)[source]¶

Save the heatmap of protein content in cells with cell types labeled.

Parameters

plot_name (str) – name of the plot, extension(e.g. .png or .jpg) is needed, defaults to “celltype_protein_cluster.png”
threshold (float) – the probability threshold above which a cell is assigned to a cell type, defaults to 0.7
figsize (Tuple[float, float]) – the size of the figure, defaults to (7.0, 5.0)

Return type

None

predict(new_dset)[source]¶

Feed new_dset to the recognition net to get a prediction.

Parameters: new_dset (DataFrame) – the dataset to be predicted
Return type: array
Returns: the resulting cell type assignment

class astir.models.StateRecognitionNet(C, G, const=2, dropout_rate=0, batch_norm=False)[source]¶

Bases: torch.nn.modules.module.Module

State Recognition Neural Network to get mean of z and standard deviation of z. The neural network architecture looks like this: G -> const * C -> const * C -> G (for mu) or -> G (for std). With batch normal layers after each activation output layers and dropout activation units

Parameters

C (int) – the number of pathways
G (int) – the number of proteins
const (int) – the size of the hidden layers are const times proportional to C, defaults to 2
dropout_rate (float) – the dropout rate, defaults to 0
batch_norm (bool) – apply batch normal layers if True, defaults to False