astir.models package¶
Module contents¶
Classes:
|
Abstract class to perform statistical inference to assign. |
|
Class to perform statistical inference to on the activation |
|
Class to perform statistical inference to assign cells to cell types. |
|
State Recognition Neural Network to get mean of z and standard deviation of z. |
|
Type Recognition Neural Network. |
- class astir.models.AstirModel(dset, random_seed, dtype, device=device(type='cpu'))[source]¶
Bases:
object
Abstract class to perform statistical inference to assign. This module is the super class of CellTypeModel and CellStateModel and is not supposed to be instantiated.
Methods:
fit
(max_epochs, learning_rate, batch_size, …)Runs train loops until the convergence reaches delta_loss for delta_loss_batch sizes or for max_epochs number of times
Get the final assignment of the dataset.
get_data
()Get model data
Getter for losses.
Getter for the SCDataset.
Returns all variables
Returns True if the model converged
- fit(max_epochs, learning_rate, batch_size, delta_loss, delta_loss_batch, msg)[source]¶
Runs train loops until the convergence reaches delta_loss for delta_loss_batch sizes or for max_epochs number of times
- Return type
None
- get_assignment()[source]¶
Get the final assignment of the dataset.
- Return type
DataFrame
- Returns
the final assignment of the dataset
- class astir.models.CellStateModel(dset=None, const=2, dropout_rate=0, batch_norm=False, random_seed=42, dtype=torch.float64, device=device(type='cpu'))[source]¶
Bases:
astir.models.abstract.AstirModel
- Class to perform statistical inference to on the activation
of states (pathways) across cells
- Parameters
dset (
Optional
[SCDataset
]) – the input gene expression dataset, defaults to Noneconst (
int
) – See parameterconst
inastir.models.StateRecognitionNet()
, defaults to 2dropout_rate (
float
) – See parameterdropout_rate
inastir.models.StateRecognitionNet()
, defaults to 0batch_norm (
bool
) – See parameterbatch_norm
inastir.models.StateRecognitionNet()
, defaults to Falserandom_seed (
int
) – the random seed number to reproduce results, defaults to 42dtype (
dtype
) – torch datatype to use in the model, defaults to torch.float64device (
device
) – torch.device’s cpu or gpu, defaults to torch.device(“cpu”)
Methods:
Run diagnostics on cell type assignments
fit
([max_epochs, learning_rate, batch_size, …])Runs train loops until the convergence reaches delta_loss for delta_loss_batch sizes or for max_epochs number of times
Returns a C (# of pathways) X G (# of proteins) matrix where each element represents the correlation value of the pathway and the protein
get_final_mu_z
([new_dset])Returns the mean of the predicted z values for each core
Getter for the recognition net
load_hdf5
(hdf5_name)Initializes Cell State Model from a hdf5 file type
- diagnostics()[source]¶
Run diagnostics on cell type assignments
See
astir.Astir.diagnostics_cellstate()
for full documentation- Return type
DataFrame
- fit(max_epochs=50, learning_rate=0.001, batch_size=128, delta_loss=0.001, delta_loss_batch=10, msg='')[source]¶
Runs train loops until the convergence reaches delta_loss for delta_loss_batch sizes or for max_epochs number of times
- Parameters
max_epochs (
int
) – number of train loop iterations, defaults to 50learning_rate (
float
) – the learning rate, defaults to 0.01batch_size (
int
) – the batch size, defaults to 128delta_loss (
float
) – stops iteration once the loss rate reaches delta_loss, defaults to 0.001delta_loss_batch (
int
) – the batch size to consider delta loss, defaults to 10msg (
str
) – iterator bar message, defaults to empty string
- Return type
None
- get_correlations()[source]¶
Returns a C (# of pathways) X G (# of proteins) matrix where each element represents the correlation value of the pathway and the protein
- Return type
array
- Returns
matrix of correlation between all pathway and protein pairs.
- get_final_mu_z(new_dset=None)[source]¶
Returns the mean of the predicted z values for each core
- Parameters
new_dset (
Optional
[SCDataset
]) – returns the predicted z values of this dataset on the existing model. If None, it predicts using the existing dataset, defaults to None- Return type
Tensor
- Returns
the mean of the predicted z values for each core
- class astir.models.CellTypeModel(dset=None, random_seed=1234, dtype=torch.float64, device=device(type='cpu'))[source]¶
Bases:
astir.models.abstract.AstirModel
Class to perform statistical inference to assign cells to cell types.
- Parameters
dset (
Optional
[SCDataset
]) – the input gene expression dataframerandom_seed (
int
) – the random seed for parameter initialization, defaults to 1234dtype (
dtype
) – the data type of parameters, should be the same as dset, defaults to torch.float64
Methods:
diagnostics
(cell_type_assignments, alpha)Run diagnostics on cell type assignments
fit
([max_epochs, learning_rate, batch_size, …])Runs train loops until the convergence reaches delta_loss for delta_loss_batch sizes or for max_epochs number of times
get_celltypes
([threshold, assignment_type, …])Get the most likely cell types.
Getter for the recognition net.
load_hdf5
(hdf5_name)Initializes Cell Type Model from a hdf5 file type
plot_clustermap
([plot_name, threshold, …])Save the heatmap of protein content in cells with cell types labeled.
predict
(new_dset)Feed new_dset to the recognition net to get a prediction.
- diagnostics(cell_type_assignments, alpha)[source]¶
Run diagnostics on cell type assignments
See
astir.Astir.diagnostics_celltype()
for full documentation- Return type
DataFrame
- fit(max_epochs=50, learning_rate=0.001, batch_size=128, delta_loss=0.001, delta_loss_batch=10, msg='')[source]¶
Runs train loops until the convergence reaches delta_loss for delta_loss_batch sizes or for max_epochs number of times
- Parameters
max_epochs (
int
) – number of train loop iterations, defaults to 50learning_rate (
float
) – the learning rate, defaults to 0.01batch_size (
int
) – the batch size, defaults to 128delta_loss (
float
) – stops iteration once the loss rate reaches delta_loss, defaults to 0.001delta_loss_batch (
int
) – the batch size to consider delta loss, defaults to 10msg (
str
) – iterator bar message, defaults to empty string
- Return type
None
- get_celltypes(threshold=0.7, assignment_type='threshold', prob_assign=None)[source]¶
Get the most likely cell types. A cell is assigned to a cell type if the probability is greater than threshold. If no cell types have a probability higher than threshold, then “Unknown” is returned.
- Parameters
assignment_type (
str
) – either ‘threshold’ or ‘max’. If threshold, type assignment is based on whether the probability threshold is above prob_assignment. If ‘max’, type assignment is based on the max probability value or “unknown” if there are multiple max probabilities. Defaults to ‘threshold’.threshold (
float
) – the probability threshold above which a cell is assigned to a cell type, defaults to 0.7
- Return type
DataFrame
- Returns
a data frame with most likely cell types for each
- get_recognet()[source]¶
Getter for the recognition net.
- Return type
- Returns
the trained recognition net
- load_hdf5(hdf5_name)[source]¶
Initializes Cell Type Model from a hdf5 file type
- Parameters
hdf5_name (
str
) – file path- Return type
None
- plot_clustermap(plot_name='celltype_protein_cluster.png', threshold=0.7, figsize=(7.0, 5.0), prob_assign=None)[source]¶
Save the heatmap of protein content in cells with cell types labeled.
- Parameters
plot_name (
str
) – name of the plot, extension(e.g. .png or .jpg) is needed, defaults to “celltype_protein_cluster.png”threshold (
float
) – the probability threshold above which a cell is assigned to a cell type, defaults to 0.7figsize (
Tuple
[float
,float
]) – the size of the figure, defaults to (7.0, 5.0)
- Return type
None
- class astir.models.StateRecognitionNet(C, G, const=2, dropout_rate=0, batch_norm=False)[source]¶
Bases:
torch.nn.modules.module.Module
State Recognition Neural Network to get mean of z and standard deviation of z. The neural network architecture looks like this: G -> const * C -> const * C -> G (for mu) or -> G (for std). With batch normal layers after each activation output layers and dropout activation units
- Parameters
C (
int
) – the number of pathwaysG (
int
) – the number of proteinsconst (
int
) – the size of the hidden layers are const times proportional to C, defaults to 2dropout_rate (
float
) – the dropout rate, defaults to 0batch_norm (
bool
) – apply batch normal layers if True, defaults to False
Methods:
forward
(x)One forward pass of the StateRecognitionNet
Attributes:
- forward(x)[source]¶
One forward pass of the StateRecognitionNet
- Parameters
x (
Tensor
) – the input to the recognition network model- Return type
Tuple
[Tensor
,Tensor
]- Returns
the value from the output layer of the network
- training: bool¶
- class astir.models.TypeRecognitionNet(C, G, hidden_size=20)[source]¶
Bases:
torch.nn.modules.module.Module
Type Recognition Neural Network.
- Parameters
C (
int
) – number of classesG (
int
) – number of featureshidden_size (
int
) – size of hidden layers, defaults to 10
Methods:
forward
(x)One forward pass.
Attributes:
- forward(x)[source]¶
One forward pass.
- Parameters
x (
Tensor
) – the input vector- Return type
Tensor
- Returns
the calculated cost value
- training: bool¶