Projects | Stanislav Bratchikov

Automated and Scalable Analysis of Clinical Flow Cytometry Samples Using Machine Learning

Sun, 01 Oct 2023 00:00:00 +0000

Background:
Multiparameter flow cytometry is an invaluable tool for translational research which provides in depth immunophenotyping of the clinical samples. Traditionally, analysis of the flow cytometry data is performed manually by experts who have to take into account both technical and biological variability. This approach is prone to expert-specific biases and does not scale to hundreds of samples. We hypothesize that a machine learning classifier can be applied to clinical flow cytometry samples and can achieve cell typing accuracy comparable to expert annotations, overcome technical variability while preserving biological variability, and scale up to hundreds of clinical samples.
Methods:
We have generated a diverse set of 172 expert-annotated clinical flow cytometry samples generated from bronchoalveolar lavage fluid from patients with lung diseases, including severe pneumonia, respiratory PASC, samples from lung transplant patients, and samples from healthy volunteers. We split this dataset with a 7:3 train-validation ratio to train and optimize a model based on a gradient boosting LightGBM classifier. We have validated our model using previously annotated external dataset of hundreds of clinical samples from patients with lung diseases.
Results:
After hyperparameter optimization our model achieved high classification accuracy for both common and rare cell types. Our model performed well on phenotypically distinct cell types (such as T cell subsets) or phenotypically plastic cell types (such as neutrophils and macrophages), thus, preserving biological variability. The model performed well when it was applied to a large (>1100 samples) clinical datasets from multiple studies. Our model required only ~1 second per sample, which saves about 4 mins of expert’s time for every sample.
Conclusions:
We present a scalable and generalizable approach for rapid and accurate annotation of clinical samples acquired in translational research settings. Our approach addresses many issues inherent to analysis of clinical flow cytometry samples and increases reproducibility.

Predicting Pneumonia Outcomes: Deep Learning and Traditional Methods

Fri, 01 Sep 2023 00:00:00 +0000

Background:
Large scale NIAID funded study (Successful Clinical Response In Pneumonia Treatment) of hospitalized patients with severe pneumonia held across several years at Northwestern Memorial Hospital is collecting multiple types of patients’ data including: Single cell RNAseq, flow cytometry of bronchoalveolar lavage fluid, cytokines abundance information, electronic health records that include many clinical and biological factors such as gender of patient or days since intubation of patient in icu. This data might be useful when answering following questions:

Is immune response to various pathogens pre-programmed or adaptive?
Dependence of the response from secondary/primary infection?
Can we predict ventilator acquired pneumonia onset within the next 7 days in intensive care unit?
Can we predict ventilator acquired pneumonia outcome?
How do long COVID patients stratify/cluster based on scRNA-seq?

Recent advances in biomedical deep learning introduced several useful tools for exploring and integrating multimodal biological data. These tools can be used to address questions above.
Methods:
We have collected a diverse dataset of 1741 samples generated from bronchoalveolar lavage fluid of patients with lung diseases and samples from healthy volunteers. For 263 samples additional single cell RNASeq data was provided. Several Deep Learning methods were selected for healthy vs SARS-Cov2 conditions comparison: including factor decomposition methods,latent perturbation methods, single cell large language models. These methods were used to discover gene expression patterns between different conditions to identify genetic drivers of researched diseases. A comparison benchmark was introduced to fine-tune discussed models and make sense of the results.
Results:
I have performed state-of-the-art differential gene expression analysis using pseudo bulk subsampling technique. Produced sets of genes that differed across conditions were used for models’ benchmarking. Finally, I have trained a gradient-boosting-based model to select most informative deep learning method for predicting clinical outcome of patient.
Conclusions:
We present a comprehensive study of severe-pneumonia patients using deep learning and traditional methods. Using clinical samples acquired in translational research settings, we have identified most informative methods for predicting ventilator pneumonia onset, acquiring pathogen associations with clinical outcomes and determining pathogen-associated immune response.

Developing pipeline for single cell spatial transcriptomics analysis

Tue, 01 Aug 2023 00:00:00 +0000

Background:
Single cell spatial transcriptomics is a rapidly developing area in the field of sc RNAseq. Unlike spot-based spatial transcriptomics, single cell spatial transcriptomics can provide cellular and even subcellular resolution. Information about spatial coordinates of RNA transcripts within cell and globally in tissue can provide critical insights about its structure and function. Multiple companies have tried to fill this niche of research and provide commercially available platform for performing this sequencing. List of popular commercially available platforms include: 10x Xenium, Nanostring CosMx, and Vizgen MERFISH.
Methods:
We have researched publicly available datasets and our own in-house data to compare CosMx and Xenium. For this purpose we used currently available analysis tools using python and R packages: Squidpy,Giotto,stLearn, Spapros and Seurat.
Results:
I have compared coverage of RNA transcripts within given samples to find platform that provides the best transcript coverage. Additionally, I have estimated experiment and tissue specific batch effects and ways to integrate data preserving biological variability while reducing technical variation. Finally, I have studied multiple tools for targeted gene panel construction used in spatial transcriptomics. These tools allowed selecting 50 additional genes to add to existent commercial gene panels that would best preserve biological variation of human lung tissue.
Conclusions:
Based on data used in my comparison, Xenium platform provided better transcript coverage and specificity. Lack of field of view stitching in CosMx produces cell duplications that may influence analysis. This comparison analysis enabled bringing these platforms to our laboratory and provided a standard pipeline for handling spatial transcriptomics data in the laboratory.