The Role of Artificial Intelligence in Genetic Sequencing
Written By: Mark Titleman
Over the years the market demand for high-throughput DNA sequencing in scientific
and medical contexts has both accelerated and diversified the development of these
technologies. Illumina sequencing has remained the standard high-throughout technology,
incorporating the usual RT-bases with fluorescent labelling and flow cells for massive
parallelization, while nanopore and SOLiD sequencing technologies, amongst others, offer new
and different sequencing methods. Artificial intelligence has become useful in virtually all
scientific fields and genetic sequencing is no exception; data analysis as well as throughput and
accuracy can be ameliorated, with an off-chance for further variegation in sequencing
technology. Data analysis will in fact require the increased speed of base calling and alignment
to a reference genome due to the amount of storage space needed for human genome data –
an estimated 40 exabytes by 2025. Additionally, variant calling will lead to improved disease
diagnosis and treatment while overall these medical uses will become more affordable. The
data analysis itself, however, offers the most tantalizing potential for scientific advancement in
that – in conjunction with increased data – it can provide a fuller understanding of the human
genome and gene expression.
Artificial intelligence shortens the time and expense of genome sequencing through
increased speed of base calling and alignment. A neural network trained to translate electrical
signals to nucleotide sequence via comparison to a consensus sequence can learn from
previous data and develop improved consensus accuracy, while genomic alignment within an
assembly uses a majority-rules process that is similarly self-learning. AI can also be trained using
a taxon-specific dataset or a larger neural network. This self-learning and error reduction will
lead to significant future improvement in sequencing efficiency, including eventual lowered
costs in associated medical technology.
AI self-learning in genetic sequencing can identify patterns in the genome and handle
large genomic datasets, identifying genetic variants that modify gene expression for novel traits
or disease, variants at a population level, structural variations, and ultimately personalized
treatments based on sequences as well as integrated clinical, environmental, and lifestyle data.
Identifying these patterns and their relation to illness will improve disease pathogenesis and
prevention, while treatments will develop in both genetically tailored and holistic directions –
depending on medical context. Cancer, for example, is the result of genetic mutations which
are increasingly catalogued. Structural variation analysis can be performed, with identified
treatments available for associated diseases, while data integration can be relied on for
multifactorial illnesses such as heart disease and psychiatric disorder. Cancer in particular can
be analyzed in terms of recurrence probability prediction, subtype categorization, and early
diagnosis. Some recent examples of identifying genetic variants in the context of precision
medicine are those for serine/threonine kinase 11 (STK11), epidermal growth factor receptor
(EGFR), FAT1, SETBP1, and KRAS (Kirsten rat sarcoma virus).
These types of sequencing analyses have also provided much information about human
biological variety in both healthy and diseased individuals. Cytokine types and amounts, for
example, were already known to rely on environmental factors, genetic background, and
intestinal microbiota composition – a relationship made clear through genetic sequencing and
data integration. The Human Functional Genomics Project (HFGP), which focused on 500
healthy adult individuals, was used to assess biological variety in this context. Eleven host
variables accounted for approximately 67% of variation in the production of activated cytokines
in healthy people. Artificial intelligence thus greatly enhanced understanding of cytokine
production and function.
At a population level AI in genetic sequencing can identify at-risk populations via large
databases in addition to varied commercial uses such as determining ancestry or historical
migration. Inherited disorders or disease susceptibility can be identified using artificial
intelligence and combatted by public health initiatives. Genetic markers that are compared to
reference datasets predict ancestry, and haplogroups that are categorized by the relatively
unchangeable Y chromosome can be traced more easily for uncovering the historical migration
patterns of male ancestors. Thus artificial intelligence allowing population-wide comparison of
sequences can identify pressing needs for a group homogeneously or tailor knowledge of this
group to the specific medical or commercial needs of the individual.
Most interestingly there is great potential for artificial intelligence-based genetic
sequencing in understanding the human genome and gene expression. The accuracy of gene
sequencing and its increased efficiency will allow for a much larger database of genomes and
gene variants to be assembled. Patterns can be found in this sequencing data pertaining to
disease, the specific and overall relationship between genotypes and phenotypes, and
variations within for even further discovery. It is certain that gene expression will be better
understood as well.
Artificial intelligence in genetic sequencing, although still in its infancy, has and will
continue to provide fruitful results for the understanding of human genetics and disease and
the correlated development of technologies. The throughput and accuracy of high-throughput
genetic sequencing has benefited from AI due to self-learning. Variant calling has also improved
– useful for disease diagnosis and treatment, understanding structural variation, and
population studies. Taken further, much variability in disease pathogenesis can be seen, partly
through AI-based data integration of multifactorial disorders, while population-level AI-based
genetic sequencing can be used for group disease understanding as well as any sequence
variation within for medical and commercial purposes. Data analysis and increased data offer
potential for advances in the understanding of gene expression and human genomes,
contributing in important ways to the scientific corpus.
Bibliography
“Artificial Intelligence, Machine Learning and Genomics.” 2022. Genome.gov. January 12, 2022.
https://www.genome.gov/about-genomics/educational-resources/fact-sheets/artificial-
intelligence-machine-learning-and-genomics.
Pereira, Daniel. 2024. “The Future of AI-Based Gene Sequencing.” OODA Loop. January 12,
2024. https://www.oodaloop.com/ooda-original/2024/01/12/the-future-of-ai-based-
gene-sequencing/
Vacek, George. 2023. “How AI Is Transforming Genomics.” NVIDIA Blog. February 24, 2023.
https://blogs.nvidia.com/blog/how-ai-is-transforming-genomics/.
Vilhekar, Rohit S., and Alka Rawekar. "Artificial intelligence in genetics." Cureus 16, no. 1 (2024).