The Role of Artificial Intelligence in Genetic Sequencing

Written By: Mark Titleman

Over the years the market demand for high-throughput DNA sequencing in scientific

and medical contexts has both accelerated and diversified the development of these

technologies. Illumina sequencing has remained the standard high-throughout technology,

incorporating the usual RT-bases with fluorescent labelling and flow cells for massive

parallelization, while nanopore and SOLiD sequencing technologies, amongst others, offer new

and different sequencing methods. Artificial intelligence has become useful in virtually all

scientific fields and genetic sequencing is no exception; data analysis as well as throughput and

accuracy can be ameliorated, with an off-chance for further variegation in sequencing

technology. Data analysis will in fact require the increased speed of base calling and alignment

to a reference genome due to the amount of storage space needed for human genome data –

an estimated 40 exabytes by 2025. Additionally, variant calling will lead to improved disease

diagnosis and treatment while overall these medical uses will become more affordable. The

data analysis itself, however, offers the most tantalizing potential for scientific advancement in

that – in conjunction with increased data – it can provide a fuller understanding of the human

genome and gene expression.

Artificial intelligence shortens the time and expense of genome sequencing through

increased speed of base calling and alignment. A neural network trained to translate electrical

signals to nucleotide sequence via comparison to a consensus sequence can learn from

previous data and develop improved consensus accuracy, while genomic alignment within an

assembly uses a majority-rules process that is similarly self-learning. AI can also be trained using

a taxon-specific dataset or a larger neural network. This self-learning and error reduction will

lead to significant future improvement in sequencing efficiency, including eventual lowered

costs in associated medical technology.

AI self-learning in genetic sequencing can identify patterns in the genome and handle

large genomic datasets, identifying genetic variants that modify gene expression for novel traits

or disease, variants at a population level, structural variations, and ultimately personalized

treatments based on sequences as well as integrated clinical, environmental, and lifestyle data.

Identifying these patterns and their relation to illness will improve disease pathogenesis and

prevention, while treatments will develop in both genetically tailored and holistic directions –

depending on medical context. Cancer, for example, is the result of genetic mutations which

are increasingly catalogued. Structural variation analysis can be performed, with identified

treatments available for associated diseases, while data integration can be relied on for

multifactorial illnesses such as heart disease and psychiatric disorder. Cancer in particular can

be analyzed in terms of recurrence probability prediction, subtype categorization, and early

diagnosis. Some recent examples of identifying genetic variants in the context of precision

medicine are those for serine/threonine kinase 11 (STK11), epidermal growth factor receptor

(EGFR), FAT1, SETBP1, and KRAS (Kirsten rat sarcoma virus).

These types of sequencing analyses have also provided much information about human

biological variety in both healthy and diseased individuals. Cytokine types and amounts, for

example, were already known to rely on environmental factors, genetic background, and

intestinal microbiota composition – a relationship made clear through genetic sequencing and

data integration. The Human Functional Genomics Project (HFGP), which focused on 500

healthy adult individuals, was used to assess biological variety in this context. Eleven host

variables accounted for approximately 67% of variation in the production of activated cytokines

in healthy people. Artificial intelligence thus greatly enhanced understanding of cytokine

production and function.

At a population level AI in genetic sequencing can identify at-risk populations via large

databases in addition to varied commercial uses such as determining ancestry or historical

migration. Inherited disorders or disease susceptibility can be identified using artificial

intelligence and combatted by public health initiatives. Genetic markers that are compared to

reference datasets predict ancestry, and haplogroups that are categorized by the relatively

unchangeable Y chromosome can be traced more easily for uncovering the historical migration

patterns of male ancestors. Thus artificial intelligence allowing population-wide comparison of

sequences can identify pressing needs for a group homogeneously or tailor knowledge of this

group to the specific medical or commercial needs of the individual.

Most interestingly there is great potential for artificial intelligence-based genetic

sequencing in understanding the human genome and gene expression. The accuracy of gene

sequencing and its increased efficiency will allow for a much larger database of genomes and

gene variants to be assembled. Patterns can be found in this sequencing data pertaining to

disease, the specific and overall relationship between genotypes and phenotypes, and

variations within for even further discovery. It is certain that gene expression will be better

understood as well.

Artificial intelligence in genetic sequencing, although still in its infancy, has and will

continue to provide fruitful results for the understanding of human genetics and disease and

the correlated development of technologies. The throughput and accuracy of high-throughput

genetic sequencing has benefited from AI due to self-learning. Variant calling has also improved

– useful for disease diagnosis and treatment, understanding structural variation, and

population studies. Taken further, much variability in disease pathogenesis can be seen, partly

through AI-based data integration of multifactorial disorders, while population-level AI-based

genetic sequencing can be used for group disease understanding as well as any sequence

variation within for medical and commercial purposes. Data analysis and increased data offer

potential for advances in the understanding of gene expression and human genomes,

contributing in important ways to the scientific corpus.

Bibliography

“Artificial Intelligence, Machine Learning and Genomics.” 2022. Genome.gov. January 12, 2022.

https://www.genome.gov/about-genomics/educational-resources/fact-sheets/artificial-

intelligence-machine-learning-and-genomics.

Pereira, Daniel. 2024. “The Future of AI-Based Gene Sequencing.” OODA Loop. January 12,

2024. https://www.oodaloop.com/ooda-original/2024/01/12/the-future-of-ai-based-

gene-sequencing/

Vacek, George. 2023. “How AI Is Transforming Genomics.” NVIDIA Blog. February 24, 2023.

https://blogs.nvidia.com/blog/how-ai-is-transforming-genomics/.

Vilhekar, Rohit S., and Alka Rawekar. "Artificial intelligence in genetics." Cureus 16, no. 1 (2024).

Previous
Previous

ARTIFICIAL INTELLIGENCE IN GENETICS AND GENE THERAPY FOR PREVENTION, IDENTIFICATION,MEDICATION AND ANNIHILATION OF CANCER

Next
Next

The Role of AI in Agricultural Genetics: Revolutionizing Crop Breeding and Sustainability