Decoding DNA with Artificial Intelligence
Dr Divya Tej Sowpati's journey in epigenetics began after he completed his Masters in Zoology. Artificial Intelligence brings unprecedented clarity


Dr Divya Tej Sowpati's journey in epigenetics began after he completed his Masters in Zoology. Derived from the Greek word “epi” meaning above, the field of epigenetics studies how changes can happen on the DNA without altering its sequence. Essentially, this field examines how environmental factors such as stress or diet can cause changes without changing our DNA. Identical twins – who share the same DNA – can develop different traits over time due to epigenetic differences caused by unique environments or experiences.
Dr Tej was formally trained in epigenetics when he undertook a PhD at the Centre for DNA Fingerprinting and Diagnostics in Hyderabad. Following this, he joined the Centre for Cellular & Molecular Biology (CCMB), under the Council of Scientific and Industrial Research - a research and development organisation aligned with the Ministry of Science and Technology, Government of India. At CCMB, which conducts high-quality research and training in various fields of modern biology, Dr Tej had his first brush with AI and its capabilities in genetics.
Although Dr Tej’s work is largely focused on gene expression and DNA methylation, his work is intertwined with AI. “AI happens to be a tool I use in my research. I wouldn’t say I’m AI-driven, but I’ve always watched with great interest how the AI field has changed in the past decade and a half - particularly when it has become integral in our everyday lives, such as image processing, language translation and more. Most of my AI interest started after I began my work as post doctoral fellow and continued with my growth at CCMB. Earlier, we were using simple machine learning-based models to do some of our tasks. That was my interaction with AI up until I started with large scale genomics work,” he shares.
Mapping India’s genetic landscape
Dr Tej’s work focuses on genomics and epigenetics. “Genomics essentially deals with understanding what our genome or genetic material says about us. When I say us, I don't necessarily mean how we look, because that also is controlled by genetics. I'm more interested in the health consequences of genomics, how our genetic material, or changes in genetic material can predispose us to certain disorders, common as well as rare,” he says.
Currently, Dr Tej is working on the Genome India project, which was initiated in 2020 by the Department of Biotechnology, Government of India. The project has successfully analysed the genomic data of 10,000 Indians from various corners of the country. The results of this project will form a reference for genetics-based health, resulting in more personalised medicine. Up until recently, India has been relying on data available from Western countries or Caucasian patients. By mapping baseline diversity in India, diagnosis of genetic problems that are prevalent in India will occur at a much faster rate.
Another program Dr Tej is working on – the Indian Breast Cancer Genome Atlas – involves the team understanding the genomic or molecular basis of a subset of breast cancer known as triple negative breast cancer, which is more prevalent in India.
Where does AI fit into the picture? Projects like Genome India generate large scale data. AI is used to identify patterns that human researchers may miss. “We work with large genomes which have billions of DNA bases, and we want to ask what these bases mean in terms of health disparities or phenotypic outcomes. So, it was a natural gravitation for me to see if the improvements in AI models can somehow be applied to the questions we ask. We primarily use AI in understanding epigenetics,” says Dr Tej.
The field of genomics is heavily reliant upon AI. In the last decade, scientists and researchers have witnessed an explosion of AI applications in the field, such as variant effect prediction - understanding the result of a DNA sequence change; modelling gene expression levels - identifying which genes are active and at what level, or identifying functional and regulatory regions of the genome.
From an ethical perspective, Dr Tej shared his apprehensions about privacy. Genomic data, derived from people, is anonymised before it is released to the public. However, the risk remains that a powerful AI model could still identify sensitive information from the data.
He credits AWS with handling the large-scale nature of genomics projects. Currently, his team at CCMB is leveraging AWS to scale up their GPU workloads. “Our local GPU infrastructure is under a lot of workload, and we run our AI inferences on AWS instances. This was particularly helpful when we had to process many samples in a short period of time,” he says.
In his spare time, Dr Tej likes to take on new hobbies and interests. Currently, he is absorbed in playing chess. “In another six months…who knows,” he says.
The future of genomics and AI
The field of genomics will become more personalised and cell-type specific. Today, Dr Tej has witnessed the unprecedented, large-scale generation of genomic and transcriptomic (how DNA is expressed as proteins and other molecules) data. AI and large language models, he believes, will be invaluable in both analysing and understanding this data.
He also acknowledges that while he remains optimistic, researchers can be slow to adopt newer technologies or deviate from established paths.
Dr Tej’s team stays abreast of the latest developments in AI by reading articles and listening to podcasts. Advising fellow researchers, he says, “While it is important to exercise caution in adopting anything new, it's worth a shot at the same time. It is important to have some ground checks in place, so that when you are deviating too much from what is expected you have a way of tackling the problem. I believe that researchers should be willing to try and move away from the norm.”