Interview with Joseph Marvin Imperial: aligning generative AI with technical standards

In this interview series, we’re meeting some of the AAAI/SIGAI Doctoral Consortium participants to find out more about their research. The Doctoral Consortium provides an opportunity for a group of PhD students to discuss and explore their research interests and career objectives in an interdisciplinary workshop together with a panel of established researchers. In the […]

May 12, 2025 - 09:53

In this interview series, we’re meeting some of the AAAI/SIGAI Doctoral Consortium participants to find out more about their research. The Doctoral Consortium provides an opportunity for a group of PhD students to discuss and explore their research interests and career objectives in an interdisciplinary workshop together with a panel of established researchers. In the latest interview, we hear from Joseph Marvin Imperial, who is focussed on aligning generative AI with technical standards for regulatory and operational compliance.

Tell us a bit about your PhD – where are you studying, and what is the topic of your research?

I’m doing research at the beautiful University of Bath, focusing on aligning generative AI (GenAI) with technical standards for regulatory and operational compliance. Standards are documents created by industry and/or academic experts that have been recognized to ensure the quality, accuracy, and interoperability of systems and processes (aka “the best way of doing things”). You’ll see standards in almost all sectors and domains, including the sciences, healthcare, education, finance, journalism, law, and engineering. Given the rapid implementation of GenAI-based systems across these domains, it is logical to explore research that will anchor these systems with the same rules and regulations that experts follow through standards. I believe this direction of research is extremely important in playing a key role in both the oversight and trustworthiness of more powerful GenAI-based systems in the future.

Could you give us an overview of the research you’ve carried out so far during your PhD?

So far, I have explored evaluating and benchmarking the current capabilities of GenAI, specifically LLMs, on how they can capture standard constraints through prompting-based approaches such as in-context learning (ICL). My motivation is that I wanted to measure first how good or bad LLMs are by following specifications from standards pasted on prompts which is also the way interdisciplinary practitioners such as teachers, engineers, and journalists use AI.

In my first paper published at GEM Workshop at EMNLP2023, in the context of education and language proficiency, I empirically showed that the way teachers use GenAI models such as ChatGPT do not actually maximize their instruction-following capabilities to generate content that aligns with standards in education such the Common European Framework of Reference for Languages (CEFR). In my second long paper published in the main conference of EMNLP 2024, I proposed a novel method called ‘Standardize’ that leverages knowledge artifacts such as specifications from standards, exemplar information, and linguistic flags to significantly improve how LLMs, including GPT-4, can produce standard-aligned content that teachers can use.

In another paper at EMNLP 2024, I also explored building a benchmark called SpeciaLex to evaluate how state-of-the-art LLMs such as Gemma, Llama, GPT-4o, Bloom, OLMO perform on checking, identification, rewriting, open-generation tasks based on proper usage of vocabularies from standards. For this paper, I explored vocabularies from education (using CEFR data) and from aviation and engineering domain by partnering with ASD Simple Technical English.

Is there an aspect of your research that has been particularly interesting?

Whenever I talk to non-AI colleagues working in standards- and regulation-driven domains such as healthcare, education, and finance, I always find it interesting and motivating to hear how they express needing technological innovations to ease their burden in following regulations and protocols. This includes standard compliance. Likewise, it’s interesting and challenging to know how much standard reference data (SRDs) across domains and sectors are inaccessible to researchers who want to automate or do research on compliance checking. I see this as an opportunity to build useful AI benchmarks, like what I started with a vocabulary-based one called SpeciaLex for the domains for education and engineering.

What are your plans for building on your research so far during the PhD – what aspects will you investigate next?

I’m trying to be more ambitious with my following projects for the remaining two years of my PhD. For the following months, I plan to explore reasoning traces (also known as chain-of-thought) of LLMs and analyze how they can arrive at certain conclusions by following conflicting and complementary rules in technical standards. This direction is particularly challenging as the reasoning models I would need to use, such as DeepSeek-R1, are extremely large and computationally expensive. Moreover, I also plan to build a larger standard-compliance benchmark that spans more than 10 domains to test how state-of-the-art LLMs can follow standards for tasks including compliance checking and compliance-aligned generation.

How was the AAAI Doctoral Consortium, and the AAAI conference experience in general?

I think it’s the best Doctoral Consortium out of all the Doctoral Consortiums! (Disclaimer: I’ve only attended AAAI). I love how the event was so well organized by the program chairs. My favorite part is the mentorship session where they paired us with senior researchers in AI and I was excited to have talked face-to-face and share my research with Dr Francesca Rossi from IBM and the current AAAI25 President. I’ve also met Dr Jindong Wang from William & Mary who also works with NLP and LLM research and went out to lunch with him along with other PhD student participants at a nearby Chinese restaurant. Overall, I highly recommend the consortium to any PhD student who wants to meet new people outside their circles.

Joseph presenting his poster at the AAAI 2025 Doctoral Consortium with Program Chair Dr Lijing Wang.

What advice would you give to someone considering doing a PhD in the field?

Pursuing a PhD in anything related to AI is ridiculously competitive these days. Your success can depend on several key factors: your mental drive, your interest in the research topic, your physical health, your financial standing, and the advisor you will be working with. I’ve talked to enough PhD students and have my own experience to say if you encounter problems with any of these variables, they can impact your success. However, don’t see this as something to scare you off. I think mental, physical, intellectual, and financial maturity will get you a long way in completing a PhD. Also, it wouldn’t hurt to do a PhD in a beautiful and scenic place in another country.

Could you tell us an interesting (non-AI related) fact about you?

I grew up in the Philippines and speak two other languages, Filipino and Bikol, besides English. When I moved to the UK for my PhD, I became a casual football and rugby fan and have toured three stadiums (Anfield, Old Trafford, and Stamford Bridge) during holidays. Whenever I’m available, I always try to travel around beautiful and scenic places in the UK. Nothing is more mentally rejuvenating than being on trains and looking out the window while traveling through the English and Welsh countryside.

About Joseph

Joseph Marvin Imperial is a PhD Candidate at the UKRI Centre for Doctoral Training in Accountable, Responsible and Transparent AI (ART-AI) and a researcher of the BathNLP group at the University of Bath, UK. His research focuses on the alignment and controllability of generative AI through expert-defined rules and guidelines known as standards for regulatory and operational compliance.

Outside of the university, Joseph maintains active collaborations with international partners, including serving as 1) a committee member of the British Standards Institution (BSI) and contributing toward the AI standardization landscape for the UK, 2) advisory board member of the SIGSEA and SEACrowd advancing Southeast Asian language research, and 3) academic member of MLCommons AI Risks and Responsibility Working Group for exploring frontiers in AI safety research.