How I built a spelling game with AWS Serverless and GenAI
Intro Last December 2024, as part of the AWS Game Builder Challenge, I built a simple spelling game. For that, I used AWS Serverless and GenAI services and CDK was used as the IAC tool. This was my first experience building something with the help of GenAI and using Bedrock in an application. This blog post explains the project and my experience building this simple spelling game. This is the final result - the game you can play :) https://spelling-game.pubudu.dev/ How to play the game? Player can select a language to play the game. As of now, only English (US) and Dutch are the available languages. When the player selects a language, there will be a maximum 5 words generated. This includes an audio, brief meaning of the word and the number of characters for each word. Then the player needs to fill the word in the text box given. There is an indicator next to the text box how many characters have been entered in the text box and how many characters are required for the word. This indicator will be red until the required number of characters are filled, then it turns into green. There is a timer that starts as soon as the words are generated. This is based on the number of words generated. When the remaining time is less than 30 seconds, the background of the page as well as the background of the timer turns into red. Player can submit the answers before the timer runs out, else it will be automatically submitted. Then, the answers are evaluated and based on the number of correct answers, there is a pop up visible. If all the answers are correct, there will be a "Confetti" effect appearing on the page. Clicking on the 'show results' button on the pop up, the player can see the correct/incorrect answers and the correct word (in case of an incorrect answer). Implementation There are 2 main parts of this application. Backend - where words are generated and APIs are available. Frontend - Vue.js application for the player to interact. Source code There are two repositories available with the complete source code. Backend - https://github.com/pubudusj/spelling-game-backend Frontend - https://github.com/pubudusj/spelling-game-frontend Deployment instructions: Backend is implemented using AWS CDK and can be deployed as a generic CDK application. The URL of the Cloudfront distribution is required for the frontend to work. In the frontend, add the Cloudfront API URL to the VITE_API_BASE_URL in the env file. Install necessary dependencies and then run the application using npm run dev in the dev mode. Or you can build the frontend app using npm run build. When the backend stack is deployed, it will output the S3 hosting bucket name. You can copy the built frontend app into this bucket which will host the frontend via the Cloudfront distribution created in the backend stack. Backend There are 2 main components of the backend. Words generator component - to generate and save words in the database. API component - to serve frontend. Backend - words generator component Here is the high level overview of the words generator component and the steps within the state machine. Within Step Functions, there will be many AWS services called as explained below. Image: Words generator overview: Image: Words generator state machine: Words Generator Step Functions state machine is responsible for generating words. This Step function execution takes the language code as an input. ex: en-US, nl-NL, etc. As the first step, Bedrock InvokeModel is being called to generate 5 words with descriptions for each word, based on the language. Here, the Anthropic Claude 3 Haiku model is being used which gives better balance of accuracy and the pricing in this scenario. Here is the prompt I used to generate the words: "Generate 5 unique words that have a random number of characters more than 4 and less than 10 in Dutch language. For each word, provide a brief description of its meaning in English with more than a couple of words. Produce output only in a minified JSON array with the keys word and description. Word must always be in lowercase." Here, the response is a JSON string. Within the step's Result selector, the response will be converted to an array using the intrinsic function - StringToJson. { "words.$": "States.StringToJson($.Body.content[0].text)" } Then there is a map state where each word is an input to each map. Within the map state, there are branches (currently two) based on the language. In each branch, there is a step to synthesise the word using Polly using StartSpeechSynthesisTask. StartSpeechSynthesisTask is an async operation. So, there is an immediate step to check if the synthesis task is completed using Polly's GetSpeechSynthesisTask. Great thing about the StartSpeechSynthesisTask api is that it not only synthesises, but automatically saves the mp3 into the given S3 bucket. If the speech synthesis ta

Intro
Last December 2024, as part of the AWS Game Builder Challenge, I built a simple spelling game. For that, I used AWS Serverless and GenAI services and CDK was used as the IAC tool. This was my first experience building something with the help of GenAI and using Bedrock in an application. This blog post explains the project and my experience building this simple spelling game.
This is the final result - the game you can play :)
How to play the game?
Player can select a language to play the game. As of now, only English (US) and Dutch are the available languages.
When the player selects a language, there will be a maximum 5 words generated. This includes an audio, brief meaning of the word and the number of characters for each word.
Then the player needs to fill the word in the text box given.
There is an indicator next to the text box how many characters have been entered in the text box and how many characters are required for the word. This indicator will be red until the required number of characters are filled, then it turns into green.
There is a timer that starts as soon as the words are generated. This is based on the number of words generated.
When the remaining time is less than 30 seconds, the background of the page as well as the background of the timer turns into red.
Player can submit the answers before the timer runs out, else it will be automatically submitted.
Then, the answers are evaluated and based on the number of correct answers, there is a pop up visible.
If all the answers are correct, there will be a "Confetti" effect appearing on the page.
Clicking on the 'show results' button on the pop up, the player can see the correct/incorrect answers and the correct word (in case of an incorrect answer).
Implementation
There are 2 main parts of this application.
- Backend - where words are generated and APIs are available.
- Frontend - Vue.js application for the player to interact.
Source code
There are two repositories available with the complete source code.
Backend - https://github.com/pubudusj/spelling-game-backend
Frontend - https://github.com/pubudusj/spelling-game-frontend
Deployment instructions:
Backend is implemented using AWS CDK and can be deployed as a generic CDK application. The URL of the Cloudfront distribution is required for the frontend to work.
In the frontend, add the Cloudfront API URL to the VITE_API_BASE_URL in the env file. Install necessary dependencies and then run the application using npm run dev
in the dev mode. Or you can build the frontend app using npm run build
.
When the backend stack is deployed, it will output the S3 hosting bucket name. You can copy the built frontend app into this bucket which will host the frontend via the Cloudfront distribution created in the backend stack.
Backend
There are 2 main components of the backend.
- Words generator component - to generate and save words in the database.
- API component - to serve frontend.
Backend - words generator component
Here is the high level overview of the words generator component and the steps within the state machine. Within Step Functions, there will be many AWS services called as explained below.
Image: Words generator overview:
Image: Words generator state machine:
Words Generator Step Functions state machine is responsible for generating words.
This Step function execution takes the language code as an input. ex: en-US, nl-NL, etc.
As the first step, Bedrock InvokeModel is being called to generate 5 words with descriptions for each word, based on the language.
Here, the Anthropic Claude 3 Haiku model is being used which gives better balance of accuracy and the pricing in this scenario.
-
Here is the prompt I used to generate the words:
"Generate 5 unique words that have a random number of characters more than 4 and less than 10 in Dutch language. For each word, provide a brief description of its meaning in English with more than a couple of words. Produce output only in a minified JSON array with the keys word and description. Word must always be in lowercase."
Here, the response is a JSON string.
-
Within the step's Result selector, the response will be converted to an array using the intrinsic function - StringToJson.
{ "words.$": "States.StringToJson($.Body.content[0].text)" }
Then there is a map state where each word is an input to each map.
Within the map state, there are branches (currently two) based on the language.
In each branch, there is a step to synthesise the word using Polly using StartSpeechSynthesisTask.
StartSpeechSynthesisTask is an async operation. So, there is an immediate step to check if the synthesis task is completed using Polly's GetSpeechSynthesisTask.
Great thing about the StartSpeechSynthesisTask api is that it not only synthesises, but automatically saves the mp3 into the given S3 bucket.
If the speech synthesis task is not finished based on the status of the GetSpeechSynthesisTask, it waits and retries the status check.
Once synthesis is done, the execution continues to save word to DynamoDB step.
-
In this step, we use DynamoDB's PutItem api to save the generated data to the table. One record consists of below data:
pk - Primary key in the format of Word#{LanguageCode}. ex: Word#en-US sk - MD5 hash of the word - here, intrinsic function States.Hash($.word, 'MD5')is in use. word - The word generated by Bedrock. description - The description generated by Bedrock. s3file - Mp3 file location provided by Polly synthesis task. charcount - Character count of the word. This is retrieved from Polly synthesis task. updated_at - Update timestamp.
Since this synthesis and save to db task runs on a map state, after a single execution, there will be a maximum 5 new words available in the ddb for the given language.
There are two EventBridge Schedules running every 5 minutes to call this State Machine with the different language codes - English and Dutch.
Backend - API component
API component is to create resources to interact with the frontend.
There are two APIs available in the backend.
-
POST /questions
- To generate questions using Step Functions to appear on the frontend. -
POST /answers
- To validate the answers submitted by the player.
Generate questions API
Generate questions API accept one argument. Which is the language code.
This /questions api has a proxy integration to a Lambda Function which will start an execution in Questions Generator State Machine synchronously using start_sync_execution SDK call.
-
This Questions Generator State Machine is in type - Express.
-
A sample input is as follows:
{ "language":"en-US", "iterate":[1, 2, 3, 4, 5] }
Here, "iterate" is a hard coded array to start a map execution within the state machine.
Within the state machine, first the map state is executed based on the "iterate" array from the input.
-
Inside the map state, first, it fetches max 50 records from DynamoDB. Here, DynamoDB scan is used. However, in order to fetch some random data, a random ExclusiveStartKey is in use with the help of the intrinsic function UUID(). Also, FilterExpression is used to filter the records applicable only for the given language code.
{ "TableName": "arn:aws:dynamodb:****", "Limit": 50, "ExclusiveStartKey": { "pk": { "S.$": "States.Format('Word#{}', $$.Execution.Input.language)" }, "sk": { "S.$": "States.UUID()" } }, "FilterExpression": "pk = :pk", "ExpressionAttributeValues": { ":pk": { "S.$": "States.Format('Word#{}', $$.Execution.Input.language)" } }, "ReturnConsumedCapacity": "TOTAL" }
Next, the number of item counts returned from the previous step is checked.
-
If the count is more than 0, then in the next step, single random record is selected from them. This step is a Pass state with transformation using Parameters, which uses intrinsic functions - ArrayGetItem, MathRandom and ArrayLength.
{ "item.$": "States.ArrayGetItem($.items,States.MathRandom(0, States.ArrayLength($.items)))" }
Then, the selected record is being sent to the Generate Pre signed URL step. This is a Lambda function, which generates a pre-signed URL for the s3file path of the record. So, from the frontend the mp3 file can be played using this pre-signed url. Also there is a transformation of data within this Lambda function. The expiry of the pre-signed url is set to minimum because it is only required within the session of the game.
-
This is the last step within the map state which outputs the record in below format.
{ "id":"XXXX", "description":"Description of the word", "charcount":8, "language":"en-US", "url":"Presigned-url" }
Once all the map steps are completed, there is a final aggregation Lambda function - GetUniqueResultsLambda. Since each map step is independent, there is a possibility of selecting the same record in more than one map state. This Lambda function simply removes such duplications.
And the response is returned to the frontend as the response to the
/questions
endpoint.
Validate answers API
POST /answers
API is responsible for validating the answers submitted in the frontend.-
This API endpoint has a proxy Lambda function which accepts the payload in below format:
{ "language":"en-US", "answers":[ { "id":"312e8b6583d4b65b", "word":"effect" }, { "id":"99b788c54c1a8265", "word":"perilous" }, ... ... ] }
Within the Lambda function, it does a DynamoDB's batch_get_item SDK call to fetch words per ids and match with the word provided in the API.
-
Then it returns the response in below format:
[ { "id": "312e8b6583d4b65b", "original_word": "affect", "correct": false }, { "id": "99b788c54c1a8265", "original_word": "perilous", "correct": true }, ... ... ]
Based on the value of "correct", frontend will calculate and show the results.
Frontend
For the frontend, I have used a simple single page application built with Vue.js. I have very limited knowledge on frontend technologies. Because of that, I used Amazon Q Developer on VSCode to implement the frontend application.
Almost 95% of the frontend application was built by Amazon Q Developer. I have asked different questions and in most of the cases, Amazon Q was able to analyse the code and generate the code as per my requirements. This was a step by step process where I asked Amazon Q to generate specific functionality at a time.
Here are some work in progress "versions" of the application that was implemented and fine tuned step by step using Amazon Q.
Here are some examples for the questions I asked from Amazon Q and how it analysed and generated code.
Lessons learnt
Below are some of the lessons I learnt while I was working on this project, including some feedback on some of the services I used here.
Bedrock does not always return JSON. Within the prompt I used, I stated - "Produce output only in a minified JSON array with the keys word and description". However, once in a while, Bedrock returns data in different formats. This could have been more accurate if I add the beginning of the response in the prompt, so Bedrock can continue from there. However, this will increase the request token count for each API call. So, to avoid additional cost and also, the error rate is acceptable (since this is anyway a background job), I kept this prompt as it is.
Amazon Polly's StartSpeechSynthesisTask doesn't support S3 path. We can provide the OutputS3BucketName where the generated audio will be stored. However, we cannot specify a path to save the object. Instead, I have used the OutputS3KeyPrefix parameter to provide the path with the language code so, the audio is saved in
s3://bucket_name/language_code/file_name.mp3
However, one minor issue with that is, Polly always adds a dot (.) between the file name and the prefix. So, all the files generated in the sub path start with a dot.Cost of Polly. Polly has Generative and Neural text-to-speech engines apart from Standard. However, the cost of them is quite high compared to Standard. Also, those are available only for a limited number of languages. https://aws.amazon.com/polly/pricing/
Selecting random items from the DynamoDB table is hard. There is no straightforward way to achieve this. That's why I had to use a ExclusiveStartKey and fetch maximum 50 items and select one random item. This might introduce some duplications. That's the reason I had to use the GetUniqueResultsLambda which removes any duplicates from the map job.
I initially used direct integrations to start the Step Functions express workflow to generate questions directly from API Gateway. However, the VTL is complex to build specially to get the response in a specific format. So I stuck to the Lambda proxy option which was more simple.
Initially I have hosted the frontend using Amplify Hosting. However, I wanted to restrict the API Gateway access only from Amplify project, but there is no option yet. So I switched to Cloudfront, S3 set up. This is described in detail in one of my previous blogs: Enforce CloudFront-Only Access for AWS API Gateway
For the frontend, I indeed did Vibe coding. So, there is a high chance that any frontend specialist might find unacceptable code in it