Redesigning the Reference Component on Gabe the Bear Blog
The reference component is one thing on Gabe the Bear Blog I'm most proud of. It doesn't look pretty, but it offers a place for reference and attachments (GitHub repos, relevant links, etc). Few blogs I see have such a pronounced component for reference. Most websites have a "See Also" component with a list of links below it. Some websites haven't even got a "See Also" section, in which case it is just another heading under the content. Some websites do not reference anything, even when the text says "according to a report", which is amazing. If you have not noticed, the reference component is each item listed under the "Reference" heading on a blog post (see above), on the right on a wide screen and at the bottom on a narrow one. It looks nothing more than a list item, except the icon is determined by the reference kind. For some kinds, there might be extra text before the link. I have always used this component to add links to the material I referred to on Gabe the Bear blog posts. It worked really well until this Pi Day. Problem on the Blog If you look at the reference section on one of the Pi Day posts this year, you will find their text looks rather bizarre. The text under the title is much longer than my other posts. It also appears to be a machine-readable format rather than one meant for bears. In fact, it was in a format called BibTeX. It is a reference format used in academia to cite materials, used together with LaTeX. When I was writing the posts (actually, Jupyter Notebooks) for Pi Day 2025 celebration, I didn't expect the citations to look like that. That's when I realised that the component was initially designed for documentation pages and GitHub repos, and not academic papers. Since Pi Day celebration is important for bears, I expect to cite more papers in the future. Therefore, the component will need a change. Data Structure Currently, the prop data structure for the reference component is type Reference { kind: ReferenceKind; // Enum for deciding the icon, details omitted name: string; // Title of the page/video/paper etc. url: string; // URL description: string; // Why this reference was added } My first question was whether I needed to change this. In other words, whether there are other fields that are important, but I never listed on my posts. The description stating why this reference was added always feels slightly weird to me. Maybe it needs a replacement. As a tangent, I attempted to apply for several PhD programmes over the past year, most of which asked for a research proposal. While writing those, I learned of a citation management tool called Zotero. It generates citation based on a URL, and converts it to citation text based on a specified format. It also has a browser extension to be used with their desktop tool. The fields it collects from pages can be useful for this problem. So I listed all the kinds of things I think I will ever refer to in my posts. A paper (like this one) A documentation page (like this one) A Kaggle dataset (like this one) A web-based game (like this one) A conference talk on YouTube (like this one) A GitHub repository (like this one) A song (like this one) I then saved them to Zotero and checked what information it saved for each. I also checked the citation it generated to see which of the fields it collected are more important. I chose IEEE as the reference format, only because I'm more familiar with it. For example, for the YouTube video I chose, Zotero generated the citation in IEEE format as [1] PyData, Raymond Hettinger: Numerical Marvels Inside Python - Keynote | PyData Tel Aviv 2022, (Jan. 11, 2023). Accessed: Mar. 17, 2025. [Online Video]. Available: https://www.youtube.com/watch?v=wiGkV37Kbxk Although Zotero collected many more fields, it seemed only the channel, title and URL were important to the IEEE format. I did this for all the seven items above. Here are the results. Note that the fields are not all possible fields Zotero can collect, but rather, all fields that were not empty in my experiment. For the paper, Zotero collected the item as "Preprint". It also collected the title, authors, repository (Arxiv, in this case), DOI (Digital Object Identifier) and the URL. For the documentation page, dataset, the game and the song, Zotero collected all of them as "Web Page", with their title, author, website title and URL. In particular, because the song was collected from YouTube Music, which was both a complex web application and a Premium-only feature, the URL was incorrect. For the conference talk, Zotero collected it as "Video Recording", with title, director, URL, date and running time. Surprisingly, it seemed to list the name of the YouTube channel as "Director". For the GitHub repository, Zotero collected it as "Software", with the repo title, library catalogue ("GitHub"), programming language. It also somehow collected Panda3D as "Compa

The reference component is one thing on Gabe the Bear Blog I'm most proud of. It doesn't look pretty, but it offers a place for reference and attachments (GitHub repos, relevant links, etc). Few blogs I see have such a pronounced component for reference. Most websites have a "See Also" component with a list of links below it. Some websites haven't even got a "See Also" section, in which case it is just another heading under the content. Some websites do not reference anything, even when the text says "according to a report", which is amazing.
If you have not noticed, the reference component is each item listed under the "Reference" heading on a blog post (see above), on the right on a wide screen and at the bottom on a narrow one. It looks nothing more than a list item, except the icon is determined by the reference kind. For some kinds, there might be extra text before the link.
I have always used this component to add links to the material I referred to on Gabe the Bear blog posts. It worked really well until this Pi Day.
Problem on the Blog
If you look at the reference section on one of the Pi Day posts this year, you will find their text looks rather bizarre.
The text under the title is much longer than my other posts. It also appears to be a machine-readable format rather than one meant for bears. In fact, it was in a format called BibTeX. It is a reference format used in academia to cite materials, used together with LaTeX.
When I was writing the posts (actually, Jupyter Notebooks) for Pi Day 2025 celebration, I didn't expect the citations to look like that. That's when I realised that the component was initially designed for documentation pages and GitHub repos, and not academic papers. Since Pi Day celebration is important for bears, I expect to cite more papers in the future. Therefore, the component will need a change.
Data Structure
Currently, the prop data structure for the reference component is
type Reference {
kind: ReferenceKind; // Enum for deciding the icon, details omitted
name: string; // Title of the page/video/paper etc.
url: string; // URL
description: string; // Why this reference was added
}
My first question was whether I needed to change this. In other words, whether there are other fields that are important, but I never listed on my posts. The description stating why this reference was added always feels slightly weird to me. Maybe it needs a replacement.
As a tangent, I attempted to apply for several PhD programmes over the past year, most of which asked for a research proposal. While writing those, I learned of a citation management tool called Zotero. It generates citation based on a URL, and converts it to citation text based on a specified format. It also has a browser extension to be used with their desktop tool. The fields it collects from pages can be useful for this problem.
So I listed all the kinds of things I think I will ever refer to in my posts.
- A paper (like this one)
- A documentation page (like this one)
- A Kaggle dataset (like this one)
- A web-based game (like this one)
- A conference talk on YouTube (like this one)
- A GitHub repository (like this one)
- A song (like this one)
I then saved them to Zotero and checked what information it saved for each. I also checked the citation it generated to see which of the fields it collected are more important. I chose IEEE as the reference format, only because I'm more familiar with it. For example, for the YouTube video I chose, Zotero generated the citation in IEEE format as
[1] PyData, Raymond Hettinger: Numerical Marvels Inside Python - Keynote | PyData Tel Aviv 2022, (Jan. 11, 2023). Accessed: Mar. 17, 2025. [Online Video]. Available: https://www.youtube.com/watch?v=wiGkV37Kbxk
Although Zotero collected many more fields, it seemed only the channel, title and URL were important to the IEEE format.
I did this for all the seven items above. Here are the results. Note that the fields are not all possible fields Zotero can collect, but rather, all fields that were not empty in my experiment.
- For the paper, Zotero collected the item as "Preprint". It also collected the title, authors, repository (Arxiv, in this case), DOI (Digital Object Identifier) and the URL.
- For the documentation page, dataset, the game and the song, Zotero collected all of them as "Web Page", with their title, author, website title and URL. In particular, because the song was collected from YouTube Music, which was both a complex web application and a Premium-only feature, the URL was incorrect.
- For the conference talk, Zotero collected it as "Video Recording", with title, director, URL, date and running time. Surprisingly, it seemed to list the name of the YouTube channel as "Director".
- For the GitHub repository, Zotero collected it as "Software", with the repo title, library catalogue ("GitHub"), programming language. It also somehow collected Panda3D as "Company" and the repo description as "Abstract". I needed to wait until the page loaded completely, otherwise it would be collected as a "Web Page".
Note that on the Kaggle dataset page, there is a section under "Metadata" called "DOI Citation", which might give how the dataset should be cited. Look at this ASL Alphabet dataset, for example.
The "About Dataset" section might also specify how the dataset should be cited.
Given what information Zotero and IEEE format think is important for each of these resources, I will change the data structure to the following:
type Reference {
kind: ReferenceKind; // Enum for deciding the icon, details omitted
name: string; // Title of the page/video/paper etc.
url: string; // URL
description: string;
/**
* "S -> T" means "S if S is not None else T" in Python
*
* For papers: archive ID -> DOI
* For web pages: website title -> host name
* For conference talks: library catalogue + channel title ("director") -> library catalogue
* For code repositories: library catalogue + author ("company")
*/
dates: {
kind: "Accessed" | "Published",
date: Date
}[]; // Dates of access and publish, as available
}
I do not consider changing the component's appearance other than adding dates at the bottom.
Intermezzo: Metadata
One interesting observation was that, of all the "Web Page" type resources, only the game page had "Author" and "Website Title" fields filled in correctly. In particular, for Kaggle datasets, the page must be accessed by putting the URL into the address bar for Zotero to collect "Title" (page title) correctly, and not from navigation on Kaggle search page. Otherwise, the page title will be collected as "Find Open Dataset and Machine Learning Projects | Kaggle".
Why is this? I looked into the page source of the game page above.
- Author came from
author
meta tag - Website title came from
og:site_name
meta tag - Also, the page's
og:type
meta tag states it's anarticle
; while a more proper option should be "game", there's no such type in the Open Graph protocol
Also, the GitHub repo's description came from the description
meta tag. This is also the description when creating a repository. I didn't find where the "Company" field came from, though. This might require reading the source code of Zotero.
All this shows the importance of writing correct metadata for the page. It's not just search engine engines any more. It's also the reference management tools like Zotero, which many PhD students use, that can get messed up without correct metadata.
The quirk of Kaggle website was because when navigating, the page only updated the content
element, but not the og:title
property in the element. Hopefully this is fixed when you're browsing this site in the future.
Citations Across the Page
Another thing I want to be able to do is to associate each link with its corresponding citation. I had thought this was impossible, until I found a page that did this. It was the Python documentation page. See this page, for example. The "Footnotes" section has reference to the text, with a link that can link back to the location where the notes were made.
There's nothing special about the HTML used to implement this feature, other than the numbers "[1]" to "[4]" in the "Footnotes" section had IDs of id1
to id4
, while the corresponding items in the text had IDs of id5
to id8
. The tags for those labels simply linked to the correct IDs. For citation, however, links should only be from the text to the reference section, as a same citation can be at many places in the text.
The Python documentation is built from a markup format called reStructuredText. In this format, there are explicit syntax constructs for footnotes and citations. For Markdown documents, no footnote or citation syntax is supported, but there seem to be Remark and Rehype plugins. I have yet to try any of them. My idea of an ideal plugin that supports citation in Markdown is one such that:
- I won't need to type the numbers manually; these numbers should be updated according to their location in the text
- I am able to style the bibliography, including its position
- I am able to put custom data structure in the reference, i.e., reference style
As another tangent, the idea of collecting URLs in the page and showing them as reference came from a web scraping project of mine. At the time, I wanted to see whether a news article was referring to information only from one site, and therefore is likely biased and unreliable. The link sources could also reveal most popular sites to find relevant information.
It might be useful to borrow a similar idea, to collect URLs from the post, then generate citations for them. Because this process happens after compiling Markdown to HTML, this is more likely a Rehype thing. Zotero seems to support custom reference formats, so it might be possible to generate a mapping from the URL to the data needed for the component.
Conclusion
I have been writing a conference talk this week so unable to finish all my experiments, test all my hyphotheses, implement the redesign, or read Zotero source code to understand what fields it collects and how it collects them. But I think this was a good reflection of my Pi Day 2025 celebration. My intension is to show that, a citation is more than a link. It is a description of the materials I built my post on top of. The description also lets me and every bear who sees the post decide whether this post is reliable or trustworthy.