Designing a database for a social media app
Making clones of websites and recreating their functionality is a good way to develop skills and even challenge oneself in ways you may not have expected to. So as a challenge to myself, I decided to create a clone of the social media website, Tumblr. I chose Tumblr because it's the social media website that I've used the most. The Planning Phase The biggest mistake to make with any large project is to open VSCode and stare at your code editor for a while, wondering where to begin (and often times even rewriting code you just wrote after realizing what else needs to be done). The best way to start a project like this is to ask questions about what needs to be done, how the web app is supposed to behave for users. Having a full plan reduces roadblocks and gives you a clear map of what your end result is supposed to be. Some questions I had to ask myself: How to handle user authentication? What will my routes look like, and what data will they require / send to the user? What are the most important features for my clone website (an important question, as I'm only one person. Tumblr as a whole is managed by hundreds of employees) What persistent data will I need to handle? For the sake of this article, I'll only focus on the answer I came up with for the last one for now. What persistent data will I need to handle? Users Users need a username. For their credentials, I'll need to store their email address and their (hashed) password. Users should be able to customize their profiles. Unlike the above two, this is optional. Users can save some info about themselves in an "About" section. They can also save a header image for their blog, and a profile image. For this project, I won't store the images in my database. I'll use cloudinary for that, and I'll store the links to the images (and the image ids) in my database instead. Posts Here's where things got complicated. Tumblr has a reblog feature, which allows users to share posts from other blogs to their own. They can't edit content written by other users, but they can add their own content to posts. Reblogged posts are brand new posts that "belong" to the one reblogging. So I'll need to replicate that functionality. Another thing to consider is how shared posts function when the original poster edits them. Imagine if someone made a post saying "I love cats". A thousand other users can share / reblog these posts, only to have this post suddenly change on them when the original poster edits the post to say, "I hate cats" instead. Tumblr doesn't allow this behavior, and editing a post will only edit that post for the specific user. There's also the matter of post content. Posts on tumblr can be made up of text, music, videos, photos, or links. Then you need a way to manage which part of a post was created by which user, as well as what type of content the post contains. My solution to this involved breaking posts down into segments. A post could be made up of several segments, which would be responsible for the post content. And the content would be displayed in order of when the segment was created, starting with the oldest segment. This allows me to simplify the posts themselves. Now I just need to care about the following When was a post created? (a timestamp) Who made the post? (Its author) If the post is a reblog, what is the original post? If the post is reblog, what post was it immediately reblogged from? Segments As said above, segments will be responsible for the actual post content. Whenever a user creates a new post or adds content to a reblogged post, they create a post segment. Whenever a user edits a post, they will only be able to a segment that they themselves have created. As far as Segment data goes, I'll need: The segment author When the segment was created What type of content a segment holds (text, audio, video, etc). This is so the front end knows how to render the content. Tumblr allows users to mix and match post types, so I could technically use JSON to store post segment types. But I decided to keep it simple The content of the post segment Replies Users can leave comments, or replies, on posts. Fortunately, tumblr keeps this pretty simple. Replies can only be text. So there isn't much to track here aside from: Who made the reply The content of the reply The post the reply is being made on Tags Tumblr has a post tagging system, which lets users search for posts that aren't reblogs by tag, and lets them search their own blogs for any posts that they've tagged. Tags should be unique (multiple tags named "travel" don't need to exist). The only thing to worry about as far as tags are concerned is the content of the tag. Relations Now it's important to figure out how all of this data is supposed to relate to each other. Users -> Posts: One to many relationship. Users can create multiple posts. Each post is only created by one

Making clones of websites and recreating their functionality is a good way to develop skills and even challenge oneself in ways you may not have expected to. So as a challenge to myself, I decided to create a clone of the social media website, Tumblr. I chose Tumblr because it's the social media website that I've used the most.
The Planning Phase
The biggest mistake to make with any large project is to open VSCode and stare at your code editor for a while, wondering where to begin (and often times even rewriting code you just wrote after realizing what else needs to be done). The best way to start a project like this is to ask questions about what needs to be done, how the web app is supposed to behave for users. Having a full plan reduces roadblocks and gives you a clear map of what your end result is supposed to be.
Some questions I had to ask myself:
- How to handle user authentication?
- What will my routes look like, and what data will they require / send to the user?
- What are the most important features for my clone website (an important question, as I'm only one person. Tumblr as a whole is managed by hundreds of employees)
- What persistent data will I need to handle?
For the sake of this article, I'll only focus on the answer I came up with for the last one for now.
What persistent data will I need to handle?
Users
- Users need a username.
- For their credentials, I'll need to store their email address and their (hashed) password.
- Users should be able to customize their profiles. Unlike the above two, this is optional. Users can save some info about themselves in an "About" section. They can also save a header image for their blog, and a profile image. For this project, I won't store the images in my database. I'll use cloudinary for that, and I'll store the links to the images (and the image ids) in my database instead.
Posts
Here's where things got complicated. Tumblr has a reblog feature, which allows users to share posts from other blogs to their own. They can't edit content written by other users, but they can add their own content to posts. Reblogged posts are brand new posts that "belong" to the one reblogging. So I'll need to replicate that functionality.
Another thing to consider is how shared posts function when the original poster edits them. Imagine if someone made a post saying "I love cats". A thousand other users can share / reblog these posts, only to have this post suddenly change on them when the original poster edits the post to say, "I hate cats" instead. Tumblr doesn't allow this behavior, and editing a post will only edit that post for the specific user.
There's also the matter of post content. Posts on tumblr can be made up of text, music, videos, photos, or links. Then you need a way to manage which part of a post was created by which user, as well as what type of content the post contains.
My solution to this involved breaking posts down into segments. A post could be made up of several segments, which would be responsible for the post content. And the content would be displayed in order of when the segment was created, starting with the oldest segment.
This allows me to simplify the posts themselves. Now I just need to care about the following
- When was a post created? (a timestamp)
- Who made the post? (Its author)
- If the post is a reblog, what is the original post?
- If the post is reblog, what post was it immediately reblogged from?
Segments
As said above, segments will be responsible for the actual post content. Whenever a user creates a new post or adds content to a reblogged post, they create a post segment. Whenever a user edits a post, they will only be able to a segment that they themselves have created. As far as Segment data goes, I'll need:
- The segment author
- When the segment was created
- What type of content a segment holds (text, audio, video, etc). This is so the front end knows how to render the content. Tumblr allows users to mix and match post types, so I could technically use JSON to store post segment types. But I decided to keep it simple
- The content of the post segment
Replies
Users can leave comments, or replies, on posts. Fortunately, tumblr keeps this pretty simple. Replies can only be text. So there isn't much to track here aside from:
- Who made the reply
- The content of the reply
- The post the reply is being made on
Tags
Tumblr has a post tagging system, which lets users search for posts that aren't reblogs by tag, and lets them search their own blogs for any posts that they've tagged. Tags should be unique (multiple tags named "travel" don't need to exist). The only thing to worry about as far as tags are concerned is the content of the tag.
Relations
Now it's important to figure out how all of this data is supposed to relate to each other.
Users -> Posts: One to many relationship. Users can create multiple posts. Each post is only created by one user.
Users -> Segments: This is the same relationship as users to posts.
Posts <-> Segments: Many to many relationship. A post can be made up of multiple segments. Segments can be associated with multiple posts via reblogging.
Posts <-> Tags: Many to many relationship. A post can have several tags. And the same tag can be associated with multiple posts.
Users following Users: Every social media site needs to have a follow feature! Users can follow many users, and users can be followed by many other users.
Users liking Posts: Users can like many posts. Posts can be liked by many users. This is another many to many relationship. For reblogged posts, likes are shared between them and the parent post.
Post -> Replies: Posts can have many replies, but replies are unique to one post. This is a one to many relationship.
After all of this, we can now decide what our database schema is going to look like.
This is just one of many steps. In my next post, I'll delve into using an ORM, writing tests, and creating my routes and controllers.