Amazon Redshift Data Warehousing (Sample Project)
Overview This project focuses on analyzing two key business processes using Amazon Redshift: Identifying the most common frequency of purchases per season. Finding products with high review ratings. Step 1: Setting Up Amazon Redshift Create a Redshift Cluster Navigate to AWS Redshift Console. Click Create Cluster. Choose a single-node cluster. Select an appropriate node type (e.g., dc2.large). Set the database name, username, and password. Click Create cluster. Create an IAM Role Go to the IAM Console. Create a new role with AmazonS3ReadOnlyAccess. Attach the role to your Redshift cluster. Create an S3 Bucket and Upload Data Navigate to Amazon S3. Create a new bucket (e.g., zzetu). Upload shopping_data.csv to this bucket. Step 2: Loading Data into Redshift Use the Amazon Redshift Query Editor to run SQL commands. Create a table to store raw data: CREATE TABLE shopping ( CustomerID INTEGER, Age INTEGER, Gender VARCHAR(50), Category VARCHAR(50), Location VARCHAR(50), Season VARCHAR(50), ReviewRating REAL, SubscriptionStatus VARCHAR(50), PaymentMethod VARCHAR(50), ShippingType VARCHAR(50), DiscountApplied VARCHAR(50), PromoCodeUsed VARCHAR(50), PreviousPurchases INTEGER, PreferredPaymentMethod VARCHAR(50), FrequencyOfPurchases VARCHAR(50) ); Copy data from S3 into Redshift: COPY shopping FROM 's3://zzetu/shopping_data.csv' IAM_ROLE 'arn:aws:iam::your-account-id:role/your-redshift-role' FORMAT AS CSV IGNOREHEADER 1; Step 3: Schema Design We are using a star schema with 1 fact table and 3 dimension tables: 1. Dimension Tables CREATE TABLE dimCustomer ( CustomerID INTEGER PRIMARY KEY, Age INTEGER, Gender VARCHAR(50), Location VARCHAR(50), SubscriptionStatus VARCHAR(50), PreferredPaymentMethod VARCHAR(50) ); CREATE TABLE dimProduct ( Category VARCHAR(50) PRIMARY KEY, DiscountApplied VARCHAR(50), PromoCodeUsed VARCHAR(50) ); CREATE TABLE dimTransaction ( TransactionID INTEGER IDENTITY(1,1) PRIMARY KEY, CustomerID INTEGER, PaymentMethod VARCHAR(50), ShippingType VARCHAR(50), Season VARCHAR(50), FrequencyOfPurchases VARCHAR(50), FOREIGN KEY (CustomerID) REFERENCES dimCustomer(CustomerID) ); 2. Fact Table CREATE TABLE factPurchases ( CustomerID INTEGER, Age INTEGER, ReviewRating REAL, PreviousPurchases INTEGER, FOREIGN KEY (CustomerID) REFERENCES dimCustomer(CustomerID) ); Step 4: Data Insertion -- Insert data into dimCustomer INSERT INTO dimCustomer (CustomerID, Age, Gender, Location, SubscriptionStatus, PreferredPaymentMethod) SELECT DISTINCT CustomerID, Age, Gender, Location, SubscriptionStatus, PreferredPaymentMethod FROM shopping; -- Insert data into dimProduct INSERT INTO dimProduct (Category, DiscountApplied, PromoCodeUsed) SELECT DISTINCT Category, DiscountApplied, PromoCodeUsed FROM shopping; -- Insert data into dimTransaction INSERT INTO dimTransaction (CustomerID, PaymentMethod, ShippingType, Season, FrequencyOfPurchases) SELECT CustomerID, PaymentMethod, ShippingType, Season, FrequencyOfPurchases FROM shopping; -- Insert data into factPurchases INSERT INTO factPurchases (CustomerID, Age, ReviewRating, PreviousPurchases) SELECT CustomerID, Age, ReviewRating, PreviousPurchases FROM shopping; Step 5: Business Queries 1. Most Used Frequency of Purchase per Season SELECT t.Season, t.FrequencyOfPurchases, COUNT(*) AS PurchaseCount FROM factPurchases f JOIN dimTransaction t ON f.CustomerID = t.CustomerID GROUP BY t.Season, t.FrequencyOfPurchases ORDER BY t.Season, PurchaseCount DESC; 2. Products with High Review Ratings SELECT AVG(f.ReviewRating) AS AvgRating FROM factPurchases f ORDER BY AvgRating DESC; Setup Guide Create Redshift Cluster and configure IAM roles. Load Data from S3 into Redshift tables. Run the SQL scripts for creating tables and inserting data. Execute business queries to generate insights. Repository Link GitHub Repository

Overview
This project focuses on analyzing two key business processes using Amazon Redshift:
- Identifying the most common frequency of purchases per season.
- Finding products with high review ratings.
Step 1: Setting Up Amazon Redshift
-
Create a Redshift Cluster
- Navigate to AWS Redshift Console.
- Click Create Cluster.
- Choose a single-node cluster.
- Select an appropriate node type (e.g.,
dc2.large
). - Set the database name, username, and password.
- Click Create cluster.
-
Create an IAM Role
- Go to the IAM Console.
- Create a new role with AmazonS3ReadOnlyAccess.
- Attach the role to your Redshift cluster.
-
Create an S3 Bucket and Upload Data
- Navigate to Amazon S3.
- Create a new bucket (e.g.,
zzetu
). - Upload
shopping_data.csv
to this bucket.
Step 2: Loading Data into Redshift
- Use the Amazon Redshift Query Editor to run SQL commands.
- Create a table to store raw data:
CREATE TABLE shopping (
CustomerID INTEGER,
Age INTEGER,
Gender VARCHAR(50),
Category VARCHAR(50),
Location VARCHAR(50),
Season VARCHAR(50),
ReviewRating REAL,
SubscriptionStatus VARCHAR(50),
PaymentMethod VARCHAR(50),
ShippingType VARCHAR(50),
DiscountApplied VARCHAR(50),
PromoCodeUsed VARCHAR(50),
PreviousPurchases INTEGER,
PreferredPaymentMethod VARCHAR(50),
FrequencyOfPurchases VARCHAR(50)
);
- Copy data from S3 into Redshift:
COPY shopping
FROM 's3://zzetu/shopping_data.csv'
IAM_ROLE 'arn:aws:iam::your-account-id:role/your-redshift-role'
FORMAT AS CSV
IGNOREHEADER 1;
Step 3: Schema Design
We are using a star schema with 1 fact table and 3 dimension tables:
1. Dimension Tables
CREATE TABLE dimCustomer (
CustomerID INTEGER PRIMARY KEY,
Age INTEGER,
Gender VARCHAR(50),
Location VARCHAR(50),
SubscriptionStatus VARCHAR(50),
PreferredPaymentMethod VARCHAR(50)
);
CREATE TABLE dimProduct (
Category VARCHAR(50) PRIMARY KEY,
DiscountApplied VARCHAR(50),
PromoCodeUsed VARCHAR(50)
);
CREATE TABLE dimTransaction (
TransactionID INTEGER IDENTITY(1,1) PRIMARY KEY,
CustomerID INTEGER,
PaymentMethod VARCHAR(50),
ShippingType VARCHAR(50),
Season VARCHAR(50),
FrequencyOfPurchases VARCHAR(50),
FOREIGN KEY (CustomerID) REFERENCES dimCustomer(CustomerID)
);
2. Fact Table
CREATE TABLE factPurchases (
CustomerID INTEGER,
Age INTEGER,
ReviewRating REAL,
PreviousPurchases INTEGER,
FOREIGN KEY (CustomerID) REFERENCES dimCustomer(CustomerID)
);
Step 4: Data Insertion
-- Insert data into dimCustomer
INSERT INTO dimCustomer (CustomerID, Age, Gender, Location, SubscriptionStatus, PreferredPaymentMethod)
SELECT DISTINCT CustomerID, Age, Gender, Location, SubscriptionStatus, PreferredPaymentMethod FROM shopping;
-- Insert data into dimProduct
INSERT INTO dimProduct (Category, DiscountApplied, PromoCodeUsed)
SELECT DISTINCT Category, DiscountApplied, PromoCodeUsed FROM shopping;
-- Insert data into dimTransaction
INSERT INTO dimTransaction (CustomerID, PaymentMethod, ShippingType, Season, FrequencyOfPurchases)
SELECT CustomerID, PaymentMethod, ShippingType, Season, FrequencyOfPurchases FROM shopping;
-- Insert data into factPurchases
INSERT INTO factPurchases (CustomerID, Age, ReviewRating, PreviousPurchases)
SELECT CustomerID, Age, ReviewRating, PreviousPurchases FROM shopping;
Step 5: Business Queries
1. Most Used Frequency of Purchase per Season
SELECT t.Season, t.FrequencyOfPurchases, COUNT(*) AS PurchaseCount
FROM factPurchases f
JOIN dimTransaction t ON f.CustomerID = t.CustomerID
GROUP BY t.Season, t.FrequencyOfPurchases
ORDER BY t.Season, PurchaseCount DESC;
2. Products with High Review Ratings
SELECT AVG(f.ReviewRating) AS AvgRating
FROM factPurchases f
ORDER BY AvgRating DESC;
Setup Guide
- Create Redshift Cluster and configure IAM roles.
- Load Data from S3 into Redshift tables.
- Run the SQL scripts for creating tables and inserting data.
- Execute business queries to generate insights.