From Data to Dashboards: Building an EC2 Cost Analysis Tool with Flask and AWS S3
Disclaimer: This article was previously published in Medium.com, link to article: https://medium.com/@alex.curtis_luit/from-data-to-dashboards-building-an-ec2-cost-analysis-tool-with-flask-and-aws-s3-4c0d312ea38f If you don’t have your own magic hat and need to analyze 100,000 EC2s and provide recommendations….follow along! How to decode, decipher and recommend changes that will save money on your businesses 100K EC2 operation. No magic hat?! No problem….enter Python! This article explores how to build a Flask-based web application that analyzes EC2 cost data, generates insightful visualizations, and leverages AWS S3 for storage and retrieval. The Application’s Core Functionality: Our application aims to provide a user-friendly interface for analyzing EC2 cost data. It performs the following key tasks: 1️⃣Data Ingestion: Reads EC2 cost data from a CSV file using Pandas. 2️⃣Data Analysis: Performs various cost-related analyses, such as calculating total costs, average costs per instance type, and potential savings. 3️⃣Visualization: Generates visualizations using Matplotlib and Seaborn to represent the analysis results. 4️⃣Storage: Uploads the analysis results and visualizations to AWS S3. 5️⃣Retrieval: Allows users to download the analysis results and view visualizations through a web interface. Key Components and Code Snippets: Data Analysis with Pandas: We use Pandas to read and process the EC2 cost data. The analyze_ec2_costs() function performs the core analysis: def analyze_ec2_costs(df): analysis = {} if df.empty: logging.error("DataFrame is empty for analysis. Analysis aborted.") return analysis df.columns = [str(col).strip().lower().replace(" ", "").replace("$", "").replace("(", "").replace(")", "").replace("%", "") for col in df.columns] # ... (rest of the analysis logic) return analysis Visualization with Matplotlib and Seaborn: Visualizations are generated using Matplotlib and Seaborn. The generate_visualizations() function creates plots for instance type distribution, cost per region, CPU utilization, and recommendation breakdown: import matplotlib.pyplot as plt import seaborn as sns import os and def generate_visualizations(df): # ... (data preprocessing) plt.figure(figsize=(10, 6)) sns.countplot(y="instancetype", data=df, order=df["instancetype"].value_counts().index[:10], palette="pastel", hue="instancetype", legend=False) plt.savefig("instance_type_distribution.png") # ... (other visualizations) plt.close() AWS S3 Integration: We use the boto3 library to interact with AWS S3. The upload_to_s3() function uploads files to our S3 bucket: import boto3 import logging S3_BUCKET_NAME = "alexas-ec2-cost-analysis-bucket" S3_REGION = "us-east-1" def upload_to_s3(file_name, object_name=None): s3_client = boto3.client('s3', region_name=S3_REGION) try: logging.info(f"Attempting to upload '{file_name}' to S3: {S3_BUCKET_NAME}/{object_name}") s3_client.upload_file(file_name, S3_BUCKET_NAME, object_name) logging.info(f"File '{file_name}' uploaded to S3: {S3_BUCKET_NAME}/{object_name}") except Exception as e: logging.error(f"Error uploading file '{file_name}' to S3: {e}") print(f"Upload error: {e}") print(f"File path that was attempted to upload: {file_name}") print(f"S3 path that was attempted to use: {S3_BUCKET_NAME}/{object_name}") Flask Web Interface: We use Flask to create a web interface for our application. The /run_analysis route triggers the analysis and visualization generation: from flask import Flask, render_template, send_from_directory import os app = Flask(__name__) @app.route("/run_analysis") def run_analysis(): # ... (read data, analyze, generate visualizations, upload to S3) return "Analysis completed and results uploaded to S3." @app.route("/visualizations/") def get_visualization(filename): # ... (download visualization from S3 and return) return send_from_directory(".", local_filename, as_attachment=False) Matplotlib Backend Configuration: To ensure compatibility with our Flask application, we explicitly set Matplotlib to use the “Agg” backend: import matplotlib matplotlib.use('Agg') # Force Matplotlib to use the Agg backend Results: A password protected S3 Static Website that serves as the repository for all of the info we need: The ability to query and download the information, including source .csv and visualizations charts Key Considerations: 1️⃣Security: Password protection and secure storage of sensitive data are crucial. 2️⃣Scalability: Consider using asynchronous tasks or a message queue for long-running analysis. 3️⃣Error Handling: Implement robust error handling to gracefully handle exceptions and provide informative messages. 4️⃣User Experience: Design a user-friendly interface that provides clear and concise information. This application prov

Disclaimer: This article was previously published in Medium.com, link to article: https://medium.com/@alex.curtis_luit/from-data-to-dashboards-building-an-ec2-cost-analysis-tool-with-flask-and-aws-s3-4c0d312ea38f
If you don’t have your own magic hat and need to analyze 100,000 EC2s and provide recommendations….follow along!
How to decode, decipher and recommend changes that will save money on your businesses 100K EC2 operation. No magic hat?! No problem….enter Python!
This article explores how to build a Flask-based web application that analyzes EC2 cost data, generates insightful visualizations, and leverages AWS S3 for storage and retrieval.
The Application’s Core Functionality:
Our application aims to provide a user-friendly interface for analyzing EC2 cost data. It performs the following key tasks:
1️⃣Data Ingestion: Reads EC2 cost data from a CSV file using Pandas.
2️⃣Data Analysis: Performs various cost-related analyses, such as calculating total costs, average costs per instance type, and potential savings.
3️⃣Visualization: Generates visualizations using Matplotlib and Seaborn to represent the analysis results.
4️⃣Storage: Uploads the analysis results and visualizations to AWS S3.
5️⃣Retrieval: Allows users to download the analysis results and view visualizations through a web interface.
Key Components and Code Snippets:
- Data Analysis with Pandas:
We use Pandas to read and process the EC2 cost data. The analyze_ec2_costs()
function performs the core analysis:
def analyze_ec2_costs(df):
analysis = {}
if df.empty:
logging.error("DataFrame is empty for analysis. Analysis aborted.")
return analysis
df.columns = [str(col).strip().lower().replace(" ", "").replace("$", "").replace("(", "").replace(")", "").replace("%", "") for col in df.columns]
# ... (rest of the analysis logic)
return analysis
- Visualization with Matplotlib and Seaborn:
Visualizations are generated using Matplotlib and Seaborn. The generate_visualizations()
function creates plots for instance type distribution, cost per region, CPU utilization, and recommendation breakdown:
import matplotlib.pyplot as plt
import seaborn as sns
import os
and
def generate_visualizations(df):
# ... (data preprocessing)
plt.figure(figsize=(10, 6))
sns.countplot(y="instancetype", data=df, order=df["instancetype"].value_counts().index[:10], palette="pastel", hue="instancetype", legend=False)
plt.savefig("instance_type_distribution.png")
# ... (other visualizations)
plt.close()
- AWS S3 Integration:
We use the boto3
library to interact with AWS S3. The upload_to_s3()
function uploads files to our S3 bucket: import boto3
import logging
S3_BUCKET_NAME = "alexas-ec2-cost-analysis-bucket"
S3_REGION = "us-east-1"
def upload_to_s3(file_name, object_name=None):
s3_client = boto3.client('s3', region_name=S3_REGION)
try:
logging.info(f"Attempting to upload '{file_name}' to S3: {S3_BUCKET_NAME}/{object_name}")
s3_client.upload_file(file_name, S3_BUCKET_NAME, object_name)
logging.info(f"File '{file_name}' uploaded to S3: {S3_BUCKET_NAME}/{object_name}")
except Exception as e:
logging.error(f"Error uploading file '{file_name}' to S3: {e}")
print(f"Upload error: {e}")
print(f"File path that was attempted to upload: {file_name}")
print(f"S3 path that was attempted to use: {S3_BUCKET_NAME}/{object_name}")
- Flask Web Interface:
We use Flask to create a web interface for our application. The /run_analysis
route triggers the analysis and visualization generation:
from flask import Flask, render_template, send_from_directory
import os
app = Flask(__name__)
@app.route("/run_analysis")
def run_analysis():
# ... (read data, analyze, generate visualizations, upload to S3)
return "Analysis completed and results uploaded to S3."
@app.route("/visualizations/")
def get_visualization(filename):
# ... (download visualization from S3 and return)
return send_from_directory(".", local_filename, as_attachment=False)
- Matplotlib Backend Configuration:
To ensure compatibility with our Flask application, we explicitly set Matplotlib to use the “Agg” backend: import matplotlib
matplotlib.use('Agg') # Force Matplotlib to use the Agg backend
- Results:
A password protected S3 Static Website that serves as the repository for all of the info we need:
The ability to query and download the information, including source .csv and visualizations charts
Key Considerations:
1️⃣Security: Password protection and secure storage of sensitive data are crucial.
2️⃣Scalability: Consider using asynchronous tasks or a message queue for long-running analysis.
3️⃣Error Handling: Implement robust error handling to gracefully handle exceptions and provide informative messages.
4️⃣User Experience: Design a user-friendly interface that provides clear and concise information.
This application provides a foundation for building a comprehensive EC2 cost analysis tool. By leveraging the power of Pandas, Matplotlib, and AWS S3, we can create a powerful and insightful application, and we get to use our magic hat!
Github Link: Python_Applications/EC2_Costs.py at main · alexcurtis1969/Python_Applications