Python | Rojin-Data Analyst Expert

Visit all my Python projects

My expertise with Python allows me to handle diverse and complex data challenges with precision and efficiency. Here are some of the ways I leverage Python in my work:

Data Analysis and Manipulation: Utilizing Pandas and NumPy for advanced data cleaning, manipulation, and analysis, ensuring high data quality and integrity.
Data Visualization: Creating compelling and interactive visualizations using libraries like Matplotlib, Seaborn, and Plotly to uncover insights and tell data-driven stories.
Machine Learning: Building, training, and deploying machine learning models with scikit-learn, TensorFlow, and PyTorch for predictive analytics and intelligent systems.
Statistical Analysis: Applying statistical methods to understand data distributions, trends, and patterns, providing deeper insights into datasets.
Automation and Scripting: Automating repetitive tasks and workflows with custom Python scripts, enhancing productivity and efficiency.
API Integration: Integrating with various APIs to fetch, process, and analyze real-time data, enabling dynamic and responsive applications.
Data Pipelines: Designing and implementing data pipelines using frameworks like Apache Airflow to streamline data processing and ensure seamless data flow.
Algorithm Development: Developing custom algorithms to solve complex problems and optimize processes.
Object-Oriented Programming (OOP): Writing clean, modular, and reusable code leveraging OOP principles.

Sales Method Effectiveness and Revenue Analysis

Project Overview

This project involved validating and analyzing sales data using Python to understand customer interactions and revenue distribution over time. The goal was to determine the most effective sales strategy and provide actionable recommendations based on data insights.

Objective

To validate the dataset, clean the data, analyze revenue trends over time for different sales methods, and evaluate the impact of customer loyalty on revenue.

Methods

Data Validation and Cleaning: Validated and cleaned 15,000 rows and 8 columns, resulting in 13,926 rows after removing rows with missing values and correcting inconsistencies.
Exploratory Data Analysis (EDA): Applied EDA techniques to gain insights, visualize findings, and identify key relationships between features.
Trend Analysis: Analyzed revenue trends over time for each sales method.
Business Metrics: Used the slope of revenue versus time over a 6-week period as a key business metric to determine the most effective sales strategy.

Results

1.Data Validation:

- Week: Values between 1 to 6, no missing values.
- Sales_method: Consolidated into three categories: Call, Email, Email + Call.
- Customer_id: Unique values, no duplicates.
- Nb_sold: Values range from 7 to 16, average of 10.05, no missing values.
- Revenue: Values range from 32.54 to 238.32, average of 93.93. Removed 1,074 rows with missing values.
- Years_as_customer: Values between 0 to 63, average of 5, no missing values.
- Nb_site_visits: Values between 12 to 37, average of 25, no missing values.
- State: 50 unique states, no missing values, no inconsistencies.

2.Customer Distribution:

- Email: Most used method with over 7,000 contacts.
- Call: Less than 5,000 contacts.
- Email + Call: Approximately half the number of Call contacts.

3.Revenue Distribution:

- Non-normal distribution with two peaks around 50 and 80.
- Email + Call: Highest revenue (150-180).
- Email Only: Moderate revenue (80-110).
- Call Only: Lowest revenue (~50).

Business Metric:

Email + Call: Generated 1.9 times more revenue than Email and 3.6 times more than Call.
Rate of revenue increase: Email + Call at 18.9, Email at 6.1, Call at 5.1.

Insights

The Email + Call method is most effective, generating the highest revenue.
A drop in customer loyalty after five years suggests the need for targeted promotions.
Increased site visits correlate directly with higher revenue.

Conclusion

The analysis validates the dataset, highlights customer engagement through different sales methods, and uncovers revenue distribution patterns. The Email + Call method shows the highest revenue potential, emphasizing its effectiveness.

Recommendations

Adopt Email + Call: Use it as the main strategy due to its effectiveness.
Promotions After Five Years: Offer targeted promotions to sustain customer loyalty.
Website Improvement: Invest in improving the website to increase site visits.
Regular Emails: Continue sending regular emails due to their high impact.
Enhanced Data Collection: Collect additional customer data (age, sex) for better optimization and improve data quality to avoid loss.

View project

Stock Market Analysis of U.S. Financial Institutions

Project Overview

This project primarily focuses on exploring stock prices and assessing the stock market performance of six U.S. financial institutions. It involves using visualization techniques and improving proficiency with pandas. The packages utilized for this project are Pandas, Numpy, Matplotlib, Seaborn, Datetime, Pandas_datareader, Plotly, and Cufflinks. The project starts with distplots and concludes with heatmaps and graphs produced using Plotly. The datareader was modified to include stock tickers, using Stooq as the data source due to compatibility. The terms 'start' and 'end' refer to the beginning and conclusion dates of the data range.

Objective

To analyze stock prices, assess market performance, and visualize the results to understand the trends and risks associated with six U.S. financial institutions.

Methods

Data Preparation:
- Generated a list comprising all the tickers.
- Created an empty DataFrame to store returns.
- Applied the pct_change() method from pandas to the 'Close' column to calculate return values.
- Constructed a for loop to cycle through each Bank Stock Ticker and create a returns column.
Visualization:
- Utilized distplots to visualize data distributions.
- Generated heatmaps and graphs with Plotly for enhanced visual insights.
Event Analysis:
- Observed significant returns around January 20th, potentially linked to the inauguration of Barack Obama, indicating real-world events can influence the stock market.
Risk Assessment:
- Analyzed the standard deviation of stock returns to identify the most and least risky stocks.
- Conducted a year-specific analysis for 2015 to compare the risk profiles of different banks.

Results

Inauguration Effect: Minimum returns were observed on January 20th, likely due to political shifts.
JPM Return Analysis: The highest and lowest values were close in time, suggesting real-world events might not always significantly influence stock movements.
Standard Deviation Analysis:
- Citi Group was identified as the riskiest stock with a standard deviation of 0.038.
- Goldman Sachs emerged as the most stable stock with a standard deviation of 0.025.
- In 2015, Wells Fargo had the least risk (0.0125), while Morgan Stanley and Bank of America were the most vulnerable with standard deviations of 0.0161 and 0.0160, respectively.

Conclusion

The analysis of stock returns and volumes reveals common trends among six U.S. financial institutions:

Significant minimum returns occurred on January 20th, coinciding with the presidential inauguration.
The highest and lowest returns for JPM were close in time, indicating a limited impact from the inauguration day.
Standard deviation analysis shows Citi Group as the riskiest and Goldman Sachs as the most stable stock.
The risk profiles of banks were similar in 2015, with variations in specific institutions' risk levels.
Historical data highlighted the impact of the 2008-2009 economic downturn on Goldman Sachs and Citigroup more than other banks.
A detailed 2015 analysis showed rising stock prices from March to August, followed by a sharp decline in Morgan Stanley and Bank of America.

This project demonstrates the importance of understanding stock market trends and the influence of external events on financial institutions.

View project

Emergency Calls Analysis Using 911 Dataset

Project Overview

This project utilizes a dataset sourced from emergency calls to 911, consisting of 99,493 rows and 9 columns. The data reveals that the top five townships for 911 calls are Lower Merion, Abington, Norristown, Upper Merion, and Cheltenham.

Objective

To analyze the 911 call data to identify patterns and trends, helping to understand the distribution and frequency of emergency calls.

Methods

Data Extraction: Utilized lambda functions to extract and categorize call reasons, with primary categories being EMS (Emergency Medical Services), Fire, and Traffic. The most frequent reasons identified were EMS (48,877 calls), Traffic (35,695 calls), and Fire (14,920 calls).
Data Visualization: Used Seaborn to create joint plots and lm plots. The lm plot showed a decline in the number of calls over the months, indicating a potential issue with month 14, as Seaborn did not account for only twelve months.
Heatmap Creation: Restructured the data frame using the unstack method to have Hours as columns and Days of the week as the index, facilitating the creation of a heatmap.
Cluster Map: Created cluster maps to show the contrast in call volumes between different days and hours.

Results

Call Frequency Analysis: The heatmap showed a decrease in calls between 12 am and 5 am, correlating with typical sleep hours. Most calls occurred during the daytime, with fewer calls on Sundays and Saturdays.
High-Volume Days: The data indicated high call volumes on Monday, Wednesday, and Tuesday, as well as Fridays between 8 am and 5 pm. There was also a noticeable decrease in call volume on weekend mornings.
Seasonal Trends: An increase in call frequency during the summer months compared to the winter months was observed.

Conclusion

By analyzing the 911 call data, we identified specific hours and days with the highest call frequencies. Most emergency calls occur during the day and on weekdays, with a significant drop during late-night hours and weekends. The seasonal spike in call frequency during the summer months highlights the need for increased emergency services during this period. This analysis provides valuable insights for optimizing emergency response strategies and resource allocation.

View project