Hi, I'm Kalyani Tekade.
A
Self-driven, quick starter and passionate analyst with a curious mind who enjoys solving complex and challenging real-world problems.
About
I’ve always been a curious person, eager to understand the reasoning behind everything. Growing up, I enjoyed solving puzzles and found satisfaction in seeing the complete picture come together. This natural curiosity led me toward a career in data analytics.
I'm currently pursuing my Master’s in Computer Science at Arizona State University, where I'm expanding my skills in Data Analytics and Machine Learning. I bring hands-on experience from Doosan Bobcat, Unilever and Defense Research and Development Organization.
- Languages/Databases: Python, Microsoft SQL Server, MySQL, PostgreSQL, Oracle DB
- Data Analysis and Visualization: Excel, Power BI, NumPy, Pandas, Matplotlib, Seaborn, Plotly
- Machine and Deep Learning Frameworks: Sklearn, Surprise, TensorFlow, PyTorch, Keras, SciPy
- Tools/Software: Microsoft Power Automate, Salesforce, Git, Jira, VS Code, Confluence, Kanban, Apache Spark
- ML Techniques: Classification, Regression, Clustering, Recommendation, Neural Networks
- Certificates:
- Microsoft Certified Power BI Data Analyst Associate
- Google Data Analytics
- Microsoft Certified Office Specialist : Excel
- Project Management
Looking for an opportunity to work in a challenging position combining my skills in Data Analysis and Machine Learning, which provides professional development, and personal growth.
Experience
- Wrote a Python script leveraging Gemma3 LLM to determine whether additional labor was for diagnostic time, generating insights that drove policy reformulation with projected savings of over $3M.
- Analyze dealership activity using Excel and Salesforce to identify inactive dealerships, directly driving $150,000 in cost savings.
- Deploy and maintain interactive Power BI dashboards to monitor supplier recovery, enhancing visibility and decision-making processes.
- Develop Python automation scripts that reduce manual image retrieval time by 90% through bulk-processing warranty claim documents via server APIs based on Excel inputs, enabling non-conformance detection.
- Implement an Excel-based application to calculate warranty refund amounts based on product usage, lowering processing time by 70% and saving an estimated 250 labor hours annually.
- Engineer a Dealer Auditing Dashboard using Power BI to streamline the auditing process and enable effective tracking and analysis of non- conformance activities across dealerships, improving audit accuracy by 35% and reducing manual effort.
- Build a Selenium-based web scraping automation tool that extracts, downloads, and consolidates PDFs into centralized repositories, reducing manual document retrieval time by 85%.
- Assisting in course preparations and developing class materials.
- Grading assignments, quizzes, and exams meticulously.
- Engaging with students to promptly address questions and concerns, fostering a dynamic and interactive learning environment.
- Monitoring online platforms to maintain academic integrity and discourage unauthorized sharing.
- Conducted data-driven analysis on past student performance using Excel, providing insights to the instructor that helped identify areas where students needed additional support, leading to a 20% improvement in overall class output.
- Collaborated with cross-functional teams to design and optimize data collection process for the Unilever Technology Portfolio dashboard, leading to a 70% improvement in data standardization by ensuring consistent formats.
- Engineered an Excel system with auto-fill functions and dynamic drop-down menus, increasing data submission by 40% from over 50 global teams.
- Created pivot tables and charts to assign teams red, amber and green statuses based on the data filled, helping managers and stakeholders quickly identify discrepancies in data and take corrective actions.
- Constructed multiple comprehensive mock-up dashboards in Excel for requirement analysis, collaborating with managers and stakeholders, resulting in a 30% improvement in project alignment.
- Created a Power BI dashboard, leveraging the Excel system as a data source and using DAX-calculated fields to track KPIs, reducing manual processing time by 60%.
- Built an end-to-end automated data pipeline using Python script to extract data from multiple sources and consolidate it into a single Excel sheet, reducing Power BI's data processing time by 30% and improving report generation efficiency.
- Accelerated the completion of ad hoc requests using Excel and Python, enabling faster data-driven decision-making across teams.
- Implemented Jira timeline to track data update frequency across teams, offering stakeholders visibility into upcoming data refresh schedules.
- Gathered and centralized experimental results from 20+ raw material batches per month, utilizing SQL to store data in a high-integrity database and, reduce retrieval time by 30%.
- Cleaned, and performed statistical analysis on consolidated test data using Python and Excel cutting manual analysis time by 60%.
- Deployed regression models to analyze parameter relationships, identifying trends and patterns that boosted predictive accuracy by 15%.
- Created documentation on workflows and analysis findings using Confluence, lowering repetitive queries by 25% and speeding ad hoc requests.
- Automated onboarding and email notification workflows using Power Automate, saving approximately 15 hours per week.
Projects

Personalized book suggestions using Recommendation model
- Tools: Pandas, NumPy, Matplotlib, Seaborn, Scikit-learn, Surprise
- Engineered a content-based recommendation system leveraging TF -IDF Vectorizer to compute book similarities based on metadata, resulting in a personalized book suggestions based on user preference.
- Implemented collaborative filtering techniques, including KNNBasic and SVD using the Surprise library, to predict user-book ratings and provide recommendations across genres.
- Optimized the SVD model through GridSearchCV hyperparameter tuning, achieving a test RMSE of 0.827.

Clustering customer behaviors through RFM Modelling
- Tools: Pandas, NumPy, Matplotlib, Seaborn, Scikit-learn
- Performed Exploratory Data Analysis and Feature Engineering with Python libraries to create RFM values from raw data and applied log transformation to address data skewness, improving clustering performance by 15%.
- Fine-tuned clustering models (K-means, K-means++, DBSCAN, Agglomerative) to optimize customer clusters, achieving a Silhouette Score of 0.47.
- Classified customers into four distinct segments (Best, At-Risk, New, Lost) enabling data-driven targeted marketing strategies.

A Chatbot Intent Recognition System
- Tools: Pandas, NumPy, Scikit-learn, TensorFlow, Keras, NLTK, Matplotlib
- This project involves the development of a Chatbot Intent Recognition System that leverages machine learning techniques to accurately understand user intents from text inputs.
- Developed a machine learning pipeline for training a neural network model using TensorFlow and Keras, designed specifically for intent recognition tasks.
- Utilized clustering techniques to group similar intents and improve the accuracy and efficiency of the chatbot's response system.
- Evaluated the model's performance using various metrics, including accuracy, precision, recall, and F1-score, to ensure high reliability and performance.

Predictive Modeling for Ride Request Demand
- Tools: Pandas, NumPy, Seaborn, Matplotlib, Scikit-learn, TensorFlow/Keras, Pandas Profiling, HTML
- The project aims to forecast ride request demand using historical data.
- The HTML report is generated using Pandas Profiling provides a comprehensive overview of the dataset, including descriptive statistics and correlation analysis.
- Various machine learning models, including Decision Trees, Random Forest, Support Vector Machines, and Neural Networks, were trained and evaluated to optimize performance.
- A prediction pipeline is created to process new input data and generate predictions using the trained model.

Optimizing Customer Retention with Ensemble Models
- Tools: Pandas, NumPy, Scikit-learn, Matplotlib, SMOTE
- The project involves building a machine learning model to classify and predict customer churn.
- Developed feature engineering and model validation techniques to enhance predictive performance, aiding in better identification of at-risk customers.
- SMOTE (Synthetic Minority Over-sampling Technique) is used to address class imbalance, ensuring the minority class is well-represented.
- Various machine learning models, including Logistic Regression, Random Forest, AdaBoost, and Gradient Boosting, are trained and evaluated using metrics like ROC-AUC and confusion matrices.

A system that matches your resume to the job description.
- Tools: Pandas, NumPy, Scikit-learn, Matplotlib, Gensim, NLTK, Flask
- The project aims to build a resume matching system that evaluates how well a candidate's resume matches a given job description.
- The matching algorithm utilizes the Doc2Vec model from Gensim to convert resumes and job descriptions into numerical vectors.
- The model is trained and evaluated using metrics like ROC-AUC and confusion matrices to ensure its accuracy.
- A Flask application is built to provide a user interface for uploading resumes and job descriptions, and to display the matching results in a user-friendly manner.

Forecasting Daily Sales for Rossmann Stores
- Tools: Pandas, NumPy, Scikit-learn, Seaborn, Matplotlib, Flask, Joblib
- The project focuses on predicting sales using a machine learning model.
- The process involves data cleaning to handle missing values, transforming categorical features into numerical values using encoding techniques, and assessing feature importance to identify significant predictors for sales.
- The machine learning models used include Linear Regression, Stochastic Gradient Descent (SGD) Regression, Random Forest Regression, and Decision Tree Regression.
- Decision Tree model is deployed in a Flask web application, allowing users to input data and receive sales predictions.

Identifying similar images using multiple feature descriptors.
- Tools: PyTorch, Torchvision, Pandas, NumPy, PIL, Scikit-image, Matplotlib, Scipy
- The project aims to identify similar images in the Caltech101 dataset using various feature descriptors.
- Color moments (mean, standard deviation, skewness) and histogram-oriented gradients (HOG) are calculated to capture image features.
- These feature descriptors are then used to compute distances (Euclidean, Cosine, Pearson, etc.) between images.
- The system utilizes concurrent processing to enhance efficiency and visualizes the most similar images based on the computed distances.
Skills
Languages and Databases






Data Analysis and Visualization





Machine and Deep Learning Frameworks





Education
Tempe, USA
Degree: Master of Science in Computer Science
CGPA: 4.0/4.0
- Data Mining
- Statistical Machine Learning
- Data Processing at Scale
- Time Series Analysis/Forecasting
- Multimedia and Webdatabases
Relevant Courseworks:
Ramaiah Institute of Technology
Bangalore, India
Degree: Bachelor of Engineering in Electronics and Communicaion Engineering
CGPA: 9.33/10
- Machine and Deep Learning
- Data Structures using C++
- Fundamentals of Computing
- Cryptography, Network and Cyber Security
- Information, Learning and Inference
Relevant Courseworks: