Shashank Guda

Intro

With over 4 years of experience in data analytics and applied machine learning, I specialize in transforming complex data into meaningful insights that inform strategy and drive impact. My work spans various domains, supporting teams in solving business challenges through data. Feel free to explore some of my recent work.

I hold a Master of Science in Applied Data Science from Syracuse University, where I focused on data-driven problem solving, AI systems, and scalable analytical solutions. My background combines consulting, research, and product-oriented work, enabling me to bridge the gap between data science and real-world outcomes.

I’m particularly interested in how AI and machine learning can be applied responsibly and effectively across industries from building smarter tools to enabling better decisions. I’m always open to opportunities where data, innovation, and impact intersect.

Educational Details

University	Degree	GPA
Syracuse University (May 2025)	M.S. in Applied Data Science	3.97/4
SRM University (May 2021)	B.Tech. in Electronics & Communication Engineering	9.15/10

Snagged a full tuition scholarship 💰 for grad school

Proud Recipient of the Graduate Student Excellence Award

Receiving Graduate Student Excellence Award - Applied Data Science at Syracuse University

Honored to receive the Master’s Degree Award for academic excellence and research contributions in the Applied Data Science program at Syracuse University. This recognition is awarded to one graduating student each year and reflects my commitment to impactful, innovation-driven work at the intersection of data science and real-world problem solving. Watch me receive the award ⬇️

Badges Earned

These badges show my ongoing efforts to keep learning and growing. Each one represents a new skill I’ve picked up or a course I’ve completed. They’re not just symbols; they’re proof of my dedication to staying updated and improving in my field. From learning new tools to exploring advanced analytics, these badges highlight my passion for continuous improvement.

More Info On Badges Earned

Last Updated: June 2025

Work

Experience

Company	Position	Time Period
EY	Sr. Consultant (Data Engineering)	Oct 2025 - Present
Inferenz.ai	Jr. AI/ML Engineer	May 2024 - Aug 2024
Tredence Inc.	Analytics Consultant	Jan 2023 - Jun 2023
Tredence Inc.	Data Analyst	Jun 2021 - Jan 2023
Cognizant	Web Developer (Full-Stack)	Jan 2021 - Jun 2021

Also worked as a Research Analyst at the School of Information Studies, Syracuse University, where I contributed to AI-focused research exploring real-world applications of emerging technologies and helping shape innovative data-driven solutions.

Inferenz

As a Jr. AI/ML Engineer (Intern) at Inferenz, I contributed to the development of scalable, production-ready AI applications with a focus on NLP and generative AI. My work involved optimizing conversational systems and integrating Retrieval-Augmented Generation (RAG) with enterprise-grade infrastructure.

AI-Powered Chatbot Development

Redesigned the chatbot architecture using asynchronous API calls and parallel processing in Snowflake, resulting in faster task handling and improved performance across key NLP workflows.

Prompt Engineering & Token Efficiency

Applied context pruning and advanced prompt design to reduce unnecessary token consumption. Achieved more efficient interactions while preserving the quality of model outputs.

RAG System Implementation

Deployed a RAG pipeline combining OpenAI with a Snowflake-hosted vector store. Integrated semantic search using embeddings to ground responses with contextual accuracy and up-to-date information.

Accuracy & Performance Tuning

Improved chatbot accuracy by leveraging LangChain caching, parallel document handling, and Snowflake’s scalable compute engine—boosting overall response reliability and speed.

Impact Delivered

40%

Reduction in Response Time

28%

Decrease in Token Usage

25%

Improvement in Chatbot Accuracy

✔

Productionized RAG in Snowflake

Tredence Inc.

At Tredence Inc. as Analytics Consultant, I worked with Unilever to deliver data-driven strategies that improved market expansion, store performance, and data pipeline scalability. My role involved forecasting demand, optimizing store placements, and streamlining ETL workflows using cloud and distributed computing platforms.

Store Expansion & Market Intelligence

Forecasted demand and identified optimal locations for 1,200 Unilever stores across 16 markets using geospatial analytics, Power BI, and demographic analysis—contributing over $1M in annual revenue gains through delivery efficiency.

Performed spatial analysis using foot traffic, competitor proximity, and market data to increase profitability by 15% and store visibility by 22%.

ETL Pipeline Development & Automation

Designed scalable ETL pipelines in Databricks and PySpark, integrating and cleaning 52+ CSV data sources to ensure consistent, high-quality data for downstream analytics.

Automated data validation and quality monitoring through custom alert workflows reducing manual checks by 80% while ensuring schema compliance and data integrity.

Dashboarding & Reporting

Built Power BI dashboards with DAX-based KPIs to monitor data completeness, consistency, and accuracy achieving 98% data coverage and improving decision-making efficiency.

Cloud Optimization & Performance Tuning

Used Azure Data Factory and Databricks to optimize data ingestion and scheduling workflows, reducing pipeline latency by 35% while ensuring seamless scalability.

Performed root cause analysis using SQL and profiling tools to diagnose and fix data pipeline bottlenecks—achieving a 20% improvement in processing time.

Agile Project Delivery

Led a team of 3 analysts using Jira and Azure Project Management to execute sprint-based development, ensuring agile delivery and cross-functional collaboration with Unilever stakeholders.

Impact Delivered

$1M+

Revenue from Store Optimization

15%

Increase in Store Profitability

80%

Reduction in Manual QA Checks

35%

Pipeline Latency Reduction

Cognizant

During my internship at Cognizant, I gained foundational experience in backend data handling and front-end web development. I contributed to internal tools by managing databases and creating web interfaces while sharpening my problem-solving and collaborative skills.

Database Management

Developed and maintained efficient relational databases to support project requirements, focusing on scalability and performance.

Implemented optimized data retrieval and transformation logic using SQL queries to enhance data processing speed.

Worked with PostgreSQL and MySQL to ensure secure and structured data handling across modules.

Web Development

Built responsive, interactive web pages using HTML and CSS, improving user experience for internal tools.

Applied mobile-first design principles to ensure accessibility and performance across multiple screen sizes and browsers.

Impact Delivered

Seamless

Back-End Integration for Internal Tools

Accelerated

Transition from Theory to Practical Development

Campus Leadership

Recitation Lead – IST 195

Selected to lead and mentor a class of over 100 undergraduate students for the course "Information Technologies." Delivered weekly sessions simplifying technical concepts, assisted in exam prep, and served as a bridge between students and faculty.

Board Member – University Conduct Board

Appointed to Syracuse University's Conduct Board to review student conduct cases and uphold the institution’s values of fairness, integrity, and accountability. Worked closely with administration to ensure due process and equitable resolution.

What Others Say

Jeff Rubin

SVP & Chief Digital Officer, Syracuse University

“Shashank consistently went above and beyond as a recitation lead — delivering on time, enhancing the student experience, and contributing meaningfully to the class culture. A natural leader that any team would benefit from.”

Jeff Saltz

Professor, School of Information Studies, Syracuse University

“Shashank is a smart, curious, and hardworking student who consistently goes above and beyond. In our Generative AI class, he led his team in building an impressive chatbot and actively supported his peers — a true reflection of why he earned the Graduate Student Excellence Award.”

Scott Bryan

President & CEO, Macronomics Inc. & Advisor, E78 Partners

“Shashank is a brilliant, driven, and highly skilled data science consultant with a rare ability to turn complex ideas into impactful solutions. His work ethic, leadership, and collaborative mindset make him an asset to any team. I highly recommend him, he will exceed expectations and deliver outstanding results.”

Keval R Menon

Senior Manager (Analytics), Tredence Inc.

“Shashank brought deep analytical thinking, technical expertise, and strong leadership to our data science team. He took initiative on high-impact projects, automated complex pipelines, and consistently delivered results under pressure, all with clarity, ownership, and professionalism.”

Rahul Kumar

Manager, Tredence Inc.

“Shashank has a sharp analytical mind and a knack for solving complex problems. His solutions consistently exceeded expectations, and his collaborative nature made him a valuable asset to the team.”

Archana Mishra

Associate Manager, Tredence Inc.

“From technical execution to research passion, Shashank stood out across projects. His performance on the Unilever initiative and award-winning delivery reflect his excellence and commitment to impact.”

Posts

Medium Total Number of Posts (Till Now): 20

Data Reconciliation with GenAI: Can LLMs Solve a Billion Dollar Banking Headache?

Posted on Sep 15, 2025: 📖 47 min read ⏳

Link to article

Lakehouse, Agent Bricks & Lakeflow: Learnings from Databricks DevConnect Chicago 2025

Posted on Aug 20, 2025: 📖 26 min read ⏳

Link to article

Why Your Data Lake Became a Swamp & How Data Contracts Can Save It: Technical Deep Dive

Posted on Aug 04, 2025: 📖 36 min read ⏳

Link to article

Parameter Efficient Fine Tuning (PEFT) Techniques for Large Models: How to Fine Tune LLMs Without Breaking the Bank

Posted on Jul 21, 2025: 📖 52 min read ⏳

Link to article

Context Is the New Code: Engineering Intelligence at Scale

Posted on Jul 14, 2025: 📖 25 min read ⏳

Link to article

LLM Deployments Aren’t Plug & Play: Building for Scale and Efficiency

Posted on Jul 04, 2025: 📖 47 min read ⏳

Link to article

AI is making your refrigerator louder!: The Unconventional Side Effects of AI

Posted on Jun 21, 2025: 📖 21 min read ⏳

Link to article

LLMs Are Great, Until You Talk to Them Twice!: Why Chatbots Struggle in Multi-Step Dialogues

Posted on Jun 02, 2025: 📖 23 min read ⏳

Link to article

Perception Language Models: When AI Can See and Speak

Posted on May 17, 2025: 📖 22 min read ⏳

Link to article

The Hidden Thoughts of AI: When Chain-of-Thought Doesn’t Tell the Whole Truth

Posted on Apr 08, 2025: 📖 21 min read ⏳

Link to article

The State of AI Models: Scaling, Reasoning, and Agentic Intelligence

Posted on Mar 20, 2025: 📖 22 min read ⏳

Link to article

Challenges & Criticisms of LangChain

Posted on Mar 2, 2025: 📖 15 min read ⏳

Link to article

Understanding LLM-as-a-Judge: The Future of Automated Evaluation

Posted on Jan 09, 2025: 📖 7 min read ⏳

Link to article

Beyond Tokens: Large Concept Models in AI

Posted on Dec 27, 2024: 📖 7 min read ⏳

Link to article

Best-of-N Jailbreaking: How Simple Tricks Can Evade AI Safety Measures Across Text, Images, and Audio

Posted on Dec 16, 2024: 📖 5 min read ⏳

Link to article

LangChain — A Quick Refresher

Posted on Nov 10, 2024: 📖 11 min read ⏳

Link to article

How the 2024 Nobel Laureates in Physics Shaped Modern Machine Learning

Posted on Oct 26, 2024: 📖 6 min read ⏳

Link to article

Summer 2024 Internship Experience at Inferenz.ai

Posted on Aug 22, 2024: 📖 4 min read ⏳

Link to article

Garbage In, Garbage Out: How Data Poisoning Can Corrupt AI

Posted on June 24, 2024: 📖 6 min read ⏳

Link to article

The Truth Behind the Shine: Fake Offers in Today’s Job Market

Posted on May 9, 2024: 📖 6 min read ⏳

Link to article

Projects

YouTube Total Number of Projects (Till Now): 10

Injury Prediction in Basketball Players – iHoop Insights

Award: 🏆 Winning Team of the Injury Prediction Challenge for innovative risk scoring model and actionable insights.

View Announcement

This project focuses on predicting injury risk in basketball players by analyzing their performance and physiological metrics. The model is designed to support sports scientists and trainers in minimizing injury occurrences through early detection and intervention.

Dataset: 2,604 records of 14 players (Jan–Dec 2023), containing performance stats, muscle imbalance data, and injury logs.
Data Analysis: Explored injury trends, positional risk factors, and muscle imbalance patterns using visualizations and statistical methods (p-values, correlations).
Injury Prediction Model:
- Random Forest Classifier: Achieved high recall (0.98) for injured players and an AUC of 0.90.
- Risk Scoring: Players categorized into Very Low, Low, Moderate, and High risk based on prediction scores.
Key Insights: Muscle imbalances, especially in the hamstring-to-quad and calf regions—were strong predictors. Guards showed the highest average risk.
Challenges: High class imbalance and sparse injury-related fields, requiring careful handling and domain-specific feature engineering.

Analysis Notebook
Technical Report
Project Video

Anomaly Detection in Auxiliary Power Unit (APU) of Metro Trains

This project focuses on detecting anomalies in the Auxiliary Power Unit (APU) of metro trains using sensor data. The goal is to enable predictive maintenance, enhance system reliability, and minimize downtime by identifying potential failures early.

Dataset: MetroPT dataset with 1,516,948 rows and 17 columns (February to August 2020).
Data Preprocessing: Schema definition, data cleaning, and exploratory data analysis (EDA) using correlation heatmaps and temporal analysis.
Anomaly Detection Techniques:
- K-Means Clustering: Identified normal (Cluster 0) and anomalous (Cluster 1) operations.
- LSTM Autoencoder: Detected anomalies based on reconstruction error (95th percentile threshold).
Key Results: Anomalies peaked during early morning hours (2 AM - 5 AM) and aligned with recorded failure events.
Challenges: Implementing Isolation Forest and One-Class SVM with PySpark and determining anomaly thresholds.

Analysis Notebook
Technical Report

COMPASS - University Recommendation System 🎓

COMPASS is an AI-powered university guidance system that helps international students find and track university programs, living expenses, and career opportunities in the United States. The system provides personalized recommendations based on user preferences and maintains an interactive chat interface for queries about universities, costs, weather, and job prospects.

Personalized University Recommendations: Based on field of study, budget, location, and weather preferences.
Interactive Chat Interface: Ask about university programs, living expenses, weather conditions, and job market trends.
Application Tracking: Manage applications, deadlines, and document requirements with downloadable templates.
Resource Generation: Generate application checklists in DOCX format and CSV templates for tracking.

Technical Stack 💻

Frontend: Streamlit
Database: ChromaDB with OpenAI embeddings
APIs: OpenAI GPT-4, OpenWeather API
Data Processing: Pandas, Python-docx

Technical Report
Project Video
Application

Tokyo Olympics in Data 🎌

This project aims to build a cloud-based data pipeline using Azure services to analyze and visualize the 2021 Tokyo Olympics dataset. The pipeline integrates data ingestion, transformation, and visualization to unlock insights into athlete demographics, country performance, and event participation.

Technologies Used: Power BI SQL

Data Ingestion (Azure Data Factory): Automated extraction of data from a GitHub-hosted CSV file into Azure.
Data Storage (Azure Data Lake Gen2): Scalable and secure storage for raw and processed data.
Data Transformation (Azure Databricks): Cleansing and processing data using Spark.
Data Analysis (Azure Synapse Analytics): SQL-based querying and advanced analytics.
Visualization (Power BI): Interactive dashboards displaying insights and performance metrics.

Technical Report
Project Video

LEAP – Personalized Learning Path Generator

LEAP is a web application that generates a personalized learning path based on users' educational background, skills, and career goals. It leverages AI-driven models to create customized plans that include key concepts, curated resources, and estimated timelines, making it easier for users to achieve their learning objectives.

Built a web application using Python, Streamlit, and the GROQ LLM API to design personalized learning paths.

Integrated AI to recommend curated resources, breaking down complex transitions into actionable steps.

Developed a feature to generate downloadable .docx files for users, enabling structured offline access to their learning plans.

Provided estimated completion timelines and progress tracking, optimizing the user's learning experience.

Designed an intelligent recommendation system to suggest resources from trusted platforms, enhancing learning efficiency.

Technical Report
Streamlit App

Sage - First Aid Simplified

Sage is a health chatbot developed using Python, LangChain, and Streamlit, designed to diagnose injuries and provide probable precautions based on user input. Leveraging advanced natural language processing and machine learning techniques, Sage offers accurate and timely health advice, ensuring users receive relevant information and guidance for their symptoms.

Developed a health-focused chatbot using Python, integrating LangChain for natural language processing and Streamlit for the user interface.

Implemented advanced NLP techniques to accurately interpret user-reported symptoms and health concerns.

Integrated Large Language Models (LLMs) to enhance the chatbot's language understanding and response generation capabilities.

Created an interactive, conversational user experience that provides real-time health advice and injury diagnosis.

Designed the system to offer tailored recommendations based on individual user inputs, ensuring personalized health guidance.

Received recognition through the Wolfram Award, highlighting the project's innovation and potential impact in the health tech space.

Technical Report

Equal Eyes

EqualEyes aims to advance image captioning technology by combining recent advances in image recognition and language modeling to generate rich and detailed descriptions beyond simple object identification. Through inclusive design and training on diverse datasets, the project seeks to create a system accessible to all users, particularly benefiting individuals with visual impairments. Stakeholders include visually impaired individuals, educators, and developers.

Developed an image captioning system that generates rich, descriptive captions going beyond naming objects by combining advanced image recognition and language modeling techniques.

Implemented data preprocessing pipelines, including image augmentation, text tokenization, and vectorization to prepare diverse datasets for model training.

Explored and evaluated multiple state-of-the-art model architectures like CNN Encoder-Decoder, Vision Transformers (ViT-GPT2), and BLIP for image encoding and caption generation.

Conducted extensive data exploration and analysis on the image-caption dataset, examining image size/orientation distributions, caption lengths, word frequencies, and image quality assessments.

Implemented evaluation metrics focused on measuring how well generated captions capture the full context of images beyond just object presence.

Developed a working web application that takes images as input, processes them through the trained captioning model, and generates descriptive captions with audio output for accessibility.

Technical Report

Analyzing Austin Animal Center Data for Enhanced Adoption Strategies

This project involves a comprehensive analysis of data from the Austin Animal Center to understand trends in animal intakes, outcomes, and stray locations. By merging and analyzing multiple structured datasets, project aims to identify factors contributing to stray animal cases and develop strategies to address the issue. The analysis includes exploratory data analysis, preprocessing, and actionable insights to improve adoption rates and animal welfare.

Preprocessed and integrated three datasets (Austin Animal Center Intakes, Outcomes, and Stray Map) by handling missing values, removing duplicates, and performing inner joins to create a unified dataset for analysis.

Conducted exploratory data analysis on the intake dataset to examine distributions of animal types, intake conditions, sexes, ages, and breeds, identifying trends and potential areas of focus.

Analyzed outcome data to determine common outcomes (adoption, transfer, euthanasia) across different animal types, ages, and assessed top breeds for targeted adoption efforts.

Performed geospatial analysis on the stray animal map data, pinpointing urban hotspots and frequent locations for stray animal findings to guide targeted interventions and resource allocation.

Investigated correlations between animal age at intake and outcome to derive insights for optimizing adoption strategies based on age groups and tailoring marketing/fostering approaches .

Developed visualizations, including bar charts, heatmaps, and geographic maps, to effectively communicate key findings and patterns related to intake sources, outcome distributions, and stray locations.

Synthesized analysis results to propose data-driven recommendations for the Austin Animal Center, such as sterilization programs, adoption campaigns, resource allocation, and improvements to recordkeeping and identification practices.

Technical Report

Data Analysis For Energy Consumption & Conservation Strategies For eSC

In this project, we spearheaded a comprehensive analysis of energy consumption patterns with a keen focus on peak demand during the hot summer months, particularly in July. Leveraging a robust toolkit that included R Studio, Shiny app development, and advanced data cleaning and merging techniques, we delved into the intricacies of energy data to derive meaningful insights.

Conducted a meticulous analysis of energy usage data, employing data cleaning and merging techniques to ensure the integrity and accuracy of the dataset.

Utilized R Studio to identify key drivers of high demand during peak periods, specifically in July, shedding light on the factors contributing to increased energy consumption during critical periods.

Developed predictive models using linear modeling, decision trees, and random forest algorithms. These models were instrumental in forecasting future energy demand scenarios, providing a quantitative basis for understanding the potential impact of conservation initiatives.

Formulated strategic recommendations for the Energy Services Company (eSC) aimed at managing demand during peak periods. Explored alternative approaches beyond the traditional method of building additional power plants, considering innovative conservation initiatives.

Presented the comprehensive analysis and strategic plan to key stakeholders, highlighting the findings and recommendations. Customized Shiny app dashboard was utilized to provide an interactive and intuitive platform for stakeholders to engage with the insight

Technical Report
Shiny App

Harmony Hub (DBMS for Music Streaming Service)

In the creation of Harmony Hub, a Database Management System (DBMS) tailored for a Music Streaming Service, I led the design and development of a robust and comprehensive solution that seamlessly organized and stored data related to tracks, artists, and streaming history.

Technologies Used: SQL Database Management Microsoft Data Studio

Designed and developed an end-to-end music streaming database solution using SQL and Microsoft Data Studio. This encompassing solution provided a structured and efficient platform for organizing a vast array of data, ensuring optimal performance.

Engineered optimized table schemas to efficiently ingest streaming data from source systems. These schemas were meticulously designed to transform raw streaming data into analysis-ready datasets, laying the foundation for detailed usage analytics.

Implemented a streamlined process for ingesting streaming data from various source systems, ensuring the continuous flow of information into the database. This facilitated real-time updates and maintained the integrity of the dataset.

Utilized the power of SQL queries and stored procedures to grant key stakeholders self-service access to streaming analytics. This empowerment enabled decision-makers to delve into usage patterns, contributing to data-driven decision-making in areas such as artist payments and content recommendations.

The implementation of self-service analytics played a pivotal role in enhancing decision-making processes related to artist payments and content recommendations. Stakeholders could navigate and extract insights independently, fostering a more agile and responsive approach to business strategies.

Presentation

Contact

Availability Schedule Book a meeting with me

Résumé

Elements

Text

This is bold and this is strong. This is italic and this is emphasized. This is ^superscript text and this is _subscript text. This is underlined and this is code: for (;;) { ... }. Finally, this is a link.

Heading Level 2

Heading Level 3

Heading Level 4

Heading Level 5

Heading Level 6

Blockquote

Fringilla nisl. Donec accumsan interdum nisi, quis tincidunt felis sagittis eget tempus euismod. Vestibulum ante ipsum primis in faucibus vestibulum. Blandit adipiscing eu felis iaculis volutpat ac adipiscing accumsan faucibus. Vestibulum ante ipsum primis in faucibus lorem ipsum dolor sit amet nullam adipiscing eu felis.

Preformatted

i = 0;

while (!deck.isInOrder()) {
    print 'Iteration ' + i;
    deck.shuffle();
    i++;
}

print 'It took ' + i + ' iterations to sort the deck.';

Lists

Unordered

Dolor pulvinar etiam.
Sagittis adipiscing.
Felis enim feugiat.

Alternate

Dolor pulvinar etiam.
Sagittis adipiscing.
Felis enim feugiat.

Ordered

Dolor pulvinar etiam.
Etiam vel felis viverra.
Felis enim feugiat.
Dolor pulvinar etiam.
Etiam vel felis lorem.
Felis enim et feugiat.

Icons

Actions

Table

Default

Name	Description	Price
Item One	Ante turpis integer aliquet porttitor.	29.99
Item Two	Vis ac commodo adipiscing arcu aliquet.	19.99
Item Three	Morbi faucibus arcu accumsan lorem.	29.99
Item Four	Vitae integer tempus condimentum.	19.99
Item Five	Ante turpis integer aliquet porttitor.	29.99
		100.00

Alternate

Name	Description	Price
Item One	Ante turpis integer aliquet porttitor.	29.99
Item Two	Vis ac commodo adipiscing arcu aliquet.	19.99
Item Three	Morbi faucibus arcu accumsan lorem.	29.99
Item Four	Vitae integer tempus condimentum.	19.99
Item Five	Ante turpis integer aliquet porttitor.	29.99
		100.00

Buttons

Icon
Icon

Disabled
Disabled