My Portfolio

Select Projects

Data Science Capstone

Communication Unbreakdown

In the last few weeks of the 2020 election it was becoming increasingly difficult to reach a particularly elusive group of people - the undecided voter. Read more

The grassroots organization People's Action wanted to find undecided voters so they might have an important discussion. The purpose: to change hearts and minds using an empathy-driven persuasion method called Deep Canvassing.

DATA + PEOPLE = POWER

Communication Unbreakdown, our Data Science capstone team, has paired grassroots organizing with cutting-edge data science to help community organizers reach target populations for civic engagement. Our solution helped People’s Action reach 33% more conflicted voters in the last two weeks of the U.S. presidential election in five key swing states where the technology was used.

The technology was also used for the Georgia Senate runoff. During the final days of voting, conflicted voters were found in 45% of people contacted compared to a human-informed control group of 18%.

How it works | Watch a video | Read the case study | Play with DARTS | Visit the site

TOOLS & METHODS

Deep Learning | NLP

FireBERT: Hardening BERT-based classifiers against adversarial attack

The introduction of BERT in 2018 has brought natural language understanding to a level where it quickly became the standard for real-world classification tasks. But with broad adoption comes an attractive attack surface. Read more

The 2019 project “TextFooler” used word-similarities to generate samples that near-completely fool BERT-based classifiers. Thus, our research starts with the question: “Is it possible to build a firewall to protect BERT?"

Our resulting body of work, FireBERT, consists of three classification techniques to harden BERT against adversarial attack. Published in the Springer Series Advances in Information and Communication for FICC 2021, FireBERT is evaluated against MNLI and IMDB Movie Review tasks, and demonstrates it is possible to harden BERT against 95% of adversarial samples without significantly reducing the performance of regular benchmarks.

Read the paper | View the architecture | Watch the conference recording | Visit the repo

TOOLS & METHODS

Machine Learning at Scale | Spark

Flights and Weather: Predicting Airline Delays

Flight delays create problems in scheduling for airlines and airports, leading to passenger inconvenience and huge economic loss. As a result there is growing interest in predicting flight delays beforehand in order to optimize operations and to improve customer satisfaction. Read more

In this project, two massive datasets were obtained consisting of five years of national flight and weather data. The focus of my work for the project was in developing a data pipeline consisting of a multi-zone data lake, a multi-stage cleanup and data transformation process, complex queries and window joins, and the engineering of novel features used in model building and prediction. The pipeline was built in Databricks using Delta Lake, Spark SQL and PySpark.

View the Pipeline and EDA notebook

TOOLS & METHODS

Data Visualization | Tableau

How do people die?

In the United States 50 million people died between 1999-2018. What did they die from? Read more

In this interactive visualization the viewer is encouraged to explore leading causes of death in the United States and compare death rates along geographic regions to learn more about where they happen, and when, to inform public policy.

The exploration yields key insights that show a geographic and seasonal correlation to the prevelance of certain types of disease. For example, by exploring the geographic heatmap the viewer can see clusters of states that have lower economic activity also show a higher incidence of deaths caused by diabetes.

Watch the interactive presentation

TOOLS & METHODS

Data Visualization | D3.js

Excess Deaths That May Be Related to COVID-19

Are deaths from COVID-19 being undercounted or overcounted? One way to understand the accuracy in classification is by taking a view of excess deaths. Read more

Excess deaths are defined as the difference between observed deaths and expected deaths based on historic norms. Because cause of death can often include co-morbidities, a view of all causes can provide insights about the proportion of deaths that may be misclassified as they pertain to COVID-19.

In this interactive data visualization viewers gain insights into excess deaths as reported by all 50 states over time; moreover, the visualization allows the viewer to see that excess deaths can reasonably be attributed to COVID-19.

See the data visualization

TOOLS & METHODS

Applied Machine Learning

Microsoft Malware Prediction

The malware industry continues to be a well-organized, well-funded market dedicated to evading traditional security measures. Once a computer is infected by malware, criminals can hurt consumers and enterprises in many ways. Read more

Microsoft takes this problem very seriously. As one part of their overall strategy for security, Microsoft challenged the data science community on Kaggle to develop techniques to predict if a machine will soon be hit with malware.

Over the course of three weeks, my team created two submission entries for Kaggle from our best performing models. We obtained results that surpassed our goal: 61.6% from the Light GBM method and 60.4% from our PyTorch Neural Net.

View the notebook

TOOLS & METHODS

Data Engineering | Pipelines | Containers

Data Pipelines for Data Science

Roughly 80% of data science projects never make it into production. This is because the foundational underpinnings for storing, managing, and processing large datasets, and ensuring quality, security and availability lack a reliable and resilient infrastructure. Read more

Data engineering is the act of designing and building those reliable and resilient processes that transform and transport data along a series of "pipes" before arriving in a final state useful to data scientists.

Pipelines begin by ingesting data from many disparate sources where it is collected in raw form as a single source of truth. Then, as data is transported through the pipeline various data engineering tools and platforms are used to assemble, transform, connect and deliver data to the underlying storage and processing architectures needed by analytics and data science applications.

Below are some examples of constructing and consuming from a data pipeline.

Building a Pipeline | A Pipeline with Event Streaming | Data Analytics using BigQuery

TOOLS & METHODS

Data Science Community

Launching of Berkeley-Data

Data Science is a constantly evolving discipline. Practitioners need to participate in an active community to hone their craft. Berkeley Data is a hub for sharing knowledge, tools and resources within the Data Science community. Read more

Actively maintained on GitHub, Berkeley Data is an evolving body of work curated by current and former students of the Master's of Information and Data Science (MIDS) program at UC Berkeley. The repositories contain guidelines, procedures, and code that enable support of a data science practice.

Visit the repo

View More Projects

Experiments & Causal Inference

Moving the Needle on Public Opinion: An Experiment on the Persuasive Effects of Moral Frames

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Curabitur in iaculis ex. Etiam volutpat laoreet urna. Morbi ut tortor nec nulla commodo malesuada sit amet vel lacus. Fusce eget efficitur libero. Morbi dapibus porta quam laoreet placerat.

Read the paper

TOOLS & METHODS

Descriptive Statistics | R | LaTeX

Local Policy Recommendations for Crime Reduction

Read the paper

TOOLS & METHODS

Research Design | Bias | Ethics

Beyond Plug and Pray: How to ensure self-determination in a massively connected society.

Watch the presentation

TOOLS & METHODS

Experience

Databricks

Partner Solutions Lead

Establishing the Databricks Data Intelligence Platform and the industry defining Lakehouse within customers of all shapes and sizes.
I lead an incredible team of Partner Solutions Architects that work with global and regional partners to develop GenAI and Analytics solutions for customers, create Champions of the Data Intelligence platform, and solve some of the world's toughest data problems.

University of California - Berkeley

Adjunct Faculty / Data Science

TA and then Lecturer for Data Science Capstone where students combine technical, analytic, interpretive, and social dimensions to design and build a data science product.
The class provides an opportunity to integrate all the core skills and concepts learned in the program and prepares students for long-term professional success in the field.
Students are evaluated on their ability to work collaboratively, proper identification of the problem space and a solution that meets the intended needs of stakeholders, application of appropriate data science and engineering methods, societal impact, and the final presentation delivery that clearly articulates the problem and solution while addressing required elements from the course rubric.

Unify Consulting

National Managing Director of Data and AI Solutions

Practice Development

Grew our Data Science staff 20% and increased our ML Engineering staff 5-fold.
Contributed $14M in sales revenue through solutions.
Brought three net-new accounts to the organization.
Developed repeatable solutions for clients tailored to each stage of their Data+AI journey.
Directly recruited Data Scientists and ML Engineers into the organization through my personal brand and network.
Advanced the skill and development of our team through practice meetups, book clubs, paper readings, show and tells, one-on-ones, mentoring, creating personal training paths and leading by doing.
Created sales and marketing assets, presentations, proposals, SoWs, and drove opportunities to close by providing thought leadership, subject matter expertise, and right-sized solutions.
Implemented an evaluation problem for new Data Science recruits conducted over a live session between candidate and interviewer with pair programming to immediately assess technical skill, consulting acumen and critical thinking abilities.

Databricks Partnership

Socialized, enacted, and led our partnership with Databricks and nurtured it to a burgeoning program.
Coordinating partner training events, developing programs, and managing certifications for our Data Engineers and ML Professionals to maintain our partnership status.
Actively managing activity with our shared accounts, facilitating meetings, events, sponsorships, and assembling joint go-to-market programs. Activity has led to large, net-new opportunities that leverage our partner affiliation and credential status.
Enjoying deep professional relationships with Partner Sales Directors, Account Executives, Solution Architects, Managing Directors and GMs/VPs at Databricks.
Certified Databricks Machine Learning Professional and invited Databricks Champion.

Client Work

ML Platform Designer for a multi-billion dollar energy company

Served as Machine Learning Architect and platform lead for a multi-billion-dollar energy company designing the organization’s new data platform in support of emerging ML use cases for the modern electric grid.
- Defined the needs and capabilities of the core data platform, conducted platform assessments with the organization’s lead engineers and architects, and produced a scorecard comparing three cloud data platform providers to support both traditional BI analytics and operational ML use cases.
- Implemented a dedicated Data Science environment to drive experiments and train models against the organizations operational data from the new data platform and tied the workspace together with end-to-end MLOps custom tailored to address specific challenges in this highly regulated industry
- Led the engineering and deployment of production models for Propensity Modeling, Power Outages and Load Forecasting.

AI/ML SOP Engagement Lead for a mega-cloud technology company

Served as subject matter expert, facilitator and engagement lead for a mega-cloud technology company that provides platform services that enable digital workflows.
- Defined the organization’s SOPs for AI/ML Dataset Management, AI/ML Model Development and AI/ML Deployment Operations.
- Drove the standardization of processes across different ML teams by facilitating collaborative working sessions in Miro, providing subject matter expertise and driving consensus toward a set of three core operating documents that captured the standard approaches and procedures for AI/ML.
- Included a detailed write up of recommendations for future process improvements to move the organization toward better operational ML and more responsible AI.

MLOps Architect for a multi-billion dollar provider of residential and commercial security solutions

Served as the ML Architect and SME for a multi-billion dollar provider of residential and commercial security solutions to build the firm’s next-generation MLOps system.

System included automated pipelines and proactive model monitoring.
Additionally deployed a model on the platform that optimized remote service workflows to prevent costly truck rolls.

Data Science Engagement Lead for a mega-cap multinational food, snack and beverage corporation

Led effort to develop a churn model for a mega-cap multinational food, snack and beverage corporation that proactively identifies attrition risk for front-line service workers. The model finds the bulk of voluntary churn targets (68%) in 19% of the population, saving the organization $10M in talent acquisition costs.

Data+AI Product Advisor for a provider in digital human resources, health and wealth benefits

Served as strategic Data+AI advisor to the senior product leaders responsible for growing and building the provider platform for a large, multinational technology and consulting company that provides services in digital human resources, health and wealth benefits management, and global payroll. Gathered ideas and iterated on feature concepts by applying design thinking principles, identifying “jobs-to-be-done” and discovering “how-might-we” with available data.

ML Use Case Program Facilitator for a multi-billion dollar energy company

Facilitated workshops and provided subject matter expertise for a multi-billion dollar energy company to develop high-value use cases that can be solved with ML that optimize call center operations. Established ROI metrics for each use case then refined and prioritized the backlog for program management.

Data Science & MLOps Engagement Lead for a cloud platform company in the healthcare payments space

Led effort in developing several prototype ML models for a cloud platform company in the healthcare payments space to address churn and propensity use cases for care providers in the company’s payments network. Currently advancing the prototypes to production and initiating an engagement to build and scale MLOps for the entire enterprise.

Independent Technology and Business Consulting

Product Manager | ML Engineering | Data Scientist

Independent Consulting

Providing Digital Solutions at the intersection of Business Strategy, Value Proposition and Technology Innovation

Client Work

Data Scientist for a Legal Automation Startup

Designed experiments and developed machine learning algorithm to detect high risk clauses in legal contracts leveraging Large Language Models (LLMs) from HuggingFace.

Solution Architect for Next Generation Data Management Platform

Designed cloud-native technology solution for a next generation data management platform in the medical accreditation industry using domain-driven design and serverless microservices.

Product Manager for Global CRM Solution

Managed delivery of a large-scale CRM solution with third party data integrations for a global industrial packaging manufacturer.

Product Manager for Fortune 500 financial services client

Determined product-market fit and managed delivery of a software product in financial and investment services industry using Business Model Canvas, Design Sprints, Lean Product Playbook, Design Thinking and Agile Software Delivery.

ML Engineer for Fortune 500 B2B client

Assembled and optimized a data pipeline process in Apache NiFi, Spark, Livy, Kafka, and Spark Structured Streaming.

Data Scientist for Fortune 500 B2B client

Trained machine-learning model for fraud detection using Python and Scikit-learn, and developed a data pipeline process for deployment.

SPR Consulting

Vice President

Delivery Leadership | Emerging Technology | Client Partner

Managed key client relationships and delivery engagements.
Researched and delivered emerging technology solutions for strategic clients.
Nurtured company culture of highly productive cross-functional teams.
Defined 300-page collection of best practices for professional services software delivery.
Pursued strategic initiatives on behalf of the CEO and CTO.

IoT Practice Lead

Defined IoT market strategy and offerings, and initiated partnerships.
Performed R&D in consumer, commercial and industry 4.0 space.
Built unique internal IoT products and facilitated hands-on IoT training labs.
Mentored staff in the IoT space.

Product Manager for Venture Capital Projects

Developed four startups into a $5M portfolio of investments from ideation to execution using business modeling, technology strategy, and delivery management.

Redpoint Technologies

Principal Architect

Facilitated hundreds of interactive workshops for ideation, scoping and project definition.
Managed 50+ projects across industries, from Fortune 500 to early startups.
Built system architectures, from e-commerce portals and data analytics applications to sophisticated yield-management processes and multi-threaded grid solutions.
Managed teams and delivered solutions using Agile methods and industry-leading architecture and design practices.
Conducted Agile training camps, assessments and organizational transformations.
Advised business owners in refining and validating their ideas and business models.
Conducted architecture reviews with recommendations on technology/strategy advice.

Leapnet

Senior Software Architect

Designed and built several revenue-enhancement and reporting systems for clients in the hospitality industry. Systems include Data Warehouses and custom applications providing forecasting, yield-management and revenue tracking for hotel managers and corporate revenue-management personnel.

Parthenon Consulting Group

Principal Consultant

Designed and built several system applications for an insurance industry client. Systems include a Data Warehouse for Investment Fund Performance Review and a variety of payroll, benefit and compensation applications for HR.

Parian Development Group

Senior Software Developer

Delivered client projects for a software development consultancy startup, including work in insurance, banking, financial, reality, automotive, and records-retention fields.

U.S. Army Construction Engineering Research Lab

Student Contracter | Programmer

Developed software modules for a railroad inspection and track-management system in C and RBASE.

Testimonials

George Goehl

Director at People's Action Institute

George was Kevin's stakeholder for Data Science Capstone

Our partnership with Kevin has been game-changing.

When Kevin first approached me with his capstone project I knew we had to take him up on his offer. In a matter of weeks he helped us build something massive that no one else has done.

Every aspect of the project we worked on together was more effective because of the rigorous data science that Kevin brought to the partnership. He has the skills that could be used for most anything. And he is a delight to work with!
Joyce J. Shen

Venture Investor | Board Member | Data Science & AI at Berkeley | Book Author | ex-Managing Director of Emerging Tech & Investments Thomson Reuters; CFO & COO IBM Cloud Platform; Corp Dev IBM

Joyce advised Kevin on Data Science Capstone

From Joyce's feedback on Data Science Capstone:

I want to acknowledge how thorough this plan is. It is well researched, detailed, and well written. You have set the bar for future students.
Mike Tamir, PhD

Chief ML Scientist & Head of Machine Learning/AI at SIG, Data Science Faculty-Berkeley

Mike was Kevin’s teacher for Deep Learning with NLP

I met Kevin in the UC Berkeley Data Science program where he took Deep Learning for NLP. This is one of the most advanced courses in the curriculum and Kevin excelled.

As a major component of this class students submit a Deep Learning research project to demonstrate their ability to contribute to the field. Kevin’s team produced a sophisticated adversarial training framework, FireBERT, for hardening BERT based algorithms against adversarial attacks. Kevin was responsible for the co-tuning strategy of this project. His contributions stood out not just for engaging in advanced DL techniques, but also for the level of engineering capabilities he demonstrated.

He created a repeatable process to set up and execute the adversarial TextFooler algorithm for the team. He was able to refactor the original code to use PyTorch Lightning to create consistently their classifier models and train in different gpu deployment environments. He also made contributions to their SWITCH logic utilizing base and abstract classes in Python and implementing a random Search routine for hyper-parameter optimization.

The work Kevin did in class showcases the sort of skills I often seek when hiring industry professionals, and I recommend him to hiring managers looking for ML engineers.
Clarence Chio

Co-founder & CTO/President, Unit21

Clarence was Kevin’s teacher for Applied Machine Learning

Kevin was a student in the Applied Machine Learning class I was teaching as part of Berkeley’s Masters in Information and Data Science program. From the start, it was clear that Kevin was a highly capable student, achieving top grades consistently. He was also passionate about the topic, increasing the quality of discussions in class with thoughtful questions that provoke me and his classmates to think about problems in a different way.

After graduating from the program, Kevin continued to pursue his passion in data science and machine learning, embarking on projects in data visualization, research on adversarial ML, (even publishing this work in a peer reviewed journal!) and applying what he learned in class to real world applications. He is clearly in the top 1% of all students that I’ve had the pleasure of teaching.

Kevin’s aptitude in data science and hustle in delivering on projects makes him an invaluable asset.
Stephen Ramsey

President at Redpoint Technologies

Stephen managed Kevin directly

Kevin is a brilliant technologist and an exceptional leader, coach and mentor.

He consistently demonstrated a solid work ethic at Redpoint plus a dedication to success.

Kevin is self-motivated, methodical and very capable. It was a pleasure to have Kevin on my team and I would welcome the opportunity to work with him again.
Stacie Keller

Team Lead at West Monroe Partners

Stacie worked with Kevin in different groups

I had the pleasure of working with Kevin during several discovery workshop sessions.

Kevin expertly worked with our clients to identify requirements and was flexible during the sessions to ensure the most relevant topics were discussed.

Thanks to Kevin's leadership during these sessions, the projects that followed were able to get off to a great start with clearly-defined requirements.
Michael Rivera, PhD

Assistant Adjunct Professor at University of California, Berkeley

Michael was Kevin's teacher for Research Design

Kevin has strong technical skills and informed business acumen.

He is incredibly talented and persistent.

He's pleasant to work with and will be a strong addition to any team.
Kyle Hamilton

Chief Innovation & Data Officer, iQ4; Lecturer, UCBerkeley; PhD Candidate, ML-Labs

Kyle was Kevin's teacher for Machine Learning at Scale

I had the pleasure of meeting Kevin when he was a student in my Machine Learning course at Berkeley in the Summer of 2020.

He not only excelled in the course, he continued to show his dedication to the work even after the course was finished, organizing a program wide course project review encouraging others to learn from each other's challenges and experiences.

A true team player!
John Alexis Guerra Gómez

Data Visualization Instructor and Course Coordinator at University of California, Berkeley

John Alexis was Kevin's teacher for Data Visualizations

Kevin was one of those students that you dream to have in your classes.

His thirst for knowledge and willingness to invest the time and hard work needed to master the most complicated concepts of my classes made him stand out clearly. He was a great driving force for the group, always eager to ask more questions, and share the answers he had worked on.

From his work in my class, it was evident that he is a very insight driven individual, which will define clear objectives in his work and will lead the group through the process of achieving them.
Gunnar Kleemann, MIDS PhD.

Principal Data Scientist and Owner of Austin Capital Data

Gunnar was Kevin's teacher for Statistics

I taught Kevin Statistics for Data Science for the UC Berkeley Masters in Data Science program.

Even as a student I easily saw how professional and capable Kevin is. His is a very good R coder, he can think critically and break down mathematical concepts, and he is enthusiastic about collaborative work.

He works with a good measure of both ability and humility. This makes him effective as an analyst and data scientist. Kevin will be an asset to any team.
Kevin R. Crook

Lecturer in Cybersecurity and Data Science at University of California, Berkeley

Kevin R. was Kevin's teacher for Data Engineering

Kevin was a student at the University of California, Berkeley in the master's of data science program.

More specifically he was a student in my data engineering course, which included numerous skills, such as analysis, cloud-based computing, virtual machines, Docker containers, Linux, Python object-oriented programming, source code control, big data architecture (lambda), and massively parallel processing using Spark.

Kevin excelled in the course, going above and beyond the required elements.

I highly recommend, without reservation, Kevin Hartman.
MQ Qureshi

Award Winning Digital Executive | Keynote Speaker | Successful Startup Founder | Digital Innovation & Product Strategy Leader | Transformation Expert

MQ was Kevin's client for Xoobies

Kevin is a consumate professional.

He is warm, and kind and exceptionally smart. His ability to listen, collaborate and seek solutions is not found often and his lifelong pursuit of learning (evidenced by his recent pursuit and accomplishments in Data Science) make him an incredible asset to any organization.

I'm proud to have worked with him and even more to call him a friend.

All words shared are shared with permission.

Kevin Hartman

Product Management | ML Engineering | Data Science

About Me

Select Projects

Communication Unbreakdown

DATA + PEOPLE = POWER

TOOLS & METHODS

FireBERT: Hardening BERT-based classifiers against adversarial attack

TOOLS & METHODS

Flights and Weather: Predicting Airline Delays

TOOLS & METHODS

How do people die?

TOOLS & METHODS

Excess Deaths That May Be Related to COVID-19

TOOLS & METHODS

Microsoft Malware Prediction

TOOLS & METHODS

Data Pipelines for Data Science

TOOLS & METHODS

Launching of Berkeley-Data

Moving the Needle on Public Opinion: An Experiment on the Persuasive Effects of Moral Frames

TOOLS & METHODS

Local Policy Recommendations for Crime Reduction

TOOLS & METHODS

Beyond Plug and Pray: How to ensure self-determination in a massively connected society.

TOOLS & METHODS

Experience

Databricks

Partner Solutions Lead

University of California - Berkeley

Adjunct Faculty / Data Science

Unify Consulting

National Managing Director of Data and AI Solutions

Practice Development

Databricks Partnership

Client Work

ML Platform Designer for a multi-billion dollar energy company

AI/ML SOP Engagement Lead for a mega-cloud technology company

MLOps Architect for a multi-billion dollar provider of residential and commercial security solutions

Data Science Engagement Lead for a mega-cap multinational food, snack and beverage corporation

Data+AI Product Advisor for a provider in digital human resources, health and wealth benefits

ML Use Case Program Facilitator for a multi-billion dollar energy company

Data Science & MLOps Engagement Lead for a cloud platform company in the healthcare payments space

Independent Technology and Business Consulting

Product Manager | ML Engineering | Data Scientist

Independent Consulting

Client Work

Data Scientist for a Legal Automation Startup

Solution Architect for Next Generation Data Management Platform

Product Manager for Global CRM Solution

Product Manager for Fortune 500 financial services client

ML Engineer for Fortune 500 B2B client

Data Scientist for Fortune 500 B2B client

SPR Consulting

Vice President

Delivery Leadership | Emerging Technology | Client Partner

IoT Practice Lead

Product Manager for Venture Capital Projects

Redpoint Technologies

Principal Architect

Leapnet

Senior Software Architect

Parthenon Consulting Group

Principal Consultant

Parian Development Group

Senior Software Developer

U.S. Army Construction Engineering Research Lab

Student Contracter | Programmer

Education

University of California, Berkeley

Master of Information and Data Science

University of Illinois at Urbana-Champaign

Bachelor of Science in Electrical and Computer Engineering

Testimonials

George Goehl

Joyce J. Shen

Mike Tamir, PhD

Clarence Chio

Stephen Ramsey

Stacie Keller