Close
 

Kevin Hartman

Product Management | ML Engineering | Data Science

Download Resume

About Me

Hi, I'm Kevin. I solve complex problems using technology and data.

In my day job I work as a partner solutions leader at Databricks where I help our consulting partners build Data and AI solutions on the Data Intelligence Platform.

In my evenings I'm a lecturer at UC Berkeley where I teach and advise students working on their Data Science Capstone projects.

My speciality is formulating high-impact solutions that leverage technology innovation. I've inspired, influenced, and led the creation of hundreds of digital products; working from both inside and alongside teams of researchers, scientists, designers, engineers and enthusiasts.

I'm lucky to have worked with and been taught by some really brilliant people. Some of them have a few words to share.

I get my hands dirty a lot in the technical details. But I also have the acumen to manage the 10,000 foot view in a room with executives. My leadership experience includes:

  • Delivering outcomes using Agile and Lean methods
  • Building high-performing, cross-functional teams
  • Driving stakeholder alignment
  • Executing strategic plans

When I'm not spending time working on my next project, I'm an avid cook and aspiring family photographer. My favorite subjects (and taste-testers) are my wife and two children. With my family I also love to travel.

Wherever I go, I carry with me a master’s degree in Information and Data Science from UC Berkeley. I also have in my backpack a bachelor’s degree in Electrical and Computer Engineering from the University of Illinois at Champaign-Urbana.

Select Projects

Data Science Capstone

Communication Unbreakdown

In the last few weeks of the 2020 election it was becoming increasingly difficult to reach a particularly elusive group of people - the undecided voter. Read more

The grassroots organization People's Action wanted to find undecided voters so they might have an important discussion. The purpose: to change hearts and minds using an empathy-driven persuasion method called Deep Canvassing.

DATA + PEOPLE = POWER

Communication Unbreakdown, our Data Science capstone team, has paired grassroots organizing with cutting-edge data science to help community organizers reach target populations for civic engagement. Our solution helped People’s Action reach 33% more conflicted voters in the last two weeks of the U.S. presidential election in five key swing states where the technology was used.

The technology was also used for the Georgia Senate runoff. During the final days of voting, conflicted voters were found in 45% of people contacted compared to a human-informed control group of 18%.

TOOLS & METHODS
EMR Civis Platform
Thrutalk
Zapier
EMR Amazon SageMaker
EMR Amazon
Elastic MapReduce
Apache Spark
scikit-learn scikit-learn
D3.js
Flask
Python
Data Pipelines
Data Transformation
Exploratory Data Analysis
Feature Engineering
Supervised Machine Learning
Reinforcement Learning

Deep Learning | NLP

FireBERT: Hardening BERT-based classifiers against adversarial attack

The introduction of BERT in 2018 has brought natural language understanding to a level where it quickly became the standard for real-world classification tasks. But with broad adoption comes an attractive attack surface. Read more

The 2019 project “TextFooler” used word-similarities to generate samples that near-completely fool BERT-based classifiers. Thus, our research starts with the question: “Is it possible to build a firewall to protect BERT?"

Our resulting body of work, FireBERT, consists of three classification techniques to harden BERT against adversarial attack. Published in the Springer Series Advances in Information and Communication for FICC 2021, FireBERT is evaluated against MNLI and IMDB Movie Review tasks, and demonstrates it is possible to harden BERT against 95% of adversarial samples without significantly reducing the performance of regular benchmarks.

TOOLS & METHODS
PyTorch PyTorch
PyTorch Lightning PyTorch Lightning
Hugging Face Hugging Face
Python
Adam Adam
BERT BERT
Dense Connections Dense Connections
Dropout Dropout
GELU GELU
Layer Normalization Layer Normalization
Multi-Head Attention Multi-Head Attention
Residual Connection Residual Connection
Scaled Dot-Product Attention Scaled Dot-Product Attention
Softmax Softmax
Weight Decay Weight Decay
WordPiece WordPiece

Machine Learning at Scale | Spark

Flights and Weather: Predicting Airline Delays

Flight delays create problems in scheduling for airlines and airports, leading to passenger inconvenience and huge economic loss. As a result there is growing interest in predicting flight delays beforehand in order to optimize operations and to improve customer satisfaction. Read more

In this project, two massive datasets were obtained consisting of five years of national flight and weather data. The focus of my work for the project was in developing a data pipeline consisting of a multi-zone data lake, a multi-stage cleanup and data transformation process, complex queries and window joins, and the engineering of novel features used in model building and prediction. The pipeline was built in Databricks using Delta Lake, Spark SQL and PySpark.

TOOLS & METHODS
scikit-learn Databricks
Delta Lake
PySpark
Spark SQL
Python
Data Pipelines
Data Transformation
Exploratory Data Analysis
Feature Engineering
Supervised Machine Learning

Data Visualization | Tableau

How do people die?

In the United States 50 million people died between 1999-2018. What did they die from? Read more

In this interactive visualization the viewer is encouraged to explore leading causes of death in the United States and compare death rates along geographic regions to learn more about where they happen, and when, to inform public policy.

The exploration yields key insights that show a geographic and seasonal correlation to the prevelance of certain types of disease. For example, by exploring the geographic heatmap the viewer can see clusters of states that have lower economic activity also show a higher incidence of deaths caused by diabetes.

TOOLS & METHODS
Tableau
Tableau Desktop
Tableau Prep
TabPy
JavaScript
Python
Data Vis Analysis Framework
Data Transformation
DATA.GOV
National Center for Health Statistics

Data Visualization | D3.js

Excess Deaths That May Be Related to COVID-19

Are deaths from COVID-19 being undercounted or overcounted? One way to understand the accuracy in classification is by taking a view of excess deaths. Read more

Excess deaths are defined as the difference between observed deaths and expected deaths based on historic norms. Because cause of death can often include co-morbidities, a view of all causes can provide insights about the proportion of deaths that may be misclassified as they pertain to COVID-19.

In this interactive data visualization viewers gain insights into excess deaths as reported by all 50 states over time; moreover, the visualization allows the viewer to see that excess deaths can reasonably be attributed to COVID-19.

TOOLS & METHODS
D3.js
Flask
JavaScript
Python
Data Vis Analysis Framework
DATA.GOV
Centers for Disease Control

Applied Machine Learning

Microsoft Malware Prediction

The malware industry continues to be a well-organized, well-funded market dedicated to evading traditional security measures. Once a computer is infected by malware, criminals can hurt consumers and enterprises in many ways. Read more

Microsoft takes this problem very seriously. As one part of their overall strategy for security, Microsoft challenged the data science community on Kaggle to develop techniques to predict if a machine will soon be hit with malware.

Over the course of three weeks, my team created two submission entries for Kaggle from our best performing models. We obtained results that surpassed our goal: 61.6% from the Light GBM method and 60.4% from our PyTorch Neural Net.

TOOLS & METHODS
PyTorch PyTorch
LightGBM LightGBM
scikit-learn scikit-learn
jupyter Jupyter
Python

Data Engineering | Pipelines | Containers

Data Pipelines for Data Science

Roughly 80% of data science projects never make it into production. This is because the foundational underpinnings for storing, managing, and processing large datasets, and ensuring quality, security and availability lack a reliable and resilient infrastructure. Read more

Data engineering is the act of designing and building those reliable and resilient processes that transform and transport data along a series of "pipes" before arriving in a final state useful to data scientists.

Pipelines begin by ingesting data from many disparate sources where it is collected in raw form as a single source of truth. Then, as data is transported through the pipeline various data engineering tools and platforms are used to assemble, transform, connect and deliver data to the underlying storage and processing architectures needed by analytics and data science applications.

Below are some examples of constructing and consuming from a data pipeline.

TOOLS & METHODS
Google Cloud Platform
Google BigQuery
Docker Containers
Apache Spark
Apache Kafka
Flask
Jupyter
Python
Data Pipelines
Data Transformation

Data Science Community

Launching of Berkeley-Data

Data Science is a constantly evolving discipline. Practitioners need to participate in an active community to hone their craft. Berkeley Data is a hub for sharing knowledge, tools and resources within the Data Science community. Read more

Actively maintained on GitHub, Berkeley Data is an evolving body of work curated by current and former students of the Master's of Information and Data Science (MIDS) program at UC Berkeley. The repositories contain guidelines, procedures, and code that enable support of a data science practice.


View More Projects

Experiments & Causal Inference

Moving the Needle on Public Opinion: An Experiment on the Persuasive Effects of Moral Frames

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Curabitur in iaculis ex. Etiam volutpat laoreet urna. Morbi ut tortor nec nulla commodo malesuada sit amet vel lacus. Fusce eget efficitur libero. Morbi dapibus porta quam laoreet placerat.

TOOLS & METHODS
Coming soon...

Descriptive Statistics | R | LaTeX

Local Policy Recommendations for Crime Reduction

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Curabitur in iaculis ex. Etiam volutpat laoreet urna. Morbi ut tortor nec nulla commodo malesuada sit amet vel lacus. Fusce eget efficitur libero. Morbi dapibus porta quam laoreet placerat.

TOOLS & METHODS
Coming soon...

Research Design | Bias | Ethics

Beyond Plug and Pray: How to ensure self-determination in a massively connected society.

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Curabitur in iaculis ex. Etiam volutpat laoreet urna. Morbi ut tortor nec nulla commodo malesuada sit amet vel lacus. Fusce eget efficitur libero. Morbi dapibus porta quam laoreet placerat.

TOOLS & METHODS
Coming soon...

Experience

Databricks

Partner Solutions Lead

  • Establishing the Databricks Data Intelligence Platform and the industry defining Lakehouse within customers of all shapes and sizes.
  • I lead an incredible team of Partner Solutions Architects that work with global and regional partners to develop GenAI and Analytics solutions for customers, create Champions of the Data Intelligence platform, and solve some of the world's toughest data problems.

University of California - Berkeley

Adjunct Faculty / Data Science

  • TA and then Lecturer for Data Science Capstone where students combine technical, analytic, interpretive, and social dimensions to design and build a data science product.
  • The class provides an opportunity to integrate all the core skills and concepts learned in the program and prepares students for long-term professional success in the field.
  • Students are evaluated on their ability to work collaboratively, proper identification of the problem space and a solution that meets the intended needs of stakeholders, application of appropriate data science and engineering methods, societal impact, and the final presentation delivery that clearly articulates the problem and solution while addressing required elements from the course rubric.

Unify Consulting

National Managing Director of Data and AI Solutions

Practice Development
  • Grew our Data Science staff 20% and increased our ML Engineering staff 5-fold.
  • Contributed $14M in sales revenue through solutions.
  • Brought three net-new accounts to the organization.
  • Developed repeatable solutions for clients tailored to each stage of their Data+AI journey.
  • Directly recruited Data Scientists and ML Engineers into the organization through my personal brand and network.
  • Advanced the skill and development of our team through practice meetups, book clubs, paper readings, show and tells, one-on-ones, mentoring, creating personal training paths and leading by doing.
  • Created sales and marketing assets, presentations, proposals, SoWs, and drove opportunities to close by providing thought leadership, subject matter expertise, and right-sized solutions.
  • Implemented an evaluation problem for new Data Science recruits conducted over a live session between candidate and interviewer with pair programming to immediately assess technical skill, consulting acumen and critical thinking abilities.
Databricks Partnership
  • Socialized, enacted, and led our partnership with Databricks and nurtured it to a burgeoning program.
  • Coordinating partner training events, developing programs, and managing certifications for our Data Engineers and ML Professionals to maintain our partnership status.
  • Actively managing activity with our shared accounts, facilitating meetings, events, sponsorships, and assembling joint go-to-market programs. Activity has led to large, net-new opportunities that leverage our partner affiliation and credential status.
  • Enjoying deep professional relationships with Partner Sales Directors, Account Executives, Solution Architects, Managing Directors and GMs/VPs at Databricks.
  • Certified Databricks Machine Learning Professional and invited Databricks Champion.
Client Work
ML Platform Designer for a multi-billion dollar energy company
  • Served as Machine Learning Architect and platform lead for a multi-billion-dollar energy company designing the organization’s new data platform in support of emerging ML use cases for the modern electric grid.
    • Defined the needs and capabilities of the core data platform, conducted platform assessments with the organization’s lead engineers and architects, and produced a scorecard comparing three cloud data platform providers to support both traditional BI analytics and operational ML use cases.
    • Implemented a dedicated Data Science environment to drive experiments and train models against the organizations operational data from the new data platform and tied the workspace together with end-to-end MLOps custom tailored to address specific challenges in this highly regulated industry
    • Led the engineering and deployment of production models for Propensity Modeling, Power Outages and Load Forecasting.
AI/ML SOP Engagement Lead for a mega-cloud technology company
  • Served as subject matter expert, facilitator and engagement lead for a mega-cloud technology company that provides platform services that enable digital workflows.
    • Defined the organization’s SOPs for AI/ML Dataset Management, AI/ML Model Development and AI/ML Deployment Operations.
    • Drove the standardization of processes across different ML teams by facilitating collaborative working sessions in Miro, providing subject matter expertise and driving consensus toward a set of three core operating documents that captured the standard approaches and procedures for AI/ML.
    • Included a detailed write up of recommendations for future process improvements to move the organization toward better operational ML and more responsible AI.
MLOps Architect for a multi-billion dollar provider of residential and commercial security solutions
  • Served as the ML Architect and SME for a multi-billion dollar provider of residential and commercial security solutions to build the firm’s next-generation MLOps system.
    • System included automated pipelines and proactive model monitoring.
    • Additionally deployed a model on the platform that optimized remote service workflows to prevent costly truck rolls.
Data Science Engagement Lead for a mega-cap multinational food, snack and beverage corporation
  • Led effort to develop a churn model for a mega-cap multinational food, snack and beverage corporation that proactively identifies attrition risk for front-line service workers. The model finds the bulk of voluntary churn targets (68%) in 19% of the population, saving the organization $10M in talent acquisition costs.
Data+AI Product Advisor for a provider in digital human resources, health and wealth benefits
  • Served as strategic Data+AI advisor to the senior product leaders responsible for growing and building the provider platform for a large, multinational technology and consulting company that provides services in digital human resources, health and wealth benefits management, and global payroll. Gathered ideas and iterated on feature concepts by applying design thinking principles, identifying “jobs-to-be-done” and discovering “how-might-we” with available data.
ML Use Case Program Facilitator for a multi-billion dollar energy company
  • Facilitated workshops and provided subject matter expertise for a multi-billion dollar energy company to develop high-value use cases that can be solved with ML that optimize call center operations. Established ROI metrics for each use case then refined and prioritized the backlog for program management.
Data Science & MLOps Engagement Lead for a cloud platform company in the healthcare payments space
  • Led effort in developing several prototype ML models for a cloud platform company in the healthcare payments space to address churn and propensity use cases for care providers in the company’s payments network. Currently advancing the prototypes to production and initiating an engagement to build and scale MLOps for the entire enterprise.

Independent Technology and Business Consulting

Product Manager | ML Engineering | Data Scientist

Independent Consulting
  • Providing Digital Solutions at the intersection of Business Strategy, Value Proposition and Technology Innovation
Client Work
Data Scientist for a Legal Automation Startup
  • Designed experiments and developed machine learning algorithm to detect high risk clauses in legal contracts leveraging Large Language Models (LLMs) from HuggingFace.
Solution Architect for Next Generation Data Management Platform
  • Designed cloud-native technology solution for a next generation data management platform in the medical accreditation industry using domain-driven design and serverless microservices.
Product Manager for Global CRM Solution
  • Managed delivery of a large-scale CRM solution with third party data integrations for a global industrial packaging manufacturer.
Product Manager for Fortune 500 financial services client
  • Determined product-market fit and managed delivery of a software product in financial and investment services industry using Business Model Canvas, Design Sprints, Lean Product Playbook, Design Thinking and Agile Software Delivery.
ML Engineer for Fortune 500 B2B client
  • Assembled and optimized a data pipeline process in Apache NiFi, Spark, Livy, Kafka, and Spark Structured Streaming.
Data Scientist for Fortune 500 B2B client
  • Trained machine-learning model for fraud detection using Python and Scikit-learn, and developed a data pipeline process for deployment.

SPR Consulting

Vice President

Delivery Leadership | Emerging Technology | Client Partner
  • Managed key client relationships and delivery engagements.
  • Researched and delivered emerging technology solutions for strategic clients.
  • Nurtured company culture of highly productive cross-functional teams.
  • Defined 300-page collection of best practices for professional services software delivery.
  • Pursued strategic initiatives on behalf of the CEO and CTO.
IoT Practice Lead
  • Defined IoT market strategy and offerings, and initiated partnerships.
  • Performed R&D in consumer, commercial and industry 4.0 space.
  • Built unique internal IoT products and facilitated hands-on IoT training labs.
  • Mentored staff in the IoT space.
Product Manager for Venture Capital Projects
  • Developed four startups into a $5M portfolio of investments from ideation to execution using business modeling, technology strategy, and delivery management.

Redpoint Technologies

Principal Architect

  • Facilitated hundreds of interactive workshops for ideation, scoping and project definition.
  • Managed 50+ projects across industries, from Fortune 500 to early startups.
  • Built system architectures, from e-commerce portals and data analytics applications to sophisticated yield-management processes and multi-threaded grid solutions.
  • Managed teams and delivered solutions using Agile methods and industry-leading architecture and design practices.
  • Conducted Agile training camps, assessments and organizational transformations.
  • Advised business owners in refining and validating their ideas and business models.
  • Conducted architecture reviews with recommendations on technology/strategy advice.

Leapnet

Senior Software Architect

Designed and built several revenue-enhancement and reporting systems for clients in the hospitality industry. Systems include Data Warehouses and custom applications providing forecasting, yield-management and revenue tracking for hotel managers and corporate revenue-management personnel.

Parthenon Consulting Group

Principal Consultant

Designed and built several system applications for an insurance industry client. Systems include a Data Warehouse for Investment Fund Performance Review and a variety of payroll, benefit and compensation applications for HR.

Parian Development Group

Senior Software Developer

Delivered client projects for a software development consultancy startup, including work in insurance, banking, financial, reality, automotive, and records-retention fields.

U.S. Army Construction Engineering Research Lab

Student Contracter | Programmer

Developed software modules for a railroad inspection and track-management system in C and RBASE.

Education

Testimonials

All words shared are shared with permission.

Technical Skills

Get in Touch

On our travels we inevitibly run into a complex problem. What problem did you see and how will you solve it?



Grab a Time

If you have more than a few travel stories to exchange here are some times you can reach me.

Schedule a Meeting