Close
 

Kevin Hartman

Product Management | ML Engineering | Data Science

Download Resume

About Me

Hi, I'm Kevin. I solve complex problems using technology and data.

In my day job I work as a consultant in software delivery. My speciality is formulating high-impact solutions that leverage technology innovation. I've inspired, influenced, and led the creation of hundreds of digital products; working from both inside and alongside teams of researchers, scientists, designers, engineers and enthusiasts.

I'm lucky to have worked with and been taught by some really brilliant people. Some of them have a few words to share.

I get my hands dirty a lot in the technical details. But I also have the acumen to manage the 10,000 foot view in a room with executives. My leadership experience includes:

  • Delivering outcomes using Agile and Lean methods
  • Building high-performing, cross-functional teams
  • Driving stakeholder alignment
  • Executing strategic plans

When I'm not spending time working on my next project, I'm an avid cook and aspiring family photographer. My favorite subjects (and taste-testers) are my wife and two children. With my family I also love to travel.

Wherever I go, I carry with me a master’s degree in Information and Data Science from UC Berkeley. I also have in my backpack a bachelor’s degree in Electrical and Computer Engineering from the University of Illinois at Champaign-Urbana.

Select Projects

Data Science Capstone

Communication Unbreakdown

In the last few weeks of the 2020 election it was becoming increasingly difficult to reach a particularly elusive group of people - the undecided voter. The grassroots organization People's Action wanted to find these voters so they might have an important discussion. The purpose: to change hearts and minds using an empathy-driven persuasion method called Deep Canvassing.

DATA + PEOPLE = POWER

Communication Unbreakdown, our Data Science capstone team, has paired grassroots organizing with cutting-edge data science to help community organizers reach target populations for civic engagement. Our solution helped People’s Action reach 33% more conflicted voters in the last two weeks of the U.S. presidential election in five key swing states where the technology was used. The technology was also used for the Georgia Senate runoff. During the final days of voting, conflicted voters were found in 45% of people contacted compared to a human-informed control group of 18%.

TOOLS & METHODS
EMR Civis Platform
Thrutalk
Zapier
EMR Amazon SageMaker
EMR Amazon
Elastic MapReduce
Apache Spark
scikit-learn scikit-learn
D3.js
Flask
Python
Data Pipelines
Data Transformation
Exploratory Data Analysis
Feature Engineering
Supervised Machine Learning
Reinforcement Learning

Deep Learning | Natural Language Processing

FireBERT: Hardening BERT-based classifiers against adversarial attack

The introduction of BERT in 2018 has brought natural language understanding to a level where it quickly became the standard for real-world classification tasks. But with broad adoption comes an attractive attack surface. The 2019 project “TextFooler” used word-similarities to generate samples that near-completely fool BERT-based classifiers. Thus, our research starts with the question: “Is it possible to build a firewall to protect BERT?"

Our resulting body of work, FireBERT, consists of three classification techniques to harden BERT against adversarial attack. Published in the Springer Series Advances in Information and Communication for FICC 2021, FireBERT is evaluated against MNLI and IMDB Movie Review tasks, and demonstrates it is possible to harden BERT against 95% of adversarial samples without significantly reducing the performance of regular benchmarks.

TOOLS & METHODS
PyTorch PyTorch
PyTorch Lightning PyTorch Lightning
Hugging Face Hugging Face
Python
Adam Adam
BERT BERT
Dense Connections Dense Connections
Dropout Dropout
GELU GELU
Layer Normalization Layer Normalization
Multi-Head Attention Multi-Head Attention
Residual Connection Residual Connection
Scaled Dot-Product Attention Scaled Dot-Product Attention
Softmax Softmax
Weight Decay Weight Decay
WordPiece WordPiece

Machine Learning at Scale | Spark | MapReduce

Flights and Weather: Predicting Airline Delays

Flight delays create problems in scheduling for airlines and airports, leading to passenger inconvenience and huge economic loss. As a result there is growing interest in predicting flight delays beforehand in order to optimize operations and to improve customer satisfaction.

In this project, two massive datasets were obtained consisting of five years of national flight and weather data. The focus of my work for the project was in developing a data pipeline consisting of a multi-zone data lake, a multi-stage cleanup and data transformation process, complex queries and window joins, and the engineering of novel features used in model building and prediction. The pipeline was built in Databricks using Delta Lake, Spark SQL and PySpark.

TOOLS & METHODS
scikit-learn Databricks
Delta Lake
PySpark
Spark SQL
Python
Data Pipelines
Data Transformation
Exploratory Data Analysis
Feature Engineering
Supervised Machine Learning

Data Visualization | Tableau

How do people die?

In the United States 50 million people died between 1999-2018. What did they die from?

In this interactive visualization the viewer is encouraged to explore leading causes of death in the United States and compare death rates along geographic regions to learn more about where they happen, and when, to inform public policy.

The exploration yields key insights that show a geographic and seasonal correlation to the prevelance of certain types of disease. For example, by exploring the geographic heatmap the viewer can see clusters of states that have lower economic activity also show a higher incidence of deaths caused by diabetes.

TOOLS & METHODS
Tableau
Tableau Desktop
Tableau Prep
TabPy
JavaScript
Python
Data Vis Analysis Framework
Data Transformation
DATA.GOV
National Center for Health Statistics

Data Visualization | D3.js

Excess Deaths That May Be Related to COVID-19

Are deaths from COVID-19 being undercounted or overcounted? One way to understand the accuracy in classification is by taking a view at excess deaths.

Excess deaths are defined as the difference between observed deaths and expected deaths based on historic norms. Because cause of death can often include co-morbidities, a view of all causes can provide insights about the proportion of deaths that may be misclassified as they pertain to COVID-19.

In this interactive data visualization viewers gain insights into excess deaths as reported by all 50 states over time; moreover, the visualization allows the viewer to see that excess deaths can reasonably be attributed to COVID-19.

TOOLS & METHODS
D3.js
Flask
JavaScript
Python
Data Vis Analysis Framework
DATA.GOV
Centers for Disease Control

Applied Machine Learning

Microsoft Malware Prediction

The malware industry continues to be a well-organized, well-funded market dedicated to evading traditional security measures. Once a computer is infected by malware, criminals can hurt consumers and enterprises in many ways. Microsoft takes this problem very seriously. As one part of their overall strategy for security, Microsoft challenged the data science community on Kaggle to develop techniques to predict if a machine will soon be hit with malware.

Over the course of three weeks, my team created two submission entries for Kaggle from our best performing models. We obtained results that surpassed our goal: 61.6% from the Light GBM method and 60.4% from our PyTorch Neural Net.

TOOLS & METHODS
PyTorch PyTorch
LightGBM LightGBM
scikit-learn scikit-learn
jupyter Jupyter
Python

Data Engineering | Queries | Pipelines | Containers

Data Pipelines for Data Science

Roughly 80% of data science projects never make it into production. This is because the foundational underpinnings for storing, managing, and processing large datasets, and ensuring quality, security and availability lack a reliable and resilient infrastructure. Data engineering is the act of designing and building those reliable and resilient processes that transform and transport data along a series of "pipes" before arriving in a final state useful to data scientists.

Pipelines begin by ingesting data from many disparate sources where it is collected in raw form as a single source of truth. Then, as data is transported through the pipeline various data engineering tools and platforms are used to assemble, transform, connect and deliver data to the underlying storage and processing architectures needed by analytics and data science applications.

TOOLS & METHODS
Google Cloud Platform
Google BigQuery
Docker Containers
Apache Spark
Apache Kafka
Flask
Jupyter
Python
Data Pipelines
Data Transformation

Data Science Community

Launching of Berkeley-Data

Berkeley Data is a hub for sharing knowledge, tools and resources within the Data Science Community.

Actively maintained on GitHub, Berkeley Data is an evolving body of work curated by current and former students of the Master's of Information and Data Science (MIDS) program at UC Berkeley. The repositories contain guidelines, procedures, and code that enable support of a data science practice.


View More Projects

Experiments & Causal Inference

Moving the Needle on Public Opinion: An Experiment on the Persuasive Effects of Moral Frames

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Curabitur in iaculis ex. Etiam volutpat laoreet urna. Morbi ut tortor nec nulla commodo malesuada sit amet vel lacus. Fusce eget efficitur libero. Morbi dapibus porta quam laoreet placerat.

TOOLS & METHODS
Coming soon...

Descriptive Statistics | R | LaTeX

Local Policy Recommendations for Crime Reduction

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Curabitur in iaculis ex. Etiam volutpat laoreet urna. Morbi ut tortor nec nulla commodo malesuada sit amet vel lacus. Fusce eget efficitur libero. Morbi dapibus porta quam laoreet placerat.

TOOLS & METHODS
Coming soon...

Research Design | Bias | Ethics

Beyond Plug and Pray: How to ensure self-determination in a massively connected society.

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Curabitur in iaculis ex. Etiam volutpat laoreet urna. Morbi ut tortor nec nulla commodo malesuada sit amet vel lacus. Fusce eget efficitur libero. Morbi dapibus porta quam laoreet placerat.

TOOLS & METHODS
Coming soon...

Experience

Independent Technology and Business Consulting

Product Manager | Data Scientist | Data Engineer | Architect

  • Designed cloud-native technology solution for a next generation data management platform in the medical accreditation industry using domain driven design, and serverless microservices.
  • Managed delivery of a large-scale CRM solution with third party data integrations for a global industrial packaging manufacturer.
  • Determined product-market fit and managed delivery of a software product in financial and investment services industry using Business Model Canvas, Design Sprints, Lean Product Playbook, Design Thinking and Agile Software Delivery.
  • Assembled and optimized a data pipeline process in Apache NiFi, Spark, Hadoop, HDFS, Kafka, Spark Structured Streaming and PySpark.
  • Trained machine-learning model for a fraud detection using Scikit-learn, Python and Jupyter, and developed a live data pipeline for deployment.

SPR Consulting

Vice President

Delivery Leadership | Emerging Technology
  • Managed key client relationships and delivery engagements.
  • Researched and delivered emerging technology solutions for strategic clients.
  • Nurtured company culture of highly productive cross-functional teams.
  • Defined 300-page collection of best practices for professional services software delivery.
  • Pursued strategic initiatives on behalf of the CEO and CTO.
IoT Practice Lead
  • Defined IoT market strategy and offerings, and initiated partnerships.
  • Performed R&D in consumer, commercial and industry 4.0 space.
  • Built unique internal IoT products and facilitated hands-on IoT training labs.
  • Mentored staff in the IoT space.
Digital Strategist | Digital Lead

Developed four startups into a $5M portfolio of investments from ideation to execution using business modeling, technology strategy, and delivery management.

Redpoint Technologies

Principal Architect

  • Facilitated hundreds of interactive workshops for ideation, scoping and project definition.
  • Managed 50+ projects across industries, from Fortune 500 to early startups.
  • Built system architectures, from e-commerce portals and data analytics applications to sophisticated yield-management processes and multi-threaded grid solutions.
  • Managed teams and delivered solutions using Agile methods and industry-leading architecture and design practices.
  • Conducted Agile training camps, assessments and organizational transformations.
  • Advised business owners in refining and validating their ideas and business models.
  • Conducted architecture reviews with recommendations on technology/strategy advice.

Leapnet

Senior Software Architect

Designed and built several revenue-enhancement and reporting systems for clients in the hospitality industry. Systems include Data Warehouses and custom applications providing forecasting, yield-management and revenue tracking for hotel managers and corporate revenue-management personnel.

Parthenon Consulting Group

Principal Consultant

Designed and built several system applications for an insurance industry client. Systems include a Data Warehouse for Investment Fund Performance Review and a variety of payroll, benefit and compensation applications for HR.

Parian Development Group

Senior Software Developer

Delivered client projects for a software development consultancy startup, including work in insurance, banking, financial, reality, automotive, and records-retention fields.

U.S. Army Construction Engineering Research Lab

Student Contracter | Programmer

Developed software modules for a railroad inspection and track-management system in C and RBASE.

Education

Testimonials

All words shared are shared with permission.

Technical Skills

Get in Touch

On our travels we inevitibly run into a complex problem. What problem did you see and how will you solve it?



Grab a Time

If you have more than a few travel stories to exchange here are some times you can reach me.

Schedule a Meeting