Agile Data Science 2.0: Building Full-Stack Data Analytics Applications with Spark | Book Series

Overview:

This book attempts to introduce a new methodology for analytics product development the book accomplishes it’s stated goal. Although somewhat lengthy, the flow of information within this book stays focused on the critical path to the end product while covering documentation, facilitation, exploration, and discovery. A reappearing theme of aligning data science with the rest of the organization is present throughout.

With the revised second edition of this hands-on guide, up-and-coming data scientists will learn how to use the Agile Data Science development methodology to build data applications with Python, Apache Spark, Kafka and other tools.

Author Russell Jurney demonstrates how to compose a data platform for building, deploying and refining analytics applications with Apache Kafka, MongoDB, ElasticSearch, d3.js, scikit-learn and Apache Airflow. Youíll learn an iterative approach that lets you quickly change the kind of analysis youíre doing, depending on what the data is telling you. Publish data science work as a web application and affect meaningful change in your organization.

Authors:

Russell Jurney

Published In:

2017

 

Cumulative Flow | Glossary

Definition:

Cumulative Flow diagrams is a tool used in kanban method to showcase the bottlenecks in real-time. It helps the team to visualize their effort put in completing a project and the impediments that are about to occur in the process. Cumulative Flow diagrams are used to visually represent how many items are in progress in each day.

The Cumulative Flow Diagram shows the way the tasks mount up and their distribution along the different process stages. The graph is built from different colored bands of tasks gathered in different columns. One color per column – each band tells you how many tasks sits in what stage of the process in a given time (the horizontal value). The CFD only requires 3 basic things from the process – a Backlog, an In Progress column and a Done section – using this division allows you to see the information in the diagram well. Therefore, any team that utilizes this kind of workflow division can benefit from Cumulative Flow information. Whether you use Scrum, Kanban or any other custom project management method, as long as you organize it in task groups, the CFD will be helpful to you.

Further Reading:

Lean from the Trenches: Managing large scale projects with Kanban by Henrick Kniberg