DIDM = Data-Informed Decision Making
[1.0.0] Ask: Question
[1.1.0] Define Problem
[1.2.0] Choose Measure
[1.3.0] Determine Factors
[2.0.0] Acquire: Data
[2.1.0] Design Experiment
[2.2.0] Collect Data
[2.2.1] Data Extract
[2.2.2] Data Load: Excel
> Data > Get Data> From File > From Excel Workbook
> Select WB > Navigator > Select Sheets > Transform Data
[2.2.3] Data Transform
[3.0.0] Analyze: Data
[3.1.0] Box-Plot
IQR = Inter-quartile Range = Q3 - Q1
Q4_Max = largest value in sample, assume outlier
+------------------------------------- FU = Fence_Upper = Q3 + 1.5 * IQR
+--------------------------- L = Whisker_High = (largest value < FU)
Q3
Q2_Median, Mean, Mode
Q1
+--------------------------- S = Whisker_Low = (smallest value > FL)
+------------------------------------- FL = Fence_Lower = Q1 - 1.5 * IQR
Q0_Min = smallest value in sample, assume outlier
[3.2.0] Iterate
[4.0.0] Apply: Expertise, Insights, Data Story
Who is audience?
What key insight?
What story main message?
What chart type for visualization?
What is audience action target?
[5.0.0] Announce: Decide, Communicate
[6.0.0] Assess: Monitor outcome
Bias?
ASK = Questions
What is happening?
Why?
Will it happen again?
What will happen if we make change to inputs?
What is the data telling us?
ANALYZE = Analytical Methods
Mean, Median, Mode, SD, min, max
ANALYZE = Predictive Analytics
Logistic Regression
Linear Regression
ANALYZE = Prescriptive Analytics
How to make the best happen?
Maximize profit
Maximize revenue
Minimize cost
Step 1 = Discovery
ASK = Define business challenge
How can we maximize profit
Maximize Revenue
Match demand and supply of containers, by location
Reduce Volume of Empty Container
Minimize Cost
Reduce Dwell Time
ASK = Identify key analytical questions
What factors contribute to Dwell time?
Recognize stakeholders
The company and shareholders
ACQUIRE = Evaluate data sources (internal, external)
Internal Company data
Volume, Dwell
External data
Location holidays
Location distances
Define Success
Get valuable insights for decision making from the data
Data source?
Data sample represent population?
Data distribution include outliers? Affect results?
Assumptions behind analysis?
Any conditions violate assumptions ==> model invalid?
Analytical approach? Any alternatives?
Causality: dX causes dY?
Correlation != Causality
Root Cause = Why * 5
[Population]
+-- Random Select
+-- [Sample]
+-- Random Assign
+-- [Group_Control]
+-- [Group_Treatment]
Avoid Bias
+-- First-conclusion
+-- Confirmation
+-- Survivor
Step 2 = Data Prep
ACQUIRE = Extract data from DB
.xlsx file
ACQUIRE = Load data into workspace
Databricks
ANALYZE = Data quality check
Data Exploration
Excel
Power BI
Python, Streamlit
Step 3 = Model Planning
ACQUIRE = Transform data
Numerical
Categorical
Step 4 = Model Building
ANALYZE = Multiple Linear Regression
Simple Linear Regression
ANALYZE = Visualize
Step 5 = Results Communicate ANNOUNCE = Teams ANNOUNCE = TLoom ANNOUNCE = In-person classes (after)
Step 6 = Operationalize ASSESS = Predict Dwell Times APPLY = Do the results make sense?

