Global Container Volumes and Dwell Times in 2020 Data Period = 1 Year (2020) Data Frequency = Monthly by Continents by Countries by Movement Type Volume [containers] Dwell [days] Here we explore the entire dataset for 2020 focusing on 2 numerical factors 1) Container Volumes measured in containers 2) Container Dwell times measure in days Each datapoint is for a month, a location So the 2 numerical factors can be expressed as: 1) Volume( t=month, loc=country ) 2) Dwell( t=month, loc=country ) Both factors are clearly - all non-negative - positively skewed (long right-tail) So we try two methods to transform the data 1) Exclude Outliers 2) Logarithmic Transformation (for non-negative values) 2.1) y = ln( x ) : for data without zeros 2.2) y = log( 1 + x) : for data with zeros
Data Original
Top Row = Association Between
y = Volume [containers]
x = Dwell [days]
Top Left = Split High-Low by Mean
Top Right = Split High-Low by Median
Bottom Left = Scatter Plot Betwwen
y = Volume [containers]
x = Dwell [days]
n = Sample Size
= 22049
r = rho_{s}
= Sample Correlation Coefficient
= CORREL( Volume, Dwell )
= -0.0118
Relationship between Volume and Dwell is
NOT LINEAR
Try Log-transformation
Data Transformed = ln( Data ) = Natural Logarithm of Data
Top Row = Association Between
y = ln( Volume [containers] )
x = ln( Dwell [days] )
Top Left = Split High-Low by Mean
Top Right = Split High-Low by Median
Bottom Left = Scatter Plot Betwwen
y = ln( Volume [containers] )
x = ln( Dwell [days] )
n = Sample Size
= 22049
r = rho_{s}
= Sample Correlation Coefficient
= CORREL( ln(Volume), ln(Dwell) )
= +0.0545
Relationship between Volume and Dwell is
STILL NOT LINEAR
Data Transformed = log_{10}( 1+Data ) = Logarithm Base-10 of 1-Plus-Data
Top Row = Association Between
y = log( 1 + Volume [containers] )
x = log( 1 + Dwell [days] )
Top Left = Split High-Low by Mean
Top Right = Split High-Low by Median
Bottom Left = Scatter Plot Betwwen
y = log( 1 + Volume [containers] )
x = log( 1 + Dwell [days] )
n = Sample Size
= 22049
r = rho_{s}
= Sample Correlation Coefficient
= CORREL( ln(Volume), ln(Dwell) )
= +0.0727
Relationship between Volume and Dwell is
STILL NOT LINEAR
but it looks like there is some form shaping up.
There looks like
a faint cluster with (+) correlation
a faint cluster with (-) correlation
Data Transformed = sqrt( Data ) = Square-Root of Data
Top Row = Association Between
y = sqrt( Volume [containers] )
x = sqrt( Dwell [days] )
Top Left = Split High-Low by Mean
Top Right = Split High-Low by Median
Bottom Left = Scatter Plot Betwwen
y = sqrt( Volume [containers] )
x = sqrt( Dwell [days] )
n = Sample Size
= 22049
r = rho_{s}
= Sample Correlation Coefficient
= CORREL( ln(Volume), ln(Dwell) )
= +0.0209
Relationship between Volume and Dwell is
STILL NOT LINEAR
but it looks like there is some form shaping up.
There looks like
a 'spike' cluster with (+) correlation (red oval)
Data Transformed = Data^{-0.05} = Data to the Power of Negative 5 Percent
Top Row = Association Between
y = ( Volume [containers] )^{-0.05}
x = ( Dwell [days] )^{-0.05}
Top Left = Split High-Low by Mean
Top Right = Split High-Low by Median
Bottom Left = Scatter Plot Betwwen
y = ( Volume [containers] )^{-0.05}
x = ( Dwell [days] )^{-0.05}
n = Sample Size
= 22049
r = rho_{s}
= Sample Correlation Coefficient
= CORREL( ln(Volume), ln(Dwell) )
= -0.0977
Relationship between Volume and Dwell is
STILL NOT LINEAR

