Overview

Dataset statistics

Number of variables9
Number of observations398
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory28.1 KiB
Average record size in memory72.3 B

Variable types

Numeric5
Categorical4

Alerts

horsepower has a high cardinality: 94 distinct values High cardinality
car name has a high cardinality: 305 distinct values High cardinality
mpg is highly correlated with cylinders and 4 other fieldsHigh correlation
cylinders is highly correlated with mpg and 3 other fieldsHigh correlation
displacement is highly correlated with mpg and 3 other fieldsHigh correlation
weight is highly correlated with mpg and 3 other fieldsHigh correlation
model year is highly correlated with mpgHigh correlation
origin is highly correlated with mpg and 3 other fieldsHigh correlation
mpg is highly correlated with cylinders and 4 other fieldsHigh correlation
cylinders is highly correlated with mpg and 4 other fieldsHigh correlation
displacement is highly correlated with mpg and 4 other fieldsHigh correlation
weight is highly correlated with mpg and 3 other fieldsHigh correlation
acceleration is highly correlated with cylinders and 1 other fieldsHigh correlation
model year is highly correlated with mpgHigh correlation
origin is highly correlated with mpg and 3 other fieldsHigh correlation
mpg is highly correlated with cylinders and 2 other fieldsHigh correlation
cylinders is highly correlated with mpg and 3 other fieldsHigh correlation
displacement is highly correlated with mpg and 3 other fieldsHigh correlation
weight is highly correlated with mpg and 2 other fieldsHigh correlation
origin is highly correlated with cylinders and 1 other fieldsHigh correlation
cylinders is highly correlated with horsepowerHigh correlation
horsepower is highly correlated with cylinders and 1 other fieldsHigh correlation
origin is highly correlated with horsepowerHigh correlation
mpg is highly correlated with cylinders and 6 other fieldsHigh correlation
cylinders is highly correlated with mpg and 5 other fieldsHigh correlation
displacement is highly correlated with mpg and 5 other fieldsHigh correlation
horsepower is highly correlated with mpg and 6 other fieldsHigh correlation
weight is highly correlated with mpg and 5 other fieldsHigh correlation
acceleration is highly correlated with mpg and 4 other fieldsHigh correlation
model year is highly correlated with mpg and 1 other fieldsHigh correlation
origin is highly correlated with mpg and 4 other fieldsHigh correlation
car name is uniformly distributed Uniform

Reproduction

Analysis started2022-06-09 10:43:57.316610
Analysis finished2022-06-09 10:44:03.612143
Duration6.3 seconds
Software versionpandas-profiling v3.2.0
Download configurationconfig.json

Variables

mpg
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct129
Distinct (%)32.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean23.51457286
Minimum9
Maximum46.6
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.2 KiB
2022-06-09T12:44:03.699632image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum9
5-th percentile13
Q117.5
median23
Q329
95-th percentile37.03
Maximum46.6
Range37.6
Interquartile range (IQR)11.5

Descriptive statistics

Standard deviation7.815984313
Coefficient of variation (CV)0.3323889555
Kurtosis-0.5107812652
Mean23.51457286
Median Absolute Deviation (MAD)6
Skewness0.457066344
Sum9358.8
Variance61.08961077
MonotonicityNot monotonic
2022-06-09T12:44:03.816871image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1320
 
5.0%
1419
 
4.8%
1817
 
4.3%
1516
 
4.0%
2614
 
3.5%
1613
 
3.3%
1912
 
3.0%
2511
 
2.8%
2411
 
2.8%
2210
 
2.5%
Other values (119)255
64.1%
ValueCountFrequency (%)
91
 
0.3%
102
 
0.5%
114
 
1.0%
126
 
1.5%
1320
5.0%
1419
4.8%
14.51
 
0.3%
1516
4.0%
15.55
 
1.3%
1613
3.3%
ValueCountFrequency (%)
46.61
0.3%
44.61
0.3%
44.31
0.3%
441
0.3%
43.41
0.3%
43.11
0.3%
41.51
0.3%
40.91
0.3%
40.81
0.3%
39.41
0.3%

cylinders
Categorical

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct5
Distinct (%)1.3%
Missing0
Missing (%)0.0%
Memory size3.2 KiB
4
204 
8
103 
6
84 
3
 
4
5
 
3

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters398
Distinct characters5
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row8
2nd row8
3rd row8
4th row8
5th row8

Common Values

ValueCountFrequency (%)
4204
51.3%
8103
25.9%
684
21.1%
34
 
1.0%
53
 
0.8%

Length

2022-06-09T12:44:03.916678image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-06-09T12:44:04.024472image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
4204
51.3%
8103
25.9%
684
21.1%
34
 
1.0%
53
 
0.8%

Most occurring characters

ValueCountFrequency (%)
4204
51.3%
8103
25.9%
684
21.1%
34
 
1.0%
53
 
0.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number398
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
4204
51.3%
8103
25.9%
684
21.1%
34
 
1.0%
53
 
0.8%

Most occurring scripts

ValueCountFrequency (%)
Common398
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
4204
51.3%
8103
25.9%
684
21.1%
34
 
1.0%
53
 
0.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII398
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
4204
51.3%
8103
25.9%
684
21.1%
34
 
1.0%
53
 
0.8%

displacement
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct82
Distinct (%)20.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean193.4258794
Minimum68
Maximum455
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.2 KiB
2022-06-09T12:44:04.127018image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum68
5-th percentile85
Q1104.25
median148.5
Q3262
95-th percentile400
Maximum455
Range387
Interquartile range (IQR)157.75

Descriptive statistics

Standard deviation104.2698382
Coefficient of variation (CV)0.5390687042
Kurtosis-0.7465966296
Mean193.4258794
Median Absolute Deviation (MAD)58.5
Skewness0.7196451643
Sum76983.5
Variance10872.19915
MonotonicityNot monotonic
2022-06-09T12:44:04.250559image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
9721
 
5.3%
9818
 
4.5%
35018
 
4.5%
31817
 
4.3%
25017
 
4.3%
14016
 
4.0%
40013
 
3.3%
22513
 
3.3%
9112
 
3.0%
23211
 
2.8%
Other values (72)242
60.8%
ValueCountFrequency (%)
681
 
0.3%
703
0.8%
712
 
0.5%
721
 
0.3%
761
 
0.3%
781
 
0.3%
796
1.5%
801
 
0.3%
811
 
0.3%
831
 
0.3%
ValueCountFrequency (%)
4553
 
0.8%
4541
 
0.3%
4402
 
0.5%
4293
 
0.8%
40013
3.3%
3901
 
0.3%
3832
 
0.5%
3604
 
1.0%
3518
2.0%
35018
4.5%

horsepower
Categorical

HIGH CARDINALITY
HIGH CORRELATION
HIGH CORRELATION

Distinct94
Distinct (%)23.6%
Missing0
Missing (%)0.0%
Memory size3.2 KiB
150
 
22
90
 
20
88
 
19
110
 
18
100
 
17
Other values (89)
302 

Length

Max length3
Median length2
Mean length2.404522613
Min length1

Characters and Unicode

Total characters957
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique35 ?
Unique (%)8.8%

Sample

1st row130
2nd row165
3rd row150
4th row150
5th row140

Common Values

ValueCountFrequency (%)
15022
 
5.5%
9020
 
5.0%
8819
 
4.8%
11018
 
4.5%
10017
 
4.3%
7514
 
3.5%
9514
 
3.5%
10512
 
3.0%
7012
 
3.0%
6712
 
3.0%
Other values (84)238
59.8%

Length

2022-06-09T12:44:04.360364image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
15022
 
5.5%
9020
 
5.0%
8819
 
4.8%
11018
 
4.5%
10017
 
4.3%
7514
 
3.5%
9514
 
3.5%
10512
 
3.0%
7012
 
3.0%
6712
 
3.0%
Other values (84)238
59.8%

Most occurring characters

ValueCountFrequency (%)
1197
20.6%
0171
17.9%
5129
13.5%
8106
11.1%
790
9.4%
975
 
7.8%
667
 
7.0%
252
 
5.4%
435
 
3.7%
329
 
3.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number951
99.4%
Other Punctuation6
 
0.6%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1197
20.7%
0171
18.0%
5129
13.6%
8106
11.1%
790
9.5%
975
 
7.9%
667
 
7.0%
252
 
5.5%
435
 
3.7%
329
 
3.0%
Other Punctuation
ValueCountFrequency (%)
?6
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common957
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1197
20.6%
0171
17.9%
5129
13.5%
8106
11.1%
790
9.4%
975
 
7.8%
667
 
7.0%
252
 
5.4%
435
 
3.7%
329
 
3.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII957
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1197
20.6%
0171
17.9%
5129
13.5%
8106
11.1%
790
9.4%
975
 
7.8%
667
 
7.0%
252
 
5.4%
435
 
3.7%
329
 
3.0%

weight
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct351
Distinct (%)88.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2970.424623
Minimum1613
Maximum5140
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.2 KiB
2022-06-09T12:44:04.466610image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum1613
5-th percentile1923.5
Q12223.75
median2803.5
Q33608
95-th percentile4464
Maximum5140
Range3527
Interquartile range (IQR)1384.25

Descriptive statistics

Standard deviation846.8417742
Coefficient of variation (CV)0.2850911508
Kurtosis-0.7855289051
Mean2970.424623
Median Absolute Deviation (MAD)637.5
Skewness0.5310625126
Sum1182229
Variance717140.9905
MonotonicityNot monotonic
2022-06-09T12:44:04.585419image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
19854
 
1.0%
21304
 
1.0%
21253
 
0.8%
29453
 
0.8%
22653
 
0.8%
23003
 
0.8%
21553
 
0.8%
27203
 
0.8%
18002
 
0.5%
24082
 
0.5%
Other values (341)368
92.5%
ValueCountFrequency (%)
16131
0.3%
16491
0.3%
17551
0.3%
17601
0.3%
17731
0.3%
17952
0.5%
18002
0.5%
18252
0.5%
18341
0.3%
18352
0.5%
ValueCountFrequency (%)
51401
0.3%
49971
0.3%
49551
0.3%
49521
0.3%
49511
0.3%
49061
0.3%
47461
0.3%
47351
0.3%
47321
0.3%
46991
0.3%

acceleration
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION

Distinct95
Distinct (%)23.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean15.56809045
Minimum8
Maximum24.8
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.2 KiB
2022-06-09T12:44:04.700365image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum8
5-th percentile11.285
Q113.825
median15.5
Q317.175
95-th percentile20.415
Maximum24.8
Range16.8
Interquartile range (IQR)3.35

Descriptive statistics

Standard deviation2.75768893
Coefficient of variation (CV)0.1771372628
Kurtosis0.419496883
Mean15.56809045
Median Absolute Deviation (MAD)1.7
Skewness0.2787768446
Sum6196.1
Variance7.604848234
MonotonicityNot monotonic
2022-06-09T12:44:04.821472image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
14.523
 
5.8%
15.521
 
5.3%
1416
 
4.0%
1616
 
4.0%
13.515
 
3.8%
1714
 
3.5%
1514
 
3.5%
16.513
 
3.3%
1912
 
3.0%
1312
 
3.0%
Other values (85)242
60.8%
ValueCountFrequency (%)
81
 
0.3%
8.52
 
0.5%
91
 
0.3%
9.52
 
0.5%
104
1.0%
10.51
 
0.3%
117
1.8%
11.11
 
0.3%
11.21
 
0.3%
11.31
 
0.3%
ValueCountFrequency (%)
24.81
0.3%
24.61
0.3%
23.71
0.3%
23.51
0.3%
22.22
0.5%
22.11
0.3%
21.91
0.3%
21.81
0.3%
21.71
0.3%
21.51
0.3%

model year
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct13
Distinct (%)3.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean76.01005025
Minimum70
Maximum82
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.2 KiB
2022-06-09T12:44:04.917700image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum70
5-th percentile70
Q173
median76
Q379
95-th percentile82
Maximum82
Range12
Interquartile range (IQR)6

Descriptive statistics

Standard deviation3.697626647
Coefficient of variation (CV)0.04864654917
Kurtosis-1.181231743
Mean76.01005025
Median Absolute Deviation (MAD)3
Skewness0.01153459402
Sum30252
Variance13.67244282
MonotonicityIncreasing
2022-06-09T12:44:05.013074image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=13)
ValueCountFrequency (%)
7340
10.1%
7836
9.0%
7634
8.5%
8231
 
7.8%
7530
 
7.5%
7029
 
7.3%
7929
 
7.3%
8029
 
7.3%
8129
 
7.3%
7128
 
7.0%
Other values (3)83
20.9%
ValueCountFrequency (%)
7029
7.3%
7128
7.0%
7228
7.0%
7340
10.1%
7427
6.8%
7530
7.5%
7634
8.5%
7728
7.0%
7836
9.0%
7929
7.3%
ValueCountFrequency (%)
8231
7.8%
8129
7.3%
8029
7.3%
7929
7.3%
7836
9.0%
7728
7.0%
7634
8.5%
7530
7.5%
7427
6.8%
7340
10.1%

origin
Categorical

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct3
Distinct (%)0.8%
Missing0
Missing (%)0.0%
Memory size3.2 KiB
1
249 
3
79 
2
70 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters398
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
1249
62.6%
379
 
19.8%
270
 
17.6%

Length

2022-06-09T12:44:05.105611image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-06-09T12:44:05.199452image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
1249
62.6%
379
 
19.8%
270
 
17.6%

Most occurring characters

ValueCountFrequency (%)
1249
62.6%
379
 
19.8%
270
 
17.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number398
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1249
62.6%
379
 
19.8%
270
 
17.6%

Most occurring scripts

ValueCountFrequency (%)
Common398
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1249
62.6%
379
 
19.8%
270
 
17.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII398
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1249
62.6%
379
 
19.8%
270
 
17.6%

car name
Categorical

HIGH CARDINALITY
UNIFORM

Distinct305
Distinct (%)76.6%
Missing0
Missing (%)0.0%
Memory size3.2 KiB
ford pinto
 
6
toyota corolla
 
5
amc matador
 
5
ford maverick
 
5
chevrolet chevette
 
4
Other values (300)
373 

Length

Max length36
Median length28
Mean length16.09547739
Min length6

Characters and Unicode

Total characters6406
Distinct characters45
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique249 ?
Unique (%)62.6%

Sample

1st rowchevrolet chevelle malibu
2nd rowbuick skylark 320
3rd rowplymouth satellite
4th rowamc rebel sst
5th rowford torino

Common Values

ValueCountFrequency (%)
ford pinto6
 
1.5%
toyota corolla5
 
1.3%
amc matador5
 
1.3%
ford maverick5
 
1.3%
chevrolet chevette4
 
1.0%
amc gremlin4
 
1.0%
chevrolet impala4
 
1.0%
peugeot 5044
 
1.0%
amc hornet4
 
1.0%
toyota corona4
 
1.0%
Other values (295)353
88.7%

Length

2022-06-09T12:44:05.297434image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
ford51
 
4.9%
chevrolet43
 
4.1%
plymouth31
 
3.0%
sw28
 
2.7%
amc28
 
2.7%
dodge28
 
2.7%
toyota25
 
2.4%
datsun23
 
2.2%
custom18
 
1.7%
buick17
 
1.6%
Other values (305)748
71.9%

Most occurring characters

ValueCountFrequency (%)
642
 
10.0%
o530
 
8.3%
a501
 
7.8%
e419
 
6.5%
r390
 
6.1%
t383
 
6.0%
c352
 
5.5%
l332
 
5.2%
d267
 
4.2%
i255
 
4.0%
Other values (35)2335
36.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter5363
83.7%
Space Separator642
 
10.0%
Decimal Number309
 
4.8%
Open Punctuation36
 
0.6%
Close Punctuation36
 
0.6%
Dash Punctuation10
 
0.2%
Other Punctuation8
 
0.1%
Math Symbol2
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o530
 
9.9%
a501
 
9.3%
e419
 
7.8%
r390
 
7.3%
t383
 
7.1%
c352
 
6.6%
l332
 
6.2%
d267
 
5.0%
i255
 
4.8%
s247
 
4.6%
Other values (16)1687
31.5%
Decimal Number
ValueCountFrequency (%)
0100
32.4%
156
18.1%
248
15.5%
426
 
8.4%
525
 
8.1%
315
 
4.9%
613
 
4.2%
811
 
3.6%
911
 
3.6%
74
 
1.3%
Other Punctuation
ValueCountFrequency (%)
/3
37.5%
.3
37.5%
@1
 
12.5%
'1
 
12.5%
Space Separator
ValueCountFrequency (%)
642
100.0%
Open Punctuation
ValueCountFrequency (%)
(36
100.0%
Close Punctuation
ValueCountFrequency (%)
)36
100.0%
Dash Punctuation
ValueCountFrequency (%)
-10
100.0%
Math Symbol
ValueCountFrequency (%)
+2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin5363
83.7%
Common1043
 
16.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
o530
 
9.9%
a501
 
9.3%
e419
 
7.8%
r390
 
7.3%
t383
 
7.1%
c352
 
6.6%
l332
 
6.2%
d267
 
5.0%
i255
 
4.8%
s247
 
4.6%
Other values (16)1687
31.5%
Common
ValueCountFrequency (%)
642
61.6%
0100
 
9.6%
156
 
5.4%
248
 
4.6%
(36
 
3.5%
)36
 
3.5%
426
 
2.5%
525
 
2.4%
315
 
1.4%
613
 
1.2%
Other values (9)46
 
4.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII6406
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
642
 
10.0%
o530
 
8.3%
a501
 
7.8%
e419
 
6.5%
r390
 
6.1%
t383
 
6.0%
c352
 
5.5%
l332
 
5.2%
d267
 
4.2%
i255
 
4.0%
Other values (35)2335
36.5%

Interactions

2022-06-09T12:44:02.857119image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-09T12:44:01.126331image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-09T12:44:01.596361image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-09T12:44:02.020089image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-09T12:44:02.440794image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-09T12:44:02.940899image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-09T12:44:01.232289image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-09T12:44:01.681396image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-09T12:44:02.103514image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-09T12:44:02.524958image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-09T12:44:03.024842image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-09T12:44:01.345349image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-09T12:44:01.767846image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-09T12:44:02.188987image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-09T12:44:02.609515image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-09T12:44:03.114525image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-09T12:44:01.430497image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-09T12:44:01.853496image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-09T12:44:02.272947image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-09T12:44:02.693202image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-09T12:44:03.303781image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-09T12:44:01.514163image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-09T12:44:01.937963image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-09T12:44:02.359249image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-09T12:44:02.776681image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Correlations

2022-06-09T12:44:05.382242image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-06-09T12:44:05.564620image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-06-09T12:44:05.872964image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-06-09T12:44:06.031279image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
2022-06-09T12:44:06.183893image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-06-09T12:44:03.445366image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
A simple visualization of nullity by column.
2022-06-09T12:44:03.559419image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

mpgcylindersdisplacementhorsepowerweightaccelerationmodel yearorigincar name
018.08307.0130350412.0701chevrolet chevelle malibu
115.08350.0165369311.5701buick skylark 320
218.08318.0150343611.0701plymouth satellite
316.08304.0150343312.0701amc rebel sst
417.08302.0140344910.5701ford torino
515.08429.0198434110.0701ford galaxie 500
614.08454.022043549.0701chevrolet impala
714.08440.021543128.5701plymouth fury iii
814.08455.0225442510.0701pontiac catalina
915.08390.019038508.5701amc ambassador dpl

Last rows

mpgcylindersdisplacementhorsepowerweightaccelerationmodel yearorigincar name
38826.04156.092258514.5821chrysler lebaron medallion
38922.06232.0112283514.7821ford granada l
39032.04144.096266513.9823toyota celica gt
39136.04135.084237013.0821dodge charger 2.2
39227.04151.090295017.3821chevrolet camaro
39327.04140.086279015.6821ford mustang gl
39444.0497.052213024.6822vw pickup
39532.04135.084229511.6821dodge rampage
39628.04120.079262518.6821ford ranger
39731.04119.082272019.4821chevy s-10