Overview

Dataset statistics

Number of variables5
Number of observations2604
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory101.8 KiB
Average record size in memory40.0 B

Variable types

Categorical2
Numeric3

Alerts

refjunction has constant value "urn:ngsi-ld:Junction:54201" Constant
date has a high cardinality: 110 distinct values High cardinality
avg_co2 is highly correlated with avg_numvehiclesHigh correlation
avg_numvehicles is highly correlated with avg_co2High correlation
avg_co2 is highly correlated with avg_numvehiclesHigh correlation
avg_numvehicles is highly correlated with avg_co2High correlation
hour is highly correlated with avg_co2 and 1 other fieldsHigh correlation
avg_co2 is highly correlated with hour and 1 other fieldsHigh correlation
avg_numvehicles is highly correlated with hour and 1 other fieldsHigh correlation
date is uniformly distributed Uniform
hour has 108 (4.1%) zeros Zeros

Reproduction

Analysis started2022-06-14 11:18:25.529793
Analysis finished2022-06-14 11:18:31.470588
Duration5.94 seconds
Software versionpandas-profiling v3.2.0
Download configurationconfig.json

Variables

refjunction
Categorical

CONSTANT
REJECTED

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size20.5 KiB
urn:ngsi-ld:Junction:54201
2604 

Length

Max length26
Median length26
Mean length26
Min length26

Characters and Unicode

Total characters67704
Distinct characters19
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowurn:ngsi-ld:Junction:54201
2nd rowurn:ngsi-ld:Junction:54201
3rd rowurn:ngsi-ld:Junction:54201
4th rowurn:ngsi-ld:Junction:54201
5th rowurn:ngsi-ld:Junction:54201

Common Values

ValueCountFrequency (%)
urn:ngsi-ld:Junction:542012604
100.0%

Length

2022-06-14T11:18:31.533178image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-06-14T11:18:31.651530image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
urn:ngsi-ld:junction:542012604
100.0%

Most occurring characters

ValueCountFrequency (%)
n10416
15.4%
:7812
 
11.5%
u5208
 
7.7%
i5208
 
7.7%
c2604
 
3.8%
02604
 
3.8%
22604
 
3.8%
42604
 
3.8%
52604
 
3.8%
o2604
 
3.8%
Other values (9)23436
34.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter41664
61.5%
Decimal Number13020
 
19.2%
Other Punctuation7812
 
11.5%
Uppercase Letter2604
 
3.8%
Dash Punctuation2604
 
3.8%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
n10416
25.0%
u5208
12.5%
i5208
12.5%
c2604
 
6.2%
o2604
 
6.2%
t2604
 
6.2%
d2604
 
6.2%
r2604
 
6.2%
l2604
 
6.2%
s2604
 
6.2%
Decimal Number
ValueCountFrequency (%)
02604
20.0%
22604
20.0%
42604
20.0%
52604
20.0%
12604
20.0%
Other Punctuation
ValueCountFrequency (%)
:7812
100.0%
Uppercase Letter
ValueCountFrequency (%)
J2604
100.0%
Dash Punctuation
ValueCountFrequency (%)
-2604
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin44268
65.4%
Common23436
34.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
n10416
23.5%
u5208
11.8%
i5208
11.8%
c2604
 
5.9%
o2604
 
5.9%
t2604
 
5.9%
d2604
 
5.9%
J2604
 
5.9%
r2604
 
5.9%
l2604
 
5.9%
Other values (2)5208
11.8%
Common
ValueCountFrequency (%)
:7812
33.3%
02604
 
11.1%
22604
 
11.1%
42604
 
11.1%
52604
 
11.1%
-2604
 
11.1%
12604
 
11.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII67704
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
n10416
15.4%
:7812
 
11.5%
u5208
 
7.7%
i5208
 
7.7%
c2604
 
3.8%
02604
 
3.8%
22604
 
3.8%
42604
 
3.8%
52604
 
3.8%
o2604
 
3.8%
Other values (9)23436
34.6%

date
Categorical

HIGH CARDINALITY
UNIFORM

Distinct110
Distinct (%)4.2%
Missing0
Missing (%)0.0%
Memory size20.5 KiB
2022-04-21
 
24
2022-05-17
 
24
2022-05-15
 
24
2022-05-14
 
24
2022-05-13
 
24
Other values (105)
2484 

Length

Max length10
Median length10
Mean length10
Min length10

Characters and Unicode

Total characters26040
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2022-02-25
2nd row2022-02-25
3rd row2022-02-25
4th row2022-02-25
5th row2022-02-25

Common Values

ValueCountFrequency (%)
2022-04-2124
 
0.9%
2022-05-1724
 
0.9%
2022-05-1524
 
0.9%
2022-05-1424
 
0.9%
2022-05-1324
 
0.9%
2022-05-1224
 
0.9%
2022-05-1124
 
0.9%
2022-05-1024
 
0.9%
2022-05-0924
 
0.9%
2022-05-0824
 
0.9%
Other values (100)2364
90.8%

Length

2022-06-14T11:18:31.741823image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
2022-04-2124
 
0.9%
2022-04-2024
 
0.9%
2022-03-0224
 
0.9%
2022-03-0324
 
0.9%
2022-03-0424
 
0.9%
2022-03-0524
 
0.9%
2022-03-0624
 
0.9%
2022-03-0724
 
0.9%
2022-03-0824
 
0.9%
2022-03-0924
 
0.9%
Other values (100)2364
90.8%

Most occurring characters

ValueCountFrequency (%)
28938
34.3%
06308
24.2%
-5208
20.0%
11141
 
4.4%
31124
 
4.3%
5990
 
3.8%
4977
 
3.8%
6590
 
2.3%
7262
 
1.0%
8262
 
1.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number20832
80.0%
Dash Punctuation5208
 
20.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
28938
42.9%
06308
30.3%
11141
 
5.5%
31124
 
5.4%
5990
 
4.8%
4977
 
4.7%
6590
 
2.8%
7262
 
1.3%
8262
 
1.3%
9240
 
1.2%
Dash Punctuation
ValueCountFrequency (%)
-5208
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common26040
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
28938
34.3%
06308
24.2%
-5208
20.0%
11141
 
4.4%
31124
 
4.3%
5990
 
3.8%
4977
 
3.8%
6590
 
2.3%
7262
 
1.0%
8262
 
1.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII26040
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
28938
34.3%
06308
24.2%
-5208
20.0%
11141
 
4.4%
31124
 
4.3%
5990
 
3.8%
4977
 
3.8%
6590
 
2.3%
7262
 
1.0%
8262
 
1.0%

hour
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct24
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean11.50384025
Minimum0
Maximum23
Zeros108
Zeros (%)4.1%
Negative0
Negative (%)0.0%
Memory size20.5 KiB
2022-06-14T11:18:31.850590image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1
Q16
median12
Q317
95-th percentile22
Maximum23
Range23
Interquartile range (IQR)11

Descriptive statistics

Standard deviation6.919019026
Coefficient of variation (CV)0.6014529825
Kurtosis-1.202047064
Mean11.50384025
Median Absolute Deviation (MAD)6
Skewness-0.0008799774518
Sum29956
Variance47.87282428
MonotonicityNot monotonic
2022-06-14T11:18:31.965879image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=24)
ValueCountFrequency (%)
16109
 
4.2%
21109
 
4.2%
22109
 
4.2%
15109
 
4.2%
14109
 
4.2%
13109
 
4.2%
2109
 
4.2%
3109
 
4.2%
12109
 
4.2%
11109
 
4.2%
Other values (14)1514
58.1%
ValueCountFrequency (%)
0108
4.1%
1108
4.1%
2109
4.2%
3109
4.2%
4108
4.1%
5108
4.1%
6108
4.1%
7108
4.1%
8108
4.1%
9109
4.2%
ValueCountFrequency (%)
23108
4.1%
22109
4.2%
21109
4.2%
20108
4.1%
19108
4.1%
18108
4.1%
17108
4.1%
16109
4.2%
15109
4.2%
14109
4.2%

avg_co2
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct1862
Distinct (%)71.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean10.02014226
Minimum1.451666667
Maximum27.612
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size20.5 KiB
2022-06-14T11:18:32.098233image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum1.451666667
5-th percentile1.847944196
Q19.178928571
median9.663333333
Q310.062
95-th percentile22.894885
Maximum27.612
Range26.16033333
Interquartile range (IQR)0.8830714286

Descriptive statistics

Standard deviation5.612256754
Coefficient of variation (CV)0.5600975121
Kurtosis1.124355843
Mean10.02014226
Median Absolute Deviation (MAD)0.4357738095
Skewness0.9663222026
Sum26092.45046
Variance31.49742587
MonotonicityNot monotonic
2022-06-14T11:18:32.250388image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
9.7516
 
0.6%
9.6211
 
0.4%
9.8811
 
0.4%
1.9510
 
0.4%
9.5559
 
0.3%
1.829
 
0.3%
10.018
 
0.3%
1.867
 
0.3%
1.8857
 
0.3%
9.97757
 
0.3%
Other values (1852)2509
96.4%
ValueCountFrequency (%)
1.4516666671
< 0.1%
1.4561
< 0.1%
1.46251
< 0.1%
1.6033333331
< 0.1%
1.691
< 0.1%
1.7031
< 0.1%
1.70561
< 0.1%
1.7116666671
< 0.1%
1.7136363641
< 0.1%
1.7188888891
< 0.1%
ValueCountFrequency (%)
27.6121
< 0.1%
26.705714291
< 0.1%
26.5851
< 0.1%
25.5451
< 0.1%
25.210714291
< 0.1%
25.091
< 0.1%
24.92751
< 0.1%
24.781
< 0.1%
24.4791
< 0.1%
24.4661
< 0.1%

avg_numvehicles
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct2100
Distinct (%)80.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean46.2049626
Minimum6.4
Maximum117.0769231
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size20.5 KiB
2022-06-14T11:18:32.404312image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum6.4
5-th percentile8.468080645
Q143.2861991
median44.65625
Q345.87931034
95-th percentile104.8432692
Maximum117.0769231
Range110.6769231
Interquartile range (IQR)2.59311125

Descriptive statistics

Standard deviation25.76278655
Coefficient of variation (CV)0.5575761802
Kurtosis1.064287342
Mean46.2049626
Median Absolute Deviation (MAD)1.28587963
Skewness0.9352263849
Sum120317.7226
Variance663.7211708
MonotonicityNot monotonic
2022-06-14T11:18:32.548121image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
44.7515
 
0.6%
8.512
 
0.5%
44.259
 
0.3%
44.57
 
0.3%
8.757
 
0.3%
45.257
 
0.3%
456
 
0.2%
45.756
 
0.2%
446
 
0.2%
45.1256
 
0.2%
Other values (2090)2523
96.9%
ValueCountFrequency (%)
6.41
< 0.1%
6.8484848481
< 0.1%
71
< 0.1%
7.3461538461
< 0.1%
7.5416666672
0.1%
7.5957446811
< 0.1%
7.6251
< 0.1%
7.8787878791
< 0.1%
7.91
< 0.1%
7.906251
< 0.1%
ValueCountFrequency (%)
117.07692311
< 0.1%
116.593751
< 0.1%
111.551
< 0.1%
1111
< 0.1%
110.781251
< 0.1%
110.56944441
< 0.1%
110.221
< 0.1%
109.85416671
< 0.1%
109.33653851
< 0.1%
109.31251
< 0.1%

Interactions

2022-06-14T11:18:30.829007image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-14T11:18:30.094031image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-14T11:18:30.462101image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-14T11:18:30.945606image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-14T11:18:30.226215image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-14T11:18:30.575187image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-14T11:18:31.064678image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-14T11:18:30.344233image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-14T11:18:30.712881image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Correlations

2022-06-14T11:18:32.664304image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-06-14T11:18:32.797561image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-06-14T11:18:32.924411image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-06-14T11:18:33.238050image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-06-14T11:18:31.241906image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
A simple visualization of nullity by column.
2022-06-14T11:18:31.390421image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

refjunctiondatehouravg_co2avg_numvehicles
0urn:ngsi-ld:Junction:542012022-02-251811.31000029.000000
1urn:ngsi-ld:Junction:542012022-02-251910.40000044.454545
2urn:ngsi-ld:Junction:542012022-02-252011.63500055.000000
3urn:ngsi-ld:Junction:542012022-02-252126.705714103.000000
4urn:ngsi-ld:Junction:542012022-02-252224.479000116.593750
5urn:ngsi-ld:Junction:542012022-02-25239.66333328.937500
6urn:ngsi-ld:Junction:542012022-02-26010.92000046.375000
7urn:ngsi-ld:Junction:542012022-02-2619.13900043.974359
8urn:ngsi-ld:Junction:542012022-02-2629.37857142.733333
9urn:ngsi-ld:Junction:542012022-02-2639.10000045.619048

Last rows

refjunctiondatehouravg_co2avg_numvehicles
2594urn:ngsi-ld:Junction:542012022-06-1471.9206458.467742
2595urn:ngsi-ld:Junction:542012022-06-1489.65033344.000000
2596urn:ngsi-ld:Junction:542012022-06-1499.73375045.375000
2597urn:ngsi-ld:Junction:542012022-06-14109.43620742.991379
2598urn:ngsi-ld:Junction:542012022-06-141122.397742104.596774
2599urn:ngsi-ld:Junction:542012022-06-141210.31766746.025000
2600urn:ngsi-ld:Junction:542012022-06-14139.52972244.868056
2601urn:ngsi-ld:Junction:542012022-06-14149.84060645.175182
2602urn:ngsi-ld:Junction:542012022-06-14159.74617642.530303
2603urn:ngsi-ld:Junction:542012022-06-141610.38266741.183333