Dataset statistics
Number of variables | 5 |
---|---|
Number of observations | 2604 |
Missing cells | 0 |
Missing cells (%) | 0.0% |
Duplicate rows | 0 |
Duplicate rows (%) | 0.0% |
Total size in memory | 101.8 KiB |
Average record size in memory | 40.0 B |
Variable types
Categorical | 2 |
---|---|
Numeric | 3 |
refjunction has constant value "urn:ngsi-ld:Junction:54201" | Constant |
date has a high cardinality: 110 distinct values | High cardinality |
avg_co2 is highly correlated with avg_numvehicles | High correlation |
avg_numvehicles is highly correlated with avg_co2 | High correlation |
avg_co2 is highly correlated with avg_numvehicles | High correlation |
avg_numvehicles is highly correlated with avg_co2 | High correlation |
hour is highly correlated with avg_co2 and 1 other fields | High correlation |
avg_co2 is highly correlated with hour and 1 other fields | High correlation |
avg_numvehicles is highly correlated with hour and 1 other fields | High correlation |
date is uniformly distributed | Uniform |
hour has 108 (4.1%) zeros | Zeros |
Reproduction
Analysis started | 2022-06-14 11:18:25.529793 |
---|---|
Analysis finished | 2022-06-14 11:18:31.470588 |
Duration | 5.94 seconds |
Software version | pandas-profiling v3.2.0 |
Download configuration | config.json |
Distinct | 1 |
---|---|
Distinct (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 20.5 KiB |
urn:ngsi-ld:Junction:54201 |
---|
Length
Max length | 26 |
---|---|
Median length | 26 |
Mean length | 26 |
Min length | 26 |
Characters and Unicode
Total characters | 67704 |
---|---|
Distinct characters | 19 |
Distinct categories | 5 ? |
Distinct scripts | 2 ? |
Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | urn:ngsi-ld:Junction:54201 |
---|---|
2nd row | urn:ngsi-ld:Junction:54201 |
3rd row | urn:ngsi-ld:Junction:54201 |
4th row | urn:ngsi-ld:Junction:54201 |
5th row | urn:ngsi-ld:Junction:54201 |
Common Values
Value | Count | Frequency (%) |
urn:ngsi-ld:Junction:54201 | 2604 |
Length
Histogram of lengths of the category
Category Frequency Plot
Value | Count | Frequency (%) |
urn:ngsi-ld:junction:54201 | 2604 |
Most occurring characters
Value | Count | Frequency (%) |
n | 10416 | |
: | 7812 | 11.5% |
u | 5208 | 7.7% |
i | 5208 | 7.7% |
c | 2604 | 3.8% |
0 | 2604 | 3.8% |
2 | 2604 | 3.8% |
4 | 2604 | 3.8% |
5 | 2604 | 3.8% |
o | 2604 | 3.8% |
Other values (9) | 23436 |
Most occurring categories
Value | Count | Frequency (%) |
Lowercase Letter | 41664 | |
Decimal Number | 13020 | 19.2% |
Other Punctuation | 7812 | 11.5% |
Uppercase Letter | 2604 | 3.8% |
Dash Punctuation | 2604 | 3.8% |
Most frequent character per category
Lowercase Letter
Value | Count | Frequency (%) |
n | 10416 | |
u | 5208 | |
i | 5208 | |
c | 2604 | 6.2% |
o | 2604 | 6.2% |
t | 2604 | 6.2% |
d | 2604 | 6.2% |
r | 2604 | 6.2% |
l | 2604 | 6.2% |
s | 2604 | 6.2% |
Decimal Number
Value | Count | Frequency (%) |
0 | 2604 | |
2 | 2604 | |
4 | 2604 | |
5 | 2604 | |
1 | 2604 |
Other Punctuation
Value | Count | Frequency (%) |
: | 7812 |
Uppercase Letter
Value | Count | Frequency (%) |
J | 2604 |
Dash Punctuation
Value | Count | Frequency (%) |
- | 2604 |
Most occurring scripts
Value | Count | Frequency (%) |
Latin | 44268 | |
Common | 23436 |
Most frequent character per script
Latin
Value | Count | Frequency (%) |
n | 10416 | |
u | 5208 | |
i | 5208 | |
c | 2604 | 5.9% |
o | 2604 | 5.9% |
t | 2604 | 5.9% |
d | 2604 | 5.9% |
J | 2604 | 5.9% |
r | 2604 | 5.9% |
l | 2604 | 5.9% |
Other values (2) | 5208 |
Common
Value | Count | Frequency (%) |
: | 7812 | |
0 | 2604 | 11.1% |
2 | 2604 | 11.1% |
4 | 2604 | 11.1% |
5 | 2604 | 11.1% |
- | 2604 | 11.1% |
1 | 2604 | 11.1% |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 67704 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
n | 10416 | |
: | 7812 | 11.5% |
u | 5208 | 7.7% |
i | 5208 | 7.7% |
c | 2604 | 3.8% |
0 | 2604 | 3.8% |
2 | 2604 | 3.8% |
4 | 2604 | 3.8% |
5 | 2604 | 3.8% |
o | 2604 | 3.8% |
Other values (9) | 23436 |
Distinct | 110 |
---|---|
Distinct (%) | 4.2% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 20.5 KiB |
2022-04-21 | 24 |
---|---|
2022-05-17 | 24 |
2022-05-15 | 24 |
2022-05-14 | 24 |
2022-05-13 | 24 |
Other values (105) |
Length
Max length | 10 |
---|---|
Median length | 10 |
Mean length | 10 |
Min length | 10 |
Characters and Unicode
Total characters | 26040 |
---|---|
Distinct characters | 11 |
Distinct categories | 2 ? |
Distinct scripts | 1 ? |
Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
Unique | 0 ? |
---|---|
Unique (%) | 0.0% |
Sample
1st row | 2022-02-25 |
---|---|
2nd row | 2022-02-25 |
3rd row | 2022-02-25 |
4th row | 2022-02-25 |
5th row | 2022-02-25 |
Common Values
Value | Count | Frequency (%) |
2022-04-21 | 24 | 0.9% |
2022-05-17 | 24 | 0.9% |
2022-05-15 | 24 | 0.9% |
2022-05-14 | 24 | 0.9% |
2022-05-13 | 24 | 0.9% |
2022-05-12 | 24 | 0.9% |
2022-05-11 | 24 | 0.9% |
2022-05-10 | 24 | 0.9% |
2022-05-09 | 24 | 0.9% |
2022-05-08 | 24 | 0.9% |
Other values (100) | 2364 |
Length
Histogram of lengths of the category
Value | Count | Frequency (%) |
2022-04-21 | 24 | 0.9% |
2022-04-20 | 24 | 0.9% |
2022-03-02 | 24 | 0.9% |
2022-03-03 | 24 | 0.9% |
2022-03-04 | 24 | 0.9% |
2022-03-05 | 24 | 0.9% |
2022-03-06 | 24 | 0.9% |
2022-03-07 | 24 | 0.9% |
2022-03-08 | 24 | 0.9% |
2022-03-09 | 24 | 0.9% |
Other values (100) | 2364 |
Most occurring characters
Value | Count | Frequency (%) |
2 | 8938 | |
0 | 6308 | |
- | 5208 | |
1 | 1141 | 4.4% |
3 | 1124 | 4.3% |
5 | 990 | 3.8% |
4 | 977 | 3.8% |
6 | 590 | 2.3% |
7 | 262 | 1.0% |
8 | 262 | 1.0% |
Most occurring categories
Value | Count | Frequency (%) |
Decimal Number | 20832 | |
Dash Punctuation | 5208 | 20.0% |
Most frequent character per category
Decimal Number
Value | Count | Frequency (%) |
2 | 8938 | |
0 | 6308 | |
1 | 1141 | 5.5% |
3 | 1124 | 5.4% |
5 | 990 | 4.8% |
4 | 977 | 4.7% |
6 | 590 | 2.8% |
7 | 262 | 1.3% |
8 | 262 | 1.3% |
9 | 240 | 1.2% |
Dash Punctuation
Value | Count | Frequency (%) |
- | 5208 |
Most occurring scripts
Value | Count | Frequency (%) |
Common | 26040 |
Most frequent character per script
Common
Value | Count | Frequency (%) |
2 | 8938 | |
0 | 6308 | |
- | 5208 | |
1 | 1141 | 4.4% |
3 | 1124 | 4.3% |
5 | 990 | 3.8% |
4 | 977 | 3.8% |
6 | 590 | 2.3% |
7 | 262 | 1.0% |
8 | 262 | 1.0% |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 26040 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
2 | 8938 | |
0 | 6308 | |
- | 5208 | |
1 | 1141 | 4.4% |
3 | 1124 | 4.3% |
5 | 990 | 3.8% |
4 | 977 | 3.8% |
6 | 590 | 2.3% |
7 | 262 | 1.0% |
8 | 262 | 1.0% |
Distinct | 24 |
---|---|
Distinct (%) | 0.9% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 11.50384025 |
Minimum | 0 |
---|---|
Maximum | 23 |
Zeros | 108 |
Zeros (%) | 4.1% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 20.5 KiB |
Quantile statistics
Minimum | 0 |
---|---|
5-th percentile | 1 |
Q1 | 6 |
median | 12 |
Q3 | 17 |
95-th percentile | 22 |
Maximum | 23 |
Range | 23 |
Interquartile range (IQR) | 11 |
Descriptive statistics
Standard deviation | 6.919019026 |
---|---|
Coefficient of variation (CV) | 0.6014529825 |
Kurtosis | -1.202047064 |
Mean | 11.50384025 |
Median Absolute Deviation (MAD) | 6 |
Skewness | -0.0008799774518 |
Sum | 29956 |
Variance | 47.87282428 |
Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=24)
Value | Count | Frequency (%) |
16 | 109 | 4.2% |
21 | 109 | 4.2% |
22 | 109 | 4.2% |
15 | 109 | 4.2% |
14 | 109 | 4.2% |
13 | 109 | 4.2% |
2 | 109 | 4.2% |
3 | 109 | 4.2% |
12 | 109 | 4.2% |
11 | 109 | 4.2% |
Other values (14) | 1514 |
Value | Count | Frequency (%) |
0 | 108 | |
1 | 108 | |
2 | 109 | |
3 | 109 | |
4 | 108 | |
5 | 108 | |
6 | 108 | |
7 | 108 | |
8 | 108 | |
9 | 109 |
Value | Count | Frequency (%) |
23 | 108 | |
22 | 109 | |
21 | 109 | |
20 | 108 | |
19 | 108 | |
18 | 108 | |
17 | 108 | |
16 | 109 | |
15 | 109 | |
14 | 109 |
Distinct | 1862 |
---|---|
Distinct (%) | 71.5% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 10.02014226 |
Minimum | 1.451666667 |
---|---|
Maximum | 27.612 |
Zeros | 0 |
Zeros (%) | 0.0% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 20.5 KiB |
Quantile statistics
Minimum | 1.451666667 |
---|---|
5-th percentile | 1.847944196 |
Q1 | 9.178928571 |
median | 9.663333333 |
Q3 | 10.062 |
95-th percentile | 22.894885 |
Maximum | 27.612 |
Range | 26.16033333 |
Interquartile range (IQR) | 0.8830714286 |
Descriptive statistics
Standard deviation | 5.612256754 |
---|---|
Coefficient of variation (CV) | 0.5600975121 |
Kurtosis | 1.124355843 |
Mean | 10.02014226 |
Median Absolute Deviation (MAD) | 0.4357738095 |
Skewness | 0.9663222026 |
Sum | 26092.45046 |
Variance | 31.49742587 |
Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
Value | Count | Frequency (%) |
9.75 | 16 | 0.6% |
9.62 | 11 | 0.4% |
9.88 | 11 | 0.4% |
1.95 | 10 | 0.4% |
9.555 | 9 | 0.3% |
1.82 | 9 | 0.3% |
10.01 | 8 | 0.3% |
1.86 | 7 | 0.3% |
1.885 | 7 | 0.3% |
9.9775 | 7 | 0.3% |
Other values (1852) | 2509 |
Value | Count | Frequency (%) |
1.451666667 | 1 | |
1.456 | 1 | |
1.4625 | 1 | |
1.603333333 | 1 | |
1.69 | 1 | |
1.703 | 1 | |
1.7056 | 1 | |
1.711666667 | 1 | |
1.713636364 | 1 | |
1.718888889 | 1 |
Value | Count | Frequency (%) |
27.612 | 1 | |
26.70571429 | 1 | |
26.585 | 1 | |
25.545 | 1 | |
25.21071429 | 1 | |
25.09 | 1 | |
24.9275 | 1 | |
24.78 | 1 | |
24.479 | 1 | |
24.466 | 1 |
Distinct | 2100 |
---|---|
Distinct (%) | 80.6% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 46.2049626 |
Minimum | 6.4 |
---|---|
Maximum | 117.0769231 |
Zeros | 0 |
Zeros (%) | 0.0% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 20.5 KiB |
Quantile statistics
Minimum | 6.4 |
---|---|
5-th percentile | 8.468080645 |
Q1 | 43.2861991 |
median | 44.65625 |
Q3 | 45.87931034 |
95-th percentile | 104.8432692 |
Maximum | 117.0769231 |
Range | 110.6769231 |
Interquartile range (IQR) | 2.59311125 |
Descriptive statistics
Standard deviation | 25.76278655 |
---|---|
Coefficient of variation (CV) | 0.5575761802 |
Kurtosis | 1.064287342 |
Mean | 46.2049626 |
Median Absolute Deviation (MAD) | 1.28587963 |
Skewness | 0.9352263849 |
Sum | 120317.7226 |
Variance | 663.7211708 |
Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
Value | Count | Frequency (%) |
44.75 | 15 | 0.6% |
8.5 | 12 | 0.5% |
44.25 | 9 | 0.3% |
44.5 | 7 | 0.3% |
8.75 | 7 | 0.3% |
45.25 | 7 | 0.3% |
45 | 6 | 0.2% |
45.75 | 6 | 0.2% |
44 | 6 | 0.2% |
45.125 | 6 | 0.2% |
Other values (2090) | 2523 |
Value | Count | Frequency (%) |
6.4 | 1 | |
6.848484848 | 1 | |
7 | 1 | |
7.346153846 | 1 | |
7.541666667 | 2 | |
7.595744681 | 1 | |
7.625 | 1 | |
7.878787879 | 1 | |
7.9 | 1 | |
7.90625 | 1 |
Value | Count | Frequency (%) |
117.0769231 | 1 | |
116.59375 | 1 | |
111.55 | 1 | |
111 | 1 | |
110.78125 | 1 | |
110.5694444 | 1 | |
110.22 | 1 | |
109.8541667 | 1 | |
109.3365385 | 1 | |
109.3125 | 1 |
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here. A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
First rows
refjunction | date | hour | avg_co2 | avg_numvehicles | |
---|---|---|---|---|---|
0 | urn:ngsi-ld:Junction:54201 | 2022-02-25 | 18 | 11.310000 | 29.000000 |
1 | urn:ngsi-ld:Junction:54201 | 2022-02-25 | 19 | 10.400000 | 44.454545 |
2 | urn:ngsi-ld:Junction:54201 | 2022-02-25 | 20 | 11.635000 | 55.000000 |
3 | urn:ngsi-ld:Junction:54201 | 2022-02-25 | 21 | 26.705714 | 103.000000 |
4 | urn:ngsi-ld:Junction:54201 | 2022-02-25 | 22 | 24.479000 | 116.593750 |
5 | urn:ngsi-ld:Junction:54201 | 2022-02-25 | 23 | 9.663333 | 28.937500 |
6 | urn:ngsi-ld:Junction:54201 | 2022-02-26 | 0 | 10.920000 | 46.375000 |
7 | urn:ngsi-ld:Junction:54201 | 2022-02-26 | 1 | 9.139000 | 43.974359 |
8 | urn:ngsi-ld:Junction:54201 | 2022-02-26 | 2 | 9.378571 | 42.733333 |
9 | urn:ngsi-ld:Junction:54201 | 2022-02-26 | 3 | 9.100000 | 45.619048 |
Last rows
refjunction | date | hour | avg_co2 | avg_numvehicles | |
---|---|---|---|---|---|
2594 | urn:ngsi-ld:Junction:54201 | 2022-06-14 | 7 | 1.920645 | 8.467742 |
2595 | urn:ngsi-ld:Junction:54201 | 2022-06-14 | 8 | 9.650333 | 44.000000 |
2596 | urn:ngsi-ld:Junction:54201 | 2022-06-14 | 9 | 9.733750 | 45.375000 |
2597 | urn:ngsi-ld:Junction:54201 | 2022-06-14 | 10 | 9.436207 | 42.991379 |
2598 | urn:ngsi-ld:Junction:54201 | 2022-06-14 | 11 | 22.397742 | 104.596774 |
2599 | urn:ngsi-ld:Junction:54201 | 2022-06-14 | 12 | 10.317667 | 46.025000 |
2600 | urn:ngsi-ld:Junction:54201 | 2022-06-14 | 13 | 9.529722 | 44.868056 |
2601 | urn:ngsi-ld:Junction:54201 | 2022-06-14 | 14 | 9.840606 | 45.175182 |
2602 | urn:ngsi-ld:Junction:54201 | 2022-06-14 | 15 | 9.746176 | 42.530303 |
2603 | urn:ngsi-ld:Junction:54201 | 2022-06-14 | 16 | 10.382667 | 41.183333 |