Dataset statistics
| Number of variables | 5 |
|---|---|
| Number of observations | 2604 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 101.8 KiB |
| Average record size in memory | 40.0 B |
Variable types
| Categorical | 2 |
|---|---|
| Numeric | 3 |
refjunction has constant value "urn:ngsi-ld:Junction:54201" | Constant |
date has a high cardinality: 110 distinct values | High cardinality |
avg_co2 is highly correlated with avg_numvehicles | High correlation |
avg_numvehicles is highly correlated with avg_co2 | High correlation |
avg_co2 is highly correlated with avg_numvehicles | High correlation |
avg_numvehicles is highly correlated with avg_co2 | High correlation |
hour is highly correlated with avg_co2 and 1 other fields | High correlation |
avg_co2 is highly correlated with hour and 1 other fields | High correlation |
avg_numvehicles is highly correlated with hour and 1 other fields | High correlation |
date is uniformly distributed | Uniform |
hour has 108 (4.1%) zeros | Zeros |
Reproduction
| Analysis started | 2022-06-14 11:18:25.529793 |
|---|---|
| Analysis finished | 2022-06-14 11:18:31.470588 |
| Duration | 5.94 seconds |
| Software version | pandas-profiling v3.2.0 |
| Download configuration | config.json |
| Distinct | 1 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 20.5 KiB |
| urn:ngsi-ld:Junction:54201 |
|---|
Length
| Max length | 26 |
|---|---|
| Median length | 26 |
| Mean length | 26 |
| Min length | 26 |
Characters and Unicode
| Total characters | 67704 |
|---|---|
| Distinct characters | 19 |
| Distinct categories | 5 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | urn:ngsi-ld:Junction:54201 |
|---|---|
| 2nd row | urn:ngsi-ld:Junction:54201 |
| 3rd row | urn:ngsi-ld:Junction:54201 |
| 4th row | urn:ngsi-ld:Junction:54201 |
| 5th row | urn:ngsi-ld:Junction:54201 |
Common Values
| Value | Count | Frequency (%) |
| urn:ngsi-ld:Junction:54201 | 2604 |
Length
Histogram of lengths of the category
Category Frequency Plot
| Value | Count | Frequency (%) |
| urn:ngsi-ld:junction:54201 | 2604 |
Most occurring characters
| Value | Count | Frequency (%) |
| n | 10416 | |
| : | 7812 | 11.5% |
| u | 5208 | 7.7% |
| i | 5208 | 7.7% |
| c | 2604 | 3.8% |
| 0 | 2604 | 3.8% |
| 2 | 2604 | 3.8% |
| 4 | 2604 | 3.8% |
| 5 | 2604 | 3.8% |
| o | 2604 | 3.8% |
| Other values (9) | 23436 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 41664 | |
| Decimal Number | 13020 | 19.2% |
| Other Punctuation | 7812 | 11.5% |
| Uppercase Letter | 2604 | 3.8% |
| Dash Punctuation | 2604 | 3.8% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| n | 10416 | |
| u | 5208 | |
| i | 5208 | |
| c | 2604 | 6.2% |
| o | 2604 | 6.2% |
| t | 2604 | 6.2% |
| d | 2604 | 6.2% |
| r | 2604 | 6.2% |
| l | 2604 | 6.2% |
| s | 2604 | 6.2% |
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 2604 | |
| 2 | 2604 | |
| 4 | 2604 | |
| 5 | 2604 | |
| 1 | 2604 |
Other Punctuation
| Value | Count | Frequency (%) |
| : | 7812 |
Uppercase Letter
| Value | Count | Frequency (%) |
| J | 2604 |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 2604 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 44268 | |
| Common | 23436 |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| n | 10416 | |
| u | 5208 | |
| i | 5208 | |
| c | 2604 | 5.9% |
| o | 2604 | 5.9% |
| t | 2604 | 5.9% |
| d | 2604 | 5.9% |
| J | 2604 | 5.9% |
| r | 2604 | 5.9% |
| l | 2604 | 5.9% |
| Other values (2) | 5208 |
Common
| Value | Count | Frequency (%) |
| : | 7812 | |
| 0 | 2604 | 11.1% |
| 2 | 2604 | 11.1% |
| 4 | 2604 | 11.1% |
| 5 | 2604 | 11.1% |
| - | 2604 | 11.1% |
| 1 | 2604 | 11.1% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 67704 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| n | 10416 | |
| : | 7812 | 11.5% |
| u | 5208 | 7.7% |
| i | 5208 | 7.7% |
| c | 2604 | 3.8% |
| 0 | 2604 | 3.8% |
| 2 | 2604 | 3.8% |
| 4 | 2604 | 3.8% |
| 5 | 2604 | 3.8% |
| o | 2604 | 3.8% |
| Other values (9) | 23436 |
| Distinct | 110 |
|---|---|
| Distinct (%) | 4.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 20.5 KiB |
| 2022-04-21 | 24 |
|---|---|
| 2022-05-17 | 24 |
| 2022-05-15 | 24 |
| 2022-05-14 | 24 |
| 2022-05-13 | 24 |
| Other values (105) |
Length
| Max length | 10 |
|---|---|
| Median length | 10 |
| Mean length | 10 |
| Min length | 10 |
Characters and Unicode
| Total characters | 26040 |
|---|---|
| Distinct characters | 11 |
| Distinct categories | 2 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 2022-02-25 |
|---|---|
| 2nd row | 2022-02-25 |
| 3rd row | 2022-02-25 |
| 4th row | 2022-02-25 |
| 5th row | 2022-02-25 |
Common Values
| Value | Count | Frequency (%) |
| 2022-04-21 | 24 | 0.9% |
| 2022-05-17 | 24 | 0.9% |
| 2022-05-15 | 24 | 0.9% |
| 2022-05-14 | 24 | 0.9% |
| 2022-05-13 | 24 | 0.9% |
| 2022-05-12 | 24 | 0.9% |
| 2022-05-11 | 24 | 0.9% |
| 2022-05-10 | 24 | 0.9% |
| 2022-05-09 | 24 | 0.9% |
| 2022-05-08 | 24 | 0.9% |
| Other values (100) | 2364 |
Length
Histogram of lengths of the category
| Value | Count | Frequency (%) |
| 2022-04-21 | 24 | 0.9% |
| 2022-04-20 | 24 | 0.9% |
| 2022-03-02 | 24 | 0.9% |
| 2022-03-03 | 24 | 0.9% |
| 2022-03-04 | 24 | 0.9% |
| 2022-03-05 | 24 | 0.9% |
| 2022-03-06 | 24 | 0.9% |
| 2022-03-07 | 24 | 0.9% |
| 2022-03-08 | 24 | 0.9% |
| 2022-03-09 | 24 | 0.9% |
| Other values (100) | 2364 |
Most occurring characters
| Value | Count | Frequency (%) |
| 2 | 8938 | |
| 0 | 6308 | |
| - | 5208 | |
| 1 | 1141 | 4.4% |
| 3 | 1124 | 4.3% |
| 5 | 990 | 3.8% |
| 4 | 977 | 3.8% |
| 6 | 590 | 2.3% |
| 7 | 262 | 1.0% |
| 8 | 262 | 1.0% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 20832 | |
| Dash Punctuation | 5208 | 20.0% |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 2 | 8938 | |
| 0 | 6308 | |
| 1 | 1141 | 5.5% |
| 3 | 1124 | 5.4% |
| 5 | 990 | 4.8% |
| 4 | 977 | 4.7% |
| 6 | 590 | 2.8% |
| 7 | 262 | 1.3% |
| 8 | 262 | 1.3% |
| 9 | 240 | 1.2% |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 5208 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 26040 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 2 | 8938 | |
| 0 | 6308 | |
| - | 5208 | |
| 1 | 1141 | 4.4% |
| 3 | 1124 | 4.3% |
| 5 | 990 | 3.8% |
| 4 | 977 | 3.8% |
| 6 | 590 | 2.3% |
| 7 | 262 | 1.0% |
| 8 | 262 | 1.0% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 26040 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 2 | 8938 | |
| 0 | 6308 | |
| - | 5208 | |
| 1 | 1141 | 4.4% |
| 3 | 1124 | 4.3% |
| 5 | 990 | 3.8% |
| 4 | 977 | 3.8% |
| 6 | 590 | 2.3% |
| 7 | 262 | 1.0% |
| 8 | 262 | 1.0% |
| Distinct | 24 |
|---|---|
| Distinct (%) | 0.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 11.50384025 |
| Minimum | 0 |
|---|---|
| Maximum | 23 |
| Zeros | 108 |
| Zeros (%) | 4.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 20.5 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 6 |
| median | 12 |
| Q3 | 17 |
| 95-th percentile | 22 |
| Maximum | 23 |
| Range | 23 |
| Interquartile range (IQR) | 11 |
Descriptive statistics
| Standard deviation | 6.919019026 |
|---|---|
| Coefficient of variation (CV) | 0.6014529825 |
| Kurtosis | -1.202047064 |
| Mean | 11.50384025 |
| Median Absolute Deviation (MAD) | 6 |
| Skewness | -0.0008799774518 |
| Sum | 29956 |
| Variance | 47.87282428 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=24)
| Value | Count | Frequency (%) |
| 16 | 109 | 4.2% |
| 21 | 109 | 4.2% |
| 22 | 109 | 4.2% |
| 15 | 109 | 4.2% |
| 14 | 109 | 4.2% |
| 13 | 109 | 4.2% |
| 2 | 109 | 4.2% |
| 3 | 109 | 4.2% |
| 12 | 109 | 4.2% |
| 11 | 109 | 4.2% |
| Other values (14) | 1514 |
| Value | Count | Frequency (%) |
| 0 | 108 | |
| 1 | 108 | |
| 2 | 109 | |
| 3 | 109 | |
| 4 | 108 | |
| 5 | 108 | |
| 6 | 108 | |
| 7 | 108 | |
| 8 | 108 | |
| 9 | 109 |
| Value | Count | Frequency (%) |
| 23 | 108 | |
| 22 | 109 | |
| 21 | 109 | |
| 20 | 108 | |
| 19 | 108 | |
| 18 | 108 | |
| 17 | 108 | |
| 16 | 109 | |
| 15 | 109 | |
| 14 | 109 |
| Distinct | 1862 |
|---|---|
| Distinct (%) | 71.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 10.02014226 |
| Minimum | 1.451666667 |
|---|---|
| Maximum | 27.612 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 20.5 KiB |
Quantile statistics
| Minimum | 1.451666667 |
|---|---|
| 5-th percentile | 1.847944196 |
| Q1 | 9.178928571 |
| median | 9.663333333 |
| Q3 | 10.062 |
| 95-th percentile | 22.894885 |
| Maximum | 27.612 |
| Range | 26.16033333 |
| Interquartile range (IQR) | 0.8830714286 |
Descriptive statistics
| Standard deviation | 5.612256754 |
|---|---|
| Coefficient of variation (CV) | 0.5600975121 |
| Kurtosis | 1.124355843 |
| Mean | 10.02014226 |
| Median Absolute Deviation (MAD) | 0.4357738095 |
| Skewness | 0.9663222026 |
| Sum | 26092.45046 |
| Variance | 31.49742587 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 9.75 | 16 | 0.6% |
| 9.62 | 11 | 0.4% |
| 9.88 | 11 | 0.4% |
| 1.95 | 10 | 0.4% |
| 9.555 | 9 | 0.3% |
| 1.82 | 9 | 0.3% |
| 10.01 | 8 | 0.3% |
| 1.86 | 7 | 0.3% |
| 1.885 | 7 | 0.3% |
| 9.9775 | 7 | 0.3% |
| Other values (1852) | 2509 |
| Value | Count | Frequency (%) |
| 1.451666667 | 1 | |
| 1.456 | 1 | |
| 1.4625 | 1 | |
| 1.603333333 | 1 | |
| 1.69 | 1 | |
| 1.703 | 1 | |
| 1.7056 | 1 | |
| 1.711666667 | 1 | |
| 1.713636364 | 1 | |
| 1.718888889 | 1 |
| Value | Count | Frequency (%) |
| 27.612 | 1 | |
| 26.70571429 | 1 | |
| 26.585 | 1 | |
| 25.545 | 1 | |
| 25.21071429 | 1 | |
| 25.09 | 1 | |
| 24.9275 | 1 | |
| 24.78 | 1 | |
| 24.479 | 1 | |
| 24.466 | 1 |
| Distinct | 2100 |
|---|---|
| Distinct (%) | 80.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 46.2049626 |
| Minimum | 6.4 |
|---|---|
| Maximum | 117.0769231 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 20.5 KiB |
Quantile statistics
| Minimum | 6.4 |
|---|---|
| 5-th percentile | 8.468080645 |
| Q1 | 43.2861991 |
| median | 44.65625 |
| Q3 | 45.87931034 |
| 95-th percentile | 104.8432692 |
| Maximum | 117.0769231 |
| Range | 110.6769231 |
| Interquartile range (IQR) | 2.59311125 |
Descriptive statistics
| Standard deviation | 25.76278655 |
|---|---|
| Coefficient of variation (CV) | 0.5575761802 |
| Kurtosis | 1.064287342 |
| Mean | 46.2049626 |
| Median Absolute Deviation (MAD) | 1.28587963 |
| Skewness | 0.9352263849 |
| Sum | 120317.7226 |
| Variance | 663.7211708 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 44.75 | 15 | 0.6% |
| 8.5 | 12 | 0.5% |
| 44.25 | 9 | 0.3% |
| 44.5 | 7 | 0.3% |
| 8.75 | 7 | 0.3% |
| 45.25 | 7 | 0.3% |
| 45 | 6 | 0.2% |
| 45.75 | 6 | 0.2% |
| 44 | 6 | 0.2% |
| 45.125 | 6 | 0.2% |
| Other values (2090) | 2523 |
| Value | Count | Frequency (%) |
| 6.4 | 1 | |
| 6.848484848 | 1 | |
| 7 | 1 | |
| 7.346153846 | 1 | |
| 7.541666667 | 2 | |
| 7.595744681 | 1 | |
| 7.625 | 1 | |
| 7.878787879 | 1 | |
| 7.9 | 1 | |
| 7.90625 | 1 |
| Value | Count | Frequency (%) |
| 117.0769231 | 1 | |
| 116.59375 | 1 | |
| 111.55 | 1 | |
| 111 | 1 | |
| 110.78125 | 1 | |
| 110.5694444 | 1 | |
| 110.22 | 1 | |
| 109.8541667 | 1 | |
| 109.3365385 | 1 | |
| 109.3125 | 1 |
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here. A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
First rows
| refjunction | date | hour | avg_co2 | avg_numvehicles | |
|---|---|---|---|---|---|
| 0 | urn:ngsi-ld:Junction:54201 | 2022-02-25 | 18 | 11.310000 | 29.000000 |
| 1 | urn:ngsi-ld:Junction:54201 | 2022-02-25 | 19 | 10.400000 | 44.454545 |
| 2 | urn:ngsi-ld:Junction:54201 | 2022-02-25 | 20 | 11.635000 | 55.000000 |
| 3 | urn:ngsi-ld:Junction:54201 | 2022-02-25 | 21 | 26.705714 | 103.000000 |
| 4 | urn:ngsi-ld:Junction:54201 | 2022-02-25 | 22 | 24.479000 | 116.593750 |
| 5 | urn:ngsi-ld:Junction:54201 | 2022-02-25 | 23 | 9.663333 | 28.937500 |
| 6 | urn:ngsi-ld:Junction:54201 | 2022-02-26 | 0 | 10.920000 | 46.375000 |
| 7 | urn:ngsi-ld:Junction:54201 | 2022-02-26 | 1 | 9.139000 | 43.974359 |
| 8 | urn:ngsi-ld:Junction:54201 | 2022-02-26 | 2 | 9.378571 | 42.733333 |
| 9 | urn:ngsi-ld:Junction:54201 | 2022-02-26 | 3 | 9.100000 | 45.619048 |
Last rows
| refjunction | date | hour | avg_co2 | avg_numvehicles | |
|---|---|---|---|---|---|
| 2594 | urn:ngsi-ld:Junction:54201 | 2022-06-14 | 7 | 1.920645 | 8.467742 |
| 2595 | urn:ngsi-ld:Junction:54201 | 2022-06-14 | 8 | 9.650333 | 44.000000 |
| 2596 | urn:ngsi-ld:Junction:54201 | 2022-06-14 | 9 | 9.733750 | 45.375000 |
| 2597 | urn:ngsi-ld:Junction:54201 | 2022-06-14 | 10 | 9.436207 | 42.991379 |
| 2598 | urn:ngsi-ld:Junction:54201 | 2022-06-14 | 11 | 22.397742 | 104.596774 |
| 2599 | urn:ngsi-ld:Junction:54201 | 2022-06-14 | 12 | 10.317667 | 46.025000 |
| 2600 | urn:ngsi-ld:Junction:54201 | 2022-06-14 | 13 | 9.529722 | 44.868056 |
| 2601 | urn:ngsi-ld:Junction:54201 | 2022-06-14 | 14 | 9.840606 | 45.175182 |
| 2602 | urn:ngsi-ld:Junction:54201 | 2022-06-14 | 15 | 9.746176 | 42.530303 |
| 2603 | urn:ngsi-ld:Junction:54201 | 2022-06-14 | 16 | 10.382667 | 41.183333 |