You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
SELECT
count(distinct changeset_id) as changesets,
count(distinct user_id) as users,
ifNull(sum(road_length_delta)/1000, 0) as roads,
ifNull(sum(building_edit), 0) as buildings,
count(map_feature_edit) as edits,
max(changeset_timestamp) as latest
FROM (
SELECT
min(changeset_id) as changeset_id,
user_id,
min(road_length_delta) as road_length_delta,
min(building_edit) as building_edit,
min(map_feature_edit) as map_feature_edit,
changeset_timestamp,
groupArray(hashtag) as hashtags,
min(country_iso_a3)
FROM "stats"
WHERE
startsWith(hashtag, 'm')
AND changeset_timestamp > parseDateTimeBestEffort('2020-01-01T00:00:00Z')
AND changeset_timestamp < parseDateTimeBestEffort('2022-01-01T00:00:00Z')
GROUP BY
changeset_timestamp,
user_id,
osm_id
) as grouped_stats
WHERE
arrayExists(hashtag -> startsWith(hashtag, 'm'), hashtags)
-> gives ~2/3rd the buildings of our current approach
-> Issue: takes 6 seconds instead of 0.4s
I investigated further and tried to create a aggregated projection like so:
ALTER TABLE stats ADD PROJECTION combined_hashtag (
SELECT
min(changeset_id) as changeset_id,
user_id,
min(road_length_delta) as road_length_delta,
min(building_edit) as building_edit,
min(map_feature_edit) as map_feature_edit,
changeset_timestamp,
groupArray(hashtag) as hashtags,
min(country_iso_a3) as country_iso_a3
GROUP BY
changeset_timestamp,
user_id,
osm_id
)
;
Sadly there seems to be no way to access the newly defined column "hashtags" so I cannot query for arrayExists(hashtag -> startsWith(hashtag, 'm'), hashtags), rendering it useless.
Conclusion
Currently we have no feasible (sub-1-seconds-execition time) way to re-aggregate our hashtag column and thus cannot guarantee correct values for all wildcard queries.
contributions could be counted more than once with the current implementation of the
/stats
endpoint when the wildcard*
filter is used.When using the wildcard filter we could first group values by contribution_id before deriving the sum(buildings) or sum(road_length_delta).
ohsome-now-stats-service/src/main/kotlin/org/heigit/ohsome/stats/StatsRepo.kt
Lines 27 to 40 in ba1bedb
The text was updated successfully, but these errors were encountered: