Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Import de données : meilleure détection type documentation #4308

Closed
AntoineAugusti opened this issue Nov 13, 2024 · 0 comments · Fixed by #4309
Closed

Import de données : meilleure détection type documentation #4308

AntoineAugusti opened this issue Nov 13, 2024 · 0 comments · Fixed by #4309
Assignees

Comments

@AntoineAugusti
Copy link
Member

AntoineAugusti commented Nov 13, 2024

Détecter les formats HTML, PDF et SVG comme étant des fichiers de documentation pour tous les types de données et non uniquement la catégorie public-transit.

Par exemple, le JDD 🅿️ Indigo a une ressource HTML qui n'est pas indiquée comme de la documentation.

Je ne sais pas pourquoi on se limitait précédemment à la catégorie des TC.

@doc """
Determines if a format is likely a documentation format.
Only used for the `public-transit` type, other types use
`documentation?/1` which is stricter.
iex> documentation_format?("PDF")
true
iex> documentation_format?("GTFS")
false
"""
def documentation_format?(%{"format" => format}), do: documentation_format?(format)
def documentation_format?(format) do
format?(format, ["pdf", "svg", "html"])
end

Données existantes

select format, type, count(1)
from resource
where format in ('pdf', 'html', 'svg')
group by 1, 2
format type count
pdf documentation 30
html documentation 7
html main 6
pdf other 3
html other 2
pdf main 1

cc @etalab/transport-bizdev

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant