Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Media data causing batch job to fail. #188

Open
gilesdring opened this issue Sep 11, 2023 · 0 comments
Open

Media data causing batch job to fail. #188

gilesdring opened this issue Sep 11, 2023 · 0 comments
Assignees
Labels
data-quality Data quality issue

Comments

@gilesdring
Copy link
Member

Looks like a media file is in an incorrect format.

Traceback (most recent call last):
Index(['news_date', 'news_headline', 'outlet_name', 'outlet_country',
  File "/home/runner/.local/share/virtualenvs/leeds-2023-a_g5KJIi/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 3653, in get_loc
       'audience_reach', 'uv', 'news_attachment_name', 'tone', 'medium',
    return self._engine.get_loc(casted_key)
  File "pandas/_libs/index.pyx", line 147, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 176, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 7080, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 7088, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'uv'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/runner/work/leeds-2023/leeds-2023/scripts/metrics/media/cision.py", line 133, in convert_numbers
    data['uv'] = data['uv'].fillna(0).astype('Int64')
  File "/home/runner/.local/share/virtualenvs/leeds-2023-a_g5KJIi/lib/python3.10/site-packages/pandas/core/frame.py", line [376](https://github.com/open-innovations/leeds-2023/actions/runs/6127402427/job/16633101580#step:7:377)1, in __getitem__
       'news_company_mentions', 'source_file'],
      dtype='object')
Index(['news_date', 'news_headline', 'contact_name', 'outlet_name',
       'outlet_type', 'audience_reach', 'uv', 'news_attachment_name', 'tone',
       'medium', 'custom_tags', 'news_company_mentions', 'source_file'],
      dtype='object')
Index(['news_date', 'news_headline', 'outlet_name', 'audience_reach', 'uv',
       'news_company_mentions', 'news_attachment_name', 'tone', 'medium',
       'custom_tags', 'source_file'],
      dtype='object')
Index(['news_date', 'news_headline', 'contact_name', 'outlet_name',
       'outlet_type', 'outlet_city', 'outlet_country', 'audience_reach', 'uv',
       'news_text', 'news_creation_date', 'news_attachment_name', 'tone',
       'medium', 'outlet_dma', 'custom_tags', 'news_company_mentions',
       'source_file'],
      dtype='object')
Index(['news_date', 'news_headline', 'outlet_name', 'outlet_type',
       'outlet_country', 'audience_reach', 'uv', 'news_attachment_name',
       'tone', 'medium', 'custom_tags', 'news_company_mentions',
       'source_file'],
      dtype='object')
Index(['news_date', 'news_headline', 'outlet_name', 'audience_reach',
       'desktop_uvpm', 'news_attachment_name', 'tone', 'medium',
       'source_file'],
      dtype='object')
    indexer = self.columns.get_loc(key)
@gilesdring gilesdring added the data-quality Data quality issue label Sep 11, 2023
@gilesdring gilesdring self-assigned this Sep 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data-quality Data quality issue
Projects
None yet
Development

No branches or pull requests

1 participant