Skip to content

Commit

Permalink
Fix/Using apply to set timestamp (#215)
Browse files Browse the repository at this point in the history
* wip: adding log

* chores: downgrading numpy

* lock

* fix: use apply

* fix: apply

* test

* numpy 2.0.1
  • Loading branch information
polomarcus authored Jul 30, 2024
1 parent 934edb1 commit 2b47ea9
Show file tree
Hide file tree
Showing 3 changed files with 53 additions and 41 deletions.
2 changes: 1 addition & 1 deletion docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -137,7 +137,7 @@ services:
#entrypoint: ["python", "quotaclimat/data_processing/mediatree/api_import.py"]
environment:
ENV: docker # change me to prod for real cases
LOGLEVEL: INFO # Change me to info (debug, info, warning, error) to have less log
LOGLEVEL: DEBUG # Change me to info (debug, info, warning, error) to have less log
PYTHONPATH: /app
POSTGRES_USER: user
POSTGRES_DB: barometre
Expand Down
83 changes: 46 additions & 37 deletions poetry.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

9 changes: 6 additions & 3 deletions quotaclimat/data_processing/mediatree/api_import.py
Original file line number Diff line number Diff line change
Expand Up @@ -230,13 +230,15 @@ def parse_reponse_subtitle(response_sub, channel = None, channel_program = "", c
logging.getLogger("modin.logging.default").setLevel(logging.WARNING)
if(total_results > 0):
logging.info(f"{total_results} 'total_results' field")

new_df : pd.DataFrame = json_normalize(response_sub.get('data'))
logging.debug("Schema from API before formatting :\n%s", new_df.dtypes)
pd.set_option('display.max_columns', None)
logging.debug("head: :\n%s", new_df.head())
new_df['timestamp'] = pd.to_datetime(new_df['start'], unit='s', utc=True)

logging.debug("setting timestamp")
new_df['timestamp'] = new_df.apply(lambda x: pd.to_datetime(x['start'], unit='s', utc=True), axis=1)
logging.debug("timestamp was set")

new_df.drop('start', axis=1, inplace=True)
logging.debug("renaming columns")
new_df.rename(columns={'channel.name':'channel_name',
Expand All @@ -246,13 +248,14 @@ def parse_reponse_subtitle(response_sub, channel = None, channel_program = "", c
},
inplace=True
)

logging.debug(f"setting program {channel_program} type { type(channel_program)}")

# weird error if not using this way: (ValueError) format number 1 of "20h30 le samedi" is not recognized
new_df['channel_program'] = new_df.apply(lambda x: channel_program, axis=1)
new_df['channel_program_type'] = new_df.apply(lambda x: channel_program_type, axis=1)

logging.debug("programs were set")

log_dataframe_size(new_df, channel)

logging.debug("Parsed Schema\n%s", new_df.dtypes)
Expand Down

1 comment on commit 2b47ea9

@github-actions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coverage

Coverage Report
FileStmtsMissCoverMissing
postgres
   insert_data.py43784%36–38, 56–58, 63
   insert_existing_data_example.py19384%25–27
postgres/schemas
   models.py1471093%121–128, 140–141, 199–200, 214–215
quotaclimat/data_ingestion
   scrap_sitemap.py1341787%27–28, 33–34, 66–71, 95–97, 138–140, 202, 223–228
quotaclimat/data_ingestion/ingest_db
   ingest_sitemap_in_db.py553733%21–42, 45–58, 62–73
quotaclimat/data_ingestion/scrap_html
   scrap_description_article.py36392%19–20, 32
quotaclimat/data_processing/mediatree
   api_import.py21313138%44–48, 53–69, 73–76, 82, 85–127, 133–148, 152–153, 166–178, 182–188, 201–212, 215–219, 225, 265–266, 270, 274–308, 311–313
   channel_program.py1365162%30–32, 43–45, 59, 95, 104, 142–183
   config.py15287%7, 16
   detect_keywords.py213896%169–172, 216, 271–273
   update_pg_keywords.py543928%14–100, 125–129, 152–178, 184
   utils.py692268%27–51, 54, 63, 84–85
quotaclimat/utils
   healthcheck_config.py291452%22–24, 27–38
   logger.py241154%22–24, 28–37
   sentry.py10280%21–22
TOTAL122335771% 

Tests Skipped Failures Errors Time
83 0 💤 0 ❌ 0 🔥 1m 36s ⏱️

Please sign in to comment.