Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Usage manual would be nice #2

Open
JoaoPedroAlex opened this issue Feb 10, 2020 · 7 comments
Open

Usage manual would be nice #2

JoaoPedroAlex opened this issue Feb 10, 2020 · 7 comments

Comments

@JoaoPedroAlex
Copy link

I have run it like this: python3 was.py SystemOut.mpm.log0
Receive the following error:
Writing TraditionalWASLogEntries.xlsx ... Traceback (most recent call last):
File "was.py", line 2014, in
print_data_frame(df, options, name)
File "was.py", line 1861, in print_data_frame
df.to_excel(writer, sheet_name=sheetname, freeze_panes=(df.columns.nlevels, df.index.nlevels))
File "/usr/lib64/python3.7/site-packages/pandas/core/generic.py", line 2256, in to_excel
engine=engine,
File "/usr/lib64/python3.7/site-packages/pandas/io/formats/excel.py", line 739, in write
freeze_panes=freeze_panes,
File "/usr/lib64/python3.7/site-packages/pandas/io/excel/_xlsxwriter.py", line 214, in write_cells
for cell in cells:
File "/usr/lib64/python3.7/site-packages/pandas/io/formats/excel.py", line 687, in get_formatted_cells
cell.val = self._format_value(cell.val)
File "/usr/lib64/python3.7/site-packages/pandas/io/formats/excel.py", line 437, in _format_value
"Excel does not support datetimes with "
ValueError: Excel does not support datetimes with timezones. Please ensure that datetimes are timezone unaware before writing to Excel.

@JoaoPedroAlex
Copy link
Author

I see in the output:
== Options ==
Namespace(available_only=False, available_skip=False, clean_output_directory=True, create_csvs=False, create_excels=True, create_pickles=False, create_texts=False, debug=False, encoding='utf-8', end_date=None, file=['SystemOut.mpm.log'], filter_access_log_url=None, filter_to_well_known_threads=False, important_messages='CWOBJ7852W,DCSV0004W,HMGR0152W,TRAS0017I,TRAS0018I,UTLS0008W,UTLS0009W,WSVR0001I,WSVR0024I,WSVR0605W,WSVR0606W', only=None, output_directory='was_data_mining', parent_directory_as_name=True, prefilter_informational=False, print_full=True, print_stdout=False, print_summaries=False, print_top_messages=True, recurse=True, remove_raw_timestamps=True, show_plots=False, skip=None, skip_well_known_stack_frames=True, start_date=None, time_grouping='1s', top_hitters=10, trim_stack_frames=True, tz=None)

Where do I set the timezone option or any other options?!

@kgibm
Copy link
Owner

kgibm commented Feb 10, 2020

Hi, you can see usage with was.py -h:

optional arguments:
  -h, --help            show this help message and exit
  --available-only      Print available --only options
  --available-skip      Print available --skip options
  -c, --clean-output-directory
                        Clean the output directory before starting
  -C, --do-not-clean-output-directory
                        Do not clean the output directory before starting
  --create-csvs         Create CSVs
  --create-excels       Create Excels
  --create-pickles      Create Pickles
  --create-texts        Create txt files
  --debug               Print debug information
  --do-not-create-csvs  Don't create CSVs
  --do-not-create-excels
                        Don't create Excels
  --do-not-create-pickles
                        Don't create Pickles
  --do-not-create-texts
                        Don't create txt files
  --do-not-trim-stack-frames
                        Don't trim stack frames
  --do-not-print-full   Do not print full data summary
  --do-not-print-top-messages
                        Do not print top messages
  --do-not-recurse      Do not recurse
  --do-not-skip-well-known-stack-frames
                        Don't skip well known stack frames
  --encoding ENCODING   File encoding. For example, --encoding 'ISO-8859-1'
  --end-date END_DATE   Filter any time-series data before 'YYYY-MM-DD(
                        HH:MM:SS)?'
  --filter-access-log-url FILTER_ACCESS_LOG_URL
                        Only process access log entry if the value is included
                        in the URL. May be specified multiple times.
  --filter-to-well-known-threads
                        Filter to well known threads
  --important-messages IMPORTANT_MESSAGES
                        Important messages to search for
  --keep-raw-timestamps
                        Keep raw timestamp and raw TZ columns
  -o OUTPUT_DIRECTORY, --output-directory OUTPUT_DIRECTORY
                        Output directory
  --only ONLY           Only process certain types of files. May be specified
                        multiple times. For example, --only
                        TraditionalWASSystemOutLog
  --parent-directory-as-name
                        Use the parent directory as the name
  --prefilter-informational
                        Pre-filter informational messages to improve
                        performance
  --print-full          Print full data summary
  --print-stdout        Print tables to stdout
  --recurse             Recurse
  --show-plots          Show each plot interactively
  --skip SKIP           Skip certain types of files. May be specified multiple
                        times. For example, --skip TraditionalWASTrace
  --start-date START_DATE
                        Filter any time-series data after 'YYYY-MM-DD(
                        HH:MM:SS)?'
  -t TZ, --tz TZ        Output timezone (olson format). Times are converted
                        from their parsed time zones. Example: -t
                        America/New_York
  --time-grouping TIME_GROUPING
                        See https://pandas.pydata.org/pandas-
                        docs/stable/user_guide/timeseries.html#offset-aliases
  --top-hitters TOP_HITTERS
                        Top X items to process for top hitters plots

The timezone option is:

  -t TZ, --tz TZ        Output timezone (olson format). Times are converted
                        from their parsed time zones. Example: -t
                        America/New_York

Although I haven't seen the error, ValueError: Excel does not support datetimes with timezones before which suggests that somewhere we're sending enriched timestamps to Excel. Can you provide a snippet of the log it's failing on?

@JoaoPedroAlex
Copy link
Author

I will try the option -t TZ
Meanwhile here it is the output without that option:
[was@272353f3ec4a was_data_mining]$ python3 was.py SystemOut.mpm.log
[2020-02-10 15:50:50] Started
[2020-02-10 15:50:50] Processing (1/1 100.00%) SystemOut.mpm.log (413947 bytes) as FileType.TraditionalWASSystemOutLog [.log]
[2020-02-10 15:50:51] Post-processing...
{}
[2020-02-10 15:50:51] Processing 1382 rows for TraditionalWASLogEntries
[2020-02-10 15:50:52] Timestamps converted
[2020-02-10 15:50:52] Sorting
[2020-02-10 15:50:52] Finished creating timestamps and sorting
== Options ==
Namespace(available_only=False, available_skip=False, clean_output_directory=True, create_csvs=False, create_excels=True, create_pickles=False, create_texts=False, debug=False, encoding='utf-8', end_date=None, file=['SystemOut.mpm.log'], filter_access_log_url=None, filter_to_well_known_threads=False, important_messages='CWOBJ7852W,DCSV0004W,HMGR0152W,TRAS0017I,TRAS0018I,UTLS0008W,UTLS0009W,WSVR0001I,WSVR0024I,WSVR0605W,WSVR0606W', only=None, output_directory='was_data_mining', parent_directory_as_name=True, prefilter_informational=False, print_full=True, print_stdout=False, print_summaries=False, print_top_messages=True, recurse=True, remove_raw_timestamps=True, show_plots=False, skip=None, skip_well_known_stack_frames=True, start_date=None, time_grouping='1s', top_hitters=10, trim_stack_frames=True, tz=None)
== OutputTZ ==
WET
== ThreadDumpInfo ==
None
== Threads ==
None

== TraditionalWASLogEntries ==
Writing TraditionalWASLogEntries.xlsx ... Traceback (most recent call last):
File "was.py", line 2014, in
print_data_frame(df, options, name)
File "was.py", line 1861, in print_data_frame
df.to_excel(writer, sheet_name=sheetname, freeze_panes=(df.columns.nlevels, df.index.nlevels))
File "/usr/lib64/python3.7/site-packages/pandas/core/generic.py", line 2256, in to_excel
engine=engine,
File "/usr/lib64/python3.7/site-packages/pandas/io/formats/excel.py", line 739, in write
freeze_panes=freeze_panes,
File "/usr/lib64/python3.7/site-packages/pandas/io/excel/_xlsxwriter.py", line 214, in write_cells
for cell in cells:
File "/usr/lib64/python3.7/site-packages/pandas/io/formats/excel.py", line 687, in get_formatted_cells
cell.val = self._format_value(cell.val)
File "/usr/lib64/python3.7/site-packages/pandas/io/formats/excel.py", line 437, in _format_value
"Excel does not support datetimes with "
ValueError: Excel does not support datetimes with timezones. Please ensure that datetimes are timezone unaware before writing to Excel.

@JoaoPedroAlex
Copy link
Author

Here is the result with option -t:
[was@272353f3ec4a was_data_mining]$ python3 was.py SystemOut.mpm.log -t Europe/Lisbon
[2020-02-10 17:23:59] Started
[2020-02-10 17:23:59] Processing (1/1 100.00%) SystemOut.mpm.log (413947 bytes) as FileType.TraditionalWASSystemOutLog [.log]
[2020-02-10 17:24:00] Post-processing...
{}
[2020-02-10 17:24:00] Processing 1382 rows for TraditionalWASLogEntries
[2020-02-10 17:24:01] Timestamps converted
[2020-02-10 17:24:01] Sorting
[2020-02-10 17:24:01] Finished creating timestamps and sorting
== Options ==
Namespace(available_only=False, available_skip=False, clean_output_directory=True, create_csvs=False, create_excels=True, create_pickles=False, create_texts=False, debug=False, encoding='utf-8', end_date=None, file=['SystemOut.mpm.log'], filter_access_log_url=None, filter_to_well_known_threads=False, important_messages='CWOBJ7852W,DCSV0004W,HMGR0152W,TRAS0017I,TRAS0018I,UTLS0008W,UTLS0009W,WSVR0001I,WSVR0024I,WSVR0605W,WSVR0606W', only=None, output_directory='was_data_mining', parent_directory_as_name=True, prefilter_informational=False, print_full=True, print_stdout=False, print_summaries=False, print_top_messages=True, recurse=True, remove_raw_timestamps=True, show_plots=False, skip=None, skip_well_known_stack_frames=True, start_date=None, time_grouping='1s', top_hitters=10, trim_stack_frames=True, tz='Europe/Lisbon')
== OutputTZ ==
Europe/Lisbon
== ThreadDumpInfo ==
None
== Threads ==
None

== TraditionalWASLogEntries ==
Writing TraditionalWASLogEntries.xlsx ... Traceback (most recent call last):
File "was.py", line 2014, in
print_data_frame(df, options, name)
File "was.py", line 1861, in print_data_frame
df.to_excel(writer, sheet_name=sheetname, freeze_panes=(df.columns.nlevels, df.index.nlevels))
File "/usr/lib64/python3.7/site-packages/pandas/core/generic.py", line 2256, in to_excel
engine=engine,
File "/usr/lib64/python3.7/site-packages/pandas/io/formats/excel.py", line 739, in write
freeze_panes=freeze_panes,
File "/usr/lib64/python3.7/site-packages/pandas/io/excel/_xlsxwriter.py", line 214, in write_cells
for cell in cells:
File "/usr/lib64/python3.7/site-packages/pandas/io/formats/excel.py", line 687, in get_formatted_cells
cell.val = self._format_value(cell.val)
File "/usr/lib64/python3.7/site-packages/pandas/io/formats/excel.py", line 437, in _format_value
"Excel does not support datetimes with "
ValueError: Excel does not support datetimes with timezones. Please ensure that datetimes are timezone unaware before writing to Excel.

@kgibm
Copy link
Owner

kgibm commented Feb 10, 2020

I can reproduce the issue. There was a change in Pandas 0.25.0 that causes this new exception:

Series.to_excel() and DataFrame.to_excel() will now raise a ValueError when saving timezone aware data. (GH27008, GH7056)

I'll see what we can do...

kgibm added a commit that referenced this issue Feb 10, 2020
… change. The date is still correct and the timezone is in the column name.
@kgibm
Copy link
Owner

kgibm commented Feb 10, 2020

I think it's fixed now. Please update and try again.

@JoaoPedroAlex
Copy link
Author

I think it's fixed now. Please update and try again.

Yes, it is fixed. Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants