You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
"somewhat erratically" -> no visible logical trigger but consistant in time (I've been running same trials in the last months)
It took me several hours to understand what was really going on here (and describe it properly).
While the immediate error that one will witness is that there is data lacking for some Fridays (and they'll get an exception if
they try to retrieve it), the problem is deeper as it is basically making the retrieval of historical data "random" with a possible
shift by one day. One day is not that much in average, but if there's a spike or a crash, it can matter a lot.
Here is code to reproduce the issue, along with my explanations
import yahoo_fin.stock_info
import datetime
def reproduce_issue(ticker, date_iso_format):
date_iso = datetime.date.fromisoformat(date_iso_format)
date_exception = date_iso.strftime("%m/%d/%Y")
date_iso -= datetime.timedelta(days=3)
date_begin_us_format = date_iso.strftime("%m/%d/%Y")
date_iso += datetime.timedelta(days=6)
date_end_us_format = date_iso.strftime("%m/%d/%Y")
panda_data=yahoo_fin.stock_info.get_data(ticker, start_date= date_begin_us_format, end_date= date_end_us_format)
print(panda_data)
print("Trying to get value for " + ticker + " on " + date_exception + " - Exception follows")
panda_data.at[date_exception, 'open']
# I have several dozen examples (see at the end) but let's focus on one to try to reproduce :
ticker = "USDEUR=X"
date_iso_format = datetime.date.fromisoformat("2021-03-11")
date_begin_us_format = date_iso_format.strftime("%m/%d/%Y")
date_iso_format += datetime.timedelta(days=30)
date_end_us_format = date_iso_format.strftime("%m/%d/%Y")
panda_data=yahoo_fin.stock_info.get_data(ticker, start_date= date_begin_us_format, end_date= date_end_us_format)
print(panda_data)
What are we seeing here ?
Data before 26 March is in line with https://finance.yahoo.com/quote/EUR%3DX/history?p=EUR%3DX
Data after that is shifted by one day.
And indeed, there is no data on the website for the 28th March, which is a Sunday (rightfully so).
But there is data in the panda table for that day, it's the value that's for the following day on the website, and the shift by one day starts from there.
It becomes obvious if you try to retrieve the values for the 2nd of April, which is a Friday (so data is there on the website and should be given by the lib)
# Add the following lines after the previous code extract
date_iso_format = datetime.date.fromisoformat("2021-04-02")
exception_date = date_iso_format.strftime("%m/%d/%Y")
print("Trying to get value for 2nd of April - Exception follows")
print("#######################")
panda_data.at[exception_date, 'open']
Other examples follow.
All those days are Fridays. I don't know if we can infer anything, but it's likely the shift only goes one way
I didn't find any example for a Monday.
All my examples are currencies because of what I needed to code for my taxes. But there's no reason to believe other tickers aren't affected.
(In particular, EURAUD=X is over-represented due to bias in what I do. Don't deduce anything from it)
Example of other dates failing for ticker EURAUD=X (just uncomment) :
Notice how 2017-04-07 is bad for several tickers. Or 2018-06-22.
Also, I want to take this opportunity to thank atreadw1492 a lot for this lib, as well as all other contributors. This is a very commendable project and I am grateful. Big thumbsup !
The text was updated successfully, but these errors were encountered:
Yes, that may be, although it'd require a bit more digging in to confirm.
What I am sure for the bug here is that it's erratic. I haven't found a pattern : not all weeks are shifted.
Intro
"somewhat erratically" -> no visible logical trigger but consistant in time (I've been running same trials in the last months)
It took me several hours to understand what was really going on here (and describe it properly).
While the immediate error that one will witness is that there is data lacking for some Fridays (and they'll get an exception if
they try to retrieve it), the problem is deeper as it is basically making the retrieval of historical data "random" with a possible
shift by one day. One day is not that much in average, but if there's a spike or a crash, it can matter a lot.
Here is code to reproduce the issue, along with my explanations
What are we seeing here ?
Data before 26 March is in line with https://finance.yahoo.com/quote/EUR%3DX/history?p=EUR%3DX
Data after that is shifted by one day.
And indeed, there is no data on the website for the 28th March, which is a Sunday (rightfully so).
But there is data in the panda table for that day, it's the value that's for the following day on the website, and the shift by one day starts from there.
It becomes obvious if you try to retrieve the values for the 2nd of April, which is a Friday (so data is there on the website and should be given by the lib)
Other examples follow.
All those days are Fridays. I don't know if we can infer anything, but it's likely the shift only goes one way
I didn't find any example for a Monday.
All my examples are currencies because of what I needed to code for my taxes. But there's no reason to believe other tickers aren't affected.
(In particular, EURAUD=X is over-represented due to bias in what I do. Don't deduce anything from it)
Example of other dates failing for ticker EURAUD=X (just uncomment) :
Notice how 2017-04-07 is bad for several tickers. Or 2018-06-22.
Also, I want to take this opportunity to thank atreadw1492 a lot for this lib, as well as all other contributors. This is a very commendable project and I am grateful. Big thumbsup !
The text was updated successfully, but these errors were encountered: