You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi! I just cloned your project and am messing around with it. Though I am an experienced software engineer, I am new to machine learning so feel free to tell me my insights are incorrect!
After reading the code I noticed prediction modeling heavily relies on the KeyStats, however data is extremely limited. Would it not be SUPER beneficial to back fill this data with a record per quarter (the provided data is very erratic, yet most 'feature' data points are provided be the company every quarter).
In addition to this, a cron or a simple get_missing_quartly_keystats.py script that can be invoked on demand to fill in new stats to accommodate longevity and modern accuracy of this project would help this project modeling become more accurate (more data sets), but also bring it closer to becoming a practical live use tool.
There are many categories with sub categories that can most likely be scraped and parsed. For example, the full historical market cap chart served here: https://www.macrotrends.net/stocks/charts/GNW/genworth-financial/market-cap
can be parsed out as in the html is a <script> tag that defines var chartData with all the values by date.
between the balance sheets and financial records they provide you may even find other influential data points to add to the ML portion of this script.
Let me know what you think, or if my logic is simply way off. If you think it is a good Idea I can help out with refactoring!
The text was updated successfully, but these errors were encountered:
You have struck upon the core issue when it comes to financial data science – data availability. I fully agree that this current collection of keystats data is not great. This project is meant to be a starting point for people to see a complete machine learning pipeline applied to investing.
Good find regarding macrotrends – the data looks pretty good! If you submit a PR with a scraper I'd be more than happy to merge it and credit you in the readme.
Hi! I just cloned your project and am messing around with it. Though I am an experienced software engineer, I am new to machine learning so feel free to tell me my insights are incorrect!
After reading the code I noticed prediction modeling heavily relies on the KeyStats, however data is extremely limited. Would it not be SUPER beneficial to back fill this data with a record per quarter (the provided data is very erratic, yet most 'feature' data points are provided be the company every quarter).
In addition to this, a cron or a simple get_missing_quartly_keystats.py script that can be invoked on demand to fill in new stats to accommodate longevity and modern accuracy of this project would help this project modeling become more accurate (more data sets), but also bring it closer to becoming a practical live use tool.
Most of the historical quarterly
features
data points can be found directly or through calculations on https://www.macrotrends.net/. Example: https://www.macrotrends.net/stocks/charts/GNW/genworth-financial/financial-statementsThere are many categories with sub categories that can most likely be scraped and parsed. For example, the full historical market cap chart served here: https://www.macrotrends.net/stocks/charts/GNW/genworth-financial/market-cap
can be parsed out as in the html is a <script> tag that defines
var chartData
with all the values by date.between the balance sheets and financial records they provide you may even find other influential data points to add to the ML portion of this script.
Let me know what you think, or if my logic is simply way off. If you think it is a good Idea I can help out with refactoring!
The text was updated successfully, but these errors were encountered: