Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

原始代码可能会丢失数据的bug #141

Open
PeikaiLi opened this issue May 27, 2022 · 4 comments
Open

原始代码可能会丢失数据的bug #141

PeikaiLi opened this issue May 27, 2022 · 4 comments

Comments

@PeikaiLi
Copy link

PeikaiLi commented May 27, 2022

if self.db.find_one(collection='DXYArea', data=area):
                continue

可能会导致的丢失数据的

比如

area=
 {'provinceName': '西藏自治区',
  'provinceShortName': '西藏',
  'currentConfirmedCount': 0,
  'confirmedCount': 1,
  'suspectedCount': 0,
  'curedCount': 1,
  'deadCount': 0,
  'comment': '',
  'locationId': 540000,
  'statisticsData': 'https://file1.dxycdn.com/2020/0223/353/3398299755968039885-135.json',
  'highDangerCount': 0,
  'midDangerCount': 0,
  'detectOrgCount': 32,
  'vaccinationOrgCount': 16}
  • 当时间足够长时,新数据的上述value可能同已经存在的过去的数据的value一样(但不是今天的,就会导致新的数据无法录入),
  • 解决方案加入精确到天的日期后,再执行db.find_one()。
@PeikaiLi
Copy link
Author

我按照您的风格,做了一些修改,解决了一些bug,并添加了一些功能,提交了一个pull request,希望可以帮到你

@BlankerL
Copy link
Owner

你好,抱歉才看到这条issue。

数据中的curedCountdeadCount是单调增长的序列,如果一个城市出现确诊病例,确诊病例痊愈或死亡后应该会进入curedCountdeadCount中,因此这两个字段应该可以替代比对日期。

如果上述逻辑不成立,比对日期仍然可能出现同一天的数据遗漏的问题。

@PeikaiLi
Copy link
Author

Thanks for your reply; your code is perfect.
That issue I mentioned only appear in the table without cumulative date columns.

@PeikaiLi
Copy link
Author

PeikaiLi commented Jul 19, 2022

出现在在再爬取无症状感染者的数据的时候(无症状感染者的表中没有给出有累计单调增长的序列)
我那样按天查重的过滤方式才是有意义的
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants