Urban Region Function Classification Top18 Solution
- 浑南摸鱼队
- 王德君,东北大学计算机本科大二在读; (Team Leader)
- 姚来刚,东北大学计算机系硕士,研究方向为机器学习与数据挖掘. (Key Teammate)
- Preliminary: Rank 17
- Semi-Final: Rank 18
Build models to classify the functions of urban areas with data of satellite images({Area_ID}.jpg) and user behavior({Area_ID}.txt) from given geographical areas.
- Tables of the functions of urban areas:
CategoryID | Functions of Areas |
---|---|
001 | Residential area |
002 | School |
003 | Industrial park |
004 | Railway station |
005 | Airport |
006 | Park |
007 | Shopping area |
008 | Administrative district |
009 | Hospital |
For more Detailed Task descriptions, please go to 赛题详情
- Ubuntu 18.04.1 LTS
- GTX 1080Ti x 1 + GTX 1060M x 1
- Baidu AI Studio (Tesla V100 x 36, For Model I 36 Networks Training)
- Anaconda 4.7.10
- python 3.6
- pytorch 1.1.0
- keras 2.2.4
- opencv3
- sklearn
- numpy
- matplotlib
Model | Baseline Acc | Top Result |
---|---|---|
Model I | None | 77.08% |
Model II | 77.08% | 81.6200% |
Model III | 81.6200% 81.2440% |
82.1800% |
Model I: Several Netural Network Stacking (DeepLearning)
# Model Descriptions:
8 Nets
5 folds Stacking
Trained Networks = 7*5+1 (We only trained fold1 on Net6 because of Its Low Local Acc).
# Result:
After 36 NN Stacking, we reached a top online acc of 77.08%.
NetWork Name | Baseline | Descriptions | Online Top1-Acc on Test of 5 folds Merged Result |
---|---|---|---|
Net1_raw | DPN26+Resnext50 | Train with RAW NPYs (40w 182x24 Npys) | 76.21% |
Net2_1 | Net1 | Introduced Resampled Folds_Split for Train | About 76% |
Net3_w | Net1 | Class_Ratio were considered into Loss Calculation | About 76% |
Net4_TTA | Net1 | Introduced TTA (Test Time Augumentation) | About 76% |
Net5_HR | Net1 | Introduced HighResample (linear) | About 76% |
Net6_Features | DenseNet | Introduced Feture Engineering (Features: 175) | 61.19% |
Net7_MS | Net1 | Introduced MultiScale | About 76% |
Net8_MS_cat | Net1 | Introduced MultiScale & Concatenate | About 76% |
Model II: Txt Processing (Feature Engineering)
# Steps:
1) Txt Identical Check (Completed)
2) Multivoters based on Total Times A user Appeared in Same Category (Completed)
3) Multivoters based on Total Hours A user Appeared in Same Category (Only 3/2000 json files were processed)
# Result:
After above 3 steps, we got a submission of 81.62%. (81.6200%.txt)
# Notes:
[1] Step 3 was not completed Because of Limited Time and Computation Resources, only 3/2000 data was processed.
[2] In this project we abbreviate {Preliminary,Semi-Final}-{Train,Test}-Datasets as {P,S}{Tr,Te} ==> {PTr,PTe,STr,STe}.
Steps | Content Descriptions | Oringinal Score | After Improved | Source Code |
---|---|---|---|---|
(1) | Utilize Identical txts' Categories in PTr & STr to provide answers for STe | 77.04% | 78.74% | SelfDuplicateCheck |
(2) | Multivoters based on Total Times A user Appeared in Same Category | 78.74% | 81.62% | MergeVotes |
(3) | Multivoters based on Total Hours A user Appeared in Same Category | 81.62% | - | AdvM2_train |
Model III: Merge & Rebalance the Predicts in Submissions (Post-processing)
Directly Modify Submission.txt:
We Compared predicts in 81.2440%.txt and 81.6200%.txt, finding that 001 was TOO MANY (4k more than True Value), 003/005 were a bit more-predicted, and others were all less-predicted.
- Category Distributions in Our Submissions (Take 81.6200%.txt for Example)
Category | Total Predicts in 81.6200%.txt | Estimated True Value | Difference (Pred-Estimated) | Desc |
---|---|---|---|---|
001 | 34542 | 30092 | +4450 | Too Much More |
002 | 22026 | 22763 | -737 | Much Less |
003 | 13247 | 12753 | +494 | More |
004 | 1510 | 1647 | -137 | Little Less |
005 | 4314 | 4123 | +191 | More |
006 | 12978 | 15671 | -2693 | Much Less |
007 | 4986 | 5283 | -297 | Little Less |
008 | 2247 | 3295 | -1048 | Much Less |
009 | 4150 | 4370 | -220 | Little Less |
Therefore, We Merged the Predicts among our ex-Top2 Submissions. (81.6200%.txt & 81.2440%.txt)
Strategy & Rules:
1) Compare & Merge the predicts in the two txt file by Replacing those '001's to other less-predicted categories.
2) While the two gives the same prediction or both predictions are in More-Predicted Categories ['001','003','005'], Choose the answer in 81.6200%.txt as result Beacause of its Higher Acc.
After this operation, we got our final best submission 82.1800%.txt, which reached 82.18%.
- Related Source Code (Submission_Check)
def MergeDict(Dict1,Dict2,ModCates):
Merge_Dict = {}
_identical,_new_choice,_prior = 0,0,0
for key,val1 in Dict1.items():
val2 = Dict2[key]
if val1==val2:
Merge_Dict[key] = val1
elif '001' in [val1,val2] and '003' not in [val1,val2] and '005' not in [val1,val2]:
Merge_Dict[key] = val2 if val1=='001' else val1
elif val2 in ModCates:
Merge_Dict[key] = val2
else:
Merge_Dict[key] = val1
return Merge_Dict
txt1 = '../81.6200%.txt'
txt2 = '../81.2440%.txt'
Dict1 = LoadDictFromTxt(txt1)
Dict2 = LoadDictFromTxt(txt2)
priorlist = ['001',
# '006',
# '003',
# '008'
]
submit_txt = MergeName(txt1,txt2,'{}_MOD'.format('_'.join(priorlist)))
Merge_Dict = MergeDict(Dict1,Dict2,[])
##Merge_Dict = MergeDict(Dict1,Dict2,['006'])
##Merge_Dict = MergeDict(Dict1,Dict2,['003','008'])
WriteDictToTxt(submit_txt,Merge_Dict)
Statistics(Merge_Dict)