-
Notifications
You must be signed in to change notification settings - Fork 0
/
Recommendations_with_IBM.html
2217 lines (2217 loc) · 262 KB
/
Recommendations_with_IBM.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Recommendations with IBM\n",
"\n",
"In this notebook, you will be putting your recommendation skills to use on real data from the IBM Watson Studio platform. \n",
"\n",
"\n",
"You may either submit your notebook through the workspace here, or you may work from your local machine and submit through the next page. Either way assure that your code passes the project [RUBRIC](https://review.udacity.com/#!/rubrics/2322/view). **Please save regularly.**\n",
"\n",
"By following the table of contents, you will build out a number of different methods for making recommendations that can be used for different situations. \n",
"\n",
"\n",
"## Table of Contents\n",
"\n",
"I. [Exploratory Data Analysis](#Exploratory-Data-Analysis)<br>\n",
"II. [Rank Based Recommendations](#Rank)<br>\n",
"III. [User-User Based Collaborative Filtering](#User-User)<br>\n",
"IV. [Content Based Recommendations (EXTRA - NOT REQUIRED)](#Content-Recs)<br>\n",
"V. [Matrix Factorization](#Matrix-Fact)<br>\n",
"VI. [Extras & Concluding](#conclusions)\n",
"\n",
"At the end of the notebook, you will find directions for how to submit your work. Let's get started by importing the necessary libraries and reading in the data."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
" article_id title \\\n",
"0 1430.0 using pixiedust for fast, flexible, and easier... \n",
"1 1314.0 healthcare python streaming application demo \n",
"2 1429.0 use deep learning for image classification \n",
"3 1338.0 ml optimization using cognitive assistant \n",
"4 1276.0 deploy your python model as a restful api \n",
"\n",
" email \n",
"0 ef5f11f77ba020cd36e1105a00ab868bbdbf7fe7 \n",
"1 083cbdfa93c8444beaa4c5f5e0f5f9198e4f9e0b \n",
"2 b96a4f2e92d8572034b1e9b28f9ac673765cd074 \n",
"3 06485706b34a5c9bf2a0ecdac41daf7e7654ceb7 \n",
"4 f01220c46fc92c6e6b161b1849de11faacd7ccb2 "
],
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>article_id</th>\n <th>title</th>\n <th>email</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>1430.0</td>\n <td>using pixiedust for fast, flexible, and easier...</td>\n <td>ef5f11f77ba020cd36e1105a00ab868bbdbf7fe7</td>\n </tr>\n <tr>\n <th>1</th>\n <td>1314.0</td>\n <td>healthcare python streaming application demo</td>\n <td>083cbdfa93c8444beaa4c5f5e0f5f9198e4f9e0b</td>\n </tr>\n <tr>\n <th>2</th>\n <td>1429.0</td>\n <td>use deep learning for image classification</td>\n <td>b96a4f2e92d8572034b1e9b28f9ac673765cd074</td>\n </tr>\n <tr>\n <th>3</th>\n <td>1338.0</td>\n <td>ml optimization using cognitive assistant</td>\n <td>06485706b34a5c9bf2a0ecdac41daf7e7654ceb7</td>\n </tr>\n <tr>\n <th>4</th>\n <td>1276.0</td>\n <td>deploy your python model as a restful api</td>\n <td>f01220c46fc92c6e6b161b1849de11faacd7ccb2</td>\n </tr>\n </tbody>\n</table>\n</div>"
},
"metadata": {},
"execution_count": 1
}
],
"source": [
"import pandas as pd\n",
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"import project_tests as t\n",
"import pickle\n",
"\n",
"%matplotlib inline\n",
"\n",
"df = pd.read_csv('data/user-item-interactions.csv')\n",
"df_content = pd.read_csv('data/articles_community.csv')\n",
"del df['Unnamed: 0']\n",
"del df_content['Unnamed: 0']\n",
"\n",
"# Show df to get an idea of the data\n",
"df.head()"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
" doc_body \\\n",
"0 Skip navigation Sign in SearchLoading...\\r\\n\\r... \n",
"1 No Free Hunch Navigation * kaggle.com\\r\\n\\r\\n ... \n",
"2 ☰ * Login\\r\\n * Sign Up\\r\\n\\r\\n * Learning Pat... \n",
"3 DATALAYER: HIGH THROUGHPUT, LOW LATENCY AT SCA... \n",
"4 Skip navigation Sign in SearchLoading...\\r\\n\\r... \n",
"\n",
" doc_description \\\n",
"0 Detect bad readings in real time using Python ... \n",
"1 See the forest, see the trees. Here lies the c... \n",
"2 Here’s this week’s news in Data Science and Bi... \n",
"3 Learn how distributed DBs solve the problem of... \n",
"4 This video demonstrates the power of IBM DataS... \n",
"\n",
" doc_full_name doc_status article_id \n",
"0 Detect Malfunctioning IoT Sensors with Streami... Live 0 \n",
"1 Communicating data science: A guide to present... Live 1 \n",
"2 This Week in Data Science (April 18, 2017) Live 2 \n",
"3 DataLayer Conference: Boost the performance of... Live 3 \n",
"4 Analyze NY Restaurant data using Spark in DSX Live 4 "
],
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>doc_body</th>\n <th>doc_description</th>\n <th>doc_full_name</th>\n <th>doc_status</th>\n <th>article_id</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>Skip navigation Sign in SearchLoading...\\r\\n\\r...</td>\n <td>Detect bad readings in real time using Python ...</td>\n <td>Detect Malfunctioning IoT Sensors with Streami...</td>\n <td>Live</td>\n <td>0</td>\n </tr>\n <tr>\n <th>1</th>\n <td>No Free Hunch Navigation * kaggle.com\\r\\n\\r\\n ...</td>\n <td>See the forest, see the trees. Here lies the c...</td>\n <td>Communicating data science: A guide to present...</td>\n <td>Live</td>\n <td>1</td>\n </tr>\n <tr>\n <th>2</th>\n <td>☰ * Login\\r\\n * Sign Up\\r\\n\\r\\n * Learning Pat...</td>\n <td>Here’s this week’s news in Data Science and Bi...</td>\n <td>This Week in Data Science (April 18, 2017)</td>\n <td>Live</td>\n <td>2</td>\n </tr>\n <tr>\n <th>3</th>\n <td>DATALAYER: HIGH THROUGHPUT, LOW LATENCY AT SCA...</td>\n <td>Learn how distributed DBs solve the problem of...</td>\n <td>DataLayer Conference: Boost the performance of...</td>\n <td>Live</td>\n <td>3</td>\n </tr>\n <tr>\n <th>4</th>\n <td>Skip navigation Sign in SearchLoading...\\r\\n\\r...</td>\n <td>This video demonstrates the power of IBM DataS...</td>\n <td>Analyze NY Restaurant data using Spark in DSX</td>\n <td>Live</td>\n <td>4</td>\n </tr>\n </tbody>\n</table>\n</div>"
},
"metadata": {},
"execution_count": 2
}
],
"source": [
"# Show df_content to get an idea of the data\n",
"df_content.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### <a class=\"anchor\" id=\"Exploratory-Data-Analysis\">Part I : Exploratory Data Analysis</a>\n",
"\n",
"Use the dictionary and cells below to provide some insight into the descriptive statistics of the data.\n",
"\n",
"`1.` What is the distribution of how many articles a user interacts with in the dataset? Provide a visual and descriptive statistics to assist with giving a look at the number of times each user interacts with an article. "
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": true
},
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"email\n",
"0000b6387a0366322d7fbfc6434af145adf7fed1 13\n",
"001055fc0bb67f71e8fa17002342b256a30254cd 4\n",
"00148e4911c7e04eeff8def7bbbdaf1c59c2c621 3\n",
"001a852ecbd6cc12ab77a785efa137b2646505fe 6\n",
"001fc95b90da5c3cb12c501d201a915e4f093290 2\n",
" ..\n",
"ffc6cfa435937ca0df967b44e9178439d04e3537 2\n",
"ffc96f8fbb35aac4cb0029332b0fc78e7766bb5d 4\n",
"ffe3d0543c9046d35c2ee3724ea9d774dff98a32 32\n",
"fff9fc3ec67bd18ed57a34ed1e67410942c4cd81 10\n",
"fffb93a166547448a0ff0232558118d59395fecd 13\n",
"Name: article_id, Length: 5148, dtype: int64"
]
},
"metadata": {},
"execution_count": 3
}
],
"source": [
"# Number of interactions on article per user\n",
"df.groupby('email')['article_id'].count()"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"output_type": "display_data",
"data": {
"text/plain": "<Figure size 432x288 with 1 Axes>",
"image/svg+xml": "<?xml version=\"1.0\" encoding=\"utf-8\" standalone=\"no\"?>\r\n<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\r\n \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\r\n<!-- Created with matplotlib (https://matplotlib.org/) -->\r\n<svg height=\"277.314375pt\" version=\"1.1\" viewBox=\"0 0 395.328125 277.314375\" width=\"395.328125pt\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\r\n <metadata>\r\n <rdf:RDF xmlns:cc=\"http://creativecommons.org/ns#\" xmlns:dc=\"http://purl.org/dc/elements/1.1/\" xmlns:rdf=\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\">\r\n <cc:Work>\r\n <dc:type rdf:resource=\"http://purl.org/dc/dcmitype/StillImage\"/>\r\n <dc:date>2021-01-23T17:05:43.205620</dc:date>\r\n <dc:format>image/svg+xml</dc:format>\r\n <dc:creator>\r\n <cc:Agent>\r\n <dc:title>Matplotlib v3.3.3, https://matplotlib.org/</dc:title>\r\n </cc:Agent>\r\n </dc:creator>\r\n </cc:Work>\r\n </rdf:RDF>\r\n </metadata>\r\n <defs>\r\n <style type=\"text/css\">*{stroke-linecap:butt;stroke-linejoin:round;}</style>\r\n </defs>\r\n <g id=\"figure_1\">\r\n <g id=\"patch_1\">\r\n <path d=\"M 0 277.314375 \r\nL 395.328125 277.314375 \r\nL 395.328125 0 \r\nL 0 0 \r\nz\r\n\" style=\"fill:none;\"/>\r\n </g>\r\n <g id=\"axes_1\">\r\n <g id=\"patch_2\">\r\n <path d=\"M 53.328125 239.758125 \r\nL 388.128125 239.758125 \r\nL 388.128125 22.318125 \r\nL 53.328125 22.318125 \r\nz\r\n\" style=\"fill:#ffffff;\"/>\r\n </g>\r\n <g id=\"patch_3\">\r\n <path clip-path=\"url(#p9319ac19d7)\" d=\"M 68.546307 239.758125 \r\nL 88.837216 239.758125 \r\nL 88.837216 32.672411 \r\nL 68.546307 32.672411 \r\nz\r\n\" style=\"fill:#1f77b4;\"/>\r\n </g>\r\n <g id=\"patch_4\">\r\n <path clip-path=\"url(#p9319ac19d7)\" d=\"M 88.837216 239.758125 \r\nL 109.128125 239.758125 \r\nL 109.128125 227.297883 \r\nL 88.837216 227.297883 \r\nz\r\n\" style=\"fill:#1f77b4;\"/>\r\n </g>\r\n <g id=\"patch_5\">\r\n <path clip-path=\"url(#p9319ac19d7)\" d=\"M 109.128125 239.758125 \r\nL 129.419034 239.758125 \r\nL 129.419034 236.335946 \r\nL 109.128125 236.335946 \r\nz\r\n\" style=\"fill:#1f77b4;\"/>\r\n </g>\r\n <g id=\"patch_6\">\r\n <path clip-path=\"url(#p9319ac19d7)\" d=\"M 129.419034 239.758125 \r\nL 149.709943 239.758125 \r\nL 149.709943 238.047035 \r\nL 129.419034 238.047035 \r\nz\r\n\" style=\"fill:#1f77b4;\"/>\r\n </g>\r\n <g id=\"patch_7\">\r\n <path clip-path=\"url(#p9319ac19d7)\" d=\"M 149.709943 239.758125 \r\nL 170.000852 239.758125 \r\nL 170.000852 239.363258 \r\nL 149.709943 239.363258 \r\nz\r\n\" style=\"fill:#1f77b4;\"/>\r\n </g>\r\n <g id=\"patch_8\">\r\n <path clip-path=\"url(#p9319ac19d7)\" d=\"M 170.000852 239.758125 \r\nL 190.291761 239.758125 \r\nL 190.291761 239.363258 \r\nL 170.000852 239.363258 \r\nz\r\n\" style=\"fill:#1f77b4;\"/>\r\n </g>\r\n <g id=\"patch_9\">\r\n <path clip-path=\"url(#p9319ac19d7)\" d=\"M 190.291761 239.758125 \r\nL 210.58267 239.758125 \r\nL 210.58267 239.451006 \r\nL 190.291761 239.451006 \r\nz\r\n\" style=\"fill:#1f77b4;\"/>\r\n </g>\r\n <g id=\"patch_10\">\r\n <path clip-path=\"url(#p9319ac19d7)\" d=\"M 210.58267 239.758125 \r\nL 230.87358 239.758125 \r\nL 230.87358 239.758125 \r\nL 210.58267 239.758125 \r\nz\r\n\" style=\"fill:#1f77b4;\"/>\r\n </g>\r\n <g id=\"patch_11\">\r\n <path clip-path=\"url(#p9319ac19d7)\" d=\"M 230.87358 239.758125 \r\nL 251.164489 239.758125 \r\nL 251.164489 239.758125 \r\nL 230.87358 239.758125 \r\nz\r\n\" style=\"fill:#1f77b4;\"/>\r\n </g>\r\n <g id=\"patch_12\">\r\n <path clip-path=\"url(#p9319ac19d7)\" d=\"M 251.164489 239.758125 \r\nL 271.455398 239.758125 \r\nL 271.455398 239.758125 \r\nL 251.164489 239.758125 \r\nz\r\n\" style=\"fill:#1f77b4;\"/>\r\n </g>\r\n <g id=\"patch_13\">\r\n <path clip-path=\"url(#p9319ac19d7)\" d=\"M 271.455398 239.758125 \r\nL 291.746307 239.758125 \r\nL 291.746307 239.758125 \r\nL 271.455398 239.758125 \r\nz\r\n\" style=\"fill:#1f77b4;\"/>\r\n </g>\r\n <g id=\"patch_14\">\r\n <path clip-path=\"url(#p9319ac19d7)\" d=\"M 291.746307 239.758125 \r\nL 312.037216 239.758125 \r\nL 312.037216 239.758125 \r\nL 291.746307 239.758125 \r\nz\r\n\" style=\"fill:#1f77b4;\"/>\r\n </g>\r\n <g id=\"patch_15\">\r\n <path clip-path=\"url(#p9319ac19d7)\" d=\"M 312.037216 239.758125 \r\nL 332.328125 239.758125 \r\nL 332.328125 239.758125 \r\nL 312.037216 239.758125 \r\nz\r\n\" style=\"fill:#1f77b4;\"/>\r\n </g>\r\n <g id=\"patch_16\">\r\n <path clip-path=\"url(#p9319ac19d7)\" d=\"M 332.328125 239.758125 \r\nL 352.619034 239.758125 \r\nL 352.619034 239.758125 \r\nL 332.328125 239.758125 \r\nz\r\n\" style=\"fill:#1f77b4;\"/>\r\n </g>\r\n <g id=\"patch_17\">\r\n <path clip-path=\"url(#p9319ac19d7)\" d=\"M 352.619034 239.758125 \r\nL 372.909943 239.758125 \r\nL 372.909943 239.670377 \r\nL 352.619034 239.670377 \r\nz\r\n\" style=\"fill:#1f77b4;\"/>\r\n </g>\r\n <g id=\"matplotlib.axis_1\">\r\n <g id=\"xtick_1\">\r\n <g id=\"line2d_1\">\r\n <path clip-path=\"url(#p9319ac19d7)\" d=\"M 67.70784 239.758125 \r\nL 67.70784 22.318125 \r\n\" style=\"fill:none;stroke:#b0b0b0;stroke-linecap:square;stroke-width:0.8;\"/>\r\n </g>\r\n <g id=\"line2d_2\">\r\n <defs>\r\n <path d=\"M 0 0 \r\nL 0 3.5 \r\n\" id=\"m7b9c543c6a\" style=\"stroke:#000000;stroke-width:0.8;\"/>\r\n </defs>\r\n <g>\r\n <use style=\"stroke:#000000;stroke-width:0.8;\" x=\"67.70784\" xlink:href=\"#m7b9c543c6a\" y=\"239.758125\"/>\r\n </g>\r\n </g>\r\n <g id=\"text_1\">\r\n <!-- 0 -->\r\n <g transform=\"translate(64.52659 254.356562)scale(0.1 -0.1)\">\r\n <defs>\r\n <path d=\"M 31.78125 66.40625 \r\nQ 24.171875 66.40625 20.328125 58.90625 \r\nQ 16.5 51.421875 16.5 36.375 \r\nQ 16.5 21.390625 20.328125 13.890625 \r\nQ 24.171875 6.390625 31.78125 6.390625 \r\nQ 39.453125 6.390625 43.28125 13.890625 \r\nQ 47.125 21.390625 47.125 36.375 \r\nQ 47.125 51.421875 43.28125 58.90625 \r\nQ 39.453125 66.40625 31.78125 66.40625 \r\nz\r\nM 31.78125 74.21875 \r\nQ 44.046875 74.21875 50.515625 64.515625 \r\nQ 56.984375 54.828125 56.984375 36.375 \r\nQ 56.984375 17.96875 50.515625 8.265625 \r\nQ 44.046875 -1.421875 31.78125 -1.421875 \r\nQ 19.53125 -1.421875 13.0625 8.265625 \r\nQ 6.59375 17.96875 6.59375 36.375 \r\nQ 6.59375 54.828125 13.0625 64.515625 \r\nQ 19.53125 74.21875 31.78125 74.21875 \r\nz\r\n\" id=\"DejaVuSans-48\"/>\r\n </defs>\r\n <use xlink:href=\"#DejaVuSans-48\"/>\r\n </g>\r\n </g>\r\n </g>\r\n <g id=\"xtick_2\">\r\n <g id=\"line2d_3\">\r\n <path clip-path=\"url(#p9319ac19d7)\" d=\"M 109.631205 239.758125 \r\nL 109.631205 22.318125 \r\n\" style=\"fill:none;stroke:#b0b0b0;stroke-linecap:square;stroke-width:0.8;\"/>\r\n </g>\r\n <g id=\"line2d_4\">\r\n <g>\r\n <use style=\"stroke:#000000;stroke-width:0.8;\" x=\"109.631205\" xlink:href=\"#m7b9c543c6a\" y=\"239.758125\"/>\r\n </g>\r\n </g>\r\n <g id=\"text_2\">\r\n <!-- 50 -->\r\n <g transform=\"translate(103.268705 254.356562)scale(0.1 -0.1)\">\r\n <defs>\r\n <path d=\"M 10.796875 72.90625 \r\nL 49.515625 72.90625 \r\nL 49.515625 64.59375 \r\nL 19.828125 64.59375 \r\nL 19.828125 46.734375 \r\nQ 21.96875 47.46875 24.109375 47.828125 \r\nQ 26.265625 48.1875 28.421875 48.1875 \r\nQ 40.625 48.1875 47.75 41.5 \r\nQ 54.890625 34.8125 54.890625 23.390625 \r\nQ 54.890625 11.625 47.5625 5.09375 \r\nQ 40.234375 -1.421875 26.90625 -1.421875 \r\nQ 22.3125 -1.421875 17.546875 -0.640625 \r\nQ 12.796875 0.140625 7.71875 1.703125 \r\nL 7.71875 11.625 \r\nQ 12.109375 9.234375 16.796875 8.0625 \r\nQ 21.484375 6.890625 26.703125 6.890625 \r\nQ 35.15625 6.890625 40.078125 11.328125 \r\nQ 45.015625 15.765625 45.015625 23.390625 \r\nQ 45.015625 31 40.078125 35.4375 \r\nQ 35.15625 39.890625 26.703125 39.890625 \r\nQ 22.75 39.890625 18.8125 39.015625 \r\nQ 14.890625 38.140625 10.796875 36.28125 \r\nz\r\n\" id=\"DejaVuSans-53\"/>\r\n </defs>\r\n <use xlink:href=\"#DejaVuSans-53\"/>\r\n <use x=\"63.623047\" xlink:href=\"#DejaVuSans-48\"/>\r\n </g>\r\n </g>\r\n </g>\r\n <g id=\"xtick_3\">\r\n <g id=\"line2d_5\">\r\n <path clip-path=\"url(#p9319ac19d7)\" d=\"M 151.554571 239.758125 \r\nL 151.554571 22.318125 \r\n\" style=\"fill:none;stroke:#b0b0b0;stroke-linecap:square;stroke-width:0.8;\"/>\r\n </g>\r\n <g id=\"line2d_6\">\r\n <g>\r\n <use style=\"stroke:#000000;stroke-width:0.8;\" x=\"151.554571\" xlink:href=\"#m7b9c543c6a\" y=\"239.758125\"/>\r\n </g>\r\n </g>\r\n <g id=\"text_3\">\r\n <!-- 100 -->\r\n <g transform=\"translate(142.010821 254.356562)scale(0.1 -0.1)\">\r\n <defs>\r\n <path d=\"M 12.40625 8.296875 \r\nL 28.515625 8.296875 \r\nL 28.515625 63.921875 \r\nL 10.984375 60.40625 \r\nL 10.984375 69.390625 \r\nL 28.421875 72.90625 \r\nL 38.28125 72.90625 \r\nL 38.28125 8.296875 \r\nL 54.390625 8.296875 \r\nL 54.390625 0 \r\nL 12.40625 0 \r\nz\r\n\" id=\"DejaVuSans-49\"/>\r\n </defs>\r\n <use xlink:href=\"#DejaVuSans-49\"/>\r\n <use x=\"63.623047\" xlink:href=\"#DejaVuSans-48\"/>\r\n <use x=\"127.246094\" xlink:href=\"#DejaVuSans-48\"/>\r\n </g>\r\n </g>\r\n </g>\r\n <g id=\"xtick_4\">\r\n <g id=\"line2d_7\">\r\n <path clip-path=\"url(#p9319ac19d7)\" d=\"M 193.477937 239.758125 \r\nL 193.477937 22.318125 \r\n\" style=\"fill:none;stroke:#b0b0b0;stroke-linecap:square;stroke-width:0.8;\"/>\r\n </g>\r\n <g id=\"line2d_8\">\r\n <g>\r\n <use style=\"stroke:#000000;stroke-width:0.8;\" x=\"193.477937\" xlink:href=\"#m7b9c543c6a\" y=\"239.758125\"/>\r\n </g>\r\n </g>\r\n <g id=\"text_4\">\r\n <!-- 150 -->\r\n <g transform=\"translate(183.934187 254.356562)scale(0.1 -0.1)\">\r\n <use xlink:href=\"#DejaVuSans-49\"/>\r\n <use x=\"63.623047\" xlink:href=\"#DejaVuSans-53\"/>\r\n <use x=\"127.246094\" xlink:href=\"#DejaVuSans-48\"/>\r\n </g>\r\n </g>\r\n </g>\r\n <g id=\"xtick_5\">\r\n <g id=\"line2d_9\">\r\n <path clip-path=\"url(#p9319ac19d7)\" d=\"M 235.401303 239.758125 \r\nL 235.401303 22.318125 \r\n\" style=\"fill:none;stroke:#b0b0b0;stroke-linecap:square;stroke-width:0.8;\"/>\r\n </g>\r\n <g id=\"line2d_10\">\r\n <g>\r\n <use style=\"stroke:#000000;stroke-width:0.8;\" x=\"235.401303\" xlink:href=\"#m7b9c543c6a\" y=\"239.758125\"/>\r\n </g>\r\n </g>\r\n <g id=\"text_5\">\r\n <!-- 200 -->\r\n <g transform=\"translate(225.857553 254.356562)scale(0.1 -0.1)\">\r\n <defs>\r\n <path d=\"M 19.1875 8.296875 \r\nL 53.609375 8.296875 \r\nL 53.609375 0 \r\nL 7.328125 0 \r\nL 7.328125 8.296875 \r\nQ 12.9375 14.109375 22.625 23.890625 \r\nQ 32.328125 33.6875 34.8125 36.53125 \r\nQ 39.546875 41.84375 41.421875 45.53125 \r\nQ 43.3125 49.21875 43.3125 52.78125 \r\nQ 43.3125 58.59375 39.234375 62.25 \r\nQ 35.15625 65.921875 28.609375 65.921875 \r\nQ 23.96875 65.921875 18.8125 64.3125 \r\nQ 13.671875 62.703125 7.8125 59.421875 \r\nL 7.8125 69.390625 \r\nQ 13.765625 71.78125 18.9375 73 \r\nQ 24.125 74.21875 28.421875 74.21875 \r\nQ 39.75 74.21875 46.484375 68.546875 \r\nQ 53.21875 62.890625 53.21875 53.421875 \r\nQ 53.21875 48.921875 51.53125 44.890625 \r\nQ 49.859375 40.875 45.40625 35.40625 \r\nQ 44.1875 33.984375 37.640625 27.21875 \r\nQ 31.109375 20.453125 19.1875 8.296875 \r\nz\r\n\" id=\"DejaVuSans-50\"/>\r\n </defs>\r\n <use xlink:href=\"#DejaVuSans-50\"/>\r\n <use x=\"63.623047\" xlink:href=\"#DejaVuSans-48\"/>\r\n <use x=\"127.246094\" xlink:href=\"#DejaVuSans-48\"/>\r\n </g>\r\n </g>\r\n </g>\r\n <g id=\"xtick_6\">\r\n <g id=\"line2d_11\">\r\n <path clip-path=\"url(#p9319ac19d7)\" d=\"M 277.324669 239.758125 \r\nL 277.324669 22.318125 \r\n\" style=\"fill:none;stroke:#b0b0b0;stroke-linecap:square;stroke-width:0.8;\"/>\r\n </g>\r\n <g id=\"line2d_12\">\r\n <g>\r\n <use style=\"stroke:#000000;stroke-width:0.8;\" x=\"277.324669\" xlink:href=\"#m7b9c543c6a\" y=\"239.758125\"/>\r\n </g>\r\n </g>\r\n <g id=\"text_6\">\r\n <!-- 250 -->\r\n <g transform=\"translate(267.780919 254.356562)scale(0.1 -0.1)\">\r\n <use xlink:href=\"#DejaVuSans-50\"/>\r\n <use x=\"63.623047\" xlink:href=\"#DejaVuSans-53\"/>\r\n <use x=\"127.246094\" xlink:href=\"#DejaVuSans-48\"/>\r\n </g>\r\n </g>\r\n </g>\r\n <g id=\"xtick_7\">\r\n <g id=\"line2d_13\">\r\n <path clip-path=\"url(#p9319ac19d7)\" d=\"M 319.248035 239.758125 \r\nL 319.248035 22.318125 \r\n\" style=\"fill:none;stroke:#b0b0b0;stroke-linecap:square;stroke-width:0.8;\"/>\r\n </g>\r\n <g id=\"line2d_14\">\r\n <g>\r\n <use style=\"stroke:#000000;stroke-width:0.8;\" x=\"319.248035\" xlink:href=\"#m7b9c543c6a\" y=\"239.758125\"/>\r\n </g>\r\n </g>\r\n <g id=\"text_7\">\r\n <!-- 300 -->\r\n <g transform=\"translate(309.704285 254.356562)scale(0.1 -0.1)\">\r\n <defs>\r\n <path d=\"M 40.578125 39.3125 \r\nQ 47.65625 37.796875 51.625 33 \r\nQ 55.609375 28.21875 55.609375 21.1875 \r\nQ 55.609375 10.40625 48.1875 4.484375 \r\nQ 40.765625 -1.421875 27.09375 -1.421875 \r\nQ 22.515625 -1.421875 17.65625 -0.515625 \r\nQ 12.796875 0.390625 7.625 2.203125 \r\nL 7.625 11.71875 \r\nQ 11.71875 9.328125 16.59375 8.109375 \r\nQ 21.484375 6.890625 26.8125 6.890625 \r\nQ 36.078125 6.890625 40.9375 10.546875 \r\nQ 45.796875 14.203125 45.796875 21.1875 \r\nQ 45.796875 27.640625 41.28125 31.265625 \r\nQ 36.765625 34.90625 28.71875 34.90625 \r\nL 20.21875 34.90625 \r\nL 20.21875 43.015625 \r\nL 29.109375 43.015625 \r\nQ 36.375 43.015625 40.234375 45.921875 \r\nQ 44.09375 48.828125 44.09375 54.296875 \r\nQ 44.09375 59.90625 40.109375 62.90625 \r\nQ 36.140625 65.921875 28.71875 65.921875 \r\nQ 24.65625 65.921875 20.015625 65.03125 \r\nQ 15.375 64.15625 9.8125 62.3125 \r\nL 9.8125 71.09375 \r\nQ 15.4375 72.65625 20.34375 73.4375 \r\nQ 25.25 74.21875 29.59375 74.21875 \r\nQ 40.828125 74.21875 47.359375 69.109375 \r\nQ 53.90625 64.015625 53.90625 55.328125 \r\nQ 53.90625 49.265625 50.4375 45.09375 \r\nQ 46.96875 40.921875 40.578125 39.3125 \r\nz\r\n\" id=\"DejaVuSans-51\"/>\r\n </defs>\r\n <use xlink:href=\"#DejaVuSans-51\"/>\r\n <use x=\"63.623047\" xlink:href=\"#DejaVuSans-48\"/>\r\n <use x=\"127.246094\" xlink:href=\"#DejaVuSans-48\"/>\r\n </g>\r\n </g>\r\n </g>\r\n <g id=\"xtick_8\">\r\n <g id=\"line2d_15\">\r\n <path clip-path=\"url(#p9319ac19d7)\" d=\"M 361.171401 239.758125 \r\nL 361.171401 22.318125 \r\n\" style=\"fill:none;stroke:#b0b0b0;stroke-linecap:square;stroke-width:0.8;\"/>\r\n </g>\r\n <g id=\"line2d_16\">\r\n <g>\r\n <use style=\"stroke:#000000;stroke-width:0.8;\" x=\"361.171401\" xlink:href=\"#m7b9c543c6a\" y=\"239.758125\"/>\r\n </g>\r\n </g>\r\n <g id=\"text_8\">\r\n <!-- 350 -->\r\n <g transform=\"translate(351.627651 254.356562)scale(0.1 -0.1)\">\r\n <use xlink:href=\"#DejaVuSans-51\"/>\r\n <use x=\"63.623047\" xlink:href=\"#DejaVuSans-53\"/>\r\n <use x=\"127.246094\" xlink:href=\"#DejaVuSans-48\"/>\r\n </g>\r\n </g>\r\n </g>\r\n <g id=\"text_9\">\r\n <!-- Number of Interactions -->\r\n <g transform=\"translate(162.913281 268.034687)scale(0.1 -0.1)\">\r\n <defs>\r\n <path d=\"M 9.8125 72.90625 \r\nL 23.09375 72.90625 \r\nL 55.421875 11.921875 \r\nL 55.421875 72.90625 \r\nL 64.984375 72.90625 \r\nL 64.984375 0 \r\nL 51.703125 0 \r\nL 19.390625 60.984375 \r\nL 19.390625 0 \r\nL 9.8125 0 \r\nz\r\n\" id=\"DejaVuSans-78\"/>\r\n <path d=\"M 8.5 21.578125 \r\nL 8.5 54.6875 \r\nL 17.484375 54.6875 \r\nL 17.484375 21.921875 \r\nQ 17.484375 14.15625 20.5 10.265625 \r\nQ 23.53125 6.390625 29.59375 6.390625 \r\nQ 36.859375 6.390625 41.078125 11.03125 \r\nQ 45.3125 15.671875 45.3125 23.6875 \r\nL 45.3125 54.6875 \r\nL 54.296875 54.6875 \r\nL 54.296875 0 \r\nL 45.3125 0 \r\nL 45.3125 8.40625 \r\nQ 42.046875 3.421875 37.71875 1 \r\nQ 33.40625 -1.421875 27.6875 -1.421875 \r\nQ 18.265625 -1.421875 13.375 4.4375 \r\nQ 8.5 10.296875 8.5 21.578125 \r\nz\r\nM 31.109375 56 \r\nz\r\n\" id=\"DejaVuSans-117\"/>\r\n <path d=\"M 52 44.1875 \r\nQ 55.375 50.25 60.0625 53.125 \r\nQ 64.75 56 71.09375 56 \r\nQ 79.640625 56 84.28125 50.015625 \r\nQ 88.921875 44.046875 88.921875 33.015625 \r\nL 88.921875 0 \r\nL 79.890625 0 \r\nL 79.890625 32.71875 \r\nQ 79.890625 40.578125 77.09375 44.375 \r\nQ 74.3125 48.1875 68.609375 48.1875 \r\nQ 61.625 48.1875 57.5625 43.546875 \r\nQ 53.515625 38.921875 53.515625 30.90625 \r\nL 53.515625 0 \r\nL 44.484375 0 \r\nL 44.484375 32.71875 \r\nQ 44.484375 40.625 41.703125 44.40625 \r\nQ 38.921875 48.1875 33.109375 48.1875 \r\nQ 26.21875 48.1875 22.15625 43.53125 \r\nQ 18.109375 38.875 18.109375 30.90625 \r\nL 18.109375 0 \r\nL 9.078125 0 \r\nL 9.078125 54.6875 \r\nL 18.109375 54.6875 \r\nL 18.109375 46.1875 \r\nQ 21.1875 51.21875 25.484375 53.609375 \r\nQ 29.78125 56 35.6875 56 \r\nQ 41.65625 56 45.828125 52.96875 \r\nQ 50 49.953125 52 44.1875 \r\nz\r\n\" id=\"DejaVuSans-109\"/>\r\n <path d=\"M 48.6875 27.296875 \r\nQ 48.6875 37.203125 44.609375 42.84375 \r\nQ 40.53125 48.484375 33.40625 48.484375 \r\nQ 26.265625 48.484375 22.1875 42.84375 \r\nQ 18.109375 37.203125 18.109375 27.296875 \r\nQ 18.109375 17.390625 22.1875 11.75 \r\nQ 26.265625 6.109375 33.40625 6.109375 \r\nQ 40.53125 6.109375 44.609375 11.75 \r\nQ 48.6875 17.390625 48.6875 27.296875 \r\nz\r\nM 18.109375 46.390625 \r\nQ 20.953125 51.265625 25.265625 53.625 \r\nQ 29.59375 56 35.59375 56 \r\nQ 45.5625 56 51.78125 48.09375 \r\nQ 58.015625 40.1875 58.015625 27.296875 \r\nQ 58.015625 14.40625 51.78125 6.484375 \r\nQ 45.5625 -1.421875 35.59375 -1.421875 \r\nQ 29.59375 -1.421875 25.265625 0.953125 \r\nQ 20.953125 3.328125 18.109375 8.203125 \r\nL 18.109375 0 \r\nL 9.078125 0 \r\nL 9.078125 75.984375 \r\nL 18.109375 75.984375 \r\nz\r\n\" id=\"DejaVuSans-98\"/>\r\n <path d=\"M 56.203125 29.59375 \r\nL 56.203125 25.203125 \r\nL 14.890625 25.203125 \r\nQ 15.484375 15.921875 20.484375 11.0625 \r\nQ 25.484375 6.203125 34.421875 6.203125 \r\nQ 39.59375 6.203125 44.453125 7.46875 \r\nQ 49.3125 8.734375 54.109375 11.28125 \r\nL 54.109375 2.78125 \r\nQ 49.265625 0.734375 44.1875 -0.34375 \r\nQ 39.109375 -1.421875 33.890625 -1.421875 \r\nQ 20.796875 -1.421875 13.15625 6.1875 \r\nQ 5.515625 13.8125 5.515625 26.8125 \r\nQ 5.515625 40.234375 12.765625 48.109375 \r\nQ 20.015625 56 32.328125 56 \r\nQ 43.359375 56 49.78125 48.890625 \r\nQ 56.203125 41.796875 56.203125 29.59375 \r\nz\r\nM 47.21875 32.234375 \r\nQ 47.125 39.59375 43.09375 43.984375 \r\nQ 39.0625 48.390625 32.421875 48.390625 \r\nQ 24.90625 48.390625 20.390625 44.140625 \r\nQ 15.875 39.890625 15.1875 32.171875 \r\nz\r\n\" id=\"DejaVuSans-101\"/>\r\n <path d=\"M 41.109375 46.296875 \r\nQ 39.59375 47.171875 37.8125 47.578125 \r\nQ 36.03125 48 33.890625 48 \r\nQ 26.265625 48 22.1875 43.046875 \r\nQ 18.109375 38.09375 18.109375 28.8125 \r\nL 18.109375 0 \r\nL 9.078125 0 \r\nL 9.078125 54.6875 \r\nL 18.109375 54.6875 \r\nL 18.109375 46.1875 \r\nQ 20.953125 51.171875 25.484375 53.578125 \r\nQ 30.03125 56 36.53125 56 \r\nQ 37.453125 56 38.578125 55.875 \r\nQ 39.703125 55.765625 41.0625 55.515625 \r\nz\r\n\" id=\"DejaVuSans-114\"/>\r\n <path id=\"DejaVuSans-32\"/>\r\n <path d=\"M 30.609375 48.390625 \r\nQ 23.390625 48.390625 19.1875 42.75 \r\nQ 14.984375 37.109375 14.984375 27.296875 \r\nQ 14.984375 17.484375 19.15625 11.84375 \r\nQ 23.34375 6.203125 30.609375 6.203125 \r\nQ 37.796875 6.203125 41.984375 11.859375 \r\nQ 46.1875 17.53125 46.1875 27.296875 \r\nQ 46.1875 37.015625 41.984375 42.703125 \r\nQ 37.796875 48.390625 30.609375 48.390625 \r\nz\r\nM 30.609375 56 \r\nQ 42.328125 56 49.015625 48.375 \r\nQ 55.71875 40.765625 55.71875 27.296875 \r\nQ 55.71875 13.875 49.015625 6.21875 \r\nQ 42.328125 -1.421875 30.609375 -1.421875 \r\nQ 18.84375 -1.421875 12.171875 6.21875 \r\nQ 5.515625 13.875 5.515625 27.296875 \r\nQ 5.515625 40.765625 12.171875 48.375 \r\nQ 18.84375 56 30.609375 56 \r\nz\r\n\" id=\"DejaVuSans-111\"/>\r\n <path d=\"M 37.109375 75.984375 \r\nL 37.109375 68.5 \r\nL 28.515625 68.5 \r\nQ 23.6875 68.5 21.796875 66.546875 \r\nQ 19.921875 64.59375 19.921875 59.515625 \r\nL 19.921875 54.6875 \r\nL 34.71875 54.6875 \r\nL 34.71875 47.703125 \r\nL 19.921875 47.703125 \r\nL 19.921875 0 \r\nL 10.890625 0 \r\nL 10.890625 47.703125 \r\nL 2.296875 47.703125 \r\nL 2.296875 54.6875 \r\nL 10.890625 54.6875 \r\nL 10.890625 58.5 \r\nQ 10.890625 67.625 15.140625 71.796875 \r\nQ 19.390625 75.984375 28.609375 75.984375 \r\nz\r\n\" id=\"DejaVuSans-102\"/>\r\n <path d=\"M 9.8125 72.90625 \r\nL 19.671875 72.90625 \r\nL 19.671875 0 \r\nL 9.8125 0 \r\nz\r\n\" id=\"DejaVuSans-73\"/>\r\n <path d=\"M 54.890625 33.015625 \r\nL 54.890625 0 \r\nL 45.90625 0 \r\nL 45.90625 32.71875 \r\nQ 45.90625 40.484375 42.875 44.328125 \r\nQ 39.84375 48.1875 33.796875 48.1875 \r\nQ 26.515625 48.1875 22.3125 43.546875 \r\nQ 18.109375 38.921875 18.109375 30.90625 \r\nL 18.109375 0 \r\nL 9.078125 0 \r\nL 9.078125 54.6875 \r\nL 18.109375 54.6875 \r\nL 18.109375 46.1875 \r\nQ 21.34375 51.125 25.703125 53.5625 \r\nQ 30.078125 56 35.796875 56 \r\nQ 45.21875 56 50.046875 50.171875 \r\nQ 54.890625 44.34375 54.890625 33.015625 \r\nz\r\n\" id=\"DejaVuSans-110\"/>\r\n <path d=\"M 18.3125 70.21875 \r\nL 18.3125 54.6875 \r\nL 36.8125 54.6875 \r\nL 36.8125 47.703125 \r\nL 18.3125 47.703125 \r\nL 18.3125 18.015625 \r\nQ 18.3125 11.328125 20.140625 9.421875 \r\nQ 21.96875 7.515625 27.59375 7.515625 \r\nL 36.8125 7.515625 \r\nL 36.8125 0 \r\nL 27.59375 0 \r\nQ 17.1875 0 13.234375 3.875 \r\nQ 9.28125 7.765625 9.28125 18.015625 \r\nL 9.28125 47.703125 \r\nL 2.6875 47.703125 \r\nL 2.6875 54.6875 \r\nL 9.28125 54.6875 \r\nL 9.28125 70.21875 \r\nz\r\n\" id=\"DejaVuSans-116\"/>\r\n <path d=\"M 34.28125 27.484375 \r\nQ 23.390625 27.484375 19.1875 25 \r\nQ 14.984375 22.515625 14.984375 16.5 \r\nQ 14.984375 11.71875 18.140625 8.90625 \r\nQ 21.296875 6.109375 26.703125 6.109375 \r\nQ 34.1875 6.109375 38.703125 11.40625 \r\nQ 43.21875 16.703125 43.21875 25.484375 \r\nL 43.21875 27.484375 \r\nz\r\nM 52.203125 31.203125 \r\nL 52.203125 0 \r\nL 43.21875 0 \r\nL 43.21875 8.296875 \r\nQ 40.140625 3.328125 35.546875 0.953125 \r\nQ 30.953125 -1.421875 24.3125 -1.421875 \r\nQ 15.921875 -1.421875 10.953125 3.296875 \r\nQ 6 8.015625 6 15.921875 \r\nQ 6 25.140625 12.171875 29.828125 \r\nQ 18.359375 34.515625 30.609375 34.515625 \r\nL 43.21875 34.515625 \r\nL 43.21875 35.40625 \r\nQ 43.21875 41.609375 39.140625 45 \r\nQ 35.0625 48.390625 27.6875 48.390625 \r\nQ 23 48.390625 18.546875 47.265625 \r\nQ 14.109375 46.140625 10.015625 43.890625 \r\nL 10.015625 52.203125 \r\nQ 14.9375 54.109375 19.578125 55.046875 \r\nQ 24.21875 56 28.609375 56 \r\nQ 40.484375 56 46.34375 49.84375 \r\nQ 52.203125 43.703125 52.203125 31.203125 \r\nz\r\n\" id=\"DejaVuSans-97\"/>\r\n <path d=\"M 48.78125 52.59375 \r\nL 48.78125 44.1875 \r\nQ 44.96875 46.296875 41.140625 47.34375 \r\nQ 37.3125 48.390625 33.40625 48.390625 \r\nQ 24.65625 48.390625 19.8125 42.84375 \r\nQ 14.984375 37.3125 14.984375 27.296875 \r\nQ 14.984375 17.28125 19.8125 11.734375 \r\nQ 24.65625 6.203125 33.40625 6.203125 \r\nQ 37.3125 6.203125 41.140625 7.25 \r\nQ 44.96875 8.296875 48.78125 10.40625 \r\nL 48.78125 2.09375 \r\nQ 45.015625 0.34375 40.984375 -0.53125 \r\nQ 36.96875 -1.421875 32.421875 -1.421875 \r\nQ 20.0625 -1.421875 12.78125 6.34375 \r\nQ 5.515625 14.109375 5.515625 27.296875 \r\nQ 5.515625 40.671875 12.859375 48.328125 \r\nQ 20.21875 56 33.015625 56 \r\nQ 37.15625 56 41.109375 55.140625 \r\nQ 45.0625 54.296875 48.78125 52.59375 \r\nz\r\n\" id=\"DejaVuSans-99\"/>\r\n <path d=\"M 9.421875 54.6875 \r\nL 18.40625 54.6875 \r\nL 18.40625 0 \r\nL 9.421875 0 \r\nz\r\nM 9.421875 75.984375 \r\nL 18.40625 75.984375 \r\nL 18.40625 64.59375 \r\nL 9.421875 64.59375 \r\nz\r\n\" id=\"DejaVuSans-105\"/>\r\n <path d=\"M 44.28125 53.078125 \r\nL 44.28125 44.578125 \r\nQ 40.484375 46.53125 36.375 47.5 \r\nQ 32.28125 48.484375 27.875 48.484375 \r\nQ 21.1875 48.484375 17.84375 46.4375 \r\nQ 14.5 44.390625 14.5 40.28125 \r\nQ 14.5 37.15625 16.890625 35.375 \r\nQ 19.28125 33.59375 26.515625 31.984375 \r\nL 29.59375 31.296875 \r\nQ 39.15625 29.25 43.1875 25.515625 \r\nQ 47.21875 21.78125 47.21875 15.09375 \r\nQ 47.21875 7.46875 41.1875 3.015625 \r\nQ 35.15625 -1.421875 24.609375 -1.421875 \r\nQ 20.21875 -1.421875 15.453125 -0.5625 \r\nQ 10.6875 0.296875 5.421875 2 \r\nL 5.421875 11.28125 \r\nQ 10.40625 8.6875 15.234375 7.390625 \r\nQ 20.0625 6.109375 24.8125 6.109375 \r\nQ 31.15625 6.109375 34.5625 8.28125 \r\nQ 37.984375 10.453125 37.984375 14.40625 \r\nQ 37.984375 18.0625 35.515625 20.015625 \r\nQ 33.0625 21.96875 24.703125 23.78125 \r\nL 21.578125 24.515625 \r\nQ 13.234375 26.265625 9.515625 29.90625 \r\nQ 5.8125 33.546875 5.8125 39.890625 \r\nQ 5.8125 47.609375 11.28125 51.796875 \r\nQ 16.75 56 26.8125 56 \r\nQ 31.78125 56 36.171875 55.265625 \r\nQ 40.578125 54.546875 44.28125 53.078125 \r\nz\r\n\" id=\"DejaVuSans-115\"/>\r\n </defs>\r\n <use xlink:href=\"#DejaVuSans-78\"/>\r\n <use x=\"74.804688\" xlink:href=\"#DejaVuSans-117\"/>\r\n <use x=\"138.183594\" xlink:href=\"#DejaVuSans-109\"/>\r\n <use x=\"235.595703\" xlink:href=\"#DejaVuSans-98\"/>\r\n <use x=\"299.072266\" xlink:href=\"#DejaVuSans-101\"/>\r\n <use x=\"360.595703\" xlink:href=\"#DejaVuSans-114\"/>\r\n <use x=\"401.708984\" xlink:href=\"#DejaVuSans-32\"/>\r\n <use x=\"433.496094\" xlink:href=\"#DejaVuSans-111\"/>\r\n <use x=\"494.677734\" xlink:href=\"#DejaVuSans-102\"/>\r\n <use x=\"529.882812\" xlink:href=\"#DejaVuSans-32\"/>\r\n <use x=\"561.669922\" xlink:href=\"#DejaVuSans-73\"/>\r\n <use x=\"591.162109\" xlink:href=\"#DejaVuSans-110\"/>\r\n <use x=\"654.541016\" xlink:href=\"#DejaVuSans-116\"/>\r\n <use x=\"693.75\" xlink:href=\"#DejaVuSans-101\"/>\r\n <use x=\"755.273438\" xlink:href=\"#DejaVuSans-114\"/>\r\n <use x=\"796.386719\" xlink:href=\"#DejaVuSans-97\"/>\r\n <use x=\"857.666016\" xlink:href=\"#DejaVuSans-99\"/>\r\n <use x=\"912.646484\" xlink:href=\"#DejaVuSans-116\"/>\r\n <use x=\"951.855469\" xlink:href=\"#DejaVuSans-105\"/>\r\n <use x=\"979.638672\" xlink:href=\"#DejaVuSans-111\"/>\r\n <use x=\"1040.820312\" xlink:href=\"#DejaVuSans-110\"/>\r\n <use x=\"1104.199219\" xlink:href=\"#DejaVuSans-115\"/>\r\n </g>\r\n </g>\r\n </g>\r\n <g id=\"matplotlib.axis_2\">\r\n <g id=\"ytick_1\">\r\n <g id=\"line2d_17\">\r\n <path clip-path=\"url(#p9319ac19d7)\" d=\"M 53.328125 239.758125 \r\nL 388.128125 239.758125 \r\n\" style=\"fill:none;stroke:#b0b0b0;stroke-linecap:square;stroke-width:0.8;\"/>\r\n </g>\r\n <g id=\"line2d_18\">\r\n <defs>\r\n <path d=\"M 0 0 \r\nL -3.5 0 \r\n\" id=\"m7250963a62\" style=\"stroke:#000000;stroke-width:0.8;\"/>\r\n </defs>\r\n <g>\r\n <use style=\"stroke:#000000;stroke-width:0.8;\" x=\"53.328125\" xlink:href=\"#m7250963a62\" y=\"239.758125\"/>\r\n </g>\r\n </g>\r\n <g id=\"text_10\">\r\n <!-- 0 -->\r\n <g transform=\"translate(39.965625 243.557344)scale(0.1 -0.1)\">\r\n <use xlink:href=\"#DejaVuSans-48\"/>\r\n </g>\r\n </g>\r\n </g>\r\n <g id=\"ytick_2\">\r\n <g id=\"line2d_19\">\r\n <path clip-path=\"url(#p9319ac19d7)\" d=\"M 53.328125 195.884033 \r\nL 388.128125 195.884033 \r\n\" style=\"fill:none;stroke:#b0b0b0;stroke-linecap:square;stroke-width:0.8;\"/>\r\n </g>\r\n <g id=\"line2d_20\">\r\n <g>\r\n <use style=\"stroke:#000000;stroke-width:0.8;\" x=\"53.328125\" xlink:href=\"#m7250963a62\" y=\"195.884033\"/>\r\n </g>\r\n </g>\r\n <g id=\"text_11\">\r\n <!-- 1000 -->\r\n <g transform=\"translate(20.878125 199.683252)scale(0.1 -0.1)\">\r\n <use xlink:href=\"#DejaVuSans-49\"/>\r\n <use x=\"63.623047\" xlink:href=\"#DejaVuSans-48\"/>\r\n <use x=\"127.246094\" xlink:href=\"#DejaVuSans-48\"/>\r\n <use x=\"190.869141\" xlink:href=\"#DejaVuSans-48\"/>\r\n </g>\r\n </g>\r\n </g>\r\n <g id=\"ytick_3\">\r\n <g id=\"line2d_21\">\r\n <path clip-path=\"url(#p9319ac19d7)\" d=\"M 53.328125 152.009941 \r\nL 388.128125 152.009941 \r\n\" style=\"fill:none;stroke:#b0b0b0;stroke-linecap:square;stroke-width:0.8;\"/>\r\n </g>\r\n <g id=\"line2d_22\">\r\n <g>\r\n <use style=\"stroke:#000000;stroke-width:0.8;\" x=\"53.328125\" xlink:href=\"#m7250963a62\" y=\"152.009941\"/>\r\n </g>\r\n </g>\r\n <g id=\"text_12\">\r\n <!-- 2000 -->\r\n <g transform=\"translate(20.878125 155.80916)scale(0.1 -0.1)\">\r\n <use xlink:href=\"#DejaVuSans-50\"/>\r\n <use x=\"63.623047\" xlink:href=\"#DejaVuSans-48\"/>\r\n <use x=\"127.246094\" xlink:href=\"#DejaVuSans-48\"/>\r\n <use x=\"190.869141\" xlink:href=\"#DejaVuSans-48\"/>\r\n </g>\r\n </g>\r\n </g>\r\n <g id=\"ytick_4\">\r\n <g id=\"line2d_23\">\r\n <path clip-path=\"url(#p9319ac19d7)\" d=\"M 53.328125 108.135849 \r\nL 388.128125 108.135849 \r\n\" style=\"fill:none;stroke:#b0b0b0;stroke-linecap:square;stroke-width:0.8;\"/>\r\n </g>\r\n <g id=\"line2d_24\">\r\n <g>\r\n <use style=\"stroke:#000000;stroke-width:0.8;\" x=\"53.328125\" xlink:href=\"#m7250963a62\" y=\"108.135849\"/>\r\n </g>\r\n </g>\r\n <g id=\"text_13\">\r\n <!-- 3000 -->\r\n <g transform=\"translate(20.878125 111.935068)scale(0.1 -0.1)\">\r\n <use xlink:href=\"#DejaVuSans-51\"/>\r\n <use x=\"63.623047\" xlink:href=\"#DejaVuSans-48\"/>\r\n <use x=\"127.246094\" xlink:href=\"#DejaVuSans-48\"/>\r\n <use x=\"190.869141\" xlink:href=\"#DejaVuSans-48\"/>\r\n </g>\r\n </g>\r\n </g>\r\n <g id=\"ytick_5\">\r\n <g id=\"line2d_25\">\r\n <path clip-path=\"url(#p9319ac19d7)\" d=\"M 53.328125 64.261757 \r\nL 388.128125 64.261757 \r\n\" style=\"fill:none;stroke:#b0b0b0;stroke-linecap:square;stroke-width:0.8;\"/>\r\n </g>\r\n <g id=\"line2d_26\">\r\n <g>\r\n <use style=\"stroke:#000000;stroke-width:0.8;\" x=\"53.328125\" xlink:href=\"#m7250963a62\" y=\"64.261757\"/>\r\n </g>\r\n </g>\r\n <g id=\"text_14\">\r\n <!-- 4000 -->\r\n <g transform=\"translate(20.878125 68.060976)scale(0.1 -0.1)\">\r\n <defs>\r\n <path d=\"M 37.796875 64.3125 \r\nL 12.890625 25.390625 \r\nL 37.796875 25.390625 \r\nz\r\nM 35.203125 72.90625 \r\nL 47.609375 72.90625 \r\nL 47.609375 25.390625 \r\nL 58.015625 25.390625 \r\nL 58.015625 17.1875 \r\nL 47.609375 17.1875 \r\nL 47.609375 0 \r\nL 37.796875 0 \r\nL 37.796875 17.1875 \r\nL 4.890625 17.1875 \r\nL 4.890625 26.703125 \r\nz\r\n\" id=\"DejaVuSans-52\"/>\r\n </defs>\r\n <use xlink:href=\"#DejaVuSans-52\"/>\r\n <use x=\"63.623047\" xlink:href=\"#DejaVuSans-48\"/>\r\n <use x=\"127.246094\" xlink:href=\"#DejaVuSans-48\"/>\r\n <use x=\"190.869141\" xlink:href=\"#DejaVuSans-48\"/>\r\n </g>\r\n </g>\r\n </g>\r\n <g id=\"text_15\">\r\n <!-- Users -->\r\n <g transform=\"translate(14.798438 145.038906)rotate(-90)scale(0.1 -0.1)\">\r\n <defs>\r\n <path d=\"M 8.6875 72.90625 \r\nL 18.609375 72.90625 \r\nL 18.609375 28.609375 \r\nQ 18.609375 16.890625 22.84375 11.734375 \r\nQ 27.09375 6.59375 36.625 6.59375 \r\nQ 46.09375 6.59375 50.34375 11.734375 \r\nQ 54.59375 16.890625 54.59375 28.609375 \r\nL 54.59375 72.90625 \r\nL 64.5 72.90625 \r\nL 64.5 27.390625 \r\nQ 64.5 13.140625 57.4375 5.859375 \r\nQ 50.390625 -1.421875 36.625 -1.421875 \r\nQ 22.796875 -1.421875 15.734375 5.859375 \r\nQ 8.6875 13.140625 8.6875 27.390625 \r\nz\r\n\" id=\"DejaVuSans-85\"/>\r\n </defs>\r\n <use xlink:href=\"#DejaVuSans-85\"/>\r\n <use x=\"73.193359\" xlink:href=\"#DejaVuSans-115\"/>\r\n <use x=\"125.292969\" xlink:href=\"#DejaVuSans-101\"/>\r\n <use x=\"186.816406\" xlink:href=\"#DejaVuSans-114\"/>\r\n <use x=\"227.929688\" xlink:href=\"#DejaVuSans-115\"/>\r\n </g>\r\n </g>\r\n </g>\r\n <g id=\"patch_18\">\r\n <path d=\"M 53.328125 239.758125 \r\nL 53.328125 22.318125 \r\n\" style=\"fill:none;stroke:#000000;stroke-linecap:square;stroke-linejoin:miter;stroke-width:0.8;\"/>\r\n </g>\r\n <g id=\"patch_19\">\r\n <path d=\"M 388.128125 239.758125 \r\nL 388.128125 22.318125 \r\n\" style=\"fill:none;stroke:#000000;stroke-linecap:square;stroke-linejoin:miter;stroke-width:0.8;\"/>\r\n </g>\r\n <g id=\"patch_20\">\r\n <path d=\"M 53.328125 239.758125 \r\nL 388.128125 239.758125 \r\n\" style=\"fill:none;stroke:#000000;stroke-linecap:square;stroke-linejoin:miter;stroke-width:0.8;\"/>\r\n </g>\r\n <g id=\"patch_21\">\r\n <path d=\"M 53.328125 22.318125 \r\nL 388.128125 22.318125 \r\n\" style=\"fill:none;stroke:#000000;stroke-linecap:square;stroke-linejoin:miter;stroke-width:0.8;\"/>\r\n </g>\r\n <g id=\"text_16\">\r\n <!-- User Interactions with Articles Distribution -->\r\n <g transform=\"translate(93.643438 16.318125)scale(0.12 -0.12)\">\r\n <defs>\r\n <path d=\"M 4.203125 54.6875 \r\nL 13.1875 54.6875 \r\nL 24.421875 12.015625 \r\nL 35.59375 54.6875 \r\nL 46.1875 54.6875 \r\nL 57.421875 12.015625 \r\nL 68.609375 54.6875 \r\nL 77.59375 54.6875 \r\nL 63.28125 0 \r\nL 52.6875 0 \r\nL 40.921875 44.828125 \r\nL 29.109375 0 \r\nL 18.5 0 \r\nz\r\n\" id=\"DejaVuSans-119\"/>\r\n <path d=\"M 54.890625 33.015625 \r\nL 54.890625 0 \r\nL 45.90625 0 \r\nL 45.90625 32.71875 \r\nQ 45.90625 40.484375 42.875 44.328125 \r\nQ 39.84375 48.1875 33.796875 48.1875 \r\nQ 26.515625 48.1875 22.3125 43.546875 \r\nQ 18.109375 38.921875 18.109375 30.90625 \r\nL 18.109375 0 \r\nL 9.078125 0 \r\nL 9.078125 75.984375 \r\nL 18.109375 75.984375 \r\nL 18.109375 46.1875 \r\nQ 21.34375 51.125 25.703125 53.5625 \r\nQ 30.078125 56 35.796875 56 \r\nQ 45.21875 56 50.046875 50.171875 \r\nQ 54.890625 44.34375 54.890625 33.015625 \r\nz\r\n\" id=\"DejaVuSans-104\"/>\r\n <path d=\"M 34.1875 63.1875 \r\nL 20.796875 26.90625 \r\nL 47.609375 26.90625 \r\nz\r\nM 28.609375 72.90625 \r\nL 39.796875 72.90625 \r\nL 67.578125 0 \r\nL 57.328125 0 \r\nL 50.6875 18.703125 \r\nL 17.828125 18.703125 \r\nL 11.1875 0 \r\nL 0.78125 0 \r\nz\r\n\" id=\"DejaVuSans-65\"/>\r\n <path d=\"M 9.421875 75.984375 \r\nL 18.40625 75.984375 \r\nL 18.40625 0 \r\nL 9.421875 0 \r\nz\r\n\" id=\"DejaVuSans-108\"/>\r\n <path d=\"M 19.671875 64.796875 \r\nL 19.671875 8.109375 \r\nL 31.59375 8.109375 \r\nQ 46.6875 8.109375 53.6875 14.9375 \r\nQ 60.6875 21.78125 60.6875 36.53125 \r\nQ 60.6875 51.171875 53.6875 57.984375 \r\nQ 46.6875 64.796875 31.59375 64.796875 \r\nz\r\nM 9.8125 72.90625 \r\nL 30.078125 72.90625 \r\nQ 51.265625 72.90625 61.171875 64.09375 \r\nQ 71.09375 55.28125 71.09375 36.53125 \r\nQ 71.09375 17.671875 61.125 8.828125 \r\nQ 51.171875 0 30.078125 0 \r\nL 9.8125 0 \r\nz\r\n\" id=\"DejaVuSans-68\"/>\r\n </defs>\r\n <use xlink:href=\"#DejaVuSans-85\"/>\r\n <use x=\"73.193359\" xlink:href=\"#DejaVuSans-115\"/>\r\n <use x=\"125.292969\" xlink:href=\"#DejaVuSans-101\"/>\r\n <use x=\"186.816406\" xlink:href=\"#DejaVuSans-114\"/>\r\n <use x=\"227.929688\" xlink:href=\"#DejaVuSans-32\"/>\r\n <use x=\"259.716797\" xlink:href=\"#DejaVuSans-73\"/>\r\n <use x=\"289.208984\" xlink:href=\"#DejaVuSans-110\"/>\r\n <use x=\"352.587891\" xlink:href=\"#DejaVuSans-116\"/>\r\n <use x=\"391.796875\" xlink:href=\"#DejaVuSans-101\"/>\r\n <use x=\"453.320312\" xlink:href=\"#DejaVuSans-114\"/>\r\n <use x=\"494.433594\" xlink:href=\"#DejaVuSans-97\"/>\r\n <use x=\"555.712891\" xlink:href=\"#DejaVuSans-99\"/>\r\n <use x=\"610.693359\" xlink:href=\"#DejaVuSans-116\"/>\r\n <use x=\"649.902344\" xlink:href=\"#DejaVuSans-105\"/>\r\n <use x=\"677.685547\" xlink:href=\"#DejaVuSans-111\"/>\r\n <use x=\"738.867188\" xlink:href=\"#DejaVuSans-110\"/>\r\n <use x=\"802.246094\" xlink:href=\"#DejaVuSans-115\"/>\r\n <use x=\"854.345703\" xlink:href=\"#DejaVuSans-32\"/>\r\n <use x=\"886.132812\" xlink:href=\"#DejaVuSans-119\"/>\r\n <use x=\"967.919922\" xlink:href=\"#DejaVuSans-105\"/>\r\n <use x=\"995.703125\" xlink:href=\"#DejaVuSans-116\"/>\r\n <use x=\"1034.912109\" xlink:href=\"#DejaVuSans-104\"/>\r\n <use x=\"1098.291016\" xlink:href=\"#DejaVuSans-32\"/>\r\n <use x=\"1130.078125\" xlink:href=\"#DejaVuSans-65\"/>\r\n <use x=\"1198.486328\" xlink:href=\"#DejaVuSans-114\"/>\r\n <use x=\"1239.599609\" xlink:href=\"#DejaVuSans-116\"/>\r\n <use x=\"1278.808594\" xlink:href=\"#DejaVuSans-105\"/>\r\n <use x=\"1306.591797\" xlink:href=\"#DejaVuSans-99\"/>\r\n <use x=\"1361.572266\" xlink:href=\"#DejaVuSans-108\"/>\r\n <use x=\"1389.355469\" xlink:href=\"#DejaVuSans-101\"/>\r\n <use x=\"1450.878906\" xlink:href=\"#DejaVuSans-115\"/>\r\n <use x=\"1502.978516\" xlink:href=\"#DejaVuSans-32\"/>\r\n <use x=\"1534.765625\" xlink:href=\"#DejaVuSans-68\"/>\r\n <use x=\"1611.767578\" xlink:href=\"#DejaVuSans-105\"/>\r\n <use x=\"1639.550781\" xlink:href=\"#DejaVuSans-115\"/>\r\n <use x=\"1691.650391\" xlink:href=\"#DejaVuSans-116\"/>\r\n <use x=\"1730.859375\" xlink:href=\"#DejaVuSans-114\"/>\r\n <use x=\"1771.972656\" xlink:href=\"#DejaVuSans-105\"/>\r\n <use x=\"1799.755859\" xlink:href=\"#DejaVuSans-98\"/>\r\n <use x=\"1863.232422\" xlink:href=\"#DejaVuSans-117\"/>\r\n <use x=\"1926.611328\" xlink:href=\"#DejaVuSans-116\"/>\r\n <use x=\"1965.820312\" xlink:href=\"#DejaVuSans-105\"/>\r\n <use x=\"1993.603516\" xlink:href=\"#DejaVuSans-111\"/>\r\n <use x=\"2054.785156\" xlink:href=\"#DejaVuSans-110\"/>\r\n </g>\r\n </g>\r\n </g>\r\n </g>\r\n <defs>\r\n <clipPath id=\"p9319ac19d7\">\r\n <rect height=\"217.44\" width=\"334.8\" x=\"53.328125\" y=\"22.318125\"/>\r\n </clipPath>\r\n </defs>\r\n</svg>\r\n",
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYsAAAEWCAYAAACXGLsWAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/Il7ecAAAACXBIWXMAAAsTAAALEwEAmpwYAAAgQUlEQVR4nO3deZhdVZnv8e+PMCkBAoZOQ4iEIQ4MihAGr6gJKJNosG9QNEhQWtQLjXbjAKISEJShEUVQjMIlDBIQsYmgQgSK4SqzAQJpJEAYwiSZoEBoEt77x1oVdyqnalUVZyr4fZ7nPLX32tO7V+1z3rPW3mdvRQRmZma9WaXVAZiZWftzsjAzsyInCzMzK3KyMDOzIicLMzMrcrIwM7MiJwsbVCSdJenbrY6jO0nvl3R/L9NHSwpJqzY5rj7Vl6QOSf/ajJi6bfebkn5Rx/V1StosD58r6fg6rrstj71mcbJokfzBsUW3simSLmjCtvv8JmpWTD1s+yBJN1XLIuKLEfHdVsTTm4i4MSLe3jUuaZ6kD73W9eb6D0k79WHetqqvnIBekvS8pOck3SHpSElrVOL7XkQUk1Rfk1lEDI2Ih+oQe1vVZTtwsnida/Y32Xbb/mAmScCBwML8t7d527WeD4uItYENgSOA/YHf5X2rmzbe/9cNJ4s2JWm4pCskLZa0UNKNklbJ0zaS9GtJf5P0sKTDK8tNkXSppAskPQccVNhOV/fIZEmPSnpW0tF52p7AN4FP5ub9Xbl8XUlnS3pS0nxJx0sakqcdJOn/STpN0gJgiqTNJV0raUFe/4WShlViGCXpsrw/CySdIemdwFnAe/O2F+d5V2gVSfq8pLm5jmZI2qgyLSR9UdIDuR7P7PqQkrSFpOslLckxXdxD/UyTdEQeHpnXeWge3zxvdxVJ4yQ9nsvPB94K/DbH/vXKKid1r+devJ/0IXs4sL+k1Stxda/ni/tYXxMkzcrf9B/M/+Na+/05SXMkLZJ0laRNcrnyNp/J67hH0taF/SAiXoiIDuBjwHuBj+T1LW+5SlozH7cL8v/rNkkjJJ2Q6+KMvG9n5PlD0qGSHgAeqJRVW+zDJc1Uat1cX9mPlboFlVsvjT72Bisni/Z1BPA4sAEwgvShHUoJ47fAXcBIYDfgK5L2qCw7AbgUGAZc2Mft7QK8Pa/vO5LeGRF/AL4HXJyb9+/O854LLAW2AN4D7A5Uuwh2Ah7KcZ8ACPg+sBHwTmAUMAVAKclcATwCjM77ND0i5gBfBP6ctz2se8CSds3r/QTpQ/URYHq32fYBdgDelefrqqfvAlcD6wEbAz/uoV6uB8bl4Q/m/fpAZfzGiHi1ukBEfAZ4FPhojv3kyuSV6rmH7QJMJv2vL8njH+02vVrPB1Curx2B84CvkY6NDwDzasw3gXS8/Qvp+LsRuChP3j0v9zZgXVKdLuhlH1YQEY8Ct5M+/LubnNc5CnhL3p+/R8TROYbD8r4dVllmX1I9bNnDJieR/tfDgVn04f3QhGNvUHKyaF+vkA7CTSLildwnHqSDb4OIOC4i/if3z/6c1Lzv8ueI+K+IeDUi/t7H7R0bEX+PiLtIiejdtWaSNALYG/hK/rb4DHBat+0/ERE/joileZ1zI2JmRLwcEX8DfkD6oAXYkZREvpbX91JErNBX3ItJwDkRcWdEvAwcRfo2OLoyz4kRsTh/SF0HbJvLXwE2ATYqbPN6YJecpD8AnAy8L0/7YJ7eH32t5zcD+wG/jIhXSMm/e1fUCvXch20fTKqvmfnYmB8R/11jvi8C34+IORGxlPSFYdv8rfwVYG3gHYDyPE/2YdsrxA2sX6P8FVKS2CIilkXEHRHxXGFd34+Ihb3s/5URcUM+Po4mHR+j+hlvLa/l2BuUnCxaZxmwWrey1UhvGIBTgLnA1ZIeknRkLt8E2Cg3bRfnJvI3Sd8uuzw2gHieqgy/CAztYb5NcpxPVrb/M+Cfetp+7kqYrtRl9RxwAembHqRvkY/kD6X+2oj0jQ6AiOgkfcsdWZmnp/36OqnFc6ukeyV9rtYGIuJB4AXSG/39pFbQE5LezsCSRV/r+eOk1tvv8viFwF6SNqjM09//8yjgwT7Mtwnwo8r/dyGprkZGxLXAGcCZwDOSpkpap59xjMzr7O584CpguqQnJJ0sqft7pLtSHSyfno+PhaTj5rV6LcfeoORk0TqPkrpdqjYlH4AR8XxEHBERm5H6ef9D0m6kg//hiBhWea0dEXtX1lPPWwl3X9djwMvA8Mr214mIrXpZ5nu5bJuIWIfUZdLVf/sY8FbVPkFZ2o8nSB9sAEhai/TNdH5hOSLiqYj4fERsBHwB+Em3vu6q64GJwOoRMT+PTyZ1Yc3qaROlGAomkz5cHpX0FPArUpL+dC/bKG3zMWDzPmz7MeAL3Y6xN0XEnwAi4vSI2J7U9fM2UrdWn+Rv9duTupVWDD61oI+NiC2B/0XqxulqTfW0b6V9Xt6KkDSU1KJ5gvQFAODNlXn/uR/rHfCxN1g5WbTOxcC3JG2cT5B+iNQnfSmApH2UTsIKWEJqibwK3Ao8L+kbkt4kaYikrSXt0KA4nwZG524YcpfD1cCpktbJsW8u6YO9rGNtoBNYImkkK3643Ao8CZwoaa18krOrm+dpYGNVTux2cxHwWUnbKl2O+T3gloiYV9opSftJ2jiPLiJ9OLzaw+zXA4cBN+Txjjx+U0Qs62GZp4HNSnH0EFvXuah9SC2abUndVSfR+1VRpfo6m1Rfu+X/20hJ76gx31nAUZK2yvGsK2m/PLyDpJ3yN/4XgJfoud6q+/TmfIxcTvqf/67GPOMlbZPPYz1HamV3rXug9bm3pF1ynXwXuDkiHsvdofOBA/J76HOsmEgbduwNVk4WrXMc8CfgJtKH1cnApIiYnaePAf5I+pD9M/CTiLgufzh1fYg8DDwL/IJ0YrARfpX/LpB0Zx4+EFgduC/Hfinp/EpPjgW2IyW9K4HLuibk/fko6WT5o6ST+p/Mk68F7gWekvRs95VGxB+BbwO/JiWczVnx3ElvdgBukdQJzAC+3Mv1+deTEl5XsriJ9I30hh7mh3Ty81u5K+erfYypy2eAWRFxdW4BPRURTwGnA+9Sz1cflerrVuCzpHNMS/J+bVJjvt+QEtP03G04G9grT16HdI5sEakVvIDUZdqTMyQ9T/rw/SHpf7Vn94sCsn8mHUvPAXNyfOfnaT8CJipdnXV6L9vr7pfAMaTup+1Jrdounyd9cVkAbEV6P3Zp5LE3KCn88CMzMytwy8LMzIqcLMzMrMjJwszMipwszMys6HV5863hw4fH6NGjB7TsCy+8wFprrVXfgBrAcdbPYIgRHGe9Oc6V3XHHHc9GxAY1J0bE6+61/fbbx0Bdd911A162mRxn/QyGGCMcZ705zpUBt0cPn6vuhjIzsyInCzMzK3KyMDOzIicLMzMrcrIwM7MiJwszMytysjAzsyInCzMzK3KyMDOzotfl7T5eq9FHXlnX9c078SN1XZ+ZWbO5ZWFmZkVOFmZmVuRkYWZmRU4WZmZW5GRhZmZFThZmZlbkZGFmZkVOFmZmVuRkYWZmRU4WZmZW5GRhZmZFThZmZlbkZGFmZkVOFmZmVuRkYWZmRU4WZmZW5GRhZmZFThZmZlbkZGFmZkVOFmZmVuRkYWZmRU4WZmZW5GRhZmZFThZmZlbkZGFmZkVOFmZmVuRkYWZmRQ1PFpKGSPqLpCvy+KaSbpE0V9LFklbP5Wvk8bl5+ujKOo7K5fdL2qPRMZuZ2Yqa0bL4MjCnMn4ScFpEbAEsAg7O5QcDi3L5aXk+JG0J7A9sBewJ/ETSkCbEbWZmWUOThaSNgY8Av8jjAnYFLs2zTAP2zcMT8jh5+m55/gnA9Ih4OSIeBuYCOzYybjMzW9GqDV7/D4GvA2vn8bcAiyNiaR5/HBiZh0cCjwFExFJJS/L8I4GbK+usLrOcpEOAQwBGjBhBR0fHgALu7OzkiG2WDWjZngw0lt50dnY2ZL31NhjiHAwxguOsN8fZPw1LFpL2AZ6JiDskjWvUdrpExFRgKsDYsWNj3LiBbbKjo4NTb3qhjpHBvEkDi6U3HR0dDHQfm2kwxDkYYgTHWW+Os38a2bJ4H/AxSXsDawLrAD8ChklaNbcuNgbm5/nnA6OAxyWtCqwLLKiUd6kuY2ZmTdCwcxYRcVREbBwRo0knqK+NiEnAdcDEPNtk4PI8PCOPk6dfGxGRy/fPV0ttCowBbm1U3GZmtrJGn7Oo5RvAdEnHA38Bzs7lZwPnS5oLLCQlGCLiXkmXAPcBS4FDI6K+JxXMzKxXTUkWEdEBdOThh6hxNVNEvATs18PyJwAnNC5CMzPrjX/BbWZmRU4WZmZW5GRhZmZFThZmZlbkZGFmZkVOFmZmVuRkYWZmRU4WZmZW5GRhZmZFThZmZlbkZGFmZkVOFmZmVuRkYWZmRU4WZmZW5GRhZmZFThZmZlbkZGFmZkVOFmZmVuRkYWZmRU4WZmZW5GRhZmZFThZmZlbkZGFmZkVOFmZmVuRkYWZmRU4WZmZW5GRhZmZFThZmZlbkZGFmZkVOFmZmVuRkYWZmRU4WZmZW5GRhZmZFThZmZlbkZGFmZkUNSxaS1pR0q6S7JN0r6dhcvqmkWyTNlXSxpNVz+Rp5fG6ePrqyrqNy+f2S9mhUzGZmVlsjWxYvA7tGxLuBbYE9Je0MnAScFhFbAIuAg/P8BwOLcvlpeT4kbQnsD2wF7An8RNKQBsZtZmbdNCxZRNKZR1fLrwB2BS7N5dOAffPwhDxOnr6bJOXy6RHxckQ8DMwFdmxU3GZmtjJFRONWnloAdwBbAGcCpwA359YDkkYBv4+IrSXNBvaMiMfztAeBnYApeZkLcvnZeZlLu23rEOAQgBEjRmw/ffr0AcXc2dnJw0uWDWjZnmwzct26rg9SnEOHDq37euttMMQ5GGIEx1lvjnNl48ePvyMixtaatmojNxwRy4BtJQ0DfgO8o4HbmgpMBRg7dmyMGzduQOvp6Ojg1JteqGNkMG/SwGLpTUdHBwPdx2YaDHEOhhjBcdab4+yfplwNFRGLgeuA9wLDJHUlqY2B+Xl4PjAKIE9fF1hQLa+xjJmZNUEjr4baILcokPQm4MPAHFLSmJhnmwxcnodn5HHy9Gsj9ZHNAPbPV0ttCowBbm1U3GZmtrJGdkNtCEzL5y1WAS6JiCsk3QdMl3Q88Bfg7Dz/2cD5kuYCC0lXQBER90q6BLgPWAocmru3zMysSRqWLCLibuA9NcofosbVTBHxErBfD+s6ATih3jGamVnf+BfcZmZW5GRhZmZFThZmZlbkZGFmZkVOFmZmVuRkYWZmRX1KFpLeJ2mtPHyApB9I2qSxoZmZWbvoa8vip8CLkt4NHAE8CJzXsKjMzKyt9DVZLM233pgAnBERZwJrNy4sMzNrJ339Bffzko4CDgA+IGkV0vMpzMzsDaCvLYtPkp58d3BEPEW68+spDYvKzMzaSrFlkW8EeFFEjO8qi4hH8TkLM7M3jGLLIt/h9VVJ9X/cm5mZDQp9PWfRCdwjaSaw/DFyEXF4Q6IyM7O20tdkcVl+mZnZG1CfkkVETMtPu3trRNzf4JjMzKzN9PUX3B8FZgF/yOPbSprRwLjMzKyN9PXS2Smkp9stBoiIWcBmDYnIzMzaTl+TxSsRsaRb2av1DsbMzNpTX09w3yvp08AQSWOAw4E/NS4sMzNrJ31tWfwbsBXpV9wXAc8BX2lQTGZm1mb6ejXUi8DRwNH5F91rRcRLDY3MzMzaRl+vhvqlpHXyMy3uAe6T9LXGhmZmZu2ir91QW0bEc8C+wO+BTYHPNCooMzNrL31NFqtJWo2ULGZExCtANCwqMzNrK31NFmcBDwNrATfkR6o+17CozMysrfR6glvSf1RGTyO1Jg4AbgLG11zIzMxed0oti7Urr6H571jSeYuJjQ3NzMzaRa8ti4g4tla5pPWBPwLTGxGUmZm1l76es1hBRCwEVOdYzMysTQ0oWUgaDyyqcyxmZtamSie472HlS2TXB54ADmxUUGZm1l5Kt/vYp9t4AAsi4oVaM5uZ2etT6QT3I80KxMzM2teAzlmYmdkbS8OShaRRkq6TdJ+keyV9OZevL2mmpAfy3/VyuSSdLmmupLslbVdZ1+Q8/wOSJjcqZjMzq62RLYulwBERsSWwM3CopC2BI4FrImIMcE0eB9gLGJNfhwA/heW/6TgG2In0aNdjuhKMmZk1R8OSRUQ8GRF35uHngTnASGACMC3PNo10c0Jy+XmR3AwMk7QhsAcwMyIWRsQiYCawZ6PiNjOzlSmi8TePlTQauAHYGng0IoblcgGLImKYpCuAEyPipjztGuAbwDhgzYg4Ppd/G/h7RPxnt20cQmqRMGLEiO2nTx/Yj8s7Ozt5eMmyAS3bk21GrlvX9UGKc+jQoXVfb70NhjgHQ4zgOOvNca5s/Pjxd0TE2FrT+voM7gGTNBT4NfCViHgu5YckIkJSXbJVREwFpgKMHTs2xo0bN6D1dHR0cOpN9b0yeN6kgcXSm46ODga6j800GOIcDDGC46w3x9k/Db0aKj8D49fAhRFxWS5+Oncvkf8+k8vnA6Mqi2+cy3oqNzOzJmnk1VACzgbmRMQPKpNmAF1XNE0GLq+UH5ivitoZWBIRTwJXAbtLWi+f2N49l5mZWZM0shvqfaRHr94jaVYu+yZwInCJpIOBR4BP5Gm/A/YG5gIvAp+FdNNCSd8FbsvzHZdvZGhmZk3SsGSRT1T3dGfa3WrMH8ChPazrHOCc+kVnZmb94V9wm5lZkZOFmZkVOVmYmVmRk4WZmRU5WZiZWZGThZmZFTlZmJlZkZOFmZkVOVmYmVmRk4WZmRU5WZiZWZGThZmZFTlZmJlZkZOFmZkVOVmYmVmRk4WZmRU5WZiZWZGThZmZFTlZmJlZkZOFmZkVOVmYmVmRk4WZmRU5WZiZWZGThZmZFTlZmJlZkZOFmZkVOVmYmVmRk4WZmRU5WZiZWZGThZmZFTlZmJlZkZOFmZkVOVmYmVmRk4WZmRU5WZiZWVHDkoWkcyQ9I2l2pWx9STMlPZD/rpfLJel0SXMl3S1pu8oyk/P8D0ia3Kh4zcysZ41sWZwL7Nmt7EjgmogYA1yTxwH2Asbk1yHATyElF+AYYCdgR+CYrgRjZmbN07BkERE3AAu7FU8ApuXhacC+lfLzIrkZGCZpQ2APYGZELIyIRcBMVk5AZmbWYIqIxq1cGg1cERFb5/HFETEsDwtYFBHDJF0BnBgRN+Vp1wDfAMYBa0bE8bn828DfI+I/a2zrEFKrhBEjRmw/ffr0AcXc2dnJw0uWDWjZnmwzct26rg9SnEOHDq37euttMMQ5GGIEx1lvjnNl48ePvyMixtaatmpTIqghIkJS3TJVREwFpgKMHTs2xo0bN6D1dHR0cOpNL9QrLADmTRpYLL3p6OhgoPvYTIMhzsEQIzjOenOc/dPsq6Gezt1L5L/P5PL5wKjKfBvnsp7KzcysiZqdLGYAXVc0TQYur5QfmK+K2hlYEhFPAlcBu0taL5/Y3j2XmZlZEzWsG0rSRaRzDsMlPU66qulE4BJJBwOPAJ/Is/8O2BuYC7wIfBYgIhZK+i5wW57vuIjoftLczMwarGHJIiI+1cOk3WrMG8ChPaznHOCcOoZmZmb95F9wm5lZkZOFmZkVOVmYmVmRk4WZmRU5WZiZWZGThZmZFTlZmJlZkZOFmZkVOVmYmVmRk4WZmRU5WZiZWZGThZmZFTlZmJlZkZOFmZkVOVmYmVmRk4WZmRU5WZiZWZGThZmZFTlZmJlZkZOFmZkVOVmYmVmRk4WZmRU5WZiZWZGThZmZFTlZmJlZ0aqtDuCNYPSRV9Z1ffNO/Ehd12dmVuKWhZmZFTlZmJlZkZOFmZkVOVmYmVmRk4WZmRU5WZiZWZGThZmZFTlZmJlZkX+UNwiNPvJKjthmKQfV6cd+/pGfmZUMmpaFpD0l3S9prqQjWx2PmdkbyaBoWUgaApwJfBh4HLhN0oyIuK+1kb0+1Pt2JODWitnrzaBIFsCOwNyIeAhA0nRgAuBk0aa6ElA9u8sapV4xOkHa65kiotUxFEmaCOwZEf+axz8D7BQRh1XmOQQ4JI++Hbh/gJsbDjz7GsJtFsdZP4MhRnCc9eY4V7ZJRGxQa8JgaVkURcRUYOprXY+k2yNibB1CaijHWT+DIUZwnPXmOPtnsJzgng+MqoxvnMvMzKwJBkuyuA0YI2lTSasD+wMzWhyTmdkbxqDohoqIpZIOA64ChgDnRMS9Ddrca+7KahLHWT+DIUZwnPXmOPthUJzgNjOz1hos3VBmZtZCThZmZlbkZJG18+1EJM2TdI+kWZJuz2XrS5op6YH8d70WxHWOpGckza6U1YxLyem5fu+WtF2L45wiaX6u01mS9q5MOyrHeb+kPZoY5yhJ10m6T9K9kr6cy9uqTnuJs63qVNKakm6VdFeO89hcvqmkW3I8F+eLZpC0Rh6fm6ePbmGM50p6uFKX2+bylr2PiIg3/It00vxBYDNgdeAuYMtWx1WJbx4wvFvZycCRefhI4KQWxPUBYDtgdikuYG/g94CAnYFbWhznFOCrNebdMv//1wA2zcfFkCbFuSGwXR5eG/hrjqet6rSXONuqTnO9DM3DqwG35Hq6BNg/l58FfCkP/x/grDy8P3BxC2M8F5hYY/6WvY/cskiW304kIv4H6LqdSDubAEzLw9OAfZsdQETcACzsVtxTXBOA8yK5GRgmacMWxtmTCcD0iHg5Ih4G5pKOj4aLiCcj4s48/DwwBxhJm9VpL3H2pCV1muulM4+ull8B7Apcmsu712dXPV8K7CZJLYqxJy17HzlZJCOBxyrjj9P7wd9sAVwt6Y58WxOAERHxZB5+ChjRmtBW0lNc7VjHh+Wm/DmVbry2iDN3gbyH9E2zbeu0W5zQZnUqaYikWcAzwExSq2ZxRCytEcvyOPP0JcBbmh1jRHTV5Qm5Lk+TtEb3GGvE31BOFoPDLhGxHbAXcKikD1QnRmqftt010O0aV/ZTYHNgW+BJ4NSWRlMhaSjwa+ArEfFcdVo71WmNONuuTiNiWURsS7rrw47AO1ob0cq6xyhpa+AoUqw7AOsD32hdhImTRdLWtxOJiPn57zPAb0gH/dNdzc/895nWRbiCnuJqqzqOiKfzm/RV4Of8o1ukpXFKWo30AXxhRFyWi9uuTmvF2a51mmNbDFwHvJfUddP1g+RqLMvjzNPXBRa0IMY9c1dfRMTLwP+lDerSySJp29uJSFpL0tpdw8DuwGxSfJPzbJOBy1sT4Up6imsGcGC+mmNnYEmla6XpuvXzfpxUp5Di3D9fGbMpMAa4tUkxCTgbmBMRP6hMaqs67SnOdqtTSRtIGpaH30R6Hs4c0gfyxDxb9/rsqueJwLW5JdfsGP+78uVApHMq1bpszfuoWWfS2/1Fusrgr6Q+zaNbHU8lrs1IV5LcBdzbFRupL/Ua4AHgj8D6LYjtIlJ3wyukvtODe4qLdPXGmbl+7wHGtjjO83Mcd5PegBtW5j86x3k/sFcT49yF1MV0NzArv/ZutzrtJc62qlPgXcBfcjyzge/k8s1IyWou8CtgjVy+Zh6fm6dv1sIYr811ORu4gH9cMdWy95Fv92FmZkXuhjIzsyInCzMzK3KyMDOzIicLMzMrcrIwM7MiJwtrK5JC0qmV8a9KmlKndZ8raWJ5zte8nf0kzZF0Xbfy0arc+baX5b/ZuOhWiOXTlfGxkk5v9HZt8HKysHbzMvAvkoa3OpCqyi9+++Jg4PMRMX6Am+t3spA0pJ+LjAaWJ4uIuD0iDu/vdu2Nw8nC2s1S0jOH/737hO4tA0md+e84SddLulzSQ5JOlDQpPyfgHkmbV1bzIUm3S/qrpH3y8kMknSLptnzjti9U1nujpBnAfTXi+VRe/2xJJ+Wy75B+tHa2pFN62klJB0m6TNIflJ5TcXIuPxF4k9IzDC7MZQfkfZkl6WddiUFSp6RTJd0FvFfSd/I+zJY0Nf/6F0lbSPqj0jMT7sz1cSLw/rzOf8/7ekWef31J/5Xr4mZJ78rlU5RuENiR6/nwXL6WpCvz+mdL+mT532yDTrN+/eeXX315AZ3AOqRneKwLfBWYkqedS+Ue/0Bn/jsOWEx6zsIapHvlHJunfRn4YWX5P5C+JI0h/Zp7TeAQ4Ft5njWA20nPXRgHvABsWiPOjYBHgQ2AVUm/uN03T+ugxi9rSd/mZ+fhg4CH8j6uCTwCjKruVx5+J/BbYLU8/hPgwDwcwCcq865fGT4f+GgevgX4eB5eE3hz3rcrKvMvHwd+DByTh3cFZuXhKcCfch0NJ903aTXgfwM/r6xr3VYfR37V/+WWhbWdSHcwPQ/oT7fIbZFuvvYy6VYIV+fye0gf0l0uiYhXI+IB0of1O0j32zpQ6TbRt5BurzEmz39rpGcwdLcD0BERf4t0O+sLSQ9Z6o9rImJJRLxEarlsUmOe3YDtgdtyfLuRblcBsIx0M78u45We8HYP6UN+K6X7io2MiN8ARMRLEfFiIa5dSMmGiLgWeIukdfK0KyM9l+JZ0g0NR5Dq+MOSTpL0/ohY0p9KsMGhP/2wZs30Q+BO0h03uywld51KWoX0VMMuL1eGX62Mv8qKx3n3+9sE6X47/xYRV1UnSBpHalk0SjXmZdR+PwqYFhFH1Zj2UkQsg/R4TlKrY2xEPJYvClizzvFCjZgj4q9Kj/fcGzhe0jURcVwDtm0t5JaFtaWIWEh6/OXBleJ5pG/ZAB8jdYH0136SVsn99puRbmx3FfAlpdtuI+ltSnf47c2twAclDc/nED4FXD+AeGp5pSsW0g0EJ0r6pxzb+pJqtUC6EsOzSs+ZmAjLn2T3uKR98/JrSHoz8Dzpkai13AhMyvOPA56Nbs/VqJK0EfBiRFwAnEJ6hK29zrhlYe3sVOCwyvjPgcvzCd0/MLBv/Y+SPujXAb4YES9J+gWpq+rOfFL4bxQeUxsRT0o6knS7a5G6Z+p1m/ipwN2S7oyISZK+RXpS4iqkO+ceSjrHUY1nsaSfk+5S+hTptvtdPgP8TNJxefn9SHc5XZbr8lzSnU+7TAHOkXQ38CL/uG13T7YBTpH0al7/l/q/y9bufNdZMzMrcjeUmZkVOVmYmVmRk4WZmRU5WZiZWZGThZmZFTlZmJlZkZOFmZkV/X+97AgjZVTZGgAAAABJRU5ErkJggg==\n"
},
"metadata": {
"needs_background": "light"
}
}
],
"source": [
"# Histogram of user interactions with articles\n",
"df['email'].value_counts().hist(bins = 15)\n",
"plt.title('User Interactions with Articles Distribution')\n",
"plt.xlabel('Number of Interactions')\n",
"plt.ylabel('Users')\n",
"plt.show()"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"count 5148.000000\n",
"mean 8.930847\n",
"std 16.802267\n",
"min 1.000000\n",
"25% 1.000000\n",
"50% 3.000000\n",
"75% 9.000000\n",
"max 364.000000\n",
"Name: article_id, dtype: float64"
]
},
"metadata": {},
"execution_count": 5
}
],
"source": [
"# Basic statistics of user interactions on number of articles\n",
"user_inter = df.groupby('email')['article_id'].count()\n",
"user_inter.describe()"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"3.0"
]
},
"metadata": {},
"execution_count": 6
}
],
"source": [
"# Number of articles that 50% of individuals interact with\n",
"df.groupby('email')['article_id'].count().median()"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"collapsed": true
},
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"364"
]
},
"metadata": {},
"execution_count": 7
}
],
"source": [
"# Max number of user-article interaction by any one user\n",
"df.groupby('email')['article_id'].count().max()"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# Fill in the median and maximum number of user_article interactions below\n",
"\n",
"median_val = 3 # 50% of individuals interact with ____ number of articles or fewer.\n",
"max_views_by_user = 364 # The maximum number of user-article interactions by any 1 user is ______."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`2.` Explore and remove duplicate articles from the **df_content** dataframe. "
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"0\n"
]
}
],
"source": [
"# Number of duplicate articles \n",
"print(df_content.duplicated().sum())"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"collapsed": true
},
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"5"
]
},
"metadata": {},
"execution_count": 10
}
],
"source": [
"# Find and explore duplicate articles\n",
"df_content.duplicated(['article_id']).sum()"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# Remove any rows that have the same article_id, only keep the first\n",
"df_content.drop_duplicates(subset='article_id', keep='first', inplace=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`3.` Use the cells below to find:\n",
"\n",
"**a.** The number of unique articles that have an interaction with a user. \n",
"**b.** The number of unique articles in the dataset (whether they have any interactions or not).<br>\n",
"**c.** The number of unique users in the dataset. (excluding null values) <br>\n",
"**d.** The number of user-article interactions in the dataset."
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"collapsed": true
},
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"714"
]
},
"metadata": {},
"execution_count": 12
}
],
"source": [
"# The number of unique articles that have an interaction with a user\n",
"df.article_id.nunique()"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"article_id\n",
"0.0 12\n",
"2.0 44\n",
"4.0 13\n",
"8.0 82\n",
"9.0 10\n",
" ..\n",
"1440.0 8\n",
"1441.0 6\n",
"1442.0 4\n",
"1443.0 12\n",
"1444.0 5\n",
"Name: email, Length: 714, dtype: int64"
]
},
"metadata": {},
"execution_count": 13
}
],
"source": [
"# Article id by number of unique user(email) interactions\n",
"df.groupby('article_id')['email'].nunique()"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"1051"
]
},
"metadata": {},
"execution_count": 14
}
],
"source": [
"# The number of unique articles in the dataset\n",
"df_content.article_id.nunique()"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"5148"
]
},
"metadata": {},
"execution_count": 15
}
],
"source": [
"# The number of unique users(email) in the dataset\n",
"df.email.nunique()"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"45993"
]
},
"metadata": {},
"execution_count": 16
}
],
"source": [
"# The number of user-article interactions in the dataset\n",
"df.shape[0]"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"unique_articles = 714 # The number of unique articles that have at least one interaction\n",
"total_articles = 1051 # The number of unique articles on the IBM platform\n",
"unique_users = 5148 # The number of unique users\n",
"user_article_interactions = 45993 # The number of user-article interactions"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`4.` Use the cells below to find the most viewed **article_id**, as well as how often it was viewed. After talking to the company leaders, the `email_mapper` function was deemed a reasonable way to map users to ids. There were a small number of null values, and it was found that all of these null values likely belonged to a single user (which is how they are stored using the function below)."
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"collapsed": true
},
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"article_id\n",
"1429.0 937\n",
"1330.0 927\n",
"1431.0 671\n",
"1427.0 643\n",
"1364.0 627\n",
"Name: email, dtype: int64"
]
},
"metadata": {},
"execution_count": 18
}
],
"source": [
"# The top article id with the most views in the dataset \n",
"df.groupby(['article_id'])['email'].count().sort_values(ascending = False).head()"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"most_viewed_article_id = '1429.0' # The most viewed article in the dataset as a string with one value following the decimal \n",
"max_views = 937 # The most viewed article in the dataset was viewed how many times?"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {
"collapsed": true
},
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
" article_id title user_id\n",
"0 1430.0 using pixiedust for fast, flexible, and easier... 1\n",
"1 1314.0 healthcare python streaming application demo 2\n",
"2 1429.0 use deep learning for image classification 3\n",
"3 1338.0 ml optimization using cognitive assistant 4\n",
"4 1276.0 deploy your python model as a restful api 5"
],
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>article_id</th>\n <th>title</th>\n <th>user_id</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>1430.0</td>\n <td>using pixiedust for fast, flexible, and easier...</td>\n <td>1</td>\n </tr>\n <tr>\n <th>1</th>\n <td>1314.0</td>\n <td>healthcare python streaming application demo</td>\n <td>2</td>\n </tr>\n <tr>\n <th>2</th>\n <td>1429.0</td>\n <td>use deep learning for image classification</td>\n <td>3</td>\n </tr>\n <tr>\n <th>3</th>\n <td>1338.0</td>\n <td>ml optimization using cognitive assistant</td>\n <td>4</td>\n </tr>\n <tr>\n <th>4</th>\n <td>1276.0</td>\n <td>deploy your python model as a restful api</td>\n <td>5</td>\n </tr>\n </tbody>\n</table>\n</div>"
},
"metadata": {},
"execution_count": 20
}
],
"source": [
"## No need to change the code here - this will be helpful for later parts of the notebook\n",
"# Run this cell to map the user email to a user_id column and remove the email column\n",
"\n",
"def email_mapper():\n",
" coded_dict = dict()\n",
" cter = 1\n",
" email_encoded = []\n",
" \n",
" for val in df['email']:\n",
" if val not in coded_dict:\n",
" coded_dict[val] = cter\n",
" cter+=1\n",
" \n",
" email_encoded.append(coded_dict[val])\n",
" return email_encoded\n",
"\n",
"email_encoded = email_mapper()\n",
"del df['email']\n",
"df['user_id'] = email_encoded\n",
"\n",
"# show header\n",
"df.head()"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {
"collapsed": true
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"It looks like you have everything right here! Nice job!\n"
]
}
],
"source": [
"## If you stored all your results in the variable names above, \n",
"## you shouldn't need to change anything in this cell\n",
"\n",
"sol_1_dict = {\n",
" '`50% of individuals have _____ or fewer interactions.`': median_val,\n",
" '`The total number of user-article interactions in the dataset is ______.`': user_article_interactions,\n",
" '`The maximum number of user-article interactions by any 1 user is ______.`': max_views_by_user,\n",
" '`The most viewed article in the dataset was viewed _____ times.`': max_views,\n",
" '`The article_id of the most viewed article is ______.`': most_viewed_article_id,\n",
" '`The number of unique articles that have at least 1 rating ______.`': unique_articles,\n",
" '`The number of unique users in the dataset is ______`': unique_users,\n",
" '`The number of unique articles on the IBM platform`': total_articles\n",
"}\n",
"\n",
"# Test your dictionary against the solution\n",
"t.sol_1_test(sol_1_dict)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### <a class=\"anchor\" id=\"Rank\">Part II: Rank-Based Recommendations</a>\n",
"\n",
"Unlike in the earlier lessons, we don't actually have ratings for whether a user liked an article or not. We only know that a user has interacted with an article. In these cases, the popularity of an article can really only be based on how often an article was interacted with.\n",
"\n",
"`1.` Fill in the function below to return the **n** top articles ordered with most interactions as the top. Test your function using the tests below."
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"['use deep learning for image classification',\n",
" 'insights from new york car accident reports',\n",
" 'visualize car data with brunel',\n",
" 'use xgboost, scikit-learn & ibm watson machine learning apis',\n",
" 'predicting churn with the spss random tree algorithm']"
]
},
"metadata": {},
"execution_count": 22
}
],
"source": [
"# Top articles title list\n",
"top_articles = list(df.title.value_counts().head().index)\n",
"top_articles"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"[1429.0, 1330.0, 1431.0, 1427.0, 1364.0]"
]
},
"metadata": {},
"execution_count": 23
}
],
"source": [
"# Top article id list\n",
"top_articles = list(df.article_id.value_counts().head().index)\n",
"top_articles"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"def get_top_articles(n, df=df):\n",
" '''\n",
" INPUT:\n",
" n - (int) the number of top articles to return\n",
" df - (pandas dataframe) df as defined at the top of the notebook \n",
" \n",
" OUTPUT:\n",
" top_articles - (list) A list of the top 'n' article titles \n",
" \n",
" '''\n",
" # Your code here\n",
" top_articles = list(df.title.value_counts().head(n).index)\n",
" return top_articles # Return the top article titles from df (not df_content)\n",
"\n",
"def get_top_article_ids(n, df=df):\n",
" '''\n",
" INPUT:\n",
" n - (int) the number of top articles to return\n",
" df - (pandas dataframe) df as defined at the top of the notebook \n",
" \n",
" OUTPUT:\n",
" top_articles - (list) A list of the top 'n' article titles \n",
" \n",
" '''\n",
" # Your code here\n",
" top_articles = list(df.article_id.value_counts().head(n).index) \n",
"\n",
" return top_articles # Return the top article ids"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {
"collapsed": true
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"['use deep learning for image classification', 'insights from new york car accident reports', 'visualize car data with brunel', 'use xgboost, scikit-learn & ibm watson machine learning apis', 'predicting churn with the spss random tree algorithm', 'healthcare python streaming application demo', 'finding optimal locations of new store using decision optimization', 'apache spark lab, part 1: basic concepts', 'analyze energy consumption in buildings', 'gosales transactions for logistic regression model']\n[1429.0, 1330.0, 1431.0, 1427.0, 1364.0, 1314.0, 1293.0, 1170.0, 1162.0, 1304.0]\n"
]
}
],
"source": [
"print(get_top_articles(10))\n",
"print(get_top_article_ids(10))"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {
"collapsed": true
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Your top_5 looks like the solution list! Nice job.\nYour top_10 looks like the solution list! Nice job.\nYour top_20 looks like the solution list! Nice job.\n"
]
}
],
"source": [
"# Test your function by returning the top 5, 10, and 20 articles\n",
"top_5 = get_top_articles(5)\n",
"top_10 = get_top_articles(10)\n",
"top_20 = get_top_articles(20)\n",
"\n",
"# Test each of your three lists from above\n",
"t.sol_2_test(get_top_articles)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### <a class=\"anchor\" id=\"User-User\">Part III: User-User Based Collaborative Filtering</a>\n",
"\n",
"\n",
"`1.` Use the function below to reformat the **df** dataframe to be shaped with users as the rows and articles as the columns. \n",
"\n",
"* Each **user** should only appear in each **row** once.\n",
"\n",
"\n",
"* Each **article** should only show up in one **column**. \n",
"\n",
"\n",
"* **If a user has interacted with an article, then place a 1 where the user-row meets for that article-column**. It does not matter how many times a user has interacted with the article, all entries where a user has interacted with an article should be a 1. \n",
"\n",
"\n",
"* **If a user has not interacted with an item, then place a zero where the user-row meets for that article-column**. \n",
"\n",
"Use the tests to make sure the basic structure of your matrix matches what is expected by the solution."
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"article_id 0.0 2.0 4.0 8.0 9.0 12.0 14.0 15.0 \\\n",
"user_id \n",
"1 0 0 0 0 0 0 0 0 \n",
"2 0 0 0 0 0 0 0 0 \n",
"3 0 0 0 0 0 1 0 0 \n",
"4 0 0 0 0 0 0 0 0 \n",
"5 0 0 0 0 0 0 0 0 \n",
"... ... ... ... ... ... ... ... ... \n",
"5145 0 0 0 0 0 0 0 0 \n",
"5146 0 0 0 0 0 0 0 0 \n",
"5147 0 0 0 0 0 0 0 0 \n",
"5148 0 0 0 0 0 0 0 0 \n",
"5149 0 0 0 0 0 0 0 0 \n",
"\n",
"article_id 16.0 18.0 ... 1434.0 1435.0 1436.0 1437.0 1439.0 \\\n",
"user_id ... \n",
"1 0 0 ... 0 0 1 0 1 \n",
"2 0 0 ... 0 0 0 0 0 \n",
"3 0 0 ... 0 0 1 0 0 \n",
"4 0 0 ... 0 0 0 0 0 \n",
"5 0 0 ... 0 0 0 0 0 \n",
"... ... ... ... ... ... ... ... ... \n",
"5145 0 0 ... 0 0 0 0 0 \n",
"5146 0 0 ... 0 0 0 0 0 \n",
"5147 0 0 ... 0 0 0 0 0 \n",
"5148 0 0 ... 0 0 0 0 0 \n",
"5149 1 0 ... 0 0 0 0 0 \n",
"\n",
"article_id 1440.0 1441.0 1442.0 1443.0 1444.0 \n",
"user_id \n",
"1 0 0 0 0 0 \n",
"2 0 0 0 0 0 \n",
"3 0 0 0 0 0 \n",
"4 0 0 0 0 0 \n",
"5 0 0 0 0 0 \n",
"... ... ... ... ... ... \n",
"5145 0 0 0 0 0 \n",
"5146 0 0 0 0 0 \n",
"5147 0 0 0 0 0 \n",
"5148 0 0 0 0 0 \n",
"5149 0 0 0 0 0 \n",
"\n",
"[5149 rows x 714 columns]"
],
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th>article_id</th>\n <th>0.0</th>\n <th>2.0</th>\n <th>4.0</th>\n <th>8.0</th>\n <th>9.0</th>\n <th>12.0</th>\n <th>14.0</th>\n <th>15.0</th>\n <th>16.0</th>\n <th>18.0</th>\n <th>...</th>\n <th>1434.0</th>\n <th>1435.0</th>\n <th>1436.0</th>\n <th>1437.0</th>\n <th>1439.0</th>\n <th>1440.0</th>\n <th>1441.0</th>\n <th>1442.0</th>\n <th>1443.0</th>\n <th>1444.0</th>\n </tr>\n <tr>\n <th>user_id</th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>1</th>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>...</td>\n <td>0</td>\n <td>0</td>\n <td>1</td>\n <td>0</td>\n <td>1</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n </tr>\n <tr>\n <th>2</th>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>...</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n </tr>\n <tr>\n <th>3</th>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>1</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>...</td>\n <td>0</td>\n <td>0</td>\n <td>1</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n </tr>\n <tr>\n <th>4</th>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>...</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n </tr>\n <tr>\n <th>5</th>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>...</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n </tr>\n <tr>\n <th>...</th>\n <td>...</td>\n <td>...</td>\n <td>...</td>\n <td>...</td>\n <td>...</td>\n <td>...</td>\n <td>...</td>\n <td>...</td>\n <td>...</td>\n <td>...</td>\n <td>...</td>\n <td>...</td>\n <td>...</td>\n <td>...</td>\n <td>...</td>\n <td>...</td>\n <td>...</td>\n <td>...</td>\n <td>...</td>\n <td>...</td>\n <td>...</td>\n </tr>\n <tr>\n <th>5145</th>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>...</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n </tr>\n <tr>\n <th>5146</th>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>...</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n </tr>\n <tr>\n <th>5147</th>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>...</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n </tr>\n <tr>\n <th>5148</th>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>...</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n </tr>\n <tr>\n <th>5149</th>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>1</td>\n <td>0</td>\n <td>...</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n </tr>\n </tbody>\n</table>\n<p>5149 rows × 714 columns</p>\n</div>"
},
"metadata": {},
"execution_count": 27
}
],
"source": [
"# Create user-article matrix with 1's and 0's\n",
"user_item = df.groupby(['user_id', 'article_id'])['title'].max().unstack()\n",
"user_item = user_item.notnull() # True and False\n",
"user_item = user_item.apply(lambda x: x*1) # Set 1s and 0s\n",
"user_item"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# create the user-article matrix with 1's and 0's\n",
"\n",
"def create_user_item_matrix(df):\n",
" '''\n",
" INPUT:\n",
" df - pandas dataframe with article_id, title, user_id columns\n",
" \n",
" OUTPUT:\n",
" user_item - user item matrix \n",
" \n",
" Description:\n",
" Return a matrix with user ids as rows and article ids on the columns with 1 values where a user interacted with \n",
" an article and a 0 otherwise\n",
" '''\n",
" # Fill in the function here\n",
" user_item = df.groupby(['user_id', 'article_id'])['title'].max().unstack()\n",
" user_item = user_item.notnull()\n",
" user_item = user_item.apply(lambda x: x*1)\n",
" return user_item # return the user_item matrix \n",
"\n",
"user_item = create_user_item_matrix(df)"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"article_id 0.0 2.0 4.0 8.0 9.0 12.0 14.0 15.0 \\\n",
"user_id \n",
"1 0 0 0 0 0 0 0 0 \n",
"2 0 0 0 0 0 0 0 0 \n",
"3 0 0 0 0 0 1 0 0 \n",
"4 0 0 0 0 0 0 0 0 \n",
"5 0 0 0 0 0 0 0 0 \n",
"\n",
"article_id 16.0 18.0 ... 1434.0 1435.0 1436.0 1437.0 1439.0 \\\n",
"user_id ... \n",
"1 0 0 ... 0 0 1 0 1 \n",
"2 0 0 ... 0 0 0 0 0 \n",
"3 0 0 ... 0 0 1 0 0 \n",
"4 0 0 ... 0 0 0 0 0 \n",
"5 0 0 ... 0 0 0 0 0 \n",
"\n",
"article_id 1440.0 1441.0 1442.0 1443.0 1444.0 \n",
"user_id \n",
"1 0 0 0 0 0 \n",
"2 0 0 0 0 0 \n",
"3 0 0 0 0 0 \n",
"4 0 0 0 0 0 \n",
"5 0 0 0 0 0 \n",
"\n",
"[5 rows x 714 columns]"
],
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th>article_id</th>\n <th>0.0</th>\n <th>2.0</th>\n <th>4.0</th>\n <th>8.0</th>\n <th>9.0</th>\n <th>12.0</th>\n <th>14.0</th>\n <th>15.0</th>\n <th>16.0</th>\n <th>18.0</th>\n <th>...</th>\n <th>1434.0</th>\n <th>1435.0</th>\n <th>1436.0</th>\n <th>1437.0</th>\n <th>1439.0</th>\n <th>1440.0</th>\n <th>1441.0</th>\n <th>1442.0</th>\n <th>1443.0</th>\n <th>1444.0</th>\n </tr>\n <tr>\n <th>user_id</th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>1</th>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>...</td>\n <td>0</td>\n <td>0</td>\n <td>1</td>\n <td>0</td>\n <td>1</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n </tr>\n <tr>\n <th>2</th>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>...</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n </tr>\n <tr>\n <th>3</th>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>1</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>...</td>\n <td>0</td>\n <td>0</td>\n <td>1</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n </tr>\n <tr>\n <th>4</th>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>...</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n </tr>\n <tr>\n <th>5</th>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>...</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n </tr>\n </tbody>\n</table>\n<p>5 rows × 714 columns</p>\n</div>"
},
"metadata": {},
"execution_count": 29
}
],
"source": [
"user_item.head()"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {
"collapsed": true
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"You have passed our quick tests! Please proceed!\n"
]
}
],
"source": [
"## Tests: You should just need to run this cell. Don't change the code.\n",
"assert user_item.shape[0] == 5149, \"Oops! The number of users in the user-article matrix doesn't look right.\"\n",
"assert user_item.shape[1] == 714, \"Oops! The number of articles in the user-article matrix doesn't look right.\"\n",
"assert user_item.sum(axis=1)[1] == 36, \"Oops! The number of articles seen by user 1 doesn't look right.\"\n",
"print(\"You have passed our quick tests! Please proceed!\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`2.` Complete the function below which should take a user_id and provide an ordered list of the most similar users to that user (from most similar to least similar). The returned result should not contain the provided user_id, as we know that each user is similar to him/herself. Because the results for each user here are binary, it (perhaps) makes sense to compute similarity as the dot product of two users. \n",
"\n",
"Use the tests to test your function."
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"def find_similar_users(user_id, user_item=user_item):\n",
" '''\n",
" INPUT:\n",
" user_id - (int) a user_id\n",
" user_item - (pandas dataframe) matrix of users by articles: \n",
" 1's when a user has interacted with an article, 0 otherwise\n",
" \n",
" OUTPUT:\n",
" similar_users - (list) an ordered list where the closest users (largest dot product users)\n",
" are listed first\n",
" \n",
" Description:\n",
" Computes the similarity of every pair of users based on the dot product\n",
" Returns an ordered\n",
" \n",
" '''\n",
" # Compute similarity of each user to the provided user\n",
" similarity = user_item.dot(user_item.loc[user_id]) \n",
" # Sort by similarity\n",
" similarity = similarity.sort_values(ascending=False)\n",
" # Create list of just the ids\n",
" most_similar_users = list(similarity.index)\n",
" # Remove the own user's id\n",
" most_similar_users.remove(user_id)\n",
"\n",
" return most_similar_users # return a list of the users in order from most to least similar\n",
" "
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {
"collapsed": true
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [