-
Notifications
You must be signed in to change notification settings - Fork 30
/
Lecture11.tex
215 lines (179 loc) · 7.99 KB
/
Lecture11.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
% Lecture 10: 10 November 2014
\sektion{11}{Spam}
Focus on email.
Scope of problem:
\begin{itemize}
\item Vast majority of email is spam (99+\%)
\item Lots is fraudulent (or inappropriate)
\item 5\% of US users have bought something from a spammer
The anonymity makes this attractive for certain kinds of products
\item Spamming often pays (low cost to send, so need little success to
profit)
\end{itemize}
\sidenote{{\bf Review: how email works}
\begin{itemize}
\item Messages written in standard format
\begin{itemize}
\item Headers: To, From, Date, ...
\item Body: can encode different media types in body
\end{itemize}
\item Traditionally:
\framebox{\parbox[c]{2cm}{sender's computer}} $\xrightarrow{\text{SMTP}}$
\framebox{\parbox[c]{2cm}{sender's MTA}} $\xrightarrow{\text{SMTP}}$
\framebox{\parbox[c]{2cm}{recipient's MTA}} $\xrightarrow{\text{IMAP}}$
\framebox{\parbox[c]{2cm}{recipient's computer}}
(MTA: Mail Transfer Agent)
\item Webmail model:
\framebox{\parbox[c]{2cm}{sender's computer}} $\xrightarrow{\text{HTTP(S)}}$
\framebox{\parbox[c]{2cm}{sender's mail\\service}} $\xrightarrow{\text{SMTP}}$
\framebox{\parbox[c]{2cm}{recipient's mail\\service}} $\xrightarrow{\text{HTTP(S)}}$
\framebox{\parbox[c]{2cm}{recipient's computer}}
\item More complexities:
\begin{itemize}
\item Forwarding
\item Mailing lists
\item Autoresponders
\end{itemize}
\end{itemize}
}
\subsektion{Economics of spam}
It is very cheap to send email.
Most of the cost falls on recipient
\begin{itemize}
\item Needs to store message
\item Human takes the time to actually read the message
\end{itemize}
Sender will send when sender's cost $<$ sender's benefit.
What we would like: send when total cost $<$ total benefit.
\subsektion{Anti-spam strategies}
Laws, technology, or combination
Note that most spam is already illegal (either fraudulent offer, false
adverising, or offering an illegal product).
\begin{definition}{Spam}
\begin{enumerate}
\item Email the recipient doesn't want to receive
Problems:
\begin{itemize}
\item Defined after the fact
\item Legally problematic (anyone can say they didn't want some
message)
\item Not what you want (just not wanting it doesn't make it spam)
\end{itemize}
\item Unsolicited email
Problems:
\begin{itemize}
\item What does this mean? (May not explicitly have asked for a
given email)
\item Lots of unsolicited email is wanted
\end{itemize}
\item Unsolicited commercial email
Problems: less than in definition 2, but still the same issues
\end{enumerate}
\end{definition}
{\bf Free speech:} (legal constraint, principle we'd like to honor)
\begin{itemize}
\item Minimum: don't stop a communication if both parties want it
\item Legally, there's no absolute right not to hear undesired speech.
\item Commercial speech gets less protection than political speech.
\end{itemize}
\begin{definition}{Spam (CAN SPAM Act)}
Any commercial, non-political email is spam unless:
\begin{enumerate}[(a)]
\item it's non-commercial, or
\item it's political speech, or
\item recipient has explicitly consented to receive it, or
\item sender has a continuing business relationship with recipient, or
\item email relates to an ongoing commercial transaction between the sender
and receiver
\end{enumerate}
\end{definition}
Note: this definition is not testable in software.
Vigorous enforcement against wire fraud, false medical claims, etc. (can follow
the money)
Law against forging the From address is surprisingly effective\\
Disadvantages for spammer:
\begin{itemize}
\item Forced to identify themself
\item Easy to filter
\end{itemize}
Private lawsuits
\begin{itemize}
\item ISP sue spammer to cover their server resources, etc.
(AOL: sue spammers and give a random user their stuff!)
\item Serves as a deterrent (expensive to bring lawsuit)
\end{itemize}
Laws have succeeded in drawing a line between spam and legitimate businesses.
Anti-spam technologies:
\begin{itemize}
\item Blacklist:
List of ``known spammers'', refuse to accept mail from them
\begin{itemize}
\item If list holds From addresses: Spammer will spoof From address
(then spammer is also breaking the law)
\item If list holds IP addresses: Spammers move around, compromise
innocent users' machines and send spam from there (very common);
also note that outgoing email IP address is often shared
\end{itemize}
How aggressive should you be about putting people on the blacklist?
\begin{itemize}
\item Too slow: spammers can get away with spamming
\item Too quick: false blocking
%(Felten's domain blocked on SpamCop's list of banned IP
%addresses, got cancelled by webhosting company for policy
%violation -- had never sent an email from that domain, so had a
%good argument against being a spammer; had written a blog post,
%and someone linked to that post then sent it to a friend who
%marked it as spam and couldn't unreport; blacklisted 2 weeks)
\end{itemize}
Denial of service: forge spam "from" victim
\item Whitelist:
List of people you know, reject email from everybody else
\begin{itemize}
\item Blocks too much
\item But: can combine with other countermeasures (exempt certain
people from another countermeasure)
\end{itemize}
\item Require payment (Pay-to-send):
\begin{itemize}
\item Pay in money:
Sender pays receiver\\
OR sender pays receiver IF receiver reports messsage as spam
(generates incentive problem for reciever)\\
OR sender pays charity if receiver reports as spam
Problem: really expensive for large mailing lists
\item Pay in wasted computing time:
Sender must solve some difficult computational puzzle
Works internationally, but big problem for large mailing lists,
destroys computing time
\item Pay in human attention:
CAPTCHA %(Completely Automated Public Turing test to tell
%Computers and Humans Apart)
Can hire solving of CAPTCHAs in various ways (sweatshops, make
people solve to see porn, ...)
\item Pros and Cons
\begin{enumerate}[(a)]
\item raises cost of spamming (+)
\item often raises cost of legit mail (-)
\item often wastes resources rather than transferring them (-)
\end{enumerate}
\end{itemize}
\item Sender authentication
\begin{itemize}
\item address-based (e.g. Princeton says which IPs can send @princeton.edu From address)
\item signature by domain owner
\end{itemize}
\item Content-based filtering:
Recipient applies filter algorithm ot content of incoming email
\begin{itemize}
\item Early days: keyword-based
Lots of false positives, ways to work around these
\item Now: word-based machine learning, often also personalized
(relying on user to label as spam or not)
Spammer can test the non-personalized part by making an account
and seeing what gets marked as spam -- until filters started
looking for the word-salad test messages
\item Collaborative filtering: use other users' reports as indicator
of spam
\end{itemize}
\end{itemize}
Robocalls now a huge problem: FTC gave public challenge to solving this