Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for localising regexps #18

Open
wants to merge 6 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 12 additions & 16 deletions email_reply_parser/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
import re
from regexps import regexps

"""
email_reply_parser is a python library port of GitHub's Email Reply Parser.
Expand All @@ -12,17 +13,17 @@ class EmailReplyParser(object):
"""

@staticmethod
def read(text):
def read(text, locale='en'):
""" Factory method that splits email into list of fragments

text - A string email body

Returns an EmailMessage instance
"""
return EmailMessage(text).read()
return EmailMessage(text, locale).read()

@staticmethod
def parse_reply(text):
def parse_reply(text, locale='en'):
""" Provides the reply portion of email.

text - A string email body
Expand All @@ -36,17 +37,12 @@ class EmailMessage(object):
""" An email message represents a parsed email body.
"""

SIG_REGEX = r'(--|__|-\w)|(^Sent from my (\w+\s*){1,3})'
QUOTE_HDR_REGEX = r'^:etorw.*nO'
MULTI_QUOTE_HDR_REGEX = r'(?!On.*On\s.+?wrote:)(On\s(.+?)wrote:)'
QUOTED_REGEX = r'(>+)'
HEADER_REGEX = r'^(From|Sent|To|Subject): .+'

def __init__(self, text):
def __init__(self, text, locale='en'):
self.fragments = []
self.fragment = None
self.text = text.replace('\r\n', '\n')
self.found_visible = False
self.regexps = regexps(locale)

def read(self):
""" Creates new fragment for each line
Expand All @@ -57,9 +53,9 @@ def read(self):

self.found_visible = False

is_multi_quote_header = re.search(self.MULTI_QUOTE_HDR_REGEX, self.text, re.MULTILINE | re.DOTALL)
is_multi_quote_header = re.search(self.regexps['multi_quote_hdr'], self.text, re.MULTILINE | re.DOTALL)
if is_multi_quote_header:
expr = re.compile(self.MULTI_QUOTE_HDR_REGEX, flags=re.DOTALL)
expr = re.compile(self.regexps['multi_quote_hdr'], flags=re.DOTALL)
self.text = expr.sub(
is_multi_quote_header.groups()[0].replace('\n', ''),
self.text)
Expand Down Expand Up @@ -92,11 +88,11 @@ def _scan_line(self, line):
line - a row of text from an email message
"""

is_quoted = re.match(self.QUOTED_REGEX, line) is not None
is_header = re.match(self.HEADER_REGEX, line) is not None
is_quoted = re.match(self.regexps['quoted'], line) is not None
is_header = re.match(self.regexps['header'], line) is not None

if self.fragment and len(line.strip()) == 0:
if re.match(self.SIG_REGEX, self.fragment.lines[-1]):
if re.match(self.regexps['sig'], self.fragment.lines[-1]):
self.fragment.signature = True
self._finish_fragment()

Expand All @@ -115,7 +111,7 @@ def quote_header(self, line):

Returns True or False
"""
return re.match(self.QUOTE_HDR_REGEX, line[::-1]) != None
return re.match(self.regexps['quote_hdr'], line[::-1]) != None

def _finish_fragment(self):
""" Creates fragment
Expand Down
13 changes: 13 additions & 0 deletions email_reply_parser/locales.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
regexps:
en:
sig: '(--|__|-\w)|(^Sent from my (\w+\s*){1,3})'
quote_hdr: '^:etorw.*nO'
multi_quote_hdr: '(?!On.*On\s.+?wrote:)(On\s(.+?)wrote:)'
quoted: '(>+)'
header: '^(From|Sent|To|Subject): .+'
it:
sig: '(--|__|-\w)|(^Inviato da (\w+\s*){1,3})'
quote_hdr: '^:ottircs\sah.*lI'
multi_quote_hdr: '(?!Il.*Il\s.+?ha\sscritto:)(Il\s(.+?)ha\sscritto:)'
quoted: '(>+)'
header: '^(Da|Data|A|Ogg): .+'
6 changes: 6 additions & 0 deletions email_reply_parser/regexps.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
from yaml import load

def regexps(locale):
with open('email_reply_parser/locales.yaml', 'r') as stream:
return load(stream)['regexps'][locale]

4 changes: 3 additions & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,5 +32,7 @@
"Programming Language :: Python :: 3.2",
"Programming Language :: Python :: 3.3",
"Programming Language :: Python :: 3.4",
]
],
requires=['pyyaml'],
tests_require=['pyyaml']
)
Original file line number Diff line number Diff line change
Expand Up @@ -126,6 +126,5 @@ def get_email(self, name):
text = f.read()
return EmailReplyParser.read(text)


if __name__ == '__main__':
unittest.main()
4 changes: 4 additions & 0 deletions test/emails/it/correct_sig.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
this is an email with a correct -- signature.

--
rick
13 changes: 13 additions & 0 deletions test/emails/it/email_1_1.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
Hi folks

What is the best way to clear a Riak bucket of all key, values after
running a test?
I am currently using the Java HTTP API.

-Abhishek Kona


_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
51 changes: 51 additions & 0 deletions test/emails/it/email_1_2.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
Hi,
Il giorno sabato 26 maggio 2012 16:58:20 UTC+2, Avv. Michele D'Auria ha scritto:
> Hi folks
>
> What is the best way to clear a Riak bucket of all key, values after
> running a test?
> I am currently using the Java HTTP API.

You can list the keys for the bucket and call delete for each. Or if you
put the keys (and kept track of them in your test) you can delete them
one at a time (without incurring the cost of calling list first.)

Something like:

String bucket = "my_bucket";
BucketResponse bucketResponse = riakClient.listBucket(bucket);
RiakBucketInfo bucketInfo = bucketResponse.getBucketInfo();

for(String key : bucketInfo.getKeys()) {
riakClient.delete(bucket, key);
}


would do it.

See also

http://wiki.basho.com/REST-API.html#Bucket-operations

which says

"At the moment there is no straightforward way to delete an entire
Bucket. There is, however, an open ticket for the feature. To delete all
the keys in a bucket, you’ll need to delete them all individually."

>
> -Abhishek Kona
>
>
> _______________________________________________
> riak-users mailing list
> [email protected]
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com




_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
55 changes: 55 additions & 0 deletions test/emails/it/email_1_3.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
Oh thanks.

Having the function would be great.

-Abhishek Kona

Il giorno 12/dic/2015 11:35, "studiolaportamichele via Domiciliazioni Legali" <[email protected]> ha scritto:
> Hi,
> On Tue, 2011-03-01 at 18:02 +0530, Abhishek Kona wrote:
>> Hi folks
>>
>> What is the best way to clear a Riak bucket of all key, values after
>> running a test?
>> I am currently using the Java HTTP API.
> You can list the keys for the bucket and call delete for each. Or if you
> put the keys (and kept track of them in your test) you can delete them
> one at a time (without incurring the cost of calling list first.)
>
> Something like:
>
> String bucket = "my_bucket";
> BucketResponse bucketResponse = riakClient.listBucket(bucket);
> RiakBucketInfo bucketInfo = bucketResponse.getBucketInfo();
>
> for(String key : bucketInfo.getKeys()) {
> riakClient.delete(bucket, key);
> }
>
>
> would do it.
>
> See also
>
> http://wiki.basho.com/REST-API.html#Bucket-operations
>
> which says
>
> "At the moment there is no straightforward way to delete an entire
> Bucket. There is, however, an open ticket for the feature. To delete all
> the keys in a bucket, you’ll need to delete them all individually."
>
>> -Abhishek Kona
>>
>>
>> _______________________________________________
>> riak-users mailing list
>> [email protected]
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>


_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
5 changes: 5 additions & 0 deletions test/emails/it/email_1_4.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
Awesome! I haven't had another problem with it.

Il Domenica 6 Dicembre 2015 7:34, 'Elisa Di Maggio' via Domiciliazioni Legali <[email protected]> ha scritto:

> Loader seems to be working well.
15 changes: 15 additions & 0 deletions test/emails/it/email_1_5.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
One: Here's what I've got.

- This would be the first bullet point that wraps to the second line
to the next
- This is the second bullet point and it doesn't wrap
- This is the third bullet point and I'm having trouble coming up with enough
to say
- This is the fourth bullet point

Two:
- Here is another bullet point
- And another one

This is a paragraph that talks about a bunch of stuff. It goes on and on
for a while.
15 changes: 15 additions & 0 deletions test/emails/it/email_1_6.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
I get proper rendering as well.

Sent from a magnificent torch of pixels

Il 19/dic/2015 19:37, "Claudio Cardinali"
<[email protected]>
ha scritto:

> Was this caching related or fixed already? I get proper rendering here.
>
> ![](https://img.skitch.com/20111216-m9munqjsy112yqap5cjee5wr6c.jpg)
>
> ---
> Reply to this email directly or view it on GitHub:
> https://github.com/github/github/issues/2278#issuecomment-3182418
11 changes: 11 additions & 0 deletions test/emails/it/email_1_7.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
:+1:

Il 19/dic/2015 19:37, "Claudio Cardinali" <[email protected]> ha scritto:

> Steps 0-2 are in prod. Gonna let them sit for a bit then start cleaning up
> the old code with 3 & 4.
>
>
> Reply to this email directly or view it on GitHub.
>
>
25 changes: 25 additions & 0 deletions test/emails/it/email_2_1.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
Outlook with a reply


------------------------------

*Da:* Google Apps Sync Team [mailto:[email protected]]
*Data:* Thursday, February 09, 2012 1:36 PM
*A:* [email protected]
*Ogg:* Google Apps Sync was updated!



Dear Google Apps Sync user,

Google Apps Sync for Microsoft Outlook® was recently updated. Your computer
now has the latest version (version 2.5). This release includes bug fixes
to improve product reliability. For more information about these and other
changes, please see the help article here:

http://www.google.com/support/a/bin/answer.py?answer=153463

Sincerely,

The Google Apps Sync Team.

3 changes: 3 additions & 0 deletions test/emails/it/email_BlackBerry.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
Here is another email

Inviato da BlackBerry
22 changes: 22 additions & 0 deletions test/emails/it/email_bullets.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
test 2 this should list second

and have spaces

and retain this formatting


- how about bullets
- and another


Il 19/dic/2015 19:37, "Claudio Cardinali" <[email protected]> ha scritto:

> Give us an example of how you applied what they learned to achieve
> something in your organization




--

*Joe Smith | Director, Product Management*
15 changes: 15 additions & 0 deletions test/emails/it/email_headers_no_delimiter.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
And another reply!

Da: Dan Watson [mailto:[email protected]]
Data: Monday, November 26, 2012 10:48 AM
A: Watson, Dan
Ogg: Re: New Issue

A reply

--
Sent from my iPhone

Il 19/dic/2015 19:37, "Claudio Cardinali" <[email protected]> ha scritto:
This is a message.
With a second line.
3 changes: 3 additions & 0 deletions test/emails/it/email_iPhone.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
Here is another email

Inviato da iPhone
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
Here is another email

Inviato da Verizon Wireless BlackBerry
10 changes: 10 additions & 0 deletions test/emails/it/email_one_is_not_on.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
Thank, this is really helpful.

One outstanding question I had:

Locally (on development), when I run...

Il 19/dic/2015 19:37, "Claudio Cardinali" <[email protected]> ha scritto:

> The good news is that I've found a much better query for lastLocation.
>
13 changes: 13 additions & 0 deletions test/emails/it/email_partial_quote_header.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
On your remote host you can run:

telnet 127.0.0.1 52698

This should connect to TextMate (on your Mac, via the tunnel). If that
fails, the tunnel is not working.

Il 19/dic/2015 19:37, "Claudio Cardinali" <[email protected]> ha scritto:

> I am having an odd issue wherein suddenly port forwarding stopped
> working in a particular scenario for me. By default I have ssh set to
> use the following config (my ~/.ssh/config file):
> […]
Loading