-
-
Notifications
You must be signed in to change notification settings - Fork 580
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add functionality to redact/filter sensitive data #1
Comments
It might be better to do redaction by simply removing the window-title property of the event entirely. |
I've done some basic work in ActivityWatch/aw-research@eb45bf6 |
Would it make more sense to instead of redaction implement some kind of data encryption? For instance with some from of asymmetrical cryptography (public/private key) encryption would add no requirements but accessing the data would require a private key and/or a password. |
I'd also really like to have a filtering feature, so I'll add my two cents here. (Edit: Well, that grew to a bit more than two cents... o.O) In my wanton imagination it would look something like this: 1) main ideaVia a html form (on the activitywatch website) the user can create filters (e.g. 2) FiltersFilters should consist of two parts, namely 2.1) Filter criteriaSpecifies when the filter should be applied. (2.1.1) if [title/data.incognito] [equals/differs/includes/regex/>/>=/</<=] [comparison]Examples: (2.1.2) logical operatorsExamples: (2.1.3) metadata checks [watcher_name, is_test, ...]e.g. (2.1.4) time rangesExamples: 2.2) Filter actionSpecifies what should be done, if applied. (2.2.1) remove [event/event.data/event.title/...]Examples: (2.2.2) replace [target] with [val]Examples: 3) Implementation(3.1) User interfaceFilters should be createable via a html interface on the localhost site (http://localhost:5600/filters) (3.1.1) Filters page(3.1.1.1) A list of the active filters with the options to [edit/copy/disable/delete] the filter (3.1.2) Add new filter UIShould be easy to understand for non-coders. Likely with dropdowns and predefined fields. (3.2) Server partSomeone knows of a library for that...??? o.o (3.3) Standardized events format per bucketThis would be really nice, as we then can give the users a list of available options when creating filters (e.g. data.[dropdown: 1) pizza, 2) pasta, 3) ...]) and for making sure a filter is valid. NotesOf course, I am realistic that it would take time to implement this, especially if there's no library for this. But from my point of view, this would enhance this tool really much. Also much of this is just nice-to-have and doesn't need to be implemented right from the beginning. I just thought I would write out everything, so that while developing we can keep an eye on these (and maybe code in a way these other options can be implemented easily) From next week on, I would have more time for developing, so until then maybe we can discuss if/how we should implement this? :) |
Had a bit time, so here is a quick draft showing what I mean with these filters: https://github.com/Otto-AA/aw-filter/blob/master/filter.py Nonetheless, before starting getting into the details we should agree on how we implement it ^_^ |
Any thoughts on this proposal? |
@Otto-AA I've only skimmed through it as of now, but seems to be kind of in-line with what we have been thinking aswell. As of now I want to prioritize editor format and visualizations and once that's done the more important feature IMO is tagging (which would feature some similarities in the datastore, making this easier later on). But even more important is making a final 0.8 release. This task is huge (just planning and prototyping the design would probably be 2 complete days of full work), so I'm not sure if I want to prioritize discussing the design of this as of now. I'm sorry, I really want this feature aswell. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
Since categorization is now done, I'd just like to throw out a suggestion: one way to do this is to have a "sensitive/to-redact" category and then wipe the title/URL/app of all the events that match the category. |
@ErikBjare That is not a good solution in terms of security, to make it truly secure we have to never even add the data to the buckets in the first place, not filter it when querying. We could add the new settings API to solve this, add a way in the web-ui to add regexes which should be filtered and then let aw-watcher-window check those on startup and filter them before the events get sent, There's also a duplicate feature request on the forum |
Agreed.
That's not what I mean. I mean to classify & filter when a heartbeat is received.
That makes the watchers depend on the server settings, and also requires us to implement the same filtering in all watchers. It's a bit more secure than what I had in mind since the server would never see the sensitive info at all, but not sure if it's worth it. It's worth mentioning that the rules themselves are sensitive information, especially if they only contain a few things, making the "anonymity set" for redacted events small. However, it would be less of a problem if we went for deleting events entirely. In any case, I've been thinking of building a feature in aw-webui that lets you search for events matching a particular pattern, and then let you delete them or replace them with redacted versions of the events. Wouldn't take that much work to build, search would be a generally useful feature anyway, and wouldn't add any code to the server or watchers. |
Oh, alright. Might still be an issue though, either we need to be aware of bucket types (so we for example don't corrupt events in buckets we don't expect to, for example replacing "afk" with "redacted" or something).
Agreed, currently that's just a few watchers (aw-watcher-window and aw-watcher-web) but in the future it might become more.
Very good point, didn't think of that.
Definitely a good start! Not sure myself which one of our suggested solutions are the best, both have their pros and cons really. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
I've added a new example |
Why not encrypt the data going in? The goal should be to not leak any private information if your machine gets hacked (unfortunately very common) Then require a 2FA to view your own data. |
I've been looking at doing a change the Rust server engine to just filter out events on its side. So no matter which tracker is sending data to the server, the server itself is responsible to filter them out based on regex matches. Looking at making the configuration be part of the server config file at this point for simplicity, but ultimately I think having a filter table built into the database would be useful so then the front end could easily then send new filters to the backend. @ErikBjare @johan-bjareholt - Would this be a PR you would be interested in. |
Just to throw out a couple of ideas relating to this/window titles that I'd like to see realised (some points mentioned earlier by others): A way to:
I've known about ActivityWatch for many years and have probably installed it once every year or two, but the lack of any way to disable window title capturing completely has always caused me to inevitably uninstall. Until there are ways for a user to handle window title capturing, an on/off switch would be great. Excuse any ignorance as my overall experience with ActivityWatch is quite limited. I hope that will change because I'd really like to use your useful program. Thanks. |
Shouldn't this be as easy as creating a filtering list on the server side, thus "if entry has a match with a filter, don't add it to the sql database"? I can easily create a Category with a "Private browsing" pattern which correctly identifies all my "Private Browsing" data; Currently, there is no solution nor compromise which would fix / alleviate the problem, apart from using this pull or running "redact_sensitive.py" periodically. |
Doesn't Chrome already know where you've been? (Unless you turned off all settings to track you?) I believe most AW users' expectations differ wildly from those of a https://www.qubes-os.org/ user If you really, really can't trust yourself with what you're doing on your computer, simply use a different operating system that allows you to hide entire compute workloads from yourself. ActivityWatch, in my mind, is not for PEPs or investigative journalists, it's for everyone else who wants more control (but not total control, as if that were even a thing...) over their digital crumbs, and trusts themselves enough with a local database, on a non-air-gapped computer, likely connected to the internet. If you need even lower level trust, go for https://puri.sm/ with Qubes on it :-) No need to overcomplexify AW, IMO |
The default should be to respect private browsing, with opt-out option if somebody wants to record that. Mostly people will not want to record private browsing time, which by default for most people is not work related anyway. |
I do want to record private browsing time. The reason I use private browsing and VPNs is to hide my activity from others on the web, not from myself. The reason I use AW is to surface insights into my own digital behavior (on and off the web, work and personal, both), private browsing included. Actually, I use multiple computers (and VMs) and I'd like to track my behavior across all these (virtual) devices, not just my "main" device. I do trust my LAN/VPNs to not be compromised... and AW fits the bill quite nicely. :-) |
We need a model to filter out sensitive data by default.
For example if a window title contains: "[title] - Firefox (Private Browsing)" we should redact [title] to some magic string such as "REDACTED".
For some cases we might want to filter the window out entirely, giving 0 information about which window is running, better catch too much than too little.
It should be the goal that every user has a set of "clean" data. The filtering should also be able to be run on an existing database of data, so that cleaner data can be output. Preferably, the data should be so clean that there is little (or even no) reason not to share it (which would be great since easy access to a large dataset could make research in some areas a lot easier!).
The question left is where this processing step should take place. We want the filtering/redacting to happen before data is sent anywhere but it should also be able to be enforceable on a server (if the server owner doesn't trust the servers security, if in the cloud for example) and have clients notified of this so that they can do the filtering on their side, removing the need to send sensitive data at all. It might therefore be prudent to write a module in aw-core that implements this functionality since it should be useable from the server and all clients (which transmit sensitive data).
This feature should be on by default, we don't need anything advanced yet, first priority is to redact titles from Incognito/Private Browsing, that's a good step in the right direction.
This should have a far higher priority than Zero-Knowledge storage right now, because it's a lot easier and is more user friendly (In ZK storage: if you lose your keys you lose your data).
Useful when:
This issue was originally moved from ActivityWatch/aw-server#4 which ended up here because it ended up having wider scope not only relating to aw-server.
Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.
The text was updated successfully, but these errors were encountered: