A Node.js Framework to imitate human internet browsing behavior on Chrome
HOSIT is a Node.js framework which extends the Headless Chrome API Puppeteer with additional functionalities making the automated internet browsing behavior as humanlike as possible.
We developed this framework for studies on online services which required our browser automation to be as humanlike as possible. In our work, Is This Really You? An Empirical Study on Risk-Based Authentication Applied in the Wild, we used HOSIT to find out more about the Risk-based Authentication practices of big online services (which were kept secret by the companies).
Further details about HOSIT and how you can use it for your own studies are presented in our publication Even Turing Should Sometimes Not Be Able To Tell: Mimicking Humanoid Usage Behavior for Exploratory Studies of Online Services.
We recorded this video on May 17th, 2019, so the generated search query reflects a current event covered in German media here (the search query was automatically generated by HOSIT).
- Table of Contents
- HOSIT compared to Puppeteer
- Install
- Documentation
- Getting started
- Notice
- Publications
- License
Puppeteer 0.13.0 | HOSIT |
---|---|
Constant typing speed with each key | Randomized typing which varies with each key around a defined deviation |
Clicks inside the element's exact center | Clicks around 1/4 of the element's center with random deviation |
0 ms delay between pressing and releasing the mouse button | Randomized delay between pressing and releasing the mouse button |
No scrolling | Scrolling function with included reading simulation |
No search query generation | Extendable Search query generator |
No function to select tabs or frames | Selecting tabs/frames containing a certain URL. |
No logging | Extended logging functionalities (e.g. taking screenshots, logging all events into a database) |
No CAPTCHA solving | Automated solving of (re)CAPTCHAs (Anti-Captcha API key required) |
Commonly used workflows take a lot of code, especially when simulating humanlike identities | Controller-Class which simulates behavior of a user identity and eases application of commonly used Puppeteer workflows (e.g. initializing browser, opening/closing tabs, typing/clicking, taking screenshots). |
-
With npm:
npm install hosit-browser
-
With yarn:
yarn install hosit-browser
Note: At least Node.js v7.6.0 is needed since this framework relies on the await
command. All requirements needed to use Puppeteer also apply for this framework.
The documentation is available at docs/api.md
You can also generate the documentation on your own with JsDoc:
jsdoc -c jsdoc-conf.json
The code has to be entered in an async environment:
(async() => {
// Enter your code here
})();
-
Import module:
// Import module const HOSIT = require("hosit-browser")
-
Create an Identity
const testidentity = new HOSIT.Identity("Firstname","Lastname", new Date(1992, 5, 19), "[email protected]", "PASSW0RD", "Company", "Position", 456, 265, global.GENDER_MALE);
-
Initiate the Controller and assign it to the created identity.
// Initiate controller with test identity const controller = await new HOSIT.Controller(testidentity); // Start browser session and open new tab await controller.init();
-
Now you can control your identity with the contoller object, e.g.:
// Open example.com await controller.goto("https://example.com"); // Wait until "More Information"-Link is visible await controller.waitForSelector("a[href='http://www.iana.org/domains/example']"); // Wait a random time period with the standard values await controller.randomWait(); // Click on the "More Information"-Link await controller.click("a[href='http://www.iana.org/domains/example']"); // Wait until the page is loaded await controller.waitForNavigation(); // Wait a random time period with the standard values await controller.randomWait(); // Open new page tab with example.net await controller.newPage("http://ixquick.com"); // Wait around 5 seconds await controller.randomWait(5000); // Wait until the text field is loaded await controller.waitForSelector("input[type='text']"); // Enter Stuff inside the text field await controller.type("input[type='text']", "Here is an example search query: "); await controller.typeSearchQuery("input[type='text']"); // Wait around 5 seconds await controller.randomWait(5000); // Close the new Page tab await controller.closePage(); // Wait around 5 seconds await controller.randomWait(5000); // Scroll to the bottom of the last opened page (Simulate reading) await controller.scrollToBottom();
The test script is located at examples/example.js.
Remember to use await
in most cases due to the asynchronous functions provided
by the Puppeteer API.
In order to imitate human browsing behavior as humanlike as possible, the browser is started in the display (GUI) mode. We discovered that the Chromium headless mode was detected and partially blocked by big online services, which was not the case with the remote controlled GUI mode.
The GUI mode can be started headlessly on a linux server by starting it in a "headful" mode with the X virtual framebuffer (Xvfb).
Install (e.g. Ubuntu):
# apt-get install xvfb
Launch with 1366x768 pixels resolution:
$ xvfb-run --server-args="-screen 0 1366x768x24" ./[path to your NodeJS script].js
Most of the common known headless Chrome detection methods (at June 2018) can
already be eliminated with the "headful" mode on unmodified versions of
Chrome/Chromium. However, we recommend to build your own Chromium version with
additional adjustments (e.g. modify or remove navigator.webdriver
object) to reduce
detectability even more. See the works by Antoine Vastel for more details on
how headless browsers might be detected by services.
Settings like browser's user agent string and the window size can be adjusted by changing the Settings parameters inside the imported HOSIT module.
Following the example above, we could examplary change the user agent to Firefox 58.0 with:
HOSIT.Settings.USER_AGENT = "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:58.0) Gecko/20100101 Firefox/58.0";
Note that some settings have to be set before the Controller.init
command in
order to be active. All possible settings with its default values can be seen
inside include/settings.js.
The Anti-Captcha API key needed for CAPTCHA solving can be added there as well.
You can enhance the list of search query generators with your own RSS feeds generators. This might be useful if you want to generate search queries for different geolocations.
In our example we take the RSS feed for the Google Hot Trends for the US geolocation:
-
Define properties for the new entry (just like you see in include/settings.js):
let trendsus = { rssFeedURL: 'https://trends.google.com/trends/hottrends/atom/feed?pn=p1', cacheFileName: "trendsusQueries", // Name of the cache file created in tmp-folder queryArrayObject: global.TRENDSUS_QUERIES, // our Array object where we're saving the queries evaluationFunction: function(item) { // Function which evaluates every feed item and generates the query for it // Writes item title in upper case for fun return item.title.toUpperCase(); } };
-
Add new entry to the list of search query generators defined in the settings:
HOSIT.Settings.SEARCH_QUERY_GENERATORS.hottrends = trendsus;
If you want to use your query generator as the default generator, just change the default object of the list:
HOSIT.Settings.SEARCH_QUERY_GENERATORS.default = trendsus;
-
Now we can call our new search query generator with the keyword we used
hottrends
in our case following the example above:await controller.typeSearchQuery("input[type='text']", "hottrends");
If you have changed the default generator, you can use the shorter function call as well:
await controller.typeSearchQuery("input[type='text']");
The test script with the example is located at examples/example-new-querygenerator.js
The Framework can be extended for database logging (MongoDB recommended). See the project hosit-logger for details.
This framework was used and tested with Puppeteer 0.13.0 between December 2017 and March 2018. However, we assume that it works with newer Puppeteer versions as well.
We are convinced that headless detection methods will improve over time (see e.g. headless-cat-n-mouse on Github). After publication of the technical paper, it is very likely that online services try to protect against HOSIT. So feel free to improve this framework.
We provide further details about HOSIT in the following publication. There you can also find information on how to use HOSIT for your own research studies. Please cite the paper when using HOSIT in your own studies:
Even Turing Should Sometimes Not Be Able To Tell: Mimicking Humanoid Usage Behavior for Exploratory Studies of Online Services (2019)
Stephan Wiefling, Nils Gruschka, and Luigi Lo Iacono.
24th Nordic Conference on Secure IT Systems (NordSec 2019), Aalborg, Denmark.
@inproceedings{Wiefling_Even_2019,
author = {Wiefling, Stephan and Gruschka, Nils and Lo Iacono, Luigi},
title = {Even {Turing} {Should} {Sometimes} {Not} {Be} {Able} {To} {Tell}: {Mimicking} {Humanoid} {Usage} {Behavior} for {Exploratory} {Studies} of {Online} {Services}},
booktitle = {24th {Nordic} {Conference} on {Secure} {IT} {Systems} ({NordSec} 2019)},
series = {{Lecture} {Notes} in {Computer} {Science}},
volume = {11875},
pages = {188--203},
isbn = {978-3-030-35055-0},
doi = {10.1007/978-3-030-35055-0_12},
publisher = {Springer Nature},
location = {Aalborg, Denmark},
month = nov,
year = {2019}
}
We also did a comprehensive study on Risk-based Authentication using this tool:
Is This Really You? An Empirical Study on Risk-Based Authentication Applied in the Wild (2019)
Stephan Wiefling, Luigi Lo Iacono, and Markus Dürmuth.
34th IFIP TC-11 International Conference on Information Security and Privacy Protection (IFIP SEC 2019), Lisbon, Portugal.
@inproceedings{Wiefling_Is_2019,
author = {Wiefling, Stephan and Lo Iacono, Luigi and D\"{u}rmuth, Markus},
title = {Is {This} {Really} {You}? {An} {Empirical} {Study} on {Risk}-{Based} {Authentication} {Applied} in the {Wild}},
booktitle = {34th {IFIP} {TC}-11 {International} {Conference} on {Information} {Security} and {Privacy} {Protection} ({IFIP} {SEC} 2019)},
series = {{IFIP} {Advances} in {Information} and {Communication} {Technology}},
volume = {562},
pages = {134--148},
isbn = {978-3-030-22311-3},
doi = {10.1007/978-3-030-22312-0_10},
publisher = {Springer International Publishing},
location = {Lisbon, Portugal},
month = jun,
year = {2019}
}
This project is licensed under the MIT License (LICENSE).