Skip to content

Commit

Permalink
Merge pull request #50 from nelsonic/master
Browse files Browse the repository at this point in the history
Revive Hits - Better (Much Faster!) than Ever
  • Loading branch information
iteles authored Sep 4, 2017
2 parents f4413d3 + 3a8a2f8 commit 4f5d73f
Show file tree
Hide file tree
Showing 29 changed files with 943 additions and 215 deletions.
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ build/Release
# https://www.npmjs.org/doc/misc/npm-faq.html#should-i-check-my-node_modules-folder-into-git
node_modules

config.env
*.env
dump.rdb
npm-debug.log
data/
195 changes: 154 additions & 41 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,78 +1,168 @@
# hits

What if there was a *simple+easy* way to see how many people have viewed your GitHub Repository?
A _simple & easy_ way to see how many people have _viewed_ your GitHub Repository.

[![Build Status](https://travis-ci.org/dwyl/hits.svg)](https://travis-ci.org/dwyl/hits)
[![HitCount](https://hitt.herokuapp.com/nelsonic/hits.svg)](https://github.com/nelsonic/hits)
[![Code Climate](https://codeclimate.com/github/dwyl/hits/badges/gpa.svg)](https://codeclimate.com/github/dwyl/hits)
[![codecov.io](http://codecov.io/github/dwyl/hits/coverage.svg?branch=master)](http://codecov.io/github/dwyl/hits?branch=master)
[![Dependency Status](https://david-dm.org/dwyl/hits.svg)](https://david-dm.org/dwyl/hits)
[![devDependency Status](https://david-dm.org/dwyl/hits/dev-status.svg)](https://david-dm.org/dwyl/hits#info=devDependencies)
[![Build Status](https://img.shields.io/travis/dwyl/hits.svg?style=flat-square)](https://travis-ci.org/dwyl/hits)
[![HitCount](http://hits.dwyl.io/dwyl/hits.svg)](https://github.com/dwyl/hits)
[![codecov.io](https://img.shields.io/codecov/c/github/dwyl/hits/master.svg?style=flat-square)](http://codecov.io/github/dwyl/hits?branch=master)
[![Dependency Status](https://img.shields.io/david/dwyl/hits.svg?style=flat-square)](https://david-dm.org/dwyl/hits)
[![devDependency Status](https://img.shields.io/david/dev/dwyl/hits.svg?style=flat-square)](https://david-dm.org/dwyl/hits#info=devDependencies)


## Why?

We have a few repos on GitHub ... but sadly, we have no idea how many people
are looking at the repos unless they star/watch them; GitHub does not share
any stats with people using their site.
We have a _few_ projects on GitHub ... <br />
_Sadly_, we ~~have~~ _had_ no idea how many people
are _reading/using_ the projects because GitHub only shares "[traffic](https://github.com/blog/1672-introducing-github-traffic-analytics)" stats
for the [_past 14 days_](https://github.com/dwyl/hits/issues/49) and **not** in "***real time***".
(_unless people star/watch the repo_) Also, _manually_ checking who has viewed a
project is _exceptionally_ tedious when you have more than a handful of projects.

We would like to *know* the popularity of each of our repos
to know where we need to be investing our time.
We want to *know* the popularity of _each_ of our repos
to know what people are finding _useful_ and help us
decide where we need to be investing our time.

## What?

A simple way to add (*very basic*) analytics to your GitHub repos.

There are already *many* "Badges" available which people put in their repos: https://github.com/dwyl/repo-badges
There are already *many* "badges" that people use in their repos.
See: [github.com/dwyl/**repo-badges**](https://github.com/dwyl/repo-badges) <br />
But we haven't seen one that gives a "***hit counter***"
of the number of times a page has been viewed ...
of the number of times a GitHub page has been viewed ... <br />
So, in today's mini project we're going to _create_ a _basic **Web Counter**_.

## How?

Place a badge (*image*) in your repo `README.md` so others can
can see how popular the page is and you can track it.
https://en.wikipedia.org/wiki/Web_counter

### Implementation
### What Data to Capture/Store?

What is the ***minimum possible*** amount of data we can store?
The _first_ question we asked ourselves was:
What is the ***minimum possible*** amount of (_useful/unique_)
**info** we can store ***per visit*** (_to one of our projects_)?

+ **date+time** the person visited the site.
1. **date + time** (_timestamp_) ***when***
the person visited the site/page. <br />
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Date/now
+ **user-agent** the browser or crawler visiting the page

2. **url** being visited. i.e. which project was viewed.

3. **user-agent** the browser/device (_or "crawler"_) visiting the site/page
https://en.wikipedia.org/wiki/User_agent
+ **referer** url of the page where the image is requested from?
https://en.wikipedia.org/wiki/HTTP_referer

Log entries are stored as a `String` which can be parsed and re-formatted into
any other format:
4. IP Address of the client. (_for checking uniqueness_)

5. **Language** of the person's web browser.
_Note: While not "essential", we added **Browser Language**
as the **5th** piece of data (when it is set/sent by the browser/device)
because it's **insightful** to know what language people are using
so that we can determine if we should be **translating**/"**localising**"
our content._

### "Common Log Format" (CLF) ?

We initially _considered_ using the "Common Log Format" (CLF)
because it's well-known/understood.
see: https://en.wikipedia.org/wiki/Common_Log_Format

An example log entry:
```
127.0.0.1 user-identifier frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326
```

Real example:
```
84.91.136.21 Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) 007 [05/Aug/2017:16:50:51 -0000] "GET github.com/dwyl/phase-two HTTP/1.0" 200 42247
```

The data makes sense when viewed as a table:

| IP Address of Client | User Identifier | User ID | Date+Imte of Request | Request "Verb" and URL of Request | HTTP Status Code | Size of Response |
| -------------|:-----------|:--|:------------:|:--------:|:--|--|--|
| 84.91.136.21 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) | 007 | [05/Aug/2017:16:50:51 -0000] | "GET github.com/dwyl/phase-two HTTP/1.0" | 200 | 42247 |

On further reflection, we think the "Common Log Format" is _inneficient_
as it contains a lot of _duplicate_ and some _useless_ data.

We can do better.

### Alternative Log Format ("ALF")

From the CLF we can remove:

+ **IP Address**, **User Identifier** and **User ID** can be condensed into a single hash (_see below_).
+ "**GET**"" - the word is implied by the service we are running (_we only accept GET requests_)
+ **Response size** is _irrelevant_ and will be the same for most requests.

| Timestamp | URL | User Agent | IP Address | Language | Hit Count |
| ------------- |:------------|:------------|:------------:|:--------:|
| 1436570536950 | github.com/dwyl/the-book | Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) | 84.91.136.21 | EN-GB | 42 |


In the log entry (_example_) described above the first 3 bits of data will
identify the "user" requesting the page/resource, so rather than duplicating the data in an inefficient string, we can _hash_ it!

Any repeating user-identifying data should be concactenated

Log entries are stored as a (_"pipe" delimited_) `String`
which can be parsed and re-formatted into any other format:

```sh
1436570536950 x7uapo9 84.91.136.21
1436570536950|github.com/dwyl/phase-two|Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5)|88.88.88.88|EN-US|42
```
| Timestamp | User Agent | IP Address |
| ------------- |:------------|:------------:|
| 1436570536950 | x7uapo9 | 84.91.136.21 |

We then have a user-agent hash where we can lookup the by id:
### Reducing Storage (_Costs_)

If a person views _multiple_ pages, _three_ pieces of data are duplicated:
User Agent, IP Address and Language.
Rather than storing this data multiple times, we _hash_ the data
and store the hash as a lookup.

#### Hash Long Repeating (Identical) Data

If we run the following `Browser|IP|Language` `String`:
```sh
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5)|84.91.136.21|EN-US'
```
through a **SHA** hash function we get: `8HKg3NB5Cf` (_always_)<sup>1</sup>.

_Sample_ code:
```js
{
"x7uapo9":"Mozilla/5.0 (iPad; U; CPU OS 3_2_1 like Mac OS X; en-us) AppleWebKit/531.21.10",
"N03v1lz":"Googlebot/2.1 (+http://www.google.com/bot.html)"
}
var hash = require('./lib/hash.js');
var user_agent_string = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5)|88.88.88.88|EN-US';
var agent_hash = hash(user_agent_string, 10); // 8HKg3NB5Cf
```

<sup>1</sup>Note: SHA hash is _always_ 40 characters,
but we _truncate_ it because 10 alphanumeric characters (_selected from a set of 26 letters + 10 digits_)
means there are 36<sup>10</sup> = [3,656,158,440,062,976](http://www.wolframalpha.com/input/?i=36%5E10)
(_three and a half [**Quadrillion**](http://www.wolframalpha.com/input/?i=3,656,158,440,062,976+in+english)_)
possible strings which we consider "_enough_" entropy.
(_if you disagree, tell us why in an
[issue](https://github.com/dwyl/hits/issues)_!)

#### Hit Data With Hash

```
1436570536950|github.com/dwyl/the-book|8HKg3NB5Cf|42
```


## How?

Place a badge (*image*) in your repo `README.md` so others can
can see how popular the page is and you can track it.

### Fetch SVG from shields.io and serve it just-in-time

Given that shields.io has a badge creation service,
and it has acceptable latency, we are proxying the their service.

## Run it!
## _Run_ it Your_self_!

Download (clone) the code to your local machine:

```sh
git clone https://github.com/dwyl/hits.git && cd hits
```
> Note: you will need to have Redis running on your localhost,
> if you are new to Redis see: https://github.com/dwyl/learn-redis

> Note: you will need to have Node.js running on your localhost.
Install dependencies:
```sh
Expand All @@ -85,6 +175,20 @@ npm run dev
Visit: http://localhost:8000/any/url/count.svg


# Data Storage

Recording the "hit" data is _essential_
for this app to _work_ and be _useful_.

We have built it to work with _two_ "data stores":
Filesystem and Redis <!-- and PostgreSQL. --> <br />
> _**Note**: you only need **one** storage option to be available_.
## Filesystem




## Research

### User Agents
Expand All @@ -108,3 +212,12 @@ http://www.monitorware.com/en/logsamples/apache.php
### Node.js http module headers

https://nodejs.org/api/http.html#http_message_rawheaders

## Running the Test Suite locally

The test suite includes tests for 3 databases
therefore running the tests on your `localhost`
requires all 3 to be running.

Deploying and _using_ the app only requires _one_
of the databases to be available.
53 changes: 48 additions & 5 deletions lib/client.js
Original file line number Diff line number Diff line change
@@ -1,9 +1,52 @@
// connect to websocket server
$( document ).ready(function() {
console.log('Ready!', window.location.host);
var root = document.getElementById("hits");
console.log('Ready!', window.location.host);

setTimeout(function(){
var socket = io(window.location.host);
socket.on('news', function (data) {
console.log(data);
socket.emit('my other event', { my: 'data' });
socket.emit('hello', { msg: 'Hi!' });
});
});

socket.on('hit', function (data) {
var previous = root.childNodes[0];
root.insertBefore(div(Date.now(), data.hit), previous);
});

// borrowed from: https://git.io/v536m
function div(divid, text) {
var div = document.createElement('div');
div.id = divid;
div.className = divid;
if(text !== undefined) { // if text is passed in render it in a "Text Node"
var txt = document.createTextNode(text);
div.appendChild(txt);
}
return div;
}
document.getElementById("how").classList.remove('dn'); // show form if JS available (progressive enhancement)
document.getElementById("nojs").classList.add('dn'); // show form if JS available (progressive enhancement)
display_badge_markdown(); // render initial markdown template
}, 500);

// Markdown Template
var mt = '[![HitCount](http://hits.dwyl.io/{user}/{repo}.svg)](http://hits.dwyl.io/{user}/{repo})';

function generate_markdown () {
var user = document.getElementById("username").value || '{username}';
var repo = document.getElementById("repo").value || '{project}';
// console.log('user: ', user, 'repo: ', repo);
return mt.replace(/{user}/g, user).replace(/{repo}/g, repo);
}

function display_badge_markdown() {
var md = generate_markdown()
var pre = document.getElementById("badge").innerHTML = md;
}

var get = document.getElementsByTagName('input');
for (i = 0; i < get.length; i++) {
get[i].addEventListener('keyup', display_badge_markdown, false);
get[i].addEventListener('keyup', display_badge_markdown, false);

}
21 changes: 0 additions & 21 deletions lib/climate.svg

This file was deleted.

Loading

0 comments on commit 4f5d73f

Please sign in to comment.