GitHub - ryanachten/ensemble: Visualise musical connections ✨

Web application visualising connections between bands, artists and genres.

Overview

Ensemble uses Wikipedia as a data source by scraping metadata for a given query and recursively visiting relevant links. These relationships are dynamically built up as an adjacency list before being formatted for client consumption once a given degree-of-separation is met.

Stack

Ensemble is built using the following technologies:

Client

Svelte + Sveltekit - Client framework
Tailwind + DaisyUI - Styling
Cytoscape - Graph rendering and searching

API

Go + Gin - API framework
Colly - Web scraping

Performance testing

Node - tooling
K6 - performance testing

Graph relationships

In the band or artist graph, the following relationships are supported in addition to their corresponding data source in the Wikipedia website. The genre graph is a little different, where only genre nodes are currently included.

Degrees of Separation

To prevent infinite recursion, we need a hard limit how how many times to recursion cycles we'll complete when building the graph. In the Ensemble, this is limit is referred to as degrees of separation from the original target.

Pathfinding

Currently, Ensemble uses Cytoscape's implementation of the Floyd-Warshall search algorithm for finding the shortest paths between two nodes. Genre nodes are weighted slightly lower than band or artist nodes to discourage routing via genres for more interesting paths.

Concurrency

The initial approach when scraping information from Wikipedia and building up the graph was by sequentially visiting each node using depth-first search. This approach produced a bottleneck awaiting HTTP requests to Wikipedia to retrieve node metadata.

To address this HTTP request bottleneck, node visitations were parallelized using goroutines. This produced a secondary issue, where concurrent read and writes lead to data loss when compared to the graph produced sequentially.

To address concurrent read and write conflicts, graph updates were sequenced using a queue. A Go channel watches for queue entries and executes graph updates accordingly.

There are a couple of ways we can create thread-safe graphs in Golang; the sync package comes with a concurrent-safe Map, or we can implement our own synchronisation using Mutex mutual exclusion locks. We decided to test both approaches to see which would yield the best results when compared to the sequential approach.

Performance testing

To evaluate which strategy and graph implementation performed the best, we created performance tests using K6. We tested for 3 different factors:

Reliability - did data loss occur
Latency - how long did the graph take to generate
Request failure - did the response return a non-200 status code

These were also evaluated using different numbers of concurrent users and degrees of separation.

A custom test runner was built in Node to execute these tests and format the results in CSV for client display - the charted results can be viewed on the Ensemble website.

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
.github/workflows		.github/workflows
.vscode		.vscode
api		api
client		client
docs		docs
performance		performance
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overview

Stack

Graph relationships

Degrees of Separation

Pathfinding

Concurrency

Performance testing

Further information

About

Releases

Packages

Languages

ryanachten/ensemble

Folders and files

Latest commit

History

Repository files navigation

Overview

Stack

Graph relationships

Degrees of Separation

Pathfinding

Concurrency

Performance testing

Further information

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages