Skip to content

Introduction to PyCharm, Git and GitHub

Matt Graham edited this page Jan 19, 2022 · 2 revisions

Introduction to PyCharm, Git and GitHub


PyCharm

PyCharm is a Python integrated development environment (IDE).

Integrates a text editor, terminal, Python interpreter and version control (Git) interface all within one application.

Other IDEs with Python support are available such as VSCode, Atom and Spyder and most of what we describe will be translatable.


PyCharm interface

Depending on your configuration and version some elements may appear different / in different locations.

Main components that we will use

  • Project explorer - opening and managing files
  • File editor area - editing and viewing files and file changes
  • Terminal - running system commands
  • Python Console - running Python commands
  • Commit tab - adding and committing changes *
  • Pull Requests tab - creating and viewing pull requests *
  • Git tab - navigating commit history and branches *

(*) We will return to these in the second Git part of these notes.


Project explorer

Main point of entry for opening and creating files.

Can also be used to perform some Git operations via the context menu.

Colours of filenames indicate Git status - for example ignored files in yellow, untracked files in red.


File editor

Allows viewing and editing text based files, most commonly .py Python source files.

Inbuilt syntax highlighting and code analysis help writing code and spotting bugs.

Documentation pane can be used to show document for currently selected object.

We can set breakpoints for debugging by clicking in gutter to right of line numbers.

Using context menu when selecting a name in file allows navigating to definition of object and/or usages.

Problems tab and highlighting in editor help flag up potential issues in code.


File editor

PyCharm also includes support for other text file formats including Markdown and reStructuredText documents.

In-built rich-text preview simplifies writing documentation pages.


Terminal

Allows running commands in system shell (terminal).

On Windows defaults to PowerShell and on MacOS / Linux a bash shell.

Can be useful for using git command-line interface or submitting Azure runs.


Python console

Python interpreter running in virtual environment created as part of environment set up.

Can be used interactively, for example to test snippets of code, inspect objects.

Variable inspector allows inspecting objects in current namespace.


Running scripts and tests

Shortcut toolbar under top menu bar allows quick access to running scripts and tests.

Options for running tests with a debugger or profiler active.

We can specify run configurations for commonly used scripts and tests.


Run configurations

Dialog can be used to add pre-defined configurations for running commonly used scripts or test suites.

For scripts generally can just specify script path and leave other settings as defaults unless script accepts arguments.

Can be useful to set up configurations for analysis scripts and tests for modules you are developing.


Running scripts

When running a script any output, for example logger output or the simulation progress bar will be shown in a Python interpreter in the Run tab.

We can halt a running script at any time using the :black_square_for_stop: stop button.


Running tests

When tests are running the Run tab will show details of currently running test and test passes / fails.

Can rerun only failed tests - useful when trying to fix problems in implementation causing failures.


Git and GitHub

Git is an example of a version control system (VCS) - a tool for managing and tracking changes to a set of files.

Git is free and open-source and currently the most widely used VCS.

Git has a distributed design which makes it simple for multiple people to be working on a project concurrently - ideal for TLO!

GitHub is a web service which allows hosting Git repositories online and provides a web interface with additional features for collaboration.


Git glossary

Repository : Collection of all files and their history associated with a particular project

Commits : Snapshots of the state of the files in a repository

Branches : A linear sequence of commits originating from a particular point in the commit tree, often implementing a particular feature or fix with an associated label

Cloning : Copying a a repository hosted remotely (for example GitHub) to your local machine

Pulling : Synchronising changes from a remote repository to your local repository

Pushing : Synchronising changes from your local repository to a remote repository

Forking : Creating a new copy of a repository on a hosting service such as GitHub that you can synchronise changes to


Repositories

Git's distributed design means there can be multiple copies of a repository on different machines (or even the same machine).

Typically each person collaborating on a project will have their own local repository and there will also be a remote repository hosted on a service such as GitHub where the changes made in each individual's repository are synchronised to.

In TLO's case this central repository is hosted at https://github.com/UCL/TLOmodel

This model allows each person to simultaneously work on their own updates without affecting other peoples files. While there can be conflicts when changes are merged, Git has powerful tools for helping to resolve these.


Commits

A commit corresponds to a snapshot of the files in a repository plus some associated metadata.

An example of GitHub's representation of a commit on the TLOModel repository is https://github.com/UCL/TLOmodel/commit/4578425b4a6136bb1026d330876001463e1c430f

Commits are tagged with the author, the date and time of creation, a message (short description) and a reference to the parent commit (the previous commit changes were made from).

A commit is uniquely identified by a long hexadecimal string (using characters 0-9 and a-f) or commit hash, for example 4578425b4a6136bb1026d330876001463e1c430f

We can use a subsection of this hash to refer to a commit, for example 4578425, providing it is unique amongst all commits in the repository.

Typically however we work with branches which are pointers to commits with a more human readable name.


Branches

Typically there be multiple simultaneous lines of development of a Git repository as people work on adding different features or fixes.

A branch is an automatically updating pointer to the latest in a chain of commits representing a line of development, and associated with a human readable name.

A branch present in many repositories is the master or main branch, which by convention is often used to represent the main line of development which the changes in other branches are merged in to when considered 'complete'.

Each copy of a repository will have its own set of branches, however we can pull changes from a branch on a remote repository to a branch on our local repository and conversely we can push changes from a branch on our local repository to a branch on a remote repository

The GitHub TLOModel branches are listed at https://github.com/UCL/TLOmodel/branches


Using Git and GitHub in PyCharm

We will now go through a worked example of the basics of interacting with Git and GitHub in PyCharm.

We will use an example 'Travel guide' repository rather than the actual TLOModel repository to allow us to make edits without polluting the TLO codebase!

https://github.com/matt-graham/git-example

This example is taken from an exercise designed by David Perez-Suarez for the UCL Research Software Engineering with Python course.


Creating a new project from the repository

To create a new project from a Git repository in PyCharm we need to clone the repository by going to Git > Clone... on the menu bar

A Get from Version Control dialog will then appear

In the URL field enter https://github.com/matt-graham/git-example

By default the repository will be cloned to a directory named git-example in your PyCharm projects directory - you may change this to something else if you wish.

Once you have entered the URL and if desired changed the directory, click the Clone button at the bottom right of the dialog

This will then show a progress bar while the repository is cloned to your machine - as this is a very small Git repository this should be quick!

Once the repository has finished loading you should be able to browse the files from the Project tab on the left of the PyCharm interface. You should see a list of directories for each continent and a README.md file like the following


Viewing Git history

By default when you clone the git-example repository the master branch is checked out and the files you see correspond to the latest commit on this branch.

If we open the Git tab from the toolbar at the bottom of the interface you will see something like the following

The tabular area in the centre shows the commit history of the current master branch. We can see there are three commits, each with an associated commit message, author, commit time and (short) hexadecimal hash.

The 🏷️ origin & master label indicates the commit currently referenced both by the local master branch and the master branch on the remote (GitHub) origin repository.


Switching branches

The tree navigator interface in the left column of the Git tab shows the branches in the local and remote repositories.

As well as the current master branch we see that there is folder icon 📁 mmg (my initials!) which if we expand we see there is a branch named wuerzburg-entry within.

The branch is shown in this directory tree like manner as it was given the name mmg/wuerzburg-entry with the forward slash being interpreted as a directory separator. While naming branches like this is not necessary it can be a useful way of organizing the branches you are personally working on to allow easier access in a large repository like TLO.

We can switch to the mmg/wuerzburg-branch by right-clicking on the entry in the navigator and selecting Checkout from the context menu.

If we select Branch: HEAD from the dropdown in the central history viewer column, we will see the commit history for the branch we just checked out with HEAD being the Git term for the currently checked out branch (or other commit).

We see that the mmg/wuerzburg-entry branch currently has one commit on top of the current master branch with message Adding initial entry for Würzburg, Germany. There are now two 🏷️ tags showing the commits pointed to by the mmg/wuerzburg-entry and master branches.

Although we will not do so at the moment, we can also create new branches from the currently checked out branch from the Git tab by clicking the ➕ icon in the left sidebar.

A Create branch from ... dialogue will then show where the new branch name can be specified and a checkbox used to select whether to also checkout (switch to) this new branch at the same time as creating it.


Adding changes to be committed

A typical workflow is for a branch to be used to manage the changes associated with a particular unit of work for example adding a new feature.

While we are working on this feature, we commit changes we make as we proceed to the branch. This allows us to keep track of what changes we have made and also allows for the possibility of going back to an earlier point in the commit history or reverting certain changes.

Ideally we should make small regular commits and give the informative descriptions to make it easier for us to get to navigate to a particular point in the history later.

As an example, here we will consider adding a commit to the mmg/wuerzburg-entry branch which performs some file reorganisation.

If we browse the files from the Project explorer tab we see that this branch has two new files README.md and wuerzburg.md under europe.

We might later decide we would prefer to have the files associated with each place further grouped in to per country directories.

To make this changes we would create a new directory germany in the europe directory

We then move the wuerzburg.md file in to this new subdirectory by dragging it in the Project explorer.

PyCharm will then display a Move dialog. As well as actually moving the files, PyCharm has the useful feature that it can automatically update any references to the files in other files to reflect the updated location. Here the wuerzburg.md file is linked to from the README.md file in the europe directory so if we select Search for references and click Refactor, PyCharm will automatically update this link for us.


Creating a commit

We have now updated the files in our local working copy of the branch but we have not yet commited these changes to the local repository. To do this we use the Commit tab on left sidebar of the PyCharm interface.

The Commit tab shows a tree navigator interface listing two top-level options Changes and Unversioned Files. We will ignore the latter for now. If we expand the Changes entry we will the europe/README.md and europe/wuerzburg.md files are both listed as having changes.

If we click on the README.md entry we are shown a summary of the changes made to this file as a side-by-side diff (difference).

We see that the URL for the Würzburg link has been updated to reflect the new location.

To stage these changes ready for committing we add them to the commit by toggling the checkboxes next to the individual files (or we can select all changes by toggling the top-level Changes checkbox).

Once we have added the changes to be committed, our final task is to write a short descriptive message for the changes made in the commit in the text field at the bottom of the Commit tab.

Once we have entered a commit message we click the Commit button in the bottom left to perform the commit.

If we now look at the commit history in the Git tab we see the new commit has been added

Importantly there are now separate 🏷️ tags for the local mmg/wuerzburg-entry branch and the mmg/wuerzburg branch on the remote origin repository, with the later still pointing to the previous commit. This is because while we have added this commit to our local branch we have not yet pushed this update to the remote repository.


Pulling and pushing

In Git parlance, the operation of synchronising changes from a remote repository to the local repository is called pulling and the operation in the opposite direction of synchronising changes from a local repository to a remote repository is called pushing.

In PyCharm while the latter operation is till referred to as Push the former operation is instead termed Update.

While we want to ultimately push our changes to the remote repository here, a good habit to get in to is to always update (or pull) from the remote repository before pushing. This will make sure if there have been any changes to the branch on the remote repository since you last updated these will be merged in to your local branch first.

In PyCharm we can update our local branch by right clicking on it within the the branch tree navigator column in the Git tab and selecting Update from the context menu.

This will pull in any commits from the branch on the remote repository and merge them in to the local branch. If there are commits to be pulled in, in some cases Git can automatically 'rewind' your local commits and reapply them on top of the incoming commits. In other cases there may be conflicts between the commits that need to be resolved.

Here there have been no commits made so no updates occur.

We are now finally ready to push our local changes to the remote repository. To do this we again right-click the branch name in the branch tree explorer in the Git tab and select Push... from the context menu.

A Push Commits to git-example dialogue will then appear. This summarises the commits that will be pushed and allows reviewing the changes made to the files. It is a good idea to double-check you have not unintentionally added any changes you did not want to commit at this point as undoing changes that are only present in your local repository is much simpler than doing so once they have been pushed to a remote repository.

Once you are happy that you do want to push the commits, clock the Push button at the bottom right of the dialogue.

As pulling and pushing changes to the current branch is a very common operation, PyCharm provides shortcut icons to update (pull to) and push from the currently checked out branch on the toolbar at the top right of the interface, with the ↙️ indicating updating / pulling and the ↗️ indicating pushing.


Creating a pull request

While pushing synchronises local changes to your branch to the remote repository, eventually you will want to merge these changes in to the main master branch on the remote repository.

To ensure changes are only merged in once they have been reviewed by another member of the team, TLO, as with many other open-source projects, uses GitHub's pull request feature to manage the process of merging in changes from a feature branch.

Pull requests allow you to describe the changes you have pushed to a branch in a GitHub repository, and discuss these changes with other team members. You can also request for your change to be reviewed and follow up with further commits to address comments from reviewers.

For the TLOModel repository we also have continuous integration set up using GitHub Actions that automatically runs all of our tests with the proposed updates to the code in a pull request every time new commits are pushed to it. This allows us to ensure that any changes that are being considered for merging in do not cause failure in existing tests. Generally if adding a new feature it will also be necessary to add new tests to check the validity of the new functionality.

Once reviewers have approved the changes made in a pull-request and all tests are passing, the final feature branch can then be merged into the main master branch.

Once a branch has been pushed to the remote repository on GitHub, we can go to the GitHub web page for the repository to create a pull request using GitHub's web interface. It is also possible however to open pull-requests directly from PyCharm.

This is performed using the Pull Requests tab accessible from the left sidebar.

On first opening the Pull Requests tab a list of any open pull requests will be shown along with a search bar that can be used to search within / filter the pull requests. To create a new pull request we click the ➕ icon on the top toolbar.

A New Pull Request at ... dialogue then appears. In the main Info tab we can enter a title for the pull request as well as a longer description of the changes made. Ideally the description should both summarise the changes made and the rationale for them, and provide pointers for reviewers of things you think need checking. The pull request opening description can use GitHub Flavor Markdown to add rich formatting, and can also uses autolinking features to automatically link to related GitHub issues or other pull requests, and to tag people with their GitHub username.

We can also request reviews from specific team members when opening a pull request by clicking the 📝 icon next to text currently showing No reviewers. This then shows a pop-up field into which the GitHub username of another collaborator on the repository can be entered to request a review; multiple reviews can also be requested.

The Files and Commits tabs on the New Pull Request at... dialogue can be used to check which files changes have been made to and what the commit history is of the branch the pull request is being made for.

Once you are happy with the information entered for the pull request, the Create Pull Request button at the bottom of the dialogue can be clicked to open the pull request on the GitHub repository 🎉


Exercise:

  1. Create a new branch named <initials>/<place>-entry where <initials> are your initials and <place> is the name of place you would like to visit or have visited.
  2. Create a new Markdown file placename.md in the relevant subdirectory of the repository, creating any necessary intermediate subdirectories.
  3. Add a title # Place name and short description of the place to the file and save.
  4. Commit the new file to your branch.
  5. Push the branch to the remote repository on GitHub.
  6. Create a new pull request with the branch, adding a brief description of the changes made.

Resources for learning more about using Git

Clone this wiki locally