data structures and pointers in the C language
- Practice development in the C programming language.
- Practice explicit memory allocation and deallocation.
- Practice creating and manipulate data structures (arrays and
struct
s). - Practice working with pointers.
- Practice writing algorithms for these data structures.
- Prepare for C2.
- Develop good coding style.
This is a warmup assignment, meant to give you a problem which involves major features of the C language but is not hard in itself. You need to read in a file with English words and output a file where the words are counted and sorted. You need to use a binary tree as the fundamental structure.
You need to fork this repository and git clone
it to your development machines. When you are done and your code works, git commit
all your changes and git push
to your forked (aka remote repository. Work in the master branch. This assignment will not have a test suite like the following ones, so you don't have to install cmocka. Your output will only be compared to the correct output.
Submissions are one per team. If you haven't done so, create a git account for your team. The name should look like msl-os-[color]-[school]-[term]. For example, msl-os-orange-msud-s17 or msl-os-black-ucd. (If you want, you can create an organization but that might be overkill.) Make all team submissions from this account.
This assignment is pass/fail and carries no score. Note that good coding style will be important for further assignments so use this opportunity to develop it.
The assignment is due on Mon, Feb 6, at 23:59 Mountain time. The last commit to your repository before the deadline will be graded.
Free Github repositories are public so you can look at each other's code. Please, don't do that. You can discuss any programming topics and the assignments in general but sharing of solutions diminishes the individual learning experience of many people. Assignments might be randomly checked for plagiarism and a plagiarism claim may be raised against you.
Note that PA1 one is an individual assignment, not a team assignment like the upcoming Pintos assignments.
For this assignment, no external libraries should be used, except for the ANSI C Standard Library. The implementation of the data structures should be your own. We will use library implementations of data structures and programming primitives in the Pintos assignments.
Familiarize yourself with and start the following coding style guide. While you are not expected to follow every point of it, you should try to follow it enought to get a feel for what is good style and bad style. C code can quickly become unreadable and difficult to maintain.
A minimal C Reference, which should be sufficient for your needs.
The C98 Library Reference is more complete.
The C11 Standard is just provided for completeness, and you shouldn't need to read it, except peruse it out of curiosity.
Two guides for implementation of malloc()
: here and here.
The input will be a file containing a single line (i.e. no new lines) of all-lower-case English words. You need to read it in to sort them. The file will be in the same directory as your executable and will appear as the first argument when your program is run. To test input files are provided for you. A much larger file will be used in addition to test your program.
For each input file input01.txt
, generate an output file in the same directory, called myoutput01.txt
. Use the provided output01.file
to compare against. The output should be an alphabetized list of unique words and their counts in the file, as follows:
one: 4
three: 5
years: 23
You should use a binary tree to keep the running count for words and keep them in alphabetical order. This means that if you have four words, say one, two, three, go, one, that come in this order, you will end up with a tree that looks like:
one(2)
/ \
go(1) two(1)
/
three(1)
Think what traversal you need to print the words in the tree in alphabetical order.
The tree has to be a self-referential C struct
, containing a dynamically allocated word
, its integer count
, and pointers to the left
and right
subtrees. In other words, a tree is equivalent to a single node of the tree.
- Read words from the file. Don't read in the whole file and then process it. Read it in chunks using a buffer.
- Tree lookup for the next word in the input.
- If a word is in the tree, increment the count; if it isn't, dynamically allocate a new node; position and link it properly, and initialize the count to 1.
- When you are done with the input, you should have a complete tree. Use it to print out the output file.
- Destroy the tree, making sure you
free()
dynamic structures in the proper order. - Your tree functions should be recursive.
Overwrite the README.md file and describe your project (approach, data structures, algorithms, etc.). Use this opportunity to get to know markdown. It's a useful skill.