data structures and pointers in the C language
- Learn to work with Git and Github.
- Practice development in the C programming language.
- Practice explicit memory allocation and deallocation.
- Practice working with pointers, raw references to memory locations, usually dynamically allocated.
- Practice creating and manipulating data structures (arrays and
struct
s). - Practice writing algorithms for these data structures.
- Prepare for C2.
- Develop good coding style.
This is a warmup assignment, meant to give you a problem which involves important features of the C language (pointers, dynamic memory allocation with malloc
and deallocation with free
, struct
data structures), but is not in itself hard conceptually. You need to read in a file with English words and output a file where the words are counted and sorted. You are required to use an unbalanced binary tree as the fundamental structure.
You need to fork this repository and git clone
it to your development environment. When you are done and your code works, git commit
all your changes and git push
to your forked (aka remote) repository. Work in the master branch. This assignment will not have a test suite like the following ones, so you don't have to install cmocka. Your output will only be compared to the correct output.
Submissions are one per team. If you haven't done so, create a git account for your team. The name should look like ucd-[course]-[team-color]-s18. For example, ucd-os-orange-s18 or ucd-unix-black-s18. (If you want, you can create an organization but that might be overkill.) Make all team submissions from this account.
Both team members have to make an assignment submission on Canvas before the deadline. You only need to enter the clone URL of your project repository (e.g. https://github.com/ivogeorg/msl-clang-001.git).
50
This assignment has no test suite, so your grade is at the discretion of the instructor. Criteria:
- Correctness
- Performance
- Style
The assignment is due on Sun, Jan 21, at 23:59 Mountain time. The last commit to your repository before the deadline will be evaluated.
Free Github repositories are public so you can look at each other's code. Please, don't do that. You can discuss any programming topics and the assignments in general but sharing of solutions diminishes the individual learning experience of many people. Assignments might be randomly checked for plagiarism and a plagiarism claim may be raised against you. You definitely don't want that.
For this assignment, no external libraries should be used, except for the ANSI C Standard Library. The implementation of the data structures should be your own. We will use library implementations of data structures and programming primitives in the Pintos assignments.
Familiarize yourself with and start the following coding style guide. While you are not expected to follow every point of it, you should try to follow it enought to get a feel for what is good style and bad style. C code can quickly become unreadable and difficult to maintain.
The C Reference, which you should get confortable consulting.
The ISO C Standards defines the language. A freely available draft C11 Standard, if you want to dig deep.
This C98 Library Reference seems to be the standard reference. You should not expect many changes, though it's always good to work off a latest copy of your library reference, which should be available through the vendor/implementor.
Two guides for implementation of malloc()
: here and here.
The input will be a file containing a single line (i.e. no new lines) of all-lower-case English words. You need to read it in to sort them. The file will be in the same directory as your executable and will appear as the first argument when your program is run. To test input files are provided for you. A much larger file will be used in addition to test your program.
For each input file input01.txt
, generate an output file in the same directory, called myoutput01.txt
. Use the provided output01.file
to compare against. The output should be an alphabetized list of unique words and their counts in the file, as follows:
one: 4
three: 5
years: 23
You should use a binary tree to keep the running count for words and keep them in alphabetical order. This means that if you have four words, say one, two, three, go, one, that come in this order, you will end up with a tree that looks like:
one(2)
/ \
go(1) two(1)
/
three(1)
Think what traversal you need to print the words in the tree in alphabetical order.
The tree has to be a self-referential C struct
, containing a dynamically allocated word
, its integer count
, and pointers to the left
and right
subtrees. In other words, a tree is equivalent to a single node of the tree.
- Read words from the file. Don't read in the whole file and then process it. Read it in chunks using a buffer.
- Tree lookup for the next word in the input.
- If a word is in the tree, increment the count; if it isn't, dynamically allocate a new node; position and link it properly, and initialize the count to 1.
- When you are done with the input, you should have a complete tree. Use it to print out the output file.
- Destroy the tree, making sure you
free()
dynamic structures in the proper order. - Your tree functions should be recursive.
Overwrite the README.md file and describe your project (approach, data structures, algorithms, etc.). Use this opportunity to get to know markdown. It's a useful skill.