-
Slides - Full Pages, 4-per page
-
Federal Election Commission (FEC) data
- https://www.fec.gov/data/browse-data/?tab=bulk-data.
- "Contributions by individuals" panel
- zip file
- itcont.txt
- Data/FEC/itcont100K.txt - smaller version - 100,000 lines
- Exploring reading the FEC Data
- [R Session]((Lectures/Day2/Rsession2)
- R Session
- Code in Data/NCBI
- Strategy
- Different versions of functions
- test.R - run the functions
- funs0.R, funs1.R, funs2.R
- funs3.R - problems/bugs
- funs4.R - improvements on funs3.R
- Debugging funs3.R
-
- The functions we wrote.
- Also see funs3.R and funs4.R from Day3 (above)
- Day 5 R Session
- See Debugging funs3.R for the narrative of this debugging session.
-
- reading the web logs
- using read.table() and fixing the results
- start of reading via regular expressions and capture groups as described in the Web log case study.
- reading the web logs
- Day 9 R Session
- reading the web logs
- regular expressions and capture groups
- Variation of the approach in Web log case study.
- specifically, getting the GET, file path and HTTP version at the same time as all of the other columns/capture groups.
- Some code to get the data as data.frame and cleanup the variables
- reading the web logs
- Day 13 SQL Session
- SQL Code for Self Join
- Example of Self Join to Compute Time/Year Difference
- Example of using SUM() to count the number of tuples in a particular category/logical condition
- Exploring, Debugging SQL and Joins
- Web Scraping Introduction slides
- R Session
- Reading HTML Table - Wikipedia's USA state populations
github.com/search via HTML
github.com/search via HTML
- R Session
- Functions to scrape github.com/search
- general approach code scraping search results and processing pages of results
results = ghSearch("R URL decode")
- XPath Axes
-
Call graphs for StackOverflow functions
-
Writing functions
- General guidelines and principles
- email messages
- outline of implementations
- R functions
- Data/examples
- Individual email with body and attachment
- Multiple emails in single file, no attachments from R-help mailing list archives.
- slides
- R essentials
- Vectors and lists
- apply() functions
- subsetting
- preallocation/not concatenating
- Run times for URLdecode implementations
- run time for utils::URLdecode() - quadratic function.
- Extrapolation to 600K
- run times for original, preallocated and vectorized versions
- run times for preallocated and vectorized versions only for more detail.
- Note, these were run on a slow, debugging version of R (compiled was not optimized.)
- Avoding redundant computations
- riverdist package and the whoconnected() function.
- See slides
- Start of riverdist package and removemicrosegs() function
and removing for(jj ...) for(jjj ...) nested loops.
- See slides
- and also slides from Day 20.
- riverdist package and removemicrosegs() function
- remove for(jj ...) for(jjj ...) nested loops.
- slides on vectorization
- Example of combining pairs of words in data.frame based on alphabetical order
- Vectorized creation of sample words
- Organizing Markdown
- example of doing timing computations in a script and saving the timing results to file and then reading these in the Rmarkdown document in order to plot them, etc.
- and having the functions in a separate file, e.g., URLdecodeFuns.R
and using
source()
in the Rmarkdown to read and define the functins. - the Rmarkdown example