Skip to content

Latest commit

 

History

History
142 lines (98 loc) · 6.37 KB

README.md

File metadata and controls

142 lines (98 loc) · 6.37 KB
upd-logo-2019

CRScraper

Indecisive ka ba sa pagpili ng mga schedule 'pag CRS enlistment nanaman? O eto ang sagot diyan.

This project is a simple schedule maker with probability ranking-based system for University of the Philippines Diliman's Computerized Registration System (CRS). Basically, it generates all possible schedule you can make given a list of courses you want to enlist into! You just have to input your CRS login credentials, and courseURLs in this format <course_link0>, <course_link1>, <course_link2>, ....

courseURLS sample input: https://crs.upd.edu.ph/preenlistment/class_search/5670, https://crs.upd.edu.ph/preenlistment/class_search/18849, https://crs.upd.edu.ph/preenlistment/class_search/18843

Note: Though working, this project is still UNDER DEVELOPMENT. Feel free to contribute din!


Simple Testing

  1. Clone the repository

    git clone https://github.com/meezlung/CRScraper.git
    cd CRScraper/
    
  2. Search for your preferred courses in CRS and copy paste the input as follows:

    all_course_table_schedule_url_cs_test = [
                        "https://crs.upd.edu.ph/preenlistment/class_search/19405", 
                        "https://crs.upd.edu.ph/preenlistment/class_search/19398", 
                        "https://crs.upd.edu.ph/preenlistment/class_search/19403",
                        "https://crs.upd.edu.ph/preenlistment/class_search/19404",
                        "https://crs.upd.edu.ph/preenlistment/class_search/19480",
                        ] # Sample format for CS 136, CS 21, CS 33, CS 132, and Eng 30
    
    # Note: Each URL corresponds to a search result table of a DESIRED COURSE.
    # You may edit this list as you please.
    
  3. Open test.py in a text editor, and modify the all_course_table_schedule_url_cs_test variable.

  4. Feel free to edit the filename variable as well.

  5. Save the file and run test.py in the terminal.

    python test.py
    
  6. The generated ranked schedules output will be saved as schedules_ranked_test.csv and will be in the same directory as test.py.


Use the App through Docker

  1. Download Docker (if you don't have one yet).

  2. Clone the repository

    git clone https://github.com/meezlung/CRScraper.git
    cd CRScraper/
    
  3. Build Docker (make sure Docker is running at the background)

    docker-compose up --build
    

    Note: If you want the composed container to be removed, run the following:

    docker-compose down
    
  4. Go to http://localhost:3000


Demo

CRScraperSimpleTestingDemo.mp4
CRS.Scraper.Preenlistment.Run.mp4

Overview of the files

crs_scraper_preenlistment.py, crs_scraper_waitlist.py

  • Consists of the class, CRScraper.
  • CRScraper
    • This just scrapes everything from the CRS website then outputs a data in the form of list[dict[str, str | list[str]]].
    • I used this last Midyear CRS enlistment (2023) and outputted the data as raw_data_CS_2ndYear_1stSem_AY_2024-2025.json and raw_data_CS_2ndYear_1stSem_AY_2024-2025.txt.
    • They are raw data, so I had to use data_sorter.py to sort important property data. What I'm thinking now is that I should really optimize this by sorting and organizing properties while scraping the website (see optimized_crscraper.py).

crs_data.py (for debugging purposes only)

  • This contains the raw data generated by crs_scraper.py last CRS enlistment during the Midyear for the subjects Physics 72, Math 23, Math 40, CS 20, and CS 32 (my courses this 1st Sem Second Year).

data_sorter.py

  • Consists of classes, DataSorter and ScheduleGenerator.
  • DataSorter
    • It needs the raw data from crs_scraper.py as input.
    • Only formats the raw data beautifully so it's kinda unnecessary.
  • ScheduleGenerator
    • This generates all possible combinations of schedules with no time conflict.
    • Also has the feature to rank all generated schedule combinations possible by the property probability.
    • TODO:
      • Can be good as well if we can consider other constrains as well (e.g. Restrictions/Remarks)

probability_calculator.py

  • Consists of the class, ProbabilityCalculator.
  • ProbabilityCalculator
    • This is highly inspired by Leonard Ang's code in his UPD-Course-Probability-Calculator
    • This calculates probability based on available slots, total demand, and preenlistment priority.
    • I think this should not be the only factor in deciding which ranks the best schedule.

crs_main.py

  • This controls everything, including crs_data.py, data_sorter.py, crs_scraper.py via import modules.
  • This will also serve as the main backend file for the Svelte frontend via Flask.

test.py

  • Same behavior as crs_main.py, but is only just for local testing purposes.

Mga Kulang Pa (Pero ewan ko kung gagawin ko pa to):

  • Ranking system based on Rate UP Profs (RUPP) or Restrictions/Remarks. (Still don't know how to implement)
  • JavaScript scraper in the future so we can upload to Google Extensions?! If not, host in the internet? Though needs a way to host backend and frontend online (consider homemade Linux server).
  • I think we need to scrape Preenlistment Priority as well to feed it into the Course Probability Calculator
  • Also add criterias/filtering functions for organizing schedules (e.g. no weekend classes, no classes after 4 pm, no class start before 9 per se). (Still don't know how to implement)
  • Apply DP to overlapping subproblems in the backtrack function. (Tinatamad pa)
  • Fix typehints, organize code later.
  • Organize documentation per each function.
  • Feature for utilizing similar subjects.