HackIdea is a platform to generate hackathon ideas. The platform will come in 2 forms:
- choice-based: where users basically narrow down using filter
- chat-based: users can describe their idea and it will be generated in a conversational manner, ideally allowing for followup questions
These ideas have to come from somewhere though, right?
Goals:
- Learn web scraping with python
- Fine-tune some models
- Implement some of that caching and data lake stuff I learnt about in class
- (stretch goal) create a robust api to provide the data
Phase 1: Scraper, the prepmaster
To get the data I needed, I had to "obtain" this data from a reliable source. The whole process took 12h and 28 minutes. I think this was mostly because I was using somewhat poor scraping logic as this is my first(and last?) time scraping. Here is a screenshot taken at the end of the process: