The data that is contained in the file consists of metadata and linguistic analysis of posts over the years 2002 to 2011. Working with 20,000 posts randomly selected from the original file, the task was to:
- Analyse activity and language on the forum over time
- Analyse the language used by groups
- Identify social networks online
This was done through cleaning the dataset and then creating multiple graphs to visualise the results and performing null hypothesis tests to further prove the information found.
Full report can be found on my github (private): https://github.com/TingHanGan/FIT3152_Assignment1.