Rate My Professor: is it always bad?

As a college student myself, you would have to hold me hostage with no internet-access in order for me to not check RateMyProfessor reviews before enrolling in any course. If a professor has negative review after negative review after negative review listed with a review saying “I would rate this professor 0/5 if I can, avoid him at any rate especially those who enjoy this field in computer science” (yes, this is an actual review at my university), then I might think twice before signing up for their course. 

Yet, it seems other students don’t have this same reliance on RMP as I do, saying such ludicrous statements as “it’s all negative reviews since you’re more likely to leave a negative review than a positive one.” This begs the question… are the reviews on RateMyProfessor inherently more negative than positive? But that seems too easy, so let’s go one step further into my only issue with RMP: do the extensive written comments for each review represent the single number estimates for course quality and difficulty? And just for the fun of it, what is the highest (and lowest) reviewed courses on RMP? It’ll take some scraping and processing, but in just a few minutes, we will have the answer! 

Lot of Web Scraping 

Let’s get through the messy logistics first before we can get into the analysis: web scraping. As a quick aside: I used Jake Daniel’s wonderful web-scraping R code template you can read more about here and the Google Chrome tool SelectorGadget found here to make this a bit easier on myself. With that, we’ll start our journey at the RMP professor search page for my university, Illinois Institute of Technology. 

The first web scrape will start here, where we will collect the name, subject, and URL to the reviews page for each professor. One minor challenge was the ability to scrape more than 20 professors and not have to set a single limit to search until (if the limit is 654 today, next semester it could be 672 and we are missing out on those newly-added professors). To do this, we just web scrape the total number of results, make it a multiple of 20, and iterate through each multiple of 20 up to that limit, setting the website offset to this to generate a page of new professors. Here’s my code: 

Great – so now we have all the URLs that we will visit and scrape the information. The world is almost our oyster, so what information shall we scrape here? I chose to scrape the professor’s name, subject, and RMP URL. Here we go: 

Our professor_links dataframe looks pretty messy right now… 

Name
<chr>
URL
<chr>
Subject
<chr>
Stagliano,/ShowRatings.jsp?tid=125537Illinois Institute of Technology, Science
Poros,/ShowRatings.jsp?tid=495895Illinois Institute of Technology, Sociology
Saniie, Jafar/ShowRatings.jsp?tid=54696Illinois Institute of Technology, Engineering

… but after a bit of cleanup… 

… our professor_links dataframe finally looks nice and tidy: 

Name
<chr>
URL
<chr>
Subject
<chr>
Staglianohttp://www.ratemyprofessors.com/ShowRatings.jsp?tid=125537Science
Poroshttp://www.ratemyprofessors.com/ShowRatings.jsp?tid=495895Sociology
Jafar Saniiehttp://www.ratemyprofessors.com/ShowRatings.jsp?tid=54696Engineering
… … … 

Our objective is now clear: copy and paste what we just did above with our web scrapper and use it again to get individual ratings for each of the URLs in our dataframe. Sounds simple, but here comes a classic 🚨🚨🚨 ‘uh-oh, I don’t have that much time to run a web-scrapper for hundreds and hundreds of professors’ moment 🚨🚨🚨. So, as a simple example, I filtered the list to only show me professors in my department (Computer Science), which still took about a minute to run: 

In this current implementation, there is a glaring issue – for each professor, the maximum number of reviews we can scrape is capped at 20. I could not figure out how to scrape any more without telling the website to load more reviews – perhaps a future update to this blog post can address this issue. Regardless, the most-recent 20 reviews for a single professor ensures that many reviews are not incredibly old (as both professors’ teaching styles and their courses change quite a bit over many years) and that newer reviews are prioritized, as they should be. 

Slight issue aside, the dataframe we just scraped together is definitely a bit messy. One particular issue I found was with the course – some included a “CS” in front, others were reviews for a single course, some two courses combined in one review, and some just said the full course name with no abbreviations (which is more work than just writing a number in my opinion). 

Name
<chr>
Course
<chr>
Rating
<fctr>
Quality
<dbl>
Difficulty
<dbl>
Comments  
<chr>
\r\n David \r\n Grossman\r\n \r\nDATAMININGgood4.05.0\r\n No Comments\r\n
\r\n Michael \r\n Lee\r\n \r\nCS331CS350awful1.05.0\r\n Isn’t clear whatsoever about grades, assignments, lectures and much more. Overall, bad teacher.\r\n
\r\n Irina \r\n Matveeva\r\n \r\nCS422CS522awesome4.52.0\r\n Teaching and motivations are just amazing… opens up goods opportunities fpr students through guests from industry and real projects…\r\n
… … … … … … 

It’s a bit crude, but the code below cleans up the dataframe as best as possible in a completely automated sense. To address the weird course issue, I mark all non-CS or non-number-having course as “Other” (if a CS professor teaches a non-CS course, I decide to ignore it for now), strip all of the courses to just numbers, split six-digit numbers (AKA two courses reviewed in one) as two separate courses with the same review, and make it all pretty. The code below does a much better job explaining it than me. 

Finally, oh finally, we have clean data: 

Name
<chr>
Course
<chr>
Rating
<fctr>
Quality
<dbl>
Difficulty
<dbl>
Comments  
<chr>
Course_Number 
<chr>
Michael LeeCS331CS350awful1.05Isn’t clear whatsoever about grades, assignments, lectures and much more. Overall, bad teacher.331 
Michael LeeCS331CS350awful1.05Isn’t clear whatsoever about grades, assignments, lectures and much more. Overall, bad teacher.350
Kyle C HaleCS350awesome5.04I’ve never written here about any prof, but Prof. Hale is one of those profs who teach exactly what you need… 350 
… … … … … … … 

Let the analysis begin. 

Looking at the Best and Worst Courses

This task initially stumped me. If a class has a single negative review with quality 1.0 and another class has three negative reviews with quality 1.5, which class is worse? I believe the latter, but how can we quantify this? The answer is, actually, IMDb, believe it or not. 

Applying this true ‘Bayesian estimate’ formula to our CS professor reviews, we can conclude that the highest-reviewed (and lowest-reviewed) CS course at Illinois Tech are… 

CS 595 👍 and CS 553 👎, respectively. 

Just taking a look at the reviews table for these courses shows that this isn’t too hard to believe… 

Name
<chr>
Course
<chr>
Rating
<fctr>
Quality
<dbl>
Difficulty
<dbl>
Comments  
<chr>
Course_Number 
<chr>
Ioan RaicuCS553awful15Professor Ioan is by far the worst professor that I have taken at CS and that’s saying something. Terrible communicate… 553 
Ioan RaicuCS553awful15There will be general 3 program assignments and a big project this semester, but actually he delayed second program assignment that leads to a rush in the final semester…. 553 
Ioan RaicuCS553awful15I would rate this professor 0/5 if I can, avoid him at any rate especially those who enjoy this field in computer science… 553 
… … … … … … … 

… compared with… 

Name
<chr>
Course
<chr>
Rating
<fctr>
Quality
<dbl>
Difficulty
<dbl>
Comments  
<chr>
Course_Number 
<chr>
Kyle C HaleCS595awesome54I would say he is the best and coolest Professor I have never met in IIT. Knowledgeable, skillful, programming assignment is not trivial, but he always explains everything… 595 
Kyle C HaleCS595awesome55If you Want to build an Operating System or would like to you should take his class. 595 
Kyle C HaleCS595awesome54Coolest professor ever. Research oriented, very highly skilled..!! 595 
… … … … … … … 

And to think I almost took CS 553… 

Average Course Quality Rating 

Alright, I’m finally ready to answer the big-momma question: are a majority of the RateMyProfessor reviews inherently negative or positive (I guess specifically in the Illinois Tech CS department for starters). A very simple bar plot below shows that, actually… 

Even though I’ve already made my point that it’s not all negative with an average quality of 3.23 and median even higher (AKA they’re not mostly negative at all), it’s hard to summarize an entire course’s worth of thoughts into a single number. The most useful part of RMP’s system is the extensive comments reviewers leave for each course. If only there was a way to do some sort of “sentiment analysis” on the comments or something… 

RMP Comments, meet NLP 

Yup – let’s do this. I’m no NLP-expert, I truly only have experience working with Stanford’s CoreNLP package, but since the installation is inherently broken on macOS architecture, I’ll be going with an alternative: sentimentr, which is an effective augmented dictionary lookup to determine a numerical sentiment for a sentence. The more negative a rating is, the more negative the sentiment. For example, 

… results in… 

sentence_id
<int>
word_count
<int>
sentiment
<dbl>
190.0000000
213-0.1386750
315-0.3872983

… while happier sentences such as… 

sentence_id
<int>
word_count
<int>
sentiment
<dbl>
180.1767767
240.0000000
3150.387298

You can read more about the package on its Github here

Throwing all of the comments extracted for CS courses at Illinois Tech into the semantic analysis and plotting it shows… 

… 

… 

… 

… exactly the same results as the quality – most reviews truly are more positive than negative. I know – it’s almost unbelievable! 

As a little bonus, I’ve included the exact sentiment analysis breakdown for each and every comment extracted for this analysis – it’s very pretty to look at and worth a read if you’d like a quick chuckle reading about students suffering in class for a semester. You can read that here (and I could not recommend reading it more). 

Conclusion 

It took about a day’s work, but I think our conclusions are valid in this small, controlled sample: RateMyProfessor reviews are not inherently more negative than positive, in both quality rating and comment sentiment (and also CS 553 is perhaps not the best course at Illinois Tech). To the interested reader out there, download the full source code (found right below this) and try it out with your school and your department and see what you get! The answers may surprise you! 

So the next time your friend decides to take a course without reading the reviews because they are convinced the reviews are disproportionately negative, you might want to send them this post first. They’ll thank you later. 

Enough CS – How about the entire school?! Update (as of 1/26/19) 

Alright, fine. You asked – I listened. How about this same analysis for every course at Illinois Tech? 

Deal. 

The source code is just about the same thing – run the scrapper for a very long time to get every review (not just for those CS professors), run just about the same cleaning for all general courses, and let it run! Here’s what we get: 

Using the same true ‘Bayesian’ rating system as before, our lowest-rated course is STILL CS 553, Cloud Computing, with a score of 1.92 (mega-ouch) and the best course being HUM 380, Topics in Humanities (as someone who has taken a HUM 380 course, yes, this sounds about right). Let’s inspect some of these reviews together then, shall we: 

Name
<chr>
Course
<chr>
Rating
<fctr>
Quality
<dbl>
Difficulty
<dbl>
Comments  
<chr>
Course_Number 
<chr>
Ioan RaicuCS553awful15Professor Ioan is by far the worst professor that I have taken at CS and that’s saying something. Terrible communicate both verbally and written. His exams are very difficult and very unreasonable. He barely does one example of each calculation and will not go back to previous days material. He teaches the class as if you should already know this… 553 
Ioan RaicuCS553awful15There will be general 3 program assignments and a big project this semester, but actually he delayed second program assignment that leads to a rush in the final semester… 553 
Ioan RaicuCS553awful15I would rate this professor 0/5 if I can, avoid him at any rate especially those who enjoy this field in computer science. Too many tests, pop quiz, talk more than do. If you plan to take this professor’s course, be careful, there will be dragon. Choose other professor instead. 553 
… … … … … … … 

… vs. … 

Name
<chr>
Course
<chr>
Rating
<fctr>
Quality
<dbl>
Difficulty
<dbl>
Comments  
<chr>
Course_Number 
<chr>
Glenn BroadheadHUM380awesome5.01I would reccomend Broadhead. He is a very kind grader and is also friendly and funny in class. I learned a lot through videos he put on. The only assignments were a 5 page and 10 page paper and a presentation at the end. 380 
Catherine BronsonHUM380awesome5.01Awesome class, awesome professor, very interesting topic. One or two essays, open book exams are easy if you’ve paid attention to interesting class discussion. 380 
Teresa MorenoHUM380awesome4.51Good teacher. Really interested in women’s issues and will make you think. If you show up and do the work you’ll get a good grade. 380 
… … … … … … … 

As a CS major, this definitely worries me.

Looking again at our plots, we see that, actually, the quality of Illinois Tech courses are well above average, the average difficulty is right in the sweet-spot, and even the semantic analysis of our RateMyProfessor comments are still showing more positives than negatives! 

So now, school-wide, I think I’ve definitively made the case for RateMyProfessor! 


Full source code with exciting bonus features can be found here. A direct link to donate me money via hiring me for a job can be found here

Leave a Reply

Your email address will not be published. Required fields are marked *