Continued from the last exercise: detecting those titles with Korean letters was easy but excluding them wasn’t. I asked help from Allison to figure out how to filter out the ones with Korean, which is a method that ideally only collects English titles.
The problem was, sometimes I visited contents that belong to neither Korean nor English. After the review, I found out mostly they are Japanese, and excluded them as well – despite of their small number.
Next step was going into more details. One of the things I tried was checking the usage difference of same service in two languages. It will be more useful if I can collect interesting keywords, and filter out results based on them.
Project by Google Fonts – saving in this post as a reference. Not sure how relevant it will be, but it was interesting to see the same content in different composition.