Tokenization and noise removal are staples of almost all text pre-processing pipelines. However, some data may require further processing through text normalization. Text normalization is a catch-all term for various text pre-processing tasks. In the next few exercises, we’ll cover a few of them:
The simplest of these approaches is to change the case of a string. We can use Python’s built-in String methods to make a string all uppercase or lowercase:
my_string = 'tHiS HaS a MiX oF cAsEs' print(my_string.upper()) # 'THIS HAS A MIX OF CASES' print(my_string.lower()) # 'this has a mix of cases'
Make all the characters in
brands lowercase and save the results to
Make all the letters in
brands uppercase and save the results to