Dear regex, Merry Christmas!
Ho ho ho, Christmas is coming! So today, we wanna write a script that automatizes the process of reading and analyzing german children’s christmas wishes.
Step 1 - The Data.
The whole world is getting more and more digital. Some christmas elf converted all the children’s letters to a digital format, saved them as a JSON file and uploaded them.
First of all, your script should get the christmas wishes from the web, using the module urllib or requests.1
The full URL is http://fsr.github.io/python-lessons/misc/christmas.json
.
Then, parse the data, using the json
module2 to have the data in a python-readable format. Now, analyze the the data you got. Does it follow a pattern?
Step 2 - Sort out the Spam.
Now there are some kiddies who were not able to write a good wish letter or just wanted to send Santa some spam mail. Those letters can be easily recognized, since they are not starting with a “Lieber Weihnachtsmann, …“.
Start to sort out the whishlists by saving names of those kids who were not able or willing to write a good letter in a new list for naughty children. We will need those names for step 5.
Step 3 - “He know’s if you’ve been bad or good…“
Find out, who was naughty and who was nice. Or at least you check if the child was nice all the time.
Check with an regex if the whishlist contains 'immer lieb'
or 'immer artig'
or 'immer brav'
.
Step 4 - What does Santa have to buy?
Santa doesn’t want to read whole letters. He wants to get a list of items to buy for every good child.3
The whishes follow a pattern (every new wish starts with a -
).4,5
Step 5 - Results for Santa!
Gather all your results and print them out in a nicely readable format, like:
Note that children that didn’t write a correct wishlist are naughty in Santa’s eyes, too!
Dump your generated list also as a text file that can be sent to Santa later.
Done!
Merry Christmas!
-
Remember to
import urllib.request
and have a look at our last lesson regarding the URL library. Hint: use something likeurllib.request.urlopen()
with.read().decode("UTF-8")
to get the data. ↩ -
Have a look at the docs if you need a little refreshment in terms of “How to read json data”. ↩
-
So start to build a new dict with an entry for every good child. ↩
-
In the previous steps we only needed a single match. Here we should use
finditer
to get an iterable of all available match objects. ↩ -
Do not forget: You can use groups to get the information out of them! ↩