I recently undertook learning Python to solve a challenge to parse an RSS feed from Microsoft for their Office 365 URLS and IPs. I had attempted learning Python in the past just by reading the online tutorials and was not very successful. Furthermore, I had always been skeptical of learning via YouTube, but I noticed that my children were very successful using the service to teach themselves useful skills.
I did some research and discovered that Google had a nice learning curriculum for Python that incorporated YouTube videos, online documentation, and corresponding exercises. My overall experience with the Google curriculum was good, and I really learned the basics of the language well. Of course, I still needed to do research on what modules I would need to incorporate in my program, but it really got me going down the right path.
I would recommend leveraging their course, and for those interested see below for my script, which reads the RSS feed from the MS URL and then parses it into an output.csv. Below is the full script:
## # # 12/29/2017 # XML Extract # Take MS XML for O365 URLS and IPs and move to CSV # Outputs with headers # # ## """ Lightweight extract program to take in URL, parse XML and output CSV file """ import sys import re from xml.dom import minidom import urllib.request import datetime # Define a main() function def main(): #hard coded date to search after checkDate = datetime.datetime(2017, 11, 1) #print (checkDate) #MS URL to RSS XML file url = 'https://support.office.com/en-us/o365ip/rss' xmldoc = minidom.parse(urllib.request.urlopen(url)) outputList =  headerStr = 'Pub Date, Action, Effective Date, Product impact, Express Route, IP/FQDN, Notes \n' outputList.append(headerStr) rss = xmldoc.documentElement items = rss.getElementsByTagName('item') for item in items: datetime_object = datetime.datetime.strptime(item.getElementsByTagName('pubDate').firstChild.data, '%a, %d %b %Y %H:%M:%S %Z') if datetime_object > checkDate: description = item.getElementsByTagName('description').firstChild.data description = re.sub(r'\n', ' ', description) splList = description.split(";") if len(splList) > 1: ## # Split out notes field / will repeat the append to each # row to help grouping # Need to add some logic if the field just contains notes ## dataSplit = splList.split("Notes:") if len(dataSplit) > 1: dataList = dataSplit.split(',') for data in dataList: match = re.search(r'(\d/\[)(.+)(\])', data) colList = match.group(2).split('.') # second regular expression to parse content extracted between square brackets match2 = re.search(r'(.+\.)(.+\.)(.+\. )(.+)', match.group(2)) #useful debug step to see FQDNs or IPs #print("here " + match2.group(4)) # Split would not work due to use of periods in IPs ## # Construct Output String before adding to file ## outputStr = datetime_object.strftime("%m/%d/%Y") + ',' + splList + ',' outputStr = outputStr + match2.group(1) + "," + match2.group(2) + ","+ match2.group(3) + ","+ match2.group(4) + "," outputStr = outputStr + dataSplit + ", \n" outputList.append(outputStr) else: print('hi' + splList) outputList.append(datetime_object.strftime("%m/%d/%Y") + ', , , , , ,'+ splList + '\n') str = '' # construct and write the file for output in outputList: str = str + output outf = open('output.csv', 'w') outf.write(str + '\n') outf.close() # This is the standard boilerplate that calls the main() function. if __name__ == '__main__': main()