Learning Python

I recently undertook learning Python to solve a challenge to parse an RSS feed from Microsoft for their Office 365 URLS and IPs. I had attempted learning Python in the past just by reading the online tutorials and was not very successful. Furthermore, I had always been skeptical of learning via YouTube, but I noticed that my children were very successful using the service to teach themselves useful skills.

I did some research and discovered that Google had a nice learning curriculum for Python that incorporated YouTube videos, online documentation, and corresponding exercises.   My overall experience with the Google curriculum was good, and I really learned the basics of the language well. Of course, I still needed to do research on what modules I would need to incorporate in my program, but it really got me going down the right path.

I would recommend leveraging their course, and for those interested see below for my script, which reads the RSS feed from the MS URL and then parses it into an output.csv.  Below is the full script:


##
#
# 12/29/2017
# XML Extract
# Take MS XML for O365 URLS and IPs and move to CSV
# Outputs with headers 
#
#
##

"""
Lightweight extract program to take in URL, parse XML and output CSV file

"""
import sys
import re
from xml.dom import minidom  
import urllib.request
import datetime



# Define a main() function
def main():
  #hard coded date to search after
  checkDate = datetime.datetime(2017111)
  #print (checkDate)
  #MS URL to RSS XML file
  url = 'https://support.office.com/en-us/o365ip/rss'
  xmldoc = minidom.parse(urllib.request.urlopen(url))
  outputList = []
  headerStr = 'Pub Date, Action, Effective Date, Product impact, Express Route, IP/FQDN, Notes \n'
  outputList.append(headerStr)
  rss = xmldoc.documentElement
  items = rss.getElementsByTagName('item')
  
  for item in items:
    datetime_object = datetime.datetime.strptime(item.getElementsByTagName('pubDate')[0].firstChild.data'%a, %d %b %Y %H:%M:%S %Z')
    if datetime_object > checkDate:
      description = item.getElementsByTagName('description')[0].firstChild.data
      description = re.sub(r'\n'' 'description)
      splList = description.split(";")
      if len(splList) > 1:
        ##
        # Split out notes field / will repeat the append to each 
        # row to help grouping
        # Need to add some logic if the field just contains notes
        ##
        dataSplit = splList[1].split("Notes:")
        if len(dataSplit) > 1:
          dataList = dataSplit[0].split(',')
          for data in dataList:
            match = re.search(r'(\d/\[)(.+)(\])'data)
            colList = match.group(2).split('.')
            # second regular expression to parse content extracted between square brackets
            match2 = re.search(r'(.+\.)(.+\.)(.+\. )(.+)'match.group(2))
            #useful debug step to see FQDNs or IPs
            #print("here " + match2.group(4))
            # Split would not work due to use of periods in IPs
            ##
            # Construct Output String before adding to file
            ##
            outputStr = datetime_object.strftime("%m/%d/%Y") + ',' + splList[0] + ','
            outputStr = outputStr + match2.group(1) + "," + match2.group(2) + ","match2.group(3) + ","match2.group(4) + ","
            outputStr = outputStr + dataSplit[1] + ", \n"
            outputList.append(outputStr)
      else:
        print('hi' + splList[0])
        outputList.append(datetime_object.strftime("%m/%d/%Y") + ', , , , , ,'splList[0] + '\n')
  str = ''          
  # construct and write the file
  for output in outputList:
    str = str + output
  outf = open('output.csv''w')    
  outf.write(str + '\n')
  outf.close()
            
              

        


# This is the standard boilerplate that calls the main() function.
if __name__ == '__main__':
  main()

 

Leave a Reply

Your email address will not be published. Required fields are marked *