News

Automated SEO Audits with Slack + Python

In this age of extreme SEO competition, performing scheduled SEO audits has become quite essential in sustaining consistent traffic and page ranks.

Of course, there are many available SEO Audit tools from various well-known platforms, but they are costly and unnecessary burdens to your digital marketing budget.

At PEMAVOR, we’d like to share our expertise to fulfill your needs for digital marketing.

So, here we are with another terrific Python script that will make your life easier!

With Python: Compare the Content Script, we’ve shown a practical way of comparing your content against your strongest competitor.

Then we opened the door a bit more, Python Autosuggest Trends and Python Semantic Keyword Clustering have enabled you to come up with keyword ideas without paying extra fees. Well, after all, we’re performance marketing experts, and we don’t want you to pay extra for anything, neither PPC nor SEO.

Now, it’s time to take a massive step into the mysterious and practical world of Python.

I will now show you how you can set up your SEO monitoring solution in Slack using three audit scripts.

✔️ Add settings to Slack for notifications and file uploads

✔️ Audit Job #1: “Sitemap Status Code Checker”
Report the number of cases with status codes different than 20x.
Attach URL+Bad Status Code as File to the message.

✔️ Audit Job #2: “Internal Link Checker”
Check all internal links found on the website – report the number of cases with bad status codes.
Attach file for bad cases with URL where the link was found, the link URL, the link status code, and the link anchor text.

✔️ Audit Job #3: “Missing Meta Description Checker”
Check for missing meta description on all URLs – report the number of cases
Attach URLs with missing meta description as file

The running example below has two more SEO audit scripts, as I believe that you’ll get many Python-SEO audit solutions as long as you’re creative. You can automate almost everything in Python.

Monitoring App in Slack

At first, you’ll need a Slack environment. The app has a free plan, so it should be fine for now.

  1. Now that you have Slack go to this link and create a new app.
  2. Click on “Create new app.”
  3. Select an App name, e.g., SEO Audit, and select your Slack workspace
  4. You need to add features for notifications and files out of your Python script. Go to “OAuth & Permissions”
  5. Under “Bot Token Scopes,” add the following OAuth Scopes:

files:write
channels:join
chat:write

  1. Now, click “install to the workspace,” and you’ll see “OAuth Access Token.” You need to copy and paste in your Python script.
  2. So, the Slack part is almost finished. Choose a channel where you want your messages; just click the “Add apps” menu and look for your newly created app.

3 Simple SEO Audits in Python

As I already mentioned, this is just a blueprint on how to create your SEO audit. You can add as many routines as you wish.

Just don’t forget to change the sitemap URL and add your own Slack OAuth Access Token, then you’re more than ready.

Here is the Python code:

# Pemavor.com SEO Monitoring with Slack Notifications

# Author: Stefan Neefischer

import requests

from urllib.request import urlparse, urljoin

from bs4 import BeautifulSoup

import advertools as adv

import sys

import json

import time

import pandas as pd

import warnings

warnings.filterwarnings(“ignore”)

def slack_notification_message(slack_token,slack_channel,message):

data = {

‘token’: slack_token,

‘channel’: slack_channel,

‘text’: message

}

url_chat=’https://slack.com/api/chat.postMessage’

response = requests.post(url=url_chat,data=data)

def slack_notification_file(slack_token,slack_channel,filename,filetype):

# link to files.upload method

url = “https://slack.com/api/files.upload”

querystring = {“token”:slack_token}

payload = { “channels”:slack_channel}

file_upload = { “file”:(filename, open(filename, ‘rb’),filetype) }

headers = { “Content-Type”: “multipart/form-data”, }

response = requests.post(url, data=payload, params=querystring, files=file_upload)

def getStatuscode(url):

try:

headers = {‘User-Agent’: ‘Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36’}

r = requests.get(url, headers=headers, verify=False,timeout=25, allow_redirects=False) # it is faster to only request the header

soup = BeautifulSoup(r.text)

metas = soup.find_all(‘meta’)

description=[ meta.attrs[‘content’] for meta in metas if ‘name’ in meta.attrs and meta.attrs[‘name’] == ‘description’ ]

if len(description)>0:

des=1

else:

des=-1

return r.status_code,des

except:

return -1,-1

def is_valid(url):

“””

Checks whether `url` is a valid URL.

“””

parsed = urlparse(url)

return bool(parsed.netloc) and bool(parsed.scheme)

def get_all_website_links(url):

“””

Returns all URLs that is found on `url` in which it belongs to the same website

“””

# all URLs of `url`

internal_urls = list()

# domain name of the URL without the protocol

domain_name = urlparse(url).netloc

headers = {‘User-Agent’: ‘Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36’}

r_content = requests.get(url, headers=headers, verify=False, timeout=25, allow_redirects=False).content

soup = BeautifulSoup(r_content, “html.parser”)

for a_tag in soup.findAll(“a”):

href = a_tag.attrs.get(“href”)

#print(a_tag.string)

if href == “” or href is None:

# href empty tag

continue

# join the URL if it’s relative (not absolute link)

href = urljoin(url, href)

parsed_href = urlparse(href)

# remove URL GET parameters, URL fragments, etc.

href = parsed_href.scheme + “://” + parsed_href.netloc + parsed_href.path

if not is_valid(href):

# not a valid URL

continue

if href in internal_urls:

# already in the set

continue

if domain_name not in href:

# external link

continue

internal_urls.append([href,a_tag.string])

return internal_urls

def get_sitemap_urls(site):

sitemap = adv.sitemap_to_df(site)

sitemap_urls = sitemap[‘loc’].dropna().to_list()

return sitemap_urls

def  sitemap_internallink_status_code_checker(site,SLEEP,slack_token,slack_channel):

print(“Start scrapping internal links for all sitemap urls”)

sitemap_urls = get_sitemap_urls(site)

sub_links_dict = dict()

for url in sitemap_urls:

sub_links = get_all_website_links(url)

sub_links_dict[url] = list(sub_links)

print(“checking status code and description”)

scrapped_url=dict()

description_url=dict()

url_statuscodes = []

for link in sub_links_dict.keys():

int_link_list=sub_links_dict[link]

for int_link in int_link_list:

internal_link=int_link[0]

#print(internal_link)

linktext=int_link[1]

#print(linktext)

if internal_link in scrapped_url.keys():

check = [link,internal_link,linktext,scrapped_url[internal_link],description_url[internal_link]]

else:

linkstatus,descriptionstatus=getStatuscode(internal_link)

scrapped_url[internal_link]=linkstatus

description_url[internal_link]=descriptionstatus

check = [link,internal_link,linktext,linkstatus,descriptionstatus]

time.sleep(SLEEP)

url_statuscodes.append(check)

url_statuscodes_df=pd.DataFrame(url_statuscodes,columns=[“url”,”internal_link”,”link_text”,”status_code”,”description_status”])

#check status code for all sitemap urls

sitemap_statuscodes=[]

for url in sitemap_urls:

if url in scrapped_url.keys():

check=[url,scrapped_url[url]]

else:

linkstatus,descriptionstatus=getStatuscode(url)

check=[url,linkstatus]

time.sleep(SLEEP)

sitemap_statuscodes.append(check)

sitemap_statuscodes_df=pd.DataFrame(sitemap_statuscodes,columns=[“url”,”status_code”])

# statitics and then send to slack

strstatus=””

df_internallink_status=url_statuscodes_df[url_statuscodes_df[“status_code”]!=200]

if len(df_internallink_status)>0:

df_internallink_status=df_internallink_status[[“url”,”internal_link”,”link_text”,”status_code”]]

df_internallink_status[“status_group”]=(df_internallink_status[‘status_code’] / 100).astype(int) *100

for status in df_internallink_status[“status_group”].unique():

ststatus=f'{status}’

noUrls=len(df_internallink_status[df_internallink_status[“status_group”]==status])

sts=ststatus[:-1] + ‘X’

if sts==’X’:

sts=”-1″

strstatus=f”>*{noUrls}* internal link with status code *{sts}*\n” + strstatus

df_internallink_status=df_internallink_status[[“url”,”internal_link”,”link_text”,”status_code”]]

df_internallink_status.to_csv(“internallinks.csv”,index=False)

else:

strstatus=”>*Great news!*, There is no internal links with bad status code\n”

strdescription=””

df_description=url_statuscodes_df[url_statuscodes_df[“description_status”]==-1]

if len(df_description)>0:

df_description=df_description[[“internal_link”,”status_code”,”description_status”]]

df_description=df_description.drop_duplicates(subset = [“internal_link”])

df_description.rename(columns={‘internal_link’: ‘url’}, inplace=True)

df_description.to_csv(“linksdescription.csv”,index=False)

lendesc=len(df_description)

strdescription=f”>*{lendesc}* url that don’t have *meta description*.\n”

else:

strdescription=”>*Great news!*, There is no url that don’t have *meta description*\n”

sitemapstatus=””

df_sitemap_status=sitemap_statuscodes_df[sitemap_statuscodes_df[“status_code”]!=200]

if len(df_sitemap_status)>0:

df_sitemap_status=df_sitemap_status[[“url”,”status_code”]]

df_sitemap_status[“status_group”]=(df_sitemap_status[‘status_code’] / 100).astype(int) *100

for status in df_sitemap_status[“status_group”].unique():

ststatus=f'{status}’

noUrls=len(df_sitemap_status[df_sitemap_status[“status_group”]==status])

sts=ststatus[:-1] + ‘X’

if sts==’X’:

sts=”-1″

sitemapstatus=f”>*{noUrls}* url with status code *{sts}*\n” + sitemapstatus

df_sitemap_status=df_sitemap_status[[“url”,”status_code”]]

df_sitemap_status.to_csv(“sitemaplinks.csv”,index=False)

else:

sitemapstatus=”>*Great news!*, There is no url in sitemap with bad status code\n”

if (len(df_sitemap_status) + len(df_internallink_status) + len(df_description))>0:

message=f”After analysing {site} sitemap: \n”+strstatus+strdescription+sitemapstatus+”For more details see the attachement files.”

else:

message=f”After analysing {site} sitemap: \n”+strstatus+strdescription+sitemapstatus

print(“send slack notifications”)

#send notification to slack

slack_notification_message(slack_token,slack_channel,message)

if len(df_sitemap_status)>0:

slack_notification_file(slack_token,slack_channel,”sitemaplinks.csv”,”text/csv”)

if len(df_internallink_status)>0:

slack_notification_file(slack_token,slack_channel,”internallinks.csv”,”text/csv”)

if len(df_description)>0:

slack_notification_file(slack_token,slack_channel,”linksdescription.csv”,”text/csv”)

# Enter your XML Sitemap

sitemap = “https://www.pemavor.com/sitemap.xml”

SLEEP = 0.5 # Time in seconds the script should wait between requests

#————————————————————————-

# Enter your slack OAutch token here

slack_token = “XXXX-XXXXXXXX-XXXXXX-XXXXXXX”

# Change slack channel to your target one

slack_channel= “SEO Monitoring”

sitemap_internallink_status_code_checker(sitemap,SLEEP,slack_token,slack_channel)

Where to Run and Schedule your Scripts?

  • It’s better you host your script in the Cloud. We use Cloud Functions or Cloud Runs that are triggered by Pub/Sub
  • Or, you can simply use a small virtual server which is provided by many web hosting services. As they generally run on Linux, add your Python code there and schedule it using the good old crontab
  • RaspberryPi is also a good solution if you want to hack around a little bit. You can run your own home-based Linux server 24×7. It is pretty cheap, around 60$ and mobile, so that you can place and hide it somewhere, maybe from your wife ☺. I say it is a perfect project for Covid lockdowns!

Quite fun – not only in boring Covid-19 times: Automating your stuff with a Raspberry Pi home server

Related Articles

Leave a Reply

Check Also
Close
Back to top button
Close