Skip to content

oxylabs/backlink-monitoring

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 

Repository files navigation

Backlink checker

Table of Contents

Backlink checker is a simple tool, which checks backlink quality, identifies problematic backlinks, and outputs them to a specific Slack channel.

The tool tries to reach a backlink, which is supposed to contain a referent link, and checks if it indeed does. If a backlink contains a referent link, the tool retrieves the HTML of that backlink and checks for certain HTML elements, which indicate good quality of backlink.

Packages Required

The first step is to prepare the environment. The backlink checker is written in Python. The most common Python packages for creating any web crawling tool are Requests and Beautiful Soup 4 - a library needed for pulling data out of HTML. Also, make sure you have Pandas package installed, as it will be used for some simple data wrangling.

These packages can be installed using the pip install command.

pip install beautifulsoup4 requests pandas

This will install all the three needed packages.

Important: Note that version 4 of BeautifulSoup is being installed here. Earlier versions are now obsolete.

Checking backlinks

The script scrapes backlink websites and checks for several backlink quality signs:

  • if backlink is reachable
  • if backlink contains noindex element or not
  • if backlink contains a link to a referent page
  • if link to referent's page is marked as nofollow

STEP 1: Check if backlink is reachable

The first step is to try to reach the backlink. This can be done using the Requests library's get() method.

try:
    resp = requests.get(
        backlink,
        allow_redirects=True
    )
except Exception as e:
    return ("Backlink not reachable", "None")

response_code = resp.status_code
if response_code != 200:
    return ("Backlink not reachable", response_code)

If a request returns an error (such as 404 Not Found) or backlink cannot be reached, backlink is assigned Backlink not reachable status.

STEP 2: Check if backlink HTML has noindex element

To be able to navigate in the HTML of a backlink, a Beautiful soup object needs to be created.

bsObj = BeautifulSoup(resp.content, 'lxml', from_encoding=encoding)

Note that if you do not have lxml installed already, you can do that by running pip install lxml.

Beautiful Soup's find_all() method can be used to find if there are <meta> tags with noindex attributes in HTML. If that's true, let's assign Noindex status to that backlink.

if len(bsObj.findAll('meta', content=re.compile("noindex"))) > 0:
    return('Noindex', response_code)

STEP 3: Check if backlink HTML contains a link to a referent page

Next, it can be found if HTML contains an anchor tag (marked as a) with a referent link. If there was no referent link found, let's assign Link was not found status to that particular backlink.

elements = bsObj.findAll('a', href=re.compile(our_link))
if elements == []:
    return ('Link was not found', response_code)

STEP 4: Check if referent page is marked as nofollow

Finally, let's check if an HTML element, containing a link to a referent page, has a nofollow tag. This tag can be found in the rel attribute.

try:
    if 'nofollow' in element['rel']:
        return ('Link found, nofollow', response_code)
except KeyError:
    return ('Link found, dofollow', response_code)

Based on the result, let's assign either Link found, nofollow or Link found, dofollow status.

Assigning results to Pandas DataFrame

After getting status for each backlink and referent link pair, let's append this information (along with the response code from a backlink) to pandas DataFrame.

df = None
for backlink, referent_link in zip(backlinks_list, referent_links_list):
    (status, response_code) = get_page(backlink, referent_link)
    if df is not None:
        df = df.append([[backlink, status, response_code]])
    else:
        df = pd.DataFrame(data=[[backlink, status, response_code]])
df.columns = ['Backlink', 'Status', 'Response code']

get_page() function refers to the 4-step process that was described above (please see the complete code for the better understanding).

Pushing results to Slack

In order to be able to automatically report backlinks and their statuses in a convenient way, a Slack app could be used. You will need to create an app in Slack and assign incoming webhook to connect it and Slack's channel you would like to post notifications to. More on Slack apps and webhooks: https://api.slack.com/messaging/webhooks

SLACK_WEBHOOK = "YOUR_SLACK_CHANNEL_WEBHOOK"

Although the following piece of code could look a bit complicated, all that it does is formatting data into a readable format and pushing that data to Slack channel via POST request to Slack webhook.

cols = df.columns.tolist()
dict_df = df.to_dict()
header = ''
rows = []

for i in range(len(df)):
    row = ''
    for col in cols:
        row +=  "`" + str(dict_df[col][i]) + "` "
    row = ':black_small_square:' + row
    rows.append(row)

data = ["*" + "Backlinks" "*\n"] + rows

slack_data = {
    "text": '\n'.join(data)
}

requests.post(webhook_url = SLACK_WEBHOOK, json = slack_data)

That's it! In this example, Slack was used for reporting purposes, but it is possible to adjust the code so that backlinks and their statuses would be exported to a .csv file, google spreadsheets, or database.

Please see backlink_monitoring_oxylabs.py for the complete code.

About

Backlink checker is a simple tool, which checks backlink quality, identifies problematic backlinks, and outputs them to a specific Slack channel

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages