Using Claude 3.7 Sonnet to Help Categorise Blog Posts for SEO Topic Clustering


Contents
  • Using Claude for Content Topic Clustering
  • How the Google Apps Script Works
  • The Importance of Content Clustering for SEO
  • The App Script in Full

Last week I came across Giles Thomas’ blog post on the topic of ‘On the benefits of learning in public‘ via Hacker News. So Friday afternoon, while I was working through a task relating to topic clustering for SEO that was made dramatically more efficient and less time-consuming with the help of Claude, I wanted to share the steps that got me there.

The use of content pillars for SEO isn’t new. It wasn’t new when it blew up as a topic a few years back, and every SEO and their dog was talking about, nor is it new now. But it can still be a beneficial practice to audit existing blog posts and to see how they align with your core services or offerings.

In this post I’ll explore how I used this on a commercial website. Not to say it can’t be used on a personal blog either, in fact, it probably would be pretty interesting to see the range of topics and categories covered on a blog of any kind, but here I’ll focus on businesses offering core services and how their blog section and the content within it ultimately strengthens their position within that category, or ultimately leaves and gaps and opportunities to be filled.

Using Claude for Content Topic Clustering

Starting from the beginning, lets see the exact prompt I started with. It is nothing fancy, and my prompt writing could definitely improve.

Actually this wasn’t the very first step. The step before this saw me use Screaming Frog to extract the content of all blog posts on the website I was analysing. This was done by copying the xpath for the blog content section. I think I’ll leave that as a post of its own to refrain from this entry becoming too long.

Claude’s first reply to this was actually in the form of a Python script. I’m certainly no expert there, although I think it would make me a better SEO to learn so perhaps I’ll dedicate some time to that soon. This is also an example of how my prompt writing could be improved, as I could have informed Claude that I would ideally be using Google Sheets, anyhow. The next reply came in the form of an App Script.

So as your see at the bottom of the screenshot above, Claude was already beginning to give me the option to ‘Setup Topic Keywords’ after the App Script had run. Perhaps if using the App Script more than once across multiple projects this would be the best approach as I could customise it to every new project. But when going through it for the first time, I didn’t know if the App Script would work or the effectiveness of it.

So I decided to stay in Cluade for now. I gave it the topics I wanted each blog topic to be sorted into. These related to the core services offered by the client I was working with. In this instance, they remained quite broad, think ‘Brand’ and ‘Marketing’.

Claude then took these topics and started assigning keywords to them.

These were a good start to be honest and probably would have worked fine. But I felt on a roll and that I could be on to something.

So I went a step further. I copied over all of the content for each of the Service pages on the client website aiming for these to refine the keywords related to each topic, which they did.

And that was pretty much it. I was ready for the complete App Script.

How the Google Apps Script Works: Step-by-Step Explanation

For this section I’ll hand over to Claude:

How the Google Apps Script Works: Step-by-Step Explanation

The Google Apps Script I created automates the process of analyzing and categorizing your blog posts. Here’s a detailed breakdown of how it works:

1. Core Components

Topic Keywords Definition

const TOPIC_KEYWORDS = {
  'Brand': [
    'brand', 'branding', 'logo', 'identity', ...
  ],
  // Other topics defined similarly
};
  • These predefined keyword lists for each topic are what the script uses to identify and categorize content
  • Each keyword is assigned to a specific topic category (Brand, Marketing etc.)

Custom Menu Creation

function onOpen() {
  const ui = SpreadsheetApp.getUi();
  ui.createMenu('Blog Topic Analysis')
      .addItem('Analyze Topics', 'analyzeBlogTopics')
      // Other menu items
      .addToUi();
}
  • This function runs automatically when you open the sheet
  • Creates a custom menu with options for analysis and configuration

2. The Analysis Process

Finding the Content Column

// Find header row and identify columns
const headers = data[0];
const contentColIndex = headers.indexOf('Content');
  • The script first looks for a column named “Content”
  • If not found, it tries alternative names like “Body”, “Text”, etc.
  • If still not found, it asks you to specify which column contains the blog content

Processing Each Blog Post

for (let i = 1; i < data.length; i++) {
  const content = data[i][contentColIndex] || '';
  // Analyze content and assign topics
  const { topic, confidence, keywords } = analyzeContent(content);
  // Update the sheet with results
}
  • For each row (blog post), the script:
    1. Extracts the content text
    2. Sends it to the analyzeContent function
    3. Writes the results back to the sheet

The Keyword Matching Algorithm

function analyzeContent(content) {
  // Convert to lowercase for case-insensitive matching
  const lowerContent = content.toLowerCase();
  
  // Count keyword occurrences for each topic
  const topicScores = {};
  const keywordsFound = {};
  
  for (const topic in TOPIC_KEYWORDS) {
    // Count matches using regex
    // ...
  }
  
  // Find the topic with the highest score
  // ...
}
  • For each blog post, the script:
    1. Converts the text to lowercase (for case-insensitive matching)
    2. Searches for each keyword using regular expressions that match whole words
    3. Counts how many times keywords from each topic appear
    4. The topic with the most keyword matches wins
    5. Calculates a confidence score based on the relative strength of the match

3. Output and Results

Primary Topic Assignment

// Update the sheet with the topic
sheet.getRange(i + 1, topicColIndex + 1).setValue(topic);
sheet.getRange(i + 1, topicColIndex + 2).setValue(confidence);
sheet.getRange(i + 1, topicColIndex + 3).setValue(keywords.slice(0, 5).join(', '));
  • For each post, the script adds:
    1. The primary topic (Brand, Marketing, etc.)
    2. A confidence score (0-100%)
    3. The top keywords found in the content

Secondary Topic Analysis

function addSecondaryTopicAnalysis() {
  // Get a secondary topic excluding the primary one
  const { topic: secondaryTopic, confidence } = getSecondaryTopic(content, primaryTopic);
  // Update the sheet with the secondary topic
}
  • This optional function finds the second-most relevant topic
  • Useful for identifying content that spans multiple categories

4. Customization Features

Topic Keywords Configuration

function setupTopicKeywords() {
  // Create a sheet for customizing keywords
  // Add existing topic keywords
  // Format for ease of use
}
  • Creates a separate sheet where you can view and modify the keywords
  • Any changes here can improve the accuracy of future analyses

5. Technical Features

  • Progress Tracking: Shows toast notifications as it processes to keep you informed
  • Error Handling: Gracefully handles missing data and provides user prompts
  • Performance Optimizations: Adds delays to prevent timeouts on large datasets
  • Format Enhancement: Automatically formats the output columns for readability

The script essentially works like a basic text mining algorithm – it searches for predefined keywords in each blog post and makes a statistical determination of the most likely topic based on keyword frequency and distribution. The more specific and relevant your keywords are, the more accurate the categorization will be.


Of the whole process I would consider these three steps to be the most important:

And to be honest, they’re not terribly sophisticated, as Cluade even alludes to itself: “The script essentially works like a basic text mining algorithm”. When I return to this tool for another client, I would likely revisit these steps in some way to fine-tune just a touch further, but in fact, the results were more than satisfactory for a Script that took less than an afternoon to create and saw the categorisation of 600+ blog posts.

I was then able to head back to Cluade for a full breakdown of each topic and the % of overall content it made up.

The Importance of Content Clustering for SEO

So I guess all of the above still presents the question, Ok, great, so now what? Why bother with the categorisation in the first place?

Well, I guess that depends on how large your belief is in the importance of ‘owning a space’ or ‘displaying authority’ within a certain category and its importance in being able to rank organically for topics and keywords in that area.

I do believe this to be important in the way that Google decides to rank a website, so the next steps for me using the data from 600+ blog posts that have been assigned topics or categories aligning with the website focus areas or offerings would be:

  1. Review the data and confidence scores assigned by the App Script
  2. Analyse the data to identify:
    • Where we are strongest and have the most visibility.
    • Where opportunities exist to grow further.
  3. Build topical authority further by strengthening topic authority through the implementation of strong internal linking between pages in the same category.

The App Script in Full

/**
 * Blog Post Topic Analyzer
 * 
 * This script helps categorize blog posts based on keyword frequency
 * and predefined topic categories.
 */

// Define your topics and associated keywords
// Modify these based on your specific blog content
const TOPIC_KEYWORDS = {
  'SEO': ['seo', 'search engine', 'ranking', 'keyword', 'backlink', 'serp', 'google', 'meta', 'indexing'],
  'Content Marketing': ['content', 'marketing', 'blog', 'article', 'storytelling', 'audience', 'engagement'],
  'Social Media': ['social media', 'facebook', 'twitter', 'instagram', 'linkedin', 'platform', 'engagement'],
  'Email Marketing': ['email', 'newsletter', 'campaign', 'subscriber', 'open rate', 'click', 'conversion'],
  'PPC': ['ppc', 'pay per click', 'ad', 'adwords', 'campaign', 'cpc', 'conversion', 'landing page'],
  'Web Design': ['design', 'website', 'ui', 'ux', 'responsive', 'mobile', 'layout', 'template'],
  'Analytics': ['analytics', 'data', 'metric', 'tracking', 'conversion', 'report', 'insight', 'measurement'],
  'E-commerce': ['ecommerce', 'e-commerce', 'shop', 'store', 'product', 'cart', 'checkout', 'customer'],
  'Local SEO': ['local', 'business', 'listing', 'map', 'google my business', 'review', 'citation'],
  'Content Strategy': ['strategy', 'planning', 'editorial', 'calendar', 'persona', 'journey', 'funnel']
};

/**
 * Adds a menu to the Google Sheet when it opens
 */
function onOpen() {
  const ui = SpreadsheetApp.getUi();
  ui.createMenu('Blog Topic Analysis')
      .addItem('Analyze Topics', 'analyzeBlogTopics')
      .addItem('Setup Topic Keywords', 'setupTopicKeywords')
      .addToUi();
}

/**
 * Main function to analyze blog topics
 */
function analyzeBlogTopics() {
  const sheet = SpreadsheetApp.getActiveSpreadsheet().getActiveSheet();
  const data = sheet.getDataRange().getValues();
  
  // Find header row and identify columns
  const headers = data[0];
  const contentColIndex = headers.indexOf('Content');
  const urlColIndex = headers.indexOf('Address');
  
  if (contentColIndex === -1 || urlColIndex === -1) {
    SpreadsheetApp.getUi().alert(
      'Column not found. Please make sure your sheet has "Content" and "Address" columns.'
    );
    return;
  }
  
  // Check if Topic column exists, if not, add it
  let topicColIndex = headers.indexOf('Topic');
  if (topicColIndex === -1) {
    topicColIndex = headers.length;
    sheet.getRange(1, topicColIndex + 1).setValue('Topic');
    
    // Add a column for topic confidence
    sheet.getRange(1, topicColIndex + 2).setValue('Topic Confidence');
  }
  
  // Process each row
  for (let i = 1; i < data.length; i++) {
    const content = data[i][contentColIndex] || '';
    
    // Skip rows with no content
    if (!content) continue;
    
    // Analyze content and assign topics
    const { topic, confidence } = analyzeContent(content);
    
    // Update the sheet with the topic
    sheet.getRange(i + 1, topicColIndex + 1).setValue(topic);
    sheet.getRange(i + 1, topicColIndex + 2).setValue(confidence);
    
    // Add a slight delay to prevent hitting quotas
    if (i % 10 === 0) {
      SpreadsheetApp.flush();
      Utilities.sleep(100);
    }
  }
  
  SpreadsheetApp.getUi().alert('Analysis complete!');
}

/**
 * Analyzes content and returns the most likely topic
 */
function analyzeContent(content) {
  // Convert to lowercase for case-insensitive matching
  const lowerContent = content.toLowerCase();
  
  // Count keyword occurrences for each topic
  const topicScores = {};
  
  for (const topic in TOPIC_KEYWORDS) {
    const keywords = TOPIC_KEYWORDS[topic];
    let score = 0;
    
    for (const keyword of keywords) {
      // Count occurrences of each keyword
      const regex = new RegExp('\\b' + keyword + '\\b', 'gi');
      const matches = lowerContent.match(regex);
      if (matches) {
        score += matches.length;
      }
    }
    
    topicScores[topic] = score;
  }
  
  // Find the topic with the highest score
  let bestTopic = 'Uncategorized';
  let highestScore = 0;
  
  for (const topic in topicScores) {
    if (topicScores[topic] > highestScore) {
      highestScore = topicScores[topic];
      bestTopic = topic;
    }
  }
  
  // If the highest score is 0, keep as uncategorized
  if (highestScore === 0) {
    return { topic: 'Uncategorized', confidence: 0 };
  }
  
  // Calculate a simple confidence metric (0-100)
  const totalKeywords = Object.values(TOPIC_KEYWORDS).flat().length;
  const confidence = Math.min(100, Math.round((highestScore / totalKeywords) * 100));
  
  return { topic: bestTopic, confidence: confidence };
}

/**
 * Creates a new sheet to configure topic keywords
 */
function setupTopicKeywords() {
  const ss = SpreadsheetApp.getActiveSpreadsheet();
  let sheet = ss.getSheetByName('Topic Keywords');
  
  // Create the sheet if it doesn't exist
  if (!sheet) {
    sheet = ss.insertSheet('Topic Keywords');
    sheet.getRange(1, 1, 1, 2).setValues([['Topic', 'Keywords (comma-separated)']]);
    
    // Add existing topic keywords
    let row = 2;
    for (const topic in TOPIC_KEYWORDS) {
      sheet.getRange(row, 1).setValue(topic);
      sheet.getRange(row, 2).setValue(TOPIC_KEYWORDS[topic].join(', '));
      row++;
    }
    
    // Add instructions
    sheet.getRange(row + 1, 1, 1, 2).merge();
    sheet.getRange(row + 1, 1).setValue('Add or modify topics and keywords, then run "Analyze Topics" again.');
  }
  
  ss.setActiveSheet(sheet);
  SpreadsheetApp.getUi().alert('Topic keywords configuration sheet created!');
}

/**
 * Gets custom topic keywords from the Topic Keywords sheet
 */
function getCustomTopicKeywords() {
  const ss = SpreadsheetApp.getActiveSpreadsheet();
  const sheet = ss.getSheetByName('Topic Keywords');
  
  if (!sheet) {
    return TOPIC_KEYWORDS;
  }
  
  const data = sheet.getDataRange().getValues();
  const customTopics = {};
  
  // Skip header row
  for (let i = 1; i < data.length; i++) {
    const topic = data[i][0];
    const keywordsString = data[i][1];
    
    if (topic && keywordsString) {
      customTopics[topic] = keywordsString.split(',').map(k => k.trim().toLowerCase());
    }
  }
  
  return Object.keys(customTopics).length > 0 ? customTopics : TOPIC_KEYWORDS;
}

/**
 * Advanced content analysis (optional enhancement)
 * Performs TF-IDF like analysis on the content
 */
function advancedContentAnalysis(content, allContents) {
  // This function would implement a more sophisticated analysis
  // For a Google Apps Script implementation, we're keeping it simple
  // A full TF-IDF implementation would be more complex
  
  // For now, we'll use the basic keyword matching approach
  return analyzeContent(content);
}

Step-by-Step Guide to Categorize Your Blog Posts

1. Prepare Your Data in Google Sheets

  1. Create a new Google Sheet and import your Screaming Frog data
  2. Ensure your data has a column with the blog content
    • The script will look for a column named “Content”
    • It will also try alternative names like “Body”, “Text”, “Post Content”
    • If it can’t find the content column, it will prompt you to specify it

2. Set Up the Script

  1. Open the Script Editor:
    • Click “Extensions” > “Apps Script” in your Google Sheet
  2. Delete any code that appears in the editor
  3. Copy and paste the entire code from the “Custom Topic Analyzer” above
  4. Save the project (give it a name like “Blog Topic Analyzer”)
  5. Return to your sheet and refresh the page

3. Run the Topic Analysis

  1. After refreshing, you’ll see a new “Blog Topic Analysis” menu at the top
  2. Click “Setup Topic Keywords” to see the default keywords I’ve added for your topics
  3. Customize the keywords if needed:
    • Add industry-specific terms
    • Remove irrelevant keywords
    • Add variations of important terms
  4. Go back to your main data sheet and click “Analyze Topics”
  5. The script will process all your posts and add:
    • A “Primary Topic” column (Brand, Marketing, Growth, Experience, or Organization)
    • A “Confidence Score” showing how certain the categorization is
    • A “Keywords Found” column showing which terms triggered the categorization

4. Review and Refine

  1. Sort by confidence score to identify posts that might need manual review
  2. Look for patterns in misclassified posts
  3. Update keywords in the Topic Keywords sheet based on your findings
  4. If needed, run the analysis again with your improved keywords

5. Advanced Features

The script includes a few helpful extras:

  • “Add Secondary Topic Analysis” identifies overlapping topics (e.g., a post primarily about Brand but with significant Marketing content)
  • The script handles special characters and variations in your content
  • You can easily expand to add more topics if needed

A Reminder: What the Script Does

  • Automatically identifies which column contains your blog content
  • Analyzes each post against the keyword sets for each topic
  • Assigns a primary topic and confidence score to each post
  • Shows which keywords were found in each post
  • Provides a secondary topic analysis option to identify content with overlapping themes
  • Formats the output for easy reading and analysis

In

Leave a Reply

Your email address will not be published. Required fields are marked *