How to leverage Data Science for SEO

  • March 23, 2020
  • Steven van Vessum
JR Oakes

There are many processes that would take hours in spreadsheets, but take minutes using Python.

Hi JR! Can you tell us a little about yourself?

My name is JR Oakes. I am a Technical SEO working at Locomotive, an agency near Raleigh, NC. I am a former architectural glass artist, turned developer, turned SEO. I love solving problems and pride myself in having a unique ability to approach things in novel ways.

I am a constant reader, learner, and tinkerer. My true nature is to be a recluse. I don’t like large crowds or being the center-of-attention. Despite that, I have developed a love for the SEO community over the last few years by forcing myself to do what I am uncomfortable with. I have been an organizer for the Raleigh SEO Meetup and Beer & SEO Meetup, along with some other amazing SEOs from Raleigh.

This year, a few of us wanted to launch a different kind of SEO Conference, called the Findability Conference, which focused more on the research behind finding things online. This is currently postponed till later in 2020, or until we get a better sense of where the Coronavirus is headed.

We have recently also launched i.codeseo.dev which is an open source documentation website seeking to offer a Wikipedia-style approach to Technical SEO content, as well as development resources for SEOs.

You recently launched a new initiative: iCodeSEO. Can you explain the idea behind it?

Technical SEO has been a strongly growing trend over the last few years. There are a lot of concepts about what Technical SEOs do, but it is evolving into the idea that it is any SEO that embraces technology and development to gain understanding and insight towards the goal of achieving better visibility in search engines. (Russ Jones)

I have been interested in a space which serves as a consolidation of all of the new, more data science code, being written up for a while. The impetus, though, was trying to write a Technical SEO Wikipedia page, and not finding any sources that I thought would pass muster as authoritative. The SEO industry is definitely a bubble, with little trickling out to the broader world, other than large Google events.

What part of your job is tinkering with new and upcoming technologies, and what are you currently working on?

I would say that my job is equally split between tinkering, deliverables and training.

Technology is changing so fast and new possibilities open up daily. I never want to be the SEO that has a checklist they have used for ten years.

My main focus is working on language understanding to inform IA, intent-fulfillment, and targeting.

For example, users may type or say hundreds of thousands of unique search queries to find your content. How do your boil that information down into meaningful query sets, topics, to intents, and then to prioritized action.

Three things I am working on, which I think are cool, are:

  1. Better incorporating SEO testing into CI/CD processes.
  2. Using data to inform site IA via query refinement, link flow, and user behavior.
  3. Language generation and fact-checking algorithms.

I love Python because it helps make me a better SEO.

There are many processes that would take hours in spreadsheets, but take minutes using Python. I love the intuitiveness of the Python language. I also like that it is very well supported in the machine learning and data science space, meaning it allows me to easily download and test work by incredibly smart people, to see if it is valuable for SEO.

To that end, there are plenty of SEOs that are incredible, and can’t code.

I am not a great writer and not nearly as great as some in the industry at JS SEO, internationalization, etc.

Coding is my thing and it helps make us valuable, like the mastery that Aleyda Solis and Jamie Alberico have in their focus areas. When you think of an area of expertise, and someone’s name pops to mind, that is what we should be aiming for. Not whether you code, or not..

When it comes to keyword research, what data gathering do you do and what data sources do pull data from?

I love Google Search Console API, and I also use SEMRush and Ahrefs quite a bit.

We augment with scraped clean content, and data from APIs like Watson and Google’s Knowledge API.

I also tend to like just using Screaming Frog to grab data and pass that data into various notebooks. It is really easy to run clustering, TF-IDF, LDA, and other analysis on the raw document content to define key topics covered across large collections of content.

What’s the most challenging part of your job, and why?

I tend to want to keep innovating and improving. It is sometimes hard to resist the urge to try a new approach when something just needs to be done.

If you had to create a SEO strategy to rank a new site in a competitive vertical such as finance, what would your approach be?

I’d start with the basics:

  1. Develop KPIs and realistic measurement cadences.
  2. Strong technical foundation for the site.
  3. Strong analytics, crawl, behavior tracking.
  4. Competitive topic gap to determine the universe and priorities for growth.
  5. Several weeks understanding the IA, taxonomy, trends, and personas.
  6. Hire authoritative writers with strong topical authority.
  7. Develop process for Target > Research > Write > Design > Build for each page.
  8. Hire creative for asset development and outreach.
  9. Content planning and development backlog.
  10. Develop a strategy for brand awareness and social character.

While Google’s getting better at understanding query intent, it’s far from perfect. How do you think Google will try to get better and better at this?

I think Google is already good at this, for the most part.

The most interesting thing of the last couple years, IMHO, has been that Google has taken a much more multi-modal approach than just ads and links. We see results really responding to new types of content preference like for images, videos, and knowledge graphs.

I think the more interesting things involve Google triangulating content by mapping authors, brands, and search history to the knowledge graph.

I think understanding search intent is hard for search engines because people are messy and have strange ways of communication.

If Google can cobble together patterns of interest by mining search history, align that with either authors, brands, or topic, and serve that to you based on search or journey stage, that is much more interesting than trying to tell what you mean when you type “ball”.

What should technical SEOs be doing now to stay ahead of the curve?

I think the most important thing that SEOs should do, almost universally is:

  1. Read: organize a list of who gives really good information and follow them.
  2. Test: never read, and then execute. Always, read, test, then execute.
  3. Specialize: technical is becoming fragmented similar to SEO. The money is in deep knowledge and know-how. Data science, JavaScript SEO, Big Data Attribution, Page Speed, Edge SEO, Crawl Analysis, etc.

What people inspire you to keep innovating and exploring?

I can’t answer this — there are just too many. Everyday, I learn from the SEO industry.

If you review my Twitter followers, for the most part, that is who I have found I learn from, with a great signal to noise ratio.

Continue reading in-depth interviews with SEO specialists

Useful resources
Steven van Vessum
Steven van Vessum

Steven is ContentKing’s VP of Community. This means he’s involved in everything community and content marketing related. Right where he wants to be. He gets a huge kick out of letting websites rank and loves to talk SEO, content marketing and growth.

Start your free 14-day trial

Get up and running in 20 seconds

Please enter a valid domain name (www.example.com).
  • No credit card required
  • No installation needed
  • No strings attached