MySQL: Count occurrences of string

Drupal and MySQL

Today I needed to quickly find out a list of node's which did not contain a certain token (one we use on PPOnline to insert adverts into articles). There were meant to be 2 of these tokens per article but we had noticed some only had one. Some didn't have any at all!

Now, this might already exist as a neat function somewhere, but I couldn't find it. So I wrote this little query to find the number of occurrences of a substring in a larger block of text.

I did look into using the SUBSTRING() and SUBSTRING_INDEX() commands, however I couldn't see a way of efficiently using them to find any number of occurrences. So I settled on the following…

[adsense:468x60:4496506397]
SET @findme="MySQL";
SELECT
  n.nid, n.title,
  CAST((LENGTH(nr.body) - LENGTH(REPLACE(nr.body, @findme, ""))) / LENGTH(@findme) AS UNSIGNED) AS findme_count
FROM node n
INNER JOIN node_revisions nr ON nr.vid = n.vid
WHERE n.type = "blog"
ORDER BY findme_count DESC;
[adsense:468x60:4496506397]

The above example will list all node's of type blog. It will then compare the length of the body before and after removing all instances of the string you're looking for. The result should be n * LENGTH(find_me) where n is the number of occurrences. This is why we divide by the length of the find_me string.

This is a pretty fast query too. On our live database, it searched over 1,400 nodes in just over 31 milliseconds.

I hope this saves someone some time in the future!

Comment Icon

8 Comments

The most recent comment was on Mon, 5th Dec 2011 - 12:43

Thank you for idea!
How to simply count the substrings - even easier:

select length(haystack)-length(replace(haystack,"needle","")) as needlecount

Was helpful, thank you!! Saved me an hour to be sure!

Shouldnt it be

select (length(haystack)-length(replace(haystack,"needle","")))/length("needle") as needlecount

?

DannyB is correct... But essentially, its the same command. The CAST is to avoid floating point rounding errors.

smart and simple

WOW!
That is some proper out-of-the-box thinking. Saved me some in-the-box thinking loops :)

Add new comment

Filtered HTML

  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <pre> <ul> <ol> <li> <dl> <dt> <dd> <img> <p>
  • You can use BBCode tags in the text. URLs will automatically be converted to links.
  • You can enable syntax highlighting of source code with the following tags: <code>, <pre>, <bash>, <css>, <html>, <js>, <jquery>, <mysql>, <php>. PHP source code can also be enclosed in <?php ... ?> or <% ... %>.

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.
Type the characters you see in this picture. (verify using audio)
Type the characters you see in the picture above; if you can't read them, submit the form and a new image will be generated. Not case sensitive.