Page 1 of 1

Designing a System to Flag Profanity in MSSQL

#1 codeprada  Icon User is offline

  • Changed Man With Different Priorities
  • member icon

Reputation: 944
  • View blog
  • Posts: 2,353
  • Joined: 15-February 11

Posted 16 September 2011 - 07:33 PM

In this hopefully short tutorial I will explain a way in which you can check user input for profanity.

Requirements
  • CHECK Constraint
  • Triggers
  • Functions
  • Variables
  • Cursors


In this tutorial we are going to:
  • Create a table that will store the words that are flagged as profane.
  • Index that table since we are going to be rarely changing the data in there.
  • Create a function that is able to test a string for profanity.
  • Create a dummy comments table that will accept some user input.
  • Place a trigger on the aforementioned table to test for profanity when a row is added or updated
  • Create a table solely for keeping track of our profanity flag on each comment.


Creating the profanity table
This table only really needs one column to hold the flagged words. If you wish to add an IDENTITY column to satisfy your needs then by all means. Nothing in this tutorial is set in stone.

The syntax for our profanity table is as follows
--Create table to hold profanity
CREATE TABLE profanity
(
	word VARCHAR(32) NOT NULL
)



Notice there isn't an index on it as yet. We'll create the index after we've inserted the data. I grabbed a list of words from the internet that proved to be quite handy.
Spoiler


Now to create our index
CREATE CLUSTERED INDEX idxProfanity ON dbo.profanity(word)


Creating our function to test profanity
Our function will use the LIKE operator to determine if there is a match. The trick to this is to not detect words within other words.
For example grass contains ass and a mere 'grass' LIKE '%ass%' will be flagged as profane. It may seem backward having the user's input on the left side of the LIKE operator but it's totally valid. The reason for this is that when we're working with an input like The cows love to eat green grass our LIKE statement would look like this
'ass' LIKE '%The cows love to eat green grass%'
If you're familiar with the LIKE operator you'll see that you will never get a match there.
Let's create our function called profanityCheck that will accept a VARCHAR of size 32 and return a TINYINT (0 or 1).
--Create function to check for profanity
CREATE FUNCTION profanityCheck(@text VARCHAR(32))
RETURNS TINYINT
AS
BEGIN
	--Our code goes here
	--Don't run this just yet
END
GO


Within our BEGIN END block we shall declare a variable with a datatype of TINYINT.
DECLARE @result TINYINT

We will then set @result to the result of a query which will count the amount of matches found.
--Detect matches from the profanity tables
SET @result = (
	SELECT 
		COUNT(*) 
	FROM 
		dbo.profanity 
	WHERE 
		@text LIKE ('%[^a-z]' + word + '[^a-z]%')
	OR
		@text LIKE (word + '[^a-z]%')
	OR
		@text LIKE ('%[^a-z]' + word)
	OR 
		@text LIKE word
)



Finally we must return out result.
--Return the number of results found
RETURN @result


We now have a entire function. Feel free to run it. To create our dummy comments table you must add this code
--Create a sample table to accept user input
CREATE TABLE comments
(
	id INT IDENTITY PRIMARY KEY,
	content VARCHAR(255) NOT NULL,
)
GO


Now to create the table that will keep track of the entries that contain profanity or not.
CREATE TABLE flag
(
	commentId INT,
	flag TINYINT,
	PRIMARY KEY(commentId)
)
GO



Our final step is creating a trigger that will be executed before an insertion or update takes place on our comments table.
--Create trigger
CREATE TRIGGER runProfanityCheck
ON comments
INSTEAD OF INSERT, UPDATE
AS
	DECLARE @content VARCHAR(255)
	DECLARE @id INT
	DECLARE @cursor CURSOR
	DECLARE @flag TINYINT
	DECLARE @insertion TINYINT
	
	--Determine if we're dealing with an UPDATE or INSERT 
	IF EXISTS (SELECT * FROM deleted)
		SET @insertion = 0 --We're not inserting but updating
	ELSE
		SET @insertion = 1 --We're inserting data
	
	--Set the cursor to read from the columns ID and CONTENT
	SET @cursor = CURSOR FOR SELECT inserted.id, inserted.content FROM inserted
	
	--Must open the cursor in order to use it
	OPEN @cursor
	
	--Similar concept as reading a file with ifstream in C++
	FETCH FROM @cursor INTO @id, @content
	
	--If the status is anything other than 0 then 
	--the fetch failed, went pass the last row or 
	--found a missing row
	WHILE(@@FETCH_STATUS = 0)
	BEGIN
		IF dbo.profanityCheck(@content) > 0
			SET @flag = 1 --Flagged for profanity 
		ELSE
			SET @flag = 0 --Not flagged for profanity
		IF @insertion = 1
		BEGIN
			--Populate our tables
			INSERT INTO dbo.comments VALUES (@content)	
			INSERT INTO dbo.flag VALUES(@@IDENTITY, @flag)
		END
		ELSE
		BEGIN
			--Update our tables
			UPDATE dbo.comments SET content = @content WHERE id = @id
			UPDATE dbo.flag SET flag = @flag WHERE commentId = @id
		END	
		--Attempt to fetch the next row
		FETCH FROM @cursor INTO @id, @content
	END
	
	CLOSE @cursor
	DEALLOCATE @cursor
GO



I'll explain why the cursor is used in our trigger. A trigger isn't executed each time a row is inserted, deleted or updated but each time INSERT, DELETED or UPDATED statement is executed (depending on what the trigger is set to respond to). Without the cursor in our trigger queries like
INSERT INTO 
    dbo.comments 
VALUES
    ('This is a row I would like to insert'),
    ('Here is another just for the fun of it')

will populate the deleted or inserted table with more than one row. This will cause a simple code like
SET @content = (SELECT content FROM inserted)

to fail.


The entire code is in the spoiler below.
Spoiler


Is This A Good Question/Topic? 2
  • +

Page 1 of 1