Subscribe to andrewsw's Blog        RSS Feed
-----

Basic SQL Knowledge

Icon 1 Comments
It surprises me how often we encounter people who do not know how to join two tables together in a single query. People seem to find just enough information to create a table, to define a primary key, and to write a simple SELECT query based on the table. They do not read beyond this basic knowledge.

I suppose we could blame the internet to an extent. If someone wants to create a database table they can find this information quite quickly. Creating and working with a database is a complex topic. Designing a database, with valid, and effective, relationships (foreign and primary keys) is a complex task. It requires study; and time, and more study.

We have many free and online tools (sqlFiddle, for example) that we can use to build and test a database. Get to know your database's console, command panel, or command-line. Before attempting to create and run a query from within your programming language, test the query, and understand it, using your database's control panel. Once you've understood it, then you can start writing programming code to execute the query. There is no point in starting to code unless you understand your query, and know that it will work.

Below I discuss some common mistakes and misunderstandings.



Odd characters and reserved words

Don't include spaces or other non-alphanumeric characters in your database, table or field names. With some databases it is common practice to use underscores within tables and/or field-names, so that's fine, but don't use any other non alphanumeric characters, and don't start the name with a number (even though many databases will accept this). Programming languages don't allow odd characters in identifiers, or for them to start with a number, so you might as well carry these standards across for your database work.

If you are using such characters then you will need to delimit these table or field names. The delimiters for MS SQL Server and MS Access are [square brackets], other databases use `back-ticks`. (There are other characters that some databases use.)

If you use reserved words as identifiers, such as [table] or [user] then these also need to be delimited. It is far better that you don't use reserved words as identifiers, and don't use non-alphanumeric characters. Also bear in mind that each database has its own set of reserved words so, when you move between database-types, it is easy for you to be caught out, not realising that a word that is acceptable in one database is reserved in another. For example, the words Password and Date are reserved in MS Access, but acceptable in many other databases.

Don't use the word Date (or Time) as a field-name, even if you've checked that it isn't reserved by your database-type. Date alone is meaningless: the Date of what? TransactionDate or TransDate (StartDate, etc.) is much more descriptive.

Also, the list of reserved words has changed over time for each database, and each list is sometimes longer than you think it is. To labour the point:

Problem names and reserved words in Access

Note: Doubling-up a field-name makes it far less likely that it will be a keyword: StartDate, FirstDate, etc.. Nevertheless, bookmark a page that lists your database's reserved words. I am not recommending doubling-up field-names simply to avoid reserved words (possibly making a name longer than it needs to be), I am just letting you know that such a name is less likely to conflict.

Avoid using * (all fields)

SELECT * FROM TableName

I know that this is handy, to select all fields from a table, but it is inefficient, and mostly unnecessary, to do this. In most cases you will know which fields you need, and their names are very unlikely to change. Besides, you have limited control over the data retrieved using *, unless you know the field-names anyway. You should definitely not assume that the fields will be returned in a particular order, using code such as .Fields(0).

I know that it is a pain to list, comma separated, 50 fields (although there are ways to obtain such a list without typing it yourself in full). But the question I will ask is: why do you need all 50 fields? Fair enough, you can work comfortably with this number of columns (and more) in Excel, but 50 columns is too many for most other GUI/forms and reports. A widely acknowledged principle, and practice, with databases is to only return the data you need. This reduces the activity, and burden, against the database, and is more targeted to your user's needs.

Don't use meaningless names

SELECT field1, field2 FROM table1

Even at a very early stage in your studies, what is the point in doing this? and how much longer does it take to create meaningful names? You will see this pattern in books, tutorials, and even in answers to forum questions. The author's point is to convey the general meaning and syntax of a SQL statement, but you should be replacing field1, table1, etc., with meaningful names.

In a similar vein, I've seen this a few times:

SELECT FirstName AS [FirstName] FROM SomeTable

There is nothing gained by giving a field (or table) an alias which is the same as the original field-name. It will probably also cause problems or errors with some databases.

Joining two tables

SELECT tblStaff.FirstName, tblStaff.Surname, tblDepts.Department 
FROM tblStaff INNER JOIN tblDepts 
ON tblStaff.DeptID = tblDepts.DeptID;

This is how we obtain fields from two tables that are related (joined) by a common field, the DeptID.

If a field-name only occurs in one of the two tables then there is no need to qualify it with the table-name:

SELECT FirstName, Surname, Department 
FROM tblStaff INNER JOIN tblDepts 
ON tblStaff.DeptID = tblDepts.DeptID;

Note that the DeptID's are qualified with their respective table-names, because the same field-name is used in both tables.

However, many people prefer to qualify every field with its table-name, so that they don't have to be concerned about possible duplicates, and it is obvious where each field originates.

Study table joins:

SQL Joins :techonthenet

A Visual Explanation of SQL Joins

There are good tutorials here @DIC as well, click the Tutorials link at the top of this page. There is a specific Database Tutorials section.

Don't use LIKE without a wildcard

SELECT FirstName, LastName FROM tblStaff WHERE FirstName LIKE 'Bob'

If you are looking for names beginning with 'Bob' use the wildcard for your database (typically * or %) that matches any other characters:

SELECT FirstName, LastName FROM tblStaff WHERE FirstName LIKE 'Bob*'

If you are matching the name EXACTLY just use equality:

SELECT FirstName, LastName FROM tblStaff WHERE FirstName = 'Bob'

There is no point in using LIKE without a wildcard character. You are just checking for equality and should use =, and the query is likely to be less efficient. (Perhaps your database is capable of optimizing the expression to use equality. Nevertheless, the use of LIKE is redundant, and probably confusing.)

Store dates as dates

Don't store dates in a TEXT (VARCHAR, etc.) field, store them using the correct DATE (or DATETIME, etc.) data-type. You can then accurately match dates using date-functions and date arithmetic. You'll be able to easily find all those dates in a particular year and/or month, etc..

Stored as text, a statement like this:

SELECT StartDate FROM SomeTable WHERE StartDate='01/02/2015'

is assuming that ALL the dates have been entered with the format 'MM/DD/YYYY'. Because it is a text-field this isn't guaranteed.

1 Comments On This Entry

Page 1 of 1

Curtis Rutland 

29 July 2015 - 05:11 PM
One bit of advice that I'd like to add:

If your language and DBMS allows for it, use parameters when dealing with input. Avoid concatenating user input directly into SQL strings (or any kind of string interpolation/sprintf/string.Format/cout/etc...). A very common pattern:

var sql = "select * from mytable where fieldval='" + variableHere + "'"


This is bad, for a lot of reasons. One, it's difficult to read when you're maintaining code. Two, and this is much bigger, it opens you up for what's called "SQL Injection Attacks". Without going too deeply into it, you're allowing the user to directly manipulate the command you're sending to the database. With some guesswork, an attacker can insert values that can cause your database to dump private info back to himself, or even send commands to ruin your database by attempting to delete or truncate your tables.

Obviously this is a bad thing. And it's more dangerous than you think; it doesn't take a determined attacker to find these flaws. I accidentally found one on a website when I accidentally mistyped a single-quote character into a search box, and suddenly saw a database error message displayed. It isn't inconceivable that your average user might put something in that breaks your query. Like I said, in mine all it took was a single apostrophe; what if someone's name is "O'Brien"? Even if it doesn't corrupt your data, it can break the application experience for a user.

How to avoid? Parameters are the answer. The implementation varies from language and database, of course. But look them up. A parameterized sql statement might look like this:

var sql = "select * from mytable where [email protected]"


Note a couple of things:

1) I didn't have to put quote marks around it, even though it's textual data (or at least, pretend I specified that fieldval is a varchar). Text needs to be quoted in a sql string, but the parameter represents a value rather than being a value. It's not text itself.

2) Doesn't that look easier to read? You don't have to mentally put the string back together.

Now, of course by itself that will do nothing. It has to be sent to the DB with a value as well. This is going to vary, language to language. I'll show how it looks in C#:

private static void Select() {
    string cmdStr = "SELECT FirstName, LastName, Telephone FROM Person WHERE FirstName = @FirstName";
    using (SqlConnection connection = new SqlConnection(ConnectionString))
    using (SqlCommand command = new SqlCommand(cmdStr, connection)) {
        command.Parameters.AddWithValue("@FirstName", "John");
        connection.Open();
        SqlDataReader reader = command.ExecuteReader();
        while (reader.Read()) {
            string output = "First Name: {0} \t Last Name: {1} \t Phone: {2}";
            Console.WriteLine(output, reader["FirstName"], reader["LastName"], reader["Telephone"]);
        }
    }
}



Here's a link on how to do this in C#: http://www.dreaminco...ery-a-database/

But I strongly recommend you learn how to do this in whatever language you use, for all your querying.
1
Page 1 of 1

Trackbacks for this entry [ Trackback URL ]

There are no Trackbacks for this entry

December 2019

S M T W T F S
1234567
891011 12 1314
15161718192021
22232425262728
293031    

Tags

    Recent Entries

    Recent Comments

    Search My Blog

    1 user(s) viewing

    1 Guests
    0 member(s)
    0 anonymous member(s)

    Categories