SAS/Access for Database Access

The College of Public Health and Health Professions uses SAS/Access to move data to and from the PHHP MySQL database. SAS/Access provides fast uploads of large database tables as well as the ability to pass SQL queries to the MySQL database where queries on indexed fields provide for fast subsetting and sorting of large datasets.

Methods and Requirements | Libnam | Proc SQL | SQL Pass Through | Performance Tests


Methods and Requirements

Communication with the MySQL server can be done via the MySQL support in SAS or via ODBC. To use the MySQL support you will need SAS 9.1.2 or higher. ODBC support has been in SAS for quite some time. Current documentation for all of SAS can be found online at SAS 9.1.3 Documentation.

SAS Institute recommends the use of the SAS libname statement along with Proc SQL to make connections and interact with databases. A libname statement defines a name called a “libref” that will point to some data source. That data source can be a file, directory, or database on some system. The libref can then be used as shorthand for that data source in other SAS statements. SAS provides lots of engines so it can read (and usually write to) SAS, SPSS, MySQL, ODBC, Oracle, MS SQL and many other forms of content.

Proc SQL is a command in SAS which can be used to execute SQL against data sources, i.e. librefs. For remote systems it can be instructed to do an SQL pass-through which sends the query to the database server so it can be executed more efficiently. This is more efficient when doing queries that reduce the dataset with a “where” clause or aggregate it with a “group by” clause.


Libname

The libname statement points at container which might be a directory, a database, or–in special cases–a single file.

ODBC & MySQL Connectors

For the ODBC and the MySQL connectors, the container is a database. The objects inside are tables. A libname called “fin” connecting to an ODBC data source name (DSN) of “MySQL-Database” as user “alice” to access the “finance” database would look like this:

  libname fin ODBC datasrc="MySQL-Database" user=alice
  password=our_little_secret schema=finance;

Note that we have not yet referred to a table. To access a table, use refer to it as “libref.table_name”. E.g. To refer to the table ‘payables’ in the fin libref, write

  fin.payables

The libname statement also accepts database options. Should you be creating MySQL tables through the ODBC interface, this libname statement would guarantee that they are InnoDB tables.

  libname fin ODBC datasrc="MySQL-Database" user=alice
  password=our_little_secret schema=finance
  DBCREATE_TABLE_OPTS='TYPE=InnoDB';

SAS Connector

The SAS connector use directories as containers. The objects inside the containers are just files. Revising the above example, here’s a libname called “fin” pointing a directory c:\finance that we will access via the SAS connector:

  libname fin 'c:\finance';

To access a sas dataset in the “c:\finance” directory, refer to it as “libref.base-filename”. To refer to the SAS dataset “payables” in the file “c:\finance\payables.sas7bdat”, write

  fin.payables

SPSS Connector

The SPSS connector uses a full path to a filename as the container. As it is a file is has only one object inside. Revising the above example, here’s a libname called “fin” pointing a file c:\finance\payables.por that we will access via the SPSS connector:

  libname fin SPSS 'c:\finance\payables.por';

To access an SPSS dataset with a libref, refer to it as “libref._first_”. To refer to the SPSS dataset at “c:\finance\payables.por”, with the libref defined above write

  fin._first_

Proc SQL

Proc SQL can be used to access any datasets SAS can access using SQL. An example that would work the ODBC or SAS data sources defined above would look this:

  proc sql; select account, balance, due_date from fin.payables where
  balance > 100 order by due_date asc; quit;

Proc SQL can also create datasets via Proc SQL. You could create a table in a database via the ODBC connector and copy a local dataset into it like this:

  libname remote ODBC datasrc="MySQL-Database" user=alice
  password=our_little_secret schema=finance
  DBCREATE_TABLE_OPTS='TYPE=InnoDB';

  libname local 'c:\finance';

  proc sql; create table remote.payables as select * from
  local.payables ; quit;

To insert the first 3000 records of a local dataset into an existing table you could do this:

  libname remote ODBC datasrc="MySQL-Database" user=alice
  password=our_little_secret schema=finance;

  libname local 'c:\finance';

  proc sql inobs=3000 outobs=3000; insert into remote.payables select
  * from local.payables; quit;

SQL Pass Through

To boost speed of select statements against remote databases, use Remote SQL Pass Through (RSPT):

  proc sql; connect to odbc as remote (datasrc="MySQL-Database"
  user=alice password=our_little_secret schema=finance); select * from
  connection to remote ( select account, balance, due_date from
  fin.payables where balance > 100 order by due_date asc ); quit;

Performance tests were run using the ODBC interface to communicate with a MySQL 4.0 server to assess the merits of remote SQL pass through. These results were found:

Test Rows Columns Size in MySQL (mb) Size in local format (mb) Time(s) mb/s row/s
select returning 699717 rows, with index and RSPT 3,078,064 25 445 N/A 1.6 N/A 1,923,790
select returning 699717 rows, with index but not RSPT 3,078,064 25 445 N/A 115.6 N/A 26,627

More disturbing than the 70-fold loss of performance, opening the result sets within SAS causes SAS to re-execute the query. Thus casual manipulation of the results sets is quite painful if the query is not created carefully and executed with RSPT.


Performance Tests

Performance tests were run using the ODBC interface to communicate with a MySQL 4.0 server to assess the data load speed from SAS. These results were found:

Test Rows Columns Size in MySQL (mb) Size in local format (mb) Time(s) mb/s row/s
create and insert on server 100,000 40 19.6 25 (SAS) 53 0.37 1887
create and insert on server 3,078,064 25 445 451 1315 0.34 2341

Performance test were run with mysqlcc and mysql against a MySQL 4.0 server to assess the value of indicies. These results were found:

Test Rows Columns Size in MySQL (mb) Size in local format (mb) Time(s) mb/s row/s
table scan returning 1 row, without index 3,078,064 25 445 N/A 11.6 N/A N/A
table scan returning 1 row, with index 3,078,064 25 445 N/A 7.27 N/A N/A
select returning 699717 rows, with index 3,078,064 25 445 N/A 1.28 N/A 2,404,737
select returning 699717 rows, with index on 5 of 9 chars in varchar(9) 3,078,064 25 445 N/A 5.67 N/A 542,868
select returning 699717 rows, without index 3,078,064 25 445 N/A 8.62 N/A 357,083
create index on varchar(9) 100,000 N/A N/A N/A 1.90 N/A 52,632
create index on varchar(9) 3,078,064 N/A N/A N/A 59.92 N/A 51,370
create index on varchar(9)(5) 3,078,064 N/A N/A N/A 70.57 N/A 43,618
create index on varchar(17) 3,078,064 N/A N/A N/A 64.69 N/A 47,582
create index on decimal(11,0) 3,078,064 N/A N/A N/A 78.88 N/A 39,022

PHHP IT Information

Hours: 7:30am – 5pm Monday – Friday

Phone: 352.273.6200

Email: support@phhp.ufl.edu

If you have an issue with any PHHP website, please send an email to webmaster@phhp.ufl.edu.

PHHP IT Contact QR Code
Scan this QR code to create a contact for us on your smartphone!