
2EE and .NET are two comprehensive frameworks, especially when you consider their data access layers. JDBC (Java DataBase Connectivity) on one side, and ADO.NET (evolution of ActiveX Data Objects) on the other side are two efficient and flexible APIs that enable us to connect to many different data storage systems. In this article, we'll focus on the usage of those APIs to access relational data.
After a short reminder of each technology historic, we'll review different ways to interact with databases, and we'll ask the following recurring questions :
Should we adopt a connected or a disconnected data interaction technique ?
How to manage transactions ?
Do JDBC and ADO.NET integrate a connection pool ?
We won't write about object/relational mapping here (on this topic, you can consult Sami Jaber's articles on ObjectSpaces and competitors), but only about the data access API features.
JDBC is born in 1996 along with JDK 1.1. Even at that time, this API enabled to connect to any relational database, supposing a connection "driver" existed for this database. Indeed, JDBC is only a programming interface aiming at interacting with a database. In order to really use it, it is mandatory to own a specific implementation of JDBC. That's what we call a Driver.
There are 4 different JDBC drivers types :
Type 1 : this is a simple JDBC bridge to... ODBC, a database connection API from Microsoft (cf next section). The aim of this driver is to enable Java developers to connect to any database for which an ODBC driver exists. Unfortunately, this driver has some major drawbacks :
An ODBC driver is a shared library, ie it is implemented in the form of native code (dll). An application that would rely on this kind of driver wouldn't be 100% Java, which harms its portability.
Each desktop on which a Java application is installed (an application that uses this driver) must be configured : the driver must be installed on each machine, and the ODBC data source must be well set. This makes client-server Java applications difficult to deploy at a large scale.
Last, ODBC is slow and has some security weaknesses. Adding a Java layer (a wrapper) on top of such a library won't solve anything... Please note that these critiques are coherent with Microsoft DB connection tools evolution : ODBC isn't used any more in recent applications.
Type 2 : it's quite the same idea, but this time the native library (often written in C) is encapsulated in Java by the database editor. Hence, performances match those of a C/C++ application that would connect to the database using the same library. But even though performance is good, we still have the deployment problem : a Type 2 driver isn't written in 100% Java either and can prove to be difficult to deploy at a large scale in a client-server mode. Of course, in a 3 tiers architecture, client desktops don't connect to the database directly, so this problem is limited to the server configuration.
Type 4 : the driver is written 100% in Java and becomes trivial to deploy. The question is : what about performance ? They can be excellent (sometimes even better than for a Type 2 driver for the same database !) or deplorable. It depends on the implementation quality...
Type 3 : an old-fashioned driver, used in particular
by Java Applets as a workaround. Indeed, an Applet can only connect
(via TCP/IP) on the HTTP server it comes from; our database must be
placed on the Web server so that an Applet can access it. Tell your
database administrators, they will probably not agree with that !
Hence, the idea behind Type 3 driver is to place a "JDBC
server" on the Web server; all our Applets can connect on this
specific server which will simply forward SQL queries to a database.
Then, the database can be anywhere in the server local area
network.
But honestly, considering the way Applets degenerate,
it becomes pretty rare to speak about this kind of driver.
Database vendors generally provide a JDBC driver (generally a Type 2 or 4). But sometimes, performance or ease of deployment considerations may lead you to buy a third party driver, developed by firms expert in this domain. Have a look at Sun web site to get an exhaustive list of existing JDBC drivers (cf resources).
JDBC 1.0 (born in 1996) : the first version of the API enables of course to connect to a database, to run SQL queries and to handle query results. But we have to admit that JDBC 1.0 contains serious gaps, which have been filled in by subsequent versions.
JDBC 2.0 (born in 1998) : the API now splits into two parts named "core" and "optional package". In addition to basic features the previous version already proposed, we find in JDBC 2.0 :
the possibility to use bidirectional cursors, supporting relative and absolute moves
a set of advanced data types such as BLOBs and SQL3 structured types
data update simplifications : records can be programmatically updated on the fly in Java (while iterating on a SQL query results) and not in SQL
an optimization named "Batch updates" that can group and send a whole set of SQL queries at once to the database. This progress is significant in the case of many precise database updates. Imagine a dozen of INSERTs, UPDATEs or record DELETEs; if we had to send queries one after the other (and then systematically establish a common context between the application and the database), the communication overhead would become prohibitive. Grouping this kind of queries enables us to send a whole set of requests using only one communication, and the database interpreter / optimizer to sequence and optimize queries in a totally autonomous way.
a new way of handling query results : "Rowsets". Those new classes enable disconnected data management so if we use them appropriately they will :
limit network interaction
improve applications responsiveness
improve data management parallelism.
last, JDBC 2.0 natively support distributed transactions and connections pooling.
JDBC 3.0 (born in 2002) : this last version can mask some proprietary parts of relational databases engines and can standardize some advanced optimizations. More specifically, JDBC 3.0 offers the possibility to :
get back values of auto-generated keys (some engines call them "sequence", other "auto-incremented integers") when an insert occurs in a table of the database
iterate over many ResultSets in parallel : a procedure or a SQL query can send back many record sets; up to now (using JDBC 2.0), the only way to handle those record sets was sequentially. Switching to the next record set closed the preceding. This limitations disappears and we can now run across many record sets in parallel.
standardize some connection pools configuration parameters (initial, max and min size, inactivity timeout duration before opened connections in the pool are really closed...)
PreparedStatement pooling : as we'll see in the following sections, it is possible with JDBC to pre-compile a SQL query execution plan. Unfortunately, this pre-compilation used to be associated with a connection (up to JDBC 2.0). But in multi-tiered architectures, using connections pools becomes a standard, hence we cannot be sure that the connection the pool will "lend" to us is the one we pre-compiled a query a moment before ! Well, it is the case in JDBC 3.0 since query pre-compilation is now shared by all the connections of a same pool.
last, JDBC transactional model evolves. Up to now, only flat transactions were supported : begin() followed by either commit() or rollback(). Today, we can also place "SavePoints" that punctuate our relationship with the database. This new concept will make our rollback() much more precise : it can put the database back to a saved state that is identified by a SavePoint.
To conclude, let's say that (unfortunately) all drivers don't support all the features we just spoke about. It will take some time till JDBC drivers developers can implement all JDBC 3.0 optimizations. Except for this, everybody will admit that JDBC is a sharp and flexible tool that enables any Java application to connect to any relational database. And even if we don't mention it here, do know that JDBC can also interact with other storage systems such as spreadsheets or structured files...
The very first database connection interface in Microsoft environments is of course ODBC (Open DataBase Connectivity), which appeared before JDBC. This generic interface also enables an application (developed in Delphi, Python, VB or VC++ at that time) to connect to any database as soon as an ODBC driver exists for this it.
ODBC is a pretty low level interface. It is not very easy to use, especially in Visual Basic. This need lead Microsoft to introduce RDO (Remote Data Object) so that VB can easily communicate with a relational database. RDO is in fact a layer on top of ODBC, and enables VB to connect to a local or a remote database.
RDO has never been made accessible to VC++ (via Microsoft Foundation Classes), hence developers in this language had to wait for DAO (Data Access Object) to see the integration of data access classes into MFC. But initially, DAO only allowed applications to connect to a Microsoft Access database (via the Jet engine). Knowing DAO success, it was a wise decision to extend the list of databases DAO could access; the technique was simple : DAO can connect either to the Jet engine or to a standard ODBC driver via the ODBCDirect bridge.
Let's see... This makes a good deal of interfaces : ODBC, RDO, DAO... having the same use ! Then Microsoft decided to standardize data sources access with OLE-DB (Object Linking and Embedding - DataBase). This new API enables any language (OLE-DB is based on COM components, hence agnostic to programming languages) to connect to any kind of data source (relational databases, structured files, spreadsheets, message exchange systems). OLE-BD performance generally are far better than those of the previous APIs : to each data source corresponds an "OLE-DB provider", totally re-implemented for this new API and hence that doesn't inherit from its predecessor weaknesses. At worst, if no OLE-DB provider exist for a data source, we can use an OLE-DB / ODBC bridge.
Unfortunately, OLE-DB is a COM API (Component Object Model), hence pretty low level. It has rapidly been simplified by ADO (ActiveX Data Objects), in which we can manipulate interfaces such as Connection, Command, and Recordset. ADO was available for every developer, whatever their programming language. Its philosophy was "connected", that's to say that it was typically made to maintain the connection between applications and databases while dealing with SQL queries results.
With the advent of .NET, the standard data source connection API becomes ADO.NET, initially called ADO+. In this API, the philosophy is more "disconnected" : applications execute their queries on the database, take back a copy of the results, and immediately free the connection before handling data. We'll come back on this style in the next sections.
Like JDBC, ADO.NET is only a programming interface. In order to connect to a data source, we have to get an implementation of this interface. That's what we call a ".NET Data Provider".
Good news ! 2 different .NET Data Providers are included in the .NET framework by default :
SQL Server .NET Data Provider : this provider enables to connect efficiently to SQL Server 7.0 and upper. Using this provider, our applications benefits from excellent performance since this new provider doesn't rely on any other data access layer but directly implements SQL Server access protocol.
OLE DB .NET Data Provider : Microsoft already standardized the way to connect to any kind of data source through an efficient API with OLE DB. This provider is in fact a bridge between the ADO.NET and OLE DB APIs. In fact, any data source accessible via OLE DB becomes accessible for a .NET application. A special case : SQL Server previous to 6.5 must use OLE DB.
If your data source can only be accessed by ODBC (Excel for example), note that there is also an ODBC .NET Data Provider, freely downloadable on Microsoft's Web Site (cf References).
However, we have one little regret regarding this data access API design : an application code using the SQL Server provider isn't the same as the one using OLE DB. More specifically, there are as many class hierarchies as .NET data providers ! A little sample :
// Using the OLE DB .NET Data Provider
OleDbConnection cnx = new OleDbConnection("...");
cnx.Open();
OleDbCommand cmd = new OleDbCommand ("...", cnx);
OleDbDataReader reader = cmd.ExecuteReader();
// Using the SQL SERVER .NET Data Provider
SqlConnection cnx = new SqlConnection("...");
cnx.Open();
SqlCommand cmd = new SqlCommand ("...", cnx);
SqlDataReader reader = cmd.ExecuteReader();
The bets are on that Microsoft will review the design of this API in future versions of the .NET framework and will use a Factory class as well as a set of abstract classes to reduce the coupling between user classes and the type of provider it uses to connect to the data source...
Well. Now that we know where ADO.NET and JDBC come from, we are going to review their respective features and to put in parallel Java and C# code samples showing those features.
Executing a SQL query using JDBC is trivial. Have a look :
import java.sql.*;
public class PremiereConnexion {
public static void main(String[] argv){
try{
// Loading "Hypersonic SQL" database driver
Class.forName("org.hsql.jdbcDriver");
// Opening the connection
Connection cnx = DriverManager.getConnection
("jdbc:HypersonicSQL:hsql://localhost") ;
// Executing a SQL query on this connection
Statement stmt = cnx.createStatement();
ResultSet rs = stmt.executeQuery("SELECT nom, prenom, age FROM Gurus");
// Handling results
while (rs.next()){
String nom = rs.getString("nom"); // or rs.getString(1);
String prenom = rs.getString("prenom");
int age = rs.getInt("age");
// Do something useful with data
}
} catch(Exception e){
// Potential JDBC exceptions management...
} finally {
try{
if (cnx != null){
cnx.close();
}
}catch(Exception e){
// Handling the fact that the connection cannot be closed.
}
}
}
}
PremiereConnexion.java
ADO.NET isn't outdone :
public class PremiereConnexion {
public static void Main(string[] argv){
// Connection configuration
OleDbConnection cnx = new OleDbConnection
("Provider= SQLOLEDB;data source= PAR-GIL;initial catalog=DNG;"
+ "password= héhéhé;persist security info=True;user id=sa;"
+ "workstation id=PAR-GIL;packet size=4096");
cnx.Open();
OleDbCommand cmd = new OleDbCommand
("SELECT Nom, Prenom, Age FROM Gurus", cnx);
OleDbDataReader reader = cmd.ExecuteReader();
while (reader.Read()) {
String nom = reader.GetString(0);
String prenom = reader.GetString(1);
int age = reader.GetInt32(2);
// Do something useful with data
}
cnx.Close();
}
}
PremiereConnexion.csThe preceding Java code isn't optimal : if you notice that you execute the same queries many times in the same program, it is recommended to use a PreparedStatement instead of a Statement. It will accept parameters that may vary from one invocation to the other. He added value of a PreparedStatement is that the SQL query interpretation and the execution plan in the database will only be done once even if we execute this query many times, which noticeably improves data access performance.
// Only the query execution chances.
// Everything else is the same
PreparedStatement stmt = cnx.prepareStatement
("SELECT nom, prenom FROM Gurus WHERE age < ?");
stmt.setInt(0, 26); // Equivalent to "WHERE age < 26"
ResultSet rs = stmt.executeQuery();
UtilisationPreparedStatement.javaYou can notice that a PreparedStatement simplifies dynamic SQL queries design : we can replace ? by the values we want with clean invocation of dedicated methods such as "setInt(indice)". This is even true for String values : it is not necessary to double simple quotes contained in strings nor to manually handle special characters.
The preceding C# code already used an intermediary object called the Command object (OleDbCommand here) which plays exactly the same role as JDBC' PreparedStatement. Hence, in ADO.NET as well, we can put parameters in the query that we will set afterwards in the program (before launching the precompiled query of course). ADO.NET code simplicity is comparable to JDBC :
// Only the query execution chances.
// Everything else is the same
OleDbCommand cmd = new OleDbCommand
("SELECT Nom, Prenom FROM Gurus WHERE age < ?", cnx);
cmd.Parameters.Add(new OleDbParameter("age", 26));
OleDbDataReader reader = cmd.ExecuteReader();
UtilisationParametresCommand.csOf course, both JDBC and ADO.NET can execute stored and precompiled procedures on the database. We won't debate here about the never-ending argue between :
stored procedures supporters saying that procedures :
mask the database schema to external applications (and hence encapsulate the database),
speed up applications,
are pre-compiled therefore validated towards the database schema
critics who answer that stored procedures :
are written in a proprietary language (PL/SQL, Transact-SQL...), which is not object oriented, therefore less flexible and less evolutionary than Java, C# or VB.NET code...
run the risk to be assigned many more responsibilities than the simple data access one; this may overload computations in an inappropriate execution environment (in particular, it is difficult or impossible to implement dynamic load balancing on a cluster of database instances that would run the same stored procedures)
It is up to you to adopt a position (a compromise seems to be a wise decision as usual...). We'll only give you an example of how to trigger a stored procedure via our preferred interfaces. But first, here is the Transact-SQL code of the stored procedure we will invoke (compiled and stored in SQL Server 2000) :
CREATE PROCEDURE [dbo].[GetYoungGurus] @age int AS SELECT Nom, Prenom FROM Gurus WHERE Age < @age GO GetYoungGurusProc.sql
Invoking this procedure using JDBC isn't a problem :
CallableStatement stmt = cnx.prepareCall("{call dbo.GetYoungGurus(?)}");
stmt.setInt(1, 26);
ResultSet rs = stmt.executeQuery();
InvocProcStock.javaSame thing in ADO.NET :
OleDbCommand cmd = new OleDbCommand("GetYoungGurus", cnx);
cmd.CommandType = CommandType.StoredProcedure;
cmd.Parameters.Add(new OleDbParameter("age", 26));
OleDbDataReader reader = cmd.ExecuteReader();
InvcProcStock.csStill in the objective to optimize our database access code, note that Java applications can send a set of SQL queries at once to the database. The JDBC API calls this “Batch Update”. The principle is very simple, and well depicted by the following lines :
Statement stmt = cnx.createStatement();
stmt.addBatch("INSERT INTO Gurus VALUES ('Sami', 'Jaber', 28)");
stmt.addBatch("INSERT INTO Gurus VALUES('Thomas', 'Gil', 25)");
int [] updateCounts = stmt.executeBatch();
UtilisationBatchUpdate.javaADO.NET doesn't directly propose an equivalent of the Batch Update, but you'll see in the following section that it is not penalizing since we'll be able to simulate it using a DataSet.
Java RowSets are JavaBeans (therefore serializable) that can be connected or disconnected (since JDBC 2.0). There may be as many RowSets as database vendors or as data access frameworks vendors, but Sun typically provides us with 3 RowSets by default :
the "CachedRowSet" : disconnected RowSet that caches its info in memory (data and meta-data). This object is not adapted to manipulating huge sets of information, but its memory print is quite light (indeed, in terms of libraries, no need to have a full JDBC driver ! The CachedRowSet is ideal for light clients that don't need a full implementation of JDBC.
The "JDBCRowSet" : a simple wrapper of the ResultSet which makes it compatible with the JavaBeans specification. The aim is typically to enable graphical application designers to graphically “bind” datasources to graphical widgets (Jtable, Jtree, JList)
the "WebRowSet" : a connected RowSet that uses HTTP to dialog with a server side component responsible of the data access (typicallly a JSP/Servlet, but why not an ASP.NET...).
In this list, only the CachedRowSet corresponds to a disconnected data access architecture. Here is a simple example of Java code manipulating this class :
Connection cnx = DriverManager.getConnection
("jdbc:HypersonicSQL:hsql://localhost");
Statement stmt = cnx.createStatement();
ResultSet rs = stmt.executeQuery("SELECT nom, prenom, age FROM Gurus");
CachedRowSet crset = new CachedRowSet();
crset.populate(rs);
crset.execute();
// Disconnect from the data source
rs.close();
stmt.close();
cnx.close();
// Disconnected data processing
while (crset.next()) {
System.out.println(crset.getString("nom"));
}
UtilisationCachedRowSet.javaIn this short Java code, you will have noticed that a RowSet stands for a record collection in our program memory. It is absolutely not connected to the data source we used. And to fill it, we just have to invoke « populate() ». In this example, we simply read information of the RowSet; we could have done that through a standard ResultSet and it would have been more efficient and less expensive in terms of memory occupation. But if the same data can be useful to our program again in the future, or if a complex data processing must be implemented in Java, our RowSet will act as a cache and will become very interesting because it will avoid many interactions between our application an the database.
The ADO.NET framework can also offers a disconnected data access mechanism; its central class is DataSet and globally behave as a RowSet. What is very interesting in the .NET platform is that everything is organized to easily manipulate a DataSet :
Creating and configuring a DataSet can be done by a simple drag and drop in VisualStudio.NET. Then, in the scope of a Web application, it is trivial to display our DataSet in an HTML table or a list using the <asp:Repeater> or <asp:DataGrid> Web controls.
If you develop a data access WebService, it is also very easy to build a DataSet and to serialize it in XML as a return type of our WebService (of course, the clients of this WebService had better be able to understand the XML grammar used by the DataSet serialization. In practice, it is preferable that clients of such a WebService also be implemented in .NET, so that they can directly deserialize XML data into a DataSet on the client side).
WindowsForms rich clients can also be built to present tabular information contained by a DataSet (in a ListBox or a DataGrid control for example).
As you can see, the .NET framework is centered on this disconnected data processing object, the DataSet, even though it is still possible do handle query results on the fly through a DataReader.
In terms of programming, the code enabling to fill a DataSet is conceptually very close to the Java / RowSet equivalent. Have a look :
SqlCommand cmd = new SqlCommand();
cmd.CommandText = "SELECT Nom, Prenom, Age FROM Gurus";
cmd.Connection = cnx;
// A DataAdapter can execute queries on the database and fill
// the DataSet with those queries results
SqlDataAdapter da = new SqlDataAdapter();
da.SelectCommand = cmd;
DataSet ds = new DataSet();
da.Fill(ds);
foreach (DataRow row in ds.Tables[0].Rows){
Console.WriteLine("Age du gourou : " + row["age"]);
}
UtilisationDataSet.csEverything is done to simplify our lives. But wait, there's more : VisualStudio.NET also helps us to design typed DataSets, ie to shape DataSets for special cases. In our example, using a typed DataSet would enable both to improve (a bit) performance and to make our C# code clearer. Suppose we created a DataSet named “GuruDataSet” using VisualStudio.NET; our code becomes :
SqlCommand cmd = new SqlCommand();
cmd.CommandText = "SELECT Nom, Prenom, Age FROM Gurus";
cmd.Connection = cnx;
SqlDataAdapter da = new SqlDataAdapter();
da.SelectCommand = cmd;
// Typed DataSet : all tables and columns are available
// through intellisense at coding time !
GuruDataSet gds = new GuruDataSet();
da.Fill(gds);
foreach (GuruDataSet.GurusRow row in gds.Gurus.Rows){
Console.WriteLine("Age du gourou : " + row.Age);
}
UtilisationDataSetType.csThe beginning of the code doesn't change. But as soon as our GuruDataSet is filled, iterating over it becomes much more intuitive and more robust than in the previous example. In fact, the GuruDataSet can only contain (in our design) the "Gurus" table, whose columns are respectively "nom, prénom, age". Of course, designing a typed DataSet is fully graphical in VS.NET :

Since the DataSet is typed, developers cannot mistake any more on tables or columns names : they know at compile time if they misspell anything, which helps developing robust applications very rapidly.
Up to now, we used RowSet / DataSet to read data. It is the simplest and the more frequent usage. But it is also possible to modify data in memory, to add new records, to delete others, and of course to apply these modifications to the original data source. Let's do it in Java ...
// Moving to an absolute row number
// (relative moves are also possible)
crset.absolute(1);
// In memory modification
crset.updateInt("age", 26); // Updating the guru's age
crset.updateRow();
// Inserting a new record
crset.insertRow();
crset.updateString("nom", "Roques");
crset.updateString("prenom", "Pascal");
crset.updateInt("age", 35);
// Deleting an existing record
crset.deleteRow();
// Updating the data source
crset.acceptChanges();
ModificationRowSet.java... and in C# (using a typed DataSet here) :
// In-memory modification gds.Gurus[2].Age = 26; // Updating a guru's age gds.AcceptChanges(); // Inserting a new record GuruDataSet.GurusRow row = gds.Gurus.NewRow(); row.Nom = "Roques"; row.Prenom = "Pascal"; row.Age = 35; gds.Gurus.AddGurusRow(row); // Deleting an existing record gds.Gurus[2].Delete(); // Updating the data crset.acceptChanges(); ModificationRowSet.cs
But we have to pay attention to the fact that applying modifications to the data source is much more complex that those example may suggest. Indeed, we have to handle complex situations in which many people work in parallel on different DataSets / RowSets that stand for the same source data ! We won't enter those details here, but it is possible to be informed of data merging problems through exceptions, and to handle those errors programmatically.
Both JDBC and ADO.NET offer two data access strategies : one is connected (ResultSet or DataReader), the other disconnected (RowSet or DataSet). What is subtle here is to determine the criteria that will lead us to choose one or the other in each context...
Using a DataSet/RowSet in a system where simultaneous clients are few is possible (imagine light clients using ASP.NET or Servlets/JSP on the server side, or rich clients using WindowsForms or JavaSwing) when data access is read-only. In this case, there is absolutely no risk of incoherence or merge problem between different copies of data used in parallel by each client.
Using a DataSet/RowSet in this case is interesting for simplicity reasons : building a rich or a light client that displays the contents of a DataSet is incredibly simplified by VisualStudio.NET (everything can be done graphically !), and we can even handle pagination on a big DataSet.
Of course, the constraint will be memory consumption : if each client runs different queries, we will have to maintain as many DataSet/RowSet as simultaneous clients :
For rich clients, it means that we download a lot of data on the client side; hence latency is big at the startup, but small afterwards. In addition, users desktops need to have more memory than in other architectures (remember : CLR and JRE already are pretty big...).
For light clients, it means that we will need to have a lot of memory available on the Web server : if we keep one DataSet in each user session, we'll have to take care to sessions timeout and to be sure that the web site doesn't have too many simultaneous clients (the system scalability is reduced). However, in this case, we could setup an optimization technique such as a shared pool of DataSets : each DataSet would be associated to a SQL query, and if two clients (two ASP.NET or UserControls) need to display a presentation of the same data, the'll only have to create a DataView on a pre-configurated and pre-filled DataSet.
Our applications rarely limit their data access to read-only. In our disconnected mode hypothetic situation, if a client wishes to update the data she sees, she has two options :
either she modifies data directly in the DataSet/RowSet she sees, then she asks to synchronize the data cache with the original data source. In this case, our application must anticipate errors : if another client already has modified these data, or if the record has disappeared...
or she uses the DataSet to view data (in order to take advantage of Java and .NET IDEs) and uses a connected mode to implement updates. Then she can use transactions to reduce simultaneous data access conflicts.
There is a situation maybe less intuitive or less often implemented in which our DataSet/RowSet can prove to be very interesting as well : a client who would wish to see data from different points of view. We can imagine a stock exchange addict (let's say a private trader) who wants to download day to day stock historic for each stock she plays with in order to do a technical analysis of their trends.
Her software could be implemented in WindowsForms and could download at 7 PM a DataSet that would contain all required data and that would calculate on the client side all the technical indicators that will help her to do her analysis (MACD, Stochastic, RSI, etc...). The user could ask her software to zoom on a specific period of the past, to compare a stock trend with an index such as the Dow Jones, or even to try to calculate forecasts in order to be able to decide whether she should buy or sell. The user could also want to display quotations as curves, Japanese candles, and so on... She could do all that stuff without having to stay connected to the network and without undergo any back and forth between client and server. The DataSet enables users to take a snapshot of stock data.
On the server side, it is easy to imagine a WebService that would send the DataSet in XML format back to the client software. This application would be very simple to setup.
Let's imagine a Web site (say www.dotnetguru.org for example) on which crowds are enormous ( ;-) ). On this site, you can see some statistics such as rush hours, number of hits per week, per month... all those data being aggregated by a dynamic page. In order to improve the site responsiveness, the simplest thing to do wouldn't be to use a data cache but a page (or output) cache. If we consider ASP.NET, the OutputCache would be set to a certain timeout, so that data be close to reality. This way, not only the database wouldn't be stressed too much, but neither would the application : we could short-circuit the whole processing sequence and send back the same HTML page to user during 10 minutes.
In the situation when simultaneous clients are very numerous, we have to pay attention to our systems memory consumption. Using a data cache (especially on the server side) would become prohibitive. Another argument against DataSet / RowSet is that the more simultaneous users, the more simultaneous data access. Merging caches to the original data source may put the whole system into jeopardy : many conflicts will rapidly imply a performance loss, and many errors sent back to user interfaces. Our software may not remain usable.
In this case, we need more flexibility regarding transactions management and memory consumption. Using a DataReader / ResultSet is recommended, and must be associated with advanced techniques such as stored procedures, connection pools and explicit transactions (that we'll see in the next sections).
Of course in the real world, a detailed analysis must be lead in order to determine the best compromise.
JDBC and ADO.NET allow us to drive (very precisely) transactions while we run SQL code on our database. This supposes of course that transactions are supported by the target database; and as you know, there are some differences between transactions implementations, especially concerning transactional isolation.
In both APIs, transaction are managed in the scope of a connection. By default, connections set transaction management mode to auto-commit : each SQL query execution is done in a new transaction, which is closed as soon as the query returns. But for performance reasons or to isolate differently the database interactions, you will generally need to disable this automatic mode and to choose a manual commit mode. A short sample ?
In Java:
// Suppose the "cnx" connection is open
// We switch to "explicit transactions" mode
// and start a new transaction (this is implicit)
cnx.setAutoCommit(false);
// Execute a PreparedStatement in the scope of our transaction
PreparedStatement stmt = cnx.prepareStatement
("INSERT INTO Gurus (Nom, Prenom, Age) VALUES (?, ?, ?)");
try {
stmt.setString("Nom", "Jaber");
stmt.setString("Prenom", "Sami");
stmt.setInt("Age", 28);
stmt.executeUpdate();
stmt.setString("Nom", "Gil");
stmt.setString("Prenom", "Thomas");
stmt.setInt("Age", 25);
stmt.executeUpdate();
stmt.setString("Nom", "Roques");
stmt.setString("Prenom", "Pascal");
stmt.setInt("Age", 35);
stmt.executeUpdate();
cnx.Commit();
System.out.println("Les enregistrements ont été écrits en base.");
}
catch (SQLException ex) {
System.err.println("SQLException: " + ex.getMessage());
if (cnx != null) {
try {
System.err.print("Aucun enregistrement n'a été écrit en base.");
cnx.rollback();
}
catch (SQLException excep) {
System.err.print("SQLException: ");
System.err.println(excep.getMessage());
}
}
} finally {
try{
if (cnx != null){
cnx.close();
}
}catch(Exception e){
// Gestion de l'impossibilité de fermer la connexion
}
}
TransactionsExplicites.javaAnd in C# :
// Suppose the "cnx" connection is open
cnx.Open();
// Start an explicit transaction
OleDbTransaction trans = cnx.BeginTransaction();
// Execute a command in the scope of our transaction
OleDbCommand cmd = new OleDbCommand
("INSERT INTO Gurus (Nom, Prenom, Age) VALUES (?, ?, ?)", cnx);
cmd.Transaction = trans;
try
{
cmd.Parameters.Add(new OleDbParameter("Nom", "Jaber"));
cmd.Parameters.Add(new OleDbParameter("Prenom", "Sami"));
cmd.Parameters.Add(new OleDbParameter("Age", 28));
cmd.ExecuteNonQuery();
cmd.Parameters.Clear();
cmd.Parameters.Add(new OleDbParameter("Nom", "Gil"));
cmd.Parameters.Add(new OleDbParameter("Prenom", "Thomas"));
cmd.Parameters.Add(new OleDbParameter("Age", 25));
cmd.ExecuteNonQuery();
cmd.Parameters.Clear();
cmd.Parameters.Add(new OleDbParameter("Nom", "Roques"));
cmd.Parameters.Add(new OleDbParameter("Prenom", "Pascal"));
cmd.Parameters.Add(new OleDbParameter("Age", 35));
cmd.ExecuteNonQuery();
trans.Commit();
Console.WriteLine("Les enregistrements ont été écrits en base.");
}
catch (Exception e)
{
trans.Rollback();
Console.WriteLine(e.ToString());
Console.WriteLine("Aucun enregistrement n'a été écrit en base.");
}
finally
{
cnx.Close();
}
TransactionsExplicites.csAt the beginning of this article, we said that JDBC 3.0 enabled to set marks in transactions, what we call « SavePoints », that support a partial cancel of the actions encompassed by a transaction (a partial rollback()). Here is how to use this feature (reusing the same code as previously) :
cnx.setAutoCommit(false);
PreparedStatement stmt = cnx.prepareStatement
("INSERT INTO Gurus (Nom, Prenom, Age) VALUES (?, ?, ?)");
// We omit exceptions management here
stmt.setString("Nom", "Jaber");
stmt.setString("Prenom", "Sami");
stmt.setInt("Age", 28);
stmt.executeUpdate();
// Let's put a SavePoint
Savepoint svpt1 = cnx.setSavepoint("SAVEPOINT_1");
stmt.setString("Nom", "Gil");
stmt.setString("Prenom", "Thomas");
stmt.setInt("Age", 25);
stmt.executeUpdate();
// Let's put another SavePoint
Savepoint svpt2 = cnx.setSavepoint("SAVEPOINT_2");
stmt.setString("Nom", "Roques");
stmt.setString("Prenom", "Pascal");
stmt.setInt("Age", 35);
stmt.executeUpdate();
// Partial cancel of modifications we made
cnx.rollback(svpt1);
System.out.println("Seul le premier enregistrement a été écrit en base.");
// Validate the modifications we made before the SavePoint
cnx.commit();
cnx.setAutoCommit(true);
UtilisationSavePoints.javaAnd without any surprise, we find the same feature in ADO.NET :
OleDbTransaction trans = cnx.BeginTransaction();
OleDbCommand cmd = new OleDbCommand
("INSERT INTO Gurus (Nom, Prenom, Age) VALUES (?, ?, ?)", cnx);
cmd.Transaction = trans;
// We omit exceptions management here
cmd.Parameters.Add(new OleDbParameter("Nom", "Jaber"));
cmd.Parameters.Add(new OleDbParameter("Prenom", "Sami"));
cmd.Parameters.Add(new OleDbParameter("Age", 28));
cmd.ExecuteNonQuery();
// Let's put a SavePoint
trans.Save("SAVEPOINT_1");
cmd.Parameters.Clear();
cmd.Parameters.Add(new OleDbParameter("Nom", "Gil"));
cmd.Parameters.Add(new OleDbParameter("Prenom", "Thomas"));
cmd.Parameters.Add(new OleDbParameter("Age", 25));
cmd.ExecuteNonQuery();
// Let's put another SavePoint
trans.Save("SAVEPOINT_2");
cmd.Parameters.Clear();
cmd.Parameters.Add(new OleDbParameter("Nom", "Roques"));
cmd.Parameters.Add(new OleDbParameter("Prenom", "Pascal"));
cmd.Parameters.Add(new OleDbParameter("Age", 35));
cmd.ExecuteNonQuery();
// Partial cancel of the modifications we made
trans.Rollback("SAVEPOINT_1");
Console.WriteLine("Seul le premier enregistrement a été écrit en base.");
// Validate the modifications we made before the SavePoint
trans.Commit();
UtilisationSavePoints.csWhen database access tools undergo a huge stress, it becomes prohibitive to systematically open and close connection between application code and databases. Indeed, establishing a connection is a costly process that runs the risk to become a bottleneck for our applications.
As usual in object oriented applications, a fine-grained management of the connections life-cycle will be implemented by a « pool », ie a set of connections to the database that are permanently maintained open, and that are dynamically (on demand) assigned to object that need database connections.
Connections pools are implemented in JDBC as well as in ADO.NET. However, beware of the fact that all JDBC drivers don't implement this part of the specification (or at least not completely). Depending on the requirements of your application, make sure that the driver you use really implements a connections pool, and that this pool is configurable enough for your needs.
The general advice in Java is to externalize database configuration (and hence the connection pool settings) in a directory. Client applications will query the configuration from the directory via JNDI (Java Naming Directory Interface), which avoids incoherencies and gives the opportunity to move the database or to refine the settings without having any impact on client applications.
Getting a reference on a connection pool using JNDI is very simple :
// Directory root
Context ctx = new InitialContext();
// Get a reference on the pool (this is an objet of type DataSource)
DataSource ds = (DataSource)ctx.lookup("jdbc/GuruBD");
Connection cnx = ds.getConnection("tom", "héhéhé");
UtilisationPoolConnexions.javaAs you can see, we end up with a 100% standard connection reference, from which we can orchestrate transactions, prepare statements and execute SQL queries. There is absolutely no impact on the rest of the code, except in terms of performance : when you invoke « cnx.close(); » the connection won't be physically closed but simply given back to the pool. It may be reused by another object that asks for a connection via « ds.getConnection(); ».
Of course, depending on the pool settings, we'll support more or less simultaneous connections, and it will be possible to parameterize the increment (the amount of connections the pool must be augmented when running out of connections), and the idle timeout (after which the pool will reduce its size by closing unused connections).
The concept being exactly the same, we'll put the stress on technical differences between an ADO.NET pool and a JDBC DataSource.
In fact, using a connection pool is automatic in ADO.NET since OLE-DB and SQL Server providers that are included in the .NET framework support it transparently. How ? Pertty simply : by associating a connection pool to each connection string used by our applications ! Suppose you open two connections using the same connection string, your connections will be managed by a transparent pool :
string cnxString = "Provider=SQLOLEDB;Data Source=localhost;" + "Initial Catalog=DNG;Integrated Security=SSPI;"; OleDbConnection cnx = new OleDbConnection (cnxString); cnx.Open(); OleDbConnection cnx2 = new OleDbConnection (cnxString); cnx2.Open(); OleDbConnection cnx3 = new OleDbConnection (cnxString); cnx3.Open(); // Our pool contains 0 connection, the application uses 3 open connections cnx3.Close(); cnx.Open(); cnx2.Close(); // Our pool contains 3 available connections, the application 0 cnx = new OleDbConnection (cnxString); // Our pool contains 2 available connections, the application 1 // No connection was opened by the last line of code UtilisationPoolConnexions.cs
By default, pools have an initial size of 0 connections and a maximum size of 100. You can modify these settings via the connection string itself :
cnx.ConnectionString = "Provider=SQLOLEDB;Data Source=localhost;" + "Initial Catalog=DNG;Integrated Security=SSPI;" + "Max Pool Size=50;Min Pool Size=10"; ParametresPoolConnexions.cs
JDBC and ADO.NET have comparable features. Both APIs evolved in parallel, they progressively introduced the same concepts (ResultSets or in-memory scrollable and modifiable RecordSets, RowSets or DataSets for disconnected architectures, the connection pool pattern, and the ability to manage transaction in a fine grained way...).
JDBC and ADO.NET don't impose any technical architecture. They both can be used :
in a client-serveur mode (a rich client directly accesses a database),
in a Web architecture (an ASP.NET or a Servlet / JSP directly connects to the database),
in a multi-tiers architecture, with or without Web access (a rich client or an ASP.NET interacts with a business component that implements persistence on top of ADO.NET).
in a deconnected mode, with or without WebServices
We could even imagine that serializing an ADO.NET DataSet could be deserialized into a JDBC RowSet in order to reduce the gap between J2EE and .NET even more. Any candidate ?
Auteur : Thomas GIL
Translator : Thomas GIL
Copyright : DotNetGuru ©
JDBC drivers search engine : http://industry.java.sun.com/products/jdbc/drivers
ODBC .NET Data Provider : http://msdn.microsoft.com/downloads