Wednesday, March 7, 2012

Hourly batch process locking my database

Hello everyone,

I have around 20 reports in an ASP web-application which connects to a
SQL Server 2000 dB, executes stored procedures based on input
parameters and returns the data in a nice tabular format.

The data which is used in these reports actually originates from a 3rd
party accounting application called Exchequer. I have written a VB
application (I call it the extractor) which extracts data from
Exchequer and dumps the same into the SQL Server dB every hour. The
running time for the extractor is an average of 10 minutes. During
these 10 minutes, while the extractor seems to run happily, my ASP
web-application which queries the same dB that the extractor
application is updating becomes dead slow.

Is there anyway I can get the extractor to be nice to SQL Server and
not take up all its resources so that the ASP web-application users do
not have to contend with a very very slow application during those
times?

I am using a DSN to connect to the dB from the server that runs the
web-application and well as the other server which runs extractor.
Connection pooling has been enabled on both (using the ODBC
Administrator). The Detach Database dialog gives me a list of open
connections to the dB. I have been monitoring the same and I have
noted 10-15 open connections at most times, even during the execution
of extractor.

All connection objects in the ASP as well as VB applications are
closed and then set to nothing.

This system has been in use from 2002. My Data file has grown to 450MB
and my Transaction Log is close to 2GB. Can the Transaction Log be a
problem. For some reason, the size of the Transaction Log does not go
down even after a complete dB backup is done. Once a complete dB
backup is done, doesn't the Transaction Log lose its significance and
can be actually deleted? Anyway this is another post I'm doing today
to the group.

In the extractor program,
1) I create a temporary table
2) I create an empty recordset out of the table
3) I loop through the Exchequer records using Exchequer's APIs, adding
records into the recordset of the temporary table as I go along.
4) I do an UpdateBatch of the Recordset intermitently
5) I open an SQL Transaction
6) I delete all records from the main table
7) I run a INSERT INTO main_table SELECT * FROM #temp_table
8) I commit the transaction

I hope that the information is sufficient

Thanks
SamSam (simantahandique@.indiatimes.com) writes:
> Is there anyway I can get the extractor to be nice to SQL Server and
> not take up all its resources so that the ASP web-application users do
> not have to contend with a very very slow application during those
> times?

There probably is, but it's not that there is a low-priority setting.
You simply will have to analyse in what way the extractor is slowing
the rest down. It could be blocking, it could be that the extractor
consumes a lot of CPU. Indeed during these steps:

> 5) I open an SQL Transaction
> 6) I delete all records from the main table
> 7) I run a INSERT INTO main_table SELECT * FROM #temp_table
> 8) I commit the transaction

The other processes are likely to be blocked, as the main table is
being reloaded.

I don't think you said how much data that is loaded each time, but it is
not impossible that here are possibilities for improvements. In step 6,
you could use TRUNCATE TABLE to empty the table, this is faster than using
the DELETE statement. TRUNCATE TABLE will not work if the table has
foreign keys. Also, the extractor must be running as dbo to have permission.

Then again, does data really change that wildly? Maybe you are better off
loading deltas only?

Also: you load the data through a recordset. You are likely to get better
performance with bulk-load operations.

> This system has been in use from 2002. My Data file has grown to 450MB
> and my Transaction Log is close to 2GB. Can the Transaction Log be a
> problem. For some reason, the size of the Transaction Log does not go
> down even after a complete dB backup is done. Once a complete dB
> backup is done, doesn't the Transaction Log lose its significance and
> can be actually deleted?

Well, someone has to make that decision, and SQL Server is not doing it
behind your back. You need to backup the transaction log as well to
keep it down in size. The transaction log is kept so that you can back
it up and make an up-to-the point recovery.

Judging from the description of the database, it sounds that you might
be interested in setting the database in simple recovery. This means
that you cannot perform up-to-the-point recovery, only restore from
the most recent backup. The flip side is that SQL Server will regularly
truncate the transaction log from all committed transaction.

--
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techin.../2000/books.asp|||Thank you for your post. A couple of points.

Erland Sommarskog <esquel@.sommarskog.se> wrote in message news:<Xns952F720586F5AYazorman@.127.0.0.1>...
> Then again, does data really change that wildly? Maybe you are better off
> loading deltas only?
I run the extractor every hour to synchronize data from the past 7
days from Exchequer. This is very much the requirement, so I do not
have any choice there. This results on an average 10,000 records to be
moved into the main_table from the #temp_table, which is when the
dead-lock happens. Does this information help you in any way to
suggest some improvements?

I was thinking of running two parallel dBs in some sort of a
round-robin load balanced setup. (I'm sure the "round-robin load
balanced" term is completely incorrect to describe the setup. But I am
using it for lack of a better phrase to describe my plan) In a typical
scenario, the Extractor will run on dB1. The Intranet web-application
will be connected to dB2. Once the extractor completes its execution,
it can alter the connection string in the conf file for my Intranet
web-application so that the Intranet connects to the updated dB1
henceforth. The next time the extractor executes, it will update dB2
instead of dB1 and switch the Intranet back to dB2 after its
execution, and so on. The only problem is that the dB has around 20
tables out of which only 6 are being updated by Extractor. How do I
synchronize the other 14 tables between dB1 and db2? Does SQL Server
provide any facility for that?

> Also: you load the data through a recordset. You are likely to get better
> performance with bulk-load operations.
What are these bulk-load operations? Are they a group of SQL Commands?
If you can point me to the right place in the SQL Server help file, I
can read up from there.

> Judging from the description of the database, it sounds that you might
> be interested in setting the database in simple recovery. This means
> that you cannot perform up-to-the-point recovery, only restore from
> the most recent backup. The flip side is that SQL Server will regularly
> truncate the transaction log from all committed transaction.
I just enquired with the administration team. They are using a
third-party software called Backup Exec from Veritas for all backup
activities. The backup software uses some thing it calls as SQL Server
agents to backup the dB. I'm not sure if these SQL Server agents would
cause the transaction log to get truncated once they are done with the
backups.
If I use a simple recovery model, is the transaction log table still
used by the dB? Would SQL statements which have a BEGIN TRANS use the
Transaction Log? Once the transaction is committed in this case, does
it automatically truncate the Transaction Log and thus keep its file
size in check?
These questions may been answered in the previous post. I'm sorry if
I'm repeating the same question. But I could not understand the last
few sentences completely. Especially, when it was mentioned thus: "The
flip side is that ...". Why do you say it is a "flip" side?

Thanks and regards,
Sam|||Just a suggestion for you. It sounds similar to something we do in
here, so it may be worth a thought.

We have to extract a large number of records from several tables. From
this we need to perform some complex calculations and play around with
data formatting to get the data into a required output which is a
single table. We need to do this every 10 minutes with no longer than
3 mins taken up for the processing. Locking users is a major concern,
as previous methods used directly on the 'live' tables caused us
problems.

What we found worked best for us was to have a database in which we
could store the structure of the tables we were copying (specifically
just the fields we needed). Using a DTS package and SP's, we extract
the data from the master db into CSV files (very quick to do this -
for 1/2 million records between all of the tables, this takes about 1
min to export and import) and re-import the data into the copied
structure in the new database. We then perform all of the calculations
on these tables as opposed to the master 'live' ones.

I know it probably seems daft exporting and importing, but it's the
quickest way of getting the data without causing the users problems.
SQL server can easily handle the workload on the data, but we were
causing too many problems with locking. By doing this, we have
improved the speed dramatically.

The advantages are that you can take the data as read only and reduce
locking (although this will be minimal) and speed of export and
import.

A simple scheduled task runs this for us every 10 minutes.

If you get a chance, try this and see how it performs against your
normal method, it may prove quick enough for you to take further.

Hope that helps, let me know if you need more info.

Ryan|||Sam (simantahandique@.indiatimes.com) writes:
> Erland Sommarskog <esquel@.sommarskog.se> wrote in message
> news:<Xns952F720586F5AYazorman@.127.0.0.1>...
>>
>> Then again, does data really change that wildly? Maybe you are better off
>> loading deltas only?
>>
> I run the extractor every hour to synchronize data from the past 7
> days from Exchequer. This is very much the requirement, so I do not
> have any choice there. This results on an average 10,000 records to be
> moved into the main_table from the #temp_table, which is when the
> dead-lock happens. Does this information help you in any way to
> suggest some improvements?

Not really. If I understood your previous post, you are once an hour
flushing the target database, and reloads everything from Exchequer. This
sounds very ineffecient to me. Is there no way to find out what has
changed, and load only the changes?

> I was thinking of running two parallel dBs in some sort of a
> round-robin load balanced setup.
> ...
> The only problem is that the dB has around 20 tables out of which only 6
> are being updated by Extractor. How do I synchronize the other 14 tables
> between dB1 and db2? Does SQL Server provide any facility for that?

You could use triggers for this, although you need to take some care that
you don't trigger forth and back.

However, having two databases is a bit overkill. It might be possible
to do this with two tables that you rename once an hour. Not that this
is particularly exciting. (Loading only deltas still seems like the best
bet.)

> What are these bulk-load operations? Are they a group of SQL Commands?
> If you can point me to the right place in the SQL Server help file, I
> can read up from there.

There are several methods to bulk-load:

o BCP, which is a command-line tool.
o BULK INSERT which is an SQL command.
o There are bulk-copy APIs both in OLE DB and ODBC.
o DTS.

The easiest to use in my opinion is BCP. But these ways have many things
in common, including how you specify how the data to import looks like.
(With the reservation for DTS that I don't know much about.)

With bulk-load, you could get all data into a staging table, and then
work from there. Preferably set-based, and not one-by-one processing
like in ADO.

> I just enquired with the administration team. They are using a
> third-party software called Backup Exec from Veritas for all backup
> activities. The backup software uses some thing it calls as SQL Server
> agents to backup the dB. I'm not sure if these SQL Server agents would
> cause the transaction log to get truncated once they are done with the
> backups.

I don't know about Backup Exec, you will have to sort that out with
your admin people. But I would guess, that it's a configuration thing
whether you also backup the log with Backup Exec.

Then again... You said, 10000 rows on average each time. If I assume
an average row size of 500 bytes, this means that you insert and delete
10000 * 500 * 2 * 24 = 240.000.000 bytes a day. If your transaction log
never had been truncated since 2002, you would have a terabyte transaction
log by now. So I would guess that it is backed up after all.

> If I use a simple recovery model, is the transaction log table still
> used by the dB? Would SQL statements which have a BEGIN TRANS use the
> Transaction Log? Once the transaction is committed in this case, does
> it automatically truncate the Transaction Log and thus keep its file
> size in check?

More or less, yes. That is, transaction atomicity is still maintained,
so that a transaction can be rolled back (or forward). The log is never
truncated past the oldest active open transaction. The actual truncation
does not really happen with commit, but by the checkpointing process
which runs about once minute or so. But that's really nothing you have
to bother about.

--
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techin.../2000/books.asp

No comments:

Post a Comment