Post by Robert AH PrinsMon 18-Oct-2004: Got from the update author
459244 Oct 10 2004 ftp://garbo.uwasa.fi/pc/turbopa7/bpl70v20.zip
bpl70v20.zip Borland-Pascal 7.01 RT-Libary Update Rel2.0, R.Prins
FWIW, this was the original message that accompanied the Garbo upload:
Subject: BPL70V20.ZIP Partial 32-bit replacement libraries for BP7
File name: BPL70V20.ZIP
One line description: Optimized BP7 runtime libraries
Replaces: NONE - THIS IS AN ALTERNATIVE TO BPL70N16.ZIP!
Suggested Garbo directory: TURBOPA7
Uploader name & email: Robert AH Prins <prino at onetel dot com>
Author or company: Robert AH Prins
Email address: <prino at onetel dot com>
Surface address: 52 Lummis Vale, Kesgrave, IPSWICH, SUFFOLK, IP5 2FJ, UK
Special requirements: 80386+ required
Shareware payment required from private users: N
Shareware payment required from corporates: N
Distribution limitations: None
Garbo CD-ROM distribution allowed without preconditions: Yes
Demo: No
Nagware: No
Self-documenting: Yes
External documentation included: Yes
Source included: Yes
Size: 449 K / 1415 K
10 lines description:
BPL70V20.ZIP contains optimized Run Time Libraries for Borland Pascal 7.
The code is based on Norbert Juffa's BPL70N16, but incorporates changes
Borland made in BP 7.01. Unlike the libraries in Norbert's BPL70N16, the
ones include here only run on 386+ CPU's due to the extensive use of
32-bit instructions. Many of the original RTL routines have also been
split up further to increase the smart-link granularity.
In addtion to the System unit, the RTLs also include optimized Dos, CRT
(removal of RTE 200) and Overlay units.
Long description:
The most notable "flaws" in the replacement libraries written by Norbert
Juffa in BPL70N16.ZIP are:
1) they are based on the original BP 7.00 RTL code
2) they do not use any 32-bit code
To quote Norbert:
"Those users already familiar with my previous project, the fast
replacement library for Turbo Pascal 6.0 (distributed as TPL60N19.ZIP),
may be disappointed that not all the features of that program have been
included in BPL70N16.ZIP yet. I don't have much time at the moment but
still wanted to provide a BP 7.0 version of my library as soon as
possible. So I decided to port the performance relevant stuff first and
work on the other aspects later."
Now, better (10 years...) late than never, BPL70V20.ZIP contains some of
these "other aspects" Norbert never got around to.
The most significant differences between BPL70N16 & BPL70V20 are
1) the new RTLs are based on the BP 7.01 RTL
2) they _require_ a 32-bit CPU
3) much of the code is Pentium+ friendly by replacing most slow CISC
instructions with their RISC equivalents, which can be executed
in parallel on Pentium/PII/PIII/P4/Atlon CPUs
4) they are more smartlink-friendly
5) in the default configuration, they no longer support the software
FPU emulator (but two additional System units incorporating the
SW emulator) are provided
6) they include a copy of my non-RTE200 smartlink friendly CRT unit
7) they include more smartlink-friendly DOS & Overlay units
As far as performance is concerned, the greatest gains, compared to
Norbert's libs, can be found in the Longint arithmetic. Norbert removed
all traces of it, I removed all traces of 16-bit code, essentially
reverting back to the original Borland unit, but without the Test8087
tests. The output from LONGTEST indicates that my code is about 35%
faster than that of Norbert and some 5% faster than the original BP 7
RTL.
Another noticable improvement is in the Set handling code. My code is
about 10% faster than the code in BPL70N16, which is due to the fact
that I use 32-bit code RISCified code that.
The results of WHETST87, where my code is some 10 to 14% faster than
Norbert's should be taken with a pinch of salt. Yes, it IS faster, but
this might mostly be due to the much faster conversion INT 34-3D to
real FPU instructions.
The two most notable other improvements that are not shown in the
results of any of the supplied testing programs are:
1) the Move() routine auto-aligns the destination of the move and moves
4 bytes at a time, making it, for longer moves, close to four times
as fast as the original Borland RTL and up to twice as fast as
Norbert's code.
2) The 'internal' move also moves 4 bytes at a time, but doesn't perform
auto-alignment of the destination. This move is, among others, used
during assignments of records and arrays.
Other routines also use some 32-bit instructions, but the quality of
Norbert's original code was so high to start with, that improvements
are marginal. On my AMD64 all benchmarks execute faster, on the old
Cyrix some are slower. At this moment (October 2004) I do not have
access to any Intel hardware to generate more performance data.
As for the sources included in the various archives in the main file,
ARISOURC.ZIP & STRSOURC.ZIP contain Norbert's arithmetic and string
handing sources, with my tweaks. RAHPSRCE.ZIP contains all source that
no longer contains any Borland copyrightable code and the full sources
for the hardware and software emulators used in the libraries. Be aware
that this code has been significantly changed from the code extracted
from the original EM86/7 .OBJ files supplied by Borland.
RAHPSRCE.ZIP also contains, in \RTL\BIN\TPU & \RTL\BIN\TPP versions of
all units compiled with full debugging information and versions of
System.TPU/TPP that support the software emulator.
The units included are:
- nnnnD.TPU/TPP : Unit nnnn compiled with full debugging support
- SystemE.TPU/TPP : System units compiled with SW emulator support
- SystemED.TPU/TPP: System units compiled with SW emulator and full
debugging support
To use any of these alternative units, rename them and use TPUMOVER to
add them to the appropriate TPL.
Last but not least, RAHPSRCE also contains the two makefiles I use
to generate the whole mess...
Robert AH Prins
October 2004