Update March 2022: The PLX9054 is no longer in production at PLX/Broadcom. An alternative chip with PCI-localbus interface is difficult to locate, as Texas and Maxim seem to focus on PCIe these days.
PCI is dated on desktops, but still fairly active in embedded systems. A similar bus situation can be expected on PCIe,
the difference between DMA and non-DMA is probably even larger then.
The PLX9054 (this page no longer available, March 2022) is a configurable
interface chip used on PCI cards, which interfaces the PCI bus to a local bus on the card. The local bus can be either 8,16 or 32bit wide,
with separated or multiplexed address and data bus lines.
The PLX9054 handles bus arbitration and timing on the PCI, as well as on the local bus side, optionally endian-ness conversion and offers
read-ahead FIFO, IRQ generation, mailboxes and DMA control.
This text summaries timing comparison between DMA and non-DMA mode. For Linux kernel DMA setup, consult
Documentation/DMA-API-HOWTO.txt and Documentation/PCI/pci.txt in your Linux kernel source.
The following diagrams show a read of 4096bytes in 32bit words from a 32bit-local bus to kernel memory. Non-DMA reads use a readl()
in a for-loop, IRQs not disabled. Two configs of the PLX9054 were used for non-DMA access: The first disables look-ahead FIFO reading from
local bus, and disallows BURST timing, the second one enables both. On this board, the local bus uses multiplexed address/data signals on the
32bit lines (J-mode in PLX9054 parlance).
Scope screen-dumps show signals on the local bus, top-to-bottom: LCLK (clock on local bus, 50MHz), ADS (address-strobe, beginning of read
cycle, address put on bus), BLAST (end-of-cycle) and READY (signal to PLX when data ready), all three active-low.
In BURST mode, each rising edge of LCLK can read a data word at a consecutive address, following the initial setting of the start address.
units: 1MB = 1024*1024 bytes
no read-ahead, no DMA | with read-ahead, no DMA | with read-ahead, DMA | |||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
overall timing of 1k x 32bit words, 100us/div | |||||||||||||||
680us , 5.7 MB/s | 420us , 9.3 MB/s | 33us , 118 MB/s | |||||||||||||
detailed timing, 400ns/div |
Top row, left-to-right:
No read-ahead, no DMA: Reading 4096byte in 1024 32bit words to RAM takes around 680us. The gap is a random interrupt happening in
between. Around 5.7 MB/s
With read-ahead, no DMA: The same 4096byte are read in around 420us. Another interrupt happening in between. Around 9.3 MB/s
With read-ahead and DMA: The same 4096byte are read in around 33us. Roughly 20 times faster than the first config. Around 118 MB/s (nominal
bus capacity for a 33MHz 32bit PCI is 133MB/s)
Second row, left-to-right:
No read-ahead, no DMA: Each ADS strobe is followed quickly by BLAST: the local bus is active for one 32bit word only. Between this, the
local bus and PCI bus are inactive during long intervals.
With read-ahead, no DMA: The initial read from PCI triggers a BURST read on the local bus (PCI controller pre-fetched multiple words from
local bus, so longer time between ADS and BLAST), following reads from PCI result in one-word reads on local bus.
Intervals between reads are shorter, since each PCI read except the first is immediately fed from the cache (FIFO) inside the PLX9054, while
the next word is read from local bus in parallel.
With read-ahead and DMA: At start of DMA, a BURST read on the local bus saturates FIFO, followed by shorter BURST periods (same timing on
PCI and local bus as in config before). Dead intervals between reads are much shorter than in non-DMA case.
Peter Apian-Bennewitz, info[AT]pab-opto.de,
text and images are under the GNU_Free_Documentation_License,
reference to this text appreciated.