Monday, September 24, 2012

Routing Differential Signals

Differential Signalling uses a pair of complementary signals to cary data, that is each wire is carrying the opposite signal, those signals are subtracted at the receiving end and the result is either 0 for noise or 1 for a valid signal. This technique is used with high speed signals to reduce crosstalk and EMI, examples of differential signalling include the twisted pairs inside Ethernet cables, HDMI and USB.
When routing differential signals, each pair should have the exact same length, because the signals are subtracted from each other so they need to arrive at the exact same time, otherwise it's useless, the same requirement applies to other high-speed non-deferential signals such as the signals of this DRAM, note the wiggly tracks on the PCB which are used to make the tracks of equal length:

Eagle 6 has some facilities to help with differential routing, if you use an older version you could still do it manually. For Eagle 6, first give both signals the same name with one ending with _N and the other with _P, so for example USB_N and USB_P, now when you start routing either one of the signals they both will be routed together as you can see: 

If you can't route both signals together, if there's not enough space, then one will be shorter than the other, and that's when you will need the wiggly traces, or formally the Meander function. After you finish routing the signals as usual, run the meander function with the length you wish to add to the shorter trace, then click on the trace and move the mouse it will start adding wiggles to the trace:

You can also use the length.ulp to find out the difference between the tracks before using the meander function:
run length USB*
Read more ...

Saturday, September 1, 2012

STM32F4 Discovery Quick Start Guide

I finally got my hands on an STM32F4 Discovery board, those are super cheap STM32F407 ARM Cortex-M4 boards from ST. The Cortex-M4 is perfect for DSP applications as it has SIMD and an FPU and it also runs at 168Mhz. As for the board, it seems okay given its price, I really hate the headers though and it only has a few components namely, an audio DAC, a chip microphone, an accelerometer and a few LEDs.


There's also an STLINK debugger on board, which means you don't need anything else to program and debug the chip. This is a quick tutorial on how to setup a toolchain and the necessary software.

Setting up a toolchain
First thing you need to do is build or install a toolchain, I couldn't get myself to use any of the toolchains/IDEs that work with the board, as they are all propriety and they all have some sort of limitations on the free editions.

There's always the CodeSourcry toolchain, while it works fine, however, it was not built with hard-float enabled only soft and softfp, which means you have a choice between slow FP and really slow FP emulation. I also tried the summon-arm-toolchain, but it seems to have the same problem no hard-float...

I then accidentally stumbled upon a gcc toolchain which seems to be maintained by ARM and it supports soft, softfp and hard-float, I think I will be using this toolchain for every ARM Cortex I have from now on, anyway, now that you have a working toolchain you can start compiling programs but you still need drivers.

Building the drivers library:
ST provides a package that includes drivers for all the peripherals, a DSP library, API documentation and lots of examples, if you look at the examples you will see that you have to copy all the *.c files you need to your project directory along with the linker script, system initialization and startup code, personally, I like to build a monolithic library and just link with that, so you can skip this step if you want:

First download and extract the STM32F4 DSP and standard peripherals Library then copy the startup and initialization code to the src directory:
$ cp Libraries/CMSIS/Device/ST/STM32F4xx/Source/Templates/system_stm32f4xx.c Libraries/STM32F4xx_StdPeriph_Driver/src/ 
$ cp Libraries/CMSIS/Device/ST/STM32F4xx/Source/Templates/TrueSTUDIO/startup_stm32f4xx.s Libraries/STM32F4xx_StdPeriph_Driver/src/
Before you build the library, note that most of the drivers use a macro called assert_param, this macro is defined in stm32fxx_conf.h when you build a project you can enable/disable assertions, but since we're compiling this into a library the macro has to be seen at compile time, otherwise the compiler will just assume it's an external symbol and you'll get undefined references later, so you need to add this macro to the stm32f4xx header
#Libraries/CMSIS/Device/ST/STM32F4xx/Include/stm32f4xx.h
#ifdef  USE_FULL_ASSERT
  #define assert_param(expr) ((expr) ? (void)0 : assert_failed((uint8_t *)__FILE__, __LINE__))
  void assert_failed(uint8_t* file, uint32_t line);
#else
  #define assert_param(expr) ((void)0)
#endif /* USE_FULL_ASSERT */
if you #define USE_FULL_ASSERT it will enable assertions or you can compile two libraries one for normal use and the other for debugging.

Finally, copy this Makefile into the src directory and compile the library, I'm assuming here that you have the new toolchain in the path:
#Libraries/STM32F4xx_StdPeriph_Driver/src/Makefile
LIB     =  libcm4.a 
SRCS    = $(wildcard *.c) 
LIB_OBJS= $(SRCS:.c=.o)
AFLAGS  = -mcpu=cortex-m4 -mthumb -mthumb-interwork -mlittle-endian -mfloat-abi=hard -mfpu=fpv4-sp-d16 
CFLAGS  = -mcpu=cortex-m4 -mthumb -mthumb-interwork -mlittle-endian -mfloat-abi=hard -mfpu=fpv4-sp-d16 -O2 \
-I../inc -I../../CMSIS/Include/ -I../../CMSIS/Device/ST/STM32F4xx/Include/

CC = arm-none-eabi-gcc
AS = arm-none-eabi-as
LD = arm-none-eabi-ld
AR = arm-none-eabi-ar

all:: $(LIB)

$(LIB): $(LIB_OBJS)
    $(AR) -r $(LIB) $(LIB_OBJS)
    echo $(LIB_OBJ)

clean:
    $(RM) *.o *.a

.c.o :
    $(CC) $(CFLAGS) -c $<

.s.o :
    $(AS) $(AFLAGS) $< -o $@
When it's done, copy all the headers (including the CMSIS headers) and the library to somewhere like /opt or /usr/local. There's just one last piece of the puzzle missing, the linker script, you can find one in the many templates included with the library or use the one here, it also includes a Makefile template.

Blinky... Well not exactly!
To keep this simple, this example just lights up one of the LEDs on the board

#include "stm32f4xx_rcc.h"
#include "stm32f4xx_gpio.h"
GPIO_InitTypeDef  GPIO_InitStructure;
int main(void)
{
    /* GPIOG Periph clock enable */
    RCC_AHB1PeriphClockCmd(RCC_AHB1Periph_GPIOD, ENABLE);

    /* Configure PD12 in output mode */
    GPIO_InitStructure.GPIO_Pin   = GPIO_Pin_12;
    GPIO_InitStructure.GPIO_Mode  = GPIO_Mode_OUT;
    GPIO_InitStructure.GPIO_OType = GPIO_OType_PP;
    GPIO_InitStructure.GPIO_Speed = GPIO_Speed_2MHz;
    GPIO_InitStructure.GPIO_PuPd  = GPIO_PuPd_NOPULL;
    GPIO_Init(GPIOD, &GPIO_InitStructure);

    /* Set PD12 high */
    GPIO_SetBits(GPIOD, GPIO_Pin_12);

    /* Do nothing */
    while (1) {
    }
}
Programming and debugging
There's no "official" stlink utility for Linux, so I used this one instead:
$ git clone https://github.com/texane/stlink.git
$ cd stlink
$ ./autogen.sh
$ ./configure
$ make
This will build two binaries, the st-util a gdb server, and st-flash which is supposed to let you write the binary to flash but it doesn't really work! anyway, the project includes a udev rules, you should use that instead of sudo each time:
$ cp 49-stlinkv2.rules /etc/udev/rules.d/
$ udevadm control --reload-rules

Now just run st-util and in another terminal run gdb:
$ arm-none-eabi-gdb blinky.elf
$ tar ext :4242
$ load
$ continue

Notes
Take care not to remap the USB pins used by the STLINK debugger, if you do so the debugger won't be able to talk to the chip and you could actually brick the board :) if you do find yourself in this situation, pull the BOOT0 pin high, there's a VDD pin right next to it so just place a jumper on those pins, when you do that, the board will run the bootloader instead and you can erase the entire flash.

The library is configured for a 25MHz oscillator and the Discovery board has an 8MHz oscillator, you will need to change two values to fix that, the HSE_VALUE in stm32f4xx.h set that to (8000000) and the PLL_M in system_stm32f4xx.c to (8).
Read more ...

Wednesday, July 11, 2012

Portable SDR with Olinuxino for Tracking Wildlife

I was working on a project for a biologist friend of mine, that involved tracking RF tag implants on wildlife, using an SDR DVB-T dongle. He wanted a portable solution to be placed near the sprainting places of Otters(Lutra lutra), the main subject of the study, where they regularly come to deposit scent marks.

An RF tag is implanted on the animal which sends pulses on a frequency unique for each tag, this way we're able to tell them apart, this is what a cheap RF tag looks like,  this one has a frequency of 148.354Mhz
Software defined radio (SDR) became a bit more accessible recently, when a kernel hacker discovered that DVB-T dongles based on the RTL2832U chip can be used as a cheap SDR, since the chip allows transferring the raw samples to the host. Since the project was on a tight budget that seemed like a perfect solution...

The osmocom project provides a user space driver for the dongle, I started with pyrtlsdr, a Python binding for librtlsdr, and pylab for signal processing of the samples collected by the SDR, to compute and plot the PSD, that last spike corresponds to the signal of the RF tag above:

Finally, I ported everything to an olinuxino ARM-based board running Linux, a script runs on boot and logs the detected frequencies to file on SD card for later processing, the board still hasn't been tested in the field yet, I'm still working out some issues with the power management.
Read more ...

Thursday, May 3, 2012

MIPS1 ISA Implementation with Verilog

uMIPS is a 32­-bit pipelined processor implementing the MIPS1 ISA, written in synthesizable Verilog and tested on a DE0 CycloneIII FPGA board. A great deal of the ISA is implemented, enough to run small C programs compiled with the gcc toolchain.


The design is based on a classic five-stage pipeline, fetch, decode, execute, memory and write-back stages, each one is represented by an architectural state register, it's worth mentioning that it's enough to save and restore those registers for a context switch if needed. Separate data and instruction memories are used, each one is initialized with a memory hex file generated by the toolchain.

The design includes a hazard unit that solves control and data hazards by forwarding or stalling the pipeline whenever is necessary and some registers are memory mapped to high addresses to control the LED strip on the board and the 2x16 LCD screen.

Testing
I tested the processor successfully by running a program that computes the factorial of a number and the LCD driver.



Todo
  • Dynamic branch prediction.
  • Load programs from Flash and link to libc.
  • Exceptions/Interrupts coprocessor.
  • The instruction memory does not get inferred properly.


Sources
hg clone https://code.google.com/p/umips/ 


References
David Harris, Sarah Harris. "Digital Design and Computer Architecture"
David A. Patterson, John L. Hennessy. “Computer Organization and Design”, 4th Edition.

Read more ...

Wednesday, January 25, 2012

nRFCam

The nRFCam is an open-source 2.4GHz wireless camera built using the TCM8230MD camera, an ARM Cortex-m3 LPC176x MCU and a Nordic nRF24l01p chip for the radio link.

Overview
On reset, the Nordic chip is initialized and a PWM channel is configured to generate the clock signal for the EXTCLK, fast GPIO is used for the data bus and the other sync signals. Once the camera is configured, using the I2C interface, it starts sending frames over the 8-bits wide data bus as usual, synchronized by the HSYNC, VSYNC and DCLK signals (for more details on this see the TCM8230MD-Breakout post).


The VSYNC interrupt handler is written entirely in assembly, unfortunately this makes it difficult to port but it's much more optimized.  Each scanline is read in a loop and converted from RGB565 to grayscale and then stored in the frame buffer.

Once we read a complete frame, the frame is compressed using lzf and then sent out over SPI to the nordic radio chip. A simple frame header precedes the frame data, consisting of a start of frame marker (0xAA), to synchronise the frames, followed by the length of the frame and the frame data.

Timing
The datasheet does not mention how the data bus clock (DCLK) frequency relates to the external clock (EXTCLK) frequency, however, the maximum DCLK frequency can be inferred from the timing characteristics and from the following diagram, given that the minimum setup time TSU and hold time THD is 10, if the camera is running at the maximum external clock (EXTCLK) frequency which is 25Mhz, then the maximum DCLK frequency is 1s/20ns = 50MHz.
If you sample the data at the rising edge of DCLK it means you have only 10ns to read DOUT, if the MCU is running at 100MHz that's one clock cycle to read DOUT, which is not possible, however, since DCLK is a function of EXTCLK lowering EXTCLK lowers DCLK and increase the time window we have to read the data. The NRFCam generates a 6Mhz EXTCLK for the camera, which gives us a few cycles to read the data.

Compression
The Nordic chip has an on-air datarate of 2Mbps or 250KBps (ignoring any  protocol overhead), it means that even with the camera set to output the smallest possible frame 24KB (128x96x2), the maximum we can send is 10 FPS. In other words, there's no point of trying to keep up with a higher frame rate when all you can send is 10FPS.

The frame is converted from RGB565 to gray scale reducing the frame size by half (12KB), since we don't have enough time between each DCLK edge, the whole scanline is read first and then converted before the next scanline starts (before the next HSYNC edge).

After the whole frame has been read, the frame is compressed using the lzf compression algorithm, there's no particular reason for selecting lzf other than that it was the easiest to port and configure for a low memory footprint, maybe there's an algorithm that can better exploit the data, anyway, at this point the frame size is reduced to 4KBs-6KBs (depending on the frame).

On the other side, the frame can be received by any compatible Nordic chip, decompressed and the grayscale level of each pixel is repeated in the RGB components of the new pixel. Note that since the blue component has 6 bit, in an RGB565, you will need to multiply the grayscale value by two, or left shift by one, for the blue component.

The Prototype
I haven't tested the new board yet, so it's not in the repository. There's however a new breakout, the one used in the prototype, the new breakout is basically the same except that this one is meant to be connected as module, not to the breadboard, and I've removed the crystal oscillator and instead exposed the EXTCLK pin to be controlled by a PWM. This is the prototype in action:
Eagle files and source code
hg clone https://code.google.com/p/nrfcam/
Read more ...