1. Prologue: What's an ‘Oric’, Anyway?
People keep reading this article for some reason.
It's been 35 years since it was released and Oric users were never many, so this article reallyneeds an explanation. The Oric-1, Oric Atmos, and Oric IQ-164, Stratos or Telestrat were European home microcomputers (micros for short) from the 1980s. The first two came in 16K and 48K versions: that's 16,384 or 49,152 bytes of RAM. Bytes, not megabytes or gigabytes. All Oric machines used 6502 microprocessors, like the Apple II and Commodore 64. The Orics run their 6502s at a staggering 1 MHz—that's 0.001 GHz. Most people used a TV as an output device, and a tape recorder for storage. People who could afford it bought a 3" (not 3½") floppy drive which also expanded the 48K models' memory to 64K. The extra 16K was used by the disk OS.
Like many relics of the Eighties, the Oric went through a tender, nostalgic revival in the Nineties. That called for more modern software, but also more modern ways of writing it. Luckily, there were already C cross compilers that targeted the 6502 microprocessor. Someone (I forget who) compiled one that allowed you to use a PC to compile C programs that ran on a real or emulated Oric as tape or disk images. A number of us spent long nights coding various rudimentary C libraries using such necessary monstrosities as self-modifying code.
Eventually, I was asked to write a C tutorial for people starting out with the language. People were already very familiar with the limitations of the Oric architecture, and most of them could program in the BASIC interpreter the machine booted to. Those two facts set the target audience. The tutorial was serialised in Club Europe Oric magazine over several months.
What this is meant to convey is this is an old article: 26 years old and counting. It covers ANSI C from 1989 and was published years before C99 (1999) or C11 (2011). Follow it and you'll be learning a version of ANSI C that (while compilers still happily grok), is far from modern. Also, you'll be learning for an architecture you're almost certainly not going to target with your code! Oh, and the standard libraries are drastically different from what you might expect.
You are of course, welcome to read through this. And if you're an Oric user from the days of yore, very welcome to reminisce with me.
— Alexios, March 2018.
2. Introduction
This series of articles is a C language tutorial, focused on using C efficiently on Oric computers. The language combines a set of features which make it an excellent choice for most programming tasks, from system programming to Artificial Intelligence. Unfortunately, it was not designed with small eight-bit systems in mind. As a result, the Oric implementation of C is a cut-down version.
Although this is supposed to be programming on the Oric, I will try to provide a generic C tutorial, stressing differences between ANSI C and the compiler we use on the Oric. I might fail utterly, in which case Kernighan and Ritchie will appear to me in a vision and damn me to eternal debugging, but it will be fun anyway.
So, the most important question: why use C? Why not stay with Assembly, Forth or Basic? There are many answers to the Question. C is a very compact language: it comprises a minimalist core of less than fifteen statements. Everything else (including I/O, maths, and many data types) is in external libraries, which can be linked to the program at will. This allows for very small programs and extreme flexibility. Also, C is a high-level language, but, depending on the style and knowledge of the programmer, it may work with all the power of Assembly, or with all the structure and readability of Pascal. This makes it a good choice for writing fast Oric programs easily and quickly. And of course, a language which provides such ample opportunities for puns and jokes hasto be good, right?
These articles will assume you are using an 48k Oric-1 or Atmos as your testing platform. Since there is no native Oric C compiler, we use a cross-compiler (i.e. the compiler runs on one platform, and produces code which will run on another). The compiler is David R. Hanson's retargettable ANSI C compiler, lcc. The Oric version, lcc65, runs on IBM PCs and compatibles (DOS or Linux based). You can run your programs either on a real Oric, or on Fabrice Frances' Euphoric emulator.
3. Using the Compiler
Let's start with some hands-on explanations of how to use the compiler. Type in the following archetypal program using your favourite PC text editor.
#include "stdlib.h"
/* The famous Hello World program! */
void main()
{
printf("Hello world!\n");
}
Please take care with capital and small letters: C is a case-sensitive language. Save it as hello.c
. Exit the editor, and enter the following:
cc65 hello
After a while, the DOS or Linux prompt will be displayed. Your program is now compiled. Yes, that was all. If you get any error messages, you probably made a typo. Check the file and try again.
Now, let us run the program. A successful compilation will yield a .out
file (hello.out
in this case). This is a ‘tape’ file. Run Euphoric and load it:
So, there you have it: your first C program on the Oric. Now, let us see why it does what it does.
To understand that, we need an important piece of information: the C compiler is not a single program, but a pipeline of different programs:
- The Preprocessor is responsible for preparing the source code for compilation. It handles all lines beginning with hash (
#
). It outputs the source code it inputs, with a few changes we'll discuss soon. - The Compiler reads the source code and translates it into Assembly Language (I make it sound so easy).
- The Assembler translates the Assembly source code into machine language.
- Finally, the Linker reads one or more machine language files (object files), as well as some libraries, and creates an executable program.
In the case of the Oric compiler, the linker (by Vaggelis Blathras) runs between the compiler and assembler, and there is also an extra program which translates the final machine code object file into an Oric ‘tape’ file.
All this might sound a bit too technical (and definitely useless). However, each of these stages has its own command set. In the Hello World program, for example, the first line is handled by the preprocessor.
The #include "stdlib.h"
directive instructs the preprocessor to literally include the file stdlib.h at the point in hello.c where the #include
command was seen. The preprocessor actually processes and outputs the contents of the specified file, and then goes on with the rest of the original file (of course, #include
directives can be nested).
The next line is a comment. Anything between /*
and */
is considered a comment. Comments may be located anywhere in a line. They may span lines.
Next, we define a function. A C program is split into functions, each of which calls other functions. Even the main program is a function, called ‘main
’. This function is automatically called when the program starts. void
means the function returns nothing (this makes C functions differ from the mathematical concept of a function). main
is the name of the function. ‘()
’ means that the function accepts no parameters. Note that you have to include the parentheses even if the function accepts no parameters. The parentheses are C's way of knowing that we are calling or declaring a function (something like the $
in BASIC strings).
The curly brackets (‘{
’ and ‘}
’) denote the beginning and end of the body of the function. printf()
is a function (as you might guess by the ‘()
’). It prints the string passed to it as a parameter (note that C strings are enclosed in double quotes: "Hello
world!\n"
). printf()
is obviously not defined in our little program. It is not defined by C itself, either (remember, C has no built-in functions). It's declared inside stdlib.h
1. This is why we need to #include "stdlib.h"
. By the way, the \n
at the end of the string means ‘start a new line’. C actually translates it to the ASCII code for CR
(Carriage Return, CTRL-M
or RETURN
).
Another interesting point is the final semicolon (;
). In C, all declarations and statements end with this symbol. Try not to forget it; the compiler will stop with all sorts of weird error messages. Since semicolons are used to delimit declarations and statements, white space (spaces, TABs and RETURNs) is not important to the compiler. As long as you use semicolons, you might as well write your whole program in a single line (I wouldn't recommend it: it makes debugging a nightmare).
4. Simple Data Types and Variables
Like most high-level languages, C has its own set of data types. Only simple data types are defined by C: all other data types can be derived from simple ones. They are defined in (guess what) libraries. Here is a table of simple data types and their widths in bits.
Type | Description | Bits |
---|---|---|
char | A single byte | 8 |
int | A signed integer | 16/32 |
float | Floating point number | 32 |
double | Double precision float | 64 |
Strange as this may sound, this is all. To make things easier, there are modifiers which change the width and format of the four data types. The modifiers are short
, long
, signed
, and unsigned
. The first two change the width and range of the data types; the latter two change between signed and unsigned formats. Here is a table of all the meaningful combinations of modifiers and data types:
Modified Type | Bits | Range |
---|---|---|
char , signed char | 8 | -128 … 127 |
unsigned char | 8 | 0 … 255 |
int , signed int | 16/32 | -32768 … 32767 or -2147483649 … 2147483648 |
unsigned int | 16/32 | 0 … 65535 or 0 … 4294967295 |
short int , signed short int | 8/16 | -128 … 127 or -32768 … 32767 |
unsigned short int | 8/16 | 0 … 255 or 0 … 65535 |
long int , signed long int | 32 | -2147483649 … 2147483648 |
unsigned long int | 32 | 0 … 4294967295 |
float | 32 | -3.4E-38 … 3.4E+38 |
double | 64 | -1.7E-308 … 1.7E+308 |
long double | 64 | -3.4E-4932 … 1.1E+4932 |
As you can see, this gives a rather impressive collection. Note that there are defaults to each modifier. If you do not specify the data type, int
is assumed (so it is more common to write long
than long int
). The default format is signed
and the default size is normal (no modifier).
An important concept in C is that, although there are very concretely defined data types, you are not forced to obey them: you can store a value of 255 in a signed char
. The compiler might warn you of possible problems, but will not generate an error. The interpretation of the bit patterns, however, depends on the variable's data type. So, when reading the signed char from the previous example, we will not get 255, but -1 (binary 11111111 = unsigned
255, but -1 in signed
or twos' complement).
Another point (important to BASIC users) is that C does not allocate variables dynamically: you must declare them before use. The declaration is a line like one of the following:
char x, y; /* x and y are chars */
int i; /* i is an int */
float X; /* note that x is NOT X */
long a=15; /* variable initialisation */
Variable declarations are placed between an open curly bracket (‘{
’) and the following statement, or outside function definitions, at the top level of the program. In the first case, they are local variables: accessible only within the block they were defined in (i.e. only accessible to the statements within the curly brackets). Variables defined at the top level are global: they are available to all your program.
Literals are actual (literal) constant numbers used like that in C code. The 15
in the previous code snippet, and the "Hello world!"
in the canonical ‘Hello World’ program are both literals. Like most languages, C allows us to specify literal values in different ways. Here they are:
Constant | Description |
---|---|
13 | 13 decimal |
13. | 13.0 in floating point |
13.45 | 13.45 in floating point |
0xd | 13 in hexadecimal |
015 | 13 in octal |
'A' | ASCII ‘A’ or 65 |
'\101' | ASCII ‘A’ in octal |
'\0' | ASCII 0 or NUL |
'\n' | ASCII 13 or CR |
You can assign any of these to any of the data types described above! If you assign the character literal '@'
to an int
, the value you read back will be 64
. If you assign 33 to a char
, you'll read back !
. If you assign 42
to a float
, it's automatically converted to 42.0
. Try not to assign 42.0
to an int
. Bad things may happen.
Warning
Depending on the system's architecture and howint
and float
are represented by the computer's CPU, they may or may not be magic at play here. There's no magic at play on the Oric, and to avoid problems, you should always add the decimal point when you assign literals to float
. For example, say float x = 10.;
, not float x = 10;
. I once spent a frustrating week trying to locate a bug caused by something like this. Decimal points are difficult to spot in print. 5. Expressions
The focus of this section is be on expressions: a very important concept in the language. This is rather natural, since C handles almost everything as expressions (borrowing a little of that Lisp functionality).
Let's start with a very simple program:
#include "stdlib.h"
void main()
{
int result;
result = 10 * 3 + 100 / 2;
printf ("result: %d\n", result);
}
There are two new things here. One is the way we print numbers using printf()
. The function interprets a set of format specifiers, all of which start with a percent sign (%
). The specifier is replaced by the value of a variable, passed to printf()
as an additional parameter. You can have any number of specifiers and any number of additional parameters, as long as their numbers are equal: weird things will happen if you have more specifiers than parameters! You can print other things except numbers, too. Here's a short table:
Specifier | Description |
---|---|
%d | Print an int |
%x | Likewise, but in hex |
%f | Print a float |
%s | Print a string |
%c | Print a single character |
Oric-Specific
On larger compilers, format specifiers are a lot more complicated. For example, you can specify padding, number formats, or even define your own! The Oric libraries are small, however, and do not implement a complete version ofprintf()
yet. C, as usual, doesn't care what parameter you pass. It will try to print a string as an int (and fail completely), so be careful in matching specifiers and parameters!
The second and most important element of our program is an expression: we assign the result of a calculation to a variable named result
(yes, C variable names can be of any length). If you know the first thing about programming, this will be quite obvious to you. The syntax is no different than BASIC's. The interesting part is that the expression is result = 10 * 3 + 100 / 2
, and not10 * 3 + 100 / 2
as the case would be in BASIC.
This happens because in C, a variable assignment (denoted by =
) is an operator, and not a statement (as in BASIC). The =
operator calculates the expression at its right hand side, and assigns the result to the variable at the left hand side. It then returns the value assigned. This gives C its cryptic style, because you can actually write x = (y =
10)
. In this case, y
will be assigned the value 10
, and x
will be assigned the value assigned to y
. To obfuscate things further, the parentheses are unnecessary, since assignment works from right to left, and has very low precedence: y = 10
is performed first, then x = y
.
Getting used to this fact of life (or C, rather) is fundamental. It makes most of the difference between beginners and seasoned programmers. It also accounts for most of C bugs, but that is another story. Apart from this (and some of the weird symbols used), C expressions are everyday programming language expressions. Another point: C does not have any conditions as such: like conditions in Oric BASIC, if
statements, loops, etc. evaluate an expression, and consider it true if it is non-zero. In BASIC, you could write:
A=10: IF A THEN PRINT "A IS NON-ZERO" A IS NON-ZERO Ready
Exactly the same thing goes on in C. Tests for equality and inequality are (surprise!) operators which return 1
if the test succeeds (BASIC functions return -1
), and 0
if it fails.
Now, you'll probably begin to wonder: since even assignments are expressions, what happens to the result of an assignment? In (old, non-Turbo) Pascal, you can't have a result that isn't used. Well, in C you can do that. Once a semicolon (;
) is met, the result of the expression is thrown away. In this sense, result=3*10+100/2
evaluates the expression, stores the outcome in the named variable, and then throws it away. Strangely enough, the same thing is done with function calls: printf()
could well return some value (it doesn't, really), but we'll never know what that is, because it's discarded. Yes, you could do result = printf ("blah")
, but there would be no point: the function returns void
(i.e. doesn't return anything), so you'd get a warning or error from the compiler.
Finally, here's another thing: until you get accustomed to C expressions and their little quirks (they have many), use plenty of parentheses. They won't make your program slower and they won't make it bigger. They will make clear to the compiler what you want evaluated first. Here's a quick list of the types of operators:
- Arithmetic: addition, subtraction, etc. Include a couple of special operators for increasing/decreasing variables.
- Conditionals and logic: equality, inequality and Boolean operators (AND, OR, NOT). No surprises apart from some strange symbols.
- Bitwise operators: bitwise AND, OR, NOT, XOR (quite different from Boolean operators!), bitwise shifts (left and write).
- Special operators: assignment and more obscure ones.
6. Operators
Now we'll discuss the various operators you can use in C expressions. We'll start off with the easy ones, and move on to more exotic cases.
6.1. Arithmetic Operators
OP | Name | Example | Description |
---|---|---|---|
+ | Addition | x+y | Add x and y . |
- | Subtraction | x-y | Subtract y from x . |
* | Multiplication | x*y | Multiply x times y . |
/ | Division | x/y | Divide x by y . |
% | Modulo | x%y | Remainder of x/y |
++ | Increment | x++ or ++x | Increase variable by 1 |
-- | Decrement | x-- or --x | Decrease variable by 1 |
- | Unary negation | -x | Negate the value of x |
Most of these are extremely straightforward and don't even deserve discussion. A couple of them need some clarification, though.
6.1.1. Modulo or Remainder
This calculates the remainder of the division of its two parameters. Oric BASIC doesn't have it, but Pascal users will be familiar with it, as the Mod
operator.
6.1.2. Increment and Decrement Operators
These two are two very useful operators. Given a variable, they increment or decrement its contents by 1
. The ++
or --
operator can be put before or after a variable. It behaves differently, depending on its position. When put before a variable, it changes the variable's value beforethat variable is used in the expression. If put after a variable, it alters its value after the variable has been used in the expression. This sounds obscure, and requires an example (or two):
Assume we have two variables, x=10
and y=20
. We calculate the expression x=x+(++y)
. The contents of y
are incremented before the variable is used, so the result will be x=10+21
, or x=31
. The value of y
is 21
.
Now let's again assume x=10
and y=20
. In calculating x=x+(y++)
, the contents of y
are used before it gets incremented. Thus, x=10+20
, or x=30
. The value of y
at the end of the calculation is again 21
, but the result of the expression is not the same. This double syntax comes very handy when programming loops.
6.1.3. Unary Negation
This is no other than our very own -
operator, as used on single operands. For example, assuming x=10
, the result of -x
is -10
. This should be simple enough.
6.1.4. Conditions, Comparisons and Logic
These operators are used in conditionals, comparisons and logic operations. Most should look very familiar.
OP | Name | Example | Description |
---|---|---|---|
> | Greater than | x>y | 1 if x>y , else 0 |
>= | Greater/equal | x>=y | 1 if x>=y , else 0 |
< | Less than | x<y | 1 if x<y , else 0 |
<= | Less/equal | x<=y | 1 if x<=y , else 0 |
== | Equal to | x==y | 1 if x = y , else 0 |
!= | Not equal to | x!=y | 1 if not x==y , else 0 |
! | Logical NOT | !x | 1 if x is zero , else 0 |
&& | Logical AND | x&&y | 1 if both x and y non-zero, else 0 |
|| | Logical OR | x||y | 1 if either x or y non-zero, else 0 |
As you can see, all these operations return one of two possible values: 1
to signify Boolean true, 0
for Boolean false. The same thing goes on in Oric BASIC. Although the logical and conditional operators return only 1
or 0
, they consider any non-zero value as True.
6.1.5. Greater Than, Less Than, etc.
BASIC users, beware! The ‘greater than/equal to’ and ‘less than/equal to’ operators have to be written exactly as shown: C won't recognise ‘=>
’ as ‘greater than or equal to’.
6.1.6. Equality (==
)
You should be very careful of this one. Do not confuse it with the assignment operator (=
). To check two expressions for equality, only use ==
. In many cases the compiler will warn you if you confuse them, but not always.
6.1.7. Inequality (!=
)
No surprises here. Just remember that <>
does not work in C!
6.1.8. AND (&&
), OR (||
)
These work just as they do in all programming languages. An important note: &&
and ||
are Boolean operators. There is another set of operators for dealing with bit values. This means, for example, that 255 && 31
will return 1
(remember, this set of operators considers allnon-zero values as 1
). I cannot stress it enough, but remember: careless use of these logical operators is the source of many bugs in C programs.
6.1.9. NOT (!
)
This is a unary operator. Put it before an expression, and it will return 0
if the expression is non-zero, and 1
if the expression evaluates to zero. Just your ordinary Boolean NOT operator. Let me again caution you: !
will not reverse bit values! It, too, only deals with zero and non-zero values. Bit-wise negation is handled by another operator.
6.2. Bitwise Operators
These operators deal with the bits patterns that comprise values in C (and in every other programming language). All well-known, standard bit operations are supported.
OP | Name | Example | Description |
---|---|---|---|
~ | Complement | ~x | Toggle all bits 1↔0 |
& | Bitwise AND | x&y | Bitwise AND of x , y |
| | Bitwise OR | x|y | Bitwise OR of x , y |
^ | Bitwise XOR | x^y | Bitwise XOR of x , y |
<< | Left shift | x<<y | Shift x y bits to the left |
>> | Right shift | x>>y | Shift x y bits to the right |
6.2.1. Bitwise Complement
This operator changes all of its operand's bits from 0
to 1
and vice versa.
6.2.2. Bitwise AND, OR and XOR
These apply the respective Boolean operators to all bits in their operands. They work like their BASIC counterparts.
6.2.3. Left and Right Shifts
These operators move (shift) the bits of their first operand N places to the left or right, where N is the second operand. Bits ‘pushed off’ are lost, and ‘empty places’ are filled with 0
bits. So, shifting 00010111
two places to the right, we get 00000101
. These are bitwise shift operators, which means that they deal with all of their operands' bits, regardless of whether the operands are signed or not. Shifting a number one place to the left doubles it; shifting it to the right halves it. This is much faster than integer multiplication or division, and so it is preferred (although most compilers, the Oric one included, take advantage of this fact to speed up programs).
6.3. The Strange Ones
This section discusses C's arsenal of weird and wonderful operators, many of which aren't even readily recognisable as such!
OP | Name | Example | Description |
---|---|---|---|
= | Assignment | x=y | Put value of y into x . |
X= | Compound Assignment | x*=y | See description below. |
[] | Array element | x[0] | Access Nth element of array x |
. | Member Selection | s.x | Member named x in structure s . |
-> | Member Selection | p->x | Member named y in structure p points to. |
* | Indirection | *p | Contents of location whose address in p . |
& | Address of | &x | Address of symbol x . |
sizeof | Size of | sizeof(x) | Size of x in bytes. |
() | Function call | foo(99) | Call function foo with argument 99 . |
(type) | Type Cast | (int)x | Converts x to a int . |
?: | Conditional | x1?x2:x3 | See description below. |
, | Sequential Evaluation | x++,y++ | See description below. |
6.3.1. Assignment and Compound Assignment
We have already discussed the assignment operator: it evaluates its right hand side, stores the result in the left hand side variable, and returns that result. Compound assignments are an extension of plain vanilla assignments. The X
should be replaced by any of these operators: + - *
/ % << >> & ^ |
. So, the expression x+=100
is equivalent to x=x+100
. I agree it's strange, but compound assignment is a very handy trick, and one that is very commonly used in C programs.
6.3.2. Array Element Operator
This is a simple one. Assume you have an array a
. By writing a[n]
, you refer to the n
th element of the array. a[0]
is always the first element of the array. Note that the brackets are square, like in Pascal. Sorry, C arrays have to start with element 0
. The reason will be made plain later on, when we discuss arrays (if I haven't been assassinated by BASIC or Pascal activists by then—will it help if I say I really adore Pascal?).
6.3.3. Member Selection Operators
BASIC users will be generally unfamiliar with C structures; Pascal programmers may identify them with Pascal records. In short, structures are how you define custom data types in C. Worry not, structures will be discussed in the near future as well! Say we define a data type to represent complex numbers, so that it contains two float
s, called real
and imag
(for ‘real’ and ‘imaginary’). Say we declare a variable x
of this data type. To refer to the real part of x
, we should write x.real
. Pascal people, you should be at home here! Now comes the strange stuff. Before you start, I will also explain pointers in the near future. Say we have the address of x
stored in a pointer p
. We'd refer to the imaginary part of the complex number whose address is stored in p
using (*p).imag
. We can write exactly the same thing as p->imag
. This saves some time, but otherwise the two forms are equivalent. Again, don't panic! All will be explained soon!
6.3.4. Indirections and Addresses
A warning before I describe these: they are unary operators, and not to be confused with the binary operators *
(multiplication) and &
(bitwise AND). Things are really simple: if we need the address of a variable x
, we write &x
. To refer to the contents of the memory address stored in y
, we write *y
.
6.3.5. The sizeof Operator
It may look like a function, but it's not (notice I don't write it as ‘sizeof()
’). This operator accepts a data type or a variable, and returns its size in memory. sizeof(int)
returns 4
(or 2
on 16-bit compilers). sizeof(x)
returns 4
again, assuming x
is an int
. This is a really useful operator in C. It allows you to deal with data types whose size you don't know, or changes between different platforms (modularity and portability are big words in C).
6.3.6. The Function Call Operator
Simple, really: this operator calls the function at its left, and passes it the arguments within the parentheses. It needs to be an operator because the name of a function by itself has a differentmeaning (the address of the function in memory, actually). This is why you need to append ()
to a function, even when it doesn't take any arguments. The parentheses ask the compiler to execute the function instead of dealing with its address. This address thing is an advanced topic, and you might as well forget about it if its purpose is not entirely clear to you.
6.3.7. Type Casts
Another straightforward operator, although with a little complication of its own (do you C why C is so strange? You have side effects [and puns, for that matter] everywhere). Let x
be a float
, and y
an int
. If you divide the two, you might get the right result, but this is not necessary (behold the root of many an evil in C). The best thing when doing arithmetic is to use the same type of operands. However, this is impossible most of the time. Here's the solution. We can divide x
by y
by writing x/(float)y
. This converts y
to a float
and performs the division. You can apply it to anything you like: y+(signed int)foo(99)
works as well. Here's the catch: type casts only convert between simple C data types, that is types the C compiler knows about before reading your program. Type casting to anything else just makes the compiler think the operand has been converted. This is used all the time to get rid of spurious compiler warnings of the ‘expected
char, found int
’ type.
6.3.8. The Conditional or Ternary Operator (?
)
As far as I know, C is the only language with a hard-wired ternary operator (one that takes three operands). The ?
operator is among the most powerful C constructs. It is a sort of if
statement for expressions, very similar to the LISP if
special form. It's written like this:
cond ? if_true : if_false
If the expression cond evaluates to 0
, the expression if_false
is evaluated and returned. Otherwise, the expression if_true
is evaluated and returned. This allows us to write something as cryptic as this:
x = (x >= 10) ? 10 : (x + 1)
This checks if x>=10
; if it is, it returns 10
(which is assigned to x
). Otherwise, it returns x+1
, which is assigned to x
.
6.3.9. Sequential Evaluation
Right. This one is easy and straight. Assume you have the expression x=(y+1,z*2)
. The =
operator evaluates its right hand side to get the value to assign to x
. So it encounters y+1
and increases y
by 1
. Next it sees the comma. It throws away the result we just calculated, and goes on to calculate z*2
. It does so, and, seeing no more commas, returns the value for assignment. Thus, the comma operator evaluates the expression to its left, discards its result, evaluates the expression to its right and returns its result. Sounds silly, but it's useful now and then (in loops for example, or when you want to really confuse someone — Yes, I will indeed explain loops soon enough!).
7. Operator Precedence
Like many programming languages, C applies its operators to expressions according to their precedence. Operators with high precedence are applied first. This parallels the precedence of operators in arithmetic, where multiplications take place before additions. The table below lists the numerous C operators grouped by their precedence (higher precedence first).
Operator Type | Operators |
---|---|
Expression | () [] . -> |
Unary | - ~ ! * & ++ -- sizeof() |
Multiplicative | * / % |
Additive | + - |
Shift | << >> |
Relational (inequality) | < <= > >= |
Relational (equality) | == != |
Bitwise AND | & |
Bitwise XOR | ^ |
Bitwise OR | | |
Logical AND | && |
Logical OR | || |
Conditional | ?: |
Assignment | = *= /= %= -= <<= >>= &= |= ^= |
Sequential Evaluation | , |
8. Statements
In this section, we'll delve into the not-so-deep matter of C statements. Twelve different keywords are used to construct the eleven possible statements. We'll take them one by one.
8.1. Special Statements
We'll do this the wrong way round and explain the odd ones out first. Since the odd ones out happen to be both useful and simple, this actually makes some more sense.
8.1.1. The (Emperor's New) Null Statement
Try this very useful program:
void main() /* Do absolutely nothing! */
{
; /* These are two null statements */
;
}
If you can't see where the null statement is, don't worry. It has no keyword. You can tell it's there because of the semicolon (;
) directly to its right. A null statement does (guess!) absolutely nothing. Its only purpose in life is to be a place holder, when you have to use a statement, but you don't want to. Empty loops are an example of this. Pascal lovers will know: they have a null statement, too.
8.1.2. Evaluation Statement
An evaluation statement is an expression which we want evaluated. An expression might be a calculation, a function call, etc. Here's a short example (the function consists of one variable declaration and two evaluation statements):
void main()
{
int a;
printf ("Power to little soup pots!\n");
a=1;
}
8.2. Branching
This section discusses various ways of controlling program flow in C, using branching.
8.2.1. The goto Statement
Yes, it's our old favourite, the goto
statement. Here's an example:
void main()
{
goto that_label_over_there;
printf ("This never gets printed.\n");
that_label_over_there: /* a label (surprised?) */
printf ("This always gets printed.\n");
}
To jump to another place in a C program, you have to define a label at that point. A label is any valid C identifier ending in a colon (:
). In our case, the label is eloquently named ‘that_label_over_there
’.
Labels are local to the function they are defined in. You can't jump from one function to another using goto
. You can goto
into and out of loops and other constructs. Bear in mind, however, that using goto
to jump into a loop is very bad C and bound to create problems.
Generally, try not to overuse this statement. Anything you can do with a goto
, you can do with the other C constructs. Do not ignore it, though! It may not be the best in structured programming, but when you're struggling to save just a byte more RAM, anything is acceptable.
8.2.2. Returning from a Function
The return
keyword does two things: it exits the current function (this is why it's listed among the other branching statements), and returns an evaluated expression to the caller (if the function allows that, of course). return
ing from the main function effectively exits the program. Here's an example:
int foo(int bar) /* A new function */
{
return bar * 2; /* return a value */
printf ("This is never printed.\n");
}
void main()
{
printf ("foo(10)=%d\n", foo (10));
return; /* exit from the program */
printf ("Never printed either.\n");
}
Limitations
You can't return anything bigger than 32 bits (4 bytes) on the 32-bit compiler, or 16 bits (2 bytes) on the 16-bit compiler. So this limits things tochars
, ints
, and other short data types. Soon we'll see how to return bigger ones (yes, the solution involves pointers). 8.2.3. The if…else Statement
This statement is the equivalent of the IF
…THEN
…ELSE
statement in BASIC, only more powerful. Here's its syntax:
if (expression) statement1
or
if (expression) statement1 else statement2
The if
statement first evaluates expression. If it is non-zero (i.e. true), statement1 is executed. Otherwise, if the optional else
part is there, statement2 is executed.
The parentheses around expression are not part of the expression; they're part of the statement and are obligatory. statement1 and statement2 can be single statements, or blocks (lots of C code enclosed in curly brackets {
…}
— a block is equivalent to a single statement).
Since they can be any statement, they can also be other if statements. This creates a complication: you can have nested if
s. In this case, the problem occurs when we try to decide which if an else
‘sticks’ to. The answer is simple: an else will always pair with the closest if
. You can avoid this (admittedly embarrassing) situation by putting the nested if
s inside curly brackets. Let's see a couple of examples:
if (game_over)
if (rudeness > 100) blow_raspberry();
else go_on_with_game();
if (game_over)
{
if (rudeness > 100) blow_raspberry();
else be_nice();
}
else go_on_with_game();
The first example is wrong in the sense that it doesn't do what we want to (rather, it does what we tell it to.) The else
sticks to the closest if
, which is the one on the line above (the indentation of the else is wrong, it should be under the if
above it). If you try to express this in words, you'll get the same kind of ambiguity. So, it's better to use blocks to disambiguate nested if
s, as in the second example, which is much clearer.
Of course, if statement2 is another if
statement, you get something like the following:
if (condition1) statement1
else if (condition2) statement2
else if ...
...
else if (conditionN) statementN
or
if (condition1) statement1
else if (condition2) statement2
else if ...
...
else if (conditionN) statementN
else statementN1
This allows you to check for various conditions at once. The final else
part is optional. Here is an example. See if you can figure it out.
void move_around()
{
/* get a character, convert to upper case */
int c = toupper (getchar());
if (c == 'A') move_up();
else if (c == 'Z') move_down();
else if (c == 'L')
{
/* using a block and a nested if */
if (can_move_right()) move_right();
else beep();
}
else if (c == 'K')
/* a nested if without a block */
if (can_move_left()) move_left();
/* we can't have an else here -- it would
stick to the top-level if. To use an else
we have to also use a block. */
else wrong_key();
}
8.2.4. The switch Statement
The above example is a bit awkward. switch
provides a way to test an expression for a number of values, and act differently, depending on the value. Its syntax is as follows:
switch (expression)
{
case constant1:
statements1;
break;
case constant2:
statements2;
break;
...
case constantN:
statementsN;
break;
default:
statementsN1;
}
switch
evaluates the expression within the parentheses (which are mandatory, like in if
). Then it goes through the case
statements, and compares the result of the expression to the constant next to each case
. If they are equal, all statements between case
and break
are executed. If no match is found, and the optional default:
part exists, the statements between default:
and the closing curly bracket are executed. This allows for a default case, which applies if nothing else does.
Strangely enough, break
is also optional! If a break
keyword is not found in a case, the statements for the next case are executed as well, until a break
is found, or we reach the closing curly bracket at the end of the case
statement. Be careful of this, it's a nice feature in some cases (pun intended), but try not to forget the break
s when you need them! Here's an example:
void interpret_commands()
{
switch (toupper (getchar()))
{
case 'A':
move_up();
break;
case 'Z':
move_down();
break;
case 'K':
if (can_move_left()) move_left();
else beep();
printf ("2 statements in this case.");
break;
case 'L':
if (can_move_right()) move_right();
else beep();
break;
case 'Q': /* save and exit */
save(); /* no break! */
case 'X': /* exit */
bye_bye();
break;
default: /* oops, wrong key! */
bzzz_wrong_key();
}
}
8.3. Loops
This concludes the section on C statements. C's collection of looping constructs is among the most comprehensive and powerful around. Here they are.
8.3.1. ‘For’ Loops
Let's have a look at BASIC's FOR-TO-STEP
statement: it assigns a value to a variable, and then loops through the statements, increasing the variable by the value declared in STEP
(1
by default), until the variable reaches the value specified in the TO
part.
The for
statement in C has the same parts: initialisation, termination condition and step. The similarity ends here. This is the general form of a for
loop:
for (expression1; expression2; expression3) statement
This is the algorithm for
uses:
- Evaluate expression1.
- If expression2 is true (non-zero), execute statement, otherwise end the loop.
- Evaluate expression3.
- Go to step 2.
The point here is that you're not limited to a simple variable which increases or decreases: anyexpression will do. Here are a few examples (time to brush up expressions, too):
int i,j;
for (i = 1; i <= 10; i++) printf ("%d\n", i);
for (i = 1; i <= 256; i *= 2) printf ("%d\n", i);
for (i = 1, j = 0; j <= 8; i <<= 1, j++)
printf ("%d\n", i);
for (i = 1, j = 0; j <= 8; j++) {
printf ("%d\n", i);
i <<= 1;
}
for (i = 1, j = 0; j <= 8; printf ("%d\n", i), i <<= 1, j++);
These examples cover a range of different styles (they get progressively more difficult to follow).
The first one is the easiest: it loops, allowing i
to take values from 1
to 10
(inclusive), and prints each value of i
to the screen.
The second one demonstrates the fact that for
loops are not limited in the same way BASIC and PASCAL loops are: it starts with i=1
, prints it, doubles i
, and loops as long as i<=256
.
The third one does exactly the same thing, but I thought I'd show what all those strange C operators can do in a real program. Careful: the initialisation statement is not i=1
, but i=1,j=0
. Remember, for
expressions are delimited by semicolons (;
), nothing else! Here, the loop goes on for values of j
in the range 0
to 8
. The current value of i
is printed, and then i
is shifted one place to the left (which is the same as doubling it), and j
is increased by one. The apparently useless comma operator gives tremendous power to for
loops.
The fourth example does the same thing again, but in a way which is slightly easier to understand (and introduces the next example). Note that C allows you to play with the looping variables in any way you want, even inside the looping statement. If you feel like it, go ahead and reset i
to zero right after the printf()
call (of course it will never terminate).
The fifth example is the strangest of them all: the printf()
function call is inside the loop! No problems, just remember that a function call is an expression. Since it's in the third part of the for
loop, printf()
gets called right after the looping statement (which is null, so it does nothing anyway), and before the variables are increased. Try it out using the algorithm above, you'll see it does the same as the previous three examples.
An interesting addition to all that: the three expressions in for
loops are optional. If you don't need one, don't include it! Never forget the semicolons, though.
- If you don't include an initialisation expression, no initialisation occurs.
- If you omit the looping condition (the second expression), for assumes the condition is always TRUE.
- If you omit the stepping expression (the third one), nothing is evaluated to step the loop (it's assumed you do that inside the looping statement).
Omitting all of the expressions results in an infinite for
loop: for (;;)
will execute for ever.
Which brings us to another two statements which you can use inside all C loops: break
and continue
. break
breaks out of the loop whose looping statement break is. Here's a little example:
int i;
for (i = 1; ; i++){
if (i > 10) break;
printf ("%d\n",i);
}
The if
statement will invoke break
to end the loop when i>10
. This plays the role of a terminating condition, since the for
loop doesn't have one. It doesn't look exceptionally useful, but sometimes you need to stop a loop quickly, without waiting for the next test of the looping condition.
continue
is very similar. Whenever it is met inside a looping statement, it will cause the loop to skip the rest of the looping statements and go to the looping condition test. Here's an example of that:
int i;
for (i = 1; i >= 100; i++){
if (i % 2) continue;
printf ("%d\n", i);
}
This loop will go through all numbers in the range 1
…100
. The if
statement will check the remainder of the division i/2
. If it is non-zero (i.e. i
is not divisible by 2
), it will invoke continue
, skipping all the rest of the statements (only the call to printf()
in this case). The net effect is that the loop will print all the even numbers in the range 1
…100
. Again, continue
's virtues may not be so obvious, but there are uses for it.
8.3.2. ‘While’ Loops
The general form of this loop is as follows:
while (expression) statement;
A while
loop works like this:
- Is expression true (non-zero)?
- If yes, execute statement, then go to step 1.
- If no, end the loop.
This should be simple enough. Here's a brief example:
printf ("Please press space to continue.\n");
/* Not very friendly... if you press any- */
/* thing else you get FOO-ed at. :-) */
while (getchar() != ' ') printf ("FOO!\n");
while
's test expressions calls getchar()
to read the keyboard, and checks it against the space character. while
it's not a space character, FOO at the user and go back to reading the keyboard.
Unlike for
, while
's test expression is mandatory. If you want an endless while
loop, you'll have to write is as while (1)
. 1
is usually non-zero (and when it becomes zero, it'll probably be the end of time anyway).
break
and continue
work here, as well.
8.3.3. ‘Do’ Loops
The general form of do
loops is as follows:
do statement while (expression);
It's a reversed sort of while
loop. Here is its algorithm:
- Execute statement.
- If expression is true (non-zero), goto step 1.
Straightforward, right? Example time!
/* Slightly friendlier... this time */
/* you don't get a FOO, but you still */
/* have to look for the space bar. :-) */
do {
printf ("Press space to continue.\n");
} while (getchar() != ' ');
The example should be self-documented. We print the ‘press space’ message, and read the keyboard. As long as the user doesn't press space, we go back to printing the message and reading the keyboard.
By the way, the curly brackets aren't necessary; I put them because it looks a bit better that way (that's the effect of three solid years of Pascal).
Everything that holds for the while
keyword in a while
loop also holds for the same keyword in a do
loop: don't leave the parentheses out, and you can't omit the expression. Remember that break
and continue
also work in do
loops.
8.4. Intermission
This concludes the discussion of C statements. Coming next: C functions, pointers, advanced data structures and more. For the moment, here's a little joke program to test your understanding of C. Try to find the bug (it's really simple)!
#include <stdlib.h>
#include <radar.h>
#include <missiles.h>
/* The World's Last C Bug */
void main()
{
for (;;){
int invasion = get_radar_status();
if (invasion = 1) launch_missiles();
}
}
9. Functions
C functions are the building blocks of all C programs. In fact, in C there is no such concept as a ‘main program’ (like in BASIC or Pascal, for example). The main program is a function by itself — by convention called main
.
Most of C programming consists of writing and calling functions. A huge, monolithic program is considered bad practice since this is a structured language. There's also another reason to use functions: you avoid the useless repetition of code, which, on the Oric, is definitely a bad idea.
Here's the syntax of a C function, along with an example of a function to calculate the sum of the squares of its two parameters (called arguments in C).
type function_id (type p1, type p2, ..., type pN)
{
statements
}
int sum_of_squares (int x, int y)
{
return x * x + y * y;
}
Here are the basic parts of the function, in order of appearance:
9.1. The Function Type
What the function returns. We may return any data type, as long as it takes four bytes or less (two bytes or less for the 16-bit compiler). This may not seem very useful (how do you return a string?), but we'll soon see that it all gets solved using pointers (i.e. instead of returning the data type itself, we return its address in memory). More of this later on. By the way, an interesting data type is void
, which is the empty data type. If you use it here, you're effectively saying that your function will not return anything (something like a BASIC subroutine or a Pascal procedure). You can skip the type altogether: int
is the default value.
9.2. The Function Identifier
Simply put, the name of the function. Like all C identifiers, function identifiers may only consist of lower or upper-case letters, and the underscore (_
). The case of the letters matters (as usual), and the identifier may have any length you want. A point about this: don't use short, cryptic identifiers to save memory. This is a compiled language, which means that identifiers are for the user's reference only and are not stored in the executable program.
9.3. The Argument List
The argument list: this is a list of comma-delimited argument declarations. The list has to be enclosed in parentheses. If there aren't any arguments, the argument list is empty, but you still have to have the parentheses (i.e. ()
). An argument declaration consists of a type and a argument identifier: it's just like declaring variables. Unlike variable declarations, though, you cannot declare two int
arguments x
and y
as (int x, y)
. C, for some perverted reason I won't even try to guess, wants a data type to the left of each and every argument identifier.
9.4. The Body of the Function
The body of the function: it must be enclosed in curly brackets. It contains (of course) variable declarations and statements. The arguments of the function are accessible here just like perfectly normal variables (which they are).
An important point: nothing stops you from declaring a variable with the same name as an argument. This is called shadowing an argument. The variable takes precedence over the argument (since the variable was defined after the argument). Say you have an argument x
, and you also declare x
as a variable inside the function. Every time you refer to x
, you refer to the variable, and not the argument. This may result in a lot of problems if you haven't noticed what's going on.
Speaking of variables, remember that, unlike global variables which are always filled with 0
, the initial values of variables in functions are undefined. Take care to initialise your function variables to a suitable value.
Finally, don't forget that your function may have to return
a value. If the function type is void
, you don't need to return anything. If you need to bail out of the function, you use return
with no arguments. If the function type is anything else but void
, you have to return a value whenever you exit the function. Use return
with an expression as its argument to return a value. You can see this in the short example function. If you don't explicitly return a value, anything can get back to the caller. Most compilers warn you of this, of course.
9.5. Another Way
Here's an alternative way of declaring a function (with the same example):
type function_id (p1, p2, ..., pN)
type p1;
type p2;
...
type pN;
{
statements
}
int sum_of_squares (x, y)
int x, y;
{
return x * x + y * y;
}
What's different here? Basically, the argument list may just contain the identifiers of the arguments. Their types are declared between the closing parenthesis and the opening curly bracket. The manner of declaration here is identical to that of declaring variables. As you can see in the example, we declare x
and y
as ‘int x, y;
’.
This redundancy has a reason: by not declaring any types for the arguments, you stop the compiler from complaining whenever you try to pass a value of the wrong type to the function. This is quite useful sometimes.
9.6. What's Your Order?
Here's the catch. You can't use a C function unless it's already been declared earlier on in the program. Well, that's not exactly true. You can do this, but the compiler seldom likes it, because it needs to check if the variables you pass to the function are of the right type and, unless the function is already defined, the types are unknown.
The solution involves prototype declarations: function declarations without bodies. To use a function A
before it's declared, copy its header (the type, name and argument list — only the bit inside the parentheses) right above the function where you want to call A
. The prototype for sum_of_squares()
would be like this:
int sum_of_squares(int x, int y);
This is also used in the C Header files (.h
files), where the various library functions are merely prototyped, not actually implemented (because they're written in Assembly, not C).
9.7. Limitations
Due to the way the 16-bit compiler is implemented, the variables (including arguments) local to a function cannot exceed 256 bytes in size. In fact, this holds for all non-global variables (i.e. all variables inside curly brackets). I don't have complete details on the nature of this limitation, but it can be quite annoying (you'll have to consider making some of your variables global). The 32-bit compiler is not limited in this way, fortunately.
10. Iteration vs. Recursion
And now, at the risk of being accused to be impossible to understand, I'll try to outline a technique of which Pascal users are aware, but one that (because of large memory overheads) should probably be avoided on the Oric: recursion. To describe recursion, I'll compare it to iteration by using a simple example (which will probably only complicate things, but here goes anyway).
C (like Pascal) does not have a built-in power operator. Probably so people like me can use raising to a power as an example of recursion. So, how do we define the familiar power function? There are two obvious ways to do it. One is to say that xy is 1 if y=0, else xy is x*x*
…*x
, y
times. Here's a C function to do this:
unsigned int
power (unsigned int x, unsigned int y)
{
unsigned int result;
if (y == 0) return 1;
for (result = x; y > 1; y--) result *= x;
return result;
}
This is the iterative way of doing things: in short, use a loop. It's nice, it's fast, it takes minimal amounts of memory (well, it does in this case). However, we may also define power as follows 2: x0 = 1, and xy = x xy-1. Think about it. What this says is that, if you have the (y-1)-th power of x
, you can always find the y-th power by multiplying the whole thing times x
. It makes sense, doesn't it? In mathematics, this is called a recurrence relation*: defining something in terms of itself. In programming, it's called recursion. Here's a function to do just that:
unsigned int
power (unsigned int x, unsigned int y)
{
if (y == 0) return 1;
else return x * power (x, y - 1);
}
This is all! As you can see, you may call a C function from within itself. The recursive version of power does the following: if y==0
, the result is 1
. Otherwise, the result is x
times xy-1. Let's see a trace of the function call power(2,3)
:
2 * power (2,2) 2 * (2 * power (2, 1)) 2 * (2 * (2 * power (2, 0))) 2 * (2 * (2 * 1)) 2 * (2 * 2) 2 * 4 8
The recursive power is a good-looking function: it adheres to the mathematical model for raising to a power, and it's much simpler than the iterative one. However, always remember that an amount of memory is allocated on the C stack for every function call. This, coupled with the slightly lower speed of recursion, makes the whole thing rather undesirable for a small 8-bit machine. Take this with a pinch of salt, though. There are always cases where you might want to use recursion.
In case you feel like using it, though, here is the most important tip on recursion: never forget that there are two cases in a recursive function. The base case, which does not recurse any more (i.e. x0 in our example), and the recursing case (or cases). In order for the recursion to end (we usually want this to happen), the base case must always be reached, sooner or later.
There may be multiple base cases! Here's an improved recursive power()
function has an extra condition and saves us one function call and one multiplication (both of which are expensive on the Oric):
unsigned int
power (unsigned int x, unsigned int y)
{
if (y == 0) return 1;
else if (y == 1) return x;
else return x * power (x, y - 1);
}
Let's see a trace of this new function being used to calculate power(2,3)
:
2 * power (2,2) 2 * (2 * power (2,1)) 2 * (2 * 2) 2 * 4 8
11. Data Structures
The next sections will deal with C's advanced data structures. These include arrays and strings (including an explanation of why they behave so strangely), but also composite data structures (those made up of other, simpler data structures).
11.1. Arrays
Arrays are the simplest of these data structures. An array in C is just like an array in any other language: a group of data items of the same type. Let's skip the boring theoretical bit. Here's a small program that copies the first line of the screen to an array of characters.
/* Need version 0.5 of the compiler. */
#include <stdlib.h>
#include <sys/oric.h>
void main()
{
unsigned int i;
char status_line [40];
for (i = 0; i <= 39; i++) {
status_line [i] = peek (48000 + i);
}
}
First of all I have to say that the header file sys/oric.h
contains various Oric-specific definitions (of which this program uses peek()
). A note for DOS users: I've used a (forward) slash (/
) as the directory separator. This is perfectly acceptable (although undocumented) on DOS, and has the extra advantage of allowing the same program to be compiled with the Linux version of the compiler. Linux, on the other hand, only allows forward slashes (/
) as directory separators, so we use /
which is understood by everyone.
Right, back to the point. This program shows how to define and use arrays (in this case, we define a forty-character array named status_line
). You define an array just like any normal variable, but you append to it the number of elements enclosed in square brackets:
datatype name [number_of_elements];
Where datatype is the data type of the array's elements, name is the name of the array, and number_of_elements is the price of fish in India 3.
You can access an array element by using the array element operator ([]
). In the case of our example, status_line[0]
is the array's first element and status_line[39]
is the last element. C arrays always start with element 0
. Because of this, this array's last element does not have index 40
, but 39
. Don't forget this, it's a common mistake among people who have just migrated from a more ‘reasonable’ language (like Pascal).
The reason for this will be explained later in this article. As for how you use an array element, (I hope) it's made clear in the example: you can treat it like any ordinary variable. Assign values to it, use it in expressions, etc.
How much space does it take? Simple. You can always use sizeof(status_line)
to find out. You'll see it takes 40 bytes of memory. Unlike other languages, C arrays are contiguous and packed: all array elements are grouped together in one large block of memory and there is no unused space between array elements. So, the size of an array is the size of one element times the number of elements in the array.
Another warning: try to avoid using off-range array elements (e.g. status_line[40]
is off-range, since the last element is number 39
). C performs no range checking (for reasons of speed), and you'll never get a compiler error if you do that. Your program will probably fail in very strange ways, though.
11.1.1. Multi-Dimensional Arrays
Of course, in some cases, you'll want to use arrays of more than one dimension (for example, storing a list of co-ordinates will need a two-dimensional array). It's easy:
datatype name [n1][n2]...[ni];
Example (a two-dimensional array of co-ordinates in space):
int coord_list [10][3];
How much space does this one take? It's still the same, of course: the number of elements times the size of one element. In this case, coord_list has a total of 10*2
elements; sizeof(int)
is 4
(2
for the 16-bit compiler), so the total size of the array in memory is 10*2*4
bytes.
To address an element of a multi-dimensional array, you have to give all dimensions inside separate square brackets: coord_list[3][0]
is an example of how you do it. You might be tempted to say coord_list[3,0]
. C won't say anything, but it will be wrong. Remember the comma operator? It evaluates everything, but discards the result of the expression to its left and returns the result of the right-hand side one. So this will be the same as coord_list[0]
. You might think that addressing a two-dimensional array by using a single dimension is an error. It's not, in C. I'll make it clear when I explain how C handles arrays internally.
How is it organised in memory? A multi-dimensional array is still contiguous and packed. The sequence of elements in memory is like this:
int A [10][3];
Sequence of elements in memory:
A[0][0], A[0][1], A[0][2], A[1][0], A[1][1], A[1][2], A[2][0], ..., A[2][2], ... A[9][0], ..., A[9][2]
11.2. Strings
By now you should be wondering about these. C knows about strings (printf()
prints a string, for example), but there is no such thing as a string data type. Here's the answer: in C, strings are represented as character arrays. Here's a little program to demonstrate the use of strings.
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
void main()
{
char s[14];
strcpy (s, "Hello World!\n");
printf (s);
}
strcpy()
copies the source string (the second argument) to the target string (the first argument). For extra functionality, it also returns the resultant string. The target string is modified anyway, so we don't need to use the function's result. printf()
works as usual, and prints the string.
As you can see, we use a simple character array to hold the string. Note the length necessary to store the string: "Hello World!\n"
is 13 characters long (\n
is only one character, ASCII 13 or Newline
/Return
). Yet, s
is 14 characters long. This is because C uses null-terminated strings. That is, there is an ASCII 0 after the string's last character so that the language knows where the string ends. If the terminating null is missing, the computer doesn't know when to stop processing your string and you'll get weird behaviour.
Intermission with Strings
Two strings walk into a bar. The first string says: ‘I think I'll have a beer quag fulk boorg jdk^CjfdLk jk3s d%f67howe%^U r89nvy~owmc63^Dz x.xvc’. ‘Please excuse my friend’, the second string says, ‘he isn't null-terminated.’Version 0.5 of the Oric C compiler includes a complete implementation of the ANSI C string library (the header file for this is string.h
). I'd like to describe some of the string-related functions, but it's impossible because of the lack of space. Please refer to the comments included with the header file, they are very helpful.
11.2.1. Initialising Arrays and Strings
We've seen that you can initialise C variables as you declare them. You can do the same for arrays:
int random_numbers[7] = {1,2,3,17,21,42,666};
int coord_list[4][2] = {0,0, 1,0, 1,1, 0,1};
char hello[6] = {'h','e','l','l','o',0};
char foobar[7] = "foobar";
The constants inside the curly brackets are stored in the array elements starting from element 0. You don't have to provide values for all elements, but uninitialised elements will have random values. Values are stored according to their sequence in memory (so, for the multi-dimensional array, things can be slightly more complicated — I've grouped elements together to show how it's done).
In the case of character arrays, you can either use character constants inside curly brackets (hello
demonstrates this. Note the terminating null, the last element). Alternatively, you can just do what I've done with foobar
and initialise the array with a string (C adds the terminating null automatically).
If you don't want to adjust the number of elements every time you change the initial values, you can declare the array in this fashion:
int random_numbers[] = {1, 2, 3, 17, 21, 42, 666};
char foobar[] = "foobar";
C will set the number of elements to fit the required elements (if you use a string initialiser, the terminating null is counted as well). Please note that this only works if you initialise an array. The size of the array is still constant; the only difference is that its size is calculated by the compiler during compilation.
11.3. Inner Sanctum
By this point you should be wondering about the weird, seemingly inconsistent use of arrays (and strings) in C. It's puzzling until you learn one little secret. This is where this secret is unveiled.
Everything is explained by this single fact: when we declare an array a
, a[0]
is the array's first element, but a
by itself is the address in memory where the array sits. The a[n]
operator works by adding to an address a
the result of n
times the size of the type of a
and using that address. In this way it's easy to translate array references to simple machine code instructions. This is why it's not a good idea to do array range-checking in C (there wouldn't be a point in designing something so fast if we had an extra couple of instructions after it).
This is also why writing a
is not an error. There are times when you want to refer to an array by its address. strcpy()
needs the address of the target string so it can store its results, hence we used just s
in the example program above.
Finally, this also explains why it's okay to declare an array as int b[10][2]
and sometimes refer to it as just b[3]
. Multi-dimensional arrays are thought of as arrays of arrays. In this case, we have a 10-element array of 2-element int arrays. So b[3]
refers to the address of the fourth (remember, they start at zero!) 2-element int
sub-array. This coincides with the address of element b[3][0]
, just like b
is the address of the array's first element, b[0][0]
. But beware: an address is not an element! Unless you specify all indices, you can't access a single element, and you'll get compiler errors.
11.4. Composite Data Structures
Assume that Laurent, for some elusive reason, needs to keep a record of all CEO members using a C program. He needs to keep diverse information on each member like their name, their s-mail 4 and e-mail addresses, the date when they first subscribed and the type of machine they have (Oric-1, Atmos or Telestrat). People with previous Pascal experience will be smiling, BASIC users will be curious, and I hope there aren't any COBOL users around.
The way to do this is by defining our very own composite data structure. Such data structures are called structs
in C. Let's define the struct
for Laurent's database 5:
struct ceo_member {
char name[40];
char smail[80];
char email[40];
short unsigned int year_subscribed;
short unsigned int month_subscribed;
short unsigned int day_subscribed;
char machine[10];
};
Place all data structure definitions outside functions. The new data structure is called struct
ceo_member
. It's not called simply ceo_member
! C also wants the struct
keyword. We define 7 fields within this new struct
. The name
field is a 40-character string (Laurent will be forced to change it if he gets many Greek subscribers). s-mail
might be long, so we allocate 80 bytes for it. e-mail
also gets 40 characters. The subscription date is stored as three short unsigned
int
s. They have a range of 0..65535 (0..255 on the 16-bit compiler). Finally, we allocate a 10-byte string for the machine
type owned by that person. The string is just long enough to hold "Telestrat"
(remember, C strings are null-terminated: we need an additional 10th character to store the final '\0'
).
How do we use this new struct
? Well, we've just defined a new data structure. We can now declare variables of type struct ceo_member
.
struct ceo_member one_member;
struct ceo_member members[1000];
one_member
is a record of a single member. Simple isn't it? Of course, we can have arrays of structs
. members
is a big enough array of CEO members.
The usual operation on a struct
is to access one of its fields. To access the name of the CEO member stored in one_member
, we write one_member.name
(remember the dot (.
) operator?) It's simple, really. The following example prints the names and addresses of the first 10 CEO members stored in members
. It also shows you how to use arrays of structs
.
void main()
{
int i;
for(i = 0; i < 10; i++) {
printf ("%d. NAME=%s, ADDRESS=%s\n",
i,
members[i].name,
members[i].smail);
}
}
Of course, you can use the fields as normal variables. You can involve one_member.year_subscribed
in any expression you want!
structs
may contain any data structure, even other structs
. This piece of information will allow us to make struct ceo_member
look better by making a new data structure:
struct date {
short unsigned int year, month, day;
};
Now we can change struct ceo_member
as follows:
struct ceo_member {
char name[40];
char smail[80];
char email[40];
struct date subscription_date;
char machine[10];
};
Now we can access the subscription year of the member stored in one_member
by writing one_member.subscription_date.year
(as you can see this can get quite long).
What's the sizeof()
of a struct
? Well, it's equal to the sum of the sizeof()
s of its fields A struct
is only a way of grouping data. It does not take any space by itself.
Word Alignment and Packing
The above holds on the Oric, but not necessarily on other computers. Most modern computers prefer their values to be ‘word-aligned’, i.e. to start on an address divisible by four (on a 32-bit computer). This can introduce gaps between fields of astruct
(and its sizeof
can be larger than the sum of the sizes of its fields). 32-bit computers don't care because they usually have enough memory to spare, or because the processor simply can't access anything unless it's properly aligned! On the Oric, space is of the essence, so we leave no gaps. ‘Grown-up’ compilers for some architectures allow you to select which behaviour you prefer, creating ‘packed’ or ‘unpacked’ struct
s. 11.5. Enumerated Types
The struct ceo_member
we discussed previously takes quite a lot of memory. For example, why do we allocate a ten byte string for the machine type, since we've only got three different machines? You'd argue that we could use a single byte to store this information. You'd be right! We can and should do this. One way to do it is by making machine a char
. Then, different values would ‘mean’ different machines. There's a nicer way: an enumerated data type, known as an enum
in C:
enum Machine {oric_1, atmos, telestrat};
Here we define a new data type, called enum Machine
(not just Machine
). It can take three values: oric_1
, atmos
, or telestrat
. We can define variables of this type and use them in the normal way:
{
enum Machine my_oric;
my_oric = telestrat;
printf ("I have an Oric");
switch (my_oric) {
case oric_1:
printf ("-1!\n");
break;
case atmos:
printf (" Atmos!\n");
break;
case telestrat:
printf (" Telestrat!\n");
}
}
This is nothing but an int
with special values declared for it! You can safely do arithmetic (for example, if I'm upgrading from an Atmos to a Telestrat, I can write my_oric++
;). In fact, the symbols you name in the enum
definition are assigned an integer value, starting at 0
and increasing. So, oric_1==0
, atmos==1
and telestrat==2
. You can override this and set the assigned values by yourself:
enum Machine {oric_1=0, atmos=10, telestrat=20};
You can even have different constants share the same values. If a value for a constant is left out, it's set to the value of the previous constant in the enum plus one. There is one argument against making extensive use of enums: they're as big as an int
, which is 4 bytes for the 32-bit compiler and 2 bytes for the 16-bit one. Of course, this means that, unlike Pascal, enum
s in C can have more than 256 different values. On the Oric, though, this is more of a curse than a blessing…
11.6. Name It!
All this business with struct this
and enum that
is getting annoying, right? Why should you have to stick struct
or enum
before the name of your data type? Well, one reason is that you can easily have both a struct x
and an enum x
without C getting confused. But it's stillannoying. This is why you can name your own data types using the typedef
keyword:
typedef old_data_type new_name_for_it;
typedef struct ceo_member ceo_member_t;
typedef enum Machine machine_t;
typedef short unsigned int nice_int_t;
This example assumes you have previously defined struct ceo_member
and enum Machine
. After these typedef
s, you can declare variables by simply writing something like ceo_member_t one_member
or machine_t my_oric
. The _t
at the end of every typedef
name isn't mandatory, but is generally a good idea so that you can tell what sort of data type this is. As you can see from the third example, you can also give new names to simple C data types.
11.7. Unions
Okay, this is a difficult one to explain. I've tried it twice. I've failed twice. Here goes anyway! It all boils down to the fact that information is only bytes which we interpret to suit our needs. The hex number #60
(0x60
for us C freaks) can be interpreted in a number of ways: in 6502 machine language, it's interpreted as RTS
. If you treat it as an ASCII character, it's the back-quote character (‘
). As an integer, it's the number 96. Data is data. Data plus interpretation equals meaningful information: a fundamental Computer Science axiom if there ever was one.
Unions are a way of imposing different interpretations (‘views’) on the same block of memory. This is required quite often. For example, there are eight bytes on the Oric-1/Atmos' page 2, organised as four 16-bit int
s. They are used for storing the arguments to various calls to graphics and sound functions. When calling the CURSET
command, these contain the X
and Y
coordinates of the pixel we want to change; the third integer contains the action to be performed on the pixel. When calling the SOUND
command, the same three integers contain the channel number, frequency and volume needed by the command. Got it so far? Right. I'll implement the same example:
enum CURSET_mode {reset=0, set=1, xor=2, none=3};
struct CURSET_data {
unsigned short int x, y;
enum CURSET_mode mode;
};
struct SOUND_data {
unsigned short int channel, freq, volume;
};
union OS_call {
struct CURSET_data CURSET;
struct SOUND_data SOUND;
};
union OS_call data;
We define various data structures, and a union
of two of them. We also declare a variable data of type union OS_call
. Say we want to call the Oric operating system to draw a pixel. We'll access CURSET
's view of the data by using the fields data.CURSET.x
, data.CURSET.y
, and data.CURSET.mode
(note a typical use of enum
there). To make a SOUND
, we'll access data.SOUND.channel
, data.SOUND.freq
and data.SOUND.volume
. The CURSET
and SOUND
parts of the union (and as many others as we wish to define) refer to the same block of memory, but allow you to interpret it in different ways. By the way, a union takes as much space as its biggest member.
Don't Reinvent the Wheel!
In practice, you don't have to do any of the above, of course! The graphics and sound routines by Vaggelis Blathras are as simple as their BASIC counterparts, and in fact call the BASIC counterparts in the Oric's BASIC ROM.12. Conclusion, Epilogue, Colophon and the like
I have no clue why, but people drop by to read parts of this article on occasion. There are better C tutorials for small devices now, and nearly all small devices are more capable than the Oric: I have an 80-MHz RISC machine responsible for just watering my plants that cost me €7, and I consider it severely limited.
‘Oric C programming’ is now about twenty 24 years old and never really had a proper ending. I was at university and university life (invariably meaning exams — it really isn't like Hollywood makes it out to be) caught up with me. There really wasn't much more left in the series anyway, though.
The one thing I'd have liked is a longish, annotated example. For this, I suspect your best bet would be to check the source code of either my Font Editor program, or the C parts of ‘Slime!’ (most of which is written in 6502 Assembly)