1011 0101 1011 0101
                   &  1110 1110              ||  1110 1110
                        -----------                   -----------
                        1010 0100                 1111  1111
                          164 DEC                      255 DEC
 

If you have an eight-bit binary value 'X' and you want to guarantee
that bits four through seven contain zeros,
 

you could logically AND the value 'X' with the binary value 0000 1111.
 

Using the logical AND, OR, and XOR operations to
manipulate bit strings in this fashion is know as masking bit strings.

 What about negative numbers?

Negative values are objects in their own right, just like positive numbers.
 

We'll  use half of the 256 different values to represent negative numbers.
So we can represent the negative values (-128) - (-1)
and the positive values 0 - 127 with one byte (8 bits).


 
The 80x86 microprocessor uses the two's complement notation.
In the two's complement system, the H(igh).O(rder). bit of a number is a sign bit.
 
If the H.O. bit is a zero, the number is positive;
If the H.O. bit is a one, the number is negative.
 
 

Examples:

For 16-bit numbers:

8000h is negative because the High .Order. bit is one.

100h is positive because the H.O. bit is zero.

7FFFh is positive.

0FFFFh is negative.

0FFFh is positive.

If the H.O. bit is zero, then the number is positive and is stored as a standard binary value.
If the H.O. bit is one, then the number is negative and is stored in the two's complement form.
 

To convert a positive number to its negative, two's complement form,
you use the following algorithm:
 
 
to compute the eight bit equivalent of -5;  
Invert all the bits in the number,   
Add one to the inverted result.  
Check by changing it back.. 

  0000 0101       Five (in binary).  
  1111 1010       Invert all the bits.  
  1111 1011       Add one to obtain result.  
 
 

 
 

 



 

The following examples provide some positive and negative 16-bit signed values:

        7FFFh: +32767, the largest 16-bit positive number.
        8000h: -32768, the smallest 16-bit negative number.
        4000h: +16,384.

To negate the numbers above:

7FFFh;   0111 1111 1111 1111     +32,767t
               1000 0000 0000 0000     Invert all the bits (8000h)
               1000 0000 0000 0001     Add one (8001h or -32,767t)

8000h:  1000 0000 0000 0000     -32,768t
              0111 1111 1111 1111     Invert all the bits (7FFFh)
              1000 0000 0000 0000     Add one (8000h or -32768t)

4000h:  0100 0000 0000 0000     16,384t
              1011 1111 1111 1111     Invert all the bits (BFFFh)
              1100 0000 0000 0000     Add one (0C000h or -16,384t)
 

8000h inverted becomes 7FFFh.
After adding one we obtain 8000h!
 

OOps  Wrong answer...
-(-32,768) is -32,768? Of course not.
 

But the value +32,768 cannot be represented with a 16-bit signed number,
so we cannot negate the smallest negative value.
With the two's complement system, most other operations are as easy as the binary system.
 

For example, suppose you were to perform the addition 5+(-5). The result is zero. Consider what happens when we add these two values in the two's complement system:
 

          00000101
          11111011
          --------
       1 00000000
 

We end up with a carry into the ninth bit and all other bits are zero.
As it turns out, if we ignore the carry out of the H.O. bit, adding two signed values always produces the correct result when using the two's complement numbering system.
 

This means we can use the same hardware for signed and unsigned addition and subtraction. This wouldn't be the case with some other numbering systems.
 

The 80x86 microprocessor provides an instruction, NEG (negate), which performs this operation. Furthermore, all the hexadecimal calculators will perform this operation by pressing the change sign key (+/- or CHS).
 

The data represented by a set of binary bits depends entirely on the context.
The eight bit binary value 11000000b could represent an IBM/ASCII character,
it could represent the unsigned decimal value 192,
or it could represent the signed decimal value -64, etc.
As a programmer, it is your responsibility to use this data consistently.
 



 
 
Since two's complement format integers have a fixed length, a small problem develops.
What happens if you need to convert an eight bit two's complement value to 16 bits?
 
Consider the value "-64".
The eight bit two's complement value for this number is 0C0h.
The 16-bit equivalent of this number is 0FFC0h.
 

Now consider the value "+64".
The eight and 16 bit versions of this value are 40h and 0040h.
The difference between the eight and 16 bit numbers can be described by the rule:
 

"If the number is negative, the H.O. byte of the 16 bit number contains 0FFh;
  if the number is positive, the H.O. byte of the 16 bit quantity is zero."
 



 

To sign extend a value from some number of bits to a greater number of bits is easy,
just copy the sign bit into all the additional bits in the new format.
For example, to sign extend an eight bit number to a 16 bit number,
simply copy bit seven of the eight bit number into bits 8..15 of the 16 bit number.
 

To sign extend a 16 bit number to a double word,
simply copy bit 15 into bits 16..31 of the double word.
Sign extension is required when manipulating signed values of varying lengths.
Often you'll need to add a byte quantity to a word quantity.
 

You must sign extend the byte quantity to a word before the operation takes place.
Other operations may require a sign extension to 32-bits.
You must not sign extend unsigned values.
 

Examples of sign extension:
Eight Bits      Sixteen Bits    Thirty-two Bits
 80h              FF80h          FFFFFF80h
 28h              0028h          00000028h
 9Ah             FF9Ah         FFFFFF9Ah
 7Fh              007Fh          0000007Fh
 ---               1020h           00001020h
 ---               8088h           FFFF8088h
 
 
To extend an unsigned byte you must zero extend the value.
Zero extension is very easy - just store a zero into the H.O. byte(s) of the smaller operand.
To zero extend the value 82h to 16-bits you simply add a zero to the H.O. byte yielding 0082h.
 
Eight Bits      Sixteen Bits    Thirty-two Bits
 80h             0080h           00000080h
 28h             0028h           00000028h
 9Ah            009Ah           0000009Ah
 7Fh             007Fh           0000007Fh
 ---               1020h           00001020h
 ---               8088h           00008088h
 



 
 

Sign contraction, converting a value with some number of bits to the identical value with a
fewer number of bits, is a little more troublesome.
 

Sign extension never fails. Given an m-bit signed value you can always convert it to an n-bit number (where n > m) using sign extension.
 

Unfortunately, given an n-bit number, you cannot always convert it to an m-bit number if m < n.
 

For example, consider the value -448.
As a 16-bit hexadecimal number, its representation is 0FE40h.
Unfortunately, the magnitude of this number is too great to fit into an eight bit value,
so you cannot sign contract it to eight bits.
 

This is an example of an overflow condition that occurs upon conversion.

To properly sign contract one value to another, you must look at the H.O. byte(s) that you want
to discard.

The H.O. bytes you wish to remove must all contain either zero or 0FFh. If you
encounter any other values, you cannot contract it without overflow.
Finally, the H.O. bit of your resulting value must match every bit
you've removed from the number.
 

Examples (16 bits to eight bits):

                FF80h can be sign contracted to 80h
                0040h can be sign contracted to 40h
                FE40h cannot be sign contracted to 8 bits.
                0100h cannot be sign contracted to 8 bits.
 
 



 
 

Another set of logical operations which apply to bit strings are the shift and rotate operations.
These two categories can be further broken down into left shifts, left rotates, right shifts, and
right rotates.

The left shift operation moves each bit in a bit string one position to the left. We'll shift the
value zero into the L.O. bit, and the previous value of bit seven will be the carry out of this
operation.

Note that shifting a value to the left is the same thing as multiplying it by its radix. For example,
shifting a decimal number one position to the left ( adding a zero to the right of the number)
effectively multiplies it by ten (the radix):

1234 SHL 1 = 12340       (SHL 1 = shift left one position)
 

Since the radix of a binary number is two, shifting it left multiplies it by two.
If you shift a binary value to the left twice,
you multiply it by two twice (i.e., you multiply it by four).
 

If you shift a binary value to the left three times, you multiply it by eight (2*2*2).
In general, if you shift a value to the left n times, you multiply that value by2**n.
 

A right shift operation works the same way. Bit seven moves into bit six,
bit six moves into bit five, bit five moves into bit four, etc.
During a right shift, we'll move a zero into bit seven, and bit zero will be the carry out of the
operation:
 
 

Since a left shift is equivalent to a multiplication by two, it should come as no surprise that a
right shift is roughly comparable to a division by two
(or, in general, a division by the radix of the number).
If you perform n right shifts, you will divide that number by2**n.

There is one problem with shift rights with respect to division: as described above a shift right
is only equivalent to an unsigned division by two.
 

For example, if you shift the unsigned representation of 254 (0FEh) one place to the right, you get 127 (07Fh), exactly what you would expect.
However, if you shift the binary representation of -2 (0FEh) to the right one position,
you get 127 (07Fh), which is not correct.

This problem occurs because we're shifting a zero into bit seven.
If bit seven previously contained a one, we're changing it from a negative to a positive number.
Not a good thing when dividing by two.

To use the shift right as a division operator, we must define a third shift operation:
arithmetic shift right.
An arithmetic shift right works just like the normal shift right operation
(a logical shift right) with one exception: instead of shifting a zero into bit seven,
an arithmetic shift right operation leaves bit seven alone, that is,
during the shift operation it does not modify the value of bit seven.
 
 

This generally produces the result you expect. For example, if you perform the arithmetic shift
right operation on -2 (0FEh) you get -1 (0FFh).
Keep one thing in mind about arithmetic shift right, however.
This operation always rounds the numbers to the closest integer which is less
than or equal to the actual result.

Based on experiences with high level programming languages and the standard rules of integer truncation, most people assume this means that a division always truncates towards zero.
But this simply isn't the case.
 

For example, if you apply the arithmetic shift right operation on -1 (0FFh), the result is -1,
not zero. -1 is less than zero so the arithmetic shift right operation rounds towards minus one.
This is the way integer division typically gets defined.
The 80x86 integer division instruction also produces this result.

Another pair of useful operations are rotate left and rotate right.
These operations behave like the shift left and shift right operations with one major difference:
the bit shifted out from one end is shifted back in at the other end.
 



 

Although the 80x86 operates most efficiently on byte, word, and double word data types,
occasionally you'll need to work with a data type that uses some number of bits other than eight,
16, or 32. For example, consider a date of the form "4/2/88".
 

It takes three numeric values to represent this date: a month, day, and year value. Months, of course, take on the values 1..12. It will require at least four bits (maximum of sixteen different values) to represent the month. Days range between 1..31.
 

So it will take five bits (maximum of 32 different values) to represent the day entry. The year value, assuming that we're working with values in the range 0..99, requires seven bits (which can be used to represent up to 128 different values).
Four plus five plus seven is 16 bits, or two bytes.
 

In other words, we can pack our date data into two bytes rather than the three that would be required if we used a separate byte for each of the month, day, and year values.
This saves one byte of memory for each date stored,
which could be a substantial saving if you need to store a lot of dates.

MMMM represents the four bits making up the month value, DDDDD represents the five bits
making up the day, and YYYYYYY is the seven bits comprising the year. Each collection of
bits representing a data item is a bit field. April 2nd, 1988 would be represented as 4158h:

                0100 00010 1011000      = 0100 0001 0101 1000b or 4158h
          4     2       88

Although packed values are space efficient (that is, very efficient in terms of memory usage),
they are computationally inefficient (slow!).
The reason?
 

It takes extra instructions to unpack the data packed into the various bit fields. These extra instructions take additional time to execute (and additional bytes to hold the instructions);
hence, you must carefully consider whether packed data fields will save you anything.
 

Examples of practical packed data types abound. You could pack eight boolean values into a
single byte, you could pack two BCD digits into a byte, etc.
 

The ASCII character set (excluding the extended characters defined by IBM) is divided into four groups of 32 characters.
The first 32 characters, ASCII codes 0 through 1Fh (31), form a special set of non-printing characters called the control characters.
We call them control characters because they perform various printer/display control operations rather than displaying symbols.
There is very little standardization among those output devices.
 

The second group of 32 ASCII character codes comprise various punctuation symbols, special
characters, and the numeric digits. The most notable characters in this group include the space
character (ASCII code 20h) and the numeric digits (ASCII codes 30h..39h). Note that the
numeric digits differ from their numeric values only in the H.O. nibble. By subtracting 30h from
the ASCII code for any particular digit you can obtain the numeric equivalent of that digit.
 

The third group of 32 ASCII characters is reserved for the upper case alphabetic characters.
The ASCII codes for the characters "A".."Z" lie in the range 41h..5Ah (65..90). Since there are
only 26 different alphabetic characters, the remaining six codes hold various special symbols.
 

The fourth, and final, group of 32 ASCII character codes are reserved for the lower case
alphabetic symbols, five additional special symbols, and another control character (delete).
Note that the lower case character symbols use the ASCII codes 61h..7Ah. If you convert the
codes for the upper and lower case characters to binary, you will notice that the upper case
symbols differ from their lower case equivalents in exactly one bit position.
 

Upper case characters always contain a zero in bit five; lower case alphabetic characters always contain a one in bit five.
You can use this fact to quickly convert between upper and lower case. If you have an upper case character you can force it to lower case by setting bit five to one.
If you have a lower case character and you wish to force it to upper case, you can do so by setting bit five to zero. You can toggle an alphabetic character between upper and lower case by simply inverting bit five.
 

Indeed, bits five and six determine which of the four groups
in the ASCII character set you're in:
Bit 6 Bit 5 Group
0 0 Control Characters
0 1 Digits & Punctuation
1 0 Upper Case & Special
1 1 Lower Case & Special
 
 

So you could, for instance, convert any upper or lower case (or corresponding special) character to its equivalent control character by setting bits five and six to zero.
 

Consider, for a moment, the ASCII codes of the numeric digit characters:

"0" 48 30h
"1" 49 31h
"2" 50 32h
"3" 51 33h
"4" 52 34h
"5" 53 35h
"6" 54 36h
"7" 55 37h
"8" 56 38h
"9" 57 39h
 Char Dec Hex
 

The decimal representations of these ASCII codes are not very enlightening.
However, the hexadecimal representation of these ASCII codes reveals something very important - the L.O. nibble of the ASCII code is the binary equivalent of the represented number.
 

By stripping away  (i.e., setting to zero) the H.O. nibble of a numeric character, you can convert that character code to the corresponding binary representation. Conversely, you can convert a binary value in the range 0..9 to its ASCII character representation by simply setting the H.O. nibble to three. Note that you can use the logical-AND operation to force the H.O. bits to zero; likewise, you can use the logical-OR operation to force the H.O. bits to 0011 (three).
 

Note that you cannot convert a string of numeric characters to their equivalent binary
representation by simply stripping the H.O. nibble from each digit in the string. Converting 123
(31h 32h 33h) in this fashion yields three bytes: 010203h, not the correct value which is 7Bh.
Converting a string of digits to an integer requires more sophistication than this; the conversion
above works only for single digits.
 

Bit seven in standard ASCII is always zero. This means that the ASCII character set consumes
only half of the possible character codes in an eight bit byte. IBM uses the remaining 128
character codes for various special characters including international characters (those with
accents, etc.), math symbols, and line drawing characters. Note that these extra characters are a
non-standard extension to the ASCII character set. Of course, the name IBM has considerable
clout, so almost all modern personal computers based on the 80x86 with a video display
support the extended IBM/ASCII character set. Most printers support IBM's character set as
well.
 

Should you need to exchange data with other machines which are not PC-compatible, you have
only two alternatives: stick to standard ASCII or ensure that the target machine supports the
extended IBM-PC character set.
 

Some machines, like the Apple Macintosh, do not provide native support for the extended IBM-PC character set; however you may obtain a PC font which lets you display the extended character set. Other machines (e.g., Amiga and Atari ST) have similar capabilities. However, the 128 characters in the standard ASCII character set are the only ones you should count on transferring from system to system.
 

Despite the fact that it is a "standard", simply encoding your data using standard ASCII
characters does not guarantee compatibility across systems. While it's true that an "A" on one
machine is most likely an "A" on another machine, there is very little standardization across
machines with respect to the use of the control characters. Indeed, of the 32 control codes plus
delete, there are only four control codes commonly supported - backspace (BS), tab, carriage
return (CR), and line feed (LF).
 

Worse still, different machines often use these control codes in different ways. End of line is a particularly troublesome example. MS-DOS, CP/M, and other systems mark end of line by the two-character sequence CR/LF. Apple Macintosh, Apple II, and many other systems mark the end of line by a single CR character.
 

UNIX systems mark the end of a line with a single LF character. Needless to say, attempting to exchange simple text files between such systems can be an experience in frustration. Even if you use standard ASCII characters in all your files on these systems, you will still need to convert the data when exchanging files between them. Fortunately, such conversions are rather simple.
 

Despite some major shortcomings, ASCII data is the standard for data interchange across
computer systems and programs. Most programs can accept ASCII data; likewise most programs can produce ASCII data.
 

©opy®ight 1996 by Randall Hyde