395 lines
21 KiB
Markdown
395 lines
21 KiB
Markdown
<p>
|
|
This file will cover in more detail some of the concepts needed to understand
|
|
C++. The goal here is <em>not</em> to make you be able to sit down and write
|
|
your own program from scratch.
|
|
</p>
|
|
|
|
|
|
<p> </p>
|
|
<h2>Objects and Classes</h2>
|
|
<p>
|
|
An <strong>object</strong> in C++ is a single, conceptual unit that contains
|
|
<strong>data</strong> (information about the state of that object) and
|
|
<strong>methods</strong> (functions associated with that class of objects) by
|
|
which the object is modified or can interact with other objects. The data in
|
|
an object can be either normal variables (<em>e.g.</em> characters, floating
|
|
point numbers, or integers) or previously-defined objects. A category of
|
|
objects is a <strong>class</strong>; an object is a single instance of a class.
|
|
</p>
|
|
<p>
|
|
For example, in Avida one of the most important classes is called <code>cOrganism</code>
|
|
-- it is the class that all organism objects belong to. Here is an abbreviated
|
|
version of the cOrganism class declaration (explained further below).
|
|
|
|
<pre>
|
|
class <span class="class">cOrganism</span>
|
|
{
|
|
private: <span class="comment">// Data in this class cannot be directly accessed from outside.</span>
|
|
const <span class="class">Genome</span> <span class="object">m_genome</span>; <span class="comment">// The initial genome that this organism was born with.</span>
|
|
<span class="class">cPhenotype</span> <span class="object">m_phenotype</span>; <span class="comment">// Maintains the status of this organism's phenotypic traits.</span>
|
|
|
|
<span class="class">cOrgInterface</span>* <span class="object">m_interface</span>; <span class="comment">// Interface back to the population.</span>
|
|
|
|
<span class="class">cGenotype</span>* <span class="object">m_genotype;</span> <span class="comment">// A pointer to the genotype that this organism belongs to.</span>
|
|
<span class="class">cHardwareBase</span>* <span class="object">m_hardware;</span> <span class="comment">// The virtual machinery that this organism's genome is run on.</span>
|
|
|
|
public: <span class="comment"> // The methods are accessible to other classes.</span>
|
|
<span class="method">cOrganism</span>(<span class="class">cWorld</span>* <span class="object">world</span>, <span class="class">cAvidaContext</span>& <span class="object">ctx</span>, const <span class="class">Genome</span>& <span class="object">in_genome</span>);
|
|
<span class="method">~cOrganism</span>();
|
|
|
|
<span class="comment">// This batch of methods involve interaction with the population to resolve.</span>
|
|
<span class="class">cOrganism</span>* <span class="method">GetNeighbor</span>() { return <span class="object">m_interface</span>-><span class="method">GetNeighbor</span>(); }
|
|
int <span class="method">GetNeighborhoodSize</span>() { return <span class="object">m_interface</span>-><span class="method">GetNumNeighbors</span>(); }
|
|
void <span class="method">Rotate</span>(int direction) { <span class="object">m_interface</span>-><span class="method">Rotate</span>(direction); }
|
|
int <span class="method">GetInput</span>() { return <span class="object">m_interface</span>-><span class="method">GetInput</span>(); }
|
|
void <span class="method">Die</span>() { <span class="object">m_interface</span>-><span class="method">Die</span>(); }
|
|
|
|
<span class="comment">// Accessors -- these are used to gain access to private data.</span>
|
|
const <span class="class">Genome</span>& <span class="method">GetGenome()</span> const { return <span class="object">m_genome</span>; }
|
|
<span class="class">cHardwareBase</span>& <span class="method">GetHardware()</span> { return *<span class="object">m_hardware</span>; }
|
|
<span class="class">cPhenotype</span>& <span class="method">GetPhenotype()</span> { return <span class="object">m_phenotype</span>; }
|
|
<span class="class">cGenotype</span>* <span class="method">GetGenotype()</span> { return <span class="object">m_genotype</span>; }
|
|
};
|
|
</pre>
|
|
|
|
|
|
<p> </p>
|
|
<h2>Style and Syntax Guide</h2>
|
|
|
|
<p>
|
|
Don't worry too much about how the syntax works. The code presented above
|
|
is a definition of a class in C++. It is broken into two parts; one labeled
|
|
<code>private:</code> for those portions of the definition that can only
|
|
be interacted with from within the class, and another labeled <code>public:</code>
|
|
which defines the interface to the outside. In this case, we've made all
|
|
of the variables private and the methods public.
|
|
</p>
|
|
<p>
|
|
A variable is defined by a description of the type of variable (such
|
|
a <code>cPhenotype</code>) and then the name of this particular instance
|
|
of the variable. In this case, since organisms only have one phenotype,
|
|
we called it merely <code>m_phenotype</code>. Not that because this
|
|
variable is a <em>member</em> of <code>cOrganism</code> instances, it is
|
|
prefixed with '<code>m_</code>'.
|
|
</p>
|
|
|
|
<p>
|
|
Methods are slightly more complex. The declaration of a method starts with the
|
|
type of data the method returns (such as <code>int</code> for integer), or else
|
|
lists <code>void</code> if there is no return value. Then the method name is
|
|
given, followed by a set of parenthesis (which are what indicates to C++ that
|
|
you are declaring a method). Inside of those parentesis, can be
|
|
<strong>arguments</strong>, which are variables that must be given to the
|
|
method in order for it to operate correctly. The declaration can stop at this
|
|
point (ending in a semi-colon) if the method <strong>body</strong> is defined
|
|
elsewhere. The body of the method is the sourcecode that details how the method
|
|
operates, and can be included immediately after the declaration (within braces)
|
|
or be placed elsewhere in the code. Typically short method bodies are included
|
|
in the class definition, while longer ones are placed outside of it. A method is
|
|
performed on an object, by listing the object name, followed by a dot ('.'), and
|
|
then the name of the method to be called with all necessary arguments included.
|
|
This is explained further below.
|
|
</p>
|
|
<p>
|
|
The C++ language will accept variable names, class names, and method names of
|
|
any alpha-numeric sequence as long as all names begin with a letter. The only
|
|
other character allowed in a name is the underscore ('_'). To make reading code
|
|
easier, we have adopted certain conventions.
|
|
</p>
|
|
|
|
<dl>
|
|
<dt><span class="object">an_example_variable</span></dt>
|
|
<dd>
|
|
Variable names (including object names) are always all in lowercase
|
|
letters, with individual words separated by underscores. Variables are
|
|
either user-defined classes, numbers (integers, boolean values, floating
|
|
point numbers, etc.) or characters (single symbols)</dd>
|
|
</dd>
|
|
<dt><span class="method">ExampleMethod</span></dt>
|
|
<dd>
|
|
Method names always have the first letter of each word capitalized, with the
|
|
remainder of the word in lowercase. The one exception to this is Constructors
|
|
and Destructors, which must have the same name as the class (see below).
|
|
</dd>
|
|
<dt><span class="class">cExampleClass</span></dt>
|
|
<dd>
|
|
Classes use a similar format to methods, but always begin with a single,
|
|
lowercase 'c'. Some other specialized types also used this format, but with a
|
|
different initial letter. For example, an initial 't' indicates a template,
|
|
which is a special type of class.
|
|
</dd>
|
|
<dt>CONSTANT_VALUE</dt>
|
|
<dd>
|
|
Any constant values (that is, numerical values that will never change
|
|
during the course of the run) are given in all upper-case letters, with
|
|
individual words separated by underscores.
|
|
</dd>
|
|
</dl>
|
|
|
|
<p>
|
|
Different software projects will each use their own style conventions; these
|
|
are the ones you'll end up working with in Avida. Some exceptions do exist.
|
|
For example, the C++ language itself does not follow many style rules;
|
|
built-in C++ names are all lowercase letters, regardless of what they
|
|
represent. For more details, including spacing and other code formatting
|
|
standards you must follow in Avida, see the
|
|
<a href="code_standards.html">Coding Standards</a>.
|
|
|
|
|
|
<p> </p>
|
|
<h2>Description of Data Elements</h2>
|
|
|
|
<p>
|
|
The section labeled <code>private</code> above lists those data that are unique
|
|
to each organism; these are objects and pointers that exist <em>inside</em> of
|
|
an organism object. First, <span class="object">m_genome</span> keeps
|
|
the initial state of the organism. Since we never want this genome to change
|
|
over the organism's life, we place a <code>const</code> directive in front of
|
|
it. The <code>const</code> command exists so that C++ knows to warn the
|
|
programmer if they accidentally try to change an object (or variable) that
|
|
is not supposed to be altered.
|
|
</p>
|
|
<p>
|
|
The internal <span class="object">m_phenotype</span> object is used to
|
|
record the behaviors and abilities that the organism demonstrates during its
|
|
life. This class has variables to track everything from the tasks performed to
|
|
the gestation time of the organism and the number of offspring it has ever
|
|
produced. The <span class="object">m_interface</span> allows an
|
|
organism to communicate with the environment (either the
|
|
<span class="class">cPopulation</span> or the
|
|
<span class="class">cTestCPU</span>) that it is part of. This is used,
|
|
for example, when an organism is ready to finish replicating and needs its
|
|
offspring to be placed into the population. If an organism is being run on a
|
|
test CPU rather than in a proper population object, then this interface will
|
|
cause the statistics about offspring to be recorded for later use instead of
|
|
activating it.
|
|
</p>
|
|
<p>
|
|
Next, we have two <strong>pointers</strong>. A pointer is a value that
|
|
represents ("points to") a location in the physical memory of the computer. A
|
|
pointer can be identified by the asterisk ('*') that follows the type name. The
|
|
code "<code>cGenotype* genotype</code>" indicates that the variable
|
|
<span class="object">genotype</span> points to a location in memory
|
|
where an object of class <span class="class">cGenotype</span> is
|
|
stored. In this case, all of the organisms that are of a single genotype all
|
|
point to the <em>same</em> cGenotype object so that the genotypic information
|
|
is accessible to all organisms that may need to make use of it.
|
|
</p>
|
|
<p>
|
|
The final data element is <span class="object">m_hardware</span>, a
|
|
pointer to an object of type <span class="class">cHardwareBase</span>.
|
|
This variable is a pointer for a different reason than the genotype. Where a
|
|
single genotype is shared by many different organisms, each organism does
|
|
possess its own hardware. However, Avida supports more than one type of
|
|
hardware, where any of them can be at the other end of that hardware pointer.
|
|
The cHardwareBase class is used as an interface to the actual hardware that is
|
|
used. This is explained in more detail later in the section on inherited
|
|
classes. For the moment, the key idea is that a pointer can sometimes point to
|
|
a general type of object, not just those of a very specific class.
|
|
</p>
|
|
|
|
|
|
<p> </p>
|
|
<h2>Description of Methods</h2>
|
|
|
|
<p>
|
|
Class descriptions (with limited exceptions) must contain two specific
|
|
methods called the <strong>constructor</strong> and the
|
|
<strong>destructor</strong>. The constructor always has the same name as the
|
|
class (it's called <span class="method">cOrganism</span>(...) in this
|
|
case), and is executed in order to create a new object of that class. The
|
|
arguments for the constructor must include all of the information required to
|
|
build on object of the desired class. For an organism, we need the world
|
|
object within which the organism resides, the current execution context, and
|
|
perhaps most importantly the genome of the organism. The method is not defined
|
|
here, only <strong>declared</strong>. A declared method must be defined
|
|
elsewhere in the program. All methods must be, at least, declared in the class
|
|
definition. Note that if there are many ways to create an object, multiple
|
|
constructors are allowed as long as they take different inputs.
|
|
</p>
|
|
<p>
|
|
Whereas the constructor is called when an object is created, the destructor
|
|
is called when the object is destroyed, whereupon it must do any cleanup,
|
|
such as freeing allocated memory (see the section on memory management
|
|
below). The name of a destructor is always the same as the class name,
|
|
but with a tilde ('~') in front of it. Thus, the cOrganism's destructor
|
|
is called <span class="method">~cOrganism</span>(). A destructor can
|
|
never take any arguments, and there must be only one of them in a class
|
|
definition.
|
|
</p>
|
|
<p>
|
|
The next group of five methods are all called when an organism needs
|
|
to perform some behavior, which in all of these cases involves it interacting
|
|
with the population. For example, if you need to know at whom an organism
|
|
is facing, you can call the method
|
|
<span class="method">GetNeighbor</span>()
|
|
on it, and a pointer to the neighbor currently faced will be returned.
|
|
Likewise, if you need to kill an organism, you can call the method
|
|
<span class="method">Die</span>() on it, and it will be terminated.
|
|
Since each of these require interaction on the population level, the population
|
|
itself takes care of the bulk of the functionality.
|
|
</p>
|
|
<p>
|
|
The 'Accessors' are methods that provide access to otherwise private
|
|
data. For example, the method <span class="method">GetGenome</span>()
|
|
will literally pass the genome of the organism to the object that calls it. In
|
|
particular, the hardware object associated with an organism will often call
|
|
<span class="method">GetPhenotype</span>() in order to get the current
|
|
state of the organism's phenotype and update it with something new the organism
|
|
has done. Several things to take note of here. In the first three accessors,
|
|
the name of the class being returned is followed by an ampersand ('&').
|
|
This means that the actual object is being passed back, and not just a copy of
|
|
all the values contained in it. See the next section on pointers, references,
|
|
and values for more information about this. Also, in the very first accessor,
|
|
the keyword <code>const</code> is used twice. The first time is to say that
|
|
the object being passed out of the method is constant (that is, the programmer
|
|
should be warned if somewhere else in the code it is being changed) and the
|
|
second time is to say that the actions of this method will never change
|
|
anything about the object they are being run on (that is, the object is being
|
|
left constant even when the method is run.) The net effect of this is that an
|
|
object marked const can only have const methods run on it. The compiler will
|
|
assume that a non-const method being run will make a change to the object, and
|
|
is therefore an error.
|
|
</p>
|
|
<p>
|
|
This section has contained information about a particular C++ class found
|
|
in Avida. The next sections will more generally explain some of the
|
|
principles of the language. If you haven't already, now might be a good
|
|
time to take a deep breath before you dive back in.
|
|
</p>
|
|
|
|
|
|
<p> </p>
|
|
<h2>Pointers, References, and Values</h2>
|
|
|
|
<p>
|
|
The three ways of passing information around in a
|
|
C++ program is through sending a pointer to the location of that information,
|
|
sending a reference to it, or actually just sending the value of the information.
|
|
For the moment, lets consider the return value of a method. Consider
|
|
the three methods below:
|
|
</p>
|
|
<pre>
|
|
<span class="class">Genome</span> <span class="method">GetGenomeValue</span>();
|
|
<span class="class">Genome</span>* <span class="method">GetGenomePointer</span>();
|
|
<span class="class">Genome</span>& <span class="method">GetGenomeReference</span>();
|
|
</pre>
|
|
<p>
|
|
These three cases are all very different. In the first case
|
|
(<strong>Pass-by-Value</strong>), the value of the genome in question is
|
|
returned. That means that the genome being returned is analyzed, and the exact
|
|
sequence of instruction in it are sent to the object calling this method. Once
|
|
the requesting object gets this information, however, any changes made to it do
|
|
<em>not</em> affect the original genome that was copied. The second case
|
|
(<strong>Pass-by-Pointer</strong>),only a few bytes of information are returned
|
|
that give the location in memory of this genome. The requesting object can then
|
|
go and modify that memory if it chooses to, but it must first 'resolve' the
|
|
pointer to do so. Finally, the last case (<strong>Pass-by-Reference</strong>)
|
|
actually passes the whole object out. It is used in a very similar way to
|
|
pass-by-value, but any changes made to the genome after it is passed out will
|
|
affect the genome in the actual organism itself! Pass-by-reference does not
|
|
add any new functionality over pass-by-pointer, but in practice it is often
|
|
easier to use.
|
|
</p>
|
|
|
|
|
|
<p> </p>
|
|
<h2>Memory Management</h2>
|
|
|
|
<p>
|
|
Memory management in C++ can be as simple or complicated as the programmer
|
|
wants it to be. If you never explicitly allocate a chunk of memory, than
|
|
you never need to worry about freeing it up when you're done using it.
|
|
However, there are many occasions where a program can be made faster or
|
|
more flexible by dynamically allocating objects. The command
|
|
<strong><code>new</code></strong> is used to allocate memory; for example if
|
|
you wanted to allocate memory for a new genome containing 100 instructions,
|
|
you could type:
|
|
</p>
|
|
<pre>
|
|
<span class="class">Genome</span>* <span class="object">created_genome</span> = new <span class="method">Genome</span>(100);
|
|
</pre>
|
|
<p>
|
|
The variable created_genome is defined as a pointer to a memory location
|
|
containing a genome. This is assigned the location of the newly allocated
|
|
genome in memory, which is the return value of the <code>new</code> command.
|
|
The Genome constructor (called with new) takes as an argument a single number
|
|
that is the sequence length of the genome.
|
|
</p>
|
|
<p>
|
|
Unfortunately, C++ won't know when we're done using this genome. If we
|
|
need to create many different genomes and we only use each of them once,
|
|
our memory can rapidly fill up. So, we need tell the memory management
|
|
that we are finished with the current object. Thus, when we're done using
|
|
it, we type:
|
|
</p>
|
|
<pre>
|
|
delete <span class="object">created_genome</span>;
|
|
</pre>
|
|
<p>
|
|
And the memory pointed to by the created_genome
|
|
variable will be freed up.
|
|
</p>
|
|
<p>
|
|
An excellent example of when allocation and freeing of memory is employed
|
|
in Avida is with the genotype. Every time a new genotype is created during
|
|
evolution, Avida needs to allocate a new object of class cGenotype. During
|
|
the course of a run, millions of genotypes may be created, so we need to be
|
|
sure to free genotypes whenever they will no longer be needed in the run.
|
|
</p>
|
|
|
|
|
|
<p> </p>
|
|
<h2>Inherited Classes</h2>
|
|
|
|
<p>
|
|
One of the beauties of C++ is that well written code
|
|
is inherently very reusable. As part of that, there is the concept
|
|
of the <strong>class inheritance</strong>. When a new class is built in C++,
|
|
it is possible to build it off of an existing class, then referred to as
|
|
a <strong>base class</strong>. The new <strong>derived</strong>
|
|
class will have access to all of the methods in the base class, and can
|
|
<strong>overload</strong> them; that is, it can change how any of those
|
|
methods operate.
|
|
</p>
|
|
<p>
|
|
For example, in the Avida scheduler, we use a class called
|
|
<span class="class">cSchedule</span> to determine which organism's
|
|
virtual CPU executes the next instruction. Well, this cSchedule object is not
|
|
all that clever. In fact, all that it does is run through the list of
|
|
organisms that need to go, lets them each execute a single instruction, and
|
|
then does it all over again. But sometimes we need to make some organisms
|
|
execute instructions at a faster rate than others. For that reason, there are
|
|
several derived classes, including
|
|
<span class="class">cIntegratedSchedule</span>, which takes in a merit
|
|
value for each organism, and assigns CPU cycles proportional to that merit.
|
|
Since this new class uses cSchedule as its base class, it can be dynamically
|
|
plugged in during run time, after looking at what the user chooses in the
|
|
configuration file.
|
|
</p>
|
|
<p>
|
|
If a method is not implemented in a base class (left to be implemented in the
|
|
derived classes) it is called an <strong>abstract</strong> method. If a base
|
|
class does not implement <em>any</em> of its methods and is only used in order
|
|
to specify what methods need to be included in the derived classes, it is
|
|
referred to as an <strong>abstract base class</strong> (or sometimes an
|
|
<strong>interface class</strong> or <strong>protocol</strong>) and is used
|
|
simply as a method contract for derived classes. An example in Avida where this
|
|
is used is the organism interface with the environemnt. The class
|
|
<span class="class">cOrgInterface</span> is an abstract base class,
|
|
with <span class="class">cPopulationInterface</span> and
|
|
<span class="class">cTestCPUInterface</span> as derived classes. This
|
|
organization allows for organism objects to interact with both the population
|
|
and the test environment, without having to write separate code for each.
|
|
</p>
|
|
|
|
|
|
<p> </p>
|
|
<h2>Other C++ Resources:</h2>
|
|
|
|
<ul>
|
|
<li><a href="http://www.intap.net/~drw/cpp/">Online C++ Tutorial</a></li>
|
|
<li><a href="http://www.cplusplus.com/doc/tutorial/">Another Tutorial</a></li>
|
|
<li><a href="http://www.quiver.freeserve.co.uk/OOP1.htm">Object Oriented Concepts Tutorial</a></li>
|
|
<li><a href="http://www.research.att.com/~bs/glossary.html">A Glossary of C++ Terms</a></li>
|
|
</ul> |