avida-ed-library-build/docs/documentation/Development-|-Tutorial-|-C-...

395 lines
21 KiB
Markdown

<p>
This file will cover in more detail some of the concepts needed to understand
C++. The goal here is <em>not</em> to make you be able to sit down and write
your own program from scratch.
</p>
<p>&nbsp;</p>
<h2>Objects and Classes</h2>
<p>
An <strong>object</strong> in C++ is a single, conceptual unit that contains
<strong>data</strong> (information about the state of that object) and
<strong>methods</strong> (functions associated with that class of objects) by
which the object is modified or can interact with other objects. The data in
an object can be either normal variables (<em>e.g.</em> characters, floating
point numbers, or integers) or previously-defined objects. A category of
objects is a <strong>class</strong>; an object is a single instance of a class.
</p>
<p>
For example, in Avida one of the most important classes is called <code>cOrganism</code>
-- it is the class that all organism objects belong to. Here is an abbreviated
version of the cOrganism class declaration (explained further below).
<pre>
class <span class="class">cOrganism</span>
{
private: <span class="comment">// Data in this class cannot be directly accessed from outside.</span>
const <span class="class">Genome</span> <span class="object">m_genome</span>; <span class="comment">// The initial genome that this organism was born with.</span>
<span class="class">cPhenotype</span> <span class="object">m_phenotype</span>; <span class="comment">// Maintains the status of this organism's phenotypic traits.</span>
<span class="class">cOrgInterface</span>* <span class="object">m_interface</span>; <span class="comment">// Interface back to the population.</span>
<span class="class">cGenotype</span>* <span class="object">m_genotype;</span> <span class="comment">// A pointer to the genotype that this organism belongs to.</span>
<span class="class">cHardwareBase</span>* <span class="object">m_hardware;</span> <span class="comment">// The virtual machinery that this organism's genome is run on.</span>
public: <span class="comment"> // The methods are accessible to other classes.</span>
<span class="method">cOrganism</span>(<span class="class">cWorld</span>* <span class="object">world</span>, <span class="class">cAvidaContext</span>&amp; <span class="object">ctx</span>, const <span class="class">Genome</span>&amp; <span class="object">in_genome</span>);
<span class="method">~cOrganism</span>();
<span class="comment">// This batch of methods involve interaction with the population to resolve.</span>
<span class="class">cOrganism</span>* <span class="method">GetNeighbor</span>() { return <span class="object">m_interface</span>-&gt;<span class="method">GetNeighbor</span>(); }
int <span class="method">GetNeighborhoodSize</span>() { return <span class="object">m_interface</span>-&gt;<span class="method">GetNumNeighbors</span>(); }
void <span class="method">Rotate</span>(int direction) { <span class="object">m_interface</span>-&gt;<span class="method">Rotate</span>(direction); }
int <span class="method">GetInput</span>() { return <span class="object">m_interface</span>-&gt;<span class="method">GetInput</span>(); }
void <span class="method">Die</span>() { <span class="object">m_interface</span>-&gt;<span class="method">Die</span>(); }
<span class="comment">// Accessors -- these are used to gain access to private data.</span>
const <span class="class">Genome</span>&amp; <span class="method">GetGenome()</span> const { return <span class="object">m_genome</span>; }
<span class="class">cHardwareBase</span>&amp; <span class="method">GetHardware()</span> { return *<span class="object">m_hardware</span>; }
<span class="class">cPhenotype</span>&amp; <span class="method">GetPhenotype()</span> { return <span class="object">m_phenotype</span>; }
<span class="class">cGenotype</span>* <span class="method">GetGenotype()</span> { return <span class="object">m_genotype</span>; }
};
</pre>
<p>&nbsp;</p>
<h2>Style and Syntax Guide</h2>
<p>
Don't worry too much about how the syntax works. The code presented above
is a definition of a class in C++. It is broken into two parts; one labeled
<code>private:</code> for those portions of the definition that can only
be interacted with from within the class, and another labeled <code>public:</code>
which defines the interface to the outside. In this case, we've made all
of the variables private and the methods public.
</p>
<p>
A variable is defined by a description of the type of variable (such
a <code>cPhenotype</code>) and then the name of this particular instance
of the variable. In this case, since organisms only have one phenotype,
we called it merely <code>m_phenotype</code>. Not that because this
variable is a <em>member</em> of <code>cOrganism</code> instances, it is
prefixed with '<code>m_</code>'.
</p>
<p>
Methods are slightly more complex. The declaration of a method starts with the
type of data the method returns (such as <code>int</code> for integer), or else
lists <code>void</code> if there is no return value. Then the method name is
given, followed by a set of parenthesis (which are what indicates to C++ that
you are declaring a method). Inside of those parentesis, can be
<strong>arguments</strong>, which are variables that must be given to the
method in order for it to operate correctly. The declaration can stop at this
point (ending in a semi-colon) if the method <strong>body</strong> is defined
elsewhere. The body of the method is the sourcecode that details how the method
operates, and can be included immediately after the declaration (within braces)
or be placed elsewhere in the code. Typically short method bodies are included
in the class definition, while longer ones are placed outside of it. A method is
performed on an object, by listing the object name, followed by a dot ('.'), and
then the name of the method to be called with all necessary arguments included.
This is explained further below.
</p>
<p>
The C++ language will accept variable names, class names, and method names of
any alpha-numeric sequence as long as all names begin with a letter. The only
other character allowed in a name is the underscore ('_'). To make reading code
easier, we have adopted certain conventions.
</p>
<dl>
<dt><span class="object">an_example_variable</span></dt>
<dd>
Variable names (including object names) are always all in lowercase
letters, with individual words separated by underscores. Variables are
either user-defined classes, numbers (integers, boolean values, floating
point numbers, etc.) or characters (single symbols)</dd>
</dd>
<dt><span class="method">ExampleMethod</span></dt>
<dd>
Method names always have the first letter of each word capitalized, with the
remainder of the word in lowercase. The one exception to this is Constructors
and Destructors, which must have the same name as the class (see below).
</dd>
<dt><span class="class">cExampleClass</span></dt>
<dd>
Classes use a similar format to methods, but always begin with a single,
lowercase 'c'. Some other specialized types also used this format, but with a
different initial letter. For example, an initial 't' indicates a template,
which is a special type of class.
</dd>
<dt>CONSTANT_VALUE</dt>
<dd>
Any constant values (that is, numerical values that will never change
during the course of the run) are given in all upper-case letters, with
individual words separated by underscores.
</dd>
</dl>
<p>
Different software projects will each use their own style conventions; these
are the ones you'll end up working with in Avida. Some exceptions do exist.
For example, the C++ language itself does not follow many style rules;
built-in C++ names are all lowercase letters, regardless of what they
represent. For more details, including spacing and other code formatting
standards you must follow in Avida, see the
<a href="code_standards.html">Coding Standards</a>.
<p>&nbsp;</p>
<h2>Description of Data Elements</h2>
<p>
The section labeled <code>private</code> above lists those data that are unique
to each organism; these are objects and pointers that exist <em>inside</em> of
an organism object. First, <span class="object">m_genome</span> keeps
the initial state of the organism. Since we never want this genome to change
over the organism's life, we place a <code>const</code> directive in front of
it. The <code>const</code> command exists so that C++ knows to warn the
programmer if they accidentally try to change an object (or variable) that
is not supposed to be altered.
</p>
<p>
The internal <span class="object">m_phenotype</span> object is used to
record the behaviors and abilities that the organism demonstrates during its
life. This class has variables to track everything from the tasks performed to
the gestation time of the organism and the number of offspring it has ever
produced. The <span class="object">m_interface</span> allows an
organism to communicate with the environment (either the
<span class="class">cPopulation</span> or the
<span class="class">cTestCPU</span>) that it is part of. This is used,
for example, when an organism is ready to finish replicating and needs its
offspring to be placed into the population. If an organism is being run on a
test CPU rather than in a proper population object, then this interface will
cause the statistics about offspring to be recorded for later use instead of
activating it.
</p>
<p>
Next, we have two <strong>pointers</strong>. A pointer is a value that
represents ("points to") a location in the physical memory of the computer. A
pointer can be identified by the asterisk ('*') that follows the type name. The
code &quot;<code>cGenotype* genotype</code>&quot; indicates that the variable
<span class="object">genotype</span> points to a location in memory
where an object of class <span class="class">cGenotype</span> is
stored. In this case, all of the organisms that are of a single genotype all
point to the <em>same</em> cGenotype object so that the genotypic information
is accessible to all organisms that may need to make use of it.
</p>
<p>
The final data element is <span class="object">m_hardware</span>, a
pointer to an object of type <span class="class">cHardwareBase</span>.
This variable is a pointer for a different reason than the genotype. Where a
single genotype is shared by many different organisms, each organism does
possess its own hardware. However, Avida supports more than one type of
hardware, where any of them can be at the other end of that hardware pointer.
The cHardwareBase class is used as an interface to the actual hardware that is
used. This is explained in more detail later in the section on inherited
classes. For the moment, the key idea is that a pointer can sometimes point to
a general type of object, not just those of a very specific class.
</p>
<p>&nbsp;</p>
<h2>Description of Methods</h2>
<p>
Class descriptions (with limited exceptions) must contain two specific
methods called the <strong>constructor</strong> and the
<strong>destructor</strong>. The constructor always has the same name as the
class (it's called <span class="method">cOrganism</span>(...) in this
case), and is executed in order to create a new object of that class. The
arguments for the constructor must include all of the information required to
build on object of the desired class. For an organism, we need the world
object within which the organism resides, the current execution context, and
perhaps most importantly the genome of the organism. The method is not defined
here, only <strong>declared</strong>. A declared method must be defined
elsewhere in the program. All methods must be, at least, declared in the class
definition. Note that if there are many ways to create an object, multiple
constructors are allowed as long as they take different inputs.
</p>
<p>
Whereas the constructor is called when an object is created, the destructor
is called when the object is destroyed, whereupon it must do any cleanup,
such as freeing allocated memory (see the section on memory management
below). The name of a destructor is always the same as the class name,
but with a tilde ('~') in front of it. Thus, the cOrganism's destructor
is called <span class="method">~cOrganism</span>(). A destructor can
never take any arguments, and there must be only one of them in a class
definition.
</p>
<p>
The next group of five methods are all called when an organism needs
to perform some behavior, which in all of these cases involves it interacting
with the population. For example, if you need to know at whom an organism
is facing, you can call the method
<span class="method">GetNeighbor</span>()
on it, and a pointer to the neighbor currently faced will be returned.
Likewise, if you need to kill an organism, you can call the method
<span class="method">Die</span>() on it, and it will be terminated.
Since each of these require interaction on the population level, the population
itself takes care of the bulk of the functionality.
</p>
<p>
The 'Accessors' are methods that provide access to otherwise private
data. For example, the method <span class="method">GetGenome</span>()
will literally pass the genome of the organism to the object that calls it. In
particular, the hardware object associated with an organism will often call
<span class="method">GetPhenotype</span>() in order to get the current
state of the organism's phenotype and update it with something new the organism
has done. Several things to take note of here. In the first three accessors,
the name of the class being returned is followed by an ampersand ('&amp;').
This means that the actual object is being passed back, and not just a copy of
all the values contained in it. See the next section on pointers, references,
and values for more information about this. Also, in the very first accessor,
the keyword <code>const</code> is used twice. The first time is to say that
the object being passed out of the method is constant (that is, the programmer
should be warned if somewhere else in the code it is being changed) and the
second time is to say that the actions of this method will never change
anything about the object they are being run on (that is, the object is being
left constant even when the method is run.) The net effect of this is that an
object marked const can only have const methods run on it. The compiler will
assume that a non-const method being run will make a change to the object, and
is therefore an error.
</p>
<p>
This section has contained information about a particular C++ class found
in Avida. The next sections will more generally explain some of the
principles of the language. If you haven't already, now might be a good
time to take a deep breath before you dive back in.
</p>
<p>&nbsp;</p>
<h2>Pointers, References, and Values</h2>
<p>
The three ways of passing information around in a
C++ program is through sending a pointer to the location of that information,
sending a reference to it, or actually just sending the value of the information.
For the moment, lets consider the return value of a method. Consider
the three methods below:
</p>
<pre>
<span class="class">Genome</span> <span class="method">GetGenomeValue</span>();
<span class="class">Genome</span>* <span class="method">GetGenomePointer</span>();
<span class="class">Genome</span>&amp; <span class="method">GetGenomeReference</span>();
</pre>
<p>
These three cases are all very different. In the first case
(<strong>Pass-by-Value</strong>), the value of the genome in question is
returned. That means that the genome being returned is analyzed, and the exact
sequence of instruction in it are sent to the object calling this method. Once
the requesting object gets this information, however, any changes made to it do
<em>not</em> affect the original genome that was copied. The second case
(<strong>Pass-by-Pointer</strong>),only a few bytes of information are returned
that give the location in memory of this genome. The requesting object can then
go and modify that memory if it chooses to, but it must first 'resolve' the
pointer to do so. Finally, the last case (<strong>Pass-by-Reference</strong>)
actually passes the whole object out. It is used in a very similar way to
pass-by-value, but any changes made to the genome after it is passed out will
affect the genome in the actual organism itself! Pass-by-reference does not
add any new functionality over pass-by-pointer, but in practice it is often
easier to use.
</p>
<p>&nbsp;</p>
<h2>Memory Management</h2>
<p>
Memory management in C++ can be as simple or complicated as the programmer
wants it to be. If you never explicitly allocate a chunk of memory, than
you never need to worry about freeing it up when you're done using it.
However, there are many occasions where a program can be made faster or
more flexible by dynamically allocating objects. The command
<strong><code>new</code></strong> is used to allocate memory; for example if
you wanted to allocate memory for a new genome containing 100 instructions,
you could type:
</p>
<pre>
<span class="class">Genome</span>* <span class="object">created_genome</span> = new <span class="method">Genome</span>(100);
</pre>
<p>
The variable created_genome is defined as a pointer to a memory location
containing a genome. This is assigned the location of the newly allocated
genome in memory, which is the return value of the <code>new</code> command.
The Genome constructor (called with new) takes as an argument a single number
that is the sequence length of the genome.
</p>
<p>
Unfortunately, C++ won't know when we're done using this genome. If we
need to create many different genomes and we only use each of them once,
our memory can rapidly fill up. So, we need tell the memory management
that we are finished with the current object. Thus, when we're done using
it, we type:
</p>
<pre>
delete <span class="object">created_genome</span>;
</pre>
<p>
And the memory pointed to by the created_genome
variable will be freed up.
</p>
<p>
An excellent example of when allocation and freeing of memory is employed
in Avida is with the genotype. Every time a new genotype is created during
evolution, Avida needs to allocate a new object of class cGenotype. During
the course of a run, millions of genotypes may be created, so we need to be
sure to free genotypes whenever they will no longer be needed in the run.
</p>
<p>&nbsp;</p>
<h2>Inherited Classes</h2>
<p>
One of the beauties of C++ is that well written code
is inherently very reusable. As part of that, there is the concept
of the <strong>class inheritance</strong>. When a new class is built in C++,
it is possible to build it off of an existing class, then referred to as
a <strong>base class</strong>. The new <strong>derived</strong>
class will have access to all of the methods in the base class, and can
<strong>overload</strong> them; that is, it can change how any of those
methods operate.
</p>
<p>
For example, in the Avida scheduler, we use a class called
<span class="class">cSchedule</span> to determine which organism's
virtual CPU executes the next instruction. Well, this cSchedule object is not
all that clever. In fact, all that it does is run through the list of
organisms that need to go, lets them each execute a single instruction, and
then does it all over again. But sometimes we need to make some organisms
execute instructions at a faster rate than others. For that reason, there are
several derived classes, including
<span class="class">cIntegratedSchedule</span>, which takes in a merit
value for each organism, and assigns CPU cycles proportional to that merit.
Since this new class uses cSchedule as its base class, it can be dynamically
plugged in during run time, after looking at what the user chooses in the
configuration file.
</p>
<p>
If a method is not implemented in a base class (left to be implemented in the
derived classes) it is called an <strong>abstract</strong> method. If a base
class does not implement <em>any</em> of its methods and is only used in order
to specify what methods need to be included in the derived classes, it is
referred to as an <strong>abstract base class</strong> (or sometimes an
<strong>interface class</strong> or <strong>protocol</strong>) and is used
simply as a method contract for derived classes. An example in Avida where this
is used is the organism interface with the environemnt. The class
<span class="class">cOrgInterface</span> is an abstract base class,
with <span class="class">cPopulationInterface</span> and
<span class="class">cTestCPUInterface</span> as derived classes. This
organization allows for organism objects to interact with both the population
and the test environment, without having to write separate code for each.
</p>
<p>&nbsp;</p>
<h2>Other C++ Resources:</h2>
<ul>
<li><a href="http://www.intap.net/~drw/cpp/">Online C++ Tutorial</a></li>
<li><a href="http://www.cplusplus.com/doc/tutorial/">Another Tutorial</a></li>
<li><a href="http://www.quiver.freeserve.co.uk/OOP1.htm">Object Oriented Concepts Tutorial</a></li>
<li><a href="http://www.research.att.com/~bs/glossary.html">A Glossary of C++ Terms</a></li>
</ul>