Homework 1 feedback

Homework 1 Feedback

Note: for your grade in the course, full credit for the assignment will be awarded so long as you fully complete the homework and hand it in on time. Grades assigned and feedback provided are intended to help make sure you understand the material, and to understand the kinds of things I will look for when grading the final project assignment.

This page is intended to give you some sense of what I was looking for in grading homework one. The sample answers that are provided are not the only acceptable ones. For some questions, it is necessary to base the grade on a somewhat subjective sense of the depth of your understanding of the relevant issues.

Part 1. Jets and Sharks

1A [5 points]: Why are some of Ken’s properties more strongly activated than others?

Example answer:

Ken’s name unit activates the Ken_in instance unit which, in turn, activates the rest of Ken’s properties (Sharks, 20s, HS, Single, Burglar). These property units, in turn, partially activate other instance units who are similar to Ken in that they share many of his properties (e.g., Nick, Neal). The partially activated instance units then provide additional support to their own property units, which sometimes support and sometimes compete with Ken’s properties, causing differing levels of final activation.

Feedback:

People generally did well on this question, although some made only general statements about the network without reasoning specifically about interactions among the units. It is always a good idea to provide a specific example.

1B [5 points]: Several other instance units are partially active, but none of the other name units are active. Explain the difference in how these two groups behave.

Example answer:

Each name unit receives positive input from only a single instance unit, which is insufficient to overcome the direct inhibition from other name units. By contrast, each instance unit receive positive input from multiple property units, which can sometimes overcome the direct inhibition from other instance units.

Feedback:

People generally did well here, though some people failed to note that name units only receive a single positive input while instance units receive multiple positive inputs..

1C [5 points]: Why are some of the instance units more active than others?

Example answer:

Individuals differ in the degree to which they share properties with Ken (and with each other), and hence the degree to which activation of Ken’s properties (via his name unit) provides support for the instance units of those individuals. In particular, although Nick, Neal, and Rick all share three features with Ken, Nick and Neal share four properties with each other but only three with Rick, so they support each other (via the property units) more than either supports Rick.

Feedback:

Some people just said that certain instance units received more input from the property units without explaining why (i.e., not mentioning that this depended on the level of overlap between the individuals’ properties and those of Ken and other individuals similar to Ken). Most people understood that individuals similar to Ken will be more strongly activated; but a further important observation is that the overlap amongst these other individuals is also relevant (eg Nick and Neal share many properties with each other, and fewer properties with Rick). This is what makes the model different from a straigthforward similarilty-based model!

1D [10 points]: [after providing input to Sharks and 20s units] Explain why the occupation units show partial activations of units other than Ken’s occupation, which is Burglar. Be sure to contrast the current case with the one with Ken as input.

Example answer:

The Sharks and 20s input causes a greater degree of partial activation among a number of instance units other than Ken (e.g., Pete, Fred, Nick, Neal) than when the Ken name unit alone is provided as input. Two of these instances support Bookie and two support Pusher, so these alternative occupations receive significant support. Also, when the Ken name unit is input, the Burglar unit is activated much earlier than when Sharks and 20s are input, and this early activation provides an additional advantage in the competition with the Bookie and Pusher units.

Feedback:

Most people got the basic idea here.

1E [15 points]: [after removing the Lance-Burglar connections] Describe how the model was able to fill in the correct occupation for Lance. Also, explain why the model tends to activate the Div. (divorced) unit as well as the Mar. (married) unit.

Example answer:

Lance’s correct occupation (Burglar) is still activated because the instance units that share several of Lance’s properties (Al, Jim, John, George) also happen to share the same occupation (i.e., the network generalizes on the basis of similarity). The divorced unit is partially activated because two of the instances that are similar to Lance (Jim, George) are divorced rather than married.

Feedback:

Most people did well here too.

Part 2. (Schemas)

2A [10 points]: What does each of the notions of variable, value, and default value correspond to within Rumelhart et al.’s PDP formulation of a schema?

Basic answer:

This is more-or-less straight from the text (p. 33): “The variables of a schema correspond to those parts of the pattern that are not completely determined by the remainder of the structure of the pattern itself” and, thus, “vary from one situation to another.” The specific pattern these portions take in a given situation corresponds to the value assigned to the variable. “Default values represent variable subpatterns that tend to get filled-in in the absence of any specific input.”

Feedback:

People had a little trouble with the notions of a variable in terms of connectionist networks. Most people thought that a variable = a unit that did not get external input—but this is not exactly right. If, in a given schema, a unit does not get external input, but it’s activation is strongly constrained (so it is not free to vary across different instantiations of the schema), it isn’t a variable. Also, some definitions of a variable were often overly restricted. For example, a variable need not correspond to a single unit; the various different units coding size together correspond to a single variable, with different values of the variable corresponding to different patterns of activation across these. Most people were good with “value”, but some had trouble with the default activation: some seemed to think that the default was the unit’s initial activation; others that it was equal to the average activation value across schemas.

2B [10 points]: Summarize briefly how, according to Rumelhart and colleagues, schema embedding is instantiated in a constraint satisfaction network. That is, what does it mean for a particular collection of descriptors to be a subschema?

Basic answer:

On p. 35, Rumelhart et al. state “Under our interpretation, subschemata correspond to small configurations of units which cohere and which may be a part of many different stable patterns (and therefore constitute a schema on the own right).” So the critical property is that the descriptors within a subschema have to cohere (come and go together) across many different contexts. Rumelhart et al. use relative goodness across contexts as a measure of coherence.

Feedback:

Students generally did well on this question though many did not indicate that the subschema needed to be coherent across many different subschemas. If, for instance, fridge and stove appear either together or not at all, this is not sufficient to make them a subschema, because the two items only appear togther within the kitchen schema. This means that each is also highly correlated with, and predicted by, all other kitchen properties.

2C [20 points]: Based on your answer to 2B, critically evaluate the claim that the “windows” and “drapes” features do, in fact, form a subschema. Be sure to provide evidence from running the simulation (e.g., activation levels, goodness values) to support your argument.

Basic answer:

The text (see Figure 13) operationalizes a subschema as a set of units for which the goodness is higher when they occur together or not at all than when they occur separately. This is clearly true for “windows” and “drapes” only in the office schema (and maybe weakly in the bathroom); the effect is not found in the bedroom and living room schemas. Thus “windows” and “drapes” do not cohere within “many different stable patterns” and so don’t satisfy the criteria in 2B. [Now go on to provide specific evidence from your simulations, reporting goodness for different schemas when window/drapes are on together, off together, or in different states].

Feedback:

Many people had difficulty with this question. They had the idea that goodness needed to be the higher when both features were on than when either was on by itself. However, they didn’t always show that having neither on is better than only one. Also, they tended to give only a very little bit of evidence to test whether it was true of windows and drapes. Moreover—and this was perhaps the main difficulty—most people didn’t seek to evaluate goodness in different contexts. In fact, many people believed that a subschema “belonged” to a particular schema—that, for example, “window and drapes” was a subschema for office.

As a general rule, it is important to support your claims with evidence—in this case, that means listing the actual goodness values of configurations in different contexts.

2D [20 points]: Find another combination of features that, based on your intuitions of the contents of the various room types, ought to form a subschema but do not. Explain your choice of features and provide evidence for your claims (as in 2C). Does the pattern of weights among units (see the actual network or Figure 5 in the Rumelhart chapter) provide any insight into why the network behaves as it does?

Feedback:

Many different choices of a possible subschema are possible; common ones were “book” and “bookshelf”, or “desk” and “desk-chair”. As mentioned above, I was looking for you to argue that these (or other) features do not form a subschema using evidence gathered by running simulations. A natural way to do this would be to repeat 2C, determining goodness values for each standard schema with various combinations of your proposed features.

This question was also problematic for many students. Common problems were: 1) not considering the case where both members of the subschema are off; 2) only discussing the subschema in a single context; 3) not considering that individual items might occur independently—for example, “bed” and “clock” was one common choice, because these often occur together, even though it’s evident that clock occurs without ned in many contexts; 4) not providing evidence from simulations; and 5) not commenting on how analysis of the weights might shed light on the network’s