Articles

B.F. Skinner - History

B.F. Skinner - History


We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

B.F. Skinner

1904- 1990

Scientist

Burrhus Frederic Skinner was born on March 20, 1904 in Susquehanna Pennsylvania. He attended Hamilton College where he received a BA in English Literature. He then went to study at Harvard, where he began to study behaviorism. After briefly returning home to attempt to write a novel he returned to Harvard where he received a Phd in psychology in 1931. After teaching at University of Minnesota and Indiana University he returned to Harvard in 1948 as a tenured professor and their he remained for the rest of his life.

B.F. Skinner believed that behavior of any complexity could be explained by genetic predisposition to conditioned responses. His famed "Skinner box" was used experimentally with animals to demonstrate Skinner's theories of behavior and reinforcement.

Books

B.f. Skinner: A Life


B. F. Skinner

B.F. Skinner was the 20th century’s most influential psychologist, pioneering the science of behaviorism. Inventor of the Skinner Box, he discovered the power of positive reinforcement in learning, and he designed the first psychological experiments to give quantitatively repeatable and predictable results.

Skinner wrote a bestselling novel for the general public exploring the effects of utilizing his behaviorism principles in a community of people.

He courted controversy by denying the existence of free will, claiming that humans act according to rules programmed by a combination of our genes and the circumstances of our environment.

Beginnings

Burrhus Frederic Skinner was born in the small railroad town of Susquehanna, Pennsylvania, USA on March 20, 1904. He was always known to his family as Fred. His mother nearly died in childbirth.

Fred’s father, William Arthur Skinner, was an attorney. His mother, Grace Madge Burrhus, was a typist who gave up her job when she married she was also an accomplished pianist and singer. Both parents graduated second in their classes from high school. Fred had one younger brother, Edward.

Country Boys

Fred and Edward enjoyed a rural upbringing. The two brothers harvested grapes, apples, and plums from their home’s overgrown garden. In the forest they collected berries and nuts they set traps for mice and chipmunks, which they released, and snakes, which they killed. They gathered cocoons so they could watch moths and butterflies emerge later in the season. They watched the blacksmith and carpenter doing their work and watched trains passing through town.

Burning in the Fires of Hell

Fred’s parents were not very religious – they often skipped their Presbyterian church on Sundays.

Fred’s grandmother terrified him by pointing to a coal fire and telling him that if he ever told a lie he would be held in a fire like that forever.

Later, his father confirmed to him that bad boys ended up in the fires of hell. For years Fred would lie awake at night sobbing over a lie he had once told, unable to reveal even to his mother what was upsetting him so much.

Fred was not baptized. His father believed children should only be baptized when they understood what was happening, and for Fred that didn’t happen.

Although his confirmation that hell was real unintentionally traumatized Fred, Fred’s father never physically punished his sons. This was unusual for the era. Later in life, Fred’s behaviorist psychology would focus on rewards rather than punishments.

Toys and Books

Fred grew up with toys that stimulated his mind. He bolted together devices from Meccano, and he experimented with a working miniature steam engine. When he was a little older, he carried out his own chemistry and electrical experiments. He designed but did not build a perpetual motion machine. He built a steam cannon.

Fred spent a lot of time working on mathematical and word problems in a big puzzle book. His favorite books were Daniel Defoe’s Robinson Crusoe and Jules Verne’s Mysterious Island – he liked the idea of self-sufficiency. He wrote poems and short stories on his father’s typewriter.

As part of their education, their father took Fred and his brother to see factories and find out how things were produced.

Beating Teacher

His 12 years of schooling all took place in one small building. In eighth grade, he had trouble with a teacher. Fred argued with her in class, telling her that what she was saying was factually incorrect. His teacher complained to the principal, and the principal asked her about the specific arguments she was having with Fred. The principal realized Fred was correct, so Fred was not punished.

Francis Bacon

In eighth grade, Fred became a devotee of the works of William Shakespeare and also bought into the conspiracy theory that the true author of Shakespeare’s works was Francis Bacon. He read further works by Bacon, including his famous Novum Organum describing the Baconian Method of doing science.

Graduating from High School

Fred enjoyed his time at Susquehanna High School, where mathematics was his favorite subject. He graduated second in his class, just like his parents at the very same school.

The school’s principal, ‘Professor’ Bowles liked Fred, but near the end of Fred’s schooling grew worried that Fred had become an agnostic or even an atheist. He tried to guide Fred by giving him a book entitled God or Gorilla, which attempted to refute the theory of evolution Fred was not guided by the book.

Hamilton College & Tragedy

In September 1922, age 18, Frederic Skinner matriculated at Hamilton College, Clinton, New York, a men’s liberal arts school. He soon learned that only a few of the staff had PhDs.

Fred Skinner. Hamilton College freshman.

In the Easter vacation, Fred’s younger brother Edward fell violently ill. Fred called the doctor, who told him to go fetch his parents from church. By the time they returned, Edward had died. The cause was determined to be ‘acute indigestion.’ Later in life, Fred reasoned that the true cause had been a massive cerebral hemorrhage. Edward’s death at age 16 left the family, including Fred, in shock for a long time.

At the end of his freshman year, Fred wrote an essay about the year. He had not enjoyed it. Too many of his fellow students looked down on scholarly students like Fred, who wanted to learn about literature rather than engage in athletics. This had drained his enthusiasm. His disdain for his fellow students was matched by theirs for him: he was thought of as a somewhat vain, standoffish, highbrow character.

Fred enjoyed his sophomore year and by junior year had decided to major in English Language and Literature. He graduated with a Bachelor of Arts in 1926, intent on becoming a novelist.

Failed Novelist Turns to Psychology

By the end of summer 1926, Fred realized he could not write a novel and began considering an alternative future. He realized novel-writing was attractive to him because it involved describing and analyzing human behavior. In November 1927, he decided to abandon literature and study psychology. He became a graduate student at Harvard University’s psychology department in the fall of 1928, age 24. Three years later he graduated with a Ph.D. in psychology.


Contents

B.F. Skinner was an American behaviourist inspired by John Watson's philosophy of behaviorism. [5] Skinner was captivated with systematically controlling behaviour to result in desirable or beneficial outcomes. This passion led Skinner to become the father of operant conditioning. [4] Skinner made significant contributions to the research concepts of reinforcement, punishment, schedules of reinforcement, behaviour modification and behaviour shaping. [6] The mere existence of the instinctive drift phenomenon challenged Skinner's initial beliefs on operant conditioning and reinforcement. [4]

Operant conditioning Edit

Skinner described operant conditioning as strengthening behaviour through reinforcement. Reinforcement can consist of positive reinforcement, in which a desirable stimulus is added negative reinforcement, in which an undesirable stimulus is taken away positive punishment, in which an undesirable stimulus is added and negative punishment, in which a desirable stimulus is taken away. [7] Through these practices, animals shape their behaviour and are motivated to perform said learned behaviour to optimally benefit from rewards or to avoid punishment. Through operant conditioning, the presence of instinctive drift was discovered. [3]

The term instinctive drift was coined by married couple Keller and Marian Breland Bailey, former psychology graduate students of B.F. Skinner at the University of Minnesota. Keller and Marian were recruited to work with B.F. Skinner on a project to train pigeons to pilot bombs towards targets to aid with World War II efforts. [3] This project was terminated when the development of the atom bomb took precedence. [3] The Brelands, however, still enthralled with the application of animal behaviour, adopted Skinner's principles and began a life of training animals. They profited from these animals performing complex and amusing behaviours for the public's entertainment. They coined their successful business, "Animal Behaviour Enterprises" in 1943. [4] [3] Their business soon gained nationwide attention and even had a partnership with General Mills to train chickens, via operant conditioning, for business promotion. [4] [3]

Keller and Marian Breland were the discoverers of instinctive drift. [4] [3] They first noted this behavioural pattern when animals they had been training for years interrupted their learned behaviours to satisfy innate patterns of feeding behaviours. [3] This discovery debunked the once assumed ideas that animals are a "tabula rasa" prior to purposeful training and that all responses are equally conditionable. [3] The Breland's described their first exposure to this phenomenon when working with their chickens that had been trained to appear as if they were turning on a jukebox and subsequently dancing. The breakdown in operant conditioning appeared when over half the chickens they had trained to stand on a platform developed an unplanned scratching or pecking pattern. [3] The scratching pattern was subsequently used to create the "dancing chicken" performance. [3]

The Breland's had their second, and more perplexing, encounter with instinctive drift when working with raccoons. They were training racoons to perform a captivating sequence of events to aid with the advertisement of a bank. This project involved teaching raccoons to deposit money into a bank slot. The Breland's were successful at yet another animal training project as raccoons were initially very successful at the task of depositing coins into the bank. The Brelands then noticed that over time and as the reinforcement schedule was spaced out, the raccoons began to dip the coins in and out of the bank and rub them with their paws rather than depositing them. They concluded that this was an instinct that was interfering with the raccoons’ performance on the task. [4] In nature, raccoons dip their food in water several times in order to wash it. This is an instinct which was seemingly triggered by the similar action sequence involved in retrieving and depositing coins into a bank. Instinctive behaviour is usually automatic and unplanned and is a natural reaction which often is preferred by the animal over learned and unnatural actions. [2] This instinctual drift was successfully avoided when they instead taught the raccoons to place a basketball into a basket. Because of the size of the ball and the different body position involved in this action, the raccoons did not experience instinctual drift (they did not dip the balls in and out of the basket).

A similar training regimen was applied on pigs, animals who are known to condition rapidly. [4] These pigs were trained to insert wooden coins into a piggy bank. [8] Over time, the pigs stopped depositing the coins and instead began to drop it in the dirt, push it down with their noses, drag it back out, and fling it into the air. [8] This is a series of actions which are part of a behaviour known as rooting. It is an instinctual pattern of behaviour which pigs use to dig for food and to communicate. [8] The pigs chose to engage in rooting rather than performing their trained action (depositing the coin) and therefore, this is yet another clear example of instinctive drift interfering with operant conditioning. [8]

The nature vs. nurture controversy is a major topic discussed in psychology and pertains to animal training as well. Both sides of the nature vs. nurture debate have valid points and this controversy is one of the most debated in psychology. [9] A common question asked today by many experts in various fields is if behaviour is due to life experiences or if it is predisposed in DNA. [9] Today, partial credit is given to both sides and in many cases nature and nurture are given equal weight. With animal training it is often questioned if the training and shaping is the cause of a behaviour exhibited by an animal (nurture), or if the behaviour is actually innate to the species (nature). [9] [10] Instinctive drift centers around the nature of behaviour more so than learning being the sole cause of a behaviour. Species are obviously capable of learning behaviours, this is not denied in instinctive drift. [9] Instinctive drift says that animals often revert to innate (nature) behaviours that can interfere with conditioned responses (nurture). [9]

Instinctive drift can be discussed in association with evolution. [11] Evolution is commonly classified as change occurring over a period of time. [11] Instinctive drift says that animals will behave in accordance with evolutionary contingencies, as opposed to operant contingencies of their specific training. [11] Evolutionary roots of instinct exist. [12] Evolution of traits and behaviours occur over time and it is by means of evolution and natural selection that adaptive traits and behaviours are passed on to the next generation and maladaptive traits are weaned out. It is the adaptive traits of species over time that is exhibited in instinctive drift and that species revert to that interferes with operant conditioning. [12] [11] Much knowledge on the topic of evolution and natural selection can be credited to Charles Darwin. [11] Darwin developed and proposed the theory of evolution and it was through this knowledge that other subjects could be better understood, such as instinctive drift. [12]


A day of great illumination: B. F. Skinner's discovery of shaping

Despite the seminal studies of response differentiation by the method of successive approximation detailed in chapter 8 of The Behavior of Organisms (1938), B. F. Skinner never actually shaped an operant response by hand until a memorable incident of startling serendipity on the top floor of a flour mill in Minneapolis in 1943. That occasion appears to have been a genuine eureka experience for Skinner, causing him to appreciate as never before the significance of reinforcement mediated by biological connections with the animate social environment, as opposed to purely mechanical connections with the inanimate physical environment. This insight stimulated him to coin a new term (shaping), and also led directly to a shift in his perspective on verbal behavior from an emphasis on antecedents and molecular topographical details to an emphasis on consequences and more molar, functional properties in which the social dyad inherent to the shaping process became the definitive property of verbal behavior. Moreover, the insight seems to have emboldened Skinner to explore the greater implications of his behaviorism for human behavior writ large, an enterprise that characterized the bulk of his post-World War II scholarship.


Issues and Debates

Free will vs Determinism

Strong determinism of the behavioral approach as all behavior is learnt from our environment through classical and operant conditioning. We are the sum total of our previous conditioning.

Softer determinism of the social learning approach theory as it recognises an element of choice as to whether we imitate a behavior or not.

Nature vs Nurture

Behaviorism is very much on the nurture side of the debate as it argues that our behavior is learnt from the environment.

The social learning theory is also on the nurture side because it argues that we learn our behavior from role models in our environment.

The behaviorist approach proposes that apart from a few innate reflexes and the capacity for learning, all complex behavior is learned from the environment.

Holism vs Reductionism

The behaviorist approach and social learning are reductionist they isolate parts of complex behaviors to study.

The behaviorists take the view that all behavior, no matter how complex, can be broken down into the fundamental processes of conditioning.

Idiographic vs Nomothetic

It is a nomothetic approach as it views all behavior governed by the same laws of conditioning.

However, it does account for individual differences and explain them in terms of difference of history of conditioning.

Are the research methods used scientific?

The behaviorist approach introduced the scientific methods to psychology. Laboratory experiments were used with high control of extraneous variables.

These experiments were replicable and the data obtained was objective (not influenced by an individual’s judgement or opinion) and measurable. This gave psychology more credibility.

However the behaviorists use animal experiments as it assumes that humans learn in the same way than animals.


Contribution to Psychology

Over the course of his long career, Skinner developed many theories and inventions, and he remains one of the best known and most controversial figures in psychology. His behaviorist theories remain hotly contested and have influenced fields ranging from education to dog training. Skinner influenced behaviorism through his research on reinforcement he focused heavily on the exploration of negative and positive reinforcement and the effects they had on behavior. He believed that his behaviorist theories could save humanity from itself and argued in favor of positive reinforcement to shape political and social behavior. His theory of radical behaviorism argues that internal perceptions are not based on a psychological level of consciousness, but rather on an individual's own physical body.

Among Skinner&rsquos many inventions was a highly controversial one, known as the &ldquoAir-Crib&rdquo that he developed while teaching at Indiana University. Designed to support child rearing, the crib was a temperature-controlled, sterile, soundproof box that was meant to encourage a child&rsquos independence, while minimizing discomfort. The most famous of Skinner&rsquos inventions is commonly known as the &ldquoSkinner box,&rdquo a device designed to employ &ldquooperant conditioning&rdquo&mdashthe manipulation of behaviors through reinforcement. For example, an animal would receive a reward for small acts representing a desired behavior and the rewards would increase as the animal came closer to completing the desired behavior.

Skinner conducted extensive research into reinforcement as a method of teaching. Continuous reinforcement involves the constant delivery of reinforcement by reward for a desired behavior, but Skinner found the method impractical and ineffective. Interval-based reinforcements, on the other hand, are reinforcements delivered according to a specific schedule and tend to produce slow and steady change. Interval-based reinforcement might follow a fixed interval or variable interval schedule, providing reinforcement after a fixed or variable amount of time. Alternatively, interval-based reinforcement can follow a fixed-ratio schedule, in which reinforcement is given after a certain number of responses, or a variable-ratio schedule, in which reinforcement is provided based on an average number of responses. Skinner concluded that variable-ratio schedules tend to produce the most compliance, particularly when rewards occur frequently. For example, a person training a dog might reward the dog, on average, every five times it obeys, but vary the number of obedience tasks between each reward.


Contents

Skinner was born in Susquehanna, Pennsylvania, to Grace and William Skinner, the latter of whom was a lawyer. Skinner became an atheist after a Christian teacher tried to assuage his fear of the hell that his grandmother described. [15] His brother Edward, two and a half years younger, died at age 16 of a cerebral hemorrhage. [16]

Skinner's closest friend as a young boy was Raphael Miller, whom he called Doc because his father was a doctor. Doc and Skinner became friends due to their parents’ religiousness and both had an interest in contraptions and gadgets. They had set up a telegraph line between their houses to send messages to each other, although they had to call each other on the telephone due to the confusing messages sent back and forth. During one summer, Doc and Skinner started an elderberry business to gather berries and sell them door to door. They found that when they picked the ripe berries, the unripe ones came off the branches too, so they built a device that was able to separate them. The device was a bent piece of metal to form a trough. They would pour water down the trough into a bucket, and the ripe berries would sink into the bucket and the unripe ones would be pushed over the edge to be thrown away. [17]

Education Edit

Skinner attended Hamilton College in New York with the intention of becoming a writer. He found himself at a social disadvantage at the college because of his intellectual attitude, [ further explanation needed ] [18] He was a member of Lambda Chi Alpha fraternity. [17]

He wrote for the school paper, but, as an atheist, he was critical of the traditional mores of his college. After receiving his Bachelor of Arts in English literature in 1926, he attended Harvard University, where he would later research and teach. While attending Harvard, a fellow student, Fred Keller, convinced Skinner that he could make an experimental science from the study of behaviour. This led Skinner to invent a prototype for the Skinner Box and to join Keller in the creation of other tools for small experiments. [18]

After graduation, Skinner unsuccessfully tried to write a great novel while he lived with his parents, a period that he later called the 'Dark Years.' [18] He became disillusioned with his literary skills despite encouragement from the renowned poet Robert Frost, concluding that he had little world experience and no strong personal perspective from which to write. His encounter with John B. Watson's behaviourism led him into graduate study in psychology and to the development of his own version of behaviourism. [18]

Later life Edit

Skinner received a PhD from Harvard in 1931, and remained there as a researcher for some years. In 1936, he went to the University of Minnesota in Minneapolis to teach. [19] In 1945, he moved to Indiana University, [20] where he was chair of the psychology department from 1946 to 1947, before returning to Harvard as a tenured professor in 1948. He remained at Harvard for the rest of his life. In 1973, Skinner was one of the signers of the Humanist Manifesto II. [21]

In 1936, Skinner married Yvonne (Eve) Blue. The couple had two daughters, Julie (m. Vargas) and Deborah Buzan (married Barry Buzan). [22] [23] Yvonne died in 1997, [24] and is buried in Mount Auburn Cemetery, Cambridge, Massachusetts. [25]

Skinner's public exposure had increased in the 1970s, he remained active even after his retirement in 1974, until his death. In 1989, Skinner was diagnosed with leukemia and died on August 18, 1990, in Cambridge, Massachusetts. Ten days before his death, he was given the lifetime achievement award by the American Psychological Association and gave a talk concerning his work. [26]

Behaviourism Edit

Skinner referred to his approach to the study of behaviour as radical behaviourism, [27] which originated in the early 1900s as a reaction to depth psychology and other traditional forms of psychology, which often had difficulty making predictions that could be tested experimentally. This philosophy of behavioural science assumes that behaviour is a consequence of environmental histories of reinforcement (see applied behaviour analysis). In his words:

The position can be stated as follows: what is felt or introspectively observed is not some nonphysical world of consciousness, mind, or mental life but the observer's own body. This does not mean, as I shall show later, that introspection is a kind of psychological research, nor does it mean (and this is the heart of the argument) that what are felt or introspectively observed are the causes of the behaviour. An organism behaves as it does because of its current structure, but most of this is out of reach of introspection. At the moment we must content ourselves, as the methodological behaviourist insists, with a person's genetic and environment histories. What are introspectively observed are certain collateral products of those histories.… In this way we repair the major damage wrought by mentalism. When what a person does [is] attributed to what is going on inside him, investigation is brought to an end. Why explain the explanation? For twenty-five hundred years people have been preoccupied with feelings and mental life, but only recently has any interest been shown in a more precise analysis of the role of the environment. Ignorance of that role led in the first place to mental fictions, and it has been perpetuated by the explanatory practices to which they gave rise. [27]

Foundations of Skinner's behaviourism Edit

Skinner's ideas about behaviourism were largely set forth in his first book, Behaviour of Organisms (1938). [9] Here, he gives a systematic description of the manner in which environmental variables control behaviour. He distinguished two sorts of behaviour which are controlled in different ways:

  • Respondentbehaviours are elicited by stimuli, and may be modified through respondent conditioning, often called classical (or pavlovian) conditioning, in which a neutral stimulus is paired with an eliciting stimulus. Such behaviours may be measured by their latency or strength.
  • Operant behaviours are 'emitted,' meaning that initially they are not induced by any particular stimulus. They are strengthened through operantconditioning (aka instrumental conditioning), in which the occurrence of a response yields a reinforcer. Such behaviours may be measured by their rate.

Both of these sorts of behaviour had already been studied experimentally, most notably: respondents, by Ivan Pavlov [28] and operants, by Edward Thorndike. [29] Skinner's account differed in some ways from earlier ones, [30] and was one of the first accounts to bring them under one roof.

The idea that behaviour is strengthened or weakened by its consequences raises several questions. Among the most commonly asked are these:

  1. Operant responses are strengthened by reinforcement, but where do they come from in the first place?
  2. Once it is in the organism's repertoire, how is a response directed or controlled?
  3. How can very complex and seemingly novel behaviours be explained?

1. Origin of operant behaviour Edit

Skinner's answer to the first question was very much like Darwin's answer to the question of the origin of a 'new' bodily structure, namely, variation and selection. Similarly, the behaviour of an individual varies from moment to moment a variation that is followed by reinforcement is strengthened and becomes prominent in that individual's behavioural repertoire. Shaping was Skinner's term for the gradual modification of behaviour by the reinforcement of desired variations. Skinner believed that 'superstitious' behaviour can arise when a response happens to be followed by reinforcement to which it is actually unrelated. [ clarification needed ]

2. Control of operant behaviour Edit

The second question, "how is operant behaviour controlled?" arises because, to begin with, the behaviour is "emitted" without reference to any particular stimulus. Skinner answered this question by saying that a stimulus comes to control an operant if it is present when the response is reinforced and absent when it is not. For example, if lever-pressing only brings food when a light is on, a rat, or a child, will learn to press the lever only when the light is on. Skinner summarised this relationship by saying that a discriminative stimulus (e.g. light or sound) sets the occasion for the reinforcement (food) of the operant (lever-press). This three-term contingency (stimulus-response-reinforcer) is one of Skinner's most important concepts, and sets his theory apart from theories that use only pair-wise associations. [30]

3. Explaining complex behaviour Edit

Most behaviour of humans cannot easily be described in terms of individual responses reinforced one by one, and Skinner devoted a great deal of effort to the problem of behavioural complexity. Some complex behaviour can be seen as a sequence of relatively simple responses, and here Skinner invoked the idea of "chaining". Chaining is based on the fact, experimentally demonstrated, that a discriminative stimulus not only sets the occasion for subsequent behaviour, but it can also reinforce a behaviour that precedes it. That is, a discriminative stimulus is also a "conditioned reinforcer". For example, the light that sets the occasion for lever pressing may also be used to reinforce "turning around" in the presence of a noise. This results in the sequence "noise – turn-around – light – press lever – food." Much longer chains can be built by adding more stimuli and responses.

However, Skinner recognised that a great deal of behaviour, especially human behaviour, cannot be accounted for by gradual shaping or the construction of response sequences. [31] Complex behaviour often appears suddenly in its final form, as when a person first finds his way to the elevator by following instructions given at the front desk. To account for such behaviour, Skinner introduced the concept of rule-governed behaviour. First, relatively simple behaviours come under the control of verbal stimuli: the child learns to "jump," "open the book," and so on. After a large number of responses come under such verbal control, a sequence of verbal stimuli can evoke an almost unlimited variety of complex responses. [31]

Reinforcement Edit

Reinforcement, a key concept of behaviourism, is the primary process that shapes and controls behaviour, and occurs in two ways: positive and negative. In The behaviour of Organisms (1938), Skinner defines negative reinforcement to be synonymous with punishment, i.e. the presentation of an aversive stimulus. This definition would subsequently be re-defined in Science and Human behaviour (1953).

In what has now become the standard set of definitions, positive reinforcement is the strengthening of behaviour by the occurrence of some event (e.g., praise after some behaviour is performed), whereas negative reinforcement is the strengthening of behaviour by the removal or avoidance of some aversive event (e.g., opening and raising an umbrella over your head on a rainy day is reinforced by the cessation of rain falling on you).

Both types of reinforcement strengthen behaviour, or increase the probability of a behaviour reoccurring the difference being in whether the reinforcing event is something applied (positive reinforcement) or something removed or avoided (negative reinforcement). Punishment can be the application of an aversive stimulus/event (positive punishment or punishment by contingent stimulation) or the removal of a desirable stimulus (negative punishment or punishment by contingent withdrawal). Though punishment is often used to suppress behaviour, Skinner argued that this suppression is temporary and has a number of other, often unwanted, consequences. [32] Extinction is the absence of a rewarding stimulus, which weakens behaviour.

Writing in 1981, Skinner pointed out that Darwinian natural selection is, like reinforced behaviour, "selection by consequences." Though, as he said, natural selection has now "made its case," he regretted that essentially the same process, "reinforcement", was less widely accepted as underlying human behaviour. [33]

Schedules of reinforcement Edit

Skinner recognised that behaviour is typically reinforced more than once, and, together with Charles Ferster, he did an extensive analysis of the various ways in which reinforcements could be arranged over time, calling it the schedules of reinforcement. [10]

The most notable schedules of reinforcement studied by Skinner were continuous, interval (fixed or variable), and ratio (fixed or variable). All are methods used in operant conditioning.

  • Continuous reinforcement (CRF): each time a specific action is performed the subject receives a reinforcement. This method is effective when teaching a new behaviour because it quickly establishes an association between the target behaviour and the reinforcer. [34]
  • Interval schedule: based on the time intervals between reinforcements. [7]
    • Fixed Interval schedule (FI): A procedure in which reinforcements are presented at fixed time periods, provided that the appropriate response is made. This schedule yields a response rate that is low just after reinforcement and becomes rapid just before the next reinforcement is scheduled.
    • Variable Interval schedule (VI): A procedure in which behaviour is reinforced after scheduled but unpredictable time durations following the previous reinforcement. This schedule yields the most stable rate of responding, with the average frequency of reinforcement determining the frequency of response.
    • Fixed Ratio schedule (FR): A procedure in which reinforcement is delivered after a specific number of responses have been made.
    • Variable Ratio schedule (VR): [7] A procedure in which reinforcement comes after a number of responses that is randomized from one reinforcement to the next (e.g. slot machines). The lower the number of responses required, the higher the response rate tends to be. Variable ratio schedules tend to produce very rapid and steady responding rates in contrast with fixed ratio schedules where the frequency of response usually drops after the reinforcement occurs.

    Token economy Edit

    "Skinnerian" principles have been used to create token economies in a number of institutions, such as psychiatric hospitals. When participants behave in desirable ways, their behaviour is reinforced with tokens that can be changed for such items as candy, cigarettes, coffee, or the exclusive use of a radio or television set. [35]

    Verbal behaviour Edit

    Challenged by Alfred North Whitehead during a casual discussion while at Harvard to provide an account of a randomly provided piece of verbal behaviour, [36] Skinner set about attempting to extend his then-new functional, inductive approach to the complexity of human verbal behaviour. [37] Developed over two decades, his work appeared in the book Verbal behaviour. Although Noam Chomsky was highly critical of Verbal behaviour, he conceded that Skinner's "S-R psychology" was worth a review. [38] (behaviour analysts reject the "S-R" characterization: operant conditioning involves the emission of a response which then becomes more or less likely depending upon its consequence.) [38]

    Verbal behaviour had an uncharacteristically cool reception, partly as a result of Chomsky's review, partly because of Skinner's failure to address or rebut any of Chomsky's criticisms. [39] Skinner's peers may have been slow to adopt the ideas presented in Verbal behaviour because of the absence of experimental evidence—unlike the empirical density that marked Skinner's experimental work. [40]

    Operant conditioning chamber Edit

    An operant conditioning chamber (also known as a Skinner Box) is a laboratory apparatus used in the experimental analysis of animal behaviour. It was invented by Skinner while he was a graduate student at Harvard University. As used by Skinner, the box had a lever (for rats), or a disk in one wall (for pigeons). A press on this "manipulandum" could deliver food to the animal through an opening in the wall, and responses reinforced in this way increased in frequency. By controlling this reinforcement together with discriminative stimuli such as lights and tones, or punishments such as electric shocks, experimenters have used the operant box to study a wide variety of topics, including schedules of reinforcement, discriminative control, delayed response ("memory"), punishment, and so on. By channeling research in these directions, the operant conditioning chamber has had a huge influence on course of research in animal learning and its applications. It enabled great progress on problems that could be studied by measuring the rate, probability, or force of a simple, repeatable response. However, it discouraged the study of behavioural processes not easily conceptualised in such terms—spatial learning, in particular, which is now studied in quite different ways, for example, by the use of the water maze. [30]

    Cumulative recorder Edit

    The cumulative recorder makes a pen-and-ink record of simple repeated responses. Skinner designed it for use with the operant chamber as a convenient way to record and view the rate of responses such as a lever press or a key peck. In this device, a sheet of paper gradually unrolls over a cylinder. Each response steps a small pen across the paper, starting at one edge when the pen reaches the other edge, it quickly resets to the initial side. The slope of the resulting ink line graphically displays the rate of the response for example, rapid responses yield a steeply sloping line on the paper, slow responding yields a line of low slope. The cumulative recorder was a key tool used by Skinner in his analysis of behaviour, and it was very widely adopted by other experimenters, gradually falling out of use with the advent of the laboratory computer and use of line graphs. [41] Skinner's major experimental exploration of response rates, presented in his book with Charles Ferster, Schedules of Reinforcement, is full of cumulative records produced by this device. [10]

    Air crib Edit

    The air crib is an easily cleaned, temperature- and humidity-controlled box-bed intended to replace the standard infant crib. [42] Skinner invented the device to help his wife cope with the day-to-day tasks of child rearing. It was designed to make early childcare simpler (by reducing laundry, diaper rash, cradle cap, etc.), while allowing the baby to be more mobile and comfortable, and less prone to cry. Reportedly it had some success in these goals. [43]

    The air crib was a controversial invention. It was popularly mischaracterized as a cruel pen, and it was often compared to Skinner's operant conditioning chamber (aka the 'Skinner Box'). This association with laboratory animal experimentation discouraged its commercial success, though several companies attempted production. [43] [44]

    In 2004 therapist Lauren Slater repeated unfounded rumors that Skinner had used his baby daughter in some of his experiments, and that she had subsequently committed suicide. [45] His outraged daughter publicly accused Slater of giving new life to old lies about her and her father and of inventing new ones, and faulted her not making a good-faith effort to check her facts before publishing. [46]

    Teaching machine Edit

    The teaching machine was a mechanical device whose purpose was to administer a curriculum of programmed learning. The machine embodies key elements of Skinner's theory of learning and had important implications for education in general and classroom instruction in particular. [47]

    In one incarnation, the machine was a box that housed a list of questions that could be viewed one at a time through a small window. (see picture.) There was also a mechanism through which the learner could respond to each question. Upon delivering a correct answer, the learner would be rewarded. [48]

    Skinner advocated the use of teaching machines for a broad range of students (e.g., preschool aged to adult) and instructional purposes (e.g., reading and music). For example, one machine that he envisioned could teach rhythm. He wrote: [49]

    A relatively simple device supplies the necessary contingencies. The student taps a rhythmic pattern in unison with the device. "Unison" is specified very loosely at first (the student can be a little early or late at each tap) but the specifications are slowly sharpened. The process is repeated for various speeds and patterns. In another arrangement, the student echoes rhythmic patterns sounded by the machine, though not in unison, and again the specifications for an accurate reproduction are progressively sharpened. Rhythmic patterns can also be brought under the control of a printed score.

    The instructional potential of the teaching machine stemmed from several factors: it provided automatic, immediate and regular reinforcement without the use of aversive control the material presented was coherent, yet varied and novel the pace of learning could be adjusted to suit the individual. As a result, students were interested, attentive, and learned efficiently by producing the desired behaviour, "learning by doing." [50]

    Teaching machines, though perhaps rudimentary, were not rigid instruments of instruction. They could be adjusted and improved based upon the students' performance. For example, if a student made many incorrect responses, the machine could be reprogrammed to provide less advanced prompts or questions—the idea being that students acquire behaviours most efficiently if they make few errors. Multiple-choice formats were not well-suited for teaching machines because they tended to increase student mistakes, and the contingencies of reinforcement were relatively uncontrolled.

    Not only useful in teaching explicit skills, machines could also promote the development of a repertoire of behaviours that Skinner called self-management. Effective self-management means attending to stimuli appropriate to a task, avoiding distractions, reducing the opportunity of reward for competing behaviours, and so on. For example, machines encourage students to pay attention before receiving a reward. Skinner contrasted this with the common classroom practice of initially capturing students’ attention (e.g., with a lively video) and delivering a reward (e.g., entertainment) before the students have actually performed any relevant behaviour. This practice fails to reinforce correct behaviour and actually counters the development of self-management.

    Skinner pioneered the use of teaching machines in the classroom, especially at the primary level. Today computers run software that performs similar teaching tasks, and there has been a resurgence of interest in the topic related to the development of adaptive learning systems. [51]

    Pigeon-guided missile Edit

    During World War II, the US Navy required a weapon effective against surface ships, such as the German Bismarck class battleships. Although missile and TV technology existed, the size of the primitive guidance systems available rendered automatic guidance impractical. To solve this problem, Skinner initiated Project Pigeon, [52] [53] which was intended to provide a simple and effective guidance system. This system divided the nose cone of a missile into three compartments, with a pigeon placed in each. Lenses projected an image of distant objects onto a screen in front of each bird. Thus, when the missile was launched from an aircraft within sight of an enemy ship, an image of the ship would appear on the screen. The screen was hinged, such that pecks at the image of the ship would guide the missile toward the ship. [54]

    Despite an effective demonstration, the project was abandoned, and eventually more conventional solutions, such as those based on radar, became available. Skinner complained that "our problem was no one would take us seriously." [55]

    Verbal summator Edit

    Early in his career Skinner became interested in "latent speech" and experimented with a device he called the verbal summator. [56] This device can be thought of as an auditory version of the Rorschach inkblots. [56] When using the device, human participants listened to incomprehensible auditory "garbage" but often read meaning into what they heard. Thus, as with the Rorschach blots, the device was intended to yield overt behaviour that projected subconscious thoughts. Skinner's interest in projective testing was brief, but he later used observations with the summator in creating his theory of verbal behaviour. The device also led other researchers to invent new tests such as the tautophone test, the auditory apperception test, and the Azzageddi [ when defined as? ] test. [57]

    Along with psychology, education has also been influenced by Skinner's views, which are extensively presented in his book The Technology of Teaching, as well as reflected in Fred S. Keller's Personalized System of Instruction and Ogden R. Lindsley's Precision Teaching.

    Skinner argued that education has two major purposes:

    1. to teach repertoires of both verbal and nonverbal behaviour and
    2. to interest students in learning.

    He recommended bringing students’ behaviour under appropriate control by providing reinforcement only in the presence of stimuli relevant to the learning task. Because he believed that human behaviour can be affected by small consequences, something as simple as "the opportunity to move forward after completing one stage of an activity" can be an effective reinforcer. Skinner was convinced that, to learn, a student must engage in behaviour, and not just passively receive information. [47] : 389

    Skinner believed that effective teaching must be based on positive reinforcement which is, he argued, more effective at changing and establishing behaviour than punishment. He suggested that the main thing people learn from being punished is how to avoid punishment. For example, if a child is forced to practice playing an instrument, the child comes to associate practicing with punishment and thus develops feelings of dreadfulness and wishes to avoid practicing the instrument. This view had obvious implications for the then widespread practice of rote learning and punitive discipline in education. The use of educational activities as punishment may induce rebellious behaviour such as vandalism or absence. [58]

    Because teachers are primarily responsible for modifying student behaviour, Skinner argued that teachers must learn effective ways of teaching. In The Technology of Teaching (1968), Skinner has a chapter on why teachers fail: [59] : 93–113 He says that teachers have not been given an in-depth understanding of teaching and learning. Without knowing the science underpinning teaching, teachers fall back on procedures that work poorly or not at all, such as:

    • using aversive techniques (which produce escape and avoidance and undesirable emotional effects)
    • relying on telling and explaining ("Unfortunately, a student does not learn simply when he is shown or told.") [59] : 103
    • failing to adapt learning tasks to the student's current level and
    • failing to provide positive reinforcement frequently enough.

    Skinner suggests that any age-appropriate skill can be taught. The steps are

    1. Clearly specify the action or performance the student is to learn.
    2. Break down the task into small achievable steps, going from simple to complex.
    3. Let the student perform each step, reinforcing correct actions.
    4. Adjust so that the student is always successful until finally the goal is reached.
    5. Shift to intermittent reinforcement to maintain the student's performance.

    Skinner is popularly known mainly for his books Walden Two (1948) and Beyond Freedom and Dignity, (for which he made the cover of TIME Magazine). [60] The former describes a fictional "experimental community" [61] in 1940s United States. The productivity and happiness of citizens in this community is far greater than in the outside world because the residents practice scientific social planning and use operant conditioning in raising their children.

    Walden Two, like Thoreau's Walden, champions a lifestyle that does not support war, or foster competition and social strife. It encourages a lifestyle of minimal consumption, rich social relationships, personal happiness, satisfying work, and leisure. [62] In 1967, Kat Kinkade and others founded the Twin Oaks Community, using Walden Two as a blueprint. The community still exists and continues to use the Planner-Manager system and other aspects of the community described in Skinner's book, though behaviour modification is not a community practice. [63]

    In Beyond Freedom and Dignity, Skinner suggests that a technology of behaviour could help to make a better society. We would, however, have to accept that an autonomous agent is not the driving force of our actions. Skinner offers alternatives to punishment, and challenges his readers to use science and modern technology to construct a better society.

    Skinner's political writings emphasized his hopes that an effective and human science of behavioural control – a technology of human behaviour – could help with problems as yet unsolved and often aggravated by advances in technology such as the atomic bomb. Indeed, one of Skinner's goals was to prevent humanity from destroying itself. [64] He saw political activity as the use of aversive or non-aversive means to control a population. Skinner favored the use of positive reinforcement as a means of control, citing Jean-Jacques Rousseau's novel Emile: or, On Education as an example of literature that "did not fear the power of positive reinforcement." [3]

    Skinner's book, Walden Two, presents a vision of a decentralized, localized society, which applies a practical, scientific approach and behavioural expertise to deal peacefully with social problems. (For example, his views led him to oppose corporal punishment in schools, and he wrote a letter to the California Senate that helped lead it to a ban on spanking. [65] ) Skinner's utopia is both a thought experiment and a rhetorical piece. In Walden Two, Skinner answers the problem that exists in many utopian novels – "What is the Good Life?" The book's answer is a life of friendship, health, art, a healthy balance between work and leisure, a minimum of unpleasantness, and a feeling that one has made worthwhile contributions to a society in which resources are ensured, in part, by minimizing consumption.

    If the world is to save any part of its resources for the future, it must reduce not only consumption but the number of consumers.

    Skinner described his novel as "my New Atlantis", in reference to Bacon's utopia. [66]

    When Milton's Satan falls from heaven, he ends in hell. And what does he say to reassure himself? 'Here, at least, we shall be free.' And that, I think, is the fate of the old-fashioned liberal. He's going to be free, but he's going to find himself in hell.

    One of Skinner's experiments examined the formation of superstition in one of his favorite experimental animals, the pigeon. Skinner placed a series of hungry pigeons in a cage attached to an automatic mechanism that delivered food to the pigeon "at regular intervals with no reference whatsoever to the bird's behaviour." [67] He discovered that the pigeons associated the delivery of the food with whatever chance actions they had been performing as it was delivered, and that they subsequently continued to perform these same actions. [67]

    One bird was conditioned to turn counter-clockwise about the cage, making two or three turns between reinforcements. Another repeatedly thrust its head into one of the upper corners of the cage. A third developed a 'tossing' response, as if placing its head beneath an invisible bar and lifting it repeatedly. Two birds developed a pendulum motion of the head and body, in which the head was extended forward and swung from right to left with a sharp movement followed by a somewhat slower return.

    Skinner suggested that the pigeons behaved as if they were influencing the automatic mechanism with their "rituals", and that this experiment shed light on human behaviour: [67]

    The experiment might be said to demonstrate a sort of superstition. The bird behaves as if there were a causal relation between its behaviour and the presentation of food, although such a relation is lacking. There are many analogies in human behaviour. Rituals for changing one's fortune at cards are good examples. A few accidental connections between a ritual and favorable consequences suffice to set up and maintain the behaviour in spite of many unreinforced instances. The bowler who has released a ball down the alley but continues to behave as if she were controlling it by twisting and turning her arm and shoulder is another case in point. These behaviours have, of course, no real effect upon one's luck or upon a ball half way down an alley, just as in the present case the food would appear as often if the pigeon did nothing—or, more strictly speaking, did something else.

    Modern behavioural psychologists have disputed Skinner's "superstition" explanation for the behaviours he recorded. Subsequent research (e.g. Staddon and Simmelhag, 1971), while finding similar behaviour, failed to find support for Skinner's "adventitious reinforcement" explanation for it. By looking at the timing of different behaviours within the interval, Staddon and Simmelhag were able to distinguish two classes of behaviour: the terminal response, which occurred in anticipation of food, and interim responses, that occurred earlier in the interfood interval and were rarely contiguous with food. Terminal responses seem to reflect classical (as opposed to operant) conditioning, rather than adventitious reinforcement, guided by a process like that observed in 1968 by Brown and Jenkins in their "autoshaping" procedures. The causation of interim activities (such as the schedule-induced polydipsia seen in a similar situation with rats) also cannot be traced to adventitious reinforcement and its details are still obscure (Staddon, 1977). [68]

    Noam Chomsky Edit

    Noam Chomsky, a prominent critic of Skinner, published a review of Skinner's Verbal behaviour two years after it was published. [69] Chomsky argued that Skinner's attempt to use behaviourism to explain human language amounted to little more than word games. Conditioned responses could not account for a child's ability to create or understand an infinite variety of novel sentences. Chomsky's review has been credited with launching the cognitive revolution in psychology and other disciplines. Skinner, who rarely responded directly to critics, never formally replied to Chomsky's critique. Many years later, Kenneth MacCorquodale's reply was endorsed by Skinner. [70]

    Chomsky also reviewed Skinner's Beyond Freedom and Dignity, using the same basic motives as his Verbal behaviour review. Among Chomsky's criticisms were that Skinner's laboratory work could not be extended to humans, that when it was extended to humans it represented 'scientistic' behaviour attempting to emulate science but which was not scientific, that Skinner was not a scientist because he rejected the hypothetico-deductive model of theory testing, and that Skinner had no science of behaviour. [71]

    Psychodynamic psychology Edit

    Skinner has been repeatedly criticized for his supposed animosity towards Sigmund Freud, psychoanalysis, and psychodynamic psychology. Some have argued, however, that Skinner shared several of Freud's assumptions, and that he was influenced by Freudian points of view in more than one field, among them the analysis of defense mechanisms, such as repression. [72] [73] To study such phenomena, Skinner even designed his own projective test, the "verbal summator" described above. [74]

    J. E. R. Staddon Edit

    As understood by Skinner, ascribing dignity to individuals involves giving them credit for their actions. To say "Skinner is brilliant" means that Skinner is an originating force. If Skinner's determinist theory is right, he is merely the focus of his environment. He is not an originating force and he had no choice in saying the things he said or doing the things he did. Skinner's environment and genetics both allowed and compelled him to write his book. Similarly, the environment and genetic potentials of the advocates of freedom and dignity cause them to resist the reality that their own activities are deterministically grounded. J. E. R. Staddon has argued the compatibilist position [75] Skinner's determinism is not in any way contradictory to traditional notions of reward and punishment, as he believed. [76] [77]


    Early Years:

    B.F. Skinner described his Pennsylvania childhood as "warm and stable." As a boy, he enjoyed building and inventing things a skill he would later use in his own psychological experiments. He received a B.A. in English literature in 1926 from Hamilton College, and spent some time as a struggling writer before discovering the writings of Watson and Pavlov. Inspired by these works, Skinner decided to abandon his career as a novelist and entered the psychology graduate program at Harvard University.

    Skinner married Yvonne Blue in 1936, and the couple went on to have two daughters, Julie and Deborah.


    The Long and Surprising History of ‘Teaching Machines’

    Long before the advent of personal computers, inventors and researchers created what they called “teaching machines” in hopes of revolutionizing education. Some of these creations date back to the 1920s, and were made from wood and brass.

    Yet today’s edtech leaders often ignore or choose to forget this history, argues Audrey Watters, a longtime critical observer of edtech, who calls it “historical amnesia of the past.”

    “It's part of the narrative of this idea [by edtech founders] that, ‘we're innovators and we came up with these ideas and no one, no one has ever thought of this stuff before. So thank goodness we're here to save education,’” she says. “I wanted to show that in fact, people have been using technology in the classroom since the beginning, … and these ideas of personalized learning in particular aren't new either.”

    Watters traces the history of these pre-computer-age gadgets in her new book, “Teaching Machines: The History of Personalized Learning.” We connected with Watters for this week’s EdSurge Podcast.

    She argues that it’s important for today’s educators and policy leaders to know this history to understand the types of people and institutions who have pushed to bring automation to education. Since the beginning, she adds, there has been a contradiction between the promise of making learning more personalized and the reality that teaching machines often required a higher level of standardization.

    “I wanted to tell a story that didn’t have anything to do with computers,” she says, noting how mechanical early teaching machines were. “Because I think that too often in edtech, we get so hung up on the tech,” she adds. “We're so committed to talking about the latest gadget, the new software, this or that app, that we really act as though somehow it's that the tech is all there is to talk about. That the tech is the driving force of change. That the tech is the driving force of history.”

    Instead, Watters says, people have been driving an idea of how to use tech in education, an idea that supports a specific narrative of what education should be.

    One of the people she focuses on in the book is B.F. Skinner, who invented an early teaching machine when he was a psychology professor at Harvard in the 1950s. He was the kind of public intellectual who would be giving TED talks if he were alive today, and he was a leading proponent of “behaviorism,” which argued that people’s actions can be shaped (some might say manipulated) through positive and negative reinforcements.

    He often experimented on pigeons, some of which he even trained to play ping-pong. “And that was really the foundation for his education technology—that we'll build machines and they'll give students—just like pigeons—positive reinforcement and students—just like pigeons—will learn new skills.”

    Many of B.F. Skinner's research involved training pigeons to do things like play ping pong.

    Skinner made a case that his machines would let students learn at their own pace. But Watters worries that by design, the machines limited what students would learn. “There was very, very little freedom in Skinner's vision,” says Watters. “Indeed Skinner wrote a very well-known book, “Beyond Freedom and Dignity” in the early 1970s, in which he said freedom doesn't exist. Freedom as a facade.”

    After visiting his daughter's school, Skinner was inspired to build a 'teaching machine' of his own.

    Watters is known for her long-running blog, “Hack Education,” and for her skepticism of and critical look at the edtech industry.

    “A lot of people actually accused me of being a pessimist,” she says. “And I'm not a pessimist. I actually am hopeful.”

    She says she is currently reading the book “Hope in the Dark,” by Rebecca Solnit, about how studying history can be an important source of hope for the future.

    “We're despairing when we don't know the past,” says Watters. “We're despairing when we don't know that people have fought back before, that people have resisted before.”

    In her research she discovered people who resisted the teaching machines and determinist philosophy of folks like Skinner: “That's the place I find hope today, is where I see students who are questioning, students who are resisting and communities who are building practices that serve their needs rather than serving the needs of engineers.”

    Jeffrey R. Young (@jryoung) is producer and host of the EdSurge Podcast and the managing editor of EdSurge. He can be reached at jeff [at] edsurge [dot] com


    B.F. Skinner - History

    ARE THEORIES OF LEARNING NECESSARY ? [1]

    First published in Psychological Review , 57 , 193-216.

    Certain basic assumptions, essential to any scientific activity, are sometimes called theories. That nature is orderly rather than capricious is an example. Certain statements are also theories simply to the extent that they are not yet facts. A scientist may guess at the result of an experiment before the experiment is carried out. The prediction and the later statement of result may be composed of the same terms in the same syntactic arrangement, the difference being in the degree of confidence. No empirical statement is wholly non-theoretical in this sense, because evidence is never complete, nor is any prediction probably ever made wholly without evidence. The term "theory" will not refer here to statements of these sorts but rather to any explanation of an observed fact which appeals to events taking place somewhere else, at some other level of observation, described in different terms, and measured, if at all, in different dimensions.

    Three types of theory in the field of learning satisfy this definition. The most characteristic is to be found in the field of physiological psychology. We are all familiar with the changes that are supposed to take place in the nervous system when an organism learns. Synaptic connections are made or broken, electrical fields are disrupted or reorganized, concentrations of ions are built up or allowed to diffuse away, and so on. In the science of neurophysiology statements of this sort are not necessarily theories in the present sense. But in a science of behavior, where we are concerned with whether or not an organism secretes saliva when a bell rings, or jumps toward a gray triangle, or says bik when a cards reads tuz , or loves someone who resembles his mother, all statements about the nervous system are theories in the sense that they are not expressed in the same terms and could not be confirmed with the same methods of observation as the facts for which they are said to account.

    A second type of learning theory is in practice not far from the physiological, although there is less agreement about the method of direct observation. Theories of this type have always dominated the field of human behavior. They consist of references to "mental" events, as in saying that an organism learns to behave in a certain way because it "finds something pleasant" or because it "expects something to happen." To the mentalistic psychologist these explanatory events are no more theoretical than synaptic connections to the neurophysiologist, but in a science of behavior they are theories because the methods [p. 194] and terms appropriate to the events to be explained differ from the methods and terms appropriate to the explaining events.

    In a third type of learning theory the explanatory events are not directly observed. The writer's suggestion that the letters CNS be regarded as representing, not the Central Nervous System, but the Conceptual Nervous System (2, p. 421), seems to have been taken seriously. Many theorists point out that they are not talking about the nervous system as an actual structure undergoing physiological or bio-chemical changes but only as a system with a certain dynamic output. Theories of this sort are multiplying fast, and so are parallel operational versions of mental events. A purely behavioral definition of expectancy has the advantage that the problem of mental observation is avoided and with it the problem of how a mental event can cause a physical one. But such theories do not go so far as to assert that the explanatory events are identical with the behavioral facts which they purport to explain. A statement about behavior may support such a theory but will never resemble it in terms or syntax. Postulates are good examples. True postulates cannot become facts. Theorems may be deduced from them which, as tentative statements about behavior, may or may not be confirmed, but theorems are not theories in the present sense. Postulates remain theories until the end.

    It is not the purpose of this paper to show that any of these theories cannot be put in good scientific order, or that the events to which they refer may not actually occur or be studied by appropriate sciences. It would be foolhardy to deny the achievements of theories of this sort in the history of science. The question of whether they are necessary, however, has other implications and is worth asking. If the answer is no, then it may be possible to argue effectively against theory in the field of learning. A science of behavior must eventually deal with behavior in its relation to certain manipulable variables. Theories -- whether neural, mental, or conceptual -- talk about intervening steps in these relationships. But instead of prompting us to search for and explore relevant variables, they frequently have quite the opposite effect. When we attribute behavior to a neural or mental event, real or conceptual, we are likely to forget that we still have the task of accounting for the neural or mental event. When we assert that an animal acts in a given way because it expects to receive food, then what began as the task of accounting for learned behavior becomes the task of accounting for expectancy. The problem is at least equally complex and probably more difficult. We are likely to close our eyes to it and to use the theory to give us answers in place of the answers we might find through further study. It might be argued that the principal function of learning theory to date has been, not to suggest appropriate research, but to create a false sense of security, an unwarranted satisfaction with the status quo .

    Research designed with respect to theory is also likely to be wasteful. That a theory generates research does not prove its value unless the research is valuable. Much useless experimentation results from theories, and much energy and skill are absorbed by them. Most theories are eventually overthrown, and the greater part of the associated research is discarded. This could be justified if it were true that productive research requires a theory, as is, of course, often claimed. It is argued that research would be aimless and disorganized without a theory to guide it. The view is supported by psychological texts that take their cue from the logicians rather than empirical science and [p. 195] describe thinking as necessarily involving stages of hypothesis, deduction, experimental test, and confirmation. But this is not the way most scientists actually work. It is possible to design significant experiments for other reasons and the possibility to be examined is that such research will lead more directly to the kind of information that a science usually accumulates.

    The alternatives are at least worth considering. How much can be done without theory? What other sorts of scientific activity are possible? And what light do alternative practices throw upon our present preoccupation with theory?

    It would be inconsistent to try to answer these questions at a theoretical level. Let us therefore turn to some experimental material in three areas in which theories of learning now flourish and raise the question of the function of theory in a more concrete fashion.[2]

    The Basic Datum in Learning

    What actually happens when an organism learns is not an easy question. Those who are interested in a science of behavior will insist that learning is a change in behavior, but they tend to avoid explicit references to responses or acts as such. "Learning is adjustment, or adaptation to a situation." But of what stuff are adjustments and adaptations made? Are they data, or inferences from data? "Learning is improvement." But improvement in what? And from whose point of view? "Learning is restoration of equilibrium." But what is in equilibrium and how is it put there? "Learning is problem solving." But what are the physical dimensions of a problem -- or of a solution? Definitions of this sort show an unwillingness to take what appears before the eyes in a learning experiment as a basic datum. Particular observations seem too trivial. An error score falls but we are not ready to say that this is learning rather than merely the result of learning. An organism meets a criterion of ten successful trials but an arbitrary criterion is at variance with our conception of the generality of the learning process.

    This is where theory steps in. If it is not the time required to get out of a puzzle box that changes in learning, but rather the strength of a bond, or the conductivity of a neural pathway, or the excitatory potential of a habit, then problems seem to vanish. Getting out of a box faster and faster is not learning it is merely performance. The learning goes on somewhere else, in a different dimensional system. And although the time required depends upon arbitrary conditions, often varies discontinuously, and is subject to reversals of magnitude, we feel sure that the learning process itself is continuous, orderly, and beyond the accidents of measurement. Nothing could better illustrate the use of theory as a refuge from the data.

    But we must eventually get back to an observable datum. If learning is the process we suppose it to be, then it must appear so in the situations in which we study it. Even if the basic process belongs to some other dimensional system, our measures must have relevant and comparable properties. But productive experimental situations are hard to find, particularly if we accept certain plausible restrictions. To show an orderly change in the behavior of the average rat or ape or child is not enough, since learning is a process in the behavior of [p. 196] the individual. To record the beginning and end of learning or a few discrete steps will not suffice, since a series of cross-sections will not give complete coverage of a continuous process. The dimensions of the change must spring from the behavior itself they must not be imposed by an external judgment of success or failure or an external criterion of completeness. But when we review the literature with these requirements in mind, we find little justification for the theoretical process in which we take so much comfort.

    The energy level or work-output of behavior, for example, does not change in appropriate ways. In the sort of behavior adapted to the Pavlovian experiment (respondent behavior) there may be a progressive increase in the magnitude of response during learning. But we do not shout our responses louder and louder as we learn verbal material, nor does a rat press a lever harder and harder as conditioning proceeds. In operant behavior the energy or magnitude of response changes significantly only when some arbitrary value is differentially reinforced -- when such a change is what is learned.

    The emergence of a right response in competition with wrong responses is another datum frequently used in the study of learning. The maze and the discrimination box yield results which may be reduced to these terms. But a behavior-ratio of right vs . wrong cannot yield a continuously changing measure in a single experiment on a single organism. The point at which one response takes precedence over another cannot give us the whole history of the change in either response. Averaging curves for groups of trials or organisms will not solve this problem.

    Increasing attention has recently been given to latency, the relevance of which, like that of energy level, is suggested by the properties of conditioned and unconditioned reflexes. But in operant behavior the relation to a stimulus is different. A measure of latency involves other considerations, as inspection of any case will show. Most operant responses may be emitted in the absence of what is regarded as a relevant stimulus. In such a case the response is likely to appear before the stimulus is presented. It is no solution to escape this embarrassment by locking a lever so that an organism cannot press it until the stimulus is presented, since we can scarcely be content with temporal relations that have been forced into compliance with our expectations. Runway latencies are subject to this objection. In a typical experiment the door of a starting box is opened and the time that elapses before a rat leaves the box is measured. Opening the door is not only a stimulus, it is a change in the situation that makes the response possible for the first time. The time measured is by no means as simple as a latency and requires another formulation. A great deal depends upon what the rat is doing at the moment the stimulus is presented. Some experimenters wait until the rat is facing the door, but to do so is to tamper with the measurement being taken. If, on the other hand, the door is opened without reference to what the rat is doing, the first major effect is the conditioning of favorable waiting behavior. The rat eventually stays near and facing the door. The resulting shorter starting-time is not due to a reduction in the latency of a response, but to the conditioning of favorable preliminary behavior.


    Latencies in a single organism do not follow a simple learning process. Relevant data on this point were obtained as part of an extensive study of reaction time. A pigeon, enclosed in a box, is conditioned to peck at a recessed disc in one wall. Food is presented as reinforcement by exposing a hopper through [p. 197] a hole below the disc. If responses are reinforced only after a stimulus has been presented, responses at other times disappear. Very short reaction times are obtained by differentially reinforcing responses which occur very soon after the stimulus (4). But responses also come to be made very quickly without differential reinforcement. Inspection shows that this is due to the development of effective waiting. The bird comes to stand before the disc with its head in good striking position. Under optimal conditions, without differential reinforcement, the mean time between stimulus and response will be of the order of 1/3 sec. This is not a true reflex latency, since the stimulus is discriminative rather than eliciting, but it is a fair example of the latency used in the study of learning. The point is that this measure does not vary continuously or in an orderly fashion. By giving the bird more food, for example, we induce a condition in which it does not always respond. But the responses that occur show approximately the same temporal relation to the stimulus (Fig. 1, middle curve). In extinction, of special interest here, there is a scattering of latencies because lack of reinforcement generates an emotional condition. Some responses occur sooner and others are delayed, but the commonest value remains unchanged (bottom curve in Fig. 1). The longer latencies are easily explained by inspection. Emotional behavior, of which examples will be mentioned later, is likely to be in progress when the ready-signal is presented. It is often not discontinued before the "go" signal is presented, and the result is a long starting-time. Cases also begin to appear in which the bird simply does not respond at all during a specified time. If we average a large number of readings, either from one bird or many, we may create what looks like a progressive lengthening of latency. But the data for an individual organism do not show a continuous process.

    Another datum to be examined is the rate at which a response is emitted. Fortunately the story here is different. We study this rate by designing a situation in which a response may be freely repeated, choosing a response (for example, touching or pressing a small lever or key) that may be easily observed and counted. The responses may be recorded on a polygraph, but a more convenient form is a cumulative curve from which rate of responding is immediately read as slope. The rate at which a response is emitted in such a situation comes close to our preconception of the learning-process. As the organism learns, the rate rises. As it unlearns (for example, in extinction) the rate falls. Various sorts of discriminative stimuli may be brought into control of the response with corresponding modifications of the rate. Motivational changes alter the rate in a sensitive way. So do those events which we speak of as generating emotion. The range through which the rate varies significantly may be as great as of the order of 1000:1. Changes in rate are satisfactorily smooth in the individual case, so that it is not necessary to aver- [p. 198] age cases. A given value is often quite stable: in the pigeon a rate of four or five thousand responses per hour may be maintained without interruption for as long as fifteen hours.

    Rate of responding appears to be the only datum that varies significantly and in the expected direction under conditions which are relevant to the "learning process." We may, therefore, be tempted to accept it as our long-sought-for measure of strength of bond, excitatory potential, etc. Once in possession of an effective datum, however, we may feel little need for any theoretical construct of this sort. Progress in a scientific field usually waits upon the discovery of a satisfactory dependent variable. Until such a variable has been discovered, we resort to theory. The entities which have figured so prominently in learning theory have served mainly as substitutes for a directly observable and productive datum. They have little reason to survive when such a datum has been found.

    It is no accident that rate of responding is successful as a datum, because it is particularly appropriate to the fundamental task of a science of behavior. If we are to predict behavior (and possibly to control it), we must deal with probability of response . The business of a science of behavior is to evaluate this probability and explore the conditions that determine it. Strength of bond, expectancy, excitatory potential, and so on, carry the notion of probability in an easily imagined form, but the additional properties suggested by these terms have hindered the search for suitable measures. Rate of responding is not a "measure" of probability but it is the only appropriate datum in a formulation in these terms.

    As other scientific disciplines can attest, probabilities are not easy to handle. We wish to make statements about the likelihood of occurrence of a single future response, but our data are in the form of frequencies of responses that have already occurred. These responses were presumably similar to each other and to the response to be predicted. But this raises the troublesome problem of response-instance vs . response-class. Precisely what responses are we to take into account in predicting a future instance? Certainly not the responses made by a population of different organisms, for such a statistical datum raises more problems than it solves. To consider the frequency of repeated responses in an individual demands something like the experimental situation just described.

    This solution of the problem of a basic datum is based upon the view that operant behavior is essentially an emissive phenomenon. Latency and magnitude of response fail as measures because they do not take this into account. They are concepts appropriate to the field of the reflex, where the all but invariable control exercised by the eliciting stimulus makes the notion of probability of response trivial. Consider, for example, the case of latency. Because of our acquaintance with simple reflexes we infer that a response that is more likely to be emitted will be emitted more quickly. But is this true? What can the word "quickly" mean? Probability of response, as well as prediction of response, is concerned with the moment of emission. This is a point in time, but it does not have the temporal dimension of a latency. The execution may take time after the response has been initiated, but the moment of occurrence has no duration.[3] In recog- [p. 199] nizing the emissive character of operant behavior and the central position of probability of response as a datum, latency is seen to be irrelevant to our present task.

    Various objections have been made to the use of rate of responding as a basic datum. For example, such a program may seem to bar us from dealing with many events which are unique occurrences in the life of the individual. A man does not decide upon a career, get married, make a million dollars, or get killed in an accident often enough to make a rate of response meaningful. But these activities are not responses. They are not simple unitary events lending themselves to prediction as such. If we are to predict marriage, success, accidents, and so on, in anything more than statistical terms, we must deal with the smaller units of behavior which lead to and compose these unitary episodes. If the units appear in repeatable form, the present analysis may be applied. In the field of learning a similar objection takes the form of asking how the present analysis may be extended to experimental situations in which it is impossible to observe frequencies. It does not follow that learning is not taking place in such situations. The notion of probability is usually extrapolated to cases in which a frequency analysis cannot be carried out. In the field of behavior we arrange a situation in which frequencies are available as data, but we use the notion of probability in analyzing and formulating instances or even types of behavior which are not susceptible to this analysis.

    Another common objection is that a rate of response is just a set of latencies and hence not a new datum at all. This is easily shown to be wrong. When we measure the time elapsing between two responses, we are in no doubt as to what the organism was doing when we started our clock. We know that it was just executing a response. This is a natural zero -- quite unlike the arbitrary point from which latencies are measured. The free repetition of a response yields a rhythmic or periodic datum very different from latency. Many periodic physical processes suggest parallels.

    We do not choose rate of responding as a basic datum merely from an analysis of the fundamental task of a science of behavior. The ultimate appeal is to its success in an experimental science. The material which follows is offered as a sample of what can be done. It is not intended as a complete demonstration, but it should confirm the fact that when we are in possession of a datum which varies in a significant fashion, we are less likely to resort to theoretical entities carrying the notion of probability of response.

    We may define learning as a change in probability of response but we must also specify the conditions under which it comes about. To do this we must survey some of the independent variables of which probability of response is [p. 200] a function. Here we meet another kind of learning theory.

    An effective class-room demonstration of the Law of Effect may be arranged in the following way. A pigeon, reduced to 80 per cent of its ad lib weight, is habituated to a small, semi-circular amphitheatre and is fed there for several days from a food hopper, which the experimenter presents by closing a hand switch. The demonstration consists of establishing a selected response by suitable reinforcement with food. For example, by sighting across the amphitheatre at a scale on the opposite wall, it is possible to present the hopper whenever the top of the pigeon's head rises above a given mark. Higher and higher marks are chosen until, within a few minutes, the pigeon is walking about the cage with its head held as high as possible. In another demonstration the bird is conditioned to strike a marble placed on the floor of the amphitheatre. This may be done in a few minutes by reinforcing successive steps. Food is presented first when the bird is merely moving near the marble, later when it looks down in the direction of the marble, later still when it moves its head toward the marble, and finally when it pecks it. Anyone who has seen such a demonstration knows that the Law of Effect is no theory. It simply specifies a procedure for altering the probability of a chosen response.

    But when we try to say why reinforcement has this effect, theories arise. Learning is said to take place because the reinforcement is pleasant, satisfying, tension reducing, and so on. The converse process of extinction is explained with comparable theories. If the rate of responding is first raised to a high point by reinforcement and reinforcement then withheld, the response is observed to occur less and less frequently thereafter. One common theory explains this by asserting that a state is built up which suppresses the behavior. This "experimental inhibition" or "reaction inhibition" must be assigned to a different dimensional system, since nothing at the level of behavior corresponds to opposed processes of excitation and inhibition. Rate of responding is simply increased by one operation and decreased by another. Certain effects commonly interpreted as showing release from a suppressing force may be interpreted in other ways. Disinhibition, for example, is not necessarily the uncovering of suppressed strength it may be a sign of supplementary strength from an extraneous variable. The process of spontaneous recovery, often cited to support the notion of suppression, has an alternative explanation, to be noted in a moment.

    Let us evaluate the question of why learning takes place by turning again to some data. Since conditioning is usually too rapid to be easily followed, the process of extinction will provide us with a more useful case. A number of different types of curves have been consistently obtained from rats and pigeons using various schedules of prior reinforcement. By considering some of the relevant conditions we may see what room is left for theoretical processes.

    The mere passage of time between conditioning and extinction is a variable that has surprisingly little effect. The rat is too short-lived to make an extended experiment feasible, but the pigeon, which may live ten or fifteen years, is an ideal subject. More than five years ago, twenty pigeons were conditioned to strike a large translucent key upon which a complex visual pattern was projected. Reinforcement was contingent upon the maintenance of a high and steady rate of responding and upon striking a particular feature of the visual pattern. These birds were set aside in order to study retention. They were transferred to the usual living [p. 201] quarters, where they served as breeders. Small groups were tested for extinction at the end of six months, one year, two years, and four years. Before the test each bird was transferred to a separate living cage. A controlled feeding schedule was used to reduce the weight to approximately 80 per cent of the ad lib weight. The bird was then fed in the dimly lighted experimental apparatus in the absence of the key for several days, during which emotional responses to the apparatus disappeared. On the day of the test the bird was placed in the darkened box. The translucent key was present but not lighted. No responses were made. When the pattern was projected upon the key, all four birds responded quickly and extensively. Fig. 2 shows the largest curve obtained. This bird struck the key within two seconds after presentation of a visual pattern that it had not seen for four years, and at the precise spot upon which differential reinforcement had previously been based. It continued to respond for the next hour, emitting about 700 responses. This is of the order of one-half to one-quarter of the responses it would have emitted if extinction had not been delayed four years, but otherwise, the curve is fairly typical.

    Level of motivation is another variable to be taken into account. An example of the effect of hunger has been reported elsewhere (3). The response of pressing a lever was established in eight rats with a schedule of periodic reinforcement. They were fed the main part of their ration on alternate days so that the rates of responding on successive days were alternately high and low. Two subgroups of four rats each were matched on the basis of the rate maintained under periodic reinforcement under these conditions. The response was then extinguished -- in one group on alternate days when the hunger was high, in the other group on alternate days when the hunger was low. (The same amount of food was eaten on the non-experimental days as before.) The result is shown in Fig. 3. The upper graph gives the raw data. The levels of hunger are indicated by the points at P on the abscissa, the rates prevailing under periodic reinforcement. The subsequent points show the decline in extinction. If we multiply the lower curve through by a factor chosen to superimpose the points at P, the curves are reasonably closely superimposed, as shown in the lower graph. Several other experiments on both rats and pigeons have confirmed this general principle. If a given ratio of responding prevails under periodic reinforcement, the slopes of later extinction curves show the same ratio. Level of hunger determines the slope of the extinction curve but not its curvature.

    [p. 202] Another variable, difficulty of response, is especially relevant because it has been used to test the theory of reaction inhibition (1), on the assumption that a response requiring considerable energy will build up more reaction inhibition than an easy response and lead, therefore, to faster extinction. The theory requires that the curvature of the extinction curve be altered, not merely its slope. Yet there is evidence that difficulty of response acts like level of hunger simply to alter the slope. Some data have been reported but not published (5). A pigeon is suspended in a jacket which confines its wings and legs but leaves its head and neck free to respond to a key and a food magazine. Its behavior in this situation is quantitatively much like that of a bird moving freely in an experimental box. But the use of the jacket has the advantage that the response to the key may be made easy or difficult by changing the distance the bird must reach. In one experiment these distances were expressed in seven equal but arbitrary units. At distance 7 the bird could barely reach the key, at 3 it could strike without appreciably extending its neck. Periodic reinforcement gave a straight base-line upon which it was possible to observe the effect of difficulty by quickly changing position during the experimental period. Each of the five records in Fig. 4 covers a fifteen minute experimental period under periodic reinforcement. Distances of the bird from the key are indicated by numerals above the records. It will be observed that the rate of responding at distance 7 is generally quite low while that at distance 3 is high. Intermediate distances produce intermediate slopes. It should also be noted that the change from one position to another is felt immediately. If repeated responding in a difficult position were to build a considerable amount of reaction inhibition, we should expect the rate to be low for some little time after returning to an easy response. Contrariwise, if an easy response were to build little reaction inhibition, we should expect a fairly high rate of responding for some time after a difficult position is assumed. Nothing like this occurs. The "more rapid extinction" of a difficult response is an ambiguous expression. The slope constant is affected and with it the number of responses in extinction to a criterion, but there may be no effect upon curvature.

    One way of considering the question of why extinction curves are curved is to regard extinction as a process of ex- [p. 203] haustion comparable to the loss of heat from source to sink or the fall of the level of a reservoir when an outlet is opened. Conditioning builds up a predisposition to respond -- a "reserve" -- which extinction exhausts. This is perhaps a defensible description at the level of behavior. The reserve is not necessarily a theory in the present sense, since it is not assigned to a different dimensional system. It could be operationally defined as a predicted extinction curve, even though, linguistically, it makes a statement about the momentary condition of a response. But it is not a particularly useful concept, nor does the view that extinction is a process of exhaustion add much to the observed fact that extinction curves are curved in a certain way.


    There are, however, two variables that affect the rate, both of which operate during extinction to alter the curvature. One of these falls within the field of emotion. When we fail to reinforce a response that has previously been reinforced, we not only initiate a process of extinction, we set up an emotional response -- perhaps what is often meant by frustration. The pigeon coos in an [p. 204] identifiable pattern, moves rapidly about the cage, defecates, or flaps its wings rapidly in a squatting position that suggests treading (mating) behavior. This competes with the response of striking a key and is perhaps enough to account for the decline in rate in early extinction. It is also possible that the probability of a response based upon food deprivation is directly reduced as part of such an emotional reaction. Whatever its nature, the effect of this variable is eliminated through adaptation. Repeated extinction curves become smoother, and in some of the schedules to be described shortly there is little or no evidence of an emotional modification of rate.

    A second variable has a much more serious effect. Maximal responding during extinction is obtained only when the conditions under which the response was reinforced are precisely reproduced. A rat conditioned in the presence of a light will not extinguish fully in the absence of the light. It will begin to respond more rapidly when the light is again introduced. This is true for other kinds of stimuli, as the following classroom experiment illustrates. Nine pigeons were conditioned to strike a yellow triangle under intermittent reinforcement. In the session represented by Fig. 5 the birds were first reinforced on this schedule for 30 minutes. The combined cumulative curve is essentially a straight line, showing more than 1100 responses per bird during this period. A red triangle was then substituted for the yellow and no responses were reinforced thereafter. The effect was a sharp drop in responding, with only a slight recovery during the next fifteen minutes. When the yellow triangle was replaced, rapid responding began immediately and the usual extinction curve followed. Similar experiments have shown that the pitch of an incidental tone, the shape of a pattern being struck, or the size of a pattern, if present during conditioning, will to some extent control the rate of responding during extinction. Some properties are more effective than others, and a quantitative evaluation is possible. By changing to several values of a stimulus in random order repeatedly during the extinction process, the gradient for stimulus generalization may be read directly in the rates of responding under each value.

    Something very much like this must go on during extinction. Let us suppose that all responses to a key have been reinforced and that each has been followed by a short period of eating. When we extinguish the behavior, we create a situation in which responses are not reinforced, in which no eating takes place, and in which there are probably new emotional responses. The situation could easily be as novel as a red triangle after a yellow. If so, it could explain the decline in rate during extinction. We might have obtained a [p. 205] smooth curve, shaped like an extinction curve , between the vertical lines in Fig. 5 by gradually changing the color of the triangle from yellow to red. This might have happened even though no other sort of extinction were taking place. The very conditions of extinction seem to presuppose a growing novelty in the experimental situation. Is this why the extinction curve is curved?

    Some evidence comes from the data of "spontaneous recovery." Even after prolonged extinction an organism will often respond at a higher rate for at least a few moments at the beginning of another session. One theory contends that this shows spontaneous recovery from some sort of inhibition, but another explanation is possible. No matter how carefully an animal is handled, the stimulation coincident with the beginning of an experiment must be extensive and unlike anything occurring in the later part of an experimental period. Responses have been reinforced in the presence of, or shortly following, the organism is again placed in the experimental situation, the stimulation is this stimulation. In extinction it is present for only a few moments. When restored further responses are emitted as in the case of the yellow triangle. The only way to achieve full extinction in the presence of the stimulation of starting an experiment is to start the experiment repeatedly.

    Other evidence of the effect of novelty comes from the study of periodic reinforcement. The fact that intermittent reinforcement produces bigger extinction curves than continuous reinforcement is a troublesome difficulty for those who expect a simple relation between number of reinforcements and number of responses in extinction. But this relation is actually quite complex. One result of periodic reinforcement is that emotional changes adapt out. This may be responsible for the smoothness of subsequent extinction curves but probably not for their greater extent. The latter may be attributed to the lack of novelty in the extinction situation. Under periodic reinforcement many responses are made without reinforcement and when no eating has recently taken place. The situation in extinction is therefore not wholly novel.

    Periodic reinforcement is not, however, a simple solution. If we reinforce [p. 206] Periodic reinforcement is not, however, a simple solution. If we reinforce on a regular schedule -- say, every minute -- the organism soon forms a discrimination. Little or no responding occurs just after reinforcement, since stimulation from eating is correlated with absence of subsequent reinforcement. How rapidly the discrimination may develop is shown in Fig. 6, which reproduces the first five curves obtained from a pigeon under periodic reinforcement in experimental periods of fifteen minutes each. In the fifth period (or after about one hour of periodic reinforcement) the discrimination yields a pause after each reinforcement, resulting in a markedly stepwise curve. As a result of this discrimination the bird is almost always responding rapidly when reinforced. This is the basis for another discrimination. Rapid responding becomes a favorable stimulating condition. A good example of the effect upon the subsequent extinction curve is shown in Fig. 7. This pigeon had been reinforced once every minute during daily experimental periods of fifteen minutes each for several weeks. In the extinction curve shown, the bird begins to respond at the rate prevailing under the preceding schedule. A quick positive acceleration at the start is lost in the reduction of the record. The pigeon quickly reaches and sustains a rate that is higher than the overall-rate during periodic reinforcement. During this period the pigeon creates a stimulating condition previously optimally correlated with reinforcement. Eventually, as some sort of exhaustion intervenes, the rate falls off rapidly to a much lower but fairly stable value and then to practically zero. A condition then prevails under which a response is not normally reinforced. The bird is therefore not likely to begin to respond again. When it does respond, however, the situation is slightly improved and, if it continues to respond, the conditions rapidly become similar to those under which reinforcement has been received. Under this "autocatalysis" a high rate is quickly reached, and more than 500 responses are emitted, in a second burst. The rate then declines quickly and fairly smoothly, again to nearly zero. This curve is not by any means disorderly. Most of the curvature is smooth. But the burst of responding at forty-five minutes shows a considerable residual strength which, if extinction were merely exhaustion, should have appeared earlier in the curve. The curve may be reasonably accounted for by assuming that the [p. 207] bird is largely controlled by the preceding spurious correlation between reinforcement and rapid responding.

    This assumption may be checked by constructing a schedule of reinforcement in which a differential contingency between rate of responding and reinforcement is impossible. In one such schedule of what may be called "aperiodic reinforcement" one interval between successive reinforced responses is so short that no unreinforced responses intervene while the longest interval is about two minutes. Other intervals are distributed arithmetically between these values, the average remaining one minute. The intervals are roughly randomized to compose a program of reinforcement. Under this program the probability of reinforcement does not change with respect to previous reinforcements, and the curves never acquire the stepwise character of curve E in Fig. 6. (Fig. 9 shows curves from a similar program.) As a result no correlation between different rates of responding and different probabilities of reinforcement can develop.

    An extinction curve following a brief exposure to aperiodic reinforcement is shown in Fig. 8. It begins characteristically at the rate prevailing under aperiodic reinforcement and, unlike the curve following regular periodic reinforcement, does not accelerate to a higher overall rate. There is no evidence of the "autocatalytic" production of an optimal stimulating condition. Also characteristically, there are no significant discontinuities or sudden changes in rate in either direction. The curve extends over a period of eight hours, as against not quite two hours in Fig. 7, and seems to represent a single orderly process. The total number of responses is higher, perhaps because of the greater time allowed for emission. All of this can be explained by the single fact that we have made it impossible for the pigeon to form a pair of discriminations based, first, upon stimulation from eating and, second, upon stimulation from rapid responding.

    Since the longest interval between reinforcement was only two minutes, a certain novelty must still have been introduced as time passed. Whether this explains the curvature in Fig. 8 may be tested to some extent with other programs of reinforcement containing much longer intervals. A geometric progression was constructed by beginning with 10 seconds as the shortest interval and repeatedly multiplying through by a ratio of 1.54. This yielded a set of intervals averaging 5 minutes, the longest of which was more than 21 minutes. Such a set was randomized in a program [p. 208] of reinforcement repeated every hour.
    In changing to this program from the arithmetic series, the rates first declined during the longer intervals, but the pigeons were soon able to sustain a constant rate of responding under it. Two records in the form in which they were recorded are shown in Fig. 9. (The pen resets to zero after every thousand responses. In order to obtain a single cumulative curve it would be necessary to cut the record and to piece the sections together to yield a continuous line. The raw form may be reproduced with
    less reduction.) Each reinforcement is represented by a horizontal dash. The time covered is about 3 hours. Records are shown for two pigeons that maintained different overall rates under this program of reinforcement.

    Under such a schedule a constant rate of responding is sustained for at least 21 minutes without reinforcement, after which a reinforcement is received. Less novelty should therefore develop during succeeding extinction. In Curve 1 of Fig. 10 the pigeon had been exposed to several sessions of several hours each with this geometric set of intervals. The number of responses emitted in extinction is about twice that of the curve in Fig. 8 after the arithmetic set of intervals averaging one minute, but the
    curves are otherwise much alike. Further exposure to the geometric schedule builds up longer runs during which the rate does not change significantly. Curve 2 followed Curve 1 after two and one-half hours of further aperiodic reinforcement. On the day shown in Curve 2 a few aperiodic reinforcements were first given, as marked at the beginning of the curve. When reinforcement was discontinued, a fairly constant rate of responding prevailed for several thousand responses. After another experimental session of two and one-half hours with the geometric series, Curve 3 was recorded. This session also began with a short series of aperiodic reinforcements, followed by a sustained run of more than 6000 unreinforced responses with little change in rate (A). There seems to be no reason why other series averaging perhaps more than five minutes per interval and containing much longer exceptional intervals would not carry such a straight line much further.

    In this attack upon the problem of extinction we create a schedule of reinforcement which is so much like the conditions that will prevail during extinction that no decline in rate takes [p. 209] place for a long time. In other words we generate extinction with no curvature. Eventually some kind of exhaustion sets in, but it is not approached gradually. The last part of Curve 3 (unfortunately much reduced in the figure) may possibly suggest exhaustion in the slight overall curvature, but it is a small part of the whole process. The record is composed mainly of runs of a few hundred responses each, most of them at approximately the same rate as that maintained under periodic reinforcement. The pigeon stops abruptly when it starts to respond again, it
    quickly reaches the rate of responding under which it was reinforced. This recalls the spurious correlation between rapid responding and reinforcement under regular reinforcement. We have not, of course, entirely eliminated this correlation. Even though there is no longer a differential reinforcement of high against low rates, practically all reinforcements have occurred under a constant rate of responding.

    Further study of reinforcing schedules may or may not answer the question of whether the novelty appearing in the extinction situation is entirely responsible for the curvature. It would appear to be necessary to make the conditions prevailing during extinction identical with the conditions prevailing during conditioning. This may be impossible, but in that case the question is
    academic. The hypothesis, meanwhile, is not a theory in the present sense, [p. 210] since it makes no statements about a
    parallel process in any other universe of discourse.[4]

    The study of extinction after different schedules of aperiodic reinforcement is not addressed wholly to this hypothesis. The object is an economical description of the conditions prevailing during reinforcement and extinction and of the relations between them. In using rate of responding as a basic datum we may appeal to conditions that are observable and manipulable and we may express the relations between them in objective terms. To the extent that our datum makes this possible, it reduces the need for theory. When we observe a pigeon emitting 7000 responses at a constant rate without reinforcement, we are not likely to explain an extinction curve containing perhaps a few hundred responses by appeal to the piling up of reaction inhibition or any other fatigue product. Research which is conducted without commitment to theory is more likely to carry the study of extinction into new areas and new orders of magnitude. By hastening the accumulation of data, we speed the departure of theories. If the theories have played no part in the design of our experiments, we need not be sorry to see them go.

    A third type of learning theory is illustrated by terms like preferring , choosing , discriminating , and matching . An effort may be made to define these solely in terms of behavior, but in traditional practice they refer to processes in another dimensional system. A response to one of two available stimuli may be called choice, but it is commoner to say that it is the result of choice, meaning by the latter a theoretical pre-behavioral activity. The higher mental processes are the best examples of theories of this sort neurological parallels have not been well worked out. The appeal to theory is encouraged by the fact that choosing (like discriminating, matching, and so on) is not a particular piece of behavior. It is not a response or an act with specified topography. The term characterizes a larger segment of behavior in relation to other variables or events. Can we formulate and study the behavior to which these terms would usually be applied without recourse to the theories which generally accompany them?

    Discrimination is a relatively simple case. Suppose we find that the probability of emission of a given response is not significantly affected by changing from one of two stimuli to the other. We then make reinforcement of the response contingent upon the presence of one of them. The well-established result is that the probability of response remains high under this stimulus and
    reaches a very low point under the other. We say that the organism now discriminates between the stimuli. But discrimination is not itself an action, or necessarily even a unique process. Problems in the field of discrimination may be stated in other terms. How much induction obtains between stimuli of different magnitudes or classes? What are the smallest differences in stimuli
    that yield a difference in control? And so on. Questions of this sort do not presuppose theoretical activities in other dimensional systems.


    A somewhat larger segment must be specified in dealing with the behavior of choosing one of two concurrent stimuli. This has been studied in the pigeon by examining responses to two keys differ- [p. 211] ing in position (right or left) or in some property like color randomized with respect to position. By occasionally reinforcing a response on one key or the other without favoring either key, we obtain equal rates of responding on the two keys. The behavior approaches a simple alternation from one key to the other. This follows the rule that tendencies to respond eventually correspond to the probabilities of reinforcement. Given a system in which one key or the other is occasionally connected with the magazine by an external clock, then if the right key has just been struck, the probability of reinforcement via the left key is higher than that via the right since a greater interval of time has elapsed during which the clock may have closed the circuit to the left key. But the bird's behavior does not correspond to this probability merely out of respect for mathematics. The specific result of such a contingency of reinforcement is that changing-to-the-other-key-and-striking is more often reinforced than striking-the-same-key-a-second-time. We are no longer dealing with just two responses. In order to analyze "choice" we must consider a single final response, striking, without respect to the position or color of the key, and in addition the responses of changing from one key or color to the other.

    Quantitative results are compatible with this analysis. If we periodically reinforce responses to the right key only, the rate of responding on the right will rise while that on the left will fall. The response of changing-from-right-to-left is never reinforced while the response of changing-from-left-to-right is occasionally so. When the bird is striking on the right, there is no great tendency to change keys when it is striking on the left, there is a strong tendency to change. Many more responses come to be made to the right key. The need for considering the behavior of changing over is clearly shown if we now reverse these conditions and reinforce responses to the left key only. The ultimate result is a high rate of responding on the left key and a low rate on the right. By reversing the conditions again the high rate can be shifted back to the right key. In Fig. 11 a group of eight curves have been averaged to follow this change during six experimental periods of 45 minutes each. Beginning on the second day in the graph responses to the right key (R R ) decline in extinction while responses to the left key (R L ) increase through periodic reinforcement. The mean rate shows no significant varia- [p. 212] tion, since periodic reinforcement is continued on the same schedule. The mean rate shows the condition of strength of the response of striking a key regardless of position. The distribution of responses between right and left depends upon the relative strength of the responses of changing over. If this were simply a case of the extinction of one response and the concurrent reconditioning of another, the mean curve would not remain approximately horizontal since reconditioning occurs much more rapidly than extinction.[5]

    The rate with which the bird changes from one key to the other depends upon the distance between the keys. This distance is a rough measure of the stimulus-difference between the two keys. It also determines the scope of the response of changing-over, with an implied difference in sensory feed-back. It also modifies the spread of reinforcement to responses supposedly not reinforced, since if the keys are close together, a response reinforced on one side may occur sooner after a preceding
    response on the other side. In Fig. 11 the two keys were about one inch apart. They were therefore fairly similar with respect to position in the experimental box. Changing from one to the other involved a minimum of sensory feedback, and reinforcement of a response to one key could follow very shortly upon a response to the other. When the keys are separated by as much as four inches, the change in strength is much more rapid. Fig. 12 shows two curves recorded simultaneously from a single pigeon during one experimental period of about 40 minutes. A high rate [p. 213] to the right key and a low rate to the
    left had previously been established. In the figure no responses to the right were reinforced, but those to the left were re-
    inforced every minute as indicated by the vertical dashes above curve L. The slope of R declines in a fairly smooth fashion while that of L increases, also fairly smoothly, to a value comparable to the initial value of R. The bird has conformed to the changed contingency within a single experimental period. The mean rate of responding is shown by a dotted line, which again shows no significant curvature.

    What is called "preference" enters into this formulation. At any stage of the process shown in Fig. 12 preference might be expressed in terms of the relative rates of responding to the two keys. This preference, however, is not in striking a key but in changing from one key to the other. The probability that the bird will strike a key regardless of its identifying properties behaves independently of the preferential response of changing from one key to the other. Several experiments have revealed an additional fact. A preference remains fixed if reinforcement is withheld. Fig. 13 is an example. It shows simultaneous extinction curves from two keys during seven daily experimental periods of one hour each. Prior to extinction the relative strength of the responses of changing-to-R and changing-to-L yielded a "preference" of about 3 to 1 for R. The constancy of the rate throughout the process of extinction has been shown in the figure by multiplying L through by a suitable constant and entering the points as small circles on R. If extinction altered the preference, the two curves could not be superimposed in this way.

    These formulations of discrimination and choosing enable us to deal with what is generally regarded as a much more complex process -- matching to sample. Suppose we arrange three translucent keys, each of which may be illuminated with red or green light. The middle key functions as the sample and we color it either red or green in random order. We color the two side keys
    one red and one green, also in random order. The "problem" is to strike the side key which corresponds in color to the middle key. There are only four three-key patterns in such a case, and it is possible that a pigeon could learn to make an appropriate response to each pattern. This does not happen, at least within the temporal span of the experiments to date. If we simply present a series of settings of the three colors and reinforce successful responses, the pigeon will strike the side keys without
    respect to color or pattern and be reinforced 50 per cent of the time. This is, in effect, a schedule of "fixed ratio" reinforcement which is adequate to maintain a high rate of responding.

    Nevertheless it is possible to get a pigeon to match to sample by reinforcing the discriminative responses of striking-red-after-being-stimulated-by-red and striking-green-after-being-stimulated-by-green while extinguishing the other two possibilities. The difficulty is in arranging the proper stimulation at the time of the response. The sample might be made conspicuous -- for example, by having the sample color in the general illumination of the experimental box. In such a case the pi- [p. 214] geon would learn to strike red keys in a red light and green keys in a green light (assuming a neutral illumination
    of the background of the keys). But a procedure which holds more closely to the notion of matching is to induce the pigeon to "look at the sample" by means of a separate reinforcement. We may do this by presenting the color on the middle key first, leaving the side keys uncolored. A response to the middle key is then reinforced (secondarily) by illuminating the side keys. The pigeon learns to make two responses in quick succession -- to the middle key and then to one side key. The response to the side key follows quickly upon the visual stimulation from the middle key, which is the requisite condition for a discrimination. Successful matching was readily established in all ten pigeons tested with this technique. Choosing the opposite is also easily set up. The discriminative response of striking-red-after-being-stimulated-by-red is apparently no easier to establish than striking-red-after-being-stimulated-by-green. When the response is to a key of the same color, however, generalization may
    make it possible for the bird to match a new color. This is an extension of the notion of matching that has not yet been studied with this method.

    Even when matching behavior has been well established, the bird will not respond correctly if all three keys are now presented at the same time. The bird does not possess strong behavior of looking at the sample. The experimenter must maintain a separate reinforcement to keep this behavior in strength. In monkeys, apes, and human subjects the ultimate success in
    choosing is apparently sufficient to reinforce and maintain the behavior of looking at the sample. It is possible that this species difference is simply a difference in the temporal relations required for reinforcement.

    The behavior of matching survives unchanged when all reinforcement is withheld. An intermediate case has been established in which the correct matching response is only periodically reinforced. In one experiment one color appeared on the middle key for one minute it was then changed or not changed, at random, to the other color. A response to this key illuminated the
    side keys, one red and one green, in random order. A response to a side key cut off the illumination to both side keys, until the middle key had again been struck. The apparatus recorded all matching responses on one graph and all non-matching on another. Pigeons which have acquired matching behavior under continuous reinforcement have maintained this behavior
    when reinforced no oftener than once per minute on the average. They may make thousands of matching responses per hour while being reinforced for no more than sixty of them. This schedule will not necessarily develop matching behavior in a naive bird, for the problem can be solved in three ways. The bird will receive practically as many reinforcements if it responds to (1) only one key or (2) only one color, since the programming of the experiment makes any persistent response eventually the correct one.

    A sample of the data obtained in a complex experiment of this sort is given in Fig. 14. Although this pigeon had learned to match color under continuous reinforcement, it changed to the spurious solution of a color preference under periodic reinforcement. Whenever the sample was red, it struck both the sample and the red side key and received all reinforcements. When the sample was green, it did not respond and the side keys were not illuminated. The result shown at the beginning of the graph in Fig. 14 is a high rate of responding on the upper graph, which records match- [p. 215] ing responses. (The record is actually step-wise, following the presence or absence of the red sample, but this is lost in the reduction in the figure.) A color preference, however, is not a solution to the problem of opposites. By changing to this problem, it was possible to change the bird's behavior as shown between the two vertical lines in the figure. The upper curve between these lines shows the decline in matching responses which had resulted from the color preference. The lower curve between the same lines shows the development of responding to and matching the opposite color. At the second vertical line the reinforcement was again made contingent upon matching. The upper curve shows the reestablishment of matching behavior while the lower curve shows a decline in striking the opposite color. The result was a true solution: the pigeon struck the sample, no matter what its color, and then the corresponding side key. The lighter line connects the means of a series of points on the two curves. It seems to follow the same rule as in the case of choosing: changes in the distribution of responses between two keys do not involve the over-all rate of responding to a key. This mean rate will not remain constant under the spurious solution achieved with a color preference, as at the beginning of this figure.

    These experiments on a few higher processes have necessarily been very briefly described. They are not offered as proving that theories of learning are not necessary, but they may suggest an alternative program in this difficult area. The data in the field of the higher mental processes transcend single responses or single stimulus-response relationships. But they appear to be susceptible to formulation in terms of the differentiation of concurrent responses, the discrimination of stimuli, the establishment of various sequences of responses, and so on. There seems to be no a priori reason why a complete account is not possible without appeal to theoretical processes in other dimensional systems.

    Perhaps to do without theories altogether is a tour de force that is too much to expect as a general practice. Theories are fun. But it is possible that the most rapid progress toward an understanding of learning may be made by research that is not designed to test theories. An adequate impetus is supplied by the inclination to obtain data showing orderly changes characteristic of the learning process. An acceptable scientific program is to collect data of this sort and to relate them to manipulable variables, selected for study through a common sense exploration of the field.

    This does not exclude the possibility of theory in another sense. Beyond the collection of uniform relationships lies [p. 216] the need for a formal representation of the data reduced to a minimal number of terms. A theoretical construction may yield greater generality than any assemblage of facts. But such a construction will not refer to another dimensional system and will not, there-
    fore, fall within our present definition. It will not stand in the way of our search for functional relations because it will arise only after relevant variables have been found and studied. Though it may be difficult to understand, it will not be easily misunderstood, and it will have none of the objectionable effects of the theories here considered.

    We do not seem to be ready for theory in this sense. At the moment we make little effective use of empirical, let alone rational, equations. A few of the present curves could have been fairly closely fitted. But the most elementary preliminary research shows that there are many relevant variables, and until their importance has been experimentally determined, an equation that
    allows for them will have so many arbitrary constants that a good fit will be a matter of course and a cause for very little satisfaction.

    [MS. received December 5, 1949]

    [ 1] Address of the president, Midwestern Psychological Association, Chicago, Illinois, May, 1949.

    [ 2] Some of the material that follows was obtained in 1941-42 in a cooperative study on the behavior of the pigeon in which Keller Breland, Norman Guttman, and W. K. Estes collaborated. Some of it is selected from subsequent, as yet unpublished, work on the pigeon conducted by the author at Indiana University and Harvard University. Limitations of space make it impossible to report full details here.

    [ 3] It cannot, in fact, be shortened or lengthened. Where a latency appears to be forced toward a minimal value by differential reinforcement, another interpretation is called for. Although we may differentially reinforce more energetic behavior or the faster execution of behavior after it begins, it is meaningless to speak of differentially reinforcing responses with short or long latencies. What we actually reinforce differentially are (a) favorable waiting behavior and (b) more vigorous responses. When we ask a subject to respond "as soon as possible" in the human reaction-time experiment, we essentially ask him (a) to carry out as much of the response as possible without actually reaching the criterion of emission, (b) to do as little else as possible, and (c) to respond energetically after the stimulus has been given. This may yield a minimal measurable time between stimulus and response, but this time is not necessarily a basic datum nor have our instructions altered it as such. A parallel interpretation of the differential reinforcement of long "latencies" is required. In the experiments with pigeons previously cited, preliminary behavior is conditioned that postpones the responses to the key until the proper time. Behavior that "marks time" is usually conspicuous.

    [ 4] It is true that it appeals to stimulation generated in part by the pigeon's own behavior. This may be difficult to specify or manipulate, but it is not theoretical in the present sense. So long as we are willing to assume a one-to-one correspondence between action and stimulation, a physical specification is possible.

    [ 5] Two topographically independent responses, capable of emission at the same time and hence not requiring change-over, show separate processes of reconditioning and extinction, and the combined rate of responding varies.

    ( 1) MOWRER, O. H., & JONES, H. M. Extinction and behavior variability as functions of effortfulness of task. J . exp .
    Psychol ., 1943, 33, 369-386.

    ( 2) SKINNER, B. F. The behavior of organisms . New York: D. Appleton-Century Co., 1938.

    ( 3) -----. The nature of the operant reserve. Psychol. Bull ., 1940, 37, 423 (abstract).

    ( 4) -----. Differential reinforcement with respect to time. Amer. Psychol ., 1946, 1, 274-275 (abstract).

    ( 5) -----. The effect of the difficulty of a response upon its rate of emission. Amer. Psychol , 1946, 1, 462 (abstract).


    Watch the video: . Skinner Operant Conditioning Full video (June 2022).


Comments:

  1. Shakadal

    On this question, say it may take a long time.

  2. Uisdean

    you were visited by the excellent idea

  3. Minninnewah

    laugh nimaga !!

  4. Amen

    Follow the pulse of the blogosphere on Yandex Blogs? It turns out that Tatiana's day is coming soon.

  5. Kazilkree

    You commit an error. Let's discuss it.

  6. Dubhgml

    YES, this intelligible message

  7. Khalfani

    Absolutely with you it agree. In it something is also to me your idea is pleasant. I suggest to take out for the general discussion.



Write a message