Association for the Study of Canadian Radio and Television
at the Learned Societies Conference
May 28, 1996
Storage and retrieval of moving images:
a research agenda
©James M. Turner
Professeur adjoint
École de bibliothéconomie et des sciences de
l'information
Université de Montréal
voice +1 514 343 2454
fax +1 514 343 5753
turner@ere.umontreal.ca
http://tornade.ere.umontreal.ca/~turner
Table of contents
- Background
- Overview of research projects
- The 1994 study
- The 1995 study
- Summary of results
- Gender differences
- Transferability of language
- Upcoming replication studies
- Other upcoming work
- Using audio description texts
- Conclusion
- References
- To be of use to researchers, archival film and television material needs to
be indexed at the shot level
- In terms of storage and retrieval technology, this means that each shot
needs to have its own record in an online database
- From a technical point of view, is not yet practical to include the moving
image along with the textual metainformation. So databases have
information about the shots, but not the shots themselves
- Even if the shots could be included, the metainformation is still crucial to
searching, so it is worth investing our time in the creation of this
information
- Because of costs involved in creating the metainformation, any advances in
automating the process are useful
- Previous work in the area of shot-level indexing for storage and retrieval
(Turner 1994) showed that users tend to provide retrieval cues by naming
objects and events that appear in the frame
- Further research (Turner 1995) indicated that the terms named most
often for a given shot also appear in a very large number of cases in the
indexing created by professional indexers, as well as in the running
description created for use in the database
- Another study looked at gender differences in describing the same images
for storage and retrieval
- Present research (Turner 1996) is studying the transferability of the
findings with French-language data
- Next, all these studies will be repeated with a new sample. The results of
the total of 4 data sets (2 English and 2 French, 800 participants total) are
expected to confirm the original results
- Expected outcome: results strongly suggest that shot-
level indexing of the subject matter of moving images
can be successfully automated on the basis of textual
representations
- Future work will look at automatically generating a shot-level index from
the textual representation accompanying moving images
- Another study will investigate whether the narration texts created for
audio description can be recycled as the source for automatically
generating shot-level indexes to the described material.
- The focus is subject access to "ordinary" (i.e. non-art) images which are
potential elements of visual products, catalogued individually in order to
permit storage and retrieval
- Important findings include that what essentially takes place when
participants are placed in a situation in which they are asked to supply
indexing terms (without being explicitly told as much) is a naming activity,
i.e. most participants simply named what they saw in the pictures
- However, people seeing the same visual stimulus do not necessarily put
the same name to it
- The terms supplied by participants in the study fall into patterns similar to
those found in data for other types of stimuli, notably that few words are
named many times and most terms are named only once (Zipf's
distribution)
- The words named most often fared quite well, being supplied on average
by almost 60% of the participants, in spite of many ties for top term (of
the 44 shots used, 17 had a single most popular term, 16 had two terms
competing, and 11 shots had 3 or more terms competing)
- These words are the focus of our interest, since they are the potential
indexing terms
- This was a follow-up study to the 1994 study, the results of which were
reported at last year's ASCRT conference and at the ASIS conference
- In the first part of the study, the top terms given for all the shots were
compared with those supplied by professional indexers
- In the second part of the study, the top terms were compared with the
text of the running descriptions (visual synopses) of the shots
| Shots in the database | % | Shots in the card file | %
|
---|
Term appears in indexing | 9/11 | 82 | 28/31 | 90
|
---|
Term appears in description | 9/11 | 82 | 25/33 | 76
|
---|
- The results strongly suggest that automating subject access to non-art
images can be successfully achieved
- However, the basis for making this claim needs to be confirmed
- Another study (Turner and Rabinovitch 1995) looked at gender
differences in describing the same pictures
- Analysis of variance (ANOVA) was done on the percentage of participants
of each gender who named the top term for each shot
- A statistically significant difference (p = .025) was found
- 56.38% of females named the top term (averaged across all the shots), as
did 60.48% of males
- The results lend empirical support to the notion that men are more likely
to take things for what they appear to be, while women engage in more
interpretation of face values
- Limitations of the results: only the top term could be studied, so
differences might be absorbed if the top 3 terms were considered; the
pictures were non-art images (stockshots)
- The application to information systems: if responses returned to the user
are ranked by pertinence to the search request, gender differences in the
indexing may influence the system's notion of the most important
descriptor. Because of this, it is useful to study the question further
- The other studies underway (3 more data sets) will include analysis on
gender differences, and the results will confirm (or deny) this finding
- Work presently underway (Turner 1996) is studying cross-language
transferability of the findings, using French-language data
- A French-language version of the same research tapes was prepared, and
data collected from approximately 200 French-speaking participants
- Analysis is now underway (spring 1996) and preliminary results indicate a
very close correspondence with the 1994 (English-language) results
- The application for information systems: for non-art pictures, automated
dictionaries and translation software can probably be used to generate the
index in English or French once it is created in the other language
- The 44 shots appearing on the original research tapes were distilled from
a random sample of 200, on the basis of agreement on the subject matter
by professional indexers
- The goal of this exercise was to show that even when there is agreement
on the subject matter, a wide spread in the terminology is usually
observed in descriptors supplied for everyday pictures
- It is possible that using only the shots for which there was unanimous
agreement on the subject category (among 11 indexers) influenced the
results obtained
- A new random sample has been taken and will be used to prepare
research tapes for data collection in English and French to replicate all the
studies to date
- If the results across all the studies reported here are consistent, there will
be strong empirical support for the notion that shot-level indexing of the
subject matter of moving images can be automated on the basis of textual
representations of those images
- The next step will be building practical applications with several searching
approaches:
- Multilingual subject access via semantic networks (a web structure the
user can navigate to identify the search terms sought)
- An online classification providing context for concepts and multilingual
access
- Automated generation of some form of PRECIS strings from tagged
description text
- Using a visual dictionary showing whole-part relationships with multilingual
labels
- Seen mostly on PBS and largely through the work of the Descriptive Video
Service (DVS) at WGBH in Boston, audio description uses the "second
audio program" (SAP) channel to insert in the otherwise empty sound
spaces a running narration of what's going on in the picture for the benefit
of the visually impaired
- The wordprocessing file from which the narrated text is read into a
microphone for inclusion in the production provides a pre-existing
automated file describing the image
- Although created for the blind, these text files might be recycled for a
completely different purpose: they could take on a new role as the source
for automatically generating shot-level indexes to the described material.
- A research project to study the feasibility of doing this has been funded
and will be getting underway this year
- The mass of documentation found in film and television archives and
stockshot libraries steadily grows, increasing the need for shot-level
indexing. This is important for researchers of all kinds
- If ways can be found of providing shot-level indexing in some cost-effective
manner, we can hope to see much more of it
- Automating the creation of such indexes seems to be a reasonable
prospect, and there is reason to be optimistic about it
Turner, James. 1994. Determining the subject content of still and moving
image documents for storage and retrieval: an experimental
investigation. PhD thesis, University of Toronto.
Turner, James, 1995. Comparing user-assigned terms with indexer-assigned terms for storage and retrieval of moving images. Proceedings of the 1995
ASIS Conference, Chicago.
Turner, James M., and Lori Rabinovitch 1995. Does he see what she sees?:
gender differences in describing pictures for purposes of storage and
retrieval. Unpublished manuscript.
Turner, James M. 1996. Cross-language transfer of indexing concepts for
storage and retrieval of moving images: preliminary results. Proceedings of
the 1996 ASIS Conference, Baltimore [in press].
Main menu |
EBSI |
Comments
Other presentation