|
1. Introduction
The transfer of knowledge
within an organisation, across organisations, between an individual and an
organisation, and between individuals is facilitated through a number of
sign systems. Such systems include natural languages, mathematical
equations, subject specific notations, and other conventions including
graphical conventions. The term facilitation is a broad term, however, the
key to facilitation is a common consensus on the meanings of words of
natural language, kinds of mathematical equations, and agreement on
notations and conventions. So, in some respects, the transfer of knowledge
requires a consensus amongst organisations and individuals.
Much
knowledge management literature has focused on the “sharing” of know-how
and expertise through protocols devised by managers (Nonaka and Takeuchi
1995, Davenport and Probst 2002) or the focussed discussion of problems
related to the sociology of organisations (Scarbrough 1996). Some have
even looked at this problem from a cybernetic point of view in terms of
feedback and control systems (Morgan 1996). Management Studies, sociology,
and cybernetic models address fairly high-level conceptual issues.
However, the surface form of knowledge, the trace of knowledge left behind
on a document, whether paper or electronic, is amongst the few discernible
forms of knowledge. We will focus on how this trace is transferred.
The
long-standing controversy about the relationship between knowledge and
language (see Baker and Hacker on Wittgenstein 1988) notwithstanding, it
is almost universally true that the development of a subject or the
development of a subdomain within a subject discipline invariably leads to
the appropriation of certain words from the everyday natural languages of
the emergent subject or subdomain workers. Words are given specialist
interpretation; words like energy, mass and force existed in the English
language prior to Isaac Newton. However after Newton propounded his theory
relating to the material nature of being, these three words assumed a more
specialist meaning and spawned a whole new discipline, i.e physics.
Physicists, initially called natural philosophers, started discussing
different kinds of forces, different sources of energy and problems
relating to the metrication and instrumentation of quantities related to
energy, mass and force. No journal of physics, standard textbooks or
encyclopaedias of physics will accept an alternative term for these
concepts. There is no obvious coercion but there is a consensus. The
consensus is brought about partly through patronage, for instance having a
degree in physics will allow one to write a doctoral dissertation or
indeed obtain a job in various physics establishments but one has to speak
and write in the specialist language of physics. Much the same is true of
other disciplines.
We
mentioned the development of subdomains within a specialism. Sometimes the
subdomain relates specifically to the application of principles and
empirical results related to the parent domain. In our times, gene therapy
is a good example of such a transfer. Starting from the rather abstract
concept of the molecular basis of animal or plant life, originally a
theoretical and experimental enterprise variously called biochemistry and
molecular biology, one sees the development of industrial methods and
instrumentation for extracting and harvesting so-called genetic material –
an enterprise now called genetic engineering. From genetic engineering the
notion developed that some genetic material can malfunction giving rise to
sickness of various organs within an organism; by replacing the defective
genetic material, the organ will recover - hence gene therapy. Each of
these different subjects i.e. nuclear biology and gene therapy has its own
vocabulary and, indeed, writing styles for the discussion of theories and
the reportage of experimental results.
Consensus relating to terminology, and elements of other sign systems, is
used to show a commitment to certain concepts within a particular domain.
This commitment is, in one sense, philosophical, for example Newton’s
notion of the material being of nature is a philosophical commitment to
materialism articulated through words of the English language which were
given specialist meaning. The commitment also relates to the basis of
methods and techniques of the new science of the material being – physics
– in that Newton chose differential calculus over algebra or geometry to
describe the movement of material beings. A series of graphical
conventions were adopted for displaying the results of experimental
observations and tabulation protocols were set up to show the relationship
between two or more variables. There is a third sense of this commitment
which relates to the structure of knowledge – also referred to as
epistemological commitment – in that Newton argued about the primacy of
the three concepts, mass, force and energy, and emphasised that the other
physical concepts could be derived from these three. The umbrella term for
different kinds of commitment adopted by a domain community at a given
time in their genesis relates to the existence of that community and of
the ideas propounded by the community. This umbrella term is ontology –
the study of the existence of being: the commitments could be called
different kinds of ontological commitments.
In this
paper, we discuss some of the challenges and opportunities related to
sharing knowledge between experts and practitioners within a specialist
domain and the sharing between the two groups and the potential end-users
of the knowledge of the domain or those upon whom the knowledge will have
an impact. The case in point here is that of breast cancer therapy. This
is an extensively researched topic involving major laboratories and
academic departments working on cancer treatment. The results of their
deliberations are published in learned journals, written in a formal style
for peer-to-peer communication – if you are not an expert or aspiring to
be one in oncology or radiation therapy, for example, learned papers in
these disciplines will mean very little to you. The knowledge of the
experts is refined, related to the knowledge of other experts, and then
passed on to the practitioners including cancer therapists working in
hospitals, some having close links with the laboratories/departments, and
nurses specialising in cancer therapy together with technicians involved
in the operation of complex radiotherapy machines, various imaging
devices, and/or highly toxic drug treatments. This refined and correlated
knowledge is documented in a peer-to-operative language and practitioners
themselves write some of the documents. Another important development in
recent times has been that of digital libraries and documentation archives
that can be accessed through the Internet. Nowadays, the Internet is the
first place people go to seek clarification and knowledge related to
complex topics; sometimes cancer patients, especially those who have just
been diagnosed or about to receive (novel) therapy, tend to consult the
Internet. Major cancer charity organisations have devised documents in a
language which is more accessible to this new audience. These documents
are written in an operative/expert-to-lay person language.
We
report on the development of an information spider: a computer program
that can allow access to a range of documents, for example learned papers,
practice manuals, and fact sheets. The spider not only allows access but
helps in creating a text archive and in extracting terms from documents
for indexing purposes as well.
2. Shared concepts,
terminology and knowledge spirals
Early
literature on knowledge management focused on sharing knowledge related to
industrial innovation: there are two well-cited examples of this genre of
sharing. The first relates to the development of new product lines by
persuading researchers, product designers, manufacturing and sales
personnel to work together across departmental and status boundaries (Nonaka
and Takeuchi 1995:95-123). The second example relates to the sharing of
‘local innovation’ in the design of usable technology by sharing the
knowledge of the end-users of the products (Seely-Brown 1998). Both of
these classic examples describe how large organisations used brainstorming
methods, and software systems for co-designing and for cross levelling the
knowledge within the organisations.
Knowledge sharing in more recent literature stresses more indirect
interaction between the constituent members of a (geographically
distributed) organisation. For instance, organisations keen on their staff
sharing ‘best practices’ typically use a document repository – for example
reports of past successful/failed projects, employee, product, and service
profiles (e.g. the so-called Yellow Pages) – and tools for inputting and
extracting knowledge from such repositories (Davenport and Probst 2002).
The range of knowledge sharing systems includes document management
systems, systems that manage documents which have been selected and
annotated by experts for the use of others (Gibbert, Jonczyk and Völpel
2000), to the ambitiously-titled intelligent systems (Fisher and Ostwald
2001).
Knowledge sharing within a community is a more recent phenomenon and
appears to be supported by public-sector organisations. For example, the
US National Cancer Institute, a US government agency, is ‘cross levelling’
knowledge across the sub-communities of cancer researchers, cancer-care
professionals, and the public at large (Cancer 2003). Again, a document
repository is at the heart of the National Cancer Institute’s system. The
repository comprises newsletters, fact-files, journal papers, application
notes for care workers, information specific to cancer for the public at
large, and a glossary of terms.
2.1
Intra-organisational knowledge sharing and exchange
Classical knowledge sharing models suggest that the knowledge
transfer/sharing process involves the conversion of tacit knowledge into
explicit knowledge and vice versa. En route there are processes that help
share explicit and implicit knowledge without conversion. These models
focus largely on how knowledge is shared within an organisation or
intraorganisationally. The sharing of knowledge within an organisation at
one level should be part of the natural functioning of the organisation.
At another level there are a number of bottlenecks prohibiting this
transfer including physical problems of disseminating information, social
problems related to prestige and power, and linguistic problems of sharing
knowledge across different levels and kinds of expertise. As we show
later, interorganisational transfer of knowledge can pose equally severe
challenges.
The
terms implicit and explicit knowledge are ambiguous and subject to much
philosophical debate. For Nonaka and Takeuchi (1995) the conversion of
knowledge from implicit to explicit and finally to implicit is the basis
of knowledge creation. Choi and Lee (2002) have observed a close
relationship between the management strategies of Korean enterprises and
the knowledge conversion modes suggested in Nonaka and Takeuchi.
Generally, explicit knowledge is formalised consensually, and is
articulated in the language of a specialist domain through texts. These
texts are either informative (learned texts) or instructive (instruction
manuals). Implicit knowledge is articulated mainly through the spoken word
and is suffused with metaphors, similes, and analogies. Implicit knowledge
is largely informal and idiosyncratic of individuals. Documents like
inter-office memos, product catalogues, advertisements for goods and
services, comprise both implicit and explicit knowledge.
The
knowledge conversion process involves a close interaction between, and
understanding amongst, the key players - the knowledge crew of an
organisation: these include the experts, professional workers, including
production/marketing/sales staff, researchers and design engineers, the
end-users of the artefacts created by the experts and professional
workers. The artefacts may include goods and services.
There
are four modes of knowledge conversion, according to Nonaka and Takeuchi
(1995:71-73), and we discuss these modes with reference to the exchange of
terminology and concepts amongst the crew during each of the modes:
(i)
In the SOCIALISATION mode the crew works on an informal basis: verbal
exchanges enable the crew to understand each other’s vocabulary.
(ii) SOCIALISATION is followed by EXTERNALISATION. Here, an
inventory of novel, revised, and abolished concepts is produced in a
written document;
(iii) SOCIALISATION and EXTERNALISATION produce fragmented
knowledge. The knowledge crew then tends to fuse concepts and terminology
in the so-called COMBINATION mode. The fusion is implicit in the
development of new methods of working or new products.
(iv) Once the method and products are established, the crew
internalises the operational details, sometimes improving on it and at
other times jettisoning some of the new knowledge. This is the
INTERNALISATION mode of knowledge transfer. This ultimately leads to
SOCIALISATION, EXTERNALISATION and COMBINATION.
The
articulated public and consensual development of a shared conceptual
system and its vocabulary is more vivid in a loosely-organised setting,
e.g. systems for sharing best practice, than in the high-pressured setting
as encountered in the creation of a new type of automobile, home bakery (Nonaka
and Takeuchi 1995), or smarter and non-intrusive photocopiers (Seely-Brown
1998) where an organisation explicitly plans for a targeted change.
Best
practice is shared across an organisation and the recipients of
collated/created knowledge are not as well defined as may be the case for
design and production engineers sharing the ideas of an architect
(product/services) and a marketing expert. Recent developments in
knowledge creation are broad-spectrum. This we discuss next.
2.2
Inter-organisational knowledge sharing and exchange
Mergers
and acquisitions (M&A) between organisations present a major challenge to
knowledge management in that M&A precipitate lasting changes in the
participating organisations, and the acquiring organisation undergoes
changes when it takes over the other organisation. The example of Siemens’
Information and Communication Mobile (ICM) segment is quite apt here (Kalpers
et al 2002).
There
are a number of tasks that involve the workers in the two (or more)
organisations during a merger and acquisition: Kalpers et al describe the
workers as a Business Community: ‘a [geographically and organizationally
distributed] group of people who share existing knowledge, create new
knowledge, and help one another on the basis of a common interest in a
business-related topic’ (2002:197). The Business Community ‘was designed
as socio-technical system’ for facilitating the ‘combination of knowledge
and the creation of new knowledge’ (ibid:198). The five main activities of
the Business Community suggest that the exchange of knowledge is primarily
through social interaction and quadri-modal as per Nonaka and Takeuchi
(Table 1).
Table 1:
Activities of the Business Community and knowledge conversion modes.
|
Key
Activities of the Business Community |
Soc |
Ext |
Comb |
Int |
|
Sharing
regular events: face-to-face and phone conference |
a |
|
|
|
|
Urgent
request forum: Discussion forum with email and Net-meeting sessions |
a |
a |
|
|
|
Information-platform process for knowledge packages and project
information |
|
a |
a |
|
|
Merger
and Acquisition (M&A) process improvement work-shops |
|
|
a |
a |
|
Disseminating information related to M&A projects through information
brokering and debriefing |
a |
|
|
a |
The
technical component of the Business Community is an information system
that helps in the storage, annotation and retrieval of documents. Kalpers
and colleagues talk about K(knowledge) Packs: clearly formatted structures
for encapsulating meta-level and summarised contents of documents. The
documents can be classified in different facets: (i) according to the type
of change – merger, acquisition, divestment; (ii) according to the
relevant business process – human resources, logistics, product design;
(iii) according to M&A processes and phases - monitoring, evaluation,
integration/post closing; (iv) according to IT topics - data,
applications, infrastructure, security; and (v) according to the
organisational structure of Siemens – group-wide, business-unit wide,
region-wide. K-Packs range from informative (contacts, project
documentation, laws, contracts) to instructive documents (checklists,
documents templates, lessons learnt/annotated histories).
This
multi-faceted information platform is called an information spider or an
infospider. There is a team of authors and editors involved in providing
potentially ‘reusable knowledge’ to this document repository. According to
Kalpers et al ‘a sophisticated search engine allows the user to
keyword-search (sic) the K-Packs …[and there are facilities] to browse the
most popular and often used K-Packs’ (2002:201). The initial evaluation of
the Siemens’ M&A Knowledge Exchange (MAKE) appears to be encouraging. What
interests us is how the M&A experts built up the knowledge of the mergers
and acquisitions business.
3. Special language
and knowledge sharing
The
different modes of knowledge conversion help in the articulation,
explanation, revision, and acceptance/rejection of key concepts within a
group with diverse interests: the players in the group ensure that the
terminology they use in articulation and explanation of concepts is
clearly understood by others. The group interaction helps the group in
achieving a shared understanding of concepts by sharing the terminology of
each other. There is anecdotal/case study evidence in Nonaka and Takeuchi
suggesting that ‘speaking a common language and having discussions can
assemble the power of the group. This is a vital point, even though it
takes time to develop a common language’ (1995:99). The development of the
understanding of the vocabulary of a specialism is discussed under the
rubric of languages for special purpose (LSP) (Sager, Dungworth and
MacDonald 1980; Schröder 1991): this subject has an active constituency in
Northern Europe and North America as evidenced by academic journals (e.g.
Fachsprache). The use of LSP in shaping specialist written knowledge is a
subject of debate in pure and applied linguistics (Halliday and Martin
1993; Bazerman 1988). One major area of research in LSP is the growing
gulf between language used by experts and by the layperson
3.1
Knowledge exchange and LSP terminology
Any
specialist language is a part of the natural language of the authors of
specialist texts: ‘Scientific English may be distinctive, but it is still
a kind of English, likewise scientific Chinese is a kind of Chinese’ (Halliday
and Martin 1993:4). Pejorative remarks that equate specialist talk with
obfuscating jargon notwithstanding, specialist languages are an excellent
example of parsimony that hallmarks human cognition: a small set of
keywords is used to represent a large body of knowledge, or, more
specifically, these keywords usually comprise a significant proportion of
specialist texts. This parsimony is essential for reducing ambiguity and
increasing precision. An even smaller set of single words is used by the
community as their (specialist) signature: physicists will write around
and about mass, energy, force, time and space, biologists around and about
life forms, evolution, heredity, and environment for instance.
The role
of shared terminology in knowledge creation is perceptible in the MAKE
system. Each K-Pack has associated keywords and MAKE has access to a
search engine that presumably makes use of the keywords. Human editors
append the keywords to the documents. The editors make a judgement about
the suitability of the keywords for a given document and assume that a
potential user will be familiar with the keywords. This is a
time-consuming and expensive process.
In the
following, we outline a method for automatically extracting candidate
single word terms and compound terms, for automatically identifying
relationships between terms based solely on the behaviour of the
candidates in relation to other terms and words used in everyday
discourse, the so-called general language discourse. Our method is
domain-independent and relies only on a representative but random sample
of texts used in a given specialism – cancer care for example – together
with a sample of texts used in general language.
3.2
A text-based method for identifying shared knowledge
The
introduction, usage, and obsolescence of words in a language is complex
and creative. Language experts, particularly lexicographers, have advanced
a plausible explanation in relation to the birth, currency, and death of
words: they argue that the frequency of a word generally correlates with
its acceptability by the language community (Quirk et al 1985). The
frequency is computed by examining a collection of written texts (or
speech fragments) randomly sampled from a universe of texts. Such sampling
is essential especially since the language system is open-ended.
Corpus
linguistics is a branch of linguistics where the emphasis is on the use of
systematically organised text collections – text corpora or text corpus
(singular) – as a starting point of linguistic description or as a means
of verifying hypotheses about a language. Machine-readable versions of
such collections have been developed for major languages of the world. One
major beneficiary of corpus linguistics is lexicography – and many
individual dictionary publishers have their own in-house corpora.
The
British National Corpus (BNC) of 20th century English language comprises
over 100 million words including written text (c. 90%) and speech
fragments (10%) (Aston& Barnard 1998). The written component comprises
3,209 texts published mainly between 1975-1993: two-thirds of the texts
belong to imaginative genres (novels, literary magazines), the arts, world
affairs and leisure, and the other third to natural, pure, applied and
social sciences. There are approximately 250,000 unique words including
plurals of nouns and verbs in different tenses. Some of the words are used
in most texts and most frequently - 6% of the BNC is the word the (6
million instances) - and yet others are used rarely; the word cancer is
used 949 times in the BNC, neutron appears 247 times and radionuclide 40
times. Words like ‘the’ and other determiners (a, an), conjunctions (and,
but), and prepositions (in, on) are the most frequent and comprise a
quarter of the BNC. These are called closed-class words as
English-language users seldom invent new determiners or prepositions.
Words
belonging to the open-class category, nouns, adjectives, adverbs, are not
as frequent. Indeed, amongst the 100 most frequent words in the BNC
comprising about half the words in the corpus there are only two nouns,
time and people.
3.2.1
Language-related and subject-related signatures
Recall
that a specialist writing about his or her domain of specialist knowledge
writes in a form of natural language. A specialist document typically has
two signatures. The first signature signifies the natural language of the
document and the second signifies the special domain.
A
corpus-based analysis of a number of individual subject domains, ranging
from subjects as diverse as nuclear physics to dance studies, philosophy
of science to sewer engineering, theoretical linguistics to cancer
research, suggests the existence of the two signatures (Ahmad 2001 and
references therein). A corpus was created for each domain usually by
keying in a subject name on a search engine and selecting texts of
different genres: journal papers, text books, advertisements for goods and
services, conference announcements specifically dealing with topics in the
domain. The corpora varied from 150,000 words to 750,000 words.
The
language-related signature of an English LSP shows itself in the
distribution of closed-class words. This distribution is the same as that
of the British National Corpus: the first 10 most frequent words in almost
each of the domains included determiners, prepositions, and conjunctions.
The subject related signature of an LSP is reflected in the profusion of
open-class words, mainly nouns, in the 100 most frequent words: in some
disciplines as many as 30 nouns comprise the 100 most frequent words and
in others about 10 or so.
The most
frequent nouns refer to a small group of concepts in the domain: in
nuclear physics the 100 most frequent words include the names of key
objects of study in nuclear physics - the atomic nucleus, constituent
particles of the nucleus, protons and neutrons - and key concepts in
physics - energy, force and mass. In linguistics, the 100 most frequent
words include the names of the grammatical categories or words, noun,
verb, adjective, together with important theoretical notions of
transformation, structure and grammar.
The
subject-related signature discussed above refers to single words.
Specialist language differs more sharply from general language in the
usage of compound words, containing as many as six single words. It turns
out that the most frequent single words, nucleus and nuclear, are the key
ingredients of many of the most frequent compound terms in nuclear
physics, i.e., nuclear structure and nuclear reaction, target nucleus,
stable/unstable nucleus.
3.2.2
Automatic identification of terms
It is
the profusion of subject-related nouns that distinguishes a special
language text from a text written in general language. For example, for
one instance of the term nucleus in the BNC there may be as many as 300
instances in a typical nuclear physics corpus – the ratio rising to over
5000 for the plural nuclei.
The
ratio of the relative frequency of a word in a specialist corpus and in a
general language corpus may suggest whether or not the word is a term. As
closed-class words have a similar distribution in the two corpora, the
ratio of relative frequencies of these words in the two corpora, one
specialist and the other general language, is generally around unity. But
the ratio of the relative frequency of subject-related nouns within a
specialist text (corpus) to that in the BNC is generally greater than 1
and indicates a candidate term. This ratio is sometimes called the
weirdness ratio. The computation of weirdness is the first step in
automatic extraction.
3.2.3
Subject-related signatures and knowledge sharing
One
example of knowledge sharing is the emergence of an applied science or
engineering science around a theoretical subject. The example of nuclear
physics (NP) will illustrate this point. The systematic use of nuclear
radiation in medicine and agriculture is discussed in the radiation
physics (RP) literature. RP is based on key concepts in nuclear physics:
concepts that help explain naturally radioactive elements, or unstable
elements that emit nuclear radiation, or concepts that describe how stable
elements can be made unstable, or radioactive, by bombarding or
irradiating these elements with other radiation. The controlled use of
emitted radiation is used in radiation therapy or diagnosis. Nuclear
(reactor) engineering is a branch of engineering based on the theoretical
concepts of nuclear fission in nuclear physics.
The
applied sciences and engineering are regulated by law to ensure the safety
and well being of humans whilst promoting the use of potentially lethal
artefacts like nuclear radiation. Radiation protection/safety has emerged
as a discipline following the extensive use of radiation physics.
In order
to be autonomous disciplines, both radiation physics and radiation
protection have to have their own concepts and associated terminology, a
terminology that manifests itself as subject-related signatures. A
three-way comparison between the three subjects will show the influences
of the parent and the progeny’s own identity. We have created three
corpora to study these influences and identity: theoretical nuclear
physics (151 texts comprising 444,540 words, published between 1970-1999),
radiation physics (91 texts, comprising 286,676 words, published between
2001-2003), and radiation safety (16 texts, comprising 127704 words,
published in 2003). The texts are written in American and British English
and are drawn from journals, textbooks, public announcements and
advertisements.
Table 2
shows the ten most frequent single words in each of the corpora: nuclear
physics and radiation physics ‘share’ two key terms: energy and neutron;
radiation physics and radiation safety ‘share’ the terms dose and
radiation. The other eight terms show the autonomy of the disciplines.
Table 2:
Subject-related signatures in three disciplines in physics
|
Nuclear Physics |
Radiation Physics |
Radiation Safety |
|
N=
444540 |
N=
286676 |
N=
127704 |
|
Term |
f/N |
Term |
f/N |
Term |
f/N |
|
energy |
0.57% |
dose |
0.79% |
mutation |
0.91% |
|
nucleus |
0.52% |
neutron |
0.41% |
dose |
0.75% |
|
neutron |
0.41% |
beam |
0.40% |
disease |
0.60% |
|
nucleon |
0.35% |
radiation |
0.33% |
gene |
0.59% |
|
nuclear |
0.32% |
energy |
0.30% |
radiation |
0.57% |
|
potential |
0.32% |
system |
0.27% |
risk |
0.47% |
|
target |
0.25% |
treatment |
0.24% |
rate |
0.45% |
|
scattering |
0.24% |
image |
0.22% |
exposure |
0.32% |
|
interaction |
0.21% |
rays |
0.22% |
cancer |
0.31% |
|
mass |
0.20% |
detector |
0.19% |
radionuclide |
0.30% |
|
Total |
3.390% |
|
3.356% |
|
5.254% |
Let us
now compare the distribution of five of the most frequent terms in each of
our corpora and in the BNC (see Table 3). What one sees in the
distributions is that the term energy is used 43 and 23 times more
frequently in the NP and RP corpora respectively than in the BNC; more
demonstrably, the term dose is used 337 and 291 times more in the RP and
RS corpora respectively than in the BNC, and the term neutron is used 790,
1379 and 54 times more in NP, RP and RS corpora respectively than in the
BNC. The term nucleon, the weirdest in the three corpora, is used only in
our nuclear physics corpus.
Table 3:
Weirdness ratio for the most frequent open-class words in the three
corpora
|
Nuclear Physics |
Radiation Physics |
Radiation Safety |
|
N=
|
444540 |
N=
|
286676 |
N=
|
127704 |
|
Term |
fNucPhys/fBNC |
Term |
fRadPhys/fBNC |
Term |
fRadSafets/fBNC |
|
energy |
43 |
dose |
337 |
mutation |
629 |
|
nucleus |
535 |
neutron |
790 |
dose |
291 |
|
neutron |
790 |
beam |
218 |
disease |
50 |
|
nucleon |
6402 |
radiation |
125 |
gene |
309 |
|
nuclear |
39 |
energy |
23 |
radiation |
409 |
The 10
subject-related signature terms help (in Table 2) in the formation of
compound terms and illustrate the linguistic parsimony and linguistic
productivity of specialist writers. The term nucleus is used as a head
word for two frequent compound terms, target nucleus and halo nucleus, and
the neologism nucleon acts as a modifier for the most frequent compound in
our nuclear physics corpus, nucleon-nucleon amplitude. In radiation
physics neutron is used as a head word for the frequently occurring
thermal neutron, or as a modifier in neutron-capture therapy and the other
noun in the noun-noun compound neutron fluence. Radiation acts as a
dominant constituent in the radiation safety corpus, as a modifier in
radiation exposure and radiation dose, in its derivative form radiological
protection, and as a head word in ionizing radiation.
Table 4:
Most frequent compound terms in the three corpora. Terms in italics are
neologisms
|
Nuclear Physics |
Radiation Physics |
Radiation Safety |
|
nucleon-nucleon amplitude |
dose
distribution |
radiation exposure |
|
neutron star |
thermal neutron |
congenital abnormalities |
|
nuclear physics |
neutron capture therapy |
Multi-factorial disease |
|
angular distribution |
radiation therapy |
ionising radiation |
|
target nucleus |
neutron fluence |
air
concentration |
|
halo
nucleus |
spatial resolution |
genetic disease |
|
nuclear reaction |
fluorescence reabsorption |
transfer coefficient |
|
nuclear structure |
maximum dose |
radiological protection |
|
angular momentum |
intensity matrix |
breast cancer |
|
radioactive beam |
radiation physics |
radiation dose |
The
theoretical notion of a structured and composite nucleus, and interaction
between the constituents of two nucleons (as in n-n amplitude), shows the
physico-philosophical bias of the subject and that of the terms. In
radiation physics, the term dose (or the energy of the radiation), and its
control, dominate the discussion and show the applied physics/engineering
bias of the subject. Radiation safety deals with exposure to the risk of
nuclear radiation – hence the most frequent terms radiation exposure,
radiation dose and the current interest in breast cancer dominate the
discussion in the RS corpus demonstrating the ethico-legal aspect aspects
of the subject.
We have
attempted to describe how knowledge sharing can be monitored using a text
and terminology management system by identifying the subject-related
signature of specialist subjects, and particularly how the sharing of
terminology across disciplines indicates the sharing of concepts. The
explication of knowledge in nuclear physics resulted in the development of
radiation physics, and explication of radiation physics knowledge led to
the domain of radiation safety. Each of the two explications have led to
the internalisation of knowledge which when explicated has its own
terminology.
The
results in nuclear physics and related disciplines have been replicated in
the transfer of knowledge in theoretical solid state physics to electron
device engineering (Al-Thubaity and Ahmad 2003); in knowledge transfer
from civil engineering to environmental planning systems (Ahmad and Miles
2001); and in a study of how concepts in cognitive psychology and
structuralism found their way in theoretical linguistics (Ahmad 2002).
In the
next section we discuss how the automatic extraction of terminology for
identifying the subject-related signature of a domain, and for identifying
its impact on its application/applied domain, can be used to build an
information spider semi-automatically. Such a method will facilitate the
automatic annotation of key terms for each of the documents and the
stronger and weaker cross-referencing between the parent and progeny
domains.
Our
chosen domain is cancer care where experts are attempting to share their
knowledge with professional workers, including therapists, nurses, and
radiation workers, and where both experts and professionals are attempting
to do the same with increasingly Internet-aware actual or potential cancer
patients. Ours is a corpus-based study.
4. Monitoring and
documenting change and differences: A health infospider
Health-care is an all-pervasive domain where advances in medicine and the
concomitant costs respectively encourage and discourage the use of new
knowledge. In this domain documentation is the ‘main means of
communication between care providers’ (Ruch et al 1999) and the effective
healthcare delivery systems have become increasingly dependent on accurate
and detailed clinical information based on best practices (Chute, Cohn and
Campbell 1998).
Knowledge of advances and best practice can be shared and refined by
formal knowledge dissemination outlets, for example journal papers,
workshops and seminars, and through learning-by-doing during encounters
with patients. The Internet facilitates sharing of scientific results
either through digital journals or through research notes posted on secure
websites relating to drug trials, for example. The widespread use of the
Internet has led to potential and actual patients, or their friends and
relatives, going online for information after receiving news that the
patient is or might be suffering from cancer.
Health-care knowledge has to be shared between many organisations and
increasingly that knowledge has to be shared with an open-ended audience.
In health-care or its sub-domain cancer care, as in any other specialist
domain, terminology management is of the essence: including new terms and
expunging old ones. Maintainers of controlled medical vocabularies
recognize that such vocabularies are not static (Cimino 1996).
The US
National Cancer Institute (NCI) is attempting to provide up-to-date online
information on cancer to two groups: health-care professionals and
patients. The NCI website provides a facility for searching the contents
of its document base; there is also a glossary of cancer terms. The
website is organised and is accessible according to different facets:
users can look at individual types of cancer, at different types of
treatments, and at the results of studies being carried out. Information
for professionals is generally in the form of an extended abstract or
summary about a specific topic together with an extensive bibliography.
References to published journal articles in the bibliography of a given
extended abstract are generally hyperlinked to the abstract of the cited
article. Information for patients is provided without extensive references
to journal articles and is mainly in the form of fact sheets: highlights
of a recent diagnostic or therapeutic discovery, of a long-term study and
other useful information. In addition to the US NCI, and other national
cancer charities like Cancer Research UK, pharmaceutical companies also
provide information about their drugs as fact sheets.
4.1
Building a cancer infospider
In order
to ascertain the subject-related signature of the language used by experts
for cancer-care professionals and for addressing laypersons, especially
patients, we have created three text corpora. We are not considering the
parent discipline - cancer research - rather focusing on its three
progenies to determine the extent to which knowledge is shared between the
three progenies by measuring terminological commonalities. In order to
illustrate our ideas we have focused on aspects of diagnosis (specifically
the breast cancer gene), therapy and after-care of breast cancer patients.
The
breast-cancer expert corpus comprised 300 texts, abstracts, and full
papers (114,394 words). The texts were collected by navigat |