OF SHEAVES AND BALES; INFORMATICS AT WORK FOR E-COMMERCE.
International Business Law Services, Inc.
2445 McCabe Way
Irvine, CA 92614
International Business Law Services, Inc.
2445 McCabe Way
Irvine, CA 92614
The use of the terms Sheaves and Bales of e-Commerce is intended to evoke the question of Intermediated versus Dis-Intermediated information. The farmer who cuts the wheat by hand and gathers the sheaves for thrashing is the paradigm for intermediated database construction. The mechanized harvesters that cut, thrash and bale the straw is the paradigm for the dis-intermediated amalgamation of information incorporated in the use of spiders and bots.
Layered on top of the concept of database construction is the concept of Informatics. Informatics began in the 1970’s as a way of discussing information technology in a way that is not necessarily concerned with the underlying subject material itself, but with the technology that makes information available to the end user. Informatics is limited to the discussion of information technology that can be used as a knowledge management system in a specific discipline, as in the example of Medical Informatics.
The University of Kyoto has posted a definition of Informatics that I like. I am quoting only a portion of the definition below.
Informatics studies the creation, recognition, representation, collection, organization, optimization, transformation, communication, evaluation, and control of information in complex and dynamic systems. 
Informatics, as being discussed in this paper, does not include subjects like Artificial Intelligence, Intelligent Systems, and Cybernetics. These topics are far beyond the scope of this discussion.
Consider Informatics in the e-Commerce arena and a central question arises. How do you get your arms around the subject of e-Commerce as an information resource, and then present that information to an interested searcher? Librarians, for centuries, have attempted to create systems of classification that represent the universe of knowledge. Communications systems advanced from papyrus to movable type to electronic transmission of data. Systems of classification have become more important as the volume and variety of formats renders hard copy catalogs moot.
More recently, Internet metacrawlers (spiders) and robots (bots) go out from search engines to search for web pages. When they return, the information is interpreted, sorted and indexed. There are two problems with these bots. Firstly, they can be stopped in their tracks by several methods that are employed by web page designers who wish to avoid the server traffic that bots cause. In effect, a metatag can make a site invisible. And Secondly, interpretation, sorting, and indexing of bot retrieved raw material usually takes place without human intervention.
Compiling a database of useful and easily accessible information is not something that merely arrives on your plate. There is a huge army of farmers, transportation systems, wholesalers, and kitchen staff at work behind the scene who bring that steak to the table. In much the same way, there are a huge number of data amalgamators, writers, editors, database and web page designers who extrude information into usable electronic formats.
Basic systems like effective labeling systems and thesaurus construction must be created for an effective intermediated database. Once the labeling and thesaurus links are made, there is room for full text search engines that supply hits to documents that may otherwise escape the researchers or catalogers attention. It need not be a universal knowledge system like the Library of Congress Subject Heading (LCSH) classification system, however there is a need for a controlled vocabulary; a language that the researcher understands. The controlled vocabulary should include terms that the searcher understands in context, not terms that the librarian cataloger must arbitrarily assign. 
For Instance LCSH has long held high esteem for such strings as “Technology -- United States -- Law and Legislation” as a parent term to a Senate Legislative History Report as a required taxonomy.
With a targeted, controlled vocabulary, one can even eliminate the problems of homographs (an expression that has the same spelling but a different contextual meaning). The meaning is understood when the targeted vocabulary excludes all alternatives not related to the target audience, such as researchers interested in E-Commerce.
Example: In an E-Commerce controlled vocabulary, VAT is never construed as a vessel, but is always understood as Value Added Tax. Therefore VAT, in our controlled vocabulary will net need an added parenthetical qualifier: VAT (Value Added Tax). A bot does not make the distinction between VAT and vat. A bot can not clarify the relationship unless the terms is included as a phrase search. Even then, the phrase search may bring back hits that exactly match the phrase but miss the mark in the case of non-standard English usage, or American or British spelling, or synonyms or quasi-synonyms, or non-preferred terms.
System designers may want to have MARC Formats , or Metadata Encoding and Transmission Standard (METS) , or Dublin Core Metadata Initiative XML Standards  waiting in the wings, but for the early stages of database development they are not necessary.
BOTS AND INTELLECTUAL PROPERTY:
The automatic harvesting of pieces of intellectual property is of great concern to information providers. After all the efforts of the founders and creators brings forth an exciting new addition to the electronic information universe, the intellectual property owner looks up to discover that someone has committed a trespass to his chattels by deep-linking to his page, or by extracting the content to another, unrelated web page. It may have been a long time since attorneys had to research the law of Chattels, but, on December 8, 2000, Judge Barbara S. Hones of the Southern District of New York used the theory of Trespass to Chattels in granting a preliminary injunction against Verio, Inc. from using an automated software program to access a database created by Register.com. Verio’s software was “flooding” Register.com’s computer system with traffic bent on extracting information from the database. Verio’s bot is an example of a database application that harvests intermediated data and republishes it to Internet users. Yes, Search Engine BOTS do make the information freely available to anyone who uses the Internet to do research, but it may also subvert the originating sources potential to reap a profit on the data it has placed on the market.
An intermediated data resource, in the wild-west days of the early Internet, can be characterized as allowing free and open access to any page that could be found. Early fee based systems like Westlaw and Dialog showed entrepreneurs that they had a chance of putting a data resource on the net that might be able to turn a profit; if they could control access to the database, and if they could charge a reasonable fee for the intellectual content on the database. Once database builders left government and academic institutions, emerging into a competitive market system, there was an immediate basal understanding that someone was going to have to pay for the intellectual property that many hands caused to come to the market.
Legal Informatics arrived on the world stage in the phiz of Lexis, Westlaw, Dialog, Disclosure, and LiveEdgar, to mention only a few of the major vendors in the market. There are also a large number of other vendors who have appeared on the world market who are advancing legal informatics in a number of ways. While the large vendors named above have created ways of presenting documents, by placing a proprietary search engine structure over public documents and previously published articles; there remains a need for intermediated product that presents itself as a sign post and directs the reader to other primary documents.
Lets use another real world illustration. When a lawyer goes to a library and pulls down a book, and finds information in a subject that he needs to research, the book will have a note that directs the lawyer to another resource. Leaving the book open, the lawyer places the new book on top of the first book. This is repeated until the lawyer finds the answer to the problem he/she has been researching. Librarians call this building a learning tree. At the root is the principle research question. Each of the books in the pile adds to the understanding of the topic, from the roots, through the branches, to the leaves. At the end of the research session, there is one leaf that will be the answer to the question.
With dis-intermediated computer systems, a lawyer types in a few terms and the database hits are returned. This is like knowing only the root and the leaves. Computer systems return a lot of leaves that may or may not relate to the original root question. And if the relationship between root and leaf exits, it still does not allow the researcher to know the relationship unless the researcher looks at each leaf in its turn and discards the false hits.
As much as computer technology tries to eliminate the human factor, dis-intermediated systems often lack the single thing that is required to make the system efficient. And that single thing is a human who can ferret out the relational links that make for an efficient research database.
Characteristics of an e-Commerce legal informatics database are its:
INTERNATIONAL BUSINESS LAW SERVICES MISSION:
- Focus on end users information needs.
- Explication of the law: The writer will take a major law, regulation, or concept; break it into bite-sized pieces, providing FAQ like summaries.
- Summary of Country Commercial Laws
- Regulation of the Law of the Internet
- E-Commerce law applications in the international arena.
- Recent changes in commercial law based on new cases.
- Advising small and medium business entities on commercial law having strictly to do with electronic b2b or b2c applications.
- Considerations taken for the organization of e-commerce information.
- Topical Divisions of information managed by a controlled targeted vocabulary.
- Legal Categories
- Business Categories
- Country Categories
- Considerations taken for the storage of information that may include the intellectual property of others.
- How to present the information in bite size (or byte-size) packets
- Whether to link or store original documents
- Use of Public documents vs. Copyrighted documents
- Used by For-Profit entities vs. Not-For-Profit entities
- Whether justified under the Fair Use doctrine of Copyright law
- Re-Publishing edited portions of documents with the permission of the copyright holder.
- Considerations regarding forward pointing links to original documents. Sources for:
- Legislation – Both U.S. and foreign Acts
- Case law – Current rulings that change the face of e-commerce
- Secondary resources – Publications that aid business people avoid the pitfalls of doing business online
- News sources – Breaking information that will effect commerce
Our mission is to connect the docs in an architecture that allows our registered users to quickly cut through the information jumble. Better access to e-commerce information is contingent upon a simplified introduction to a topic, but also provides the sign posts that will link the user to other primary resources on worldwide laws and regulations focused on commercial applications on the Internet. Law summaries are written in easy-to-understand business language and are accessible in multiple languages. Links provide immediate access to the sources of the laws, regulations, and case law used in the summary. Registered members can request that summaries be written on legal subjects of their own interest.
Another service provided is the linking to Attorneys practicing in 31 countries through our Charter Partner program.
As the database has grown, we have made it available to e-commerce students through a partnering arrangement with St. Thomas University School of Law in Miami. The St. Thomas LL.M. program is an ABA approved law specialization in Tax. IBLS provides an e-commerce component that complements the LL.M. course of study. This distance-learning program can be licensed to other educational institutions. Our Edu component can travel beyond the traditional university setting by establishing partnerships with professional organizations, publishers, and continuing legal education providers.
Original materials such as the Legal Summaries, and Supplementary Reading material from published case law, Law Reviews, and commentary is mounted on a database. Along with our Summaries database, we use WebExplorer to transfer documents from Westlaw to our St. Thomas Blackboard site. From the Blackboard site (developed by LexisNexis for Law School Web Courses) the course description, weekly modules, and discussion board assignments are posted. The Discussion Board is an interactive element of the course offering. Each week a discussion forum is posted and students can begin or add to a thread. The teaching faculty member has the option to intensively interact, or to stand off and direct the discussion by adding to a thread or posting an announcement or a concluding note on the discussion that has taken place during the previous week.
The IBLS database and course offering makes use of a hybrid of Intermediated content and Informatics tools.
Dis-intermediated sources, such as can be found with searches conducted on the major Internet search engines, can only go so far in providing quality information. Sifting through the hits from a search engine is not generally the quickest way of accessing information. Intermediated, fee based systems can take the researcher farther and faster. In the current marketplace, profitability is the essential ingredient for Internet funding, either by advertising revenue or public offering revenue.
Are there any questions?
 University of Kyoto, Graduate School of Informatics, Introduction.
[old link] http://www.i.kyoto-u.ac.jp/English/introduction.html
 L. Rosenfeld and P. Morville, Information architecture for the world wide web
(101 Morris Street Sebastopol, CA 95472: O’Reilly & Associates, Inc.,1998)
 Thesaurus Construction, Welcome to the Introductory Tutorial on Thesaurus Construction.
Prof. Tim Craven
Faculty of Information and Media Studies
University of Western Ontario,
Canada, N6A 5B7
 Library of Congress, Network Development and MARC Standards Office, Frequently Asked Questions (FAQ).
MARC is the acronym for MAchine-Readable Cataloging.
 Library of Congress, Metadata Encoding and Transmission Standard (METS)
 Dublin Core Metadata Initiative: Making it easier to find information.
Dublin Core Metadata Initiative XML Standards