文档视界 最新最全的文档下载
当前位置:文档视界 › Methods for information server selection

Methods for information server selection

Methods for Information Server Selection

DAVID HAWKING and PAUL THISTLEWAITE

Australian National University

The problem of using a broker to select a subset of available information servers in order to achieve a good trade-off between document retrieval effectiveness and cost is addressed. Server selection methods which are capable of operating in the absence of global information, and where servers have no knowledge of brokers,are investigated.A novel method using Lightweight Probe queries(LWP method)is compared with several methods based on data from past query processing,while Random and Optimal server rankings serve as controls. Methods are evaluated,using TREC data and relevance judgments,by computing ratios,both empirical and ideal,of recall and early precision for the subset versus the complete set of available servers.Estimates are also made of the best-possible performance of each of the methods.LWP and Topic Similarity methods achieved best results,each being capable of retrieving about60%of the relevant documents for only one-third of the cost of querying all servers.Subject to the applicable cost model,the LWP method is likely to be preferred because it is suited to dynamic environments.The good results obtained with a simple automatic LWP implementation were replicated using different data and a larger set of query topics. Categories and Subject Descriptors:C.2.4[Computer-Communication Networks]:Distrib-uted Systems—distributed databases;H.3.3[Information Storage and Retrieval]:Informa-tion Search and Retrieval—search process;selection process;H.3.4[Information Storage and Retrieval]:Systems and Software—information networks;H.3.6[Information Storage and Retrieval]:Library Automation—large text archives

General Terms:Design,Experimentation,Performance

Additional Key Words and Phrases:Information servers,Lightweight Probe queries,network servers,server ranking,server selection,text retrieval

1.INTRODUCTION

The problem of locating relevant text documents from distributed network servers is partially solved by large-scale centralized indexing services such as Alta Vista and Excite.This centralized model suffers from four limita-tions.First,even systems with enormous capacity are likely to index only a The authors wish to acknowledge that this work was carried out within the Cooperative Research Centre for Advanced Computational Systems established under the Australian Government’s Cooperative Research Centres Program.

Authors’address:Cooperative Research Centre for Advanced Computational Systems,Depart-ment of Computer Science,Australian National University,Canberra,ACT0200,Australia; email:dave@https://www.docsj.com/doc/f315199890.html,.au;pbt@https://www.docsj.com/doc/f315199890.html,.au.

Permission to make digital/hard copy of part or all of this work for personal or classroom use is granted without fee provided that the copies are not made or distributed for profit or commercial advantage,the copyright notice,the title of the publication,and its date appear, and notice is given that copying is by permission of the ACM,Inc.To copy otherwise,to republish,to post on servers,or to redistribute to lists,requires prior specific permission and/or a fee.

?1999ACM1046-8188/99/0100–0040$5.00

ACM Transactions on Information Systems,Vol.17,No.1,January1999,Pages40–76.

Methods for Information Server Selection?41 fraction of all documents on the Internet.Examples of documents not indexed include those barred by robot exclusion,those missed by the indexing robot,those served by parameterized scripts,and those accessible only through a search service operated by the server on which they are located.Second,index information may be out-of-date.Third,a serious researcher has no effective means of restricting searches to authoritative primary sources.Finally,public Internet indexing services do not index documents on private“intranets.”

Various studies(summarized below)have examined variants of an alter-native distributed model in which a user’s search requests are forwarded by a broker to a carefully selected subset F of a long list S of known servers. S is not likely to include all servers on the Internet but rather a shorter list prepared by the searcher’s organization or by an Internet search service.S may include,or entirely consist of,servers on an organization’s private network.

In the simplest distributed model,all servers operate a query-processing service restricted to the documents which they serve.This model is as-sumed throughout the present article.

The broker endeavors to merge the results obtained from the servers in F into a ranked list which will best meet the user’s need.The main reasons for selecting a subset of S are reduction of network or server access costs and improvement in the timeliness of result delivery.However,it is possible that a highly effective server selection method might also result in a better combined ranking of documents.The relationship between search effectiveness(precision-recall performance)and cost of searching is of particular interest.

In general,successful implementation of the broker model requires solutions to the following problems:

(1)How to translate the user’s statement of information need into the

query languages of the respective search servers(Query Translation).

(2)How to select the members of subset F(Server Ranking/Server Selec-

tion).

(3)How to merge search results from the different servers so as to achieve

precision-recall goals.(Result Merging,also known as the Collection Fusion problem).

The interesting problems of Query Translation and Result Merging are beyond the scope of the present work,where the focus is on Server Ranking and Selection.Empirical comparisons of server ranking methods reported here avoid the query translation problem by assuming that all servers operate the same retrieval system.Similarly,a special type of relevance scoring(described below)is used to obtain server ranking results which are not confounded by incorrect result merging.

ACM Transactions on Information Systems,Vol.17,No.1,January1999.

42? D.Hawking and P.Thistlewaite

1.1Terminology

Elsewhere in the literature,the terms server,source,and subcollection have been used synonymously.Here,following(for example)Yuwono and Lee [1997],the term server is chosen for use in this context,and the term source is reserved for describing the organization which created or supplied a collection of documents,e.g.,Associated Press.The term subcollection is not used as it does not convey the idea that the data may be distributed across a network.

The literature uses many different terms for the entity which refers a user query to a subset of available servers.Examples include meta search engine,broker,metabroker,search manager,receptionist,metaservice,and query intermediary.Following(for example)Gravano and García-Molina [1996],the term broker is somewhat arbitrarily chosen for use here. Following the TREC convention[Voorhees and Harman1996],the En-glish language statement of a user’s information need is called a topic description.Topic descriptions may directly serve as queries for a retrieval system or may be converted into a system-dependent query language.

The term perfect merging describes an unrealistic system capable of combining the document lists returned by multiple servers into a merged list with all relevant documents at the head.By contrast,the term correct merging refers to a merging process which is capable of producing a merged ranking effectively identical to that which would have been produced had all documents from all selected servers been searched as a single collection by a single retrieval system.Correct merging requires that document scores produced by independent servers are strictly comparable,which almost certainly requires that all servers use the same document-scoring algorithm.

1.2Summary of Relevant Literature

Several approaches to Server Selection use term frequency data to rank servers.Callan et al.[1995]use Collection Retrieval Inference(CORI)Nets whose arcs are weighted by term document frequency(df)and inverse collection frequency(icf).Having ranked servers,they use a clustering method to determine how many servers to consult.

The gGlOSS system of Gravano and García-Molina[1996]allows for a hierarchical broker structure in which first-level brokers characterize servers according to term df s and aggregated term weights and where second-level brokers characterize first-level brokers in the same fashion. Servers need to know how to contact all first-level brokers,and first-level brokers need to know how to contact all second-level brokers,in order to keep information up-to-date.The goal is to use the information held by the brokers to estimate the number of relevant documents likely to be held by each server.This is done according to two alternative extreme-case hypoth-eses about term cooccurrence,which are experimentally compared. Yuwono and Lee[1997]describe a centralized broker architecture in which the broker maintains df tables for all servers.The variance of df ACM Transactions on Information Systems,Vol.17,No.1,January1999.

Methods for Information Server Selection?43 values across servers is used to select terms from the user query which best discriminate between servers,and then servers with higher df values for those terms are selected to process the query.Servers must transmit changes in df values when they occur and therefore need to know the address of the broker.

By contrast,a number of other approaches to the Server Selection problem are based on the use of server descriptions.Chakravarthy and Haase[1995]propose a method for locating the Internet server containing a known item.They manually characterize a large set of servers using semantic structures expressed in WordNet[Miller1995]style.For each new search specification,a semantic distance based on WordNet hyponymy trees is computed for each server,and the server with the smallest such distance is chosen.They also use the SMART retrieval system to select servers based on unstructured,manually generated server descriptions and compare performance of the two methods.Kirk et al.[1995]propose the use of Knowledge Representation technology to represent both queries and server content and have developed algorithms for choosing a necessary and sufficient subset of servers described in this way.

The Pharos system proposed by Dolin et al.[1996]uses decentralized, hierarchical metadata descriptions for selecting appropriate servers.The authors suggest generating metadata automatically by performing cluster analysis of the server data and classifying the clusters within a manually defined metadata taxonomy.

Finally,isolated Database Merging approaches assume that neither global collection statistics nor server descriptions are available.The two approaches taken by Voorhees et al.[1995]rank servers using information derived from past query processing.In one,similarities are computed between the weighted term vector representing a new search topic and vectors representing each available past topic.The relevant-document distribution for the average of the k most similar past queries is used to score the utility of servers.In the other approach,vectors representing past queries are clustered into groups believed to represent topic areas,and similarities are computed between the new topic vector and the centroids of each past topic cluster.It should be noted that Voorhees and her colleagues do not attempt to restrict the number of servers accessed but rather to determine the optimum number?i of documents to retrieve from each server i,based on usefulness of servers to past research topics.

1.3Overview of the Present Article

The present study compares server selection methods capable of operating in the absence of both global collection information and server descriptions. The methods studied do not require servers to have knowledge of brokers. Three methods based on historical data,including a variant of the Voorhees et al.[1995]methods,are empirically compared with a novel Lightweight Probes method(described below)that requires no past https://www.docsj.com/doc/f315199890.html,-

ACM Transactions on Information Systems,Vol.17,No.1,January1999.

44? D.Hawking and P.Thistlewaite

parisons are made using TREC[Voorhees and Harman1996]data,re-search topics,and relevance judgments.

Section2details the experimental framework used in the present exper-iments.Section3contains an evaluation of each of the methods in turn. These evaluations are followed by comparisons of the estimated best-possible performance of each method(Section3.6),comparisons of perfor-mance of practical implementations of each method(Section 3.7),and comparisons of automatic implementations(Section 3.8).Supplementary experiments using different data and a much larger set of queries are presented in Section3.9to confirm the generality of results obtained using automatically generated Lightweight Probes.Section3.10discusses cost models,compares server selection methods on the basis of equal estimated cost,and addresses cost-benefit issues.Section3.11briefly addresses the problem of choosing the number of servers to access.Finally,Section4 discusses the performance and applicability of the methods studied and identifies a number of areas for further research.

2.EXPERIMENTAL FRAMEWORK

In essence,the evaluation methodology was as follows.A large test collec-tion of text documents was distributed across approximately100simulated servers.The test documents had been previously judged for relevance to a set of topics.For each topic,a server ranking was generated using the method to be evaluated,and the cumulative numbers of relevant docu-ments held by the servers up to given points in the ranking were calcu-lated.Server selections were compared on the basis of relative proportion of relevant documents held,or alternatively,on the basis of the ratio of precision/recall performance of merged retrieval results from the selected subset to that achieved by merging results from all servers.

Four different basic approaches to server ranking were evaluated:Server General Utility(Section3.2),Collection Promise(Section3.3),Topic Simi-larity(Section 3.4),and Lightweight Probes(Section 3.5).Manual and automatic versions of the latter two were implemented and studied.These approaches were compared with two controls:Random,representing a performance floor,and Optimal(see Section3.1.2),representing a perfor-mance ceiling.The availability of TREC-5[Voorhees and Harman1996] data and relevance judgments enabled accurate evaluation of the perfor-mance of each method over a large number of retrieval topics.Given a distribution of TREC documents across servers it is possible to know the locations of all the documents relevant to each topic.

Server rankings were evaluated in two ways.The first assumed an ideal retrieval system on each server and resulted in a theoretical measure of the effectiveness of the ranking.Unfortunately,the theoretical measure can be based only on the Recall(proportion of relevant documents retrieved) dimension and gives no information about Precision(proportion of re-trieved documents which were relevant).Accordingly,server ranking meth-ods were also compared empirically,using actual queries and a real ACM Transactions on Information Systems,Vol.17,No.1,January1999.

Methods for Information Server Selection?45 retrieval system on each server.Definitions of the measures used are given in Section2.6.

For each method and variant,actual retrieval runs were performed over four different server subset sizes,corresponding to one-half(49),one-third (33),one-fifth(20),and one-tenth(10)of the total number(98)of servers. Each retrieval run processed50queries and was evaluated in comparison to the performance achieved by the same queries processed over all servers. Finally,it was recognized that server selection methods may be imple-mented in a range of different ways and that different implementations may vary in performance.Accordingly,lower-bound estimates of the best-possible performance achievable by any implementation of the methods are derived using“hindsight”and compared with actual manual and automatic implementations.For example,the best-possible performance of Topic Similarity is estimated by using a topic similarity measure based on the known distributions of relevant documents rather than on the text of the topic.

Server selection methods were compared on the basis of equal numbers of servers accessed.The problem of choosing an optimum number of servers for a given user and a given information need is considered to be indepen-dent of the server ranking method.(See Section3.11.)

2.1Text Data

The text data used in the experiment comprised TREC CDs2and4, representing a total of approximately2gigabytes divided into524,929 documents.Six distinct collections(each corresponding to a distinct docu-ment source)were represented:Associated Press(11),US Federal Register (28,comprising1988and1994groupings),Wall Street Journal(14),Ziff-Davis computer publications(8),Financial Times(26),and US Congres-sional Record(11).Each collection was divided across the number of servers shown in parentheses.The total amount of data(measured in bytes)on each server was roughly constant,although the number of documents per server varied from1,548to8,527,with a mean value of 5,356.The division of data was as specified in the TREC-5Database Merging task.The distribution of number of documents per server is shown in Figure2.No server held data from more than one collection.

2.2Hardware and Software

The experiments reported below were carried out using the PADRE[Hawk-ing and Bailey1997;Hawking and Thistlewaite1995]text retrieval system running on a128–node parallel distributed-memory machine,the Fujitsu AP1000.Details of this machine are given by Horie et al.[1991]and are summarized in Figure1.

Each of the98subcollections was assigned to its own processor node, leaving30idle.Each of the nodes thus modeled a network server with the front-end machine playing the role of the broker and client interface.This

ACM Transactions on Information Systems,Vol.17,No.1,January1999.

approach required no modification to PADRE’s indexing structures which are naturally based on splitting up the data.

The use of a parallel machine was convenient rather than essential to the present study.Although the Fujitsu AP1000architecture is logically a cluster of workstations and could potentially simulate a more realistic network (with random delays inserted to model real-world network laten-cies),the configuration available featured shared disks.Consequently,observed query-processing times on a node could not be used in the study,as they are not independent of activity on other nodes.A simplified cost model is described in Section 2.8.

PADRE mechanisms specifically implemented to support this study included

Fujitsu AP1000

98 processing nodes simulate WEB servers, 30 are idle

Fig.1.Overview of the experimental model used throughout this article.The front-end of the Fujitsu AP1000simulates the broker and the retrieval client interface.AP1000nodes simulate network servers,each of which manages a subcollection of the overall text.In each experiment a fixed-sized subset F of available servers is used to process the full query.A variety of alternative methods for selecting F ,represented as ellipses clustering around the broker,are compared.Some of them are based on knowledge derived from queries processed by all servers in the past.Another relies on transmission of a very cheap probe query to all nodes and reception of small reply packets of frequency data (as illustrated on the left of the diagram).

46? D.Hawking and P.Thistlewaite

ACM Transactions on Information Systems,Vol.17,No.1,January 1999.

(1)a mode in which local rather than global statistics are returned to the

front-end and

(2)a command to select a subset of processors and to prevent nonmembers

from contributing to the processing of a query.

2.3Merging Rankings

It is assumed here that result merging is independent of the server selection method,though this may not always be the case.

“Theoretical”comparisons of server selection methods presented below assume perfect retrieval and perfect merging.However,server selection experiments using actual queries need to control for the potentially con-founding effect of incorrect result merging.The difficulties inherent in merging rankings,even when the servers operate the same (tf .idf -based)algorithm,are described by Dumais [1992].

Hawking and Thistlewaite [1995]demonstrate that the use of a distance-based relevance scoring method enables correct merging,provided that all servers use it.Accordingly,empirical comparisons in the present study employ distance-based scoring to achieve correct merging and thereby allow direct comparison of server selection methods.Naturally,precision-recall performance would be expected to deteriorate if correct merging were not possible.

Note that the distance-based scoring is used only to achieve correct merging of results;the server selection methods examined do not rely

on

Fig.2.Frequency distribution of number of documents per server.

Methods for Information Server Selection ?47

ACM Transactions on Information Systems,Vol.17,No.1,January 1999.

the relevance scoring method.A description of distance-based scoring appears in Section 2.4,and an example query is shown in Figure 3.

2.4Research Topics and Queries

The research topics used in the experiments were topics 251–300of the TREC set.Relevance judgments for all these topics are available for all of the data used.

A set of manually devised queries Q T 5for these topics was used in all the retrieval runs reported here.These queries were oriented toward good TREC performance and are consequently quite long.The average number of terms in each query was approximately 65.Long queries such as these are typical of high-performing queries in the TREC AdHoc tasks [Allan et al.1995;Buckley et al.1995].An example query is shown in Figure 3.

Note that none of the server selection methods depend upon the manual generation of queries.

Relevance scoring of the queries was based on the method first described by Hawking and Thistlewaite [1995].(Clarke et al.[1995]simultaneously reported an independently developed but nearly identical method.)These methods are based on lexical distance between term occurrences and are independent of collection statistics.They have been shown to be capable of achieving good precision-recall results on the TREC AdHoc tasks.

In processing the example query,each document is examined for spans of text including one term from each of the anyof lists.The fragment “...terrorists using stolen plutonium from Kazakhstan...”is an example of such a span.

The length of each span may be computed as the number of intervening nonquery terms (two in the example).The relevance score of the document which contains it is increased by an amount which depends upon some inverse function of span length.The rationale is that the fewer the

number Fig.3.One of the distance-based queries used in the present experiments.The topic related to “dangers posed by fissionable materials in the states of the former Soviet Union.”Each anyof creates a set of all match points for all terms in the given list of alternatives.The span 4scores the relevance of documents according to the spans they contain over the four match sets.An explanation of spans is given in the text and in cited references.

48? D.Hawking and P.Thistlewaite

ACM Transactions on Information Systems,Vol.17,No.1,January 1999.

of intervening terms,the more likely that the query term occurrences are semantically related to each other.Hawking and Thistlewaite[1996]and Hawking et al.[1996]provide further explanation of the method and discuss the effectiveness of different functions of span length and the treatment of partial spans.

2.5Query-Processing Data for Past Topics

The Topic Similarity and Server General Utility methods studied below rely on the availability of data about past query processing over all servers. TREC topics202–250were used as the past topics,but,unfortunately, relevance judgments were not available for these topics on CD4documents. Accordingly,results of processing Q T4,the best available set of PADRE queries for the past topics(a selection of the best of three independently generated sets),were used to estimate the missing information.Q T4 achieves an(unofficial)average precision of0.3634on the TREC-4task.Of more relevance in this context,its R-precision(precision at the point when the number of documents retrieved equals the total number of relevant documents)was0.4069.Q T4queries use only distance-based relevance scoring.

2.6Server Selection Effectiveness Measures

Server selection methods may be compared,for a given number?F??n of servers accessed,using

R n?rel docs on F rel docs on S

.

This use of the R n notation follows Lu et al.[1996].

Gravano and García-Molina[1996]criticize the use of measures,such as R n,which are based on the location of relevant documents,pointing out that there is no benefit to the user in accessing a server containing relevant documents if the search engine there is unable to retrieve them.They propose measures?n and?n which are based on the ability of the server selection algorithm to predict which servers will return high-scoring docu-ments.

Despite the above objection,the R n measure has the considerable advan-tage that it is independent of the search engine(s)employed.Accordingly, R n is used here.However,the performance of server selection methods in the context of a real retrieval system was also investigated,using measures based on relevant documents retrieved by the search engine in use.Search engines are normally compared on the basis of recall and precision.The proposed server selection effectiveness measures R?n and P?n report the proportions of all-server recall and all-server precision respectively which were obtained by accessing only n servers:

Methods for Information Server Selection?49

ACM Transactions on Information Systems,Vol.17,No.1,January1999.

R

?n ?recall for Q T 5over F recall for Q T 5over S

and P ?n ?precision @20for Q T 5over F

precision @20for Q T 5over S

R n may be regarded as an ideal or theoretical form of R

?n .Recall is actually the number of relevant documents within the first 1000retrieved.Precision@20is the number of relevant documents within the first 20retrieved.The latter measure was chosen because it is a very real determi-nant of user satisfaction in the context of Internet searches.

Finally,an apparently satisfactory performance when averaged over 50topics may conceal dismal failures on some topics.To measure this,the percentage of topics for which F held less than 10%of all the relevant documents is reported as the failure rate .Like R n ,this measure is indepen-dent of the search engine(s)employed.

2.6.1Relevant Set .The relevant set for the experiments was defined as the set of documents judged relevant by the TREC assessors.The number of relevant documents per topic ranged from 1to 594and the mean was 110.Figure 4shows the distribution of number of relevant documents per

topic.

Fig.4.Frequency distribution of number of relevant documents per topic.

50? D.Hawking and P.Thistlewaite

ACM Transactions on Information Systems,Vol.17,No.1,January 1999.

The distribution of relevant documents across servers was also deter-mined and is shown in Figure 5.The number of servers holding relevant documents for a topic ranged from 1to 71,and the mean was 29.

2.6.2Evaluation Framework .A server ranking was evaluated by calcu-lating R n values and failure rates for ?F ??1,...,98.Then Q T 5queries were run over server subsets of size ?F ??10,20,33,49,98,and values of R

?n and P ?n were calculated.When measuring the performance of a ranking method,the results are specific to the particular implementation of the method;a better implemen-tation of the method will lead to better performance.For example,poor performance of an implementation of the Topic Similarity method may be due to poor choice of similarity metric rather than to a general deficiency of the method.This makes it difficult to make useful cross-method compari-sons.

However,the availability of complete relevance judgments for the “new”topics makes it possible to estimate the maximum possible performance of a particular method on the stated https://www.docsj.com/doc/f315199890.html,ing the Topic Similarity example again,it is possible to rank past topics on the basis of the extent to which their distribution of relevant documents across servers matches that of the new topic,thus producing an estimated best-possible ranking of available past topics.Implementations of a method which rely on relevance judg-ments for new topics are totally impractical and are referred to here as “hindsight”

implementations.

Fig.5.Frequency distribution of number of servers holding relevant documents on a per-topic basis.

Methods for Information Server Selection ?51

ACM Transactions on Information Systems,Vol.17,No.1,January 1999.

52? D.Hawking and P.Thistlewaite

Results are presented for hindsight as well as for practical implementa-tions of the methods considered here.

2.7Incomplete Rankings

It was sometimes the case that less than?F?servers achieved nonzero scores,either because the ranking method correctly assigned zero utility to useless servers or because the ranking procedure was deficient.It was assumed that the latter explanation was more often correct than the former,and,when necessary,additional servers were chosen by random selection to permit equal-cost(as measured by number of servers selected) comparison of methods.If zero scores are in fact due to a failure of the ranking method,this approach leads to fair comparisons of the methods. However,as server selection methods approach the optimal,costs will tend to be overstated,particularly for large?F?.The issue of how to determine appropriate values for?F?based on user and topic characteris-tics is discussed in Section3.11.

2.8Cost Model

Practical applications of distributed information retrieval may be subject to a variety of different cost models,depending upon the user,and the technological and economic circumstances in force at a particular time. Cost may be measured in terms of monetary charges levied by server, broker,and/or network operators,or in terms of service delays experienced by users and loads experienced by servers,brokers and networks.

The cost model used here is oriented toward monetary charges,but it is assumed that these may be derived directly from the computational and network resources used.This is almost certainly unrealistic,but in the absence of an accepted charging model,there is little practical alternative. Under this model,the cost associated with distributed query processing depends upon the following:

(1)The number of servers accessed.It is simplistically assumed that the

cost of processing an arbitrary query is equal for all servers.Thus the cost of processing a query over?F?servers is?F?/?S?of processing it over all servers.Actual query-processing times on the nodes of the parallel machine could not be used because the nodes share disks and thus do not function independently.

(2)The complexity of the query submitted to a server.Short queries are

cheaper than long ones.

(3)The amount of data transferred between servers and the researcher’s

workstation.Transfer of documents for viewing is not included here,as it should be independent of the server selection method.Consequently, the majority of transmitted data which must be considered consists of lists of document identifiers(including scores,etc.).

ACM Transactions on Information Systems,Vol.17,No.1,January1999.

Methods for Information Server Selection?53 Table I.Retrieval Performance of Random Server Ranking

Number of Servers10203349

R n0.0940.1960.3200.485 Failure rate64%16%6%0%

As yet,network latencies and timeouts have not been included.It would be necessary to include these factors if server selection comparisons were to be made on the basis of quality of service.

2.8.1Global Ranking.In the actual retrieval runs,the local rankings at each server were actually merged using an efficient global reduction mechanism available on the parallel machine.In reality,the global-ranked list would be formed by sending local lists(or sublists)to the broker and merging them there.However,the results would be identical due to the use of distance-based scoring.The network traffic costs estimated in Section

3.10.3assume the use of merging at the broker.

3.EVALUATION OF SERVER RANKING METHODS

Three of the experimental server selection methods—“Collection Promise,”“Topic Similarity,”and“Server General Utility”—rely on historical infor-mation derived from past query processing over all servers.In contrast,the “Lightweight Probes”method characterizes servers by requiring all of them to process a very short,low-cost query and return a packet of frequency information.

3.1Controls

Two impractical methods—“Random”and“Optimal”—serve as controls. 3.1.1Random.The method of randomly choosing a set of servers to process each query was used to verify that the simulation machinery was working and to establish a performance floor against which the other methods could be judged.Table I records the performance of the method.As can be seen,the proportion of relevant documents retrieved very closely approximates the proportion of servers used.The proportion of“failed”queries(as defined in Section2.6)is quite high unless more than one-third of available servers are used.

3.1.2Optimal.Given knowledge of the complete set of relevant docu-ments,it is possible to choose a set of n servers which will achieve the best-possible recall of relevant documents for that value of n.Optimal runs are performed to serve as a performance ceiling.

Figure6illustrates the benefits to be gained if a close-to-optimal server ranking can be discovered in a real situation.On average,the best server for a topic holds nearly11%of the relevant documents,and the best four hold approximately30%.Table II reports the performance of the method on

ACM Transactions on Information Systems,Vol.17,No.1,January1999.

the chosen measures.Even when accessing only 10%of the servers,no queries “fail.”

3.2Server General Utility

This method assumes that some servers are better sources of relevant documents than others,regardless of the topic.Server utility was esti-mated with reference to the set of past topics (202–250).Because no past relevance judgments exist for these topics over CD4,the document ranks obtained using a full run with Q T 4queries were employed.If PADRE ranked a document r th among the documents retrieved by a Q T 4query,it contributed 1/r to the ranking weight of the server which held it.

The best-possible performance of this method on this task was estimated,using the complete set of relevance judgments over all new topics to compute the total counts for each server.

Table III and Figure 7show the performance of this method as imple-mented compared with the estimated best-possible implementation.On Table II.Retrieval Performance of the Optimal Server Ranking

Number of Servers 10

203349R n

0.545

0.7550.8960.976Failure rate

0%0%0%

0%Fig.6.The percentage of relevant documents for a topic held by servers as a function of their position in an optimal ranking.Data have been averaged over all 50topics.Note that distributions for some individual topics depart considerably from this shape.

54? D.Hawking and P.Thistlewaite

ACM Transactions on Information Systems,Vol.17,No.1,January 1999.

both measures,results are considerably better than the Random method.Clearly some servers are generally more useful than others.

The estimated best-possible performance of the method is much inferior to the Optimal (only about half as good at 10servers,using the R n metric).This is not surprising as optimal rankings vary from topic to topic whereas the Server General Utility method takes no account of topic differences.

3.3Collection Promise

This method assumes that subgroups of available servers serve particular categories of documents.Such an assumption is likely to apply in some real-world environments and is the case to a certain extent here.Collection promise could be estimated using a broad-brush version of the topic similarity methods discussed below.Instead,however,each new topic was manually assigned a list of collections (corresponding to single sources such as Wall Street Journal,Federal Register,etc.)considered most likely to supply documents in the area of the topic.If the target number of servers

Table III.Retrieval Performance of Server General Utility Ranking

Number of

Implemented Estimated Best Possible Servers 10

20334910203349R n

0.177

0.3880.5770.7390.2810.4610.6380.820Failure rate

38%8%0%0%18%0%0%

0%Fig.7.R n values for Server Utility methods compared with the controls.

Methods for Information Server Selection ?55

ACM Transactions on Information Systems,Vol.17,No.1,January 1999.

was less than the total number of servers for the listed sources then a random sample of servers was drawn from each source.

Table IV records the results for this https://www.docsj.com/doc/f315199890.html,ing 10servers,the method as implemented is much better than the Random method,but performance relative to the Random control declines as the number of servers increases.It is possible that the author of the collection orderings tended to do a reasonable job of choosing the most promising collection but a poor job of choosing the second and third best.

The same table also shows the estimated best-possible performance of the Collection Promise method on this task.These figures were obtained using knowledge of the server locations of all the judged relevant documents.Collections were ranked on a topic-by-topic basis according to ratio of relevant documents held per server.Results obtained are much better than those achieved by the human-assigned rankings and show a much lower percentage of failures.The actual and estimated best-possible Collection Promise rankings are compared with the controls in Figure 8.

3.4Topic Similarity

This approach is derived from the Query Clustering (QC)and Modeling of Relevant Document Distributions (MRDD)methods of Voorhees et al.

[1995].

Like General Utility,the Topic Similarity method assumes

(1)The existence of relevant historical data obtained by processing queries

on ALL servers.

(2)That there is some sort of semantic relationship between the content of

documents held by a single server.If documents are distributed across servers without regard to content,then this technique is unlikely to be useful.

As explained in Section 2.5above,tables of the number of relevant documents per server for TREC topics 201–250were used as the store of knowledge about past queries.Because no relevance judgments were avail-able for these topics on CD4,Q T 4queries were first run over CD2docu-ments only.A relevance-score threshold was then set for each topic so that the number of documents above the threshold was equal to the number of Table IV.Retrieval Performance of the Collection Promise Method as Implemented,

Compared with the Estimated Best-Possible Performance of the Method

Number of

Implemented Estimated Best Possible Servers 10

20334910203349R n

0.202

0.2820.4120.5490.2800.4480.6220.780Failure rate

38%18%10%0%4%0%0%0%56? D.Hawking and P.Thistlewaite

ACM Transactions on Information Systems,Vol.17,No.1,January 1999.

CD2documents actually judged relevant.The Q T 4queries were then run over CD2and CD4together and documents exceeding the threshold score for the topic were assumed to be relevant.The number of “relevant”documents held by each server for each past topic was then tabulated.Note that the use of an absolute score cutoff here is justified by the use of distance-based scoring.A document’s score may be determined in isolation from any collection.

When generating server subsets for a new topic,the distributions of relevant documents across servers for each of the similar past topics were accumulated,and the resulting server-indexed array was sorted in de-scending order of number of relevant documents,giving a server ranking.Two alternative methods of topic similarity computation were used:manual and automatic .In the manual method,a list of past topics considered most likely to involve documents in the same subcollection was manually derived by scanning the set of topics.The maximum number of similar past topics assigned to any one new topic was six,and the mean number was 3.7.

The automatic method used the SMART [Buckley et al.1996]retrieval system to compute vector-space similarities.In the latter computations,relevant ,document ,and contain were added to the stopword list,and idf values derived from the full two-gigabyte data set were used in generating weights.It is not unreasonable to do this,since the method assumes past access to data on all servers.A straight cosine match with tf *idf variant was used (SMART ltc

).

Fig.8.R n values for Collection Promise methods compared with the controls.

Methods for Information Server Selection ?57

ACM Transactions on Information Systems,Vol.17,No.1,January 1999.

A maximum of six most similar past topics were chosen (average 5.9),and these were used to generate results reported below.A run based on a smaller average number (matching that of the manually generated similar-ities)of similar past topics yielded slightly poorer performance on all measures.

3.4.1Results for Topic Similarity Methods .An estimate of the best-possible performance of Topic Similarity methods on this particular prob-lem was made by creating a normalized vector,for both new and past topics,of the number of relevant documents per server.For each new topic,inner products were computed between its vector and those of each of the past topics,allowing past topics to be ranked by “similarity”to the new topic.Heuristics were used to choose the number of similar topics in each case and resulted in a maximum of six and an average of 3.7.Better heuristics may result in better performance.Table V and Figure 9

show Fig.9.R n values for Topic Similarity methods compared with the controls.

Table V.Retrieval Performance of the Automatic and Manual Topic Similarity

Implementations Compared with Estimated Best Possible Performance of Any Topic Similarity Implementation.The automatic version used an average of 5.9similar topics.Number of

Automatic Manual Estimated Best Possible Servers 102033491020334910203349R n

0.1810.3350.5120.7020.1990.3680.5380.7550.3990.5710.7260.864Failure rate 28%8%0%0%16%8%4%2%0%0%0%0%58? D.Hawking and P.Thistlewaite

ACM Transactions on Information Systems,Vol.17,No.1,January 1999.

Methods for Information Server Selection?59 that excellent results on this task were obtainable using hindsight-gener-ated topic similarities.The proportion of relevant documents obtained from 10servers was four times higher than that from Random ranking.No queries failed.

Unfortunately,neither manual nor automatic assignment of similarities approached the performance level of the hindsight method.However, results for both categories are much better than random,with manual outperforming automatic.

3.5Lightweight Probes

Experience in TREC has shown that high precision-recall performance requires complex queries;most of the best-performing systems make use of large-scale query expansion.However,complex queries take much longer to process than short queries,raising the question of whether very short, efficiently processed“queries”could be used to select servers to process the full query.

The Lightweight Probe method proposed here(believed novel)broadcasts a small number p of terms to all available servers,each of which responds with a small packet of term frequency information.The frequency data are then used to rank the likely utility of the servers.

There are some similarities between the Lightweight Probe method and the facilities provided within the Stanford Protocol Proposal for Internet Retrieval and Search(STARTS)[Gravano et al.1997]for extracting meta-data and content summaries from servers.However,in the STARTS proposal it is envisaged that full-content summaries are obtained from servers“periodically.”By contrast,full content summaries are never ob-tained in the Lightweight Probe method.Instead,each query is preceded by a request for a minimal amount of such information.

The Lightweight Probe approach assumes that

(1)tiny probes can be processed with a significant cost saving over a full

query;

(2)useful information about the holdings of a server may be deduced using

a low-cost probe;and

(3)probe bandwidth and latency are low.

It should be noted that a p-term probe is likely to be significantly cheaper to process on a server than a p-term query,because no document ranking is required.In all experiments reported below,p?2,and the frequency information returned by the servers comprised

(1)D

:the total number of documents on the server,

i

(2)f

:the number of documents containing a specified number of the prox

terms within a specified proximity of each other,

(3)f

:the number of documents in which a specified number of the cooccur

terms cooccur,

ACM Transactions on Information Systems,Vol.17,No.1,January1999.

SQL中的case-when,if-else实例

create database EXAM go create table student (stuName varchar(10)not null, stuNO int primary key not null, stuSex char(2)check(stuSex='男'or stuSex='女'), stuAge int, stuSeat int, stuAddress varchar(40) ) GO insert into student values('张秋丽','25301','女','21','1','北京海淀'), ('李文才','25302','男','25','2','天津'), ('张三','25303','男','22','3','北京海淀'), ('红尘','25304','女','21','4','湖南长沙'), ('段林希','25305','女','20','5','江西赣州'), ('魏晨','25306','男','23','6','河北石家庄'), ('郑爽','25307','女','20','7',''), ('张杰','25308','男','21','8',''), ('王洁','25309','女','23','9','湖南怀化'), ('刘欣','253010','女','21','10','北京') create table exam (ExamNO int primary key, stuNO int not null, WrittenExam int, LabExam int ) GO insert into exam values(01,250301,86,89), (02,250302,67,78), (03,250303,76,80), (04,250304,79,56), (05,250305,56,63), (06,250306,67,60), (07,250307,90,83), (08,250308,80,79), (09,250309,92,90), (10,250310,55,58)

项目开发中常用到的SQL语句

项目开发中常用到的SQL语句1、循环示例 循环示例代码: ? 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 DECLARE @i int DECLARE @name varchar(10) DECLARE @password varchar(10) Set @i = 1000 WHILE @i < 1200 BEGIN Set @i =@i +1 SET @name = RIGHT('0000' + CAST(@i AS varchar(10)),4) set @password = @name select @name insert into dbo.LocomotiveTeminalBase (li_ID,t_ID,lt_IDNumber,lt_MiM,lt_FuWQIP,lt_FuWQDKH,lt_CreatedBy) values('d82575c0-2d21-4c47-a406-7771d7d2c80a','fb5d9a7b-9cd6-4a55-9e90-881706eaf @name,@password,'192.168.1.187','2000','9015c234-e185-4e15-96c6-f53426dd6690') END 2、数据库缓存依赖中用到的SQL语句代码示例: ? 1 2 3 4 5 6 7 8 --查看状态 Select DATABASEpRoPERTYEX('soft_LocomotiveRM_DB','IsBrokerEnabled') --启用broker ALTER DATABASE soft_LocomotiveRM_DB SET NEW_BROKER WITH ROLLBACK IMMEDIATE ALTER DATABASE soft_LocomotiveRM_DB SET ENABLE_BROKER --添加用户

for循环的简介及break和continue的区别

for循环的简介及break和continue的区别 1.for循环 for循环是更加简洁的循环语句,大部分情况下,for循环可以代替while循环、do-while循环。 for循环的格式为: for( 初始语句 ; 执行条件 ; 增量) { 循环体 } 执行顺序:1、初始语句2、执行条件是否符合?3、循环体4、增加增量 初始化语句只在循环开始前执行一次,每次执行循环体时要先判断是否符合条件,如果循环条件还会true,则执行循环体,在执行迭代语句。 所以对于for循环,循环条件总比循环体多执行一次。 注意:for循环的循环体和迭代语句不在一起(while和do-while是在一起的)所以如果使用continue来结束本次循 环,迭代语句还有继续运行,而while和do-while的迭代部分是不运行的。 来个例子:输入一个数n(n>1),输出n!的值。n!(n的阶层)=1*2*3*……*n #include void main() { long num=1; int n,i; printf("请输入n:");

scanf("%d",&n); for(i=1;i<=n;i++) num=num*i; printf("%d的阶层是%d\n",n,num); } 2.break和continue的区别和作用 break和continue都是用来控制循环结构的,主要是停止循环。 1.break 有时候我们想在某种条件出现的时候终止循环而不是等到循环条件为false才终止。 这是我们可以使用break来完成。break用于完全结束一个循环,跳出循环体执行循环后面的语句。 2.continue continue和break有点类似,区别在于continue只是终止本次循环,接着还执行后面的循环,break则完全终止循环。 可以理解为continue是跳过当次循环中剩下的语句,执行下一次循环。 例子: #include void main() { int sum,i; sum=0; for(i=1;i<=100;i++) { sum=sum+i; if(i==2) {

for循环语句的翻译

课程设计任务书 学生姓名:辛波专业班级:计算机0707班 指导教师:彭德巍工作单位:计算机科学与技术学院 题目: FOR循环语句的翻译程序设计(递归下降法、输出四元式) 初始条件: 理论:学完编译课程,掌握一种计算机高级语言的使用。 实践:计算机实验室提供计算机及软件环境。如果自己有计算机可以在其上进行设计。 要求完成的主要任务:(包括课程设计工作量及其技术要求,以及说明书撰写等具体要求) (1)写出符合给定的语法分析方法的文法及属性文法。 (2)完成题目要求的中间代码四元式的描述。 (3)写出给定的语法分析方法的思想,完成语法分析和语义分析程序设计。 (4)编制好分析程序后,设计若干用例,上机测试并通过所设计的分析程序。 (5)设计报告格式按附件要求书写。课程设计报告书正文的内容应包括: 1 系统描述(问题域描述); 2 文法及属性文法的描述; 3 语法分析方法描述及语法分析表设计; 4 按给定的题目给出中间代码形式的描述及中间代码序列的结构设计; 5 编译系统的概要设计; 6 详细的算法描述(流程图或伪代码); 7 软件的测试方法和测试结果; 8 研制报告(研制过程,本设计的评价、特点、不足、收获与体会等); 9 参考文献(按公开发表的规范书写)。 时间安排: 设计安排一周:周1、周2:完成系统分析及设计。 周3、周4:完成程序调试及测试。 周5:撰写课程设计报告。 设计验收安排:设计周的星期五第1节课开始到实验室进行上机验收。 设计报告书收取时间:设计周的次周星期一上午10点。 指导教师签名: 2010年 01月 08日 系主任(或责任教师)签名: 2010年 01月 08日

DB2常用SQL语句集

DB2常用SQL语句集 1、查看表结构: describe table tablename describe select * from tablename 2、列出系统数据库目录的内容: list database directory 3、查看数据库配置文件的内容: get database configuration for DBNAME 4、启动数据库: restart database DBNAME 5、关闭表的日志 alter table TBLNAME active not logged inially 6、重命名表 rename TBLNAME1 to TBLNAME2 7、取当前时间 select current time stamp from sysibm.sysdummy1 8、创建别名 create alias ALIASNAME for PRONAME(table、view、alias、nickname) 9、查询前几条记录 select * from TBLNAME fetch first N rows 10、联接数据库 db2 connect to DB user db2 using PWD 11、绑定存储过程命令 db2 bind BND.bnd 12、整理优化表 db2 reorgchk on table TBLNAME db2 reorg table TBLNAME db2 runstats on table TBNAME with distribution and indexes all 13、导出表 db2 export to TBL.txt of del select * from TBLNAME db2 export to TBL.ixf of ixf select * from TBLNAME 以指定分隔符‘|’下载数据: db2 "export to cmmcode.txt of del modified by coldel| select * from cmmcode”14、导入表 db2 import from TBL.txt of del insert into TBLNAME db2 import from TBL.txt of del commitcount 5000 insert into TBLNAME db2 import from TBL.ixf of ixf commitcount 5000 insert into TBLNAME db2 import from TBL.ixf of ixf commitcount 5000 insert_update into TBLNAME db2 import from TBL.ixf of ixf commitcount 5000 replace into TBLNAME db2 import from TBL.ixf of ixf commitcount 5000 create into TBLNAME (仅IXF) db2 import from TBL.ixf of ixf commitcount 5000 replace_create into TBLNAME (仅 IXF) 以指定分隔符“|”加载:

for循环实例

for循环实例 读取的是数组expr的行数,然后程序执行循环体(loopbody),所以expr有多少列,循环体就循环多少次。expr经常用捷径表达式的方式,即first:incr:last。 在for和end之间的语句我们称之为循环体。在for循环运转的过程中,它将被重复的执行。For循环结构函数如下: 1.在for循环开始之时,matlab产生了控制表达式。 2.第一次进入循环,程序把表达式的第一列赋值于循环变量index,然后执行循环体内的语句。 3.在循环体的语句被执行后,程序把表达式的下一列赋值于循环变量index,程序将再一次执行循环体语句。 4.只要在控制表达式中还有剩余的列,步骤3将会一遍一遍地重复执行。 10次。循环系数ii在第一次执行的时侯是1,第二次执行的时侯为2,依次类推,当最后一次执行时,循环指数为10。在第十次执行循环体之后,再也没有新的列赋值给控制表达式,程序将会执行end语句后面的第一句。注意在循环体最后一次执行后,循环系数将会一直为10。 环指数ii在第一次执行时为1,第二次执行时为3,依此类推,最后一次执行时为9。在第五次执行循环体之后,再也没有新的列赋值给控制表达式,程序将会执行end语句后面的第一句。注意循环体在最后一次执行后,循环系数将会一直为9。 循环指数ii在第一次执行时为1,第二次执行时为3,第三次执行时为7。循环指数在循环结束之后一直为7。

循环指数ii 在第一次执行时为行向量??????41,第二次执行时为??????54,第三次执行时为?? ????76。这个例子说明循环指数可以为向量。 例1 阶乘(factorial )函数 这种循环将会执行5次,ii 值按先后顺序依次为1,2,3,4,5。n_factorial 最终的计算结果为1ⅹ2ⅹ3ⅹ4ⅹ5=120。 例2 统计分析 执行如下算法: 输入一系列的测量数,计算它们的平均数和标准差。这些数可以是正数,负数或0。 答案: 这个程序必须能够读取大量数据,并能够计算出这些测量值的平均数和标准差。这些测量值可以是正数,负数或0。 因为我们再也不能用一个数来表示数据中止的标识了,我们要求用户给出输入值的个数,然后用for 循环读取所有数值。 下面的就是这个修定版本的程序。它允许各种输入值,请你自己验证下面5个输入值的

sql循环语句的写法

sql循环语句的写法 SQL循环语句 declare @i int set @i=1 while @i<30 begin insert into test (userid) values(@i) set @i=@i+1 end --------------- while 条件 begin 执行操作 set @i=@i+1 end WHILE 设置重复执行SQL 语句或语句块的条件。只要指定的条件为真,就重复执行语句。可以使用BREAK 和CONTINUE 关键字在循环内部控制WHILE 循环中语句的执行。语法WHILE Boolean_expression { sql_statement | statement_block } [ BREAK ] { sql_statement | statement_block } [ CONTINUE ] 参数

Boolean_expression 返回TRUE 或FALSE 的表达式。如果布尔表达式中含有SELECT 语句,必须用圆括号将SELECT 语句括起来。{sql_statement | statement_block} Transact-SQL 语句或用语句块定义的语句分组。若要定义语句块,请使用控制流关键字BEGIN 和END。BREAK 导致从最内层的WHILE 循环中退出。将执行出现在END 关键字后面的任何语句,END 关键字为循环结束标记。CONTINUE 使WHILE 循环重新开始执行,忽略CONTINUE 关键字后的任何语句。注释 如果嵌套了两个或多个WHILE 循环,内层的BREAK 将导致退出到下一个外层循环。首先运行内层循环结束之后的所有语句,然后下一个外层循环重新开始执行。示例 A. 在嵌套的IF...ELSE 和WHILE 中使用BREAK 和CONTINUE 在下例中,如果平均价格少于$30,WHILE 循环就将价格加倍,然后选择最高价。如果最高价少于或等于$50,WHILE 循环重新启动并再次将价格加倍。该循环不断地将价格加倍直到最高价格超过$50,然后退出WHILE 循环并打印一条消息。USE pubs GO WHILE (SELECT A VG(price) FROM titles) < $30 BEGIN

实验10 T-SQL语言编程基础

实验十 T-SQL语言编程基础 姓名:学号: 专业:网络工程班级: 同组人:无实验日期:2012-4-19【实验目的与要求】 1.熟练掌握变量的定义和赋值。 2.熟练掌握各种运算符。 3.熟练掌握流程控制语句,尤其是条件语句和循环语句。【实验内容与步骤】 10.1. 变量的定义与输出 1.变量的定义和赋值 1) 局部变量的声明: DECLARE @variable_name DataType 例如: declare @stuname varchar(20)--声明一个存放学员姓名的变量stuname. declare @stuseat int--声明一个存放学员座位号的变量stuseat 2) 局部变量的赋值: 局部变量的赋值有两种方法: a) 使用Set语句 Set @variable_name=value b) 使用Select语句 Select @variable_name=value 实验: 运行以下程序段,理解变量的使用。

--局部变量的赋值与使用 declare @customer_name varchar(20)--声明变量用来存放客户名称 set @ customer_name ='家电市场'--使用SET语句给变量赋值 select* from xss where客户名称=@customer_name --通过局部变理向sql语句传递数据 请给出运行结果: 练习: 创建一名为 Product_name的局部变量,并在SELECT语句中使用该变量查找“冰箱”的”价格”和”库存量”。 给出相应的语句 declare @Product_name varchar(20) set @Product_name ='冰箱' select价格,库存量 from CP where产品名称= @Product_name 请给出运行测试结果:

《For循环语句》

《F o r循环语句》教学设计 池州市第八中学杜亦麟 课题 2.4.1 For循环语句 教学内容 粤教版信息技术(选修1)《算法与程序设计》第二章《程序设计基础》第四节《程序的循环结构》第一小节《For循环语句》 教学目标 知识与能力: 1.理解循环结构的基本思想及For语句的执行过程。 2.培养和提高学生逻辑思维能力,使其可以独立完成简单循环结构算法的设计。 3.能够利用For循环语句实现循环结构,解决实际问题。 过程与方法: 1.通过简单的数学问题的分析、讲解,让学生掌握For循环语句语法知识,及其执行原理。 2.以任务驱动,学生分组合作探究的方式,进一步让学生理解For循环语句的基本思想,同时培养学生自主探究和合作学习的能力。 3.通过自评和互评活动,培养学生语言表达能力和归纳总结能力。 情感态度与价值观: 1.提高学生学习兴趣,培养学习的主动性和探究性。 2.培养学生团结协作精神,体验成功的快乐。 教学重点 1.掌握For循环语句的格式和功能; 2.理解For循环语句的执行过程。 教学难点 控制循环的条件、确定循环体的内容 教材分析 第二章是程序设计基础,也是全书的基础。它沿着分析问题、设计算法、编写程序等运用计算机解决问题之路,开始学习如何使用VB程序设计编写程序解决问题。本节课的主要内容For语句的基本格式、执行过程及语句的实际应用。又是本章的重点和难点内容。而循环结构是程序设计的三种基本结构之一,其作用是使一段程序反复执行。For循环语句在程序设计中频繁出现,也是三种结构中较难的一种,因此,学好本节课非常重要,本节课的学习会使学生对算法有一个更深刻的理解,为以后的程序设计打下一个良好的基础,也可以培养学生的创新能力、分析问题和解决问题的能力以及探究精神。

实验7_T-SQL语言编程基础[1]1

实验七T-SQL语言编程基础 【实验目的与要求】 1.熟练掌握变量的定义和赋值。 2.熟练掌握各种运算符。 3.熟练掌握流程控制语句,尤其是条件语句和循环语句。 【实验内容与步骤】 一、准备实验数据 CPXS数据库包含如下三个表: CP(产品编号,产品名称,价格,库存量); XSS(客户编号,客户名称,地区,负责人,电话); CPXSB(产品编号,客户编号,销售日期,数量,销售额); 三个表结构如图2.1~图2.3所示,请在企业管理器中完成表的创建。 图2.1CP表结构

图2.2XSS表结构 图2.3CPXSB表结构 2.1数据写入操作 在企业管理器中输入如图2.4~图2.6的CP表、XSS表和CPXSB表的样本数据。 图2.4CP表的样本数据

图2.5XSS表的样本数据 图2.6CPXSB表的样本数据 10.1.变量的定义与输出 1.变量的定义和赋值 1)局部变量的声明: DECLARE@variable_name DataType 例如: declare@stuname varchar(20)--声明一个存放学员姓名的变量stuname. declare@stuseat int--声明一个存放学员座位号的变量stuseat 2)局部变量的赋值: 局部变量的赋值有两种方法: a)使用Set语句 Set@variable_name=value b)使用Select语句 Select@variable_name=value 实验: 运行以下程序段,理解变量的使用。 --局部变量的赋值与使用 declare@customer_name varchar(20)--声明变量用来存放客户名称set@customer_name='家电市场'--使用SET语句给变量赋值select* from xss where客户名称=@customer_name--通过局部变理向sql语句传递数据请给出运行结果:

SQL循环语句的写法

SQL循环语句的写法 SQL循环语句 declare @i int set @i=1 while @i<30 begin insert into test (userid) values(@i) set @i=@i+1 end --------------- while 条件 begin 执行操作 set @i=@i+1 end WHILE 设置重复执行 SQL 语句或语句块的条件。只要指定的条件为真,就重复执行语句。可以使用 BREAK 和 CONTINUE 关键字在循环内部控制 WHILE 循环中语句的执行。 语法 WHILE Boolean_expression { sql_statement | statement_block } [ BREAK ] { sql_statement | statement_block } [ CONTINUE ] 参数 Boolean_expression 返回 TRUE 或 FALSE 的表达式。如果布尔表达式中含有 SELECT 语句,必须用圆括号将 SELECT 语句括起来。 {sql_statement | statement_block} Transact-SQL 语句或用语句块定义的语句分组。若要定义语句块,请使用控制流关键字 BEGIN 和 END。 BREAK

导致从最内层的 WHILE 循环中退出。将执行出现在 END 关键字后面的任何语句,END 关键字为循环结束标记。 CONTINUE 使 WHILE 循环重新开始执行,忽略 CONTINUE 关键字后的任何语句。 注释 如果嵌套了两个或多个 WHILE 循环,内层的 BREAK 将导致退出到下一个外层循环。首先运行内层循环结束之后的所有语句,然后下一个外层循环重新开始执行。 示例 A. 在嵌套的 IF...ELSE 和 WHILE 中使用 BREAK 和 CONTINUE 在下例中,如果平均价格少于 $30,WHILE 循环就将价格加倍,然后选择最高价。如果最高价少于或等于 $50,WHILE 循环重新启动并再次将价格加倍。该循环不断地将价格加倍直到最高价格超过 $50,然后退出 WHILE 循环并打印一条消息。 USE pubs GO WHILE (SELECT AVG(price) FROM titles) < $30 BEGIN UPDATE titles SET price = price * 2 SELECT MAX(price) FROM titles IF (SELECT MAX(price) FROM titles) > $50 BREAK ELSE CONTINUE END PRINT 'Too much for the market to bear' B. 在带有游标的过程中使用 WHILE 以下的 WHILE 结构是名为 count_all_rows 过程中的一部分。下例中,该 WHILE 结构测试用于游标的函数 @@FETCH_STATUS 的返回值。因为 @@FETCH_STATUS 可能返回–2、-1 或 0,所以,所有的情况都应进行测试。如果某一行在开始执行此存储过程以后从游标结果中删除,将跳过该行。成功提取(0) 后将执行 BEGIN...END 循环内部的 SELECT 语句。 USE pubs DECLARE tnames_cursor CURSOR FOR SELECT TABLE_NAME FROM INFORMATION_SCHEMA.TABLES

《C语言中的for循环》教案

《C语言中的for循环》教学设计 班级:计科软件对131 学号:124 姓名:李泽倩 日期:2016.6.12

《C语言中的for循环》教学设计 一、前端分析 (一)教材内容分析 C语言是国内外广泛使用的计算机语言,学会使用C语言进行程序设计是计算机专业本科生需要掌握的一项基本功。它在各高校计算机专业中既是其他课程的前期基础课,又是培养学生具有程序设计、调试能力的专业核心课程。程序设计的三种基本结构重中之重就是循环结构。而循环中的for循环是程序中运用最多的,它既是前面知识的延续,又是后面知识的基础。本文针对学生的实际情况,具体阐述for循环语句的教学方法和过程,使学生理解for循环语句的格式、功能和特点及其在具体编程时的灵活应用。 (二)学习者特征分析 大学生在智能发展上呈现出进一步成熟的特征。他们的思维有了更高的抽象性和理论性,并由抽象逻辑思维逐渐向辩证逻辑思维发展。他们观察事物的目的性和系统性进一步增强,已能按程序掌握事物本质属性的细节特征,思维的组织性、深刻性和批判性有了进一步的发展,独立性更为加强,注意更为稳定,集中注意的范围也进一步扩大。 二、教学目标设计 (一)知识与技能 1、领会程序设计中构成循环的方法

2、能使用for循环语句编写C语言程序,并能运用for循环语句解决程序设计中的实际问题。 (二)过程与方法 。C语言程序设计中for循环语句教学以行动导向教学为主线,通过“提出问题―分析问题―解决问题―问题扩展―讨论―总结归纳―实践”的程序,过渡到知识应用和练习。 本次课采用多媒体课件进行教学,通过课件把文字和图片有机的结合,使学生在学习过程中更加容易理解,学习效率高。在课堂讨论和实践过程中,教师适当引导,学生主动探究、归纳总结学习内容,既有利于学习新东西,又能充分发挥学生的主体作用。在重点的突破上,采用范例比较教学法,给出具体的案例,让学生通过典型的例子掌握知识,同时通过用while、do while语句的所编写的程序进行比较,加深学生印象,让学生快速的掌握for循环语句的基本结构及使用方法。 (三)情感与价值观: 1.让学生在自主解决问题的过程中培养成就感,为今后学会自主学习打下良好的基础。 2、培养学生学习的主观能动性,激发学生学习热情,以及培养团队合作的精神。 三、教学内容设计 教学重点:C语言程序设计循环结构程序中,要使用for循环语句进行程序设计,首先要求学生要掌握语句的基本格式,理解各个表达式的作用,以及执行过程,所以C语言程序设计中for循环语句的重点是“for语句的结构”。 教学难点:for语句的应用 学生掌握语句的结构和用法并不困难,难的是在实际的应用中那些时候该使用那种循环来解决问题比较简洁、高效,所以我把本节课的难点确定为“for语句的应用”。 四、教学策略分析 (一)教学方法 1、课堂讲授,给出主要内容。

大数据的库基本SQL语句大全

数据库基本SQL语句大全 数据库基本----SQL语句大全 一、基础 1、说明:创建数据库 Create DATABASE database-name 2、说明:删除数据库 drop database dbname 3、说明:备份sql server --- 创建备份数据的device USE master EXEC sp_addumpdevice 'disk', 'testBack', 'c:\mssql7backup\MyNwind_1.d at' --- 开始备份 BACKUP DATABASE pubs TO testBack 4、说明:创建新表 create table tabname(col1 type1 [not null] [primary key],col2 typ e2 [not null],..) 根据已有的表创建新表: A:create table tab_new like tab_old (使用旧表创建新表) B:create table tab_new as select col1,col2…from tab_old defini tion only 5、说明:删除新表 drop table tabname 6、说明:增加一个列 Alter table tabname add column col type 注:列增加后将不能删除。DB2中列加上后数据类型也不能改变,唯一能改变的是增加varchar类型的长度。 7、说明:添加主键:Alter table tabname add primary key(col) 说明:删除主键:Alter table tabname drop primary key(col) 8、说明:创建索引:create [unique] index idxname on tabname(col….) 删除索引:drop index idxname 注:索引是不可更改的,想更改必须删除重新建。 9、说明:创建视图:create view viewname as select statement

FOR循环语句教学设计

F O R循环语句教学设计 集团档案编码:[YTTR-YTPT28-YTNTL98-UYTYNN08]

F O R循环语句 一、教材分析:本节是《算法与程序设计》(选修)第二章第四节“程序的循环结构”中的内容。这一节的前面是顺序结构和选择结构,紧接FOR语句后面是DO语句和循环嵌套。本节课是FOR语句的初次学习,着重介绍FOR语句的基础知识:格式和执行过程,不涉及双重循环等较难的运用。循环结构是程序设计的三种基本结构之一,是程序设计的基础。 二、学情分析:在学习本课之前,学生已掌握VB程序的顺序结构和选择结构的程序执行流程,对条件语句有了较深的理解,并具有一定的算法基础和比较、归纳能力。 三、教学目标 1、知识与技能:: 1)掌握FOR循环语句的基本格式; 2)理解FOR循环语句的执行过程; 3)能用for循环结构编写简单的程序。 2、过程与方法: 1)培养学生分析问题,解决问题的能力。 2)能进一步理解用计算机解决问题的过程和方法。 3、情感态度与价值观:激发学生学习热情,培养学生学习的积极性。 四、教学重点、难点及确立依据: 教学重点:1、掌握FOR循环语句的基本格式; 2、理解FOR循环语句的执行过程; 教学难点:解决实际问题,编写简单程序。 五、教学方法:讲授法、任务驱动法 六、教学环境:机房 六、教学过程: 1、导入新课: 由故事引出本节课内容: 阿基米德与国王下棋,国王输了,国王问阿基米德要什么奖赏?阿基米德对国王说:我只要在棋盘上第一格放一粒米,第二格放二粒,第三格放四粒,第四格放八粒………按这个比例放满整个棋盘64个格子就行。国王以为要不了多少粮食,可一个粮仓的米还摆不完一半的棋格子,全部摆满后,你知道排满棋盘全部格子有多少米吗?请根据你所学的数学知识列出式子。 学生回答:2^0+2^1+2^2+……2^64 那用vb程序怎样进行计算呢?引出循环结构。 2、新课讲授: 在实际问题中会遇到具体规律性的重复运算问题,反映在程序中就是将完成特定任务的一组语句重复执行多次。重复执行的一组语句称为循环体,每重复一次循环体,都必须做出继续或者停止循环的判断,其依据就是判断一个特定的条件,成立与否,决定继续还是退出循环。 举例说明: Fori=1To10 s=s+i

分享高性能批量插入和批量删除sql语句写法

分享高性能批量插入和批量删除sql语句写法 一,技术水平总能在扯皮和吹毛求疵中得到提高。如果从来不“求疵”,可能就不会知道if(str != "")不如if(str != string.Empty)高效、批量插入和删除的sql语句是要那样写才执行最快、接口和抽象类的区别不仅是语言层面、原来权限管理是要这样设计的、某个类那样设计职责才更单一更易于扩展…… 本来前两篇文章是学习cnblogs编辑控件用的,看到跟贴的朋友询问批量插入和批量删除的问题,决定整理成文和大家分享。 我们这里讨论的只是普通sql语句如何写更高效,不考虑特殊的用文件中转等导入方式,毕竟在代码中调用sql语句或存储过程才更方便。 批量删除很简单,大家可能都用过: DELETE FROM TestTable WHERE ID IN (1, 3, 54, 68) --sql2005下运行通过 当用户在界面上不连续的选择多项进行删除时,该语句比循环调用多次删除或多条delete语句中间加分号一次调用等方法都高效的多。 本文重点讲述的是批量插入的写法: sql写法: INSERT INTO TestTable SELECT 1, 'abc' UNION SELECT 2, 'bcd' UNION SELECT 3, 'cde' --TestTable表没有主键,ID不是主键 oracle写法: INSERT INTO TestTable SELECT 1, 'abc' From daul UNION SELECT 2, 'bcd' From daul --TestTable表没有主键,ID不是主键 曾经测试过,这种写法插入1000条数据比循环调用1000次insert或1000条insert语句简单叠加一次调用要高效得多,大概快20多倍(调试状态不是太准)。其实很简单,就用了个union (union all 也可以),但当时得出测试结果时还是很惊喜的。 要得出这个结果需要两个条件: 1、表不能有主键或者主键是数据库默认的(sql用自动递增列,oracle用序列) 2、组合sql语句时只能直接用字符串连接,不能用参数化sql语句的写法(就是在组合的sql中用@parm做占位符,再给Command对象添加Parameter)

SQL循环语句

在sqlserver 数据库中,循环语句是最常用的语句之一,下面就将为您分析SQL循环语句的例子,供您参考,希望对您有所启迪。 SQL循环语句 declare @i int set @i=1 while @i<30 begin insert into test (userid) values(@i) set @i=@i+1 end -------------------------- while 条件 begin 执行操作 set @i=@i+1 end WHILE 设置重复执行SQL 语句或语句块的条件。只要指定的条件为真,就重复执行语句。可以使用BREAK 和CONTINUE 关键字在循环内部控制WHILE 循环中语句的执行。 语法 WHILE Boolean_expression { sql_statement | statement_block } [ BREAK ] { sql_statement | statement_block } [ CONTINUE ] 参数 Boolean_expression 返回TRUE 或FALSE 的表达式。如果布尔表达式中含有SELECT 语句,必须用圆括号将SELECT 语句括起来。 {sql_statement | statement_block} Transact-SQL 语句或用语句块定义的语句分组。若要定义语句块,请使用控制流关键字BEGIN 和END。 BREAK 导致从最内层的WHILE 循环中退出。将执行出现在END 关键字后面的任何语句,END 关键字为循环结束标记。 CONTINUE 使WHILE 循环重新开始执行,忽略CONTINUE 关键字后的任何语句。 注释 如果嵌套了两个或多个WHILE 循环,内层的BREAK 将导致退出到下一个外层循环。首先运行内层循环结束之后的所有语句,然后下一个外层循环重新开始执行。 示例 A. 在嵌套的IF...ELSE 和WHILE 中使用BREAK 和CONTINUE 在下例中,如果平均价格少于$30,WHILE 循环就将价格加倍,然后选择最高价。如果最

“FOR循环语句”教案

“FOR循环语句”教案五常市职教中心学校马瑞雪

“FOR循环语句”教案 五常市职教中心学校马瑞雪 一、教学目标 1、认知领域目标: 2、操作领域目标: a.学会书写for语句的一般格式; b.能够读懂由for结构编制的程序 c.简单掌握用for语句求解问题。 3、情感领域目标: 提高学生思维能力,激发学生的探索精神,掌握科学的思维方法,养成勤于思考、乐于探求新知识的好习惯。 二、教学重点、难点 1.教学重点(1)循环的概念。 (2)FOR语句的格式及使用 2.教学难点循环的概念及使用 三、教学方法及学习方法 1.教学方法:讲授法、启发式教学方法、案例分析法 2.学习方法:探讨式 四、教学用具多媒体课件、网络多媒体教室设备、教材 五、教学程序 (一)检查小测8分钟 要求学生写出一个简单的收银程序(输入单价、数量,显示应付款,输入实际付款,输出找还款项) 【教师活动】巡堂指导学生检查并评价学生的程序,指出存在的问题 【学生活动】编写程序部分学生演示程序 (二)复习导入2分钟 回想商店收银情况 提出问题:商店可能一天只做一个人的生意吗?在收银的时候可能每次都重新运行一次程序吗? 【教师活动】采用启发式提问学生 【学生活动】思考后作答

(三)新课呈现30分钟 1、循环的概念 计算机重复执行某一语句或语句体 按照这个概念,你能想到在什么情况下可以用到循环或者属于循环这种情况? 【教师活动】 1)通过例子讲解循环的概念 2)由循环的概念引申出循环的作用,并启发学生思考还有什么相应的例子 3)帮助学生一起分析他们所理解的情况是否正确。 【学生活动】 1)聆听2)思考后回答 2、C 循环语句for语句(也称for循环) 2、C语言中的for语句使用最为灵活,不仅可以用于循环次数已经确定的情况,而且可以用于循环次数不确定而只给出循环结束条件的情况,它的使用格式为: for(表达式一;表达式二;表达式三) 循环体语句 它的执行过程如下(流程图显示见课件): 1>先求解表达式一(循环变量赋初值)。 2>求解表达式二(循环条件),若其值为0则结束循环;若其值为非0则执行下面的第 三步。 3>执行循环体语句,这个语句代表一条语句或一个复合语句。 4>求解表达式三(循环变量增值)。 5>转到第二步去执行。 三个表达式的含义要注意循环初值,循环条件,循环增量 例如让循环执行三次:for(i=0;i<3;i++)或for(i=3;i>0;i--) 让循环执行十次:for(i=0;i<10;i++)或for(i=10;i>0;i--) 【教师活动】 1)讲授 2)演示流程并讲解 3)启发学生理解、明白三个表达式的含义 4)提出重点要理解的地方,举出简单例子先和学生一起完成,然后协助学生完成【学生活动】 1)聆听 2)聆听,观看演示 3)看演示,并回答老师提问,理解个中含义 4)发言说出自己的理解和程序表达式 3、例子分析 1)main() { int n, i=100;

T-SQL 程序循环结构

T-SQL 程序 循环结构WHILE 1.特点:WHILE循环语句可以根据某些条件重复执行一条T-SQL语句或一个语句块。 WHILE(条件) BEGIN 语句或语句块 END 3.调试程序:ALT+F5 启动调试,启动后F9切换断点,F10逐过程,F11逐语句 4.例题:使用循环求1-10之间的累加和 DECLARE@sum int,@i int SET@sum=0 SET@i=1 --使用循环结构累加求和 WHILE(@i<=10) BEGIN SET@sum=@i+@sum SET@i=@i+1 END PRINT'1-10之间的累加和为:'+CAST(@sum ASvarchar(4))

GO 三、CONTINUE、BREAK、RETURN关键字 1.特点: 1)CONTINUE:可以让程序跳过CONTINUE关键字后的语句,回到WHILE循环的第一 行命令。 2)BREAK:让程序完全跳出循环,结束WHILE循环的执行 3)RETURN:从查询或过程中无条件退出,RETURN语句可在任何时候,用于从过程、 批处理或语句块中退出。位于RETURN之后的语句不会被执行。 2.例题 【1】在循环中使用CONTINUE,遇到CONTINUE回到WHILE的条件判断处 求1-10之间的偶数和 DECLARE@sum int,@i int SET@i=1 WHILE(@i<=10) BEGIN IF(@i%2=1) BEGIN SET@i=@i+1 CONTINUE END ELSE

BEGIN SET@sum=@sum+@i SET@i=@i+1 END END--WHILE的结束 PRINT'1-10之间的偶数和为:'+CONVERT(varchar(2),@sum) GO 【2】在循环中使用BREAK 遇到数字5退出程序。 DECLARE@i int SET@i=1 WHILE(@i<=10) BEGIN IF(@i=5) BEGIN BREAK--退出循环结构,即使条件成立也不去判断了END ELSE SET@i=@i+1 END

for语句语法

for语句语法 for(初始化表达式;判断表达式;循环表达式) { 语句;} for语句说明 for语句非常灵活完全可以代替while与do...while语句。见下图,先执行"初始化表达式",再根据"判断表达式"的结果判断是否执行循环,当判断表达式为真true时,执行循环中的语句,最后执行"循环表达式",并继续返回循环的开始进行新一轮的循环;表达式为假false不执行循环,并退出for循环。(真(true)假(false)是JavaScript布尔类型) 示例

计算1-100的所有整数之和(包括1与100) for(var i=0,iSum=0;i<=100;i++) { iSum+=i; } document.write("1-100的所有数之和为"+iSum); for语句使用中的注意事项 ?应该使用大括号{}包含多条语句(一条语句也最好使用大括号) ?初始化表达式可以包含多个表达式,循环表达式也可以包含多个表达式例如: for(var i=0,iSum=0,j=0;i<=100;i++,j--) { iSum=i+j; } ?初始化表达式,判断表达式,循环表达式都是可以省略的 例如 for(;;) { } 上面例子的循环将无法停止 for与while语句的互相转化 for与while是可以相互转化的。 for(var i=0,iSum=0;i<=100;i++) { iSum+=i; } var i=0; var iSum=0; while(i<=100) { iSum+=i;

i++; } break与continue说明 前面讲到break可以跳出switch...case语句,继续执行switch语句后面的内容。break语句还可以跳出循环,也就是结束循环语句的执行。 continue语句的作用为结束本次循环,接着进行下一次是否执行循环的判断。break与continue的本质区别 continue与break的区别是:break是彻底结束循环,而continue是结束本次循环 break语句示例 在字符串中找到第一个d的位置,可以使用break var sUrl = ""; var iLength = sUrl.length; var iPos = 0; for(var i=0;i

相关文档